# OpenPiton: An Open-Source Framework for EDA Tool Development

David Wentzlaff FOSDEM 2020

Princeton University



http://openpiton.org https://github.com/PrincetonUniversity/openpiton OpenPiton: The world's first open source, general purpose, multithreaded manycore processor

- Open source manycore
- Written in Verilog RTL
- Scales to ½ billion cores
- Configurable core, uncore
- Includes synthesis and back-end flow
- Simulate in VCS, ModelSim, NCSim, Verilator, Icarus, Riviera
- ASIC & FPGA verified
- ASIC power and energy fully characterized [HPCA 2018]
- Runs full stack multi-user Debian Linux

<u>http://openpiton.org</u> <u>https://github.com/PrincetonUniversity/openpiton</u>





deb

OpenPiton+Ariane: The first Open-Source SMP Linuxbooting RISC-V system scaling from one to many cores



- Collaboration with ETH Zurich
  - Integrate OpenPiton and Ariane RISC-V Core
  - Booted SMP Linux less than 6 months after starting integrating RTL

http://openpiton.org https://github.com/PrincetonUniversity/openpiton



|               | rv64imac           |
|---------------|--------------------|
|               | sv39               |
| ch            | eth, ariane        |
|               |                    |
|               |                    |
| cessor        | 1                  |
|               | 1                  |
|               | rv64imac           |
| t             | sv39               |
| ch            | eth, ariane        |
|               |                    |
|               |                    |
| cessar        |                    |
| cessor<br>t   | 2                  |
|               | 2<br>2<br>rv64imac |
|               | sv39               |
| ch            | eth, ariane        |
|               |                    |
| cessor        | 3                  |
| t             | 5                  |
|               | · · · · · · ·      |
|               | rv64imac           |
|               | sv39               |
| ch            | eth, ariane        |
|               |                    |
| x4 /          |                    |
| An and a star |                    |
|               |                    |

# Silicon Proven Designs

- Piton (25-core instance of OpenPiton)
  - 25-core modified 64 bit OpenSPARC T1 Core
  - 3 P-Mesh NoCs
  - P-Mesh Directory-Based Cache System
  - Taped-out in IBM 32nm SOI
    - 6mm x 6mm; **460 Million Transistors** Among largest chips built in academia
  - Target: 1 GHz Clock @ 900 mV
  - Received silicon and runs full-stack Debian in lab

#### Ariane RISC-V (ETH-Z)

- Taped-out in GlobalFoundries 22nm FDX
   Poseidon:
  - Area: 0.23 mm<sup>2</sup> 175 kGE
  - 0.2 1.7 GHz (0.5 V 1.15 V)

#### Kosmodrom:

- RV64GCXsmallFloat, Transprecision / Vector FPU
- Ariane HP
  - 8T library, 0.8V, 1.3 GHz
  - 55 mW @ 1 GHz
- Ariane LP
  - 7.5T ULP library, 0.5V, 250 MHz
  - 5 mW @ 200 MHz





Ariane Die Photos from ETH-Z

# FPGA Prototyping Platforms

Available:

- Digilent Genesys2
   \$999 (\$600 academic)
  - 1-2 cores at 66MHz
- Xilinx VC707
  - \$3500
  - 1-4 cores at 60MHz
- Digilent Nexys Video
  - \$500 (\$250 academic)
  - 1 core at 30MHz

- BittWare XUPP3R
   \$7000-8000
  - ->100MHz (12 cores)
- Amazon AWS F1
  - —~\$1.60/hr
  - Rent by the hour
  - 12 cores





#### **OpenPiton System Overview**



## **OpenPiton System Overview**



http://openpiton.org https://github.com/PrincetonUniversity/openpiton

#### **OpenPiton+Ariane System Overview**



### Current Status of Free and Open Source Chip EDA tools and Benchmarks

- Growing use of Free & Open Source EDA tools
  - Mainly used for verification of designs and with FPGAs
- Free & Open Source EDA tools have relied on industrial hardware design releases for verification
- Dependence on industry incurs limitations
  - Designs and scale of designs are outdated by the time they are released
    - LEON3 (2004) is still one of the largest designs
  - Lower-level information such as one needed for Placement and Routing tools, etc. are often obfuscated (ISPD, ICCAD benchmarks)

#### **OpenPiton Contains Design Variety**

- Big Cores
- Small Cores
- Caches
- Interconnect
- GPGPU (MIAOW)
- I/O
- Adding in additional accelerators



# **Configurability Options**

| Component                | Configurability Options                       |                                    |  |
|--------------------------|-----------------------------------------------|------------------------------------|--|
| Cores (per chip)         | Up to 65,536                                  |                                    |  |
| Cores (per system)       | Up to 500 million                             |                                    |  |
| Core Type                | OpenSPARC T1                                  | Ariane 64 bit RISC-V               |  |
| Threads per Core         | 1/2/4                                         | 1                                  |  |
| Floating-Point Unit      | FP64, FP32                                    | FP64, FP32, FP16, FP8,<br>BFLOAT16 |  |
| TLBs                     | 8/16/32/64 entries                            | Number of entries (16 entries)     |  |
| L1 I-Cache               | Number of Sets, Ways (16kB, 4-way)            |                                    |  |
| L1 D-Cache               | Number of Sets, Ways (8kB, 4-way)             |                                    |  |
| L1.5 Cache               | Number of Sets, Ways (8kB, 4-way)             |                                    |  |
| L2 Cache                 | Number of Sets, Ways (64kB, 4-way)            |                                    |  |
| Intra-chip<br>Topologies | 2D Mesh, Crossbar                             |                                    |  |
| Inter-chip<br>Topologies | 2D Mesh, 3D Mesh, Crossbar, Butterfly Network |                                    |  |
| Bootloading              | SD/SDHC Card, UART, RISC-V JTAG Debug         |                                    |  |

# Scale of Designs

| Component                | Configurability Options                       |                                    |  |
|--------------------------|-----------------------------------------------|------------------------------------|--|
| Cores (per chip)         | Up to 65,536                                  |                                    |  |
| Cores (per system)       | Up to 500 million                             |                                    |  |
| Core Type                | OpenSPARC T1                                  | Ariane 64 bit RISC-V               |  |
| Threads per Core         | 1/2/4                                         | 1                                  |  |
| Floating-Point Unit      | FP64, FP32                                    | FP64, FP32, FP16, FP8,<br>BFLOAT16 |  |
| TLBs                     | 8/16/32/64 entries                            | Number of entries (16 entries)     |  |
| L1 I-Cache               | Number of Sets, Ways (16kB, 4-way)            |                                    |  |
| L1 D-Cache               | Number of Sets, Ways (8kB, 4-way)             |                                    |  |
| L1.5 Cache               | Number of Sets, Ways (8kB, 4-way)             |                                    |  |
| L2 Cache                 | Number of Sets, Ways (64kB, 4-way)            |                                    |  |
| Intra-chip<br>Topologies | 2D Mesh, Crossbar                             |                                    |  |
| Inter-chip<br>Topologies | 2D Mesh, 3D Mesh, Crossbar, Butterfly Network |                                    |  |
| Bootloading              | SD/SDHC Card, UART, RISC-V JTAG Debug         |                                    |  |

#### **OpenPiton: Not Just Verilog**

- Verification
  - testbenches
- 8000+ Test cases
- Power and **Thermal** Analysis
- PCB design







https://parallel.princeton.edu/openpiton/piton power char.html https://parallel.princeton.edu/openpiton/piton\_pcb.html

2000 Time (s)

1500

2500

3000

1000

500

### **Piton Test Setup**



#### **Piton** + Heat Sink Power Supply

[McKeown et al, HotChips 2016] [McKeown et al, IEEE MICRO 2017] [McKeown et al, HPCA 2018]

**Chipset FPGA** 

Kintex 7

#### **Piton Characterization**



Sending Data 8-hops on NoC equivalent of an ALU op.

Debunks conventional wisdom that NoCs dominate energy of manycore

Mem.

Control

FPSP

FOINS

st with

breint beas

st

,07

#### **Open Source Design and Data**



# OpenPiton and Free & Open Source EDA tools

- OpenPiton framework utilizes multiple open source tools
  - Icarus Verilog
  - Verilator
  - Yosys
  - FuseSoC
  - SV2V
- Not only use but contribute too
  - Bug reports and fixes
  - Requests for new features
  - Contributing new features



#### Need for a Free & Open Source EDA Flow

- EDA tools are essential components but a flow that connects is equally important
- The EDA flow from a chip builder's perspective is different than from an EDA tools developer
  - View across tools
  - Interaction between tools
  - Hierarchical synthesis
  - Two passes
  - Good support for Gate-level verification



#### What is missing for a full Free & Open Source Flow

- The UCSD led OpenROAD project provides majority of chip implementation tools
- UW has packaged flow using OpenROAD with Free45 library

#### **Missing Components**

- Free/Open Source DRC tool
- Strengthening Yosys
- Better SystemVerilog support
- Free/Open
   Source Parasitic
   Extraction tool
- Powergrid and Signal Integrity Issues

https://github.com/The-OpenROAD-Project/OpenROAD-flow https://github.com/bsg-idea/uw\_openroad\_free45

#### Building Open Source Chips with Open Source EDA tools

- Princeton with UW planning to tapeout **billion** transistor chip
  - Global Foundries 14/12nm
  - Use Free & Open Source
     CAD tools from OpenROAD
     and DARPA IDEA project

FMC

BaseJump

PCB



**TurboXAUI** SerDes

**TurboXAUI SerDes** 

### DECADES: an Open Source Heterogeneous Platform

- Software Defined Hardware (SDH)
  - Design runtime-reconfigurable hardware to accelerate data-intensive software applications
    - Machine learning and data science
    - Graph analytics and sparse linear algebra
- DECADES: heterogeneous tile-based chip
  - Combination of core, accelerator, and intelligent storage tiles
  - Princeton/Columbia collaboration led by PIs Margaret Martonosi, David Wentzlaff, Luca Carloni
- Our tools and Hardware is **open-source**!
  - <u>https://decades.cs.princeton.edu/</u>



# **OpenPiton Community**

- Welcome community contributions
- Homepage <u>http://openpiton.org</u>
- Google Group <u>https://groups.google.com/group/openpiton</u>
- Direct email openpiton@princeton.edu
- GitHub <u>https://github.com/PrincetonUniversity/openpiton</u>





#### DOWNLOAD, BUILD, TAPE-OUT!

Explore and build on top of a robust research many-core processor.

# Team and Acknowledgements



#### Princeton Parallel Team

Special thanks to
 Georgios Tziantzioulis
 for help with slides

- Ariane Team as part of the PULP platform led by Prof. Luca Benini
- Prof. Michael B. Taylor and Prof. Richard Shi Groups at UW
- OpenROAD team led by Prof. Andrew Kahng

### Funding/Support





# **DIGILENT**<sup>®</sup> XILINX<sub>®</sub>



Fonds national suisse Schweizerischer Nationalfonds Fondo nazionale svizzero Swiss National Science Foundation



Horizon 2020 European Union funding for Research & Innovation

AIR FORCE OFFICE OF SCIENTIFIC RESEARC

This material is based on research sponsored by the NSF under Grants No. CNS-1823222, CCF-1217553, CCF1453112, CCF-1823032, and CCF-1438980, AFOSR under Grant No. FA9550-14-1-0148, Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA) under agreement No. FA8650-18-2-7846, FA8650-18-2-7852, and FA8650-18-2-7862 and DARPA under Grants No. N66001-14-1-4040 and HR0011-13-2-0005. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA), the NSF, AFOSR, or the U.S. Government.

#### OpenPiton: An Open-Source Framework for EDA Tool Development

**PRINCETON** David Wentzlaff (wentzlaf@princeton.edu)

 UNIVERSITY
 http://openpiton.org

 http://openpiton.org
 http://openpiton.org

https://github.com/PrincetonUniversity/openpiton

