# **OPEN-SOURCE CPU: RISC-V CFU AND ZEPHYR**

**Mohammed Billoo MAB Labs Embedded Solutions FOSDEM 25** 

Creative Commons 💿 🛈





Mohammed Billoo mab@mab-labs.com







in /mab-embedded



X @mabembedded

Creative Commons



# THE SPEAKER



- Embedded Software Consultant (NYC, USA)
- Design Work
  - Medical Devices
  - Scientific Instruments
  - LIDAR
  - Custom ASICs
- Experience/Expertise
  - Zephyr RTOS
  - Embedded Linux
  - GUI-based applications





# BIOS FOOD NEWSLETTER Training/Workshops

Creative Commons







www.mab-labs.com

# AGENDA

- Background 0
- Under the Hood
- The Plan
- How It Went (not good)
- Next Steps (need debugging help) 0







https://www.youtube.com/watch?v=syA3xxKfB4s&t=28s

Creative Commons



### BACKGROUND



### FOSDEM24

- Used neorv32 project to implement RISC-V ISA in an FPGA
  - Has support for built-in bootloader
- Used vendor tools to build and flash **FPGA**
- Loaded "Hello World" Zephyr application on to FPGA.
  - Zephyr has support for "neorv32" board
  - Uploaded application using bootloader



# BACKGROUND

- After some fennagling, got output! 0
- See previous presentation for details
- Next step was to investigate CFU
- Roadblock 0
  - Design with CFU enabled didn't fit in FPGA 0





### CMD:>hAvailable CMDs:

- h: Help
- r: Restart
- u: Upload
- s: Store to flash
- 1: Load from flash
- x: Boot from flash (XIP)

### e: Execute

CMD:> e

Booting from 0x00000000...

\*\*\* Booting Zephyr OS build 6f56a6a91e2c \*\*\* Hello World! neorv32

# MOTIVATION

- **C**ustom **F**unction **U**nit?
- Allows us to offload functionality to hardware (FPGA)
- Specific for operations that are inefficient (in software):
  - Performance
  - Latency
  - **Energy Consumption**
  - **Program Memory**

Creative Commons 💿 🛈





# MOTIVATION

- Crypto
- Communications
- Arithmetic
- Image Processing 0
- **Requires CPU dependency**





# DETAILS

- Leverages "custom-0" and "custom-1" RISC-V opcodes 0
- CFU is implemented using the "Zxcfu" extension of the RISC-V ISA
  - Specific to neorv32
- Sample implementation of XTEA in RTL 0
  - Simple block cipher
  - Embedded systems

| ſ | inst[4:2] | 000    | 001      | 010      | 011      | 100    | 101      | 110               | 111        |
|---|-----------|--------|----------|----------|----------|--------|----------|-------------------|------------|
|   | inst[6:5] |        |          |          |          |        |          |                   | (> 32b)    |
|   | 00        | LOAD   | LOAD-FP  | custom-0 | MISC-MEM | OP-IMM | AUIPC    | OP-IMM-32         | 48b        |
|   | 01        | STORE  | STORE-FP | custom-1 | AMO      | OP     | LUI      | OP-32             | 64b        |
|   | 10        | MADD   | MSUB     | NMSUB    | NMADD    | OP-FP  | reserved | custom-2/ $rv128$ | 48b        |
|   | 11        | BRANCH | JALR     | reserved | JAL      | SYSTEM | reserved | custom- $3/rv128$ | $\geq 80b$ |

Creative Commons





|--|

Table 19.1: RISC-V base opcode map, inst[1:0]=11

# **UNDER THE HOOD**

### rtl/core/neorv32\_cpu\_cp\_cfu.vhd

# -- instruction identifiers (funct3 bit-field) --

### Use "intrinsics" to "call" functions in CFU

### sw/example/demo\_cfu/main.c

| <pre>#define xtea_hw_init(sum)</pre>  | neorv32_cfu_r3_           | instr(0b0000000, 0b100, | sum, 0) |
|---------------------------------------|---------------------------|-------------------------|---------|
| <pre>#define xtea_hw_enc_v0_ste</pre> | p(v0, v1) neorv32_cfu_r3_ | instr(0b000000, 0b000,  | v0, v1) |
| <pre>#define xtea_hw_enc_v1_ste</pre> | p(v0, v1) neorv32_cfu_r3_ | instr(0b000000, 0b001,  | v0, v1) |
| <pre>#define xtea_hw_dec_v0_ste</pre> | p(v0, v1) neorv32_cfu_r3_ | instr(0b000000, 0b010,  | v0, v1) |
| <pre>#define xtea_hw_dec_v1_ste</pre> | p(v0, v1) neorv32_cfu_r3_ | instr(0b000000, 0b011,  | v0, v1) |
| <pre>#define xtea_hw_illegal_i</pre>  | neorv32_cfu_r3_           | instr(0b000000, 0b111,  | 0, 0)   |

Creative Commons 💿 🛈





constant xtea\_enc\_v0\_c : std\_ulogic\_vector(2 downto 0) := "000"; constant xtea\_enc\_v1\_c : std\_ulogic\_vector(2 downto 0) := "001"; constant xtea\_dec\_v0\_c : std\_ulogic\_vector(2 downto 0) := "010"; constant xtea\_dec\_v1\_c : std\_ulogic\_vector(2 downto 0) := "011"; constant xtea\_init\_c : std\_ulogic\_vector(2 downto 0) := "100";

# **UNDER THE HOOD**

| <pre>#define xtea_hw_init(sum)</pre>           | <pre>neorv32_cfu_r3_instr(0b0000000,</pre> | 0b100, sum, | 0)  |
|------------------------------------------------|--------------------------------------------|-------------|-----|
| <pre>#define xtea_hw_enc_v0_step(v0, v1)</pre> | <pre>neorv32_cfu_r3_instr(0b0000000,</pre> | 0b000, ∨0,  | v1) |
| <pre>#define xtea_hw_enc_v1_step(v0, v1)</pre> | <pre>neorv32_cfu_r3_instr(0b0000000,</pre> | 0b001, ∨0,  | v1) |
| <pre>#define xtea_hw_dec_v0_step(v0, v1)</pre> | <pre>neorv32_cfu_r3_instr(0b0000000,</pre> | 0b010, ∨0,  | v1) |
| <pre>#define xtea_hw_dec_v1_step(v0, v1)</pre> | <pre>neorv32_cfu_r3_instr(0b0000000,</pre> | 0b011, ∨0,  | v1) |
| <pre>#define xtea_hw_illegal_inst()</pre>      | <pre>neorv32_cfu_r3_instr(0b0000000,</pre> | 0b111, 0,   | 0)  |

#define neorv32\_cfu\_r3\_instr(funct7, funct3, rs1, rs2) \

Creative Commons 🖸 🛈





sw/lib/include/neorv32\_cpu\_cfu.h CUSTOM\_INSTR\_R3\_TYPE(funct7, rs2, rs1, funct3, 0b0001011)

### **UNDER THE HOOD**

### #define neorv32\_cfu\_r3\_instr(funct7, funct3, rs1, rs2) \ CUSTOM\_INSTR\_R3\_TYPE(funct7, rs2, rs1, funct3, 0b0001011)

```
({
   uint32_t __return;
   asm volatile (
     ".word (
       (((" #funct7 ") & 0x7f) << 25) |
       ((( reg_%2 ) & 0x1f) << 20) |
       ((( reg_%1 ) & 0x1f) << 15) |
       (((" #funct3 ") & 0x07) << 12) |
       ((( reg_%0 ) & 0x1f) << 7) |
       (((" #opcode ") & 0x7f) << 0)
     );"
      : [rd] "=r" (__return)
     : "r" (rs1),
       "r" (rs2)
    );
    __return;
```

Creative Commons





### sw/lib/include/neorv32\_intrinsics.h

#define CUSTOM\_INSTR\_R3\_TYPE(funct7, rs2, rs1, funct3, opcode) \

### Assembler calls!



# THE PLAN

- Get board with FPGA that can fit design with CFU 0
- Enable CFU in board top level RTL file 0
- Build and flash design 0
- Load Zephyr Blinky/Hello World 0
- Add function calls to Zephyr from neorv32 library to call instrinsics 0
  - Need to pull in header/source files from neorv32 repo
- Confirm XTEA operation
  - Compare performance of XTEA in HW vs SW

https://orangecrab-fpga.github.io/orangecrab-hardware/









- OrangeCrab
  - Need newer hardware version
  - Larger FPGA to support CFU functionality
- https://github.com/stnolting/neorv32-setups/tree/main/osflow 0
  - Describes process to build FPGA image
  - OrangeCrab "setup" uses open-source tools
  - Current implementation uses smaller FPGA 0
    - Lattice ECP5-25F requires modifications





### Hardware r0.2.1

Status: Currently produced design

### Changes from r0.2:

- Changed USB micro-b to USB-C
- Swapped DCDC devices
- Added support for ECP5 85F-5G
- Added Castellated I/O pins





Installing tools directly didn't go as planned - ran into installation issues 0 0

> ſ && DEBIAN\_FRONTEND=noninteractive apt-get -y install --no-install-recommends \ python3−pip \ && pip3 install wheel setuptools \ && pip3 install doit ∖ && apt-get autoclean && apt-get clean && apt-get -y autoremove \ && rm -rf /var/lib/apt/lists/\* radiant vivado gowineda neorv32 quartus README.md

FROM gcr.io/hdl-containers/debian/bullseye/impl RUN apt-get update -qq \ ENV GHDL\_PLUGIN\_MODULE=ghdl WORKDIR tmp/src [~]\$ cd fosdem/2025/neorv32-setups [neorv32-setups]\$ ls CODE\_OF\_CONDUCT.md constraints LICENSE osflow cologne\_chip [neorv32-setups]\$ docker run -v\$PWD:/tmp/src:z -it orangecrab-neorv32 bash -c 'make -C osflow/ BOARD =OrangeCrab MinimalBoot'

Creative Commons





### Decided to leverage container that is used as part of Github Actions build in repo

### Enable DFU in RTL

[neorv32]\$ git diff diff --git a/rtl/core/neorv32\_top.vhd b/rtl/core/neorv32\_top.vhd index 1236ed22..6b5150cf 100644 --- a/rtl/core/neorv32\_top.vhd +++ b/rtl/core/neorv32\_top.vhd <u>00</u> -45,7 +45,7 00 entity neorv32\_top is CPU\_EXTENSION\_RISCV\_Zicond : boolean CPU\_EXTENSION\_RISCV\_Zihpm : boolean CPU\_EXTENSION\_RISCV\_Zmmul : boolean CPU\_EXTENSION\_RISCV\_Zxcfu : boolean -CPU\_EXTENSION\_RISCV\_Zxcfu : boolean H.

Creative Commons 💿 🛈





| := false; | implement integer conditional operations? |
|-----------|-------------------------------------------|
| := false; | implement hardware performance monitors?  |
| := false; | implement multiply-only M sub-extension?  |
| := false; | implement custom (instr.) functions unit? |
| := true;  | implement custom (instr.) functions unit? |

### Modify appropriate Makefile to inform tools of different FPGA

diff --git a/osflow/boards/index.mk b/osflow/boards/index.mk index 20ff948..fd09c43 100644 --- a/osflow/boards/index.mk +++ b/osflow/boards/index.mk 00 -99,7 +99,7 00 DEVICE\_SERIES = ecp5 OrangeCrab\_REV ?= r02-25F

CONSTRAINTS ?= \$(PCF\_PATH)/\$(BOARD).lpf IMPL

endif

Creative Commons 💿 🛈





- After successfully built FPGA, used dfu-util to Flash FPGA
- Held button on power up to enter programming mode
- Attach device information to file
  - ocp neorv32\_OrangeCrab\_r02-25F\_MinimalBoot.bit neorv32\_OrangeCrab\_r02-25F\_MinimalBoot.dfu
  - neorv32\_OrangeCrab\_r02-25F\_MinimalBoot.dfu
  - dfu-suffix -v 1209 -p 5af0 -a • dfu-util -a 0 -D neorv32\_OrangeCrab\_r02-25F\_MinimalBoot.dfu





- **Bootloader working!** 0
- Compile Zephyr "Hello World" for RISC-V
  - west build -p -b neorv32 samples/hello\_world/
- On power-up, hit any key over console to enter bootloader 0
- Use neorv32 script to upload binary 0
- Run into issue when attempting to flash application 0
  - https://stnolting.github.io/neorv32/

ERR\_SIZE

Your program is way too big for the internal processor's instructions memory. Increase the memory size or reduce your application code.

Creative Commons





### << NEORV32 Bootloader >>

- BLDV: Jul 19 2024
- HWV: 0x01100209
- CLK: 0x016e3600
- MISA: 0x40800100
- XISA: 0x000008b
- 0x0013000d SOC:
- IMEM: 0x00004000
- DMEM: 0x00002000

Autoboot in 10s. Press any key to abort. Loading from SPI flash @0x00400000...

ERR\_SIZE

### Increase IMEM size

```
index 552a744..ed13551 100644
00 -102,7 +102,7 00 begin
  generic map (
    MEM_INT_IMEM_SIZE => 16*1024,
   MEM_INT_IMEM_SIZE => 32*1024,
+
    MEM_INT_DMEM_SIZE => 8*1024
   port map (
```

Creative Commons 💿 🛈





### --- a/osflow/board\_tops/neorv32\_OrangeCrab\_BoardTop\_MinimalBoot.vhd +++ b/osflow/board\_tops/neorv32\_OrangeCrab\_BoardTop\_MinimalBoot.vhd

neorv32\_inst: entity work.neorv32\_ProcessorTop\_MinimalBoot

CLOCK\_FREQUENCY => f\_clock\_c, -- clock frequency of clk\_i in Hz

- After re-building FPGA and re-uploading application, no "Hello World" :-(
- Try demo Blinky example under neorv32/sw
- Same issue 0
  - No indication of error
  - Console just sits there

Creative Commons 📀 🛈





# **NEXT STEPS**

- Troubleshoot why application not loading
- What debugging tools available and how to debug?
  - Equivalent of JTAG/SWD and GDB in FPGA land?
  - Have used Xilinx ILA and Synopsys Identify
    - How to capture signals in Lattice?
    - Open source tools?
    - Necessary hardware connections to perform troubleshooting?





### **NEXT STEPS**

- Get it working and generate comparison metrics in Zephyr
  - Present success at FOSDEM26 ?
- Integrate CFU intrinsic in Zephyr and upstream
- Investigate Custom Functions Subsystem 0
  - **CPU** independent operations





### Creative Commons 💿 🛈



### **NEXT STEPS**



- Try other hardware-accelerated functions
- **CFU** Playground
  - https://cfu-playground.readthedocs.io/ en/latest/
  - Different design/implementation
  - Meant for machine learning
  - Integrate with Zephyr and present results!
- **Custom Functions Subsystem** 
  - **CPU** independent
  - Integrate with Zephyr and present results!



# THANK YOU! **Mohammed Billoo MAB Labs Embedded Solutions FOSDEM25**

Creative Commons CC



