Cpu 151

3-Stage RISC-V Processor

This project is a 3-stage, in-order RISC-V CPU in Verilog (IF → Decode/Execute → Mem/Writeback) targeting the Sky130flow. It includes a direct-mapped instruction cache + data cache and a stall/flush scheme for hazards and cache misses. It also implements forwarding paths to eliminate common data hazards and added system features like CSR writes. The design was taken through synthesis and place-and-route, reporting post-PAR timing and PPA.

Verilog SystemVerilog Synopsys VCS

Background

This project was done with Nicolas Rakela, for the the EECS 151 (ASIC Lab) course at UC Berkeley. It was an end-to-end CPU design exercise, in which we implemented synthesizable RTL, validated using testbenches, and ran the ASIC flow including synthesis and place and route in the Skywater 130nm process.

We build a working pipelined core first and improved throughput via hazards/forwarding and a cache-based memory system.

Core Microarchitecture (3 Stages)

We have a 3-stage pipeline:

1) IF (Instruction Fetch): fetches from the instruction cache and updates the PC
2) IDX (Instruction Decode + Execute): decodes opcode/funct, reads regfile, performs ALU ops / branch decisions
3) MEM/WB (Memory + Writeback): handles loads/stores via the data cache and writes results back to the register file

Data hazards: forwarding instead of stalling

To avoid bubbles on dependent instruction sequences, we added forwarding paths from the writeback result back into:

ALU operand inputs
store-data path (write data toward memory)
branch comparator inputs

This effectively removed the common RAW hazard cases for a 3-stage in-order design, keeping stalls mostly for control hazards and memory delays.

Control hazards: stall/flush on mispredict

Branches are resolved with dedicated compare logic. On a mispredicted branch, the next in-flight instruction is flushed by injecting a no-op into the appropriate stages.

Memory stalls: cache hit latency + miss servicing

Because both instruction and data memories are synchronous, the pipeline must stall when the cache/memory system can’t return in time:

data hit access requires additional cycles (synchronous cache timing)
miss servicing includes multi-cycle refills (and possible writeback on dirty eviction)

Memory System: Instruction + Data Caches

We built separate direct-mapped I-cache and D-caches to avoid structural hazards between fetch and load/store. Each address is partitioned into tag / index / block offset. Metadata stores valid, dirty, and tag bits. The cache controller uses an FSM with several states such as IDLE, TAG_CHECK, WRITE, WRITE_BACK, MEM_FETCH, and MEM_RESP.

Verification approach

We utilized assembly tests to validate instruction-level correctness and higher level benchmark tests written in C to validate pipeline and cache interactions.

Synthesis + place-and-route results (Sky130)

Timing

Post-synthesis critical path: 18.067 ns (slack 1.591 ns)
Post-PAR critical path: 19.414 ns (slack 0.246 ns)