3-Stage RISC-V Processor
This project is a 3-stage, in-order RISC-V CPU in Verilog (IF → Decode/Execute → Mem/Writeback) targeting the Sky130flow. It includes a direct-mapped instruction cache + data cache and a stall/flush scheme for hazards and cache misses. It also implements forwarding paths to eliminate common data hazards and added system features like CSR writes. The design was taken through synthesis and place-and-route, reporting post-PAR timing and PPA.
Background
This project was done with Nicolas Rakela, for the the EECS 151 (ASIC Lab) course at UC Berkeley. It was an end-to-end CPU design exercise, in which we implemented synthesizable RTL, validated using testbenches, and ran the ASIC flow including synthesis and place and route in the Skywater 130nm process.
We build a working pipelined core first and improved throughput via hazards/forwarding and a cache-based memory system.
Core Microarchitecture (3 Stages)
We have a 3-stage pipeline:
1) IF (Instruction Fetch): fetches from the instruction cache and updates the PC
2) IDX (Instruction Decode + Execute): decodes opcode/funct, reads regfile, performs ALU ops / branch decisions
3) MEM/WB (Memory + Writeback): handles loads/stores via the data cache and writes results back to the register file
Data hazards: forwarding instead of stalling
To avoid bubbles on dependent instruction sequences, we added forwarding paths from the writeback result back into:
- ALU operand inputs
- store-data path (write data toward memory)
- branch comparator inputs
This effectively removed the common RAW hazard cases for a 3-stage in-order design, keeping stalls mostly for control hazards and memory delays.
Control hazards: stall/flush on mispredict
Branches are resolved with dedicated compare logic. On a mispredicted branch, the next in-flight instruction is flushed by injecting a no-op into the appropriate stages.
Memory stalls: cache hit latency + miss servicing
Because both instruction and data memories are synchronous, the pipeline must stall when the cache/memory system can’t return in time:
- data hit access requires additional cycles (synchronous cache timing)
- miss servicing includes multi-cycle refills (and possible writeback on dirty eviction)
Memory System: Instruction + Data Caches
We built separate direct-mapped I-cache and D-caches to avoid structural hazards between fetch and load/store. Each address is partitioned into tag / index / block offset. Metadata stores valid, dirty, and tag bits. The cache controller uses an FSM with several states such as IDLE, TAG_CHECK, WRITE, WRITE_BACK, MEM_FETCH, and MEM_RESP.
Verification approach
We utilized assembly tests to validate instruction-level correctness and higher level benchmark tests written in C to validate pipeline and cache interactions.
Synthesis + place-and-route results (Sky130)
Timing
- Post-synthesis critical path: 18.067 ns (slack 1.591 ns)
- Post-PAR critical path: 19.414 ns (slack 0.246 ns)