EECE 476: Computer Architecture

Fall 2004

Assignment 2: Datapath and Control Design (revised Oct 16, 2004)

Due Date:  Monday, October 18, 2004


NOTE:  Two problems will be randomly chosen and marked.  Each problem will be graded out of 3 (0=wrong/missing, 1=good effort, 2=almost perfect, 3=perfect).
Submit your assignment in the box outside CICSR362 lab.  Please do not disturb the lab occupants (they are not 476 TAs), but they may respond well to gentle feeding.



You will design a CPU datapath and a controller.  You will implement some of this design into a CAD tool using Verilog.  The designs created in this assignment are the first stages of your project, so put good effort into it!  Performing the design work reinforces your understanding of the datapath.  The times given below assume you have already reviewed the lecture material.



·        For the single-cycle MIPS datapath, what changes are needed to add the JR instruction?  You will explore this in detail in a problem below.

·        For the multi-cycle MIPS datapath, what is the RTL sequence needed to execute a J instruction?  How many cycles does it take?  If you wanted it to take 1 less cycles, what changes are necessary to the datapath + controller?

·        For the multi-cycle MIPS datapath, what is the RTL sequence needed to execute a SW instruction?  How many cycles does it take?


1.      Design the basic datapath for a single-cycle NIOS-II processor for the essential instructions listed below.  For each instruction, ensure you have placed proper muxes and datapath components together so the computation can proceed.  You will find the NIOS II Processor Reference Handbook on the course web site.  You can use the MIPS datapath in the textbook/lecture as the reference design, but you must make some small changes to ensure that every instruction below works in your datapath.  Highlight these differences.  For this question, hand in a block-level diagram and label all of the blocks and signals.

·        Essential:  add, addi, and, andi, beq, br, cmplt, jmp, ldw, or, ori, stw, sub, xor

2.      Modify your NIOS-II datapath from the previous problem so that it supports the additional instructions below.  These will require additional changes to your datapath that may require a bit more thought.  Hand in separate block-level datapath diagrams for this question and the previous one.

·        Additional:  blt, bne, callr, orhi, ret, sll, slli, sra, srai

3.      Your NIOS-II datapath from the previous question is capable of executing the following free instructions without any modifications.  Verify this by giving a complete example of equivalent essential or additional instruction(s) for each instruction below.

·        Free:  mov, movhi, movi, movia, movui, nop, subi

4.      Give the NIOS-II instruction sequence needed to load a 32-bit constant in your datapath.

5.      Your datapath will not be implementing the remaining NIOS-II instructions.  They are listed here for completeness.  There is nothing for you to do for this question.

·        Arithmetic:  div, divu, mul, muli, mulxss, mulxsu, mulxuu

·        Logical and Shifting:  andhi, nor, xorhi, xori, rol, roli, ror, srl, srli

·        Branching:  bge, bgeu, bgt, bgtu, ble, bleu, bltu, call

·        Memory:  ldb/ldbio, ldbu/ldbuio, ldh/ldhio, ldhu/ldhuio, ldwio, stb/stbio, sth/sthio, stwio

·        Comparisons:  cmpeq, cmpeqi, cmpge, cmpgei, cmpgeu, cmpgeui, cmpgt(cmplt), cmpgti(cmpgei), cmpgtu (cmpltu), cmpgtui (cmpgeui), cmple(cmpge), cmplei(cmplti), cmpleu(cmpgeu), cmpleui(cmpltui), cmplt, cmplti, cmpltu, cmpltui, cmpne, cmpnei

·        Advanced:  break, bret, custom, eret, flushd, flushi, flushp, initd, initi, nextpc, rdctl, sync, trap, wrctl


6.      Design the datapath for UBCPU, a simplified multi-cycle processor.  This CPU does not have any general-purpose registers for computation, so there are no “load word” or “store word” instructions.  Instead, all calculations operate directly on values in memory.  Each instruction is 32 bits long, and each calculation uses 32-bits of data. The following instructions are supported:  ADD, SUB, BZ, BNEG.  For example:

·         ADD  0x124,0x125              Mem[0x124] ß Mem[0x124] + Mem[0x125]

·         ADD  0x124,[0x125]           Mem[0x124] ß Mem[0x124] + Mem[Mem[0x125]]

·         BZ     [0x012],0x023            if(Mem[Mem[0x012]]=0)  PC ß PC + SignExt{0x023}

During each instruction (including a taken branch), PC is also incremented by 1.

Instructions are encoded as follows:  opcode = Instr{31..26}, Op1Indirect = Instr{25}, Op2Indirect = Instr{24}, Operand1 = Instr{23..12}, Operand2 = Instr{11..0}.

Operands and Addressing:  Each of the instructions uses 2 operands.  Both operands are used as inputs.  For ADD or SUB, Operand1 also specifies where the result is stored.  Each operand is the word address of some value in memory, specified in the assembly code as “0x” followed by 3 hex digits (12 bits).  Normally, this is a “direct” operand, meaning the operand is a 12-bit memory word address located in the lowest 212 words of memory.  The [square brackets] means “indirect” mode is being used, setting the “Indirect” operand bit to 1.  In indirect mode, the CPU first uses the 12-bit word address to read a 32-bit word, then uses this 32-bit word as the effective word address of the final value.

For branch instructions (BZ for “branch if zero”, BNEG for “branch if negative”), Operand1 is the address of the value that is to be tested (i.e., if zero or if negative).  The Indirect bit can be used as before for Operand1.  Operand2 is a signed 12-bit branch target offset amount in words.  If Operand2 has the Indirect bit set to 1, then the 32-bit word at that 12-bit word address is the target word address (do not shift or add to the 32-bit value – this is an absolute address like a jump, not an offset).

Your CPU must contain only one 32-bit wide memory (combined instructions + data) with at least 16 address bits.  Assume the memory has one Address port, one WriteData port, one ReadData port, and one WriteEnable port.  The instructions can be located anywhere in memory.  Direct memory access by instructions can only access the lowest 212 words of memory, but Indirect access can reach any memory location.

a.       Hand in the datapath for your design.  Be sure to label all registers, muxes, and important signals.  You will need a PC, an InstructionRegister, plus numerous other temporary registers.  For registers that must hold their value more than one clock cycle, be sure to label an ‘enable’ control signal.

b.      Give the RTL steps that are needed for proper execution of each instruction.  Assume direct mode is always used for both operands.  How many clock cycles are required for each instruction?

c.       For the ADD instruction, show the modified RTL steps if both operands use indirect mode.  How many clock cycles are needed?

d.      For the BZ instruction, show the modified RTL steps if both operands use indirect mode.  How many clock cycles are needed?

CONTROLLER DESIGN (approx 2-3 hours)

7.      For each of the essential NIOS-II instructions, determine the required control signal values for your single-cycle datapath designed earlier.  Use a table similar to the one in lecture:  each of the bottom rows lists one control signal, each column lists one instruction.  The values in these table cells must indicate a ‘1’, ‘0’, or ‘X’ value.  Insert some additional top rows to show the instruction encoding of each instruction.

8.      Design the Finite State Machine controller for the multi-cycle UBCPU.  Instead of showing all states for all instructions, consider only the following cases:  the four instructions (all operands use direct mode), ADD with two indirect operands, and BZ with two indirect operands.  You only have to draw the FSM bubble diagram (don’t implement it in gates).


VERILOG DESIGN (3-5 hours, not on midterm, do not hand in for assignment 2)

1.      Construct an ALU bitslice in Verilog for the essential NIOS-II instructions.  This should be similar to your design from assignment 1.  A bitslice is the 1-bit portion of the ALU.  Compile the ALU bitslice using the Quartus II software.  You should exhaustively simulate it to verify that it works.  Hand in a printout of your ALU bitslice and a screen-capture of the simulation waveform (Alt-PrintScreen).  The waveform must show support for the following operations:  ADD, AND, SUBTRACT, OR, XOR.

2.      Create a shifter for use with the following NIOS-II instructions:  sll, slli, sra, srai.

You can let Verilog do the shifting for you, or build your own shifter as follows.  Shifting in one direction by an arbitrary amount up to 31 positions is specified by a 5-bit value, called S4S3S2S1S0.  The shifting itself is done with 5 levels of multiplexers.  The first level uses S4 and thirtytwo two-to-one multiplexers to either shift the data by 16 positions or by 0 positions.  When shifted by 16 positions, new bit values of ‘0’ are inserted.  The second level uses S3 and thirtytwo two-to-one multiplexers to shift the output of the previous level by 8 positions or by 0 positions.  Repeat for the 3rd, 4th, and 5th levels to optionally shift by 4, 2 or 1 positions, respectively.  You must find a way to modify this to shift in two directions (left or right).

Using the Quartus II software, implement, compile, and simulate these two operations (shifting left or right) together in the same Verilog module.  Hand in this Verilog code and a screen-capture of the simulation waveform.  Show at least 3 shift amounts applied to each direction.

3.      Construct the full ALU by creating a new Verilog component that combines your ALU bitslices and the shifter.  The ALU should have two 32-bit inputs, ALUA and ALUB, and a 32-bit result ALUR. Your ALU should also have two 1-bit outputs named N and Z to indicate NEGATIVE and ZERO results.  Hand in this Verilog code.

4.      Create a new Verilog component to hold the program counter and address calculation logic.  Include the adder(s) necessary to fetch the next instruction (PC ß PC + 4) as well as the branch, JMP, and CALLR instructions.  You do not have to re-use your ALU bitslices; use the + Verilog operator instead.

5.      You will assemble the register file, instruction memory, data memory, and other components above into a completed datapath.  You can get a head start on this now if you want.  For the missing pieces (register file and memories), you will have to modify the Verilog modules from last year’s web site.  Target an APEX20K200 device in Quartus.