EE380 Assignment 1 Solution

100% From EE280, you should know how to build a decoder. How do the circuit characteristics of a decoder change in response to a linear increase in the number of address bits handled by the decoder?
The delay through the decoder increases linearly
The total number of gates needed increases linearly
The delay through the decoder increases exponentially
The total number of gates needed increases logarithmically
The number of memory addresses that can be decoded increases linearly
100% Which of the following statements is/are true?
The MFC signal is not used when writing to memory
Allowing a few simultaneous writers to a bus is easy
DRAM usually takes less power per bit held than SRAM
In general, outputs are enabled at the start of a clock cycle
There is a race between the address and data lines when writing to memory
100% Given this processor hardware design, add control states to the following to implement an XOR-with-immediate instruction (as decoded by the when below), such that xori $rt,$rs,immed yields rt=(rs^immed). This is actually a MIPS instruction, as we'll discuss later. Hint: it's a lot like the addi given to you, isn't it? You should add initial values and test your design using the simulator before submitting it here.
when (op()) {addi} Addi when (op()) {xori} Xori Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, IRin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Addi: SELrs, REGout, Yin IRimmedout, ALUadd, Zin Zout, SELrt, REGin, JUMP(Start) Xori: SELrs, REGout, Yin IRimmedout, ALUxor, Zin Zout, SELrt, REGin, JUMP(Start)
100% Given this processor hardware design, add control states to the following to implement an exchange-with-memory instruction (as decoded by the when below), such that xchg $rt,immed($rs) swaps the values in register rt and memory[rs+immed]. Hint: swaps are usually done using a temporary register -- Y is a good choice. Another hint: you only need to compute rs+immed once, because reading from memory doesn't change the value in the MAR. You should add initial values and test your design using the simulator before submitting it here.
when (op()) op(1) Xchg Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, IRin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Xchg: SELrs, REGout, Yin IRimmedout, ALUadd, Zin Zout, MARin, MEMread SELrt, REGout, Yin, UNTILmfc MDRout, SELrt, REGin Yout, MDRin, MEMwrite, JUMP(Start)
100% What high-level languages call goto is usually called a jump is assembly language. Our goto is implemented by loading the PC with the address we wish to go to... which is very easy, except in that the address is 32 bits long, so it cannot fit in a 32-bit instruction word with an opcode field. Thus, our goto place instruction will be encoded in two consecutive words: the first holds the goto opcode, the second would be the address place. Implement this jump instruction. Hint: the PC is pointing at the second word of the instruction as the Goto control state is entered. You should add initial values and test your design using the simulator before submitting it here.
when op() op(2) Goto Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, IRin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Goto: PCout, MARin, MEMread UNTILmfc MDRout, PCin, JUMP(Start)

100% Given this processor hardware design and the control sequence below, describe in words (or C-like pseudo code) the function of the instruction xyzzy $rd.

when op() op(1) Xyzzy

Start:
 PCout, MARin, MEMread, Yin
 CONST(4), ALUadd, Zin, UNTILmfc
 MDRout, IRin
 Zout, PCin, JUMPonop
 HALT /* Should end here on undecoded op */

Xyzzy:
 SELrd, REGout, Yin
 CONST(-1), ALUxor, Zin
 Zout, SELrd, REGin, JUMP(Start)

100% Given the xyzzy rd instruction as defined above, and assuming that a memory load request takes 3 clock cycles to complete (after MEMread has been issued), how many clock cycles would it take to execute each xyzzy instruction? You may use the simulator to get or check your answer. In any case, give and briefly explain your answer here:
Thre are a total of 7 states passed through, but we stay in the UNTILmfc one for 3 cycles. That's a total of 9 CPI.
100% Given this processor hardware design, suppose that the following control state is the limiting factor in determining the maximum clock speed. Given that the propagation delay associated with Zin is 1ns, MARin is 4ns, SELrd is 8ns, ALUadd is 16ns, and REGout is 2ns, what is the period (in nanoseconds) of the fastest allowable clock? You may use the simulator to get or check your answer. In any case, give and briefly explain your answer here:
```
Zin, MARin, SELrd, ALUadd, REGout
```
There are two circuit paths: SELrd->REGout->MARin delay 8+2+4=14 SELrd->REGout->ALUadd->Zin delay 8+2+16+1=27 27>14, so we need a clock >=27ns period.
100% Given this processor hardware design, add control states to the following to implement a multiply-by-3 instruction (as decoded by the when below), such that mul3 rd makes rd=3*rd;. Note that there is no multiplier per se in the ALU. You should add initial values and test your design using the simulator before submitting it here.
when op() op(1) Mul3 Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, IRin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Mul3: SELrd, REGout, Yin Yout, ALUadd, Zin Zout, ALUadd, Zin Zout, SELrd, REGin, JUMP(Start)
You can test your code with:
```
MEM[0]=op(1)+rd(8)
MEM[4]=0
$8=5
```
Register $8 should end-up holding the value 0x0000000f.
100% Given this processor hardware design, add control states to the following to implement a clear memory instruction (as decoded by the when below), such that clr immed(rs) places the value zero in memory, i.e., mem[rs+immed]=0;. You should add initial values and test your design using the simulator before submitting it here.
when op() op(2) Clr Start: PCout, MARin, MEMread, Yin CONST(4), ALUadd, Zin, UNTILmfc MDRout, IRin Zout, PCin, JUMPonop HALT /* Should end here on undecoded op */ Clr: SELrs, REGout, Yin IRimmedout, ALUadd, Zin Zout, MARin CONST(0), MDRin, MEMwrite, JUMP(Start)
You can test your code with:
```
MEM[0]=op(2)+rs(1)+immed(40)
MEM[4]=0
MEM[44]=601
$1=4
```
Memory MEM[44] should end-up holding the value 0x00000000.
100%, A particular program expressed in a particular ISA executes 100 ALU instructions, 5 Loads, 8 Stores, and 2 Branches. A simple, non-pipelined, implementation of that ISA takes 8 CPI for each ALU instruction, 20 CPI for each load, 10 CPI for each Store, and 10 CPI for each Branch. The original clock period is 10ns. How many clock cycles would the program take to execute? How many microseconds would the program take to execute?
Exec CPI ALU 100 8 = 800 Load 5 20 100 Store 8 10 80 Branch 2 10 20 ---- 1000 cycles @ 10ns = 10,000ns, or 10us
100%, For this question, check all that apply. Given the circumstances described in the previous question, which of the following changes by itself would yield at least 2X speedup?
Adding a cache memory reduces both Load and Store to 4 CPI
An improved design reduces the CPI for ALU instructions from 8 to 4
A new compiler reduces the number of ALU instructions from 100 to 50
New VLSI fabrication technology changes the clock frequency to 250MHz
A multi-cycle ALU makes ALU instructions take 18 CPI, but allows a 2ns clock period
1000 @ 2ns = 2us
100%, A benchmark program that does no useful computation but is carefully constructed to have very similar statistical properties to the code you care about is called a Synthetic benchmark.

Computer Organization and Design.