EE380 Assignment 1 Solution

  1. 100% From EE280, you should know how to build a decoder. How do the circuit characteristics of a decoder change in response to a linear increase in the number of address bits handled by the decoder?
    The delay through the decoder increases linearly
    The total number of gates needed increases linearly
    The delay through the decoder increases exponentially
    The total number of gates needed increases logarithmically
    The number of memory addresses that can be decoded increases linearly
  2. 100% Which of the following statements is/are true?
    The MFC signal is not used when writing to memory
    Allowing a few simultaneous writers to a bus is easy
    DRAM usually takes less power per bit held than SRAM
    In general, outputs are enabled at the start of a clock cycle
    There is a race between the address and data lines when writing to memory
  3. 100% Given this processor hardware design, add control states to the following to implement an XOR-with-immediate instruction (as decoded by the when below), such that xori $rt,$rs,immed yields rt=(rs^immed). This is actually a MIPS instruction, as we'll discuss later. Hint: it's a lot like the addi given to you, isn't it? You should add initial values and test your design using the simulator before submitting it here.
  4. 100% Given this processor hardware design, add control states to the following to implement an exchange-with-memory instruction (as decoded by the when below), such that xchg $rt,immed($rs) swaps the values in register rt and memory[rs+immed]. Hint: swaps are usually done using a temporary register -- Y is a good choice. Another hint: you only need to compute rs+immed once, because reading from memory doesn't change the value in the MAR. You should add initial values and test your design using the simulator before submitting it here.
  5. 100% What high-level languages call goto is usually called a jump is assembly language. Our goto is implemented by loading the PC with the address we wish to go to... which is very easy, except in that the address is 32 bits long, so it cannot fit in a 32-bit instruction word with an opcode field. Thus, our goto place instruction will be encoded in two consecutive words: the first holds the goto opcode, the second would be the address place. Implement this jump instruction. Hint: the PC is pointing at the second word of the instruction as the Goto control state is entered. You should add initial values and test your design using the simulator before submitting it here.
  6. 100% Given this processor hardware design and the control sequence below, describe in words (or C-like pseudo code) the function of the instruction xyzzy $rd.
    when op() op(1) Xyzzy
    
    Start:
     PCout, MARin, MEMread, Yin
     CONST(4), ALUadd, Zin, UNTILmfc
     MDRout, IRin
     Zout, PCin, JUMPonop
     HALT /* Should end here on undecoded op */
    
    Xyzzy:
     SELrd, REGout, Yin
     CONST(-1), ALUxor, Zin
     Zout, SELrd, REGin, JUMP(Start)
    

  7. 100% Given the xyzzy rd instruction as defined above, and assuming that a memory load request takes 3 clock cycles to complete (after MEMread has been issued), how many clock cycles would it take to execute each xyzzy instruction? You may use the simulator to get or check your answer. In any case, give and briefly explain your answer here:
  8. 100% Given this processor hardware design, suppose that the following control state is the limiting factor in determining the maximum clock speed. Given that the propagation delay associated with Zin is 1ns, MARin is 4ns, SELrd is 8ns, ALUadd is 16ns, and REGout is 2ns, what is the period (in nanoseconds) of the fastest allowable clock? You may use the simulator to get or check your answer. In any case, give and briefly explain your answer here:
    Zin, MARin, SELrd, ALUadd, REGout
    

  9. 100% Given this processor hardware design, add control states to the following to implement a multiply-by-3 instruction (as decoded by the when below), such that mul3 rd makes rd=3*rd;. Note that there is no multiplier per se in the ALU. You should add initial values and test your design using the simulator before submitting it here.

    You can test your code with:

    MEM[0]=op(1)+rd(8)
    MEM[4]=0
    $8=5
    

    Register $8 should end-up holding the value 0x0000000f.

  10. 100% Given this processor hardware design, add control states to the following to implement a clear memory instruction (as decoded by the when below), such that clr immed(rs) places the value zero in memory, i.e., mem[rs+immed]=0;. You should add initial values and test your design using the simulator before submitting it here.

    You can test your code with:

    MEM[0]=op(2)+rs(1)+immed(40)
    MEM[4]=0
    MEM[44]=601
    $1=4
    

    Memory MEM[44] should end-up holding the value 0x00000000.

  11. 100%, A particular program expressed in a particular ISA executes 100 ALU instructions, 5 Loads, 8 Stores, and 2 Branches. A simple, non-pipelined, implementation of that ISA takes 8 CPI for each ALU instruction, 20 CPI for each load, 10 CPI for each Store, and 10 CPI for each Branch. The original clock period is 10ns. How many clock cycles would the program take to execute? How many microseconds would the program take to execute?
  12. 100%, For this question, check all that apply. Given the circumstances described in the previous question, which of the following changes by itself would yield at least 2X speedup?
    Adding a cache memory reduces both Load and Store to 4 CPI
    An improved design reduces the CPI for ALU instructions from 8 to 4
    A new compiler reduces the number of ALU instructions from 100 to 50
    New VLSI fabrication technology changes the clock frequency to 250MHz
    A multi-cycle ALU makes ALU instructions take 18 CPI, but allows a 2ns clock period
    1000 @ 2ns = 2us
  13. 100%, A benchmark program that does no useful computation but is carefully constructed to have very similar statistical properties to the code you care about is called a benchmark.


EE380 Computer Organization and Design.