-
100%
From EE280, you should know how to build a decoder. How do the
circuit characteristics of a decoder change in response to a
linear increase in the number of address bits handled by the
decoder?
The delay through the decoder increases linearly
The total number of gates needed increases linearly
The delay through the decoder increases exponentially
The total number of gates needed increases logarithmically
The number of memory addresses that can be decoded increases linearly
-
100%
Which of the following statements is/are true?
The MFC signal is not used when writing to memory
Allowing a few simultaneous writers to a bus is easy
DRAM usually takes less power per bit held than SRAM
In general, outputs are enabled at the start of a clock cycle
There is a race between the address and data lines when writing to memory
-
100%
Given this processor
hardware design, add control states to the following to
implement an XOR-with-immediate instruction (as decoded by the when
below), such that xori $rt,$rs,immed yields rt=(rs^immed).
This is actually a MIPS instruction, as we'll discuss later.
Hint: it's a lot like the addi given to you, isn't it?
You should add initial values and test your design using the
simulator before submitting it here.
-
100%
Given this processor hardware design, add control states to the
following to implement an exchange-with-memory instruction (as decoded
by the when below), such that xchg $rt,immed($rs) swaps the
values in register rt and memory[rs+immed].
Hint: swaps are usually done using a temporary register --
Y is a good choice.
Another hint: you only need to compute rs+immed once,
because reading from memory doesn't change the value in the MAR.
You should add initial values and test your design using the
simulator before submitting it here.
-
100%
What high-level languages call goto is usually called a
jump is assembly language. Our goto is implemented by loading
the PC with the address we wish to go to... which is very easy, except
in that the address is 32 bits long, so it cannot fit in a 32-bit
instruction word with an opcode field. Thus, our goto place
instruction will be encoded in two consecutive words: the first holds
the goto opcode, the second would be the address place.
Implement this jump instruction.
Hint: the PC is pointing at the second word of the instruction
as the Goto control state is entered.
You should add initial values and test your design
using the
simulator before submitting it here.
-
100%
Given this processor
hardware design and the control sequence below, describe in
words (or C-like pseudo code) the function of the instruction
xyzzy $rd.
when op() op(1) Xyzzy
Start:
PCout, MARin, MEMread, Yin
CONST(4), ALUadd, Zin, UNTILmfc
MDRout, IRin
Zout, PCin, JUMPonop
HALT /* Should end here on undecoded op */
Xyzzy:
SELrd, REGout, Yin
CONST(-1), ALUxor, Zin
Zout, SELrd, REGin, JUMP(Start)
-
100%
Given the xyzzy rd instruction as defined
above, and assuming that a memory load request takes 3
clock cycles to complete (after MEMread has
been issued), how many clock cycles would it take to execute
each xyzzy instruction? You may use the simulator to
get or check your answer. In any case, give and briefly explain
your answer here:
-
100%
Given this processor
hardware design, suppose that the following control state is
the limiting factor in determining the maximum clock speed.
Given that the propagation delay associated with Zin
is 1ns, MARin is 4ns, SELrd is 8ns,
ALUadd is 16ns, and REGout is 2ns, what is the
period (in nanoseconds) of the fastest allowable clock? You may
use the simulator to get or check your answer. In any case, give
and briefly explain your answer here:
Zin, MARin, SELrd, ALUadd, REGout
-
100%
Given this processor
hardware design, add control states to the following to
implement a multiply-by-3 instruction (as decoded by the when
below), such that mul3 rd makes rd=3*rd;.
Note that there is no multiplier per se in the ALU.
You should add initial values and test your design using the
simulator before submitting it here.
You can test your code with:
MEM[0]=op(1)+rd(8)
MEM[4]=0
$8=5
Register $8 should end-up holding the value 0x0000000f.
-
100%
Given this processor
hardware design, add control states to the following to
implement a clear memory instruction (as decoded by the when
below), such that clr immed(rs) places the value
zero in memory, i.e., mem[rs+immed]=0;.
You should add initial values and test your design using the
simulator before submitting it here.
You can test your code with:
MEM[0]=op(2)+rs(1)+immed(40)
MEM[4]=0
MEM[44]=601
$1=4
Memory MEM[44] should end-up holding the value 0x00000000.
-
100%,
A particular program expressed in a particular ISA executes 100
ALU instructions, 5 Loads, 8 Stores, and 2 Branches. A simple,
non-pipelined, implementation of that ISA takes 8 CPI for each
ALU instruction, 20 CPI for each load, 10 CPI for each Store,
and 10 CPI for each Branch. The original clock period is 10ns.
How many clock cycles would the program take to execute?
How many microseconds would the program take to execute?
-
100%,
For this question, check all that apply.
Given the circumstances described in the previous question, which of
the following changes by itself would yield at least 2X speedup?
Adding a cache memory reduces both Load and Store to 4 CPI
An improved design reduces the CPI for ALU instructions from 8 to 4
A new compiler reduces the number of ALU instructions from 100 to 50
New VLSI fabrication technology changes the clock frequency to 250MHz
A multi-cycle ALU makes ALU instructions take 18 CPI, but allows a 2ns clock period
1000 @ 2ns = 2us
-
100%,
A benchmark program that does no useful computation but is carefully
constructed to have very similar statistical properties to the code you care
about is called a benchmark.