EE380 Practice Assignment Solution Key

Here are answers for the practice assignment.


  1. In a typical Athlon XP or Pentium 4 PC, which one of the following four memory subsystems is not found on the processor chip?
    TLB
    L1 Cache
    L2 Cache
    Registers
    All four of the above typically are on the processor chip
  2. Which one of the following four statements about the memory hierarchy is false?
    For comparable cache size, a direct mapped cache is easier to build (simpler logic) than set associative cache
    Temporal Locality refers to an object being likely to be referenced again soon after being referenced once
    Modern processors often have separate caches for instructions and data
    Larger cache line sizes take better advantage of Spatial Locality
    All four of the above statements are true
  3. Consider the following two MIPS subset implementations:


    Which of the following four statements about how pipelining changes the architecture is false?
    The ALU used for operations like add and xor could be the same circuit in both implementations
    The ALU used to add 4 to the PC could be the same circuit in both implementations
    The Instruction Memory module could be the same circuit in both implementations
    The Data Memory module could be the same circuit in both implementations
    None of the above four statements is false; in fact, all of the modules can be the same circuits in both implementations because pipelining only adds buffers, changes/adds some datapaths, and modifies the control logic
  4. Pipelined designs generally achieve higher performance than similar single-cycle designs by allowing a higher clock rate, but the clock rate with a 5-stage pipeline is generally somewhat less than 5X the speed of the single-stage design it was derived from (e.g., compare the two MIPs implementations given in question 5). Give one reason why the clock rate is less than 5X.
  5. The first time a modern processor executes a particular branch instruction, it must compute the target address by adding the offset encoded within the branch instruction to the PC value. However, if the same instruction is executed again soon enough, the processor does not have to recompute the target address. Which hardware structure implements this feature?
    BTB
    TLB
    TLC
    Data Cache
    Instruction Cache
  6. Consider executing each of the following code sequences on the pipelined MIPS implementation given below (which does not incorporate value forwarding):

    Incidentally, both code sequences produce the same final results. Which of the following statements best describes the execution times you would expect to observe?
    (A)  addi $t1,$t0,4
         lw   $t2,0($t0)
         xor  $t2,$t2,$t3
    
    (B)  lw   $t2,0($t0)
         addi $t1,$t0,4
         xor  $t2,$t2,$t3
    

    (A) would be faster than (B)
    (B) would be faster than (A)
    (A) would take the same number of clock cycles as (B)
  7. Consider executing each of the following code sequences on the pipelined MIPS implementation given below:

    Also consider executing them on this design with value forwarding logic and datapaths added. Which of the following statements best describes how the forwarding logic would alter the execution times?
    (A)  lw   $t1,4($t0)
         sw   $t1,16($t2)
         beq  $t1,$t3,lab
    
    (B)  lw   $t1,4($t0)
         sw   $t2,16($t3)
         beq  $t0,$t3,lab
    

    Neither (A) nor (B) is affected by forwarding
    (A) is not affected, (B) would be faster using forwarding
    (A) would be faster using forwarding, (B) is not affected
    Both (A) and (B) would be faster using forwarding
    The execution time improvements due to forwarding depend on the values in the registers, not on the instructions being executed; thus, it is impossible to say how execution times for (A) and (B) are affected
  8. The Intel Pentium 4 has gone through several revisions; the following diagram shows the internals of the version known as Prescott. According to the diagram, which of the following techniques is not used in this design?

    Branch prediction
    Set-associative cache
    Separate L2 caches for code and data
    Superscalar execution of integer arithmetic
    Instruction scheduling with register renaming
  9. Suppose that a simple system has a single cache with an access time of 3 clock cycles. Cache misses are satisfied with an average memory latency of 200 clock cycles. Assuming a cache hit ratio of 0.9 (90%), how long does the average reference take? Don't worry about the numerical value of the answer; just show the formula that would give the answer.
  10. Given the declarations int a[N][N]; int i, j;, a C compiler would allocate N*N words in memory for a such that a[i][j] is (i*N)+j words after the memory location that holds a[0][0]. Given that N is large, which of the following two loop nests is likely to execute faster and why:
    (1) for (i=0; i<N; ++i) for (j=0; j<N; ++j) a[i][j] = 0;
    (2) for (j=0; j<N; ++j) for (i=0; i<N; ++i) a[i][j] = 0;
    


EE380 Computer Organization and Design.