EE380 Assignment 2 Solution
100%
, A particular program expressed in a particular ISA executes 100 ALU instructions, 8 Loads, 5 Stores, and 1 Branch. A simple, non-pipelined, implementation of that ISA takes 8 CPI for each ALU instruction, 10 CPI for each load, 20 CPI for each Store, and 20 CPI for each Branch. The original clock period is 3ns. How many clock cycles would the program take to execute? How many microseconds would the program take to execute?
# CPI Clock ALU 100 8 3ns Load 8 10 3ns Store 5 20 3ns Branch 1 20 3ns (100*8 + 8*10 + 5*20 + 1*20) * 3ns = (800 + 80 + 100 + 20) * 3ns = (1000 clock cycles) * 3ns = 3000ns = 3us
100%
,
For this question, check all that apply.
Given the circumstances described in question 1 above, which of the following changes by itself would yield at least 2X speedup?
Adding a cache memory reduces both Load and Store to 4 CPI
Can't -- ALU is unaffected and 80% of the time
An improved design reduces the CPI for ALU instructions from 8 to 4
Can't -- only halves ALU time, rest is unaffected
A new compiler reduces the number of ALU instructions from 100 to 50
Can't -- only halves ALU time, rest is unaffected
New VLSI fabrication technology changes the clock frequency to 500MHz
Can't -- 500HMz is 2ns, much longer than half 3ns
A multi-cycle ALU makes ALU instructions take 15 CPI, but allows a 1ns clock period
(100*15+8*10+5*20+1*20)*1ns=1700ns... still not fast enough!
100%
, What is a synthetic benchmark?
A simple program constructed to have the same statistical properties as the application you care about, but that does not actually perform the same (or any useful) computation; for example, it should have the same mix of different types of instructions.
100%
, A particular program consists of two functions,
a()
and
b()
. Initially,
a()
takes 10 clock cycles and
b()
takes 90 clock cycles. What is the maximum possible overall speedup that could be obtained by making changes that only affect the execution speed of
b()
?
slow original is 10+90 = 100 clocks fastest possible (not likely) is b takes 0 clocks, so 10+0 = 10 clocks speedup is 100/10 = 10X
100%
,
For this question, check all that apply.
Which of the following statements about performance is/are true?
Many processors now have a tick count performance register
On a single-core processor, unix user time cannot exceed real time
FLOPS are commonly used to measure supercomputer performance
It is very easy to measure time in a computer to within a processor clock cycle
The SPEC benchmarks are the best predictor of performance for your application
Computer Organization and Design.