0% found this document useful (0 votes)
120 views

Appendix H: Authors: John Hennessy & David Patterson

Computer Architecture Appendix H

Uploaded by

a.shyam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views

Appendix H: Authors: John Hennessy & David Patterson

Computer Architecture Appendix H

Uploaded by

a.shyam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Appendix H

Authors: John Hennessy & David Patterson

Copyright 2011, Elsevier Inc. All rights Reserved. 1


Figure H.1 A software-pipelined loop chooses instructions from different loop iterations, thus separating
the dependent instructions within one iteration of the original loop. The start-up and finish-up code will
correspond to the portions above and below the software-pipelined iteration.

Copyright 2011, Elsevier Inc. All rights Reserved. 2


Figure H.2 The execution pattern for (a) a software-pipelined loop and (b) an unrolled loop. The shaded areas are the times
when the loop is not running with maximum overlap or parallelism among instructions. This occurs once at the beginning and once
at the end for the software-pipelined loop. For the unrolled loop it occurs m/n times if the loop has a total of m iterations and is
unrolled n times. Each block represents an unroll of n iterations. Increasing the number of unrollings will reduce the start-up and
clean-up overhead. The overhead of one iteration overlaps with the overhead of the next, thereby reducing the impact. The total
area under the polygonal region in each case will be the same, since the total number of operations is just the execution rate
multiplied by the time.

Copyright 2011, Elsevier Inc. All rights Reserved. 3


Figure H.3 A code fragment and the common path shaded with gray. Moving the assignments to B or C requires a more
complex analysis than for straight-line code. In this section we focus on scheduling this code segment efficiently without hardware
assistance. Predication or conditional instructions, which we discuss in the next section, provide another way to schedule this code.

Copyright 2011, Elsevier Inc. All rights Reserved. 4


Figure H.4 This trace is obtained by assuming that the program fragment in Figure H.3 is the inner loop and unwinding it
four times, treating the shaded portion in Figure H.3 as the likely path. The trace exits correspond to jumps off the frequent
path, and the trace entrances correspond to returns to the trace.

Copyright 2011, Elsevier Inc. All rights Reserved. 5


Figure H.5 This superblock results from unrolling the code in Figure H.3 four times and creating a superblock.

Copyright 2011, Elsevier Inc. All rights Reserved. 6


Figure H.11 The performance of four multiple-issue processors for five SPECfp and SPECint benchmarks. The clock rates
of the four processors are Itanium 2 at 1.5 GHz, Pentium 4 Extreme Edition at 3.8 GHz, AMD Athlon 64 at 2.8 GHz, and the IBM
Power5 at 1.9 GHz.

Copyright 2011, Elsevier Inc. All rights Reserved. 7

You might also like