0% found this document useful (0 votes)

44 views

Branch Prediction: Case For Branch Prediction When Issue N Instructions Per Clock Cycle

1. Branch prediction is important for processors that can issue multiple instructions per cycle since branches will arrive faster than the processor can handle without prediction. 2. According to Amdahl's law, the relative impact of control stalls due to branches will be larger for processors that can issue more instructions per cycle. 3. Common branch prediction schemes include 1-bit and 2-bit prediction buffers, correlating branch predictors that use histories of recent branches, and branch target buffers that predict target addresses.

Uploaded by

shubham anurag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Branch Prediction: Case For Branch Prediction When Issue N Instructions Per Clock Cycle

Uploaded by

shubham anurag

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Case for Branch Prediction when

Issue N instructions per clock cycle

1. Branches will arrive up to n times faster in

an n-issue processor
Branch Prediction 2. Amdahl’s Law => relative impact of the
control stalls will be larger with the lower
potential CPI in an n-issue processor

conversely, need branch prediction to ‘see’

potential parallelism

Branch Prediction Schemes Parts of the predictor

1. 1-bit Branch-Prediction Buffer • Direction Predictor

• gzip: loop branch A@ 0x1200098d8 • gzip: if branch B@ 0x12000fa04

• Executed: 1359575 times • Executed: 151409 times

• Taken: 1359565 times • Taken: 71480 times
• Not-taken: 10 times • Not-taken: 79929 times
• % time taken: 99% - 100% • % time taken: ~49%

Easy to predict (direction and address) Easy to predict? (maybe not/ maybe dynamically)

Branch Backwards Dynamic Branch Prediction

3.5
3
• Performance = ƒ(accuracy, cost of prediction and misprediction)
% of total branches

not taken
2.5
taken • 1-bit branch prediction scheme – simplest.
2 • Branch History Table (BHT): Lower bits of PC address index
1.5 table of 1-bit values
– Says whether or not branch taken last time
1
– No address check (saves HW, but may not be right branch)
0.5 • Problem: in a loop, 1-bit BHT will cause
0 2 mispredictions (avg is 9 iterations before exit):
– End of loop case, when it exits instead of looping as before
5
20

95
5

0
00

– First time through loop on next time through code, when it predicts exit
-8

-7

-5

-4

-2

-1
-1

distance of branch target instead of looping

– Only 80% accuracy even if loop 90% of the time
Most backward branches are heavily TAKEN
Forward branches slightly more likely to be NOT-TAKEN
Ref: The Effects of Predicated Execution on Branch Prediction
Dynamic Branch Prediction
(Jim Smith, 1981)
1-bit predictor
• Better Solution: 2-bit scheme where change
• 1-bit history (direction predictor) prediction only if get misprediction twice:
– Remember the last direction for a branch
T
T NT
Branch History Table
branchPC Predict Taken 11 10 Predict Taken
T
Predict
Taken
T NT
NT
Predict Not 01 00 Predict Not
NT T NT T T
Taken Taken

• Red: stop, not taken

Predict
Not Taken NT
How big is the BHT?
• Green: go, taken
NT • Adds hysteresis to decision making process

Correlating Branches (2-level Branch Predictors)

2-bit predictor
Idea: taken/not Branch address (4 bits)

• 2-bit history (direction predictor) taken of recently

executed branches is 2-bits per branch
related to 2)
if (aa == behavior local predictors

Branch History Table of next branch

aa=0; (as
branchPC wellif (bb
as the
== 2)history of
that branch bb=0;behavior) Prediction
Prediction
– Then behavior of recent
if (aa !=bb)
branches {
selects
between, say, 4
predictions of next
SN NT T ST

branch, updating just

that prediction

How big is the BHT? • (2,2) predictor: 2-bit 2-bit recent global
global, 2-bit local branch history
(01 = not taken then taken)
Correlating Branches (2-level Branch Predictors) Example
BNEZ R1, L1 ; branch b1 (d!=0)
If (d == 0)
ADDI R1, R0, #1 ; d==0, so d=1
d = 1;
Branch address (k bits) L1: SUBI R3, R1, #1
if (d==1)
BNEZ R3, L2 ;branch b2 (d!=1)
(m, n) branch predictor n-bits per branch …
local predictors L2:
m-bit global history
n-bit local predictor
Local history only: Behavior of a 1-bit predictor initialized to NT (not taken)

The number of bits in an (m,n) predictor is: Prediction

Prediction Initial value Pred b1 real b1 Pred b2 real b2
of d
2^m x n x 2^k 2 NT T NT T
A 16 entry (k=4) (2, 2) predictor has: 0 T NT T NT

2^2 x 2 x 2^4 = 128 bits. 2 NT T NT T

m-bit recent global 0 T NT T NT
branch history

Example Example
BNEZ R1, L1 ; branch b1 (d!=0) BNEZ R1, L1 ; branch b1 (d!=0)
If (d == 0) If (d == 0)
ADDI R1, R0, #1 ; d==0, so d=1 ADDI R1, R0, #1 ; d==0, so d=1
d = 1; d = 1;
L1: SUBI R3, R1, #1 L1: SUBI R3, R1, #1
if (d==1) if (d==1)
BNEZ R3, L2 ;branch b2 (d!=1) BNEZ R3, L2 ;branch b2 (d!=1)
… …
L2: L2:
The action of the 1-bit predictor with 1-bit correlation -- (1,1) predictor The action of the 1-bit predictor with 1-bit correlation -- (1,1) predictor
Initialized to NT/NT Initialized to T/T

Initial value Pred b1 real b1 Pred b2 real b2 Initial value Pred b1 real b1 Pred b2 real b2
of d of d
2 NT/NT T NT/NT T 2 T/T T T/T T
0 T/NT NT NT/T NT 0 T/T NT T/T NT
2 T/NT T NT/T T 2 T/NT T NT/T T
0 T/NT NT NT/T NT 0 T/NT NT NT/T NT

Prediction
Example Example
NT T
Branch PC

00 00 00 00 00 00
NT

0 0 0
The action of the 2-bit predictor with 1-bit correlation -- (1,2) predictor The action of the 2-bit predictor with 2-bit correlation -- (2,2) predictor
Initialized to NT(00)/NT(00) Initialized to NT/NT/NT/NT

Initial Pred b1
real Pred b2 real Initial value Pred b1 real b1 Pred b2 real b2
d =? b1 b2 of d
2 NT(00)/NT(00) T NT(00)/NT(00) T 2 NT/NT/NT/NT T NT/NT/NT/NT T
0 NT(01)/NT(00) NT NT(00)/NT(01) NT 0 NT NT
2 NT(01)/NT(00) T NT(00)/NT(01) T 2 T T
0 T(11)/NT(00) NT NT(00)/T(11) NT 0 NT NT

Accuracy of Different Schemes

20% Re-evaluating Correlation
18%

18%
• Several of the SPEC benchmarks have less
4096 Entries 2-bit BHT
than a dozen branches responsible for 90%
Frequency of Mispredictions

16%

14%
Unlimited Entries 2-bit BHT of taken branches:
12%
1024 Entries (2,2) BHT program branch % static # = 90%
11%

10%
compress 14% 236 13
eqntott 25% 494 5
8%
6% 6% 6% gcc 15% 9531 2020
mpeg 10% 5598 532
6% 5% 5%
4%
4%
real gcc 13% 17361 3214
2% 1%
0%
1% • Real programs + OS more like gcc
• Small benefits beyond benchmarks for
0%
0%

correlation? problems with branch aliases?

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

What’s missing in this picture?

BHT Accuracy Tournament Predictors
• Motivation for correlating branch predictors is
• Mispredict because either: 2-bit predictor failed on important branches;
– Wrong guess for that branch by adding global information, performance
– Got branch history of wrong branch when index the improved
table
• 4096 entry table programs vary from 1% • Tournament predictors: use 2 predictors, 1
misprediction (nasa7, tomcatv) to 18% based on global information and 1 based on
(eqntott), with spice at 9% and gcc at 12% local information, and combine with a selector
• For SPEC92, • Hopes to select right predictor for right
4096 about as good as infinite table branch (or right context of branch)

7 Branch Prediction Schemes Parts of the predictor

1. 1-bit Branch-Prediction Buffer • Direction Predictor

– For conditional branches
2. 2-bit Branch-Prediction Buffer » Predicts whether the branch will be taken
3. Correlating Branch Prediction – Examples:
» Always taken; backwards taken
4. Tournament Branch Predictor
• Address Predictor
5. Branch Target Buffer – Predicts the target address (use if predicted taken)
6. Integrated Instruction Fetch Units – Examples:
» BTB; Return Address Stack; Precomputed Branch
7. Return Address Predictors • Recovery logic
Example
T
Example NT
T(11) T(10) PC of b1
BNEZ R1, L1 ; branch b1 (d!=0) T 00 00 00 00
NT NT
ADDI R1, R0, #1 ; d==0, so d=1 T PC of b2
L1: SUBI R3, R1, #1 T 00 00 00 00
BNEZ R3, L2 ;branch b2 (d!=1) NT(01) NT(00) NT
… NT
NT
L2:
0 0
The action of the 1-bit predictor with 1-bit correlation -- (1,1) predictor The action of the 2-bit (per address (local) history) predictor with 2-bit (global history) correlation --
Initialized to T/T (2,2) predictor (GAp predictor (Yeh&Patt’s terminology))
Initialized to NT(00)/NT(00)/NT(00)/NT(00)
Initial value Pred b1 real b1 Pred b2 real b2 Initial value Pred b1 real b1 Pred b2 real b2
of d
of d
2 NT(00)/NT(00)/NT(00)/NT(00) T NT(00)/NT(00)/NT(00)/NT(00) T
2 T/T T T/T T
0 NT(01)/NT(00)/NT(00)/NT(00) NT NT(00)/NT(01)/NT(00)/NT(00) NT
0 T/T NT T/T NT
2 NT(01)/NT(00)/NT(00)/NT(00) T NT(00)/NT(01)/NT(00)/NT(00) T
2 T/NT T NT/T T
0 T(11)/NT(00)/NT(00)/NT(00) NT NT(00)/TT(11)/NT(00)/NT(00) NT
0 T/NT NT NT/T NT

Example Example
T T
NT NT
T(11) T(10) PC of b1 T(11) T(10) PC of b1
T 10 01 10 00 T 10 01 10 00
NT T NT T
T PC of b2 T PC of b2
T 01 01 00 11 T 01 01 00 11
NT(01) NT(00) ? NT(01) NT(00) ?
NT NT
NT NT
0 0 0 0
The action of the 2-bit (per address (local) history) predictor with 2-bit (global history) correlation -- The action of the 2-bit (per address (local) history) predictor with 2-bit (global history) correlation --
(2,2) predictor (GAp predictor (Yeh&Patt’s terminology)) (2,2) predictor (GAp predictor (Yeh&Patt’s terminology))
Pred b1 real b1 Pred b2 real b2 Pred b1 real b1 Pred b2 real b2
T(10)/NT(01)/T(10)/NT(00) NT NT(01)/NT(01)/NT(00)/T(11) T T(10)/NT(01)/T(10)/NT(00) NT NT(01)/NT(01)/NT(00)/T(11) T

T NT NT(00)/NT(01)/T(10)/NT(00) T T(11)/NT(01)/NT(00)/T(11) NT

T NT NT(00)/T(11)/T(10)/NT(00) T T(11)/NT(01)/NT(00)/T(10) NT

T NT NT(00)/T(11)/T(11)/NT(00) T T(11)/NT(00)/NT(00)/T(10) NT

NT NT NT(00)/T(11)/T(11)/NT(00) NT T(11)/NT(00)/NT(00)/T(10) NT

T T NT(00)/T(11)/T(10)/NT(00) T T(10)/NT(00)/NT(00)/T(10) T

NT NT NT(01)/T(11)/T(10)/NT(00) NT T(10)/NT(01)/NT(00)/T(10) NT
Accuracy of Different Schemes
20% Re-evaluating Correlation
18%

18% • Several of the SPEC benchmarks have less

4096 Entries 2-bit BHT
than a dozen branches responsible for 90%
Frequency of Mispredictions

16%

14%
Unlimited Entries 2-bit BHT of taken branches:
12%
1024 Entries (2,2) BHT program branch % static # = 90%
11%

correlation? problems with branch aliases?

4,096 entries: 2-bits per entry Unlimited entries: 2-bits/entry 1,024 entries (2,2)

What’s missing in this picture?

BHT Accuracy Tournament Predictors

• Motivation for correlating branch predictors is
• Mispredict because either: 2-bit predictor failed on important branches;
– Wrong guess for that branch by adding global information, performance
– Got branch history of wrong branch when index the improved
table
• 4096 entry table programs vary from 1% • Tournament predictors: use 2 predictors, 1
misprediction (nasa7, tomcatv) to 18% based on global information and 1 based on
(eqntott), with spice at 9% and gcc at 12% local information, and combine with a selector
• For SPEC92, • Hopes to select right predictor for right
4096 about as good as infinite table branch (or right context of branch)
Tournament (hybrid) predictors Tournament Predictor in Alpha 21264
• 4K 2-bit counters to choose from among a global
predictor and a local predictor
• Global predictor also has 4K entries and is indexed by
Local predictor the history of the last 12 branches; each entry in the
Global/gshare predictor
(e.g. 2-bit) global predictor is a standard 2-bit predictor
(much more state)
– 12-bit pattern: ith bit 0 => ith prior branch not taken;
ith bit 1 => ith prior branch taken;
Prediction Prediction • Local predictor consists of a 2-level predictor:
1 2 – Top level a local history table consisting of 1024 10-bit
entries; each 10-bit entry corresponds to the most recent
10 branch outcomes for the entry. 10-bit history allows
Selection table patterns 10 branches to be discovered and predicted.
(2-bit state machine) Prediction – Next level Selected entry from the local history table is
used to index a table of 1K entries consisting a 3-bit
saturating counters, which provide the local prediction
How do you select which predictor to use? • Total size: 4K*2 + 4K*2 + 1K*10 + 1K*3 = 29K bits!
How do you update the various predictor/selector?
(~180,000 transistors)

% of predictions from local predictor Accuracy of Branch Prediction

in Tournament Prediction Scheme tomcatv
99%
99%
100%
0% 20% 40% 60% 80% 100% 95%
doduc 84%
nasa7 98% 97%
matrix300 100% 86%
tomcatv 94% fpppp 82% Profile-based
98%
doduc 90% 2-bit counter
spice 55% 88% Tournament
li 77%
fpppp 76% 98%
gcc 72% 86%
espresso 63% espresso 82%
96%
eqntott 37%
88%
li 69% gcc 70%
94%

0% 20% 40% 60% 80% 100%

Branch prediction accuracy
• Profile: branch profile from last execution
(static in that in encoded in instruction, but profile)
Need Address
Accuracy v. Size (SPEC89) at Same Time as Prediction
10%
• Branch Target Buffer (BTB): Address of branch index to get
9% prediction AND branch address (if taken)
– Note: must check for branch match now, since can’t use wrong branch address
8%
Local
7% Branch PC Predicted PC

PC of instruction
6%

FETCH
5%
Correlating
4%
3%
2%
Tournament
1% =? Extra
Yes: instruction is prediction state
branch and use bits
0% No: branch not predicted PC as
predicted, proceed normally
0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 120 128
next PC
Total predictor size (Kbits) (Next PC = PC+4)

Predicated Execution
Overriding Predictors • Avoid branch prediction by turning branches
into conditionally executed instructions:
• Big predictors are slow, but more accurate if (x) then A = B op C else NOP
– If false, then neither store result nor cause exception
x
• Use a single cycle predictor in fetch – Expanded ISA of Alpha, MIPS, PowerPC, SPARC have
conditional move; PA-RISC can annul any following
• Start the multi-cycle predictor instr. A=
– When it completes, compare it to the fast prediction. – IA-64: 64 1-bit condition fields selected B op C
» If same, do nothing so conditional execution of any instruction
» If different, assume the slow predictor is right and flush – This transformation is called “if-conversion”
pipline.
• Advantage: reduced branch penalty for those • Drawbacks to conditional instructions
branches mispredicted by the fast predictor and – Still takes a clock even if “annulled”
correctly predicted by the slow predictor – Stall if condition evaluated late
– Complex conditions reduce effectiveness;
condition becomes known late in pipeline
Pitfall: Sometimes bigger and
Special Case Return Addresses dumber is better
• 21264 uses tournament predictor (29 Kbits)
• Register Indirect branch hard to predict • Earlier 21164 uses a simple 2-bit predictor
address with 2K entries (or a total of 4 Kbits)
• SPEC95 benchmarks, 21264 outperforms
• SPEC89 85% such branches for procedure
– 21264 avg. 11.5 mispredictions per 1000 instructions
return – 21164 avg. 16.5 mispredictions per 1000 instructions
• Since stack discipline for procedures, save • Reversed for transaction processing (TP) !
return address in small buffer that acts like – 21264 avg. 17 mispredictions per 1000 instructions
a stack: 8 to 16 entries has small miss rate – 21164 avg. 15 mispredictions per 1000 instructions
• TP code much larger & 21164 hold 2X
branch predictions based on local behavior
(2K vs. 1K local predictor in the 21264)

Dynamic Branch Prediction Summary General speculation

• Prediction becoming important part of scalar
execution • Control speculation
• Branch History Table: 2 bits for loop accuracy – “I think this branch will go to address 90004”
• Correlation: Recently executed branches correlated • Data speculation
with next branch. – “I’ll guess the result of the load will be zero”
– Either different branches • Memory conflict speculation
– Or different executions of same branches – “I don’t think this load conflicts with any proceeding store.”
• Tournament Predictor: more resources to • Error speculation
competitive solutions and pick between them – “I don’t think there were any errors in this calculation”
• Branch Target Buffer: include branch address &
prediction
• Predicated Execution can reduce number of
branches, number of mispredicted branches
• Return address stack for prediction of indirect
jump
Speculation in general
Control Flow Speculation
• Need to be 100% sure on final correctness!
– So need a recovery mechanism NT T tag1
– Must make forward progress!
• Want to speed up overall performance NT T tag2
NT T
– So recovery cost should be low or expected rate of
occurrence should be low. NT T NT T NT T NT T
– There can be a real trade-off on accuracy, cost of tag3
recovery, and speedup when correct.
• Leading Speculation
• Should keep the worst case in mind… – Tag speculative instructions
– Advance branch and following instructions
– Buffer addresses of speculated branch instructions

Mis-speculation Recovery
Mis-speculation Recovery • Eliminate Incorrect Path
– Use branch tag(s) to deallocate completion buffer entries occupied by speculative
NT T tag1 instructions (now determined to be mis-speculated).
– Invalidate all instructions in the decode and dispatch buffers, as well as those in
reservation stations
NT T tag2
NT T How expensive is a misprediction?
tag2
NT T NT T NT T NT T • Start New Correct Path
– Update PC with computed branch target (if it was predicted NT)
tag3 tag3 tag3 – Update PC with sequential instruction address (if it was predicted T)
– Can begin speculation once again when encounter a new branch
How soon can you restart?
• Eliminate Incorrect Path
– Must ensure that the mis-speculated instructions produce no side
effects
• Start New Correct Path
– Must have remembered the alternate (non-predicted) path
Fast Branch Rewind and Restart:
Trailing Confirmation
NT T tag1 • Discard all ROB entries (and
corresponding operations) younger
than the mispredicted branches
NT T tag2
NT T
tag2 • Can restart immediately from oldest
NT T NT T NT T NT T the corrected branch target
because the ROB has sufficient
tag3 tag3 tag3 information (rename & value) to
continue from where left off another miss
• Works with nested misprediction
• Trailing Confirmation mispredictions!!
– When branch is resolved, remove/deallocate speculation another miss
youngest
tag
– Permit completion of branch and following instructions

Rewinding/Flushing of Rename Table

ARF Map Table PRF
data busy tag data rdy
next to
logical free
register next to
name allocate

Operand Value/Tag

• To reinitiate renaming:
– wait for all instructions older than the rewind point to drain clear of the
pipeline and then reset register remapping to null
Long restart latency
– Reorder buffer has to remember how to restored the map table to the point
of the mispredicted branch
Complicated multi-cycle logic
- Cache rename map after branch prediction

MS-364 - Bernard Lindenbaum Vertical Flight Research Collection
No ratings yet
MS-364 - Bernard Lindenbaum Vertical Flight Research Collection
214 pages
4TH Quarter Examination
100% (3)
4TH Quarter Examination
19 pages
8 - Branch Prediction
No ratings yet
8 - Branch Prediction
29 pages
Branch Prediction
No ratings yet
Branch Prediction
41 pages
L11 PipelineHazards 4
No ratings yet
L11 PipelineHazards 4
30 pages
17.L15 BranchPrediction
No ratings yet
17.L15 BranchPrediction
38 pages
L12 - Advanced Branch Preiction
No ratings yet
L12 - Advanced Branch Preiction
9 pages
Branch Predictors
No ratings yet
Branch Predictors
41 pages
Branch Prediction
No ratings yet
Branch Prediction
38 pages
Lec4 Supp Branch Prediction
No ratings yet
Lec4 Supp Branch Prediction
45 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
17 pages
L10 PipelineHazards 3
No ratings yet
L10 PipelineHazards 3
35 pages
CA_L15b_BranchPrediction_DynamicPredictors
No ratings yet
CA_L15b_BranchPrediction_DynamicPredictors
25 pages
5.Branch prediction
No ratings yet
5.Branch prediction
25 pages
07 Branch Prediction
No ratings yet
07 Branch Prediction
35 pages
AmarthyaRidheeshSethPravarProj1
No ratings yet
AmarthyaRidheeshSethPravarProj1
4 pages
Branch Handling
No ratings yet
Branch Handling
23 pages
RISC-V Pipeline P3
No ratings yet
RISC-V Pipeline P3
24 pages
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
No ratings yet
Branch Prediction: Prof. Mikko H. Lipasti University of Wisconsin-Madison
22 pages
Branch Prediction
No ratings yet
Branch Prediction
5 pages
branchPred
No ratings yet
branchPred
27 pages
10_branchprediction
No ratings yet
10_branchprediction
49 pages
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
No ratings yet
Prof. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 16 Branch Prediction
26 pages
CA_L15a_BranchPrediction_Intro_And_StaticPredictors
No ratings yet
CA_L15a_BranchPrediction_Intro_And_StaticPredictors
19 pages
Computer Architecture: Branching
No ratings yet
Computer Architecture: Branching
37 pages
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
No ratings yet
18 740 Fall15 Lecture05 Branch Prediction Afterlecture
93 pages
Cs433 Fa12 Hw4 Sol Correct
No ratings yet
Cs433 Fa12 Hw4 Sol Correct
14 pages
8 DynamicBranchPrediction
No ratings yet
8 DynamicBranchPrediction
8 pages
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
No ratings yet
Branch Prediction Techniques: Prof. Pimal Khanpara Department of Computer Science & Engineering
20 pages
hw2 Sols Ece570 w14
No ratings yet
hw2 Sols Ece570 w14
9 pages
Dynamic Branch Prediction With Perceptrons
No ratings yet
Dynamic Branch Prediction With Perceptrons
10 pages
Ue21ec341b 20240412163937
No ratings yet
Ue21ec341b 20240412163937
22 pages
Branch Prediction: Jeroen Lichtenauer
No ratings yet
Branch Prediction: Jeroen Lichtenauer
23 pages
lect09-adv-branch-prediction
No ratings yet
lect09-adv-branch-prediction
55 pages
asg7 question 4 chegg
No ratings yet
asg7 question 4 chegg
1 page
Finding Difficult Branches
No ratings yet
Finding Difficult Branches
19 pages
CA Lecture 4 Module 3
No ratings yet
CA Lecture 4 Module 3
27 pages
Lecture05 Branches
No ratings yet
Lecture05 Branches
47 pages
Implementing a Branch Predictor
No ratings yet
Implementing a Branch Predictor
7 pages
Correlating (Global) Branch Predictors Correlating Branch Predictors
No ratings yet
Correlating (Global) Branch Predictors Correlating Branch Predictors
3 pages
9 Types of Two Level Branch Predictor
No ratings yet
9 Types of Two Level Branch Predictor
4 pages
Branch_Prediction_Two_Level_c9dad57e-1c2e-47df-8284-25f3c9587a86
No ratings yet
Branch_Prediction_Two_Level_c9dad57e-1c2e-47df-8284-25f3c9587a86
2 pages
Dynamic Branch Prediction
No ratings yet
Dynamic Branch Prediction
7 pages
Lec 22
No ratings yet
Lec 22
11 pages
Branch Prediction
No ratings yet
Branch Prediction
2 pages
PDC S1
No ratings yet
PDC S1
23 pages
Pipeline Part 2 and Data Hazards
No ratings yet
Pipeline Part 2 and Data Hazards
11 pages
9.1.0 Branch Prediction Pentiums IBM PPC
No ratings yet
9.1.0 Branch Prediction Pentiums IBM PPC
163 pages
Branch Prediction: Joel Emer
No ratings yet
Branch Prediction: Joel Emer
36 pages
Branch Prediction Maryamhamza
No ratings yet
Branch Prediction Maryamhamza
12 pages
05 - Pipelining - Branch Prediction
No ratings yet
05 - Pipelining - Branch Prediction
20 pages
Exploring_Convolution_Neural_Network_for_Branch_Prediction
No ratings yet
Exploring_Convolution_Neural_Network_for_Branch_Prediction
9 pages
ANN BRANCH PREDICTION_compressed
No ratings yet
ANN BRANCH PREDICTION_compressed
15 pages
HW2 S24 Sol
No ratings yet
HW2 S24 Sol
15 pages
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
No ratings yet
The Schemes and Performances of Dynamic Branch Predictors: Chih-Cheng Cheng
18 pages
البحث الثاني
No ratings yet
البحث الثاني
10 pages
2 Level Type
No ratings yet
2 Level Type
14 pages
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
From Everand
Hidden Surface Determination: Unveiling the Secrets of Computer Vision
Fouad Sabry
No ratings yet
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
From Everand
WAN TECHNOLOGY FRAME-RELAY: An Expert's Handbook of Navigating Frame Relay Networks
Mamta Devi
No ratings yet
Virtual Boy Architecture: Architecture of Consoles: A Practical Analysis, #17
From Everand
Virtual Boy Architecture: Architecture of Consoles: A Practical Analysis, #17
Rodrigo Copetti
No ratings yet
CCNA Exam Focus: Study Guide with Practice Tests
From Everand
CCNA Exam Focus: Study Guide with Practice Tests
SUJAN
No ratings yet
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
From Everand
Scanline Rendering: Exploring Visual Realism Through Scanline Rendering Techniques
Fouad Sabry
No ratings yet
Code Generation: M.B.Chandak Lecture Notes On Language Processing
No ratings yet
Code Generation: M.B.Chandak Lecture Notes On Language Processing
19 pages
TPM17.1E Eu
No ratings yet
TPM17.1E Eu
112 pages
CH 5 Heat Exchanger Design Methods
100% (1)
CH 5 Heat Exchanger Design Methods
30 pages
h._geo_-_midterm_review_packet_-_1718
No ratings yet
h._geo_-_midterm_review_packet_-_1718
18 pages
F2424 UserManual
No ratings yet
F2424 UserManual
177 pages
Mathematical Modeling of Electrical Systems: Nader Sadegh
No ratings yet
Mathematical Modeling of Electrical Systems: Nader Sadegh
10 pages
MAN0012966 CloneJET PCR Cloning 40rxn UG PDF
No ratings yet
MAN0012966 CloneJET PCR Cloning 40rxn UG PDF
2 pages
Indefinite Integration (Practice Question)
No ratings yet
Indefinite Integration (Practice Question)
23 pages
Fiber Fabrication:: Liquid-Phase (Melting) Techniques
No ratings yet
Fiber Fabrication:: Liquid-Phase (Melting) Techniques
10 pages
Wind Catcher Datasheet
No ratings yet
Wind Catcher Datasheet
2 pages
9.an Integrated System For Regional
No ratings yet
9.an Integrated System For Regional
3 pages
Selection of Pipe Repair Methods DOT Project 359
100% (1)
Selection of Pipe Repair Methods DOT Project 359
174 pages
Science 7 q4 Module 1 Locating Placesofthe Earth v1
No ratings yet
Science 7 q4 Module 1 Locating Placesofthe Earth v1
34 pages
Mechanical Short Circuit Strength
No ratings yet
Mechanical Short Circuit Strength
12 pages
Soil Stabilization Using Coconut Coir: Presentation
No ratings yet
Soil Stabilization Using Coconut Coir: Presentation
76 pages
Ultrasonic Thickness Gauge NOVOTEST UT-1М-IP
No ratings yet
Ultrasonic Thickness Gauge NOVOTEST UT-1М-IP
4 pages
Third Periodical Test in Sci6
No ratings yet
Third Periodical Test in Sci6
4 pages
Lab 10 DB
No ratings yet
Lab 10 DB
3 pages
2002 OTC 14154 Riser System Selection and Design For A Deepwater FSO in The GOM
No ratings yet
2002 OTC 14154 Riser System Selection and Design For A Deepwater FSO in The GOM
12 pages
Phy130 Tutorial 8
No ratings yet
Phy130 Tutorial 8
3 pages
SS 600 6 4 SwagelokCompany 2DSalesDrawing 04 08 2022
No ratings yet
SS 600 6 4 SwagelokCompany 2DSalesDrawing 04 08 2022
1 page
XI Kinematics
No ratings yet
XI Kinematics
28 pages
05 Handout 1
No ratings yet
05 Handout 1
6 pages
Gel Permeation Chromatography
No ratings yet
Gel Permeation Chromatography
7 pages
I C Slave To SPI Master Bridge: December 2010 Reference Design RD1094
No ratings yet
I C Slave To SPI Master Bridge: December 2010 Reference Design RD1094
7 pages
Design of Splicing of Rafter
100% (1)
Design of Splicing of Rafter
3 pages
Seismic Design of Industrial Structures, Equipments and Piping Systems
No ratings yet
Seismic Design of Industrial Structures, Equipments and Piping Systems
2 pages
Edexcel International Lower Secondary Curriculum DRAFT SchemeOfWork Science
No ratings yet
Edexcel International Lower Secondary Curriculum DRAFT SchemeOfWork Science
14 pages

Uploaded by

Uploaded by

Case for Branch Prediction when

Issue N instructions per clock cycle

1. Branches will arrive up to n times faster in

conversely, need branch prediction to ‘see’

Branch Prediction Schemes Parts of the predictor

1. 1-bit Branch-Prediction Buffer • Direction Predictor

• gzip: loop branch A@ 0x1200098d8 • gzip: if branch B@ 0x12000fa04

• Executed: 1359575 times • Executed: 151409 times

Branch Backwards Dynamic Branch Prediction

distance of branch target instead of looping

• Red: stop, not taken

Correlating Branches (2-level Branch Predictors)

• 2-bit history (direction predictor) taken of recently

Branch History Table of next branch

branch, updating just

The number of bits in an (m,n) predictor is: Prediction

2^2 x 2 x 2^4 = 128 bits. 2 NT T NT T

Accuracy of Different Schemes

correlation? problems with branch aliases?

What’s missing in this picture?

7 Branch Prediction Schemes Parts of the predictor

1. 1-bit Branch-Prediction Buffer • Direction Predictor

18% • Several of the SPEC benchmarks have less

correlation? problems with branch aliases?

What’s missing in this picture?

BHT Accuracy Tournament Predictors

% of predictions from local predictor Accuracy of Branch Prediction

0% 20% 40% 60% 80% 100%

Dynamic Branch Prediction Summary General speculation

Rewinding/Flushing of Rename Table

You might also like