Query Execution
Query Execution
Chapter 6
Outline
Introduction
An algebra for queries
Introduction to physical query plan operators
One pass algorithms for Database Operations
Nested-loop joins
Two-pass algorithms Based on Sorting
Two-pass algorithms based on hashing
Index-based algorithms
Buffer Management
Overview of parallel algorithms
Introduction
Previous chapters
Finding tuples given a search key
Query processor
Converts queries into a sequence of operations on
the database and executes the same.
Nave conversion may require more time.
Query Processor
Query
Query compilation
Chapter 7
Query execution
Meta data
Query
Compilation
Chapter 6
Query
Execution
data
Query rewrite
Parse tree is converted to initial query plan
Algebraic representation of a query
Query Execution
Execution of relational algebra operations
Union
Intersection
Difference
Selection
Projection
Joins
sorting
Selection:
Where clause
Projection
SECLECT clause
Product
Relations are FROM clause, whose products forms he relation with condition of
WHERE clause and projection of SELECT clause.
JOIN
JOIN, NATURAL JOIN, OUTER JOIN
Duplicate elimination
DISTINCT
Grouping
GROUP BY clause
Sorting
ORDER BY clause
SELECTION
(R)
C
Arithmetic
Comparison
Boolean
PROJECTION
L(R) projection R in list L.
L can have the following elements
An expression xy, where x and y are names
for attributes.
An expression Ez, where E is an expression
involving attributes of R, constants, arithmetic
operators and string operators and z is a new
names of the attribute.
Joins
Natural join
L(C(R X S))
C is the condition that equates all the pairs of R and S that
have the same name
L is a list of all the attributes of R and S, except one copy of
each pair of attributes is omitted.
Theta join
C(R X S), where condition C is of the form like x any
condition y.
Equijoin
C(R X S), where condition C is of equal form like x = y.
Outer join
Adding dangling tuples from R and S
L-170
L-230
L-260
3000
4000
1700
Relation borrower
customer-name
Jones
Smith
Hayes
loan-number
L-170
L-230
L-155
branch-name
L-170
L-230
L-260
L-155
Downtown
Redwood
Perryridge
null
amount customer-name
3000
4000
1700
null
loan-number
branch-name
L-170
L-230
Downtown
Redwood
amount customer-name
3000
4000
Jones
Smith
Borrower
branch-name
Downtown
Redwood
Perryridge
amount
3000
4000
1700
customer-name
Jones
Smith
null
borrower
loan-number
Borrower
L-170
L-230
L-260
loan
Jones
Smith
null
Hayes
loan
loan-number
L-170
L-230
L-155
borrower
branch-name
Downtown
Redwood
null
amount customer-name
3000
4000
null
Jones
Smith
Hayes
Duplicate Elimination
(R ) returns one copy of each tuple.
Group By
The relation is grouped according to the value of
the attribute
Having
Provides a condition
Sorting operator
We use the operator to sort a relation
SQL ORDER BY
L(R), L is a list of attributes (a1, a2,, an)
Expression Trees
Schema
MovieStar(name, address, gender, birthdate)
StarsIn(title, year, starName)
Query
Select title, birthdate
From MovieStar, StarsIn
Where Year = 1996 gender = F
starName=name
Expression Tree 1
19
20
Introduction to Physical
Query Plan Operators
Scanning tables
Read the entire contents of R.
Two basic operators
Get the blocks one by one
Table scan
Three functions
Open:
Starts the process of getting tuples, but does not get a tuple. It initializes
any data structures needed to perform the operation and calls Open for any
orguments of the operation.
GetNext
The function returns the next tuple in the result and adjusts data structures
as necessary to allow subsequent tuples to be obtained. It returns null, if
there are no more tuples produced.
Close
The function ends iteration after all tuples or all the tuples the consumer
wanted, have been obtained.
Open(R){
b:= the first block of R;
t:= the first tuple of block b;
found := true;
}
GetNext {
IF(t is past the last tuple on block b) { increment b to the next block;
IF(there is no next block) {Found:=FALSE; RETURN;}
ELSE /* b is a new block */
t:= first tuple on bock b;
Oldt:=t;
Increment t to the next tuple of b;
RETURN oldt;
}
CLOSE(R) {}
Iterators
Iterators can be combined by calling other
iterators.
Open(R,S){ R.open(); CurRel:=R;}
GetNext(R,S) { IF CurRel=R {t=R.GetNext()
IF (Fund) RETURN t ELSE S.open();
CurRel=S;}} Return S.GetNext();
Close(R,S){ R.Close(); S.Close();}
Three categories
Sorting-based methods
Hash-based methods
Index-based methods
Full relation
Seeing all the tuples in memory once
Grouping
The grouping operation L gives zero or more grouping
attributes and one or more aggregated attributes.
For each of grouping attribute create one entry for each value
and scan each block at a time
For a MIN(a) or MAX(a) aggregate, record min and max value,
respectively and change min or maximum whenever each time a tuple
of a group seen
For COUNT, add one to the corresponding group
For SUM(a), add the value of a to the accumulated sum.
AVG(a): maintain two accumulations: count and sum of values.
Until all the tuples are seen, we can not output the result.
Entire grouping is done in OPEN function.
Hash tables and balanced trees are used to find the entry in
each group.
The number o I/Os is B.
Set Intersection
Read S into M-1 buffers and build a search structure with full tuples as
search key. Read each block of R, for each t of R, see if t is also in S.
If so, copt t to the output. If not, ignore it.
Set difference
Assume R is larger relation. Read S to M-1 buffers and build a search
structure with full tuple as a search key.
R-S
Read each block of R. IF t is in S, ignore t; if it is not in S, copy t to the
output
S-R
Read each block of R. IF t is in S, ignore t; if it is in S, delete t from the
copy of S. If t is not in S we do nothing. After completing R, output the
remaining S tuples.
Bag difference
Similar.. (Check the book)
Product
Product
Read S into M-1 buffers. Read each
block of R. For each tuple of R,
concatenate t with each tuple of S.
Output concatenated file.
Natural Join
Natural join
R(X,Y) being joined with S(Y,Z)
Read all the tuples of S into M-1 blocks.
Read each block of R. For each t of R, find
the tuples that agree with all the attributes of
Y. For each matching tuple of S, form a
tuple by joining it with t, and move the
resulting tuple to the output.
Takes B(R)+B(S) disk I/Os
Example
B(R)=1000 and B(S)=500 and let M=101.
We use 100 blocks of memory for S
Outer loop iterates five times
Each iteration we have 100 disk I/Os
Second loop we must read entire R.
1000 I/Os
5500 I/Os
Summary
Operators
Approximate M
Required
Disk I/O
,
,
1
B
B
B
U, , , , Join
Min(B(R),B(S))
B(R )+B(S)
Join
Any M 2
B(R)B(S)/M
Complexity
Ignore the writing of output
Memory requirement
B M2 is required for the two pass algorithm to be feasible
(for one pass B<= M)
(R ) requires B(R ) blocks of memory
(for one pass B=M)
Complexity is similar to
B M2 is required for the two pass algorithm to be feasible
(R ) requires B(R ) blocks of memory
Bag intersection
Output t the minimum number of times t appears in R and in S.
R-S (set)
Output t, if only it appears in R but not in S.
R-S (bag)
Output t the number of times it appears in R minus the number of times
it appears in S. If both occur at equal number of times, we do not
output any tuples.
Analysis
3(B(S)+B(S)) disk I/Os.
B(R)+B(S) M2 for the algorithm to work.
If neither relation has sufficiently few tuples with Yvalue y that they all fit into M-1 buffers, do nestedloop join
Use M buffers to perform nested loop join on the tuples
with Y-value y from both relations.
Approximate M
Required
Disk I/O
3B
U, ,
Join
(B(R)+B(S))
3(B(R )+B(S))
max(B(R),B(S)) 5(B(R )+B(S))
Join
(B(R)+B(S)
3(B(R )+B(S))
Intersection or difference
Apply appropriate one pass algorithm
(B(R) + B(S)) I/Os.
Summary
Total 3(B(R)+B(S)) I/Os
The algorithms work iff Min(B(R), B(S)) M2
Summary
Total 3(B(R)+B(S)) I/Os
The algorithms work iff Min(B(R), B(S)) M2
Approximate M
Required
Disk I/O
3B
U, ,
B(S)
3(B(R )+B(S))
Join
B(S)
5(B(R )+B(S))
Join
B(S)
(3-2M/B(S))(B(R )
+B(S))
Index-based Algorithms
Index is available on one or more attributes of
a relation
Clustering idexes
Indexes on an attribute or attributes such that all
the tuples with a fixed value for the search key of
this index appear roughly as few blocks that hold
them
Only clustered relations have clustered index.
Index-based Selection
If there are no indexes on R, one has to read all the tuples of
R
Number of disk I/Os=B(R) or t(R )
Notions of clustering
Clustered file organization
Tuples of two relations are placed together in the
same block based on certain value.
Clustered relation
One block contains the tuples of one relation.
Clustered index
Tuples having given search key value are stored
together.
Reading S
For each tuple t of R, we must read an average of T(S)/V(S,Y) tuples of S.
S has non clustered index on Y, the number of disk I/Os required is T(R)
T(S)/V(S,Y) disk I/Os
If the index is clustered we only need T(R )B(S)/V(S,Y) disk I/Os office.
Buffer Management
Value of M vary depending on system
conditions
Buffer manager allocates buffers to processes
such that delay is minimized.
FIFO
Throw out the bock which has stayed for longer time.
CLOCK algorithm
Approximation of LRU
Buffers are arranged in a circle.
When the block is read it is set to 1. When the contents are accessed it
is set to 1.
Buffer manager replaces a block with 0. If it finds 1, it sets to 0.
System control
Take advice from query processor about the algorithm and pinned
blocks.
Pinned blocks
The blocks can not be moved with out
modifying other blocks that point to it.
Buffer manager avoid expelling pinned
blocks.
If the block is pinned, it will remain in
memory
Root block of B-tree is pinned
Pin the blocks of smaller relation in the join
Observations
If we use sort-based algorithm, it is possible to
adapt to changes to M.
If M shrinks, we can change the h=size of sublist.
Parallel Algorithms
Database operations can profit from parallel
processing
Models of parallelism
Shared memory
Each processor can access all the memory of all the
processors.
Single physical address space
Shared disk
Every processor has its own memory and not accessible by
other processors
Disks are accessible by every processor through network
Shared nothing
Each processor has its own memory and disk
Communication is through network
Shared Nothing
Tuple-at-a time operations in parallel
If there are P processors, divide Rs tuples
evenly among the P processors.
Send the selection and projection to the
processors.
Performance
Number of disk I/Os do not change
Time = 1/p of time + cost of shipping
Shipment over network is cheaper than disk
utilization.