0% found this document useful (0 votes)
14 views

Hashing

The document discusses various searching algorithms, including Sequential Search, Binary Search, and Hashing, highlighting their efficiency and applications. It explains the concept of hash tables, their operations, and methods for handling collisions, such as chaining and linear probing. Additionally, it covers the characteristics of good hash functions and provides examples of hash function methods and their implementations.

Uploaded by

Hasnain Nisar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Hashing

The document discusses various searching algorithms, including Sequential Search, Binary Search, and Hashing, highlighting their efficiency and applications. It explains the concept of hash tables, their operations, and methods for handling collisions, such as chaining and linear probing. Additionally, it covers the characteristics of good hash functions and provides examples of hash function methods and their implementations.

Uploaded by

Hasnain Nisar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Structure Algorithms and

Applications
CT-157

Course Instructor
Engr. Vanesh Kumar
SEARCHING
Introduction
 Process of finding an element within the list of
elements in order or randomly.
 Retrieval: Successful Search.
 A table of records in which a key is used for
retrieval is often called a SEARCH TABLE or
DICTIONARY.
 Internal Searching – Whole data in main
memory
 External Searching – Most data is kept in
auxiliary memory.
Searching Methods
 Sequential or Linear Searching.

 Binary Search.

 Hashing.
Sequential Search
 Searches on unordered and ordered tables in
sequential manner until the desired record is not
found or table end is not reached.
 It is simple and good for small arrays.
 Mostly used when data is not sorted.
 Efficiency:
 Best – 𝑂(1)
𝑛
 Average – 𝑂( )
2
 Worst – 𝑂(𝑛)
 Less efficient if array size is large.
 Not efficient on sorted arrays.
Binary Search
 This technique works better on sorted arrays and
can be applied only on sorted arrays.
 Not applied on Linked Lists.
 Requires less number of comparisons than linear
search.
 Efficiency: 𝑂(𝑙𝑜𝑔2 𝑛).
 Logic behind the technique:
First Half Second Half

First Value Mid Value Last Value

low=0 mid=(low+high)/2 high=n-1


HASHING
Introduction
 The search operation on a sorted array using the
binary search method takes 𝑂(𝑙𝑜𝑔2 𝑛)
 We can improve the search time by using an
approach called Hashing.
 Usually implemented on Dictionaries.
 Well, there are lots of applications out there that
need to support ONLY the operations INSERT,
SEARCH, and DELETE. These are known as
“dictionary” operations.
 Hashing can make this happen in 𝑂(1) and is
quite fast in practice.
Dictionary
 A dictionary is a collection of elements
 Each element has a field called key
– (key, value)
 Every key is usually distinct.
 Typical dictionary operations are:
– Insert a pair into the dictionary
– Search the pair with a specified key
– Delete the pair with a specified key
 Collection of student records in a class
– (key, value) =(student-number, a list of assignment and
exam marks)
– All keys are distinct
Dictionary as an Ordered Linear List

 L = (e1, e2, e3, …, en)


 Each ei is a pair (key, value)
 Array or chain representation

– unsorted array: O(n) search time


– sorted array: O(logn) search time
– unsorted chain: O(n) search time
– sorted chain: O(n) search time
Hash Table
 A hash table is a data structure that stores elements and
allows insertions, lookups, and deletions to be performed in
𝑂(1) time.
 A hash table is an alternative method for representing a
dictionary
 In a hash table, a hash function is used to map keys into
positions in a table. This act is called hashing
 Hash Table Operations
– Search: compute f(k) and see if a pair exists
– Insert: compute f(k) and place it in that position
– Delete: compute f(k) and delete the pair in that position
 In ideal situation, hash table search, insert or delete takes
(1)
Why we need Hash Tables
 Internet routers is a good example of why hash
tables are required.
 A router table (especially in those routers in the
backbone networks of internet operators) may
contain hundreds of thousands or millions of
entries.
When a packet has to be routed to a specific IP ad
dress, the router has to determine the best route
by querying the router table in an efficient manner.
Hash Tables are used as an efficient lookup struct
ure having as key the IP address and as value the
path that should be follow for that address.
Why we need Hash Tables
How Does it Work
 The table part is just an ordinary array, it is the Hash that
we are interested in.
 The Hash is a function that transforms a key into address
or index of array(table) where the record will be stored. If
the size of the table is N, then the integer will be in the
range 0 to N-1. The integer is used as an index into the arr
ay. Thus, in essence, the key itself indexes the array.
 If h is a hash function and k is key then h(k) is called the
hash of the key and is the index at which a record with the
key k should be placed.
 The hash function generates this address by performing
some simple arithmetic or logical operations on the key.
Ideal Hashing Example
 Pairs are: (22,a),(33,c),(3,d),(72,e),(85,f)-
-(key, value) pairs
 Hash table is ht[0:7], m = 8 (where m is the
number of positions in the hash table)
 Hash function h is k % m = k % 8
 Where are the pairs stored?

[0] [1] [2] [3] [4] [5] [6] [7]

(72,e) (33,c) (3,d) (85,f) (22,a)


[0] [1] [2] [3] [4] [5] [6] [7]
Hashing Function Methods
(Hashing Methods)
 Division Hash Method
 The key K is divided by some number m and the
remainder is used as the hash address of K.
 h(k)=k mod m

 This gives the indexes in the range 0 to m-1 so the


hash table should be of size m
 This is an example of uniform hash function if value of
m will be chosen carefully.
 Generally a prime number is a best choice which
will spread keys evenly.
 A uniform hash function is designed to distribute the
keys roughly evenly into the available positions within
the array (or hash table).
Hashing Function Methods
 The Folding Method
 The key K is partitioned into a number of parts ,each of
which has the same length as the required address with
the possible exception of the last part .
 The parts are then added together , ignoring the
final carry, to form an address.
 Example: If key=356942781 is to be transformed into a
three digit address.
P1=356, P2=942, P3=781 are added to yield 079.
Hashing Function Methods
 The Mid- Square Method
 The key K is multiplied by itself and the address is
obtained by selecting an appropriate number of digits
from the middle of the square.
 The number of digits selected depends on the size of
the table.
 Example: If key=123456 is to be transformed.
 (123456)2 =15241383936
 If a three-digit address is required, positions 5 to 7
could be chosen giving address 138.
Hashing a string key
 Table size [0..99]
 A..Z ---> 1,2, ...26
 0..9 ----> 27,...36
 Key: CS1 --->3+19+28 (concat) = 31,928
 (31,928)2 = 1,019,397,184 - 10 digits
 Extract middle 2 digits (5th and 6th) as table
size is 0..99.
 Get 39, so: H(CS1) = 39.
Characteristics of a Good Hash Function

 The hash value is fully determined by the data


being hashed.

 The hash function uses all the input data.

 The hash function "uniformly" distributes the data


across the entire set of possible hash values.

 The hash function generates very different hash


values for similar strings.
Hash Function Examples
Let h(k) = k % 15. Then,
if k = 25 129 35 2501 47 36
h(k) = 10 9 5 11 2 6

Storing the keys in the array is straightforward:


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
_ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _

Thus, delete and find can be done in O(1), and


also insert, except…
Hash Function
What happens when you try to insert: k = 65 ?
k = 65
h(k) = 5

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
_ _ 47 _ _ 35 36 _ _ 129 25 2501 _ _ _
65(?)

This is called a collision.


Handling Collisions
 Chaining (Hashing with Chaining)

Open Addressing
– Linear Probing
Handling Collisions

Chaining
Separate Chaining
Let each array element be the head of a chain.
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
     
47 65 36 129 25 2501

35

Where would you store: 29, 16, 14, 99, 127 ?


Separate Chaining
Let each array element be the head of a chain:

Where would you store: 29, 16, 14, 99, 127 ?


0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
        
16 47 65 36 127 99 25 2501 14
  
35 129 29

New keys go at the front of the relevant chain.


Separate Chaining: Disadvantages
• Parts of the array might never be used.
• As chains get longer, search time increases
to O(n) in the worst case.
• Constructing new chain nodes is relatively
expensive .
• Is there a way to use the “unused” space in
the array instead of using chains to make
more space
?
Handling Collisions

Linear Probing
Linear Probing
Let key k be stored in element h(k)=t of the array
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 129 25 2501
65(?)

What do you do in case of a collision?


If the hash table is not full, attempt to store key in the
next array element
(in this case (t+1)%N, (t+2)%N, (t+3)%N …).
until you find an empty slot.
Linear Probing
Where do you store 65 ? [Here N is 15].
65%15=5
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501
  
attempts

Where would you store: 29?


Linear Probing
If the hash table is not full, attempt to store key
in array elements (t+1)%N, (t+2)%N, …
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
47 35 36 65 129 25 2501 29

attempts

Where would you store: 16?


Linear Probing

• If the hash table is not full, attempt to store key in


array elements (t+1)%N, (t+2)%N, …
• [16%15=1]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
16 47 35 36 65 129 25 2501 29

Where would you store: 14?


[14%15=14]
Linear Probing

• If the hash table is not full, attempt to store key in


array elements (t+1)%N, (t+2)%N, …
• [14%15=14]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 29
 
attempts

Where would you store: 99?


Linear Probing

• If the hash table is not full, attempt to store key in


array elements (t+1)%N, (t+2)%N, …
• [99%15=9]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
14 16 47 35 36 65 129 25 2501 99 29
   
attempts

Where would you store: 127 ?


Linear Probing

• If the hash table is not full, attempt to store key in


array elements (t+1)%N, (t+2)%N, …
• [127%15=7]
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
16 47 35 36 65 127 129 25 2501 29 99 14
 
attempts
Linear Probing
• Eliminates need for separate data structures
(chains), and the cost of constructing nodes.

• Leads to problem of clustering. Elements tend


to cluster in dense intervals in the array.
    

• Search efficiency problem remains.

You might also like