Chapter One ISR
Chapter One ISR
Chapter One
Overview of Information Retrieval
Introduction to Information
10/16/2017 Retrieval 1
1
IR and IR Systems
Information
retrieval (IR) is the process of searching for relevant
documents from unstructured large corpus that satisfy users
information need .
According to Baeze-Yates & Riberio-Neto Information
retrieval deals with representation, storage, organization
of, and access to information items.
➢ “take a picture”
Retrieval
DB
Browsing
USER
Given:
A corpus of textual natural-language documents.
A user query in the form of a textual string.
Find:
A ranked set of documents that are relevant to the
query.
Document
corpus
Quer IR
y System
Strin
1. Doc1
g 2. Doc2
Ranked 3. Doc3
.
Documents .
User Interface
Text
User
Text Operations
Need
Logical View
User Query DB Manager
Feedback Operations Indexing
Module
Inverted
file
Query Searching Index
Text
Ranked Retrieved Database
Docs Ranking Docs
Issues that arise in IR
Text representation
what makes a “good” representation?
how is a representation generated from text?
what are retrievable objects and how are they organized?
Information needs representation
what is an appropriate query language?
how can interactive query formulation and refinement be
supported?
Comparing representations (to identify relevant
documents)
What weighting scheme and similarity measure to be used?
what is a “good” model of retrieval?
Evaluating effectiveness of retrieval
what are good metrics?
what constitutes a good experimental test bed?
documents
Documents Assign document identifier
text document
Tokenize
IDs
tokens Stop list
non-stoplist Stemming & Normalize
tokens
stemmed Term weighting
terms
terms with
weights Index
Index terms
Index