Ruchitha_paper
Ruchitha_paper
IMPAIRED PEOPLE
Nadipelli Ruchitha*,P.Shiva Kumar **,Marugalla Sridhar***, agadam Jyotsna
* Information Technology
** J.B.Institute of Engineering and Technology
Abstract- As object recognition technology has developed in video sequences and to cluster pixels of these
recently, various technologies have been applied to objects. The detection of an object in video sequence
autonomous vehicles, robots, and industrial facilities. plays a major role in several applications specifically
However, the benefits of these technologies are not as video surveillance applications. Object detection in
reaching the visually impaired, who need it the most. In
a video stream can be done by processes like pre
this paper, we proposed an object detection system for the
blind using deep learning technologies. We use voice
processing, segmentation, foreground and background
recognition technology in order to know what objects a extraction, feature extraction. Humans can easily
blind person wants, and then to find the objects via object detect and identify objects present in an image. The
recognition. Furthermore, a voice guidance technique is human visual system is fast and accurate and can
used to inform sight impaired persons as to the location of perform complex tasks like identifying multiple
objects. The object recognition deep learning model objects with little conscious thought. With the
utilizes Deep neural network architecture, and voice availability of large amounts of data, faster GPUs, and
recognition is designed through speech-to-text (STT) better algorithms, we can now easily train computers
technology. In addition, a voice announcement is to detect and classify multiple objects within an image
synthesized using text-to-speech (TTS) to make it easier with high accuracy. Tensor flow is an open source
for the blind to get information about objects. The system
software library for high performance numerical
is built using python OpenCV tool. As a result, we
implement an efficient object-detection system that helps
computation. It allows simple deployment of
the blind find objects in a specific space without help from computation across a range of platforms (CPUs,
others, and the system is analyzed through experiments to GPUs, TPUs) due to its versatile design also from
verify performance. desktops to clusters of servers to mobile and edge
devices. Tensor flow was designed and developed by
Index Terms- CNN, Image Processing , Bayes Theorem ,YOLO researchers and engineers from the Google Brain team
v3,Object detection ,API, at intervals 8 Google’s AI organization, it comes with
I. INTRODUCTION robust support for machine learning and deep learning
and the versatile numerical computation core is used
Object Detection is the process of finding and across several alternative scientific domains. To
recognizing real-world object instances such as car, construct, train and deploy Object Detection Models
bike, TV, flowers, and humans out of an images or TensorFlow is used that makes it easy and also it
videos. An object detection technique lets you provides a collection of Detection Models pre-trained
understand the details of an image or a video as it on the COCO dataset, the Kitti dataset, and the Open
allows for the recognition, localization, and detection Images dataset. One among the numerous Detection
of multiple objects within an image. It is usually Models is that the combination of Single Shot
utilized in applications like image retrieval, security, Detector (SSDs) and Mobile Nets architecture that is
surveillance, and advanced driver assistance systems quick, efficient and doesn't need huge computational
(ADAS). capability to accomplish the object Detection. YOLO
is real-time object detection. It applies one neural
Object detection from a video in video surveillance network to the complete image dividing the image
applications is the major task these days. Object into regions and predicts bounding boxes and
detection technique is used to identify required objects possibilities for every region. Predicted probabilities
are the basis on which these bounding boxes are from the human vision which further increases the problem
weighted. A single neural network predicts bounding of detection. Decreasing illumination and acquisition angle.
boxes and class possibilities directly from full pictures The proposed MLP based object tracking system is made
in one evaluation. Since the full detection pipeline is a robust by an optimum selection of unique features and also
single network, it can be optimized end-to-end by implementing the Adaboost strong classification
method. Background Subtraction The background
directly on detection performance. subtraction method by Horprasert et al (1999), was able to
Image recognition/image processing is in the forefront of
cope with local illumination changes, such as shadows and
Artificial Intelligence today. It is however far from
highlights, even globe illumination changes. In this
perfection. Seemingly simple scenarios, such as object
method, the background model was statistically modelled
detection, face recognition, removing motion blur, etc. and
on each pixel. Computational colour mode, include the
more complex scenarios such as compression
brightness distortion and the chromaticity distortion which
artefacts,scratch detection, sensor noise, and spilling
was used to distinguish shading background from the
detection are applications of image recognition/image
ordinary background or moving foreground objects. The
processing. Digitized images are often represented as a
background and foreground subtraction method used the
two-dimensional (2D) array of pixels values. Each pixel
following approach. A pixel was modelled by a 4-tuple [Ei,
value which makes up the color scheme of the image is
si, ai, bi], where Ei- a vector with expected colour value, si
often influenced by an array of factors such as light
- a vector with the standard deviation of colour value, ai -
intensity. Visual scene is projected unto a surface, where
the variation of the brightness distortion and bi was the
receptors (natural or artificial) produce values that depend
variation of the chromaticity distortion of the ith pixel. In
on the intensity of incident light. These exciting concepts
the next step, the difference between the background image
are however hard to implement. Forming an image leads to
and the current image was evaluated. Each pixel was
loss of details of information while collapsing a three-
finally classified into four categories: original 8
dimensional (3D) image into a two dimensional image.
background, shaded background or shadow, highlighted
Many other factors are responsible for why image
background and moving foreground object. Liyuan Li et al
recognition/ image processing is hard. Some of such
(2003), contributed a 13 method for detecting foreground
factors are noise in the image (pixels values that are off
objects in non-stationary complex environments containing
from its surrounding pixels), mapping from scene to image
moving background objects. A Bayes decision rule was
etc.
used for classification of background and foreground
II. RESEARCH AND IDEA
changes based on inter-frame colour co-occurrence
statistics. An approach to store and fast retrieve colour
In various fields, there is a necessity to detect the target cooccurrence statistics was also established.
object and also track them effectively while handling
occlusions and other included complexities. Many
researchers (Almeida and Guting 2004, Hsiao-Ping Tsai
2011, Nicolas Papadakis and Aure lie Bugeau 2010 )
attempted for various approaches in object tracking. The
nature of the techniques largely depends on the application
domain. Some of the research works which made the
evolution to proposed work in the field of object tracking
are depicted as follows. Object detection is an important
task, yet challenging vision task. It is a critical part of
many applications such as image search, image auto-
annotation and scene understanding, object tracking.
Moving object tracking of video image sequences was one
of the most important subjects in computer vision. It had
already been applied in many computer vision fields, such
as smart video surveillance (Arun Hampapur 2005),
artificial intelligence, military guidance, safety detection
and robot navigation, medical and biological application.
In recent years, a number of successful single-object
tracking system appeared, but in the presence of several
objects, object detection becomes difficult and when
objects are fully or partially occluded, they are obtruded
In this method, foreground objects were detected in two suitably similar in pattern and if such I exists, output the
steps. First, both the foreground and the background location of I in S as in Hager and Bellhumear (1998).
changes are extracted using background subtraction and Schweitzer et al (2011), derived an algorithm which used
temporal differencing. The frequent background changes both upper and lowers bound to detect ‘k’ best matches.
were then recognized using the Bayes decision rule based Euclidean distance and Walsh transform kernels are used to
on the learned colour co-occurrence statistics. Both short- calculate match measure. The positive things included the
term and long term strategies to learn the frequent usage of priority queue improved quality of decision as to
background changes were used. An algorithm focused on which bound-improved and when good matches exist
obtaining the stationary foreground regions as said by inherent cost was dominant and it improved performance.
Álvaro Bayona et al (2010), which was useful for But there were constraints like the absence of good
applications like the detection of abandoned/stolen objects matches that lead to queue cost and the arithmetic
and parked vehicles. This algorithm mainly used two steps. operation cost was higher. The proposed methods dint use
Firstly, a sub-sampling scheme based on background queue thereby avoiding the queue cost rather used template
subtraction techniques was implemented to obtain matching. Visual tracking methods can be roughly
stationary foreground regions. This detects foreground categorized in two ways namely, the feature-based and
changes at different time instants in the same pixel region-based method as proposed by Ken Ito and
locations. This was done by using a Gaussian distribution Shigeyuki Sakane (2001). The feature-based approach
function. Secondly, some modifications were introduced on estimates the 3D pose of a target object to fit the image
this base algorithm such as thresh holding the previously features the edges, given a 3D 9 geometrical model of an
computed subtraction. The main purpose of this algorithm object. This method requires much computational cost.
was reducing the amount of stationary foreground detected. Region-based can be classified into two categories namely,
3.1.2 Template Matching Template Matching is the parametric method and view-based method. The parametric
technique of finding small parts of an image which match a method assumes a parametric model of the images in the
template image. It slides the template from the top left to target image and calculates optimal fitting of the model to
the bottom right of the image and compares for the best pixel data in a region. The view-based method was used to
match with the template. The template dimension should find the best match of a region in a search area given the 14
be equal to the reference image or smaller than the reference template. This has the advantage that it does not
reference image. It recognizes the segment with the highest require much computational complexity as in the feature-
correlation as the target. Given an image S and an image T,
where the dimension of S was both larger than T, output based approach.
whether S contains a subset image I where I and T are
VI. CONCLUSION