0% found this document useful (0 votes)
5 views

Ruchitha_paper

This document presents a CNN-based object detection system designed to assist visually impaired individuals by utilizing deep learning and voice recognition technologies. The system employs the YOLO algorithm for real-time object detection and integrates voice guidance to inform users about the location of objects. The implementation leverages TensorFlow and OpenCV to create an efficient and user-friendly solution for enhancing the independence of blind users in navigating their environment.

Uploaded by

motheanilit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Ruchitha_paper

This document presents a CNN-based object detection system designed to assist visually impaired individuals by utilizing deep learning and voice recognition technologies. The system employs the YOLO algorithm for real-time object detection and integrates voice guidance to inform users about the location of objects. The implementation leverages TensorFlow and OpenCV to create an efficient and user-friendly solution for enhancing the independence of blind users in navigating their environment.

Uploaded by

motheanilit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

CNN BASED TRACKING SYSTEM FOR VIRTUALLY

IMPAIRED PEOPLE
Nadipelli Ruchitha*,P.Shiva Kumar **,Marugalla Sridhar***, agadam Jyotsna

* Information Technology
** J.B.Institute of Engineering and Technology

Abstract- As object recognition technology has developed in video sequences and to cluster pixels of these
recently, various technologies have been applied to objects. The detection of an object in video sequence
autonomous vehicles, robots, and industrial facilities. plays a major role in several applications specifically
However, the benefits of these technologies are not as video surveillance applications. Object detection in
reaching the visually impaired, who need it the most. In
a video stream can be done by processes like pre
this paper, we proposed an object detection system for the
blind using deep learning technologies. We use voice
processing, segmentation, foreground and background
recognition technology in order to know what objects a extraction, feature extraction. Humans can easily
blind person wants, and then to find the objects via object detect and identify objects present in an image. The
recognition. Furthermore, a voice guidance technique is human visual system is fast and accurate and can
used to inform sight impaired persons as to the location of perform complex tasks like identifying multiple
objects. The object recognition deep learning model objects with little conscious thought. With the
utilizes Deep neural network architecture, and voice availability of large amounts of data, faster GPUs, and
recognition is designed through speech-to-text (STT) better algorithms, we can now easily train computers
technology. In addition, a voice announcement is to detect and classify multiple objects within an image
synthesized using text-to-speech (TTS) to make it easier with high accuracy. Tensor flow is an open source
for the blind to get information about objects. The system
software library for high performance numerical
is built using python OpenCV tool. As a result, we
implement an efficient object-detection system that helps
computation. It allows simple deployment of
the blind find objects in a specific space without help from computation across a range of platforms (CPUs,
others, and the system is analyzed through experiments to GPUs, TPUs) due to its versatile design also from
verify performance. desktops to clusters of servers to mobile and edge
devices. Tensor flow was designed and developed by
Index Terms- CNN, Image Processing , Bayes Theorem ,YOLO researchers and engineers from the Google Brain team
v3,Object detection ,API, at intervals 8 Google’s AI organization, it comes with
I. INTRODUCTION robust support for machine learning and deep learning
and the versatile numerical computation core is used
Object Detection is the process of finding and across several alternative scientific domains. To
recognizing real-world object instances such as car, construct, train and deploy Object Detection Models
bike, TV, flowers, and humans out of an images or TensorFlow is used that makes it easy and also it
videos. An object detection technique lets you provides a collection of Detection Models pre-trained
understand the details of an image or a video as it on the COCO dataset, the Kitti dataset, and the Open
allows for the recognition, localization, and detection Images dataset. One among the numerous Detection
of multiple objects within an image. It is usually Models is that the combination of Single Shot
utilized in applications like image retrieval, security, Detector (SSDs) and Mobile Nets architecture that is
surveillance, and advanced driver assistance systems quick, efficient and doesn't need huge computational
(ADAS). capability to accomplish the object Detection. YOLO
is real-time object detection. It applies one neural
Object detection from a video in video surveillance network to the complete image dividing the image
applications is the major task these days. Object into regions and predicts bounding boxes and
detection technique is used to identify required objects possibilities for every region. Predicted probabilities
are the basis on which these bounding boxes are from the human vision which further increases the problem
weighted. A single neural network predicts bounding of detection. Decreasing illumination and acquisition angle.
boxes and class possibilities directly from full pictures The proposed MLP based object tracking system is made
in one evaluation. Since the full detection pipeline is a robust by an optimum selection of unique features and also
single network, it can be optimized end-to-end by implementing the Adaboost strong classification
method. Background Subtraction The background
directly on detection performance. subtraction method by Horprasert et al (1999), was able to
Image recognition/image processing is in the forefront of
cope with local illumination changes, such as shadows and
Artificial Intelligence today. It is however far from
highlights, even globe illumination changes. In this
perfection. Seemingly simple scenarios, such as object
method, the background model was statistically modelled
detection, face recognition, removing motion blur, etc. and
on each pixel. Computational colour mode, include the
more complex scenarios such as compression
brightness distortion and the chromaticity distortion which
artefacts,scratch detection, sensor noise, and spilling
was used to distinguish shading background from the
detection are applications of image recognition/image
ordinary background or moving foreground objects. The
processing. Digitized images are often represented as a
background and foreground subtraction method used the
two-dimensional (2D) array of pixels values. Each pixel
following approach. A pixel was modelled by a 4-tuple [Ei,
value which makes up the color scheme of the image is
si, ai, bi], where Ei- a vector with expected colour value, si
often influenced by an array of factors such as light
- a vector with the standard deviation of colour value, ai -
intensity. Visual scene is projected unto a surface, where
the variation of the brightness distortion and bi was the
receptors (natural or artificial) produce values that depend
variation of the chromaticity distortion of the ith pixel. In
on the intensity of incident light. These exciting concepts
the next step, the difference between the background image
are however hard to implement. Forming an image leads to
and the current image was evaluated. Each pixel was
loss of details of information while collapsing a three-
finally classified into four categories: original 8
dimensional (3D) image into a two dimensional image.
background, shaded background or shadow, highlighted
Many other factors are responsible for why image
background and moving foreground object. Liyuan Li et al
recognition/ image processing is hard. Some of such
(2003), contributed a 13 method for detecting foreground
factors are noise in the image (pixels values that are off
objects in non-stationary complex environments containing
from its surrounding pixels), mapping from scene to image
moving background objects. A Bayes decision rule was
etc.
used for classification of background and foreground
II. RESEARCH AND IDEA
changes based on inter-frame colour co-occurrence
statistics. An approach to store and fast retrieve colour
In various fields, there is a necessity to detect the target cooccurrence statistics was also established.
object and also track them effectively while handling
occlusions and other included complexities. Many
researchers (Almeida and Guting 2004, Hsiao-Ping Tsai
2011, Nicolas Papadakis and Aure lie Bugeau 2010 )
attempted for various approaches in object tracking. The
nature of the techniques largely depends on the application
domain. Some of the research works which made the
evolution to proposed work in the field of object tracking
are depicted as follows. Object detection is an important
task, yet challenging vision task. It is a critical part of
many applications such as image search, image auto-
annotation and scene understanding, object tracking.
Moving object tracking of video image sequences was one
of the most important subjects in computer vision. It had
already been applied in many computer vision fields, such
as smart video surveillance (Arun Hampapur 2005),
artificial intelligence, military guidance, safety detection
and robot navigation, medical and biological application.
In recent years, a number of successful single-object
tracking system appeared, but in the presence of several
objects, object detection becomes difficult and when
objects are fully or partially occluded, they are obtruded
In this method, foreground objects were detected in two suitably similar in pattern and if such I exists, output the
steps. First, both the foreground and the background location of I in S as in Hager and Bellhumear (1998).
changes are extracted using background subtraction and Schweitzer et al (2011), derived an algorithm which used
temporal differencing. The frequent background changes both upper and lowers bound to detect ‘k’ best matches.
were then recognized using the Bayes decision rule based Euclidean distance and Walsh transform kernels are used to
on the learned colour co-occurrence statistics. Both short- calculate match measure. The positive things included the
term and long term strategies to learn the frequent usage of priority queue improved quality of decision as to
background changes were used. An algorithm focused on which bound-improved and when good matches exist
obtaining the stationary foreground regions as said by inherent cost was dominant and it improved performance.
Álvaro Bayona et al (2010), which was useful for But there were constraints like the absence of good
applications like the detection of abandoned/stolen objects matches that lead to queue cost and the arithmetic
and parked vehicles. This algorithm mainly used two steps. operation cost was higher. The proposed methods dint use
Firstly, a sub-sampling scheme based on background queue thereby avoiding the queue cost rather used template
subtraction techniques was implemented to obtain matching. Visual tracking methods can be roughly
stationary foreground regions. This detects foreground categorized in two ways namely, the feature-based and
changes at different time instants in the same pixel region-based method as proposed by Ken Ito and
locations. This was done by using a Gaussian distribution Shigeyuki Sakane (2001). The feature-based approach
function. Secondly, some modifications were introduced on estimates the 3D pose of a target object to fit the image
this base algorithm such as thresh holding the previously features the edges, given a 3D 9 geometrical model of an
computed subtraction. The main purpose of this algorithm object. This method requires much computational cost.
was reducing the amount of stationary foreground detected. Region-based can be classified into two categories namely,
3.1.2 Template Matching Template Matching is the parametric method and view-based method. The parametric
technique of finding small parts of an image which match a method assumes a parametric model of the images in the
template image. It slides the template from the top left to target image and calculates optimal fitting of the model to
the bottom right of the image and compares for the best pixel data in a region. The view-based method was used to
match with the template. The template dimension should find the best match of a region in a search area given the 14
be equal to the reference image or smaller than the reference template. This has the advantage that it does not
reference image. It recognizes the segment with the highest require much computational complexity as in the feature-
correlation as the target. Given an image S and an image T,
where the dimension of S was both larger than T, output based approach.
whether S contains a subset image I where I and T are

III. SCOPE OF THE PROJECT


already been trained on YOLO v3 by others and we have
already obtained the weights stored in a 200+mb file. If
It is a field of Computer Vision that detects instances of you are not sure what weights are, think of it as trying to
semantic objects in images/videos (by creating bounding find the Best Fit Line in Linear Regression. We need to
boxes around them in our case). We can then convert the find the right values of m and c in y=mx+c such that our
annotated text into voice responses and give the basic line minimizes the error between all points. Now in our
positions of the objects in the person/camera’s view. more complex prediction task, we have millions of Xs
Training Data: The model is trained with the Common when we feed images into the complex network. These Xs
Objects In Context (COCO) dataset. You can explore the will each have an m and these are the predicted weights
images that they labeled in the link, it’s pretty cool. Model stored in our yolov3.weights file. The ms have been
: The model here is the You Only Look Once (YOLO) constantly readjusted to minimize some loss function. 3.
algorithm that runs through a variation of an extremely Input Data: We will be using our webcam to feed images at
complex Convolutional Neural Network architecture called 30 frames-per-second to this trained model and we can set
the Darknet. Even though we are using a more enhanced it to only process every other frame to speed things up. We
and complex YOLO v3 model, I will explain the original can then send the text description to the Google Text-to-
YOLO algorithm. Also, the python cv2 package has a Speech API using the gTTS package.We will also obtain
method to setup Darknet from our configurations in the the coordinates of the bounding box of every object
yolov3.cfg file. I am more interested in getting something detected in our frames, overlay the boxes on the objects
to work as soon as possible this time round so I will be detected and return the stream of frames as a video
using a pre-trained model. This means that COCO has playback. We will also schedule to get a voice feedback on
the 1st frame of each second (instead of 30 fps) e.g. and scalability. it ensures reliable protection against cyber
“bottom left cat” — meaning a cat was detected on the threats. Future enhancements will focus on expanding data
bottom-left of my camera view. Understanding the YOLO sources and refining detection capabilities to keep up with
algorithm Previously, classification-based models were the ever-evolving cybersecurity landscape.
used to detect objects using localization, region-based
classification or things such as the sliding window. Only
high scoring regions of the image are considered as a V. RESULTS
detection and they could be very time-consuming.
In this program we used the YOLO algorithm to train our
IV. THE PROPOSED SYSTEM machine learning model. YOLO is a machine learning
The most widely used state of the art version of the R-CNN model from Google which was design to work best with
family — Faster R-CNN was first published in 2015. This dark net framework in 2016 but later it was made
article, the third and final one of a series to understand the compatible to work with OpenCV which we used in this
fundamentals of current day object detection elaborates the project. To understand the workings of this project we first
technical details of the Faster R-CNN detection pipeline. need to understand how yolo algorithm actual work. So,
For a review of its predecessors, check out these with the help of the webcam of our laptop it takes image
summaries: Regions with CNN (R CNN) and Fast R-CNN. every second and passes those images to the YOLO
In the R-CNN family of papers, the evolution between algorithm. After the image, the algorithm is trained to
versions was usually in terms of computational efficiency identify different objects in that particular image with the
(integrating the different training stages), reduction in test help of Coco dataset. Coco datasets contain names and data
time, and improvement in performance (mAP). These of every object which the algorithm uses to train and learn
networks usually consist of — a) A region proposal from. The program takes image every second and throws
algorithm to generate “bounding boxes” or locations of that image to the algorithm and after getting trained the
possible objects in the image; b) A feature generation stage algorithm slowly learns to detect any object in real time
to obtain features of these objects, usually using a CNN; c) and give out the result with 90% accuracy. You can also
A classification layer to predict which class this object enhance the accuracy and differentiate wide variety of
belongs to; and d) A regression layer to make the object depending on the size of your dataset and the
coordinates of the object bounding box more precise. processing power of your
laptop

Figure 3: Object Detection

VI. CONCLUSION

Object detection is a key ability for most computer and


robot vision system. Although great progress has been
observed in the last years, and some existing techniques are
now part of many consumer electronics(e.g., face detection
for auto-focus in smart phones) or have been integrated in
assistant driving technologies,we are still far from
In conclusion, this system strengthens cybersecurity by achieving human level performance, in particular in terms
offering real-time detection, proactive threat mitigation, of open-world learning. It should be noted that object
detection has not been used much in many areas where it 2. Nikouri SeyedYahya et al Intelligent Surveillance as ais Edge
Network Service for Hat Cascade SVM to a Lightweight CNN arx
could be of great help. As mobile robots, and in general prepren arv 1805 00331 (2018)
autonomous machines, are starting to be more widely 3. Petrov, Yoldan, improving object detection by explating semantic
deployed (e.g., quadcopters, drones and soon service relations between objectsS thesis Universitat Politècnica de Catalunya,
2017D. J. Reddy and M. R. Kumar, ‘‘Crop yield prediction using
robots), the need of object detection systems is gaining machine learning algorithm,’’ in Proc. 5th Int. Conf. Intell. Comput.
more importance. Finally, we need to consider that weneed Control Syst. (ICICCS), May 2021, pp. 1466–1470. [26]
4. K Marapalk, A. Bansode, P. Dundgekararid N Rathod, AIGER An
object detection systems for nano-robots or for robots that Intelligent Vehicle for Military
will explore areas that have not been seen by humans, such 5. -7.-Q. Zhao, P. Zheng S. T. Xu and X W Object Detection With Deep
as depth parts of the sea or other planets, and the detection Leaming A Review EEE Transactions on Neural Networks and
Learning Systems, vol. 30, no 11, pp 3212-3232 Nov
systems will have to learn tonew object classes as they are
2019Authors
encountered. In such cases, a real-time open-world learning
First Author – Nadipelli Ruchitha, B.Tech(IT) JBIET and
ability will becritical.
[email protected]
Second Author – P.Shiva Kumar , B.Tech(IT) JBIET and
REFERENCES
[email protected]
1. Park, SW Cho, NR. Baek J. ChoLK R Park, DeepFeature Based The-
Stage Detection of Banknotes and Coins for Assisting Visually
Third Author – Marugalla Sridhar, B.Tech(IT) JBIET and
Impaired A.Tomar,G.Gupta,W.Salehi, C.H. Vanipriya, N.Kumar, and [email protected]
B.Sharma, ‘‘A review on leaf-based plant disease detection systems Internal Guide- Jagadam Jyotsna , Asst.professor,(IT) JBIET
using machine learning,’’ in Proc. ICRIC, vol. 1, 2022, pp. 297–303.
[22] and [email protected]

You might also like