0% found this document useful (0 votes)

85 views

I Jeter 039112021

This document reviews and compares convolutional neural network backbones that are commonly used in object detection models. It first discusses traditional object detection methods and deep learning-based methods. For deep learning approaches, it describes the typical architecture including backbone networks, neck, and heads. It then reviews several popular two-stage object detection models from R-CNN to Mask R-CNN and their improvements over time. Finally, it discusses commonly used backbone networks for object detection like VGG, ResNet, and how they are utilized to extract features in object detection pipelines.

Uploaded by

WARSE Journals

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

85 views

I Jeter 039112021

Uploaded by

WARSE Journals

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

ISSN 2347 - 3983

Volume 9. No.11, November 2021

International Journal of Emerging Trends in Engineering Research
Available Online at http://www.warse.org/IJETER/static/pdf/file/ijeter039112021.pdf
https://doi.org/10.30534/ijeter/2021/039112021

Object Detectors’ Convolutional Neural Networks backbones : a

review and a comparative study
Sara Bouraya1 , Abdessamad Belangour2
1
Laboratory of Information Technology and Modeling,Hassan II University, Faculty of sciences Ben M'sik,
Casablanca, Morocco, [email protected]
2
Laboratory of Information Technology and Modeling, Hassan II University, Faculty of sciences Ben M'sik,
Casablanca, Morocco,[email protected]

Received Date : October 04, 2021 Accepted Date : October 25, 2021 Published Date : November 07, 2021

ABSTRACT

Computer vision is a scientific field that deals with how

computers can acquire significant level comprehension from
computerized images or videos. One of the keystones of
computer vision is object detection that aims to identify
relevant features from video or image to detect objects.
Backbone is the first stage in object detection algorithms
that play a crucial role in object detection. Object detectors
are usually provided with backbone networks designed for
image classification. Object detection performance is highly
based on features extracted by backbones, for instance, by
simply replacing a backbone with its extended version, a
large accuracy metric grows up. Additionally, the
backbone's importance is demonstrated by its efficiency in Figure 1: Object Detection Methodologies' Categories
real-time object detection. In this paper, we aim to
Without ignoring traditional methodologies, these methods
accumulate the crucial role of the deep learning era and
are generally based on three different stages. Firstly,
convolutional neural networks in particular in object informative region selection, when we try to find object
detection tasks. We have analyzed and have been location that is appearing in different shapes and different
concentrating on a wide range of reviews on convolutional locations. Based on the sliding window this stage could be
neural networks used as the backbone of object detection computationally expensive and capturing irrelevant results
models. Building, therefore, a review of backbones that help Secondly, feature extraction is based on algorithms like
researchers and scientists to use it as a guideline for their HOG or SIFT. Finally, the third stage is relying on some
works. classifiers to classify the target object. These methods'
drawbacks are computational costs.
Key words :Object Detection, Deep Learning, Computer
On the other hand, deep learning-based methods are based on
vision, Backbone. different steps that we can summary them up in(see Figure 2)
1. INTRODUCTION

Object detection is a computer vision technique used for

locating instances of objects in videos or images.Object
detection models typically rely on deep learning or machine
learning to produce meaningful results. During the last
decades, Deep Leaning techniques of Object Detection have
been growing rapidly. Thus, we can find a variety of models
based on Deep Learning approaches.Deep Learning
approaches could be divided into two categories one stage Figure 2: Object Detection Based Deep Learning Architecture
detectors such as Yolo[1], RetinaNet[2], and SSD[3] and
two-stage detectors such as R-CNN[4], Fast-R-CNN[5], As we can see, there are different steps in reaching object
Faster R-CNN[6], and Mask R-CNN[7] (see Figure 1). detection based on deep learning starting from an input
image or a video frame. Then the next step is feature
extraction that can be reached using Backbones that we are
going to see in this paper.

1379
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

Backbones are convolutional neural networks based on Faster R-CNN[8] stands for “Faster Region-based
different layers also, moreover, the Neck stage refers to a Convolutional Neural Networks”. The main idea behind
collection of layers that collect feature maps and they are Faster R-CNN [8]is to integrate the region proposal model
composed of several top-down paths and several bottom-up into CNN which going to make the R-CNN [4]family train
paths. Next, the head of the model that can predict bounding rapidly. This model is proposed in 2016 its architecture is
boxes of objects and their classes, can be either a one-stage based on constructing a unified model composed of region
detector or a two-stage detector. Two-stage detectors are proposal network and Fast R-CNN[5] meanwhile a shared
more complicated than one-stage detectors which are elegant convolutional feature layer.
and straightforward.
Let us see some of the architectures of two-stage detectors
that are complicated and let us observe their improvements.
Ranging from object detection to object segmentation. In
other words, starting from R-CNN[4] to Mask R-CNN[7].
R-CNN[4]stands for “Region-based Convolutional Neural
Networks”. It is one of the famous models that gave a lot of
performance to object detection. The idea behind its
architecture is composed of two steps. Firstly, relying on
selective search to identify several bounding boxes object
region candidates that are named region of interest or Roi. Its
next step based on CNN can extracts features from each
region separately for classification(see Figure 3).
Figure 5: Faster R-CNN Architecture

Mask R-CNN[7] stands for “Fast Region-based

Convolutional Neural Networks”. This model was proposed
in 2017 to make improvements of Faster R-CNN[6] to deal
with image segmentation (see Figure 6). This model's main
idea is to predict pixel-level masks. Relying on Faster R-
CNN[6], Mask R-CNN [7]adds to its architecture the third
branch which is used to predict the mask at the same time as
to classification task and bounding box prediction. The mask
is also a fully connected network that reaches a segmentation
task applied to each region.

Figure 3: R-CNN Architecture

Fast R-CNN[4] stands for “Fast Region-based Convolutional

Neural Networks”. To make R-CNN faster the authors
proposed another training ticks to gain more accuracy(see
Figure 4). They improved the training process by unifying
three models into a jointly trained framework and growing
the shared computation result. The model aggregates the
feature vectors into one CNN one forward pass over the
input and sharing the feature matrix without treating them
separately. Next, this matrix was collected to be used as an
input for classification tasks and bounding boxes regression.

Figure 6: Mask R-CNN Architecture

2. BACKGROUND

All the discussed architectures in the previous section as I

said, are relying on the backbone of their architecture. In
this section, we are going to discuss some of the useful
backbones in Object Detection such as VGG[9],
ResNet[10], and so on.
Convolutional Neural Networks have been used in several
visual tasks. One of these tasks is image classification. Their
Figure 4: Fast R-CNN Architecture main role is feature extraction, which referred us to
Backbones. Many scientists implement the successful model

1380
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

in the ImageNet classification contest, to their models to the deep learning boom. AlexNet[11]competed in the
gain better performance. These convolutional neural famous ImageNet Large Scale Visual Recognition
networks have different architectures and characteristics. Challenge in 2012. The proposed network achieved high
accuracy. AnAlexNet[11]architectural model is depicted in
Figure 7.AlexNet [11]Architecture is composed of 8 layers.
AlexNet[11]is repeatedly considered the pioneer of
It contains eight learned layers, i.e., five convolutional and
convolutional neural networks and the beginning point of
Max Pooling 1 three fully connected in which three softmax pooling.

Max Pooling 2

Max Pooling 5

Dense 8
Dense 6

Dense 7
Conv 4

Conv 5
Conv 1

Conv 2

Conv 3
Figure 7: An illustration of the AlexNet architecture

VGG16[9] is convolutional neural network that won Figure 8, VGG16[9] have 5 Convolution block and 1 fully
ImageNet Large Scale Visual Recognition Challenge connected block. Each convolution block contains a set of
competition in 2014. VGG16 [9]has been regarded as the convolutional layers with a pooling. Finally, three fully
best model at that time. 16 in VGG16[9] refers to its 16 connected layers are referred to as Dense in Figure 8.
layers. Indeed VGG16 [9]is a large model with 138
parameters approximately. As shown in
Conv 1-1

Conv 1-2

Conv 2-1

Conv 2-2

Conv 3-1

Conv 3-2

Conv 3-3

Conv 4-1

Conv 4-2

Conv 4-3

Conv 5-1

Conv 5-2

Conv 5-3
Pooling

Pooling

Dense

Dense
Figure 8: An illustration of the VGG16 architecture

ResNet18[12]is a convolutional neural network that won trained networks with 100 and 1000 layers also. 18 refers to
ImageNet Large Scale Visual Recognition Challenge the number of convolutions that are 18 and two pooling.
Classification competition in 2015.Residual Network

Avg Pooling
Conv 2-1

Conv 2-2

Conv 2-3

Conv 2-4

Conv 3-1

Conv 3-2

Conv 3-3

Conv 3-4

Conv 4-1

Conv 4-2

Conv 4-3

Conv 4-4

Conv 5-1

Conv 5-2

Conv 5-3

Conv 5-4
Pooling
Conv 1

Dense
Figure 9: An illustration of ResNet18 architecture

GoogleNet[13]is based on inceptions as shown in figure 10. and max pooling. The inception module contains four
Each inception is composed of several convolutional layers parallel operations.

Filter Concatenation
3*3 Convolutions 5*5 Convolutions 1*1 Convolutions
1*1 Convolutions
1*1 Convolutions 1*1 Convolutions 3*3 max pooling
Previous layer

Figure 10: An illustration of Inception architecture

GoogleNet[13] architecture contains 22 layers with 27 the inception modules, there is the global average pooling as
pooling layers. In total there are 9 inception modules. After illustrated in Figure 11.

1381
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

Dropout 40%
Inception(3b)

Inception(4b)

Inception(4d)

Inception(5b)
Inception(3a)

Inception(4a)

Inception(4c)

Inception(4e)

Inception(5a)
Max Pooling

Max Pooling

Avg Pooling
Convolution

Convolution

SoftMax
Linear
Figure 11: An illustration of GoogleNet architecture

In DenseNet [14]architecture, each layer is connected to that is extremely powerful. Hence, The input of each layer
every other layer, thus the name Densely Connected inside DenseNet [14]is the concatenation of feature maps
Convolutional Network. This is the main idea of DenseNet from previous existent layers (see Figure 12).

12 dense layers

24 dense layers

16 dense layers
6 dense layers

Convolution

Convolution
Avg Pooling

Avg Pooling

Avg Pooling
Max Pooling
Convolution

Avg Pooling

SoftMax
Dense
Dense block

Dense block

Dense block
Transition

Transition

Transition
Layer

Layer

Layer
Figure 12: An illustration of DenseNet architecture

MobileNet[15] utilizes depthwise separable convolutions construct lightweight deep neural networks for embedded
instead of the standard convolutions to reduce computation and mobile vision applications. All layers are followed by
and model size except for the first layer. Thus, it can be used batch normalization and ReLU non-linearity. However, the
to final layer is a fully connected layer without any non-
linearity and then softmax for classification (see Figure 13).
Conv dw
Conv

Avg Pooling
Conv dw

Conv dw

SoftMax
Dense
Conv

Conv

Conv
times
5

Figure 13: An illustration of MobileNet architecture

3. COMPARAISON OF BACKBONES

This table illustrates the deep learning model used for the describes generally how the model performs across all
classification task of the ImageNet Large Scale Visual classes. It is counted based on the ratio of the correct
Recognition Challenge. The number associated with each number of predictions to the number total of predictions.
name referred to the number of layers. This table contains The accuracy metric is between 0% and 100%. There are
the model name, reference, paper title, accuracy, finally, and also other performances such as Recall, Precision,
time(see Table 1).Our comparison criteria in terms of time
and accuracy. Time refers to the training time on the
ImageNet dataset. Accuracy is an evaluation metric that

1382
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

Table 1: Accuracy and Time of Classification models based on deep learning

Model Ref Paper title Accuracy % Time

vgg16 [9] Very Deep Convolutional Networks For 70.79 24.95
Large Scale Image Recognition
vgg19 70.89 24.95
resnet18 [10] Deep Residual Learning for Image 68.24 16.07
Recognition
resnet50 74.81 22.62
resnet101 76.58 33.03
resnet152 76.66 42.37
resnet50v2 69.73 19.56
resnet101v2 71.93 28.80
resnet152v2 72.29 41.09
resnext50 [16] Aggregated residual transformations for 77.36 37.57
deep neural networks
resnext101 78.48 60.07
densenet121 [14] Densely connected convolutional networks 74.67 27.66
densenet169 75.85 33.71
densenet201 77.13 42.40
inceptionv3 [17] Rethinking the Inception Architecture for 77.55 38.94
Computer Vision
xception [18] Xception: Deep learning with depthwise 78.87 42.18
separable convolutions
inceptionresnetv2 [19] Inception-v4, Inception-ResNet and the 80.03 54.77
Impact of Residual Connections on
Learning
seresnet18 [20] Squeeze and Excitation Networks 69.41 20.19
seresnet34 72.60 22.20
seresnet50 76.44 23.64
seresnet101 77.92 32.55
seresnet152 78.34 47.88
seresnext50 78.74 38.29
seresnext101 79.88 62.80
senet154 81.06 137.36
nasnetlarge [21] Learning Transferable Architectures for 82.12 116.53
Scalable Image Recognition
nasnetmobile 74.04 27.73
mobilenet [15] MobileNets: Efficient Convolutional 70.36 15.50
Neural Networks for Mobile Vision
Applications
mobilenetv2 [22] MobileNetV2: Inverted Residuals and 71.63 18.31
Linear Bottlenecks

After gathering the main methods to compare them (see Here is a bar plot shows the best method.
Table 1).One on the one hand, with a view to detect the best
model in term of time.

1383
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

Figure 12: Time Training Comparisonof Classification Models Based Deep Learning

On the other hand, in term of accuracy this is the bar

plot(see Figure 13).

Figure 13: Accuracy Metric comparison Of Classification Models Based Deep Learning

4. DISCUSSION learning. We are interested in deep learning based that are

divided into two techniques one stage detectors and two
This paper covers lot of models, starting from gathering the stage detectors.
relevant methods of object detection that are divided into In our second stage, we said that deep learning-based
two categories traditional approaches and those based deep models are list of small models of deep learning, their

1384
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

architecture contains backbone, neck, and at the end sparse

prediction or dense prediction it depends of each category. REFERENCES
So, we had interested in one backbone part. We gathered the
famous methodologies in this part. 1.J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You
After gathering backbone methodologies based deep only look once: Unified, real-time object detection,”
learning saying deep learning saying a sit of layers. We Proc. IEEE Comput. Soc. Conf. Comput. Vis.
discussed each method separately. Without ignoring the Pattern Recognit., pp. 779–788, 2016.
architecture of each backbone model.
2. T. Y. Lin, P. Goyal, R. Girshick, K. He, and P.
At the end, and after discussing and analyzing the
Dollar, “Focal Loss for Dense Object Detection,”
architectures, we defined a benchmark table that contains
IEEE Trans. Pattern Anal. Mach. Intell., pp. 318–
the performance in terms of time and accuracy. Our methods
327, 2020.
are implemented on ImageNet Dataset.
After all of these steps, plots have handed based on bar plot 3. W. Liu et al., “SSD: Single shot multibox detector,”
that visualize the best and the worst methods in terms of Lect. Notes Comput. Sci. (including Subser. Lect.
time and accuracy. Notes Artif. Intell. Lect. Notes Bioinformatics), pp.
In terms of time, MobileNet and ResNet 18 have less time in 21–37, 2016.
training, not like senNet150 and NesNetLarge.
Based on our comparisonNasNetLarge and 4. R. Girshick, J. Donahue, T. Darrell, and J. Malik,
SeNet154reached the high performance in terms of accuracy “Rich feature hierarchies for accurate object
however relying on time they are the worst. detection and semantic segmentation,” Proc. IEEE
Comput. Soc. Conf. Comput. Vis. Pattern Recognit.,
ResNext101, InceptionResNetsV2, SerResNext101 are
pp. 580–587, 2014.
reaching greater than 76% and in terms of time they take
medium place.Some other models are great in terms of 5. R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf.
accuracy, for example, ResNet18, MobileNet but in terms of Comput. Vis., pp. 1440–1448, 2015,“Fast R-CNN,”
accuracy, they reach greater than 68. Proceedings of the IEEE International Conference
In general,more layer increases performance relied on on Computer Vision. pp. 1440–1448, 2015.
accuracy metric and increases the training time which is not
good. The main purpose of researchers in deep learning 6. S. Ren, K. He, and R. Girshick, “Faster R-CNN :
erea,is looking for higher accuracy metric and less training Towards Real-Time Object Detection with Region
time. Proposal Networks,” pp. 1–9.
7. K. He, G. Gkioxari, P. Dollár, and R. Girshick,
5. CONCLUSION “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach.
Intell., pp. 386–397, 2020.
This paper has givena whole globalvision about Object
Detection one and two-stage detectors as well as a close up 8. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-
view of their backbone part. Finally,it has given you a CNN: Towards Real-Time Object Detection with
comparison of some classification models. Region Proposal Networks,” IEEE Trans. Pattern
In addition, we have presented the techniques of object Anal. Mach. Intell., pp. 1137–1149, 2017.
detection, the traditional ones and those based on deep 9. Karen Simonyan∗& Andrew Zisserman+, “VERY
learning. We have focused on Two-stage detectors that are DEEP CONVOLUTIONAL NETWORKS FOR
based on the backbone or feature extraction stage. LARGE-SCALE IMAGE RECOGNITION Karen,”
Furthermore, we have stated the most relevant techniques Am. J. Heal. Pharm., pp. 398–406, 2018.
based on deep learning and their architecture.
Additionally, a survey has been made on the most relevant 10. K. He and J. Sun, “Deep Residual Learning for
image classification techniques for the ImageNet Large Image Recognition,” pp. 1–9.“Deep Residual
Scale Visual Recognition Challenge Classification Learning for Image Recognition.” pp. 1–9.
competition.The architecture of these techniques has been 11. H. G. Krizhevsky A., Sutskever I., “ImageNet
discussed and decorticated. Classification with Deep Convolutional Neural
After gathering some techniques of image classification Networks,” NIPS, pp. 84–90, 2012.
based on deep learning, we have made a comparison of
these models in terms of time and accuracy because of their 12. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual
importance in this field. learning for image recognition,” Proc. IEEE
The future work is to implement these techniquesusing Comput. Soc. Conf. Comput. Vis. Pattern Recognit.,
TensorFlow in object detection.Some of the models are pp. 770–778, 2016.
good in terms of accuracy and others in terms of time. Thus, 13. C. Szegedy et al., “Going deeper with convolutions
our future work will focus on finding a new model that Christian,” Popul. Health Manag., pp. 186–191,
combines less time and high accuracy. This is our main 2015.
challenge that is going to be implemented in object
detection-based deep learning approaches. 14. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q.
Weinberger, “Densely connected convolutional
networks,” Proc. - 30th IEEE Conf. Comput. Vis.
Pattern Recognition, CVPR 2017.

1385
Sara Bouraya et al., International Journal of Emerging Trends in Engineering Research, 9(11), November 2021, 1379 – 1386

15. A. G. Howard and W. Wang, “MobileNets:

Efficient Convolutional Neural Networks for
Mobile Vision Applications,” 2012..
16. S. Xie, R. Girshick, P. Dollár, Z. Tu, and K. He,
“Aggregated residual transformations for deep
neural networks,” Proc. - 30th IEEE Conf. Comput.
Vis. Pattern Recognition, CVPR 2017, pp. 5987–
5995, 2017.
17. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and
Z. Wojna, “Rethinking the Inception Architecture
for Computer Vision,” Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., pp. 2818–
2826, 2016.
18. F. Chollet, “Xception: Deep learning with depthwise
separable convolutions,” Proc. - 30th IEEE Conf.
Comput. Vis. Pattern Recognition, CVPR 2017, pp.
1800–1807, 2017.
19. M. Längkvist, L. Karlsson, and A. Loutfi,
“Inception-v4, Inception-ResNet and the Impact of
Residual Connections on Learning,” Pattern
Recognit. Lett., pp. 11–24, 2014.
20. J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu,
“Squeeze and Excitation Networks,” IEEE Trans.
Pattern Anal. Mach. Intell., pp. 2011–2023, 2020.
21. B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le,
“Learning Transferable Architectures for Scalable
Image Recognition,” Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., pp. 8697–
8710, 2018.
22. M. Sandler, A. Howard, M. Zhu, A. Zhmoginov,
and L. C. Chen, “MobileNetV2: Inverted Residuals
and Linear Bottlenecks,” Proc. IEEE Comput. Soc.
Conf. Comput. Vis. Pattern Recognit., pp. 4510–
4520, 2018.
23. S. Kumar, J. Tiwari, “A Review: Machine Learning
Approach and Deep Learning Approach for Fake
News Detection“, International Journal of
Emerging Technologies in Engineering Research
(IJETER) Volume 9, Issue 8, August (2021).

1386

Applied Chemistry For CSE Stream
No ratings yet
Applied Chemistry For CSE Stream
102 pages
Name Here: Face Recognition System With Face Detection
No ratings yet
Name Here: Face Recognition System With Face Detection
70 pages
WDXI: The Dataset of X-Ray Image For Weld Defects: Wenming Guo, Huifan Qu Lihong Liang
No ratings yet
WDXI: The Dataset of X-Ray Image For Weld Defects: Wenming Guo, Huifan Qu Lihong Liang
5 pages
A Survey of Modern Deep Learning Based Object Detection Models
No ratings yet
A Survey of Modern Deep Learning Based Object Detection Models
19 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
Object Tracking in Crowd Environment Using Deep Learning
No ratings yet
Object Tracking in Crowd Environment Using Deep Learning
8 pages
Multi Object Tracking in Traffic Environments: A Systematic Literature
No ratings yet
Multi Object Tracking in Traffic Environments: A Systematic Literature
13 pages
2 Convolutional Neural Network For Image Classification
No ratings yet
2 Convolutional Neural Network For Image Classification
6 pages
Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model
No ratings yet
Super-Resolution of Document Images Using Transfer Deep Learning of An ESRGAN Model
6 pages
Enhanced Super-Resolution Using GAN
No ratings yet
Enhanced Super-Resolution Using GAN
6 pages
SSRN Id4107251
No ratings yet
SSRN Id4107251
7 pages
Trackformer
No ratings yet
Trackformer
16 pages
Efficient Fusion of Spatio-Temporal Saliency For Frame Wise Saliency Identification
No ratings yet
Efficient Fusion of Spatio-Temporal Saliency For Frame Wise Saliency Identification
13 pages
Video Saliency-Recognition by Applying Custom Spatio Temporal Fusion Technique
No ratings yet
Video Saliency-Recognition by Applying Custom Spatio Temporal Fusion Technique
10 pages
Cornernet: Detecting Objects As Paired Keypoints: Hei Law Jia Deng Princeton University, University of Michigan
No ratings yet
Cornernet: Detecting Objects As Paired Keypoints: Hei Law Jia Deng Princeton University, University of Michigan
24 pages
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
No ratings yet
Object Detection Using YOLO: Challenges, Architectural Successors, Datasets and Applications
33 pages
Video Segmentation For Moving Object Detection Using Local Change & Entropy Based Adaptive Window Thresholding
No ratings yet
Video Segmentation For Moving Object Detection Using Local Change & Entropy Based Adaptive Window Thresholding
12 pages
3D Optical Data Storage PDF
No ratings yet
3D Optical Data Storage PDF
20 pages
Applsci 13 04144 v2
No ratings yet
Applsci 13 04144 v2
26 pages
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
No ratings yet
Design and Implementation of A Convolutional Neural Network On An Edge Computing Smartphone For Human Activity Recognition
12 pages
Research Paper
No ratings yet
Research Paper
5 pages
A Comprehensive Study of Camouflaged Object Detection Using Deep Learning
No ratings yet
A Comprehensive Study of Camouflaged Object Detection Using Deep Learning
8 pages
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
No ratings yet
CNN-based and DTW Features For Human Activity Recognition On Depth Maps
14 pages
Transfer Learning With Convolutional Neural Networks For Iris Recognition
No ratings yet
Transfer Learning With Convolutional Neural Networks For Iris Recognition
18 pages
Panoptic Segmentation
No ratings yet
Panoptic Segmentation
29 pages
Chapter 7. Object Recognition
No ratings yet
Chapter 7. Object Recognition
106 pages
Object Detection Using Yolo
No ratings yet
Object Detection Using Yolo
42 pages
Classify Webcam Images Using Deep Learning - MATLAB & Simulink
No ratings yet
Classify Webcam Images Using Deep Learning - MATLAB & Simulink
11 pages
Frequency Domain Filtering Image Processing
100% (1)
Frequency Domain Filtering Image Processing
24 pages
Challenges and Hurdles
No ratings yet
Challenges and Hurdles
8 pages
Blindness Detection - A Systematic Research
No ratings yet
Blindness Detection - A Systematic Research
10 pages
I Mouse
94% (16)
I Mouse
22 pages
Sliding Window Blockchain Architecture For Internet of Things
No ratings yet
Sliding Window Blockchain Architecture For Internet of Things
47 pages
Improved YOLOv4 Tiny Network For Real-Time Electronic Component Detection
No ratings yet
Improved YOLOv4 Tiny Network For Real-Time Electronic Component Detection
13 pages
Image Super-Resolution The Techniques Applications
No ratings yet
Image Super-Resolution The Techniques Applications
20 pages
UNIT_2_DL[1]
No ratings yet
UNIT_2_DL[1]
43 pages
D2CRP A Novel Distributed 2-Hop Cluster Routing Protocol For Wireless Sensor Networks
No ratings yet
D2CRP A Novel Distributed 2-Hop Cluster Routing Protocol For Wireless Sensor Networks
14 pages
IEEE Format Virtual Air Painting
No ratings yet
IEEE Format Virtual Air Painting
2 pages
tmpAAE7 TMP
No ratings yet
tmpAAE7 TMP
21 pages
Sun Human Action Recognition ICCV 2015 Paper
No ratings yet
Sun Human Action Recognition ICCV 2015 Paper
9 pages
DenseNet_Presentation
No ratings yet
DenseNet_Presentation
11 pages
Module 1:image Representation and Modeling
No ratings yet
Module 1:image Representation and Modeling
48 pages
Biomedical Research Paper
No ratings yet
Biomedical Research Paper
7 pages
3D Convolutional Neural Networks For Human Action Recognition
No ratings yet
3D Convolutional Neural Networks For Human Action Recognition
11 pages
Sample Copy of Black Book
No ratings yet
Sample Copy of Black Book
81 pages
Dense Net
No ratings yet
Dense Net
28 pages
UNIT_4_DL
No ratings yet
UNIT_4_DL
31 pages
Waste Classification Using Convolutional Neural Network On Edge Devices
No ratings yet
Waste Classification Using Convolutional Neural Network On Edge Devices
5 pages
Object Detection and Tracking Algorithms For Vehicle Counting: A Comparative Analysis
No ratings yet
Object Detection and Tracking Algorithms For Vehicle Counting: A Comparative Analysis
11 pages
Unit II
No ratings yet
Unit II
56 pages
Deep Convolutional Denoising of Low-Light Images: Tal Remez or Litany Raja Giryes
No ratings yet
Deep Convolutional Denoising of Low-Light Images: Tal Remez or Litany Raja Giryes
11 pages
FPGA Implementation of A Face Recognition System
No ratings yet
FPGA Implementation of A Face Recognition System
5 pages
Fast and Subpixel Precise Blob Detection and Attribution
No ratings yet
Fast and Subpixel Precise Blob Detection and Attribution
4 pages
Detection and Classification of Dental Caries in X-Ray Images Using Deep Neural Networks
No ratings yet
Detection and Classification of Dental Caries in X-Ray Images Using Deep Neural Networks
5 pages
Sensor and Data Fusion A Tool for Information Assessment and Decision Making Second Edition Lawrence A. Klein 2024 Scribd Download
100% (10)
Sensor and Data Fusion A Tool for Information Assessment and Decision Making Second Edition Lawrence A. Klein 2024 Scribd Download
64 pages
Seminar
No ratings yet
Seminar
38 pages
1905.13750 Sketch2code Generating A Website From A Paper
No ratings yet
1905.13750 Sketch2code Generating A Website From A Paper
64 pages
Mini Project
100% (1)
Mini Project
57 pages
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
From Everand
Mastering WebGL: Crafting Advanced 3D Web Experiences: WebGL Wizadry
Kameron Hussain
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
2802 8020 1 PB
No ratings yet
2802 8020 1 PB
3 pages
Object Detection With Deep Learning_ A Review Summary
No ratings yet
Object Detection With Deep Learning_ A Review Summary
11 pages
I Jeter 0110122022
No ratings yet
I Jeter 0110122022
6 pages
Automatic Change Detection On Satellite Images Using Principal Component Analysis, ISODATA and Fuzzy C-Means Methods
No ratings yet
Automatic Change Detection On Satellite Images Using Principal Component Analysis, ISODATA and Fuzzy C-Means Methods
8 pages
Spoken Language Identification Using CNN With Log Mel Spectrogram Features in Indian Context
No ratings yet
Spoken Language Identification Using CNN With Log Mel Spectrogram Features in Indian Context
7 pages
GIS-technology of Water Drainage System (WDS) Modernization in Ukrainian City With Rugged Terrain
No ratings yet
GIS-technology of Water Drainage System (WDS) Modernization in Ukrainian City With Rugged Terrain
5 pages
Simulation Based Analysis of Hierarchical Timed Colored Petri Nets Model of The Restaurant Food Serving Process
No ratings yet
Simulation Based Analysis of Hierarchical Timed Colored Petri Nets Model of The Restaurant Food Serving Process
11 pages
GIS-technology of Water Drainage System (WDS) Modernization in Ukrainian City With Rugged Terrain
No ratings yet
GIS-technology of Water Drainage System (WDS) Modernization in Ukrainian City With Rugged Terrain
5 pages
Spoken Language Identification Using CNN With Log Mel Spectrogram Features in Indian Context
No ratings yet
Spoken Language Identification Using CNN With Log Mel Spectrogram Features in Indian Context
7 pages
Optimal Placement and Sizing of Dgs in Distribution Networks Using Dandelion Optimization Algorithm: Case Study of An Algerian Distribution Network
No ratings yet
Optimal Placement and Sizing of Dgs in Distribution Networks Using Dandelion Optimization Algorithm: Case Study of An Algerian Distribution Network
9 pages
Automatic Change Detection On Satellite Images Using Principal Component Analysis, ISODATA and Fuzzy C-Means Methods
No ratings yet
Automatic Change Detection On Satellite Images Using Principal Component Analysis, ISODATA and Fuzzy C-Means Methods
8 pages
A Efficient Method To Detect DDos Attack in Cloud Computing
No ratings yet
A Efficient Method To Detect DDos Attack in Cloud Computing
9 pages
Optimal Placement and Sizing of Dgs in Distribution Networks Using Dandelion Optimization Algorithm: Case Study of An Algerian Distribution Network
No ratings yet
Optimal Placement and Sizing of Dgs in Distribution Networks Using Dandelion Optimization Algorithm: Case Study of An Algerian Distribution Network
9 pages
I Jeter 049112021
No ratings yet
I Jeter 049112021
8 pages
I Jeter 019112021
No ratings yet
I Jeter 019112021
6 pages
I Jeter 109112021
No ratings yet
I Jeter 109112021
8 pages
Web-Based Health Monitoring System and Textual Mining
100% (1)
Web-Based Health Monitoring System and Textual Mining
13 pages
Gamification As An Effective Learning Tool To Increase Learner Motivation and Engagement
No ratings yet
Gamification As An Effective Learning Tool To Increase Learner Motivation and Engagement
5 pages
Neuronet Works Book
No ratings yet
Neuronet Works Book
690 pages
Final Report
No ratings yet
Final Report
28 pages
A Brief History of Deep Learning - DATAVERSITY
No ratings yet
A Brief History of Deep Learning - DATAVERSITY
7 pages
Van Liebergen - Machine Learning in Compliance Risk Management PDF
No ratings yet
Van Liebergen - Machine Learning in Compliance Risk Management PDF
8 pages
Zhou - 2022 - Elastic weight consolidation-based adaptive neural networks for dynamic building energy load predict
No ratings yet
Zhou - 2022 - Elastic weight consolidation-based adaptive neural networks for dynamic building energy load predict
15 pages
Grad Cam Comparision Types
No ratings yet
Grad Cam Comparision Types
17 pages
2412.14699v1
No ratings yet
2412.14699v1
33 pages
Complete Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi PDF For All Chapters
100% (4)
Complete Progress in Advanced Computing and Intelligent Engineering: Proceedings of ICACIE 2019, Volume 2 Chhabi Rani Panigrahi PDF For All Chapters
39 pages
Gradient Starvation - A Learning Proclivity in Neural Networks
No ratings yet
Gradient Starvation - A Learning Proclivity in Neural Networks
26 pages
Video Metadata Generation and Classification-1
No ratings yet
Video Metadata Generation and Classification-1
26 pages
LSTM Paper
No ratings yet
LSTM Paper
5 pages
Matlab
100% (2)
Matlab
162 pages
AItRBM Proof
No ratings yet
AItRBM Proof
23 pages
Hand-Gesture Recognition Based On EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing
No ratings yet
Hand-Gesture Recognition Based On EMG and Event-Based Camera Sensor Fusion: A Benchmark in Neuromorphic Computing
15 pages
Maize Leaf Disease Identification
No ratings yet
Maize Leaf Disease Identification
10 pages
(Advances in Intelligent Systems and Computing 836) Oleg Chertov, Tymofiy Mylovanov, Yuriy Kondratenko, Janusz Kacprzyk, Vladik Kreinovich, Vadim Stefanuk-Recent Developments in Data Science and Intel.pdf
No ratings yet
(Advances in Intelligent Systems and Computing 836) Oleg Chertov, Tymofiy Mylovanov, Yuriy Kondratenko, Janusz Kacprzyk, Vladik Kreinovich, Vadim Stefanuk-Recent Developments in Data Science and Intel.pdf
391 pages
An Efficient Optimization Method For Stacking Sequ
No ratings yet
An Efficient Optimization Method For Stacking Sequ
25 pages
AdvancedBooks - Python Wiki
0% (1)
AdvancedBooks - Python Wiki
104 pages
MIS Assignment
No ratings yet
MIS Assignment
23 pages
Minor
No ratings yet
Minor
48 pages
Hybrid AI Agent On 2d Racing Game Using Neural Networks and Reinforcement Learning
No ratings yet
Hybrid AI Agent On 2d Racing Game Using Neural Networks and Reinforcement Learning
7 pages
Unit-2 DL Cse
No ratings yet
Unit-2 DL Cse
21 pages
AI Mini Research Project On Loss of Natural Intelligence in Humans Due To AI
No ratings yet
AI Mini Research Project On Loss of Natural Intelligence in Humans Due To AI
12 pages
Group Emotion Recognition Using Machine Learning: Third Year Project Report
No ratings yet
Group Emotion Recognition Using Machine Learning: Third Year Project Report
52 pages
PyTorch Guide
No ratings yet
PyTorch Guide
17 pages
Drozda Thesis ML
No ratings yet
Drozda Thesis ML
216 pages
Paper Logo Published
No ratings yet
Paper Logo Published
8 pages
Parallelization of A Neural Network Algorithm For Use in Handwriting Recognition
No ratings yet
Parallelization of A Neural Network Algorithm For Use in Handwriting Recognition
6 pages

Uploaded by

Uploaded by

ISSN 2347 - 3983

Volume 9. No.11, November 2021

Object Detectors’ Convolutional Neural Networks backbones : a

Computer vision is a scientific field that deals with how

Object detection is a computer vision technique used for

Mask R-CNN[7] stands for “Fast Region-based

Figure 3: R-CNN Architecture

Fast R-CNN[4] stands for “Fast Region-based Convolutional

Figure 6: Mask R-CNN Architecture

All the discussed architectures in the previous section as I

Figure 10: An illustration of Inception architecture

Figure 13: An illustration of MobileNet architecture

Table 1: Accuracy and Time of Classification models based on deep learning

Model Ref Paper title Accuracy % Time

On the other hand, in term of accuracy this is the bar

4. DISCUSSION learning. We are interested in deep learning based that are

architecture contains backbone, neck, and at the end sparse

15. A. G. Howard and W. Wang, “MobileNets:

You might also like