0% found this document useful (0 votes)

17 views

8 ObectDectection

Uploaded by

reach geeks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

8 ObectDectection

Uploaded by

reach geeks

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 60

CSCI E-25

Computer Vision
Object Detection
Steve Elston

Copyright 2020, 2021,2022, 2023, Stephen F Elston. All rights reserved.

Overview of Object Detection
Goal of object detection is to localize and identify objects in an image
• Real-world scenes are complex with multiple objects
• Localizing and identifying multiple objects is key to scene understanding
• Task related to semantic segmentation
• Localization parameterized by bounding box
• Is a dense CV task
• Object detection and segmentation are dense CV tasks
• Classification is not dense
• Objects can occur anywhere in an image
• Localize objects to pixel-level
• Classification does not require pixel-level accuracy
Overview of Object Detection
Goal of object detection is to find, localize and identify objects in an image
• Is a multi-task AI problem
• Finding numeric values of bounding box location and dimensions is a
regression problem
• Identification of objects in bounding boxes is a classification problem
• Training models requires a multi-task loss function!

Try it yourself! Object detection is widely used commercially

https://cloud.google.com/vision/automl/object-detection/docs/
https://docs.microsoft.com/en-us/azure/cognitive-services/custom-vision-service/get-st
arted-build-detector

https://huggingface.co/models?pipeline_tag=object-detection
Overview of Object Detection
As alternative can use semantic segmentation – closely related algorithms
• Semantic segmentation is a primary tool for scene understanding
• More on this method in another lesson
Overview of Object Detection
Object detection is a hard problem - Real-world scenes are cluttered

• Hard to uniquely detect

and classify all objects
• This object is both a vase
and a potted plant
Lesson Overview
• Elements of object detection algorithms
• Parameterization of bounding boxes
• Evaluation of object detection algorithms
• Multiple prior bounding boxes
• Solving the object detection problem
• Working with multiple scales
• Single-shot algorithms for video
• Transformer architectures
Overview of Object Detection
What approaches might be used to find objects in a complex scene?
• Example: Classical approach
• Compute features – e.g. HOGs
• Search to localize objects
• Classify objects detected
• Example: Classical approach
• Compute embedding – e.g. PCA
• Locate similar patches
• e.g. the eigen-faces algorithm
• Use deep neural networks
• Dramatic increase in accuracy
• speed may be sacrificed
• Create feature map with CNN
• Localize and classify objects
Elements of Object Detection Algorithms

Object detection algorithms have some common elements

• Convolutional Neural Network: CNN backbone creates a feature map
which is used to detect and classify objects
• Candidate bounding boxes: Multiple candidate bounding boxes are
generated for each region
• Bounding box maximal suppression: The probability of an object being
in each bounding box (objectness), and low probability boxes are
suppressed
• Minimal bounding boxes: The size of the bounding boxes is adjusted to
best fit the objects
• Identification: Classify the object in each bounding
Overview of Object Detection
Object detection is a hard problem - Occultation is common in real-world scenes

• How many people are in

this scene?
• Are these heads or
people?
• What is this object?
Overview of Object Detection
Object detection is a hard problem – Trade-offs of speed, categories and accuracy

• Selection of object detection

models has multi-dimensional

Increasing number of categories

Complex multi-step
trade-offs
models
• Less confidence in classification Inc

Increasing confidence –
rea

Classification accuracy
with increasing number of
categories co sing
mp m
• Lower complexity straight-through lex ode
ity l
models are faster
• Chose model to meet
requirements Single shot
detectors

Increasing frame rate - speed

Evolution of Object Detection Algorithms
An incomplete list of seminal object detection algorithms
• Erhan et. al., 2013, Scalable Object Detection using Deep Neural
Networks, introduced the R-CNN algorithm the first widely accepted
deep learning object detection algorithm. R-CNN demonstrated a
significant improvement in object recognition accuracy over classical
methods. However, this algorithm is too slow for real-time video
processing.
• Girshick, 2015, Fast R-CNN simplified the required computations but still
too slow for real-time video.
Evolution of Object Detection Algorithms
An incomplete list of seminal object detection algorithms
• Ren et. al., 2016, Faster R-CNN algorithm, but computational complexity
of the algorithm was still rather high.
• He, et. al. in 2018 Mask R-CNN algorithm exhibits significantly improved
object detection accuracy, particularly when there are large numbers of
objects, such as flock of birds or a crowd of people. While not efficient
enough for real-time video, but accurate for complex scenes
Evolution of Object Detection Algorithms
An incomplete list of seminal object detection algorithms
• Single shot real-time object detection algorithms
• Lui et. al., 2016, Single shot Multibox Detector performs bounding box
fitting, object detection, and classification in one step. This single shot
algorithm provides real time performance for video
• Redmon, et. al. 2016, You Only Look Once: Uniﬁed, Real-Time Object
Detection (YOLO) is an alternative single shot detector. YOLO version 1
suffered from low accuracy
• Redmon, et. al., 2016, YOLO 9000: Better, Faster, Stronger (aka YOLO v2)
made several improvements over the original algorithm. Included the
combination of efficient CNN, larger, integrated training data set.
Evolution of Object Detection Algorithms
An incomplete list of seminal object detection algorithms
• Single shot real-time object detection algorithms
• Redmon, et. al., 2016, YOLOv3: An Incremental Improvement, primarily
new CNN.
• Lin, et. al., 2018, Focal Loss for Dense Object Detection
• Bochkovskiy, et. al., 2020, YOLOv4: Optimal Speed and Accuracy of
Object Detection
• YOLOv5, YOLOv6, not published – so who knows??
• Chen, et. al., 2022 proposed a simplified transformer architecture for
dense CV tasks. This work may point toward the future
Parameterization of Bounding Boxes
Need a stable parameterization of 4 parameters of bounding box
• Start with a prior, prototype, or anchor for
the bounding box pw
• cx, cy is center of the prior
• pw is the width prior ph cx, cy bx, by bh
• ph is the height prior
• The compute the best fit box bw
• bx, by is center of bounding box
• bw is the width of the bounding box
• bh is the Hight of the bounding box
Parameterization of Bounding Boxes
Need a stable parameterization of 4 parameters of bounding box
• A naive approach is to solve a linear system
of equations for parameters, tx, ty, tu, th:
pw

ph cx, cy bx, by bh

• But parameters of the bounding box are

unconstrained!
• Solution can be unstable
Parameterization of Bounding Boxes
Need a stable parameterization of 4 parameters of bounding box
• A better parameterization is proposed:
pw

ph cx, cy bx, by bh

bw
• The bounding box is now constrained and
the parameterization is stable
• p0 is the probability the box contains an
object
Evaluation of bounding box proposals
How can we evaluate bounding boxes computed with object detection?
• Compare the computed bounding box with the marked bounding box
(label)
• Use the ratio of the area of the intersection divided by the area of the
union
• Intersection over union or IoU metric
• Range:
• 0.0 – no overlap
• 1.0 – perfect match
Evaluation of bounding box proposals
How can we evaluate bounding boxes computed with object
detection?

Intersection Union
Evaluation of bounding box proposals
How can we evaluate bounding boxes computed with object detection?
• The closer the prediction is to the ground-truth bounding box the higher
the IoU
• We say higher IoU predictions have greater confidence

Figure from Balasubramaniam and Pasricha, 2022

Multiple Prior Bounding Boxes
• Images can contain
many objects
• SSD uses a grid to divide
the image
• Can fit bounding boxes
around centroids in
each of the grid cells
• Use odd grid
dimensions so there is a
centroid at the center
of image
Multiple Prior Bounding Boxes
• Images contain many
objects
• Impose grid over image
• Locate objects on the
grid
Multiple Prior Bounding Boxes
There are many possible bounding box proposals
• Start with a first bounding box proposal, with
centroid
• Boxes with different aspect ratios and same
centroid
• Apply non-maximal suppression to box
estimates using proposals as prior
Multiple Prior Bounding Boxes
• Multiple objects in
image
• Bounding box
prototypes center on
grid elements
• Multiple prior bounding
box candidates
• SSD uses multiple grids
to accommodate
multiple scales
Finding Priors for Bounding Boxes
Good priors are required for solution
• Priors for VOC and COCO for
YOLO models
• For both data sets, tall and
narrow priors are favored
Performance comparison and trade-offs
YOLOv2 (9000) uses overlapping bounding boxes at multiple scales
• Starts with a grid of anchor
boxes on a single grid
• Priors for multiple scale
bounding boxes around
anchor boxes
• Maximal suppression of
bounding boxes using
probability map
• Result is bounding boxes at
multiple scales
• From Redmon, et. al. 2016
Solving Object Detection Problem
Solve as object detection as regression problem
Find bounding box (regression), objectness (or probability no object), and
category (c1,c2,…,cn), as label for box
Solving Object Detection Problem
Solve as object detection as multi-task problem
• Can formulate the problem with label for
multiple bounding boxes.
• Solve box regression problem and identification
problem in one step
• Use fully convolutional neural networks
Solving Object Detection Problem
Solve as object detection as regression problem
Find most probable bounding box with non-max suppression algorithm:

Filter all boxes with p0 below threshold, say 0.5

While( more than one overlapping box ):
Select the remaining boxes with the highest probability
Compute the IoU for overlapping bounding boxes
Compute probability of objects in boxes, P(ci)
Suppress (filter out) bounding boxes with f(IoU, P(ci))
below threshold
Solving Object Detection Problem
Find most probable bounding box with non-max suppression algorithm:
• Most bounding boxes will not optimally contain an object
• Imbalance with many more true negatives than true positives
• Can lead to poor model training
• SSD solves this problem with a hard negative mining algorithm:
1. Sort bounding boxes by confidence score
2. Retain only most confident cases to maintain a 3:1 ratio of negative to
positive cases.
Loss functions for object detection
How can we construct a multi-task loss function for this problem?
• Execute tasks sequentially
• First step identify bounding box
• Second step classify objects
• Each step trained individually with specific loss function
• Examples include R-CNN algorithm
• Multiple steps are inherent performance bottleneck
• Combine tasks
• Use a multi-task loss function for training
• Used in efficient single shot algorithms; e.g. SSD and YOLO
• Suitable for video rates
Loss functions for object detection
How can we construct a multi-task loss function for this problem?
• Need loss component for bounding box localization accuracy
• Need loss component for identification confidence accuracy
• Example: For N matched bounding boxes, SSD uses this loss function:

Where:
is a trade-off parameter between confidence and location accuracy
is a binary indicator tensor for matching the i-th prototype box to the j-th
ground truth box
c is the class of the object
l are parameters of predicted box and g are parameters of ground truth box
Loss functions for object detection
How can we construct a multi-task loss function for this problem?
• Loss component for bounding box localization accuracy uses a smooth L1
distance with respect to the ground truth box location, , of the i-th box

Where , the bounding box predication has four components for center, {cx, cy},
and dimensions, {w, h}, with respect to the default bonding box, d:
Loss functions for object detection
How can we construct a multi-task loss function for this problem?
• Confidence loss component for correct identification at each location:

Where, the class probability prediction is given by the softmax function:

And,
is the p-th category in box prediction i
is the category for no object in box prediction i
Loss Functions for Object Detection
Class imbalance with object detection
• Class imbalance is a significant problem when training object
detection models
• Example: Foreground objects are generally only a small fraction of pixels
• Example: Many types of small-area background categories – e.g. stripes on a
road
• To overcome class imbalance problems, Li, et. al., use two
approaches:
• Focal loss is applied in the position head
• Training the end-to-end network uses Dice loss
Loss Function for Object Detection
Dice-Sørensen coefficient, or Dice loss, is considered more robust to
class imbalance
• For two sets, , , the Dice-Sørensen coefficient is defined:

• For the simple case of binary classification, we write Dice loss:

Where:
label
Binary category prediction
• Dice loss is equivalent to F1 loss
• Full details on loss functions for training semantic segmentation models
can be found in Jadon, 2020, or Jeremy Jordan’s blog post
Loss Function for Object Detection
Focal loss addresses class imbalance by reweighting cross-entropy
• We can write binary cross-entropy
in the well known form:

• Focal loss reweights cross-entropy:

- Where:
hyperparameter,
cross-entropy
• The term down-weights easy to
learn categories
Find many more details in Lin, et. al, 2018
Evaluation of object detection
Need multiple criteria to evaluate object detection
• Is there an object in the box?
• Is the bounding box correct?
• Is the object classification in the box correct?
• Need metrics for accuracy of bounding box and object class predictions
• Average precision measured on recall-precision trade-off curve
• Use mean average precision – mAP
• Mean taken over average precision over all object classes
Evaluation of object detection
Review of the classification model metrics
• Selectivity or Precision:
– Fraction of cases classified as positive which are correctly classified

• Sensitivity or Recall:
– The fraction positive cases correctly classified

• There is an inherent trade-off between precision and recall

• Can average precision on recall-precision curve
– Change threshold or confidence to quantify curve
Evaluation of object detection
Computing Average Precision - AP

• Precision decreases as recall

increases
• Recall increases as confidence (IoU)
increases

Precision
• Average precision is Area Under the
Curve (AUC) of precision-recall curve
• Approximate AUC as sum of area of
rectangles at each threshold sample
• Usually sample precision at 10
threshold (IoU) values Increasing threshold, IoU
Increasing Recall
Evaluation of object detection
Computing mean average precision - mAP
1. Compute average precision for each of c classes
2. Compute mean of precision for all c classes
3. Report mAP as percentage
• Prefect performance = 100%
• No correct detection and classification = 0%
Architectures of object detectors
Architectural components of single shot object detectors:
detectors Example YOLOv4

Backbone
Neck to Head detects Sparse Prediction
convolutionally NN
accommodate objects and applies non-
creates the feature
multiple scales identifies them maximal
map
suppression
Backbones: CNNs Create Feature Maps
Many choices have been tried
• VGG-16
• ResNet-52
• EfficientNet-BO/B7
• Darknet-53
• Others…
Neck: Working with multiple scales
Images contain objects a multiple scales
• Need to detect objects across wide range of scale
• Is trade-off between semantics and detail
• Large scale has better semantics
• Fine scale has more detail
• Deep neural network architecture produces multiple scales
• Convolution with max pooling reduces detail
• Deeper layers with better semantics
Neck: Working with multiple scales
Images contain objects a multiple scales
• Need to detect objects across wide range of scale
• Is trade-off between semantics and detail
• Large scale has better semantics
• Fine scale has more detail
• Deep neural network architecture produces multiple scales
• Convolution with max pooling reduces detail
• Deeper layers with better semantics
Neck: Working with multiple scales
Convolutional neural network with multi-scale feature map (pyramid)
Convolution/max-
polling layers

Predict

bounding boxes
Multi-scale
Predict

Predict
Image

Convolution/up-
Head for detection and
sampling
identification
Backbone Convolutional Layers Convolutional Layers of neck
Creates feature map Multiple scales
Straight-Through Architectures
Example Single Shot Detector, SSD

• SSD takes a different approach to the speed-accuracy trade-off

• SSD achieves efficiency by scoring multiple bounding boxes at different
scales simultaneously using convolutional layers
Straight-Through Architectures
Example Single Shot Detector, SSD

• SSD is a fully convolutional network

• No fully connected layers
Straight-Through Architectures
Example Single Shot Detector, SSD

VGG-16 Backbone Convolutional Layers

Creates feature map
Straight-Through Architecture
Example Single Shot Detector, SSD

Convolutional Layers of neck down sample to multiple scales

Detection and classification on grids
Straight-Through Architectures
Example Single Shot Detector, SSD

Head layers
Output for each box and class
Straight-Through Architectures
Architectural components of single shot object detectors:
detectors Example YOLOv4

Backbone DarkNet- Neck to Head detects Sparse Prediction

53 creates the accommodate objects and applies non-
feature map multiple scales identifies them maximal
suppression
Figure from Bochkovskiy, et. al., 2020
YOLOv4
YOLOv4 incorporates multiple improvements for better performance

• YOLOv4 introduced mosaic

data augmentation
• Mosaic created from patches
of several images
• Gives greater diversity of
backgrounds and objects in
augmented images

Figure from Bochkovskiy, et. al., 2020

YOLOv4
YOLOv4 incorporates multiple improvements for better performance

• S = Sensitivity for bounding box

• M = Mosaic augmentation
• IT = Multiple anchors for single
ground truth
• GA = Genetic algorithm for
network model selection
• OA = Optimized anchors for 512
x 512 image

Figure from Bochkovskiy, et. al., 2020

YOLOv4
YOLOv4 incorporates multiple improvements for better performance

Figure from Bochkovskiy, et. al., 2020

Transformer Architecture for Object Detection
Chen, et. al., 2022 proposed a simplified transformer architecture
for dense CV tasks
Pure ViT transformer architecture Task specific
to create feature map heads

Constant window size (UViT) or increasing window size

with depth (+) to achieve high density efficiently
Transformer Architecture for Object Detection
Chen, et. al., 2022 showed simple architecture is superior to more complex
hand-engineered architectures

• SD = Spatial down-sampling
• MF = Multi-scale features
• 2x = Doubled channels
Transformer Architecture for Object Detection
Chen, et. al., 2022 showed simple architecture is superior to more complex
hand-engineered architectures

• SD = Spatial down-sampling
• MF = Multi-scale features
• 2x = Doubled channels
Transformer Architecture for Object Detection
Chen, et. al., 2022 propose transformer architecture which may be a path
for future dense CV tasks

How well does the pure transformer model work?

Summary
• Elements of object detection algorithms
• Parameterization of bounding boxes, maintain stability
• Multiple prior (anchor) bounding boxes
• Evaluation of object detection algorithms, mAP
• Solving the object detection problem, multi-task loss functuion
• Working with multiple scales
• Single-shot algorithms for video
• Transformer architecture

Image and Video Analytics Unit 3
No ratings yet
Image and Video Analytics Unit 3
18 pages
Object Detection
No ratings yet
Object Detection
13 pages
Report 34
No ratings yet
Report 34
26 pages
MINI PROJECT SYNOPSIS
No ratings yet
MINI PROJECT SYNOPSIS
6 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Project Report (Group 9)
No ratings yet
Project Report (Group 9)
20 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
Base Paper (YOLO)
No ratings yet
Base Paper (YOLO)
6 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
1-s2.0-S1877050924033301-main
No ratings yet
1-s2.0-S1877050924033301-main
7 pages
YOLO Based Object Detection Models: A Review and Its Applications
No ratings yet
YOLO Based Object Detection Models: A Review and Its Applications
40 pages
Report 34
No ratings yet
Report 34
22 pages
Wepik Advancing Object Detection Unveiling The Potential For Precision and Efficiency 202401081226449LyU
No ratings yet
Wepik Advancing Object Detection Unveiling The Potential For Precision and Efficiency 202401081226449LyU
22 pages
1 ObjectDetection
No ratings yet
1 ObjectDetection
46 pages
Object Detection Using Yolo Algorithm-1
No ratings yet
Object Detection Using Yolo Algorithm-1
9 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
s11042-024-18872-y
No ratings yet
s11042-024-18872-y
40 pages
Aws RP
No ratings yet
Aws RP
11 pages
Project
100% (1)
Project
30 pages
Object Detection and Segmentation On Tensor Flow Using
No ratings yet
Object Detection and Segmentation On Tensor Flow Using
10 pages
Object Detection
No ratings yet
Object Detection
76 pages
"Object Detection With Yolo": A Seminar On
No ratings yet
"Object Detection With Yolo": A Seminar On
14 pages
DEVANSH RAJESH DHURI 8TH F ROLL NO.13 [OBJECT DETECTION IN AI]
No ratings yet
DEVANSH RAJESH DHURI 8TH F ROLL NO.13 [OBJECT DETECTION IN AI]
10 pages
SEMINAR
No ratings yet
SEMINAR
13 pages
Paper Survey On Performance Metrics For Object Detection Algorithms
No ratings yet
Paper Survey On Performance Metrics For Object Detection Algorithms
6 pages
ankit synopsis
No ratings yet
ankit synopsis
13 pages
The Ultimate Guide To Object Detection
No ratings yet
The Ultimate Guide To Object Detection
16 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
Object Detection Models
No ratings yet
Object Detection Models
36 pages
Od Segment
No ratings yet
Od Segment
53 pages
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
No ratings yet
Real Time Object Detection in Surveillance Cameras With 2xjeq74wam
8 pages
Deep Learning for Object Detection - 131124
No ratings yet
Deep Learning for Object Detection - 131124
35 pages
Object Detection Using TensorFlow
No ratings yet
Object Detection Using TensorFlow
21 pages
C11240283S19
No ratings yet
C11240283S19
4 pages
CSE4261 Lecture-12
No ratings yet
CSE4261 Lecture-12
24 pages
Vijay Report
No ratings yet
Vijay Report
14 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
Final Report - Removed
No ratings yet
Final Report - Removed
43 pages
Improvement of Object Detection Based On Faster R - 220904 150051
No ratings yet
Improvement of Object Detection Based On Faster R - 220904 150051
5 pages
Deep Learning: Dr. Sanjeev Sharma
No ratings yet
Deep Learning: Dr. Sanjeev Sharma
61 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
Presentation1 FINAL 1
No ratings yet
Presentation1 FINAL 1
11 pages
Object Detection Report
No ratings yet
Object Detection Report
27 pages
Object-Detection-with-YOLO
No ratings yet
Object-Detection-with-YOLO
18 pages
EdgeYOLO AnEdge-Real-Time Object Detector
No ratings yet
EdgeYOLO AnEdge-Real-Time Object Detector
7 pages
arjun1123 (3)
No ratings yet
arjun1123 (3)
20 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
yolo
No ratings yet
yolo
32 pages
C11240283S19
No ratings yet
C11240283S19
4 pages
Aiav Unit 2 Notes
No ratings yet
Aiav Unit 2 Notes
8 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
10 1109@iwssip48289 2020 9145130
No ratings yet
10 1109@iwssip48289 2020 9145130
6 pages
A Survey of Modern Object Detection Literature Using Deep Learning
No ratings yet
A Survey of Modern Object Detection Literature Using Deep Learning
15 pages
Object Detection Models Part2
No ratings yet
Object Detection Models Part2
12 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
Team 10
No ratings yet
Team 10
20 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
4 MachineLearningForCV
No ratings yet
4 MachineLearningForCV
73 pages
12 OpticalFLow
No ratings yet
12 OpticalFLow
57 pages
2 ConvolutionFilterig
No ratings yet
2 ConvolutionFilterig
42 pages
1 Act1
No ratings yet
1 Act1
1 page

Uploaded by

Uploaded by

CSCI E-25

Copyright 2020, 2021,2022, 2023, Stephen F Elston. All rights reserved.

Try it yourself! Object detection is widely used commercially

• Hard to uniquely detect

Object detection algorithms have some common elements

• How many people are in

• Selection of object detection

Increasing number of categories

Increasing frame rate - speed

• But parameters of the bounding box are

Figure from Balasubramaniam and Pasricha, 2022

Filter all boxes with p0 below threshold, say 0.5

Where, the class probability prediction is given by the softmax function:

• For the simple case of binary classification, we write Dice loss:

• Focal loss reweights cross-entropy:

• There is an inherent trade-off between precision and recall

• Precision decreases as recall

• SSD takes a different approach to the speed-accuracy trade-off

• SSD is a fully convolutional network

VGG-16 Backbone Convolutional Layers

Convolutional Layers of neck down sample to multiple scales

Backbone DarkNet- Neck to Head detects Sparse Prediction

• YOLOv4 introduced mosaic

Figure from Bochkovskiy, et. al., 2020

• S = Sensitivity for bounding box

Figure from Bochkovskiy, et. al., 2020

Figure from Bochkovskiy, et. al., 2020

Constant window size (UViT) or increasing window size

How well does the pure transformer model work?

You might also like