0% found this document useful (0 votes)

16 views

Aiav Unit 2 Notes

The document discusses the importance of perception in autonomous driving, highlighting the role of various sensors and datasets in enhancing object detection and segmentation. It covers key detection methods, challenges in tracking, and advancements in convolutional neural networks (CNNs) for object detection and semantic segmentation. Additionally, it addresses stereo vision and optical flow techniques for depth estimation and motion analysis, emphasizing the need for robust algorithms in complex driving environments.

Uploaded by

Rohan S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Aiav Unit 2 Notes

Uploaded by

Rohan S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Perception in Autonomous Driving

Introduction

Autonomous vehicles operate in complex environments where accurate perception is crucial.

Sensors such as cameras, LiDAR, radar, and ultrasonic sensors help in gathering
environmental data. Among these, cameras and LiDAR are the most informative. Computer
vision plays a key role in processing visual data for autonomous driving. Since the 1980s,
significant advancements have been made in perception, yet it remains a major challenge.

Datasets

Large datasets are essential for improving perception algorithms in autonomous vehicles.
These datasets help in quantitative evaluation, exposing weaknesses, and enabling fair
comparisons. Common datasets for computer vision tasks include those for image
classification, semantic segmentation, optical flow, stereo vision, and tracking.

For autonomous driving, key datasets include:

• KITTI: A dataset created by Karlsruhe Institute of Technology and Toyota

Technological Institute at Chicago, featuring real-world street scenes collected using
multiple sensors, including GPS, LiDAR, and cameras. It contains data for stereo
vision, optical flow, visual odometry, object detection, object tracking, and road
parsing.
• Cityscapes: A dataset focused on urban scene segmentation.

Newer datasets include:

• Audi Autonomous Driving Dataset (A2D2)

• nuScenes
• Berkeley DeepDrive
• Waymo Open Dataset
• Lyft Level 5 Open Data

These datasets provide high-precision 3D geometry, real-world scenarios, and diverse

perception tasks, making them essential for advancing autonomous driving research.
Detection and Segmentation in Autonomous Driving

Detection

Autonomous vehicles must detect various objects on the road, including cars, pedestrians,
obstacles, and lane markers. Object detection involves three main stages:

1. Preprocessing of input images

2. Region of interest detection
3. Classification of detected objects

Challenges in detection include variations in position, size, shape, orientation, and

appearance. Additionally, detection must be performed in real time for safe navigation.

Key Detection Methods:

• Histogram of Oriented Gradients (HOG) + Support Vector Machine (SVM)

(Dalal & Triggs, 2005): Uses sliding windows to extract HOG features and classify
objects with a linear SVM.
• Deformable Part Model (DPM) (Felzenszwalb et al.): Splits objects into smaller
parts to handle non-rigid shapes and uses latent SVM for detection.
• LiDAR-based Detection: While LiDAR performs well for car detection, it struggles
with pedestrians and cyclists, highlighting the need for sensor fusion.

Pedestrian detection is particularly critical for safety, as human behavior is unpredictable and
often occluded. Modern detectors rely on convolutional neural networks (CNNs), which are
discussed in the next chapter.

Segmentation

Semantic segmentation enhances object detection by assigning a class label to each pixel,
providing a structured understanding of the environment.

Traditional Approach:

• Conditional Random Fields (CRF): A graphical model where nodes (pixels) are
assigned labels based on extracted features, ensuring spatial smoothness and object
coherence.
• Challenges: CRF struggles with long-range dependencies and computational
efficiency.

Advancements:

• Fully connected CRFs with pairwise potentials improve inference speed.

• Algorithms incorporating object class co-occurrence enhance accuracy.
• Deep learning approaches (discussed in the next chapter) improve segmentation
performance using multi-scale features and contextual reasoning.
Stereo, Optical Flow, and Scene Flow in Autonomous Driving

Stereo and Depth Perception

Autonomous vehicles require 3D spatial information for navigation. While LiDAR provides
precise but sparse depth data, stereo cameras offer dense visual information.

• Stereo vision mimics human binocular vision by capturing images from two slightly
different angles and solving a correspondence problem to estimate depth.
• Feature-based methods use distinctive features (e.g., SIFT, SURF) for matching but
provide sparse results.
• Area-based methods use spatial smoothness constraints to compute dense disparity
maps but require more computation.
• Global methods (e.g., Semi-Global Matching (SGM)) optimize disparity estimation
by minimizing energy functions, improving accuracy and efficiency.
• Deep learning-based methods now achieve the best performance in stereo matching
(discussed in the next chapter).

The depth (z) of an object is derived using the formula:

z=Bdfz = \frac{B d}{f}

where B is the camera baseline, d is the disparity, and f is the focal length.

Optical Flow

Optical flow estimates 2D motion by tracking intensity changes between consecutive images.
Unlike stereo vision, which captures images simultaneously, optical flow must account for:

• Motion variations due to lighting changes, reflections, and transparency.

• The aperture problem, where motion ambiguity arises due to limited local
observations.

To improve robustness, alternative cost functions have been introduced to replace the
quadratic penalty in classical methods.

Scene Flow

Autonomous vehicles need 3D motion estimation rather than just 2D optical flow. Scene
flow extends optical flow by using two consecutive stereo image pairs to estimate both:

• 3D positions of points.
• 3D motion between time intervals.

The KITTI Scene Flow 2015 benchmark evaluates methods for accurate 3D motion
estimation, crucial for vehicle navigation and obstacle avoidance.
Object Tracking in Autonomous Vehicles

Tracking Overview

Tracking estimates an object's location, speed, and acceleration over time, allowing
autonomous vehicles to maintain safe distances and predict movement. This is particularly
challenging for pedestrians and cyclists due to sudden direction changes.

Challenges in tracking:

• Occlusion (partial/full obstruction of objects).

• Appearance similarity among objects of the same class.
• Variability in appearance due to lighting, pose, and articulation.

Bayesian Filtering Approach

Tracking is traditionally modeled as a sequential Bayesian filtering problem with two main
steps:

1. Prediction: The object's state is estimated based on past motion.

2. Correction: The state estimate is refined using new sensor observations.

A commonly used method is the Particle Filter, but its recursive nature makes recovery from
missed detections difficult.

Alternative Approaches

• Energy minimization: Finds the optimal object trajectory by enforcing motion

smoothness and appearance consistency. However, the large number of possible
object hypotheses makes this approach computationally expensive.
• Tracking-by-detection: Detects objects in consecutive frames and links them, but
faces missed detections and false positives from object detectors.

Markovian Decision Process (MDP) for Tracking

MDP-based tracking defines object states and transitions:

• Active: Object detected.

• Tracked: Object confirmed as valid.
• Lost: Object temporarily undetected but might reappear.
• Inactive: Object lost for too long, removed from tracking.

MDP Tracking Algorithm:

• Uses an SVM classifier to validate detections.

• Applies a tracking-learning-detection model to maintain appearance consistency.
• Continuously updates the object's bounding box template for re-identification.
• Moves objects between states based on learned transition and reward functions.
CNNs and Object Detection in Autonomous Driving

4.1 Convolutional Neural Networks (CNNs)

CNNs are a type of deep neural network that use convolution as the primary computational
operation. They were first introduced by LeCun et al. in 1988, inspired by the visual cortex's
structure. CNNs excel in computer vision tasks due to:

• Local connectivity: Neurons only connect to nearby neurons within a receptive

field.
• Weight sharing: Spatially shared weights reduce the number of parameters, making
CNNs efficient.
• Translation invariance: CNNs learn patterns irrespective of their location in the
image.

CNNs revolutionized computer vision, with models like AlexNet (2012) leading to state-of-
the-art autonomous driving perception systems.

4.2 Object Detection in Autonomous Driving

Traditional Object Detection

Early detection methods relied on hand-crafted features and structured classifiers, but these
struggled with large data volumes and object variations.

CNN-Based Object Detection

Girshick et al. introduced R-CNN, proving CNNs outperform traditional methods. Faster R-
CNN improved detection by using a Region Proposal Network (RPN) for generating
potential object locations.

Faster R-CNN Pipeline:

1. RPN generates region proposals by scanning feature maps using anchor boxes of
different sizes (e.g., 128×128, 256×256, 512×512).
2. ROI pooling refines proposals, mapping them to a fixed-size feature map.
3. Final classification and bounding box regression predict object type and precise
location.

Proposal-Free Algorithms

Some models avoid the region proposal step for real-time performance:

• SSD (Single Shot MultiBox Detector): Uses multiple convolutional layers to detect
objects of varying sizes in a single pass.
• YOLO (You Only Look Once): Directly predicts object locations and classes in one
forward pass, achieving high speed.
While proposal-free methods are faster, Faster R-CNN still achieves the highest accuracy in
benchmarks like PASCAL VOC. However, it struggles with small, occluded objects in
datasets like KITTI.

Multi-Scale CNN (MS-CNN)

To handle objects of varying sizes, MS-CNN introduces:

• A "trunk" CNN for feature extraction.

• "Branches" with deconvolution layers and ROI pooling to refine small object
detection.
• Improved performance in detecting pedestrians and cyclists compared to Faster R-
CNN.

Anchor-Free Object Detection (FCOS)

Recent methods like FCOS (Fully Convolutional One-Stage Detector) remove predefined
anchor boxes, making detection more flexible.

• Uses a feature pyramid network to extract multi-scale features.

• Employs shared classification and regression heads across feature levels.
• Introduces a center-ness branch to suppress inaccurate detections.
• Achieves state-of-the-art accuracy with lower memory usage.

Conclusion

CNN-based object detection plays a vital role in autonomous driving.

• Faster R-CNN is highly accurate but computationally expensive.

• SSD and YOLO offer real-time detection but trade-off some accuracy.
• MS-CNN and FCOS improve detection for small or occluded objects.

Ongoing advancements in multi-scale detection and anchor-free methods continue to

refine object detection for autonomous driving perception.
Semantic Segmentation in Autonomous Driving

4.3 Semantic Segmentation

Semantic segmentation is crucial in autonomous driving perception, as it helps identify road

surfaces, obstacles, and other scene elements at the pixel level.

Fully Convolutional Networks (FCN)

• FCNs transform traditional CNNs (e.g., VGG-19) by removing the softmax layer
and replacing fully connected layers with 1×1 convolutions.
• They allow input of any size and predict per-pixel labels for segmentation.
• However, small object segmentation is challenging due to dominant larger receptive
fields.

Pyramid Scene Parsing Network (PSPNet)

To address global-local feature integration, Zhao et al. proposed PSPNet, which enhances
FCNs using a pyramid pooling module.

PSPNet Workflow:

1. Feature extraction: A CNN (ResNet) extracts feature maps from the input image.
2. Pyramid pooling: Multi-level pooling (1×1, 2×2, 3×3, 6×6) aggregates contextual
information.
3. Feature compression: Feature maps are passed through 1×1 convolutions for
dimensional reduction.
4. Upsampling & fusion: Pooled features are upsampled and concatenated with original
feature maps for final pixel-wise classification.

Key Findings from PSPNet Experiments:

• Average pooling performs better than max pooling.

• Multi-level pyramid pooling improves segmentation over global-only pooling.
• Feature dimensionality reduction helps maintain efficiency.
• Auxiliary loss aids deep network optimization.

PSPNet won 1st place in the ImageNet Scene Parsing Challenge 2016 and achieved state-
of-the-art performance on PASCAL VOC 2012 and Cityscapes datasets.

Conclusion

Deep learning, particularly FCN-based architectures like PSPNet, has significantly

advanced semantic segmentation in autonomous driving, ensuring more precise road scene
understanding for safer navigation.
Stereo and Optical Flow in Autonomous Driving

4.4 Stereo and Optical Flow

Stereo vision and optical flow are key techniques for depth estimation and motion analysis
in autonomous driving. Both involve matching corresponding points between two images.

4.4.1 Stereo Vision

• Content-CNN (Siamese Architecture):

o Uses two CNN branches (for left and right images) with shared weights.
o Outputs are merged via an inner-product layer to estimate pixel disparity.
o Disparity estimation is treated as a classification problem over possible
disparity values.
o Achieves fast processing on the KITTI Stereo 2012 dataset.
o Post-processing (e.g., local smoothing, semi-global matching) enhances
accuracy and 3D depth estimation.

4.4.2 Optical Flow

• FlowNet (Encoder-Decoder Architecture):

o FlowNetSimple: Stacks images and applies convolution layers, but is
computationally heavy.
o FlowNetCorr: Extracts features separately, merges via a correlation layer,
and applies convolutions.
o Uses “up-convolution” to restore resolution after compression.
o FlowNet achieves competitive performance on KITTI with 0.15-sec GPU
inference time.
o SpyNet refines optical flow estimation using a coarse-to-fine spatial
pyramid approach.
o SpyNet achieves state-of-the-art performance with a lightweight model,
ideal for mobile applications.

4.4.3 Unsupervised Learning for Dense Correspondence

• Challenge: Ground truth dense correspondence is expensive and difficult to collect.

• MonoDepth & MonoDepth2 use unsupervised learning from video frames.
o Loss components:
1. Appearance Matching Loss – Assumes corresponding pixels in two
views are visually similar.
2. Disparity Smoothness Loss – Enforces local smoothness with
occasional depth discontinuities.
3. Left–Right Disparity Consistency Loss – Ensures disparity
consistency between left and right views.
o Uses an encoder-decoder structure with skip connections.
o Performs better than traditional methods, with improvements by increasing
training data.

Webdesign Proposal Template and Example 060420
No ratings yet
Webdesign Proposal Template and Example 060420
13 pages
Autonomous Driving Perception Fundamentals And Applications Rui Fan download
No ratings yet
Autonomous Driving Perception Fundamentals And Applications Rui Fan download
87 pages
IEEE _ MODAT_Sairaj
No ratings yet
IEEE _ MODAT_Sairaj
4 pages
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
No ratings yet
Robustness-Aware 3D Object Detection in Autonomous Driving: A Review and Outlook
32 pages
Electronics 13 02790
No ratings yet
Electronics 13 02790
15 pages
Complete Research Paper Computer Vision - Copy
No ratings yet
Complete Research Paper Computer Vision - Copy
6 pages
Final Project Paper Akash
No ratings yet
Final Project Paper Akash
5 pages
Automatic Vehicles Report
No ratings yet
Automatic Vehicles Report
19 pages
Hu Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Paper
No ratings yet
Hu Joint Monocular 3D Vehicle Detection and Tracking ICCV 2019 Paper
10 pages
Joint Monocular 3D Vehicle Detection and Tracking
No ratings yet
Joint Monocular 3D Vehicle Detection and Tracking
18 pages
A Survey On 3D Object Detection Methods For Autonomous Driving Applications
No ratings yet
A Survey On 3D Object Detection Methods For Autonomous Driving Applications
14 pages
Fast and Furious
No ratings yet
Fast and Furious
9 pages
RRPN: Radar Region Proposal Network For Object Detection in Autonomous Vehicles
No ratings yet
RRPN: Radar Region Proposal Network For Object Detection in Autonomous Vehicles
5 pages
Multi-Modal 3D Object Detection in Autonomous Driving a Survey and Taxonomy
No ratings yet
Multi-Modal 3D Object Detection in Autonomous Driving a Survey and Taxonomy
18 pages
2408.06113v1
No ratings yet
2408.06113v1
8 pages
3D-Object Detection for Autonomous Vehicles - Towards Data Science
No ratings yet
3D-Object Detection for Autonomous Vehicles - Towards Data Science
23 pages
Second Progress Report UID - 17BCS2127
No ratings yet
Second Progress Report UID - 17BCS2127
13 pages
1902 07830
No ratings yet
1902 07830
27 pages
s10015-021-00711-0
No ratings yet
s10015-021-00711-0
8 pages
51 Submission
No ratings yet
51 Submission
5 pages
Ijet V4i3p31 PDF
No ratings yet
Ijet V4i3p31 PDF
5 pages
Object Detection and Segmentation On Tensor Flow Using
No ratings yet
Object Detection and Segmentation On Tensor Flow Using
10 pages
Comparative analysis of feature descriptors and classifiers for real-time object detection
No ratings yet
Comparative analysis of feature descriptors and classifiers for real-time object detection
11 pages
s11042-021-11137-y
No ratings yet
s11042-021-11137-y
25 pages
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
No ratings yet
Obstacle Detection and Classification Using Deep Learning For Tracking in High-Speed Autonomous Driving
6 pages
Sample Case Study Report 3
No ratings yet
Sample Case Study Report 3
8 pages
Major PRC-1 ppt-1
No ratings yet
Major PRC-1 ppt-1
12 pages
WRAP Survey 3D Object Detection Methods Autonomous Driving Applications Arnold 2019
No ratings yet
WRAP Survey 3D Object Detection Methods Autonomous Driving Applications Arnold 2019
15 pages
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
No ratings yet
U D: T U D P A C C: NI Rive Owards Niversal Riving Erception Cross Amera Onfigurations
14 pages
Real Time Object Detection System
No ratings yet
Real Time Object Detection System
31 pages
Electronics 10 03159 v2
No ratings yet
Electronics 10 03159 v2
22 pages
NITCAD_-_Developing_an_object_detection_classifica
No ratings yet
NITCAD_-_Developing_an_object_detection_classifica
10 pages
MINI PROJECT SYNOPSIS
No ratings yet
MINI PROJECT SYNOPSIS
6 pages
Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly
No ratings yet
Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly
41 pages
Autonomous Car
No ratings yet
Autonomous Car
22 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
An End-to-End Curriculum Learning Approach For Aut
No ratings yet
An End-to-End Curriculum Learning Approach For Aut
10 pages
Sensors 22 04833
No ratings yet
Sensors 22 04833
17 pages
Realtime Visual Recognition in Deep Convolutional Neural Networks
No ratings yet
Realtime Visual Recognition in Deep Convolutional Neural Networks
13 pages
SSRN Id4296815
No ratings yet
SSRN Id4296815
11 pages
2106.10823v3
No ratings yet
2106.10823v3
19 pages
Seminar Topic On Spot Detection of Self Driving Car
No ratings yet
Seminar Topic On Spot Detection of Self Driving Car
7 pages
20231108
No ratings yet
20231108
6 pages
A Stereo Perception Framework For Autonomous Vehicles
No ratings yet
A Stereo Perception Framework For Autonomous Vehicles
6 pages
Aws RP
No ratings yet
Aws RP
11 pages
JOIV - Template - A Thorough Review of Vehicle Detection and Distance Estimation Using Deep Learning in Autonomous Cars
No ratings yet
JOIV - Template - A Thorough Review of Vehicle Detection and Distance Estimation Using Deep Learning in Autonomous Cars
10 pages
Self - Driving - Car (Main Project)
No ratings yet
Self - Driving - Car (Main Project)
34 pages
Object Recognition and Detection With Deep Learning For Autonomous Driving Applications
No ratings yet
Object Recognition and Detection With Deep Learning For Autonomous Driving Applications
11 pages
Object Detection
No ratings yet
Object Detection
13 pages
Multi-Modal 3D Object Detection in Autonomous Driving [A Survey]
No ratings yet
Multi-Modal 3D Object Detection in Autonomous Driving [A Survey]
30 pages
Object Tracking For Autonomous Vehicles: Project Report
No ratings yet
Object Tracking For Autonomous Vehicles: Project Report
10 pages
5-IJLEMR-77839
No ratings yet
5-IJLEMR-77839
5 pages
Visual-LiDAR Based 3D Object Detection and Tracking For Embedded Systems
No ratings yet
Visual-LiDAR Based 3D Object Detection and Tracking For Embedded Systems
14 pages
Research Proposal - Enhancement of CNN-based Object Detection in Autonomous Vehicles
No ratings yet
Research Proposal - Enhancement of CNN-based Object Detection in Autonomous Vehicles
2 pages
IJCV2023_Multi-Modal 3D Object Detection in Autonomous Driving a Survey
No ratings yet
IJCV2023_Multi-Modal 3D Object Detection in Autonomous Driving a Survey
31 pages
Alert System
No ratings yet
Alert System
7 pages
OBJECT DETECTION IN AUTONOMOUS VEHICLES USING CNN Report FINAL
No ratings yet
OBJECT DETECTION IN AUTONOMOUS VEHICLES USING CNN Report FINAL
62 pages
Object Detection and Identification Using Deep Learning and OpenCV
No ratings yet
Object Detection and Identification Using Deep Learning and OpenCV
7 pages
Franke 1998
No ratings yet
Franke 1998
9 pages
MonA04-4
No ratings yet
MonA04-4
8 pages
Pyramid Image Processing: Exploring the Depths of Visual Analysis
From Everand
Pyramid Image Processing: Exploring the Depths of Visual Analysis
Fouad Sabry
No ratings yet
Modular Cylinder: Description
No ratings yet
Modular Cylinder: Description
4 pages
A Curated List of Awesome Kubernetes Tools and Resources
100% (2)
A Curated List of Awesome Kubernetes Tools and Resources
16 pages
Biostatistics (HS167) Lab Manual: # Variable Name Variable Label Codes and Parameters (Dots Represent Missing Data)
No ratings yet
Biostatistics (HS167) Lab Manual: # Variable Name Variable Label Codes and Parameters (Dots Represent Missing Data)
15 pages
Career Interest Survey
No ratings yet
Career Interest Survey
1 page
Datasheet - Quattro 3kVA - 10kVA - Rev 12 - EN PDF
No ratings yet
Datasheet - Quattro 3kVA - 10kVA - Rev 12 - EN PDF
2 pages
HAAS-Recycling-Systems_TYRON-2.0_en_20181203-3
No ratings yet
HAAS-Recycling-Systems_TYRON-2.0_en_20181203-3
3 pages
Service Manual: Color Television Receiver
No ratings yet
Service Manual: Color Television Receiver
35 pages
Ficha Tecnica de SIM800L V2.0
No ratings yet
Ficha Tecnica de SIM800L V2.0
4 pages
F23 Final Exam Review
No ratings yet
F23 Final Exam Review
12 pages
NAVSTAR Compressed
No ratings yet
NAVSTAR Compressed
19 pages
CST Form
No ratings yet
CST Form
22 pages
ZR - ZR Plus - PlantNet Position Monitors - Tyco Valves &amp Controls Home
No ratings yet
ZR - ZR Plus - PlantNet Position Monitors - Tyco Valves &amp Controls Home
8 pages
April 24 internet bill
No ratings yet
April 24 internet bill
1 page
MIG Welder Transfer Types - Short Circut, Gloublar & Spray Transfer
No ratings yet
MIG Welder Transfer Types - Short Circut, Gloublar & Spray Transfer
10 pages
TCS NQT - Examinee Instructions - Dos and Donts - To Set Up The System - For Taking Online Exam - Oct - v6.0
No ratings yet
TCS NQT - Examinee Instructions - Dos and Donts - To Set Up The System - For Taking Online Exam - Oct - v6.0
37 pages
Report63pdf__2025_04_04_15_01_41 (1)
No ratings yet
Report63pdf__2025_04_04_15_01_41 (1)
2 pages
OPTOMA Hd28e-Full-Hd
No ratings yet
OPTOMA Hd28e-Full-Hd
2 pages
Assignment 1
No ratings yet
Assignment 1
27 pages
Institute Name: Sree Vidyanikethan Engineering College (IR-O-C-26929)
No ratings yet
Institute Name: Sree Vidyanikethan Engineering College (IR-O-C-26929)
25 pages
DS_Lab_manual R23
No ratings yet
DS_Lab_manual R23
67 pages
Blackwire 8225 Ds en
No ratings yet
Blackwire 8225 Ds en
2 pages
I/O Port Structure: by B. Prasanthi, Assistant Professor Department. of ECE AITS, Rajampet
100% (1)
I/O Port Structure: by B. Prasanthi, Assistant Professor Department. of ECE AITS, Rajampet
40 pages
Price List (West & South Except Karnataka)
No ratings yet
Price List (West & South Except Karnataka)
96 pages
CT-2 QP - Set A - Answer
No ratings yet
CT-2 QP - Set A - Answer
10 pages
Huawei Nova 3 SCH
100% (1)
Huawei Nova 3 SCH
74 pages
v1.0 Sage 200 Learning Pack Session 1 Amending Transactions
No ratings yet
v1.0 Sage 200 Learning Pack Session 1 Amending Transactions
6 pages
Smart Buildings Systems for Architects Owners and Builders James M Sinopoli pdf download
100% (1)
Smart Buildings Systems for Architects Owners and Builders James M Sinopoli pdf download
52 pages
THMS600 T95 Manual
No ratings yet
THMS600 T95 Manual
24 pages
Internet and Computer Virus
No ratings yet
Internet and Computer Virus
12 pages

Uploaded by

Uploaded by

Perception in Autonomous Driving

Autonomous vehicles operate in complex environments where accurate perception is crucial.

For autonomous driving, key datasets include:

• KITTI: A dataset created by Karlsruhe Institute of Technology and Toyota

Newer datasets include:

• Audi Autonomous Driving Dataset (A2D2)

These datasets provide high-precision 3D geometry, real-world scenarios, and diverse

1. Preprocessing of input images

Challenges in detection include variations in position, size, shape, orientation, and

Key Detection Methods:

• Histogram of Oriented Gradients (HOG) + Support Vector Machine (SVM)

• Fully connected CRFs with pairwise potentials improve inference speed.

Stereo and Depth Perception

The depth (z) of an object is derived using the formula:

z=Bdfz = \frac{B d}{f}

• Motion variations due to lighting changes, reflections, and transparency.

• Occlusion (partial/full obstruction of objects).

Bayesian Filtering Approach

1. Prediction: The object's state is estimated based on past motion.

• Energy minimization: Finds the optimal object trajectory by enforcing motion

Markovian Decision Process (MDP) for Tracking

MDP-based tracking defines object states and transitions:

• Active: Object detected.

MDP Tracking Algorithm:

• Uses an SVM classifier to validate detections.

4.1 Convolutional Neural Networks (CNNs)

• Local connectivity: Neurons only connect to nearby neurons within a receptive

4.2 Object Detection in Autonomous Driving

Traditional Object Detection

CNN-Based Object Detection

Faster R-CNN Pipeline:

Multi-Scale CNN (MS-CNN)

To handle objects of varying sizes, MS-CNN introduces:

• A "trunk" CNN for feature extraction.

Anchor-Free Object Detection (FCOS)

• Uses a feature pyramid network to extract multi-scale features.

CNN-based object detection plays a vital role in autonomous driving.

• Faster R-CNN is highly accurate but computationally expensive.

Ongoing advancements in multi-scale detection and anchor-free methods continue to

4.3 Semantic Segmentation

Semantic segmentation is crucial in autonomous driving perception, as it helps identify road

Fully Convolutional Networks (FCN)

Pyramid Scene Parsing Network (PSPNet)

Key Findings from PSPNet Experiments:

• Average pooling performs better than max pooling.

Deep learning, particularly FCN-based architectures like PSPNet, has significantly

4.4 Stereo and Optical Flow

4.4.1 Stereo Vision

• Content-CNN (Siamese Architecture):

4.4.2 Optical Flow

• FlowNet (Encoder-Decoder Architecture):

4.4.3 Unsupervised Learning for Dense Correspondence

• Challenge: Ground truth dense correspondence is expensive and difficult to collect.

You might also like