Multimodal end-to-end autonomous driving

Xiao, Yi; Codevilla Moraes, Felipe; Gurram, Akhil; Urfalioglu, Onay; López Peña, Antonio M.

Multimodal end-to-end autonomous driving

dc.contributor.author

Xiao, Yi

dc.contributor.author

Codevilla Moraes, Felipe

dc.contributor.author

Gurram, Akhil

dc.contributor.author

Urfalioglu, Onay

dc.contributor.author

López Peña, Antonio M.

dc.date.issued

2022

dc.identifier

https://ddd.uab.cat/record/274828

dc.identifier

urn:10.1109/TITS.2020.3013234

dc.identifier

urn:oai:ddd.uab.cat:274828

dc.identifier

urn:oai:egreta.uab.cat:publications/f77d5206-3a59-49e7-bf8e-01ee712973a6

dc.identifier

urn:pure_id:293376217

dc.identifier

urn:scopus_id:85122405200

dc.identifier

urn:articleid:15580016v23n1p537

dc.description.abstract

Altres ajuts: Antonio M. Lopez acknowledges the financial support by ICREA under the ICREA Academia Program. We also thank the Generalitat de Catalunya CERCA Program, as well as its ACCIO agency

dc.description.abstract

A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g. , LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.

dc.format

application/pdf

dc.language

eng

dc.publisher

dc.relation

Agencia Estatal de Investigación TIN2017-88709-R

dc.relation

Agència de Gestió d'Ajuts Universitaris i de Recerca 2017/FI-B1-00162

dc.relation

IEEE transactions on intelligent transportation systems ; Vol. 23, issue 1 (Jan. 2022), p. 537-547

dc.rights

open access

dc.rights

Aquest material està protegit per drets d'autor i/o drets afins. Podeu utilitzar aquest material en funció del que permet la legislació de drets d'autor i drets afins d'aplicació al vostre cas. Per a d'altres usos heu d'obtenir permís del(s) titular(s) de drets.

dc.rights

https://rightsstatements.org/vocab/InC/1.0/

dc.subject

Semantics

dc.subject

Task analysis

dc.subject

Laser radar

dc.subject

Autonomous vehicles

dc.subject

Cameras

dc.title

Multimodal end-to-end autonomous driving

dc.type

Article

Files in this item

Files	Size	Format	View
There are no files associated with this item.

This item appears in the following Collection(s)

Articles escrits per personal UAB o publicats per la universitat [88076]