Multimodal end-to-end autonomous driving

dc.contributor.author
Xiao, Yi
dc.contributor.author
Codevilla Moraes, Felipe
dc.contributor.author
Gurram, Akhil
dc.contributor.author
Urfalioglu, Onay
dc.contributor.author
López Peña, Antonio M.
dc.date.issued
2022
dc.identifier
https://ddd.uab.cat/record/274828
dc.identifier
urn:10.1109/TITS.2020.3013234
dc.identifier
urn:oai:ddd.uab.cat:274828
dc.identifier
urn:oai:egreta.uab.cat:publications/f77d5206-3a59-49e7-bf8e-01ee712973a6
dc.identifier
urn:pure_id:293376217
dc.identifier
urn:scopus_id:85122405200
dc.identifier
urn:articleid:15580016v23n1p537
dc.description.abstract
Altres ajuts: Antonio M. Lopez acknowledges the financial support by ICREA under the ICREA Academia Program. We also thank the Generalitat de Catalunya CERCA Program, as well as its ACCIO agency
dc.description.abstract
A crucial component of an autonomous vehicle (AV) is the artificial intelligence (AI) is able to drive towards a desired destination. Today, there are different paradigms addressing the development of AI drivers. On the one hand, we find modular pipelines, which divide the driving task into sub-tasks such as perception and maneuver planning and control. On the other hand, we find end-to-end driving approaches that try to learn a direct mapping from input raw sensor data to vehicle control signals. The later are relatively less studied, but are gaining popularity since they are less demanding in terms of sensor data annotation. This paper focuses on end-to-end autonomous driving. So far, most proposals relying on this paradigm assume RGB images as input sensor data. However, AVs will not be equipped only with cameras, but also with active sensors providing accurate depth information (e.g. , LiDARs). Accordingly, this paper analyses whether combining RGB and depth modalities, i.e. using RGBD data, produces better end-to-end AI drivers than relying on a single modality. We consider multimodality based on early, mid and late fusion schemes, both in multisensory and single-sensor (monocular depth estimation) settings. Using the CARLA simulator and conditional imitation learning (CIL), we show how, indeed, early fusion multimodality outperforms single-modality.
dc.format
application/pdf
dc.language
eng
dc.publisher
dc.relation
Agencia Estatal de Investigación TIN2017-88709-R
dc.relation
Agència de Gestió d'Ajuts Universitaris i de Recerca 2017/FI-B1-00162
dc.relation
IEEE transactions on intelligent transportation systems ; Vol. 23, issue 1 (Jan. 2022), p. 537-547
dc.rights
open access
dc.rights
Aquest material està protegit per drets d'autor i/o drets afins. Podeu utilitzar aquest material en funció del que permet la legislació de drets d'autor i drets afins d'aplicació al vostre cas. Per a d'altres usos heu d'obtenir permís del(s) titular(s) de drets.
dc.rights
https://rightsstatements.org/vocab/InC/1.0/
dc.subject
Semantics
dc.subject
Task analysis
dc.subject
Laser radar
dc.subject
Autonomous vehicles
dc.subject
Cameras
dc.title
Multimodal end-to-end autonomous driving
dc.type
Article


Files in this item

FilesSizeFormatView

There are no files associated with this item.

This item appears in the following Collection(s)