A combination of monocular depth estimation and depth completion for an enhanced RGB-D sensing

Martínez Lanza, Hugo

A combination of monocular depth estimation and depth completion for an enhanced RGB-D sensing

dc.contributor

Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions

dc.contributor

Vilaplana Besler, Verónica

dc.contributor.author

Martínez Lanza, Hugo

dc.date.issued

2024-09-13

dc.identifier

https://hdl.handle.net/2117/424603

dc.identifier

ETSETB-230.188735

dc.description.abstract

Depth perception is the capability of a system to estimate the distance of objects in a 3D scene by analyzing one or more 2D images or video frames. Depth perception is critical for applications such as robotics, autonomous vehicles, and augmented reality. This thesis explores two primary approaches to depth estimation: Monocular Depth Estimation (MDE) and Depth Completion (DC). In both cases, the best performing methods leverage Deep Learning (DL) technology. Deep learning models are complex networks that learn independently without human intervention. MDE involves estimating depth from a single RGB image, most methods leverage scene knowledge to solve what is considered to be an ill-posed problem. This includes using geometric cues, such as the relative size and position of objects within the scene, as well as texture gradients that can suggest how surfaces are oriented. Additionally, these methods often rely on prior information about the typical sizes and shapes of objects. MDE can be categorized into relative depth estimation, where the focus is on the depth relationships within the scene, and absolute depth estimation, which aims to provide precise distance measurements from the camera to objects. This thesis evaluates state-of-the-art MDE methods for both relative and absolute depth estimation. For this purpose, certain ground truth depth is needed. Ground truth depth is obatined using RGB-D cameras and Structure from Motion (SfM) techniques. In addition, a depth completion method is studied and used as a base to build a DL model that leverages the performance of relative MDE methods to enhance sparse depth data from sensors like LiDAR. After evaluation, the results show how errors are reduced and highlights the potential of these methods to be applied in real-life scenarios.

dc.format

application/pdf

dc.language

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights

S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'

dc.rights

Open Access

dc.subject

Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal

dc.subject

Deep learning

dc.subject

Image processing

dc.subject

Computer vision

dc.subject

Depth

dc.subject

Monocular

dc.subject

DeepLearning

dc.subject

Completion

dc.subject

RGB-D

dc.subject

Lidar

dc.subject

Sensing

dc.subject

Aprenentatge profund

dc.subject

Imatges -- Processament

dc.subject

Visió per ordinador

dc.title

A combination of monocular depth estimation and depth completion for an enhanced RGB-D sensing

dc.type

Master thesis

Fitxers en aquest element

Fitxers	Grandària	Format	Visualització
No hi ha fitxers associats a aquest element.

Aquest element apareix en la col·lecció o col·leccions següent(s)

Treballs acadèmics [82502]