Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Vilaplana Besler, Verónica
2024-09-13
Depth perception is the capability of a system to estimate the distance of objects in a 3D scene by analyzing one or more 2D images or video frames. Depth perception is critical for applications such as robotics, autonomous vehicles, and augmented reality. This thesis explores two primary approaches to depth estimation: Monocular Depth Estimation (MDE) and Depth Completion (DC). In both cases, the best performing methods leverage Deep Learning (DL) technology. Deep learning models are complex networks that learn independently without human intervention. MDE involves estimating depth from a single RGB image, most methods leverage scene knowledge to solve what is considered to be an ill-posed problem. This includes using geometric cues, such as the relative size and position of objects within the scene, as well as texture gradients that can suggest how surfaces are oriented. Additionally, these methods often rely on prior information about the typical sizes and shapes of objects. MDE can be categorized into relative depth estimation, where the focus is on the depth relationships within the scene, and absolute depth estimation, which aims to provide precise distance measurements from the camera to objects. This thesis evaluates state-of-the-art MDE methods for both relative and absolute depth estimation. For this purpose, certain ground truth depth is needed. Ground truth depth is obatined using RGB-D cameras and Structure from Motion (SfM) techniques. In addition, a depth completion method is studied and used as a base to build a DL model that leverages the performance of relative MDE methods to enhance sparse depth data from sensors like LiDAR. After evaluation, the results show how errors are reduced and highlights the potential of these methods to be applied in real-life scenarios.
Master thesis
Anglès
Àrees temàtiques de la UPC::Enginyeria de la telecomunicació::Processament del senyal; Deep learning; Image processing; Computer vision; Depth; Monocular; DeepLearning; Completion; RGB-D; Lidar; Sensing; Aprenentatge profund; Imatges -- Processament; Visió per ordinador
Universitat Politècnica de Catalunya
S'autoritza la difusió de l'obra mitjançant la llicència Creative Commons o similar 'Reconeixement-NoComercial- SenseObraDerivada'
Open Access
Treballs acadèmics [82539]