Comparison of motion-based approaches for multi-modal action and gesture recognition from RGB-D

Inicio | ¿Qué es? | Contacto

English | Català

Consultar RECERCAT

Por comunidades y
colecciones Por fecha Por autores Por títulos Por temas (CDU)

Consultar departamento

Por fecha Por autores Por títulos Por temas (CDU)

Estadisticas

Del documento Todo RECERCAT

Mi RECERCAT

Entrar Alertas por correo-e

Directorio de otros repositorios

RECERCAT Principal > Universitat Politècnica de Catalunya > Tesines i projectes i treballs de final de carrera > Visualizar documento

Para acceder a los documentos con el texto completo, por favor, siga el siguiente enlace: http://hdl.handle.net/2117/105657

Título:	Comparison of motion-based approaches for multi-modal action and gesture recognition from RGB-D; Action classification through temporal analysis based on deep architectures
Autor/a:	Bertiche Argila, Hugo
Otros autores:	Escalera Guerrero, Sergio
Abstract:	Automatic action and gesture recognition research field has growth in interest over the last few years. Action recognition can be understood as the automatic classification of generic human actions or activities, such as walking, reading, jumping, etc. while gesture recognition focuses on the analysis of more concrete movements, usually from the upper body, which have a meaning by their own, as waving, saluting, negating, etc. Such interest on the domain comes mainly from its many applications, which include, human-computer interaction, ambient assisted living systems, health care monitoring systems, surveillance, communications, entertainment, etc. This concrete domain shares many similarities with object recognition from still images, nevertheless, it has shown a special characteristic that turns it into a very challenging task. That is, the temporal evolution of actions and gestures. The scenario that is found nowadays into the author’s community is a competition on finding out how to deal with this extra dimensionality. Therefore, the project starts with an exhaustive state-of-the-art analysis, where the most common approaches for dealing with time are summarized. Hand-crafted features rely on the extension of 2D descriptors, such as HoG or SIFT to a third dimension (time) and also the definition of descriptors based on motion features, such as optical flow or scene flow, meanwhile, deep learning models can be categorized into four non-mutually exclusive categories according on how they deal with time: 2D CNNs that perform recognition on still images from videos, averaging results for each of them, 2D CNNs applied over motion features, 3D CNNs able to compute 3D convolutions over 2 spatial dimension and 1 temporal dimension and neural networks which can model temporal evolution, such as RNN and LSTM. After reviewing the literature, a selection and testing of some of this methods is performed to find the direction in which should point the future research on the domain. Additionally, the recent increase on availability of depth sensors (Microsoft’s Kinnect V1 and V2) allow the exploration of multi-modal techniques that take advantage of multiple data sources (RGB and depth). The domain’s background has shown how many algorithms can benefit from this extra modality, by itself or combining with classical RGB. For these reasons, it is mandatory to test as well techniques that rely on multi-modal data, to do so, one of the algorithms selected has been modified to use both, RGB data and depth maps. Hand-crafted algorithms still compete with deep learning approaches in this challenging domain, as neural networks require a much higher complexity to deal with the extra temporal dimension, which implies an increase of the number of parameters to learn by the model, therefore, larger datasets and computational resources are necessary, nevertheless, for this domain, datasets are still sparse and few, that is why many authors propose different workarounds, like pre-training on image recognition datasets or multi-task learning that allows the models to learn from several datasets at once. Due to this situation, the algorithms tested into the scope of this project are of both types, hand-crafted features and deep based models. Also, a late fusion strategy is tested to see how well can be combined the results of both kind of approaches. Finally, the results obtained are compared with other state-of-the-art techniques applied over the same datasets along with a conclusion on the topic.
Materia(s):	-Àrees temàtiques de la UPC::Informàtica -Neural networks (Computer science) -Video recording -acció -gest -trajectòria -RGB -profunditat -mutlimodalitat -action -gesture -trajectoy -depth -multi-modal -neural networks -Xarxes neuronals (Informàtica) -Vídeo
Derechos:
Tipo de documento:	Trabajo fin de máster
Editor:	Universitat Politècnica de Catalunya
Compartir:

Mostrar el registro completo del ítem

Documentos relacionados

Otros documentos del mismo autor/a

Implementation of RTK Technology in an Unmaned Aircraft System (UAS) to Increase GNSS Precission

Figueres Simo, Aleix; Bertiche Argila, Hugo

Accesibilidad | Aviso legal | Política de Cookies | Documentos de uso interno

Coordinación

Patrocinio