dc.contributor
Universitat Politècnica de Catalunya. Universitat Rovira i Virgili
dc.contributor
Universitat Rovira i Virgili
dc.contributor
Universitat de Barcelona
dc.contributor
PUNTO-FA, SL
dc.contributor
Ferrando Hernández, Pol
dc.contributor
Moreno Ribas, Antonio
dc.contributor.author
Fernández Coronado, Alba
dc.date.accessioned
2026-04-17T04:01:50Z
dc.date.available
2026-04-17T04:01:50Z
dc.date.issued
2026-01-27
dc.identifier
https://hdl.handle.net/2117/460649
dc.identifier.uri
https://hdl.handle.net/2117/460649
dc.description.abstract
This thesis presents a series of improvements to the semantic search system deployed on the mango.com e-commerce platform, with the goal of enhancing retrieval accuracy, robustness, and relevance across multiple languages and query types. The work focuses on addressing key limitations of the existing semantic component within a hybrid lexical¿semantic search architecture. The main contributions include the training of a multilingual BERT uncased model to improve robustness to capitalization and diacritics, as well as fine-tuning on fashion-specific data to enhance domain understanding. In addition, limitations in handling attribute-only queries are addressed through the use of intelligent image cropping and a weighted fusion strategy that combines image embeddings with short textual metadata. Furthermore, the CLIP image encoder is fine-tuned to generate semantically richer and more discriminative visual representations, leading to higher similarity scores and improved ranking stability. Experimental results, evaluated on a manually annotated multilingual dataset, demonstrate consistent improvements across locales, increased semantic similarity scores, and more stable ranking performance, without introducing catastrophic forgetting or degrading standard query retrieval. The integration of these enhancements into the full production search pipeline provides a robust foundation for future improvements, including tighter alignment between multilingual text embeddings and the fine-tuned image encoder.
dc.format
application/pdf
dc.publisher
Universitat Politècnica de Catalunya
dc.rights
Restricted access - confidentiality agreement
dc.subject
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject
Àrees temàtiques de la UPC::Economia i organització d'empreses::Comerç electrònic
dc.subject
Semantics--Data processing
dc.subject
Computer vision
dc.subject
Electronic commerce
dc.subject
Cerca semàntica
dc.subject
Cerca en comerç electrònic
dc.subject
Recuperació multilingüe
dc.subject
Sistemes de cerca híbrids
dc.subject
Cerca multimodal
dc.subject
Models visió-llenguatge
dc.subject
Recuperació de moda
dc.subject
Adaptació al domini
dc.subject
Embeddings imatge-text
dc.subject
Semantic search
dc.subject
E-commerce search
dc.subject
Multilingual retrieval
dc.subject
Hybrid search systems
dc.subject
Multimodal search
dc.subject
Vision-language models
dc.subject
Image-text embeddings
dc.subject
Image-text embeddings
dc.subject
Image-text embeddings
dc.subject
Aprenentatge profund
dc.subject
Semàntica--Informàtica
dc.subject
Visió per ordinador
dc.subject
Comerç electrònic
dc.title
Improvement and integration of a deep learning-based semantic model into a hybrid lexical-semantic search engine