dc.contributor
École polytechnique fédérale de Lausanne
dc.contributor
Cevher, Volkan
dc.contributor
Krawczuk, Igor
dc.contributor.author
Pujol Perich, David
dc.date.issued
2022-07-10
dc.identifier
https://hdl.handle.net/2117/376954
dc.description.abstract
Recent years have seen the vast potential of the Transformer model, as it is arguably the first general-purpose architecture in the sense that achieves state-of-the-art performance in numerous fields –e.g., Computer Vision, Natural Language Processing, Autonomous driving– using minimal architecture modifications. The success of Transformers greatly relies on the use of the self-attention mechanism, the understanding of which still remains somewhat obscure. In this thesis, we first focus on bridging this gap from an empirical perspective, studying its main inductive biases and limitations. We also propose the ReLA Nyströmformer, a novel architecture that attains a linear complexity –considerably improving the original quadratic complexity of the self-attention– meanwhile proving to be empirically superior to a vast set of state-of-the-art benchmarks. We also notice that Transformer architectures are ultimately third-order-interaction-based models –i.e., can be formalized as third-order polynomials– which tractability strongly depends on a number of inductive biases. This motivates the last part of this thesis where we discuss the suitability of devising higher-order models –i.e., based on higher-order polynomials– both from a predictive and interpretability point of view. Finally, we propose two novel architectures, the Low-rank Deep Polynomial Network and the Adaptive attention, based on low-rank projections and automatic attention pattern learning. The state-of- the-art performance of both of these models underpins the need of researching further in this direction to fully elucidate their potential.
dc.format
application/pdf
dc.publisher
Universitat Politècnica de Catalunya
dc.rights
Restricted access - author's decision
dc.subject
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural
dc.subject
Deep learning (Machine learning)
dc.subject
Natural language processing (Computer science)
dc.subject
Computer vision
dc.subject
aprenentatge profund
dc.subject
self-attention
dc.subject
processament de llenguatge natural
dc.subject
visió per computació
dc.subject
models d'ordre superior
dc.subject
efficient Transformers
dc.subject
natural language processing
dc.subject
computer vision
dc.subject
high-order models
dc.subject
aprenentatge automàtic
dc.subject
machine learning
dc.subject
Aprenentatge profund
dc.subject
Tractament del llenguatge natural (Informàtica)
dc.subject
Visió per ordinador
dc.title
Understanding and improving self-attention mechanisms