Understanding and improving self-attention mechanisms

Pujol Perich, David

Understanding and improving self-attention mechanisms

dc.contributor

École polytechnique fédérale de Lausanne

dc.contributor

Cevher, Volkan

dc.contributor

Krawczuk, Igor

dc.contributor.author

Pujol Perich, David

dc.date.issued

2022-07-10

dc.identifier

https://hdl.handle.net/2117/376954

dc.identifier

170015

dc.description.abstract

Recent years have seen the vast potential of the Transformer model, as it is arguably the first general-purpose architecture in the sense that achieves state-of-the-art performance in numerous fields –e.g., Computer Vision, Natural Language Processing, Autonomous driving– using minimal architecture modifications. The success of Transformers greatly relies on the use of the self-attention mechanism, the understanding of which still remains somewhat obscure. In this thesis, we first focus on bridging this gap from an empirical perspective, studying its main inductive biases and limitations. We also propose the ReLA Nyströmformer, a novel architecture that attains a linear complexity –considerably improving the original quadratic complexity of the self-attention– meanwhile proving to be empirically superior to a vast set of state-of-the-art benchmarks. We also notice that Transformer architectures are ultimately third-order-interaction-based models –i.e., can be formalized as third-order polynomials– which tractability strongly depends on a number of inductive biases. This motivates the last part of this thesis where we discuss the suitability of devising higher-order models –i.e., based on higher-order polynomials– both from a predictive and interpretability point of view. Finally, we propose two novel architectures, the Low-rank Deep Polynomial Network and the Adaptive attention, based on low-rank projections and automatic attention pattern learning. The state-of- the-art performance of both of these models underpins the need of researching further in this direction to fully elucidate their potential.

dc.format

application/pdf

dc.language

eng

dc.publisher

Universitat Politècnica de Catalunya

dc.rights

Restricted access - author's decision

dc.subject

Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural

dc.subject

Deep learning (Machine learning)

dc.subject

Natural language processing (Computer science)

dc.subject

Computer vision

dc.subject

aprenentatge profund

dc.subject

self-attention

dc.subject

processament de llenguatge natural

dc.subject

visió per computació

dc.subject

models d'ordre superior

dc.subject

deep learning

dc.subject

efficient Transformers

dc.subject

natural language processing

dc.subject

computer vision

dc.subject

high-order models

dc.subject

aprenentatge automàtic

dc.subject

machine learning

dc.subject

Transformers

dc.subject

Aprenentatge profund

dc.subject

Tractament del llenguatge natural (Informàtica)

dc.subject

Visió per ordinador

dc.title

Understanding and improving self-attention mechanisms

dc.type

Master thesis

Ficheros en el ítem

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a este ítem.

Este ítem aparece en la(s) siguiente(s) colección(ones)

Treballs acadèmics [82545]