Understanding and improving self-attention mechanisms

Pujol Perich, David; Pujol Perich, David

Understanding and improving self-attention mechanisms

Author

Pujol Perich, David

Other authors

École polytechnique fédérale de Lausanne

Cevher, Volkan

Krawczuk, Igor

Publication date

2022-07-10

Abstract

Recent years have seen the vast potential of the Transformer model, as it is arguably the first general-purpose architecture in the sense that achieves state-of-the-art performance in numerous fields –e.g., Computer Vision, Natural Language Processing, Autonomous driving– using minimal architecture modifications. The success of Transformers greatly relies on the use of the self-attention mechanism, the understanding of which still remains somewhat obscure. In this thesis, we first focus on bridging this gap from an empirical perspective, studying its main inductive biases and limitations. We also propose the ReLA Nyströmformer, a novel architecture that attains a linear complexity –considerably improving the original quadratic complexity of the self-attention– meanwhile proving to be empirically superior to a vast set of state-of-the-art benchmarks. We also notice that Transformer architectures are ultimately third-order-interaction-based models –i.e., can be formalized as third-order polynomials– which tractability strongly depends on a number of inductive biases. This motivates the last part of this thesis where we discuss the suitability of devising higher-order models –i.e., based on higher-order polynomials– both from a predictive and interpretability point of view. Finally, we propose two novel architectures, the Low-rank Deep Polynomial Network and the Adaptive attention, based on low-rank projections and automatic attention pattern learning. The state-of- the-art performance of both of these models underpins the need of researching further in this direction to fully elucidate their potential.

Document Type

Master thesis

Language

English

Publisher

Universitat Politècnica de Catalunya

Recommended citation

This citation was generated automatically.

Export

DIDL MARC MARC_CCUC METS OAI_DC ORE QDC RDF

Rights

Restricted access - author's decision

This item appears in the following Collection(s)

Treballs acadèmics [82502]

Understanding and improving self-attention mechanisms

Author

Other authors

Publication date

Share

Abstract

Document Type

Language

Subjects and keywords

Publisher

Recommended citation

Export

Rights

This item appears in the following Collection(s)