École polytechnique fédérale de Lausanne
Cevher, Volkan
Krawczuk, Igor
2022-07-10
Recent years have seen the vast potential of the Transformer model, as it is arguably the first general-purpose architecture in the sense that achieves state-of-the-art performance in numerous fields –e.g., Computer Vision, Natural Language Processing, Autonomous driving– using minimal architecture modifications. The success of Transformers greatly relies on the use of the self-attention mechanism, the understanding of which still remains somewhat obscure. In this thesis, we first focus on bridging this gap from an empirical perspective, studying its main inductive biases and limitations. We also propose the ReLA Nyströmformer, a novel architecture that attains a linear complexity –considerably improving the original quadratic complexity of the self-attention– meanwhile proving to be empirically superior to a vast set of state-of-the-art benchmarks. We also notice that Transformer architectures are ultimately third-order-interaction-based models –i.e., can be formalized as third-order polynomials– which tractability strongly depends on a number of inductive biases. This motivates the last part of this thesis where we discuss the suitability of devising higher-order models –i.e., based on higher-order polynomials– both from a predictive and interpretability point of view. Finally, we propose two novel architectures, the Low-rank Deep Polynomial Network and the Adaptive attention, based on low-rank projections and automatic attention pattern learning. The state-of- the-art performance of both of these models underpins the need of researching further in this direction to fully elucidate their potential.
Master thesis
English
Àrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Llenguatge natural; Deep learning (Machine learning); Natural language processing (Computer science); Computer vision; aprenentatge profund; self-attention; processament de llenguatge natural; visió per computació; models d'ordre superior; deep learning; efficient Transformers; natural language processing; computer vision; high-order models; aprenentatge automàtic; machine learning; Transformers; Aprenentatge profund; Tractament del llenguatge natural (Informàtica); Visió per ordinador
Universitat Politècnica de Catalunya
Restricted access - author's decision
Treballs acadèmics [82502]