Textual Entailment (TE) or Natural Language Inference (NLI) refers to the problem of determining a
directional relation between two text fragments. To specify, given a sentence pair (a, b), the task is to
predict whether b is entailed by a, b is contradicted to a, or whether the relation between a and b is
neutral.
NLI is a central problem in natural language understanding. Recently, the dominating trend of works for
NLI is based on artificial neural networks, which aims at building deep and complex encoder to
transform a sentence into encoded vectors. End-to-end artificial neural networks have reached state-ofthe-art
performance in NLI field. For instance, there are recurrent neural network based encoders, which
recursively concatenate each word with its previous memory, until the whole information of a sentence
has been derived. The most common RNN encoders are Long Short-Term Memory Networks (LSTM;
Hochreiter and Schmidhuber, 1997) and Gated Recurrent Unit (Cho et al., 2014). RNNs have surpassed
the performance of traditional baselines in many NLP tasks (Dai et al., 2015). There are also
convolutional neural network (LeCun et al., 1989) based encoders, which concatenate the sentence
information by applying multiple convolving filters over the sentence. CNNs have achieved state-ofthe-art
results on computer vision (Krizhevsky et al., 2012), machine translation (Costa-jussà M.R.,
2016) and also on various NLP tasks (Collobert et al., 2011).
In this paper, we use the model introduced by (Adina Williams et al., 2017) as the baseline model for
the NLI task. The baseline model is consisted with a word-level embedding layer and a BiLSTM
encoder. We augment the baseline model and propose our Character-level Intra Attention Networks
(CIAN). In our CIAN model, we use the character-level convolutional network to replace the standard
word-level embedding layer, and we use the intra attention layer to capture the intra-sentence semantics.
One contribution of our CIAN model is that we implement the character-level convolutional network
introduced by (Kim et al., 2016). Most of the sequence encoders use word-level embedding layer
initialized with pre-trained word vectors such as GloVe (Pennington et al., 2014). In that way, the words
in a sentence are not independent anymore, which helps the encoders to catch more internal information
of a sentence. However, as the growth of vocabulary size in the modern corpus, there will be more and
more out-of-vocabulary (OOV) words that are not presented in the pre-trained word embedding vector.
As the word-level embedding is blind to subword information (e.g. morphemes), it leads to high
perplexities for those OOV words. We use the character-level convolutional network in our model to
exploit the character-level information, which will be computed from the characters of corresponding
word. By doing so, our model gains the ability to learn rich semantic and orthographic features from the
encoding of characters.
Another contribution of our CIAN model is that we implement the intra attention mechanism introduced
by (Z. Yang et al., 2017). The major advantage of attention mechanism is the ability to efficiently encode
long sentences. As the size of the input grows, models that do not use attention will miss information and precision if they only use the final representation. Attention is a clever way to fix this issue and
experiments indeed confirm the intuition. Another advantage of attention mechanism is that we can
enhance the interpretability of the model by visualizing the attention weights of an encoded sentence.
We conduct the visualization of the attention weights in chapter 5, which helps us to understand how
the model judges the textual entailment relation between two sentences.
The proposed CIAN is implemented using Keras and evaluated upon a newly published MNLI corpus
in the RepEval 2017 workshop. The test accuracy for the CIAN model upon matched test dataset is
improved with 0.9 percent compared with the baseline model. Based on the improved result, we
published a paper with title Character-level Intra Attention Networks for Natural Language Inference
in the RepEval 2017 workshop, as an achievement of this thesis.
To summarize, the CIAN model presented in this paper is a sequence encoder that has the ability to to
encode long sentence in character-level with rich semantic and orthographic features. Also, the attention
mechanism provides high interpretability of the model that allows people to understand how the model
doing its task. As it’s an end-to-end neural network that does not need any specific pre-processing or
outside data like pre-trained word embeddings. It can be easily applied to other encoder architecture
tasks such as language modeling, sentiment analysis and question answering. |