arrow-left

All pages
gitbookPowered by GitBook
1 of 1

Loading...

Recurrent Neural Networks

Update: 2023-10-26

A Recurrent Neural Network (RNN) [1] maintains hidden states of previous inputs and uses them to predict outputs, allowing it to model temporal dependencies in sequential data.

The hidden state is a vector representing the network's internal memory of the previous time step. It captures information from previous time steps and influences the predictions made at the current time step, often updated at each time step as the RNN processes a sequence of inputs.

hashtag
RNN for Sequence Tagging

Given an input sequence where , an RNN for defines two functions, and :

  • takes the current input and the hidden state of the previous input , and returns a hidden state such that , where , , and is an .

  • takes the hidden state and returns an output such that , where .

Figure 1 shows an example of an RNN for sequence tagging, such as :

Notice that the output for the first input is predicted by considering only the input itself such that (e.g., the POS tag of the first word "I" is predicted solely using that word). However, the output for every other input is predicted by considering both and , an intermediate representation created explicitly for the task. This enables RNNs to capture sequential information that cannot.

circle-exclamation

Q6: How does each hidden state in a RNN encode information relevant to sequence tagging tasks?

hashtag
RNN for Text Classification

Unlike sequence tagging where the RNN predicts a sequence of output for the input , an RNN designed for predicts only one output for the entire input sequence such that:

  • Sequence Tagging

  • Text Classification:

To accomplish this, a common practice is to predict the output from the last hidden state using the function . Figure 2 shows an example of an RNN for text classification, such as :

circle-exclamation

Q7: In text classification tasks, what specific information is captured by the final hidden state of a RNN?

hashtag
Bidirectional RNN

The above does not consider the words that follow the current word when predicting the output. This limitation can significantly impact model performance since contextual information following the current word can be crucial.

For example, let us consider the word "early" in the following two sentences:

  • They are early birds -> "early" is an adjective.

  • They are early today -> "early" is an adverb.

The POS tags of "early" depend on the following words, "birds" and "today", such that making the correct predictions becomes challenging without the following context.

To overcome this challenge, a Bidirectional RNN is suggested [2] that considers both forward and backward directions, creating twice as many hidden states to capture a more comprehensive context. Figure 3 illustrates a bidirectional RNN for sequence tagging:

For every , the hidden states and are created by considering and , respectively. The function takes both and and returns an output such that , where is a concatenation of the two hidden states and .

circle-exclamation

Q8: What are the advantages and limitations of implementing bidirectional RNNs for text classification and sequence tagging tasks?

hashtag
Advanced Topics

  • Long Short-Term Memory (LSTM) Networks [3-5]

  • Gated Recurrent Units (GRUs) [6-7]

hashtag
References

  1. , Elman, Cognitive Science, 14(2), 1990.

  2. , Schuster and Paliwal, IEEE Transactions on Signal Processing, 45(11), 1997.

  3. , Hochreiter and Schmidhuber, Neural Computation, 9(8), 1997 ( available at ResearchGate).

End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRFarrow-up-right, Ma and Hovy, ACL, 2016.*

  • Contextual String Embeddings for Sequence Labelingarrow-up-right, Akbik et al., COLING, 2018.*

  • Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translationarrow-up-right, Cho et al., EMNLP, 2014.*

  • Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modelingarrow-up-right, Chung et al., NeurIPS Workshop on Deep Learning and Representation Learning, 2014.*

  • X=[x1,…,xn]X = [x_1, \ldots, x_n]X=[x1​,…,xn​]
    xi∈RdΓ—1x_i \in \mathbb{R}^{d \times 1}xiβ€‹βˆˆRdΓ—1
    fff
    ggg
    fff
    xi∈Xx_i \in Xxiβ€‹βˆˆX
    hiβˆ’1h_{i-1}hiβˆ’1​
    xiβˆ’1x_{i-1}xiβˆ’1​
    hi∈ReΓ—1h_i \in \mathbb{R}^{e \times 1}hiβ€‹βˆˆReΓ—1
    f(xi,hiβˆ’1)=Ξ±(Wxxi+Whhiβˆ’1)=hif(x_i, h_{i-1}) = \alpha(W^x x_i + W^h h_{i-1}) = h_if(xi​,hiβˆ’1​)=Ξ±(Wxxi​+Whhiβˆ’1​)=hi​
    Wx∈ReΓ—dW^x \in \mathbb{R}^{e \times d}Wx∈ReΓ—d
    Wh∈ReΓ—eW^h \in \mathbb{R}^{e \times e}Wh∈ReΓ—e
    Ξ±\alphaΞ±
    ggg
    hih_ihi​
    yi∈RoΓ—1y_i \in \mathbb{R}^{o \times 1}yiβ€‹βˆˆRoΓ—1
    g(hi)=Wohi=yig(h_i) = W^o h_i = y_ig(hi​)=Wohi​=yi​
    Wo∈RoΓ—eW^o \in \mathbb{R}^{o \times e}Wo∈RoΓ—e
    y1y_1y1​
    x1x_1x1​
    f(x1,0)=Ξ±(Wxx1)=h1f(x_1, \mathbf{0}) = \alpha(W^x x_1) = h_1f(x1​,0)=Ξ±(Wxx1​)=h1​
    yiy_iyi​
    xix_ixi​
    xix_ixi​
    hiβˆ’1h_{i-1}hiβˆ’1​
    hih_ihi​
    Y=[y1,…,yn]Y = [y_1, \ldots, y_n]Y=[y1​,…,yn​]
    X=[x1,…,xn]X = [x_1, \ldots, x_n]X=[x1​,…,xn​]
    yyy
    RNNst(X)β†’Y\text{RNN}_{st}(X) \rightarrow YRNNst​(X)β†’Y
    RNNst(X)β†’y\text{RNN}_{st}(X) \rightarrow yRNNst​(X)β†’y
    yyy
    hnh_nhn​
    ggg
    hnh_nhn​
    xix_ixi​
    hβ†’i\overrightarrow{h}_ihi​
    h←i\overleftarrow{h}_ihi​
    hβ†’iβˆ’1\overrightarrow{h}_{i-1}hiβˆ’1​
    h←i+1\overleftarrow{h}_{i+1}hi+1​
    ggg
    hβ†’i\overrightarrow{h}_ihi​
    h←i\overleftarrow{h}_ihi​
    yi∈RoΓ—1y_i \in \mathbb{R}^{o \times 1}yiβ€‹βˆˆRoΓ—1
    g(hβ†’i,h←i)=Wo(hβ†’iβŠ•h←i)=yig(\overrightarrow{h}_i, \overleftarrow{h}_i) = W^o (\overrightarrow{h}_i \oplus \overleftarrow{h}_i) = y_ig(hi​,hi​)=Wo(hiβ€‹βŠ•hi​)=yi​
    (hβ†’iβŠ•h←i)∈R2eΓ—1(\overrightarrow{h}_i \oplus \overleftarrow{h}_i) \in \mathbb{R}^{2e \times 1}(hiβ€‹βŠ•hi​)∈R2eΓ—1
    Wo∈RoΓ—2eW^o \in \mathbb{R}^{o \times 2e}Wo∈RoΓ—2e
    sequence tagging
    activation function
    part-of-speech tagging
    Feedforward Neural Networks
    text classification
    sentiment analysis
    RNN for sequence tagging
    Finding Structure in Timearrow-up-right
    Bidirectional Recurrent Neural Networksarrow-up-right
    Long Short-Term Memoryarrow-up-right
    PDFarrow-up-right
    Figure 1 - An example of an RNN and its applicatoin in part-of-speech (POS) tagging.
    Figure 2 - An example of an RNN and its applicatoin in sentiment analysis.
    Figure 3 - An overview of a bidirectional RNN.