Skip to content

Recurrent Neural Networks

Meghavarshini Krishnaswamy edited this page Mar 28, 2024 · 7 revisions

Introduction

RNNs are a good architecture for working with sequential data in the form of x(1), ... , x(τ). RNNs are better suited for sequences with many members, and varied lengths than the perceptron or feed-forward layers. This is because they are able to use the relationship between adjacent members of a sequence to make their predictions. Due to their ability to retain information about previous states and process large sequences, RNNs are a good option for time-series analysis and natural language processing.

RNNs process sequential data by defining a recurrence relation over time steps which is typically the following formula:

$$ S_{k} = f(S_{k-1}\cdot W_{rec} + X_{k}\cdot W_{X}) $$

Where $S_{k}$ is the state at time k, $X_{k}$ an exogenous input at time , $W_{rec}$ and $W_{X}$ are parameters similar to weights. The RNN can be viewed as a state model with a feedback loop. The state evolves over time due to the recurrence relation, and the feedback is fed back into the state with a delay of one time step. While processing, the intermediate, or "hidden" state is continuously updated and passed to the next step of the sequence. This delayed feedback loop gives the model memory because it can "remember" information between time steps in the states.

The final output of the network at a certain time step is typically computed from one or more states.

This structure allows us to predict the next state $S_{k+1}$ from the current state $S_{k}$ and current input $X_{k}$.

In the graphical representation bellow, we "unfold" the formula given above. On the left is an overview of the above mathematical representation of an RNN. On the right, we walk through every state generated by a sequence:

Source for information and image: How to implement an RNN (1/2) - Minimal example

Use Cases

RNNs are great for tasks that require a many-to-one, or many-to-many mapping. For example, a "character-level RNN" uses the individual characters in a given string as its input, rather than words or sentences. The network learns the underlying patterns that govern which character is allowed next to which one in a given language. In a many-to-one architecture, the RNN generates only one output, which could be a class the string belongs to.

In many-to-many architectures, the network receives an input sequence of characters and generates an output sequence of characters. For example mapping a string in a source language, to a similar one in a target language.

Source: Going Under the Hood of Character-Level RNNs: A NumPy-based Implementation Guide

RNNs for NLP

We will transition into this Python notebook for our demonstration: pytorch_char_rnn_classification_tutorial.ipynb

Further reading