Need help with min-char-rnn?

Click the “chat” button below for chat support from the developer who created it, or find similar developers for support.

130 Stars 78 Forks 44 Commits 0 Opened issues

Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy

No Data

Readme

Minimal character-level language model with a Vanilla Recurrent Neural Network, in Python/numpy

Reference page The Unreasonable Effectiveness of Recurrent Neural Networks

Actually this model use the simple RNN, not using LSTM. This model use the characters as input, then we use a one-hot vector as input X, the dimension of X is the size of characters in input file.

Let assume that the char vocab size is V, and hidden size is H, then output layer size is V.

This simple RNN model contain 3 matrices: * Whh: H * H, hidden layer to hidden layer * Wxh: V * H, input layer to hidden layer * Why: V * H, hidden layer to output layer

In the output layer, softmax is used to compute the character probability distribution, then we could sample the next character according previous input.

**update hidden state**

h^{t} = tanh(Whh * h^{t-1} + Wxh*X)

**compute output vector**

y = Why * h

This model use the characters in input.txt file, use current character and next character as a training data pair. Each character is represented by one hot vector, target value means which character we expected given current character.

For example:
in input file, we read in "hello", then 'h' and 'e' will be used as a training pair, and will be encoded into vectors.
Let's assume that we only have 4 characters in our vocab, ('h','e','l','o'), then 'h' will be encoded to [1,0,0,0]^{T}, 'e' will be encoded to [0,1,0,0]^{T}

Input layer size is V, then input value is a V * 1 one hot vector.

Hidden layer size is H, we also need to record the hidden state(value of hidden layer).

Output layer size is V, we get a character probability distribution in output layer, then we could sample a character in this probability distribution given a sequence of input.

This RNN model use cross entropy as cost function (error), cross entropy for one training data:
$$H(t, y) = -\sum*{i=1}^{V}t*{i}logy_{i}$$

Because in t, most of the value is 0, so, we could rewrite the above equation as:
$$H(t, y) = -t*{i}logy*{i}$$

Here i is the indice of 1 in vector t.

- Install numpy
pip install numpy

- Run this code
python min-char-rnn.py

The output of this model is sampled characters given current input characters. Examples out output: ``` iter 92400, loss: 48.542305

cet for dons of he wast oune tofus shee loolf the was hering thity, Youpres To make his it you gain fell must out you yie t.