您现在的位置是:首页 >其他 >Lecture 8 Deep Learning for NLP: Recurrent Networks网站首页其他
Lecture 8 Deep Learning for NLP: Recurrent Networks
- Problem of N-gram Language Model N-gram 语言模型的问题
- Recurrent Neural Network(RNN) 循环神经网络
- RNN Language Model: RNN 语言模型
- Long Short-Term Memory Model (LSTM) 长短期记忆模型(LSTM)
- Gating Vector 门向量
- Forget Gate 忘记门
- Input Gate 输入门
- Update Memory Cell 更新记忆单元
- Output Gate 输出门
- Disadvantages of LSTM LSTM 的缺点
- Example Applications 示例应用
- Variants of LSTM LSTM的变种
Recurrent Networks 循环神经网络
Problem of N-gram Language Model N-gram 语言模型的问题
Cen be implemented using counts with smoothing 可以用平滑计数实现
Can be implemented using feed-forward neural networks 可以用前馈神经网络实现
Problem: limited context 问题:上下文限制
E.g. Generate sentences using trigram model: 例如:使用 trigram 模型生成句子:
Recurrent Neural Network(RNN) 循环神经网络
Allow representation of arbitrarily sized inputs 允许表示任意大小的输入
Core idea: processes the input sequence one at a time, by applying a recurrence formula 核心思想:一次处理一个输入序列,通过应用递归公式
Uses a state vector to represent contexts that have been previously processed 使用状态向量表示之前处理过的上下文
RNN Neuron: RNN 神经元
RNN States: RNN 状态
Activation 激活函数:
RNN Unrolled: 展开的 RNN
- Same parameters
are used across all time steps 同一参数
- Same parameters
Training RNN: 训练 RNN
- An unrolled RNN is a very deep neural network. But parameters are shared across all time steps 展开的 RNN 是一个非常深的神经网络。但是参数在所有时间步中都是共享的
- To train RNN, just need to create the unrolled computation graph given an input sequence and use backpropagation algorithm to compute gradients as usual. 要训练 RNN,只需根据输入序列创建展开的计算图,并使用反向传播算法计算梯度
- This procedure is called backpropagation through time. 这个过程叫做时间反向传播
E.g of unrolled equation: 展开方程的例子
RNN Language Model: RNN 语言模型
is current word (e.g.
) mapped to an embedding是当前词(例如 eats)映射到一个嵌入
contains information of the previous words (e.g.
)包含前面词的信息(例如 a 和 cow)
is the next word (e.g.
)是下一个词(例如 grass)
Vocabulary 词汇:
[a, cow, eats, grass]
Training example 训练样本:
a cow eats grass
Training process 训练过程:
Total loss:
Problems of RNN: RNN 的问题
- Error Propagation: Unable to recover from errors in intermediate steps 错误传播:无法从中间步骤的错误中恢复
- Low diversity in generated language 生成的语言多样性低
- Tend to generate bland or generic language 倾向于生成乏味或通用的语言
Long Short-Term Memory Networks
Long Short-Term Memory Model (LSTM) 长短期记忆模型(LSTM)
RNN has the capability to model infinite context. But it cannot capture long-range dependencies in practice due to the vanishing gradients RNN 具有建模无限上下文的能力。但由于梯度消失,实际上无法捕捉长距离依赖性
Vanishing Gradient: Gradients in later steps diminish quickly during backpropagation. Earlier inputs do not get much update. 梯度消失:在反向传播过程中,后续步骤的梯度快速减小。较早的输入没有得到太多更新。
LSTM is introduced to solve vanishing gradients LSTM 用来解决梯度消失问题
Core idea: have memory cells that preserve gradients across time. Access to the memory cells is controlled by gates. 核心思想:拥有跨时间保存梯度的记忆单元。通过门控制对记忆单元的访问。
Gates: For each input, a gate decides: 门:对于每个输入,门决定
- How much the new input should be written to the memory cell 应该将多少新输入写入记忆单元
- How much content of the current memory cell should be forgotten 应该忘记当前记忆单元的多少内容
Comparison between simple RNN and LSTM: 简单 RNN 和 LSTM 的比较
Gating Vector 门向量
A gate
is a vector. Each element of the gate has values between 0 and 1. Use sigmoid function to produce
. 门
是一个向量。门的每个元素的值在 0 到 1 之间。使用 sigmoid 函数来产生
is multiplied component-wise with vector
to determine how much information to keep for
乘以 component-wise 来确定对
Forget Gate 忘记门
Controls how much information to forget in the memory cell
E.g. Given
Tha cas that the boy
predict the next wordlikes
例如,给定Tha cas that the boy
- Memory cell was storing noun information
- The cell should now forget
and storeboy
to correctly predict the singular verblikes
- Memory cell was storing noun information
Input Gate 输入门
Input gate controls how much new information to put to memory cell 输入门控制将多少新信息放入记忆单元
is new distilled information to be added
Update Memory Cell 更新记忆单元
- Use the forget and input gates to update memory cell 使用忘记门和输入门来更新记忆单元
Output Gate 输出门
- Output gate controls how much to distill the content of the memory cell to create the next state 输出门控制如何提炼记忆单元的内容以创建下一个状态
Disadvantages of LSTM LSTM 的缺点
- Introduces some but not many parameters 引入了一些但并不多的参数
- Still unable to capture very long range dependencies 仍无法捕获非常长的依赖性
- Slower but not much slower than simple RNN 比简单的 RNN 慢,但并不比 RNN 慢太多
Applications of RNN RNN 的应用
Example Applications 示例应用
Shakespeare Generator 莎士比亚生成器:
- Training data: all works fo Shakespeare 训练数据:莎士比亚的所有作品
- Model: Character RNN, hidden dimension = 512 模型:Character RNN,隐藏维度 = 512
Wikipedia Generator: 维基百科生成器
- Training data: 100MB of Wikipedia raw data 训练数据:100MB的维基百科原始数据
Code Generator 代码生成器
Text Classification 文本分类
- RNNs can be used in variety NLP tasks. Particularly suited for tasks where order of words matter. E.g. sentiment analysis RNNs可以用于各种NLP任务。特别适合于单词顺序很重要的任务。例如,情感分析
Sequence Labeling: E.g. POS tagging 序列标记:例如,词性标注
Variants of LSTM LSTM的变种
Peephole connections: allow gates to look at cell state 窥视孔连接:允许门看到单元状态
Gated recurrent unit (GRU): Simplified variant with only 2 gates and no memory cell 门控循环单元(GRU):简化的变体,只有2个门,没有记忆单元
Multi-layer LSTM 多层LSTM
Bidirectional LSTM 双向LSTM