In the paper "Capacity and trainability in RNNs": https://arxiv.org/pdf/1611.09913.pdf
The author claims that all common RNNs have similar capacity. The Vanilla RNN is super hard to train. If the task is hard to learn, one should choose gated architectures, in which GRU is the most learnable for shallow networks, +RNN (Intersection RNN) performs the best for the deep networks. Although LSTM is extremely reliable, it doesn't perform the best. If the training environment is uncertain, the author suggests using GRU or +RNN.
Another paper "On the state of the art of evaluation in neural language models" https://arxiv.org/pdf/1707.05589.pdf The authors also found that the standard LSTM performs the best among 3 different architectures (LSTM, Recurrent highway networks and Neural architecture search). The models are trained using a modified ADAM optimizer. Hyperparameters including learning rate, input embedding ratio, input dropout, output dropout, weight decay, are tuned by batched GP bandits.
It is also shown that, in the Penn Treebank experiment, for the recurrent state, the variational dropout helps, the recurrent dropout indicates no advantage.
Wednesday, December 27, 2017
Sunday, December 24, 2017
Deep learning practice and trends, some key points
I went through the 1st part of the tutorial: practice. Below are some key points in Oriol's talk:
CNN:
(1) The slide 7 Deep learning: zooming in is amazing! He listed the deep learning model construction elements and sorted them into different categories: Non-linearities, Optimizer, connectivity pattern, loss and hyper-parameters.
(2) The slide 21 which shows the convolution animation is great! very intuitive to understand the convolution mechanism.
(3) Slide 27 building very deep ConvNets: using deeper architecture and small filter size 3*3 will result in a large receptive field and less parameter size than using large filters.
(4) Slide 35 U-net: for image segmentation, bottleneck encoder-decoder with skip connection.
Seq2seq:
(1) Attention!
(2) Slide 62: tricks!
Video:
Slides: https://docs.google.com/presentation/d/e/2PACX-1vQMZsWfjjLLz_wi8iaMxHKawuTkdqeA3Gw00wy5dBHLhAkuLEvhB7k-4LcO5RQEVFzZXfS6ByABaRr4/pub?slide=id.g2a19ddb012_0_654
CNN:
(1) The slide 7 Deep learning: zooming in is amazing! He listed the deep learning model construction elements and sorted them into different categories: Non-linearities, Optimizer, connectivity pattern, loss and hyper-parameters.
(2) The slide 21 which shows the convolution animation is great! very intuitive to understand the convolution mechanism.
(3) Slide 27 building very deep ConvNets: using deeper architecture and small filter size 3*3 will result in a large receptive field and less parameter size than using large filters.
(4) Slide 35 U-net: for image segmentation, bottleneck encoder-decoder with skip connection.
Seq2seq:
(1) Attention!
(2) Slide 62: tricks!
Video:
Slides: https://docs.google.com/presentation/d/e/2PACX-1vQMZsWfjjLLz_wi8iaMxHKawuTkdqeA3Gw00wy5dBHLhAkuLEvhB7k-4LcO5RQEVFzZXfS6ByABaRr4/pub?slide=id.g2a19ddb012_0_654
Labels:
Deep learning,
NIPS 2017
Location:
Beijing, China
Subscribe to:
Posts (Atom)