Assume you have an input batch which contains variable length sequences. The batch dimension is:
input: [batch_size, max_time, dim_feature]
and you also stored the length of each sequence in a vector, say sequence_length. Now you can easily get the states output by:
_, state = tf.nn.dynamic_rnn(some_RNN_cell, input, sequence_length=sequence_length)
then you can get both the hidden and cell states output:
state.h: hidden states output, [batch_size, hidden_states_size]
state.c: cell states output
I give credit to these two sources:
https://danijar.com/variable-sequence-lengths-in-tensorflow/
https://github.com/shane-settle/neural-acoustic-word-embeddings/blob/4cc3878e6715860bcce202aea7c5a6b7284292a1/code/lstm.py#L25
Tuesday, February 20, 2018
Sunday, January 14, 2018
Sheet music and audio multimodal learning
https://arxiv.org/abs/1612.05050
Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.
https://arxiv.org/abs/1707.09887
Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?
Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.
https://arxiv.org/abs/1707.09887
Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?
Wednesday, January 3, 2018
If I were to write this paper... Drum transcription CRNN
https://ismir2017.smcnus.org/wp-content/uploads/2017/10/123_Paper.pdf
(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.
(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.
(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.
(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.
(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.
(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.
Subscribe to:
Posts (Atom)