Rong GONG's music/speech processing blog

Saturday, November 16, 2019

Create pkg MacOS installer with macos-installer-builder

We are going to create a pkg that copies some files to the /Library directory. We will use the tool macos-installer-builder, which could be downloaded from https://github.com/KosalaHerath/macos-installer-builder

The usage of this tool has been well explained in the github page. The default build-macos-x64.sh script requires two parameters: product name and product version. We are going to modify this script to generate a pkg installer that copies the application files into a fixed directory.

1. copy the application's files into ./application/ folder
2. comment the section in #Argument validation which validates the product name and product version parameters. Since we are not going to use the input parameters.
3. change PRODUCT and VERSION variables to strings.
4. change in function copyBuildDirectory the three commands below #Copy cellery product to /Library/Cellery, \${PRODUCT}/\${VERSION} to the fixed target directory
5. comment createUninstaller command, since I don't use it.
6. add commands in postinstall script, such as changing the permission.

Finally, run bash build-macos-x64.sh to generate the pkg installer.

Monday, October 28, 2019

Compile and load kext

The problem is that I use MacOS 10.15, and the sdk is already installed in the folder:

/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/

So if I do "ls" to this folder, it shows:

DriverKit19.0.sdk MacOSX.sdk MacOSX10.15.sdk

Now I need to copy 10.14 sdk to this folder. The first thing I did is "svn export" the 10.14 sdk from

https://github.com/alexey-lysiuk/macos-sdk/tree/master/MacOSX10.14.sdk

I use this command, replace "tree/master" by "trunk":

svn export https://github.com/alexey-lysiuk/macos-sdk/trunk/MacOSX10.14.sdk

Then I "sudo cp -r" this folder to my SDKs folder. For compiling the project, I used:

xcodebuild -project "projectName.xcodeproj" -scheme "schemeName" -sdk macosx10.14 -derivedDataPath build clean build

Finally I changed the ownership of the kext:

sudo chown -R root:wheel yourKext.kext

Wednesday, August 28, 2019

My understanding of Word Insertion Penalty

In Daniel Jurafsky and James H. Martin's book Speech and Language Processing 2nd edition, page 317, the concept of Word Insertion Penalty has been introduced. But it's very confusing to me. The original description is

(on average) the language model probability decreases (causing a larger penalty), the decoder will prefer fewer, longer words. If the language model probability increases (larger penalty), the decoder will prefer more shorter words.

Pay attention that larger penaly appeared twice in two opposite conditions: when language model probability decreases and increases. This is not logical and I guess it's a typo of this book.

My understanding is:

If the language model probability doesn't take enough weight (importance), the decoder will prefer more shorter words since the multiplier of the language probabilities of all of the words would be small compared to acoustic probabilities. The language model scaling factor does reduce the weight of language model so that the Word Insertion Penalty should be introduced to avoid many insertions.

Tuesday, February 20, 2018

Retrieve the final hidden states output of variable length sequences in Tensorflow

Assume you have an input batch which contains variable length sequences. The batch dimension is:

input: [batch_size, max_time, dim_feature]

and you also stored the length of each sequence in a vector, say sequence_length. Now you can easily get the states output by:

_, state = tf.nn.dynamic_rnn(some_RNN_cell, input, sequence_length=sequence_length)

then you can get both the hidden and cell states output:

state.h: hidden states output, [batch_size, hidden_states_size]
state.c: cell states output

I give credit to these two sources:
https://danijar.com/variable-sequence-lengths-in-tensorflow/
https://github.com/shane-settle/neural-acoustic-word-embeddings/blob/4cc3878e6715860bcce202aea7c5a6b7284292a1/code/lstm.py#L25

Sunday, January 14, 2018

Sheet music and audio multimodal learning

https://arxiv.org/abs/1612.05050

Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.

https://arxiv.org/abs/1707.09887

Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?

Wednesday, January 3, 2018

If I were to write this paper... Drum transcription CRNN

https://ismir2017.smcnus.org/wp-content/uploads/2017/10/123_Paper.pdf

(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.

(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.

(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.

Wednesday, December 27, 2017

Capacity and trainability of different RNNs

In the paper "Capacity and trainability in RNNs": https://arxiv.org/pdf/1611.09913.pdf

The author claims that all common RNNs have similar capacity. The Vanilla RNN is super hard to train. If the task is hard to learn, one should choose gated architectures, in which GRU is the most learnable for shallow networks, +RNN (Intersection RNN) performs the best for the deep networks. Although LSTM is extremely reliable, it doesn't perform the best. If the training environment is uncertain, the author suggests using GRU or +RNN.

Another paper "On the state of the art of evaluation in neural language models" https://arxiv.org/pdf/1707.05589.pdf The authors also found that the standard LSTM performs the best among 3 different architectures (LSTM, Recurrent highway networks and Neural architecture search). The models are trained using a modified ADAM optimizer. Hyperparameters including learning rate, input embedding ratio, input dropout, output dropout, weight decay, are tuned by batched GP bandits.

It is also shown that, in the Penn Treebank experiment, for the recurrent state, the variational dropout helps, the recurrent dropout indicates no advantage.