We are going to create a pkg that copies some files to the /Library directory. We will use the tool macos-installer-builder, which could be downloaded from https://github.com/KosalaHerath/macos-installer-builder
The usage of this tool has been well explained in the github page. The default build-macos-x64.sh script requires two parameters: product name and product version. We are going to modify this script to generate a pkg installer that copies the application files into a fixed directory.
1. copy the application's files into ./application/ folder
2. comment the section in #Argument validation which validates the product name and product version parameters. Since we are not going to use the input parameters.
3. change PRODUCT and VERSION variables to strings.
4. change in function copyBuildDirectory the three commands below #Copy cellery product to /Library/Cellery, \${PRODUCT}/\${VERSION} to the fixed target directory
5. comment createUninstaller command, since I don't use it.
6. add commands in postinstall script, such as changing the permission.
Finally, run bash build-macos-x64.sh to generate the pkg installer.
Rong GONG's music/speech processing blog
Saturday, November 16, 2019
Monday, October 28, 2019
Compile and load kext
The problem is that I use MacOS 10.15, and the sdk is already installed in the folder:
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/
So if I do "ls" to this folder, it shows:
DriverKit19.0.sdk MacOSX.sdk MacOSX10.15.sdk
Now I need to copy 10.14 sdk to this folder. The first thing I did is "svn export" the 10.14 sdk from
https://github.com/alexey-lysiuk/macos-sdk/tree/master/MacOSX10.14.sdk
I use this command, replace "tree/master" by "trunk":
svn export https://github.com/alexey-lysiuk/macos-sdk/trunk/MacOSX10.14.sdk
Then I "sudo cp -r" this folder to my SDKs folder. For compiling the project, I used:
xcodebuild -project "projectName.xcodeproj" -scheme "schemeName" -sdk macosx10.14 -derivedDataPath build clean build
Finally I changed the ownership of the kext:
sudo chown -R root:wheel yourKext.kext
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/
So if I do "ls" to this folder, it shows:
DriverKit19.0.sdk MacOSX.sdk MacOSX10.15.sdk
Now I need to copy 10.14 sdk to this folder. The first thing I did is "svn export" the 10.14 sdk from
https://github.com/alexey-lysiuk/macos-sdk/tree/master/MacOSX10.14.sdk
I use this command, replace "tree/master" by "trunk":
svn export https://github.com/alexey-lysiuk/macos-sdk/trunk/MacOSX10.14.sdk
Then I "sudo cp -r" this folder to my SDKs folder. For compiling the project, I used:
xcodebuild -project "projectName.xcodeproj" -scheme "schemeName" -sdk macosx10.14 -derivedDataPath build clean build
Finally I changed the ownership of the kext:
sudo chown -R root:wheel yourKext.kext
Labels:
drivers,
kernel extensions,
Kext,
MacOS
Location:
Vienna, Austria
Wednesday, August 28, 2019
My understanding of Word Insertion Penalty
In Daniel Jurafsky and James H. Martin's book Speech and Language Processing 2nd edition, page 317, the concept of Word Insertion Penalty has been introduced. But it's very confusing to me. The original description is
(on average) the language model probability decreases (causing a larger penalty), the decoder will prefer fewer, longer words. If the language model probability increases (larger penalty), the decoder will prefer more shorter words.
Pay attention that larger penaly appeared twice in two opposite conditions: when language model probability decreases and increases. This is not logical and I guess it's a typo of this book.
My understanding is:
If the language model probability doesn't take enough weight (importance), the decoder will prefer more shorter words since the multiplier of the language probabilities of all of the words would be small compared to acoustic probabilities. The language model scaling factor does reduce the weight of language model so that the Word Insertion Penalty should be introduced to avoid many insertions.
(on average) the language model probability decreases (causing a larger penalty), the decoder will prefer fewer, longer words. If the language model probability increases (larger penalty), the decoder will prefer more shorter words.
Pay attention that larger penaly appeared twice in two opposite conditions: when language model probability decreases and increases. This is not logical and I guess it's a typo of this book.
My understanding is:
If the language model probability doesn't take enough weight (importance), the decoder will prefer more shorter words since the multiplier of the language probabilities of all of the words would be small compared to acoustic probabilities. The language model scaling factor does reduce the weight of language model so that the Word Insertion Penalty should be introduced to avoid many insertions.
Tuesday, February 20, 2018
Retrieve the final hidden states output of variable length sequences in Tensorflow
Assume you have an input batch which contains variable length sequences. The batch dimension is:
input: [batch_size, max_time, dim_feature]
and you also stored the length of each sequence in a vector, say sequence_length. Now you can easily get the states output by:
_, state = tf.nn.dynamic_rnn(some_RNN_cell, input, sequence_length=sequence_length)
then you can get both the hidden and cell states output:
state.h: hidden states output, [batch_size, hidden_states_size]
state.c: cell states output
I give credit to these two sources:
https://danijar.com/variable-sequence-lengths-in-tensorflow/
https://github.com/shane-settle/neural-acoustic-word-embeddings/blob/4cc3878e6715860bcce202aea7c5a6b7284292a1/code/lstm.py#L25
input: [batch_size, max_time, dim_feature]
and you also stored the length of each sequence in a vector, say sequence_length. Now you can easily get the states output by:
_, state = tf.nn.dynamic_rnn(some_RNN_cell, input, sequence_length=sequence_length)
then you can get both the hidden and cell states output:
state.h: hidden states output, [batch_size, hidden_states_size]
state.c: cell states output
I give credit to these two sources:
https://danijar.com/variable-sequence-lengths-in-tensorflow/
https://github.com/shane-settle/neural-acoustic-word-embeddings/blob/4cc3878e6715860bcce202aea7c5a6b7284292a1/code/lstm.py#L25
Sunday, January 14, 2018
Sheet music and audio multimodal learning
https://arxiv.org/abs/1612.05050
Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.
https://arxiv.org/abs/1707.09887
Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?
Toward score following in sheet music: use classification to find note head position in the sheet music. Given an audio spectrogram patch, classify the location bucket.
https://arxiv.org/abs/1707.09887
Learning audio - sheet music correspondences for score identification and offline alignment: pair wise ranking objective and contrastive loss (siamese), what's the difference?
Wednesday, January 3, 2018
If I were to write this paper... Drum transcription CRNN
https://ismir2017.smcnus.org/wp-content/uploads/2017/10/123_Paper.pdf
(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.
(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.
(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.
(1) I will specify the dropout size used for the BGRU layers, unless we can attribute the better performance of the CBGRU to overfitting.
(2) I will report the parameter numbers of different models. For sure, a model with more parameters will have more capacity. In such way, the better performance of CBGRU-b than the CNN-b could be attributed its larger parameter size.
(3) The CNN-b seems to perform really well. I will fix the Conv layers in CNN-b model, switch the Dense layers to GRU layers to see if GRU can really outperform.
Wednesday, December 27, 2017
Capacity and trainability of different RNNs
In the paper "Capacity and trainability in RNNs": https://arxiv.org/pdf/1611.09913.pdf
The author claims that all common RNNs have similar capacity. The Vanilla RNN is super hard to train. If the task is hard to learn, one should choose gated architectures, in which GRU is the most learnable for shallow networks, +RNN (Intersection RNN) performs the best for the deep networks. Although LSTM is extremely reliable, it doesn't perform the best. If the training environment is uncertain, the author suggests using GRU or +RNN.
Another paper "On the state of the art of evaluation in neural language models" https://arxiv.org/pdf/1707.05589.pdf The authors also found that the standard LSTM performs the best among 3 different architectures (LSTM, Recurrent highway networks and Neural architecture search). The models are trained using a modified ADAM optimizer. Hyperparameters including learning rate, input embedding ratio, input dropout, output dropout, weight decay, are tuned by batched GP bandits.
It is also shown that, in the Penn Treebank experiment, for the recurrent state, the variational dropout helps, the recurrent dropout indicates no advantage.
The author claims that all common RNNs have similar capacity. The Vanilla RNN is super hard to train. If the task is hard to learn, one should choose gated architectures, in which GRU is the most learnable for shallow networks, +RNN (Intersection RNN) performs the best for the deep networks. Although LSTM is extremely reliable, it doesn't perform the best. If the training environment is uncertain, the author suggests using GRU or +RNN.
Another paper "On the state of the art of evaluation in neural language models" https://arxiv.org/pdf/1707.05589.pdf The authors also found that the standard LSTM performs the best among 3 different architectures (LSTM, Recurrent highway networks and Neural architecture search). The models are trained using a modified ADAM optimizer. Hyperparameters including learning rate, input embedding ratio, input dropout, output dropout, weight decay, are tuned by batched GP bandits.
It is also shown that, in the Penn Treebank experiment, for the recurrent state, the variational dropout helps, the recurrent dropout indicates no advantage.
Labels:
Deep learning,
GRU,
LSTM,
Recurrent neural networks,
RNNs
Location:
Beijing, China
Subscribe to:
Posts (Atom)