Saturday, November 15, 2014

Melody extraction: Goto's and Salamon's methods

If someone works on the melody extraction algorithms, he definitely knows Goto's (PreFEst) and Salamon's methods. I don't know exactly why the first one is well known, perhaps it's a first well-performed melody extraction algorithm. Salamon's method (first published in 2011) has the best overall performance until now, and it's also an easy implemented method.

1. Salience function
Both methods are based on the calculation of the Salience function, but with different definitions.

1.1 Goto's F0 pdf
Goto constructed firstly two harmonic-structure tone models which have the same fundamental frequency (F0) but different harmonic amplitudes $\mu_{1,2}$. Each tone model is a weighted gaussian mixture where its harmonic components are modeled by gaussian function. Another weight coefficient $w_{1,2}$ is introduced to balance the overall amplitude of the tone models mixture.

Afterwards, he utilized EM algorithm to estimate the coefficients $\mu_{1,2}$ and $w_{1,2}$ assuming that each frame of spectrogram is a probability density function (pdf) and can be generated by this tone models mixture.

1.2 Salamon's salience function
Salamon's salience function is based on harmonic summation method. He assumed the frequency $f$ of each peak detected on the spectrogram is a harmonic partial of the fundamental frequency $f_0=f/h$. He calculate afterwards the distance between $f/h$ and the salience frequency bins $b$, which is weighted by the harmonic number $h$. Higher $h$ brings about lower contribution of the peak at frequency $f$.

This calculation is based on the Hermes's (Measurement of pitch by subharmonic summation) pitch detection theory. If we count the peak frequency $f$ as the $h_{th}$ harmonic of the fundamental frequency $f/h$, it's contribution to the detection of pitch $f/h$ is $\alpha^{h-1}$ which is a decreasing sequence.

Below is salience functions of a same audio clip:
Goto's F0 pdf (salience)
Salamon's salience function

2. F0 contour tracking
Due to less disturbed melody line in Goto's salience function, peaks detected in each frame are relatively less, which results in less F0 candidate contours in Goto's salience function.

Though Salamon set up two magnitude thresholds to filter out abundant peaks (one for each frame, another for overall spectrogram), there are still a lot of contours been tracked. Apart from the inherent characteristic of his salience function, the reasons of large amount of peaks can be the small hop size of STFT (which leads to more frames) and small pitch continuity limit (within 80 cents) in Salamon's implementation.
Goto's contour tracking

Salamon's contour tracking
3. Melody selection
Two methods have their own rules of selecting the melody from the F0 candidate contours.  Goto chooses the highest total salience power contour as the melody. However, Salamon's rules are way more complicated. Voicing detection, elimination octave error and outlier melody based on contour characteristics (pitch mean, standard deviation; salience mean, standard deviation etc.) are integrated in this final selection step.
Goto's melody selection

Salamon's melody selection
4. Matlab Code
Those who want the Matlab code of these two methods please contact with me. Leave a message below or send me an email. :=)


  1. Hi there,
    first of all, you deserve many kudos for all this great work!
    I am a newcomer in the field of audio information retrieval and would like to take a look on the matlab code for this example.
    Thanks in advance,

  2. Respected sir,

    I am second year post graduate student at indian institute of technology in signal processing branch.

    As i am novice and started working in melody extraction project, currently i am simulating melody extraction paper by J. Salamon.

    I have done it but getting some bugs and melody extraction is not proper.

    Can i have the matlab code for the the same if you dont mind.

    I have dont it with synchrosqueezing transform.

    Thanking you sir.

  3. Hi Rong Gong! You have done great work!
    I left a message to you on your Facebook. I am a music technology student in NYU. I am doing research on melody extraction using Salamon's method.Would you mind to send me the matlab?
    For more details please take look at your facebook message. Thank you so much

  4. Hi Rong~ It is a wonderful post !!
    I am a student in Fudan University working in melody extraction. And currently I'm constructing a baseline system. But it seems that something is wrong and I can't get the expected results. At your convenience, would you please send me the code for reference?
    Thank you for your assistance :P~~

  5. Respected sir,

    I'm an undergraduated student from National Chaou Tung University in Taiwan. I have a project which is based on melody extraction works. After reading bunch of papers of melody extraction method, I think i just couldn't figure out an approach to realize the works in those. May I have the matlab code if you don't mind. It will be extremely helping if you can send me the code.
    Thanks for your assistance!!

    1. Hi, if you really want to try the method in this post, I strongly suggest you to use Essentia package, this implementation is the replicate of Justin's paper, written in C, open-source.