Joan Serrà published first time his musical structure segmentation method in 2012 in article "Unsupervised Detection of Music Boundaries by Time Series Structure Features" where he dealt with the segment boundary detection. In 2014, he published another article "Unsupervised Music Structure Annotation by Time Series Structure Features and Segment Similarity" which presents his segment labelling method.
1. Method overview
The method contains three main steps:
1. Descriptor extraction
2. Segment boundaries detection
3. Segment labeling
The descriptor used in article is HPCPs (Harmonic Pitch Class Profile) which is the an extension of Chroma descriptor. For the latter, we use all the spectrum content filtered by Chroma filter bank. Whereas, we just pick the harmonic peaks in the spectrum to calculate HPCPs, which theoretically immune to the spectrum noise content assuming the accuracy of harmonic peak picking.
Considering the contribution of each harmonic to the pitch perception, Harmonic summation method is integrated in the calculation of HPCPs.
2. Segment boundaries detection
The so-called music descriptor time series is the feature vector (HPCPs) of STFT spectrum. I think his innovation is incorporating the delayed coordinates to the feature vector (concatenate the near past feature vectors with the current one) and this does improve enormously the segment labeling accuracy.
Then he calculated the recurrence plot of the dissimilarity matrix which I think is the most ingenious detail of this method. The recurrence plot highlights the mutual similar segments (the similarities of frame $i$ to $j$ and $j$ to $i$ should both be above a threshold) by filtering out the dissimilarities with a dynamic threshold. The result of this plot is similar to Goto's de-noised self-similarity matrix, but with much less computation time.
Recurrence plot |
Smoothed lag-time recurrence plot |
3. Segment labeling
This procedure is very clear in article. The enforcement of transitivity by matrix multiplication is one of my favorite tricks in this method. However, I don't understand why the $Q_{max}$ measure can stand for the global similarity. If someone could point it out I will be so grateful.
4. Matlab Code
Those who want the Matlab code of this method please leave a message below or send me an email. :)
Hello,
ReplyDeleteI need to matlab code where we can get structural parts of a song.