**Goto**'s RefraiD method is accurate for detecting the repetitive sections (verse, chorus etc.) of a song. But it's not a efficient method due to its tedious post-processing procedures, for example, line segments grouping. And it can't detect automatically segment boundaries or group similar segments.

**Joan Serrà**published first time his musical structure segmentation method in 2012 in article "Unsupervised Detection of Music Boundaries by Time Series Structure Features" where he dealt with the segment boundary detection. In 2014, he published another article "Unsupervised Music Structure Annotation by Time Series Structure Features and Segment Similarity" which presents his segment labelling method.

**1. Method overview**

The method contains three main steps:

1. Descriptor extraction

2. Segment boundaries detection

3. Segment labeling

The descriptor used in article is

**HPCPs**(Harmonic Pitch Class Profile) which is the an extension of

**Chroma**descriptor. For the latter, we use all the spectrum content filtered by Chroma filter bank. Whereas, we just pick the harmonic peaks in the spectrum to calculate HPCPs, which theoretically immune to the spectrum noise content assuming the accuracy of harmonic peak picking.

Considering the contribution of each harmonic to the pitch perception,

**Harmonic summation**method is integrated in the calculation of HPCPs.

**2. Segment boundaries detection**

The so-called

*music descriptor time series*is the feature vector (HPCPs) of STFT spectrum. I think his innovation is incorporating the delayed coordinates to the feature vector (concatenate the near past feature vectors with the current one) and this does improve enormously the segment labeling accuracy.

Then he calculated the

**recurrence plot**of the dissimilarity matrix which I think is the most ingenious detail of this method. The recurrence plot highlights the mutual similar segments (the similarities of frame $i$ to $j$ and $j$ to $i$ should both be above a threshold) by filtering out the dissimilarities with a dynamic threshold. The result of this plot is similar to Goto's de-noised self-similarity matrix, but with much less computation time.

Recurrence plot |

**lag-time form**and smoothed the latter with a gaussian kernel which is similar to the gaussian-tapered kernel used by

**J. Foote.**The difference here is that the convolution should be done on the whole matrix (plot) instead of only the diagonal.

Smoothed lag-time recurrence plot |

**3. Segment labeling**

This procedure is very clear in article. The enforcement of transitivity by matrix multiplication is one of my favorite tricks in this method. However, I don't understand why the $Q_{max}$ measure can stand for the global similarity. If someone could point it out I will be so grateful.

**4. Matlab Code**

Those who want the Matlab code of this method please leave a message below or send me an email. :)

## No comments:

## Post a Comment