Cluster of songs using distances¶

Once the distance metric is calculated (the output from foucluster.distance.distance_matrix), this distances between the songs are used as features for applying clustering.

Several methodologies from sklearn are imported:

Indicating the number of clusters (KMeans,

AgglomerativeClustering, SpectralClustering).
Without the number of clusters (AffinityPropagation,

MeanShift).

For the first type of clusters,

foucluster.cluster.determinist_cluster(dist_df, method, n_clusters)[source]¶

Clustering of the songs from the dataframe, indicating the number of clusters to use.

Parameters:	dist_df (pandas.DataFrame) – method (str) – name of the sklearn.cluster. cluster.AgglomerativeClustering. cluster.SpectralClustering. cluster.KMeans. n_clusters (int) –
Returns:	pandas.DataFrame with a column with clusters.

For both types of clusters,

foucluster.cluster.automatic_cluster(dist_df, method)[source]¶

Parameters:	dist_df (pd.DataFrame) – method (str) – name of the sklearn.cluster. cluster.AffinityPropagation. cluster.MeanShift. cluster.AgglomerativeClustering. cluster.SpectralClustering. cluster.KMeans.
Returns:	pandas.DataFrame with a column with clusters.

When an algorithm which needs the number of clusters, like KMeans, is used with automatic_cluster, it calls to jump method to calculate the number of clusters.

foucluster.cluster.jump_method(dist_df, n_max=50)[source]¶

Method based on information theory to determine best number of clusters.

Parameters:	dist_df (pandas.DataFrame) – n_max (int) – maximum number of clusters to test.
Returns:	optimal number of clusters