Cluster of songs using distances

Once the distance metric is calculated (the output from foucluster.distance.distance_matrix), this distances between the songs are used as features for applying clustering.

Several methodologies from sklearn are imported:

  • Indicating the number of clusters (KMeans,
    AgglomerativeClustering, SpectralClustering).
  • Without the number of clusters (AffinityPropagation,
    MeanShift).

For the first type of clusters,

foucluster.cluster.determinist_cluster(dist_df, method, n_clusters)[source]

Clustering of the songs from the dataframe, indicating the number of clusters to use.

Parameters:
  • dist_df (pandas.DataFrame) –
  • method (str) –

    name of the sklearn.cluster.

    • cluster.AgglomerativeClustering.
    • cluster.SpectralClustering.
    • cluster.KMeans.
  • n_clusters (int) –
Returns:

pandas.DataFrame with a column with clusters.

For both types of clusters,

foucluster.cluster.automatic_cluster(dist_df, method)[source]
Parameters:
  • dist_df (pd.DataFrame) –
  • method (str) –

    name of the sklearn.cluster.

    • cluster.AffinityPropagation.
    • cluster.MeanShift.
    • cluster.AgglomerativeClustering.
    • cluster.SpectralClustering.
    • cluster.KMeans.
Returns:

pandas.DataFrame with a column with clusters.

When an algorithm which needs the number of clusters, like KMeans, is used with automatic_cluster, it calls to jump method to calculate the number of clusters.

foucluster.cluster.jump_method(dist_df, n_max=50)[source]

Method based on information theory to determine best number of clusters.

Parameters:
  • dist_df (pandas.DataFrame) –
  • n_max (int) – maximum number of clusters to test.
Returns:

optimal number of clusters