Legacy Product

Fusion 5.10
    Fusion 5.10

    2D K-Means Clustering (kmeans)

    The kmeans function performs 2D k-means clustering. 2D k-means clustering can be used to visualize patterns within 2D scatter plots. The kmeans function takes three parameters:

    1. The numeric field for the first dimension

    2. The numeric field for the second dimension

    3. K or number of clusters

    Sample syntax

    select kmeans(petal_length_d, petal_width_d, 5) as cluster,
           petal_length_d,
           petal_width_d
    from iris
           limit 150

    Result set

    The result set contains a random sample of records that match the WHERE clause. If no WHERE clause is included, the random sample will be taken from the entire result set. The size of the random sample can be controlled by the LIMIT clause. The default sample size, if no limit is applied, is 25,000.

    The kmeans function returns the cluster name of each row in the result set. The two fields used for clustering are also available in the result set.

    Sample result set in Apache Zeppelin

    Sample result

    Visualization

    Sample visualization of kmeans cluster with Apache Zeppelin scatter plot.

    Sample visualization