pkmeans

Performs K-means clustering on a set of objects.

Synopsis

pkmeans attr_in attr_out k maxiter [col_in|-] [col_out|-]

Description

pkmeans classifies a given set of objects into K clusters from their features. The object features are specified into col_in as a set of vectors attr_in.1, attr_in.2, ..., attr_in.n.

K-means is a partitioning method for a group of n objects into k clusters which uses the following steps:

Place k points into the space represented by the objects that are being clustered. These points represent initial group centroids.
Assign each object to the group that has the closest centroid.
When all objects have been assigned, recalculate the positions of the K centroids.
Repeat steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the distance to be minimized can be calculated.

The distance measure between an object i and the cluster center Cj uses the euclidean distance:

    D_ij = [ SUM_{d=1;n} (xid - Cjd )² ] ^1/2

where xid is the feature d for the object i and cjd is the feature d for the centroid Cj.

Parameters

attr_in is the base name of the feature vector. The vectors are named attr_in.1, attr_in.2, ..., attr_in.n in the input collection. The item j of the array attr_in.i contains the (i)th feature of the (j+1)th object. They are Double arrays.
attr_out is the name of the output array. Each item i of the array contains the number of the cluster from which the (i)th object is assigned. attr_out is an array of unsigned longs where attr_out[i] specifies the cluster number for the object i.
k is the number of desired cluster.
maxiter is the maximum number of iteration (in case of divergence).

Inputs

col_in: a collection which contains the object features.

Outputs

col_out: a collection which contains the assignment vector (object -> cluster).

Result

Returns SUCCESS or FAILURE.

Examples

Segments the tangram.pan image thanks to a K-means clustering of the pixels based on mean and variance features:

   pmeanfiltering 1 tangram.pan moy.pan
   pvariancefiltering 0 255 tangram.pan var.pan

   pim2array data.1 moy.pan data1.colc
   pim2array data.2 var.pan data2.colc
   parray2array data.1 Float data1.colc data1.cold
   parray2array data.2 Float data2.colc data2.cold
   pcolcatenateitem data1.cold data2.cold data3.cold
   parraysnorm data data3.cold data3.cold

   pkmeans data attrib 5 100 data3.cold cluster.cold

   pproperty 0 tangram.pan
   w=`pstatus`
   pproperty 1 tangram.pan
   h=`pstatus`

   parray2im $h $w 0 attrib cluster.Cold kmeans.pan
   pim2rg kmeans.pan classif1_out.pan

C++ prototype

Errc PKmeans( const std::string &a_in, const Collection &c_in, const std::string &a_out, Collection &c_out, int k, int max );

Version française

Classification automatique selon les K-moyennes.

Author: Alexandre Duret-Lutz