pknn

Performs K-Nearest Neighbors Clustering on a set of objects.

Synopsis

pknn attr_base attr_in attr_out k [col_base|-] [col_in|-] [col_out|-]

Description

pknn is a partitioning method for a group of n objects into k clusters. The classifier works based on minimum distance from the query instance to the training samples to determine the K-nearest neighbors. After we gather K nearest neighbors, we take simple majority of these K-nearest neighbors to be the prediction of the query instance.

The distance measure between two objects xi and xj uses the euclidean distance:

    D_ij = [ SUM_{d=1;n} (xid - xjd)² ]^1/2

where xid is the feature d for the object i and xjd is the feature d for the object j.

Parameters

attr_base is the base name of the feature vector of the classified objects. The vectors are named attr_base.1, attr_base.2,..., attr_base.n in the input collection. The item j of the array attr_in.i contains the (i)th feature of the (j+1)th object. They are Double arrays. If the array attr_base.C is present then it contains the cluster number of each objects. Otherwise the ith object falls into the cluster i.
attr_in is the base name of the feature vector of the objects to be classified. The vectors are named attr_in.1, attr_in.2, ..., attr_in.n in the input collection. The item j of the array attr_in.i contains the (i)th feature of the (j+1)th object. They are Double arrays.
attr_out is the name of the output array. Each item i of the array contains the number of the cluster from which the (i)th object is assigned. attr_out is an array of unsigned longs where attr_out[i] specifies the cluster number for the object i.
k is the number of desired cluster.

Inputs

col_base: a collection which contains the feature vector of the classified objects.
col_in: a collection which contains the feature vector of the objects to be classified.

Outputs

col_out: a collection.

Result

Returns SUCCESS or FAILURE.

Examples

Classifies beans into the jellybean.pan image from sample of each bean stored in the directory 'base' (Unix version).

# Learning
   classes=1;
   for i in base/*.pan
   do
      pim2array ind $i /tmp/tmp1 
      parraysize ind.1 /tmp/tmp1
      size=`pstatus`
      pcreatearray ind.C Ushort $size $classes | pcolcatenateitem /tmp/tmp1 - i-01.pan
      if [ -f base.pan ]
      then pcolcatenateitem i-01.pan base.pan base.pan
      else cp i-01.pan base.pan
      fi
      classes=`expr $classes + 1`
   done 

# Classification
   pproperty 0 jellybeans.pan
   ncol=`pstatus`
   pproperty 1 jellybeans.pan
   nrow=`pstatus`

   pim2array ind jellybeans.pan | pknn ind ind ind 10 base.pan - | parray2im $ncol $nrow 0 ind | pim2rg - out.pan

C++ prototype

Errc PKnn(const std::string &a_base, const Collection &c_base, const std::string &a_in, const Collection &c_in, const std::string &a_out, Collection &c_out, int K);

Version française

Classification selon les k plus proches voisins.

Author: Alexandre Duret-Lutz