Racket Machine Learning — K-Nearest Neighbors
1 Module rml-knn/  classifier
make-knn-classifier
nearest-k
1.1 Data Transformations
fuzzify
1.0

Racket Machine Learning — K-Nearest Neighbors

Simon Johnston <[email protected]>

This package provides an implementation of the k-Nearest Neighbors algorithm for classification. It provides both a straightforward classifier function that takes a data set and an individual and returns the set of predicted classifier values for that individual.

The classifier function provided by this module can be used by the higher-order classification functions classify, cross-classify, and partitioned-test-classify provided by the package rml-core.

For more information on the k-NN algorithm, see Wikipedia and Scholar.

1 Module rml-knn/classifier

 (require rml-knn/classifier) package: rml-knn

This package contains the procedures that implement the k-NN classifier itself. The classifier function returned from make-knn-classifier will in turn provide a list of classifiervalues predicted for an individual. Alternately, the nearest-k function will provide the set of closest neighbors that the classifier uses.

Examples:
> (require rml/data rml/individual rml-knn/classifier)
> (define iris-data
    (load-data-set (path->string (collection-file-path
                                   "test/iris_training_data.csv" "rml"))
                   'csv
                   (list
                     (make-feature "sepal-length" #:index 0)
                     (make-feature "sepal-width" #:index 1)
                     (make-feature "petal-length" #:index 2)
                     (make-feature "petal-width" #:index 3)
                     (make-classifier "classification" #:index 4))))
> (define an-iris
    (make-individual #:data-set iris-data
                     "sepal-length" 6.3
                     "sepal-width" 2.5
                     "petal-length" 4.9
                     "petal-width" 1.5
                     "classification" "Iris-versicolor"))
> (define classify (make-knn-classifier 5))
> (classify iris-data default-partition an-iris)

'("Iris-virginica")

The code block above demonstrates the classifier by constructing an individual and classifying it against the loaded data-set. Note that in this example the classifier returned Iris-virginica, whereas the individual was labeled as Iris-versicolor.

constructor

(make-knn-classifier k)  classifier/c

  k : exact-positive-integer?
This procedure will produce a classifier function that conforms to the classifier/c contract. The resulting function returns a list of classifier values predicted for the provided individual based on the k-nearest neighbors in dataset.

procedure

(nearest-k dataset partition individual k)  list?

  dataset : data-set?
  partition : exact-nonnegative-integer?
  individual : individual?
  k : exact-positive-integer?
This procedure will return the k nearest neighbors to the provided individual in dataset.

1.1 Data Transformations

transform

(fuzzify features)  data-set?

  features : (listof string?)
Attempts to improve the accuracy of classification by mapping values into membership sets. This is sometimes known as Fuzzy k-Nearest Neighbors, or FkNN

From Scholarpedia:

… is a transformation which exploits uncertainty in feature values in order to increase classification performance. Fuzzification replaces the original features by mapping original values of an input feature into 3 fuzzy sets representing linguistic membership functions in order to facilitate the semantic interpretation of each fuzzy set }