Classification OTB applications

From OTBWiki
Jump to: navigation, search

A new classification framework has been made available for OTB applications since OTB 3.18. It is based on the Machine Learning framework of OpenCV.

For those who used the former OTB classification framework, exclusively based on the SVM method (within the libSVM library), some modifications should be considered.

Why?

This brings the following benefits

  • a generic classification application with choosable classifiers
  • additional independent classifiers
  • easier to add new classifiers
  • easier to maintain the existing classifiers

What?

The main modification concerns the addition of new classifiers implemented within the OpenCV library such as


In order to handle these additional classifiers within the generic classification applications the following applications were removed:

  • TrainSVMImagesClassifier, replaced by TrainImagesClassifier
  • ImageSVMClassifier, replaced by ImageClassifier
  • ValidateSVMImagesClassifier, NOT replaced because its use is redundant with the ComputeConfusionMatrix application
  • ValidateImagesClassifier, removed because its use is redundant with the ComputeConfusionMatrix application

How?

Here are examples of how to use the new classes:

Training with the former libSVM based OTB framework

Replace the old command line:

otbcli_TrainSVMImagesClassifier
 -io.il input_training_multichannel_image_1.tif input_training_multichannel_image_2.tif input_training_multichannel_image_3.tif 
 -io.vd input_training_vector_data_1.shp input_training_vector_data_2.shp input_training_vector_data_3.shp 
 -io.imstat output_training_statistics_file.xml
 -svm.opt true
 -io.out output_SVM_model.svm
 -rand 121212

by the equivalent new command line:

otbcli_TrainImagesClassifier
 -io.il input_training_multichannel_image_1.tif input_training_multichannel_image_2.tif input_training_multichannel_image_3.tif 
 -io.vd input_training_vector_data_1.shp input_training_vector_data_2.shp input_training_vector_data_3.shp 
 -io.imstat output_training_statistics_file.xml
 -classifier libsvm
 -classifier.libsvm.opt true
 -io.out output_SVM_model.svm
 -rand 121212

The new -classifier parameter lets the user choose other classification methods (available values are: libsvm, svm, boost, dt, gbt, ann, bayes, rf, knn) with their specific additional parameters.


Classification

Replace the old command line:

otbcli_ImageSVMClassifier
 -in input_multichannel_image_to_classify.tif 
 -imstat input_training_statistics_file.xml
 -svm input_SVM_model.svm
 -out output_monochannel_classified_image.tif

by the equivalent new command line:

otbcli_ImageClassifier
 -in input_multichannel_image_to_classify.tif 
 -imstat input_training_statistics_file.xml
 -model input_SVM_model.svm
 -out output_monochannel_classified_image.tif


Model validation

As explained before, both applications ValidateSVMImagesClassifier and ValidateImagesClassifier were removed because they are redundant with the ComputeConfusionMatrix application which is recommanded to proceed a model validation.

However, both ValidateSVMImagesClassifier and ComputeConfusionMatrix applications are not strictly speaking equivalent, since they do not have the same API interfaces.

  • ValidateSVMImagesClassifier:
Parameters: 
       -il           <string list>    Input Multichannel Image List  (mandatory)
       -vd           <string list>    Vector Data List (mandatory)
       -imstat       <string>         XML image statistics file (optional, off by default)
       -elev.dem     <string>         DEM directory (optional, off by default)
       -elev.geoid   <string>         Geoid File (optional, off by default)
       -elev.default <float>          Default elevation (mandatory, default value is 0)
       -out          <string>         Output filename (optional, off by default)
       -svm          <string>         SVM validation filename (mandatory)
       -rand         <int32>          set user defined seed (optional, off by default)


Thus, the former ValidateSVMImagesClassifier computed a SVM classification from an input *.SVM model, over samples randomly selected into images. The classification results were then compared to the actual class labels represented by the input vector data files, in order to generate a global confusion matrix merging the performances of the *.SVM classifier over ALL the input images and validation vector data files. The output *.TXT file contained then several measurements processed over this confusion matrix (like the precision, the recall, the Kappa index,...)


  • ComputeConfusionMatrix:
Parameters: 
       -in               <string>         Input Monochannel Labelled Image  (mandatory)
       -out              <string>         Matrix output (*.CSV file) (mandatory)
       -ref              <string>         Ground truth [raster/vector] (mandatory, default value is raster)
       -ref.raster.in    <string>         Input reference image (mandatory)
       -ref.vector.in    <string>         Input reference vector data (mandatory)
       -ref.vector.field <string>         Field name (optional, off by default, default value is Class)
       -nodatalabel      <int32>          Value for nodata pixels ignored by the validation process (optional, off by default, default value is 0)
       -ram              <int32>          Available RAM (Mb) (optional, on by default, default value is 1

Contrarily to the former ValidateSVMImagesClassifier application, the ComputeConfusionMatrix application directly computes the confusion matrix of a single classification labelled map from a validation/reference raster labelled image or vector data containing labelled polygons. The output is then a *.CSV file representing the confusion matrix itself. The measurements computed over it are only written in the application LOG for informational purposes.


As expected, both applications are not equivalent because they handle different APIs. In order to obtain results from the current ComputeConfusionMatrix application equivalent to those given by the former ValidateSVMImagesClassifier application, the method to implement would be the following:

Let consider N classification maps and their corresponding N reference raster/vector data (ground truth) from which we intend to compute a global confusion matrix over which we would then compute performance measurements such as the precision, the recall or the Kappa index.

-Generate N *.CSV confusion matrix files for each individual classification map with the ComputeConfusionMatrix application
-Merge these N *.CSV confusion matrix files as a global confusion matrix simply by summing their values element by element
-Compute the measurements over this merged confusion matrix with the help of the ConfusionMatrixMeasurements OTB filter 
-An example is available in the testing code otbConfusionMatrixConcatenateTest in the file otbConfusionMatrixMeasurementsTest.cxx