Request for Changes-58: Sampling on multiple images

From OTBWiki
Jump to: navigation, search

[Request for Changes - 58] Sampling on multiple images

Status

  • Author: Guillaume Pasero
  • Additional Contributors (if different than authors)
  • Submitted on 05.09.2016
  • Proposed target release : 5.8
  • Adopted : +4 from Julien, Victor, Rémi, Guillaume
  • Merged : bf4b24dc5ed52f6c7dcf343c8bb26b186e2b9454

Summary

This RFC brings useful features to generate samples from several images and feed them into a classifier. See RFComments 20

Rationale

The existing TrainImagesClassifier allows to work on several images for training. With the new sampling framework, we also need to do the same : prepare samples from multiple images and feed them into a classifier.

Implementation details

Classes and files

M       Modules/Filtering/Statistics/src/otbSamplerBase.cxx
M       Modules/Learning/Sampling/include/otbSamplingRateCalculator.h
A       Modules/Learning/Sampling/include/otbSamplingRateCalculatorList.h
M       Modules/Learning/Sampling/src/CMakeLists.txt
M       Modules/Learning/Sampling/src/otbSamplingRateCalculator.cxx
A       Modules/Learning/Sampling/src/otbSamplingRateCalculatorList.cxx

The base sampler class otb::SamplerBase has been modified to clamp the requested number of samples to the total (and issue a warning). This situation may happen when the user plays with the sampling parameters. It was chosen to issue a warning and process with correct settings rather than crash.

A new class has been added : otb::SamplingRateCalculatorList, which is an ObjectList of SamplingRateCalculator. It allows to compute sampling rate in each of its element, based on :

  • the existing strategy : all / byClass / constant / smallest
  • a specific mode for multi-image :
    • proportional mode : the requested number of samples is divided proportionally among images.
    • equal mode : the requested number of samples is divided equally among images.
    • custom mode : the requested number of samples is split among images by the user.

The class otb::SamplingRateCalculator now has a static method to read required sample numbers from a file. That method was in the application SampleSelection, but it makes more sense to put here. Also, this is factorized code between several applications.

Applications

M       Modules/Applications/AppClassification/app/CMakeLists.txt
A       Modules/Applications/AppClassification/app/otbMultiImageSamplingRate.cxx
M       Modules/Applications/AppClassification/app/otbSampleSelection.cxx
M       Modules/Applications/AppClassification/app/otbTrainVectorClassifier.cxx

The application MultiImageSamplingRate has been added to compute per image sampling rates, based on a list of polygon statistics on each input image.

The application TrainVectorClassifiers now supports several files in input (and for validation).

The application SampleSelection now signals overflows (required > total number). The function to read required samples from file has been moved to otb::SamplingRateCalculator.

The intended multi-image workflow is :

  1. Use PolygonClassStatistics on each image
  2. Use MultiImageSamplingRate to compute sampling rates for each image (in input : statistics of each image in XML file)
  3. Use SampleSelection with each image. The computed rates are set via the parameter strategy.byclass.in)
  4. Use SampleExtraction for each image.
  5. Use TrainVectorClassifiers with the list of sample files.


Tests

M       Modules/Learning/Sampling/test/CMakeLists.txt
A       Modules/Learning/Sampling/test/otbSamplingRateCalculatorListTest.cxx
M       Modules/Learning/Sampling/test/otbSamplingTestDriver.cxx
M       Modules/Applications/AppClassification/test/CMakeLists.txt

Tests have been added for the class otb::SamplingRateCalculatorList and for the application MultiImageSamplingRate.

Documentation

Documentation to be added in CookBook.

Additional notes

The different behaviours for each mode is described here.

Ti( c ) and Ni( c ) refers resp. to the total number and needed number of samples in image i for class c. Let's call L the total number of image.

  • Strategy = all
    • Same behaviour for all modes proportional, equal, custom : take all samples
  • Strategy = constant (let's call M the global number of samples per class required)
    • Mode = proportional : For each image i and each class c,
      • Ni( c ) = M * Ti( c ) / sum_k( Tk(c) )
    • Mode = equal : For each image i and each class c,
      • Ni( c ) = M / L
    • Mode = custom : For each image i and each class c,
      • Ni( c ) = Mi where Mi is the custom requested number of samples for image i
  • Strategy = byClass (let's call M(c) the global number of samples for class c)
    • Mode = proportional : For each image i and each class c
      • Ni( c ) = M(c) * Ti( c ) / sum_k( Tk(c) )
    • Mode = equal : For each image i and each class c,
      • Ni( c ) = M(c) / L
    • Mode = custom : For each image i and each class c,
      • Ni( c ) = Mi(c) where Mi(c) is the custom requested number of samples for image i and class c
  • Strategy = smallest class
    • Mode = proportional : the smallest class is computed globally, then this smallest size is used for the strategy constant+proportional
    • Mode = equal : the smallest class is computed globally, then this smallest size is used for the strategy constant+equal
    • Mode = custom : the smallest class is computed and used for each image separately