Request for Changes-58: Sampling on multiple images
Contents
[Request for Changes - 58] Sampling on multiple images
Status
- Author: Guillaume Pasero
- Additional Contributors (if different than authors)
- Submitted on 05.09.2016
- Proposed target release : 5.8
- Adopted : +4 from Julien, Victor, Rémi, Guillaume
- Merged : bf4b24dc5ed52f6c7dcf343c8bb26b186e2b9454
Summary
This RFC brings useful features to generate samples from several images and feed them into a classifier. See RFComments 20
Rationale
The existing TrainImagesClassifier allows to work on several images for training. With the new sampling framework, we also need to do the same : prepare samples from multiple images and feed them into a classifier.
Implementation details
Classes and files
M Modules/Filtering/Statistics/src/otbSamplerBase.cxx M Modules/Learning/Sampling/include/otbSamplingRateCalculator.h A Modules/Learning/Sampling/include/otbSamplingRateCalculatorList.h M Modules/Learning/Sampling/src/CMakeLists.txt M Modules/Learning/Sampling/src/otbSamplingRateCalculator.cxx A Modules/Learning/Sampling/src/otbSamplingRateCalculatorList.cxx
The base sampler class otb::SamplerBase has been modified to clamp the requested number of samples to the total (and issue a warning). This situation may happen when the user plays with the sampling parameters. It was chosen to issue a warning and process with correct settings rather than crash.
A new class has been added : otb::SamplingRateCalculatorList, which is an ObjectList of SamplingRateCalculator. It allows to compute sampling rate in each of its element, based on :
- the existing strategy : all / byClass / constant / smallest
- a specific mode for multi-image :
- proportional mode : the requested number of samples is divided proportionally among images.
- equal mode : the requested number of samples is divided equally among images.
- custom mode : the requested number of samples is split among images by the user.
The class otb::SamplingRateCalculator now has a static method to read required sample numbers from a file. That method was in the application SampleSelection, but it makes more sense to put here. Also, this is factorized code between several applications.
Applications
M Modules/Applications/AppClassification/app/CMakeLists.txt A Modules/Applications/AppClassification/app/otbMultiImageSamplingRate.cxx M Modules/Applications/AppClassification/app/otbSampleSelection.cxx M Modules/Applications/AppClassification/app/otbTrainVectorClassifier.cxx
The application MultiImageSamplingRate has been added to compute per image sampling rates, based on a list of polygon statistics on each input image.
The application TrainVectorClassifiers now supports several files in input (and for validation).
The application SampleSelection now signals overflows (required > total number). The function to read required samples from file has been moved to otb::SamplingRateCalculator.
The intended multi-image workflow is :
- Use PolygonClassStatistics on each image
- Use MultiImageSamplingRate to compute sampling rates for each image (in input : statistics of each image in XML file)
- Use SampleSelection with each image. The computed rates are set via the parameter strategy.byclass.in)
- Use SampleExtraction for each image.
- Use TrainVectorClassifiers with the list of sample files.
Tests
M Modules/Learning/Sampling/test/CMakeLists.txt A Modules/Learning/Sampling/test/otbSamplingRateCalculatorListTest.cxx M Modules/Learning/Sampling/test/otbSamplingTestDriver.cxx M Modules/Applications/AppClassification/test/CMakeLists.txt
Tests have been added for the class otb::SamplingRateCalculatorList and for the application MultiImageSamplingRate.
Documentation
Documentation to be added in CookBook.
Additional notes
The different behaviours for each mode is described here.
Ti( c ) and Ni( c ) refers resp. to the total number and needed number of samples in image i for class c. Let's call L the total number of image.
- Strategy = all
- Same behaviour for all modes proportional, equal, custom : take all samples
- Strategy = constant (let's call M the global number of samples per class required)
- Mode = proportional : For each image i and each class c,
- Ni( c ) = M * Ti( c ) / sum_k( Tk(c) )
- Mode = equal : For each image i and each class c,
- Ni( c ) = M / L
- Mode = custom : For each image i and each class c,
- Ni( c ) = Mi where Mi is the custom requested number of samples for image i
- Mode = proportional : For each image i and each class c,
- Strategy = byClass (let's call M(c) the global number of samples for class c)
- Mode = proportional : For each image i and each class c
- Ni( c ) = M(c) * Ti( c ) / sum_k( Tk(c) )
- Mode = equal : For each image i and each class c,
- Ni( c ) = M(c) / L
- Mode = custom : For each image i and each class c,
- Ni( c ) = Mi(c) where Mi(c) is the custom requested number of samples for image i and class c
- Mode = proportional : For each image i and each class c
- Strategy = smallest class
- Mode = proportional : the smallest class is computed globally, then this smallest size is used for the strategy constant+proportional
- Mode = equal : the smallest class is computed globally, then this smallest size is used for the strategy constant+equal
- Mode = custom : the smallest class is computed and used for each image separately