Request for Comments-20: New sampling module for the classification framework
Contents
Status
- Author: Victor Poughon
- Submitted on 2016/01/19
- Open for comments
Content
What changes will be made and why would they make a better Orfeo ToolBox?
This RFC introduces a new OTB module that offers a framework for selecting and extracting samples to be used for classification models training. It is the continuation of work done by Paul Gely: https://github.com/PaulGely/AppSampling
The objective is to develop filters and applications that are modular and reusable, support different sampling strategies:
- Exhaustive
- Random
- Periodic
- Periodic with randomness
- Stratified
- (Possibly more)
MaskedIteratorDecorator
Decorate an existing iterator to the same behavior, but skip masked pixels. Developed for use in PolygonClassStatisticsFilter, but reusable.
PolygonClassStatisticsFilter
Input: Image metadata, shapefile, Mask (optional)
Output: Class Statistics (xml format)
This filter computes statistics over the labelled classes using a persistent filter. It does not need to load the image content, only its metadata. An optional input to this filter is a mask. The statistics are only computed where the mask is valid (!=0). This enables working with no-data or other masks.
SampleSelectionFilter
Input: Image metadata, class statistics, sampling strategy parameters, shapefile
Output: Sample list (OGR GDAL format)
SampleExtractionFilter
Input: Sample list, Image
Output: Samples (libSVM format, maybe OGR)
The SampleExtractionFilter could use an update mode on the input OGR sample list (adding the pixel value as a field).
Applications
New applications to be developed:
- PolygonClassStatistics: Exposes PolygonClassStatisticsFilter
- SampleSelection: Exposes SampleSelectionFilter
- SampleExtraction: Exposes SampleExtractionFilter
- ImageSampling: Exposes the complete sampling pipeline
- ConvertSampleFile: Convert a sample file to another format (OGR, libSVM, CSV).
Existing classifiation applications:
- Retrofit to use the new sampling module, keep the same user interface.
Perspectives
This architecture should support extensions for object based sampling and distributed computing.
When will those changes be available (target release or date)?
Target release is 5.4.
Who will be developing the proposed changes?
TBD