Refactoring of the classification chain
From OTBWiki
Currently, the classification tools give very decent performances in most cases with the default setup, but they lack fine tuning and flexibility especially for those samples related steps before calling the training algorithms.
These are some of the issues that could be taken into account:
Contents
Sample source:
- images
- vector data (GIS files)
- CSV files
- other
Split sampling, sample normalisation, learning and validation (at the application level)
- needs the definition of sample I/O format and drivers
Define sample strategies
- raw vs balanced
- systematic (1/N) vs random vs stratified
- current one is balanced random
Allow for the generation of disjoint training and validation sample sets
- amount of overlap between the sets (usually 0)
- ratio between training and validation samples
- the validation training set may not need balancing even if the training one does
Define normalisation strategies
- centered-reduced
- min-max
- other
Define stynthetic sample generation strategies
- add random noise
- combine exiting samples
- other