Refactoring of the classification chain

From OTBWiki
Jump to: navigation, search

Currently, the classification tools give very decent performances in most cases with the default setup, but they lack fine tuning and flexibility especially for those samples related steps before calling the training algorithms.

These are some of the issues that could be taken into account:

Sample source:

  • images
  • vector data (GIS files)
  • CSV files
  • other

Split sampling, sample normalisation, learning and validation (at the application level)

  • needs the definition of sample I/O format and drivers

Define sample strategies

  • raw vs balanced
  • systematic (1/N) vs random vs stratified
  • current one is balanced random

Allow for the generation of disjoint training and validation sample sets

  • amount of overlap between the sets (usually 0)
  • ratio between training and validation samples
  • the validation training set may not need balancing even if the training one does

Define normalisation strategies

  • centered-reduced
  • min-max
  • other

Define stynthetic sample generation strategies

  • add random noise
  • combine exiting samples
  • other