OGRSS Data Fusion

From OTBWiki
Revision as of 15:14, 24 July 2013 by Jonathan guinet (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Submission

Proposed framework

We develop a complete framework to exploit jointly hyperspectral and Lidar data. It consists in extracting relevant features from images, combine them and perform supervised classifications based on this set of features. Then, classification maps are combined using classifier fusion methodology.

The main steps of the proposed framework are:

Feature extraction

We exploit several features, from both data using:

- Dimensionality reduction using Principal Component Analysis (PCA) - Radiometry index (NDVI) - Band selection - We perform also Mean-Shift segmentation from both lidar and hyperspectral data and then extract several features over segments. On each segment we compute: first order statistics as well as geometric features like: area, perimeter and elongation...

Supervised classification

This set of features is combined and concatenated which allows to perform training using provided samples and extracted features. Two classification algorithms are used: Support Vector Machine and Random Forests. We combine these supervised classifications with an other method which used a connected-component labeling approach based on user define criteria (cf [1]). We compute 2 connected component segmentations based on the Lidar data and a spectral angle map.

Fusion of classifiers

We therefore combine these classification maps using fusion of classifiers. Two algorithms have been tested:

- Majority voting : for each pixel the class with the highest number of votes is selected - Fusion method using the Dempster-Shafer theory and based on [2]

Implementation

This methodology is based on algorithms all available in the ORFEO ToolBox library [3]. ORFEO Toolbox (OTB) is distributed as an open source library and offers particular functionalities for remote sensing image processing in general and for high spatial resolution images in particular. OTB is funded by the French Space Agency (CNES) and distributed under a free software license CeCILL (similar to GPL) to encourage contribution from users and to promote reproducible research.

Reproducibility

The product of this research will be a paper describing the methodology with the full computational environment used to produce the results.

References

[1] http://www.orfeo-toolbox.org/CookBook/CookBooksu108.html#x139-7820004.9.1

[2] L. Xu, A. Krzyzak, and C.Y. Suen,

   "Methods of combining multiple classifiers and their applications to handwriting recognition,"
   Systems, Man and Cybernetics, IEEE Transactions on vol. 22, no.3, pp. 418-435,
   May/Jun 1992.

[3] http://www.orfeo-toolbox.org/

Notes

Work schedule

  • From 5th April Meeting
  • Data :
    • push label map
    • create a link to DFC official data (via. Cmake Option)
    • update VD (create SandBox repository)
  • Processing :
  1. create a first complete chain : pixel based, based on OTB existing modules (OTB-Applications), scripted in python
  2. create the validation benchmark framework (the goal : give a quantitative evaluation of classified data)
  3. enhance the chain :
    1. pre-post processing
    2. dimensionnality reduction -> test MAF
    3. smarter integration of Lidar data (for segmentation and feature extraction to characterize samples)
    4. OBIA scheme integration


  • Misc :
    • question about GT Data OK

Data Description

Tests

Training Samples description

  1. Grass healthy
  2. Grass stressed
  3. Grass synthetic
  4. Tree
  5. Soil
  6. Water
  7. Residential
  8. Commercial
  9. Road
  10. Highway
  11. Railway
  12. Parking lot#1
  13. Parking lot#2
  14. Tennis court
  15. Running track

Processing chain

different strategies (top down, bottom up):

  • OBIA based : segment data and the classify each object(each segmented area) using high level features computed on each object
  • pixel based : segment data at pixel level and then use post processing on classified map (segmentation, majority voting ...)


  • Pre-processing
    • eliminates first bands with spectral "spreading" effect
result


    • pre processing on spectral data : filtering,dimensionnality reduction ....
    • Working with independant samples : whitening the spatial information
    • Evaluate for each class the covariance matrix
    • Reduce the multi-spectral dimension matrix using criterion such as
      • Akaike source separation criterion : working on usefull information.
      • PCA like methods : ICA, MNF ...


  • Extraction
    • take into account the spectral correlation : Mahalanobis distance
    • spectral distance
  • Segmentation
    • Meanshift
  • only lidar data (x100) spatial radius 5 range radius 8 min region size 30
    • TODO add spectral data
    • watershed ...
  • Classification
    • SVM ?
  • PostProcessing
    • majority voting (on pixel based classification, useless if a segmentation step is present)

First Processing chain

  • SVM Classification on spectral data
result

Smarter Processing chain

Fusion of classifiation

OBIA

two main step

  • create object : the diifculty give a segmentation map as accurate as possible
  • extract features : what kind of features have qe to extract (radiometric , shape) -> be carefull of training sample segmentation (official training map has to be extended to take shape into account for example)
    • Min,Max,Mean,Median,Std for lidar image
    • Min,Max,Mean,Median,Std for NDVI image
    • Geometric features (using Matlab and region props, equivalent to itkLabelGeometryImageFilter)

Results

  • explore misclassification
  • preprocessing : add mask on the image border
  • remove spectral band which are subject to atmospheric effects
    • explore atmospheric info :
      • first step ratio of spectral data in or out a cloudy area


result

Classification chain gives better results, if we discard band which present too high ration between cloudy and non cloudy areas.

result


Dimensionnality Reduction

PCA algorithm seems to performs better than other algorithm (for our purpose)


result
result


result
result


Cross Validation classification comparison

Random Forest seems to performs better than SVM (either with different kernel and parameter). especiaaly for some class (soil,residential ?) which are very confused


below classification results for spectral data reduced (10 first bands of PCA)


for SVM classifier (Linear) confusion matrix  :

soil -> 0 7 35 0 80 0 3 2 14 20 1 10 11 0 0

residential ->0 10 66 0 19 0 20 16 51 0 0 0 0 0 0 (51 for road, 66 for synthetic grass)

commercial -> 0 19 55 0 9 0 32 68 0 0 0 0 0 0 0 (55 for synthetic grass)

parking lot 1 ->0 13 48 8 52 0 12 2 13 3 0 30 3 0 0 (48 for synthetic grass 52 soil)


Precision of the different class: [0.98895, 0.646707, 0.309038, 0.947644, 0.377358, 1, 0.185185, 0.623853, 0.631579, 0.885572, 0.994505, 0.75, 0.919786, 1, 0.994681] Recall of the different class: [0.98895, 0.596685, 0.588889, 0.989071, 0.437158, 1, 0.10989, 0.371585, 0.917647, 0.994413, 1, 0.163043, 0.950276, 0.994505, 1] F-score of the different class: [0.98895, 0.62069, 0.405354, 0.967914, 0.405063, 1, 0.137931, 0.465753, 0.748201, 0.936842, 0.997245, 0.267857, 0.934783, 0.997245, 0.997333] Kappa index: 0.720281 Overall accuracy index: 0.738875


for Random Forest (100 trees max depth=30) confusion matrix  :

soil -> 0 0 3 0 173 0 2 1 0 0 0 4 0 0 0

residential -> 0 1 4 0 0 0 166 2 3 0 0 6 0 0 0

commercial -> 0 4 1 0 0 0 3 174 1 0 0 0 0 0 0

parking lot 1 -> 0 3 4 0 12 0 13 2 3 2 2 140 2 1 0


Precision of the different class: [0.989011, 0.931937, 0.928962, 0.994536, 0.935135, 1, 0.887701, 0.940541, 0.948571, 0.983333, 0.989011, 0.921053, 0.989011, 0.989071, 0.994681] Recall of the different class: [0.994475, 0.983425, 0.944444, 0.994536, 0.945355, 0.994505, 0.912088, 0.95082, 0.976471, 0.988827, 0.994475, 0.76087, 0.994475, 0.994505, 1] F-score of the different class: [0.991736, 0.956989, 0.936639, 0.994536, 0.940217, 0.997245, 0.899729, 0.945652, 0.962319, 0.986072, 0.991736, 0.833333, 0.991736, 0.991781, 0.997333] Kappa index: 0.959018 Overall accuracy index: 0.961751

MIs classification

parking lot2 map using SVM classification


result


theses results have to be validated with heterogeneous data (Lidar + PCA (+NDVI ....) )

sur apprentissage ? '

TODO

  • Spectral Info of each sample
  • Useless Band ? (F. liege interview ?)
  • collect information about image data
  • smarter use of multi source data (X-Spe+ Lidar)
    • basic testing : Lidar integration as input of SVM classif
    • 5th April Meeting
  • Data :
    • push label map
    • create a link to DFC official data (via. Cmake Option)
    • update VD (create SandBox repository)
  • Processing :
  1. create a first complete chain : pixel based, based on OTB existing modules (OTB-Applications), scripted in python
  2. create the validation benchmark framework (the goal : give a quantitative evaluation of classified data)
  3. enhance the chain :
    1. pre-post processing
    2. dimensionnality reduction
    3. smarter integration of Lidar data (for segmentation and feature extraction to characterize samples)
      1. basic testing : Lidar integration as input of SVM classif
    4. OBIA scheme integration


  • Misc :


    • meeting : 17th April
  • Agenda
    • question about GT Data
    • first processing chain (pixel based)
    • validation/comparison scheme


Work Schedule

18/04

* Push DATA on large input [OK]
 * Update Mercurial [OK]
 * Broadcast WIKI [OK]
 * add Validation scheme :
  ** Validation with new VD created using segmented map and Matlab [OK] 

19/04

  * Lidar feature generation to enhance ConnectedComponent critera [OK]
  * Matlab Function in order to synthetise spectral data [OK]
  * Broadcast Matlab generated HTML [WIP]
 * add Validation scheme :
  ** add Cross Validate scheme [WIP: Push it into OTB repository - OK]
  ** create dedicated Cross validation scheme in RT Hyper repositery [WIP]

23/04

 * Cross valdiation scheme [OK]
 * clean matlab generated validation map [OK]
 * display cross validation results [WIP]
  ** first results can be seen in DFC repository  : OTB-LargeInput/OGRSS_DFC/Results/CrossValidation/

25/04'

 * Cross validation scheme for RandomForest [OK] -> better results than SVM
 * Dimensionnality reduction comparison [OK] -> it should be better to use simple algoritm (PCA)
 * integration of OBIA in the validation scheme
 * add connected component results to performs fusion of classification [WIP]

26/04

* interpret confusion matrix and mis classification (classification expertise section) [WIP]
* implement OBIA
 

29/04

* generation of shape OBIA map
* try cloud estimator filter to remove cloudy area 

TO DO

 * Add features map extracted using matlab 
 * integrate segmentation map with shape properties 
 * smarter integration of Lidar (min,max,stddev)
 * display all the testings results
 * cloudy area processing
 * identify problematic classes : create specific classification and then fuse them


daily mail with progress report

    • Submition Deadline : May 1st PST

Results

a result example can be found here