Request for Comments-17: Pledge for a greener dashboard

From OTBWiki
Jump to: navigation, search

Status

  • Author: Julien Michel
  • Submitted on 11.12.2015
  • Open for comments

Content

What changes will be made and why they would make a better Orfeo ToolBox?

While our reference dashboard platform is all green as of today, the next best platform shows 54 failing tests. In the worst case we have no less than 139 failing tests. For the record, this is what a dashboard is supposed to look like.

The purpose of this RFC is to get OTB dashboard closer to the all green state. It will be a collection of unrelated patches in OTB and OTB-Data, composed of :

  •  Baseline updates
  • Tests rewrite
  • Code fixes

The idea is to focus on what can be tested, what is really our responsibility to test, and what accuracy can be expected. I propose that patches related to a test family go to a dg-testname dedicated git branch. This way, it will be easy to sort out which patch is related to which test.

To address the many failing tests, I propose the following strategy. We should focus on:

  1. Platform that are closer to the all green state
  2. Tests that have the highest impact (i.e. tests failing on most platforms go first)

I therefore propose to start with this platform.

We will focus on the following tests:

  • ioTvImageKeywordlist* from OSSIMAdapters module
  • prTvSensorModel* from OTBTransform module
  • prTvForwardBackwardProjectionConsistency* from OTBTransform module
  • prTvGenericRSTransformGenericConversionCheckingFromGCP_* from OTBProjection module
  • prTvSensorModelGrid* and prTvForwardSensorModelGrid* from OTBTransform module
  • prTvOrthoRectification* from OTBProjection module
  • ioTvVectorImageReadingInfo_* from ImageIOBase module
  • apTvReadImageInfo* from AppImageUtil module

Those tests represent almost 90% of all remaining failing tests. They do not show anything useful since they have been failing almost everywhere for years.

Purpose analysis
Projection related tests
Test name prTvSensorModel prTvForwardBackwardProjectionConsistency prTvSensorModelGrid prTvForwardSensorModelGrid prTvGenericRSTransformGenericConversionCheckingFromGCP prTvOrthoRectification
Test purpose Checks we can instanciate Forward and Inverse SensorModel

Checks that image point (10,10) can be projected foward, and then backward

Checks roundtripping error (img -> geo -> img) of random points for GenericRSTransform Checks we can instanciate Forward and Inverse SensorModel

Builds a grid of forward round-tripping (x,y) errors (img -> geo -> img)

Checks we can instanciate Forward SensorModel

Builds a grid of forward locations (x,y) errors (img -> geo)

Checks the accuracy of forward/inverse sensor model wrt. embedded GCPs Checks that orthorectification filter succeeds
Success condition Sensor model instanciated (otherwhise test return false)

Point transformations are written to a text file checked wrt baseline

Threshold on distance between source and output points (test param) Image comparison of error grids with baseline Image comparison of location grids with baseline Internal thresholds on residuals (different threshold for geo and img)

Geo residual with Haversine distance

Comparison with baseline ortho-image
Failing (all platforms) 120 tests failing 35 tests failing 50 tests failing 30 tests failing 66 tests failing 102 tests failing
Metadata related tests
Test name ioTvImageKeywordlist apTvReadImageInfo ioTvVectorImageReadingInfo
Test purpose Checks that we can read a keywordlist, write it to geom, and read the geom again Checks the output of the ReadImageInfo app Checks for printself
Success condition kwl->geom and kwl->geom->kwl->geom ascii compared to baseline Baseline comparison Baseline comparison
Failing (all platforms) 99 tests failing 33 tests failing 44 tests failing


Proposed solution
Projection related tests

Looking at the table above, we can draw the following conclusion:

  • There is a lot of overlap between what is tested in each test, and no clear responsibility for each test family
  • Some tests are questionable (are embedded GCPs always accurate ? Why testing this (10,10) pixel so intensively ?)

The test should verify a clear contract of what we want ForwardSensorModel, InverseSensorModel and GenericRSTransform to do:

  1. Instanciate properly with image metadata (keywordlist)
  2. GenericRSTransform encapsulate Inverse/Forward SensorModel: check that the encapsulation does not modify results
  3. results should be plausible (no NaN and no clearly out of bound results)
  4. Round tripping error should be accurate enough (<pixel ?)
  5. Stability should be checked (i.e. results from sensor model should not vary with new versions of OTB or OSSIM with respect to reference data generated when validating the model)

Note:

  • those tests could be made independent of large inputs by generating the appropriate geom files,
  • Assessing the absolute accuracy of sensor models should not be the concern of regression testing.

I suggest to remove all the previous tests and rewrite one consistent test checking for the 5 points of the contract. Only (3) and (5) need reference data.

This is the contract all sensor models should enforce. If a sensor model does not enforce it, then it should not be exposed to users: this may lead to the (at least temporary) removal of some SAR sensor models for instance.

The test could have a debug mode where additional data are produced (such as ogr layers of reference and produced points to help understand errors).

Metadata related tests

Again, we need to establish a clear contract of what metadata reading/writing should do.

ioTvVectorImageReadingInfo only checks PrintSelf against a baseline file, and most failure are due to incorrect comparison of baseline:

Nb lines differents : 2
  -------------------------------
  Base <<       Pointer: 0
  Test >>       Pointer: 0x0
  -------------------------------
  Base <<       Reference Count: 1
  Test >>       Pointer: 0x0

The same goes for apTvReadImageInfo : the test driver gets confused because it tries to ignore line order which leads to incorrect lines matching. We can maybe fix this, but then we are left with fixing one tolerance value to match spacing and lat,lon coordinates. The thing is while we need to check that the app execute correctly, we probably do not need to check its outputs.

Then comes the ioTvImageKeywordlist tests. What do we need to check ?

  • The official products are correctly read and translated to a keywordlist (we can check for keyword that should be present)
  • We can write and read again this keywordlist, without errors.

Do we need to check if the keywordlist content changes over time ? I do not think so. If a keywordlist changes over time, and this change has impact on OTB features, then it should be detected by other tests (sensor modeling, radiometric calibration, metadata interface).

I think we can therefore rewrite the iotvImageKeywordlist test to check that all necessary keywords are read from the product (wrt otb use) and have plausible values (no NaN, bound checking if possible). The test will then write/read to geom file and check that values are the same. No baseline needed (but this time dependency to large input is mandatory).

This test will check that we can safely translate to geom files, and sensor modelling tests will check that those geom files gives correct sensor models. This strategy could be applied to other tests depending on large inputs.

When will those changes be available (target release or date)?

Those changes can be available for release 5.4.

Who will be developing the proposed changes?

I will.

Community

Comments

Support

Corresponding Requests for Changes