Request for Comments-17: Pledge for a greener dashboard
- 1 Status
- 2 Content
- 2.1 What changes will be made and why they would make a better Orfeo ToolBox?
- 2.2 When will those changes be available (target release or date)?
- 2.3 Who will be developing the proposed changes?
- 3 Community
- 4 Corresponding Requests for Changes
- Author: Julien Michel
- Submitted on 11.12.2015
- Open for comments
What changes will be made and why they would make a better Orfeo ToolBox?
While our reference dashboard platform is all green as of today, the next best platform shows 54 failing tests. In the worst case we have no less than 139 failing tests. For the record, this is what a dashboard is supposed to look like.
The purpose of this RFC is to get OTB dashboard closer to the all green state. It will be a collection of unrelated patches in OTB and OTB-Data, composed of :
- Baseline updates
- Tests rewrite
- Code fixes
The idea is to focus on what can be tested, what is really our responsibility to test, and what accuracy can be expected. I propose that patches related to a test family go to a dg-testname dedicated git branch. This way, it will be easy to sort out which patch is related to which test.
To address the many failing tests, I propose the following strategy. We should focus on:
- Platform that are closer to the all green state
- Tests that have the highest impact (i.e. tests failing on most platforms go first)
I therefore propose to start with this platform.
We will focus on the following tests:
- ioTvImageKeywordlist* from OSSIMAdapters module
- prTvSensorModel* from OTBTransform module
- prTvForwardBackwardProjectionConsistency* from OTBTransform module
- prTvGenericRSTransformGenericConversionCheckingFromGCP_* from OTBProjection module
- prTvSensorModelGrid* and prTvForwardSensorModelGrid* from OTBTransform module
- prTvOrthoRectification* from OTBProjection module
- ioTvVectorImageReadingInfo_* from ImageIOBase module
- apTvReadImageInfo* from AppImageUtil module
Those tests represent almost 90% of all remaining failing tests. They do not show anything useful since they have been failing almost everywhere for years.
|Test purpose||Checks we can instanciate Forward and Inverse SensorModel
Checks that image point (10,10) can be projected foward, and then backward
|Checks roundtripping error (img -> geo -> img) of random points for GenericRSTransform||Checks we can instanciate Forward and Inverse SensorModel
Builds a grid of forward round-tripping (x,y) errors (img -> geo -> img)
|Checks we can instanciate Forward SensorModel
Builds a grid of forward locations (x,y) errors (img -> geo)
|Checks the accuracy of forward/inverse sensor model wrt. embedded GCPs||Checks that orthorectification filter succeeds|
|Success condition||Sensor model instanciated (otherwhise test return false)
Point transformations are written to a text file checked wrt baseline
|Threshold on distance between source and output points (test param)||Image comparison of error grids with baseline||Image comparison of location grids with baseline||Internal thresholds on residuals (different threshold for geo and img)
Geo residual with Haversine distance
|Comparison with baseline ortho-image|
|Failing (all platforms)||120 tests failing||35 tests failing||50 tests failing||30 tests failing||66 tests failing||102 tests failing|
|Test purpose||Checks that we can read a keywordlist, write it to geom, and read the geom again||Checks the output of the ReadImageInfo app||Checks for printself|
|Success condition||kwl->geom and kwl->geom->kwl->geom ascii compared to baseline||Baseline comparison||Baseline comparison|
|Failing (all platforms)||99 tests failing||33 tests failing||44 tests failing|
Looking at the table above, we can draw the following conclusion:
- There is a lot of overlap between what is tested in each test, and no clear responsibility for each test family
- Some tests are questionable (are embedded GCPs always accurate ? Why testing this (10,10) pixel so intensively ?)
The test should verify a clear contract of what we want ForwardSensorModel, InverseSensorModel and GenericRSTransform to do:
- Instanciate properly with image metadata (keywordlist)
- GenericRSTransform encapsulate Inverse/Forward SensorModel: check that the encapsulation does not modify results
- results should be plausible (no NaN and no clearly out of bound results)
- Round tripping error should be accurate enough (<pixel ?)
- Stability should be checked (i.e. results from sensor model should not vary with new versions of OTB or OSSIM with respect to reference data generated when validating the model)
- those tests could be made independent of large inputs by generating the appropriate geom files,
- Assessing the absolute accuracy of sensor models should not be the concern of regression testing.
I suggest to remove all the previous tests and rewrite one consistent test checking for the 5 points of the contract. Only (3) and (5) need reference data.
This is the contract all sensor models should enforce. If a sensor model does not enforce it, then it should not be exposed to users: this may lead to the (at least temporary) removal of some SAR sensor models for instance.
The test could have a debug mode where additional data are produced (such as ogr layers of reference and produced points to help understand errors).
Again, we need to establish a clear contract of what metadata reading/writing should do.
ioTvVectorImageReadingInfo only checks PrintSelf against a baseline file, and most failure are due to incorrect comparison of baseline:
Nb lines differents : 2 ------------------------------- Base << Pointer: 0 Test >> Pointer: 0x0 ------------------------------- Base << Reference Count: 1 Test >> Pointer: 0x0
The same goes for apTvReadImageInfo : the test driver gets confused because it tries to ignore line order which leads to incorrect lines matching. We can maybe fix this, but then we are left with fixing one tolerance value to match spacing and lat,lon coordinates. The thing is while we need to check that the app execute correctly, we probably do not need to check its outputs.
Then comes the ioTvImageKeywordlist tests. What do we need to check ?
- The official products are correctly read and translated to a keywordlist (we can check for keyword that should be present)
- We can write and read again this keywordlist, without errors.
Do we need to check if the keywordlist content changes over time ? I do not think so. If a keywordlist changes over time, and this change has impact on OTB features, then it should be detected by other tests (sensor modeling, radiometric calibration, metadata interface).
I think we can therefore rewrite the iotvImageKeywordlist test to check that all necessary keywords are read from the product (wrt otb use) and have plausible values (no NaN, bound checking if possible). The test will then write/read to geom file and check that values are the same. No baseline needed (but this time dependency to large input is mandatory).
This test will check that we can safely translate to geom files, and sensor modelling tests will check that those geom files gives correct sensor models. This strategy could be applied to other tests depending on large inputs.
When will those changes be available (target release or date)?
Those changes can be available for release 5.4.
Who will be developing the proposed changes?