Object Based Image Analysis whitepaper

From OTBWiki
Jump to: navigation, search

Introduction

One of the main goal for OTB 3.14 is to enhance Object Based Image Analysis support in OTB.

We would like to provide an extended set of tools to perform OBIA tasks within the application framework developed in OTB (which opens interesting tracks, like using OTB to perform OBIA in QGIS forinstance).

Of course we would like to do this in the "spirit of OTB" way, which means :

  • We would like to be able to peform OBIA in sensor geometry and to produce outputs understandable by GIS systems,
  • We want scalability and performances, to perform large scale OBIA with streaming and multi-threading,
  • And of course we want to provide each user with the appropriate level of access : filters for developers, applications for users ...

To do so, we have to address several issues, the two most straightforward being :

  1. How do we perform large scale segmentation efficiently ?
  2. How do we represent and manipulate objects ? How do we convert efficiently between representations ?

There are already a set of low-level tasks in progress that you can see in Jira. Two of them are to invesgitage Gdal API to rasterize and polygonize properly, and another is to try to refactor our star segmentation algorithm (MeanShift) in a more efficient way. We also have a task to design a filter that perform streamed segmentation + polygonalisation of a large image. This will yeld splitted polygons because of streaming, but it is a start.

This page intends to log ideas and track to follow to address the issues with OBIA and large scale image segmentation.

What has been done so far

  1. We now have a non-streamed filter that polygonizes exactly (one pixel is vectorized as a unitary square) a label image into a VectorData using GDAL capabilities. The output VectorData also keeps track of the original label value in a dedicated field
  2. We now have two streamed filters that rasterize properly VectorData based on GDAL capabilities :
    1. One burns a color into a multi-band image and is dedicated to rendering (we could switch the opengl rendering for this in the viewer for instance)
    2. One precisely write the labels from a given field of the VectorData into the image without anti-aliasing, producing a label image
  3. These filters can be combined to obtain an exact conversion of a label image into a vector data and vice-versa (exact means that even the labeling is exact, but does not take into account attributes of features in the VectorData). We have tests proving that this conversion is lossless in both ways.
  4. Conversion between LabelMap and label images or VectorData is straightforward with these new filters and will be exact. We are working at implementing this.
  5. We have a persistent filter templated by a segmentation filter that performs for each stream a segmentation and a polygonalisation using the new filters, and concatenates the output VectorData from each stream(OBIA/otbStreamingVectorizedSegmentation). This can be use to segment a large image, provided that the output vector data holds in RAM. The usual assumption that has hold for long in OTB stating that "a VectorData fits in memory" can be false.
  6. We have an on-going new implementation of the mean-shift algorithm which we hope to be faster, more versatile and less memory consuming than the previous one.
  7. We are currently conducting experiment to segment large image so as to measure the amount of RAM needed. It is likely that we will switch to an OGR based interface to allow progressive dump to disk or database (see next section).
  8. We are currently analysing the multi-scale option for polygon stiching, with the following idea (similar to the one suggested in a following section):
    1. Build a multi-scale image pyramid
    2. Segment all scales with the same streaming (tiling) scheme: this only takes 1.33 times more than the full resolution segmentation. To do this we use the new segmentation-vectorization persistent filter
    3. At each scale, tag segments (or polygons) that touch tiles boundaries
    4. For a given object, try to find the first coarser scale for which it is not splitted by streaming (this gives a full object, but with imprecise boundaries)
    5. Use this coarse segmented object to determine which of the boundary polygons of the finer scale are part of this object
    6. Merge these polygons with the coarse polygon to refine its boundaries.

This is barely more that an idea for now, but any feedback is welcome !

Large Scale Segmentation (mono-scale approach)

With the ongoing tasks, we will soon have the following filter :

  1. Stream a large image, and for each stream:
    1. Apply a segmentation algorithm
    2. Vectorize the results in a VectorData and keep-it
  2. Concatenate all VectorData into one (this is already almost done by this

This filter will allow to segment and vectorize (with errors on polygons due to streaming) large image.

  • Issue:* The pitfall with it is that the whole output VectorData has to hold in memory, and we do not know up to which image size this can be considered true. Imagine a very large image yelding billions of polygons, can we assess that this set of polygons is still sufficiently small to hold in memory ? Can this set of polygons be efficiently manipulated wihtin OTB ? The first task after coding the strategy above will be to check this on real data.
  • Possible solution:* A possible solution we discussed during the meeting would be to dump polygons to ogr at each tile (which will in turn dump them to file or to a database for instance). In this case, the filter will not return a VectorData, but will accept an ogr descriptor (file path or database parameters) which will be updated during streaming. We can maybe afford the two behaviour (building a VectorData or directly dumping to disk). On one hand, this is in contradiction with the processing pipeline architecture, but on the other hand it might be much more efficient to manipulate.

The rule of thumbs would be : VectorData are for small sets and are compatible with pipeline (since they are DataObjects). They are convenient for setting training areas for supervised classification for instance. Direct OGR dump is efficient to extract a large amount of polygons from images.

Simple stitching strategy

How do we perform simple polygons stiching on streaming regions borders ?

Alternate multi-scale scheme

One of the OTB contributor has suggested a different strategy to overcome the streaming issue.

The starting point is a multi-resolution pyramid generated from the original image. You start with the lowest resolution level, which you segment, then you recursively apply the meanshift to the obtained regions to the next level of the pyramid, etc. At each step, you can store the obtained regions into a data base, so that you don't have memory problems. The approach can also be massively parallel, for each tree branch. No tiling is needed, because of the multi-resolution approach. Some code for the framework is available here.

Representing and converting data for OBIA

Currently, we have three ways to represent data for OBIA: - Images of unique label (not compatible with large data) - VectorData (or file or database) - LabelMap

We need to have efficient exact conversion between those objects and to decide which one we need for what.