Request for Comments-38: Streaming for OGR data
- 1 [Request for Comments - 38] Streaming for OGR data
- 1.1 Status
- 1.2 Content
- 1.2.1 What changes will be made and why they would make a better Orfeo ToolBox?
- 1.2.2 When will those changes be available (target release or date)?
- 1.2.3 Who will be developing the proposed changes?
- 1.3 Community
- 1.4 Corresponding Requests for Changes
[Request for Comments - 38] Streaming for OGR data
- Author: Guillaume
- Submitted on 04.08.2017
- Open for comments / development
What changes will be made and why they would make a better Orfeo ToolBox?
This is a proposal to provide streaming mechanism for OGR data structures, which contain geometries (points, lines, polygons, ...).
Here is the outline of the development plan.
A DataObject for geometries
In order to use the ITK pipeline streaming mechanism, the
OGRFeatures need to be encapsulated in a
itk::DataObject. There are already 2 classes that are compatible:
otb::ogr::DataSource: wrapper class around GDALDataset/OGRDataSource.
otb::GeometriesSet: wrapper class to manipulate transparently
The usage of class
otb::ogr::DataSource is more straighforward, but
otb::GeometriesSet comes with a set of base filters that already handles the creation and processing of Layers.
Some prototyping work has been done based on
otb::GeometriesSet, see streaming_ogr.
The streaming mechanism has been partially implemented in this class, by overriding some generic functions:
virtual void SetRequestedRegionToLargestPossibleRegion()
virtual void CopyInformation(const DataObject *data)
virtual void SetRequestedRegion( const DataObject *data )
A region for geometries
First step is to define a region that can apply to a set of geometries. This region can be used, like with image regions, to define:
- the total size of the set (i.e. largest region)
- the subset to process (i.e. requested region)
- the subset stored (i.e. buffered region)
The different geometries are stored in a
OGRLayer, which behaves as a simple list of geometries. A simple type of region can be defined from a continuous range of features :
- the FId of the first
- the number of features in the range
Let's call this region a Range region.
Since the features also have a spatial extent, a Spatial region can also be defined using rectangular boxes:
- the coordinates of the top-left corner of the region
- the coordinates of the lower-right corner of the region
The features "inside" this region can be defined as the features intersecting the rectangular box. The only difference is that a geometry can be inside several disjointed regions.
These two types of region are streaming compatible. But there could be more ways to partition the set of geometries (by geometric shape, by field value,...). It raises the question : what is the best implementation for theses regions? Obviously, we will need a common interface for these different types of region. I can think of two implementations:
- A tagged-union: we create a single class that has several modes (Range/Spatial/...). All the functions using this region class will have switch/case sections to handle the different modes. In this case, the region class is not template, it can be added as member of
GeometriesSet. This is the solution used in the prototype here.
- A base class + N deriving classes for each type of region: in this case, we have to define virtual methods that will be overriden to adapt the behavior of each type of region. With this solution, the storage in
GeometriesSetis less straightforward: either store a pointer to the base class, or use a template for the region type (
As the 2 region types proposed (range & spatial) should be compatible for most of the use cases, I would go with the first solution.
Common filter for geometries
The OGRDataSource are different from images because they don't have Reader/Writer classes. When an OGRDataSource is created on a file, it is actually a Reader/Writer itself. When created without a file, it is a DataObject in-memory.
It means that the wrapper class
otb::GeometriesSet will have to do the job of reader and writer. The future pipeline will look like this:
+---------+ +---------+ GeometriesSet1-->| Filter1 |-->GeometriesSet2-->| Filter2 |-->GeometriesSet3 +---------+ +---------+
In this example:
- GeometriesSet1 ("the reader") will have to find what is the largest region. It can be detected as a "reader" because it has no m_Source.
- GeometriesSet2 is an in-memory dataset.
- GeometriesSet3 ("the writer") will be in charge of the streaming execution: split the largest region, propagate each block and call UpdateOutputData()
The base class to derive the filters can be
otb::GeometriesToGeometriesFilter. This class contains helper functions to explore the geometries in a
otb::GeometriesSet. They use the available iterators over a Layer (with begin() and end()). In order to process a subset of the dataset, we may need different iterators. I can see 2 options:
- Create a new iterator that will take a RangeRegion as parameter. Then we may have to adapt existing code in
- Use a strategy similar to the
otb::ogr::Layer::SetSpatialFilter(): when the spatial filter is set on a Layer, only the Features in that filter are accessible through begin()/end() standard iterators. We could define a SetRangeFilter() in the class Layer, so that begin()/end() will only run through the desired range. It would make the handling of requested regions transparent to the filter, and the behavior would be the same with SpatialRegions.
My preference goes to the second option.
To test and validate the pipeline, there is also a kind of "UnaryFunctorImageFunctor" for geometries : otb::DefaultGeometriesToGeometriesFilter.
There isn't a lot to implement in the filter base class:
- the setting of the output buffered region after processing
- multi-threading using temporary Layers (as in PersistentSamplingFilterBase).
Integration into existing pipelines
One side target would be to integrate this framework into existing pipelines, such as:
- the sampling framework: which already uses streaming using a support image and spatial regions. This is difficult because the pipeline is a mix of images and vectors.
- the segmentation framework
This streaming framework could also allow a better implementation of rasterization/vectorization applications.
When will those changes be available (target release or date)?
Probably not before release 6.4.
Who will be developing the proposed changes?
List here important comments (possibly links) received.
List here community members that support this RFComments.
Corresponding Requests for Changes
No request for change yet.