Metadata handling

From OTBWiki
Jump to: navigation, search

Introduction

Metadata handling is currently messy in OTB. We need a better way to organize the information, to make it easier to use, generate and export information linked to objects (image and vector data). The metadata covers the information linked to images (sensor model parameters, date, time of acquisition, radiometric correction parameters...) and the information linked to vector data (class of a polygon, name of a street,...)

History and current situation

Images

All this information is currently stored in the itk::MetaDataDictionary (http://www.orfeo-toolbox.org/doxygen-current/classitk_1_1MetaDataDictionary.html) which is tied to any itk::Object. The class otb::MetaDataKey (http://www.orfeo-toolbox.org/doxygen-current/classotb_1_1MetaDataKey.html) was created to facilitate the identification of the correct keynames.

Access to the information in this structure is currently mostly done through an impractical way:

std::string metadata;
if (dict.HasKey(MetaDataKey::ProjectionRefKey))
 {
   itk::ExposeMetaData<std::string>(dict, static_cast<std::string>(MetaDataKey::ProjectionRefKey), metadata);
   return ( metadata );
 }


Similarly, adding metadata is done through a call to EncapsulateMetaData. Both are defined here http://www.orfeo-toolbox.org/doxygen-current/itkMetaDataObject_8h-source.html

It is necessary to explore if the use of the [] operator and the MetaDataObject could led to a better and more practical syntax.


Originally, the method to retrieve the information from the dictionary were directly member of the otb::Image and otb::VectorImage classes. This unnecessarily complexify these classes and increase dependencies.

To better separate the access, the otb::ImageMetadataInterface (http://www.orfeo-toolbox.org/doxygen-current/classotb_1_1ImageMetadataInterface.html) was created and works directly on the itk::MetaDataDictionary, defining accessors for the most common requests. This class is used in the retrieval of the radiometry correction parameters for example in http://www.orfeo-toolbox.org/doxygen-current/otbImageToLuminanceImageFilter_8h-source.html

The syntax and usage of this latter class are still not intuitive and need to be improved.

Internally, in the itk::MetaDataDictionary, all information retrieved from ossim are directly stored as an otb::ImageKeywordlist (http://www.orfeo-toolbox.org/doxygen-current/classotb_1_1ImageKeywordlist.html) which is simply a storage place for an ossimKeyworklist. The fact to carry an ossimKeywordlist have both avantages and drawbacks: there is no convertion needed to store the data, so we are sure that all the information is still here, but the operation to read a given parameter are specific to ossim.

Vector Data

The structure developed for the VectorData metadata is very similar to what happened for the images: a otb::VectorDataKeywordlist (http://www.orfeo-toolbox.org/doxygen-current/classotb_1_1VectorDataKeywordlist.html) has been defined. This class encapsulates OGRFieldDefn and OGRField which are the OGR structures to handle the metadata.

Access to the metadata have been added to the otb::DataNode (http://www.orfeo-toolbox.org/doxygen-current/classotb_1_1DataNode.html), which appears to be the same error that was done on the image before. This point highlight the importance to provide generic and practical accessor directly from the object, without the need to go into the details of the implementation.

What we would like to have

We should make clear here whether we just want a simple way to access meta-data (syntactic sugar) or if we want to go further and make meta-data modeling. The second option seems more interesting, but needs to be rather complete in terms of the meta-data we want to deal with. And we also have to make sure that it really makes sense.

The first question that may arise is : should we have a single model for vector and raster meta-data?

If no meta-data modeling is needed, the most flexible way seems to keep the way meta-data are stored right now (dictionary) and concentrate on the syntactic sugar aspect. In this case, the different aspects that have to be analyzed are:

  • What is an intuitive syntax? Do we want to use the [] operator? Do we prefer accessors per field, etc.
  • How to find the good trade-off between flexibility (easy to add new meta-data fields) and fool-proofing (don't let the user call for a non existent field)?


New implementation details

It is important to note that meta-data are usually strings because this is the "most generic" way to store different data types. However, this introduces some complexity (from and to string translation). I would be nice to store each meta-data field using the right type for it (date, coordinates, names of sensors, etc.).

If we try to keep some coherency with the ITK style, we should use acessors for the meta-data (allow to fool proof your code). However, the macro mechanism use in ITK for the creation of setters/getters to class attributes is not flexible enough. We should write the appropriate macros for each type of meta-data field.

One solution to that can be the use of type lists. In this way, the meta-data is just a list of types (date, coordinates, etc.). The accessors can be automatically generated by the compiler. We can also force a type of data (image, vector) to only have the meta-data which makes sense for it (no resolution for a vector data), while keeping the same approach in terms of data modeling.

As a conclusion, we could define a set of types for describing meta-data (acquisition date, projection, etc.) and build type lists for the meta-data of each specific type of object (image, vector data). The automatic code generation capabilities of the compiler could produce the accessors for the meta-data from the object (image->GetAcquisitionDate()), so that the meta-data access is checked at compile time.


Implementation

OTB is handling several captors metadata reading. Thus, the file otbImageMetadataInterface is becoming a huge one. To simply the file, we can create for each captor a specific file to read the captor metadata. All those classes can heritate from a virtual one that gathers every needed method. otbImageMetadataInterface will just have to gathered all specific captor files.