OptimizingVectorImageProcessing

From OTBWiki
Revision as of 17:44, 26 May 2015 by Luc.hermitte (Talk | contribs) (First draft)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

I'll trace here the process I've followed to optimize the processing of an otb::VectorImage.

The context

We start from a mono-band image that we resample with an itk::LinearInterpolateImageFunction used from an otb::StreamingResampleImageFilter.

When the image processed is considered as an otb::Image, the execution take around 20 seconds. When the image processed is considered as an otb::VectorImage, the execution takes around 1 minute and 5-10 seconds. Nothing here is really unexpected as Vector Image are known be inefficient. Aren't they?

Let's see what callgrind have to say about this situation.

BeforeOptims-Warp.png

As we can see, most of the time is spent in memory allocations and releases. If we zoom into itk::LinearInterpolateImageFunction<>::EvaluateOptimized(), we can see that each call to this function is accompanied with 4 calls to itk::VariableLengthVector::AllocateElements(), and 6 to delete[].

BeforeOptims-LinerInterp.png

(Actually a few optimizations have already been done, but they are not enough)

The problems

New pixel value => new pixel created

Temporaries elimination

Pixel casting => new pixel created

Pixel assignment => reallocation + copy of old values

dynamic polymorphism

Proxy Pixels

MT safety of cached pixel

The solutions

The results

Iterative arithmetic

The solution with:

  • thread-safe cached pixels,
  • iterative pixel arithmetic (i.e. m_val01 = GetPixel(...); m_val02 = GetPixel(...); m_val02 -= m_val01; m_val02 *= d; m_val02 += mval01;
  • EvaluateOptimized() that returns pixel values through an [out] parameter.

runs in 20seconds (with 1 thread), and we can see in callgrind profile that no allocation (nor releases) are performed on each pixel.

AfterOptims-Warp.png

Expression Templates

Perspectives