Difference between revisions of "OptimizingVectorImageProcessing"

From OTBWiki
Jump to: navigation, search
(First draft)
 
m (The context: English grammar)
Line 7: Line 7:
 
When the image processed is considered as an <code>otb::Image</code>, the execution take around 20 seconds. When the image processed is considered as an <code>otb::VectorImage</code>, the execution takes around 1 minute and 5-10 seconds. Nothing here is really unexpected as ''Vector Image'' are known be inefficient. Aren't they?
 
When the image processed is considered as an <code>otb::Image</code>, the execution take around 20 seconds. When the image processed is considered as an <code>otb::VectorImage</code>, the execution takes around 1 minute and 5-10 seconds. Nothing here is really unexpected as ''Vector Image'' are known be inefficient. Aren't they?
  
Let's see what callgrind have to say about this situation.
+
Let's see what callgrind has to say about this situation.
  
 
[[File:BeforeOptims-Warp.png|800px|center]]
 
[[File:BeforeOptims-Warp.png|800px|center]]

Revision as of 17:44, 26 May 2015

I'll trace here the process I've followed to optimize the processing of an otb::VectorImage.

The context

We start from a mono-band image that we resample with an itk::LinearInterpolateImageFunction used from an otb::StreamingResampleImageFilter.

When the image processed is considered as an otb::Image, the execution take around 20 seconds. When the image processed is considered as an otb::VectorImage, the execution takes around 1 minute and 5-10 seconds. Nothing here is really unexpected as Vector Image are known be inefficient. Aren't they?

Let's see what callgrind has to say about this situation.

BeforeOptims-Warp.png

As we can see, most of the time is spent in memory allocations and releases. If we zoom into itk::LinearInterpolateImageFunction<>::EvaluateOptimized(), we can see that each call to this function is accompanied with 4 calls to itk::VariableLengthVector::AllocateElements(), and 6 to delete[].

BeforeOptims-LinerInterp.png

(Actually a few optimizations have already been done, but they are not enough)

The problems

New pixel value => new pixel created

Temporaries elimination

Pixel casting => new pixel created

Pixel assignment => reallocation + copy of old values

dynamic polymorphism

Proxy Pixels

MT safety of cached pixel

The solutions

The results

Iterative arithmetic

The solution with:

  • thread-safe cached pixels,
  • iterative pixel arithmetic (i.e. m_val01 = GetPixel(...); m_val02 = GetPixel(...); m_val02 -= m_val01; m_val02 *= d; m_val02 += mval01;
  • EvaluateOptimized() that returns pixel values through an [out] parameter.

runs in 20seconds (with 1 thread), and we can see in callgrind profile that no allocation (nor releases) are performed on each pixel.

AfterOptims-Warp.png

Expression Templates

Perspectives