I'll trace here the process I've followed to optimize the processing of an
- 1 The context
- 2 The problems
- 3 The solutions
- 4 The results
- 5 Perspectives
We start from a mono-band image that we resample with an
itk::LinearInterpolateImageFunction used from an
When the image processed is considered as an
otb::Image, the execution take around 20 seconds. When the image processed is considered as an
otb::VectorImage, the execution takes around 1 minute and 5-10 seconds. Nothing here is really unexpected as Vector Image are known be inefficient. Aren't they?
Let's see what callgrind have to say about this situation.
As we can see, most of the time is spent in memory allocations and releases. If we zoom into
itk::LinearInterpolateImageFunction<>::EvaluateOptimized(), we can see that each call to this function is accompanied with 4 calls to
itk::VariableLengthVector::AllocateElements(), and 6 to
(Actually a few optimizations have already been done, but they are not enough)
New pixel value => new pixel created
Pixel casting => new pixel created
Pixel assignment => reallocation + copy of old values
MT safety of cached pixel
The solution with:
- thread-safe cached pixels,
- iterative pixel arithmetic (i.e.
m_val01 = GetPixel(...); m_val02 = GetPixel(...); m_val02 -= m_val01; m_val02 *= d; m_val02 += mval01;
EvaluateOptimized()that returns pixel values through an [out] parameter.
runs in 20seconds (with 1 thread), and we can see in callgrind profile that no allocation (nor releases) are performed on each pixel.