This specification is for porting and optimizing a JPEG decoder library for ARM Cortex A9. It is intended that we will start from an existing open source code base of popular JPEG library that is most conducive to SIMD optimizations and then introduce NEON optimization backend to it. We hope to create a drop in replacement for IJG’s libjpeg library with higher performance without breaking compatibility with existing applications.


  • Development Platform: OMAP4 Board (Panda or Blaze Board )
  • GCC compiler & assembler with complete support for NEON instruction set

  • GDB and Oprofiler on Cortex A9 for finding hotspots.


  • Study and compare two likely candidates for initial development: libjpeg and libjpeg-turbo.
  • Once initial library is selected build a C version of that library and test correct operation.
  • Profile JPEG codec and find most time consuming operation. List and prioritize operations to be coded in NEON assembly or macros.
  • Port a test program to measure performance.
  • Implement each operation in the order of priority and measure performance improvement.
  • Performance will be measured in Mega pixels per second of decoding.


Following is a typical implementation block diagram of a JPEG decoder.

  • jpeg_decoder_block_diagram.JPG

Various core operations are repeated operation over multiple pixel or data. This present a opportunity to optimize them using SIMD instructions like NEON.

First stage of doing implementation is to identify such core data processing functions that can be optimized by SIMD. Once that is done we go about writing those functions in NEON assembly.

Code Changes

  1. Modify libjpeg-turbo or libjpeg to compile and run as Pure C code on PC.
  2. Compile and run modified C code to ensure no break in functionality.
  3. Move modified code to Cortex-A9 and ensure proper working.
  4. Use this code as baseline for optimization work and benchmark its performance.
  5. Do coding of infrastructure for enabling ARM optimized NEON functions.
  6. Run Test Suite to ensure no break in functionality.
  7. Start coding identified routines in NEON one by one. After each routine run both functional tests and performance test to ensure proper working.
  8. Repeat (7) until all identified routines are covered and significant performance enhancement is achieved.
  9. Deploy this optimized libjpeg in system and do system level testing to ensure functionality by using various applications that use libjpeg.

Test/Demo Plan

Show performance of a typical application using JPEG decoding before deployment of optimized JPEG library and after it.

Unresolved issues

To make a choice between the choice of starting code as libjpeg or libjpeg-turbo.

BoF agenda and discussion

WorkingGroups/Middleware/Multimedia/Specs/1105/OptimizeJPEGDecodingforARM (last modified 2010-11-17 00:07:39)