During 2012 Q1 connect in SFO there were some investigations into the top regressions for Linaro GCC 4.6 compared to Linaro GCC 4.5 .

Here are a set of notes that serve as a dumping ground after those investigations.

PRE_DEC and floating point addressing forms yet again - AAC

   void foo (float *x , float *y, float *z, float *m, int l)
      int i;
      for (i = 0; i < l ; i++)
        z[i] = x[i] * y[i] + m[i];

Connect Q2.12 status : Currently now upstream and being backported to Linaro GCC .

Identical instructions but of opposite conditions in the instruction stream


typedef unsigned char uint8_t;
typedef struct {
    int high;
    int bits; /* stored negated (i.e. negative "bits" is a positive number of
                 bits left) in order to eliminate a negate in cache refilling */
    const uint8_t *buffer;
    const uint8_t *end;
    unsigned int code_word;
} VP56RangeCoder;
extern const uint8_t ff_vp56_norm_shift[256];
extern unsigned int bytestream_get_be16 (const uint8_t ** buffer);

unsigned int __attribute__((noinline)) vp56_rac_renorm(VP56RangeCoder *c)
    int shift = ff_vp56_norm_shift[c->high];
    int bits = c->bits;
    unsigned int code_word = c->code_word;

    c->high   <<= shift;
    code_word <<= shift;
    bits       += shift;
    if(bits >= 0 && c->buffer < c->end) {
        code_word |= bytestream_get_be16(&c->buffer) << bits;
        bits -= 16;
    c->bits = bits;
    return code_word;

 int vp56_rac_get_prob(VP56RangeCoder *c, uint8_t prob)
    unsigned int code_word = vp56_rac_renorm(c);
    unsigned int low = 1 + (((c->high - 1) * prob) >> 8);
    unsigned int low_shift = low << 16;

    if (code_word >= low_shift)
     c->high = c->high - low ;
     c->code_word = code_word - low_shift;
    c->high = low;
    c->code_word = low_shift;

    return (code_word >= low_shift);

Conditional stores of the opposite conditions exist in the final instruction stream under quite a few circumstances with Linaro GCC 4.6 . This came from the vp8 benchmark.

Update: Unfortunately the problem with this case in Linaro GCC 4.6 is that we end up generating this case where the instructions of opposite conditions are within basic blocks, so standard tail merging in if-conversion isn't going to catch these cases. It might be worth looking at some of this when we look at instructions of opposite conditions and move them out or sink them but we've got to be careful around memory accesses.

ssat / usat instruction idioms


Now implemented upstream and delivered into Linaro GCC .

Aliasing issues with the vectorizer

  • Not related to the performance issues that we are seeing.

  • This was a case where the code itself had aliasing violations.

Not really GCC's problem but would be good to know where libav went from here.

Sub-optimal end of loop counter optimization


Example and details discussed here

Status : Currently in progress upstream. Uli is working through this.

RamanaRadhakrishnan/Sandbox/RRQ112ConnectLibavgcc46Reg (last modified 2012-05-26 03:19:45)