ARM Assembly Optimization - Raspberry Pi Forums


i have noticed there many references on arm assembly, though few on optimizing arm cpu. nice thing arm architecture if optimize armv7 have equally optimized armv3, armv4, armv5, , armv6. limit on minimal arm version instructions used code.

optimizing arm lot simpler others. basic rules are:
  • 1: minimize use of branching. unroll loops, , like.
    2: avoid using destination register of last operation source or destination register current instruction.
    3: minimize memory access as possible, use registers local variables as can.
    4: if must heavy memory access, try interleave @ least 2 non memory accessing instructions between each memory accessing instruction.
other making sure caches enabled, if writing bare-metal, following above simple rules provide best speed in 99% of cases on arm. these simple rules allow multi-issue of instructions on newer arm cpu's (armv5, armv6, , armv7), allowing more 1 op per cpu clock pulse.

long things thought through possible produce tight code while following these simple rules. though optimizing size while optimizing speed exercise in refining algorithm being implemented.

i should note did not mention algorithmic optimizations, should done when coding best efficiency. simple things moving can moved outside of loop, using efficient method of calculation, taking advantage of power of 2 operations many things, avoiding divesion if @ possible, preferring integer operations, etc. far exhaustive list.

algorithmic optimization makes of difference other form of optimization.


raspberrypi



Comments

Popular posts from this blog

invalid use of void expresion in FlexiTimer2 library

error: a function-definition is not allowed here before '{' token

LED Strip Code