2012年12月6日 星期四

Performance of Unaligned Memory Access in Raspberry Pi

SoC of Raspberry pi is a Broadcom BCM2835. This contains an ARM1176JZFS, ARMv6 architecture.

ARMv6 adds unaligned word(4 bytes) and halfword load and store data access support.
For detail, please check.
I would like to test performance when a user space process invokes a unaligned memory access (only load).
Here is my test environment and cases.

[Environment]

Kernel: Linux xbian 3.6.1 #4 PREEMPT Thu Nov 8 18:54:20 CET 2012 armv6l GNU/Linux
GCC: gcc version 4.6.3 (Debian 4.6.3-12+rpi1)

# cat /proc/cpuinfo
Processor       : ARMv6-compatible processor rev 7
                  (v6l)
BogoMIPS        : 697.95
Features        : swp half thumb fastmult vfp edsp

                  java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2708
Revision        : 0002
Serial          : 0000000009b752ff


[Program]

unaligned.c
#include <stdio.h>
#include <stdint.h>

#ifdef USE_GCC_FIXUP
struct __una_u32 {
    uint32_t x;  
} __attribute__((packed));

static inline uint32_t get_unaligned_32(const void *p)
{
    const struct __una_u32 *ptr = 
        (const struct __una_u32 *)p;
    return ptr->x;
}
#endif

int main()
{
    uint8_t buf[16];
    uint32_t i, j;

    for (i = 0; i < sizeof(buf); i++)
        buf[i] = i;

    for (j = 0; j < 100000000; j++) {
        /* unaligned access */
#ifndef USE_GCC_FIXUP
        i = *(unsigned int*)(&buf[1]);
#else
        i = get_unaligned_32(&buf[1]);
#endif
    }

    printf("0x%X\n", i);

    return 0;
}

[Case]

  1. unaligned word access
    • fix up by hardware
      # gcc -c unaligned.c
      # time ./unaligned
      0x4030201

      real    0m2.053s
      user    0m2.040s
      sys     0m0.010s

    • fix up by software (gcc)
      # gcc -DUSE_GCC_FIXUP -o unaligned_gcc unaligned.c
      # time ./unaligned_gcc
      0x4030201

      real    0m6.384s
      user    0m6.370s
      sys     0m0.010s

    The result is very clear.

    Here I add a case is aligned access.
    Only modify:
    #ifndef USE_GCC_FIXUP
            i = *(unsigned int*)(&buf[0]);
    #else
    
    # time ./aligned
    0x3020100

    real    0m1.934s
    user    0m1.920s
    sys     0m0.000s

  2. unaligned double words access
    Modify unaligned.c to support double words access
    • fix up by kernel
      # time ./unaligned64
      0x807060504030201

      real    1m8.754s
      user    0m7.800s
      sys     1m0.830s


    • fix up by software (gcc)
      # time ./unaligned64_gcc
      0x807060504030201

      real    0m9.753s
      user    0m9.700s
      sys     0m0.030s

    Also add a case for aligned access.
    # time ./aligned64
    0x706050403020100

    real    0m2.413s
    user    0m2.400s
    sys     0m0.000s
So, to avoid unaligned memory access if possible!

沒有留言:

張貼留言