SoC of Raspberry pi is a Broadcom BCM2835. This contains an ARM1176JZFS, ARMv6 architecture.
ARMv6 adds unaligned word(4 bytes) and halfword load and store data
access support.
For detail, please check.
I would like to test performance when a user space process invokes a unaligned memory access (only load).
Here is my test environment and cases.
[Environment]
Kernel: Linux xbian 3.6.1 #4 PREEMPT Thu Nov 8 18:54:20 CET 2012 armv6l GNU/Linux
GCC: gcc version 4.6.3 (Debian 4.6.3-12+rpi1)
# cat /proc/cpuinfo
Processor : ARMv6-compatible processor rev 7
(v6l)
BogoMIPS : 697.95
Features : swp half thumb fastmult vfp edsp
java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb76
CPU revision : 7
Hardware : BCM2708
Revision : 0002
Serial : 0000000009b752ff
[Program]
unaligned.c
#include <stdio.h>
#include <stdint.h>
#ifdef USE_GCC_FIXUP
struct __una_u32 {
uint32_t x;
} __attribute__((packed));
static inline uint32_t get_unaligned_32(const void *p)
{
const struct __una_u32 *ptr =
(const struct __una_u32 *)p;
return ptr->x;
}
#endif
int main()
{
uint8_t buf[16];
uint32_t i, j;
for (i = 0; i < sizeof(buf); i++)
buf[i] = i;
for (j = 0; j < 100000000; j++) {
/* unaligned access */
#ifndef USE_GCC_FIXUP
i = *(unsigned int*)(&buf[1]);
#else
i = get_unaligned_32(&buf[1]);
#endif
}
printf("0x%X\n", i);
return 0;
}
[Case]
- unaligned word access
- fix up by hardware
# gcc -c unaligned.c
# time ./unaligned
0x4030201
real 0m2.053s
user 0m2.040s
sys 0m0.010s
- fix up by software (gcc)
# gcc -DUSE_GCC_FIXUP -o unaligned_gcc unaligned.c
# time ./unaligned_gcc
0x4030201
real 0m6.384s
user 0m6.370s
sys 0m0.010s
The result is very clear.
Here I add a case is aligned access.
Only modify:
#ifndef USE_GCC_FIXUP
i = *(unsigned int*)(&buf[0]);
#else
# time ./aligned
0x3020100
real 0m1.934s
user 0m1.920s
sys 0m0.000s
- unaligned double words access
Modify unaligned.c to support double words access
- fix up by kernel
# time ./unaligned64
0x807060504030201
real 1m8.754s
user 0m7.800s
sys 1m0.830s
- fix up by software (gcc)
# time ./unaligned64_gcc
0x807060504030201
real 0m9.753s
user 0m9.700s
sys 0m0.030s
Also add a case for aligned access.
# time ./aligned64
0x706050403020100
real 0m2.413s
user 0m2.400s
sys 0m0.000s
So, to avoid unaligned memory access if possible!