2012年12月6日 星期四

Performance of Unaligned Memory Access in Raspberry Pi

SoC of Raspberry pi is a Broadcom BCM2835. This contains an ARM1176JZFS, ARMv6 architecture.

ARMv6 adds unaligned word(4 bytes) and halfword load and store data access support.
For detail, please check.
I would like to test performance when a user space process invokes a unaligned memory access (only load).
Here is my test environment and cases.

[Environment]

Kernel: Linux xbian 3.6.1 #4 PREEMPT Thu Nov 8 18:54:20 CET 2012 armv6l GNU/Linux
GCC: gcc version 4.6.3 (Debian 4.6.3-12+rpi1)

# cat /proc/cpuinfo
Processor       : ARMv6-compatible processor rev 7
                  (v6l)
BogoMIPS        : 697.95
Features        : swp half thumb fastmult vfp edsp

                  java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xb76
CPU revision    : 7

Hardware        : BCM2708
Revision        : 0002
Serial          : 0000000009b752ff


[Program]

unaligned.c
#include <stdio.h>
#include <stdint.h>

#ifdef USE_GCC_FIXUP
struct __una_u32 {
    uint32_t x;  
} __attribute__((packed));

static inline uint32_t get_unaligned_32(const void *p)
{
    const struct __una_u32 *ptr = 
        (const struct __una_u32 *)p;
    return ptr->x;
}
#endif

int main()
{
    uint8_t buf[16];
    uint32_t i, j;

    for (i = 0; i < sizeof(buf); i++)
        buf[i] = i;

    for (j = 0; j < 100000000; j++) {
        /* unaligned access */
#ifndef USE_GCC_FIXUP
        i = *(unsigned int*)(&buf[1]);
#else
        i = get_unaligned_32(&buf[1]);
#endif
    }

    printf("0x%X\n", i);

    return 0;
}

[Case]

  1. unaligned word access
    • fix up by hardware
      # gcc -c unaligned.c
      # time ./unaligned
      0x4030201

      real    0m2.053s
      user    0m2.040s
      sys     0m0.010s

    • fix up by software (gcc)
      # gcc -DUSE_GCC_FIXUP -o unaligned_gcc unaligned.c
      # time ./unaligned_gcc
      0x4030201

      real    0m6.384s
      user    0m6.370s
      sys     0m0.010s

    The result is very clear.

    Here I add a case is aligned access.
    Only modify:
    #ifndef USE_GCC_FIXUP
            i = *(unsigned int*)(&buf[0]);
    #else
    
    # time ./aligned
    0x3020100

    real    0m1.934s
    user    0m1.920s
    sys     0m0.000s

  2. unaligned double words access
    Modify unaligned.c to support double words access
    • fix up by kernel
      # time ./unaligned64
      0x807060504030201

      real    1m8.754s
      user    0m7.800s
      sys     1m0.830s


    • fix up by software (gcc)
      # time ./unaligned64_gcc
      0x807060504030201

      real    0m9.753s
      user    0m9.700s
      sys     0m0.030s

    Also add a case for aligned access.
    # time ./aligned64
    0x706050403020100

    real    0m2.413s
    user    0m2.400s
    sys     0m0.000s
So, to avoid unaligned memory access if possible!

2012年12月3日 星期一

Note about Linux Kernel char Device Driver

There are two ways to create a struct cdev:
  1. static way
    • declare
      • embedded in your device structure statically
        struct dummy_device {
            struct cdev cdev;
        };
    • initialize
      •  use cdev_init() function to initialize the cdev structure
        struct dummy_device dev;
        ...
        cdev_init(&dev.cdev, &dummy_fops);
    • remove
      • call cdev_del()
  2. dynamic way
    • declare
      • memory of cdev structure is allocated dynamically
        struct dummy_device {
            struct cdev *cdev;
        };
        ...
        struct dymmy_device dev;
        dev->cdev = cdev_alloc();
    • initialize
      • dev->cdev.fops = &dummy_fops;
    • remove
      • call cdev_del()
      • You don't need to call kfree() to free memory. Kernel releases the memory automatically. (talk this later)  
There is one thing you need to know about the dynamic way. 
If you use dynamic way to create a cdev structure, you should better not use cdev_init() to initialize the structure.

Let's check source code.

void cdev_init(struct cdev *cdev, 
               const struct file_operations *fops)
{
    memset(cdev, 0, sizeof *cdev);
    INIT_LIST_HEAD(&cdev->list);
    kobject_init(&cdev->kobj, &ktype_cdev_default);

    cdev->ops = fops;
}

struct cdev *cdev_alloc(void)
{

    /* cdev is allocated here! */
    struct cdev *p = kzalloc(sizeof(struct cdev),
                     GFP_KERNEL);
    if (p) {
        INIT_LIST_HEAD(&p->list);
        kobject_init(&p->kobj, &ktype_cdev_dynamic);
    }
    return p;
}

If you use dynamic way to create the cdev structure, ktype_cdev_dynamic() is used to release cdev kobject.
If you use cdev_init() to initialize cdev structure that is created in dynamic way, release function of kobject is replaced with ktype_cdev_default().
In this case, after you invoke cdev_del() to remove a cdev device from kernel, this causes a memory leak.
You need to free memory manually and it is not recommended.

static void cdev_default_release(struct kobject *kobj)
{
    struct cdev *p = container_of(kobj, struct cdev, kobj);
    cdev_purge(p);
}

static void cdev_dynamic_release(struct kobject *kobj)
{
    struct cdev *p = container_of(kobj, struct cdev, kobj);
    cdev_purge(p);
    kfree(p);    /* cdev is freed here! */
}

static struct kobj_type ktype_cdev_default = {
    .release        = cdev_default_release,
};

static struct kobj_type ktype_cdev_dynamic = {
    .release        = cdev_dynamic_release,
};