How does Duff’s Device work?

Duff’s device is a trick to express loop unrolling directly in C or C++ without extra code to treat the leftover partial loop.The trick is to use a switch statement where all but one of the cases labels are in the middle of a while loop. Further, all cases fall through to the end of the while loop. Despite the impression, it makes Duff’s device is legal C and C++ code (however, it is not valid in Java).

How is it useful?

There are two key things to Duff’s device.

  1. First, the loop is unrolled. We get more speed by avoiding some of the overhead involved in checking whether the loop is finished and jumping back to the top of the loop. The CPU can run faster when it’s executing straight-line code instead of jumping. However code size becomes larger.
  2. The second aspect is the switch statement. It allows the code to jump into the middle of the loop the first time through. The surprising part to most people is that such a thing is allowed. Well, it’s allowed.

Example

filter_none

edit
close

play_arrow

link
brightness_4
code

// C program to illustrate the working of
// Duff's Device. The program copies given
// number of elements bool array src[] to
// dest[]
#include<stdio.h>
#include<string.h>
#include <stdlib.h>
  
// Copies size bits from src[] to dest[]
void copyBoolArray(bool src[], bool dest[],
                               int size)
{
    // Do copy in rounds of size 8.
    int rounds = size / 8;
  
    int i = 0;
    switch (size % 8)
    {
    case 0:
        while (!(rounds == 0))
        {
            rounds = rounds - 1;
            dest[i] = src[i];
            i = i + 1;
  
        // An important point is, the switch
        // control can directly reach below labels
        case 7:
            dest[i] = src[i];
            i = i + 1;
        case 6:
            dest[i] = src[i];
            i = i + 1;
        case 5:
            dest[i] = src[i];
            i = i + 1;
        case 4:
            dest[i] = src[i];
            i = i + 1;
        case 3:
            dest[i] = src[i];
            i = i + 1;
        case 2:
            dest[i] = src[i];
            i = i + 1;
        case 1:
            dest[i] = src[i];
            i = i + 1;
        };
    }
}
  
// Driver code
int main()
{
    int size = 20;
    bool dest[size], src[size];
  
    // Assign some random values to src[]
    int i;
    for (i=0; i<size; i++)
        src[i] = rand()%2;
  
    copyBoolArray(src, dest, size);
  
    for (i=0; i<size ; i++)
        printf("%d\t%d\n", src[i], dest[i]);
}

chevron_right


Output:

1    1
0    0
1    1
1    1
1    1
1    1
0    0
0    0
1    1
1    1
0    0
1    1
0    0
1    1
1    1
0    0
0    0
0    0
0    0
1       1

How does the above program work?
In the above example, we are dealing with 20 bytes and copying these 20 random bits from src array to destination array. A number of passes for which it will run also depends on the size of the source array.

For the first pass, execution starts at the calculated case label, and then it falls through to each successive assignment statement, just like any other switch statement. After the last case label, execution reaches the bottom of the loop, at which point it jumps back to the top. The top of the loop is inside the switch statement, so the switch is not re-evaluated anymore.

The original loop is unwound eight times, so the number of iterations is divided by eight. If the number of bytes to be copied isn’t a multiple of eight, then there are some bytes left over. Most algorithms that copy blocks of bytes at a time will handle the remainder bytes at the end, but Duff’s device handles them at the beginning. The function calculates count % 8 for the switch statement to figure what the remainder will be, jumps to the case label for that many bytes, and copies them. Then the loop continues to copy groups of eight bytes in the left passes.

Flow Chart

For first pass:

rounds = count / 8;

// rounds = 2 for count =20
i = 0;
switch(count % 8)
{

// The remainder is 4 (20 modulo 8)
// so jump to the case 4
case 0:
    while( !(rounds == 0) )
    {
        //[skipped]
        rounds = rounds - 1;
        dest[i] = src[i];
        i = i + 1;

    case 7:
        dest[i] = src[i];
        i = i + 1;     //[skipped]
    case 6:
        dest[i] = src[i];
        i = i + 1;     //[skipped]
    case 5:
        dest[i] = src[i];
        i = i + 1;     //[skipped]

    case 4:
        dest[i] = src[i];
        i = i + 1;     //Start here. Copy 1 byte (total 1)
    case 3:
        dest[i] = src[i];
        i = i + 1;     //Copy 1 byte (total 2)
    case 2:
        dest[i] = src[i];
        i = i + 1;     //Copy 1 byte (total 3)
    case 1:
        dest[i] = src[i];
        i = i + 1;     //Copy 1 byte (total 4)
    };
}

For second pass:

rounds = count / 8;
i = 0;
switch(count % 8)
{
case 0:
    while( !(rounds == 0) )
    {
        // rounds is decremented by 1 as while
        // loop works, now rounds=1
        rounds = rounds - 1;
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 5)
    case 7:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 6)
    case 6:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 7)
    case 5:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 8)
    case 4:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 9)
    case 3:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 10)
    case 2:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 11)
    case 1:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 12)
    }
}

For third pass:

rounds = count / 8;
i = 0;
switch(count % 8)
{
case 0:
    while( !(rounds == 0) )
    {
        //now while loop works
        rounds = rounds - 1;    //rounds is decremented by 1, now rounds=0
        dest[i] = src[i];
        i = i + 1;         // Copy 1 byte (total 13)

    case 7:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 14)
    case 6:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 15)
    case 5:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 16)
    case 4:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 17)
    case 3:
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 18)
    case 2:
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 19)
    case 1:
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 20)
    };
}

References:
Wikipedia
http://www.lysator.liu.se/c/duffs-device.html

This article is contributed by Nikhil Kumar Sharma. If you like GeeksforGeeks and would like to contribute, you can also write an article using contribute.geeksforgeeks.org or mail your article to contribute@geeksforgeeks.org. See your article appearing on the GeeksforGeeks main page and help other Geeks.

Please write comments if you find anything incorrect, or you want to share more information about the topic discussed above.



My Personal Notes arrow_drop_up


Article Tags :
Practice Tags :


Be the First to upvote.


Please write to us at contribute@geeksforgeeks.org to report any issue with the above content.