Open In App

How does Duff’s Device work?

Improve
Improve
Like Article
Like
Save
Share
Report

Duff’s device is a trick to express loop unrolling directly in C or C++ without extra code to treat the leftover partial loop.The trick is to use a switch statement where all but one of the cases labels are in the middle of a while loop. Further, all cases fall through to the end of the while loop. Despite the impression, it makes Duff’s device is legal C and C++ code (however, it is not valid in Java).

How is it useful?

There are two key things to Duff’s device.

  1. First, the loop is unrolled. We get more speed by avoiding some of the overhead involved in checking whether the loop is finished and jumping back to the top of the loop. The CPU can run faster when it’s executing straight-line code instead of jumping. However code size becomes larger.
  2. The second aspect is the switch statement. It allows the code to jump into the middle of the loop the first time through. The surprising part to most people is that such a thing is allowed. Well, it’s allowed.

Example




// C program to illustrate the working of
// Duff's Device. The program copies given
// number of elements bool array src[] to
// dest[]
#include<stdio.h>
#include<string.h>
#include <stdlib.h>
  
// Copies size bits from src[] to dest[]
void copyBoolArray(bool src[], bool dest[],
                               int size)
{
    // Do copy in rounds of size 8.
    int rounds = size / 8;
  
    int i = 0;
    switch (size % 8)
    {
    case 0:
        while (!(rounds == 0))
        {
            rounds = rounds - 1;
            dest[i] = src[i];
            i = i + 1;
  
        // An important point is, the switch
        // control can directly reach below labels
        case 7:
            dest[i] = src[i];
            i = i + 1;
        case 6:
            dest[i] = src[i];
            i = i + 1;
        case 5:
            dest[i] = src[i];
            i = i + 1;
        case 4:
            dest[i] = src[i];
            i = i + 1;
        case 3:
            dest[i] = src[i];
            i = i + 1;
        case 2:
            dest[i] = src[i];
            i = i + 1;
        case 1:
            dest[i] = src[i];
            i = i + 1;
        };
    }
}
  
// Driver code
int main()
{
    int size = 20;
    bool dest[size], src[size];
  
    // Assign some random values to src[]
    int i;
    for (i=0; i<size; i++)
        src[i] = rand()%2;
  
    copyBoolArray(src, dest, size);
  
    for (i=0; i<size ; i++)
        printf("%d\t%d\n", src[i], dest[i]);
}


Output:

1    1
0    0
1    1
1    1
1    1
1    1
0    0
0    0
1    1
1    1
0    0
1    1
0    0
1    1
1    1
0    0
0    0
0    0
0    0
1       1

How does the above program work?
In the above example, we are dealing with 20 bytes and copying these 20 random bits from src array to destination array. A number of passes for which it will run also depends on the size of the source array.

For the first pass, execution starts at the calculated case label, and then it falls through to each successive assignment statement, just like any other switch statement. After the last case label, execution reaches the bottom of the loop, at which point it jumps back to the top. The top of the loop is inside the switch statement, so the switch is not re-evaluated anymore.

The original loop is unwound eight times, so the number of iterations is divided by eight. If the number of bytes to be copied isn’t a multiple of eight, then there are some bytes left over. Most algorithms that copy blocks of bytes at a time will handle the remainder bytes at the end, but Duff’s device handles them at the beginning. The function calculates count % 8 for the switch statement to figure what the remainder will be, jumps to the case label for that many bytes, and copies them. Then the loop continues to copy groups of eight bytes in the left passes.

Flow Chart

For first pass:

rounds = count / 8;

// rounds = 2 for count =20
i = 0;
switch(count % 8)
{

// The remainder is 4 (20 modulo 8)
// so jump to the case 4
case 0:
    while( !(rounds == 0) )
    {
        //[skipped]
        rounds = rounds - 1;
        dest[i] = src[i];
        i = i + 1;

    case 7:
        dest[i] = src[i];
        i = i + 1;     //[skipped]
    case 6:
        dest[i] = src[i];
        i = i + 1;     //[skipped]
    case 5:
        dest[i] = src[i];
        i = i + 1;     //[skipped]

    case 4:
        dest[i] = src[i];
        i = i + 1;     //Start here. Copy 1 byte (total 1)
    case 3:
        dest[i] = src[i];
        i = i + 1;     //Copy 1 byte (total 2)
    case 2:
        dest[i] = src[i];
        i = i + 1;     //Copy 1 byte (total 3)
    case 1:
        dest[i] = src[i];
        i = i + 1;     //Copy 1 byte (total 4)
    };
}

For second pass:

rounds = count / 8;
i = 0;
switch(count % 8)
{
case 0:
    while( !(rounds == 0) )
    {
        // rounds is decremented by 1 as while
        // loop works, now rounds=1
        rounds = rounds - 1;
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 5)
    case 7:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 6)
    case 6:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 7)
    case 5:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 8)
    case 4:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 9)
    case 3:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 10)
    case 2:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 11)
    case 1:
        dest[i] = src[i];
        i = i + 1; // Copy 1 byte (total 12)
    }
}

For third pass:

rounds = count / 8;
i = 0;
switch(count % 8)
{
case 0:
    while( !(rounds == 0) )
    {
        //now while loop works
        rounds = rounds - 1;    //rounds is decremented by 1, now rounds=0
        dest[i] = src[i];
        i = i + 1;         // Copy 1 byte (total 13)

    case 7:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 14)
    case 6:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 15)
    case 5:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 16)
    case 4:
        dest[i] = src[i];
        i = i + 1;    // Copy 1 byte (total 17)
    case 3:
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 18)
    case 2:
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 19)
    case 1:
        dest[i] = src[i];
        i = i + 1;     // Copy 1 byte (total 20)
    };
}

References:
Wikipedia
http://www.lysator.liu.se/c/duffs-device.html



Last Updated : 13 Sep, 2023
Like Article
Save Article
Previous
Next
Share your thoughts in the comments
Similar Reads