# Optional CRC Instructions in ARMv8

## Introduction

The 32-bit Cyclic-Redundancy-Check (CRC) algorithm is defined to be: CRC (M(x)) = x^deg(P(x)) * M(x) mod P(x)

Where M(x) is the message (the data we need the CRC for), P(x) is the 33-bit CRC polynomial and deg(P(x)) is the degree of the polynomial will be equal to 32. mod is the modulus (or remainder). Or in words, the CRC is the remainder of the scaled message when divided by our checking polynomial.

We are dealing with polynomial arithmetic over {0,1}, so multiplication is a special kind of multiplication (corresponding to the type found in the instruction: PMULL), and addition is XOR.

As an example for 4-bit polynomials: 1 + 3 = 0001 + 0011 = (XOR) = 0010 = 2. Multiplication is trickier: 3 * 3 = 0011 * 0011, which is equivalent to: (x + 1) * (x + 1) = x2 + 2*x + 1 = x2 + 1 = 0101 = 5. This is also referred to as carry-less multiplication as the 2* term disappears (in conventional arithmetic 2 == x and we arrive at the result 3*3 = 9). We can also think about this in terms of long multiplication with xor instead of addition at the end:

 0 0 1 1 ⊗ 0 0 1 1 0 0 1 1 ⊕ 0 0 1 1 0 0 0 1 0 1

There is an optional set of instructions to compute 32-bit CRCs for the following polynomials (the 33rd bit of P(x) is assumed to be 1):

 Polynomial P(x) Colloquial Name Instructions 0x04C11DB7 CRC-32 CRC32B, CRC32H, CRC32W, CRC32X 0x1EDC6F41 CRC-32c (Castagnoli) CRC32CB, CRC32CH, CRC32CW, CRC32CX

One uses the instructions in the following way:

 Suffix Data type Example B 1-byte CRC32B w2, w1, w0 H 2-bytes CRC32H w2, w1, w0 W 4-bytes CRC32W w2, w1, w0 X 8-bytes CRC32X w2, w1, x0

All four instructions above take an input 32-bit CRC value in w1, update the CRC to take into account the new information in w0 (or x0) and store the updated CRC in w2.

## Example Code

The following code computes the CRC32c in the kernel:

```   1 #define CRC32CX(crc, value) __asm__("crc32cx %w[c], %w[c], %x[v]":[c]"+r"(crc):[v]"r"(value))
2 #define CRC32CW(crc, value) __asm__("crc32cw %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value))
3 #define CRC32CH(crc, value) __asm__("crc32ch %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value))
4 #define CRC32CB(crc, value) __asm__("crc32cb %w[c], %w[c], %w[v]":[c]"+r"(crc):[v]"r"(value))
5
6 static u32 crc32c_arm64_le_hw(u32 crc, const u8 *p, unsigned int len)
7 {
8         s64 length = len;
9
10         while ((length -= sizeof(u64)) >= 0) {
11                 CRC32CX(crc, get_unaligned_le64(p));
12                 p += sizeof(u64);
13         }
14
15         /* The following is more efficient than the straight loop */
16         if (length & sizeof(u32)) {
17                 CRC32CW(crc, get_unaligned_le32(p));
18                 p += sizeof(u32);
19         }
20         if (length & sizeof(u16)) {
21                 CRC32CH(crc, get_unaligned_le16(p));
22                 p += sizeof(u16);
23         }
24         if (length & sizeof(u8))
25                 CRC32CB(crc, *p);
26
27         return crc;
28 }
```

Due to a lack of general availability of compiler intrinsics for the CRC instruction on AArch64, we use the gcc inline assembler definitions.

## Detection of the CRC instruction

The preferred mechanism to detect the presence of the CRC instruction under Linux is to use HWCAPs. Here is an example:

```   1 #include <stdio.h>
2 #include <sys/auxv.h>
3
4 #ifndef HWCAP_CRC32
5 #define HWCAP_CRC32             (1 << 7)
6 #endif /* HWCAP_CRC32 */
7
8 int main(void)
9 {
10         unsigned long hwcap = getauxval(AT_HWCAP);
11         if (hwcap & HWCAP_CRC32) {
12                 printf("CRC32 instructions present\n");
13                 return 1;
14         }
15
16         printf("CRC32 instructions NOT present\n");
17         return 0;
18 }
```

One can also manually inspect /proc/cpuinfo for the presence of CRC, but applications should always adopt the HWCAP approach shown above.