A CPU has a 32 KB direct mapped cache with 128-byte block size. Suppose A is a twodimensional array of size 512×512 with elements that occupy 8-bytes each. Consider the following two C code segments, P1 and P2. P1:
for (i=0; i<512; i++) {
for (j=0; j<512; j++) {
x += A[i][j];
}
}
P2:
for (i=0; i<512; i++) {
for (j=0; j<512; j++) {
x += A[j][i];
}
}
P1 and P2 are executed independently with the same initial state, namely, the array A is not in the cache and i, j, x are in registers. Let the number of cache misses experienced by P1 be M1 and that for P2 be M2 . The value of M1 is:
0
2048
16384
262144
