The Intel Software Developers Manual describes VPSRLDQ as: "Shift zmm2/m512 right by imm8 bytes while shifting in 0s and store result in zmm1."
VPSRLDQ zmm1, zmm2/m512, imm8
I want to shift zmm5 right by 8 bytes (1 quadword) into zmm1. Here's the code:
vpxorq zmm1,zmm1
VPSRLDQ zmm1,zmm5,8
This is the zmm5 register before and after the call to VPSRLDQ:
zmm5:
v8_int64 = {0xc4, 0x1ae9, 0x441, 0x144, 0x0, 0x0, 0x0, 0x0}
This is the zmm1 register after the call to VPSRLDQ:
zmm1:
v8_int64 = {0x1ae9, 0x0, 0x144, 0x0, 0x0, 0x0, 0x0, 0x0}
I expected to see zmm1 as {0x0, 0xc4, 0x1ae9, 0x441, 0x144, 0x0, 0x0, 0x0}. Instead it drops the first qword (0xc4), then alternates between 2d element, zero, 4th element, 0, etc.
My encoding appears to be by the book. Why does it not just shift right by 8 bytes?