Hi,
I'm testing a custom implementation of strcmp() which involves SSE4.2 and this instruction in particular:
pcmpistri $0x18,(%rsi,%rax,1),%xmm1
I've made a test that passes unaligned pointers to the custom strcmp(), the test looks like this:
#include <string.h> const char a[] = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"; const char b[] = "abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyz"; int main() { return strcmp(a + 1, b + 1); }
I verified in the debugger that the pointers are actually not 16-byte aligned when the above instruction is executed. My expectation was that the instruction would crash, however it did not. In fact, the function works correctly, i.e. returns 0.
My question is, does pcmpistri not actually require the memory operand to be aligned? I looked in "Intel(R) 64 and IA-32 Architectures Software Developer Manual" and the instruction is documented as having an m128 operand, which is, as I understand, required to be aligned.
I'm running on a Sandy Bridge CPU.