Hi,
I'm trying to use SSE intrinsics in the linux Kernel following a previous post in this forum: https://software.intel.com/en-us/forums/intel-isa-extensions/topic/543853
I've included x86intrin.h as described above, and called kernel_fpu_begin before calling my intrinsics. However, I get a General Protection Fault(0) when I try to run the instruction movdqa.
Basically, what my C code is doing is:
const u8 *someFunction(...) { const __m128i var = _mm_setzero_si128(); const __m128i var2 = _mm_set1_epi8(0xf); ..... __m128i var3 = _mm_loadu_si128(some_pointer); .... }
And the corresponding faulty ASM instructions given are:
All code ======== 0: 00 48 c7 add %cl,-0x39(%rax) 3: c1 (bad) 4: f0 fe lock (bad) 6: f4 hlt 7: 81 ba 11 06 00 00 eb cmpl $0x2e66a6eb,0x611(%rdx) e: a6 66 2e 11: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) 18: 00 19: 0f 1f 00 nopl (%rax) 1c: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 21: 55 push %rbp 22: 48 8d 2c 24 lea (%rsp),%rbp 26: 48 8d 64 24 e0 lea -0x20(%rsp),%rsp 2b:* 66 0f 7f 45 f0 movdqa %xmm0,-0x10(%rbp) <-- trapping instruction 30: 48 85 ff test %rdi,%rdi 33: 66 0f 7f 4d e0 movdqa %xmm1,-0x20(%rbp) 38: 0f 84 f0 01 00 00 je 0x22e 3e: 48 rex.W 3f: 85 .byte 0x85 Code starting with the faulting instruction =========================================== 0: 66 0f 7f 45 f0 movdqa %xmm0,-0x10(%rbp) 5: 48 85 ff test %rdi,%rdi 8: 66 0f 7f 4d e0 movdqa %xmm1,-0x20(%rbp) d: 0f 84 f0 01 00 00 je 0x203 13: 48 rex.W 14: 85 .byte 0x85
It seems that the data I give to movdqa is not aligned but I don't really know how to check that?
According to the panic report, it happens right before i call _mm_setzero_si128. To make my code work, I had to add -mpreferred-stack-boundary=4 for compiling the unit containing the SSE instructions. I tried to use mstackrealign in case it was my stack who was not aligned but with no effect. So basically my compiling command line is:
gcc ... (default kernel for Atom CPU) -fno-strict-aliasing -fno-common -mpreferred-stack-boundary=3 -march=atom -mtune=atom -m64 -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls (added compiling arguments) -mpreferred-stack-boundary=4 -mstackrealign
Does anyone would have a similar problem or an idea to debug further? At least how to know how to check if the address given to movdqa are aligned or not..
Thanks!