Quantcast
Channel: Intel® Software - Intel ISA Extensions
Viewing all articles
Browse latest Browse all 685

Using SSE Intrinsics/movdqa in a linux driver?

$
0
0

Hi,

 

I'm trying to use SSE intrinsics in the linux Kernel following a previous post in this forum: https://software.intel.com/en-us/forums/intel-isa-extensions/topic/543853 

 

I've included x86intrin.h as described above, and called kernel_fpu_begin  before calling my intrinsics. However, I get a General Protection Fault(0) when I try to run the instruction movdqa.

Basically, what my C code is doing is:

const u8 *someFunction(...) {
   const __m128i var = _mm_setzero_si128();
   const __m128i var2 = _mm_set1_epi8(0xf);
.....
   __m128i var3 =  _mm_loadu_si128(some_pointer);
....
}

And the corresponding faulty ASM instructions given are:

All code
========
   0:   00 48 c7                add    %cl,-0x39(%rax)
   3:   c1                      (bad)
   4:   f0 fe                   lock (bad)
   6:   f4                      hlt
   7:   81 ba 11 06 00 00 eb    cmpl   $0x2e66a6eb,0x611(%rdx)
   e:   a6 66 2e
  11:   0f 1f 84 00 00 00 00    nopl   0x0(%rax,%rax,1)
  18:   00
  19:   0f 1f 00                nopl   (%rax)
  1c:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  21:   55                      push   %rbp
  22:   48 8d 2c 24             lea    (%rsp),%rbp
  26:   48 8d 64 24 e0          lea    -0x20(%rsp),%rsp
  2b:*  66 0f 7f 45 f0          movdqa %xmm0,-0x10(%rbp)                <-- trapping instruction
  30:   48 85 ff                test   %rdi,%rdi
  33:   66 0f 7f 4d e0          movdqa %xmm1,-0x20(%rbp)
  38:   0f 84 f0 01 00 00       je     0x22e
  3e:   48                      rex.W
  3f:   85                      .byte 0x85

Code starting with the faulting instruction
===========================================
   0:   66 0f 7f 45 f0          movdqa %xmm0,-0x10(%rbp)
   5:   48 85 ff                test   %rdi,%rdi
   8:   66 0f 7f 4d e0          movdqa %xmm1,-0x20(%rbp)
   d:   0f 84 f0 01 00 00       je     0x203
  13:   48                      rex.W
  14:   85                      .byte 0x85

 

It seems that the data I give to movdqa is not aligned but I don't really know how to check that?

According to the panic report, it happens right before i call  _mm_setzero_si128. To make my code work, I had to add -mpreferred-stack-boundary=4 for compiling the unit containing the SSE instructions. I tried to use mstackrealign in case it was my stack who was not aligned but with no effect. So basically my compiling command line is:

gcc ... (default kernel for Atom CPU) -fno-strict-aliasing -fno-common -mpreferred-stack-boundary=3 -march=atom -mtune=atom -m64 -mno-red-zone -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args -fstack-protector -fno-omit-frame-pointer -fno-optimize-sibling-calls (added compiling arguments) -mpreferred-stack-boundary=4 -mstackrealign

 

Does anyone would have a similar problem or an idea to debug further? At least how to know how to check if the address given to movdqa are aligned or not..

 

Thanks!

 

 

Thread Topic: 

How-To

Viewing all articles
Browse latest Browse all 685

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>