Quantcast
Channel: Intel® Software - Intel ISA Extensions
Viewing all articles
Browse latest Browse all 685

Dynamic Shift

$
0
0

Hello,

I am trying to achieve a dynamic shift. Well, let me explain the task. I process data with SSE, AVX. Data gets loaded, worked with and later results are stored. To support arbitrary lengths, I need some kind of maskload, but also for SSE.

Suppose my lenght is 9 elements, I work with int32 and SSE. First load, second load is fine. Third load is fine from memory bound, this is no problem. But only element 0 in vector register is valid, others need to be zero. How do I achieve this best?

I get the rest count by: length AND (NOT vectorelements). This would be 1 for the case with 9 elements. So I would need some shift with variable count. To start with a register filled with 1 and shift in the right amount of zeros and AND mask with loaded data. But are there any shifts with variable count? I did not find them. Another idea would be to fill a register ascending 0,1,2,3 and do a less compare with the rest.

0,1,2,3 LT 1,1,1,1 = 1,0,0,0

This would be the correct mask. But I have trouble doing this in AVX as even AVX2 has no set of full compare instructions. So bascially I want a convenient way to implement kind of masked load for SSE, AVX for int32 and float. The code would be allowed to load all data, that NO problem. For AVX there is a maskload, but how do I create a mask for my problem?


Viewing all articles
Browse latest Browse all 685

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>