Hi everyone,
I just need to convert __m512d to __m512 on the current project to gain better performance as I can handle more numbers at the same time.
I'm not so familiar with the AVX512 extension. So I'm not sure whether my code is the most efficient way to do this.
My code is as below:
inline void pd2ps(__m512d *a1,__m512d *a2,__m512 *b){ __m256 t1,t2; __m512 tb1,tb2; int rouding = _MM_FROUND_TO_NEAREST_INT | _MM_FROUND_NO_EXC; t1 = _mm512_cvt_roundpd_ps(*a1,rouding); t2 = _mm512_cvt_roundpd_ps(*a2,rouding); tb1 = _mm512_castps256_ps512(t1); tb2 = _mm512_castps256_ps512(t2); *b = _mm512_shuffle_f32x4(tb1,tb2,0x44); }
Is there any better method to convert __m512d to __m512?
Hope you can give me some advices and share the efficient implementation.
Thank you.
Thread Topic:
How-To