Hi all,
I performed a quick test to understand the behaviors of _mm256_load_ps and _mm256_loadu_ps SIMD intrinsic respectively, and the behavior is quite unexpected.
I am wondering if this is a bug by any chance?
when i try to load a register with unaligned access with _mm256_load_ps, I am expected to encounter an general-protection exception. But this isn't the case with _mm256_loadu_ps.
However, I see no such thing happen when using the aligned load access intrinsic?. For instance in the code below clearly I must expect an exception thrown on the second iteration.
for(i = 0; i < size ; i+=1) { t0 = _mm256_load_ps(&a[i]); t1 = _mm256_load_ps(&b[i]); t2 = _mm256_add_ps(t0, t1); _mm256_store_ps(&c[i], t2); }
This seems to be the case irrespective of weather a,b,c arrays were aligned or unaligned?
Is there any documentation I could refer to which explains this behavior and the performance implication of such unaligned access?
Attached below is the full code
Thanks,
Aketh