Hi,
Can two avx instrcutions can be executed in parallel?
For example,
Version1:
a1= _mm256_load_ps((Rin +offset));
a2= _mm256_load_ps((Gin +offset));
a3= _mm256_load_ps((Bin +offset));
ac0 = _mm256_mul_ps(a1, in2outAvx_11);
ac1 = _mm256_mul_ps(a2, in2outAvx_12);
ac2 = _mm256_mul_ps(a3, in2outAvx_13);
z0 = _mm256_add_ps(ac0,ac1);
z1 = _mm256_add_ps(z0, ac2);
If I changed this code to
Version 2:
a1= _mm256_load_ps((Rin +offset));
a2= _mm256_load_ps((Gin +offset));
a3= _mm256_load_ps((Bin +offset));
ac0 = _mm256_mul_ps(a1, in2outAvx_11);
ac1 = _mm256_mul_ps(a2, in2outAvx_12);
/*first two instructions below, are data independent and might run in parallel */
z0 = _mm256_add_ps(ac0,ac1);
ac2 = _mm256_mul_ps(a3, in2outAvx_13);
z1 = _mm256_add_ps(z0, ac2);
Will version2 code run faster as add and mul intrinsics can execute together?
Or version1 and version two take same time if the compiler rearranges the instructions by itself?