Greetings,
I have recently written some code using AVX function calls to perform a convolution in my software. I have compiled and run this code on two platforms with the following compilation settings of note:
1. Windows 7 w/ Visual Studio 2010 on a i7-2760QM
Optimization: Maximize Speed (/O2)
Inline Function Expansion: Only __inline(/Ob1)
Enable Intrinsic Functions: No
Favor Size or Speed: Favor fast code (/Ot)
2. Fedora Linux 15 w/ gcc 4.6 on a i7-3612QE
Flags: -O3 -mavx -m64 -march=corei7-avx -mtune=corei7-avx
For my testing I ran the C implementation and the AVX implementation on both platforms and got the following timing results:
In Visual Studio:
C Implementation: 30ms
AVX Implementation: 5ms
In GCC:
C Implementation: 9ms
AVX Implementation: 57ms
As you can tell my AVX numbers on Linux are very large by comparison. My concern and reason for this post is that I may not have a proper understanding of using AVX and the settings to properly them in both scenarios. For example, take my Visual Studio run. If I change the flag Enable Intrinsics to Yes, my AVX numbers go from 5ms to 59ms. Does that mean disabling the compiler to optimize with intrinsics and manually setting them in Visual Studio give that much better results? Last I checked there is nothing similar in gcc. Could Microsoft be that more capable of a better compile than gcc in this case? Any ideas why my AVX numbers on gcc are just that much larger? Any help is most appreciated. Cheers.