I've rewritten sse2 code to avx2, but performance is only 30-40 % better. Number of instructions has halved. What can be the problem?
How can I know running time for each instruction? Is there any differences between sse instruction and it's avx2 analog in time?
CPU Intel core i7-4790k
Compiler gcc
I hope very much for your help