Quantcast
Channel: Intel® Software - Intel ISA Extensions
Browsing all 685 articles
Browse latest View live

AVX Optimizations and Performance: VisualStudio vs GCC

Greetings,   I have recently written some code using AVX function calls to perform a convolution in my software. I have compiled and run this code on two platforms with the following compilation...

View Article


Bloated Instruction counts in SDE as compared with that from HW PMC 0xC0

I have noted in multiple (though infrequent but freqent enough) circumstances that the instruction counts for execution of a binary in SDE and that reported by PMC 0xC0 differ by ORDERS of magnitude....

View Article


Poor Code Gen of FMA3 instructions in SPEC FP 06 using Intel 14.0.0 compiler...

I have compiled a SPEC FP 06 using the Intel 14.0.0 compiler suite.  I've observed great performance but upon looking at the code gen distributions through SDE, I note that only about 0.1% of the...

View Article

AVX-512 is a big step forward - but repeating past mistakes!

AVX512 is arguably the biggest step yet in the evolution of the x86 instruction set in terms of new instructions, new registers and new features. The first try was the Knights Corner instruction set....

View Article

Studying Intel TSX Performance: strange results

Dear all,I've made studying of Intel TSX performance - its abort cases and comparison with spin lock. The study with reference to source code is available at...

View Article


mem address directly from SSE/AVX register

Hello, I would like to make a suggestionVery often [otherwise well vectorizible] algorithms require reading/writing from/to mem addresses which are calculated per-channel (reading from table, sampling...

View Article

Intel® Software Development Emulator, Release 6.7

Hello, we just released version 6.7 of the Intel® Software Development Emulator. It is available here:http://www.intel.com/software/sdeIt includes:Debugging with GDB is now supported with Intel®...

View Article

MOVNTI and alignment for real mode

In the SDM rev. 48, vol. 2A, page 3-546, in the description of the exceptions for the MOVNTI instruction in the real-mode, it is specified that the instruction can generate#GP If a memory operand is...

View Article


Intel® Software Development Emulator, Release 6.12

Hello, we just released version 6.12 of the Intel® Software Development Emulator. It is available here:http://www.intel.com/software/sdeIt includes:Support to Mac OSX version 10.9.Improved the TSX...

View Article


Instruction set extensions programming reference, revision 17,

An updated instruction set extensions programming reference, revision 17, has been posted here. It includes information about:Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructionsIntel®...

View Article

Is there some books about SIMD(sse, avx and so on) optimization?

~Can someone please recommend a few books on program optimization?I use  multithreading and simd to improve the performance of the program.I always learn simd through the website, and ask questions in...

View Article

Latest ASM compiler other than Intel C and C++ Compilers

Hi,Am trying to code my application in Assembly to run on x86. Please suggest me the suitable compiler which will support all SSE4.2 Assembly instructions(other than Intel Compiler). If any links which...

View Article

unaligned loads avx-128 vs. -256

I just saw that my cases using _mm256_loadu_ps show better performance than _mm_loadu_ps on corei7-4, where the latter was faster on earlier AVX platforms (in part due to the ability of ICL/icc to...

View Article


Will AVX-512 replace the need for dedicated GPU's?

I do not expect it to replace high end graphics cards, and will likely be less efficient powerwise than a dedicated gpu (integrated or discrete). As far as I can tell performance wise it will easily...

View Article

ICPC 13.0.2 generates scalar load instead of packed load

Hi all,I'm a little puzzled about the generated assembly code for this little piece of Cilk code:void gemv(const float* restrict A[4], const float *restrict x, float * restrict y){...

View Article


gather instructions and the size of indexs for a given base gpr size

Hi,    I have a simple question.  When performing address computations, the size of the BASE and the INDEX are required to be the same.  I presumed this was the case in the GATHER instructions.. but I...

View Article

Image may be NSFW.
Clik here to view.

FMA manipulation of register’s content for XMM, YMM and ZMM register sets

hello, there wasn’t a typical introduction thread so since it’s my first post i though to introduce myself. my name is mile (yes like the measuring unit) and i’m a student. i’m noob in this area.i’m...

View Article


Get _mm_alignr_epi8 functionality on 256-bit vector registers (AVX2)

Hello,I'm porting an application from SSE to AVX2 and KNC.I have some _mm_alignr_epi8 intrinsics. While I just had to replace this intrinsic by the _mm512_alignr_epi32 intrinsic for KNC (by the way, I...

View Article

How to clear the upper 128 bits of __m256 value?

How can I clear the upper 128 bits of m2: __m256i    m2 = _mm256_set1_epi32(2); __m128i    m1 = _mm_set1_epi32(1); m2 = _mm256_castsi128_si256(_mm256_castsi256_si128(m2)); m2 =...

View Article

Different ways to turn an AoS into an SoA

Hi,I'm trying to implement a permutation that turns an AoS (where the structure has 4 float) into a SoA, using SSE, AVX, AVX2 and KNC, and without using gather operations, to find out if it worth...

View Article
Browsing all 685 articles
Browse latest View live