Intel® Software - Intel ISA Extensions

↧

AVX Optimizations and Performance: VisualStudio vs GCC

October 1, 2013, 6:46 pm

Greetings, I have recently written some code using AVX function calls to perform a convolution in my software. I have compiled and run this code on two platforms with the following compilation...

View Article

Bloated Instruction counts in SDE as compared with that from HW PMC 0xC0

October 2, 2013, 6:56 am

I have noted in multiple (though infrequent but freqent enough) circumstances that the instruction counts for execution of a binary in SDE and that reported by PMC 0xC0 differ by ORDERS of magnitude....

View Article

Poor Code Gen of FMA3 instructions in SPEC FP 06 using Intel 14.0.0 compiler...

October 2, 2013, 7:10 am

I have compiled a SPEC FP 06 using the Intel 14.0.0 compiler suite. I've observed great performance but upon looking at the code gen distributions through SDE, I note that only about 0.1% of the...

View Article

AVX-512 is a big step forward - but repeating past mistakes!

October 11, 2013, 3:37 am

AVX512 is arguably the biggest step yet in the evolution of the x86 instruction set in terms of new instructions, new registers and new features. The first try was the Knights Corner instruction set....

View Article

Studying Intel TSX Performance: strange results

November 11, 2013, 2:13 pm

Dear all,I've made studying of Intel TSX performance - its abort cases and comparison with spin lock. The study with reference to source code is available at...

View Article

mem address directly from SSE/AVX register

November 14, 2013, 4:02 am

Hello, I would like to make a suggestionVery often [otherwise well vectorizible] algorithms require reading/writing from/to mem addresses which are calculated per-channel (reading from table, sampling...

View Article

Intel® Software Development Emulator, Release 6.7

September 24, 2013, 6:35 am

Hello, we just released version 6.7 of the Intel® Software Development Emulator. It is available here:http://www.intel.com/software/sdeIt includes:Debugging with GDB is now supported with Intel®...

View Article

MOVNTI and alignment for real mode

December 3, 2013, 8:00 am

In the SDM rev. 48, vol. 2A, page 3-546, in the description of the exceptions for the MOVNTI instruction in the real-mode, it is specified that the instruction can generate#GP If a memory operand is...

View Article

Intel® Software Development Emulator, Release 6.12

December 4, 2013, 6:08 am

Hello, we just released version 6.12 of the Intel® Software Development Emulator. It is available here:http://www.intel.com/software/sdeIt includes:Support to Mac OSX version 10.9.Improved the TSX...

View Article

Instruction set extensions programming reference, revision 17,

December 4, 2013, 10:01 am

An updated instruction set extensions programming reference, revision 17, has been posted here. It includes information about:Intel® Advanced Vector Extensions 512 (Intel® AVX-512) instructionsIntel®...

View Article

Is there some books about SIMD(sse, avx and so on) optimization?

December 17, 2013, 2:08 am

~Can someone please recommend a few books on program optimization?I use multithreading and simd to improve the performance of the program.I always learn simd through the website, and ask questions in...

View Article

Latest ASM compiler other than Intel C and C++ Compilers

December 25, 2013, 11:21 pm

Hi,Am trying to code my application in Assembly to run on x86. Please suggest me the suitable compiler which will support all SSE4.2 Assembly instructions(other than Intel Compiler). If any links which...

View Article

unaligned loads avx-128 vs. -256

January 4, 2014, 6:00 am

I just saw that my cases using _mm256_loadu_ps show better performance than _mm_loadu_ps on corei7-4, where the latter was faster on earlier AVX platforms (in part due to the ability of ICL/icc to...

View Article

Will AVX-512 replace the need for dedicated GPU's?

January 13, 2014, 1:44 am

I do not expect it to replace high end graphics cards, and will likely be less efficient powerwise than a dedicated gpu (integrated or discrete). As far as I can tell performance wise it will easily...

View Article

ICPC 13.0.2 generates scalar load instead of packed load

January 15, 2014, 1:45 am

Hi all,I'm a little puzzled about the generated assembly code for this little piece of Cilk code:void gemv(const float* restrict A[4], const float *restrict x, float * restrict y){...

View Article

gather instructions and the size of indexs for a given base gpr size

January 15, 2014, 7:18 am

Hi, I have a simple question. When performing address computations, the size of the BASE and the INDEX are required to be the same. I presumed this was the case in the GATHER instructions.. but I...

View Article

Image may be NSFW.
Clik here to view.

FMA manipulation of register’s content for XMM, YMM and ZMM register sets

January 21, 2014, 7:23 pm

hello, there wasn’t a typical introduction thread so since it’s my first post i though to introduce myself. my name is mile (yes like the measuring unit) and i’m a student. i’m noob in this area.i’m...

View Article

Get _mm_alignr_epi8 functionality on 256-bit vector registers (AVX2)

January 27, 2014, 3:52 am

Hello,I'm porting an application from SSE to AVX2 and KNC.I have some _mm_alignr_epi8 intrinsics. While I just had to replace this intrinsic by the _mm512_alignr_epi32 intrinsic for KNC (by the way, I...

View Article

How to clear the upper 128 bits of __m256 value?

January 27, 2014, 4:47 am

How can I clear the upper 128 bits of m2: __m256i m2 = _mm256_set1_epi32(2); __m128i m1 = _mm_set1_epi32(1); m2 = _mm256_castsi128_si256(_mm256_castsi256_si128(m2)); m2 =...

View Article

Different ways to turn an AoS into an SoA

February 8, 2014, 3:13 am

Hi,I'm trying to implement a permutation that turns an AoS (where the structure has 4 float) into a SoA, using SSE, AVX, AVX2 and KNC, and without using gather operations, to find out if it worth...

View Article