Intel® Software - Intel ISA Extensions

↧

Why FMA is slower than SSE here?

December 16, 2016, 2:23 am

I am optimizing app which counts correlation coefficients many times. Loops were easy to vectorize, but there are also some calculations made outside of them. I tried to partially optimize them using...

View Article

AVX add slow due to vinsertf128

December 17, 2016, 5:44 pm

CPU is a 3820, ICL 14.0 and VS2013, variables are double'sauto newvel = velocitiesy[i] + force;That line is slow because instruction vinsertf128 in this case has a high CPI of over 3.2, this is the...

View Article

[XED] how to encode mov instruction

December 27, 2016, 12:16 am

Hello, all.I am trying to encode mov and call instruction on CentOS7, but encountered some error.The source code is as below.#ifdef __X86_64__ #define XED_MMODE...

View Article

AVX512 suboptimal intrinsics compilation

January 3, 2017, 9:28 am

I'm looking into the compilation result, of what the Intel compiler makes out of AVX512 intrinsics. (latest trial compiler downloaded a few weeks ago)There are several strange things I notice, to...

View Article

Go programs (even an empty one) hang on exit

January 5, 2017, 12:56 am

Hi,If I compile an empty go program and run it under SDE64 7.49 on Linux, the program does not exit unless I send it multiple signals (4 or 5 SIGINT or SIGQUIT seems to do the trick).By "empty" I mean...

View Article

Parallelization + Vectorization using OpenMP in Sandy Bridge

January 9, 2017, 12:05 am

Hi,I would like to ask question about parallelization+vectorization:1) Is it possible to implement parallelization+vectorization at the same time (i.e. access AVX in Sandy Bridge processor using...

View Article

Code scales poorly with AVX

January 11, 2017, 6:32 pm

This code scales poorly with AVX on my Sandy Bridge, how can I make it more vectorizer friendly:for (auto i = 0; i < pcount; i += 2){ for (auto j = 0; j < pcount; j += 2){ if (i == j) continue;...

View Article

Is xend treated as a full memory barrier?

January 13, 2017, 6:25 am

I've started attempting to learn RTM extensions. The most common examples I can find online are using them to implement a mutex or concurrent lock. Often they are similar to:#include...

View Article

_mm_prefetch usage

January 15, 2017, 6:01 am

Hi, I couldn't find an answer to this question and it might be silly but does _mm_prefetch need vzeroupper if mixed with AVX or AVX2 code since it is an SSE intrinsic and non-vex instruction? I am...

View Article

mitigating permute costs in AVX 256?

January 15, 2017, 9:21 am

Hello, I'm investigating conversion of a number of compute kernels from AVX 128 to AVX 256 and would appreciate any guidance which might be available on getting a small number of operations on port 0+1...

View Article

Image may be NSFW.
Clik here to view.

How to speed up this code?

January 17, 2017, 4:26 pm

Hello together,many thanks for all contributors to my past question.Crazy things happens, 2 years ago I was internally moved to UI & Communication development to speed up that things :) So my...

View Article

CPI rate blows up

January 20, 2017, 4:01 pm

Hi,i*m try to solve the last question here (sadly without answer so far) myself. Now I got some rating from VTune amplifier and see some strange results as well (the assembler code is generated by VS...

View Article

Cannot access compiler intrinsics for logarithm in Visual Studio

January 25, 2017, 8:43 am

Hello, I cannot use the compiler intrinsics related to logarithms in neither Visual Studio 2013 nor 2015. I tried to use _mm_log_ps and it was not found. I used the "immintrin.h" header file. I looked...

View Article

Question about latency

January 29, 2017, 7:09 am

AVXOP xmm0, xmm0, xmm1 Hi,years ago I've read and heard different mystic things about latencies caused by a regsiter choise if using AVX, and why it is better to use AVX instead of SSE -...

View Article

Random slow downs with AVX2 code.

January 31, 2017, 4:40 pm

I wrote a subroutine mostly using compiler intrinsics of AVX2 and AVX, I used some SSE instructions too but I did set the enhanced instruction set to AVX2 in the project settings of Visual Studio. My...

View Article

E5-1650 v4, What are the AVX 'Base and 'Turbo' Speeds?

January 31, 2017, 6:04 pm

Hi;I'm trying to determine if (what appears to be) unexpected (below base frequency) throttling on my new system is being caused by AVX usage when I run various stress programs like Prime-95 and the...

View Article

Question about performance difference SSE4/AVX vs. AVX2 with dual-channel vs....

February 1, 2017, 4:43 pm

Hi,today I've interesting question about your experience (not only theoretical improvement) with code performance difference on SSE4/AVX with dual-channel memory board vs. AVX2 with quad-channel...

View Article

Slightly OT, but maybe somebody has an idea.

February 6, 2017, 4:11 pm

Hi,(My question abot ISA-Extension is near the bottom of post)today I has found a old piece of code, done time profiling and ....was nearly fallen from the chair.What's happen? This very smal piece...

View Article

Image may be NSFW.
Clik here to view.

why is ‘_mm512d load/store’ intrinsic changed to vmovups not vmovupd?

February 12, 2017, 11:09 pm

in my application, speed is very important. so I use intel advisor on my application, then I find that there are some type conversions.I think it is weird, because there are some float type but I...

View Article

Skylake Xeon and AVX-512VL

February 16, 2017, 12:28 am

Hi all, please excuse my ignorance but I am just wondering if the Skylake Xeon processor is released to the market now?As I need to use the AVX-512VL (not AVX-512F or others) instruction set, and...

View Article