Quantcast
Channel: Intel® Software - Intel ISA Extensions
Browsing all 685 articles
Browse latest View live

Why FMA is slower than SSE here?

I am optimizing app which counts correlation coefficients many times. Loops were easy to vectorize, but there are also some calculations made outside of them. I tried to partially optimize them using...

View Article


AVX add slow due to vinsertf128

CPU is a 3820, ICL 14.0 and VS2013, variables are double'sauto newvel = velocitiesy[i] + force;That line is slow because instruction vinsertf128 in this case has a high CPI of over 3.2, this is the...

View Article


[XED] how to encode mov instruction

 Hello, all.I am trying to encode mov and call instruction on CentOS7, but encountered some error.The source code is as below.#ifdef __X86_64__ #define  XED_MMODE...

View Article

AVX512 suboptimal intrinsics compilation

 I'm looking into the compilation result, of what the Intel compiler makes out of AVX512 intrinsics. (latest trial compiler downloaded a few weeks ago)There are several strange things I notice, to...

View Article

Go programs (even an empty one) hang on exit

Hi,If I compile an empty go program and run it under SDE64 7.49 on Linux, the program does not exit unless I send it multiple signals (4 or 5 SIGINT or SIGQUIT seems to do the trick).By "empty" I mean...

View Article


Parallelization + Vectorization using OpenMP in Sandy Bridge

Hi,I would like to ask question about parallelization+vectorization:1) Is it possible to implement parallelization+vectorization at the same time (i.e. access AVX in Sandy Bridge processor using...

View Article

Code scales poorly with AVX

This code scales poorly with AVX on my Sandy Bridge, how can I make it more vectorizer friendly:for (auto i = 0; i < pcount; i += 2){ for (auto j = 0; j < pcount; j += 2){ if (i == j) continue;...

View Article

Is xend treated as a full memory barrier?

I've started attempting to learn RTM extensions. The most common examples I can find online are using them to implement a mutex or concurrent lock. Often they are similar to:#include...

View Article


_mm_prefetch usage

Hi, I couldn't find an answer to this question and it might be silly but does _mm_prefetch need vzeroupper if mixed with AVX or AVX2 code since it is an SSE intrinsic and non-vex instruction? I am...

View Article


mitigating permute costs in AVX 256?

Hello, I'm investigating conversion of a number of compute kernels from AVX 128 to AVX 256 and would appreciate any guidance which might be available on getting a small number of operations on port 0+1...

View Article

Image may be NSFW.
Clik here to view.

How to speed up this code?

    Hello together,many thanks for all contributors to my past question.Crazy things happens, 2 years ago I was internally moved to UI & Communication development to speed up that things :) So my...

View Article

CPI rate blows up

  Hi,i*m try to solve the last question here (sadly without answer so far) myself. Now I got some rating from VTune amplifier and see some strange results as well (the assembler code is generated by VS...

View Article

Cannot access compiler intrinsics for logarithm in Visual Studio

 Hello, I cannot use the compiler intrinsics related to logarithms in neither Visual Studio 2013 nor 2015. I tried to use _mm_log_ps and it was not found. I used the "immintrin.h" header file. I looked...

View Article


Question about latency

AVXOP xmm0, xmm0, xmm1  Hi,years ago I've read and heard different mystic things about latencies caused by a regsiter choise if using AVX, and why it is better to use AVX instead of SSE -...

View Article

Random slow downs with AVX2 code.

 I wrote a subroutine mostly using compiler intrinsics of AVX2 and AVX, I used some SSE instructions too but I did set the enhanced instruction set to AVX2 in the project settings of Visual Studio. My...

View Article


E5-1650 v4, What are the AVX 'Base and 'Turbo' Speeds?

Hi;I'm trying to determine if (what appears to be) unexpected (below base frequency) throttling on my new system is being caused by AVX usage when I run various stress programs like Prime-95 and the...

View Article

Question about performance difference SSE4/AVX vs. AVX2 with dual-channel vs....

 Hi,today I've interesting question about your experience (not only theoretical improvement) with code performance difference on SSE4/AVX with dual-channel memory board vs. AVX2 with quad-channel...

View Article


Slightly OT, but maybe somebody has an idea.

    Hi,(My question abot ISA-Extension is near the bottom of post)today I has found a old piece of code, done time profiling and ....was nearly fallen from the chair.What's happen? This very smal piece...

View Article

Image may be NSFW.
Clik here to view.

why is ‘_mm512d load/store’ intrinsic changed to vmovups not vmovupd?

 in my application, speed is very important. so I use intel advisor on my application, then I find that there are some type conversions.I think it is weird, because there are some float type but I...

View Article

Skylake Xeon and AVX-512VL

Hi all, please excuse my ignorance but I am just wondering if the Skylake Xeon processor is released to the market now?As I need to use the AVX-512VL (not AVX-512F or others) instruction set, and...

View Article
Browsing all 685 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>