Intel® Software - Intel ISA Extensions

↧

LDDQU vs. MOVDQU guidelines

May 3, 2018, 9:59 am

Hi,I'm wondering what are the guidelines on using LDDQU vs. MOVDQU instructions on latest and future Intel CPUs.I know that during Netburst era LDDQU was supposed to be a more efficient way of loading...

View Article

The memory ordering semantics of mfence versus those of locked instructions

May 9, 2018, 8:51 pm

Even after many years of the existence of the mfence instruction (and even more time with the lock prefix), and a fairly careful study of the system programming manual, something still isn't clear to...

View Article

AVX-512 release date

June 3, 2018, 9:56 pm

Hello,Do you know when this extension will release for intel iCore ?I am working on a 3D engine with 3D matrix calculation on full CPU since 2014 and I actually use the AVX with its 256 bit...

View Article

gcc not finding a _mm256_storeu2_m128i

June 6, 2018, 7:10 am

Hi,I am developing where i need to use the intrinsic function _mm256_storeu2_m128i. although i have included necessary header file and compilation flag gcc is not able to find the function. what might...

View Article

Using AVX opcodes slow my proc

June 8, 2018, 8:34 am

Strange, but AVX version this code slower XMM version in 20 times. CPU i7-6950X 3.0GHz Working on Win10/X64. Is any idea how run this code properly? IDA Pro dissasembly shows what code right. No...

View Article

KUNPCK* instructions behavior in SDM and Intrinsics Guide

June 14, 2018, 5:08 pm

Hi,I wonder what is the behavior of the KUNPCKBW/KUNPCKWD/KUNPCKDQ instructions. In SDM, the description of the instructions imply that they interleave individual bits of the input registers. This is...

View Article

Error in pseudo-code for RDPMC in SWDM Volume 2

June 19, 2018, 8:33 am

I am pretty sure there is a typo (and an inconsistency in notation) in the pseudo-code for the RDPMC instruction in Volume 2 of the SW Developer's Manual. (I am working from document 325383-067, May...

View Article

what are the performance implications of using vmovups and vmovapd...

June 20, 2018, 1:22 pm

Hi all,I see 2 instruction for virtually performing the same operations - vmovups and vmovapd as per the intel intrinsics guide...

View Article

Image may be NSFW.
Clik here to view.

Confusion in behavior of _mm256_loadu_ps and _mm256_loadu_ps instrinsics

June 21, 2018, 7:05 am

Hi all,I performed a quick test to understand the behaviors of _mm256_load_ps and _mm256_loadu_ps SIMD intrinsic respectively, and the behavior is quite unexpected.I am wondering if this is a bug by...

View Article

I understand Why SSE is slower than ANSI C

July 19, 2018, 3:52 am

Hi, I wrote a very simple c program some days ago, I tried to optimize the code my program with sse, but i just understood sse slower than c. This program is very important for my work if you can help...

View Article

RDTSC to measure performance of small # of FP calculations

July 20, 2018, 3:48 am

Hey there I found my problem at old topic herehttps://software.intel.com/en-us/forums/intel-isa-extensions/topic/306222but I can not understand which solution was true and I decide repeated questionIn...

View Article

When will SnowRidge be available?

July 22, 2018, 7:16 pm

Hi,I'm interested in a few new instructions that will be available in Snow Ridge, but I there is very little information about it, google does not help mcuh either.I wonder when will these instructions...

View Article

MWAIT is not improving performance and why my machine stucks?

August 3, 2018, 2:14 am

Hi, I'm writing a simple kernel module to test monitror/mwait instructions on my machine, which has i7-7700K processor. I use a char[64] for each core, so the false wakeup should be minimized.I was...

View Article

Intrinsic functions _rdtsc and _rdtscp

August 13, 2018, 10:56 am

Hello There is an intrinsic _rdtsc according to [1]. The questions are:1- What is the unit of the output? It is an unsigned number. Is that nano second? clock cycle? ... 2- Why there is a form _rdtscp...

View Article

Disabling HW prefetcher

August 25, 2018, 3:02 am

HiWith _mm_clflush(), I flushed an array from all cache levels. Next, I to measure two accesses with __rdtsc(). While I know the distance between two accesses is larger than cache line size, e.g. 80...

View Article

Determining wake up reason for MWAIT

September 13, 2018, 2:21 am

Hello,I'm trying to figure out how can one check what is the reason for the MWAIT to wakeup.I know there are several reasons for MWAIT to wakeup, including a write to the monitored address (of course,...

View Article

could not decode some pattern of vgatherdps

September 13, 2018, 10:41 pm

The byte code of `vgatherdps zmm0{k1}, [rax + zmm0]` is 62F27D49920400.But>xed -d 62F27D49920400>62F27D49920400>ERROR: GATHER_REGS Could not decode at offset: 0x0 PC: 0x0:...

View Article

Incorrect links in the Architectures Software Developer’s Manual

September 23, 2018, 6:43 am

Hi,There are a couple of incorrect links (references) to "Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines" in the Intel® 64 and IA-32 Architectures Software...

View Article

Unable to generate Vectorization report in icc/icpc compilers.(icc...

October 1, 2018, 3:39 am

when i executed the icc command i got following error message :$ icc -vec-report2 p1.cicc: command line remark #10148: option '-vec-report2' not supportedeven i am unable to pass flags for sse,avx.what...

View Article

Skylake documentation bug

October 1, 2018, 1:39 pm

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica; color: #0860a8}This is for Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-039 December 2017, Page...

View Article