Quantcast
Channel: Intel® Software - Intel ISA Extensions
Browsing all 685 articles
Browse latest View live

LDDQU vs. MOVDQU guidelines

Hi,I'm wondering what are the guidelines on using LDDQU vs. MOVDQU instructions on latest and future Intel CPUs.I know that during Netburst era LDDQU was supposed to be a more efficient way of loading...

View Article


The memory ordering semantics of mfence versus those of locked instructions

Even after many years of the existence of the mfence instruction (and even more time with the lock prefix), and a fairly careful study of the system programming manual, something still isn't clear to...

View Article


AVX-512 release date

Hello,Do you know when this extension will release for intel iCore  ?I am working on a 3D engine with 3D matrix calculation on full CPU since 2014 and I actually use the AVX with its 256 bit...

View Article

gcc not finding a _mm256_storeu2_m128i

Hi,I am developing where i need to use the intrinsic function _mm256_storeu2_m128i. although i have included necessary header file and compilation flag gcc is not able to find the function. what might...

View Article

Using AVX opcodes slow my proc

Strange, but AVX version this code slower XMM version in 20 times. CPU i7-6950X 3.0GHz Working on Win10/X64. Is any idea how run this code properly? IDA Pro dissasembly shows what code right. No...

View Article


KUNPCK* instructions behavior in SDM and Intrinsics Guide

Hi,I wonder what is the behavior of the KUNPCKBW/KUNPCKWD/KUNPCKDQ instructions. In SDM, the description of the instructions imply that they interleave individual bits of the input registers. This is...

View Article

Error in pseudo-code for RDPMC in SWDM Volume 2

I am pretty sure there is a typo (and an inconsistency in notation) in the pseudo-code for the RDPMC instruction in Volume 2 of the SW Developer's Manual.  (I am working from document 325383-067, May...

View Article

what are the performance implications of using vmovups and vmovapd...

Hi all,I see 2 instruction for virtually performing the same operations - vmovups and vmovapd as per the intel intrinsics guide...

View Article


Image may be NSFW.
Clik here to view.

Confusion in behavior of _mm256_loadu_ps and _mm256_loadu_ps instrinsics

Hi all,I performed a quick test to understand the behaviors of _mm256_load_ps and _mm256_loadu_ps SIMD intrinsic respectively, and the behavior is quite unexpected.I am wondering if this is a bug by...

View Article


I understand Why SSE is slower than ANSI C

Hi, I wrote a very simple c program some days ago, I tried to optimize the code my program with sse, but i just understood sse slower than c. This program is very important for my work if you can help...

View Article

RDTSC to measure performance of small # of FP calculations

Hey there I found my problem at old topic herehttps://software.intel.com/en-us/forums/intel-isa-extensions/topic/306222but I can not understand which solution was true and I decide repeated questionIn...

View Article

When will SnowRidge be available?

Hi,I'm interested in a few new instructions that will be available in Snow Ridge, but I there is very little information about it, google does not help mcuh either.I wonder when will these instructions...

View Article

MWAIT is not improving performance and why my machine stucks?

Hi, I'm writing a simple kernel module to test monitror/mwait instructions on my machine, which has i7-7700K processor. I use a char[64] for each core, so the false wakeup should be minimized.I was...

View Article


Intrinsic functions _rdtsc and _rdtscp

Hello There is an intrinsic _rdtsc according to [1]. The questions are:1- What is the unit of the output? It is an unsigned number. Is that nano second? clock cycle? ... 2- Why there is a form _rdtscp...

View Article

Disabling HW prefetcher

HiWith _mm_clflush(), I flushed an array from all cache levels. Next, I to measure two accesses with __rdtsc(). While I know the distance between two accesses is larger than cache line size, e.g. 80...

View Article


Determining wake up reason for MWAIT

Hello,I'm trying to figure out how can one check what is the reason for the MWAIT to wakeup.I know there are several reasons for MWAIT to wakeup, including a write to the monitored address (of course,...

View Article

could not decode some pattern of vgatherdps

The byte code of `vgatherdps zmm0{k1}, [rax + zmm0]` is 62F27D49920400.But>xed -d 62F27D49920400>62F27D49920400>ERROR: GATHER_REGS Could not decode at offset: 0x0 PC: 0x0:...

View Article


Incorrect links in the Architectures Software Developer’s Manual

Hi,There are a couple of incorrect links (references) to "Figure 6-4. Stack Usage on Transfers to Interrupt and Exception-Handling Routines" in the Intel® 64 and IA-32 Architectures Software...

View Article

Unable to generate Vectorization report in icc/icpc compilers.(icc...

when i executed the icc command i got following error message :$ icc -vec-report2 p1.cicc: command line remark #10148: option '-vec-report2' not supportedeven i am unable to pass flags for sse,avx.what...

View Article

Skylake documentation bug

p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 10.0px Helvetica; color: #0860a8}This is for Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-039 December 2017, Page...

View Article
Browsing all 685 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>