Is it ok to create an array of _m256i
Hi all! I am parallelizing a certain dynamic programming problem using AVX2. In the main iteration of my calculation, I calculate column in matrix where each cell is an AVX2 register -> _m256i. I...
View ArticleAlignment requirements for _mm256_maskload_pd
Hi,Are there any alignment requirements (beyond 8 bytes) for _mm256_maskload_pd and likewise for _mm256_maskstore_pd?Thanks
View ArticleExtract non-zero byte from _m128i
Hi,I have 4 _m128i 64byte elements which can contain 0 or non-zero (+ve, -ve) values. I want to extract non-zero values from them.I looked at _mm_extract_epi8/_mmextract_epi16 but the syntax is int...
View Articlemm256_shuffle_epi8
HI,I am going through the documentation for _mm256_shuffle_epi8https://software.intel.com/sites/products/documentation/doclib/iss/2013/...pseudo code shows only upto 16 bytes... for (i = 0; i < 16;...
View ArticleAlignment requirement for pcmpistri
Hi,I'm testing a custom implementation of strcmp() which involves SSE4.2 and this instruction in particular:pcmpistri $0x18,(%rsi,%rax,1),%xmm1I've made a test that passes unaligned pointers to the...
View ArticleWhy is my AVX slower than SSE?
As the description of "IIR Gaussian Blur Filter Implementation using Intel® Advanced Vector Extensions",The AVX should be faster than SSE,But, my result of performance measurement as following: The...
View ArticleSGX EGETKEY clarification?
I've been looking at a variety of things with SGX, and while looking into the EGETKEY description, I think I've found an inconsistency in the October 2014 spec. Specifically:Table 5-43 says that the...
View ArticleDynamic Shift
Hello,I am trying to achieve a dynamic shift. Well, let me explain the task. I process data with SSE, AVX. Data gets loaded, worked with and later results are stored. To support arbitrary lengths, I...
View Articlethe issue about APIC drop msix interrupt
hello, I have a difficult problem,.scenes are as follows:the hardware env is Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz, a Altera FPGA board. the os is Linux debian-rss 3.16.7-ckt7FPGA create 32 DMA...
View ArticleGuaranteed atomic operation clarification
Hello,I'm trying to understand a line in the Intel Architecture manual. It's a description of a memory operation that is guaranteed to be atomic.The line is at Chapter 8, Section 8.1.1 "Guaranteed...
View Articlesmall typo in Intel® 64 and IA-32 Architectures Software Developer’s Manual
Hi,It seems that there is a small typo in the Intel® 64 and IA-32 Architectures Software Developer’s Manual (Order Number: 253665-054US April 2015), page 3-149 (cmpss instruction) :128-bit Legacy SSE...
View ArticleMPX instructions not in the Appendix A opcode map
Hi,In the last release 55 of Intel® 64 and IA-32 Architectures Software Developer’s Manual in Vol 2C A-11, we can't see MPX instructions. In fact, I usually use opcode maps to find instructions...
View ArticleOoops - wrong instruction description in volume 2 of the SDM
Looking at the new version of Volume 2 of the SDM (document 325383-055), I just noticed that the "Description" field for the VINSERTF128 instruction (page 4-514) is incorrect. It appears to have been...
View ArticleProcessor Trace decoding support library for Atom
Dear Intel guru,Could I ask will libipt on github support decoding small-core (Atom) processor trace packets (pt pkt)? Or is already supported in other commercial product like PAL (Platform Analysis...
View ArticlePCI Legacy Mode - Why does it use subtractive decoding?
Hello, On most modern Intel boards, they have a feature called 'PCI Legacy Mode' that allows users to add old PCI cards. The datasheets say - "PCI functionality is not supported on new generation of...
View ArticleEncodings for instructions with {sae} are unclear in the doc
Chapter 4.6 indicates that EVEX.L'L is encoded for the vector length, and that {sae} is supported for all vector lengths.However, the various instruction pages, such as VCMPPD, only show {sae} for...
View ArticleNew extension needed for Maps and Sets
Idea:In current SW lot of time every app is spending walking Maps and Sets (besides arrays, those are most often used data structures). I think this is place where CPU can provide enormous acceleration...
View ArticleIRET Pseudo-code Bug
Hi,I believe that there is a documentation bug in the pseudo-code for the IRET instruction in the current edition of Volume 2A of the Architectures Software Developers' Manual.The case we're looking at...
View ArticleWhat is syntax for broadcast decorator?
The ISE doc only describes the decorator syntax with the single example {1to16} (document 319433-022 page 7).I would assume that generally you write {1ton} where n = the full vector size / the single...
View ArticleWrong memory size for VGATHERQPS (?)
My version of the document, 319433-022, page 350 showsEVEX.128.66.0F38.W0 93 /vsib VGATHERQPS xmm1 {k1}, vm64xI think this should be vm32x, not vm64x, since the operands are single-precision...
View Article