Quantcast
Channel: Intel® Software - Intel ISA Extensions
Viewing all articles
Browse latest Browse all 685

RDTSC to measure performance of small # of FP calculations

$
0
0

Hey there I found my problem at old topic here

https://software.intel.com/en-us/forums/intel-isa-extensions/topic/306222

but I can not understand which solution was true and I decide repeated question

In response to the original question, I suggest that on late PIV
hardware (Northwood and Prescott core machines) that you have little
chance of getting reliable timings for a short instruction sequence for
a variety of reasons.

In the Intel staff responses it has already been mentioned that the
first iteration is almost exclusively slower than later iterations but
there is another factor that has always effected timings under ring3
access in Windows 32 bit OS versions. Faced with higher privileged
processes being able to interfere with lower privilege level
operations, you will generally get at least a few percent variation on
small samples and it gets worse as the sample gets smaller.

You can reduce this effect by setting the process priority to high or
time critical but you will not escape this effect under ring3 access. I
have found from practice that for real time testing you need a duration
of over half a second before the deviation comes down to within a
percent or two.

What I would suggest is that you isolate the code in a seperate module in an assembler and write code of this type.

push esi
push edi

mov esi, large_number
mov edi, 1
align 16
@@:
; your code to time here
sub esi, edi
jnz @B

pop edi
pop esi

Adjust the immediate "large_number" so that the code you are timing
runs for over a half a second, over 1 second is better, set you process
priority high enough to reduce the higher privilege interference to
some extent and you should start to get timings around the 1% or lower
variation.

Two trailing comments, the next generation Intel cores will behave
differently on a scale something like the differences between the PIII
and PIV processors so be careful not to lock yourself into one
architecture. The other comment is as far as I remember the FP
instruction range while still being available on current core hardware
is being replaced by much faster SSE/2/3 instructions so if your target
hardware is late enough to support these instructions, you will
probably get a big performance hit if you can use the later
instructions.

Regards,

https://phonty.com/


Viewing all articles
Browse latest Browse all 685

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>