Hi
With _mm_clflush(), I flushed an array from all cache levels. Next, I to measure two accesses with __rdtsc(). While I know the distance between two accesses is larger than cache line size, e.g. 80 bytes distance, the TSC for the first access sounds like a miss (which is true), while the TSC for the second element sounds like a hit (which is wrong).
It seems that HW stride prefetcher brings the second element. Is there any way to force the processor not to prefetch?