by Spike » Wed Apr 10, 2013 11:20 am
less C operators perhaps.
in reality though, if r_turb_s/t is a memory operand, the shift just means that it reads a short instead of a long.
while if its a register operand then the added instruction for the shift will only cost 1 cycle, and the extra memory for the added instruction will at least partially fit inside the extra 3 bytes required for the 32bit mask value.
your real issue is that 0x7FFFFF is in the region of 8 million (more if r_turb_turb is an array of ints instead of chars/bytes). At this point you should be asking yourself how much L1 cache your cpu has. I'll help you out: not nearly that much.
'The original Pentium 4 had a 4-way set associative L1 data cache of size 8 KB'
'The original Pentium 4 also had an 8-way set associative L2 integrated cache of size 256 KB'
It also depends what else you have in memory too, like the instructions you're executing and things (so that's 2kb of your l1 gone for each separate region of memory).
We might have some awesome clock speeds nowadays, but that just means performance is more dependant upon cache speed+size than ever.
Your instructions will remain in cache the whole time. Your turb_s lookup will require 1 of your 4-way blocks, turb_t will require another, and your write to r_turb_pbase will consume the fourth. Any accesses outside of the 2k cache block will result in a cache miss. If your r_turb_t value is changing by more than 1<<11 with each iteration then you're guarenteed 2 cache misses each loop. Least-Recently-Used allocation schemes will probably result in your memory write region getting flushed at the same time (but the cpu should be smart enough to at least not flush the cache around eip).
From memory, a cache miss is about 32 cycles, and will replace part of your cpu cache resulting in more cache misses elsewhere.
A shift instruction (with register source+dest) is 1 clock.
Long story short, you've traded 2 clocks for 0-96 clocks in enough iterations of your loop, and your loop is short enough that the extra clocks are *really* noticable.
.