Τρίτη, Νοεμβρίου 30, 2004

The FPU stack is your friend.

This is an issue that has always intrigued me. Many people, especially in techno-phile web sites like slashdot.org, seem to dislike the x87 fpu stack. Far too often I notice messages that propose an alternatice register-based model, like the one used for integer calculations and logic. This seems very strange to me for two seperate reasons:
  • The "stack" structure is inherently very appropriate for formula evaluation. As a matter of fact, those familiar with the Reverse Polish Notation, a (mostly) lost art that was once popular with HP scientific calculators, will quickly point out that it is the quickest way to do formula evaluation without parentheses. It is a mathematical fact that all formulas can be evaluated in a stack based model, as long as the stack has enough space. Admittedly, the stack model can be a bit cumbersome at first but this is only a matter of habit and does not make it inferior to a register model.
  • Back in the days where the fpu was an extra component--anyone remember the 387?--floating point instructions where very, very expensive. The co-processor did not have its own bus and it used bandwidth from the main processor (40MHz FSB * 32bit plus overhead for 386DX40). Things have changed a lot since. With the arrival of the Pentium processor, the first x86 processor to include by default a decent fpu (note that the 486 had DX and SX versions, so not every 486 chip had an fpu--plus it wasn't that great), the relative cost of floating point math had plummeted. A significant example is the transition from integer based 3d mathematics, as seen in Doom source code (486 era), to floating point based 3d mathematics, as seen in the Quake source code (Pentium era). The unlucky competitors in that period where Cyrix and AMD, whose "Pentium-class" processors would always get severely beaten in Quake fpu intensive benchmarks, even though they offered good integer performance. A little known fact about the Pentium fpu was that Intel had made the fxch instruction extremely fast by allowing it to execute in parallel with every non-dependent instruction. This essentially means that since the Pentium processor anyone can use the fpu stack like a register set by freely executing the fxch instruction. There it is. Quit whinning and use the fpu however you like.
PKT

P.S. Note that new RPN HP calculators are still being made. These are great tools, but most people think that it propably makes more sense to buy a Palm and throw calculator software on top. Not the same thing, though, for several reasons.

1 Comments:

Blogger vvas said...

Stack vs. register set issues aside, people do not necessarily dislike x87 because it's inconvenient to program with or anything; they mostly dislike it because it's slow. The SIMD instruction sets, 3DNow! and SSE, have superseded it for floating-point computation; even though they started out as special instruction sets for specific applications, they are now complete enough to allow generic floating-point calculations to be performed with them.

Or at least I'm guessing that this is so, based on the observation that GCC has the capability these days to output floating point code in SSE rather than x87. Try it out. I've written the following toy program to test this capability:

int main(void)
{
float x = 1000000.0f;
while (x > 1.0f) x /= 1.000001f;
return 0;
}

When I compile it with "gcc -O2 -march=athlon-xp", the generated object code uses the x87 instructions as usual. However, if I also add "-mfpmath=sse" to the mix, no x87 instruction is generated, the SSE instructions are used instead. The SSE executable runs in 2/3 the time of the x87 executable.

Note that I was careful to only use single-precision floats in this case, since double-precision instructions were only added in SSE2, and my Athlon XP doesn't support that. Your Athlon 64, on the other hand, should be able to handle any kind of floating-point computation with SSE.

12:48 π.μ.  

Δημοσίευση σχολίου

<< Home