Hi guys, I’m struggling to get those extra FPS to make my demo run smoothly on hardware, so after doing a lot of tricks and algorithm optimizations I’m running out of ideas, and since I don’t think I’m able to write better asm code than that produced by the compiler any time soon, I was thinking about changing my whole arithmetic from float to int, using a scale to achieve the precision I need:
1.67f = 167
and going to float as much late in the execution as possible.
The question is: will this make any significant difference to make worth the effort? I really don’t know how much overhead it is to work with floats rather than with ints in the V810 so I can’t make educated guess on this matter.
thanks for your responses.
- This topic was modified 14 years, 2 months ago by jorgeche.
I would definitely wait for DogP to weigh in, but here’s what I have:
This file from David “Reality Boy” Tucker’s site includes some fixed point math routines using the VB’s native 13.3 and 7.9 fixed point formats (used for affine, as you already know).
Yup, I knew about that, but going back and forth fixed format to floats involves multiplications and divisions which I don’t want, and doing fixed multiplication is really costly:
FIX7_9_MULT(a,b) (fix7_9) ((((s32)(a))*((s32)(b)))>>9)
if you compare it with a multiplication between ints:
a * b; // here you dont have the shift
so I would just scale everything by 100 (seems enough) and just do int operations, and when I really need the float, for setting the scale in affine mode for example, only at that point I divide my int by the scale, 98 / 100.0f… I think it is faster (but more cumbersome to program) than using fixed point formats.
Isn’t that for multiplying two ints to get a fix7_9? If you used two fix7_9’s it seems like it would keep dividing a number by 512 each time you used it…
Multiplying two fixed point nums is no different than what you describe; i.e. using the inbuilt * operator.
DEFINITELY use fixed point for as much stuff as possible… floating point is horribly slow. My first Mario Kart engine was floating point (so I could get a working prototype which I could then convert small pieces at a time to fixed point), and it barely ran (just a few FPS). In fixed point, it runs full speed with no problem. I’ve also done some raycaster stuff with floating point and it really is slow.
I don’t know what you’re using floating point stuff for, but I’d assume you can get the precision you need from a 32 bit fixed point number… just use a bunch of bit shifts and keep track of where the decimal is. Oh… and for Affine… you really don’t need to use floating point. The Affine hardware is 16 bit fixed point, so using a 32 bit fixed point number for the calculations leading up to it is more than enough.
On a side note… I’ve never looked at the gcc output from floating point code… it’s possible that it’s emulating the floating point in fixed point hardware rather than using the floating point instructions, which could explain why it’s so slow (this is how it’s done on fixed point processors). The floating point instructions should take nearly the same amount of time as the fixed point mul/div instructions (8 to 30 clocks compared to 13 for multiply, 44 clocks compared to 38 for divide) although the add and subtract take much longer (9 to 28 clocks compared to 1 for add and 12 to 28 compared to 1 for sub). But there shouldn’t be THAT many instructions that a those few clock cycles pile up… but the software emulation is always REALLY slow. And conversion from floating to fixed and back using the dedicated CPU instruction is also relatively quick (5 to 16 and 9 to 14 cycles).
Ok guys thanks a lot for your replies, I will begin changing as much of my maths as I can to fixed point math. I will try to have it ready for the competition deadline… Hope to see your programs soon!.
Another question, what happens with the number sign when going back and forth fixed point format from an int? with floats there is no problem since it always divides or multiplies, but since the conversion to int is done with shifts…. must I take car of sign?
About using everywhere int instead of u8, etc… I need to save as much memory as possible, because of it, I can’t waste space for example for a flag an use an int to hold it when I can use a bit field in a u8, I will not make that concession, so the question is: is it worth to operate over those values (u8, u16, etc) as ints at the expense of making the code a lot less readable and making it less type safe?
I’m not sure if I understand your questions… fixed point is the same as integers… you’re just interpreting where the point is. If you do shift a signed number to the right (division by 2), you need to make sure you sign extend (or arithmetic shift), but there’s no way to do that with an instruction in C, except division. Most CPUs have an instruction SAR (shift arithmetic right), which the optimizer may take advantage of if the divisor is a power of 2. But if you’re shifting right to interpret as a 16 bit number (like in the FIX7_9_MULT you posted above), you don’t need to worry about it, because the result from the multiply will be sign extended to 32 bits, and shifting right 9 won’t lose the sign bit.
About the bit fields, what is the problem? I would use ints when possible unless you’re using several bytes in a row since integers must be word aligned (and shorts must be half-word aligned), so you’re wasting the space between a byte and an int if you put them consecutively.
Ok, I understand, I was just worried because of the bit sign, but as you say, it is clear as using integers.
About the int vs byte, I must revisit the memory alignment stuff, I will check where I can group bytes and will use just int were not.
Thanks again for your help DogP.
BTW I’m half way changing everything from float to fixed point, it would be great to see the FPS boosted by the change… fingers crossed.
I just wanted to update a couple things in this post… first, I was talking to RunnerPack in IRC the other day and he checked out the disassembly of the latest gccvb using floating point numbers, and it does use the proper CPU floating point instructions for the math as well as for converting to/from ints. Looking at the v810 datasheet, the instruction to convert back and forth can take between 5 and 16 cycles, so if you’re gonna use floats, you should stay with it until you no longer need it (don’t convert back and forth every time). Floating point add/sub is slow though (9 to 28 cycles), so try to limit doing those, and use as many mul/div instructions while you’re in floating point, since they’re comparable to fixed point mul/div (8 to 44 compared to 13 or 38 in fixed point)… mul and div are usually where you want floating point anyway, since it gives you a large range to work with. Floating point is still slow, so I’d recommend using fixed point if possible though.
Also, I wanted to clarify something I stated in one of the earlier posts about the sign bit… in C, right shifting an unsigned number will do a logical right shift (not sign extending), but right shifting a signed number should do an arithmetic right shift (sign extending). You do need to be aware that right shifting a negative number won’t be identical to dividing by 2 though, since it “rounds” the opposite way than division. -4/2=-2, -4>>1=-2… but -5/2=-2, -4>>1=-3… which may not be a big deal for a video game… except that it also means that -1/2=0, -1>>1=-1 (it never goes to 0).
Thanks for the follow up, it makes things clear. I’ve changed all my math to fixed point, it indeed runs faster than with floats.
BTW, congratulations for your first place in the competition.