Here’s a quick and easy performance tip: Manually enabling cache before an intense loop and disabling it after creates a VERY noticable performance increase (if used correctly, and where it’s actually needed, of course). In the loop where I create the affine table in Mario Kart, the game was running a little slow on real hardware, so I decided I needed to either use more LUTs to get rid of divisions, rewrite in ASM, or see what happens when I explicitly enable cache. Just enabling cache made the game run MUCH faster (probably ~2x as fast). Of course I knew that was the bottleneck, so it makes sense to enable cache there, but I didn’t expect that much performance difference.
Anyway, just wanted to give a heads up… I was never sure if gccvb handled cache, or if it always left it disabled… I kinda figured it left it disabled, and I believe that’s correct.
Yeah, there’s 1KB cache. ROM (with the default wait states) takes 3 cycles, RAM takes 2 cycles, and cache takes 1 cycle… so it’s definitely faster. You could also change the ROM wait states to 1 to get ROM access down to 2 cycles… that would probably be fine since most of the flash ROMs are faster than 100ns (at least the ones that I’m using).
Anyway, for the code, you can either use the asm in the code, or I use a #define and do CACHE_ENABLE and CACHE_DISABLE:
: /* No Output */
: /* No Input */
: "r1" /* Reg r1 Used */
#define CACHE_ENABLE asm("mov 2,r1 \n ldsr r1,sr24": /* No Output */: /* No Input */: "r1" /* Reg r1 Used */)
//do lots of stuff
BTW, I believe you need to have the tabs in there (you usually need them in asm), and there’s a tab after the \n in the CACHE_ENABLE line (you can’t do multi-lines w/ a ‘\’ in the newest gccvb without making some changes). I think you can use \t instead of tabs if you’d prefer as well.
Edit: There’s a ‘\’ before the n in the CACHE_ENABLE line (if you quote it in the reply it’s there correctly, but in the post it disappeared).