Original Post

Hey guys, still slowly chugging along in the background here between work and life. πŸ˜‰

Today I have a question about registers 1-5 on the CPU of the Virtual Boy. I’m confused about what is going on with them exactly and hoping someone could clear it up for me (and hopefully any others also confused)

So according to the unofficial doc, r1-r5 *seem* to be free from what I saw. The V810 Architecture manual seems to suggest they are used for the stack among other things. The seminar slideshow suggests similar, but says it depends on what tools you’re using. And finally, the V810 Architecture Summary flyer claim it is utilized by assemblers and compilers.

What I want to know is, which ones are exactly in use and which aren’t? And would it be wise to not utilize them, for sake of possible future debugging tools or such things?

I have suspicions that the “os” on the system that manages the stack and heap is utilizing at least a few, but does anyone know which ones are used and which are free of that group of registers? And are these registers compiler dependent? As in does the compiler with VBDE use different ones compared to the gccVB one?

I’m trying to convert a function I wrote in C to be in assembly, and to fit into cache. Crazily, am a few registers short of the available r6-r25. I’m considering going into the bit string register range as I’m almost certain I won’t use them during the function, but even then I’m two or three registers short.

Likely I will need to either do some load/store calls for extra variable spacing, but was hoping I could keep it all in the CPU for the speed. It’s quite expensive being a drawing related function. Though maybe doing a rewrite to knock out some variables would be good…

Any information or direction on this would be great!

6 Replies

Short answer is yes it is compiler-dependent. At least, GCC 4 reserves r3 for the stack pointer, and the remaining registers in the range r1-r5 save for r2 are set aside for internal uses. As I look at the v810 patch, r11 may also be used for long jumps, but I’ve never seen that code generated in my own projects. r30 might cause a problem with bitstring instructions depending on what compiler you’re using– you can either disable the frame pointer using -fomit-frame-pointer or just use GCC 4 where it’s patched to use r25 as the frame pointer. πŸ™‚

There is no “OS” built into the system, nor is there a “heap” (unless you implement one)– Whatever code you write is all that it runs. I personally have my own crt0.s that does nothing but initialize the stack, data and bss sections; the rest is handled in my C code. If you don’t have any interrupts firing though, then I would say feel free to use any register you want other than r3 and I believe r31, since it’s used as the link pointer for returning from the jal instruction.

Caveat emptor; it’s been a while since I’ve done any VB coding. Maybe dasi or DogP can chime in with some insight. πŸ˜‰

Hmm, I’ll have to check out that GCC patch you made there! I have a makefile that utilizes the -fomit-frame-pointer already so at least that means I have one more free register. Coincidently, a brief look at that topic already gave me insight how to assign variables in c from specific registers, so thanks for that!

I never knew that r11 also had the possibility of being used as well, I’ll have to watch out for that.

And I guess saying “OS” before was probably misleading, I was more just referring to whatever is managing the stack as the “OS” since I’m not doing it manually.

Interesting about the heap… I always thought the compilers may have snuck in a simple heap manager. I should look at the disassembly of my code more closely to see what’s going in it I suppose. πŸ˜› Also sheds light on how my pointers into memory were never overwritten by anything. …But I’ll cut it right there, before this becomes my pervious memory topic again!

Thanks for the input though!

Now this part is unrelated, but I just realized I was missing the optimization option on the VBDE compiler in the makefile, and wow… one function of my C code went from 115 ticks to 32 ticks… simply beautiful. Just had to share that in my excitement!

HollowedEmpire wrote:
Now this part is unrelated, but I just realized I was missing the optimization option on the VBDE compiler in the makefile, and wow… one function of my C code went from 115 ticks to 32 ticks… simply beautiful. Just had to share that in my excitement!

Greg figured out that his code would run much faster when he used the old 2.95 compiler: http://www.planetvb.com/modules/newbb/viewtopic.php?post_id=27790#forumpost27790

Maybe you want to give that a try.

HollowedEmpire wrote:
So according to the unofficial doc, r1-r5 *seem* to be free from what I saw. The V810 Architecture manual seems to suggest they are used for the stack among other things. The seminar slideshow suggests similar, but says it depends on what tools you’re using. And finally, the V810 Architecture Summary flyer claim it is utilized by assemblers and compilers.

What I want to know is, which ones are exactly in use and which aren’t? And would it be wise to not utilize them, for sake of possible future debugging tools or such things?

Register 1 is supposed to be used for loading large constants into other registers, as in:

movhi 0xABCE, $0, $1
movea 0xEF12, $1, $10 ; $10 is now 0xABCDEF12

Apparently nobody at NEC noticed that you can use the same destination register in both instructions. VUCC (the compiler used for commercial games) also uses it for comparing values when a constant is too large to be encoded into the CMP instructions. I don’t think GCC uses it at all though.

I have never seen register 2 used.

Register 3 is the stack pointer. You can use it as a general purpose register if you store it into a global variable and disable interrupts, since interrupt handlers save all other registers onto the stack (this depends on the compiler though, so maybe gccVB/VBDE does not do that, but VUCC does).

Register 4 is intended to be used for fast (one instruction – LD or ST with a non-zero offset) access to global variables, and VUCC does use it for that, but I think GCC doesn’t, or maybe you need to declare variables specially to take advantage of it.

I have never seen register 5 used.

If you use register 30 and you don’t use bitstring instructions, remember that it is also implicitly used as the destination for the remainder of DIV and DIVU and the high 32 bits of the result of MUL and MULU.

You can also use register 31 if you store it somewhere before calling another function. You don’t have to disable interrupts because interrupt handlers will save and restore it.

@HorvatM
Ah, thank you for that nice rundown there! Very nice! You should see about having that added to the development wiki under the CPU section or something. I was always confused about those first five for the longest time.


@thunderstruck

That topic was pretty fantastic, I actually did see it a few days ago. I been toying around with that compiler, but I must have messed up my makefile for it and ended up with a compiled rom the same size as the VBDE one would produce, but it would not seemingly run on any emulator. I ought to look into that some more, but I was just interested in trying assembly so I quickly dropped it and went back to the other compiler for the time.

The tech scroll has some more information regarding registers, notation and the calling convention:

http://www.planetvb.com/content/downloads/documents/stsvb.html#cpucallingconvention

 

Write a reply

You must be logged in to reply to this topic.