Original Post

PCM, the common abbreviation of Pulse Code Modulation, is the simplest way to represent an audio stream digitally. It’s effectively a two-dimensional graph, where the vertical axis is value and the horizontal axis is time. By plotting data points along this graph, called samples, a wave form can be expressed as data and later used in computer audio systems.

PCM on Virtual Boy

First things first, Virtual Boy was not designed with PCM audio in mind. It was designed as a feature-rich “chiptune” generator, so all PCM techniques are dark wizardry and not intended use of the hardware. That’s not to say we can’t do it, though.

The most obvious solution is to rapidly update the Virtual Boy’s wave memory as a sort of small audio buffer. Unfortunately, this isn’t feasible for two reasons:

* Wave memory cannot be updated while sound is being generated, even by channels that are using different wave buffers. Yeah. Updating wave memory, therefore, absolutely necessitates stopping audio entirely, which produces a dead spot every 32 samples.

* Virtual Boy samples are unsigned, meaning the minimum value is 0. The aforementioned dead spots, then, drop the output level to 0. Why does this matter? It matters because PCM streams situate “silence” in the middle of their range, which for Virtual Boy would be 31 or 32. If samples less than “silence” can be considered negative, then 0 is the maximum negative value: it will produce a high-frequency, low duty cycle pulse every time wave memory is updated.

If only Virtual Boy allowed you to update wave memory while sound is being generated… You know, like Mednafen does! But alas, it does not. However, there is an alternative.

Software Modulation

So using wave memory as PCM buffers might be a wash, but that doesn’t mean we’re out of options. At the end of the day, what we want is to modify the output level of the audio being generated, and modify it at some predetermined interval. As long as all the ups and downs occur at the right moments, that’s what counts.

Here’s what we can do: take only one wave pattern, and initialize all of its samples to 63, the maximum positive value. Then fire up a sound channel using that wave, with any frequency, and let it fly. This will produce a constant high signal, which is inaudible, but it lets us do funky stuff with it.

The high-frequency hardware timer can be used to schedule changes to the volume level of that sound channel. The channel never stops emitting its constant high signal; we simply change on our own exactly how loud that signal is. If we do this fast enough, we can replicate the effects of PCM audio, and all on hardware that doesn’t natively support PCM.

The 16-Level Approach (4-bit)

I strongly suspect this is what Galactic Pinball does. But I haven’t actually seen it in action to verify one way or the other. But hey, it’s an option!

The volume level of a sound channel can be independently configured for both the left and right speakers. These levels are 4-bit, meaning there are 16 volume levels, with 0 being silence.

If a PCM stream is also 4 bits per sample, then it’s a simple matter of changing the volume of a sound channel each time. This works for both mono and stereo streams, and it couldn’t be easier!

The 46-Level Approach

4 bits might be fast and easy, but it’s also rather restrictive. Not that sounds are bad at 4 bits, but they certainly aren’t good either. Fortunately for us, we can step it up a bit.

A single sound channel can output levels from 0 to 15. Two sound channels mix together through addition. So using a second sound channel, we add levels 16 through 30. And using three channels, we add levels 31 through 45. Modifying three channels is much the same as modifying one; we just move onto the next channel if the value is above 15, or above 30.

Slight problem, though: Virtual Boy only has 5 wave channels, and we’d need 6 for 46-level stereo. We luck out again, because the noise channel can actually be used as that sixth channel. See, the way the noise algorithm works, the first 7 samples generated during a sequence will always be 63, the maximum value. And the sequence is reset whenever the tap location is configured. So if we just set the tap location every time we update the volume, we get a constant high signal out of the noise channel!

The 181-Level Approach

If you don’t need stereo, you can use all 6 sound channels with additive mixing for a single audio stream. By adjusting volume alone, you have 181 levels of output, and they can get quite loud.

The 256-Level Approach (8-bit)

Sound channels have more than just volume: they also have an envelope setting. The envelope value is also 4-bit, so 0 to 15 for that one as well. When a sample is generated, the volume and the envelope are multiplied with one another, producing the output sample. This can range from 0 * 0 = 0, to 15 * 15 = 225.

The full list of unique products is attached to this post as Produts.txt.

As it turns out, there’s only 90 of those. That’s wedged between 6 and 7 bits of data, and they’re not evenly-spaced. If you plot them out, the curve is roughly exponential. Not all that useful in its current form.

If you use two sound channels, on the other hand, it’s like taking two of these products and adding them together. And if you do that, you can make a whole slew of different values. Every level from 0 to 255 can be expressed as some sum of two sound channels.

The full list of sums, from 0 to 255, is attached to this post as Sums.txt.

Let’s take one as an example:

* Sample: 254
* Volume 1: 15
* Envelope 1: 14
* Volume 2: 11
* Envelope 2: 4

We know from earlier that each sound channel outputs a level that is the product of its volume and its envelope. That said, channel 1 here will output 15 * 14 = 210, and channel 2 will output 11 * 4 = 44. When mixed, the combined output will be 210 + 44 = 254.

By using 4 sound channels in this manner, you can have true 8-bit stereo coming out of the speakers.

Static!

While the high-pitched whine of the 0-based dead spots aren’t a problem with using software modulation, static certainly is. I haven’t put a great deal of research into why this happens, but I suspect it has more to do with the audio data I’ve been using and less to do with the techniques of software modulation. All I know is that when I did the 16- and 256-level methods, the static was prohibitively audible, but the 46-level method sounded okay. Go figure.

With some additional testing and experimentation, I’m hoping that we’ll find that the 256-level approach is fully adequate for general-purpose PCM output on Virtual Boy.

Demo

Attached to this post are the following files:

* AtlasParkSector.ogg – A sound clip from City of Heroes
* pcm_46.zip – Contains a Virual Boy ROM showcasing the 46-level PCM technique
* pcm_46.ogg – A recording of that program, run on the actual hardware.

Mednafen has different timing than the Virtual Boy hardware, so its output will not be identical. You can adjust the delay between samples by pressing Up and Down on the left D-Pad.

22 Replies

Wow that is impressive. Quite a bit over my head..but the demo is extremely impressive.

Can these approaches also recreate voice ?

-Eric

After consulting with the scriptures, it would seem that 8-bit PCM doesn’t want to happen.

* First of all, as it turns out, the product of volume * envelope actually loses its lower 3 bits. If not both of the operands were 0, then 1 is added to the result. This drops the total number of valid products from 90, which I thought it was, down to 27 on the hardware.

* It’s still possible to generate 256 evenly-spaced levels, using all 5 wave patterns and combinations of volume and envelope. But the process of setting all of those values on the sound channels at run time is even more prohibitive than my previous attempt: the static almost completely drowns out the PCM clip.

So as far as 8-bit audio is concerned, bus latency is the limiting factor. It simply takes too long to set that many properties on the sound channels that the time between sets becomes audible.

Analyzing my resampled source clips, I discovered that the 16- and 46-level runs did indeed contain static in the source audio, and that it wasn’t being caused by playing the sounds on Virtual Boy. I unfortunately haven’t been able to craft a noiseless version of Atlas Park Sector for demonstration (once converting into 46-level samples), but other, less-noisy files sounded great.

All that being said, what should be possible is to generate sounds and music notes at run-time using a wave table, and producing sounds in that fashion is much more resilient when it comes to noise production. That’s an experiment for another day, but I’m going to go on record saying that it IS possible to do noiseless PCM in this fashion, and doing so is a feasible means of providing PCM audio to games.

Guy Perfect wrote:
We luck out again, because the noise channel can actually be used as that sixth channel. See, the way the noise algorithm works, the first 7 samples generated during a sequence will always be 63, the maximum value. And the sequence is reset whenever the tap location is configured. So if we just set the tap location every time we update the volume, we get a constant high signal out of the noise channel!

Nice. I had not realized this. Could definitely do six independent PCM voices then!

So I’ve been scrounging around on the internet to find out the mysterious source of the staticy noise in my sound clips. And I’ve found what it is, what causes it, and to an extent, what to do about it!

When reducing the sample size of a stream, a process called quantization, you won’t always be able to land in good spots. Drawing samples down from 16 bits to 8 bits tends to work pretty well, since there’s an evenly-spaced multiple there. But in this case, I’m drawing a 65536-level sample down to a 46-level sample, so there’s going to be a lot of “in betweens” that get messed up due to rounding. This messed-up-ness is referred to as the quantization error.

There’s a method called noise shaping that takes the quantization error for each sample in a stream and feeds it back into the input prior to quantization, which supposedly helps reduce the apparent noise in the output. Supposedly. I found all sorts of sources that were all “You can use it as feedback yaaaay!” but not a one of them told me how to apply it, where the sample rate figures in, etc. If anyone knows more about noise shaping, I’d love to have a pointer or two.

The other thing that will help, especially in conjunction with noise shaping, is to use a low-pass filter to blot out the noise in frequencies above the maximum signal frequency. Noise shaping, I’m told, can shove the quantization noise up into the high ranges, meaning a low-pass filter can snip it right off at the neck.

But that’s neither here nor there. I won’t be able to progress until I get some form of noise shaping working.

blitter wrote:
Nice. I had not realized this. Could definitely do six independent PCM voices then!

If you control a PCM stream, you can put as many voices as you want in it. You won’t need multiple sound channels in that regard. I’m only using them here to increase my sample size.

Guy Perfect wrote:

blitter wrote:
Nice. I had not realized this. Could definitely do six independent PCM voices then!

If you control a PCM stream, you can put as many voices as you want in it. You won’t need multiple sound channels in that regard. I’m only using them here to increase my sample size.

Doing that in real-time though will significantly bump up the CPU usage, right? Probably better to leverage the VB’s mixing hardware as much as possible. Six channels of 4-bit sound looks to me to be the most efficient way of doing multi-channel PCM audio on the VB.

Nice brain dump. A few comments:

First… yes, I believe Galactic Pinball and the T&E Soft games used the 4-bit approach, as does my wav converter.

Regarding the 46-level… you’re saying you can get 46 levels in 3 channels, but you need 6 channels to do 46 levels? I quickly skimmed through it, so maybe I’m missing what you’re saying, but are you saying because of stereo, you’re using 3 channels for each? Why wouldn’t you use the L/R levels for stereo (you get 4 bits for each channel independently, right)?

I took a different approach for more bits when I was experimenting with PCM years ago. I used varying levels of the waveform RAM, since 63 is (basically) 2x of 32, which is 2x of 16, and so on. I liked the waveform RAM, because it’s later in the chain, so all 6-bits get applied (though they could get chopped later if the level is too low).

I tried a couple different methods… one was a dynamic method using a single channel, and always outputting just 5 bits (4-bits L/R, plus 1 bit of ENV), but sliding which 5 bits were output (for example, if the sample was 5, it’d output the lower 5 bits from the low-bit RAM… but if the sample was 250, it’d output the upper 5 bits from the high-bit RAM). This was pretty cool, because it gave more overall dynamic range, but still used the simple 5-bit resolution. Of course you could round, rather than just masking the bits, but that’s more computation (it’d be better to do that ahead of time in the conversion process). The disadvantage of this is the small error, that it really only works for mono (stereo would need a second channel), and that it uses several waveform RAM slots.

The other method was using multiple channels with multiple RAM slots. This way is nice, because you CAN use it with stereo, and you can easily get more bits. Set the first channel’s waveform to 63, and the second channel’s waveform to 4… put the upper 4 bits in the first waveform and the lower 4 bits in the second channel. I guess if you really cared about perfection, you could do 32 and 2, but your low bits may get wiped away if you go too low.

The problem with any PCM though is that it’s processor intensive, and ROM hungry. It’s cool for short effects, where nothing else is happening… like “T&E Soft Presents”, but it’s not really practical for music or in-game special effects. I stuck with 4-bit for the wav converter because the sound is reasonable, you can do 2 samples/byte (5KB/s @ 10KSps mono), and it’s (relatively) easy on the processor.

BTW, are you sure that the RAM is unsigned? It’s been a long time since I’ve looked at any of this sound stuff (I’m pretty much going by memory on all of it), but I seem to remember looking at the output on the scope, and determining that it was signed. But I could be mistaken… IIRC, the dev manual didn’t specify signed or unsigned (which is why I checked it out in the first place). If I could just find half my notes and test code, I’d probably have some better input. 😉

Oh, and I’m not sure that I’d jump to blame quantization error for the static sound. I’d run a few tests first. A simple one would be to make a wav file on your computer with the same 46 levels and see how it sounds. Then convert that, which should (in theory) map perfectly to the VB. Does it sound identical? Also, I’d try generating a tone and look at it on a scope to inspect the quality of the signal (does it look sinusoidal, are the samples evenly spaced, is there any strange behavior). Looking at the spectrum of that signal would probably tell you a lot as well.

DogP

DogP wrote:
Regarding the 46-level… you’re saying you can get 46 levels in 3 channels, but you need 6 channels to do 46 levels? I quickly skimmed through it, so maybe I’m missing what you’re saying, but are you saying because of stereo, you’re using 3 channels for each? Why wouldn’t you use the L/R levels for stereo (you get 4 bits for each channel independently, right)?

You’re right. My brain was still in “mutually exclusive” mode, which necessitating isolating left and right to their own channels when modifying the envelope and/or wave parameters. If only volume is being adjusted, then you can certainly use one sound channel for both left and right, since the input sample is always the same. Using all 6 sound channels, this gives us 91 levels in stereo!

DogP wrote:
The problem with any PCM though is that it’s processor intensive, and ROM hungry. It’s cool for short effects, where nothing else is happening… like “T&E Soft Presents”, but it’s not really practical for music or in-game special effects. I stuck with 4-bit for the wav converter because the sound is reasonable, you can do 2 samples/byte (5KB/s @ 10KSps mono), and it’s (relatively) easy on the processor.

I wouldn’t say it’s processor-intensive or ROM hungry. You’re making a couple of assumptions that don’t have to be the case. (-:

The physical upper bound on PCM sampling rate for Virtual Boy is 41700hz, which is the sampling frequency of the VSU when converting to an analog signal. Timing techniques notwithstanding, let’s look at that in terms of CPU power. The processor runs at 20000000hz, which is 479.6 times the maximum theoretical rate of the sound samples.

Now, how much CPU does it take to output a sample? Let’s say we’re using the 46-level approach on three sound channels (which is sufficient for stereo!). That’s three writes to the VSU, plus a handful of overhead cycles to prepare the samples. Let’s say, in some terrible worst-case scenario, a sample requires 50 cycles to produce. That still leaves us with 9.5 times the CPU rate compared to sampling rate, meaning 41700hz output is still ~10.4% CPU, which isn’t bad.

41700hz is approaching the rate of CD audio, 44100hz. Even chopping that in half to 20850hz, thus halving the CPU usage, still produces a good sound. Using a simpler sampling method, like 31- or 16-level, will further reduce CPU usage.

As for being ROM hungry, what novice tries to use uncompressed wave files for music on a system like Virtual Boy? It’d be far more effective to use sampled instruments and generate a PCM stream at run-time using the other 90% of the CPU. (-:

DogP wrote:
BTW, are you sure that the RAM is unsigned? It’s been a long time since I’ve looked at any of this sound stuff (I’m pretty much going by memory on all of it), but I seem to remember looking at the output on the scope, and determining that it was signed. But I could be mistaken…

Absolutely. Here’s a wave pattern I just tested:

And here’s the signal that the hardware produced:

DogP wrote:
Oh, and I’m not sure that I’d jump to blame quantization error for the static sound. I’d run a few tests first. A simple one would be to make a wav file on your computer with the same 46 levels and see how it sounds.

Jumping to conclusions would be foolish. That’s why I made a wav file on my computer with the same 46 levels to see how it sounded. It contained the same static that I heard on the Virtual Boy. (-:

Guy Perfect wrote:
I wouldn’t say it’s processor-intensive or ROM hungry. You’re making a couple of assumptions that don’t have to be the case. (-:

What about the game? Does that get 0% of the CPU? 😉 Like I said… if you’re not doing anything else, it works great, but I don’t want to spend anywhere near 10% of the CPU on audio in a real application. And 50 cycles/sample would be optimistic IMO, if you count all the overhead involved (assuming it’s actually playing a game, firing an interrupt ~40,000x per second)… and if you’re generating the samples on the fly, obviously that’s even more overhead. Plus, didn’t you say your 8-bit didn’t work, because it took too long? IMO, the advantage of PCM is for realistic sounds that wouldn’t be generated on the fly, like speech… not for creating a better VSU using the CPU). But in the end, if you make a game that uses PCM for everything, and it works well… that’s all that matters.

Guy Perfect wrote:
Absolutely. Here’s a wave pattern I just tested:

Ah, good… I must have been thinking of something else I was working on.

Guy Perfect wrote:
Jumping to conclusions would be foolish. That’s why I made a wav file on my computer with the same 46 levels to see how it sounded. It contained the same static that I heard on the Virtual Boy. (-:

If 46 levels doesn’t have sufficient quality, why even consider it? How many bits do you need before you’re happy with the sound? It sounds like that’s the number you should be shooting for.

DogP

DogP wrote:
What about the game? Does that get 0% of the CPU? 😉 Like I said… if you’re not doing anything else, it works great, but I don’t want to spend anywhere near 10% of the CPU on audio in a real application.

100% – 10% > 0%, you know. I’m not sure what you’re trying to say.

DogP wrote:
And 50 cycles/sample would be optimistic IMO, if you count all the overhead involved (assuming it’s actually playing a game, firing an interrupt ~40,000x per second)… and if you’re generating the samples on the fly, obviously that’s even more overhead.

In the program attached to the first post, I was using the hardware timer interrupt, which proved to be less than what it said on the label in terms of speed. With the interrupt breaking, stack management and other overhead, it took more than 20 microseconds to get in and out of the interrupt handler.

But that’s if you’re using the hardware timer. My next experiment won’t be using the hardware timer. (And don’t go conjecturing at me that THAT won’t work. I’ll do it, and then you’ll have to come up with an explanation. (-: )

DogP wrote:
Plus, didn’t you say your 8-bit didn’t work, because it took too long?

I said it couldn’t update four sound channel properties for each of the left and right signals without audible side-effects. This is an entirely separate issue from the sampling rate.

DogP wrote:
IMO, the advantage of PCM is for realistic sounds that wouldn’t be generated on the fly, like speech… not for creating a better VSU using the CPU). But in the end, if you make a game that uses PCM for everything, and it works well… that’s all that matters.

Ah ah ah, no conjecturing allowed. That is not the scientific method! If you’re creative enough, you’d be surprised just what can be accomplished, even when others say it can’t be done.

DogP wrote:
If 46 levels doesn’t have sufficient quality, why even consider it? How many bits do you need before you’re happy with the sound? It sounds like that’s the number you should be shooting for.

Did someone say 46 levels had insufficient quality while I wasn’t looking?

46 levels has beautiful quality. Check the sound file I attached to this post. The only problem is that quantization noise. If I can eliminate that, we’re in business.

Very cool! It sounds great, nice work! I just don’t get how simply changing the volume on a constant tone can result in such different sounds?

Guy Perfect wrote:
100% – 10% > 0%, you know. I’m not sure what you’re trying to say.

I’m referring to:
>As for being ROM hungry, what novice tries to use uncompressed wave files for music on a system like Virtual
>Boy? It’d be far more effective to use sampled instruments and generate a PCM stream at run-time using the
>other 90% of the CPU. (-:
10%+90% = 100% 😉

Guy Perfect wrote:
In the program attached to the first post, I was using the hardware timer interrupt, which proved to be less than what it said on the label in terms of speed. With the interrupt breaking, stack management and other overhead, it took more than 20 microseconds to get in and out of the interrupt handler.

Right… that’s exactly what I’m saying. I’m not saying it can’t be done any other way, but the intuitive (and friendly) way, while also running a game, is to use an interrupt.

Guy Perfect wrote:
I said it couldn’t update four sound channel properties for each of the left and right signals without audible side-effects. This is an entirely separate issue from the sampling rate.

I wasn’t talking about sampling rate, I’m just saying that computing and updating register values DOES take time.

Guy Perfect wrote:

DogP wrote:
IMO, the advantage of PCM is for realistic sounds that wouldn’t be generated on the fly, like speech… not for creating a better VSU using the CPU). But in the end, if you make a game that uses PCM for everything, and it works well… that’s all that matters.

Ah ah ah, no conjecturing allowed. That is not the scientific method! If you’re creative enough, you’d be surprised just what can be accomplished, even when others say it can’t be done.

Uh… yes, note the _IMO_… I’m entitled to an opinion. Also, note the last sentence… I’m saying I believe it can be done, and if you do it, good for you. I would truly like to see a 41KHz PCM implementation in a full-blown VB game… it would be quite awesome. But my _opinion_ is that utilizing the VSU for all that it’s worth is more worthwhile than doing everything in the CPU. It’s almost like doing everything manually by drawing directly to the framebuffer, rather than using worlds, objs, etc. Yes, you can do it, and in some cases it’s a good idea, but in a lot of cases, it’s not.

Guy Perfect wrote:
46 levels has beautiful quality. Check the sound file I attached to this post. The only problem is that quantization noise. If I can eliminate that, we’re in business.

Well… quantization noise is caused by quantization error. Quantization error is the difference between the desired value and the value that it’s represented by. So, unless you generate a signal where all the values land right on your 46 possible levels, you’re always going to have quantization noise. You can reduce the quantization error (and therefore noise) by increasing the number of bits.

DanB wrote:
Very cool! It sounds great, nice work! I just don’t get how simply changing the volume on a constant tone can result in such different sounds?

No… not changing volume on a constant tone, but on a constant DC waveform.

DanB wrote:
I just don’t get how simply changing the volume on a constant tone can result in such different sounds?

The sound channel isn’t emitting a tone; it’s emitting a constant voltage. At full volume, for the sake of explanation, let’s say the value of the signal is 1. It could also be said that the signal’s maximum value is 1. Just 1, a horizontal line, infinitely extending to the right as time goes on.

When a sample from a wave comes in, it expresses some fraction of that voltage. Let’s say the sample’s value is 0.5. Therefore, the output voltage should also be 0.5 times the maximum value of the signal, so in this case, the output signal is also 0.5 (since the “max value” is 1).

Now let’s say that rather than changing the value of the signal coming out, we just set the volume to 0.5 (aka 50%) instead. What will that do to the output? It’s multiplied, just like the sample, so the result is still 0.5 * 1 = 0.5.

The input and output don’t change, but the means by which varying levels are instructed to come out of the sound channel changes.

DogP wrote:
Well… quantization noise is caused by quantization error. Quantization error is the difference between the desired value and the value that it’s represented by. So, unless you generate a signal where all the values land right on your 46 possible levels, you’re always going to have quantization noise. You can reduce the quantization error (and therefore noise) by increasing the number of bits.

But if there’s some other way to reduce the apparent quantization error, then we get the best of both worlds: 46 levels, low or non-existent noise. Perfect!

But my research has turned up an unfortunate truth: it’s impossible. If a sample calls for a 4, but can only express odd numbers, then the sample necessarily gets distorted. This will cause, in general, white noise at a frequency equal to the sampling rate. Whenever something gets rounded away, the increase or decrease (aka, the quantization noise) is introduced into the stream.

When it’s strictly a necessity to preserve signal when reducing the number of levels (generally described in terms of bit depth), there is a way to do it. But it also decreases the signal-to-noise ratio, since the means by which it is done is by introducing noise. How can that possibly be useful? Take a look:

* The top row is a sine wave, the source signal.

* The middle row is that same sine wave, converted into a stream that has only 3 levels for samples: -1, 0 and 1. It doesn’t much look like a sine wave. In fact, it’s two pulse waves with different duty cycles stacked on top of each other. If you listen to it, it certainly doesn’t sound like a sine wave.

* The bottom row is also a 3-level stream, but with white noise added to the sine wave prior to quantization. The rounding points are -0.5 and 0.5, so the random noise was generated between those two values. The result looks like those pulse waves from the second row, but with a bunch of garbage tossed in.

This is called dithering. While the bottom row is still only 3-level samples, it sounds pretty close to the original sine wave when you listen to it… It also sounds like a bunch of white noise on top of that. The original signal may have been preserved better than the straightforward quantization did, but at a terrible cost.

Attached to this post are sound files for each of the sounds in the image above.

In some cases, the apparent strength of the dither noise can be alleviated by carefully designing noise in the “shape” of the desired spectrum, hence the name “noise shaping”. This generally involves using the quantization error per sample as input, but also involves some hocus pocus and finger-crossing.

The unfortunate truth is this: regardless of how you dither, noise shaping or otherwise, there’s no guaranteed, tried-and-true method that will perform best for arbitrary audio streams. It’s mostly a matter of trial-and-error to see what works best on a case-for-case basis.

In terms of Virtual Boy, it means PCM will always contain noise, unless the PCM signal itself could just as well have been made using the tone generators that the VSU was designed to use. The absolute upper bound of sample levels on the hardware is 10 bits due to how samples are processed. Even if through some mystic arts we could get samples that large out to the speakers, it’s still subject to the unavoidable quantization noise.

So how many bits is enough that quantization noise is negligible? Not 16; CD audio is generally dithered before being finalized. 24 is said to be enough, and at 32 bits, quantization noise will be on the molecular scale, so it’s not worth worrying about.

Okay, I have good news and bad news.

The good news is, Atlas Park Sector at 40000hz fit into the 2 MB ROM size limit that FlashBoy affords, and I got it to render on the hardware! The file is attached to this post. That is to say, the first recording of a Virtual Boy PCM stream with 46-level samples at 40KHz is attached to this post. (-:

I’ve also attached the source clip so you can hear what it sounded like after quantization but before coming out of the Virtual Boy. The hardware recording does have a stray pop and crack that wasn’t in the source clip, and I believe that was from when the output got sampled while the multiple sound channels were being configured.

ALSO attached to this post is a .zip file containing a ROM that plays the clip at 22050hz, more or less, on the hardware. I only did that to reduce the file size, as the 40KHz clip was juuuuust under 2 MB (it exceeded 2 million bytes). I can post the 40KHz ROM if need be, but there’s also the bad news to cover…

Up and Down on the left D-Pad change the sampling delay as before. But this demo doesn’t use the hardware timing; it just uses a loop. The algorithm looks roughly like this:

for (x = 0; x < delay; x++);
OutputSample(left, right);

There's some extra code in there to process video frames, but you get the idea.

Now take a look at the delay in the 22050 program. It's 10. Ten. As in, an "x++" loop that runs ten times between samples. And that happens approximately 22050 times a second. Ten.

Knock it down to 11025 samples per second and that loop will run a big ol' whopping 20 times between frames. That takes, what, an addition, a comparison and a branch? Something like 5 cycles total before the C compiler adds its bloat?

I don't know what all the compiler does, but I sure as heck know that the combination of +, < and goto twenty times a second isn't going to be enough to run a game on. You can hypothetically squeeze out some better performance with a strictly assembly-coded program, but who's gonna take the time to make a game in assembly these days? So I'm gonna have to bite the bullet on this one. Unless you drive down the sampling quality (bit depth and sample rate) to pitiful territory, the Virtual Boy just doesn't have enough power in practice to use PCM as the predominant source of audio in games. )-:

In today’s lesson, we learn that “pitiful territory” is in the ear of the beholder. Turns out 16-level samples aren’t necessarily bad sounds! The quantization error is more pronounced, but holy shnap, it really looks like the sampling rate is more of a factor in sound quality than the sample size.

I pulled out all the stops to accelerate processing. Firstly, I need to get dasi to configure the wait controller in crt0. Second, I used a different -O switch in the compiler, which did seem to help. Last, I sized down the audio stream to 16-level samples, just for kicks.

The result? 40KHz output with a “delay” parameter of 9. Mednafen sounds better setting it at 32. Recall that in the previous post, it was 10 for 22050 (albeit 46-level samples). Due to the reduction in bit depth, the pops and crackles are gone. It now sounds just like the source audio!

Granted, this is still, what, something like 36 “x++”s between samples for 11025hz? I’m still not sure it’s feasible to do a PCM game using C, but I’m rather convinced that the system is physically capable of pulling it off well.

Applicable files are attached to this post.

Great work building on the groundwork laid by DogP and HorvatM! This whole thread has been an entertaining and informative read. It would be nice to take a peek at your code, though 😉

Although, given your final analysis, it seems as though, if we want to upgrade the sound hardware on the VB (which I still think hasn’t been taken anywhere near its true potential), we’ll have to take advantage of the cartridge port audio pass-through system the designers built in.

On that note, since you seem to like OGG Vorbis (and who doesn’t?) you (and/or others) might be interested in this thing.

It’s a ~$10 chip designed to decode OGG Vorbis audio. There might be better products and/or prices; it’s the first one I found with a quick search (I actually found it quite a while ago with this very idea in mind).

RunnerPack wrote:
Great work building on the groundwork laid by DogP and HorvatM!

Built nothin’. I did this all from scratch. (-:

RunnerPack wrote:
It would be nice to take a peek at your code, though 😉

What, it wasn’t good enough when I said “writes volume in a loop”? )-:

// Repeat until the VIP handler raises a flag
for (new_frame = 0; !new_frame;) {

    // Increment quantum until it matches target
    if (++quantum >= target) {
        quantum = 0; // Reset quantum

        // Write out one sample, returning to 0 when the end is reached
        VUE_VSU_CHANNELS[0].volume = resSamples[sample++];
        if (sample >= 1000001) sample = 0;

        // The source stream happens to be 1000001 samples long
     }

} // new_frame

In this case, “target” is the number that gets written out to the screen, and that the user can configure. It’s just a delay between samples. It’s the oldest trick in the book as far as time delays are concerned, but the depressing part is that it only allows for some small number of iterations.

RunnerPack wrote:
Although, given your final analysis, it seems as though, if we want to upgrade the sound hardware on the VB (which I still think hasn’t been taken anywhere near its true potential), we’ll have to take advantage of the cartridge port audio pass-through system the designers built in.

Precisely. I think (and don’t quote me on this) that the analog signal produced by the VSU first passes through the cartridge before heading out to the speakers. Whatever happens inside the cartridge is a matter of IC design, but all the commercial games just directly connect the in and out pins, with no additional audio functionality.

Guy Perfect wrote:

RunnerPack wrote:
Great work building on the groundwork laid by DogP and HorvatM!

Built nothin’. I did this all from scratch. (-:

Well, then, good work reinventing DogP’s wheel 😉

He did this waaaay back in 2008. I thought it was on the site somewhere, but I guess he only shared it via IRC. And then there’s HorvatM’s 8-bit wav player he just posted a few days ago which I assume must use the same idea.

What, it wasn’t good enough when I said “writes volume in a loop”? )-:

Well, yes, I did understand that part (having seen DogP’s implementation) but I thought you came up with some kind of filtering/dithering trick to make it sound better. Sorry for the misunderstanding.

Good analysis… I would say “told you so”, but it’s never time wasted to prove something to yourself. 🙂

RunnerPack wrote:
Although, given your final analysis, it seems as though, if we want to upgrade the sound hardware on the VB (which I still think hasn’t been taken anywhere near its true potential), we’ll have to take advantage of the cartridge port audio pass-through system the designers built in.

I keep considering doing some sort of add-on using the pass-through… but if only 1% of people playing it can experience it (and require adding support for this unofficial hardware to emulators), it’s probably not worth it (we’d need a Flashboy++ or something to get it into the hands of more people 😉 ).

RunnerPack wrote:

Guy Perfect wrote:

RunnerPack wrote:
Great work building on the groundwork laid by DogP and HorvatM!

Built nothin’. I did this all from scratch. (-:

Well, then, good work reinventing DogP’s wheel 😉

He did this waaaay back in 2008. I thought it was on the site somewhere, but I guess he only shared it via IRC. And then there’s HorvatM’s 8-bit wav player he just posted a few days ago which I assume must use the same idea.

There were a lot of tests and revisions (which were probably only posted on IRC, or maybe a few forum threads), but it was all rolled into my wav converter, which is posted here:
http://www.planetvb.com/modules/tech/?sec=tools&pid=wav2vb , and technical description/discussion here:
http://www.planetvb.com/modules/newbb/viewtopic.php?post_id=8921

Heh, and this reminds me of: http://knowyourmeme.com/memes/events/you-didnt-build-that .

BTW… you don’t need to modify the crt0 to set the waitstates… you can set it in your code like: HW_REGS[WCR]|=ROM1W; (where WCR is 0x24 and ROM1W is 0x01).

DogP

DogP wrote:
I would say “told you so”, but it’s never time wasted to prove something to yourself. 🙂

What you said was that the CPU doesn’t have enough power to make PCM feasible. What I said is that making a Virtual Boy program in C won’t get you enough power to make PCM feasible. I will continue to testify that a VB program written in assembly can be powerful enough to make some level of PCM useful as the sole source of audio in a game.

In C, every “x++” involves both a load and a store, and reaching out to system memory will be the bottleneck thanks to bus waiting. I rewrote the sampling loop in assembly. The *only* RAM access was when loading the sample; even the VIP flag was set in ADTRE to prevent bus waiting. The increase of speed was rather pronounced: roughly 15 times as fast as the C code. However, incorporating that loop into the C program introduced a function call, which drove the required delay parameter down to 8. |-:

Dodgamn, gcc, what is it about function calls that are so gosh darn hard?

The next time I take a look at this, I’m going to code the entire program in assembly to see what happens.

DogP wrote:
BTW… you don’t need to modify the crt0 to set the waitstates… you can set it in your code like: HW_REGS[WCR]|=ROM1W; (where WCR is 0x24 and ROM1W is 0x01).

We libvueians have no need for such archaic, outdated technologies. (-:

VUE_WAIT_CONTROL = vueWaitControl(VUE_WAIT_1, 0);

Guy Perfect wrote:
What you said was that the CPU doesn’t have enough power to make PCM feasible. What I said is that making a Virtual Boy program in C won’t get you enough power to make PCM feasible. I will continue to testify that a VB program written in assembly can be powerful enough to make some level of PCM useful as the sole source of audio in a game.

No… I never said that the CPU doesn’t have enough power… actually, I said the opposite. If you read back, I said that I _do_ think that it’s possible, but not worth it. If you write a game in assembly, counting cycles, etc… sure, you can probably make something work. But like you said, nobody is going to do that (unless you call pong with an amazing sound engine a worthwhile game).

And I’m also not talking about “some level of PCM”… I’ve already seen and shown that 4-bit from ROM is kinda quick, easy, etc. But it is pretty ROM hungry, still wasteful of the CPU, and not great quality (the Gameboy had 4-bit audio… we have 10 bits @ 41KHz, why throw that away?). You were talking about 40KHz, 8-bit, with waveforms generated on the fly (so the audio, PLUS a game fit in 2MB, not just the audio, like your one track that by itself barely fit in 2MB). I say a functional technical demo of this would be reasonable, but a full blown game where a large portion of CPU time is wasted on audio… I don’t see the point. Reducing the sample rate reduces the load on the CPU, which is fine for speech, or special effects… but who wants an entire music score/sound effects limited to ~5KHz?

Though you seem to want to think I’m saying it’s not possible, so you can prove me wrong… so if that’s motivation for you to do it, I’ll gladly say that it’s not possible.

Also, regarding dithering… that won’t work in this case. You’d need a LPF after the dithering… since the highest rate that you can dither is within the system’s producable output range (it is being output), so you’re just summing a 1-bit high frequency signal on top of your other signal. Look at the design of a sigma-delta DAC… that’s similar… without the LPF, it’s not much of a DAC (well, it’s a 1-bit DAC).

Guy Perfect wrote:

DogP wrote:
BTW… you don’t need to modify the crt0 to set the waitstates… you can set it in your code like: HW_REGS[WCR]|=ROM1W; (where WCR is 0x24 and ROM1W is 0x01).

We libvueians have no need for such archaic, outdated technologies. (-:

VUE_WAIT_CONTROL = vueWaitControl(VUE_WAIT_1, 0);

Woah! You libvueians must be some advanced race. Why did you modify the crt0, when you had such an amazing and intuitive control mechanism for it? x++; is pretty archaic too, right? Is the libvueian way to do it x=addOneToThisValue(x); ? That’s too advanced for me… I’ll just stick with my stone and chisel.

On a serious note, is the last 0 saying that it’s the ROM bit, not the EXP bit? Maybe defining that would make it a bit more clear… everything else is actually descriptive, but a stray, undescriptive 0 looks arbitrary and out of place.

I still prefer specifically setting the bits, because I can read the dev manual, decide which bits I want to set, and set them, without digging through header files to figure out that what I want to do requires something called VUE_WAIT_CONTROL, vueWaitControl, and an argument of VUE_WAIT_1 (and apparently a 0 for some reason)… my way (as in the way it’s been done until the libvueians decended to Earth to spread their superior knowledge) has the names defined the same as the register names from the manual… but if that’s too “archaic” for you, that’s your own call. But I’m sure you’re gonna say that libvue will have full documentation explaining every part of it… yay. My archaic way already does.

DogP

 

Write a reply

You must be logged in to reply to this topic.