Posted on May 23, 2017
Fast palette lookups
Cascade Quest uses a fixed palette that is based of the 16 default EGA colors. The 16-color Sierra games used dithering to give the appearance of more than 16 colors. Given the relatively blurry monitors and TVs of the day, this produced a more convincing effect than it does on modern crisp LCD monitors. As a result, Cascade Quest uses the “undithered” versions of these dithered colors. That is, each base color blended with every other base color. This results in the palette on the left below. Note that due to duplicates, the result is 136 unique colors (16 + 15 + 14 + …. + 1), not 256.
Using any limited-color palette (with arbitrary colors like VGA, or fixed colors like mine) presents some pretty significant limitations. The most obvious is that blending two colors together becomes much more difficult. We need to take the two RGB values, combine them, and then map this color back to the closest matching color in our palette.
The brute force method for palette lookup is to calculate the euclidean distance between the reference color and each other entry in the palette. The closest one wins.
This is brute force approach is fine to do in the case of a simple remap of each palette index to another – for instance, Cascade Quest does this to support automatically darkening (for sprites in shadow) or converting sprites and backgrounds to a certain lighting setup. These are fixed “remaps” that only need to be calculated once at startup.
However, it is much too slow to do on a per-pixel basis. We need per-pixel blending to support high quality scaling and pseudo alpha-blending. Below is an example of a mushroom. The original image is on the left. In the middle is what happens when you scale it to 73% of the original size using nearest neighbor sampling. This is what Sierra’s SCI1+ engines used for scaling, since it’s quick and you are only ever dealing with existing colors in your palette (SCI0 did not support scaling). On the right is a version that is scaled using bilinear filtering, with the resulting colors remapped to the palette.
You could argue that the middle mushroom is more true to the retro aesthetic (but the right-hand one is clearly a more accurate representation of the mushroom). However, look what happens when you have an image with high frequency details:
All the bars have gone missing in the center image, since the nearest neighbor sampling at that zoom level ended up sampling only the white pixels. On the right is the result using bilinear filtering and remapping to the palette. When the scaling level changes smoothly in game, the problems with nearest neighbor sampling become even more distracting.
How can we more quickly map an arbitrary RGB value to a palette index? We could generate a lookup table. However, to do this accurately, we’d need an entry for every possible color: more than 16 million of them! For optimum speed, whatever lookup table we use needs to be small enough to fit well within the processor’s L1 data cache so that memory access doesn’t become a bottleneck. A 16 megabyte array doesn’t fit the bill.
We could of course just make the grid coarser. Instead of 256 values for each RGB component, we use quantize to 16 or 10 or whatever. Once we do this however, we start to significant amounts of incorrect results.
We can perhaps produce more intelligent quantization. If we look at all our palette colors, the combinations are such that there end up only being 10 unique values for each RGB component (in fact, the same ones for each component). This is a property of the source colors we are using, and wouldn’t be true for an arbitrary palette of course.
In the above image, I’ve drawn in grey lines halfway between the 10 possible component values. We can define buckets between each of the grey lines (a total of 10 buckets). Those buckets have the property that any component value that falls within them will have the contained value (black line) as the closest color component. So we end up with two lookup tables. One (of size 256) maps a single RGB component to a bucket. And the other (of size bucketCount ^ 3, or 1000 in this case) maps the three buckets (R, G, B) to an actual palette index. Of course, we also have to do a one time pass to calculate the closest palette color for each bucket.
So, this would actually work perfectly if our palette contained every combination of those 10 discrete values (10 * 10 * 10 = 1000 palette entries) – but it of course doesn’t. As a result, it falls apart quite spectacularly. We can precalculate the nearest palette index for a particular (R, G, B) bucket, but the actual nearest color to any arbitrary (R, G, B) value in that bucket might be different (again, this wouldn’t be an issue if our palette had every combination of our 10 discrete component values).
I thought it would be useful to visualize my palette’s color distribution, so I put together a quick Unity scene that shows the RGB color space and where each palette value is within it.
Looking at a 2d cross section also helps. Here’s looking along the blue axis (thus we see the red and green distribution:
Getting it done
Once I realized that a perfect solution to this problem is probably not possible, I set out to do the best I could. To get more concrete results, I came up with a random test corpus. 1000 randomly chosen colors. They aren’t completely random though. Instead, they are random blends of two randomly chosen palette colors. This results in a color corpus that is closer to what might be actually used in the game for blending operations.
For this corpus, the bucket method described above resulted in 349 of the 1000 colors being incorrectly matched. Pretty bad – in fact, barely better than using an even distribution. I tried doubling the resolution of the buckets. That improved the wrong matches to 215 out of 1000. But at the expense of a bucket lookup table of 8000 entries (20 * 20 * 20).
I then tried some even distributions. Everything from 10 evenly distributed buckets to 20. While generally using more buckets reduced the number of mismatches, the relationship wasn’t monotonically increasing. Fifteen buckets proved to be good bang for the buck (et) – 234 mismatches, which ended up being fewer than with 16 or 17 buckets.
Still sure that a specific bucket distribution could produce better results, I threw some computational power at the problem. I tried lots of slight variations for the boundaries on each bucket. This ended up in finding a bucket distribution that results in only 149 mismatches out of the 1000 color corpus. Not too shabby.
So the approach I’m using (for now), maps each RGB component to a bucket index. Then the bucket index is used to index in the main lookup array (15 * 15 * 15 = 3375 entries).
Where is this used
I use this quick lookup for alpha blending and bilinear filtering during scaling. However, I use the slower accurate version for the global color remapping, since this is only done once at startup and only needs to evaluate 136 unique palette colors.
I timed the drawing performance of the above outhouse alpha blending cycle. The frame draw times for the outhouse were:
- Alpha blended, using the slow (correct) palette lookup: 13.3ms
- Alpha blended using the quick lookup: 0.3ms (44 times faster)
- No blending: 0.03ms.
I also tried seeing if the size of the lookup table made a difference. That is, 10 buckets instead of 15. It did not (though I’m sure increasing the number of buckets would start to slow things down).
A note on gamma-correctness
I’ve been ignoring this throughout this post, but it’s an important bit to touch upon. The base EGA colors are in gamma-corrected space. That is, those values are exactly what is displayed on the monitor. To properly combine them (to get our 136 color palette) so that they look just like the dithered colors, we need to convert them to linear, take the average of the two, and then convert back to gamma space. If we don’t do this, things will look significantly different.
Now, this should also be done when blending. I don’t do this (I just directly blend colors in gamma space) for a few reasons though:
- It would be slower, since we need to do the conversion to linear and back for every pixel
- The resulting colors will already be wrong anyway, since we’re mapping to our fixed palette. So the “perfection” won’t be noticeable.
- Older graphics hardware has gotten away with not doing this for texture filtering (although in general modern graphics hardware does this properly).