About:

The brief here is an in-theory (possibly emulated at some point) thought-exercise to create a useful screen format for an early-32-bit computer system within the technological constraints of the time.

And... Wheeee! You are getting the tail-end of several hundred hours of sleepless nights circling through all manner of different ways of doing it, including more than one quite-innovative, but ultimately flawed, C64-VIC2-like hard-wried 8-bit system. ('Flawed' in that they likely would have worked fine, but in the end the below much simpler, yet more flexible, system is inherently better.)

Sample Interface

It should be sufficiently simple to implement within per-chip transistor budgets of the early-1980's: I am targeting a start of 1983, since that was around the time I first had access to entry-level computers.

For reference:


Definitions:

LSBs: "Least Significant Bits" - the bits towards the low-value end of a byte.

MSBs: "Most Significant Bits" - the bits towards the high-value end of a byte.

ARGB Alpha (transparent), Red, Green, Blue. In the proposed system, this is an 8-bit colour space with two bits for each value. This provides 64 colours, and four levels of transparency, which I feel is optimal for 8-bit-era-like block-graphics games.

In ARGB mode, full-alpha is a bit wasteful as all 64 colours are fully-transparent, so rendered the same. I have played around with an idea of 5-levels-per-colour channel, providing 125 colours in 7-bits, 1 bit for no-alpha/half-alpha, plus one of the spare bits for full-alpha but eventually decided this was too complex for too little gain. It also makes alpha-blending far more complex than you really want on a speed-constrained system.

8-bit era refers to the sprite of the 8-bit era, and the general user-facing simplicity of these systems, not the actual size of the CPU's data bus or ALU. I am specifying a simple 32-bit CPU as it was both entirely possible in the era I am targeting, and also because multi-byte-arithematic is tedious, especially in assembly code!

CPU

I am thinking about a relatively simple RISC system analogous to an ARM2 or RV32im, both of which easily fit within about half of our 64k -transistor budget. The other 32kT could be used for timers, serial UARTs and other common system components. Also a simple MMU and two Burst-Bus controllers.

On-package with the CPU could be two 64k-bit SRAM chips, for caching 16kBytes of Burst-Bus data, or 4x 4k blocks.

This is really tight, but using DRAM to quadruple this is going to also restrict me to a 1.74MHz memory system, so it wouldn't really be worth running the CPU any faster either. Because this is cache of slower off-chip memory, it isn't as bad as it immediately seems, though programmers would have to be very careful to avoid cache-thrashing!

STORAGE

The primary Burst-Bus supports up to 8 EEPROM-like cards (also rather FLASH-like, with 4k erasure blocks), which I envision being somewhat like CF-cards in form-factor, but with far less pins (probably 14-or-so standard pin-header pins in a single line). Such a card can take (up to) 8 chips internally for (initlally, in 1979) 256k-Bytes of storage per card, and 2M-Bytes for a fully-populated system. This would be hellishly-expensive, at least compared to consumer-storage such as audio-tapes and 180k floppy disks, but (surprisingly) isn't actually too bad compared to magnetic hard-disks of the time. Since this is all pure-fantasy anyway, let's assume I am fantasy-wealthy, too! :-P

The secondary Burst-Bus supports only one card, but is hot-plug-safe. It is for using cards as removable media.

As well as FLASH(like) memory cards, ROM cards are also entirely possible for distributing commercial software/data, and could be installed internally or on the hot-plug bus, depending on application need.

Video

Here I am a bit memory constrained, and am going to start with a 64-colours-per-pixel double-scanned PAL-Video output.

Why PAL?

Because it is friendly: it is your pal, after all! :-P

We are targeting a PAL-TV system and so have a theoretical 768x576 displayable square pixels, though over-scan means we are limited to more like 704x512, with a 32-pixel/32-line border (even that is assuming the TV picture is properly aligned in the tube, which was not the case as often as it was, but manually re-aligning to fit won't put the TV outside of broadcast-content overscan expectations).

We are discarding NTSC (AKA: Never The Same Colour :-P ) TVs because 1: the spec. is rubbish not worth the time of caring about, and 2: they don't have enough scan lines to efficiently hold an interface with 256/512 lines of image, the realistic maximum being 200/400 lines when overscan is accounted for. 2ⁿ values are important on constrained systems because working with them can use simple single-instruction/cycle shifts in place of complex multi-instruction/cycle multiply operations, which can represent a 5-10x speed-up in execution of display-element calculations on a CPU without a barrel-multiplier (or in the 6502's case, any multiplier at all!)

Note that I am not suggesting programmers should be 'cleverly' substituting shifts for multiplies in their code higgilty-piggilty! That is a lovely way to produce unmainainable code! These days, just multiply, and let the compiler do the optimising! Especially since programmers (myself included, undoubtedly) are notoriously bad at recognising the limits to their own capacity for 'cleverness'! But where it can be utilised consistently and rigorously and after some actual thought.... Ideally in thoroughly audited system libraries following an actual design-guideline!

Also, I'm talking early-computational-era here, where evey clock cycle was actually measurable with a pocket stop-watch (hyperbole). These days 96.3% of global computational capacity is tied up in advertising, and tracking people for advertising (just as the users, apparently, prefer it) so I doubt it really matters anymore outside of scratching my personal OCD-itch!

Due to the way video memory is usually mapped, this is more an issue for horizontal resolution, which is why I am using 2ⁿ data widths for my video buffer elements. On an NTSC system, you could implement a 512x384 disply, and stay within the overscan areas, but then you don't have a visually-pleasing square content pane. ... But maybe North-Amaricans don't care about visually pleasing video anyway: they spent many decades watching NTSC-format video, after all!

The non-square-pane issue can sometimes be worked around, somewhat, anyway. Scrollable content like text won't really care, and I have even seen 8-bit versions of PacMan which, in order to use the origional maps, rather than 4:3 adaptions, auto-scroll the play-field with the character. So if fixed-pane-content programmers are prepared to do a little extra work, the loss of 25% of their pixels off the bottom of the screen can be dealt with.

But, in the end, I feel there are just too many compromises for a market that, while large enough, was running on a decidedly second-rate standard (one of many many such!). They chose to use a different power-frequency, and hence TV system to deliberately isolate themselves from international competition, so I probably should respect their wishes here anyway!

Also, as a final - and more serious - note, NTSC (and PAL) TV's don't exist much anymore and I could emulate my interface on any modern digital display in any country. Or on a web page, really!

So with PAL output, I am going to provide memory for a 256x256 frame-buffer. This will require 256k-pixels of data to be stored, just within budget, if I store one bit of each pixel per Video-RAM chip. The caveat is that I will have big black borders down either side of the screen (assuming I want to keep my pixels square, which I do!)

Some ad-hoc testing indicates to me that a square field is visually-optimal for text and simple graphical display, and is both readable and pleasing to the eye. Most early computers (pre home-computer era) used either square or vertical (portrait-orientation) display. Even in the 8-bit era, the majority of arcade cabinets had vertical screens, often using a square section for the game play and the extra vertical space for displaying data on scores and lives. The horizontal screen was largely forced by the heavy reliance by early home computers on domestic television sets, which were optimised for video, not text/graphics, and generally not able to be physically reoriented.

Speaking of video, it should be noted that for 'cinematic' content, such as video and rendered 3D worlds, a square display is entirely unsuitable. A rectangular horizontal display of around 2:1 aspect is ideal. The 4:3 of an analog TV was a compromise on the difficulty in building non-square picture tubes.

Early vacuum picture tubes were actually round (a structurally-ideal shape for a vacuum vessel) and - at least early on - there was a limit to how rectangular you could make a tube before it was either to thick-walled and hence heavy to practically make and use, or in danger of physically imploding under atmospheric pressure).

So this system is specifically not intended for 'cinematic' content. It is envisioned that in the age of 3D, this system might be rendered inside a 3D world, rather than the world being rendered on it! It is explicitly a 2D interface primarily for text, and simple flat graphics, mostly 8-bit-era-like game graphics which, as mentioned, mostly work better with a square play-field themselves.

VRAM

VRAM technology was all about cleverly-re-using systems already present on existing DRAM chips, so I don't have any extra limitations beyond my previously-mentioned transistor constraints.

So with 1 bit per VRAM buffer, I need a minimum of 6 chips to support 64-colours. And if I pack 2 chips per package, then that is 3 chip-packages, 1 per colour-channel. Possibly with a 2-bit video-DAC also in-package.

VRAM-layout.

I am going with - at this stage - a byte-mapped video buffer, with each pixel aligned to a whole-byte boundry, to save on the need to mess around with bit-addressing. The format would be ARGB ('A' being Alpha, or pixel-transparency), with two bits for each channel.

The VRAM is therefore 6-bits, using 6 chips to provide 64 colours, and the Alpha-channel is left unimplemented in VRAM.

While useful for graphical data in CPU memory, the two alpha bits are of no use to normal users at the display, so are not provided in VRAM by default. However, pads would be provided on the circuit board for them to be added after-market, for professional video applications which can use that channel for video-overlay applications.

Basics Down

So I have a 13.95MHz 32-bit system with 16k of on-die SRAM, caching up to 2MB of storage. Video is 256x256 square 64-colour pixels, output double-scanned to a PAL-TV. Audio would be via my C64-SID-inspired Synthesiser+Sequencer Chip.

Rather high-end for 1979-ish (Apple-Lisa levels of high-end!) but within the constraints set.

256-pixels wide is rather limiting, but not at-all unreasonable for the day. It does restrict you to the following:

Hardware Pointer

One more thing....

Since this is an inherently-graphical system, a pointer device for GUI use is definitely a thing that is going to be wanted. It is a sufficienely common and relatively-simple addition that it is well-worth implementing in hardware.

So an 8x8 pixel poiner, using 64 bytes of its own on-video-chip VRAM, in 8-bit ARGB colour, is merged into the video output via hardware. This saves the CPU a significant amount of draw/redraw effort for relatively little extra complexity.



Blitter

Above describes a system that is suitable for the very-early end of the 32-bit era. It has a fairly low transistor count for quite a high level of functionality, particularly the 6-bit colour.

But while the CPU certainly can do all the grunt-work of graphics manipulation, not even too slowly on a relatively-small 256x256 byte-mapped graphics field, a blitter is a definite value-add.

The Blitter was first popularised by the rather-clever engineers that created the video system that eventually became the Commorore Amiga. It is, at its most fundamental, a DMAC (Direct Memory Access Controller) that works in 2 dimensions.

A plain old 1-D DMAC, as has existed since the days of mainframes and minis, is sent instructions by the CPU of a source address, a destination address, and a length. It then copies data from source to destination, making sure not to trip-over itself, if the two ranges overlap, while the CPU goes off and does more specialised things with its cycles. The blitter is also inherently faster than the CPU for this specific task as while the CPU has to be able to do many different things adequately, the DMAC is a specialist with just one job. Even the simplest DMAC will be at least 2x faster than a CPU for memory copies, and there are various tricks that can squeeze a good bit more out.

Yes, I am aware that some CPUs have explicit bulk-memory-copy instructions. This is just integrating a DMAC into the CPU, so what's your point?

A simple blitter takes this concept and applies it to two-dimensional structures in memory. It takes a few extra commands from the CPU to set up its parameters for these extra dimensions, but once going, it again flies.

A more complete blitter will also have other features integrated, such as:

The reason the Amiga graphics system was able to perform so well, despite using 'planar' graphics, was entirely on its blitter. Planar graphics have some advantages over other approaches, but speed is definitely not one of them! At least not without good hardware assist, and the Amiga's blitter did all the tedious chopping up of the graphical data and distributing it appropriately to the different graphics planes, so the CPU wouldn't have to trudge through the task itself.

The hardware sprites of the Commodore 64's Video Interface Chip II, could also do something like this but with a fixed-function hardware setup which could do it blindingly fast, but also only in the very-specific and by-necessity-limited ways it was designed for. The Amiga Blitter was far more flexible, sacrificing a little (though not that much) speed to achieve this flexibility.

Another potential advantage of a Blitter, even the simplesst 2D-DMAC kind, is you can use it to separate the video buffer out of the CPU memory space. This has its own advantages:

Separate VRAM! What's not to like?

The CPU loses direct access to VRAM, but we have already established that CPUs suck at graphics, and a blitter (under CPU control anyway) is far better.

Blit me!

So, my initial blitter would have the following features.

And that's a basic blitter!

Using the setup described so-far, you can draw sprite-like entities to the screen. Sprites are small mobile graphical blocks, very commonly used in games. Sprites of any size even, rather than a hardware-pre-set size, though I would likely limit them to 2x by 2y dimensions for simplifying the transistor logic - if you want intermadiate sizes, you can just pad the next-size-up with transparency.

Adding the ability to flip the copies on the horizontal and vertical axis during copy is also a relatively-easy add. Rotating at 90/180/270 degrees is a bit harder, but not actually difficult, and might be worth the extra transistors too.

Likewise, pixel-doubling, and pixel skipping (both 1/2 and 1/4), to crudely scale graphic elements up and down. Explicitly not supporting any kind of 'smooth' scaling - a thoughtfully-designed graphical element can scale smaller (at a 2ⁿ stepping) cleanly without special help, and scaling up is just going to have to look chunky!

You also have the ability to draw fields of tiles, those being similarly-sized graphic blocks used for laying out maps and the like, again often in games. This is at the cost of the CPU still initiating each tile-draw, though that is still orders of magnitude faster than doing every tile byte-by-byte.

But wait! There's more!

Tile fields are such a common thing in games that I would definitely want to implement hardware to drive them directly. A simple sequencer that, taking some register values, will walk the blitter through the sequence of drawing each tile under its own control. Again, not excessively difficult, with similar features to the sprites:

Text.

Text (at least with fixed-width characters!) is really just a special-case of block-graphics. I'd support two text modes:

Out of scope for my personal use, but some additional features that would be fairly trivial to add:

Characters would also work a little different to graphical tiles in how they handle data, in that each character consumes 2 bytes, the low byte being the character code, defining 1 of 256 glyphs, and the high byte being 'attributes' for colour and effects:

The 8 text colours are standard binary RGB. how the 'I' (intensity) bit behaves depends on the pane's colour-mode:

Dark Mode:

This results in black/dark-grey/light-grey/white and medium/bright of every colour.

Light Mode:

This results in black/dark-grey/light-grey/white and bright/medium of every colour

As 'trendy' (and sometimes legitmately useful) as 'Dark Mode' can be, it is actually harder on the eyes. 'Light mode' floods the eye with light (go figure) which causes the iris to contract. This sharpens the image on the retina a little more, compared to a mossly-dark screen. This makes the text easier to read, reducing eye-strain, slighthtly.

And I still come across YouTubers trying to claim that 'dark mode' is more 'energy efficient'!:

On a CRT, it was a marginal difference, as the bulk of the energy was used driving the coils, not the electron beam (so much so that the electron beam's drive energy was generally harvested from the left-over coil energy after each row scan).

On an LCD, dark is very-marginally less energy efficient! The light is there all the time, it takes (a tiny amount of) extra energy to twist the liquid crystals to block it.

On a micro-LED-backlit display, it might sometimes be true, as on these the backlight does dim, but only in areas where there are no light pixels at all (so nowhere there is light text on a dark background, which is kind-of the point!)

True-LED displays (oLED, µLED, so the latest phones and high-end TVs but not many computer screens yet) are the first technology where the idea is significantly true! And those devices are so energy-efficent, the effect is still a bit on the marginal side.

And the moral is: don't believe anything you hear/read on the internet, which tends to be largely incestuous-little-pools of hearsay-hawkers reverberating unfounded 'sounds right' folklore off each other!

I did also play with the idea of using 2-bit colour (black/grey/white/a-colour or black/dark-colour/light-version/white both work well, conceptually), but in text-mode very little framebuffer space is taken up by colour data anyway (mostly hardware registers and otherwise-unused bits in the text-attribute byte), so there was no real point for shrinking the text colour-pallet.

6-bit RGB (64-colours) was also considered and discarded as it is too many bits for available character-attribute space for too little extra gain (the ARGB colour mode is used in graphical modes where it is more useful).

The characters themselves are 2-bit-per-pixel greyscale, allowing them to have slightly-softened jagged edges. Colour is added by the blitter according to attribute settings.

Finally, characters are stored in a 64-character (so 128-byte) wide array. Even 42-column characters, which can be sideways-scrolled under CPU control to display the entire buffer. The buffer itself is intended to be a fixed 64-column display, mainly for use in terminals, or other applications where variable-width fonts are not appropriate. It is explicitly not for game-use (since we already have far-superior colour-tiling support for that!)

In Sequence.

In a separate small brain-vomit concerning improvements to the C64-SID synthesiser. I also added in a sub-concept for a DMA Sequencer. Basically a linear DMAC (with a few extra features not applicable here) that walked a list of sound-chip register values, plugging them in timed to a 'beat' setting.

Why not do the same sort of thing here?

I am envisioning a CPU-memory-resident data structure that is a list of every blitter-drawable graphical element that is wanted on the screen, in order to draw.

In the name of keeping things simple, the CPU manages the list itself, and the Blitter just walks it in order. Moving one sprite above another, for example, involves the CPU swapping their draw-commands in the list (not needing to touch the graphical data itself, though). Probably try to make all data-structures that could reasonably be expected to want to interchange, to the same size, probably padding with null data to the next 2n bytes, and including entire null data structures of each size, that can be inserted rather than having to bulk-move data about to close gaps everytime a graphic element is deleted (and to leave space for any new ones that are anticipated).

So now the CPU is managing the list of screen elements, but the blitter is doing all the grunt-work of drawing them into video buffer. This is optimal, as each element is dealing with the kinds of data it is best at.

A particularly nice effect of this is also that the programmer gets full control over the balance: want more than 8 sprites on screen at once? Add more to the list. Just make sure everything else is kept simple enough that the Blitter can get through that list by the end of the screen-redray (or accept flickering - or just lower frame-rates when we get to double-buffering). Less, bigger, sprites/tiles or more, smaller, ones, is another choice that now can be made to match content needs.

And, of course, the CPU can still ignore the blitter and talk to the blitter directly if the programmer needs that level of control.

This would have the following implications:


2D or not 2D

Beyond blitter functionality, some other useful 'graphics acceleration' functons can be added for a relatively low transistor-count.

A simple pixel-write and pixel-read function (effectivelya 1x1 pixel blit) is probably free and saves the CPU a decent chunk of 2D-coordinate calculations.

More complex graphical operations such as lines, triangles, rectangles, ovals are a good bit more complex, and might be best handed over to their own on-video-chip CPU. Though such a CPU can be super-specialised, and so likely simpler in design than a general-purpose processor, and with its internals optimised for its specific tasks.

Such a graphics-processing-unit (GPU) would likely have special functions not included in CPUs for many decades to come, if ever:

Floating point graphics tends to be a bit over-rated. While the origional PlayStation shows you can certainly screw fixed-point up really badly, if you actually do it competently it is both faster and more energy efficient than floating-point by about an order of maginitude.

'Fun' fact: the reason the addition of an FPU to the x86 line was so phenominally impactful on early 3D games wasn't because floating-point-maths was some kind of magic bullet, but simply because Intel's implementation of integer multiplication and division pre-pentium was so abysmally bad that it made the good-but-not-exceptionall x87 maths actually look fast by comparison!

And people have been disregarding fixed-point maths (and even integers!) in favour of much-slower floats ever since. (Early GPUs also made floats an advantage by fudging the precision, which was fine for graphics, but bit a lot of researchers on the bum when GPGPU first started to become a thing!) Floats have very good uses. I am not convinced graphics are one of them.

It also helps to have proper fixed-point support in your CPU instruction set, as simulating it (particularly multiply, divide, and other complex operations) with integers can be quite error prone with programmers that don't have proper training in numerics (so most of them!), though even that approach is still better if appropriate care is taken. DSP programmers know what I mean!

While I wouldn't expect to be able implement even simple 3D graphics at the first iteration, leaving gaps in hardware registers and commands to add a third dimension would be an easy enough future-proofing, costing nothing at all.

Also it is worth repeating that this system is not intended to be a '3D' system anyway. The 3D presented does not have to be 'game speed' but would be intended for embedding things like simple 3D models in text documents (such as molecules in a chemistry text). Letting it be done by the CPU (excepting, probably, the almost-free set/get pixel operation) is quite acceptable.

In fact, my primary use-case for this whole idea is to implement it inside my own VR-world as a virtual mono-tasking computer that I can actually program at a low level, as its simplicity is within my ability to understand the whole-system at once. Just for fun. So this system is supposed to go in the advanced-3D, not the advanced-3D in it!


New-year Resolution

In 1984, chip transistor densities have quadrupled to 1M-transistors. I'd be inclined to use this as an opportunity to bump my display resolution to an interlaced 512x512 pixel output.

Within the revised Blitter for this, all pixel-ratios would be doubled, so bit-doubling becomes bit-quadrupling, regular becomes bit-doubling, 1/2 bit-skipping becomes regular, and 1/4 becomes 1/2. The net-effect is that everything that uses the blitter properly should just keep on working transparently. At higher detail for formerly bit-skipped content.

Text, likewise, doubles resolution to 8x16 pixel fonts for continuing 64-column mode, with sharper text (I'd just pixel-double 6x8 mode, as it is really just a sop for the Latin alphabet anyway, and anything important can just re-compile to use the proper 8x16 characters, which are now big enough to support wide Latin letters).

With the bump to - effectively - 80-column-text-resolution-equivalent detail, a PAL-TV is going to be a bit problematic, especially with Latin-text (my 4x8-optimised alphabetic set will still display fine at this resolution, just more smoothly). A composite video input (rather than RF-modulated) on the TV would definitely be wanted for adequate sharpness of high-contrast text. You would likely get away with lower contrast for games.

The hardware-pointer also needs to be bumped to a 16x16 graphical element, ideally with pixel-doubling support for legacy 8x8 pointer graphics.

...

We also get to quadruple our SRAM cache in the CPU-package to 64k, which is probably the point at which the system becomes usable as more than a neat-toy! (Though I would undoubtedly be genuinely-awed at what truly-clever programmers manage to pack do in the original 16k cache! - remember, we potentially have plenty more STORAGE memory off-chip, but it is the avoidance of cache-thrashing that is the limit.) The new CPU package would be a drop-in-replacement for the old too, for home-upgraders.

And our STORAGE chips themselves also quadruple to (up to) 2MBytes each (unformatted capacity of a 1.44MB floppy disk) or 16MB (respectable small-HDD size for the era) for a fully-loaded system - at the same cost as the prior 2MB load-out. Still ludicrously expensive, but reduced-chip lower capacitity cards (say, a 2-chip 256kByte card, comparable to a contemporary 180kB floppy disk) are now 1/4 price, and when accounting for speed, reliability and size might be starting to look price-justified, with high-end users, at least! Plus upgraders have all their old 128MByte internal cards that can be re-used as small-capacity externals, since it is all forward and backward compatible. As with SRAM cache, we are really only now entering the properly-useful level here. But it is around 1981/2, so pretty good!

There would, of course, be nothing stopping a third party making compatible floppy/hard-disk or tape drives too. It's just not something I care about in my context here. But my peripheral-bus would be fully open-specification.


Drop-through double-buffering:

Another year, another transistor-doubling!

An idea I had for VRAM is to interleave two sets of memory cells, one accessible R/W to the CPU (or to the blitter if used), the other reading out to the video display. The trick is that a single command simultaneously dumps all data from the 'working' framebuffer to the 'display' framebuffer. It is double-buffering, but with an instant (single cycle) buffer transfer.

It has an advantage over dual-frame-flipping in that you don't lose your last screen draw from working buffer, so partial updates are still possible, rather than having to redraw the whole thing from scratch each time. This also applies to triple-buffering.

The instant nature of the display buffer update also eliminates one of the disadvantages of double-buffering, namely having to wait for the buffer to copy out before starting to draw the new one (or at least being very careful to keep re-draws behind the ongoing copy-out).

That doesn't eliminate all the advantages of triple-buffering (you still should be waiting for the vertical-retrace before pushing the buffer down), but it should bring the difference down a good bit such as triple-buffering is a lot less advantageous, and possibly no longer worth the extra framebuffer memory and redraw-from-scratch, depending on the the specific application.

...

And at this point you can fit the CPU and 64k SRAM all on one chip, which will dramatically drop costs. STORAGE cards would be up to 4MB(and half-price for last year's 2MB ones) for a 32MB fully-loaded system storage capacity for 1985-ish.


Going forward, as transistor densities continue to increase, I'd be integrating more chips together (VRAMs merged on one chip, then on-chip with the blitter and synth, then all on-chip with the CPU+SRAM), and accordingly reducing system prices, rather than pushing for more SRAM or resolution. STORAGE density would keep going up, and prices for lower-capacity cards going down.

Beyond that, I would be looking for an entirely new paradime, probably starting with a 1024x512 display with strong 3D, a multi-tasking OS, and the above 2D system fully-emulated as an (or multiple) overlay(s) (or optionally embeded into a 3D world, if using one).


In Conclusion

So we have an early-32-bit-era system with 64-colour graphics, and a biltter providing both relatively-sophisticated tile-graphics, and some basic fixed-width text modes, and keeps to the square 2n content pane. And a graphics sequencer to further off-load the CPU from much graphical grunt-work.

As for physical implementation, I am not really 'into' retro hardware enough to have the patience to go FPGA-programming, but simulating it in a WASM environment for a virtual version on a web page or in a VR-world would be interesting.

These days, I could also easily go to a 1024x1024 frame, for even sharper imagery, but I am not convinced that is really useful for the kind of system this is intended to be. And if it is live on the web or VR rather than executing locally, then keeping the resolution down is going to help with data-bandwidth anyway (a 512x512 screen is roughly-equivalent area to a 480p video; 256x256 is about 360p area. I'd be targeting around a 13.3fps refresh rate.).

Caveat

The thing is, in a VR environment, I can likely better-implement 'tile games' as tile-like 3D entities laid out in a 2D grid, so do I really even need a 2D system that supports tile graphics? An API do do tile-like faux-2D in my preferred VR environment is likely a better approach from this end of time!

So I really just want a mono-spaced text panel for terminal-like things, and a rich-text reader panel for flow-text!