Graphics processing in hardware and software

I've got a peculiar hobby: I like to worry about very specific implementation details of technologies I don't really understand at all; one of them being GPUs and graphics drivers.

On one hand, it's really simple: In almost every computing device, there is a GPU. This is basically a programmable, special-purpose, massively parallel CPU, and until recently, its only purpose was drawing triangles in different colors; and not just one or two, but lots of them – per second. Parts of it are dedicated to the triangle-drawing business, because that's still the most efficient way to do it, but most of the hard work happens in the programmable parts.

Since every device seems to need a driver, there is one for every GPU. And how hard can that be? Identify the triangle-drawing chip in question, figure out a way to talk to it, throw some triangle coordinates at it and marvel at the results.

But the more I think and read about those two components, the more I get the impression that it might not be that simple.

Concerning the GPU itself, I'm wondering what parts of the rendering pipeline (the process of interpreting large amounts of bits as triangle coordinates and textures and converting them to a rasterized 2D projection of a three-dimensional scene) are actually still happening in dedicated circuits, and how much of it really happens on general purpose CPUs, programmed by firmware internal to the GPU or possibly even the driver, and therefore the CPU. From what I've learned so far (mostly by reading lots of introductions to OpenGL, modern GPUs, technical documentations and source code, everything is possible – there are software renderers that run as software on the shaders of a GPU, and, on the other end of the spectrum, "hardware" components that are fed with ASCII representations of OpenGL shaders (with the help of not-so-open source drivers.

Some GPUs need blobs of firmware in order to do their job (which hints to a partial software-like approach to the problem); others don't – but that doesn't say anything, since firmware can also be stored inside of a chip, similar to the microcode of common "CISC" CPUs.

The more I think about that, the more I realize that, for this topic as for almost every other technical subject, there is no easy or general answer, and finding it the hard way takes lots of time, and also luck with finding the right documentation. Which brings me to the topic open source graphics drivers.

Since most of the magic seems to be happening at least partially in software, whether on the host CPU or in embedded DSPs of the GPU (though I realize that there are quite a few ASICs left), there is an understandable, but still annoying tendency of GPU vendors to treat their driver software with as much secrecy as their actual hardware products – simply because that actual product is actually the combination of the chip and the driver.

This brings us the obvious problems that all closed-source drivers share: We have no way of fixing problems when they arise, and also no way of making assertions, or even educated guesses, about the security properties of a software that runs with the highest privileges possible on millions, possibly billions, of machines storing sensitive data, both commercial and private.

Apart from actual vulnerabilities in the driver code running on the CPU, I'm wondering to what extent processes running on the GPU itself can access the main memory of the system, and how the various drivers ensure that such memory accesses don't circumvent the process separation that is now commonplace on most operating systems thanks to the memory virtualization provided by the combination of the memory management unit of the CPU and the security mechanisms of the operating system kernel.

Since shaders, the programs running on the GPU execution units, can be provided in source and sometimes also binary form by any user of the graphics (OpenGL) or general purpose (OpenCL) API, memory accesses of those shaders have to be obviously limited to something less than the whole system memory space. There seem to be two approaches:

  • For some GPU drivers, that protection is provided by the driver verifying all commands that are submitted from the user space to the GPU. It checks for illegal memory accesses and other potentially dangerous operations.
  • Other, mostly newer models provide a hardware MMU themselves that can be programmed by the operating system or the driver to disallow all memory accesses, except for the ones for data that is located in buffers owned by the same user.

According to a presentation on the subject, the first approach is currently used by the Linux drivers for AMD and Intel GPUs, while the second one seems to be only supported by the open nouveau driver for Nvidia GPUs.

The situation for OpenGL on Android seems different, even though it also uses the Linux kernel: Due to some references in the Kernel source code of almost all Android platforms which I examined, I suspect that most or all of the Android drivers actually use an IOMMU, that is, the hardware approach to the problem. I suspect that this is because it allow the mobile GPU vendors to open-source the Kernel portion of their drivers – the verification approach can obviously only be executed in the Kernel (or a trusted userspace daemon, with even more overhead), and needs a lot of knowledge about the format of the command stream sent to the GPU, which would thereby be openly documented.

As I've mentioned, most of the drivers are released as closed-source by their vendors (with Intel and possibly (I've not done any research on them) AMD being a laudable exception), but there are some open-source alternatives, most of them are created by tediously reverse-engineering the GPUs. At least for Nvidias Tegra line of mobile GPUs, that might change, though; after fingers having been pointed at each other, Nvidia finally seems to release a bit more to the open source community in the form of both documentation and actual code commits. One of them is especially interesting to me, since it confirms the IOMMU approach being used. On the mainline Linux kernel, it also seems possible to use the stream validation approach.

So what is my point? As I've said, I have a peculiar hobby, and somehow I find the topic of GPU drivers really interesting. I still don't know nearly enough even to be able to understand the Kernel source code, but I'll continue to try to get a clearer overview nevertheless. If you've got any hints for me, please go ahead and write me (blog at lxgr dot net)!

Comments !