John Carmack’s annual QuakeCon addresses are gold mines of insight on 3D graphics and game development (2011 link).
One interesting point in this year’s talk was about how to structure files and I/O for data-intensive graphics. John came out very strongly in favor of what you might call an “mmap-in-place” approach. Today his code still uses heavily-compressed “pack” files that are read serially, but that’s mostly due to storage space limitations on game consoles. In the future he is favoring binary formats that are very close to the final in-memory representation of the data, so that you just mmap() the file and use it in place rather than running it through some kind of parser.
This surprised me because most of my experiments with mmap have not shown big wins relative to conventional serial file I/O. (e.g. I once hacked an MP3 player to use mmap() rather than read(), and was surprised to find that it performed poorly and trashed the buffer cache). Modern operating systems are very good at loading and caching big linear files on disk, but not as good at handling lots of random paging on memory-mapped files. I couldn’t figure out why John thought mmap-in-place was the ideal solution, until it occurred to me that his use cases are qualitatively different from mine.
Let’s contrast the two I/O approaches in a few ways. First, does a given access actually need to hit the disk? If all of one’s data fits into RAM, then neither I/O system will require disk access. mmap() will be more efficient because there is only one copy of the data in memory rather than two, and program code can access it directly. This is actually a very important consideration for future-proofing. Any code that does complicated things to get around memory limitations should have a good “fast path” that kicks in once Moore’s Law makes those limitations obsolete. For example, a Linux kernel developer once remarked that any database that uses unbuffered disk I/O should include an option to fall back to regular buffered I/O, or else it will perform very poorly in cases where RAM is actually big enough to hold the entire working set. Note that in John’s game-engine world, games levels are specifically designed to always fit into the available memory on target platforms, so it’s always going to use this “fast path.” Whereas, in the offline rendering world, I can’t guarantee that my datasets will always fit in RAM, so mmap-in-place may end up causing more I/O than reading everything serially.
Second, consider the issues of disk access locality and latency. At first glance, it seems that serial I/O on a well-designed, compressed file format is ideal, because disk reads are large and linear, whereas mmap() I/O is inefficient because the access pattern is random. However, I believe John makes an unstated assumption that most of the bulky data consists of graphical details that can be loaded asynchronously, like high-resolution textures and models, and not “core” data structures that must be present in order for the engine to run. In this case, I/O latency and locality are less important. Also, I think John assumes the use of a clever virtual-memory scheme as in his MegaTexture system, which improves locality of access.
So, in a game engine where the working set usually fits into available RAM, and where data can be paged in asynchronously, mmap-in-place does make a lot of sense as a data storage architecture. But for offline applications where you don’t have enough RAM for everything, and where reads have to be synchronous, mmap may not be the ideal approach.
All of this has got me thinking in more detail about what the true disk/memory storage needs are for high-end offline rendering. We spend a lot of time developing clever tricks to minimize memory needs, like on-demand geometry tessellation (REYES/procedurals), mip-mapping, and brickmaps. Most of my rendering optimizations boil down to trying very hard to minimize how much geometry needs to be kept in RAM. It’s interesting to take a step back and think about how much of this work is really necessary. After all, RAM is getting ridiculously cheap. Optimizations to squeeze a scene into 4GB might be useless or even counterproductive when you’ve got 16GB. Is there some point at which we can just dump everything into a naive ray-tracer and forget about all of this annoying optimization work?
Mip-mapping and brickmaps have more or less completely solved the problem of texture memory access. By selecting mip-map tiles using screen-space metrics, we’ve gotten pretty close to optimal in terms of I/O and memory needs for 2D and 3D textures. The remaining problem is just geometry. Smart culling and REYES do a fantastic job on camera-visible geometry; it’s more about ray-visible geometry. You can only fit so many million tessellated micropolygons in RAM, and given the poor locality and wide scope of ray paths, there isn’t a clear upper bound on what might need to be tessellated as there is with straight camera-visible geometry.
You’ve also got the problem of modifications to geometry – clever ray-tracing data structures usually aren’t designed for cases where major pieces of the scene are transforming or deforming every frame. This is why ray tracing hasn’t completely taken over from REYES in production. Ray tracing is theoretically O(log N), but that’s only after you have built an acceleration data structure. In practice it’s more like O(N) because you still need to stream all that source geometry into the system to get it ready to be traced. As of today, this means storing your models on disk, then serially reading those files and translating them into a ray-tracer-friendly data structure in memory. For my current project, which isn’t all that geometry-heavy, this involves going through 100-200MB of data every frame. If we are ever going to do high-quality rendering at interactive frame rates, this will need to change. John’s talk suggests the interesting approach of encoding models into some kind of virtually-paged ray acceleration structure. Perhaps we could run a pre-pass on baked models and animations, converting them into some kind of special binary format that the renderer can mmap on demand.