Home       About Me       Contact       Downloads       Part 14    Part 16   

Part 15: Huge Tracts of Land

March 24, 2011

Looking over the work done since Part 4, the last three months have been pretty frustrating. The ray tracing was fun, but since I haven't done lighting yet, it hasn't been very useful. The world design needed to be done, but I'm not convinced this is the final version. The whole space theme seems a bit overdone to me, and I want to explore some other ideas.

So far, the performance and portability work feels like a huge detour, with uncertain benefits. On the one hand, more of you can run the demos. On the other hand, there are going to be a lot more bugs (from three different platforms.) It's still not clear to me how portable OpenGL will prove to be. It's definitely a lot more work to support all three platforms.

I also have to admit that there's a reason I'm retired and living on disability -- my health stinks! If I could still work a 40 hour week, every week, I'd still have a Silicon Valley job. As it is, I'm averaging about 20 hours a week on this. The occasional 30 hour weeks are being canceled out by 5 hour weeks like the last one.

I hope that the erratic posts don't drive you all away. I really appreciate the comments, suggestions and level of interest. I'm not going to talk about my health or apologize for delays again. It feels like I'm whining and making excuses. You'll just have to understand that I'm doing my best.

Chunks

As the player moves through the larger game world, we have to render the landscape around him. As the view distance increases, we want to summarize and remove objects, to keep performance up.

A completely procedural world would just generate the landscape on the fly, at whatever detail is required. Eventually, my asteroid backgrounds will work that way. The sky, sun and ring will be fixed, the distant asteroid landscape will be procedural, and only the nearby landscape will come from the database.

I'm still not sure what the final organization of the DB will be. I might just store objects, indexed by object id and chunk coordinates. Perhaps there will be some grid over the whole thing, like the Minecraft database structure, which has a file per chunk.

For this part, I'm just interested in rendering a larger world, so I've imported more Minecraft data, and used a similar DB organization. Where in the speed test case I was rendering a 128 by 128 by 128 block, now I'm rendering a 1024 by 1024 by 128 landscape. The data is broken into chunks. Chunks are loaded by background threads in the program, and rendered as available in the UI thread.

Minecraft uses a 16 by 16 by 128 column as its chunk size. In my game, the world isn't going to be made of cubes. There will be a polygonal landscape, with cube-based buildings on top. For now, I've used a 32 by 32 by 32 chunk size. This is the same number of cubes as a Minecraft chunk (32K). It's a reasonably small amount of data to load and display.

Resources

Since the world will be (effectively) infinite, we have to worry about managing our resources. Those include system disk space, system memory, system CPU, graphics memory and graphics CPU. Let's look at each:

System disk space

Nowadays, disk space is effectively unlimited, but if we had a large multiplayer world, the user probably will want us to put a limit on DB size. You don't want the game using up gigabytes of disk, saving every object you've ever seen over the course of months.

In the demo for this part, or in any single-person game, this isn't really a consideration. The entire world DB is on disk. The only thing we could do is try to compress our data better.

System memory

On any modern system, we should assume at least a gigabyte of RAM. The only reason we'd support less than that is if a version of the game ran on a tablet, like the iPad. In memory, the chunks are Octrees, which are reasonably compact. We can hold a lot of terrain in memory. Still, as you move around, you will see a lot of terrain, and a lot of other objects. We need to have a cache and drop old items out of memory.

In this demo, I haven't done anything. The entire world fits in 120 meg, so it can all fit into memory. I am reading the files on demand, but I never unload them from memory after reading them.

System CPU

CPU use is going to be a serious consideration. As we've seen, rendering the transparent data in the right order requires a significant amount of CPU time. In the speed test, it was taking around 10ms to output the transparent vertex list. Since there are only 16ms to render a frame at 60 fps, this is a really large number. In the eventual game, there will be lots of other objects like avatars and critters to render, and scripts to run.

In fact, I'm thinking that at some point, I have to revisit transparency. Either I can limit transparent textures to ones that don't need to be sorted (use alpha-testing, like Minecraft does), or use different kinds of transparency, or find a better algorithm for the general transparency I'm supporting now.

Graphics memory

Currently, a vertex is a position (x,y,z), a normal (x,y,z) and texture coordinates (u,v, and texture index). That's nine floating point numbers, or 36 bytes per vertex. Plus there will be at least another 4 bytes in an index buffer to call out that vertex. In the speed test, there were 697,576 vertexes, so we used about 28 megabytes of display memory to hold that data.

None of the display cards out there, not even the integrated graphics chips, have a problem with that big a vertex object. This demo handles a much larger world though (64 times as much data), so we definitely need to track the total display memory we are using, and delete objects based on distance.

This is a bit of a problem to implement. I have no idea how a device driver manages the display memory. Does it keep a vertex list in contiguous memory? In that case, is memory going to be fragmented and not allow a new vertex list to be moved to the display? If that happens, OpenGL will just keep it in system memory, but performance will suffer dramatically.

How do I find out how much memory is available in the first place? I don't see anything in either the DirectX9 or the OpenGL specs to let me do this. For now, I've just made display memory size an item in the options, and defaulted it to 200 meg. That should keep the demo running with any of the displays out there. I'd like to adapt to whatever display I have though.

I also have some hopes that custom shaders will cut memory use dramatically. After all, my texture coordinates are not arbitrary u,v values. I just render an entire texture on the face of a cube, so coordinates will always be one of (0,0), (0, 1), (1, 0), (1,1). I should be able to just send a single byte to indicate which corner a vertex is, and let the shader convert that to a real u,v. The final z-value of the texture coordinate is an integer index into the texture array, which has less than 256 entries. So instead of 12 bytes per vertex for texture, I should be able to do it in 2 bytes.

The same is true of my normals. They always face outwards from the cube, so there are only six possible values -- (1,0,0), (0,1,0), (0,0,1), etc. A smarter shader could cut this from 12 bytes to one byte.

Also, you may remember that each cube has 24 possible vertexes. This is because the normal is combined with the vertex. An upwards facing point is not the same as an outwards facing point, meaning each corner generates three vertexes. If I tell the shader which of the six faces they are displaying, letting the normal be calculated in the shader, then the number of vertexes per cube drops from 24 to 8.

Finally, if I can get the shaders to understand cubes, instead of triangles, then each cube can really be specified with 3 bytes (the integer coordinates relative to an origin), one byte for the face visibility and one byte for the type (to select textures from a table.) Instead of up to 24 vertexes of 40 bytes each (960 bytes) per cube, we would have 5 bytes per cube.

For this demo, I'm sticking with my simple shaders, but this clearly is something that needs to be investigated.

Graphics CPU

Surprisingly to me, the GPU isn't even breathing hard rendering all the data from my 128 by 128 by 128 case. If it weren't for the CPU use to handle that transparent data, we'd be flying along on all the platforms, even the ones with integrated graphics.

In the demo, I haven't put in any code that worries about load on the GPU. In the longer term, once we are handling more complex shaders and more interesting visual effects, I may have to put an estimate of GPU use on each loaded object. Then as I call out objects to be rendered, I will set them to less demanding versions (less detail, or slower effects) as distance increases, in order to keep GPU use under some limit.

What good is a fast display?

My original thought was to find out what a display is capable of (memory, speed), and use it all. This might actually be a problem in a multi-user system. People with good displays would actually see farther in the world than people with average displays.

Is this what I want? Or would it be better if people with poor displays just got lower frame rates, or less detail, but had the same view distance? I know WoW players who complain the cities are unusable due to low frame rates. Would it be better if all the other players were converted to stick figures in cities, or dropped completely? What do you think?

Since I'm not sure how to query display capabilites anyway, or adapt to the speed of each display/system, I've just used a fixed view distance. In the demo, hit the plus and minus keys (- and =, actually, so you don't have to shift) to change the view distance. The default is set in "options.xml", as the "viewDistance" attribute, in world units. Since whole chunks are rendered, you will actually see a bit farther than this.

A Video

The demo runs perfectly for minutes at a time, then steps on itself and crashes somewhere inside OpenGL. I'm still debugging it. In the meantime, here's a video:

Update

I worked on this same piece of code all week, so I'll just continue this part rather than starting another. It's been a frustrating week!

Multiple threads

Since we have multiple processors in the CPU, it makes sense to use multiple threads in the program. And I have two perfect pieces of work to do in the background: 1) loading the chunk files from disk, and 2) creating the vertex lists used to draw the chunk.

On the other hand, debugging multithreaded programs is a huge pain in the neck. First, (under Visual C++ anyway) the debugger becomes a lot less useful. As you single-step through a piece of threaded code, the debugger continually switches to other threads, making it very difficult to follow.

Second, you can no longer get repeatable test cases. Tiny little changes in the environment cause threads to switch at different times. This means allocations and method calls happen in different order, and the program is not exactly repeatable. Some of my standard tricks, like putting in breakpoints that fire after a certain number of memory allocations, don't work in a multi-threaded environment.

To get around this, I usually write my thread procedures to do work in small chunks. Then to debug the logic, I call all these "worker" procedures one after another, from a single thread, simulating multithreaded use. For example, in the demo, I call the workers after each display update.

This lets me debug the thread logic without actually using multiple threads. Once I have the worker procedures debugged, I can create real threads and call the worker procedures there. Any bugs I have at that point will be due to thread race conditions, not due to the logic of the threads.

The video above was produced using the single-threaded debug version, with chunks read and vertex objects created in small steps after each display update. As I mentioned, it ran for several minutes at a time, then crashed. The graphics for a chunk would suddenly turn into a random snarl of triangles, this would flicker for a few seconds, and then the program would die deep inside OpenGL.

It turns out this was just a memory leak. I was creating the vertex objects, but never freeing them. After total memory use got near 2 gigabytes, allocations would start to fail, both in my code, and (I think) somewhere inside OpenGL. This crashed the library pretty quickly (not instantly though, which is surprising!) Anyway, this was easy to fix once I realized what was happening.

Multi-threading OpenGL

With the non-threaded version working well, I created threads and called my worker procedures. Of course, it didn't work.

In fact, it worked fine under DirectX, which is a thread-safe library. It did not work under OpenGL, despite the fact that I'd created a lock (mutex) over the graphics calls. A Google search turned up some pages that said I should just set the current rendering context (using wglMakeCurrent) before each batch of graphics calls. I could not get this to work.

This page in the OpenGL wiki said I should create multiple rendering contexts and set them up to share object ids (wglShareLists). This seemed like the thing to do, since creating vertex buffer objects in the worker threads was exactly what I wanted to do. Unfortunately, it didn't work.

Another page said the problem was you have to create each of the rendering contexts on a new Windows display context, and you have to create them within the threads doing the OpenGL calls, and you have to release the first rendering context before you can call wglShareLists to share it.

This is a pain in the neck. It means when the thread starts up, the main rendering thread has to release the rendering context (easy), but then grab it back again after the thread has finished its initialization. Which means some kind of signaling between the worker thread and the main thread, just for this context switching business. Yuck.

I hacked in a solution just to see if this technique really worked and was worth the trouble. I got it to create all the contexts and share the lists the way the page recommended, got good return codes from everything, got OpenGL to create objects in the worker threads, and tried to draw them in the main rendering thread. It still didn't work!

At this point, I was sick of messing with it. I had spent a couple of days fighting to get OpenGL to do what DirectX had done without any trouble at all, and I was annoyed.

The most common comment on forums about multithreading OpenGL was "Why bother?" The idea is that since all the graphics calls are serialized at the display driver, there's no point in doing them in different threads. And I had to admit this was true of the graphics calls themselves. What I wanted done in the other threads (other processors) was the loading of the chunks and the creation of the vertex list. Actually creating an OpenGL object might as well be done in the main rendering thread.

With a bit of fussing around, I restructured the code to do this. I don't actually like this version as much as the previous one. It's a bit more subtle, and reading the code at the top level, it's not as clear when things happen. But it does all the OpenGL calls in the main thread, and (finally!) it worked.

Then I cranked everything up by using 5 worker threads at once, to max out my 6-core processor. It didn't work... sigh. Turns out I had forgotten something about my own code: to speed up the Octrees, I manage my own allocation of tree nodes. This allocation mechanism was shared between all instances of an Octree, and it was not thread safe. Fixing that gave me a working 6-core version that I could wander around in.

I said previously that I wasn't going to manage system memory, since the world didn't use more than 120 meg. It turned out this wasn't true. I am keeping a bit of state now on each Octree node, and they add up to quite a bit more memory than I expected. I decided to add a system-memory limit and delete objects when memory use hits the limit.

This still isn't working quite right. In the demo, you will see it drop and reload a huge piece of scenery all at once. This is definitely in the system memory management code (it stops if I turn that off), and is some kind of bug. Still, it doesn't prevent the thing from working, so I've decided to release the demo anyway.

At this point, I only have the Windows version working. I'll add the other versions later, and the source when all three platforms are working again.

Update 2: The source and demo are available for all three platforms.

Update 3: I fixed the bug where landscape was unloaded and then reloaded at random. New source and demos are available for all three platforms.

The Demo

The world file has been replaced with a world directory containing a sample of 3300 chunks. Zipped, this is about 25 megabytes. Since some of you will want to try this on multiple platforms, I've separated out the world into its own zip file.

Download The Part 15 Demo World. Unzip it into the same directory as the demo. The directory "world" should be next to "docs" and "options.xml". Or you can edit the "worldDir" attribute in "options.xml" to point to it wherever you like.

For Windows, download The Part 15 Demo - Windows. I've put 2D graphics back into the demo, so you can just hit F1 for help.

For Linux, download The Part 15 Demo - Linux. There's no 2D graphics support yet on Linux.

For Mac, download The Part 15 Demo - Mac. There's no 2D graphics support yet on the Mac.

If the program fails, you will find a trace file called "errors.txt" in the demo directory. Please email the file to me at

The Source Code

Download The Part 15 Source for the source code of all three versions. In a change from previous parts, this does not contain built versions of the demo. It contains the supporting docs directory and the options.xml file. Combine with the world directory downloaded separately to run.

To build on Windows, use Visual C++ or equivalent compiler. Build the JpegLib first in release or debug mode, then the Crafty build.

To build on Linux, use the supplied makefile. See the readme.txt in the BuildsLinux directory for a list of packages you need to install.

To build on Mac, use XCode on the Crafty.xcodeproj file. See the notes there in case file names are not picked up correctly.

Statistics

There was growth in both the framework (threading, event and lock support, across all three platforms), 2D graphics for Windows, and a bit of internal reorganization. There was new game code (finally!) supporting the loading and rendering of chunks.

   Full Project     New for Part 15
TheGame lines 8,112 2,290
Framework lines   14,790 3,941
Utilities lines 6,860 7
Total 29,762 6,238

Coding hours 405.2 65.9
Writeup hours 75.9 4.6

Home       About Me       Contact       Downloads       Part 14    Part 16   

blog comments powered by Disqus