Home       About Me       Contact       Downloads       Part 58    Part 60   

Part 59: Gathering Statistics

August 6, 2012

Some of you may have noticed that there are numbered and unnumbered posts to this blog. The unnumbered ones like the last post, "Where Was I?", are all in the "delay" directory and are just me checking in to remind you I haven't given up. :-)

Normally, I don't say anything interesting in those posts, because I haven't finished anything. Last time, I mentioned that I was considering adding code to gather statistics, and ran a poll about how to handle that. Since I drop the unnumbered posts out of the chain of "next part" links, here it is again for completeness:

As you can see, people were pretty evenly divided over whether the demos should just send the logs, or ask the user first.

I hadn't planned to do anything on this, but then several events changed my mind. A reader named Corey Powell found the blog, left numerous comments, downloaded the source and showed all the enthusiasm I could ask for. And his 1-year old Lenovo laptop could not run the demos!

His laptop has Intel integrated graphics and only implements OpenGL 2.1. Now, I have supported that, but only tested it on my MacBook under OSX Snow Leopard. I have three desktop machines and an old Dell laptop, and all of them have drivers for OpenGL 3.3. So I've never been able to test the Windows or Linux 2.1 code properly.

I have a "platform" option in my code that forces it to use OpenGL 2.1. I have tested that, and thought it was enough. However, I never tested the case where the 3.3 code failed initialization. It turns out that was crashing sometimes. That prevents it from even trying OpenGL 2.1.

I also have problem with OpenGL extensions. I'm using the glew library, and it finds all the entry points defined by the driver. This means that even if I tell the system I want an OpenGL 2.1 context, if later routines are available, glew will find them, and my 2.1 code will work, even if it's not strictly 2.1 compliant. On top of all that, it turns out that MacOSX does not implement one of the 2.1 texture parameters (GL_GENERATE_MIPMAP) and implements it with a later OpenGL routine instead (glGenerateMipmap).

I decided that I had to have a machine with real 2.1 graphics support, and figured the way to get it was to use out of date drivers. So I spent a couple of afternoons rebuilding all my machines with old copies of Windows XP and Vista and old copies of Ubuntu, looking for a 2.1 driver.

Doing that, I made a couple of other unpleasant discoveries. First, my Part 58 Linux distribution had been incorrectly linked with the Debug version of the libraries. Not only did this make them 10 times the size they should have been, but they also didn't run unless you had a program development environment installed. Second, I discovered that the 32-bit binaries won't run on a 64-bit AMD machine unless you install the "ia32-libs" supporting packages.

I had no idea that either of these problems were out there, which made me realize I really do need to monitor these demos somehow. No one had told me they couldn't run the Linux demos, or that they were 10 times the size they should be!

There is no OpenGL 2.1

Despite all the searching for old driver, I couldn't find a real 2.1 driver. I finally got an old version of Ubuntu to claim that my integrated display on the laptop was 2.1, but instead of giving me errors like it should, it just crashes the app when I try to use CubeMap textures. So that wasn't much help.

I finally broke down and bought a refurbished Lenovo laptop with Intel integrated graphics. That arrived Friday, and I could finally debug the code properly. And yes, I know, this is extremely stubborn of me, to the point of being compulsive!

Here it is, with a 12.5 inch display, case color "Mocha". Or maybe Molasses, to go with the speed. I guess I don't have to worry about it being stolen.

Fig 1: I will name it 'Molasses'
Fig 2: Bad texture!

Corey had already recompiled the code and fought his way through the OpenGL 2.1 initialization. Unfortunately, once the app started to run, he reported that Crafty looked like Figure 2. I'm not even sure how you would get this effect if you wanted it. It's clearly some shader problem, not in the high-level code.

First, I changed the OpenGL 2.1 initialization to use the oldest form. This should always work, since it uses the Windows native OpenGL support, not assuming anything about the driver. Once that worked, I got the same shader problem Corey had seen.

Under OpenGL 2.1, I don't have texture arrays. Instead, I use a "texture atlas". This is a big texture with all the array elements laid out in a grid. In the vertex shader, I implement the texture array by converting the array index into an origin and size. In the fragment shader, I implement wrapping by taking the modulus of the texture position inside this small rectangle of the full texture.

Clearly something was not working right in that area. All other types of texture eventually worked, once I stopped using some non-2.1 calls. But fixing the texture arrays did not go well. After hours of cutting away code in the shader and trying experiments, I had the following:

// ------- Vertex shader
varying vec4 vFragColor;
varying vec2 vTex;
varying vec2 vTexOrigin;
varying vec2 vTexSize;

void main(void) 
  vTex.xy = vertTexCoord0.xy;

  vTexOrigin.x = 0.0;
  vTexOrigin.y = 0.0;
  vTexSize.x = 1.0;
  vTexSize.y = 1.0;

  vFragColor = vec4(1.0, 1.0, 1.0, 1.0);

  gl_Position = mgMVPMatrix * vertPoint; 

// ------- Fragment shader
uniform sampler2D mgTextureUnit0;

varying vec4 vFragColor;
varying vec2 vTex;
varying vec2 vTexOrigin;
varying vec2 vTexSize;

void main(void) 
  vec2 tc;
  tc.x = vTexOrigin.x + vTexSize.x * mod(vTex.x, 1.0);
  tc.y = vTexOrigin.y + vTexSize.y * mod(vTex.y, 1.0);

  gl_FragColor = vFragColor * texture2D(mgTextureUnit0, tc);

Fig 3: This is impossible!
Fig 4: This is right!

This produced the mangled texture shown in Figure 3. If you can read shaders, you will see that there's absolutely nothing wrong with this now-trivial shader. It's impossible for this to be the result. Experimenting with the code, it appears that on some vertexes, vTexSize.y gets set to an odd value. Everything else is fine.

After staring at this awhile and trying variations, I accidentally happened to reverse the order of the two assignments to "vTexOrigin" and "vTexSize", and got Figure 4, which is correct! Since the two pairs of assignments are completely unrelated, this meant I had been chasing a shader compiler bug all along, not a difference in the OpenGL 2.1 implementation. Fortunately, I did not throw the new laptop across the room....

The next question was "Do I care?" Do I really want to be coding around some shader compiler bug to make this demo work on these crappy old drivers? Well, of course I did.

I decided that the most likely problem if these two assignments were interfering with one another is that the compiler had the size of the item wrong. It was somehow setting data into the next slot. So I tried making all the "vec2" variables into "vec4". I think the native type in the display is vec4, so I figured there would be no confusion. And that worked.

With this change (in all the copies of the 2.1 shaders), all the demos seem to run under this 2.1 implementation. Who knows if they will run on the next bad driver? All I can do is wait and see.

If the demos previously haven't run for you, give this new set a try and let me know what result you get.

The Logs

The standard way to report statistics would be to define a set of variables and create a report format. I didn't really want to do this, since it requires deciding in advance what I want to monitor.

Up to now, I've been writing messages to a file "errors.txt". These usually start as debugging messages, and then I leave in the ones that seem interesting. I wanted to keep entire logs, since they don't just show statistics, but also the order of events. Unfortunately, interesting information was just mixed in with the text, making them difficult to parse. The logs aren't even consistent across platforms.

To make the logs more usable as reports, I added a bit of syntax. Colon-delimited lines now give information about the machine, the operating system, and OpenGL variables. A typical report now has these variable lines mixed in with the trace:

:Program: TestCube
:Program Version: Part 59
:OS: Windows
:OS Windows version: 6.1
:OS Windows build: 7601
:OS Windows service pack: S 1.0
:Machine Id: 3594f3fc-33e3-47f4-b6de-72bce25a6962
:Machine 64-bit: true
:Machine CPU count: 6
:Machine Monitor count: 1
:Machine Monitor res: 1920 by 1200
------ initialize display mode
running on monitor=\\.\DISPLAY1
current display mode: 1920 by 1200 by 32, 59 hz, flags=0
create window at (91, 62) (818 by 647)
window created
------ try to create OpenGL 2.1 context, fullscreen=false, multiSample=true
mgGLInit finds GL_VERSION = 3.3.0
wglGetPixelFormatARB = R8G8B8A0, 24 depth, 8 multisample
compile overlay shader:
vertex shader compiled.
fragment shader compiled.
:OpenGL device vendor: NVIDIA Corporation
:OpenGL device renderer: GeForce GTS 250/PCIe/SSE2
:OpenGL version: 2.1.2
:OpenGL shader version: 1.20 NVIDIA via Cg compiler

This entire log is sent to the server, and all the variables are parsed and statistics generated from that. Now if I want to add more information, I can just add more colon-delimited lines. If I see evidence of some kind of crash, I can pull the raw log and try to figure out what happened.

For the user, you can see what I'm sending by looking in "errors.txt". No other identifying information is sent or recorded, not even the IP address. The file is sent to the server with the "mgSendLog" program. The source for that is available with the rest of the code, in Milligram/mgSendLog.

Machine Id

After I put all this together and started piling up statistics on the server (due to my own testing of the apps), I realized I had a problem. I was counting the percentage of each OS used, graphics card, etc., but since I had multiple reports coming in from the same machine, I wasn't getting good numbers. If Linux people ran the apps more than Windows people, it would look like there were more Linux machines.

To fix this, I had to add a unique "Machine Id" so that I can spot repeat runs. According to StackOverflow forums, on Windows, I can use a value from the registry:

On Linux, I can read the value of the file:
and on Mac, there's a strange incantation I don't understand, using a "semi-documented" API. The official documentation under XCode just redirects me to a list that does not include these routines, so I have no idea what I'm doing here. Mac address?
// get a unique machine id
void getPlatformUUID(
  char* buf, 
  int bufSize) 
  io_registry_entry_t ioRegistryRoot = 
     IORegistryEntryFromPath(kIOMasterPortDefault, "IOService:/");
  CFStringRef uuidCf = (CFStringRef) IORegistryEntryCreateCFProperty(
    ioRegistryRoot, CFSTR(kIOPlatformUUIDKey), kCFAllocatorDefault, 0);
  CFStringGetCString(uuidCf, buf, bufSize, kCFStringEncodingMacRoman);

Sending the Logs

I was originally thinking I would convert "errors.txt" into an HTML page with a form. A "submit" button would post the page to my server. The more I thought about it, this seemed like a pain in the neck. I don't know how to get a web page opened on each platform, and I don't think the user really wants to deal with a page opening every time he runs my code.

Instead I wrote a program "mgSendLog" which just opens a connection to the server, and posts the file (as HTML form data actually.) The program takes a options from "logoptions.xml". In there, you see the server host name/port, and an option controlling when mgSendLog runs.

If you hate the idea of reporting statistics, change action="always" to action="never", which never reports, or action="error", which will report only on errors. Eventually, there will be an action="prompt" value that opens a UI displaying the log, and lets you set the action option interactively. For now, that just seemed like more busywork.

If editing the logoptions.xml is too much work, just delete the mgSendLog program, and the code can't report.

The Log Server

Finally, I had to write a server that accepted the reports and produced some kind of statistics. For me, the easy way to do that was take an HTTP server I had lying around (the same one I used for the chat server back in Part 39) and add code to handle the logs. It's all very lightly tested at this point, but the Chat server has been running uninterrupted for months, so hopefully this will keep running as well.

You can see the report at SeaOfMemes Log Statistics

In case someday thousands of people are running my game, I don't generate the report on request. Instead, a background thread regenerates the report as static HTML. This is done whenever there's idle time in the server, but no more often than once a minute. So if you run a demo and go to the page, it might not have reflected your log yet.

For now, I'm reporting on each combination of Demo/Part, and overall statistics for each unique machine. As I add new variables or have new questions about the code, I'll probably rearrange the reports.

I'd appreciate it if a few of you would run the demos and generate some statistics. If this is going to crash, I'd like to know about it before I release the DontHitMe game.


The downloads page has been updated with Part 59 versions. Please give them a try, especially if demos have failed for you before.

Note that the latest source isn't in Github. While rebuilding all my machines, I managed to nuke the local git directories...

Probably still needs more testing!

Despite all the time I've put into this, there will of course be machines that don't run the code. I'm pretty much sick of messing with these platform issues at this point, but I'd still like to know what isn't running. Send me email please.

Home       About Me       Contact       Downloads       Part 58    Part 60   

blog comments powered by Disqus