Author Topic: Rendering Time  (Read 1581 times)

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Rendering Time
« on: May 13, 2008, 03:40:49 AM »
I thought this might interest some people.


I had Outpost 2 running full screen, and I was thinking my units were moving rather sluggish. So I decided to reduce the window size to about a quarter of the screen, and suddenly my units started moving a lot faster. So I got curious as to just how long it takes to render a scene. (After all, my computer is a number of times faster than what was on the market when OP2 was released, so why the slowdown?)

I ended up hooking the virtual function responsible for drawing the detail pane, and added some timing code, using the RDTSC instruction (Read Time Stamp Counter), to see just how long it took to draw a single frame. This was only for the detail pane drawing, not the mini map or command pane, or any of the rest of the game logic. For some perspective, my computer runs at 800 MHz, and the RDTSC instruction can be used to find how many processor cycles a given section of code took to execute, in cycles. (Note that it's hard to get accurate results for small sections of code with this instruction, possibly due to caching or pipelining issues, I'm not really sure, but this wasn't a very small section of code being tested, so hopefully that's not as much of an issue). I used the full 64 bit precision, since 32 bits can rollover pretty fast, although, even 32 bits should have been enough for up to greater than 1 second timing on even the fastest CPUs. The tests were done on a single CPU machine, so the timing values likely include time spent executing other processes, and context switch time. Basically these results kind of blow, and add in the fact that I don't really know anything about collecting good statistical results. But, numbers are usually interesting to look at anyways. =)


I'm afraid some of the results were also a bit too sporadic to draw much of a conclusion, so take everything with a grain of salt. Mind you, I haven't done much more than simply eyeball the numbers so far either. Typical timings results between tests were anywhere from about 1 million cycles, up to about 65 million cycles, which a few sporadic results outside of that range.

When running full screen and doing nothing but looking at my base, it took around 13 million to 19 million cycles, with an average of probably somewhere in the 14 million range. When scrolling in full screen those values shot up to about 65 million cycles or so. That abouts 81 ms for a single frame. Definately more time than is needed to keep the game running at full speed.

With a smaller window, the results seemed a bit more sporadic. Some runs, it typically took about 2 million to 4 million cycles, with an average of about 3 million cycles, although these results didn't typically replicate very well, and the average and typical range may have been a few million higher at some times (about 5-8 million cycles). When looking at some random part of the map without units, the times ranged from about 1.1 million cycles to about 1.5 million cycles. Scrolling brough the time up to about 17 million to 26 million cycles. The average seemed to remain consistently higher after scrolling around for a while.

I also tried checking into the effects of day and night. The numbers looked about 0.3 million higher on average for an empty nighttime screen, than an empty daytime screen.

The number of units on the map also seems to have a big impact. On a map with no units, and no day/night, it took about 0.1 million cycles per frame. On the map with over 2000 lynx, about half of which were in view (and no day/night), it took about 26 million cycles per frame. (About 32 ms per frame).



One of the reasons why I wanted to look into this, is the drawing code I've seen is a bit odd. Plus, certain functions look like there is a lot of overhead before it even gets down to copying the bitmaps around. When it does get to copying the bitmaps, it does it using an indirect function call to draw each scanline. I suspect that function call overhead may be rather significant.
« Last Edit: May 13, 2008, 03:44:43 AM by Hooman »

Offline Brazilian Fan

  • Sr. Member
  • ****
  • Posts: 302
Rendering Time
« Reply #1 on: May 13, 2008, 10:55:16 AM »
30 million cycles just to draw a frame?!  :blink: It can be done in just, well, 1000 cycles. What a waste  :find:  

Offline Leviathan

  • Hero Member
  • *****
  • Posts: 4055
Rendering Time
« Reply #2 on: May 13, 2008, 11:32:38 AM »
Interesting and crazy :)

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Rendering Time
« Reply #3 on: May 14, 2008, 12:24:10 PM »
Quote
30 million cycles just to draw a frame?!  It can be done in just, well, 1000 cycles. What a waste 

Clearly not.

Consider a screen resolution of 1024x768, at 16 bits per pixel. Drawing the entire screen would require 1572864 bytes to be written to the screen buffer. If these writes are accomplished 32 bits at a time, which is the usual write size for a 32 bit CPU, then it requires 393216 writes. (Of course you could say that many CPUs use a 64 bit bus to transfer data with the rest of the system, but the bus speed is also typically much slower than the internal CPU clock speed). That is assuming each pixel is written once, and the pixels are written two at a time. That's not always going to happen. If graphics aren't of an even width, they'll probably do the final pixel as a 16 bit write, and possibly all of them, depending on the code. Also, when units are drawn, the tiles under them are drawn first, and then the unit graphics are overlaid on top. This usually means drawing the same area of the screen twice. There isn't usually much of a way to optimize away that double drawing. Also, that's just counting the number of writes needed. There's still going to be loop overhead within the bitmap copy, and likely other overhead too. For instance, to get the unit graphics to overlap each other properly, you pretty much need to sort the visible ones by their Y coordinate, and then draw them so the ones highest on the screen are drawn first, and are thus overlapped by the lower ones. Doing all that in just 1000 processor cycles is just plain impossible. Having a full screen update take at least 1 million cycles shouldn't be too unexpected. However, 60 million is perhaps a bit wasteful.

Of course, you should keep in mind that I was only measuring the detail pane update, not the full screen update.


Also remember that we don't know how all of the code works. There might be a few other things it needs to do that we don't really know about. Some of the lower cycle times suggests the game avoids redrawing parts of the screen that don't need to be updated, and maintaining a data structure to optimize away some of those updates is going to require a few cycles to keep updated as well. That would be good when there's frequently parts of the screen that don't need to be updated, but is going to be extra overhead without any real benefit when the whole screen needs to be updated. I suspect that's why scrolling takes so much longer.



Edit: Just went to do a few speed tests with MemClear and MemCopy. It seems to be a little slower than I expected. Perhaps due to the speed of memory in my system, although I'm not entirely sure.

I created two arrays of type "short", each of size [1024*768]. I did two types of tests. One was a MemClear type test, using 3 different methods. The first method used memset, the second used explicitly coded assembly using REP STOSD, and the third way used explicitly coded assembly using a loop with MOV. Similarly, for the MemCopy type test, I also used 3 different methods. The first method uses memcpy, the second used explicitly coded assembly using REP MOVSD, and the third way used explicitly coded assembly using a loop with MOV.

The MemClear of the array (short array1[1024*768]), took about 10 million cycles using either memset or the exlicitly coded REP STOSD (which is probably how the compiler implements calls to memset). The explicitly coded loop with MOV took about 11 million cycles.

The MemCopy of array1 into array2 took about 16 million cycles with either memcpy or REP MOVSD (which is probably how the compiler implements calls to memcpy). The explicitly coded loop with MOV took about 23 million cycles.

 
« Last Edit: May 14, 2008, 02:06:09 PM by Hooman »

Offline Brazilian Fan

  • Sr. Member
  • ****
  • Posts: 302
Rendering Time
« Reply #4 on: May 16, 2008, 08:27:56 AM »
True, but you're considering that the whole screen is drawn each frame, and not just the needed. But yeah, 1000 cycles is a bit dull.
« Last Edit: May 16, 2008, 08:29:40 AM by Brazilian Fan »

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Rendering Time
« Reply #5 on: May 16, 2008, 11:46:07 AM »
Drawing 1 tile takes about 1000 cycles according to my test code.

Although, it's not unreasonable to expect the detail pane to be completely redrawn fairly often. Particularly if day and night is on. As the daylight moves across the screen, each tile that changes light intensity will need to be redrawn. Plus, units have ambient animations, and each frame might cover a slightly different set of pixels. The only easy way to draw the next frame for that unit's animation, is to mark all the tiles behind that unit for redraw, and then draw over top of that. Failing that, you'll probably start accumulating garbage from all the old frames around the edges of the unit. Plus, units tend to overlap, so the structure that keeps tracks of what areas need to be updated will probably have a number of double writes to the same part of it. Think what happens when a vehicle moves behind a building, or when two vehicles are close enough to have overlapping graphics, or when a vehicle is moving between two tiles diagonally, and forces the 4 overlapped tiles to be redrawn. Then there's probably the biggest problem: Scrolling. If you move the detail pane at all in any direction, the whole thing is going to need to be redrawn.

It starts to make you wonder why they bothered.  :(