Author Topic: Graphics Drawing Code (Read 2782 times)

Hooman · « **on:** November 05, 2008, 02:58:35 AM »

So, I started playing around with the graphics drawing code today. I was basically looking at the low level software bitmap copy for sprites. (Sprites are bitmaps with a transparent background, or irregular shape, such as vehicles, that are overlaid on top of some other background).

For some time I had suspected they could be rewritten to increase their speed. Mostly because of a call made for each scanline drawn in the bitmap. I figured a double loop with inline code instead of a function call could be faster. It would require more code overall though, since the part that iterates over the scanline is the same for a number of different types of bitmap copies.

In short, I have some code that seems to run about 14% faster. Mind you, this doesn't result in a noticable speed difference while playing, since not a whole lot of time is spent in the sprite drawing code. For each frame drawn, measurements suggested that almost 240000 processor cycles are spent with the original code, and a little over 200000 cycles are spent with the modified code. The measurements are of course subject to error due to paging and task switching effects. Improving the non-sprite drawing code could possibly result in more of a speedup. This would be the code that draws the screen background for instance. As it covers much more area than the sprites drawn on top of it, it would likely accouny for more CPU time.

The object of this exercise was basically to write a replacement function. It takes a struct parameter with the following layout:

Code: [Select]

BitmapCopyInfo
--------------
0x0	4	void* sourceImageBuffer
0x4	4	void* destImageBuffer
0x8	4	void* overlayMask  [blight or day and night]
0xC	4	int overlayMaskBitOffset
0x10	4	int width
0x14	4	int height
0x18	4	int sourcePitch
0x1C	4	int destPitch
0x20	4	enum DrawMethod drawMethod
0x24	4	union
 0x24  4  short[256]* darkPal16  [current light level]
 0x24  4  int bitXOffset  	[For 1 bpp images]
0x28	4	int blightOverlay  [0..15]
0x2C	4	short[256]* lightPal16  [full daylight]
---------------

The function prototype would be:

Code: [Select]

void __cdecl DrawSprite(BitmapCopyInfo* bitmapCopyInfo);

The original code was the following:

Code: [Select]

00586CEF >  >PUSH EBP                                       ; Function: DrawSprite(BitmapCopyInfo* bitmapCopyInfo)
00586CF0    >MOV EBP,ESP
00586CF2    >PUSHAD
00586CF3    >MOV EBP,DWORD PTR SS:[EBP+8]                   ; EBP := [param1] BitmapCopyInfo* bitmapCopyInfo
00586CF6    >MOV ECX,DWORD PTR SS:[EBP+20]                  ; ECX := BitmapCopyInfo.drawMethod
00586CF9    >SHL ECX,2                                      ; [ECX := drawMethod * 4]
00586CFC    >MOV EDI,DWORD PTR DS:[ECX+586CDF]
00586D02    >SUB EDI,Outpost2.00586D3C                      ; [* Calculate offset from end of CALL instruction *]
00586D08    >MOV DWORD PTR DS:[586D38],EDI                  ; [* Hardcode CALL address into instruction *]
00586D0E    >MOV EAX,DWORD PTR SS:[EBP+18]                  ; EAX := BitmapCopyInfo.sourcePitch
00586D11    >MOV EBX,DWORD PTR SS:[EBP+10]                  ; EBX := BitmapCopyInfo.sourceWidth
00586D14    >MOV EDX,EBX
00586D16    >MOV ESI,DWORD PTR SS:[EBP+1C]                  ; ESI := BitmapCopyInfo.destPitch
00586D19    >SHL EDX,1
00586D1B    >SUB ESI,EDX
00586D1D    >SUB EAX,EBX                                    ; EAX := sourcePitch - sourceWidth
00586D1F    >MOV DWORD PTR DS:[<int destPitchDelta>],ESI    ; [global] int destPitchDelta := ESI
00586D25    >MOV DWORD PTR DS:[<int sourcePitchDelta>],EAX  ; [global] int sourcePitchDelta := EAX
00586D2A    >MOV ESI,DWORD PTR SS:[EBP]                     ; ESI := BitmapCopyInfo.sourceImageBuffer*
00586D2D    >MOV EDI,DWORD PTR SS:[EBP+4]                   ; EDI := BitmapCopyInfo.destImageBuffer*
00586D30    >MOV ECX,DWORD PTR SS:[EBP+10]                  ; [LoopStart]:  ECX := BitmapCopyInfo.sourceWidth
00586D33    >PUSH EBP                                       ; [Save EBP]
00586D34    >MOV EBP,DWORD PTR SS:[EBP+24]                  ; EBP := BitmapCopyInfo.darkPal16*
00586D37    >CALL 128CC3B4
00586D3C    >POP EBP                                        ; [Restore EBP]
00586D3D    >ADD ESI,DWORD PTR DS:[<int sourcePitchDelta>]  ; ESI := sourceImageBuffer* + sourcePitchDelta
00586D43    >ADD EDI,DWORD PTR DS:[<int destPitchDelta>]    ; EDI := destImageBuffer* + destPitchDelta
00586D49    >DEC DWORD PTR SS:[EBP+14]                      ; BitmapCopyInfo.sourceHeight--
00586D4C  ^ >JNZ SHORT Outpost2.00586D30                    ; -> LoopStart
00586D4E    >POPAD
00586D4F    >LEAVE
00586D50    >RETN


00586D51 >  >MOV EDX,EBP                                    ; Function: DrawScanline8Pal16Transparent0
00586D53    >SHR ECX,1
00586D55    >JNB SHORT Outpost2.00586D6D
00586D57    >MOVZX EAX,BYTE PTR DS:[ESI]                    ; EAX := *sourceImageBuffer
00586D5A    >ADD EAX,EAX                                    ; [EAX := sourcePixel * 2]
00586D5C    >JE SHORT Outpost2.00586D65
00586D5E    >MOV AX,WORD PTR DS:[EAX+EDX]                   ; AX := pal16[sourcePixel * 2]
00586D62    >MOV WORD PTR DS:[EDI],AX                       ; *destImageBuffer := AX
00586D65    >INC ESI                                        ; ESI := sourceImageBuffer*++
00586D66    >ADD EDI,2                                      ; EDI := destImageBuffer* += 2
00586D69    >OR ECX,ECX
00586D6B    >JE SHORT Outpost2.00586D96                     ; -> Return
00586D6D    >XOR EBX,EBX                                    ; [LoopStart]:
00586D6F    >XOR EAX,EAX
00586D71    >MOV BL,BYTE PTR DS:[ESI+1]                     ; BL := *(sourceImageBuffer + 1)
00586D74    >MOV AL,BYTE PTR DS:[ESI]                       ; AL := *sourceImageBuffer
00586D76    >ADD EBX,EBX                                    ; [EBX := sourcePixel2 * 2]
00586D78    >JE SHORT Outpost2.00586D97
00586D7A    >ADD EAX,EAX                                    ; [EAX := sourcePixel1 * 2]
00586D7C    >JE SHORT Outpost2.00586DAC
00586D7E    >MOV BX,WORD PTR DS:[EBX+EDX]                   ; [DrawBothPixels]:  BX := pal16[sourcePixel2 * 2]
00586D82    >ADD ESI,2                                      ; ESI := sourceImageBuffer* += 2
00586D85    >MOV AX,WORD PTR DS:[EAX+EDX]                   ; AX := pal16[sourcePixel1 * 2]
00586D89    >SHL EBX,10                                     ; EBX := destPixel2 << 16
00586D8C    >OR EAX,EBX                                     ; EAX := destPixel1 | (destPixel2 << 16)
00586D8E    >MOV DWORD PTR DS:[EDI],EAX                     ; *destImageBuffer := EAX
00586D90    >ADD EDI,4                                      ; EDI := destImageBuffer* += 4
00586D93    >DEC ECX
00586D94  ^ >JNZ SHORT Outpost2.00586D6D                    ; -> LoopStart
00586D96    >RETN
00586D97    >ADD EAX,EAX                                    ; [EAX := sourcePixel1 * 2]
00586D99    >JE SHORT Outpost2.00586DA2                     ; -> Skip drawing pixels
00586D9B    >MOV BX,WORD PTR DS:[EAX+EDX]                   ; [DrawFirstPixelOnly]:
00586D9F    >MOV WORD PTR DS:[EDI],BX                       ; *destImageBuffer := BX
00586DA2    >ADD ESI,2                                      ; [LoopEpilog]:  ESI := sourceImageBuffer* += 2
00586DA5    >ADD EDI,4                                      ; EDI := destImageBuffer* += 4
00586DA8    >DEC ECX
00586DA9  ^ >JNZ SHORT Outpost2.00586D6D                    ; -> LoopStart
00586DAB    >RETN
00586DAC    >MOV AX,WORD PTR DS:[EBX+EDX]                   ; [DrawSecondPixelOnly]:  AX := pal16[sourcePixel2 * 2]
00586DB0    >ADD ESI,2                                      ; ESI := sourceImageBuffer* += 2
00586DB3    >MOV WORD PTR DS:[EDI+2],AX                     ; *(destImageBuffer + 2) := AX
00586DB7    >ADD EDI,4                                      ; EDI := destImageBuffer* += 4
00586DBA    >DEC ECX
00586DBB  ^ >JNZ SHORT Outpost2.00586D6D                    ; -> LoopStart
00586DBD    >RETN

This was replaced with this new code:

Code: [Select]

	// About 14% faster
	__asm
	{
  sub esp, 0x8
  push ebx
  push esi
  push edi
  push ebp

	; Make sure we have something to draw
  mov edx, [ecx + 0x14]; height
  mov ebx, [ecx + 0x10]; width
  or edx, edx
  jz Return
  or ebx, ebx
  jz Return

	; Precalculate
  lea esi, [ebx * 2]	; width*2
  mov eax, [ecx + 0x18]; sourcePitch
  mov edi, [ecx + 0x1C]; destPitch
  sub eax, ebx  ; sourcePitch - width
  sub edi, esi  ; destPitch - width*2
  mov [esp + 0x10], eax; sourcePitchDelta
  mov [esp + 0x14], edi; destPitchDelta

	; Cache values in registers
  mov esi, [ecx]  ; sourceImage*
  mov edi, [ecx + 0x4]; destImage*
  mov ebp, [ecx + 0x24]; palette16*

  mov ecx, ebx  ; width
DrawPixelLoopStart:
  mov al, [esi]
  inc esi
	;lodsb
  or al, al
  jz SkipPixelDraw
  movzx eax, al
  mov ax, [ebp + eax*2]
	;stosw
  mov [edi], ax
SkipPixelDraw:
  add edi, 2
  dec ecx
  jnz DrawPixelLoopStart

  mov ecx, ebx  ; width
  add esi, [esp + 0x10]; sourcePitchDelta
  add edi, [esp + 0x14]; destPitchDelta
  dec edx    ; heightRemaining
  jnz DrawPixelLoopStart

Return:
  pop ebp
  pop edi
  pop esi
  pop ebx
  add esp, 0x8
	}

I had a few other ideas that I'd tried. For instance, I tried blocking memory reads and writes to try and take advantage of the full 32 bit register size. I also tried a few memory alignment tricks, since access to a 32 value that's aligned on a 32 boundary is faster than an unaligned access. Unfortunately, these changes led to quite an increase in code size, and with a lot of added complexity. The simpler code was both faster, and easier to write/debug. This is possibly due to code caching effects, as the simpler code had a much smaller loop that is more likely to fit in a code cache. It may also be due to the simpler loop structure, that might have led to fewer pipline stalls due to branching.

I still have a few tricks up my sleave that I'd like to try though. For one, this could be an excellent excuse to try and use MMX instructions/registers. Although, I wanted to write normal integer code first that didn't rely on MMX. There are some early pentiums that don't have MMX, so I wanted something a little more universal. Another thing I wanted to try, was to modify the output blocking code so that it avoided branches by reading the destination bitmap, so that the original value could be written back inside of a block instead of branching to partial block writing code. This would probably work best with the MMX idea.

Sirbomber · « **Reply #1 on:** November 05, 2008, 04:29:10 PM »

Hmm, will this be included in the final release 1.3.5?

Hooman · « **Reply #2 on:** November 05, 2008, 07:30:40 PM »

Not likely. At least not in it's current form. There is too little gain to justify it. I'd probably try modding the non-sprite drawing code and seeing if there was a significant change before adding something like this. Besides, I'd want to have finished fooling around with it so I can remove the timing code, which adds a bit to the runtime.

I think for there to be a significant improvement in drawing times, I'd need to use some kind of hardware acceleration. Such a change would probably require some fairly extensive work though.

As for some of the other things I tried, here's two other attempts at rewriting the function. They were both complete rewrites, although the first one was based on the same structure as the original but with inlined code.

Code: [Select]

	// About 5% faster
	__asm
	{
  sub esp, 0xC
	; push ebx
  push esi
  push edi
  push ebp

	; Make sure we have something to draw
  mov edx, [ecx + 0x14]; height
  mov ebx, [ecx + 0x10]; width
  or edx, edx
  jz Return
  or ebx, ebx
  jz Return

	; Precalculate
  mov [esp + 0x14], edx; height
  lea esi, [ebx * 2]
  mov eax, [ecx + 0x18]; sourcePitch
  mov edx, [ecx + 0x1C]; destPitch
  sub eax, ebx
  sub edx, esi
  mov [esp + 0xC], eax; sourcePitchDelta
  mov [esp + 0x10], edx; destPitchDelta

	; Cache values in registers
  mov esi, [ecx]  ; sourceImage*
  mov edi, [ecx + 0x4]; destImage*
  mov ebp, [ecx + 0x24]; palette16*
  mov ecx, ebx  ; width

  xor eax, eax
DrawLine:
  shr ecx, 1
  jnc DrawPixelPair; ->
	; Handle single pixel draw
  lodsb
  or al, al
  jz SkipOddPixelDraw; ->
  mov ax, [ebp + eax*2]
  mov [edi], ax
SkipOddPixelDraw:
  add edi, 2
	; Make sure there are still pixel pairs left to draw
  or ecx, ecx
  jz DrawLineLoopEpilog; -> 

DrawPixelPair:
  lodsw
  
  or ax, ax
  jz NextOutputPosition; ->
  or al, al
  jz DrawPixel2  ; -> 
  or ah, ah
  jz DrawPixel1  ; ->

	; DrawBothPixels
  movzx edx, ah
  xor ah, ah
  mov dx, [ebp + edx*2]; Palette lookup
  mov ax, [ebp + eax*2]; Palette lookup
  shl edx, 16
  or eax, edx
  stosd
  xor eax, eax
  dec ecx
  jnz DrawPixelPair	; ->
  jmp DrawLineLoopEpilog; ->
DrawPixel1:
	; Draw first pixel
  mov ax, [ebp + eax*2]; Palette lookup
  mov [edi], ax  ; Draw pixel
  jmp NextOutputPosition; ->
DrawPixel2:
  or ah, ah
  jz NextOutputPosition; ->
  movzx edx, ah
  mov ax, [ebp + edx*2]; Palette lookup
  mov [edi + 2], ax	; Draw pixel
NextOutputPosition:
  add edi, 4
DrawPixelsLoopEpilog:
  dec ecx
  jnz DrawPixelPair	; ->

DrawLineLoopEpilog:
  dec [esp + 0x14]	; height
  jz Return

  mov ecx, ebx  ; width
  add esi, [esp + 0xC]; sourcePitchDelta
  add edi, [esp + 0x10]; destPitchDelta
  jmp DrawLine

Return:
  pop ebp
  pop edi
  pop esi
	; pop ebx
  add esp, 0xC
	}

This second copy tries to align memory reads, and to read 4 bytes at a time while possible. It also tries to combine writes when possible. The branching can probably be simplified slightly.

Code: [Select]

	// About 10% faster
	__asm
	{
  sub esp, 0x18
  push ebx
  push esi
  push edi
  push ebp

	; Make sure we have something to draw
  mov edx, [ecx + 0x14]; height
  mov ebx, [ecx + 0x10]; width
  or edx, edx
  jz Return
  or ebx, ebx
  jz Return

	; Precalculate
  mov [esp + 0x18], edx; heightRemaining
  lea esi, [ebx * 2]
  mov eax, [ecx + 0x18]; sourcePitch
  mov edx, [ecx + 0x1C]; destPitch
  sub eax, ebx
  sub edx, esi
  mov [esp + 0x10], eax; sourcePitchDelta
  mov [esp + 0x14], edx; destPitchDelta

	; Cache values in registers
  mov esi, [ecx]  ; sourceImage*
  mov edi, [ecx + 0x4]; destImage*
  mov ebp, [ecx + 0x24]; palette16*

	; Calculate number of leading pixels
  mov ecx, esi
  neg ecx
  and ecx, 3  	; Number of pixels until an aligned boundary [0..3]
  cmp ecx, ebx  ; Check number of lead pixels against width
  jbe SkipLeadPixelCap; -> 
  mov ecx, ebx  ; Number of lead pixels is capped at width
SkipLeadPixelCap:
  mov [esp + 0x1C], ecx; numLeadPixels
	; Calculate number of tail pixels and middle DWORDs
  sub ebx, ecx  ; width - numLeadPixels
  mov edx, ebx
  and ebx, 3  	; numTailPixels
  shr edx, 2  	; numMiddleDwords
  mov [esp + 0x24], ebx; numTailPixels
  mov [esp + 0x20], edx; numMiddleDwords
	; Combine lead and tail pixels when numMiddleDwords is 0
  jnz SkipCombineLeadAndTail
  add ecx, edx
  mov [esp + 0x1C], ecx; numLeadPixels
  mov [esp + 0x24], 0	; numTailPixels
SkipCombineLeadAndTail:


  xor eax, eax
DrawLine:
	; Get number of leading pixels
  mov ecx, [esp + 0x1C]; numLeadPixels
  or ecx, ecx
  jz DrawAlignedPixels
DrawLeadPixelLoop:
	; Draw single pixels up to an aligned boundary
  lodsb
  xor ah, ah
  or al, al
  jz SkipLeadPixelDraw
  mov ax, [ebp + eax*2]; Palette lookup
	;stosw    ; Write pixel
  mov [edi], ax
SkipLeadPixelDraw:
  add edi, 2
  dec ecx
  jnz DrawLeadPixelLoop

DrawAlignedPixels:
	; Get number of aligned pixels
  mov ecx, [esp + 0x20]; numMiddleDwords
  or ecx, ecx
  jz DrawLineLoopEpilog
DrawPixelDword:
  lodsd
	; Check transparent pixel block
  or eax, eax
  jz SkipOverTransparentBlock
	;or ax, ax
	;jz DrawSecondPixelWord

	; Draw first pixel word
  or al, al
  jz DrawFirstBlockSecondPixel
  or ah, ah
  jz DrawFirstBlockFirstPixel

	; Draw both pixels in first block
  movzx edx, ah
  movzx ebx, al
  mov dx, [ebp + edx*2]; Palette lookup
  mov ax, [ebp + ebx*2]; Palette lookup
  shl edx, 16
  mov dx, ax
  mov [edi], edx  ; Write pixels
DrawSecondPixelWord:
  add edi, 4
  shr eax, 16
	;or ax, ax
	;jz DrawPixelDwordLoopEpilog

  or al, al
  jz DrawSecondBlockSecondPixel
  or ah, ah
  jz DrawSecondBlockFirstPixel

	; Draw both pixels in second block
  movzx edx, ah
  movzx ebx, al
  mov dx, [ebp + edx*2]; Palette lookup
  mov ax, [ebp + ebx*2]; Palette lookup
  shl edx, 16
  or eax, edx
  stosd

DrawPixelDwordLoopEpilog:
  dec ecx
  jnz DrawPixelDword	; ->
  jmp DrawTailPixels	; ->

DrawFirstBlockSecondPixel:
  or ah, ah
  jz DrawSecondPixelWord
  movzx edx, ah
  mov ax, [ebp + edx*2]; Palette lookup
  mov [edi + 2], ax
  jmp DrawSecondPixelWord
DrawFirstBlockFirstPixel:
  movzx ebx, al
  mov ax, [ebp + ebx*2]; Palette lookup
  mov [edi], ax
  jmp DrawSecondPixelWord

DrawSecondBlockSecondPixel:
  or ah, ah
  jz SkipDrawSecondBlockSecondPixel
  movzx edx, ah
  mov ax, [ebp + edx*2]; Palette lookup
  mov [edi + 2], ax
SkipDrawSecondBlockSecondPixel:
  add edi, 4
  jmp DrawPixelDwordLoopEpilog
DrawSecondBlockFirstPixel:
  mov ax, [ebp + eax*2]; Palette lookup
  mov [edi], ax
  add edi, 4
  jmp DrawPixelDwordLoopEpilog
SkipOverTransparentBlock:
  add edi, 8
  jmp DrawPixelDwordLoopEpilog

DrawTailPixels:
	; Get number of tail pixels
  mov ecx, [esp + 0x24]; numTailPixels
  xor eax, eax
  or ecx, ecx
  jz DrawLineLoopEpilog
DrawTailPixelLoop:
	; Draw single pixels until end of line
  lodsb
  xor ah, ah
  or al, al
  jz SkipTailPixelDraw
  mov ax, [ebp + eax*2]; Palette lookup
	;stosw    ; Write pixel
  mov [edi], ax
SkipTailPixelDraw:
  add edi, 2
  dec ecx
  jnz DrawTailPixelLoop

DrawLineLoopEpilog:
	; Increment pointers
  add esi, [esp + 0x10]; sourcePitchDelta
  add edi, [esp + 0x14]; destPitchDelta
	; Decrement loop count
  dec [esp + 0x18]	; heightRemaining
  jnz DrawLine


Return:
  pop ebp
  pop edi
  pop esi
  pop ebx
  add esp, 0x18
	}

Sirbomber · « **Reply #3 on:** November 06, 2008, 08:32:20 AM »

Quote

Not likely. At least not in it's current form. There is too little gain to justify it.

Yeah, that's what I thought. I mean, how many people would this actually effect? I'd be surprised if anyone here was still playing OP2 on a Windows 95.

Now, if you could get rid of the 5-second delay between giving a command and the command being processed...
And while you're at it make it so units don't have to be stuck to a grid.
And when you're done with that you might as well do a remake, just for fun.

For the sake of challenge, do it all by the end of the month.
That sounds reasonable, right Hooman?

Hooman · « **Reply #4 on:** November 06, 2008, 06:48:22 PM »

Wise guy. <_<

And who said anything about Windows 95? I'm running Windows XP, and the changes I'm making don't depend on the version of Windows. Simply put, Outpost 2 uses software rendering, and it's kind of slow. You don't notice it much if the window is at a lower resolution, but I certainly notice it if I try to maximize my Outpost 2 window. I suppose most people have faster computers than me, but then, I'm sort of doing this for me.

Sirbomber · « **Reply #5 on:** November 06, 2008, 08:12:56 PM »

Quote

I suppose most people have faster computers

When I made the Windows 95 comment, I was referring to how this change would only affect people with older, slower computers available when OP2 first came out in '97. I'd imagine most people no longer use those kinds of computers (assuming "computer rot" wouldn't have made them totally inoperable by now).

BlackBox · « **Reply #6 on:** November 07, 2008, 04:09:48 PM »

I notice pretty bad (as in not playable) graphical slowdowns when I play the game at full res (1600x1200)... granted the computer on which I did this was a P3, 700 mhz. It's only noticable at these higher resolutions. If you would like someone to test / collect benchmark data I would be willing to do so.

Although, a more useful fix would probably be to figure out what causes the game to crash when it runs at resolutions in excess of this (If I run the game fullscreen at 2048x1536 -- yes, I do have a monitor that is capable of this -- it seems to crash when I scroll to the far bottom of a larger map (I used Colony Builder II - Plymouth, Starship and it crashed after I scrolled to the bottom edge of the map, with the screen showing the approximate horizontal center of that edge).

Hooman · « **Reply #7 on:** November 07, 2008, 07:06:14 PM »

Yeah, I have an 800 MHz Celeron, and I notice it slowing to a crawl at 1280x1024. It runs at about the same rate as it did on my 120 MHz machine at the lower native resolution.

The high resolution crashing bug would also be nice to fix. I believe I found some fixed sized buffers that are probably to blame there. I don't think I have the hardware to test that stuff out though.

I'll maybe try cleaning up the code so other people can run their own timing tests if they want to. Currently, I've just canabalized a copy of an other project to add timing code into it, and a few code hooks. It's not pretty, and you'd need to recompile to test the different replacement functions.

Hooman · « **Reply #8 on:** November 09, 2008, 03:48:23 AM »

Ok, so I've added a few more code hooks so I could time the tile drawing code. At the default resolution, the total drawing time usually varies between 3-6 million cycles. About 600 thousand cycles are spent drawing tiles, and about 200 thousand cycles are spent drawing sprites. That means about 1/8 - 1/4 of the time is spent drawing to the back buffer. That data will need to then be copied to the primary buffer. (The double buffering prevents flickering). That still leaves a significant amount of time that isn't accounted for by moving bits around.

Not surprisingly, the tile drawing code accounted for more time than the sprite drawing code, as it needs to draw more. However, I don't feel there is too much room for improvement here. The time taken for both the sprite and tile drawing code is still only a relatively small part of the total drawing time.

I've been thinking one of the areas for improvement, would be to lift some of the clipping code out to a higher level function. As it stands, each bitmap drawn is clipped on it's own. This is somewhat wasteful, as much of the clipping results are shared between the edge tiles, and the center tiles don't need any clipping at all. The clipping code also seems to be rather large, so reducing it's size, and how frequently it runs could help with code caching.

It may also be worth further investigating how the unit draw lists are formed. The units have to be drawn in such a way that they overlap correctly. Perhaps there is room for improvement in how this is done.

There are also some remains of a periodic CD check that was emasculated long ago. I suppose the rest of the calling code can be removed, and save a little bit of time. Of course it only checks once every 4096 ticks, so I doubt there'd be any noticable change in speed from that. But why not, it's easy to do.

Hooman · « **Reply #9 on:** November 16, 2008, 08:05:44 AM »

I've been further documenting the detail pane drawing code. I've got most of the Viewport class member variables worked out.

The Viewport class has 5 arrays, each with 400 bytes in them. The first two are bit vectors of tiles that need to be redrawn. The last two are bit vectors of tiles that are within the sight range of a unit. It does a bit of pointer swapping between each pair of buffers, so you have both a new and old copy around. These bit vectors are only for the tiles within the current view area.

Note that since there are only 400 bytes, this means there can be at most 400*8 = 3200 tiles visible. If the detail pane took up the entire viewable area, this corresponds to a maximum resolution of a little over 2048x1536. (2048/32 = 64, 1536/32 = 48, 64 x 48 = 3072 tiles). I had suspected these buffers before for that high resolution bug, but it seems they might actually be big enough. I guess the problem lies elsewhere.

I still need to figure out one of the function for the unit draw list. It does some processing between when the list of units is built, and when it is displayed. Some of it seems to be related to shadows, and marking background areas to be redrawn.

Savant 231-A · « **Reply #10 on:** November 16, 2008, 10:13:43 AM »

(although i don't understand any of Hoomans words)
Respect!

That means... errm... keep up the good work

News:

Author Topic: Graphics Drawing Code (Read 2782 times)