Ahh gee, I didn't see your post until after we finished.
So the OllyDbg session went well. We were able to reproduce the bug while watching things unfold with OllyDbg. We spent a bit of time on initial setup getting the values we needed, and setting breakpoints at reasonable places. Brett did a quick run without OllyDbg to see if the bug could be reproduced by restarting the level, without having to go through the main menu, as that would save us quite a bit of time. After about 15-20 tries we found that was indeed possible. We then got setup in OllyDbg, and repeated the process. As the unit array potentially moved around each time the level was restarted, we found we had to clear and reset the hardware breakpoint on the field we were watching each round. This was a very tedious process. Brett managed to successfully reproduce the bug under OllyDbg after about 15-20 rounds. We kind of noticed something peculiar was up slightly before we witnessed what we were looking for. The
next pointer on the unit record of a vehicle in a garage was first unexpectedly cleared to 0, and then shortly after was set to the -1 value we were looking for. As expected, the game crashed when we examined the garage contents after this.
The two writes before the crash were done in
Map.UnitIndexToPtrs() and
Map.SaveUnits. These were both functions that had been previously analysed, and were responsible for the translation between unit indexes and unit pointers during saving/loading. It was also related to the corruption of the value mentioned in earlier posts. Units have their
next pointer set to -2 when loaded into a garage. This -2 value is not handled correctly by the conversion done during saving/loading. With a bit of luck, this corruption can lead the value to be converted back to 0. Normally a value of -1 is special cased to map to 0, and no valid value should map to 0. This means that during the reverse process, the 0 is special cased and maps to -1. Hence the -2 value is corrupted, and then converted 0, which is then mapped to -1, which indicates a dead unit, and so the game stops mapping the unit type back to a virtual function table, which leads to the crash.
To exhibit this bug, the -2 value must be corrupted in such a way that the reverse process maps the corrupted value back to 0. For this to happen, it depends on the address of the unit array. Each time the level is run, the unit array is reallocated, and may move around in memory. Certain addresses will trigger corruption in a way that the value gets mapped to 0. Hence the bug does not always exhibit itself every run.
This confirms the bug comes directly from the conversion process of the -2 value. There are no other factors at play here. This also shows the bug is in the game itself, and not part of the level. Any level with a garage, with units loaded into it, which is saved twice, runs the risk of corrupting the unit records of the stored units, and whether or not this happens depends on where in memory the unit array happened to be allocated.
The good news is, we can patch this. It shouldn't be too hard to write a replacement function that includes special casing for the -2 value as well as the -1 value. That should stop the corruption from happening during saving/loading. It won't fix already corrupted saved games, but it should prevent the problem from happening again. Perhaps we can roll something out in the next update of Outpost 2.
I may post further details later on about the exact sequence of events and values, and sample problem memory locations for the unit array. For now though, I'll leave the addresses of the two final writes that lead to values of 0 and -1 in the
next field.
In
Map.UnitIndexesToPtrs() the value 0 is unexpectedly stored at:
00435D34 MOV DWORD PTR SS:[EBP],EAX ; unit.ptr[j]* = [Unit*]nextUnit
In
Map.SaveUnits(StreamIO* savedGameFile) the value -1 is stored at:
00435BC9 MOV DWORD PTR DS:[EDI],-1 ; Unit.ptr[j] = -1
Of particular note, is how unexpected the write of 0 was. The larger code sequence is:
00435D1B |/MOV EAX,DWORD PTR SS:[EBP] ; EAX = [Unit*]currentUnit.ptr[j]*
00435D1E ||CMP EAX,-1 ; Check if ([Unit*]ptr == -1) [IsDead]
00435D21 ||JE SHORT Outpost2.00435D39 ; -> Set unit.next* = 0
00435D23 ||SHL EAX,3 ; [EAX = unitIndex * 8]
00435D26 ||LEA EDX,DWORD PTR DS:[EAX+EAX*2] ; [EDX = unitIndex * 24]
00435D29 ||LEA EAX,DWORD PTR DS:[EDX+EDX*4] ; [EAX = unitIndex * 120]
00435D2C ||MOV EDX,DWORD PTR DS:[<Map.unitArray[]*>] ; EDX = Map.unitArray[]*
00435D32 ||ADD EAX,EDX ; EAX = &unitArray[unitIndex]
00435D34 ||MOV DWORD PTR SS:[EBP],EAX ; unit.ptr[j]* = [Unit*]nextUnit
00435D37 ||JMP SHORT Outpost2.00435D40 ; -> Skip instruction
00435D39 ||MOV DWORD PTR SS:[EBP],0 ; unit.ptr[j]* = 0 [null]
00435D40 ||ADD EBP,4 ; EBP = &ptr[j+1]
00435D43 ||DEC ECX ; numPtr--
00435D44 |\JNZ SHORT Outpost2.00435D1B
The bad write of 0 occurs at
00435D34. This is a calculated 0 value. It occurs when the corrupted field value is such that
&Map.unitArray[corruptedIndex] == 0. For that to happen, we must have
corruptedIndex * 120 = -&Map.unitArray, which is a rather bizarre thing to have happen. (Here
sizeof(Unit) == 120). This is in contrast to the special cased write of 0 that would normally happen at
00435D39, which happens when an index value of -1 is converted to a nullptr.