Like I said, I didn't bother to read into your round starting/ending logic too much. There was a lot of code, and it looked somewhat twisted. I just figured the point where everything is created, and where everything dies was a safe point for demonstration purposes. I wasn't trying to finish your level for you.
Hmm, you seem to have missed one of my SVN log comments. That's not a good way to keep track of the paused count.
Consider how that assembly block works. It jumps (JMP) to the target function. Which means it doesn't push a return address (i.e. CALL). When that jumped to function finishes, it executes a return (RET) instruction. That instruction will look at whatever is on the top of the stack, and jump to it (and remove it from the stack). Now, if that assembly block was placed in it's own little function, with no local variables (like the example function I wrote), then the value on the top of the stack will be the return address from that function. But if you place it in a larger function, or try to put code after the jump, guess what. It's not returning there. It's returning to the parent function, and skipping any code beyond that point (or crashing because it read a local variable instead of a return address). I.e., it uses the fact that the "CALL" was the last instruction in a function with an equivalent parameter list, and so it was optimized to a JMP to avoid the extra step while returning.
If you want that to work, then remove those ASM blocks, and just call the function I provided. It's important that it's actually a call.
Attempted simple explaination:
f():
1: ... f()'s code
2: CALL g(1, 2)
3: ... rest of f()'s code
g(int a, int b):
4: ... g()'s code
5: PUSH b
6: PUSH a
7: CALL h(a, b)
8: RET 8
h():
9: ... h()'s code
10: RET 8 // Remomve 2 paraters (8 bytes) from the stack while returning
Stack contents after line X:
2: <2> <1> <Address of 3>
7: <2> <1> <Address of 3> <b=2> <a=1> <Address of 8>
10: <2> <1> <Address of 3>
8: [empty]
Notice the similarity in the first 3 values and the last 3 values on the stack after executing line 7. This duplication can be optimized away, by noting that in g, there is no real code after the function call to h, and the parameter list of the two functions are equal. Instead of duplicating the parameter lists, and then removing them twice, we can just feed through the first set.
f():
1: ... f()'s code
2: CALL g(1, 2)
3: ... rest of f()'s code
g(int a, int b):
4: ... g()'s code
7: JMP h(a, b)
h():
9: ... h()'s code
10: RET 8 // Remomve 2 paraters (8 bytes) from the stack while returning
Stack contents after line X:
2: <2> <1> <Address of 3>
7: <2> <1> <Address of 3>
10: [empty]
Note that this removed the PUSH instructions for the parameter passing, and the RET instruction after the original CALL. The CALL was then replaced by a JMP.
Now, what you've done, is you made the code after the replaced CALL non-empty. But since at line 10, when you try to return from h(), it see's the return address pointing into f(), and will skip completely over any remaining code in g(). This optimization is not safe in that case.
This is almost like appending h onto the end of g, and to make it work, the call, where the two functions are joined, must be the last thing in g.