Author Topic: Interesting Compiler Behavior (Read 1936 times)

Hooman · « **on:** August 24, 2008, 11:11:20 PM »

I came across something peculiar that I've been puzzling about. I played around with the source a bit, trying to find exactly what conditions generates this kind of code. I'm not entriely sure why, but with code something like this:

Code: [Select]

typedef unsigned int  uint;
uint T(int a, int b)
{
	int d;
	int r;

	// Interesting assembly code generated by:
	d = b - a;  // Difference
	r = d * 8;  // Multiply by power of 2

	return r;	// Convert to unsigned
}

You get assembly code that looks like this:

Code: [Select]

// Note: Turn on "Optimize for Speed"
?T@@YAIHH@Z PROC NEAR    ; T, COMDAT
; Line 22
	mov	ecx, DWORD PTR _a$[esp-4]
	mov	eax, ecx; a
	shl	eax, 29	; ?  ; 0000001dH
	sub	eax, ecx; ?  "-a"
	mov	ecx, DWORD PTR _b$[esp-4]
	add	eax, ecx; d = b - a  [(-a) + b]
	shl	eax, 3	; r = d * 8
; Line 23
	ret	0
?T@@YAIHH@Z ENDP

I don't really understand why there is a SHL EAX, 29 instruction. Note that we are multiplying by 8, which accounts for the shift by 3 near the end (8 = 2^3). This code is only generated when multiplying by a power of 2, and the two shift constants always add up to 32. Further, this code only seems to be generated under the three conditions in the source comments.

So, why is SHL EAX, 29 particularly odd here? Well, it will fill the lower 29 bits with 0s, and the old lower 3 bits will become the new upper 3 bits. Now, the SUB and ADD instructions that follow work in such a way that the lower bits can affect the upper bits, but not the other way around. That is, those new upper 3 bits can only affect the upper 3 bits of the result. But then you have the SHL EAX, 3, which will shift those 3 bits out and discard them.

The only possible reason that I can guess at for the SHL EAX, 29, is to affect the flags register. I'm thinking maybe it does something special to the carry or overflow flags. Of course these flags aren't used anywhere here, so why bother generating special (and extra) code to affect them. Plus, from the standard, the upper bits of a result that overflows simply get discarded. The only way I can think of to use the carry or overflow flags might be to use certain boolean operators like <, <=, >, >=. (The use of carry/overflow is controlled by the unsigned/signed part of the type).

Or, if instead you optimized for size, you get:

Code: [Select]

// Note: Turn on "Optimize for Size"
?T@@YAIHH@Z PROC NEAR    ; T, COMDAT
; Line 22
	mov	eax, DWORD PTR _a$[esp-4]
	imul	eax, 536870911  	; 1fffffffH
	add	eax, DWORD PTR _b$[esp-4]
	shl	eax, 3
; Line 23
	ret	0
?T@@YAIHH@Z ENDP    ; T

I'm not too sure where that IMUL comes from.

Another oddity is that if the return type is changed from uint to int, then it simply generates:

Code: [Select]

?T@@YAHHH@Z PROC NEAR    ; T, COMDAT
; Line 22
	mov	eax, DWORD PTR _b$[esp-4]
	mov	ecx, DWORD PTR _a$[esp-4]
	sub	eax, ecx
	shl	eax, 3
; Line 23
	ret	0
?T@@YAHHH@Z ENDP    ; T

You also get this same (simple) generated code by eliminating the local variables, even under different optimization schemes, such as Speed, Size, or Disabled (Debug). Although, the Disabled/Debug version does generate a few extra instructions for stack maintenance, but they don't affect the calculation at all.

That is, changing the code to:

Code: [Select]

uint T(int a, int b)
{
	return (b-a) * 8;
}

produces the same results as changing the return type to int did.

TH300 · « **Reply #1 on:** August 25, 2008, 01:29:27 PM »

Just some guesses from a person who doesn't understand too much of assembly...

Lets start with the size-optimized code, because its simpler.

Code: [Select]

// Note: Turn on "Optimize for Size"
?T@@YAIHH@Z PROC NEAR  ; T, COMDAT
; Line 22
mov eax, DWORD PTR _a$[esp-4]
imul eax, 536870911  ; 1fffffffH
add eax, DWORD PTR _b$[esp-4]
shl eax, 3
; Line 23
ret 0
?T@@YAIHH@Z ENDP  ; T

The thing that was not clear to you is the imul instruction. Normally, one would expect a multiplier of FFFFFFFF, because that would turn a into -a. But in that case the following add could cause an overflow if the result is in the upper uint range (i.e. greater than 2^31). By multiplying with 1FFFFFFF we get enough space to prevent an overflow and the information thats lost would be lost anyway. What happens?

mov...: eax = a
imul...: eax = (2000,0000 - 1) * a
add...: eax = (2000,0000 - 1) * a + b
shl...: eax = (2000,0000 * a - a + B) << 3 = (2000,0000 << 3) * a + ((b - a) << 3) = (b - a) << 3

At least this proofs that the result should be correct. Now, the other assembly:

Code: [Select]

// Note: Turn on "Optimize for Speed"
?T@@YAIHH@Z PROC NEAR  ; T, COMDAT
; Line 22
mov ecx, DWORD PTR _a$[esp-4]
mov eax, ecx; a
shl eax, 29; ?; 0000001dH
sub eax, ecx; ?  "-a"
mov ecx, DWORD PTR _b$[esp-4]
add eax, ecx; d = b - a  [(-a) + b]
shl eax, 3; r = d * 8
; Line 23
ret 0
?T@@YAIHH@Z ENDP

The interesting part is

Code: [Select]

shl eax, 29; ?; 0000001dH
sub eax, ecx; ?  "-a"

So, what does that do?

eax = a << 29
eax = (2000,0000 * a) - a

We see, it does essentially the same thing in both variants.

Hooman · « **Reply #2 on:** August 28, 2008, 08:02:29 AM »

Interesting how you showed the equivalence.

Of course this still begs the question, why not just calculated (b-a) << 3 directly?

I suspect this may just be some obscure internal compiler detail that just happens to show up under obscure conditions.

News:

Author Topic: Interesting Compiler Behavior (Read 1936 times)

Hooman

Interesting Compiler Behavior

TH300

Interesting Compiler Behavior

Hooman

Interesting Compiler Behavior