Compiler produced code is often pretty good. It depends of course on the quality of the code in the source language. You can always write terrible code that an optimizer can't deal well with. I don't think hand optimizing everything is really the proper way to reduce bloat.
The use of libraries can really bloat code. I think that's what the real culprit is. Trimming out unused functions has done a lot of reduce compiled executable sizes. You can still get a lot of code brought in from rarely used code paths though. Sometimes a code path may be impossible, yet references to other code from an unused code path can still bring in a lot of extra bloat. Sometimes library functions will be used that provide a lot more functionality than is strictly needed, and so extra support code is brought in. Sometimes a generic function might handle many cases, and optimized functions handle specific cases, and the program uses a bunch of specific cases, and one general case, causing all the code to be brought in, even though the one general function could have handled it all. Sometimes code is included to support legacy systems. Sometimes there are differences for string encodings, which might be set at runtime, and so all the different ways of handling strings is brought in.
There are many reasons why more code can get pulled in than you'd expect. Smarter linkers help cut down on the bloat, but there's only so much you can reasonably expect them to do.
As for optimization settings, optimizing for size often produces the fastest code, since the code is better able to fit in the cache. When it doesn't produce the fastest code, it's usually not that much slower than speed optimized code. I optimize for size by default.