Author Topic: CPU Cycle Cost  (Read 4750 times)

Offline lordpalandus

  • Banned
  • Hero Member
  • *****
  • Posts: 825
CPU Cycle Cost
« on: February 25, 2016, 02:29:14 AM »
Hi, I have a question for the more technically minded people here regarding the cost of CPU Cycles for doing certain coding structures. In the grand scheme of things it may not be entirely useful to know, but I'm interested in knowing.

By reading through a tutorial explaining C++, it states that for most nested loops or nested if-else statements, a branch should be implemented as a function call instead of having a loop or if-else statement. I know that to call the function, the program gives control up to the function until it reaches a return statement at which point control is given back to whatever initially made the function call. So, here is the question:

What costs more in CPU cycles? Performing a function call or executing a loop or executing an if-else statement? Assuming that the calculation would be performed in exactly the same way (ie if the calculation was to add two numbers together, and then output the result to the screen), which would cost more CPU cycles to perform, or would they have an equal CPU cost?

I ask because I'm curious if excessively using function calls instead of loops or if-else statements to make a program easier to read has some kind of program cost at run time.
Currently working on Cataclysm of Chaos, Remade.
Link to OPU page = http://forum.outpost2.net/index.php/topic,6073.0.html

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2352
  • OPHD Lead Developer
    • LairWorks Entertainment
Re: CPU Cycle Cost
« Reply #1 on: February 25, 2016, 11:37:20 AM »
This is entirely based on the context of the code, the compiler used and its settings.

In debug mode, most compilers will just spit out what you write in machine instructions call for call. The idea is to make it easy to find logic mistakes. When set to release, optimizing compilers will find common code path patterns and optimize them via loop unrolling, using bitshifts when dividing by 2, copying function code into the body of the code instead of making function calls, etc.

But again -- it's very context dependent. Hooman knows these optimizations probably a lot better than I do but I also know that premature optimization is a waste of time and effort when instead you should build your code and then use a profiler to see where your code is spending most of its time and optimize that.

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: CPU Cycle Cost
« Reply #2 on: February 25, 2016, 02:16:38 PM »
I agree with Leeor. Favor readability and reusability of code base over optimization unless you are encountering performance problems. Then profile to determine the problem instead of guessing.

Typically calling a function from a for loop will `cost more` than placing the body of the function inside the for loop. However, long functions or functions with multiple nested for loops or if/else statements are difficult to read and debug.

Offline lordpalandus

  • Banned
  • Hero Member
  • *****
  • Posts: 825
Re: CPU Cycle Cost
« Reply #3 on: February 25, 2016, 11:51:24 PM »
Personally, I do agree with you both. I do prefer readability over optimization; a more easily understandable program is far more valuable to me right now as a noob at programming, than optimization.

I'm just curious at the actual costs of doing these things for future interests, where squeezing every last bit of optimization possible would be important.
Currently working on Cataclysm of Chaos, Remade.
Link to OPU page = http://forum.outpost2.net/index.php/topic,6073.0.html

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: CPU Cycle Cost
« Reply #4 on: February 26, 2016, 12:27:45 AM »
I'm rather curious about this source article. There may be something context specific in it. Do you have a link?

A function call is an unconditional branch, which makes it non-equivalent to an if/else or a loop, which uses conditional branches. You can not replace a conditional branch with an unconditional one and maintain equivalent code behaviour. If you move an inner loop to another function, you'll still need the conditional branch structure for the looping inside the function, plus you'll be adding the overhead of the unconditional branch to enter and leave the function.

Function calls can add significant overhead in some cases. In the case of adding two numbers together, a function call would add significant overhead. Inlined code could be as simple as a single add instruction, which on old CPUs take 1 clock cycle, and on newer CPUs can complete in parallel with other instructions, giving an average cost of less than 1 clock cycle. A function call to add two numbers would likely contain two push instructions to load the parameters on the stack, the call, two memory loads to load the parameters, the add, and the return. Such a simple function would likely omit function prologue and epilogue code, so there are some cost savings there, but you're still looking at 7 instructions. Using the fastcall convention, you can pass the parameters in registers, which if already conveniently loaded in the correct registers, means you can omit the two push instructions, and the two memory loads inside the function, in which case there are still 3 instructions needed. Using a function call may also have other secondary effects on optimization. The fastcall convention uses a specific set of registers to pass the values, which can increase the pressure on the register allocator, whereas an inline add could work just fine with any registers. The call and return may have implications for branch prediction, which can further increase call costs. Further, passing parameters on the stack means memory is touched, which is slower than using values in registers, although for such a simple case, the cache memory should catch this so it shouldn't incur the full cost of actually writing and reading back the values from main system RAM.

Of course if optimizations are turned on (which they won't be for a debug compile), the compiler could just inline a function call, producing equivalent output whether you wrote the code inline or as a function call. The way C++ works though, with separate compilation units and a linker, the structure of your source code could prevent this. If you wrote both functions in the same source file, or placed one function as inline code in a header file (making them both visible in the same compilation unit), and do a release build, there's a good chance the compiler will inline a small function. If you place the two functions in separate source files they probably won't benefit from inlining.


As to whether or not you should break up code into functions, that's an entirely different question. My general rule of thumb is if your function is longer than a screen full of code, you should break it up. Code readability and maintainability is usually far more important than the performance impacts of a few clock cycles. You get billions of clock cycles per second, they're not so special. You won't ever notice the overhead of a single function call. It's difficult to even measure using the high performance counters that count clock cycles. There's too much noise in the results from other effects, like caching, pipelining, branch prediction, and the general superscalar nature of modern CPUs that issue multiple instructions per clock cycle. You really need to aggregate a large number of function calls before the performance impacts become measurable, and more still before they become noticeable. For the vast majority of code, it simply doesn't matter, it will never be called often enough to matter. The usual advice is always write for readability and maintainability first. Don't worry about optimization until you have data to say it matters.

Offline lordpalandus

  • Banned
  • Hero Member
  • *****
  • Posts: 825
Re: CPU Cycle Cost
« Reply #5 on: February 26, 2016, 02:50:58 AM »
Thanks for the indepth and technical reply, Hooman.

There isn't a source article. Just an idea that at some point I'd like to look into for personal interests. You are right in the grand scheme of things for most programs or games, it wouldn't matter.

If you are wondering what my personal interest is, well, I'm a fan of large-space based battles. My interest is being able to create a game that runs at 60+ FPS that looks like that scene from Episode 3: Revenge of the Sith (at the start of the movie), with all those ships, all those weapon blasts, at the SAME polygon count and SAME texture quality of it as well. Even considering that the movie is several years old, and technology has advanced quite a bit, one still must remember that the entire scene was pre-rendered likely on a render farm, frame per frame and certainly not at real-time. So my personal interest is to reproduce that at 60 FPS which many would likely say is impossible considering the amounts of calculations, RAM requirements, etc... that would be needed to run that at 60 FPS. However, I feel that by utilizing EXTREME levels of optimization, utilizing old rendering techniques that have been made "obsolete" and utilizing old RAM boosting techniques it "may" be feasible (maybe not at 60 fps, but above single digits). However, the amount of work and research still left to be done to determine if something like this would be even partially feasible is enormous and quite honestly, I'm trying to keep myself on task and focus on learning what I need for making a RTS much less, something of this scale or complexity. However, I still feel learning about the CPU cost of functions is useful right now, so thanks for answering that.
Currently working on Cataclysm of Chaos, Remade.
Link to OPU page = http://forum.outpost2.net/index.php/topic,6073.0.html

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: CPU Cycle Cost
« Reply #6 on: February 26, 2016, 03:44:26 AM »
For something of massive scale, the solution is often to fake it rather than do an in depth calculation. For all the computation that might go into making a movie scene on a rendering farm for a big budget movie, a standard computer can still playback a video of that render. If the action far off in the distance is simply a movie playing, how can the user ever tell? You can design the world big enough so the player can't get close enough in any reasonable amount of time, or even design the movie so things appear and disappear in such a way that the movie can always be played back at a fixed distance, and the player would never really be able to tell. Think ships warping in and out, or blowing up and disappearing. If nothing remains fixed for long enough, nothing can be approached. Only populate the world with real objects that are close enough to interact with.

Offline lordpalandus

  • Banned
  • Hero Member
  • *****
  • Posts: 825
Re: CPU Cycle Cost
« Reply #7 on: February 26, 2016, 03:20:13 PM »
Interesting. What the user/player doesn't know won't hurt their playing experience. If the background changed sufficiently enough, and no identifiable patterns were noticed (ie there are those players that will stop and watch the background to see if it is a background or not) then you are right that user/players wouldn't be able to tell the difference.

Currently working on Cataclysm of Chaos, Remade.
Link to OPU page = http://forum.outpost2.net/index.php/topic,6073.0.html

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2352
  • OPHD Lead Developer
    • LairWorks Entertainment
Re: CPU Cycle Cost
« Reply #8 on: July 22, 2016, 04:14:54 PM »
Quote
owever, I feel that by utilizing EXTREME levels of optimization, utilizing old rendering techniques that have been made "obsolete" and utilizing old RAM boosting techniques it "may" be feasible (maybe not at 60 fps, but above single digits)

Quote
For something of massive scale, the solution is often to fake it rather than do an in depth calculation.

Bingo. Right on the money.

I think you need to do a little research into real time graphics and how you can produce the deisred results with minimal resource usage. That's got a lot less to do with code optmizations and a lot more to do with GPU/Shader optimization. More importantly... it's about faking it really really well. Even if you're zoomed in all the way on such a battle you're only going to be seeing so much of the scene at once.

See Sins of a Solar Empire for ways you can fake it. There are MASSIVE, almost EPIC space battles with sometimes literally hundereds of ships and projectiles flying around and yet it tends to maintain 30 - 60 FPS.