Print Page - Seven Ineffective Coding Habits of Many Programmers

Off Topic => Computers & Programming General => Topic started by: Hooman on October 07, 2018, 06:43:35 PM

Title: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 07, 2018, 06:43:35 PM

YouTube: Kevlin Henney - Seven Ineffective Coding Habits of Many Programmers (https://www.youtube.com/watch?v=ZsHMHukIlJY) (~45 minutes)

It was good. Kind of entertaining too. Covers a number of high level considerations in programming, mainly about communicating intent. It discusses things like whitespace, indentation, line wrapping, variable naming, encapsulation, and testing. He talked about how you should write tests based on usage rather than simply to call the methods, with an emphasis on the "Driven" in Test Driven Development.

It was worth watching, even for experienced programmers, yet quite simple and accessible even for fairly new programmers.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 07, 2018, 07:35:08 PM

Wish there were transcripts of these things. I hate watching videos like this, I'd rather read them.

Will have to check it out sometime when I have 45 minutes free.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Vagabond on October 08, 2018, 07:05:48 AM

Thanks for posting.

I liked the reference to Douglas Crockford's, JavaScript The Good Parts and his following jab at JavaScript.

Good insights into treating your code as a manuscript that is meant to be read by others. I agreed with the curly brace comments for separating the function arguments from the body of the function.

I really appreciated the parts about not justifying argument lists or comments at some arbitrary indentation like:

Code: [Select]

int FairlyLongFunctionName(int argument1,
                           int argument2);

It always creates a nightmare when refactoring a codebase by changing the function name size.

Or not separating the list on different sides of the screen (which I am guilty of fairly often):

Code: [Select]

int ReallyLongFunctionNameThatShouldBeShortened(int argument1,
    int argument2);

I used to be really good about trying not to spill past 80 chars on a line. In fact, when I was a newer programmer, I used to print my code out and edit it on paper with a pen. Then I'd go in and make the changes. I found myself constantly fiddling with how to break the code across multiple when long functions were encountered. It also made for annoying refactors when function names, or number of arguments was changed, which is part of the reason I stopped doing it.

I still need to work on my unit test writing as well, which the video reminded me of.

-Brett

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 08, 2018, 11:27:49 AM

Without having watched the presentation yet (now I really want to get into it), the last example you have with the really long function name, at least for me, indicates that the function does too much and needs to be refactored.

As for having too many parameters, except in rare cases (like in NAS2D::Renderer's drawing functions which make sense to have a lot of parameters for drawing functions), generally I will break off many of those parameters into a struct or public class and take a reference to that instead. E.g.:

Code: [Select]

void someContrivedFunction(int with, int lots, const std::string& of, const std::string& parameters, float that, float are, float better, char* off, char* in, char* a, char* structure)
{}

would become:

Code: [Select]


/**
 * This can now be stuffed into a header somewhere so
 * it's out of the way. Also, the struct can be better
 * documented versus having a giant documentation block
 * for the function parameter list.
 */
struct StructOfRelatedData
{
    int Int1;
    int Int2;
    const std::string String1;
    const std::string String2;
    float Float1;
    float Float2;
    float Float3;
    char* Buffer1;
    char* Buffer2;
    char* Buffer3;
    char* Buffer4;
};

/**
 * Much cleaner interface. Also, less likely to deal with cache hits or whatever
 * issues come about with functions that have huge parameter lists.
 */
void someContrivedFunction(StructOfRelatedData& _s)
{}

Again, this code is very much contrived and assumes that the data moved into a struct is used in multiple places.

Hell, even the NAS2D::Renderer draw functions I mentioned could be better served with a different design paradigm (e.g., having a separate function call that sets drawing colors versus putting the color information in the function paramter list but now we're getting into design and architecture choices of an API).

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 08, 2018, 03:28:06 PM

Quote from: Vagabond on October 08, 2018, 07:05:48 AM

I used to be really good about trying not to spill past 80 chars on a line.

I personally find 80 chars per line too restrictive, even with 2-space idents. 120 chars is also pretty conventional and I find it much nicer to work with.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 08, 2018, 05:10:23 PM

I used to care and then realized it didn't matter anymore. Sure, there are those still working in console mode but eh, modernize and grow up or don't complain.

But I'm also kind of a dick. :)

EDIT:

Okay, I'm watching it now and FINE, I cede defeat on the code is too long thing. He's kinda right. I will also state that I think my code tends to stick with 40 - 60 columns or less with only some code getting longer than that. It's a good indication that logic is getting too complicated. This of course is disregarding tab length so let's assume tab size of 4 spaces.

He does touch on the naming issue which I think has a lot to do with lines of code that run on forever -- if you're overly verbose in naming your thing, you have a different problem.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 10, 2018, 02:26:13 AM

Thank you Brett for the code examples! :)

(I think that was the transcript that Leeor required. ;) )

I suppose we should add an example of one of the better ways to write and justify parameter lists:

Code: cpp [Select]


void SomeRatherLongOverlyVerboseFunctionName(
    int param1,
    int param2
) {
  // ...
}

The parameter list is then kept together as one unit, with consistent indentation that depends only on structure, and not on name lengths. In particular, if someone renames the function to have a different name length, you don't have to re-justify the parameters to maintain a consistent indentation, and the parameter list is not split across both sides of the screen.

Leeor, that's a good point about passing a struct reference, rather than a bunch of individual parameters. There are many cases where that's the best way to do it. I suppose it depends on context though. If the parameters would be naturally bundled together, it makes sense to exploit that. If they are not naturally bundled together, or would need to be bundled together just for this one function call, then maybe not. I'd probably prefer a long parameter list than bundling together parameters in a forced artificial way. Though certainly long parameter lists can be a sign that something about the design is not so great.

As an example, here's a method with different levels of parameter bundling:

Code: [Select]

void DrawRect(int x1, int y1, int x2, int y2, unsigned char red, unsigned char green, unsigned char blue);
void DrawRect(Point p1, Point p2, Color color);
void DrawRect(Rect rect, Color color);

void SomeMethod() {
  /* Parameter meaning is not so clear at point of call */
  DrawRect(5, 10, 10, 20, 30, 50, 10);
  /* These two are more self documenting, without needing comments */
  DrawRect(Point(5, 10), Point(10, 20), Color(30, 50, 10));
  DrawRect(Rect(Point(5, 10), Point(10, 20)), Color(30, 50, 10));
}

On a related matter, when constructing a Rect, there is a choice of parameters, which might not be immediately clear:

Code: cpp [Select]


/* Parameter list types are identical, so you can't have both overloads. Choose only one. */
Rect::Rect(int x1, int y1, int x2, int y2);
Rect::Rect(int x1, int y1, int width, int height);

/* These options are more clear, and can both be available at the same time */
Rect::Rect(Point p1, Point p2);
Rect::Rect(Point p1, Vector size);

The 80 character limit does seem a bit archaic. Even if you're using a terminal, modern systems still allow terminals of varying sizes. Granted, I default mine to 80 characters wide, so there's that. At any rate, I don't particularly mind a limit of 120 characters, or pushing past the margin just a little, but by and large, I prefer code to be largely left justified.

His comment about how people read was noteworthy. We read in columns, like newspaper articles. I've heard elsewhere, that's much easier on the eye, and takes less effort to move the eye from one line to the next without losing which line you're on. Though I suspect that's less of a problem with code, since the greatly varying lengths of most code lines already creates a visual indicator that can help your eyes track which line is which.

At any rate, I've long found these widescreen computer monitors a bit silly. I spent most of my time on the computer reading, and yeah, I prefer to read in a column, than going all the way across a wide screen monitor. I've long felt computer monitors were being expanded in the wrong direction. I'm sure widescreen is just fine for movies, but it's not what I want for reading. Curiously, I once got a monitor at work that could be swivelled 90 degrees, so it was tall rather than wide. One of the selling features was being able to see more lines of code at one time. The only problem though, was the monitor's viewing angle was not the same between top-bottom and left-right. In normal widescreen mode, it seemed fine, with a decent viewing angle to the sides, but when you rotated in 90 degrees, it had a rather narrow viewing angle to the sides, which affected color display. In the end, the lackluster viewing angle meant I never bothered to use it in tall mode. Would have been nice though.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 10, 2018, 12:27:44 PM

Technically speaking most of these widescreen monitors can be turned to make them taller than they are wide. And to play devil's advocate, with window snapping a wide screen monitor makes more sense if you have a single monitor layout -- e.g., you can have two narrow windows side by side if you're doing comparison (e.g., code diff, etc.).

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 11, 2018, 03:33:20 AM

Err, yes, though turning a monitor sideways is awkward if you don't have the mount or stand to support it. That's what the feature was, a monitor stand that could swivel 90 degrees and lock in place. And some software support to also rotate the image 90 degrees to match. Would be kind of hard to use without both.

I did end up using side by side windows. It was a dual monitor setup too, both widescreen, so I ended up with 4 blocks of windows. It completely filled my field of view. I didn't really need it though, as I spent the vast majority of my time focused in one window. I would have preferred more vertical screen space, and would have been willing to sacrifice horizontal space for it.

Doesn't matter how wide a screen is, if it's not also sufficiently tall, it still feels like looking through a microscope.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: White Claw on October 11, 2018, 09:26:35 PM

Well, I watched the video. The guy talks slow enough that I was able to watch it mostly at 2x speed, so that helped cut down the time. :)

Anyway, I was going to write up a post about what I agree and disagree with. But ultimately I realized that despite the fact that coding style can easily be an exact science, it isn't and likely never will be. We've injected human feeling into a system that doesn't need it, and people will always continue to be abnormally bonded with whatever coding style they personally prefer.

I think I can summarize it back to a debate which has appeared even on this forum: tabs or spaces for indent? Two or four? etc...etc...

Although I will say it was mildly entertaining to watch someone express their preferences as the way to be. I must admit, at one point, he stated that some of this was his preference.

What I've found more than anything is that particular style doesn't matter (well, it does, but also doesn't). What's maybe more important is that once a style is set, people are willing to actually follow it...and not subtly undermine or intentionally ignore it due to their own preference. I.e. Just because you prefer to use four spaces doesn't mean using tab is /wrong/.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 12, 2018, 07:11:42 AM

Quote

We've injected human feeling into a system that doesn't need it

I'd like to claim you've spoken to my heart with that statement, except I'm not sure I have one. ;)

Next up, Vim versus Emacs :P

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Vagabond on October 12, 2018, 07:59:03 PM

White Claw,

I would say style does matter. While there isn't a single right answer in how code should be formatted, there are definitely wrong answers. I have worked on code that was poorly formatted. It isn't fun. Actually, I find cleaning code fairly enjoyable sometimes, but working in the code before it is cleaned is tough.

I'm not sure that I am, but I would like to be liberal on enforcing style rules on others. Nothing is worse than tearing apart decent code that someone worked hard on because like you said there was a tabs instead of spaces or whatever. Especially when it is all freelance coding.

-Brett

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 12, 2018, 08:05:52 PM

Quote from: Hooman on October 12, 2018, 07:11:42 AM

Next up, Vim versus Emacs :P

You're fired.

Quote from: Vagabond on October 12, 2018, 07:59:03 PM

Nothing is worse than tearing apart decent code that someone worked hard on because like you said there was a tabs instead of spaces or whatever. Especially when it is all freelance coding.

Fair enough. There is really terrible formatting but beyond that I think White Claw has a point with consistency -- consistency is generally more important than style.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 12, 2018, 10:22:46 PM

Agreed on consistency. I hate when code has inconsistent formatting. It's especially bad if there is a tabs/space inconsistency, and the tab width is not even between developers. That leads to all kinds of awful indentation.

Whenever I see inconsistently formatted code, I find I want to clean it all up. Realistically, that can generally be done by code formatting tools, so probably a waste of time to do it manually. Unfortunately, if you run a poorly maintained project through such tools, it tends to create massive changes throughout the code base. That's not so bad if you're working solo, but if you're working in a team, and other people have checked out branches they're working on, there's a good chance you'll be creating a ton of merge conflicts for everyone on the team.

Global changes, such as project wide formatting changes, or restructuring the directory tree are best done when other people don't have active work in those areas.

It's preventable though. You can have code checking tools that either prevent a push from being accepted by the upstream repository, or flag a pull request as not clean if there are code formatting issues. That lets you keep on top of things, and fix them before they become part of the master branch.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 13, 2018, 12:14:49 AM

Visual Studio is actually really good at this, particularly in C#. It offers a lot of ways to be consistent. It even offers suggestions on how to improve initialization, function/class naming, etc. It's really astonishingly effective.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: White Claw on October 13, 2018, 02:41:00 AM

Quote from: Vagabond on October 12, 2018, 07:59:03 PM

I would say style does matter.

Perhaps my statement wasn't fully articulate. By "...that particular style doesn't matter..." I mean that the /particular/ style doesn't matter. Not that style, in and of itself, doesn't matter. I don't mean to imply a willy-nilly free-for-all is the way to go.

Separately...

Quote

I have worked on code that was poorly formatted.

And I'm not sure how to fully articulate this either, but I'll try. The thing is, it's sometimes a matter personal taste as to what is poorly formatted. But what if the entire code base was poorly formatted in an exacting and consistent manner, and the entire development team used and understood that formatting style? Is it easy to integrate new team members...maybe not. But then again, neither is having five or ten different "right answers" in the code base.

I think I'd rather have a team of terrible formatters who are willing to be consistent, because I can at least hand them a cleaner standard with confidence that they'd follow it.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 13, 2018, 07:06:18 AM

(https://forum.outpost2.net/index.php?action=dlattach;topic=6189.0;attach=1542;image)

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: lordpalandus on October 13, 2018, 01:49:13 PM

Interesting...

I notice that it isn't mentioned anywhere of properly documenting the code. If you properly document the code, you can make up for a lot of bad code styling. And if you combine good documentation with good styling, then it is a breeze to read through complex code.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 13, 2018, 02:05:38 PM

Code should be self-documenting. Documentation (aka comments) should explain assumptions made by the code but the code itself should be documentation enough.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 13, 2018, 11:06:09 PM

Sometimes the best comment is no comment. If the code is written cleanly, with good names, comments shouldn't be required most of the time. The problem with comments, is they are not compiler checked, and so they can get outdated and not match with the code.

For an external source that says similar, you could read The Clean Code Collection (https://www.amazon.com/Robert-Martin-Clean-Code-Collection-ebook/dp/B00666M59G).

It's also worth pointing out that many comments you do see in the wild so closely match the original code so as to be completely redundant and unnecessary. They just create noise, and train people to ignore the comments.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: lordpalandus on October 13, 2018, 11:14:31 PM

You are right, code should be self-documenting. However, very few programmers have that skill. That or they are too rushed to make good self-documenting code.

Actually the issue of being rushed, is often why coding style is horrific.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: White Claw on October 13, 2018, 11:17:51 PM

Quote from: lordpalandus on October 13, 2018, 01:49:13 PM

I notice that it isn't mentioned anywhere of properly documenting the code.

I thought there was a section in the video about this? Perhaps it was another video I watched recently.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 14, 2018, 12:51:12 AM

Quote from: lordpalandus on October 13, 2018, 11:14:31 PM

You are right, code should be self-documenting. However, very few programmers have that skill. That or they are too rushed to make good self-documenting code.

Novice programmers lack this skill. Any experienced developer that has a good grasp of their chosen language is perfectly capable of developing self-documenting code. Saying this is a skill most programmers lack is a demonstrable falsehood.

Quote from: lordpalandus on October 13, 2018, 11:14:31 PM

Actually the issue of being rushed, is often why coding style is horrific.

Mhmm.

Quote from: White Claw on October 13, 2018, 11:17:51 PM

I thought there was a section in the video about this? Perhaps it was another video I watched recently.

No, it's in this video, he talks about it for a few minutes at least.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 14, 2018, 12:58:45 AM

Comments should be written with the mindset of, "what would I forget about if I were to not look at this code until a year later?" I think sometimes even semi-obvious comments can be reasonable, like say labeling what if/else if/etc. branches are for, so you can digest the logic at a quick glance, as long as such comments are brief anyway.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: leeor_net on October 14, 2018, 11:02:54 AM

Eh, maybe to some degree. Generally I only put assumptions the code makes in comments within the code itself. I do add comments ahead of functions... mostly it's a force of habit from the perspective of generating man pages using doxygen. Even still for the most part the comment is very brief (e.g., onMouseDown(MouseButton, int, int) doesn't really need complete documentation of every parameter and exactly what it does, so /* event handler for mouse down events */ suffices... and even then it's almost too much).

Anyway, like I said, I used to comment everything... I remember a book once providing the advice of first building pseudocode in comments and then fleshing out the code itself around the comments. I realized pretty quickly how terrible this advice is and have since switched to really only documenting assumptions the code is making (e.g., /* nullptr is never passed so it's not checked for here */) or why a particular choice was made by the programmer (e.g., /* using divide operator instead of a bitshift for clarity */). These are somewhat contrived examples but basically illustrate the point.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 14, 2018, 12:49:03 PM

Hmm, I was reading something recently about mixing levels of abstraction, and how that can lead to hard to read code. Some of the examples involved if statement expressions.

In terms of adding comments to label if/else if branches, I think that would mostly be necessary if you were mixing levels of abstraction. One solution might be to extract the checks into a method, where the name of the method provides a higher level of abstraction, and hence labelling of the if/else if branches.

The main difference between pointers and references are that references are non-nullable and constant. It's a compile error if you try to assign null to a reference. Instead of documenting that a pointer can not accept null, consider using a reference instead. A reference would imply the value needs to point to something valid.

Indeed, you have to go out of your way to trick the compiler into getting a null value stored in a reference variable. One way is to first create a pointer, set it to null, and then create a reference that points to the same thing as the pointer. Failing that, you'd need an ugly cast, or a really awful union, to subvert the type system.

Code: cpp [Select]


Obj& ref = 0;  // Error!

Obj* ptr = 0;
Obj& ref = *ptr; // Okay! (Does not check if ptr is null)

Obj& ref = *(Obj*)0;  // Okay! (Casting has subverted the type checks)

union {
  Obj* ptr;
  // References can not be stored directly in a union, so wrap in a struct
  struct Ref {
    // Explicitly define a constructor, since otherwise it is deleted due to the reference
    Ref();
    Obj& ref;
  };
} unionVar;
unionVar.ptr = 0;  // Okay!  (unionVar.ref is now also null)

Another indication you might want to use a reference, is if the pointer itself is constant. References are only ever set when they are defined. They can not be re-assigned. This corresponds to a const pointer (as opposed to a pointer to a const).

Or, put the other way, the two reasons for using a pointer rather than a reference, is to support using null values, or to be able to re-assign the pointer dynamically. If you don't need either feature, I'd recommend using a reference over using a pointer.

It's probably a good idea to use division rather than shift when the actual intent really is to divide. Although easy to forget due to it's prevalence, the shift is an optimization, and one that isn't clear to non-programmers, or people unfamiliar with binary numbers. Using a division really is more clear. Plus, I don't know of any compiler that can't optimize a division by a constant power of 2 into a bit shift.

There is however one gotcha. The rounding mode (towards 0) for signed integers does not match up with a simple bit shift. If you try to divide by a constant power of 2, and only care about positive numbers, but forget to declare the variable as unsigned, you'll get a less efficient opcode sequence with extra instructions to adjust for the rounding mode.

Code: cpp [Select]


int divideBy2(int value) {
  return value / 2;
}

Code: asm [Select]


_Z9divideBy2i:
.LFB1564:
	.cfi_startproc
	mov	eax, edi  ; Copy the value  (we need some scratch space to work with the sign bit)
	shr	eax, 31   ; Shift right (logical), moving the sign bit to lowest bit position
	add	eax, edi  ; Add 1 (round up towards 0) if value was negative, otherwise add 0 (positive)
	sar	eax       ; Shift right (arithmetic), divide by 2, maintain signdness of result
	ret               ; Return result (in eax register)
	.cfi_endproc

For unsigned values, a rounding mode of towards 0 matches a rounding mode of round down, which is a simple bit shift.

Code: cpp [Select]


unsigned int divideBy2(unsigned int value) {
  return value / 2;
}

Code: asm [Select]


_Z9divideBy2j:
.LFB1565:
	.cfi_startproc
	mov	eax, edi  ; Copy the value  (this could be removed if the method was inlined)
	shr	eax       ; Shift right (logical): Unsigned divide by 2
	ret               ; Return result (in eax register)
	.cfi_endproc

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 14, 2018, 12:57:09 PM

Quote from: leeor_net on October 14, 2018, 11:02:54 AM

(e.g., /* nullptr is never passed so it's not checked for here */)

Invariants like that should probably be asserts, which are self-documenting anyway.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 14, 2018, 01:11:00 PM

That's also a very valid point. I don't usually think to use asserts. They're also more widely applicable to a wide array of things you might want to check for, not just that a pointer isn't null.

I suppose one reason I don't often use asserts is they only work in debug mode, and I used to always just build and test in release mode. Made sense to me to be testing what was going to be shipped.

In hindsight, not using debug builds meant my skills with certain debugging tools was a bit lacking. On a few occasions I found myself debugging the output assembly code in OllyDbg, rather than source code in the MSVC debugger, just because I was more comfortable with OllyDbg. I kind of had to roll my eyes at myself for that though.

In this particular case, I think I would still prefer references, since you get some minimal compile time checks regardless of the build settings, and there is still no runtime overhead.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 14, 2018, 01:15:00 PM

Quote from: Hooman on October 14, 2018, 12:49:03 PM

Another indication you might want to use a reference, is if the pointer itself is constant. References are only ever set when they are defined. They can not be re-assigned. This corresponds to a const pointer (as opposed to a pointer to a const).

Or, put the other way, the two reasons for using a pointer rather than a reference, is to support using null values, or to be able to re-assign the pointer dynamically. If you don't need either feature, I'd recommend using a reference over using a pointer.

A lot of projects' coding standards only allow const references for function arg types (with the exception of r-value references), though, and if it needs to be mutable then it needs to be a pointer. That means you have to put e.g. "function(..., &variableToModify)" in calls to such functions, which helps distinguish output args from inputs in the code.

Quote

It's probably a good idea to use division rather than shift when the actual intent really is to divide. Although easy to forget due to it's prevalence, the shift is an optimization, and one that isn't clear to non-programmers, or people unfamiliar with binary numbers. Using a division really is more clear. Plus, I don't know of any compiler that can't optimize a division by a constant power of 2 into a bit shift.

There is however one gotcha. The rounding mode (towards 0) for signed integers does not match up with a simple bit shift. If you try to divide by a constant power of 2, and only care about positive numbers, but forget to declare the variable as unsigned, you'll get a less efficient opcode sequence with extra instructions to adjust for the rounding mode.

The real gotcha to this is that the compiler can only optimize div/mod pow2 for compile-time constants. If you're writing code that divides by a non-constexpr variable, where you assume it must be a power of 2, then to optimize that you have to explicitly write out the bit twiddling yourself and not use / or %.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 14, 2018, 01:42:04 PM

That's a good point about coding standards prohibiting it. I've read a number of times of complains about how references parameters obscure the notion of a pointer being used at the call site, since the address taking is done automatically without syntactic markup at the point of call. I suppose I've never been much bothered by it myself, though do admit that's a fairly valid point.

I guess I've generally found that with most APIs I've used, it's pretty obvious when a parameter is mutable, or an out parameter. It's usually obviously built into the name and purpose of the function. Though that won't be true of code in general. There's nothing stopping you from being sneaky or obscure about what you modify, or explaining something badly.

Ok, so pointers that can not be null, are const, and point to const (or are not restricted by code guidelines), should probably be references. :P

Can you give an example of a non-constexpr variable where the compiler can not optimize a division into a bit shift, but the programmer could explicitly code it?

If the divisor was an actual variable value, I assume it would be faster to just do the divide, then to try and convert it to a shift value and then do a bit shift. I remember seeing bit twiddling hacks to convert from powers of 2 to the bit number, and I don't remember it being all that simple.

Or perhaps there's a simple example where the compiler fails to optimize for something that really is const, but wasn't deemed constexpr?

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 14, 2018, 02:12:40 PM

Quote from: Hooman on October 14, 2018, 01:42:04 PM

Can you give an example of a non-constexpr variable where the compiler can not optimize a division into a bit shift, but the programmer could explicitly code it?

If the divisor was an actual variable value, I assume it would be faster to just do the divide, then to try and convert it to a shift value and then do a bit shift. I remember seeing bit twiddling hacks to convert from powers of 2 to the bit number, and I don't remember it being all that simple.

You don't. You have to work with log2 numbers. This might involve having to do bit twiddling once which itself is more expensive than a div - one way to do it is, basically, take (pow2 number) - 1, and then either use the Hamming weight algorithm (https://en.wikipedia.org/wiki/Hamming_weight) or the SSE4 popcnt intrinsic (both are constant time complexity), or if we're using intrinsics maybe just bitscan forward would be fastest. However, when you are going to do many divisions by the same pow2 number, it can end up being cheaper to do the log2 conversion once so you can do bit shifts thereafter, rather than doing div every time.

The mod case is trivial, you just replace (A % B) with (A & (B - 1)).

Quote

Or perhaps there's a simple example where the compiler fails to optimize for something that really is const, but wasn't deemed constexpr?

This sort of thing can happen with function inlining. You call a non-constexpr function with a constexpr arg, which is used as the divisor or whatever else, so then the arg becomes non-constexpr. But then, the compiler inlines it, and sadly compilers aren't very good about recognizing this sort of situation and doesn't go back and realize it could treat that arg as a constexpr now.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 15, 2018, 05:49:30 AM

Ahh, that makes sense. Frontload the work. I was stuck thinking of a single divide rather than an array divide.

Thanks for the pop count link, that was good review. The bit manipulations has a rather interesting structure.

As for the algorithm complexity, if you were being pedantic, any algorithm is constant time when you fix the input size. As the pop count instruction works on a fixed number of bits (the register size), it is of course constant time. Though you may notice from the algorithm, the actual number of instructions depends on the number of bits involved in the counting. Looks to be logarithmic in the number of bits, which is pretty efficient. And of course, having a machine opcode to do it is always handy. Looks like it can be done quite efficiently at the hardware level.

I do rather enjoy the conversion (A % B) => (A & (B - 1)) when B is a power of 2. I use a slight variant of that for rounding up to the next multiple of a power of 2: roundUp(value, multiple) = (value + (multiple - 1)) & ~(multiple - 1)

I tried your suggestion to pass a constexpr through a non-constexpr function. It seems the g++ optimizer is too good. I was not able to fool it with my simple test:

Code: cpp [Select]


unsigned int addOne(unsigned int x) {
  return x + 1;
}

unsigned int divideByTwo(unsigned int x) {
  return x / addOne(1);
}

The result of g++ -c -S -masm=intel -O2 constExprArg.cpp is:

Code: asm [Select]


_Z6addOnej:
.LFB0:
	.cfi_startproc
	lea	eax, 1[rdi]
	ret
	.cfi_endproc

...

_Z11divideByTwoj:
.LFB1:
	.cfi_startproc
	mov	eax, edi
	shr	eax         ; Shift right (logical): Divide by 2
	ret
	.cfi_endproc

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 15, 2018, 10:27:51 AM

Quote

As for the algorithm complexity, if you were being pedantic, any algorithm is constant time when you fix the input size. As the pop count instruction works on a fixed number of bits (the register size), it is of course constant time. Though you may notice from the algorithm, the actual number of instructions depends on the number of bits involved in the counting. Looks to be logarithmic in the number of bits, which is pretty efficient. And of course, having a machine opcode to do it is always handy. Looks like it can be done quite efficiently at the hardware level.

Not exactly, actually. Whether you're dealing with 8, 16, 32, or 64 bits, (whatever the native bit depth of the system is) the algorithm is almost identical, the only differences being the 0x5555/0x3333/etc. constants would be truncated to whatever bit depth you're interested in, and the last bit shift in the algorithm is ">> (bitDepth - 8 )".

Quote

I tried your suggestion to pass a constexpr through a non-constexpr function. It seems the g++ optimizer is too good. I was not able to fool it with my simple test:

Yeah, sometimes the compiler is able to optimize it away, but it's not totally reliable, and YMMV depending on the compiler. Math operations are probably the best case for this though, whereas more special-case things like memcpy/memset inlining when using a constexpr size tends to not work properly with multiple degrees of inlining.

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 17, 2018, 08:20:38 PM

You're right about truncating the constants, though I also believe you'll need to adjust the number of constants based on the bit size. From the Wikipedia article you linked on Hamming Weight (https://en.wikipedia.org/wiki/Hamming_weight):

Quote

For processors lacking those features, the best solutions known are based on adding counts in a tree pattern.

The height of that tree depends on the number of bits at the bottom.

If you were to look at the first algorithm listed:

Code: cpp [Select]


//types and constants used in the functions below
//uint64_t is an unsigned 64-bit integer variable type (defined in C99 version of C language)
const uint64_t m1  = 0x5555555555555555; //binary: 0101...
const uint64_t m2  = 0x3333333333333333; //binary: 00110011..
const uint64_t m4  = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...
const uint64_t m8  = 0x00ff00ff00ff00ff; //binary:  8 zeros,  8 ones ...
const uint64_t m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ...
const uint64_t m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones
const uint64_t h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3...

//This is a naive implementation, shown for comparison,
//and to help in understanding the better functions.
//This algorithm uses 24 arithmetic operations (shift, add, and).
int popcount64a(uint64_t x)
{
    x = (x & m1 ) + ((x >>  1) & m1 ); //put count of each  2 bits into those  2 bits 
    x = (x & m2 ) + ((x >>  2) & m2 ); //put count of each  4 bits into those  4 bits 
    x = (x & m4 ) + ((x >>  4) & m4 ); //put count of each  8 bits into those  8 bits 
    x = (x & m8 ) + ((x >>  8) & m8 ); //put count of each 16 bits into those 16 bits 
    x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits 
    x = (x & m32) + ((x >> 32) & m32); //put count of each 64 bits into those 64 bits 
    return x;
}

The constant m32 would by useless as a bitmask on 32-bit operations, as would the x >> 32 operation. Accounting for the halving/truncating of the constants, and removal of dead operations, the 32-bit equivalent should be:

Code: cpp [Select]


const uint32_t m1  = 0x55555555; //binary: 0101...
const uint32_t m2  = 0x33333333; //binary: 00110011..
const uint32_t m4  = 0x0f0f0f0f; //binary:  4 zeros,  4 ones ...
const uint32_t m8  = 0x00ff00ff; //binary:  8 zeros,  8 ones ...
const uint32_t m16 = 0x0000ffff; //binary: 16 zeros, 16 ones ...

int popcount32a(uint32_t x)
{
    x = (x & m1 ) + ((x >>  1) & m1 ); //put count of each  2 bits into those  2 bits 
    x = (x & m2 ) + ((x >>  2) & m2 ); //put count of each  4 bits into those  4 bits 
    x = (x & m4 ) + ((x >>  4) & m4 ); //put count of each  8 bits into those  8 bits 
    x = (x & m8 ) + ((x >>  8) & m8 ); //put count of each 16 bits into those 16 bits 
    x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits 
    return x;
}

That is, one less tree operation for half the number of bits. Or one extra tree operation if you double the number of bits.

Well, I think I got wildly off topic on this thread :P

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Arklon on October 17, 2018, 09:51:40 PM

That's not the most optimized form of the algorithm. Here:

Code: [Select]

constexpr uint64 m1 = 0x5555555555555555;
constexpr uint64 m2 = 0x3333333333333333;
constexpr uint64 m3 = 0x0F0F0F0F0F0F0F0F;
constexpr uint64 m4 = 0x0101010101010101;

uint64 PopCount64(uint64 x) {
  x = x - ((x >> 1) & m1);
  x = (x & m2) + ((x >> 2) & m2);
  x = (((x + (x >> 4)) & m3) * m4) >> ((64 /* bit depth */) - 8);
  return x;
}

Title: Re: Seven Ineffective Coding Habits of Many Programmers
Post by: Hooman on October 18, 2018, 03:33:25 AM

::)

Well, if you want to get pedantic... ;)

The point was easier to illustrate with the first version. The second version is basically identical, but strays from the pattern by omitting a few operations that aren't actually needed due to range restrictions. It is asymptotically identical.

The third version is an optimization of the second algorithm, that further modifies the later part. It hides some of the complexity in a multiplication operation. The note about about "...on machines with fast multiplication" is particularly relevant. However, the asymptotic complexity of the multiplication is actually greater than the base algorithm, so this implementation actually scales less well with bit size. It's only really an optimization for small bit sizes, where you have a fast hardware multiply instruction, but no fast hardware bitcount instruction.

See: Computational Complexity of Mathematical Operations (https://en.wikipedia.org/wiki/Computational_complexity_of_mathematical_operations)

There are multiple multiplication algorithms listed, each with their complexity, and all of them are super linear. For large bit sizes, this would dominate, making this algorithm less optimal.

Since I omitted it previously, technically you should also consider the bit-complexity of the additions used in the earlier algorithms. The computational complexity of each addition is linear for schoolbook addition with ripple carry. You can get faster addition with a carry lookahead generator though, such as in the Carry-Lookahead Adder (https://en.wikipedia.org/wiki/Carry-lookahead_adder). That reduces the bit complexity of the addition to log(n) in the number of bits. Further, the additions do not use the full bit size of the input registers, so saying log(n), where n is the register size is actually too big. As there is a constant number of additions per level of the computation tree, that comes to at most (log(n))^2 complexity. So in short, still sub-linear.

But again, if you fix the bit size, it all comes out as constant time in the end.

Outpost Universe Forums

Off Topic => Computers & Programming General => Topic started by: Hooman on October 07, 2018, 06:43:35 PM