Print Page - Static initializers will murder your family

Off Topic => Computers & Programming General => Topic started by: Hooman on September 18, 2019, 02:09:01 AM

Title: Static initializers will murder your family
Post by: Hooman on September 18, 2019, 02:09:01 AM

Hehe, saw this and found it kind of funny:
Static initializers will murder your family (https://meowni.ca/posts/static-initializers/)

We ran into a static initialization order problem with op2ext a little while back. It was causing crashes in the test code on the continuous integration servers. It was somewhat random when it crashed, making it hard to know quite what was causing it.

Title: Re: Static initializers will murder your family
Post by: leeor_net on September 18, 2019, 06:27:06 PM

Holy shit, that post is awesome! I don't know if C++17 or c++20 will improve any of the issues but the writing style is fabulous. :D

Title: Re: Static initializers will murder your family
Post by: Hooman on September 19, 2019, 05:20:53 PM

Yeah, no kidding.

Quote

As Douglas Adams, the inventor of C++ said, ...

Lol

Title: Re: Static initializers will murder your family
Post by: Arklon on September 19, 2019, 10:19:43 PM

It should be noted that constexpr guarantees it will be static initialized at compile time rather than runtime.

Title: Re: Static initializers will murder your family
Post by: Hooman on September 20, 2019, 07:00:24 AM

Yes, constexpr is wonderful.

Now if only we could constexpr Outpost 2. Then we can have our fun before even playing the game. :P

Title: Re: Static initializers will murder your family
Post by: Vagabond on September 20, 2019, 08:24:26 PM

I consider the static initialization order as it stands in op2ext a decent sized flaw in how we have programmed it. It bothers me, but I don't know how to fix it yet.

I'd like to read the article, but hardly have enough time to program and keep up with the op2ext changes right now as other things going on in life. Hopefully will get to it later.

-Brett

Title: Re: Static initializers will murder your family
Post by: lordpalandus on September 20, 2019, 10:49:18 PM

Question: I don't fully understand the problem, but can't you use a preprocessor to force specific files to compile in a specific order?

Title: Re: Static initializers will murder your family
Post by: Hooman on September 21, 2019, 02:29:32 AM

That's not really what the pre-processor is for. At least not if you're talking about how it's typically used with header files.

In C++ each compilation unit (.cpp file, and recursively all the header files it includes), are compiled into an object file. Once all the source files are compiled into object files, they are linked into an executable file (or a DLL, or a static library, depending on project type).

During the link step, the order in which the object files are linked together is not specified. It might be the order the object files were specified on the command line to the compiler. Depending on the build system, that order might simply match the order of the directory listing returned by the filesystem, or it might be the order source code was added to the project file, or it might just be alphabetical order. At any rate, the order is not specified by the C++ standard.

The order of linking of the object files will affect the initialization order of global variables in those object files. If file A is linked before file B, you might expect the globals in file A to be initialized before the globals in file B. If you reverse the ordering of the linking, you might expect to reverse the order in which global variables are initialized.

This could be a problem if the initialization of a global in file A depends on the value of a global in file B.

Code: [Select]

// A.cpp
ObjectTypeA objectA(objectB.getValue());


// B.cpp
ObjectTypeB objectB;

If objectA is initialized first, then its constructor call will try to use the not yet initialized objectB in the expressions objectB.getValue(). That will likely result in accessing uninitialized memory, which is quite likely to result in a crash.

There is a workaround, though it can seem a bit hacky at times. You put all global variables in the same .cpp file. The order of initialization within a compilation unit is well defined. The variables are initialized in the order they are defined. That means you could write the following:

Code: [Select]

// Globals.cpp
ObjectTypeB objectB;
ObjectTypeA objectA(objectB.getValue());

So could you pre-process stuff and fix the problem? I suppose you could for this particular case. You could do something really ugly and non-standard, like include .cpp files instead of .h files. That could combine the whole project into one compilation unit, wherein initialization order is well defined, based on definition order. You'd then need to carefully order your #include lines:

Code: [Select]

// CombinedFiles.cpp
// Include .cpp files (not .h files!)
#include "B.cpp"
#include "A.cpp"  // A depends on B, so must come after it

Of course then you lose the firewall aspect of having multiple compilation units. If you change anything, even a comment, you now have to recompile the entire project since it's all one compilation unit. Considering how slow C++ compilers typically are, even for medium sized projects, that's a major pain. For a large project, a full recompile could take a few hours. If you keep code separated into different source files, you only need to recompile a section when something in that compilation unit changes. The linking step at the end is pretty fast.

Another problem where globals can get you in trouble with initialization, and where re-ordering things won't help you, is when a global ends up making a call to itself from its constructor. That can happen in all sorts of strange and convoluted ways, and if often quite indirect.

Case in point, I just noticed something today in op2ext. We have a global logging object. The logging object logs messages to a file. When it is constructed, it tries to open the log file for output, which could fail. If it fails, it reports the error to the user with a message box. Over time, use of the message box grew, so now it is used to report errors in several places in the code. At some point, probably weeks later, it was decided that these errors shouldn't just be displayed, but also logged.

So now, if the logger fails to open it's log file, it reports the error, which tries to log the error.

Amazingly it doesn't crash, but that's a bit to do with luck. The logging object is initialized just enough before reporting the error that the reporting system can use it without crashing, though it can't write anything to the non-opened file. I'm pretty sure it's still undefined behaviour though.

The structure looks something like this:

Code: [Select]

// Logger.cpp
Logger::Logger() : logFile(logFileName) {
  if (!logFile.open()) {
    PostErrorMessage("Failed to open log file");
  }
}

// LotsOtherFiles.cpp
// ...
// PostErrorMessage("Blah!");
// ...
// PostErrorMessage("Blah!");
// ...
// PostErrorMessage("Blah!");


// PostErrorMessage.cpp
PostErrorMessage(std::string message) {
  auto formattedMessage = getTimeString() + getExtraDetail() + message
  logger.Log(formattedMessage); // Added a few weeks later after PostErrorMessage became a general reporting tool
  MessageBox(formattedMessage);
}

Edit: I would just like to add that I was fully responsible for adding the logger.Log(formattedMessage); line. It seemed like a good idea at the time.

Title: Re: Static initializers will murder your family
Post by: lordpalandus on September 22, 2019, 02:35:19 PM

I'm just surprised that you cannot state what thing to compile in a specific order. Everything else about programming is sequential; do this, then this, then this, finish with this. I'd have thought that you could force the compiler to compile things in a specific order. Or maybe you can, but its a pain in the ass to setup and do.

Title: Re: Static initializers will murder your family
Post by: Vagabond on September 22, 2019, 08:52:43 PM

From Hooman:

Quote

I would just like to add that I was fully responsible for adding the logger.Log(formattedMessage); line. It seemed like a good idea at the time.

Hey, I was going to take credit for that line of code.

@lordpalandus,

I think to some degree it does compile in a specific order, but if the required behaviour is not specified in the C++ standard, each compiler is allowed to implement their solution in their own way. So you cannot be sure it will work between each vendor's compiler (or between different versions of the same vendor's compiler). It is best to avoid depending on this sort of behaviour in your code because it makes it very brittle. Someone can correct this paragraph since I'm probably mis-speaking a decent amount.

-Brett

Title: Re: Static initializers will murder your family
Post by: TechCor on September 23, 2019, 09:09:44 PM

I'm curious why you would want to rely on static init for anything complex. Initializing a system should be a direct function call, such as Initialize().

Of course, I'm not a fan of globals to begin with and try to keep them to a minimum.

Title: Re: Static initializers will murder your family
Post by: lordpalandus on September 23, 2019, 10:14:50 PM

I love global variables personally, but it makes saving them to file a real chore.

Title: Re: Static initializers will murder your family
Post by: Hooman on September 24, 2019, 06:15:29 AM

Quote

Hey, I was going to take credit for that line of code.

Ok, fine, you can have it! :P

...

Actually, if we want to play the blame game... ;)

Code: [Select]

git blame <previous commit SHA> -- srcStatic/GlobalDefines.cpp

Quote

... somewhere in PostErrorMessage
...
f52c2711 GlobalDefines.cpp (Brett208 2018-12-21 15:34:42 -0500 29) logger.Log(formattedMessage);
...

It was you that wrote it! ;D

Don't you just love version history. It never lies. ;)

... Though I'm pretty sure I was the one that suggested that line.

I normally try to avoid globals. Though with something like logging, you really don't want to be passing a logging object around to *every* function in your app that might want to log some event or problem.

In the end, we changed the logging to use global functions, and added protective layers around the underlying objects so they aren't really global anymore. At least their visibility isn't global. They may still be statically initialized if so desired, though it would be awkward to activate a logger at that point.

Global functions seem much less offensive than global objects.

Title: Re: Static initializers will murder your family
Post by: Hooman on November 13, 2019, 09:14:07 AM

I came across a presentation on YouTube that touched on many of the points above:
CppCon 2017: Nir Friedman “What C++ developers should know about globals (and the linker)” (https://www.youtube.com/watch?v=xVT1y0xWgww&list=WL&index=9)

Curiously, he mentioned logging as a good use case for global variables.

Title: Re: Static initializers will murder your family
Post by: leeor_net on November 15, 2019, 08:09:41 AM

Quote from: lordpalandus on September 23, 2019, 10:14:50 PM

I love global variables personally, but it makes saving them to file a real chore.

Global variables are generally evil (https://www.learncpp.com/cpp-tutorial/4-2a-why-global-variables-are-evil/)... though it really depends on how you use them, when and why. The very abridged version of Global Variables are Evil^TM is that they can be changed anywhere at any time by any part of the program and it's difficult to know when they're changed. For smaller programs this isn't really a problem but when you get into programs that are tens of thousands (or even millions) of lines of code spanning dozens of source files, it becomes painful. That's where the get/set type functions come into place because there's only one way to get to and set the values of them.

If I need a 'global' value, coding best practices usually indicate use of get/set type functions. In my code they take the form of this:

Code: [Select]

/* Declared static so it can only be directly accessed within this module */
static int SomeValueIllNeedEverwhere = 0;

int SomeValueIllNeedEverywhere() { return SomeValueIllNeedEverywhere; }
void SomeValueIllNeedEverywhere(int i) { SomeValueIllNeedEverywhere = i; }

Granted this is useful in C and C++ -- C# has the concept of Properties which is an amazing thing that I truly love about its design (e.g., public int SomeValue { get; set; }, though everything does have to be wrapped in a class like in Java).

Keep in mind, of course, that the above code sample is trivial. But say, for instance, that later on you want value validation when you set the variable. Instead of having to modify every piece of code that touches the variable, you can just put the validation code in the setter function and you're done.

All of this has a term -- encapsulation. It's the cornerstone of defensive programming and a good concept to learn and use liberally.

Title: Re: Static initializers will murder your family
Post by: Hooman on November 15, 2019, 11:03:22 PM

Quote

What’s the best naming prefix for a global variable?

Answer: //

Hah! Nice :)

One trick I've come to like for global variables is to wrap them in an anonymous namespace. That prevents references to them from other translation units. They are still stored in the global section of the executable image, but they can only ever be referenced from within the translation unit they are declared in due to name scoping rules. It reduces project global down to translation unit global.

I'd like to add a bit of nuance to get/set. Using get/set is fine, and often very useful when adding validation. However, if it's used liberally for every private field of a class, without validation, you've pretty much already destroyed encapsulation and should probably have just used a struct. If the internal format is known, and not validated, you basically just have a struct. So yes, use it, but don't take it too far so as to abuse it. And maybe just consider a struct with public fields if that's all you're going for anyway.

As a motivating example, consider two implementations of a Point class. One implementation could use Cartesian coordinates (x, y), whereas another could use polar coordinates (angle, distance). You could set either using an (x, y) pair, or using an (angle, distance) pair. You could also receive an (x, y) pair, or an (angle, distance) pair from either, they are both capable of providing such a transformation. It doesn't make sense to set either using just an x value, or just a y value, nor with just an angle value or just a distance value. Allowing that would break encapsulation. At least if the level of encapsulation is a point in some (possibly unknown) internal format. Getting or setting pairs of points is just fine though.

Outpost Universe Forums

Off Topic => Computers & Programming General => Topic started by: Hooman on September 18, 2019, 02:09:01 AM