Author Topic: When to use Return Value Optimization (RVO) in C++  (Read 9808 times)

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
When to use Return Value Optimization (RVO) in C++
« on: September 09, 2017, 12:39:36 AM »
I'm looking for a general rule of thumb on when it is or is not appropriate to assume RVO kicks in when returning a non-trivial object (such as a class or container of some sort).

For example, I wrote a function to instantiate an archive and return a pointer to the new ArchiveFile class as listed below:

Code: [Select]
ArchiveFile* archive = ConsoleHelper::openArchive(archiveFilename);

Code: [Select]
ArchiveFile* ConsoleHelper::openArchive(const string& archivePath)
{
if (XFile::extensionMatches(archivePath, "VOL"))
return new Archives::VolFile(archivePath.c_str());
else
return new Archives::ClmFile(archivePath.c_str());
}

However, this would be cleaner to work with if I rewrote it like the following:

Code: [Select]
ArchiveFile archive = ConsoleHelper::openArchive(archiveFilename);

Code: [Select]
ArchiveFile ConsoleHelper::openArchive(const string& archivePath)
{
if (XFile::extensionMatches(archivePath, "VOL"))
return Archives::VolFile(archivePath.c_str());
else
return Archives::ClmFile(archivePath.c_str());
}

I like this because I don't have to worry about dealing with the pointer or destroying the ArchiveFile later. Can I expect the compiler to optimize this by returning a reference to the Archive instead of a new copy?

-Brett

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2352
  • OPHD Lead Developer
    • LairWorks Entertainment
Re: When to use Return Value Optimization (RVO) in C++
« Reply #1 on: September 09, 2017, 11:12:09 AM »
It's somewhat compiler dependent but you can generally expect a decent optimizing compiler to find a better way to do this besides constructing an object, copying said object, then destroying it. However, since you're passing by value to a base class, you run the risk of object slicing (partial copy of an object).

To avoid having to worry about memory management by returning pointers (and prevent object slicing), you can use a smart pointer. Something along these lines:

Code: [Select]
typedef std::shared_ptr<Archive> ArchivePtr;


ArchivePtr ConsoleHelper::openArchive(const string& archivePath)
{
if (XFile::extensionMatches(archivePath, "VOL"))
{
return std::make_shared<Archives::VolFile(archivePath.c_str())>();
}
else
{
return std::make_shared<Archives::ClmFile(archivePath.c_str())>();
}
}

By using smart pointers you can eliminate the need for memory handling and have a clean way of creating derived types without having to use raw pointers.



SIDE NOTE 1: You'll notice that I included the braces -- I would recommend you adopt this as a coding best practice to avoid potential mistakes that are easy to overlook. If it appears too verbos you can condense it as such:

Code: [Select]
if (XFile::extensionMatches(archivePath, "VOL")) { return std::make_shared<Archives::VolFile(archivePath.c_str())>(); }

Though I tend to reserve this for very short statements:

Code: [Select]
if(someCondition){ return; }



SIDE NOTE 2: Determining file type by extension is dubious. It probably works in most cases but it lends itself to people doing stupid things like changing extensions. A better way to do this would be to check for some sort of magic value (like a header string). I'm not sure if vol/clm have such identifiers so this may not be possible -- in which case you have to assume that the user isn't an idiot and if they are they deserve whatever explosion ensues though I would add some sort of a try/catch block in an effort to fail gracefully vs. simply crashing.
« Last Edit: September 09, 2017, 11:23:07 AM by leeor_net »

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: When to use Return Value Optimization (RVO) in C++
« Reply #2 on: September 09, 2017, 03:23:25 PM »
Quote
It's somewhat compiler dependent but you can generally expect a decent optimizing compiler to find a better way to do this besides constructing an object, copying said object, then destroying it. However, since you're passing by value to a base class, you run the risk of object slicing (partial copy of an object).

To avoid having to worry about memory management by returning pointers (and prevent object slicing), you can use a smart pointer.

I started reading up on object slicing after your post. The concept is making my head hurt :|. I bought a book on more advanced C++ than the beginner book I had earlier but haven't made it to the chapter on smart pointers yet. I'll try and read up on them some more this weekend.

Thanks for the tip on using them.







Quote
SIDE NOTE 1: You'll notice that I included the braces -- I would recommend you adopt this as a coding best practice to avoid potential mistakes that are easy to overlook.

Code: [Select]
    
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    goto fail;

Spotting the mistake in the code above seems very easy visually to me. I'd say removing the braces reduces vertical used space and does not make it any more difficult to read. Since people have reported bugs like this, I suppose it happens in real life. Although I cannot think of a single time that I have expanded an if statement to include 2 lines of code and not thought about the fact that a set of braces needs to be added.

I used to always include the braces, but after a long time, I found it just easier to read and more succinct to not use them for single lines of code after an if statement.

Perhaps not having the braces makes things more difficult when merging conflicting changes using a repository with other programmers, but I don't have any real examples or experience here to pull from?

Quote
Code: [Select]
if(someCondition){ return; }

If would typically write the above statement as

Code: [Select]
if(someCondition)
    return;

I find including content that is typically spread across multiple lines of code less readable when condensed to one line.







Quote
SIDE NOTE 2: Determining file type by extension is dubious. It probably works in most cases but it lends itself to people doing stupid things like changing extensions. A better way to do this would be to check for some sort of magic value (like a header string).

There are magic values. vol files all start with "VOL" and CLM files all start with "OP2 Clump File Version 1.0".

I could look at designing some sort of function to stream into a string the first X bytes of the file into. I'm not sure that I want to support this feature though. Outpost 2 is unable to find and read CLM or VOL files if they do not contain their extensions. Either way you are right that I'm missing some sort of error exception. See below for a correction.

Code: [Select]
ArchiveFile* ConsoleHelper::openArchive(const string& archivePath)
{
if (XFile::extensionMatches(archivePath, "VOL"))
return new Archives::VolFile(archivePath.c_str());

if (XFile::extensionMatches(archivePath, "CLM"))
return new Archives::ClmFile(archivePath.c_str());

throw invalid_argument("Provided filename is not an archive file (.VOL/.CLM)");
}


Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: When to use Return Value Optimization (RVO) in C++
« Reply #3 on: September 10, 2017, 07:15:10 AM »
Leeor_net is right about object slicing. You'll want to return the value by pointer or reference, rather than by value. If you're going to use smart pointers, I think unique_ptr might be appropriate here. Admittedly though, I have little experience so far with using the smart pointer classes.

If your main objective is to use the dot syntax (archive.method), instead of the arrow syntax (archive->method), using a reference is sufficient. You can convert at multiple locations, such as the new statement, the return statement, or at the point of call. Remember that references use value like notation, even though they are pointers under the hood:
Code: [Select]
Archive* ptr = new VolFile(fileName);
Archive& ref = *ptr;  // You could inline this above

If your main object is to get rid of the delete, you're kind of out of luck with the current design. You have a function that returns a pointer to an object of either type. This has to be done by base class pointer. Since the object is returned by pointer/reference form the function, it's can't be a local variable stored on the stack. That means it's either a heap variable (new/delete), or a global (generally bad practice to use). At best you can hide the new/delete by using smart pointers.

A topic of interest for smart pointers is Resource Acquisition Is Initialization (RAII).

To avoid the new/delete, and global variables, you'd need a code structure where some function creates the appropriate object on the stack as a local variable, which can pass that object into other methods for future processing, but never returns it. In that way, the lifetime of the object never escapes the function that created it. This isn't always appropriate.



I have no qualms about checking file extensions in your code to guess the archive type. At any rate, the archive code already checks the file headers after trying to open the files. The code will throw an exception if it's opened by an archive class that doesn't match the file contents. If you wanted, you could use try/catch blocks to try opening the file as both types, and going with the first that succeeds, assuming one of them does. That would make it independent of the extension, though the code is a bit uglier, and this is supporting a use case people should really be avoiding, namely badly named files with misleading extensions.



A lot of languages have keyword delimited blocks, such as being/end:
Code: [Select]
if someBooleanCondition then begin
  blockOfCode
end

If you're used to that format, than the C++ equivalent makes a lot of sense:
Code: [Select]
if (someBooleanCondition) {
  blockOfCode;
}

Some people loathe this format. I used to. Though I began to try it after programming in languages that delimit blocks with keywords. It makes a big difference in the amount of code you have to scroll through. It takes one less line of vertical space, so it's more compact than having braces at matching indentation levels. It also passes all the lint validations for checkers that ensure "if" statements always having matching braces. The linters check this because bugs like the one leeor_net posted are surprisingly common. I too used to be of the opinion that it's quite visually obvious, though it's still possible to get wrong during refactoring due to competing demands for attention when making complicated changes involving many lines of code. This type of bug is not something that is generally written as is. It gets introduced during code changes, since without the braces, the structure of the code is very dependent on how many lines of code you have in a block.


Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2352
  • OPHD Lead Developer
    • LairWorks Entertainment
Re: When to use Return Value Optimization (RVO) in C++
« Reply #4 on: September 10, 2017, 02:31:22 PM »
Not only are they common but the goto fail bug mentioned above was in an open-source component for years before someone finally noticed it.

Is it more concise to omit the braces? Yes. But the point is not about concise code, it's about correct code. Defensive programming -- it's a way to avoid common and simple mistakes that even the best of us can make. Plus, as Hooman stated, it allows lint checkers and code analyzers to find possible mistakes.

Anyway, code style and best practices guidelines are the subjects of entire books.

I would still recommend the use of C++11 smart pointers in this case. shared_ptr is a good choice if you're going to be sharing the pointer across several possible interface. unique_ptr is good if you never intend to copy that pointer. The first answer to this question provides an excellent explanation of the differences between them.
« Last Edit: September 10, 2017, 02:33:17 PM by leeor_net »

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: When to use Return Value Optimization (RVO) in C++
« Reply #5 on: September 10, 2017, 11:39:27 PM »
Just got a unique_ptr working in the code. I had to format it as below with the curly brackets instead of the format Leeor had recommended. Not sure why. I ran out of time to research the subject today. I had tried using make_unique to create the pointer but it didn't like this.

Code: [Select]
unique_ptr<ArchiveFile> ConsoleHelper::openArchiveUniquePtr(const string& archivePath)
{
if (XFile::extensionMatches(archivePath, "VOL"))
return unique_ptr<VolFile>{new VolFile(archivePath.c_str())};

if (XFile::extensionMatches(archivePath, "CLM"))
return unique_ptr<ClmFile>{new ClmFile(archivePath.c_str())};

throw invalid_argument("Provided filename is not an archive file (.VOL/.CLM)");
}

-Brett

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: When to use Return Value Optimization (RVO) in C++
« Reply #6 on: September 11, 2017, 02:04:50 AM »
These might be an interesting and relevant read:
https://stackoverflow.com/questions/37514509/advantages-of-using-make-unique-over-new-operator
https://stackoverflow.com/questions/22571202/differences-between-stdmake-unique-and-stdunique-ptr



I think I see the difficulty. You need to specify the type (VolFile) to call the proper constructor, but you're returning a type based on the base class (ArchiveFile). The question is, are unique_ptr's covariant if their template parameters are? I'm guessing no, even though you'd want them to be here. I don't think there is an easy type safe way for the compiler to assume this, particularly since the compiler wouldn't have special info to know the smart pointers are really pointer types. This is all just me guessing here. I assume you tried like this?

Code: [Select]
    return make_unique<VolFile>(archivePath.c_str());
« Last Edit: September 11, 2017, 02:10:38 AM by Hooman »

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2352
  • OPHD Lead Developer
    • LairWorks Entertainment
Re: When to use Return Value Optimization (RVO) in C++
« Reply #7 on: September 11, 2017, 01:22:08 PM »
Code: [Select]
unique_ptr<ArchiveFile> ConsoleHelper::openArchiveUniquePtr(const string& archivePath)
{
if (XFile::extensionMatches(archivePath, "VOL"))
return unique_ptr<VolFile>{new VolFile(archivePath.c_str())};

if (XFile::extensionMatches(archivePath, "CLM"))
return unique_ptr<ClmFile>{new ClmFile(archivePath.c_str())};

throw invalid_argument("Provided filename is not an archive file (.VOL/.CLM)");
}

In C++11 this is the equivalent of:

Code: [Select]
unique_ptr<VolFile> volPtr = new VolFile(archivePath.c_str());
return volPtr;

See this article about C++11 initialization forms;

Hooman's suggestion of using make_unique should be correct.
« Last Edit: September 11, 2017, 01:23:44 PM by leeor_net »

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: When to use Return Value Optimization (RVO) in C++
« Reply #8 on: September 11, 2017, 07:52:29 PM »
It is working now. Thanks for the tips. I wasn't understanding that you simply feed make_unique the values required for the constructor. I was trying to feed make_unique a new object instead. The right syntax felt weird.

Incorrect
Code: [Select]
return make_unique<VolFile>(VolFile(archivePath.c_str()));

Correct
Code: [Select]
return make_unique<VolFile>(archivePath.c_str());

I don't have time right now, but hopefully sometime soon I'll have enough time to clean up all the calls to the function to the unique pointer and push to the repo.

-Brett

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: When to use Return Value Optimization (RVO) in C++
« Reply #9 on: September 11, 2017, 08:21:53 PM »
Ahh, interesting. Good to see you got that sorted out.

I also did a bit of testing on Linux, and made a few discoveries.

First, the header for unique_ptr and make_unique is:
Code: [Select]
#include <memory>

Although unique_ptr is a C++11 feature, the "make_unique" function is a C++14 feature. It got left out of the C++11 standard, which seems to have been an oversight. To get code to compile with g++, you'll need to pass the appropriate flag:

Code: [Select]
g++ -std=c++14 ...

As per unique_ptr on cppreference:
Quote
If T is a derived class of some base B, then std::unique_ptr<T> is implicitly convertible to std::unique_ptr<B>.

Thus, it should be perfectly fine to construct a unique_ptr of the derived type, and then return it as a unique_ptr to the base class. Everything will just work automatically.

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: When to use Return Value Optimization (RVO) in C++
« Reply #10 on: September 14, 2017, 03:00:13 AM »

Quote
Although unique_ptr is a C++11 feature, the "make_unique" function is a C++14 feature. It got left out of the C++11 standard, which seems to have been an oversight. To get code to compile with g++, you'll need to pass the appropriate flag:

I didn't realize make_unique is C++14. I use VS2017 with default settings, which I think leave C++14 features in the code.

Quote
it should be perfectly fine to construct a unique_ptr of the derived type, and then return it as a unique_ptr to the base class. Everything will just work automatically.

Good to know. This is working from my couple of tests without issue in practice as well.



I had one more related question.

So, now that I have decided to return a unique_ptr to the ArchiveFile, I will want to pass the unique_ptr to a couple of subclasses to perform work with it.

for example
Code: [Select]
unique_ptr<ArchiveFile> archive = ConsoleHelper::openArchive(archiveFilename);

checkFilesAvailableToRemove(*archive, filesToRemove, consoleArgs.consoleSettings.quiet);

Would it be preferred to pass the ArchiveFile as a reference to a private class function as opposed to passing the pointer itself? I don't think in this case the private function actually cares at all the location/status of ArchiveFile as long as it can consume it? It would be up to the function calling checkFilesAvailableToRemove to ensure it is passing a valid ArchiveFile...

So the declaration for checkFilesAvailableToRemove would read as:

Code: [Select]
void checkFilesAvailableToRemove(ArchiveFile& archive, const vector<string>& filesToRemove, bool quiet);

-Brett

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: When to use Return Value Optimization (RVO) in C++
« Reply #11 on: September 14, 2017, 06:27:11 AM »
That makes sense. I think passing the object by reference is the right thing to do. In this case, you are not transferring ownership of the object. The called function does not take control over the object's lifetime. It's just getting temporary access to the object to do some operation.


It would be different if you were transferring ownership, such as to store an object in a long lived data structure. In that case, you'd want to std::move the unique_ptr into the function, transferring ownership, and causing the original unique_ptr to be set to nullptr. If you try to simple pass a unique_ptr into a function (without std::move, and without a reference) you'll get a compile error. This is due to unique_ptr not having a copy constructor, thus it can not be copied from the calling function into the argument of the callee. See:
How Can I Pass unique_ptr Into A Function?

If you read the above, you'll also see it's possible to pass the unique_ptr by reference (rather than the object itself). This makes less sense here, since that would require the caller to wrap any references to the object into a unique_ptr before calling the function. That's an unnecessary restriction on the caller. Maybe it's actually using a shared pointer, or maybe the object is just a local variable on the stack, not wrapped by a smart pointer. The called function doesn't really care if the reference it gets is held uniquely or not. Hence, it's best to just pass a reference to the object itself, not wrapped in a smart pointer that dictates control of object lifetime.

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 2352
  • OPHD Lead Developer
    • LairWorks Entertainment
Re: When to use Return Value Optimization (RVO) in C++
« Reply #12 on: September 14, 2017, 12:30:54 PM »
Was going to say -- smart pointers are just C++ objects... pass it by reference, let the compiler deal with the specifics and don't overthink it.

I didn't realize make_xxxx was C++14... works just fine in Visual Studio 2015 so I just assumed it was all part of the same spec. Either way, useful functions. :)

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: When to use Return Value Optimization (RVO) in C++
« Reply #13 on: September 15, 2017, 10:47:13 PM »
Okay, thanks for the reference/pointer advice.

I finally finished going through all the OP2Archive code and replaced functions taking raw pointers as function inputs to instead take a reference to the object being pointed at.

I also moved all the raw pointers to unique_ptrs.

Quote
If you try to simple pass a unique_ptr into a function (without std::move, and without a reference) you'll get a compile error. This is due to unique_ptr not having a copy constructor, thus it can not be copied from the calling function into the argument of the callee.

Yeah, I found this out by experience and is why I started passing the objects stored in memory by reference instead of pointers to functions. It makes sense in light of unique_ptrs actually being unique and not allowing multiple copies now that I think about it more.

It is good to keep learning about C++, but it is really delaying finishing OP2Archive.

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: When to use Return Value Optimization (RVO) in C++
« Reply #14 on: September 17, 2017, 07:44:42 AM »
Yes, preference references over pointers. They are newer, and the syntax is nicer and more consistent with non-pointers.

You might still need pointers for things that are nullable, or that can be updated to point to new objects. A reference is assumed to always point to a valid object, and is set once when it is initialized, and can not be reassigned. A reference is perhaps like a non-null const pointer, with nicer syntax.


Quote
It is good to keep learning about C++, but it is really delaying finishing OP2Archive.

Welcome to the world of programming. Things always take longer than you expect  ::)

Though to be fair, I've been dragging my feet on my part.

Offline Arklon

  • Administrator
  • Hero Member
  • *****
  • Posts: 1269
Re: When to use Return Value Optimization (RVO) in C++
« Reply #15 on: October 04, 2017, 10:36:21 PM »
Was going to say -- smart pointers are just C++ objects... pass it by reference, let the compiler deal with the specifics and don't overthink it.

I didn't realize make_xxxx was C++14... works just fine in Visual Studio 2015 so I just assumed it was all part of the same spec. Either way, useful functions. :)
make_shared was in C++11, make_unique wasn't added until 14. 14 wasn't as big of an update as 11 but it did add some nice features, like that, and generic lambdas.