Author Topic: C++ std::string subtle difference from char*  (Read 65 times)

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 671
C++ std::string subtle difference from char*
« on: June 13, 2018, 12:29:46 AM »
I've been using std::string and char* as we convert the legacy archive code in OP2Utility to C++11 standards. I've had some problems dealing with strings as we read or write the volume and CLM files.

I stumbled across this article which sort of turned on a lightbulb about some differences between std::string and char*.
https://akrzemi1.wordpress.com/2014/03/20/strings-length/

In particular, std::string does not track its size by using a null terminator at the end. It actually allows for inserting null terminators in the middle of the string. You have to account for this when inserting data into the string or transferring the data out of the string. Previously, I was treating std::string as a char* with built in memory management and stored size information. Not considering the lack of a null terminator sort of messed up my thought process.

For example, when storing a fixed length string in std::string, trailing /0s will be counted in the length. However, if that string is converted to a char*, then none of the /0s will be counted in the length anymore. This can cause weird bugs when testing for equivalency since the size is different.

So I get the feeling we should probably clean up and standardize the std::string code for reading/writing the archives and map files in OP2Utility at some point. Unfortunately, there seems to be lots of places in C++ that I don't understand the subtleties of the language. Taking time to fully learn them all would probably mean the project would stall out and never be finished. :| I never felt that way about C#. I think coding properly in C++ definitely takes longer and takes more discipline than in C#. I can see why someone might want to avoid C++ and use a newer language like Java or C# to reduce development time.

Offline leeor_net

  • Administrator
  • Hero Member
  • *****
  • Posts: 1833
    • LairWorks Entertainment
Re: C++ std::string subtle difference from char*
« Reply #1 on: June 13, 2018, 07:03:15 PM »
Subtle? They're entirely different beasts.

char* isn't a string, it's a pointer to a char. This allows you to store a sequence of char values that can be interpreted by humans as a string but the concept of a 'string' is entirely a human concept and C has no understanding of it. This is why there are functions to work with 'strings' that are flaky and extremely error prone. The 'null terminator' is a sentinel value that has become standard practice but ultimately is an arbitrary sentinel value. You're still dealing with raw memory and pointers and any 'string manipulations' really just translate to pointer arithmetic. It sucks. Hardcore.

std::string, on the other hand, is a class designed specifically to model what we humans call a 'string'. This is why they can contain null terminators within them. Remember, the 'null terminator' is just an arbitrary sentinel value. std::string manages memory and offers operators that make it easier to work with 'string' type data (though it's hardly perfect... this is why Boost has such things as the lexical_cast and an entire string manipulation library).

But yeah, thinking of them as the same thing with one having some memory management is certainly going to trip you up. One is a pointer to memory, the other is a stream of bytes. Most definitely not the same thing though they achieve similar results.

All that stated, this is why I suggested you choose one or the other with a strong suggestion of switching to std::string. It's far less prone to error than using raw char* pointers and you can get a char* pointer easily out of it if/when needed.
« Last Edit: June 13, 2018, 07:05:02 PM by leeor_net »
- Leeor
LairWorks Entertainment

Titanum UFO's