I was going over some code that used the Windows style defines for pointers to wide strings.
So a `const LPWSTR` is just a `const wchar_t*`, right? Right?
Wrong!
Yet another dark corner of the C++ language.
The following test written with Google Test passes:
#include <gtest/gtest.h>
#include <type_traits>
TEST(WideStrings, WideStringTypes)
{
using LPWSTR = wchar_t*;
EXPECT_FALSE((std::is_same<const LPWSTR, const wchar_t*>::value));
EXPECT_FALSE((std::is_same<const LPWSTR, wchar_t const*>::value));
EXPECT_TRUE((std::is_same<const LPWSTR, wchar_t* const>::value));
EXPECT_FALSE((std::is_same<LPWSTR const, const wchar_t*>::value));
EXPECT_FALSE((std::is_same<LPWSTR const, wchar_t const*>::value));
EXPECT_TRUE((std::is_same<LPWSTR const, wchar_t* const>::value));
EXPECT_TRUE((std::is_same<LPWSTR const, const LPWSTR>::value));
}
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from WideStrings
[ RUN ] WideStrings.WideStringTypes
[ OK ] WideStrings.WideStringTypes (0 ms)
[----------] 1 test from WideStrings (39 ms total)
[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (101 ms total)
[ PASSED ] 1 test.
To my shock and horror, a const LPWSTR is actually the same thing as a LPWSTR const. That is, the pointer is a const, not the wchar_t.
If you really want a const wchar_t* you need to use the LPCWSTR alias.
Way back in the 1970s, C was invented, on what is now primitive computer hardware with many limitations, both in speed and in memory. Sometimes people did strange things in an attempt to optimize for them. C-strings are a horrible abomination that came from that, and are probably the worst programming decision that we are still paying for today.
If you're using C++, you have the option to use std::string, which provides many (though not typically all) of the conveniences afforded by other modern programming languages. If you're using just plain C though, then you're pretty much stuck with C-strings.
The Windows API is built for C. Just plain C, not C++. One of the primary reasons is that C has a well defined ABI (Application Binary Interface), while C++ did not. That meant that C compilers from different vendors could all build to the same ABI standard, and their code would interoperate with each other after they were compiled. This isn't the case for C++, particularly not on Windows. C++ code compiled with one compiler is unlikely to be linkable or usable by C++ code compiled by a different compiler. As the Windows API needed to provide the base level interface for all programs running on the operating system, it was written for C. This is of course fine for C++ compilers which contain most aspects of C as a sublanguage. You just can't use any of the additional C++ features at the core OS interface level.
A similar decision was made with the op2ext module loader interface. The module loader is written with C++, but the interface to other modules uses a limited C only subset of the language. This was intentional to allow for modules written with other compilers, and probably even other languages.
Anyway, ranting about the finer points of ABIs aside, the design of C-strings is part of the reason why there is a funny pointer interface to strings.
In C, a C-string is an array of char, with a bad decision at the end. Err, I mean, 0 null terminator byte at the end.
In C, an array decays to a pointer. If you try to pass an array to a function, it decays to a pointer to the first element of the array. This means you only need to copy a small pointer onto the stack during the call sequence, rather than an entire array, so it's much more efficient for anything but the smallest of strings.
As a consequence of passing a pointer, rather than copying the array, it means the function gets access to the original data, rather than a copy. It might be that you don't want the function to modify the original array. Maybe the function should only be allowed to read the data. This is where const comes in. Data that is declared const is checked by the compiler so writes to it are disallowed. A function can take a pointer to const data. This is a contract in that the function is saying it won't modify any of the data passed to it through the pointer.
The calling method might store the data as a mutable array, but only provide a const pointer when passing the data to other functions. This means the data can change, but it is limited in what functions are allowed to change it.
If you try to pass const data to a function that accepts a pointer to non-const data, it is a compile error. The type checking system disallows this. If this wasn't disallowed, the called function might happily write all over the data that wasn't supposed to change. This means for a function to accept const data, it must declare the parameter to accept const data.
Conversion in the other direction is automatic. If you have non-const data, you can pass it to a function that accepts const data. The caller doesn't care. Write to it or not, it's allowed. Though once the function is declared to accept const data, the compiler does ensure the function lives up to that promise of not writing to the data.
A consequence of all this, is if you write a function that takes data in through a pointer, and if it only ever uses that data in a read-only manner, it should declare the parameter as a pointer to const data. That way it can accept data regardless of the constness of it.
Examples:
struct Data {
int field;
};
// Function accepting non-const data
void f1(Data* data) {
int local = data->field; // Read allowed
data->field = 0; // Write allowed
}
// Function acception const data
void f2(const Data* data) {
int local = data->field; // Read allowed
//data.field = 0; // Error, data is const
}
void f3() {
Data data = { 1 }; // Create some data
f1(&data); // Allowed (data is allowed to be changed)
f2(&data); // Allowed (data can not be changed)
}
void f4() {
const Data data = { 1 }; // Create some const data
//f1(&data); // Error, this would give a function write access to const data
f2(&data); // Allowed (data can not be changed)
}
void f5() {
Data data = { 1 }; // Create some data
Data* dataPtr = &data; // This pointer allows a writable view of the data
const Data* dataReadPtr = &data; // This pointer allows a read-only view of the data
f1(dataPtr); // Allowed (data is allowed to be changed)
f2(dataPtr); // Allowed (data can not be changed)
//f1(dataReadPtr); // Error, this would give a function write access to const data
f2(dataReadPtr); // Allowed (data can not be changed)
}
In C, a C-string is an array of char, with a bad decision at the end. Err, I mean, 0 null terminator byte at the end.
Hey man, you couldn't lead with the length because you didn't know what size to use. I mean, if your string is only 32 characters, using an int would waste a whole 3 bytes!
I think saying "we are still paying for [it] today" is a bit dramatic. Times were different, needs were different. We've moved on. Newer languages have replaced C++. The "Windows API" you are describing was called Win32 when I was in college. Win32!! We're on 64-bit machines these days. Win32 was forever ago. We passed through MFC, Visual C++ (remember that auto-generated nightmare?), WPF and WinForms. The only people who care about this stuff now are working with legacy code.... oh wait.
I was sitting here with XCode open and immediately noticed this line:
PushNotificationWasClicked(const char* body)
Look at that C-String goodness.
Honestly, I've used C-string so many times that I don't even think about it. The basic manipulation of a C-string is:
strlen
strcmp
strcpy
strcat
It's pretty simple until you throw in Windows god awful typedefs for everything.
Yeah, I'm killing time waiting for a build. So what?