Quick C++ memory management question.
Can I write the code below, or will the created VolFile become a memory leak? The underlying function listContents requires a pointer to a VolFile class.
if (XFile::extensionMatches(filename, ".vol"))
archiveConsoleListing.listContents(new VolFile(filename.c_str()));
-Brett
The existing CLM packing code will already error on this condition. The packing code will try to load the WAVE header from all input files, which will error out if they are not proper .wav files. It also then compares the headers to make sure they are all the same format, and errors out if they are not.
Perfect. I'll just display the error uses cerr for the user.
I'll upload my progress shortly. It will live under the GameResources directory in the repository. So far, I've hard coded the commands LIST (earlier CONTENTS) and FIND. Next step will be to design the command argument parser and hook LIST and FIND to command arguments. I'll base it off the code I wrote for OP2MapImager. The usage statement is also partially written.
I was surprised how easy it was to do formatting with cout. It breaks the rule that most everything is harder with C++. :)
(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=902;image)
Nice looking output. Does it handle long filenames well? Do long filenames show in full, or get truncated?
Thanks Hooman. Really long file names currently jack up the formatting. I'll try to fix this in the near future by setting a max filename character length before ignoring the length on the filename column width. This way really long filenames will only mess up their row on the table and none of the others.
I've programmed the LIST and FIND commands to accept command line arguments. They seem pretty robust. I'm partially through the EXTRACT command.
Below is a draft of the usage statement. I'm trying to go for a POSIX syntax. Let me know if it is confusing or out of line. () means required and [] means optional and ... means multiple allowed.
(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=905;image)
-Brett
Just uploaded code to the repository. I fixed the LIST command to play nicely(ish) with too long of filenames. See screenshot for what I mean. Basically, the normal list will cap out at 35 character length for the filename including extension. If the filename length is greater than 35 characters, it allows the longer filename to extend beyond the width of the file size column without affecting the rest of the table.
I think this should suffice unless anyone has better ideas on the matter. I would prefer if no one uses filenames that are so large though for my own sanity. ;)
(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=914;image)
Also, I feel like the smiley face choices change every other time I post. I am with Hooman that they are better not being animated.
Back to work on the EXTRACT command...
-Brett
The "KB" in the header, but not the individual rows seems weird to me. Something about a file of size "2", with no units, and it not being bytes feels odd. If using human readable units (B, KB, MB, GB), I'd suggest they change based on the row data, and hence be included for each row. Either that or fix it at bytes for all rows. Looks like there is sufficient space available to display bytes, and the result may be more meaningful.
Hmmm, I just ran the 'dir' command for windows in a command prompt. It defaults to showing data in bytes. I'm assuming a similar command in Linux would also display file size in bytes? So, probably best for consistency to switch to bytes. Bytes always seemed too small to me when comparing file sizes, but I'd rather be uniform.
It also occurred to me the sizes should be right aligned instead of left aligned.
I'll plan to make another pass on the LIST command's output at some point and make these changes.
This sounds like a very good plan. I actually regret some of the design choices of the old code. The class to read from a Vol or CLM file doesn't have much in common with code to create them. The multiple inheritance design was bad. A static method for archive creation makes way more sense.
I think the old design is bad for the same reason that "bidirectional streams" are a bad design. It really makes no sense.
I'm just really glad there is boiler plate code that can handle archive file access. This is something that would have been well beyond my ability. I don't mind if we want to rewrite the archive code to cut out the multiple inheritance.
I noted that when trying to create a vol file, the current code packs the files properly into temp.vol, but doesn't create a vol file with the passed filename.
Below is the function in question. I think if I just change the hard coded "temp.vol" to be volumeName from the function's argument, it will work properly.
bool VolFile::CreateVolume(const char *volumeName, int numFilesToPack,
const char **filesToPack, const char **internalNames)
{
CreateVolumeInfo volInfo;
volInfo.numFilesToPack = numFilesToPack;
volInfo.filesToPack = filesToPack;
volInfo.internalNames = internalNames;
// Make sure files were specified
if (numFilesToPack < 1 || filesToPack == NULL || internalNames == NULL) return false;
// Open a file for output
if (OpenOutputFile("Temp.vol") == 0) return false; // Return false on error
// Open input files and prepare header and indexing info
if (PrepareHeader(volInfo) == false)
{
CloseOutputFile();
return false;
}
// Write volume contents
// ---------------------
// Write the header
if (WriteHeader(volInfo) == false)
{
CleanUpVolumeCreate(volInfo);
return false;
}
if (WriteFiles(volInfo) == false)
{
CleanUpVolumeCreate(volInfo);
return false;
}
CleanUpVolumeCreate(volInfo);
return true;
}
My current goal will be just getting the console interface working properly. Once that is done, I guess we can discuss what we want to do to the archive library code.
I was noticing that we are passing Boolean success values out right now for CreateArchive and Repack functions. I'm just adding a cerr message saying creating the archive failed right if the CreateArchive function passes back false. I'm thinking passing the Boolean success out is probably driven by using the windows specific file handles. Perhaps when we switch the code over to something less operating system specific we can switch to more of a throwing exceptions error handling than Boolean? I think we could probably provide a little more fine grained report to the user on why the action failed.
I would highly highly highly recommend switching
to
and
to
const std::vector<std::string>&
Perhaps
typedef std::vector<std::string> stringlist;
const stringlist&;
(really wish there was an inline code tag, hope all the above makes sense)
Alright, the LIST command is fixed up for right aligned and showing results in bytes. I think LIST is pretty much done.
(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=957;image)
Hmm, good point. I was planning mostly implementation changes, with hopefully little impact on the interface. At least that's how I see changes to remove dependence on windows.h. There are other areas I might like to update though.
One thing on my mind is the idea of using string and vector rather than char* and array. That could lead to some interface changes. I suspect the impact on your code would be minimal, since you're already using string and vector. Just remove the conversions before the calls.
I also had thoughts of reworking the class hierarchy to remove multiple inheritance. Perhaps make some methods static. It seems you've been working around problems related to that, so it could simplify some of your code. I'm not really looking at this idea yet though.
I took a look at your updates. Seems you've done a lot to simplify the command line argument processing.
I noticed in selectCommand, you have a bit long switch statement. I think that can be simplified. A switch statement is basically a jump through a jump table, and in your example each jump destination is basically just a call to some method. This is similar to virtual function dispatch, which is a call through a table of pointers to functions. You could potentially replace the switch with a call that uses virtual function dispatch. The result should be more compact and easier to maintain. It gets rid of the tedious process of adding to the switch statement when you add new commands. I hardly ever use switch anymore, now that I understand virtual functions.
To make it work, define a common base class, perhaps named ConsoleCommand. It would have derived classes List, Find, Extract, Create, Help, Add, Remove. The base class would contain a pure virtual method Execute(const ConsoleArgs& consoleArgs). The common Execute name for each command means your cases all collapse down to the same call, which is dispatched virtually based on the class type of the underlying command.
command->Execute(consoleArgs);
The part I haven't addressed yet, is how command is set to an object of the appropriate type. There are many possible solutions there. I'll take a further look at your code before suggesting something there. It seems you're mapping commands to an enum index. Where that is done could be an appropriate tie in point.
I'm getting hung up on a couple of bugs during the final testing. I've fixed one myself, but there was another related to exceptions that I wanted to mention.
The ClmFile constructor contains a couple of throws for exceptions when trying to access and parse a clm file. These throws are followed by a char* statement only and not an exception class. Like this:
ClmFile::ClmFile(const char *fileName) : ArchiveFile(fileName)
{
... code to open a file handle ...
if (m_FileHandle == INVALID_HANDLE_VALUE)
throw "Could not open file";
if (ReadHeader() == false)
throw "Invalid header";
... more code ...
}
When these exceptions are triggered, even though I wrapped the call to this function in a try catch statement, it blows up the application.
Below I rewrote the function to throw an actual exception as opposed to just a char*. This now makes it behave properly and allows my try/catch to catch the exception and spit out a message to the user about the malformed file.
ClmFile::ClmFile(const char *fileName) : ArchiveFile(fileName)
{
... code to open a file handle ...
if (m_FileHandle == INVALID_HANDLE_VALUE)
throw exception("Could not open file");
if (ReadHeader() == false)
throw exception("Invalid header");
... more code ...
}
Since C++ allowed the code to compile and operate by throwing the char* without including the exception class, I assume there must be a time/place to do this?
So, when would I want to throw char* instead of an exception? Is it fine with everyone if I just throw an exception instead of the char* so that I can properly catch the exception?
VolFile.cpp and possibly other code in the archive project have similar throw statements. I would probably change them all out if this sounds agreeable.
Someday I'll stop procrastinating on the release of this project, I promise. :)
Thanks,
Brett
I very seriously doubt a catch(char*) will work. Probably need a catch(const char*) but even then I suspect there's an implicit conversion going on. You should either catch(const std::string&) OR throw std::runtime_error(const std::string& what).
Most of the time I use std::runtime_error but in some cases it makes sense to derive 'specialized' exception objects from runtime_error:
class SomeExceptionClass : public std::runtime_error
{
public:
SomeExceptionClass() : std::runtime_error("Message Here") {}
virtual ~SomeExceptionClass() {}
};
I use this in NAS2D for very specific errors that the user can recover from. For example:
class filesystem_already_initialized;
class font_bad_data;
class image_bad_copy;
class mixer_backend_init_failure;
class renderer_opengl_context_failure;
And etc. This allows the program to catch a specific error or provide a generic std::runtime_error& handler:
NAS2D::Image image1("path/to/image.png");
try
{
NAS2D::Image image2 = image1;
NAS2D::Font font("path/to/font.ttf");
/* exception of type font_bad_data thrown here */
}
catch(const image_bad_copy& e)
{
/* Some sort of exception handling in the case of image_bad_copy */
std::cout << "Failed to copy image: " << e.what() << std::endl;
/* Possibly set up a default image of some sort here to visually indicate an error */
}
catch(const std::runtime_error& e)
{
/* No handler for font_bad_data, so catch by reference to base type as a generic handler */
std::cout << "Some other runtime error that doesn't require exploding: " << e.what() << std::endl;
}
Note that the above code doesn't attempt to catch possible other exception types, e.g., a std::bad_alloc. In a case like a failed memory allocation it would be appropriate in many cases to allow the program to unwind back to the original try block and possibly to just exit alltogether.
You're quite right, it would need to be catch(const char*). I suspect this is a somewhat recent change in the language where embedded text strings no longer implicitly convert to char* and you're forced to use const char* instead. That means my old code is broken now, since it was indeed catching char*.
Not too sure what you meant by suspecting there was an implicit conversion going on. Should be no conversion for const char*, since an embedded text string will be of type const char*. I tried catching with a string& and a const string&, but neither would work.
Example:
#include <iostream>
#include <string>
using namespace std;
int main()
{
try {
throw "A text string";
}
catch(char *) {
cout << "char*" << endl;
}
catch(const char*) {
cout << "const char*" << endl;
}
catch(const string&) {
cout << "string&" << endl;
}
return 0;
}
Output:
If I remove the catch(const char*) block, leaving the others, the output is:
Good point on the double catch design. That is useful to handle error of different severity, or that require different logging output.
I spent some time on OP2Archive recently.
Now when you attempt to upload a filename that contains a non ASCII character (above decimal 127), an error is thrown and displayed to the user. This is to protect Outpost 2's alphabetical sorting algorithm for archive files that does not include support for things like characters with accents.
I'm using the function below to check if a non ASCII character is present. I'm not thrilled with the solution. Since characters above 127 come back as a negative number, I'm checking both below 0 and above 127. I'm wondering if there are holes to approaching the problem this way. If anyone has any suggestions, I would be will to implement them instead. Since the sort is isolated in this function, it will be easy to change out in the future if bugs are discovered:
#include <algorithm>
bool StringHelper::ContainsNonAsciiChars(string str)
{
return std::any_of(str.begin(), str.end(), [](char i) { return (i < 0 || i > 127); });
}
I also opened a new branch on OP2Utility. It separates each class inside Archive into a discrete file of source code. Previously, we had ArchiveUnpacker, ArchivePacker, MemoryMappedFile, and ArchvieFile all defined in the same .h/.cpp file. I found this pretty confusing. I opened the branch so that Hooman could review my changes to his code before affecting the master.
Plan going forward:
- Update the Archive function below to not take an internalNames variable. Instead, produce the internal names from the filesToPack variable organically.
virtual bool CreateVolume(const char *volumeFileName, int numFilesToPack, const char **filesToPack, const char **internalNames) = 0;
- Move code to alphabetically sort the list of files to pack into the Archive code. (Currently, anyone wanting to create an archive must manually sort their filenames).
- Move code to check for non-approved characters in filenames into the archive code. (Same reason as above)
- Push another release of OP2Archive.
I'm planning to implement these functions using modern C++ practices within the Archive code. I've already been sprinkling in things like std:unique_ptr into the Archive code. Hooman, if this bothers you, please let me know and I'll stop. I'll do all this work in a separate branch for merging once everyone is happy with it.
Thanks,
Brett
...
I'm using the function below to check if a non ASCII character is present. I'm not thrilled with the solution. Since characters above 127 come back as a negative number, I'm checking both below 0 and above 127. I'm wondering if there are holes to approaching the problem this way. If anyone has any suggestions, I would be will to implement them instead.
...
Use an unsigned char instead. Ascii characters aren't defined in negatives so it's odd to look at values less than zero.
/* Note explicit cast, quiets static code analyzers */
return std::any_of(str.begin(), str.end(), [](unsigned char i) { return (static_cast<unsigned short>(i) > 127); });
Since it is unusual to use high ASCII characters, and encoding considerations were likely not accounted for, it may make sense to warn the user of a potential problem, though I really feel this should be a warning rather than an error. Besides, how can someone test the behaviour of Outpost 2's algorithm if you program refuses to create the test data.
Hooman, since you don't agree with aborting the operation in this case, I'll drop integrating this feature into OP2Utility's Archive library code. It will remain in the OP2Archive as it currently exists. Feel free to update OP2Archve to display a warning instead of aborting if you wish.
I researched and used TortoiseGit to perform both the interactive rebase and the force push. First time using --force, but it appears everything worked out right. Not sure how rewriting the public history will affect your copy of the repository. Thanks for the quick tutorial on how to accomplish it.
The sort commit isn't really related. It would be better on a different branch. Though I do understand the temptation, as keeping work linear is easier, and you often have changes that depend on other pending changes. In this case, you could have created another branch off of master. If the changes were more dependent on the branch changes, you can create a branch off another branch.
I think you are talking about the commit Fix function StringHelper::ContainsNonAsciiChars(std::string str) to use only unsigned chars. I made this commit to the master and then pushed it also to the branch in an effort to keep both branches as closely aligned as possbile.
However, now that I merged the branch, the commit Fix function StringHelper::ContainsNonAsciiChars(std::string str) to use only unsigned chars is listed twice which isn't desirable.
Hooman, how do you feel about changing this function:
virtual bool CreateVolume(const char *volumeFileName, int numFilesToPack, const char **filesToPack, const char **internalNames) = 0;
instead to read:
virtual bool CreateVolume(std::string volumeFileName, std::vector<std::string> filesToPack) = 0;
I think we both agree on removing const char** internalNames and generating these internal to the OP2Utility Archives code.
Changing filesToPack from const char** to std::vector<std::string> removes the need to pass the int numFilesToPack. I'm working with vectors on my side and it doesn't make sense to me to convert them into a char** if I don't have to. Perhaps I'm missing something though.
Thanks,
Brett
Leeor,
My intent is to provide the archive code with a copy of the string and vector to do with what they please without modifying the values kept by OP2Archive. So I think in C++11 or newer the best way to achieve this is as stated: I have no real idea what I'm trying to say here, it just seems to be the right way to do it based on some research online. The whole topic of how to pass arguments in C++ is really confusing to me. :-\
virtual bool CreateVolume(std::string volumeFileName, std::vector<std::string> filesToPack) = 0;
Hooman,
Check Leeor's 4 APR post in this thread. I added the static_cast based on his recommendation.
Plan to start working a new branch starting today to make these changes.
Thanks,
-Brett
Okay, I just created a new branch and pull request for OP2Utility.
I updated:
virtual bool CreateVolume(const char *volumeFileName, int numFilesToPack, const char **filesToPack, const char **internalNames) = 0;
to read:
virtual bool CreateVolume(std::string volumeFileName, std::vector<std::string> filesToPack, std::vector<std::string> internalNames) = 0;
I stopped here because this created a lot of changes in the source code and I didn't want the branch to become too big to review. Once this branch is merged, I'll create a new branch to remove internalNames from CreateVolume.
I tested the code in this branch by creating a new .vol and .clm file. Both packed successfully. I even tested the clm file in game and it worked.
Commit Message
* Update ArchivePacker::CreateVolume() to use std::string and std::vector in input arguments
- Update VolFile.h/.cpp to use size_t to represent the number of files in archive instead of using int.
- Store filesToPack and internalNames as std::vector<std::string> instead of char**.
- Remove destruction subroutines where char** was replaced with std::vector<std::string>.
Pull Request URL: https://github.com/OutpostUniverse/OP2Utility/pull/9
I requested review from Hooman, but don't have any issues with others reviewing/commenting on it as well through GitHub.
Thanks,
Brett
To readress my statement of passing by reference to const, you _still_ should do that. What you're doing now is passing a std::string by value. This means that you're invoking a constructor and a destructor when you don't actually need to. By passing by reference to const (e.g., const std::string& volumeFileName), you avoid the cost of the construction and destruction of a copy of the object. Instead, you are passing by reference. Since the function doesn't appear to be changing the string at all and it appears it's used once in the function CreateFileA, passing by reference to const is the most efficient way to do this.
Likewise, the second and third arguments:
std::vector<std::string> filesToPack, std::vector<std::string> internalNames
should also be either references or references to const. E.g.:
const std::vector<std::string>& filesToPack, const std::vector<std::string>& internalNames
Basically, your function prototype should look like this:
bool ClmFile::CreateVolume(const std::string& volumeFileName, const std::vector<std::string>& filesToPack, const std::vector<std::string>& internalNames)
Again, the function appears to call const member functions of std::vector... that being the case, by passing by value as you are, you're creating copies of each std::vector which invokes a constructor for the vector and a constructor for each item in the vector. For small file counts this is trivial but if you are doing any processing on any less than trivial count of files and internal names, you're potentially copying hundreds to thousands of objects. As you can imagine this gets very slow very fast.
This is in direct opposition to languages like Visual Basic, Java and C# where passing by value is as fast or faster (basically it's handled internally as pointers but you as the user don't see that).
Pass by reference to const whenever and wherever possible. If you don't actually need a copy of an object, don't make said copy. By changing the arguments to reference to const, you won't need to make an changes to the function itself unless you're calling non-const member functions but from a quick skim of the function it doesn't look like you're doing that.
As a semi-related side note, some really good compilers can see these kinds of things and will probably optimize that sort of needless copying, allocation and deallocation on its own but you can't count on that.
As another semi-related side note mostly regarding your post about C, in the case of C you'd be passing arguments via pointers and const pointers... e.g., const char* _string, etc.