Outpost Universe Forums

Projects & Development => Projects => Topic started by: Vagabond on August 04, 2017, 07:34:24 PM

Title: OP2Archive Application Development
Post by: Vagabond on August 04, 2017, 07:34:24 PM
I was thinking we should create a finished version of a console application that allows for manipulation of Outpost 2 Archive files. We have the GUIs Vol Creator and Vol Extractor that work (http://wiki.outpost2.net/doku.php?id=outpost_2:helper_programs). Unfortunately, as they have aged they now require registering some components with newer versions of the Windows operating system that are not present by default. They also do not work with the clm file (I think they cannot anyways?). Also, the code for these two GUIs does not appear to be in the repository.

As an added benefit, we may eventually be able to get the application to compile for Linux or other platforms.

I was thinking a console application because they are simple to program and usually age very well. I'm also not experienced with GUI work in C++ which would take me a really long time to spool up on.

We have a console application called VolDecompress that Hooman made, which works well for extracting archive files. I was thinking OP2Archive might be a better name for this application since it will do more than extracting. Hooman has already written the code to extract, create, and modify both .vol files and .clm files.



Command line example convention syntax: https://www.ibm.com/support/knowledgecenter/SSZJPZ_8.5.0/com.ibm.swg.im.iis.common.doc/common/command_conventions.html

Below are the console commands I was thinking it should support:

OP2Archive CONTENTS archiveName.[vol|clm]
  * Lists the contents of an archive file.

OP2Archive FIND filename
  * Lists the archive filename that contains the specified file. Or lets you know the file is not archived.

OP2Archive CREATE archiveName.[vol|clm] [filename...]
  * Creates an archive with the given name. Adds the listed files to the archive. Also allows creating an empty archive.

OP2Archive ADD archiveName.[vol|clm] filename...
  * Adds the specified file to the archive.

OP2Archive REMOVE archiveName.[vol|clm] filename...
  * Removes the specified file from the archive.

OP2Archvie EXTRACT archiveName.[vol|clm]... [filename]...
  * Extracts the specified file from the archive. If no filenames provided, extracts the entire archive.



File Whitelisting

I was thinking we could whitelist file types that are supported by Outpost 2 loading from a given archive type. (It looks like maybe anything could be stuffed in a .vol file and loaded by Outpost 2, so I'm a bit on the fence about this). I don't want to get too wrapped up in checking files though. In general the user will need to be familiar with what is appropriate and not to stuff in the archives.

 * .vol file include whitelist: .bmp .map .prt .raw .txt .wav .rtf
 * .clm file include whitelist: .wav



The initial program will be written in C++ using VS2017.

Any thoughts or additions to this are appreciated. I haven't started any actual coding yet, so it is easy to change. : )

Once this is done, we can look at removing windows specific code from the OP2Utility library archive section and test the results to ensure everything is working properly.

-Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on August 05, 2017, 01:38:00 AM
GUI work in C++ is extremely painful. It would be better to build a console app in C++ (as you suggested) and then, if demand exists, develop a GUI front-end in either C# or VB.Net.

The old programs didn't age well because they were written in VB. I'm all for replacing them with better open-source programs.

Want to state again that all of your work is very much appreciated!
Title: Re: OP2Archive Application Development
Post by: Hooman on August 05, 2017, 09:48:05 PM
I would love to have a console application like this that could run on Linux.


The whitelist makes sense for the CLM files. The CLM files are only really useful for WAV data, and all data must have the same header parameters. The header is stored once in the CLM. The WAV data is stored in a stripped form.

The VOL files are more generic, and can basically store any kind of data. Though with the way Outpost 2 works, it wouldn't make sense to store the DLLs inside the VOL files, since the Windows loader wouldn't be able to access them there, and that's needed for the levels to load and be playable. They might show up in a list of available games if they were in a VOL, but they wouldn't run. Hence the VOL files might have a blacklist for DLL files. Not sure if you'd really want to enforce this though, or maybe just give a warning higher up in the user interface. Nothing wrong with transporting a DLL in a VOL file, though there is no reason to do this.


Something I've learned from experience, the ADD and REMOVE functionality is a pain to implement in an efficient way, and often not very useful in practice. It might be nice to have them for a sense of completeness, but I think they should get lower implementation priority. The issue is that added files means updating the VOL header, which might cause expansion of the header if there isn't sufficient blank space, and so you're left copying the contents to a later point in the file to make space. In effect, it becomes a CREATE operation, where some of the source files are an existing VOL file. Similarly, if you want to remove a file and don't want dead space in the VOL header, you're again copying data back towards the front, again effectively doing a CREATE operation. Hence why I figure the CREATE and EXTRACT methods are the most useful, and you can get away with just those.

Might consider renaming CONTENTS to LIST. I think that would be more standard.
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 08, 2017, 08:24:07 AM
Leeor, thank you for the encouragement! Hooman, Thank you for the input!

Quote
The whitelist makes sense for the CLM files. The CLM files are only really useful for WAV data, and all data must have the same header parameters. The header is stored once in the CLM. The WAV data is stored in a stripped form.

Sounds good. For the first iteration I think we can just cerr if something besides a .wav file is attempted to be added to .clm. Maybe work up to actually error checking the contents of the .wav file if we continue far enough with the project.

Quote
The VOL files are more generic, and can basically store any kind of data. Though with the way Outpost 2 works, it wouldn't make sense to store the DLLs inside the VOL files

I agree. We can put a note in the Readme about the problems of storing dlls in the .vol file.



Quote
Something I've learned from experience, the ADD and REMOVE functionality is a pain to implement in an efficient way, and often not very useful in practice. It might be nice to have them for a sense of completeness, but I think they should get lower implementation priority. The issue is that added files means updating the VOL header, which might cause expansion of the header if there isn't sufficient blank space, and so you're left copying the contents to a later point in the file to make space. In effect, it becomes a CREATE operation, where some of the source files are an existing VOL file. Similarly, if you want to remove a file and don't want dead space in the VOL header, you're again copying data back towards the front, again effectively doing a CREATE operation. Hence why I figure the CREATE and EXTRACT methods are the most useful, and you can get away with just those.

Good to know. For modifying and creating archive files, I think we could implement what you are saying by encouraging the workflow below:


What if for ADD/REMOVE we:


The downsides to this approach is that a CREATE/ADD command are going to be more expensive than one would intuitively think and if an unhandled exception occurs the temp folder may remain or the original archive file be deleted and gone. Hmm, I could see this maybe not being a good idea to implement.



Optional Arguments

I was thinking about the following optional arguments

    -H / --Help / -?: Displays Help File
    -Q / --Quiet: [Default false] Add switch to run application without issuing console messages. (Not available for LIST/CONTENTS)
    -O / --Overwrite: [Default false] Add switch to allow application to overwrite existing files. (Not available for LIST/CONTENTS)
    -C / --Compression: [Default None]. Allows None|JPG|BMP. Sets the compression alghorithim used in archive. (Only available for CREATE.)
    -D / --DefaultDirectory: [Default to a directory called archiveName for extraction of an entire archive file or the current working directory for a single file]. Allows changing where the extracted contents of an archive file (either all contents or a single file) are placed.



Help / Usage Statements

We can have a short initial usage statement and then have the user type -h on the given command for details on the command. For example, 'OP2MapImager CREATE -h' for help on the CREATE command.



Format of LIST command

I was thinking about the following format for the LIST command. I think size might be the compressed size of the file inside the archive if compression is used instead of the uncompressed size for ease of implementation? It would be the results of ArchiveUnpacker::GetInternalFileSize. Either way, it would be useful for seeing the relative size of each file.

Contents of maps01.vol (12 files)
ID  NAME          SIZE (KB)
--------------------------------------------------
00  blue.bmp      x,xxx
01  color.bmp     xxx,xxx
02  eden04.map    xxx
03  op2_art.prt   xxx
04  tutorial.map  xxx
05  virmask.raw   xxx
06  well0000.bmp  xxx




Quote
Might consider renaming CONTENTS to LIST. I think that would be more standard.

I was on the fence about this. Since you are mentioning it, probably best to go with LIST.
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 09, 2017, 06:39:29 AM
Quick C++ memory management question.

Can I write the code below, or will the created VolFile become a memory leak? The underlying function listContents requires a pointer to a VolFile class.

Code: [Select]
if (XFile::extensionMatches(filename, ".vol"))
archiveConsoleListing.listContents(new VolFile(filename.c_str()));

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on August 09, 2017, 08:56:36 AM
Quote
Sounds good. For the first iteration I think we can just cerr if something besides a .wav file is attempted to be added to .clm. Maybe work up to actually error checking the contents of the .wav file if we continue far enough with the project.

The existing CLM packing code will already error on this condition. The packing code will try to load the WAVE header from all input files, which will error out if they are not proper .wav files. It also then compares the headers to make sure they are all the same format, and errors out if they are not.


To repack an archive, you can read the files right out of the old archive, same as the game can, while writing them into a new temporary file. Once the new file is packed, you'd want to do an atomic rename to the original filename. If the OS supports atomic renames, the original filename should at all times remain valid, and either point to the original archive, or the new fully packed and ready archive. Should be safe. Failing that, a quick delete and rename at the end is unlikely to be caught, and if the operation is interrupted after the delete, but before the rename, the user could always just complete that operation manually themselves. Again, that's fairly safe.

Quote
I think size might be the compressed size of the file inside the archive if compression is used instead of the uncompressed size for ease of implementation? It would be the results of ArchiveUnpacker::GetInternalFileSize.

I agree. The VOL structure doesn't make it easy to get the unpacked size without actually doing the decompression. They could have made it easy by using the unpacked size in the index entry, and the VBLK size as the packed size, but they didn't. They are both the packed size.
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 10, 2017, 06:30:50 AM
Quote
The existing CLM packing code will already error on this condition. The packing code will try to load the WAVE header from all input files, which will error out if they are not proper .wav files. It also then compares the headers to make sure they are all the same format, and errors out if they are not.

Perfect. I'll just display the error uses cerr for the user.


I'll upload my progress shortly. It will live under the GameResources directory in the repository. So far, I've hard coded the commands LIST (earlier CONTENTS) and FIND. Next step will be to design the command argument parser and hook LIST and FIND to command arguments. I'll base it off the code I wrote for OP2MapImager. The usage statement is also partially written.

I was surprised how easy it was to do formatting with cout. It breaks the rule that most everything is harder with C++. :)


(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=902;image)
Title: Re: OP2Archive Application Development
Post by: leeor_net on August 10, 2017, 03:47:48 PM
A lot of the new features in C++11 have made C++ considerably easier to use. I really love the new additions... not that this affects what you did but still. ;D

Anyway, I'm looking forward to having a CLI tool to do the packing/unpacking. I had to extract some map and graphics files and the VOLExtractor program is reasonably okay but it requires VB6 OCX files and I was somewhat unhappy that I had to register it as a COM object in the registry. Super not happy about that. :-\

So anyway yeah, I'm very much looking forward to that.

Separate note -- maybe now is a good time to start migrating some of the code from SVN to Git? I've started a new project forked from another one of mine and uploaded it to the OPU GitHub repository.
Title: Re: OP2Archive Application Development
Post by: Hooman on August 12, 2017, 04:28:12 AM
Quote
Separate note -- maybe now is a good time to start migrating some of the code from SVN to Git?

I was thinking the same thing. I'm enjoying the experience with GitHub. Let me know if you'd like help with moving things. I think git-svn can be used to make the transition easier, while preserving history.


Nice looking output. Does it handle long filenames well? Do long filenames show in full, or get truncated?
Title: Re: OP2Archive Application Development
Post by: leeor_net on August 12, 2017, 05:22:42 AM
Quote
Separate note -- maybe now is a good time to start migrating some of the code from SVN to Git?

I was thinking the same thing. I'm enjoying the experience with GitHub. Let me know if you'd like help with moving things. I think git-svn can be used to make the transition easier, while preserving history.

There's a few things that we need to pull out of the SVN before we can move it over (namely the gamerelease directory) but otherwise git-svn should be fairly straight forward.

End Thread Hijack
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 13, 2017, 10:08:12 AM
Code: [Select]
Nice looking output. Does it handle long filenames well? Do long filenames show in full, or get truncated?

Thanks Hooman. Really long file names currently jack up the formatting. I'll try to fix this in the near future by setting a max filename character length before ignoring the length on the filename column width. This way really long filenames will only mess up their row on the table and none of the others.


I've programmed the LIST and FIND commands to accept command line arguments. They seem pretty robust. I'm partially through the EXTRACT command.

Below is a draft of the usage statement. I'm trying to go for a POSIX syntax. Let me know if it is confusing or out of line. () means required and [] means optional and ... means multiple allowed.

(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=905;image)

-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 13, 2017, 08:01:18 PM
Just uploaded code to the repository. I fixed the LIST command to play nicely(ish) with too long of filenames. See screenshot for what I mean. Basically, the normal list will cap out at 35 character length for the filename including extension. If the filename length is greater than 35 characters, it allows the longer filename to extend beyond the width of the file size column without affecting the rest of the table.

I think this should suffice unless anyone has better ideas on the matter. I would prefer if no one uses filenames that are so large though for my own sanity.  ;)

(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=914;image)

Also, I feel like the smiley face choices change every other time I post. I am with Hooman that they are better not being animated.

Back to work on the EXTRACT command...

-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 19, 2017, 09:06:46 AM
I've been plugging away at designing OP2Archive. The EXTRACT command is mostly designed but not really tested.

I have the CREATE command actually taking files and saving them in a new archive.

It currently requires opening an existing vol file before I have access to saving a new vol file. A cursory look at the ClmFile class code makes it look like it also requires opening an existing clm file in order to create a new clm file.

@Hooman,

How do you feel about & how hard do you think it would be to add a static function to the VolFile and ClmFile class that allows creating a the respective archive file without opening an existing archive?

If this isn't feasible, we can include a VolTemplate.vol and a ClmTemplate.clm with the executable to reference when creating new archives.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on August 20, 2017, 07:49:17 AM
Nice output. Looks like you've got the long filenames handled. Good that they don't get truncated. Better to be correct than pretty. Also good they don't mess up the formatting of other rows.

The "KB" in the header, but not the individual rows seems weird to me. Something about a file of size "2", with no units, and it not being bytes feels odd. If using human readable units (B, KB, MB, GB), I'd suggest they change based on the row data, and hence be included for each row. Either that or fix it at bytes for all rows. Looks like there is sufficient space available to display bytes, and the result may be more meaningful.


Quote
How do you feel about & how hard do you think it would be to add a static function to the VolFile and ClmFile class that allows creating a the respective archive file without opening an existing archive?

This sounds like a very good plan. I actually regret some of the design choices of the old code. The class to read from a Vol or CLM file doesn't have much in common with code to create them. The multiple inheritance design was bad. A static method for archive creation makes way more sense.

I think the old design is bad for the same reason that "bidirectional streams" are a bad design. It really makes no sense.
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 20, 2017, 02:52:37 PM
Quote
The "KB" in the header, but not the individual rows seems weird to me. Something about a file of size "2", with no units, and it not being bytes feels odd. If using human readable units (B, KB, MB, GB), I'd suggest they change based on the row data, and hence be included for each row. Either that or fix it at bytes for all rows. Looks like there is sufficient space available to display bytes, and the result may be more meaningful.

Hmmm, I just ran the 'dir' command for windows in a command prompt. It defaults to showing data in bytes. I'm assuming a similar command in Linux would also display file size in bytes? So, probably best for consistency to switch to bytes. Bytes always seemed too small to me when comparing file sizes, but I'd rather be uniform.

It also occurred to me the sizes should be right aligned instead of left aligned.

I'll plan to make another pass on the LIST command's output at some point and make these changes.



Quote
This sounds like a very good plan. I actually regret some of the design choices of the old code. The class to read from a Vol or CLM file doesn't have much in common with code to create them. The multiple inheritance design was bad. A static method for archive creation makes way more sense.

I think the old design is bad for the same reason that "bidirectional streams" are a bad design. It really makes no sense.

I'm just really glad there is boiler plate code that can handle archive file access. This is something that would have been well beyond my ability. I don't mind if we want to rewrite the archive code to cut out the multiple inheritance.

I noted that when trying to create a vol file, the current code packs the files properly into temp.vol, but doesn't create a vol file with the passed filename.

Below is the function in question. I think if I just change the hard coded "temp.vol" to be volumeName from the function's argument, it will work properly.

Code: [Select]
bool VolFile::CreateVolume(const char *volumeName, int numFilesToPack,
const char **filesToPack, const char **internalNames)
{
CreateVolumeInfo volInfo;

volInfo.numFilesToPack = numFilesToPack;
volInfo.filesToPack = filesToPack;
volInfo.internalNames = internalNames;

// Make sure files were specified
if (numFilesToPack < 1 || filesToPack == NULL || internalNames == NULL) return false;
// Open a file for output
if (OpenOutputFile("Temp.vol") == 0) return false; // Return false on error

// Open input files and prepare header and indexing info
if (PrepareHeader(volInfo) == false)
{
CloseOutputFile();
return false;
}


// Write volume contents
// ---------------------
// Write the header
if (WriteHeader(volInfo) == false)
{
CleanUpVolumeCreate(volInfo);
return false;
}
if (WriteFiles(volInfo) == false)
{
CleanUpVolumeCreate(volInfo);
return false;
}

CleanUpVolumeCreate(volInfo);

return true;
}

My current goal will be just getting the console interface working properly. Once that is done, I guess we can discuss what we want to do to the archive library code.

I was noticing that we are passing Boolean success values out right now for CreateArchive and Repack functions. I'm just adding a cerr message saying creating the archive failed right if the CreateArchive function passes back false. I'm thinking passing the Boolean success out is probably driven by using the windows specific file handles. Perhaps when we switch the code over to something less operating system specific we can switch to more of a throwing exceptions error handling than Boolean? I think we could probably provide a little more fine grained report to the user on why the action failed.
Title: Re: OP2Archive Application Development
Post by: leeor_net on August 20, 2017, 03:18:39 PM
I would highly highly highly recommend switching

Code: [Select]
const char*

to

Code: [Select]
const std::string&

and

Code: [Select]
const char**

to

Code: [Select]
const std::vector<std::string>&

Perhaps

Code: [Select]
typedef std::vector<std::string> stringlist;
const stringlist&;


(really wish there was an inline code tag, hope all the above makes sense)
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 20, 2017, 10:00:01 PM
Leeor,

I understand you wanting to switch to C++11 strings, but there are currently 13 files of code involved in manipulating archive files. I'd have to switch them all from char* to std::string, which would be a fair commitment of time. I've avoided any C++11 includes in the archive code since this is how Hooman left it. I'd probably want his thought on it before changing it all.

I do agree with you in the long term of wanting to move away from char* where not required to interface with legacy Outpost 2 code.



I did some experimentation trying to create empty vol and clm files. I had to remove an if statement from the volFile's code to allow it to create an empty vol file (the if statement was returning false if no files provided to pack). After this, I was able to create and manipulate the empty vol file without issue. I'll plan to re-add part of the if statement to still check for improper values, just not reject empty values.

The ClmFile class allowed creating an empty CLM file. After creation, the file could not be loaded though. I suspect this is because a ClmFile requires at least one source audio file to pull valid audio parameters from. I'll plan to add a check to the ClmFile that returns false if no files are included to keep from creating the invalid ClmFile. 

Curiously, the VolFile class allowed overwriting an existing file while the ClmFile class did not. I fixed ClmFile to allow overwriting a file.

To summarize:
 * You can now create empty vol files
 * I plan to disable ability to create empty clmfiles since they are currently badly formed
 * clm files can now be written over when creating new ones

My plan was to create both an empty vol and clm template to use when creating new archives. Unfortunately, I'll have to pack at least one audio file in the clm file for it to be valid (which will be 9mb). An alternative would be clipping the majority of the contents of the audio file to get it down in size. I'd have to research how to create an audio file to do this (I'm sure the data is floating around the forum).

Also, I noticed while you can get the compression type used on a vol file, there is no way to set the compression on a vol file when created. It might be as simple as changing the compression in the CreateVolume function, but I haven't tested this yet. Otherwise, we can just support reading compressed files and only support creating uncompressed vol files.

-Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on August 21, 2017, 07:01:38 AM
Just a note, std::string is not a C++11 deal, that's been in there since C++98.

I recommend it as it's a lot safer than using raw C strings. You can make small changes to the interfaces as you go. Except for the std::vector<std::string>, because of the way std::string is built, passing a const char* as a paramter for a std::string& will work just fine (under the hood it creates a std::string class, passes a reference to that, then destroys said string so there are some efficiency issues until everything is converted to std::string, but the point is it'll work). Using a std::vector<> though would complicate matters by requiring changes to any code that uses the original form (array of const char*).

Anyway, the intent behind it is to produce better code that's harder to break. By using objects that encapsulate pointers and handles C strings for the user, you provide some insulation against pointer mistakes. Having worked with C strings for a couple of months now, I've discovered how much of a headache they are.

As a side note, for functions not yet converted, you can use std::string::c_str() to get a const char*.
Title: Re: OP2Archive Application Development
Post by: Hooman on August 21, 2017, 01:47:41 PM
Quote
It also occurred to me the sizes should be right aligned instead of left aligned.

Good call, I missed that.

Multiple inheritance is the work of the devil  :P

Quote
I noted that when trying to create a vol file, the current code packs the files properly into temp.vol, but doesn't create a vol file with the passed filename.

I vaguely remember code to pack stuff into a temporary file, and then rename it to the proper file name. That avoided problems of corrupt files if the program or computer crashed while packing an archive. In particular, it was safer if you were repacking the current archive, in which case you wouldn't want to corrupt it before you had a reliable replacement, and you wouldn't want to replace it if you were still reading from it. Though from what you've pointed out, it does indeed look like a bug if the parameter is ignored. Perhaps the code was badly structured.

Quote
I was noticing that we are passing Boolean success values out right now for CreateArchive and Repack functions. I'm just adding a cerr message saying creating the archive failed right if the CreateArchive function passes back false. I'm thinking passing the Boolean success out is probably driven by using the windows specific file handles. Perhaps when we switch the code over to something less operating system specific we can switch to more of a throwing exceptions error handling than Boolean? I think we could probably provide a little more fine grained report to the user on why the action failed.

Indeed, exceptions would provide better error messages. I was mostly just following an older code style at the time. Also somewhat following how Outpost 2 seemed to work, which seemed to have exceptions disabled. The choice was not related to Windows or file handles. Back then, I tended to avoid exceptions since even having them enabled in the compiler, would cause less efficient code to be output, even if you weren't using exceptions. I didn't see the value in them at the time. I also tended to avoid any sort of runtime inefficiency, even when other concerns may have warranted it. Some of those concerns may have been dealt with in future compilers. At any rate, it would make sense to use exceptions here.


It may also be worth switching things over to using string and vector. I used to avoid a lot of standard library stuff for little good reason.


Quote
The ClmFile class allowed creating an empty CLM file. After creation, the file could not be loaded though. I suspect this is because a ClmFile requires at least one source audio file to pull valid audio parameters from. I'll plan to add a check to the ClmFile that returns false if no files are included to keep from creating the invalid ClmFile.

You're probably correct here. Maybe set a default WaveFormat if no input files are given, and see what happens?

It's good that you're testing corner cases.

Quote
My plan was to create both an empty vol and clm template to use when creating new archives.

I'm not sure I follow here. Was this to work around some issue I've forgotten about now? Was this for repacking or something? I think there must be a better way than relying on empty template files.

Quote
Also, I noticed while you can get the compression type used on a vol file, there is no way to set the compression on a vol file when created. It might be as simple as changing the compression in the CreateVolume function, but I haven't tested this yet. Otherwise, we can just support reading compressed files and only support creating uncompressed vol files.

Yes, we never wrote any compression code. We've only ever supported repacking of uncompressed data. The compression scheme was complex enough that I never quite wrapped my head around how to write a compressed stream. Plus there was little reason to work on the problem. I suspect it can be done, and shouldn't be too hard given decompression code already exists, but it may take some thought and a detailed look at the decompression code.

I can't remember now if the game included any compression code. I don't think it did, or if it did, I don't think it got documented in the disassembly efforts.
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 21, 2017, 10:53:23 PM
Alright, the LIST command is fixed up for right aligned and showing results in bytes. I think LIST is pretty much done.

(http://forum.outpost2.net/index.php?action=dlattach;topic=5987.0;attach=957;image)


Title: Re: OP2Archive Application Development
Post by: Vagabond on August 21, 2017, 11:37:31 PM
Besides fixing up the LIST command as indicated in the last post, I spent some time reviewing the Archives source code and creating more clm and vol files.

I committed changes to the repository. Below is the overview of changes.

OP2Archive:
 * Implemented the CREATE command.
 * Fixed bug with ClmFile that did not allow over writing an existing CLM file.
 * Allowed VolFile to create empty Vol Files.
 * Added error checking to ClmFile and VolFile that stop file creation if NULL pointers provided to filenames.
 * Fixed Archive LIST command to show file sizes right aligned and in bytes.



I think there is sort of a crossroads here on where I continue work. I can either spend time refactoring the archive code or leave the archive code alone and have some funny peculiarities in the OP2Archive console application.


Currently, the archive library code requires loading an existing vol or clm file into memory before being allowed to create a new file. To instantiate either a ClmFile class or VolFile class, you must provide the filename of the Vol or Clm file to load. Additionally, both classes reference memory locations to pull information out of the archive. (Sorry, if I'm not explaining the well.) Basically, when Hooman and I made up the OP2 Map Structure, we wrote up a structure that is then filled up with data when the stream is loaded. In the ClmFile and VolFile case, the code references the memory location in the raw loaded data, and sometimes also records important data to actual variables.

My first thought was to create a constructor taking no input parameters that represented an empty file. Then I could write a static function that creates the archive using the parameterless constructor. But, because the full contents of the archive are not contained in a structure, I cannot just go in and set reasonable default values without doing what looks to me like a very major overhaul of the archive code. To complicate this, I'm not familiar with the design of audio files, and I would want to bone up on this subject before really overhauling the ClmFile class anyways.

Currently, I don't really plan on spending enough time on this project to do a major overhaul of the archive library code and write a decent console application to use it. Someone with a better understanding of the Windows specific file handles and Windows specific audio file code could probably do it faster than me?

Anyways, after all that info, the archive library code works fine, just not how I was expecting to use it. I have noticed a couple of small bugs, but nothing that was not easy to patch with one line of code change here or there.

So, using the current archive library code means that OP2Archive will require a vol archive and clm archive template file in order to construct new archives. Since I can create an empty vol archive, this will be about 2kb I think. The problem is that a clm file will not save in good form without at least one audio file included to pull the proper audio meta-data out of. The smallest audio file included in OP2 is about 9 megabytes. OP2 also has some weird rules on how to properly format an audio file. See http://forum.outpost2.net/index.php?topic=3906.0. If I can figure out how to create a short 1 second clip that is well formed, I will do this, but I may just leave it at 9 megabytes for now.

Also, I verified that there is no plumbing in VolFile for creating a compressed archive. I didn't continue digging into the LZH code to see if it would support the operation. I think if we are content keeping all the files uncompressed in the archives, there probably isn't a lot of reason to dig down this rabbit hole. I was pretty curious how much hard drive space we could save by compressing it all. I'll plan to take the compression argument out of the usage statement.

To recap: I'm planning on finishing OP2Archive in a way that requires a template of each archive in order to create a new file. If we get around to reformatting the archive library code, it will be easy to go back through the OP2Archive application code and change how the archive files are creating.



List of things to consider in improving Archive library code sort of in order of importance in my eyes

 * Remove Windows Specific file manipulation code (windows.h).
 * Remove Windows Specific audio file code from ClmFile class (windows.h).
 * Create a way to make a default, empty vol and clm file with a parameterless constructor.
 * Add static Create functions to both ClmFile and VolFile.
 * Consider switching fileName to filename (or change all the filename in my code to read FileName :) )
 * Consider switching from char* to std::string.

Thanks for everyone's help so far on this project!

-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 22, 2017, 04:55:13 PM
Quick update. I used Audacity to crop the EDEN11 audio track down to a half second or less and then exported it. This made the ClmTemplate.clm only about 2kb in size. the file formed well and was able to act as a template when encoding a new op2.clm file that in turn was able to play the in game videos without problem. So, no more worry about a 9 MB template file.

I'll plan on mentioning Audacity as a starting point in the ReadMe for formatting audio files in an Outpost 2 specific format. Unless someone else has a better free tooling solution?

My only worry is that I don't have sound in the main menu anymore? Not sure what caused this. I'm trying to download the original clm file from OPU to test if that fixes it, but it wants about an hour and a half to download. I doubt it is a problem with how the clm archive was created though or it would probably manifest in all the in game music and not just on the title track...

-Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on August 22, 2017, 05:40:00 PM
I'm actually going to be doing something very similar with a mapper project for the same sorts of reasons. There is information that we don't yet know how to fill exactly so we need an existing file to get that going.
Title: Re: OP2Archive Application Development
Post by: Hooman on August 22, 2017, 11:53:52 PM
Audacity is a wonderful tool.

Quote
... was able to play the in game videos without problem

I assume you mean in game music? I would be very surprised if you got anything else working from a CLM file.

The main menu music is a separate wav file. It's not packed in the CLM.


I still feel like this template file idea is kind of going down the wrong path.

Quote
* Remove Windows Specific file manipulation code (windows.h).
 * Remove Windows Specific audio file code from ClmFile class (windows.h).
 * Create a way to make a default, empty vol and clm file with a parameterless constructor.
 * Add static Create functions to both ClmFile and VolFile.
 * Consider switching fileName to filename (or change all the filename in my code to read FileName :) )
 * Consider switching from char* to std::string.

These are all good ideas. I'm thinking getting rid of the memory mapped files would make conversion to platform independent code easier, though it's a change I kind of dread. Another change is getting rid of the multiple inheritance. Maybe that can be done independently, and I think it would solve your template file issue.

Quote
I'm actually going to be doing something very similar with a mapper project for the same sorts of reasons. There is information that we don't yet know how to fill exactly so we need an existing file to get that going.

The maps do contain a large amount of stock data. I think this should be extracted into special data files that get shipped with the editor. Or perhaps compiled into the editor. Most useful is the tile set information, which should probably get packaged into a tileset meta info file, and the tile group info, which can also be a part of the tileset meta info. This doesn't need to be in the same format as the current map files. In fact, I think we could probably do something more useful with a custom format. Like adding tileset transition information, or marking tile groups as either palettes (take single tiles from these groups) or doodads (paste the group as a whole).

At any rate, in either case, I don't think a full template file is necessary, nor desirable. Though I do understand the desire to just get something working for now.


Edit: I'm thinking the CreateVolume function can be made static instead of virtual. That may allow you to create archives without instantiating an object first.
Title: Re: OP2Archive Application Development
Post by: Hooman on August 23, 2017, 12:58:56 AM
Came across this:
filename or fileName (https://stackoverflow.com/questions/742115/parameter-naming-filename-or-filename)

The argument seems to be whether you consider "filename" to be one word or two. I've gone both ways on this in the past, though I tend to stick with the two word convention now.

The two word convention seems to match the Microsoft convention listed in the answer, and leads to consistent casing of similar items: fileName, fileSize

A counter argument is that some people consider "file name" and "filename" to have subtly different meanings.

Here's a question: Would you write "directoryname" as one word? How about "foldername"?
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 25, 2017, 04:39:00 AM
I have the CREATE command working well now. The REMOVE command is now written but untested. I'll probably test the REMOVE command and then write the ADD command after that.

For REMOVE and ADD, I'm creating a temp folder labeled OP2ArchiveTemp-RANDOMNUMBER. Then temp storing the files in this folder while deleting the existing archive and creating the new archive with updated files. I'm using cstdlib with the rand function seeded to the current system time to create a random number. Then the temp directory is deleted.

Quote
The main menu music is a separate wav file. It's not packed in the CLM.

I feel a little dumb for not knowing that. So yeah, the CLM pack is working fine.



I actually own one of the books mentioned in the Stack Overflow discussion on filename/FileName, and I agree with the book's advice where it specifically discusses compound words. It does matter if one considers filename one word.

Cwalina, Krzysztof. Framework Design Guidelines: Conventions, Idioms, and Patterns for Reusable .NET Libraries (2nd Edition) (Microsoft Windows Development Series) (Kindle Locations 1294-1301). Pearson Education. Kindle Edition.

Quote
3.1.3 Capitalizing Compound Words and Common Terms

Most compound terms are treated as single words for purposes of capitalization.   DO NOT capitalize each word in so-called closed-form compound words. These are compound words written as a single word, such as endpoint. For the purpose of casing guidelines, treat a closed-form compound word as a single word. Use a current dictionary to determine if a compound word is written in closed form. Table 3-2 shows capitalization for some of the most commonly used compound words and common terms.

It is a Microsoft/.Net heavy response, but this book is actually why I use filename over fileName. It is pretty much my goto for formatting my code variable and method names and other conventions in C#. The book is a pretty good read for trying to create a coherent large library of code. (Not that you would know it reading a lot of my code... Although that is exacerbated by my lack of experience in C++ and learning on the fly.)

Quote
filename
[fahyl-neym]

noun
1. an identifying name given to an electronically stored computer file, conforming to limitations imposed by the operating system, as in length or restricted choice of characters.

Based on the Random House Dictionary, © Random House, Inc. 2017.

In common English I would write "update the filename to read XYZ." I would not write "check the filesize." I also would also not write "change the foldername."

Ultimately though, it is more important to be consistent, so I'm ok switching all my code over to fileName if you much prefer it.

On a similar topic, I noticed you are capitalizing the first letter of your function names in the archive code. I was starting to use lowercase for the first letter to match the standard library. But now that I think about it, I'm refusing to use underscores everywhere, so maybe emulating the C++ standard library is actually a bad idea. I'm leaning towards going back and changing them all to uppercase first letter to match your archive code... I like the idea of consistency with the standard library, but maybe it is just too out of date with naming conventions to be worth emulating?

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on August 25, 2017, 06:51:32 AM
goto  =>  goTo   ;)


I picked up the PascalCase (or UpperCamelCase) from Microsoft. That's how the Windows API functions are written. Some sources use camelCase (or lowerCamelCase) for function names. I'm ok with either. Just so long as it's not snake_case.

Indeed, I would say a lot of the C standard library naming conventions are terrible, and not worth emulating. Particularly the excssv abbrvtns.  ;)


Quote
Quote
The main menu music is a separate wav file. It's not packed in the CLM.

I feel a little dumb for not knowing that. So yeah, the CLM pack is working fine.
I probably didn't realize that for years.


I've been tempted to work on this stuff a number of times. Though each time I open one of the projects, I quickly realize I have no way to compile or test small changes. Changing it to compile on Linux doesn't seem like such a small task either. To counter this, I've recently setup a Windows VM with a copy of Visual Studio. (That also wasn't a small easy task). Maybe I'll get to something soon.
Title: Re: OP2Archive Application Development
Post by: Vagabond on August 25, 2017, 11:24:26 PM
Glad to hear you have the vm working. Looking forward to any changes you make to the archive library or the rest of the source code. Feel free to break it as much as you want.

I will be mostly out of pocket this weekend and the following week, but should be able to jump back in next weekend.

Did a little work today troubleshooting the REMOVE command. I can confirm it is pretty broken. I'll have to wait to get it working though. :) Will post code to repo shortly. Everything compiles. REMOVE and some parts of EXTRACT just don't work properly.

-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on September 02, 2017, 03:07:47 AM
I was working on the REMOVE Command. In the process, I stumbled on the fact that the syntax

Code: [Select]
for each ( auto i in v ) 
is actually Microsoft specific.

I was using it because the platform agnostic call is a bit awkward looking and not as intuitive:

Code: [Select]
for ( auto i : myints )
Anyways, it is an easy switch to move everything over to using the second approach. I'll try to get it switched over on Saturday before the pair programming.

When using the REMOVE command, an error is thrown when trying to delete the original file. I ran out of time for troubleshooting today though.

Looking forward to getting this project wrapped up.  :-\

-Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on September 03, 2017, 12:33:46 PM
Dafuq does Microsoft keep doing stuff like that? I hate when they add their own brand of cuteness into the language. It's almost like they want code to break.

Had I seen that line I would have questioned it immediately as it most definitely doesn't conform to C++11 standards.  :-[
Title: Re: OP2Archive Application Development
Post by: Vagabond on September 03, 2017, 10:21:52 PM
Unfortunately, the pair programming wasn't possible due to some internet bandwidth issues. However, we did manage to discuss some issues over IRC and make some progress.

A small change was made to the clmFile class to allow its destructor to work when clmFile is being referenced by its base class, ArchiveFile.

The REMOVE command is now complete and tested. Next up is the ADD command.

Hooman spent some time removing the Windows.h code from the ClmFile, although there is still more to go.

Leeor managed to port OP2Archive and OP2Utility to GitHub. You can find them here: https://github.com/OutpostUniverse.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on September 06, 2017, 12:51:39 AM
Quote
for each ( auto i in v )

I wouldn't have even noticed that. Particularly not if it was buried in a project which I assume already compiles and works. It looks too much like other languages, and language extensions.


As for the virtual destructor thing, I learned a thing or two about C++ these last couple of days when I tried making a destructor pure virtual. It seems pure virtual functions can have implementations, and in the case of destructors, the implementation is required. Further, you can't provide an implementation in the declaration that marks the destructor as pure virtual. You can however implement it later in the header after the close of the class, and mark it inline to avoid violating the One Definition Rule (ODR) should the header be included by more than one compilation unit. Setting it "= default" for the implementation works just fine, if that's what you want.

Other strange corner cases I found myself exploring, do abstract base classes have virtual function tables? What about classes with inherited virtual methods, which don't override any? And on a related note, both to this problem, and Microsoft creating their own extensions, there is the __declspec(novtable) (https://msdn.microsoft.com/en-us/library/k13k85ky.aspx) extension. All very fascinating topics.

So many years of using C++, yet I'm still surprised by all the corner cases.
Title: Re: OP2Archive Application Development
Post by: Vagabond on September 06, 2017, 01:59:38 AM
Quote
So many years of using C++, yet I'm still surprised by all the corner cases.

I'm learning C++ is not a language that one ever masters all uses and corner cases for... (at least I probably never will).

I once read a book by Douglas Crockford called JavaScript, The Good Parts. In part of it he said something to the effect that a good programmer should learn what each programming language is designed to do and attempt to use each language as designed. They should avoid oddball or corner case features in a language whenever possible. This increases readability for others, reduces the chance of exposing underlying bugs in the framework or language, and reduces the chance of using a feature that will become deprecated in the future. This always stuck with me as being a good practice.



I'm up and running just fine with GitHub. For cases where I'm developing by myself and just committing common changes, it is pretty much the same as my experience with HG and Bitbucket.

I've been cleaning the code a fair amount. The main.cpp function had approached 500 lines. It is now back down to about 170. Each major COMMAND (ADD, REMOVE, EXTRACT, etc) is now separated into a different subclass. Not much reason for the subclasses except encapsulating the code so only the required one or two functions to execute the class is visible to the rest of the project and not all the helper functions.

I've just started the ADD command, but it will need a lot of work. The time I can put in the project in the coming week may be fairly erratic.



Hooman, if you are planning to significantly alter the way the class ArchiveFile works, please let me know so I can plan accordingly. I'm thinking that you are just pulling out the Windows specific portions from VolFile, ClmFile, and MemoryMapFile and not changing how it works externally but we haven't really directly stated this. Not a big deal if you are changing, it just may mean that I should wait some before really polishing my portion if a lot of changes will need to be made.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on September 06, 2017, 07:54:34 AM
Hmm, good point. I was planning mostly implementation changes, with hopefully little impact on the interface. At least that's how I see changes to remove dependence on windows.h. There are other areas I might like to update though.

One thing on my mind is the idea of using string and vector rather than char* and array. That could lead to some interface changes. I suspect the impact on your code would be minimal, since you're already using string and vector. Just remove the conversions before the calls.

I also had thoughts of reworking the class hierarchy to remove multiple inheritance. Perhaps make some methods static. It seems you've been working around problems related to that, so it could simplify some of your code. I'm not really looking at this idea yet though.



I took a look at your updates. Seems you've done a lot to simplify the command line argument processing.

I noticed in selectCommand, you have a bit long switch statement. I think that can be simplified. A switch statement is basically a jump through a jump table, and in your example each jump destination is basically just a call to some method. This is similar to virtual function dispatch, which is a call through a table of pointers to functions. You could potentially replace the switch with a call that uses virtual function dispatch. The result should be more compact and easier to maintain. It gets rid of the tedious process of adding to the switch statement when you add new commands. I hardly ever use switch anymore, now that I understand virtual functions.

To make it work, define a common base class, perhaps named ConsoleCommand. It would have derived classes List, Find, Extract, Create, Help, Add, Remove. The base class would contain a pure virtual method Execute(const ConsoleArgs& consoleArgs). The common Execute name for each command means your cases all collapse down to the same call, which is dispatched virtually based on the class type of the underlying command.
Code: [Select]
command->Execute(consoleArgs);

The part I haven't addressed yet, is how command is set to an object of the appropriate type. There are many possible solutions there. I'll take a further look at your code before suggesting something there. It seems you're mapping commands to an enum index. Where that is done could be an appropriate tie in point.
Title: Re: OP2Archive Application Development
Post by: Vagabond on September 07, 2017, 03:10:54 AM
Quote
One thing on my mind is the idea of using string and vector rather than char* and array. That could lead to some interface changes. I suspect the impact on your code would be minimal, since you're already using string and vector. Just remove the conversions before the calls.

I also had thoughts of reworking the class hierarchy to remove multiple inheritance. Perhaps make some methods static. It seems you've been working around problems related to that, so it could simplify some of your code. I'm not really looking at this idea yet though.

Switching to std::string and or vector would be an easy fix on my side. Adding static create functions could simply my CREATE, ADD, and REMOVE commands by not requiring an initial .VOL and .CLM template file, which would be nice.

However, I've found the archive code pretty stable and easy to work with, so I'm fairly happy if it just ends up becoming platform agnostic without the rest.

I'll take a look at command->Execute() in the near future.

You are correct about mapping the COMMAND into an enum. This is done in ConsoleArgumentParser.h/.cpp. It is supposed to take the char** command line input and put it into an easier to understand structure (ConsoleArgs). Since ConsoleArgs is used by all the available commands, there is still some messiness associated outside of ConsoleArgumentParser. For example, how the member variable paths is used will depend on which COMMAND is being used. In CREATE, only the first path should by a .vol/.clm where in EXTRACT several .vol/.clm paths could be provided. Also, some optional arguments do not apply to all possible commands. Like -Q / --Quiet and -O / --Overwrite doesn't do anything for the LIST command.

I could pass more customized structures out of ConsoleArgumentParser that exactly lined up with each command, but this would force me to have multiple custom structures instead of using one general structure. The generalized structure seems to be generally working fine for now, but I wonder if it might be a little clearer to make the specialized structures (sorry for the bad pun).

I could not encapsulate ConsoleArgumentParser.h/.cpp into a class. I'm using the #include <functional> to allocate a delegate function based on which command argument is being parsed. When I pushed the code into a class, it seemed to always break the struct containing the functional delegate. (I call them delegates in C#, I don't think they are 100% the same in C++ though). Anyways, I just encapsulated ConsoleArgumentParser.h/.cpp in a namespace instead and everything is working. This is something I'd like to return to and clean up at some point though.



I finished a working draft of the ADD command. While dealing with some memory management in the ADD command class, the purpose of the RAII paradigm suddenly clicked. Basically, I needed to ensure that 2 objects were destroyed whether an exception is thrown or not. So I abstracted out their destruction into a separate function. Then I placed the function call cleanup() in the catch block and re-throw the exception and place the function call at the end of the task for when no exception occurred.

Then it occurred to me that I should probably be destroying the object in the class's destructor and not be adding all the random cleanup() function calls. Unfortunately I designed the class to not have any class level member variables, so RAII doesn't really work yet. But it might make sense to move the pointers to class level variables to leverage RAII. I'll have to consider the design implications.

Hopefully that makes some sense. If not, I'm still learning and probably am not explaining things very well. :)

-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on September 10, 2017, 11:42:27 PM
Another quick update.

I have all the commands working as advertised except for the EXTRACT command which needs a couple of tweaks. A couple of tasks left to finish besides refactoring EXTRACT is pushing in use of unique_ptr over the raw pointer, building up the post build event, and finalizing the Usage Statement/ReadMe. I also have some more general cleaning of the code that I'd like to accomplish.

-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on September 21, 2017, 02:55:14 PM
Hey everyone, we are on the home stretch for completing OP2Archive for Windows.

I finally rewrote the EXTRACT command code into something that makes reasonable sense and is cleaner to understand.

I finished a post build event for RELEASE compilation mode that automatically creates a zip file containing the executable, 2 template archive files, and the readme with the version number included in the zip name.

I created a draft page on the wiki to post OP2Archive at. It will still need a bit more work.

I completed another pass on ReadMe.txt and finalized the command list. I separated out information on how to compile and distribute the software into a separate file called CompileInstructions.txt. I did this because the average user isn't interested in compiling, but just how to use the application. ReadMe.txt points to CompileInstructions.txt for anyone interested in examining the source code.

If someone could read through the ReadMe.txt and CompileInstructions.txt below and let me know if they see any grammatical errors or omissions, it would be helpful. The first dozen lines are the usage statement that will be displayed in the command prompt if you type -h/-?, so let me know if it seems like a reasonable usage statement as well.

Next up is final refactoring and testing of the code.

-Brett



ReadMe.txt
OP2Archive - Outpost 2 Archive Access and Maintenance
Developed by Hooman and Brett208 (Vagabond)

Allows examining, creating, and extracting files from Outpost 2 .vol and .clm archives.

+++ COMMANDS +++
  * OP2Archive LIST (archivename.(vol|clm) | directory)...
    * Lists the contents and size of all files contained in provided archives.

  * OP2Archive FIND filename...
    * Determines which archive contains the file. Stops searching at first instance.
    * Pulls archives from the directory appended to the provided filename to search.

  * OP2Archive CREATE archivename.(vol|clm) [filename | directory]... [-q] [-o]
    * If no filename(s) or directory(s) provided,
      archives all contents of the default source directory (./archiveFilename).

  * OP2Archive EXTRACT archivename.(vol|clm) [filename]... [-q] [-d destDirectory] [-o]
    * If no filename(s) provided, extracts entire contents of archive.

  * OP2Archive EXTRACT directoryname [-q] [-d destDirectory] [-o].
    * Extracts entire contents of all archives in the provided directory.

  * OP2Archive ADD archiveName.[vol|clm] filename... [-q] [-o]

  * OP2Archive REMOVE archiveName.[vol|clm] filename... [-q] [-o]

+++ OPTIONAL ARGUMENTS +++
  -H / --Help / -?: Displays help information.
  -Q / --Quiet: [Default false] Prevents application from issuing console messages.
  -O / --Overwrite: [Default false] Allows application to overwrite existing files.
  -D / --DestinationDirectory: [Default is './']. Sets the destination directory for extracted file(s).
  -S / --SourceDirectory: CREATE: [Deafault is archive's filename]. Sets the source directory when creating an archive.

For more information about Outpost 2 visit the Outpost Universe (http://outpost2.net/).


+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +


+++ INSTALL DIRECTIONS +++

Unzip OP2Archive.exe using WINZIP compatible decompression algorithm. The typical place to put OP2Archive.exe is in the root OP2 install directory for ease of use, but this is not required.

Open a command prompt. Navigate to the directory containing OP2Archive and type 'OP2Archive -h' to see Usage Message.


+++ ARCHIVE (.VOL/.CLM) FILE CREATION BEST PRACTICES +++

If you wish to overwrite a file currently in an archive file, use the ADD command and include the optional argument -O / --Overwrite to allow overwriting the original file.

The ADD and REMOVE command will create a new temp directory with the name ./OP2ArchiveTemp-(RANDOM INT) to store the contents of the archive file while rebuilding it. The ADD and REMOVE command will also eventually delete the original archive file before rebuilding it. If certain fatal exceptions occur during this process such as a power loss, the original archive file may be lost and/or the temp directory may not be deleted. If you are performing ADD and REMOVE commands on a heavily modifyied archive file, consider keeping a backup of the file somewhere. In case of a fatal error in the critical steps for the ADD and REMOVE coomands, you should be able to just recover the archive file from the Outpost Universe website by redownloading the game.


+++ CLM FILES (AUDIO STORAGE) +++

Outpost 2 stores all music tracks except for the track that plays on the main menu in the archive file op2.clm.

If you wish to change out the sound tracks in Outpost 2, you must use specific settings and use the WAV file format.

As a starting point for manipulating audio tracks for Outpost 2, consider starting with the free program Audacity (http://www.audacityteam.org/).

When naming audio files for storage in a CLM archive, the filename will be clipped down to 8 characters. Without siginificant modifications to the Outpost 2 application, you must use the names of the audio tracks provided with the stock download of the game to get modified or new tracks to play in game. The game will then select the music tracks as it sees appropriate based on name. So, the Eden tracks will play for Eden missions, etc.

Outpost 2 audio tracks must be formatted as WAV files with the following settings:
 * Frequency = 22050 Hz
 * Channels = 1 (Mono)
 * BitsPerSample = 16

The file size of each WAV audio track (the actual wav data) must be a multiple of 32768 bytes (0x8000). Output 2 fills the music buffer in chunks that large, and doesn't zero fill if a chunk (at the end of a file) is too short. If the file is not a multiple of 32768 bytes (0x8000), some garbage data will be played in Outpost 2 before switches to the next track. The audio data must be zero padded to bring it up to the right size before adding to the .clm file.


+++ COMPRESSION +++

Outpost 2 contains references to 3 types of compression, RLE (Run - Length Encoded), LZ (Lempel - Ziv), and LZH (Lempel - Ziv, with adaptive Huffman encoding).

Only LZH was used in the final release of Outpost 2. Only one archive file was compressed, sheets.vol. In subsequent releases of Outpost 2 by the Outpost Universe community, sheets.vol was decompressed and included in the game download in uncompressed format.

OP2Archive is capable of reading and decompressing archives using LZH compression. However, it currently cannot CREATE archives or modify via the ADD or REMOVE file an archive using LZH compression.


+++ SOURCE CODE LOCATION AND COMPILIATION +++

Source code may be found at: https://github.com/OutpostUniverse/OP2Archive. See the file CompileInstructions.txt in the source code for specific instructions.


+++ CHANGE LOG +++

Ver 1.0.0 (XXSep2017)
 * Initial Release




CompileInstructions.txt
OP2Archive Compilation Instructions
Developed by Hooman and Brett208 (Vagabond)

If you wish to edit and compile OP2Archive's source code, read this document first. See ReadMe for basic description of project, change log, and instructions on use once compiled.

Source code may be found at: https://github.com/OutpostUniverse/OP2Archive and https://github.com/OutpostUniverse/OP2Utility.

OP2Archive is written in C++ and the solution/project files are built for Visual Studio 2017. C++11 and C++14 features are used. The currently expiremental C++ standard library filesystem is used and is supported by both VS2017 and GCC.

OP2Archive depends on the project OP2Utility. OP2Utility requires a Windows machine to compile due to use of windows.h. In particular, the archive read/write code depends on Windows.h. There is some desire to remove the Windows specific code, but the efforts are not yet completed. The project must be compiled for x86.

+++ RELEASE COMPILATION INSTRUCTIONS +++

Post Build Event Notes: When in RELEASE mode, a command prompt script will run once the source code is compiled. If compiling for test purposes, compile in DEBUG mode to prevent the script from running. This script automates staging all required files for distribution. Included in the source code is 7za.exe, the 7-zip command line tool that facilitates zipping the final release package. This is the Windows version of 7 zip and is not compatible with Linux. See http://www.7-zip.org/ for source files and https://www.dotnetperls.com/7-zip-examples for examples of use.

 1. If changing Major/Minor revision number, set new version number at top of OP2Archive main.cpp AND in post build event batch script.
 2. Run Commit and then push Updates to master repository.
 3. Set Solution Configuration to Release.
 4. Set Solution Platform to x86. (x64 is not currently supported due to some WINAPI code within OP2Utility VolDecompress).
 5. Compile Code.
 6. The following files will automatically be copied into the zipped directory'OP2MapImager 1.0.XXXX' (XXXX is svn revision number):
    * OP2Archive.exe (From Release Directory)
   * VolTemplate.vol
   * ClmTemplate.clm
   * ReadMe.txt (this file)
 7. Place zip file on the Outpost Universe Website.
Title: Re: OP2Archive Application Development
Post by: Hooman on September 29, 2017, 01:41:35 AM
Hey, good to see this released.

In the ReadMe.txt, I found this one sentence a bit unclear:
Quote
    * Pulls archives from the directory appended to the provided filename to search.

As for the credits, you should probably put your name first on this one.
Title: Re: OP2Archive Application Development
Post by: Vagabond on October 01, 2017, 03:25:56 AM
I just finished testing the different modes of OP2Archive. It seems to be working properly. I tested it both using the current directory and manipulating files on relative paths.

Several bugs were found and squashed and the code further refactored.

I'm starting to see the value of Unit Tests! Every time I make a change to OP2Utility, it affects both OP2MapImager and OP2Archive, both of which are now mature, stable projects (although small). It would nice to just open each project and push a button to make sure the change didn't blow something up in either project!

There is definitely room for improvements in the source code, and introduction of Git submodules (I think???) for more proper linking of OP2Utility. For now though, I want to call it done besides fixing bugs if anyone finds them while using it.

Thanks for the help from Hooman, Arklon, Leeor, and sirbomber answering my random Outpost 2 or C++ questions working through this project and other projects!

I'll push the actual release in a separate thread and host it on a Wiki page.

Quote
In the ReadMe.txt, I found this one sentence a bit unclear:

Yeah, that is pretty bad wording. I reworded it in the release usage statement to try and make it more clear. If people are using the FIND command and would prefer the directory to search for archives in be an argument/switch instead of appended to the filename being searched for, we could try that instead.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on October 01, 2017, 03:37:30 AM
We should totally look into both Git submodules and unit tests. I think the Git submodule thing should be easy enough. Unit tests could take a bit of time to write a meaningful number of. Lately I've often been finding myself curious about the Google unit test library that was mentioned here a while back.
Title: Re: OP2Archive Application Development
Post by: Vagabond on October 03, 2017, 04:14:50 AM
I'm getting hung up on a couple of bugs during the final testing. I've fixed one myself, but there was another related to exceptions that I wanted to mention.

The ClmFile constructor contains a couple of throws for exceptions when trying to access and parse a clm file. These throws are followed by a char* statement only and not an exception class. Like this:

Code: [Select]
ClmFile::ClmFile(const char *fileName) : ArchiveFile(fileName)
{
... code to open a file handle ...

if (m_FileHandle == INVALID_HANDLE_VALUE)
throw "Could not open file";

if (ReadHeader() == false)
throw "Invalid header";

... more code ...
}

When these exceptions are triggered, even though I wrapped the call to this function in a try catch statement, it blows up the application.

Below I rewrote the function to throw an actual exception as opposed to just a char*. This now makes it behave properly and allows my try/catch to catch the exception and spit out a message to the user about the malformed file.

Code: [Select]
ClmFile::ClmFile(const char *fileName) : ArchiveFile(fileName)
{
... code to open a file handle ...

if (m_FileHandle == INVALID_HANDLE_VALUE)
throw exception("Could not open file");

if (ReadHeader() == false)
throw exception("Invalid header");

... more code ...
}

Since C++ allowed the code to compile and operate by throwing the char* without including the exception class, I assume there must be a time/place to do this?

So, when would I want to throw char* instead of an exception? Is it fine with everyone if I just throw an exception instead of the char* so that I can properly catch the exception?

VolFile.cpp and possibly other code in the archive project have similar throw statements. I would probably change them all out if this sounds agreeable.

Someday I'll stop procrastinating on the release of this project, I promise. :)

Thanks,
Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on October 03, 2017, 06:11:00 AM
Someday I'll stop procrastinating on the release of this project, I promise. :)

Ahahahaha! Now there's something every programmer can relate to  ;)


Yes, you can make this change, and yes it would be a good idea. Though I would recommend using std::runtime_error instead of std::exception. The reason being the standard does not define a constructor which takes a string for std::exception. This is a Microsoft extension. Code that tries to pass a string to an std::exception constructor will fail on Linux. Note that std::runtime_error derives from std::exception, so you can still catch by a reference to an std::exception.

As for why the code was written the way it was. Simply, I was fairly new to using exceptions at the time. In C++ you can throw and catch anything you want. They don't have to be derived from std::exception. Though if you throw something that is not derived from std::exception, you can not catch by reference to an std::exception object. Hence why you're likely not catching the exception. If instead you'd caught a char*, it would have worked.

Note that a try block can have multiple catch blocks, each with a different data type that is caught. You could have one catch block that catches char*, while the next catch block catches a std::exception&. Ideally though, exceptions should be derived from std::exception, and not just some arbitrary datatype.


Throw runtime_error everywhere and it should be good.
Title: Re: OP2Archive Application Development
Post by: leeor_net on October 03, 2017, 07:12:37 PM
I very seriously doubt a catch(char*) will work. Probably need a catch(const char*) but even then I suspect there's an implicit conversion going on. You should either catch(const std::string&) OR throw std::runtime_error(const std::string& what).

Most of the time I use std::runtime_error but in some cases it makes sense to derive 'specialized' exception objects from runtime_error:

Code: [Select]
class SomeExceptionClass : public std::runtime_error
{
public:
    SomeExceptionClass() : std::runtime_error("Message Here") {}
    virtual ~SomeExceptionClass() {}
};

I use this in NAS2D for very specific errors that the user can recover from. For example:

Code: [Select]
class filesystem_already_initialized;
class font_bad_data;
class image_bad_copy;
class mixer_backend_init_failure;
class renderer_opengl_context_failure;

And etc. This allows the program to catch a specific error or provide a generic std::runtime_error& handler:

Code: [Select]
NAS2D::Image image1("path/to/image.png");

try
{
    NAS2D::Image image2 = image1;

    NAS2D::Font font("path/to/font.ttf");
    /* exception of type font_bad_data thrown here */
}
catch(const image_bad_copy& e)
{
    /* Some sort of exception handling in the case of image_bad_copy */
    std::cout << "Failed to copy image: " << e.what() << std::endl;
    /* Possibly set up a default image of some sort here to visually indicate an error */
}
catch(const std::runtime_error& e)
{
    /* No handler for font_bad_data, so catch by reference to base type as a generic handler */
    std::cout << "Some other runtime error that doesn't require exploding: " << e.what() << std::endl;
}

Note that the above code doesn't attempt to catch possible other exception types, e.g., a std::bad_alloc. In a case like a failed memory allocation it would be appropriate in many cases to allow the program to unwind back to the original try block and possibly to just exit alltogether.
Title: Re: OP2Archive Application Development
Post by: Hooman on October 03, 2017, 08:24:26 PM
You're quite right, it would need to be catch(const char*). I suspect this is a somewhat recent change in the language where embedded text strings no longer implicitly convert to char* and you're forced to use const char* instead. That means my old code is broken now, since it was indeed catching char*.

Not too sure what you meant by suspecting there was an implicit conversion going on. Should be no conversion for const char*, since an embedded text string will be of type const char*. I tried catching with a string& and a const string&, but neither would work.

Example:
Code: [Select]
#include <iostream>
#include <string>

using namespace std;

int main()
{
  try {
    throw "A text string";
  }
  catch(char *) {
    cout << "char*" << endl;
  }
  catch(const char*) {
    cout << "const char*" << endl;
  }
  catch(const string&) {
    cout << "string&" << endl;
  }

  return 0;
}

Output:
Code: [Select]
const char*

If I remove the catch(const char*) block, leaving the others, the output is:
Code: [Select]
Aborted (core dumped)


Good point on the double catch design. That is useful to handle error of different severity, or that require different logging output.
Title: Re: OP2Archive Application Development
Post by: Vagabond on October 04, 2017, 02:42:19 AM
Leeor and Hooman,

Thanks for the input. I went ahead and switched all my throws from exception to runtime_error or logic_error or invalid_argument as I saw fit. I also included <stdexcept> in all the .cpp files that throw exceptions. Turns out I hadn't included <stdexcept> or <exception> in any of the .cpp files, but probably dragged them into my code by including <string> and <vector> everywhere.

I also appended runtime_error to all the throws in the archive code that was throwing char*.

Interesting to know that you can throw anything you want. It didn't occur to me to do something like this though.



On another note, I've been thinking more about namespaces. Just like I brought in exceptions to my code without explicity including them, the same can happen with namespaces.

Code: [Select]
using namespace std;

I've been putting this in a lot of my header files to keep from typing std:: everywhere in my code just to use a string.

I'm realizing this forces everyone else to merge the entire std library in their code if they ever include one of my headers. Not good.

I'm going to move all using statements out of header files and below any includes. This way it is a private choice of the .cpp file and doesn't force the std namespace on anyone else.

I think moving using statements out of header files and below all includes is universally seen as a right practice. There seems to be a lot of debate whether it is okay to bring std into the global namespace of a .cpp file. One camp says it could cause future naming collisions when the std library is expanded or you add another include to your code. The other camp says it makes the code in the .cpp file harder to read and bloats your code by typing std:: everyone.

I haven't tried it yet, but I would hope the compiler would flag name collisions for me so I can sort out which function I mean to call by explicitly tagging it with its namespace.

Either way I think they both have reasonable points. For now I'm going to keep the using statements in the .cpp files for clarity and ease of typing. The important thing is for them to be out of the header files so I don't force the std library into everyone else's global namespace.

Technical bulletin #435 dealing with the oddities of C++, understanding the implications of include and using statements in modern C++ code.

So one more thing to work through in OP2Archive before it is released. I'm also learning more how TortoiseGit handles submodules, but not enough to write intelligently about it and probably not something I want to pollute this thread with.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on October 04, 2017, 01:12:25 PM
Ahh, very good call on the using namespace std; thing.

I used to really hate typing std:: everywhere, so I always used using. It does have issues in header files though. I've also come across a few articles that suggested never using using. One mentioned people being bothered by it at first, but quickly getting used to it, and eventually liking the more verbose and precise code. I've been giving it a try. I'm now at a point where the extra typing bothers me much less. I still have a lot of forgetful moments that the compiler catches for me. I figure I'll continue with this style of not using using, and instead typing std:: everywhere.

So, any other users still musing about using using for the sake of the users using your using-free code, or is your using-free code still free of users, leaving you usually unused to using criticism?  :D  :P


Will be looking forward to anything you write about Git submodules. I still need to learn that better. I've seen command line example that largely make sense now, but I don't know how Git tracks it all under the hood, or how to do these operations with TortoiseGit.
Title: Re: OP2Archive Application Development
Post by: leeor_net on October 04, 2017, 05:19:15 PM
I tend to go with explicitly stating std::whatever in headers and using namespace std; in definitions (e.g., cpp codes) if it's a giant file. Lately I've been finding I prefer to explicitly state where things are being pulled from -- sure it's 'more typing' but in the event you have a library that pollutes the namespace you could potentially end up with annoying errors related to ambiguity. Explicitly naming the namespace avoids this though ultimately it's a preference.

I think it's safe to say that best practices require explicitly naming namespaces in headers and allowing the end user to decide if they want to use a using directive.
Title: Re: OP2Archive Application Development
Post by: Hooman on October 05, 2017, 01:36:50 AM
I'd also like to point out that typing speed is never the limiting factor for programming speed. Plus with IDEs and autocomplete, you can get away withReallyReallyLongIdentifiersSuchAsInLanguagesLikeJava, and it won't slow you down very much. Just hope you have a widescreen monitor. :P

There are some interesting stats thrown about concerning the average number of lines of code a developer writes in a day. Typical figures are about 10 lines. Granted, people can write a few thousand lines of code in a day, but then might refactor, and remove a few hundred to a thousand lines the next day. Then they might spend a few days thinking about a problem, perhaps reading and testing, or reviewing other people's code, and not actually write any of their own. They might also need to fix a bug in someone else's code, which requires significant amounts of time to read and understand it before they can fix it. When all is said and done, the actually number of lines of code added per developer per day is typically much lower than most people assume. It's also a horrible metric for developer productivity, for many of the reasons just stated.

Quote from: E.W. Dijkstra
My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger.

All this is to say, a little extra typing probably won't affect productivity too much. At least not if it doesn't add to mental complexity, or some kind of anguish over working on the problem that makes you want to not do it.
Title: Re: OP2Archive Application Development
Post by: Vagabond on October 08, 2017, 12:48:21 AM
I will be releasing version 1.0.0 shortly. The recent fixes include:

* Listing the name of the file when a .clm or .vol file fails to load.
* Fixed bug not allowing -D / --DestDirectory to work properly
* Removed all namespace using statements from header files.
* Kept EXTRACT command from overwriting files unless given permission with the -O / --Overwrite switch.
* Removed -S / --SourceDirectory switch from the ReadMe.txt since it was removed from the application.

See: https://forum.outpost2.net/index.php/topic,6032.0.html

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on October 08, 2017, 04:34:17 AM
Awesome job
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 04, 2018, 07:52:33 AM
I spent some time on OP2Archive recently.

Now when you attempt to upload a filename that contains a non ASCII character (above decimal 127), an error is thrown and displayed to the user. This is to protect Outpost 2's alphabetical sorting algorithm for archive files that does not include support for things like characters with accents.

I'm using the function below to check if a non ASCII character is present. I'm not thrilled with the solution. Since characters above 127 come back as a negative number, I'm checking both below 0 and above 127. I'm wondering if there are holes to approaching the problem this way. If anyone has any suggestions, I would be will to implement them instead. Since the sort is isolated in this function, it will be easy to change out in the future if bugs are discovered:

Code: [Select]
#include <algorithm>

bool StringHelper::ContainsNonAsciiChars(string str)
{
return std::any_of(str.begin(), str.end(), [](char i) { return (i < 0 || i > 127); });
}

I also opened a new branch on OP2Utility. It separates each class inside Archive into a discrete file of source code. Previously, we had ArchiveUnpacker, ArchivePacker, MemoryMappedFile, and ArchvieFile all defined in the same .h/.cpp file. I found this pretty confusing. I opened the branch so that Hooman could review my changes to his code before affecting the master.

Plan going forward:


Code: [Select]
virtual bool CreateVolume(const char *volumeFileName, int numFilesToPack, const char **filesToPack, const char **internalNames) = 0;


I'm planning to implement these functions using modern C++ practices within the Archive code. I've already been sprinkling in things like std:unique_ptr into the Archive code. Hooman, if this bothers you, please let me know and I'll stop. I'll do all this work in a separate branch for merging once everyone is happy with it.

Thanks,
Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 04, 2018, 08:38:34 PM
...

I'm using the function below to check if a non ASCII character is present. I'm not thrilled with the solution. Since characters above 127 come back as a negative number, I'm checking both below 0 and above 127. I'm wondering if there are holes to approaching the problem this way. If anyone has any suggestions, I would be will to implement them instead.

...

Use an unsigned char instead. Ascii characters aren't defined in negatives so it's odd to look at values less than zero.

Code: [Select]
/* Note explicit cast, quiets static code analyzers */
return std::any_of(str.begin(), str.end(), [](unsigned char i) { return (static_cast<unsigned short>(i) > 127); });
Title: Re: OP2Archive Application Development
Post by: Hooman on April 05, 2018, 05:11:58 AM
Yeah, C++ standard doesn't specify if char is signed or unsigned. It's compiler dependent. If you want to ensure it's unsigned, then specify unsigned.


I'll try to take a look at the new branch over the next couple of days. Though I am generally in favor of the changes you're proposing.


One question though, are the high ASCII values actually causing demonstrable problems? I don't believe the game tries to treat high ASCII values in any kind of special way, so I would assume ignoring them would produce the most consistent results. Is this perhaps an encoding issue, where some method assumes an unexpected (non-ASCII) encoding?
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 07, 2018, 12:40:53 AM
Quote
Use an unsigned char instead. Ascii characters aren't defined in negatives so it's odd to look at values less than zero.

Thanks Leeor. I went ahead an updated my function as suggested. I used this opportunity to try out the git cherry-pick command to push it to both branches.


Quote
One question though, are the high ASCII values actually causing demonstrable problems? I don't believe the game tries to treat high ASCII values in any kind of special way, so I would assume ignoring them would produce the most consistent results. Is this perhaps an encoding issue, where some method assumes an unexpected (non-ASCII) encoding?

Hooman, to be more clear on what is going on:

1. The alphabetical sort command I created for the filenames is sorting ASCII characters above decimal 127 incorrectly. So Windows Explorer will sort them differently than my sort alogrithim.

2. Based on your comment below from the OP2Archive release thread, I feel like it would allow for undefined behavior in Outpost 2 if these characters are used in filenames. Especially since I am not verifying my sort algorithm exactly matches the one used by Outpost2.exe.

Quote
From my brief debugging session, it didn't look like Outpost 2 tried to do anything special with accented characters. The code path that was actually used only took special consideration for the 26 uppercase characters.

I don't want to expend effort testing Outpost 2's internal binary sort to see if it fails or not for the edge case of someone using a character like an accented e, so I figured blocking the behavior and providing the user an error message was a better idea than allowing possible undefined behavior.

I'm itching to get started on this, so soon I'd like to merge the branch that separated out the classes into discrete files and create the development branch for the actual changes.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on April 08, 2018, 01:11:56 AM
Quote
1. The alphabetical sort command I created for the filenames is sorting ASCII characters above decimal 127 incorrectly. So Windows Explorer will sort them differently than my sort algorithm.

I wouldn't call that incorrect. There are many different types of sorts, with different purposes. Sort rules can get quite complex, but without a reason for extra rules, it just slows the sort down with no real gain. The purpose of the sort was to implement a binary search for efficiency purposes, not to sort in a nice order that would appeal to people. It's done purely for machine processing reasons, with no user visible display of the sort order.

Quote
2. Based on your comment below from the OP2Archive release thread, I feel like it would allow for undefined behavior in Outpost 2 if these characters are used in filenames. Especially since I am not verifying my sort algorithm exactly matches the one used by Outpost2.exe.

You may have a point here about undefined behaviour. However, by making it check for high ASCII characters, you're making the algorithm diverge from the Outpost 2 algorithm, which chose to ignore that aspect. As this is a machine processing step, it's probably wise to have it match as close as possible.

Since it is unusual to use high ASCII characters, and encoding considerations were likely not accounted for, it may make sense to warn the user of a potential problem, though I really feel this should be a warning rather than an error. Besides, how can someone test the behaviour of Outpost 2's algorithm if you program refuses to create the test data.  ;)
Title: Re: OP2Archive Application Development
Post by: Hooman on April 08, 2018, 02:44:17 AM
Ok, I've had a chance to look at your branch that splits classes into separate files. Looks fine. You basically did the obvious thing.

One suggestion I have is to squash the 3 main commits into 1 commit. The 2 minor commits look like bug fixes for the larger commit, and hence the larger commit seems to be broken. By squashing, you reduce the noise, and ensure all commits (the 1 remaining commit in this case) works.

You can look up "interactive rebase" in the Git documentation to figure out how to squash commits. This falls into the realm of rewriting history. Whenever possible, it's good to do this before pushing to the central repo.

Side note: This is also why pushing after every commit is considered an anti-pattern in Git, since you inevitably catch these mistakes sometime after making the commit. By delaying the push, you give yourself a chance to find and fix these errors before publishing the results.

At any rate, things are already pushed to a branch in the public repo. There are two choices at this point. You can squash, and then do a force push (very Darth Vader like), to update the branch on the central repo. This can be a nuisance for other people that have downloaded that branch. Don't worry, I'm sure I can deal with it. Plus the branch gets deleted after merging into master, so meh. The other option, is to do the squash, and merge into master locally, and push the changes on master. The branch can then be deleted without ever having been updated in the central repo, no force push required (sorry my young apprentice).



I noticed you added the sort update to this branch. Is that the part that was making you itch?  ;)

The sort commit isn't really related. It would be better on a different branch. Though I do understand the temptation, as keeping work linear is easier, and you often have changes that depend on other pending changes. In this case, you could have created another branch off of master. If the changes were more dependent on the branch changes, you can create a branch off another branch.

At any rate, if all changes are merged into master using a fast-forward merge, you won't be able to tell where the branches were merged in, or how many, so it won't really matter. That's the case I expect here. If the merge requires a merge commit (a commit with more than 1 parent), then it's a little more obvious, and may stand out as strange to have both changes come from the same branch.



For code review, using a GitHub pull request may be a good option. It's fine to open a pull request from different branches in the same repository. That's a fairly standard way to start a code review. People can then review the changes right from the browser, with color coded diffs, and comments. They also have handy buttons for performing the merge should the code pass review, and to delete the branch after the merge is done.

If you want to give it a try, you can do so now, or after attempting the squash. Totally optional though.
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 09, 2018, 02:23:13 PM
Quote
Since it is unusual to use high ASCII characters, and encoding considerations were likely not accounted for, it may make sense to warn the user of a potential problem, though I really feel this should be a warning rather than an error. Besides, how can someone test the behaviour of Outpost 2's algorithm if you program refuses to create the test data.

Hooman, since you don't agree with aborting the operation in this case, I'll drop integrating this feature into OP2Utility's Archive library code. It will remain in the OP2Archive as it currently exists. Feel free to update OP2Archve to display a warning instead of aborting if you wish.

I researched and used TortoiseGit to perform both the interactive rebase and the force push. First time using --force, but it appears everything worked out right. Not sure how rewriting the public history will affect your copy of the repository. Thanks for the quick tutorial on how to accomplish it.

Quote
The sort commit isn't really related. It would be better on a different branch. Though I do understand the temptation, as keeping work linear is easier, and you often have changes that depend on other pending changes. In this case, you could have created another branch off of master. If the changes were more dependent on the branch changes, you can create a branch off another branch.

I think you are talking about the commit Fix function StringHelper::ContainsNonAsciiChars(std::string str) to use only unsigned chars. I made this commit to the master and then pushed it also to the branch in an effort to keep both branches as closely aligned as possbile.

However, now that I merged the branch, the commit Fix function StringHelper::ContainsNonAsciiChars(std::string str) to use only unsigned chars is listed twice which isn't desirable.



Hooman, how do you feel about changing this function:

Code: [Select]
virtual bool CreateVolume(const char *volumeFileName, int numFilesToPack, const char **filesToPack, const char **internalNames) = 0;

instead to read:

Code: [Select]
virtual bool CreateVolume(std::string volumeFileName, std::vector<std::string> filesToPack) = 0;

I think we both agree on removing const char** internalNames and generating these internal to the OP2Utility Archives code.

Changing filesToPack from const char** to std::vector<std::string> removes the need to pass the int numFilesToPack. I'm working with vectors on my side and it doesn't make sense to me to convert them into a char** if I don't have to. Perhaps I'm missing something though.

Thanks,
Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on April 10, 2018, 06:13:34 AM
Quote
However, now that I merged the branch, the commit Fix function StringHelper::ContainsNonAsciiChars(std::string str) to use only unsigned chars is listed twice which isn't desirable.

Yeah, it's probably not worthwhile to cherry pick a commit between branches that will be merged together. I think cherry pick is more for maintenance branches of old releases. You wouldn't merge new features into an old branch, but a bug fix might appear on both branches.


Side note, you have a static_cast<unsigned short>, which I don't think is needed. I assume that got added during bug hunting/fixing?


The API change is good. The newer C++ features should make it simpler and less error prone to use. Dropping the extra set of file names is a good simplification, especially for default API usage. Ideally, that's all people should have to deal with. There is some slight loss of functionality, in that files can't be renamed as they are packed, but honestly, why would that be done, and how on earth would a user interface support it? I'm fine with losing that ability. The simplification seems like a bigger win.
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 10, 2018, 08:08:13 PM
I'm seeing some pass by value's using std::string -- these should instead be pass by reference to const.
Title: Re: OP2Archive Application Development
Post by: Hooman on April 11, 2018, 03:19:59 AM
Quote
I'm seeing some pass by value's using std::string -- these should instead be pass by reference to const.

Maybe. Maybe not.

Are the Days of Passing Const Std::String as a Parameter Over? (https://stackoverflow.com/questions/10231349/are-the-days-of-passing-const-stdstring-as-a-parameter-over)

The answer may depend on the specifics of the function.
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 11, 2018, 04:17:48 AM
Leeor,

My intent is to provide the archive code with a copy of the string and vector to do with what they please without modifying the values kept by OP2Archive. So I think in C++11 or newer the best way to achieve this is as stated: I have no real idea what I'm trying to say here, it just seems to be the right way to do it based on some research online. The whole topic of how to pass arguments in C++ is really confusing to me.  :-\

Code: [Select]
virtual bool CreateVolume(std::string volumeFileName, std::vector<std::string> filesToPack) = 0;

Hooman,

Check Leeor's 4 APR post in this thread. I added the static_cast based on his recommendation.

Plan to start working a new branch starting today to make these changes.

Thanks,
-Brett
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 11, 2018, 08:26:16 AM
Okay, I just created a new branch and pull request for OP2Utility.

I updated:

Code: [Select]
virtual bool CreateVolume(const char *volumeFileName, int numFilesToPack, const char **filesToPack, const char **internalNames) = 0;

to read:

Code: [Select]
virtual bool CreateVolume(std::string volumeFileName, std::vector<std::string> filesToPack, std::vector<std::string> internalNames) = 0;

I stopped here because this created a lot of changes in the source code and I didn't want the branch to become too big to review. Once this branch is merged, I'll create a new branch to remove internalNames from CreateVolume.

I tested the code in this branch by creating a new .vol and .clm file. Both packed successfully. I even tested the clm file in game and it worked.

Commit Message
* Update ArchivePacker::CreateVolume() to use std::string and std::vector in input arguments

 - Update VolFile.h/.cpp to use size_t to represent the number of files in archive instead of using int.
 - Store filesToPack and internalNames as std::vector<std::string> instead of char**.
 - Remove destruction subroutines where char** was replaced with std::vector<std::string>.

Pull Request URL: https://github.com/OutpostUniverse/OP2Utility/pull/9

I requested review from Hooman, but don't have any issues with others reviewing/commenting on it as well through GitHub.

Thanks,
Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 11, 2018, 06:52:34 PM
To readress my statement of passing by reference to const, you _still_ should do that. What you're doing now is passing a std::string by value. This means that you're invoking a constructor and a destructor when you don't actually need to. By passing by reference to const (e.g., const std::string& volumeFileName), you avoid the cost of the construction and destruction of a copy of the object. Instead, you are passing by reference. Since the function doesn't appear to be changing the string at all and it appears it's used once in the function CreateFileA, passing by reference to const is the most efficient way to do this.

Likewise, the second and third arguments:

Code: [Select]
std::vector<std::string> filesToPack, std::vector<std::string> internalNames

should also be either references or references to const. E.g.:

Code: [Select]
const std::vector<std::string>& filesToPack, const std::vector<std::string>& internalNames

Basically, your function prototype should look like this:

Code: [Select]
bool ClmFile::CreateVolume(const std::string& volumeFileName, const std::vector<std::string>& filesToPack, const std::vector<std::string>& internalNames)

Again, the function appears to call const member functions of std::vector... that being the case, by passing by value as you are, you're creating copies of each std::vector which invokes a constructor for the vector and a constructor for each item in the vector. For small file counts this is trivial but if you are doing any processing on any less than trivial count of files and internal names, you're potentially copying hundreds to thousands of objects. As you can imagine this gets very slow very fast.

This is in direct opposition to languages like Visual Basic, Java and C# where passing by value is as fast or faster (basically it's handled internally as pointers but you as the user don't see that).

Pass by reference to const whenever and wherever possible. If you don't actually need a copy of an object, don't make said copy. By changing the arguments to reference to const, you won't need to make an changes to the function itself unless you're calling non-const member functions but from a quick skim of the function it doesn't look like you're doing that.



As a semi-related side note, some really good compilers can see these kinds of things and will probably optimize that sort of needless copying, allocation and deallocation on its own but you can't count on that.

As another semi-related side note mostly regarding your post about C, in the case of C you'd be passing arguments via pointers and const pointers... e.g., const char* _string, etc.
Title: Re: OP2Archive Application Development
Post by: Hooman on April 12, 2018, 02:11:12 AM
Those changes were a bit extensive. They look fine though. Code appears much more modern. Looked a bit like more modernization could be done too.


After reviewing, there is one tangentially related note that I'd like to bring up for possible future review. It's not related to the code changes, or at least the code changes don't impact this aspect of the code:

The size of size_t (and the size of int) is dependent on the target platform. The use of size_t is appropriate here for in memory handling of data. It would not be appropriate for the structures serialized to disk, as the file format should remain platform independent. It may be worth revisiting this possible discrepancy in future edits. There may be corner cases for 64-bit builds, such as the in memory structures being too large to fit the structures saved to disk. This is extremely unlikely to be an issue in practice. The other side is, a 64-bit build might try to enlarge the disk structure sizes, making it unable to work with existing files.



The recommendation was to use unsigned char, however the cast is for unsigned short.



Leeor, with C++11 there are now move semantics, which complicate the issue. This already eliminates a lot of copying. I believe the const reference prevents use of the new move semantics. It may actually be less efficient in some cases to pass by const reference. Further, newer compilers mandate the use of move semantics in some circumstances, which seems to negate concerns over some compilers implementing or not implementing a certain optimization. It is actually a complicated muddy issue. I'm afraid I can't offer a recommendation here without doing further reading.
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 12, 2018, 02:03:07 PM
Quote
Looked a bit like more modernization could be done too.

Thanks for reviewing so quickly Hooman. I just merged the pull request back into the master. Now that the bulk of the changes were made to pass the filenames in via vector<string>, the other changes should be a lot smaller in scope. I am trying to move the code to C++11 standards that I touch making the changes.

Quote
The size of size_t (and the size of int) is dependent on the target platform. The use of size_t is appropriate here for in memory handling of data. It would not be appropriate for the structures serialized to disk, as the file format should remain platform independent. It may be worth revisiting this possible discrepancy in future edits. There may be corner cases for 64-bit builds, such as the in memory structures being too large to fit the structures saved to disk. This is extremely unlikely to be an issue in practice. The other side is, a 64-bit build might try to enlarge the disk structure sizes, making it unable to work with existing files.

I think I basically understand what you are saying, but it is beyond my scope of knowledge to try to fix.

I pushed a second commit to the repository cleaning StringHelper. I fixed the typo in the static_cast and deleted 2 helper functions I was using to transfer vector<string> into char** and vice versa as they are no longer needed. These functions are not needed anymore since we removed the char** calls.

Quote
It is actually a complicated muddy issue. I'm afraid I can't offer a recommendation here without doing further reading.

I agree with this. My thoughts are:

1. It is complicated to determine the exact most efficient way to pass arguments in C++ when you consider move semantics.
2. The code is cleaner not peppering it everywhere with const &.
3. The code base has not expressed any performance issues. Until performance issues are noted, it is better to go with the cleaner syntax and let the compiler figure the performance details out on its own.

I've basically finished the next branch, but will wait a bit to push as I need to test a couple of archive packs first.

-Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 12, 2018, 09:38:57 PM
...
2. The code is cleaner not peppering it everywhere with const &.
...

I disagree. Partially because I'm used to it, but in a lot of ways because it makes way more sense to me. Move semantics in C++ are a relatively new thing and as stated, muddy's the waters. That muddiness is something I as a developer don't appreciate so I tend to stick with what I know... Plus I imagine even with said move semantics compiler developers wouldn't strip out optimizations regarding references.

But that's just my humble opinion.
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 13, 2018, 12:27:55 PM
I pushed another branch and associated merge request for removing the argument internalNames from ArchivePacker::CreateVolume. ArchivePacker now also checks if 2 filenames (not including paths) are the same in the provided fileToPack and throws an exception if found. I had implemented this in OP2Archive, but believe it belongs on the OP2Utility side.

The following tests were completed:

1. Packing a new CLM file and testing in game.
2. Packing a new vol file.
3. Attempting to pack a vol file with 2 files with the same name. Operation properly aborted by new code.



I also recommended merging the branch turning XFile into a namespace instead of a Utility class. After Hooman pushes this merge, I'll test against modified versions of OP2Archive and OP2MapImager.

Once we stabilize development of OP2Utility, I'll push versions of OP2Archive and OP2MapImager that use the newest version of OP2Utility.

Thanks,
Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on April 17, 2018, 04:12:08 AM
Quote
Plus I imagine even with said move semantics compiler developers wouldn't strip out optimizations regarding references.

Yes, you're right, there is of course no reason to remove old optimizations, but that's kind of missing the point. There are certain cases where the old way of doing things couldn't be optimized, which can be optimized with move semantics. Consider the case of copying and holding a string into an internal data buffer. You might still declare such a parameter as const &, as the original, doesn't need to be modified by the called function. It might however be modified or deallocated later by the caller, hence a copy is stored rather than a reference to the original. But if the input parameter was a temporary, you can avoid the copy by simply taking ownership of the string buffer. This is where move semantics help. The compiler knows to call either the copy operator or the move operator based on whether the source object is a variable (lvalue), or a temporary (rvalue). The optimization goes beyond just the point of the call, taking into account the lifetimes of the objects.

There are also some convenience and syntax benefits, such as passing literals that have no associate memory address. Declaring a parameter as const & requires the parameter to have a memory address. That doesn't work so well for literals, such as 5. If you've ever found yourself declaring a variable just to pass a constant into a function, you know how annoying this is.

Quote
That muddiness is something I as a developer don't appreciate

Heh, I tend to agree. I'm not a fan when languages have multiple syntax options for the same thing. I've noticed Ruby violated that principle a few times to ill effect. I'm also rather against aliases to account for different spellings in different regions, such as US versus UK spellings of function names. I would prefer there only being one correct way.



Vagabond, I've merged the branch.

I'm thinking about the workflow right now. In the future, it probably makes sense for you as the project owner to do the merge, and was probably silly of me to request your review after opening the pull request. I think requesting review is more suitable when there is no main project owner, or when the project owner wants code reviewed by someone else. For a lot of single author projects, opening a pull request is in a way requesting review and inclusion by the project owner.

I suppose I'm still figuring out GitHub and GitHub etiquette.
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 17, 2018, 10:40:58 AM
I went ahead and merged the Archive change that removed the internalName argument from the CreateVolume function.

A new branch and merge request were created for sorting filesToPack alphabetically inside the CreateVolume function.

This is the last change I currently had in mind for OP2Utility. If Leeor is looking to move is OP2-Landlord code to using OP2Utility, we could start looking at the differences between OP2Utility and his code base for merging. Otherwise, I'll look to release a new version of OP2Archive. Unless Hooman has something else he wants to review, change, fix, etc.

Quote
Vagabond, I've merged the branch.

I'm thinking about the workflow right now. In the future, it probably makes sense for you as the project owner to do the merge, and was probably silly of me to request your review after opening the pull request. I think requesting review is more suitable when there is no main project owner, or when the project owner wants code reviewed by someone else. For a lot of single author projects, opening a pull request is in a way requesting review and inclusion by the project owner.

I suppose I'm still figuring out GitHub and GitHub etiquette.

I figured you just tagged me for review since I wrote the original code as a courtesy. Which is why I keep tagging you in changes to the Archive code since you wrote it. I don't really consider myself the owner of the code. You wrote all of the archive code which is the largest piece. We co-wrote the code to load a map into memory, but we used all your notes that would have otherwise made it impossible for me to write.

Thanks,
-Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 21, 2018, 06:58:17 PM
In reference to Hooman's comment (https://github.com/OutpostUniverse/OP2Utility/pull/12) about subfolders in VOL files:

Quote
We may want to consider the possibility of subfolders within VOL files, and how the path component interacts with the filename.

It got me thinking (especially with Brett's comments (https://forum.outpost2.net/index.php/topic,6089.0.html) about OP2-Landlord), are the VOL files absolutely 100% required or is it possible to replace them with standard ZIP files? Is this something that our recent mods are capable of implementing or is this hard coded into the main executable?
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 22, 2018, 08:01:15 AM
We have been spending a fair amount of time working on OP2Utility and commenting on the progress. If you are wanting to follow the progress closely you can subrscribe to the repo on GitHub. I'd also be happy if others want to work alongside or add constructive comments, merge requests, bug reports, etc.

I have uncommitted code that saves about 2/3rds of an Outpost 2 map to file. I've been reviewing the process by loading the created file using HxD and comparing to the original file. The last portion I need to work on is adding the TileGroups at the end of the map file. I may need some help interpreting some items in the ollydbg notes section after the next time I work on it.

No one has commented negatively on the plan to use a MapReader and MapWriter as discussed above, so I will let go with that plan in addition to adding the terrain enums to OP2Utility as suggested by Leeor.

Leeor, I wouldn't mind switching to zip files but this is beyond my skills.

You could implement the mapper without involving vol files, but it would require using a separate tpool like OP2Archive to extract the tile sets and maps for use in the mapper. I would recommend considering allowing for reading of vol files to load tile sets and maps at least. It would probably be okay to force the user to repack Vols using a separate application as you only need to do this before redistributing a new version of the game. Just my opinion here though.

Thanks,
Brett
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 22, 2018, 08:07:01 AM
What I mean is, are the VOL files absolutely required by Outpost 2 or can we modify it to instead use a standard archive format like ZIP? ZIP is simple enough to implement via zlib or similar and I use archives like that in many of my games via PhysicsFS... so it popped into my mind that maybe we could just eliminate an archaic format and replace it with a well documented and well supported one.
Title: Re: OP2Archive Application Development
Post by: lordpalandus on April 22, 2018, 05:50:16 PM
Well, as the game can load loose VOL files, I'd say it could also load zip files. Or renamed zip files.
Title: Re: OP2Archive Application Development
Post by: leeor_net on April 22, 2018, 06:15:41 PM
VOL and ZIP are two totally different formats... if the main executable (outpost2.exe) is what loads the VOL files, changing it to load ZIP files instead would be non trivial. On the other hand, if the VOL files are loaded up by one of the various DLL's, adding ZIP support should be straight forward.

But I haven't followed along closely enough to know this (which is why I'm posing the question to Hooman/Vagabond ;)  )
Title: Re: OP2Archive Application Development
Post by: Vagabond on April 23, 2018, 01:16:49 PM
Leeor,

Unfortunately how Outpost 2 consumes vol and clm files internally is beyond my knowledge. I only know a fair amount about the actual file format due to Hooman's coaching and excellent documentation. He would probably be in a better spot to answer a question like this.

-Brett
Title: Re: OP2Archive Application Development
Post by: Hooman on April 26, 2018, 12:48:39 PM
It would probably require reasonably substantial changes to get the game to load ZIP files. Though I'm not sure it would be so much work as to preclude the possibility. I wouldn't consider this a high priority though.

If someone were to make a remake, then yeah, just pack the resources into zip files and go that route.


I have some concerns about the split between MapReader and MapWriter. Guess it's been a while since I've gone through OPU stuff. I think it makes sense to split stream reading from stream writing. I also made a mess of the VOL and CLM code, trying to support both operations in one class, and in hindsight would probably have split it into a reader and a writer class. Though I don't think I'd have extended the reader/writer split to map data.

Thinking about this deeper, my reasons are the map data is often both loaded and saved in the same session, the entire data set is processed as one atomic unit, and both loading and saving are highly symmetric.

On the other hand, VOL files are generally not both read and written in the same session, may be packed atomically but internal files are picked and chosen when reading from them, and are processed in a way that is far less symmetric. The packing code needs to collect files and filenames, possibly transform filenames for CLM, build a sorted index, add appropriate padding, and might even optionally compress the data though that was never implemented. For unpacking, you need to binary search the index, jump around in the file to find the data, verify headers are correct and match the index, and decompress data.