Excellent.
I'm going to suggest using Ruby for this project. It should be easier to work with than C++, and may lead somewhere interesting.
You can find a Windows Ruby installer at http://rubyinstaller.org/ (http://rubyinstaller.org/). Download one of the packages to get started. Currently the newest are the 2.2.X releases. There is a note about gem compatibility and general stability that suggests possibly using the 2.1.X releases, but until we run into a problem, let's try the newest version first. :)
You'll also want a decent editor for writing Ruby code. I recommend Notepad++ (https://notepad-plus-plus.org/). It's a great programming editor with syntax highlighting for a great many languages, including Ruby, and many other features useful for programming.
In the Ruby package you should find the following executables:
ruby.exe (the main Ruby interpreter which runs console mode .rb applications with an associated terminal window)
rubyw.exe (an alternate Ruby interpreter for graphical .rbw applications, without an attached terminal window)
irb.exe ("Interactive-Ruby", executes each line as you type it. Great for testing out simple code ideas, or to use as a calculator)
gem.exe (The "Gem" package manager)
You can test out the install by running irb:
irb
irb(main):001:0> 5*5
=> 25
irb(main):002:0> 1+10
=> 11
irb(main):003:0> [1,3,7,10].map{|x| x*3+1}
=> [4, 10, 22, 31]
Let me know when you're setup, and we'll continue. If you're waiting, you can check out the Ruby website https://www.ruby-lang.org/ (https://www.ruby-lang.org/). There is a Documentation section, which includes some tutorials. An interesting one is TryRuby (http://tryruby.org/levels/1/challenges/0) which is basically an IRB session in your browser, with some added tutorial text to give you some suggested input to try out and learn. Very neat. You don't even need to download and install anything to get started trying it.
@Dave: I'm a bit confused by that.
Here's a rough idea of what I'm thinking.
First you'll need a parser. The initial design will look something roughly like:
techFileText = File.read(techFileName)
techArray = Parser.parse(techFileText)
Of course that assumes a properly structured input file with no misspelt keywords, missing keywords, or syntax errors. This includes typos such as BEGIN_TEX, or a missing END_TECH tag, or forgetting to specify a tech ID after BEGIN_TECH. Such errors should be properly reported to the user, and without the program exploding in any horrible way. I'm going to suggest keeping the reporting generic though. You don't want to report the errors to the user in the parser itself, since that makes the parsing code less portable. Rather, you want parsing errors to be reported to the caller, which may then report the errors to the user. By decoupling error reporting from the parsing code you allow the parser to be reused in more scenarios (say parsing in a console app, a GUI app, or a web app). This can be done using the exception handling mechanism to report the details of the error. Using exception handling means error information is returned to the caller without specifying how the details are reported to the user, such as terminal output, popup window (yuck), or error web page (this sounds a lot like a popup window). That will look something like:
techFileText = File.read(techFileName)
begin
techArray = Parser.parse(techFileText)
rescue ParseError => parseError
# Report error message to user
# ... fileName:lineNumber:columnNumber
# ... copy of line with the problem
end
Then you'll need to design some lint checking rules. This will check for things like:
- dependency cycles (tech A depends on tech B, while tech B depends on tech A)
- unreachable techs (tech A depends on tech B, but tech B is disabled)
- undefined tech IDs (tech A depends on tech B, but tech B doesn't exist)
- missing tags (either COST is present, exclusive-or both EDEN_COST and PLYMOUTH_COST are present)
- inappropriate category (marked as civilian tech, but upgrades a military unit)
- static buffer limitations on upgrades (a tech can upgrade many units, but a unit can only be upgraded by at most 2 techs).
- static buffer limitations on techs? (can Outpost 2 only support a limited number of techs?)
That program will look something like:
techFileText = File.read(techFileName)
begin
techArray = Parser.parse(techFileText)
rescue ParseError => parseError
# Report error message
# ... fileName:lineNumber:columnNumber
# ... copy of line with the problem
end
lintCheck(techArray)
The concept of decoupling the error reporting from the error checking applies here too. You might use exception handling, or you might use return values. A return value is reasonable here, since a lint check isn't expected to do anything other than find errors. In the case of the parser, the parser was expected to return valid parsed data, and so returning an error instead is an exceptional case. It's also possible to report multiple errors at once.
A C++ compiler often tries to report multiple errors at once with parsing, but usually with horrible results, and so programmers are accustomed to ignoring everything other than the first error. This is largely because a parse error means the input data is invalid, and so trying to continue parsing invalid data to make sense of it is a bit of a lost cause. With a semantic check, done after parsing is complete, the data is at least properly structured even if it doesn't make sense, so there is more hope of reporting multiple errors at once reliably.
What I'm getting at, is the lint check could potentially return an array of errors, all of which would need to be reported. It also means a lint check could be more useful if it keeps going after marking an error, rather than stopping to report the error right away. Of course, there are alternatives, such as having the errors reported to a callback (code block) as they are found, with a return value simply indicating overall success or if any errors were found. That has the advantage that errors can be reported quickly for long running checks. But of course, a simple version of the algorithms that just reports the first error to a console and terminates is still useful and a good starting point.
But, that's a detail for later. First step is to get a parser going.
Don't feel bad, I'm a bit confused too.
I'm thinking my hubris about learning new things got carried away. I'll learn coding, but it's going to be awhile, longer than I expected.
I've read over this a couple times and was wondering how much could be applied?
http://thingsaaronmade.com/blog/a-quick-intro-to-writing-a-parser-using-treetop.html
Looks to me a good start for being able to read the file and have it search for BEGIN_TECH and then right rules for how it should be spelled?
I've read multiple definitions for what parsing is and not a one has made sense to me.
I'm sure i'm getting ahead here.
So would having Ruby use the scan and .length command to count number of times BEGIN_TECH shows up and then match that number to END_TECH work okay or is that not specific enough?
file = (techFileName)
1st array = file.scan (BEGIN_TECH).length
2nd array = file.scan (END_TECH).length
3rd array = 1st array == 2nd array
Something like this work? And how would you tell Ruby to spit out the answer to say a new text file? So that if the 3rd array = false it would log it and move on, if = true it just moves on? Or would you be looking for it to read and output specifically where the missing BEGIN or END tech is? I guess having it read top down and have it scan for BEGIN_TECH then scan for END_TECH would be more difficult as how does it know it didn't just scan past a BEGIN_TECH to find a END_TECH?
My head hurts.
As per our discussion on IRC about Ruby parsers, the Citrus (https://github.com/mjackson/citrus) library actually looked quite promising. Instead of using Treetop for parsing (which was an arbitrary library choice), we'll amend the plan to use Citrus. The Citrus library can be installed with the gem package manager using the command line:
To use this library, the Ruby code will need to include it using require. As the library is a Ruby gem, and not a core library, you will need to first load the core "rubygems" library, (or another alternative gem package manager library).
Side note: The rubygems library (or an alternative) will update the require method to also look in the gem package folder, rather than just the default bundled core library folder. Since there are alternative gem package managers, the require for rubygems is generally only present in the top level project file, and not sprinkled into every library file that uses a gem. Sprinkling require 'rubygems' throughout libraries would take away the choice of library users as to what package manager they want to use, since the library would then be causing rubygems to be loaded.
require 'rubygems' # Sometimes omitted depending on context, or a different gem package manager
require 'citrus'
A grammar is then loaded from an external .citrus file, using the Citrus.load method.
Citrus.load("grammarFileName.citrus")
The Citrus.load method will create a new Ruby Module, with a few parser specific methods added to it. Module names typically start with an upper case letter. The name of the grammar module is specified in a .citrus file using the syntax:
grammar StuffGrammar
...
end
The module name (StuffGrammar, in the example above) can be used directly in the code following the call to Citrus.load, much like when you require a source code file. In fact, the .citrus file is Ruby code, which is passed to eval, and so any valid Ruby code can be used in a .citrus file, (including malicious code). It's a very common practice in Ruby to create a DSL (Domain Specific Language) which is actually Ruby code in disguise, but used in a context where additional supporting methods are available. Here our domain is defining a grammar for use in parsing, and so extra methods such as "grammar" and "rule" are made available.
Once the grammar has been loaded, it can be used to parse an input string using the parse method.
StuffGrammar.parse(inputText)
Near the end of the Citrus page, there is a section on Debugging that gives an example of catching and reporting errors. It follows very closely to the rough idea I outlined in a previous post. Their example is as follows:
def parse_some_stuff(stuff)
match = StuffGrammar.parse(stuff)
rescue Citrus::ParseError => e
raise ArgumentError, "Invalid stuff on line %d, offset %d!" %
[e.line_number, e.line_offset]
end
Notice how the Citrus::ParseError class provides information such as line_number and line_offset. Couple that with outputting the actual text of the line where the parse error occurred (the line property), and the user should be able to pinpoint the source of parse errors quite quickly.
You can read the Citrus documentation (http://mjackson.github.io/citrus/api/) for more details, but at this point you should be able to start playing around with the example Citrus grammar files.
Here is a description of the implementation of the parsing code in Outpost 2. Excuse the rough translation from assembly code to not quite C++, not quite Ruby pseudo code. There are 3 main functions for transforming the text into meaningful data, and one bit of inlined code to skip over comments. Each function processes input data one byte at a time.
ReadInt:
EatNonNumberLoop: // Eat whitespace and comments. Error if other non-digit characters are found
char = buffer[0] = input.next
break if (isDigit(char) || char == '-') // Start of number
EatComment if (char == ';') // Start of comment
ParseError if (isAlpha(char)) // Unexpected character
i = 1
ScanNumberLoop: // Read over number to find where it ends
char = buffer[i] = input.next
break if !(isDigit(char) || char == '-') // End of number (seems to allow embedded "-" signs)
i++
return buffer.to_int
ReadToken:
EatNonAlphaLoop:
char = buffer[0] = input.next
break if (isAlpha(char)) // Start of token
EatComment if (char == ';') // Start of comment
i = 1
ScanTokenLoop:
char = buffer[i] = input.next
break if !(isAlpha(char) || char == '_') // End of token
i++
buffer[i] = 0 // Null terminate string
return buffer
ReadString:
EatNonOpeningDoubleQuoteLoop:
break if (char == '"') // Start of string
EatComment if (char == ';') // Start of comment
i = 0
ReadStringLoop:
char = buffer[i] = input.next
break if (char == '"') // End of string
continue if (char == '\t' || char == '\n') // Skip over tabs and newlines without recording them
i++
buffer[i] = 0 // Null terminate string
return buffer
EatComment:
Loop:
char = input.next
break if (char == 10) // End of line (10 = Linefeed character)
You can do better than this using regular expressions in Ruby. A rough untested stab at needed regular expressions is:
Comment: /;.*/
Int: /-?[0-9]+/
Token: /[a-zA-Z_]+
String: /"[^"]*"/
For the Comment regex, I assumed . won't match newlines, which it can depending on regular expression flags. For the String regex, I assumed newlines would be included within the match, which they might not be, depending on regular expression flags. Those will be points I'll leave you to google. Remember you can test out your regular expressions on Rubular (http://rubular.com/).
Further, I will assign the task of testing regular expressions in Rubular. Paste an example section of a tech file into Rubular as the input text, and try the above regular expressions, and see if they match what you think they will match. Make sure to include an example where a quoted string contains an embedded newline.
I can't get Rubular to do more than one regular expression at a time (which is the way it's supposed to be?)
Anyways
In this test string
BEGIN_TECH "Cybernetic Teleoperation" 03401
CATEGORY 4
DESCRIPTION "Structure Factories may now produce Robot Command Center and Vehicle Factory structure kits. _______________________________________ Our research has resulted in a specialized variant of the Command Center, with dedicated computers and communications capabilities. In addition, all vehicle designs now include the less expensive Noesis computer, utilizing elements of the Savant technology. This transfers much of the computing burden from the Robot Command Center to the vehicle itself."
TEASER "Allows production of Robot Command Center and Vehicle Factory structure kits at the Structure Factory. _______________________________________ Prior to the evacuation from our original colony site, Workers remotely operated our vehicles using a technology called Teleoperation. Since the catastrophe, we no longer have enough Workers to Teleoperate our vehicles. The Savant computers at the Command Center have taken on part of this burden, but the job is taxing their capacity. We need a specialized computer vehicle control system. This Cybernetic Teleoperation project should allows us to operate a much larger number of vehicles."
EDEN_COST 800
PLYMOUTH_COST 1000
MAX_SCIENTISTS 10
LAB 2
END_TECH
BEGIN_TECH "Emergency Response Systems" 03301
CATEGORY 11
DESCRIPTION "Structure Factories may now produce DIRT structure kits. _______________________________________ Disaster Instant Response Teams (DIRTs) can reduce damage to structures. Once the DIRT structure has been deployed, DIRT members trained in emergency medical care and structural reinforcement will be on the scene in a matter of seconds."
TEASER "Allows production of DIRT structure kits at the Structure Factory. _______________________________________ Given the new dangers confronting our colony, we need more protection against disaster than our emergency shelters are able to provide. This project will develop new methods, tools, and techniques to respond to structural damage."
COST 1000
MAX_SCIENTISTS 10
LAB 2
END_TECH
This => /\d[0-9*]*/ matches all integers of a completed size (ie 0,45,1007,6 and so on)and puts them in match groups (puts them in match groups when I enclose the regex in parentheses) 1 to 11 with no repeaters ( there are 11 different integer combos)
This => /[a-zA-Z]+/ matches all tokens as full words into match groups as well as single alone tokens a, I, as 275 separate match groups except for this > i.e. BEGIN_TECH shows up separately BEGIN then TECH and END and so on and so forth (that's me and I forgot to add in the _ to the line of code) also, adding that _ accounts for the long line separating the tech ____________________ from the description
Just imputing this => /;*/ matches all white spaces , (spoiler, there's a lot)
This => /"[^"]*"/ matches the entire string in one match group for anything enclosed in quotations, so tech descriptions, begin_tech header desc etc etc
That's as far as I got, work time.
Work more on this later.
Continuing on with using the regex, you can try something like the following:
str = 'BEGIN_TECH "Cybernetic Teleoperation" ... END_TECH'
re = /\d+|[a-zA-Z_]+|\"[^"]*\"|;.*/
# Create an array of matches:
matchArray = str.scan(str) # => ["BEGIN_TECH", "\"Cybernetic Teleoperation\"", ..., "END_TECH"]
// Or
// Create an enum to process matches one at a time
matchEnum = str.enum_for(:scan, re)
matchEnum.next # => "BEGIN_TECH"
matchEnum.next # => "\"Cybernetic Teleoperation\""
...
You can iterate over the matches using .each. It works in either case.
matchArray.each do |match|
// ...
end
matchEnum.each do |match|
// ...
end
Mind you, the .each loop doesn't quite match how Outpost 2 handles processing. It uses a more complicated code structure that's more similar to a while loop with .next calls. They're validated calls though, so it would be more like nextToken, nextString, nextInt. It doesn't just grab the next match. It validates the next match is of the expected type.
I can elaborate further, but I want to stop here for the moment. I'd also like to assign a small task. Rather than having one big regex, you can define multiple smaller regex and then combine them with a regex method. You can check the Ruby docs for the Regex class to find the needed method.
RegexInt = /\d+/
RegexToken = /[a-zA-Z_]+/
RegexString = /\"[^"]*\"/
RegexComment = /;.*/
RegexAll = Regex.something(...) # *figure this out*
That would allow splitting the text into component parts using the combined expression, and then using the individual regular expressions while going through the data to validate each part is of the expected type.
As per the IRC discussion, I thought I'd add some notes. The immediate goal is to parse the tech data into an array of structs. What you might do, is create a function to parse data for a single tech, and then work that into a loop to parse all techs into an array of structs.
You should look into the Ruby Struct class (http://ruby-doc.org/core-2.3.0/Struct.html). You can define it using syntax such as:
Tech = Struct.new(
:name,
:id,
:description,
...
)
You should start by defining all the needed fields. Play around with the struct to get a feel for using them.
tech = Tech.new
tech.id = 1007
tech.name = "Tech Name"
puts tech.id
puts tech[:id]
tech2 = Tech.new("Tech Name 2", 1008, ...)
puts tech2.name
puts tech2[:name]
You'll also want to be familiar with the Ruby String class (http://ruby-doc.org/core-2.3.0/String.html).
In particular, know the "scan", "match", and "=~" methods. It's also good to have a general sense of what other methods are available.
For handling arrays and collections, it's good to read up on the Ruby Enumerable module (http://ruby-doc.org/core-2.3.0/Enumerable.html). Many other objects include this module to provide a standard set of methods for working with collections.
In particular, try to be familiar with "each" (provided by any class that uses Enumerable), along with provided methods "map", and "inject". Try to get a sense of what other methods are offered in this module. You may or may not need this right away, but it will be useful for later.