Author Topic: Project Idea: Tech File Lint Checker  (Read 11216 times)

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Project Idea: Tech File Lint Checker
« on: December 01, 2015, 07:41:28 PM »
I had an idea for a small project, that might be good for someone who wants to learn some programming.

Tech File Lint Checker:
Read an Outpost 2 formatted tech file, parse the data, and perform a few sanity checks to make sure it won't crash that game or cause weird behaviour. Report any problems found to the user.

Sirbomber, you've worked on creating new tech files. I remember you mentioning some problems along the way with limitations and things that could go wrong. I suspect you could mention a few issues such a tool should check for.


I think this could be a fairly easy project for someone new to programming. If someone wants to learn, I can provide detailed guidance on the project.

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #1 on: December 01, 2015, 10:06:16 PM »
I'm game.
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #2 on: December 02, 2015, 08:27:51 AM »
Excellent.

I'm going to suggest using Ruby for this project. It should be easier to work with than C++, and may lead somewhere interesting.

You can find a Windows Ruby installer at http://rubyinstaller.org/. Download one of the packages to get started. Currently the newest are the 2.2.X releases. There is a note about gem compatibility and general stability that suggests possibly using the 2.1.X releases, but until we run into a problem, let's try the newest version first. :)

You'll also want a decent editor for writing Ruby code. I recommend Notepad++. It's a great programming editor with syntax highlighting for a great many languages, including Ruby, and many other features useful for programming.

In the Ruby package you should find the following executables:
ruby.exe  (the main Ruby interpreter which runs console mode .rb applications with an associated terminal window)
rubyw.exe (an alternate Ruby interpreter for graphical .rbw applications, without an attached terminal window)
irb.exe ("Interactive-Ruby", executes each line as you type it. Great for testing out simple code ideas, or to use as a calculator)
gem.exe (The "Gem" package manager)


You can test out the install by running irb:
Code: [Select]
irb
irb(main):001:0> 5*5
=> 25
irb(main):002:0> 1+10
=> 11
irb(main):003:0> [1,3,7,10].map{|x| x*3+1}
=> [4, 10, 22, 31]

Let me know when you're setup, and we'll continue. If you're waiting, you can check out the Ruby website https://www.ruby-lang.org/. There is a Documentation section, which includes some tutorials. An interesting one is TryRuby which is basically an IRB session in your browser, with some added tutorial text to give you some suggested input to try out and learn. Very neat. You don't even need to download and install anything to get started trying it.
« Last Edit: December 02, 2015, 08:29:25 AM by Hooman »

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #3 on: December 02, 2015, 09:56:41 AM »
*raises hand*
"I have to go to the bathroom"






...also, done. Next?
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #4 on: December 03, 2015, 08:01:14 PM »
Excellent. Did you actually play with any of the Ruby tutorials?


You'll need a way of parsing the tech data into some sort of memory structure. The tech file is simple enough that you could write a parser by hand. There are also parsing libraries that can help though. In particular, I'm aware of Treetop, which is a Ruby parser generator.

You can install Treetop by running the following command in a terminal window:
Code: [Select]
gem install treetop

It may be worth doing a bit of research on the topic. The page for Treetop is a good starting place.


Also, for reference, here is an example tech taken from multitek.txt. This is the sort of thing that needs to be parsed.
Quote
BEGIN_TECH "Hot-Cracking Column Efficiency" 07202
    CATEGORY        6
    DESCRIPTION     "Common Ore Smelter, Rare Ore Smelter, and GORF Power requirements reduced."
    TEASER          "Reduces Common Ore Smelter, Rare Ore Smelter, and GORF Power requirements. _______________________________________  Smelters and GORFs are dependent on hot cracking columns to separate the Metal content of Ores or rubble.  This equipment has a very high Power demand.  We believe that we may be able to apply our high-temperature superconductive material to some elements of this system and reduce the Power demand."
    IMPROVE_DESC    "Reduced Power demand"
    REQUIRES        03302
    REQUIRES        05110
    COST            1400
    MAX_SCIENTISTS  14
    LAB             2
    UNIT_PROP SMELTER_ADV Power_Required 40
    UNIT_PROP SMELTER Power_Required 40
    UNIT_PROP GORF Power_Required 40
END_TECH

Most are of the form: Key (whitespace) Value
Most are singular values, and many fields are required.
A few fields can be repeated, such as UNIT_PROP.
The BEGIN_TECH and END_TECH tags wrap all data for one tech, and there are an arbitrary number of techs.
(Except, Outpost 2 may have a hard limit for how many it will read, so that can be part of the lint checking).

Offline Sirbomber

  • Hero Member
  • *****
  • Posts: 3238
Re: Project Idea: Tech File Lint Checker
« Reply #5 on: December 03, 2015, 09:50:17 PM »
I'm not sure what was actually going on here, but I remember running into a problem where upgrading the same unit more than two different times would cause some strange side effects, like Earthworkers suddenly costing 13 rare ore for another player.  But that's definitely the kind of thing to look out for.
"As usual, colonist opinion is split between those who think the plague is a good idea, and those who are dying from it." - Outpost Evening Star

Outpost 2 Coding 101 Tutorials

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #6 on: December 03, 2015, 11:57:00 PM »
I thought Ruby downloaded with RACC, I gotta check later.

I ran that IRB bit you posted and that's as far as I got.
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #7 on: December 04, 2015, 02:16:53 AM »
Yes, I remember something about only two upgrades per unit, due to statically allocated space. That is something to check for. I believe there are other things, but I can't remember them all now. I suspect the total number of techs is limited. Circular dependency checks might also be useful. Maybe reachability, in case a tech depends on another that has been disabled. There is also the issue of using inappropriate UNIT_PROP tags, although some neat features can be added by abusing them just a little bit.


What is RACC?

I suppose I can tailor instructions to be very explicit if needed. I was originally thinking there could be some research parts though, like go read this article and that article, or read the documentation on this tool. Perhaps it's a bit too early for that though. I would recommend you check out the TryRuby link though.

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #8 on: December 04, 2015, 02:32:58 AM »
RACC is a Parser Generator written in Ruby , which is as far as I got. Sleep Time...
« Last Edit: December 05, 2015, 11:20:28 AM by dave_erald »
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #9 on: December 04, 2015, 08:43:05 AM »
Ahh, ok. I think I have heard of it before. The page says it outputs Ruby code, and the runtime is included with Ruby 1.8+, which I believe means the output will run out of the box. You'd still need to gem install it to compile a grammar description to generate Ruby output though.

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #10 on: December 17, 2015, 06:54:49 PM »
So this program should then be written with a certain order to follow when lint checking, I am generalizing as I am still learning how to code... from the beginning basically.

  • Load file
  • Spell Check
  • Check tech order, begin_tech, desc, teaser...end_tech
  • Start at level one with first begin tech integer (34052) for example
  • Check current level for required integer/s assuming there are none
  • Check free tech list, and then previous level
  • Any iterations add +1 to an array

...and I'll stop there till someone says I'm an idiot or not.

I should go back to cars and trucks... way simpler.
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Vagabond

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 1015
Re: Project Idea: Tech File Lint Checker
« Reply #11 on: December 19, 2015, 01:41:46 AM »
Dave,

Sounds like a good start. When you say spell check, I'm assuming you mean just checking the required BEGIN_TECH, CATEGORY, TEASER, etc are present as opposed to a word processing application style spelling/grammar check.

Have you thought about how you are going to report errors to the user yet? Is it going to be a command line program that spits out a log file or a simple GUI that lists them, or something else?

You could just list errors as you iterate through the tech file. For example,

TECH 07202, MAX_SCIENTISTS property missing.
TECH 07203, LAB value is not an integer.

This data could be saved to an array of some sort or directly written to a log file.


Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #12 on: December 19, 2015, 01:49:40 AM »
@Dave: I'm a bit confused by that.

Here's a rough idea of what I'm thinking.

First you'll need a parser. The initial design will look something roughly like:
Code: [Select]
techFileText = File.read(techFileName)
techArray = Parser.parse(techFileText)

Of course that assumes a properly structured input file with no misspelt keywords, missing keywords, or syntax errors. This includes typos such as BEGIN_TEX, or a missing END_TECH tag, or forgetting to specify a tech ID after BEGIN_TECH. Such errors should be properly reported to the user, and without the program exploding in any horrible way. I'm going to suggest keeping the reporting generic though. You don't want to report the errors to the user in the parser itself, since that makes the parsing code less portable. Rather, you want parsing errors to be reported to the caller, which may then report the errors to the user. By decoupling error reporting from the parsing code you allow the parser to be reused in more scenarios (say parsing in a console app, a GUI app, or a web app). This can be done using the exception handling mechanism to report the details of the error. Using exception handling means error information is returned to the caller without specifying how the details are reported to the user, such as terminal output, popup window (yuck), or error web page (this sounds a lot like a popup window). That will look something like:

Code: [Select]
techFileText = File.read(techFileName)
begin
  techArray = Parser.parse(techFileText)
rescue ParseError => parseError
  # Report error message to user
  # ... fileName:lineNumber:columnNumber
  # ... copy of line with the problem
end

Then you'll need to design some lint checking rules. This will check for things like:
  • dependency cycles (tech A depends on tech B, while tech B depends on tech A)
  • unreachable techs (tech A depends on tech B, but tech B is disabled)
  • undefined tech IDs (tech A depends on tech B, but tech B doesn't exist)
  • missing tags (either COST is present, exclusive-or both EDEN_COST and PLYMOUTH_COST are present)
  • inappropriate category (marked as civilian tech, but upgrades a military unit)
  • static buffer limitations on upgrades (a tech can upgrade many units, but a unit can only be upgraded by at most 2 techs).
  • static buffer limitations on techs? (can Outpost 2 only support a limited number of techs?)
That program will look something like:

Code: [Select]
techFileText = File.read(techFileName)
begin
  techArray = Parser.parse(techFileText)
rescue ParseError => parseError
  # Report error message
  # ... fileName:lineNumber:columnNumber
  # ... copy of line with the problem
end

lintCheck(techArray)

The concept of decoupling the error reporting from the error checking applies here too. You might use exception handling, or you might use return values. A return value is reasonable here, since a lint check isn't expected to do anything other than find errors. In the case of the parser, the parser was expected to return valid parsed data, and so returning an error instead is an exceptional case. It's also possible to report multiple errors at once.

A C++ compiler often tries to report multiple errors at once with parsing, but usually with horrible results, and so programmers are accustomed to ignoring everything other than the first error. This is largely because a parse error means the input data is invalid, and so trying to continue parsing invalid data to make sense of it is a bit of a lost cause. With a semantic check, done after parsing is complete, the data is at least properly structured even if it doesn't make sense, so there is more hope of reporting multiple errors at once reliably.

What I'm getting at, is the lint check could potentially return an array of errors, all of which would need to be reported. It also means a lint check could be more useful if it keeps going after marking an error, rather than stopping to report the error right away. Of course, there are alternatives, such as having the errors reported to a callback (code block) as they are found, with a return value simply indicating overall success or if any errors were found. That has the advantage that errors can be reported quickly for long running checks. But of course, a simple version of the algorithms that just reports the first error to a console and terminates is still useful and a good starting point.

But, that's a detail for later. First step is to get a parser going.

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #13 on: December 19, 2015, 12:51:19 PM »
Don't feel bad, I'm a  bit confused too.

I'm thinking my hubris about learning new things got carried away. I'll learn coding, but it's going to be awhile, longer than I expected.

I've read over this a couple times and was wondering how much could be applied?
http://thingsaaronmade.com/blog/a-quick-intro-to-writing-a-parser-using-treetop.html
Looks to me a good start for being able to read the file and have it search for BEGIN_TECH and then right rules for how it should be spelled?

I've read multiple definitions for what parsing is and not a one has made sense to me.


I'm sure i'm getting ahead here.
So would having Ruby use the scan and .length command to count number of times BEGIN_TECH shows up and then match that number to END_TECH work okay or is that not specific enough?

Code: [Select]
file = (techFileName)
1st array = file.scan (BEGIN_TECH).length
2nd array = file.scan (END_TECH).length
3rd array = 1st array == 2nd array

Something like this work? And how would you tell Ruby to spit out the answer to say a new text file? So that if the 3rd array = false it would log it and move on, if = true it just moves on? Or would you be looking for it to read and output specifically where the missing BEGIN or END tech is? I guess having it read top down and have it scan for BEGIN_TECH then scan for END_TECH would be more difficult as how does it know it didn't just scan past a BEGIN_TECH to find a END_TECH?

My head hurts.

-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #14 on: December 28, 2015, 12:55:41 AM »
As per our discussion on IRC about Ruby parsers, the Citrus library actually looked quite promising. Instead of using Treetop for parsing (which was an arbitrary library choice), we'll amend the plan to use Citrus. The Citrus library can be installed with the gem package manager using the command line:

Code: [Select]
gem install citrus

To use this library, the Ruby code will need to include it using require. As the library is a Ruby gem, and not a core library, you will need to first load the core "rubygems" library, (or another alternative gem package manager library).

Side note: The rubygems library (or an alternative) will update the require method to also look in the gem package folder, rather than just the default bundled core library folder. Since there are alternative gem package managers, the require for rubygems is generally only present in the top level project file, and not sprinkled into every library file that uses a gem. Sprinkling require 'rubygems' throughout libraries would take away the choice of library users as to what package manager they want to use, since the library would then be causing rubygems to be loaded.

Code: [Select]
require 'rubygems'  # Sometimes omitted depending on context, or a different gem package manager
require 'citrus'

A grammar is then loaded from an external .citrus file, using the Citrus.load method.
Code: [Select]
Citrus.load("grammarFileName.citrus")

The Citrus.load method will create a new Ruby Module, with a few parser specific methods added to it. Module names typically start with an upper case letter. The name of the grammar module is specified in a .citrus file using the syntax:

Code: [Select]
grammar StuffGrammar
...
end

The module name (StuffGrammar, in the example above) can be used directly in the code following the call to Citrus.load, much like when you require a source code file. In fact, the .citrus file is Ruby code, which is passed to eval, and so any valid Ruby code can be used in a .citrus file, (including malicious code). It's a very common practice in Ruby to create a DSL (Domain Specific Language) which is actually Ruby code in disguise, but used in a context where additional supporting methods are available. Here our domain is defining a grammar for use in parsing, and so extra methods such as "grammar" and "rule" are made available.

Once the grammar has been loaded, it can be used to parse an input string using the parse method.

Code: [Select]
StuffGrammar.parse(inputText)

Near the end of the Citrus page, there is a section on Debugging that gives an example of catching and reporting errors. It follows very closely to the rough idea I outlined in a previous post. Their example is as follows:

Code: [Select]
def parse_some_stuff(stuff)
  match = StuffGrammar.parse(stuff)
rescue Citrus::ParseError => e
  raise ArgumentError, "Invalid stuff on line %d, offset %d!" %
    [e.line_number, e.line_offset]
end

Notice how the Citrus::ParseError class provides information such as line_number and line_offset. Couple that with outputting the actual text of the line where the parse error occurred (the line property), and the user should be able to pinpoint the source of parse errors quite quickly.

You can read the Citrus documentation for more details, but at this point you should be able to start playing around with the example Citrus grammar files.
« Last Edit: December 29, 2015, 04:35:29 AM by Hooman »

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #15 on: December 28, 2015, 01:21:48 AM »
Alright, after brake job on the wife's car, this is priority. Get it done so I can move on and stop giving myself a headache.

I loaded Citrus the other day, and have read thru some of the literature. I'll read more and maybe write some.
-David R.V.

-GMT400 fan
-OPU Influencer

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #16 on: December 30, 2015, 10:50:04 PM »
This is the next project I swear.


I didn't even see the Citrus API documentation links on any pages. Good Find.
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #17 on: January 03, 2016, 06:17:31 AM »
I looked through the Outpost 2 code for parsing tech files today and wrote some info here:
Tech File Parser Analysis

It should be useful for building a parser and designing some lint checking rules.

I was reminded how simple the parser in Outpost 2 actually is. It appears to have been written by hand, without help from libraries. Always an option if you want to try it out yourself using simple programming primitives rather than a library. Tempting, since learning to use a parsing library involves working at it. Honestly though, I think I need to sit down with Citrus and provide some sample code for you to look at.

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #18 on: January 07, 2016, 12:58:00 PM »
Is the OP2 built in parser that simple? Would it be easier to copy what it does and then add error reporting to it? Maybe?
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #19 on: January 07, 2016, 02:14:21 PM »
I'm quite tempted to go that route. It would be a good exercise for working with some language basics, doing string processing, and regular expression matching. That has it's own value. Being able to use a parsing library to parse more complicated grammars can also be an asset. Maybe we can start with a hand written parser, and consider using the library later on.

Alright then, perhaps you should learn about Regular Expressions. You can try Rubular, which is an online regular expression testing tool. Great for learning. You'll want to learn how to match comments, tokens, quoted strings, and numbers.

Example: /Jan(uary)?|Feb(ruary)?/
Matches: Jan, January, Feb, February

Example: /[0-9]+/
Matches: 0, 10, 155, 16777216
(and many others)

Example: /[a-zA-Z_]+\s*/
Matches: "A", "cat", "snake_case", "area51"
(a word (letters, numbers, underscore), followed by optional space)

Some things to note about regular expressions:
? Matches 0 or 1 of the preceding item (an optional item)
+ Matches 1 or more of the preceding item (any number, but at least once)
* Matches 0 or more of the preceding item (an optional item, that can appear any number of times)
. Matches any single character
[] Denotes a character class, where ranges of character values can be specified
\s Represents whitespace (space, tab, newline: [ \t\n])
\d Represents a digit (same as [0-9])
\w Represents a word character (letters, numbers, underscore: [a-zA-Z0-9_])

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #20 on: January 12, 2016, 06:37:07 AM »
Here is a description of the implementation of the parsing code in Outpost 2. Excuse the rough translation from assembly code to not quite C++, not quite Ruby pseudo code. There are 3 main functions for transforming the text into meaningful data, and one bit of inlined code to skip over comments. Each function processes input data one byte at a time.

Code: [Select]
ReadInt:
EatNonNumberLoop:  // Eat whitespace and comments. Error if other non-digit characters are found
char = buffer[0] = input.next
break if (isDigit(char) || char == '-')  // Start of number
EatComment if (char == ';')  // Start of comment
ParseError if (isAlpha(char))  // Unexpected character
i = 1
ScanNumberLoop:  // Read over number to find where it ends
char = buffer[i] = input.next
break if !(isDigit(char) || char == '-')  // End of number (seems to allow embedded "-" signs)
i++
return buffer.to_int

ReadToken:
EatNonAlphaLoop:
char = buffer[0] = input.next
break if (isAlpha(char))  // Start of token
EatComment if (char == ';')  // Start of comment
i = 1
ScanTokenLoop:
char = buffer[i] = input.next
break if !(isAlpha(char) || char == '_')  // End of token
i++
buffer[i] = 0  // Null terminate string
return buffer

ReadString:
EatNonOpeningDoubleQuoteLoop: 
break if (char == '"')  // Start of string
EatComment if (char == ';')  // Start of comment
i = 0
ReadStringLoop:
char = buffer[i] = input.next
break if (char == '"')  // End of string
continue if (char == '\t' || char == '\n')  // Skip over tabs and newlines without recording them
i++
buffer[i] = 0  // Null terminate string
return buffer


EatComment:
Loop:
char = input.next
break if (char == 10)  // End of line  (10 = Linefeed character)

You can do better than this using regular expressions in Ruby. A rough untested stab at needed regular expressions is:
Comment: /;.*/
Int: /-?[0-9]+/
Token: /[a-zA-Z_]+
String: /"[^"]*"/

For the Comment regex, I assumed . won't match newlines, which it can depending on regular expression flags. For the String regex, I assumed newlines would be included within the match, which they might not be, depending on regular expression flags. Those will be points I'll leave you to google. Remember you can test out your regular expressions on Rubular.

Further, I will assign the task of testing regular expressions in Rubular. Paste an example section of a tech file into Rubular as the input text, and try the above regular expressions, and see if they match what you think they will match. Make sure to include an example where a quoted string contains an embedded newline.
« Last Edit: January 12, 2016, 06:44:07 AM by Hooman »

Offline dave_erald

  • Sr. Member
  • ****
  • Posts: 262
Re: Project Idea: Tech File Lint Checker
« Reply #21 on: January 14, 2016, 12:58:49 PM »
I can't get Rubular to do more than one regular expression at a time (which is the way it's supposed to be?)

Anyways

In this test string

Code: [Select]
BEGIN_TECH "Cybernetic Teleoperation" 03401
    CATEGORY        4
    DESCRIPTION     "Structure Factories may now produce Robot Command Center and Vehicle Factory structure kits.  _______________________________________ Our research has resulted in a specialized variant of the Command Center, with dedicated computers and communications capabilities.  In addition, all vehicle designs now include the less expensive Noesis computer, utilizing elements of the Savant technology.  This transfers much of the computing burden from the Robot Command Center to the vehicle itself."
    TEASER          "Allows production of Robot Command Center and Vehicle Factory structure kits at the Structure Factory.  _______________________________________ Prior to the evacuation from our original colony site, Workers remotely operated our vehicles using a technology called Teleoperation.  Since the catastrophe, we no longer have enough Workers to Teleoperate our vehicles.  The Savant computers at the Command Center have taken on part of this burden, but the job is taxing their capacity.  We need a specialized computer vehicle control system.  This Cybernetic Teleoperation project should allows us to operate a much larger number of vehicles."
    EDEN_COST       800
    PLYMOUTH_COST   1000
    MAX_SCIENTISTS  10
    LAB             2
END_TECH

BEGIN_TECH "Emergency Response Systems" 03301
    CATEGORY        11
    DESCRIPTION     "Structure Factories may now produce DIRT structure kits.  _______________________________________ Disaster Instant Response Teams (DIRTs) can reduce damage to structures.  Once the DIRT structure has been deployed, DIRT members trained in emergency medical care and structural reinforcement will be on the scene in a matter of seconds."
    TEASER          "Allows production of DIRT structure kits at the Structure Factory.  _______________________________________ Given the new dangers confronting our colony, we need more protection against disaster than our emergency shelters are able to provide. This project will develop new methods, tools, and techniques to respond to structural damage."
    COST            1000
    MAX_SCIENTISTS  10
    LAB             2
END_TECH

This =>   /\d[0-9*]*/ matches all integers of a completed size (ie 0,45,1007,6 and so on)and puts them in match groups (puts them in match groups when I enclose the regex in parentheses) 1 to 11 with no repeaters ( there are 11 different integer combos)

This =>  /[a-zA-Z]+/ matches all tokens as full words into match groups as well as single alone tokens a, I, as 275 separate match groups except for this > i.e. BEGIN_TECH shows up separately BEGIN then TECH and END and so on and so forth (that's me and I forgot to add in the _ to the line of code) also, adding that _ accounts for the long line separating the tech ____________________ from the description

Just imputing this =>   /;*/ matches all white spaces   ,  (spoiler, there's a lot)

This => /"[^"]*"/  matches the entire string in one match group for anything enclosed in quotations, so tech descriptions, begin_tech header desc etc etc

That's as far as I got, work time.

Work more on this later.
« Last Edit: January 15, 2016, 11:54:21 AM by dave_erald »
-David R.V.

-GMT400 fan
-OPU Influencer

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #22 on: January 15, 2016, 05:16:20 PM »
Nice to hear you've tried it.

Rubular only does one regular expression at a time. It's just a quick tool to let you see what a regular expression matches, which helps determine if you're using the right regular expression, or if it needs adjustment.

Note that \d = [0-9], so your first regex could have been: /\d+/
These are all equivalent: /\d+/ = /\d\d*/ = /\d[0-9]*/ = /[0-9][0-9]*/

If you put a "*" within the [], it will match a literal "*" character, rather than any number of the preceding character. Hence /[0-9*]*/ will match "01*23" as one match group.
If you put the "*" only after, /[0-9]*/ then there will be two match groups, "01", and "23".
If you omit the "*" completely /[0-9]/, then you'll have 4 match groups "0", "1", "2", "3".

Nice that you caught your missing underscore. It does affect matching within quoted strings when used on its own, but quoted strings should be matched by a different regular expression.

You seem to have missed the dot "." in the /;.*/ comment regular expression. The dot matches any character. The output without the . looks funny, since you're then matching zero or more occurrences of ";". There are no ";" in your sample input string. Hence it matches all empty strings containing zero occurrences of ";". The highlighting shows the empty matches between all the characters. It doesn't actually match spaces, but rather the zero length gaps between characters.

You can try combining your regular expressions using or "|". Also include parenthesis to see the match groups. This regex should match the 3 main types of info:
/(\d+)|([a-zA-Z_]+)|(\"[^"]*\")/

That should basically highlight everything except the whitespace. Try adding comments to the input text though. They will mess things up a bit with that regex. You can also try adding in a \s to match whitespace, which should then match everything, but you wouldn't really want to capture that in a match group.
/(\d+)|([a-zA-Z_]+)|(\"[^"]*\")|\s+/

Here's another regex that should also capture comments:
/(\d+)|([a-zA-Z_]+)|(\"[^"]*\")|(;.*)/
You can remove the parenthesis around the comment part if you don't want them included in the match groups.
« Last Edit: January 15, 2016, 05:18:09 PM by Hooman »

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #23 on: January 18, 2016, 02:19:44 AM »
Continuing on with using the regex, you can try something like the following:

Code: [Select]
str = 'BEGIN_TECH "Cybernetic Teleoperation" ... END_TECH'
re = /\d+|[a-zA-Z_]+|\"[^"]*\"|;.*/

# Create an array of matches:
matchArray = str.scan(str)  # => ["BEGIN_TECH", "\"Cybernetic Teleoperation\"", ..., "END_TECH"]

// Or
 
// Create an enum to process matches one at a time
matchEnum = str.enum_for(:scan, re)
matchEnum.next  # => "BEGIN_TECH"
matchEnum.next  # => "\"Cybernetic Teleoperation\""
...

You can iterate over the matches using .each. It works in either case.
Code: [Select]
matchArray.each do |match|
  // ...
end

matchEnum.each do |match|
  // ...
end

Mind you, the .each loop doesn't quite match how Outpost 2 handles processing. It uses a more complicated code structure that's more similar to a while loop with .next calls. They're validated calls though, so it would be more like nextToken, nextString, nextInt. It doesn't just grab the next match. It validates the next match is of the expected type.

I can elaborate further, but I want to stop here for the moment. I'd also like to assign a small task. Rather than having one big regex, you can define multiple smaller regex and then combine them with a regex method. You can check the Ruby docs for the Regex class to find the needed method.
Code: [Select]
RegexInt = /\d+/
RegexToken = /[a-zA-Z_]+/
RegexString = /\"[^"]*\"/
RegexComment = /;.*/
RegexAll = Regex.something(...) # *figure this out*
That would allow splitting the text into component parts using the combined expression, and then using the individual regular expressions while going through the data to validate each part is of the expected type.

Offline Hooman

  • Administrator
  • Hero Member
  • *****
  • Posts: 4955
Re: Project Idea: Tech File Lint Checker
« Reply #24 on: January 30, 2016, 10:37:49 PM »
As per the IRC discussion, I thought I'd add some notes. The immediate goal is to parse the tech data into an array of structs. What you might do, is create a function to parse data for a single tech, and then work that into a loop to parse all techs into an array of structs.

You should look into the Ruby Struct class. You can define it using syntax such as:
Code: [Select]
Tech = Struct.new(
  :name,
  :id,
  :description,
  ...
)

You should start by defining all the needed fields. Play around with the struct to get a feel for using them.
Code: [Select]
tech = Tech.new
tech.id = 1007
tech.name = "Tech Name"
puts tech.id
puts tech[:id]

tech2 = Tech.new("Tech Name 2", 1008, ...)
puts tech2.name
puts tech2[:name]

You'll also want to be familiar with the Ruby String class.
In particular, know the "scan", "match", and "=~" methods. It's also good to have a general sense of what other methods are available.

For handling arrays and collections, it's good to read up on the Ruby Enumerable module. Many other objects include this module to provide a standard set of methods for working with collections.
In particular, try to be familiar with "each" (provided by any class that uses Enumerable), along with provided methods "map", and "inject". Try to get a sense of what other methods are offered in this module. You may or may not need this right away, but it will be useful for later.