As per our discussion on IRC about Ruby parsers, the
Citrus library actually looked quite promising. Instead of using Treetop for parsing (which was an arbitrary library choice), we'll amend the plan to use Citrus. The Citrus library can be installed with the gem package manager using the command line:
To use this library, the Ruby code will need to include it using
require. As the library is a Ruby gem, and not a core library, you will need to first load the core "rubygems" library, (or another alternative gem package manager library).
Side note: The rubygems library (or an alternative) will update the
require method to also look in the gem package folder, rather than just the default bundled core library folder. Since there are alternative gem package managers, the require for rubygems is generally only present in the top level project file, and not sprinkled into every library file that uses a gem. Sprinkling
require 'rubygems' throughout libraries would take away the choice of library users as to what package manager they want to use, since the library would then be causing rubygems to be loaded.
require 'rubygems' # Sometimes omitted depending on context, or a different gem package manager
require 'citrus'
A grammar is then loaded from an external .citrus file, using the
Citrus.load method.
Citrus.load("grammarFileName.citrus")
The
Citrus.load method will create a new Ruby
Module, with a few parser specific methods added to it. Module names typically start with an upper case letter. The name of the grammar module is specified in a .citrus file using the syntax:
grammar StuffGrammar
...
end
The module name (StuffGrammar, in the example above) can be used directly in the code following the call to
Citrus.load, much like when you
require a source code file. In fact, the .citrus file is Ruby code, which is passed to
eval, and so any valid Ruby code can be used in a .citrus file, (including malicious code). It's a very common practice in Ruby to create a DSL (Domain Specific Language) which is actually Ruby code in disguise, but used in a context where additional supporting methods are available. Here our domain is defining a grammar for use in parsing, and so extra methods such as "grammar" and "rule" are made available.
Once the grammar has been loaded, it can be used to parse an input string using the
parse method.
StuffGrammar.parse(inputText)
Near the end of the Citrus page, there is a section on Debugging that gives an example of catching and reporting errors. It follows very closely to the rough idea I outlined in a previous post. Their example is as follows:
def parse_some_stuff(stuff)
match = StuffGrammar.parse(stuff)
rescue Citrus::ParseError => e
raise ArgumentError, "Invalid stuff on line %d, offset %d!" %
[e.line_number, e.line_offset]
end
Notice how the
Citrus::ParseError class provides information such as
line_number and
line_offset. Couple that with outputting the actual text of the line where the parse error occurred (the
line property), and the user should be able to pinpoint the source of parse errors quite quickly.
You can read the
Citrus documentation for more details, but at this point you should be able to start playing around with the example Citrus grammar files.