What the heck is a parser-combinator?
- 9 minutes read - 1813 wordsBackground and Basics
In a recent project engagement, we were assigned the job to migrate a COBOL-based mainframe application to a new environment. The core theme specifically for this project follows a re-hosting approach. The reasoning for this type of approach is agreed upon with the customer mainly due to time and cost.
Of course, the scope and effort for such a transformation is quite huge and there are many tasks involved for the complete application to run on a completely different platform. Some of the activities include:
- migrating the database (flat files, network database, etc.) and data to a common data store in Microsoft SQL Server
- re-platforming the COBOL source code to commodity hardware running a virtualized environment using Micro Focus
- adjusting the DB access layer to reflect changes
- re-design of the batch processing and job control
The list above is really just a few of the tasks involved. In this post, I want to focus on the JCL (job control language) portion and how we managed to transform these scripts into a new environment. Our idea was to keep it in a scripting fashion and chose Microsoft PowerShell as a target platform for this.
With time and cost in mind, you typically want to reduce the effort as much as you can. One way to achieve this is through automation so that you can minimize the manual effort as much as possible. I came up with the idea of automatically translating the JCL statements into corresponding (or at least similar) MS PowerShell commands. I first started with a spreadsheet that I created to list the majority of the JCL statements. I then tried to understand the main purpose behind these statements and what they are used for. I then looked for patterns in these statements and a meaningful way of implementing these in the future platform.
Scala and FastParse
Once this was achieved I started to think about an easy and flexible way to translate JCL into PowerShell. The idea was born to use parsers to understand the JCL scripts. A colleague presented a very interesting project by Li Haoyi named FastParse. “FastParse is a parser-combinator library for Scala that lets you quickly and easily write recursive descent text- and binary data parsers in Scala.” Li also wrote a great post on parsing using parser-combinators … here is the link. In short, a parser-combinator is a much more powerful way of parsing structured text into data structures. It has great advantages over simple regular expressions since it not only finds such patterns, but also returns contextual information about the paragraph back in a custom structure. Wikipedia has a lot of additional information, if you are interested.
As a matter of fact, it is really simple and easy to use this library to create a Scala script that reads in a JCL file and converts it into a PowerShell script with a high degree of automatic translation. To give you an example, here is a sample line of JCL code
$ NOTE some notes and comments written here
As you can tell, a typical JCL line always starts with a ‘$’ sign in the first column. Then, the actual command starts in column 8 up until column 15. Starting in column 16 is the content of the command, in this case the actual comment or note.
The steps to identify and process this line of code would look like this:
- read in a line of code from the JCL
- use a regular expression to find a suitable pattern
- if a notes section is identified, parse this line of code using the corresponding FastParse clause
- convert it into an equivalent MS PowerShell command
The FastParse clause used for this Scala library looks like this:
val note = P(“note” | “NOTE”) val noteClause = P(Start ~ dollar ~ ws ~ note ~ ws ~ identifier.! ~ AnyChar.rep ~ End)
Some syntax highlights here:
- the clause is enclosed in the Start and End statements
- ws is simply one or more whitespaces
- note is specified in another clause since this can be upper or lower case
- identifier.! will look for the actual content and store it in a variable so that we can make use of that later, the exclamation mark specifies this to be required
- AnyChar.rep specifies that there can be any character in a repetitive fashion
This clause can then be executed like this:
val Success ( noteStmt, index ) = noteClause.parse(line) if (noteStmt != null) { println("\t#%s".format (noteStmt)) }
The first line actually executes the FastParse clause and in the case of success, returns the actual notes in the noteStmt variable. This will then be printed into a new file (the PowerShell file, that is created earlier in the script) with a simple println statement. This shows the real power of this tool. It not only parses the content, but also returns one (or many, even including lists) structured information back in a very easy and clever way.
Here is another example:
$ RUN RUFILE=folder1/folder2/folder3/folder4/filename,
val runClause1 = P(Start ~ dollar ~ ws ~ run ~ ws ~ rufile ~ equal ~ (identifier.!).rep(sep = “/”) ~ comma ~ AnyChar.rep ~ End)
val Success ( (runOptions), index ) = runClause1.parse(line) if (runOptions != null) { for(runOpt <- runOptions) { // each folder and filename is available in ‘runOpt’ here } }
The first line shows the original JCL command, a RUN command that allows the script to execute an external file, i.e. typically a COBOL program. The RUFILE option tells the command where the COBOL program is located (folders and filename). As you can see in the FastParse clause, the definition of the folders and filename structure is simply a repetitive identifier specification where the separator is specified with a “/” sign. When executing this clause, all the folders and filenames will be stored in the runOpt variable that you can then simply loop through.
Parser-Combinator using C# .NET
The above mentioned FastParse library for Scala is great and I love its simple, but yet powerful way of creating such a parser-combinator environment. However, my key background and passion is in the Microsoft development stack and I always wondered whether or not there is something similar for C# and the .NET framework. After a little bit of research and investigation, I found a couple of very interesting tools and libraries:
- Building an External DSL in C#: a very interesting blog about creating an external DSL (Domain Specific Language); the blog seems pretty outdated, however, the source control on GIT (Tiny C# Monadic Parser Framework) is quite up-to-date
- Real World Haskell - Using Parsec: this focuses on a Haskell approach, but there is also quite interesting implementations for .NET
- Library of monads for C#: A C# library of monads and a full set of parser-combinators based on the Haskell Parsec library
Out of the above listed tools and libraries, I found the first to be the most interesting to me and for the work I wanted to accomplish. I re-used a lot of the “Sprache” implementation and then created the JCL to PowerShell logic on top of that. So, how does that look like?
Here are the steps I took to implement my custom logic:
- I created a new Console App (just for demo purposes, you could use any other project type)
- I re-used the library functionality like Parse, Parser, Result, etc. logic
- Added a new class that is called from main program that actually reads in a source file
- each line of code from the source is then parsed and processed
The approach is pretty similar to the on above with Scala and FastParse. However, the clause implementation is a bit different. The C# code of a sample transformation looks like this:
$ GLOBAL envment=(dev)
public static readonly Parser JCLText = from open in Parse.Char(’$’) from ws1 in Parse.Char(’ ‘).Many() from command in Parse.CharExcept(’ ‘).Many().Text() from ws2 in Parse.Char(’ ‘).Many() from content in Parse.CharExcept(’"’).Many().Text() select new JCLCommand(command, content);
public static readonly Parser GlobalText = from variablename in Parse.CharExcept(’=’).Many().Text() from ws2 in Parse.Char(’=’) from openbrack in Parse.Char(’(’) from filepath in Parse.CharExcept(’)’).Many().Text() from closebrack in Parse.Char(’)’) select new JCLCommand(variablename, filepath);
The first line is again a sample JCL statement. It actually defines a global variable named envment and assigns a value of dev to it. I am using the JCLText C# clause first to determine the actual JCL command being used (along with any options and parameters being specified). The C# to call this clause is quite simple (but again, yet powerful):
var p = JCLText.TryParse(line); if (p.WasSuccessful) { … }
The TryParse method takes a line of code and tries to parse it. If that is successful, I know I have a valid JCL type statement and can further process it … or ignore for that matter and just comment this line in PowerShell. In the positive case, I use the GlobalText clause to parse the option section of the GLOBAL command. As you can see, the definition is again quite straight forward. The code looks for a variable name, an “=” sign and then a content section (i.e. in the example above, I am typically looking for a file path that is specified in the global variable) enclosed in brackets. The source to actually implement this logic is as simple as this:
var globalval = GlobalText.TryParse(p.Value.Content); if (globalval.WasSuccessful) { sb.Append(string.Format("\t${0} = \"{1}\"", globalval.Value.Command, globalval.Value.Content)+"\n"); globals.Add(globalval.Value.Command.ToLower()); } else { sb.Append("\t#" + line + “\n”); }
I am using a StringBuilder to actually create the new PowerShell script that will finally be written to a new file. In the case of the parse job being successful, I append the PS logic to the StringBuilder (sb). If not, I just comment out this section and also write it to the target file (I regularly do that for tracking purposes, so that a future state SME can validate what’s happening in the old and new world). The “globals.add(…)” command simply adds the name of the global variable to a HashSet so that I can easily reference this global variable in subsequent sections of the same PowerShell script.
Conclusions
Leveraging such parser-combinators is extremely powerful. It reduces the amount of code to be implemented when looking for expressions and process the content in some way or another. I can only recommend these tools and libraries since they are typically quite easy to implement but add so much value.
It was very interesting to see that the similarities in these different programming languages are enormous and the learning curve is quite low. The FastParse syntax of defining a clause looks cleaner to me and is visually more observable.
However, the best part of all of this is that the awesome parser-combinator approach, that I really appreciated in the actual project, is also available in my favorite development stack .NET.
Happy parsing and let me know if you have any questions.