The Wolf Bytes: MGrammar

Showing posts with label MGrammar. Show all posts

Tuesday, October 13, 2009

Oslo - MGraph2POCO

When working with Oslo one of my biggest wants was the ability to deserialize MGraph to POCO (Plain Old CLR Objects). It doesn’t look this was going to happen anytime soon so I decided to just go ahead and write one. It actually turned out fairly well and I’m sure other people might find some use in it so I decided to put it on CodePlex.

http://mgraph2poco.codeplex.com/

Enjoy!

-ec

Thursday, June 25, 2009

Parsing MGraph

Ok – now you created your MGrammar, have a project to turn your users input from text to MGraph, now what? You have a few options. For my little toy project I want to take the users input and turn it into a graph of POCO objects that I can do something with. When defining your grammar you will have the opportunity to define the projections or how you want your MGraph to look. You can find a great article on MGraph at MSDN.

Intellipad is great when defining, testing and building the projections for your grammar. Here is the relevant portion of the MGraph generated by the About My Pets grammar:

Sample MGraph Output Fragment, this would be considered one record within MGraph.

{
    PetInfo => {
      Name => "Razor",
      AnimalType => "Terrier",
      Age => "9",
      Sex => "female",
      PrimaryColor => "white",
      SecondaryColor => "brown"
    },
    Activities => {
      {
        Type => "jump",
        Minutes => null,
        Seconds => "12"
      },
      {
        Type => "bark",
        Minutes => "2",
        Seconds => null
      }
    },
    Meals => {
      {
        Cups => "1",
        MealTime => "breakfast"
      },
      {
        Cups => "2",
        MealTime => "dinner"
      }
    }
}

This is nice to review but what can we do with it?

First off let’s generate some XAML, from our input. I found this was a good way to understand what was going on under the hood as well as some of the terminology.

Configuration to run M Command Line Tools

First off, we need to configure a command prompt to get it a little easier for you work with the Oslo command line tools. To do this create a simple .CMD file somewhere and add the following text:

@set PATH=%PATH%;%programfiles%\Microsoft Oslo\1.0\bin

Then create a short cut to that file with the following information:

Click on that shortcut to open the command prompt, you should now have the Oslo tools in your path, to test this simply type “m”

and you should see something like this:

Compiling our Grammar

Now back to our regularly schedule post. First we need to compile our grammar so change the directory to where your file was saved and type:

C:\MyDirectory\m [MyGrammarFile].mg

When you do so, you should see the following:

This created a compiled version of your grammar with an MX extension, confirm this by looking at the contents of the directory.

Generate some XAML for your User Input

Now we are ready to do something with our input file, so save a sample of user input into a text file in something like UserInput.txt

The sample I saved is similar to:

About my pets

Razor is a Terrier that is a
9 year old female, and her color is white.
She will jump for 15 seconds and bark for 2 minutes.
She eats 1 cup of food for breakfast and
eats 2 cups of food for dinner.

Rocket is a Rat Terrier that is a 6 year old male,
and his color is black. He will whine for 3 minutes.
He eats 2 cups of food for dinner

Next run the following:

C:\MyDirectory\mgx /r:[MyGramamrFile].mx UserInput.txt /t:xaml

This will generate a XAML file from your sample user input or let you know any errors that may have occurred.

From my sample grammar, here is portion of the generated XAML.

Getting at this data in .NET

This is nice, and may be a little easier to parse, but what I really want to do is deserialize MGraph into a tree of POCO (plain old CLR objects) so I can do some “stuff” with the data. I built a simple method to load the compiled grammar, and build an in-memory representation of the parsed user input with an instance of System.Dataflow.GraphBuilder from the Oslo runtime library.

Before you get started, you need to add references to your project:

using Microsoft.M;
using System.DataFlow;

Both can be found in the <Program Files>\Microsoft Oslo\1.0\bin directory

Create our Parser

First we need an class that knows how to format parsing errors, this will inherit from System.DataFlow.ErrorReporter.

using System;
using System.Dataflow;
using System.Linq;

namespace PetParser
{
    class ParserErrorReporter : ErrorReporter
    {
        protected override void OnError(ErrorInformation errorInformation)
        {
            string msg = string.Format(errorInformation.Message, errorInformation.Arguments.ToArray());

            throw new FormatException(
                string.Format("Syntax error at [{0}, {1}]: {2}",
                errorInformation.Location.Span.Start.Line,
                errorInformation.Location.Span.Start.Column,
                msg));
        }
    }
}

Parse the User Input

Next we’ll create a simple method to load up our compiled grammar, build a parser instance, perform the parse and hand the MGraph representation as an object graph of a set System.Dataflow.Node instance.

public void ParseGrammar(string grammar)
{
    //Load the grammar that was compiled into the assembly as a resource.
    using (var img = MImage.LoadFromResource(System.Reflection.Assembly.GetExecutingAssembly(), "PetParser.mx"))
    {
        //Load up the specific MGrammar file.
        var factory = img.ParserFactories["Pets.MyPets"];

//Create our parser.
var parser = factory.Create();

        //Create an instance of NodeGraphBuilder to let the
        //parser know what type of graph to build.
        parser.GraphBuilder = new NodeGraphBuilder();

        try
        {
            var grammarTextStream = new StringTextStream(grammar);
            //Attempt to parse the grammar, any errors will be
            //formatted via the ParserErrorReporter and handed
            //via Try/Catch
            var root = (Node)parser.Parse(grammarTextStream, new ParserErrorReporter());

WalkTree(root);

        }
        catch (FormatException exc)
        {
            //If we get an error parsing the grammar, just write it (for now)
            LogIt(0, exc.Message);
        }
    }
}

Walk the System.Dataflow.Node tree

void WalkTree(Node node, int level)
{
    foreach (Edge recordEdge in node.Edges)
    {
        var value = string.Empty;
        if (recordEdge.Node != null && recordEdge.Node.AtomicValue != null)
            value = string.Format(@"Value: ""{0}""", recordEdge.Node.AtomicValue.ToString());

LogIt(level, "{0}. [{2}] Brand Text: \"{3}\" Label Text: \"{4}\" {5}", level, node.Brand.Text, node.NodeKind, node.Brand.Text, recordEdge.Label.Text, value);

        WalkTree(recordEdge.Node, ++level);
        --level;
    }
}

Output

Now when we execute this we get the following output

The following table is from the MSDN Page “MGraph Object Model”

Now that we are walking the tree, we just need to create instances of and populate our simple .NET types. This could be done with some sort of state machine or an implementation of the visitor pattern.

I wouldn’t be very surprised if in upcoming releases of Oslo there will be built in mechanisms to create some sort of .NET code/assemblies and provide automatic serialization/deserialization using MSchema, but for now this isn’t too bad.

Next task is to build a front end to collect user input. I’m thinking along the lines of an Azure hosted Silverlight application. I’m thinking the grammar will be passed to the sever via WCF and all the actual parsing will happen there.

-ec

Embedding MGrammar Files Within Your Project

Intellipad is an awesome tool, but if you really want to do some work with Oslo you are probably going to need to write a little code. This implies that you will need to get your MGrammar definition files into your Visual Studio project.

Once you create and do some testing on your MGrammar definition files within Intellipad save them with the extension .MG and include them within your Visual Studio.NET project. Next you will need to do a bit of hand-coding on your project file, the easiest way to do this is just to click on “Unload” and the “Edit”.

Once you do that, you will need to modify your project file with the following changes:

and

and then finally change the action for your .MG file

Here are the actual lines for you to cut-and-paste enjoyment:

<MgTarget>Mgx</MgTarget>
<MgTarget>MgxResource</MgTarget>
<MgIncludeReferences>false</MgIncludeReferences>

<MGrammarLibraryPath Condition="'$(MgCompilerPath)'!=''">$(MgCompilerPath)</MGrammarLibraryPath>
<MGrammarLibraryPath Condition="'$(MGrammarLibraryPath)'==''">$(ProgramFilesx86)\Microsoft Oslo\1.0\bin</MGrammarLibraryPath>

Once you’ve done this, you will need to reload your project.

At this point when you compile, your MGrammar should get compiled and included in your project as a resource. You can verify this by using Reflector.

Now to actually do something with this in your application you need to load the resource. The following chunk of code will do that for you:

DynamicParser parser = null;

using (var img = MImage.LoadFromResource(System.Reflection.Assembly.GetExecutingAssembly(), "AboutMyPets.Web.mx"))
{
var factory = img.ParserFactories["Pets.MyPets"];
parser = factory.Create();
}

Also as a side note, you will need to add the following reference to your project.

using Microsoft.M;
using System.DataFlow;

Both can be found in the <Program Files>\Microsoft Oslo\1.0\bin directory

Now that we have our grammar loaded from a resource into M’s DynamicParser instance we can pass in some text and hopefully get something meaningful out of it. Stay tuned for an upcoming post…

-ec

Tuesday, May 26, 2009

My First Oslo MGrammar

Ever since working with FORTH, I’ve been a big fan of languages and understanding how they work. FORTH did a great job of extending FORTH using FORTH, very cool. Last fall I was introduced to Oslo, and was very impressed. At the time, it looked like a pre-bleeding edge CTP, so I didn’t spend too much time with it. Today (May 26, 2009) Microsoft released the May CTP of Oslo. Although I’m sure the standards aren’t 100% baked, if you are interested, now is the time to start playing with it. I have good intentions of publishing a better description of how this grammar was built, but with the limited time I have, I wanted to at least get my first grammar on my site. There will be three components you need to know about when using Intellipad, the first is the development of the grammar, the second is where you enter your sample grammar and the third is the output. The output is in the form of MGraph which you can use in your application.

Intellipad (better detail on the content below)

Here’s my very first MGrammar

module Pets
{
    @{CaseSensitive[false]}
    language MyPets
    {
        syntax Main = i:InitialStatement
          additional:(ThePet)+
          friends:(Friends)+
          foes:(Foes)+
          => MyPets {
                     valuesof(additional),
                     Friends [valuesof(friends)],
                     Foes [valuesof(foes)]
                     } ;
       syntax ThePet = t:Pet w:Activity+ h:HowMuchTheyEat+
            => {t, Activities [valuesof(w)], Meals [valuesof(h)]} ;
       syntax Friends = f:PetsFriends => {valuesof(f)};
       syntax Foes = f:PetsFoes => {valuesof(f)};
        token Upper = 'A'..'Z';
        token Lower = 'a'..'z';
        token Digit = '0'..'9';
        token Name = Upper Lower*;
        token Time = '0'..'9' Digit?;
        token Age = '0'..'9' Digit?;
        token Cups = '0'..'9' Digit?;
        token Type = Upper Lower*;
        syntax Action = 'run' | 'jump' | 'play' | 'bark' | 'fetch'
                              | 'hunt' | 'whine';
        syntax Sex = 'male' | 'female';
        syntax Color = 'white' | 'black' | 'brown';
        syntax MealTime = 'breakfast' | 'lunch' | 'dinner';
        syntax AnimalTypes = 'Rat Terrier' | 'Terrier' | 'dog' | 'cat'
                              | 'fish' | 'bird' | 'pig' | 'Poodle';
        syntax Pronoun = "he" | "He" | "she" | "She" | "Her"
                              | "His" | "her";
        @{Classification["Keyword"]} token AnimalColor = "color is";
        @{Classification["Keyword"]} token Will = "will" | "will also";
        @{Classification["Keyword"]} token Eats = "eat" | "eats";
        @{Classification["Keyword"]} token CupsOfFood = "cups of food for"
                              | "cup of food for";
        @{Classification["Keyword"]} token For = "for";
        @{Classification["Keyword"]} token IsFriends = "is friends with";
        @{Classification["Keyword"]} token DoesntLike = "does not like";
        @{Classification["Keyword"]} token IsA= "is a";
        @{Classification["Keyword"]} token Minutes = "minutes"
                              | "minute";
        @{Classification["Keyword"]} token Seconds = "seconds"
                              | "second";
        @{Classification["Keyword"]} token And = "and";
        @{Classification["Keyword"]} token YearOld = "year old";
        syntax InitialStatement = "About my pets";

        syntax Pet = n:Name IsA t:AnimalTypes IsA a:Age YearOld s:Sex
                 Pronoun AnimalColor c:Color
                 => Profile {Name=>n, AnimalType=>t, Sex=>s, Color=>c};
        syntax Activity = Pronoun? Will? a:Action For? m:Time? Minutes?
                 s:Time? Seconds?
                 => Activity {Type=>a, Minutes => m, Seconds => s };
        syntax HowMuchTheyEat = Pronoun? Eats q:Cups CupsOfFood m:MealTime
                 => HowMuchTheyEat {Cups=>q, MealTime=>m};
        syntax PetsFriends = n:Name IsFriends f:Name
                 => Friends {Name=>n, Friend=>f};
        syntax PetsFoes = n:Name DoesntLike f:Name => Foes {Name=>n, Foe=>f};
        interleave whitespace = ("and" | "," | " " | "\r"
                  | "\n" | "\t" | "." | "that")+;
    }
}

Sample Text

What you see in bold is the tokens that make up my grammar, the plain text are the values I want to extract as data.

About my pets

About my pets

Razor is a Terrier that is a 9 year old male, and
her color is white. She will jump for 15 seconds
and bark for 2 minutes. She eats 1 cup of food for
breakfast and eats 2 cups of food for dinner.

Razor is friends with Rocket

Rocket does not like Rascal

MGraph

MyPets{
[
    {
      Profile{
        Name => "Razor",
        AnimalType => AnimalTypes[
          "Terrier"
        ],
        Age => "9",
        Sex => Sex[
          "male"
        ],
        Color => Color[
          "white"
        ]
      },
      Activities[
        Activity{
          Type => Action[
            "jump"
          ],
          Minutes => null,
          Seconds => "15"
        },
        Activity{
          Type => Action[
            "bark"
          ],
          Minutes => "2",
          Seconds => null
        }
      ],
      Meals[
        HowMuchTheyEat{
          Cups => "1",
          MealTime => MealTime[
            "breakfast"
          ]
        },
        HowMuchTheyEat{
          Cups => "2",
          MealTime => MealTime[
            "dinner"
          ]
        }
      ]
    }
],
Friends[
    [
      {
        Name => "Razor",
        Friend => "Rocket"
      }
    ]
],
Foes[
    [
      {
        Name => "Rocket",
        Foe => "Rascal"
      }
    ]
]
}

More to come, I’m really excited about this technology!

Updated 6/12/2009 – Added Friends and Foes grammar and changed MGraph Structure

-ec