Farkle


Quick Start: Creating a calculator

Hello everyone. This guide will help you use Farkle. We will be using F#, but during the process, you will learn some useful things about Farkle itself. There's another guide that explains what's different with C#. Familiarity with context-free grammars and parsing will be very helpful.

How Farkle works

Note: This is an oversimplified, high-level description of Farkle. I will write a more technical description of it in the future.

While parser combinator libraries like FParsec combine many small parsers into a big parser, Farkle combines simple grammars (i.e. descriptions of languages) into more complex ones, and has a single, multi-purpose parser. These composable grammars are called designtime Farkles.

Also, as with FParsec, a designtime Farkle can "return" something. To accomplish this, there is a special object called a post-processor. Post-processors create a domain-specific type from the input text. For our calculator, we will automatically create a post-processor that returns a number, which is the numerical result of our mathematical expression.

To be able to use a designtime Farkle, we will first feed it to a component called the builder. The builder will check the grammar for errors, create parsing tables with the LALR algorithm, and give us another special object called a runtime Farkle. Runtime Farkles contain a grammar and a post-processor. With a runtime Farkle, we can parse text to our heart's desires.

10: By the way, Farkle means: "FArkle Recognizes Known Languages Easily".

20: And "FArkle" means: (GOTO 10).

30: I guess you can't read this line.

Designing our grammar

We want to design a grammar that represents a mathematical expressions on floating-point numbers. The supported operations will be addition, subtraction, multiplication, division, and unary negation. The operator precedence has to be honored, as well as parentheses.

A similar grammar but on the integers, can be found here

For those that don't know, a context-free grammar is made of terminals, nonterminals and productions.

* Nonterminals are composite symbols made of terminals and other nonterminals.

One of the nonterminals is designated as the _start symbol_, and it is the nonterminal from which parsing will start.

The terminals

We don't have to explicitly write a terminal for the mathematical symbols. They are just symbols that can have only one value and do not contain any meaningful information other than their presence. Farkle calls these types of symbols literals and treats them specially to reduce boilerplate.

So we are now left with one terminal to implement; the terminal for our decimal numbers. In Farkle, terminals are made of a regular expression that specifies the text that corresponds to this terminal, and a delegate called transformer that converts the text to an arbitrary object.

There are three ways to create this terminal, starting from the simplest:


The Farkle.Builder.Terminals has functions that allow you to create some commonly needed terminals, like integers or floating-point numbers. We create our terminal this way:

open Farkle
open Farkle.Builder
open System

let number = Terminals.genericReal<float> false "Number"

The boolean parameter specifies whether to allow a minus sign at the beginning (we don't). The last parameter is the terminal's name, used for error reporting.

The Terminals module has more functions. You can see them all in the documentation.


If Farkle doesn't have a ready to use function for your terminal, we have to create the terminal ourselves. The most easy way to do it is to write a regex using a string:

let numberStringRegex =
    Regex.regexString @"\d+(\.\d+)?(e[+-]?\d+)?"
    |> terminal "Number" (T(fun _ x -> float (x.ToString())))

The regexString function uses a quite familiar regex syntax. You can learn more about it at its own documentation page.

Let's take a look at the terminal function. Its last parameter is the regex, which we passed at the beggining for convenience and its first parameter is the terminal's name; nothing unusual here. Its second parameter is called a transformer and is a delegate that convert's the characters matched by our regex to an arbitrary object; in our case an integer. Its first parameter is an object of type ITransformerContext and is useful if you want to access the token's position or share some state between transformers. Its second parameter is a ReadOnlySpan of characters, which were converted to a floating-point number by our transformer.

T is the delegate's F# name; it was shortened to one letter for brevity.


For the most advanced users, Farkle allows you to construct a regex from code. Directly constructing a regex from code is rarely useful for the average user of Farkle, but might come in handy when for example the regex's structure is not known at compile time, or it is very complex.

open Farkle.Builder.Regex

let numberConstructedRegex =
    // Regexes are composable!
    let atLeastOneNumber = chars PredefinedSets.Number |> atLeast 1
    concat [
        atLeastOneNumber
        optional <| (char '.' <&> atLeastOneNumber)
        [chars "eE"; chars "+-" |> optional; atLeastOneNumber]
        |> concat
        |> optional
    ]
    |> terminal "Number" (T(fun _ x -> float (x.ToString())))

You can learn more about the functions above at the documentation. More character sets of the Farkle.Builder.PredefinedSets module can also be found at the documentation

Note: The regexes' type is Farkle.Builder.Regex. They are totally unrelated to System.Text.RegularExpressions.Regex. We can't convert between these two types, or directly match text against Farkle's regexes.

The terminal we created is of type DesigntimeFarkle<float>. This means that we can use it to parse floating-point numbers from text, but we want to create something bigger than that. As we are going to see, we can compose designtime Farkles into bigger ones, using nonterminals.

The nonterminals.

Writing simple nonterminals.

Because the calculator's nonterminals are a bit complicated, we have to take a brief interlude and tell how to create simpler ones.

Say we want to make a very simple calculator that can either add or subtract two numbers together. And let's say that an empty string would result to zero. This is the grammar of our calculator in Backus-Naur Form (don't worry if you can't understand it):

<Exp> ::= Number + Number
<Exp> ::= Number - Number
<Exp> ::= <>

Writing this in Farkle is actually surprisingly simple:

let justTwoNumbers = "Exp" ||= [
    !@ number .>> "+" .>>. number => (fun x1 x2 -> x1 + x2)
    !@ number .>> "-" .>>. number => (fun x1 x2 -> x1 - x2)
    empty =% 0.0
]

Let's explain what was going here. With the ||= operator, we define a nonterminal with its productions. In its left side goes its name, and in its right side go the productions that can produce it.

See these strange symbols inside the list? They chain designtime Farkles together and signify which of them have information we care about. !@ starts defining a production with its first member carrying significant information (the first operand). To start a production with a designtime Farkle that does not carry significant information, we can use !%.

The .>> and .>>. operators resemble FParsec's ones. .>> chains a new designtime Farkle we don't care what contains, and .>>. chains one we do.

With .>>, we can also chain string literals, instead of creating a terminal for each. We can also start a production with a literal using the !& operator.

The => operator finishes the creation of a production with a function that combines its members that we marked as significant. Such functions are called fusers. In the first production we added the numbers and in the second we subtracted them. So, depending on the expression we entered, _justTwoNumbers would return either the sum, or the difference of them. Obviously, all productions of a nonterminal have to return the same type.

In the third case, we defined an empty production using empty (what a coincidence!) We used empty =% 0. as a shortcut instead of writing empty => (fun () -> 0.).

An unfinished production is called a production builder. You can mark up to 16 significant members in a production builder.

You can pass an empty list in the right hand of the ||= operator but the grammar will be invalid. A nonterminal must always have at least one production.

Writing more complex nonterminals

We want our calculator to implement the following grammar taken from Bison's documentation:

%left '-' '+'
%left '*' '/'
%precedence NEG
%right '^'

exp:
    NUM
    | exp '+' exp        { $$ = $1 + $3;      }
    | exp '-' exp        { $$ = $1 - $3;      }
    | exp '*' exp        { $$ = $1 * $3;      }
    | exp '/' exp        { $$ = $1 / $3;      }
    | '-' exp  %prec NEG { $$ = -$2;          }
    | exp '^' exp        { $$ = pow ($1, $3); }
    | '(' exp ')'        { $$ = $2;           }
;

And this is how to implement it in Farkle:

open Farkle.Builder.OperatorPrecedence

let expression =
    let NEG = obj()

    let expression = nonterminal "Expression"
    expression.SetProductions(
        !@ number |> asIs,
        !@ expression .>> "+" .>>. expression => (fun x1 x2 -> x1 + x2),
        !@ expression .>> "-" .>>. expression => (fun x1 x2 -> x1 - x2),
        !@ expression .>> "*" .>>. expression => (fun x1 x2 -> x1 * x2),
        !@ expression .>> "/" .>>. expression => (fun x1 x2 -> x1 / x2),
        !& "-" .>>. expression |> prec NEG => (fun x -> -x),
        !@ expression .>> "^" .>>. expression => (fun x1 x2 -> Math.Pow(x1, x2)),
        !& "(" .>>. expression .>> ")" |> asIs
    )

    let opScope =
        OperatorScope(
            LeftAssociative("+", "-"),
            LeftAssociative("*", "/"),
            PrecedenceOnly(NEG),
            RightAssociative("^")
        )

    DesigntimeFarkle.withOperatorScope opScope expression

As you see, our grammar in Farkle looks pretty similar to the one in Bison. Let's take a look at some newly introduced things:

Defining recursive nonterminals

The nonterminal function is useful for recursive nonterminals. It creates a nonterminal whose productions will be set later with the SetProductions method. We can only once set them, all together. Calling the method again will have no effect.

Our nonterminal is of type Nonterminal<float>, but implements DesigntimeFarkle<float> which is actually an interface.

Warning: Despite designtime Farkles being interfaces, implementing it in your code is not allowed and will throw an exception if a custom designtime Farkle implementation is somehow passed to Farkle.

Operator Precedence

We specify the operator associativity and precedence using an operator scope that is made of associativity groups. There are four associativity group types: LeftAssociative, RightAssociative, NonAssociative and PrecedenceOnly that behave similarly to Bison's %left, %right, %nonassoc and %precedence. Their difference is best explained at Bison's documentation.

The groups in an operator scope are sorted by precedence in ascending order. In our grammar, the + and - symbols have the lowest precedence, followed by * and /. Until now, all these operators are left-associative, meaning that 1 + 2 + 3 is interpreted as (1 + 2) + 3.

Next in the precedence hierarchy is the unary negation. We can't define - again; instead we use the prec function to assign the unary negation's production a contextual reflection token, which is a dummy object that represents the production in the operator scope. This way, Farkle will recognize that - has higher precedence in unary negation than in subtraction. That new group is of type PrecedenceOnly, meaning that it doesn't specify associativity; only precedence.

And at the highest priority we have the exponentiation operator, which is right associative, meaning that 2 ^ 3 ^ 4 is interpreted as 2 ^ (3 ^ 4).

We set this operator scope to our designtime Farkle using the DesigntimeFarkle.withOperatorScope function. There are some more things to be careful about when using operator scopes:

Building our grammar

With our nonterminals being ready, it's time to create a runtime Farkle that can parse mathematical expressions. The builder uses the LALR algorithm to create parser tables for our parser. It also creates a special object called a post-processor that will execute the transformers and fusers when it is needed.

All that stuff can be done with a single line of code:

let myMarvelousRuntimeFarkle = RuntimeFarkle.build expression

Using the runtime Farkle

Now that we got it, it's time to put it to action. Farkle supports parsing text from various sources, namely strings, arbitrary character buffers on the heap (like substrings, arrays or parts of arrays) using System.ReadOnlyMemory<char>, files and System.IO.TextReaders.

The functions return an F# Result type whose error value (if it unfortunately exists), can show exactly what did go wrong.

Note: If a grammar is invalid (has an LALR conflict, two terminals are indistinguishable or something else), building would still succeed, but parsing would fail every time.

Let's look at some some examples:

open System.IO

// You can consume the parsing result like this:
match RuntimeFarkle.parseString myMarvelousRuntimeFarkle "103 + 137+281" with
| Ok result -> printfn "The answer is %f" result
// The %O format specifier (or alternatively, calling ToString())
// will create human-readable error messages.
| Error err -> printfn "Error: %O" err

// You can parse any Memory<char>, such a substring or even an array of characters!
let mem = "The answer is 45".AsMemory().Slice(14)
RuntimeFarkle.parseMemory myMarvelousRuntimeFarkle mem

RuntimeFarkle.parseFile myMarvelousRuntimeFarkle "example.txt"

let myStringReader = new StringReader("45 + 198 - 647 + 2 * 478 - 488 + 801 - 248")
RuntimeFarkle.parseTextReader myMarvelousRuntimeFarkle myStringReader

Customizing our designtime Farkle

Before we finish, let's take a look at one more thing; how to further customize a designtime Farkle.

We will see some customizations as an example:

let _customized =
    expression
    // You can add as many types of block or line comments as you want.
    |> DesigntimeFarkle.addBlockComment "/*" "*/"
    |> DesigntimeFarkle.addLineComment "//"
    |> DesigntimeFarkle.caseSensitive true
    // Whether to ignore whitespace between terminals; true by default.
    |> DesigntimeFarkle.autoWhitespace false
    // Adds an arbitrary symbol that will be ignored by Farkle.
    // It needs a regex and a terminal
    |> DesigntimeFarkle.addNoiseSymbol "Letters" (chars AllLetters)

Note: These customizations have to be done at the top-level designtime Farkle that is going to be built (or they will have no effect) and always apply to the entire grammar.


So, I hope you enjoyed this little tutorial. If you did, don't forget to give Farkle a try, and maybe you have any question, found a bug, or want a feature, and want to open a GitHub issue as well. I hope that all of you have a wonderful day and to see you soon. Goodbye!

namespace Farkle
namespace Farkle.Builder
namespace System
val number : DesigntimeFarkle<float>
module Terminals

from Farkle.Builder
val genericReal : allowSign:bool -> name:string -> DesigntimeFarkle<'TReal> (requires member Parse)
Multiple items
val float : value:'T -> float (requires member op_Explicit)

--------------------
[<Struct>]
type float = Double

--------------------
type float<'Measure> =
  float
val numberStringRegex : DesigntimeFarkle<float>
Multiple items
module Regex

from Farkle.Builder

--------------------
type Regex =
  private | Concat of Regex list
          | Alt of Regex list
          | Star of Regex
          | Chars of Set<char>
          | AllButChars of Set<char>
          | RegexString of RegexStringHolder
    member And : x2:Regex -> Regex
    member AtLeast : num:int -> Regex
    member Between : from:int -> upTo:int -> Regex
    member Optional : unit -> Regex
    member Or : x2:Regex -> Regex
    member Repeat : num:int -> Regex
    member ZeroOrMore : unit -> Regex
    static member Choice : [<ParamArray>] regexes:Regex [] -> Regex
    static member FromRegexString : x:string -> Regex
    static member Join : [<ParamArray>] regexes:Regex [] -> Regex
    ...
val regexString : x:string -> Regex
val terminal : name:string -> fTransform:T<'a> -> regex:Regex -> DesigntimeFarkle<'a>
T (ITransformerContext -> ReadOnlySpan<char> -> float)
val x : ReadOnlySpan<char>
ReadOnlySpan.ToString() : string
val numberConstructedRegex : DesigntimeFarkle<float>
val atLeastOneNumber : Regex
val chars : str:seq<char> -> Regex
module PredefinedSets

from Farkle.Builder
val Number : PredefinedSet
val atLeast : num:int -> x:Regex -> Regex
val concat : xs:seq<Regex> -> Regex
val optional : x:Regex -> Regex
Multiple items
val char : c:char -> Regex

--------------------
[<Struct>]
type char = Char
val justTwoNumbers : DesigntimeFarkle<float>
val x1 : float
val x2 : float
val empty : ProductionBuilder
val exp : value:'T -> 'T (requires member Exp)
namespace Farkle.Builder.OperatorPrecedence
val expression : DesigntimeFarkle<float>
val NEG : obj
type obj = Object
val expression : Nonterminal<float>
val nonterminal : name:string -> Nonterminal<'a>
member Nonterminal.SetProductions : firstProd:Production<'T> * [<ParamArray>] prods:Production<'T> [] -> unit
val asIs : pb:ProductionBuilders.ProductionBuilder<'T> -> Production<'T>
val prec : token:obj -> pb:'TBuilder -> 'TBuilder (requires member WithPrecedence)
val x : float
type Math =
  static member Abs : value: decimal -> decimal + 6 overloads
  static member Acos : d: float -> float
  static member Acosh : d: float -> float
  static member Asin : d: float -> float
  static member Asinh : d: float -> float
  static member Atan : d: float -> float
  static member Atan2 : y: float * x: float -> float
  static member Atanh : d: float -> float
  static member BigMul : a: int * b: int -> int64 + 2 overloads
  static member BitDecrement : x: float -> float
  ...
Math.Pow(x: float, y: float) : float
val opScope : OperatorScope
Multiple items
type OperatorScope =
  new : resolvesReduceReduceConflicts:bool * assocGroups:seq<AssociativityGroup> -> OperatorScope + 2 overloads
  member private AssociativityGroups : AssociativityGroup list
  member ResolvesReduceReduceConflict : bool

--------------------
new : [<ParamArray>] assocGroups:AssociativityGroup [] -> OperatorScope
new : resolvesReduceReduceConflicts:bool * assocGroups:seq<AssociativityGroup> -> OperatorScope
new : resolveReduceReduceConflicts:bool * [<ParamArray>] assocGroups:AssociativityGroup [] -> OperatorScope
Multiple items
type LeftAssociative =
  inherit AssociativityGroup
  new : [<ParamArray>] symbols:obj [] -> LeftAssociative

--------------------
new : [<ParamArray>] symbols:obj [] -> LeftAssociative
Multiple items
type PrecedenceOnly =
  inherit AssociativityGroup
  new : [<ParamArray>] symbols:obj [] -> PrecedenceOnly

--------------------
new : [<ParamArray>] symbols:obj [] -> PrecedenceOnly
Multiple items
type RightAssociative =
  inherit AssociativityGroup
  new : [<ParamArray>] symbols:obj [] -> RightAssociative

--------------------
new : [<ParamArray>] symbols:obj [] -> RightAssociative
Multiple items
module DesigntimeFarkle

from Farkle.Builder

--------------------
type DesigntimeFarkle =
  abstract member Metadata : GrammarMetadata
  abstract member Name : string

--------------------
type DesigntimeFarkle<'T> =
  inherit DesigntimeFarkle
val withOperatorScope : opScope:OperatorScope -> df:DesigntimeFarkle<'a> -> DesigntimeFarkle<'a>
val myMarvelousRuntimeFarkle : RuntimeFarkle<float>
Multiple items
module RuntimeFarkle

from Farkle

--------------------
type RuntimeFarkle<'TResult> =
  private { Grammar: Result<Grammar,BuildError list>
            PostProcessor: PostProcessor<'TResult>
            TokenizerFactory: TokenizerFactory }
    interface IGrammarProvider
    member Cast : unit -> RuntimeFarkle<obj>
    member ChangePostProcessor : pp:PostProcessor<'TNewResult> -> RuntimeFarkle<'TNewResult>
    member ChangeTokenizer : tokenizerFactory:TokenizerFactory -> RuntimeFarkle<'TResult> + 1 overload
    member GetBuildErrorMessage : unit -> string
    member GetBuildErrors : unit -> BuildError list
    member GetGrammar : unit -> Grammar
    member Parse : input:CharStream -> Result<'TResult,FarkleError> + 3 overloads
    member ParseFile : path:string -> Result<'TResult,FarkleError>
    member SyntaxCheck : unit -> RuntimeFarkle<obj>
    ...
val build : df:DesigntimeFarkle<'a> -> RuntimeFarkle<'a>
namespace System.IO
val parseString : rf:RuntimeFarkle<'a> -> inputString:string -> Result<'a,FarkleError>
union case Result.Ok: ResultValue: 'T -> Result<'T,'TError>
val result : float
val printfn : format:Printf.TextWriterFormat<'T> -> 'T
union case Result.Error: ErrorValue: 'TError -> Result<'T,'TError>
val err : FarkleError
val mem : ReadOnlyMemory<char>
val parseMemory : rf:RuntimeFarkle<'a> -> input:ReadOnlyMemory<char> -> Result<'a,FarkleError>
val parseFile : rf:RuntimeFarkle<'a> -> path:string -> Result<'a,FarkleError>
val myStringReader : StringReader
Multiple items
type StringReader =
  inherit TextReader
  new : s: string -> unit
  member Close : unit -> unit
  member Dispose : disposing: bool -> unit
  member Peek : unit -> int
  member Read : unit -> int + 2 overloads
  member ReadAsync : buffer: char [] * index: int * count: int -> Task<int> + 1 overload
  member ReadBlock : buffer: Span<char> -> int
  member ReadBlockAsync : buffer: char [] * index: int * count: int -> Task<int> + 1 overload
  member ReadLine : unit -> string
  ...

--------------------
StringReader(s: string) : StringReader
val parseTextReader : rf:RuntimeFarkle<'a> -> textReader:TextReader -> Result<'a,FarkleError>
val _customized : DesigntimeFarkle<float>
val addBlockComment : commentStart:string -> commentEnd:string -> df:DesigntimeFarkle<'a> -> DesigntimeFarkle<'a>
val addLineComment : commentStart:string -> df:DesigntimeFarkle<'a> -> DesigntimeFarkle<'a>
val caseSensitive : flag:bool -> df:DesigntimeFarkle<'a> -> DesigntimeFarkle<'a>
val autoWhitespace : flag:bool -> df:DesigntimeFarkle<'a> -> DesigntimeFarkle<'a>
val addNoiseSymbol : name:string -> regex:Regex -> df:DesigntimeFarkle<'a> -> DesigntimeFarkle<'a>
val AllLetters : PredefinedSet