-
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
9 changed files
with
686 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Instructions | ||
|
||
Parsing a Smart Game Format string. | ||
|
||
[SGF][sgf] is a standard format for storing board game files, in particular go. | ||
|
||
SGF is a fairly simple format. An SGF file usually contains a single | ||
tree of nodes where each node is a property list. The property list | ||
contains key value pairs, each key can only occur once but may have | ||
multiple values. | ||
|
||
The exercise will have you parse an SGF string and return a tree structure of properties. | ||
|
||
An SGF file may look like this: | ||
|
||
```text | ||
(;FF[4]C[root]SZ[19];B[aa];W[ab]) | ||
``` | ||
|
||
This is a tree with three nodes: | ||
|
||
- The top level node has three properties: FF\[4\] (key = "FF", value | ||
= "4"), C\[root\](key = "C", value = "root") and SZ\[19\] (key = | ||
"SZ", value = "19"). (FF indicates the version of SGF, C is a | ||
comment and SZ is the size of the board.) | ||
- The top level node has a single child which has a single property: | ||
B\[aa\]. (Black plays on the point encoded as "aa", which is the | ||
1-1 point). | ||
- The B\[aa\] node has a single child which has a single property: | ||
W\[ab\]. | ||
|
||
As you can imagine an SGF file contains a lot of nodes with a single | ||
child, which is why there's a shorthand for it. | ||
|
||
SGF can encode variations of play. Go players do a lot of backtracking | ||
in their reviews (let's try this, doesn't work, let's try that) and SGF | ||
supports variations of play sequences. For example: | ||
|
||
```text | ||
(;FF[4](;B[aa];W[ab])(;B[dd];W[ee])) | ||
``` | ||
|
||
Here the root node has two variations. The first (which by convention | ||
indicates what's actually played) is where black plays on 1-1. Black was | ||
sent this file by his teacher who pointed out a more sensible play in | ||
the second child of the root node: `B[dd]` (4-4 point, a very standard | ||
opening to take the corner). | ||
|
||
A key can have multiple values associated with it. For example: | ||
|
||
```text | ||
(;FF[4];AB[aa][ab][ba]) | ||
``` | ||
|
||
Here `AB` (add black) is used to add three black stones to the board. | ||
|
||
All property values will be the [SGF Text type][sgf-text]. | ||
You don't need to implement any other value type. | ||
Although you can read the [full documentation of the Text type][sgf-text], a summary of the important points is below: | ||
|
||
- Newlines are removed if they come immediately after a `\`, otherwise they remain as newlines. | ||
- All whitespace characters other than newline are converted to spaces. | ||
- `\` is the escape character. | ||
Any non-whitespace character after `\` is inserted as-is. | ||
Any whitespace character after `\` follows the above rules. | ||
Note that SGF does **not** have escape sequences for whitespace characters such as `\t` or `\n`. | ||
|
||
Be careful not to get confused between: | ||
|
||
- The string as it is represented in a string literal in the tests | ||
- The string that is passed to the SGF parser | ||
|
||
Escape sequences in the string literals may have already been processed by the programming language's parser before they are passed to the SGF parser. | ||
|
||
There are a few more complexities to SGF (and parsing in general), which | ||
you can mostly ignore. You should assume that the input is encoded in | ||
UTF-8, the tests won't contain a charset property, so don't worry about | ||
that. Furthermore you may assume that all newlines are unix style (`\n`, | ||
no `\r` or `\r\n` will be in the tests) and that no optional whitespace | ||
between properties, nodes, etc will be in the tests. | ||
|
||
[sgf]: https://en.wikipedia.org/wiki/Smart_Game_Format | ||
[sgf-text]: https://www.red-bean.com/sgf/sgf4.html#text |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
module [parse] | ||
|
||
import parser.Core as P | ||
import parser.String as S | ||
|
||
# --- SGF GRAMMAR --- | ||
# Source: https://homepages.cwi.nl/~aeb/go/misc/sgfnotes.html | ||
# | ||
# Collection = GameTree+ | ||
# GameTree = '(' Sequence GameTree* ')' | ||
# Sequence = Node+ | ||
# Node = ';' Property* | ||
# Property = PropIdent PropValue+ | ||
# PropIdent = UcLetter+ | ||
# PropValue = '[' CValueType ']' | ||
# CValueType = (ValueType | Compose) | ||
# ValueType = (None | Number | Real | Double | Color | SimpleText | Text | Point | Move | Stone) | ||
# UcLetter = 'A'..'Z' | ||
# Compose = ValueType ':' ValueType | ||
# | ||
# Note: In this exercise, we only support `Text` values for `CValueType`, where | ||
# `Text` is escaped as per the instructions. `Compose` is not supported. | ||
|
||
NodeProperties : Dict Str (List Str) | ||
|
||
# Note: Empty is unused, it's only here to avoid infinite type recursion because | ||
# the Roc compiler does not yet understand that an empty List can end the | ||
# recursion. | ||
GameTree : [Empty, GameNode { properties : NodeProperties, children : List GameTree }] | ||
|
||
## This function builds a GameTree given a list of `NodeProperties` and a list | ||
## of alternative children `GameTree`. | ||
## | ||
## For example, "(;A[1]B[2];C[3]D[4];E[5](;F[6]G[7];H[8])(;I[6]))" will be | ||
## parsed by the `gameTree` parser into a `List NodeProperties` and a | ||
## `List GameTree`: | ||
## - The `List NodeProperties` will be [AB, CD, E] in this example, where XY | ||
## represents a `NodeProperties` value containing the properties X & Y (along | ||
## with their values). | ||
## - The `List GameTree` will [FG->[H->[]], I->[]] where XY->[t1,t2] represents | ||
## a GameTree node with properties XY and with children trees t1 and t2. | ||
## | ||
## With these two input lists, the function will build the final GameTree: | ||
## AB->[CD->[E->[FG->[H->[]], I->[]]]] | ||
buildGameNode : List NodeProperties, List GameTree -> GameTree | ||
buildGameNode = \nodeProps, alternatives -> | ||
help = \remainingNodeProps, subTrees -> | ||
when remainingNodeProps is | ||
[rootNode] -> | ||
GameNode { properties: rootNode, children: subTrees } | ||
|
||
[.. as rest, last] -> | ||
help rest [GameNode { properties: last, children: subTrees }] | ||
|
||
[] -> crash "Unreachable: remainingNodeProps list cannot be empty" | ||
help nodeProps alternatives | ||
|
||
gameTree : P.Parser (List U8) GameTree | ||
gameTree = | ||
P.const (\nodeProps -> \alternatives -> buildGameNode nodeProps alternatives) | ||
|> P.skip (S.codeunit '(') | ||
|> P.keep (P.oneOrMore node) | ||
|> P.keep (P.many subTree) | ||
|> P.skip (S.codeunit ')') | ||
|
||
subTree : P.Parser (List U8) GameTree | ||
subTree = | ||
P.const (\t -> t) | ||
|> P.keep | ||
( | ||
P.oneOf [ | ||
P.const (\t -> t) |> P.keep (P.lazy (\_ -> gameTree)), | ||
P.const (\_ -> Empty) |> P.keep (P.fail "empty"), | ||
] | ||
) | ||
|
||
node : P.Parser (List U8) NodeProperties | ||
node = | ||
P.const (\s -> s) | ||
|> P.skip (S.codeunit ';') | ||
|> P.keep (P.many property) | ||
|> P.map \properties -> Dict.fromList properties | ||
|
||
## NOTE: UTF-8 error handling is very basic here. SGF actually uses ISO-8859-1 | ||
## by default, not UTF-8. Moreover, any other encoding can be specified | ||
## via the CA property. This is not supported in this exercise. | ||
property : P.Parser (List U8) (Str, List Str) | ||
property = | ||
P.map2 propIdent (P.oneOrMore propValue) \id, values -> ( | ||
id |> Str.fromUtf8 |> Result.withDefault "<BadUTF8>", | ||
values |> List.map \value -> value |> Str.fromUtf8 |> Result.withDefault "<BadUTF8>", | ||
) | ||
|
||
propIdent : P.Parser (List U8) (List U8) | ||
propIdent = | ||
P.oneOrMore ucLetter | ||
|
||
propValue : P.Parser (List U8) (List U8) | ||
propValue = | ||
P.const (\value -> value) | ||
|> P.skip (S.codeunit '[') | ||
|> P.keep valueType # in this exercise we don't support 'Compose' | ||
|> P.skip (S.codeunit ']') | ||
|
||
## in this exercise we only support Text values | ||
valueType : P.Parser (List U8) (List U8) | ||
valueType = | ||
P.buildPrimitiveParser \input -> | ||
help = \result, chars -> | ||
when chars is | ||
[] -> Err (ParsingFailure "No closing bracket") | ||
[']', ..] -> Ok { val: result, input: chars } | ||
['\\', '\t', .. as rest] -> help (result |> List.append ' ') rest | ||
['\\', '\n', .. as rest] -> help result rest | ||
['\\', c, .. as rest] -> help (result |> List.append c) rest | ||
['\t', .. as rest] -> help (result |> List.append ' ') rest | ||
[c, .. as rest] -> help (result |> List.append c) rest | ||
help [] input | ||
|
||
ucLetter : P.Parser (List U8) U8 | ||
ucLetter = | ||
S.codeunitSatisfies \b -> b >= 'A' && b <= 'Z' | ||
|
||
parse : Str -> Result GameTree [ParsingFailure Str, ParsingIncomplete Str] | ||
parse = \sgf -> | ||
S.parseStr gameTree sgf |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
{ | ||
"authors": [ | ||
"ageron" | ||
], | ||
"files": { | ||
"solution": [ | ||
"SgfParsing.roc" | ||
], | ||
"test": [ | ||
"sgf-parsing-test.roc" | ||
], | ||
"example": [ | ||
".meta/Example.roc" | ||
] | ||
}, | ||
"blurb": "Parsing a Smart Game Format string." | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
{%- import "generator_macros.j2" as macros with context -%} | ||
{{ macros.canonical_ref() }} | ||
{{ macros.header(imports=["parser"]) }} | ||
|
||
import {{ exercise | to_pascal }} exposing [{{ cases[0]["property"] | to_camel }}] | ||
|
||
{%- macro to_node(node) %} | ||
GameNode { | ||
properties: Dict.fromList [ | ||
{%- for name, values in node["properties"].items() %} | ||
({{ name | to_roc }}, {{ values | to_roc }}), | ||
{%- endfor %}], | ||
children: [ | ||
{%- for child_node in node["children"] %} | ||
{{ to_node(child_node) }}, | ||
{%- endfor %}], | ||
} | ||
|
||
{%- endmacro %} | ||
|
||
{% for case in cases -%} | ||
# {{ case["description"] }} | ||
expect | ||
sgf = {{ case["input"]["encoded"] | to_roc }} | ||
result = {{ case["property"] | to_camel }} sgf | ||
{%- if case["expected"]["error"] %} | ||
result |> Result.isErr | ||
{%- else %} | ||
expected = {{ to_node(case["expected"]) | indent(4) }} | ||
result == Ok expected | ||
{%- endif %} | ||
|
||
|
||
{% endfor %} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# This is an auto-generated file. | ||
# | ||
# Regenerating this file via `configlet sync` will: | ||
# - Recreate every `description` key/value pair | ||
# - Recreate every `reimplements` key/value pair, where they exist in problem-specifications | ||
# - Remove any `include = true` key/value pair (an omitted `include` key implies inclusion) | ||
# - Preserve any other key/value pair | ||
# | ||
# As user-added comments (using the # character) will be removed when this file | ||
# is regenerated, comments can be added via a `comment` key. | ||
|
||
[2668d5dc-109f-4f71-b9d5-8d06b1d6f1cd] | ||
description = "empty input" | ||
|
||
[84ded10a-94df-4a30-9457-b50ccbdca813] | ||
description = "tree with no nodes" | ||
|
||
[0a6311b2-c615-4fa7-800e-1b1cbb68833d] | ||
description = "node without tree" | ||
|
||
[8c419ed8-28c4-49f6-8f2d-433e706110ef] | ||
description = "node without properties" | ||
|
||
[8209645f-32da-48fe-8e8f-b9b562c26b49] | ||
description = "single node tree" | ||
|
||
[6c995856-b919-4c75-8fd6-c2c3c31b37dc] | ||
description = "multiple properties" | ||
|
||
[a771f518-ec96-48ca-83c7-f8d39975645f] | ||
description = "properties without delimiter" | ||
|
||
[6c02a24e-6323-4ed5-9962-187d19e36bc8] | ||
description = "all lowercase property" | ||
|
||
[8772d2b1-3c57-405a-93ac-0703b671adc1] | ||
description = "upper and lowercase property" | ||
|
||
[a759b652-240e-42ec-a6d2-3a08d834b9e2] | ||
description = "two nodes" | ||
|
||
[cc7c02bc-6097-42c4-ab88-a07cb1533d00] | ||
description = "two child trees" | ||
|
||
[724eeda6-00db-41b1-8aa9-4d5238ca0130] | ||
description = "multiple property values" | ||
|
||
[28092c06-275f-4b9f-a6be-95663e69d4db] | ||
description = "within property values, whitespace characters such as tab are converted to spaces" | ||
|
||
[deaecb9d-b6df-4658-aa92-dcd70f4d472a] | ||
description = "within property values, newlines remain as newlines" | ||
|
||
[8e4c970e-42d7-440e-bfef-5d7a296868ef] | ||
description = "escaped closing bracket within property value becomes just a closing bracket" | ||
|
||
[cf371fa8-ba4a-45ec-82fb-38668edcb15f] | ||
description = "escaped backslash in property value becomes just a backslash" | ||
|
||
[dc13ca67-fac0-4b65-b3fe-c584d6a2c523] | ||
description = "opening bracket within property value doesn't need to be escaped" | ||
|
||
[a780b97e-8dbb-474e-8f7e-4031902190e8] | ||
description = "semicolon in property value doesn't need to be escaped" | ||
|
||
[0b57a79e-8d89-49e5-82b6-2eaaa6b88ed7] | ||
description = "parentheses in property value don't need to be escaped" | ||
|
||
[c72a33af-9e04-4cc5-9890-1b92262813ac] | ||
description = "escaped tab in property value is converted to space" | ||
|
||
[3a1023d2-7484-4498-8d73-3666bb386e81] | ||
description = "escaped newline in property value is converted to nothing at all" | ||
|
||
[25abf1a4-5205-46f1-8c72-53273b94d009] | ||
description = "escaped t and n in property value are just letters, not whitespace" | ||
|
||
[08e4b8ba-bb07-4431-a3d9-b1f4cdea6dab] | ||
description = "mixing various kinds of whitespace and escaped characters in property value" | ||
reimplements = "11c36323-93fc-495d-bb23-c88ee5844b8c" | ||
|
||
[11c36323-93fc-495d-bb23-c88ee5844b8c] | ||
description = "escaped property" | ||
include = false |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
module [parse] | ||
|
||
# HINT: we have added the `roc-parser` package to the app's header in | ||
# sgf-parsing-test.roc. You can use it if you want, particularly the | ||
# Core module, and perhaps the String module as well. | ||
# However, if you prefer to roll out your own solution, that's fine too! | ||
# import parser.Core | ||
# import parser.String | ||
|
||
NodeProperties : Dict Str (List Str) | ||
|
||
# Note: Empty is unused, it's only here to avoid infinite type recursion because | ||
# the Roc compiler does not yet understand that an empty List can end the | ||
# recursion. | ||
GameTree : [Empty, GameNode { properties : NodeProperties, children : List GameTree }] | ||
|
||
parse : Str -> Result GameTree _ | ||
parse = \sgf -> | ||
crash "Please implement the 'parse' function" |
Oops, something went wrong.