-
Notifications
You must be signed in to change notification settings - Fork 5.9k
/
doc.go
40 lines (39 loc) · 1.6 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
// Copyright (c) 2015 The golex Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
// Package lex is a Unicode-friendly run time library for golex[0] generated
// lexical analyzers[1].
//
// Changelog
//
// 2015-04-08: Initial release.
//
// Character classes
//
// Golex internally handles only 8 bit "characters". Many Unicode-aware
// tokenizers do not actually need to recognize every Unicode rune, but only
// some particular partitions/subsets. Like, for example, a particular Unicode
// category, say upper case letters: Lu.
//
// The idea is to convert all runes in a particular set as a single 8 bit
// character allocated outside the ASCII range of codes. The token value, a
// string of runes and their exact positions is collected as usual (see the
// Token and TokenBytes method), but the tokenizer DFA is simpler (and thus
// smaller and perhaps also faster) when this technique is used. In the example
// program (see below), recognizing (and skipping) white space, integer
// literals, one keyword and Go identifiers requires only an 8 state DFA[5].
//
// To provide the conversion from runes to character classes, "install" your
// converting function using the RuneClass option.
//
// References
//
// -
//
// [0]: http://godoc.org/github.com/cznic/golex
// [1]: http://en.wikipedia.org/wiki/Lexical_analysis
// [2]: http://golang.org/cmd/yacc/
// [3]: https://github.com/cznic/golex/blob/master/lex/example.l
// [4]: http://golang.org/pkg/io/#RuneReader
// [5]: https://github.com/cznic/golex/blob/master/lex/dfa
package lex