Adding upstream version 0.5.0.
Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
parent
f9051e9424
commit
16e40566d2
8 changed files with 1303 additions and 0 deletions
29
LICENSE.md
Normal file
29
LICENSE.md
Normal file
|
@ -0,0 +1,29 @@
|
|||
BSD 3-Clause License
|
||||
|
||||
Copyright (c) 2022-present, Gani Georgiev
|
||||
All rights reserved.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are met:
|
||||
|
||||
1. Redistributions of source code must retain the above copyright notice, this
|
||||
list of conditions and the following disclaimer.
|
||||
|
||||
2. Redistributions in binary form must reproduce the above copyright notice,
|
||||
this list of conditions and the following disclaimer in the documentation
|
||||
and/or other materials provided with the distribution.
|
||||
|
||||
3. Neither the name of the copyright holder nor the names of its
|
||||
contributors may be used to endorse or promote products derived from
|
||||
this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
|
||||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
|
||||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
|
||||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
|
||||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
|
||||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
118
README.md
Normal file
118
README.md
Normal file
|
@ -0,0 +1,118 @@
|
|||
fexpr
|
||||
[](https://goreportcard.com/report/github.com/ganigeorgiev/fexpr)
|
||||
[](https://pkg.go.dev/github.com/ganigeorgiev/fexpr)
|
||||
================================================================================
|
||||
|
||||
**fexpr** is a filter query language parser that generates easy to work with AST structure so that you can create safely SQL, Elasticsearch, etc. queries from user input.
|
||||
|
||||
Or in other words, transform the string `"id > 1"` into the struct `[{&& {{identifier id} > {number 1}}}]`.
|
||||
|
||||
Supports parenthesis and various conditional expression operators (see [Grammar](https://github.com/ganigeorgiev/fexpr#grammar)).
|
||||
|
||||
|
||||
## Example usage
|
||||
|
||||
```
|
||||
go get github.com/ganigeorgiev/fexpr
|
||||
```
|
||||
|
||||
```go
|
||||
package main
|
||||
|
||||
import github.com/ganigeorgiev/fexpr
|
||||
|
||||
func main() {
|
||||
// [{&& {{identifier id} = {number 123}}} {&& {{identifier status} = {text active}}}]
|
||||
result, err := fexpr.Parse("id=123 && status='active'")
|
||||
}
|
||||
```
|
||||
|
||||
> Note that each parsed expression statement contains a join/union operator (`&&` or `||`) so that the result can be consumed on small chunks without having to rely on the group/nesting context.
|
||||
|
||||
> See the [package documentation](https://pkg.go.dev/github.com/ganigeorgiev/fexpr) for more details and examples.
|
||||
|
||||
|
||||
## Grammar
|
||||
|
||||
**fexpr** grammar resembles the SQL `WHERE` expression syntax. It recognizes several token types (identifiers, numbers, quoted text, expression operators, whitespaces, etc.).
|
||||
|
||||
> You could find all supported tokens in [`scanner.go`](https://github.com/ganigeorgiev/fexpr/blob/master/scanner.go).
|
||||
|
||||
#### Operators
|
||||
|
||||
- **`=`** Equal operator (eg. `a=b`)
|
||||
- **`!=`** NOT Equal operator (eg. `a!=b`)
|
||||
- **`>`** Greater than operator (eg. `a>b`)
|
||||
- **`>=`** Greater than or equal operator (eg. `a>=b`)
|
||||
- **`<`** Less than or equal operator (eg. `a<b`)
|
||||
- **`<=`** Less than or equal operator (eg. `a<=b`)
|
||||
- **`~`** Like/Contains operator (eg. `a~b`)
|
||||
- **`!~`** NOT Like/Contains operator (eg. `a!~b`)
|
||||
- **`?=`** Array/Any equal operator (eg. `a?=b`)
|
||||
- **`?!=`** Array/Any NOT Equal operator (eg. `a?!=b`)
|
||||
- **`?>`** Array/Any Greater than operator (eg. `a?>b`)
|
||||
- **`?>=`** Array/Any Greater than or equal operator (eg. `a?>=b`)
|
||||
- **`?<`** Array/Any Less than or equal operator (eg. `a?<b`)
|
||||
- **`?<=`** Array/Any Less than or equal operator (eg. `a?<=b`)
|
||||
- **`?~`** Array/Any Like/Contains operator (eg. `a?~b`)
|
||||
- **`?!~`** Array/Any NOT Like/Contains operator (eg. `a?!~b`)
|
||||
- **`&&`** AND join operator (eg. `a=b && c=d`)
|
||||
- **`||`** OR join operator (eg. `a=b || c=d`)
|
||||
- **`()`** Parenthesis (eg. `(a=1 && b=2) || (a=3 && b=4)`)
|
||||
|
||||
#### Numbers
|
||||
Number tokens are any integer or decimal numbers.
|
||||
|
||||
_Example_: `123`, `10.50`, `-14`.
|
||||
|
||||
#### Quoted text
|
||||
|
||||
Text tokens are any literals that are wrapped by `'` or `"` quotes.
|
||||
|
||||
_Example_: `'Lorem ipsum dolor 123!'`, `"escaped \"word\""`, `"mixed 'quotes' are fine"`.
|
||||
|
||||
#### Identifiers
|
||||
|
||||
Identifier tokens are literals that start with a letter, `_`, `@` or `#` and could contain further any number of letters, digits, `.` (usually used as a separator) or `:` (usually used as modifier) characters.
|
||||
|
||||
_Example_: `id`, `a.b.c`, `field123`, `@request.method`, `author.name:length`.
|
||||
|
||||
#### Functions
|
||||
|
||||
Function tokens are similar to the identifiers but in addition accept a list of arguments enclosed in parenthesis `()`.
|
||||
The function arguments must be separated by comma (_a single trailing comma is also allowed_) and each argument can be an identifier, quoted text, number or another nested function (_up to 2 nested_).
|
||||
|
||||
_Example_: `test()`, `test(a.b, 123, "abc")`, `@a.b.c:test(true)`, `a(b(c(1, 2)))`.
|
||||
|
||||
#### Comments
|
||||
|
||||
Comment tokens are any single line text literals starting with `//`.
|
||||
Similar to whitespaces, comments are ignored by `fexpr.Parse()`.
|
||||
|
||||
_Example_: `// test`.
|
||||
|
||||
|
||||
## Using only the scanner
|
||||
|
||||
The tokenizer (aka. `fexpr.Scanner`) could be used without the parser's state machine so that you can write your own custom tokens processing:
|
||||
|
||||
```go
|
||||
s := fexpr.NewScanner([]byte("id > 123"))
|
||||
|
||||
// scan single token at a time until EOF or error is reached
|
||||
for {
|
||||
t, err := s.Scan()
|
||||
if t.Type == fexpr.TokenEOF || err != nil {
|
||||
break
|
||||
}
|
||||
|
||||
fmt.Println(t)
|
||||
}
|
||||
|
||||
// Output:
|
||||
// {<nil> identifier id}
|
||||
// {<nil> whitespace }
|
||||
// {<nil> sign >}
|
||||
// {<nil> whitespace }
|
||||
// {<nil> number 123}
|
||||
```
|
36
examples_test.go
Normal file
36
examples_test.go
Normal file
|
@ -0,0 +1,36 @@
|
|||
package fexpr_test
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
|
||||
"github.com/ganigeorgiev/fexpr"
|
||||
)
|
||||
|
||||
func ExampleScanner_Scan() {
|
||||
s := fexpr.NewScanner([]byte("id > 123"))
|
||||
|
||||
for {
|
||||
t, err := s.Scan()
|
||||
if t.Type == fexpr.TokenEOF || err != nil {
|
||||
break
|
||||
}
|
||||
|
||||
fmt.Println(t)
|
||||
}
|
||||
|
||||
// Output:
|
||||
// {<nil> identifier id}
|
||||
// {<nil> whitespace }
|
||||
// {<nil> sign >}
|
||||
// {<nil> whitespace }
|
||||
// {<nil> number 123}
|
||||
}
|
||||
|
||||
func ExampleParse() {
|
||||
result, _ := fexpr.Parse("id > 123")
|
||||
|
||||
fmt.Println(result)
|
||||
|
||||
// Output:
|
||||
// [{{{<nil> identifier id} > {<nil> number 123}} &&}]
|
||||
}
|
3
go.mod
Normal file
3
go.mod
Normal file
|
@ -0,0 +1,3 @@
|
|||
module github.com/ganigeorgiev/fexpr
|
||||
|
||||
go 1.16
|
130
parser.go
Normal file
130
parser.go
Normal file
|
@ -0,0 +1,130 @@
|
|||
package fexpr
|
||||
|
||||
import (
|
||||
"errors"
|
||||
"fmt"
|
||||
)
|
||||
|
||||
var ErrEmpty = errors.New("empty filter expression")
|
||||
var ErrIncomplete = errors.New("invalid or incomplete filter expression")
|
||||
var ErrInvalidComment = errors.New("invalid comment")
|
||||
|
||||
// Expr represents an individual tokenized expression consisting
|
||||
// of left operand, operator and a right operand.
|
||||
type Expr struct {
|
||||
Left Token
|
||||
Op SignOp
|
||||
Right Token
|
||||
}
|
||||
|
||||
// IsZero checks if the current Expr has zero-valued props.
|
||||
func (e Expr) IsZero() bool {
|
||||
return e.Op == "" && e.Left.Literal == "" && e.Left.Type == "" && e.Right.Literal == "" && e.Right.Type == ""
|
||||
}
|
||||
|
||||
// ExprGroup represents a wrapped expression and its join type.
|
||||
//
|
||||
// The group's Item could be either an `Expr` instance or `[]ExprGroup` slice (for nested expressions).
|
||||
type ExprGroup struct {
|
||||
Item interface{}
|
||||
Join JoinOp
|
||||
}
|
||||
|
||||
// parser's state machine steps
|
||||
const (
|
||||
stepBeforeSign = iota
|
||||
stepSign
|
||||
stepAfterSign
|
||||
StepJoin
|
||||
)
|
||||
|
||||
// Parse parses the provided text and returns its processed AST
|
||||
// in the form of `ExprGroup` slice(s).
|
||||
//
|
||||
// Comments and whitespaces are ignored.
|
||||
func Parse(text string) ([]ExprGroup, error) {
|
||||
result := []ExprGroup{}
|
||||
scanner := NewScanner([]byte(text))
|
||||
step := stepBeforeSign
|
||||
join := JoinAnd
|
||||
|
||||
var expr Expr
|
||||
|
||||
for {
|
||||
t, err := scanner.Scan()
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
if t.Type == TokenEOF {
|
||||
break
|
||||
}
|
||||
|
||||
if t.Type == TokenWS || t.Type == TokenComment {
|
||||
continue
|
||||
}
|
||||
|
||||
if t.Type == TokenGroup {
|
||||
groupResult, err := Parse(t.Literal)
|
||||
if err != nil {
|
||||
return nil, err
|
||||
}
|
||||
|
||||
// append only if non-empty group
|
||||
if len(groupResult) > 0 {
|
||||
result = append(result, ExprGroup{Join: join, Item: groupResult})
|
||||
}
|
||||
|
||||
step = StepJoin
|
||||
continue
|
||||
}
|
||||
|
||||
switch step {
|
||||
case stepBeforeSign:
|
||||
if t.Type != TokenIdentifier && t.Type != TokenText && t.Type != TokenNumber && t.Type != TokenFunction {
|
||||
return nil, fmt.Errorf("expected left operand (identifier, function, text or number), got %q (%s)", t.Literal, t.Type)
|
||||
}
|
||||
|
||||
expr = Expr{Left: t}
|
||||
|
||||
step = stepSign
|
||||
case stepSign:
|
||||
if t.Type != TokenSign {
|
||||
return nil, fmt.Errorf("expected a sign operator, got %q (%s)", t.Literal, t.Type)
|
||||
}
|
||||
|
||||
expr.Op = SignOp(t.Literal)
|
||||
step = stepAfterSign
|
||||
case stepAfterSign:
|
||||
if t.Type != TokenIdentifier && t.Type != TokenText && t.Type != TokenNumber && t.Type != TokenFunction {
|
||||
return nil, fmt.Errorf("expected right operand (identifier, function text or number), got %q (%s)", t.Literal, t.Type)
|
||||
}
|
||||
|
||||
expr.Right = t
|
||||
result = append(result, ExprGroup{Join: join, Item: expr})
|
||||
|
||||
step = StepJoin
|
||||
case StepJoin:
|
||||
if t.Type != TokenJoin {
|
||||
return nil, fmt.Errorf("expected && or ||, got %q (%s)", t.Literal, t.Type)
|
||||
}
|
||||
|
||||
join = JoinAnd
|
||||
if t.Literal == "||" {
|
||||
join = JoinOr
|
||||
}
|
||||
|
||||
step = stepBeforeSign
|
||||
}
|
||||
}
|
||||
|
||||
if step != StepJoin {
|
||||
if len(result) == 0 && expr.IsZero() {
|
||||
return nil, ErrEmpty
|
||||
}
|
||||
|
||||
return nil, ErrIncomplete
|
||||
}
|
||||
|
||||
return result, nil
|
||||
}
|
142
parser_test.go
Normal file
142
parser_test.go
Normal file
|
@ -0,0 +1,142 @@
|
|||
package fexpr
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestExprIzZero(t *testing.T) {
|
||||
scenarios := []struct {
|
||||
expr Expr
|
||||
result bool
|
||||
}{
|
||||
{Expr{}, true},
|
||||
{Expr{Op: SignAnyEq}, false},
|
||||
{Expr{Left: Token{Literal: "123"}}, false},
|
||||
{Expr{Left: Token{Type: TokenWS}}, false},
|
||||
{Expr{Right: Token{Literal: "123"}}, false},
|
||||
{Expr{Right: Token{Type: TokenWS}}, false},
|
||||
}
|
||||
|
||||
for i, s := range scenarios {
|
||||
t.Run(fmt.Sprintf("s%d", i), func(t *testing.T) {
|
||||
if v := s.expr.IsZero(); v != s.result {
|
||||
t.Fatalf("Expected %v, got %v for \n%v", s.result, v, s.expr)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
||||
|
||||
func TestParse(t *testing.T) {
|
||||
scenarios := []struct {
|
||||
input string
|
||||
expectedError bool
|
||||
expectedPrint string
|
||||
}{
|
||||
{`> 1`, true, "[]"},
|
||||
{`a >`, true, "[]"},
|
||||
{`a > >`, true, "[]"},
|
||||
{`a > %`, true, "[]"},
|
||||
{`a ! 1`, true, "[]"},
|
||||
{`a - 1`, true, "[]"},
|
||||
{`a + 1`, true, "[]"},
|
||||
{`1 - 1`, true, "[]"},
|
||||
{`1 + 1`, true, "[]"},
|
||||
{`> a 1`, true, "[]"},
|
||||
{`a || 1`, true, "[]"},
|
||||
{`a && 1`, true, "[]"},
|
||||
{`test > 1 &&`, true, `[]`},
|
||||
{`|| test = 1`, true, `[]`},
|
||||
{`test = 1 && ||`, true, "[]"},
|
||||
{`test = 1 && a`, true, "[]"},
|
||||
{`test = 1 && a`, true, "[]"},
|
||||
{`test = 1 && "a"`, true, "[]"},
|
||||
{`test = 1 a`, true, "[]"},
|
||||
{`test = 1 a`, true, "[]"},
|
||||
{`test = 1 "a"`, true, "[]"},
|
||||
{`test = 1@test`, true, "[]"},
|
||||
{`test = .@test`, true, "[]"},
|
||||
// mismatched text quotes
|
||||
{`test = "demo'`, true, "[]"},
|
||||
{`test = 'demo"`, true, "[]"},
|
||||
{`test = 'demo'"`, true, "[]"},
|
||||
{`test = 'demo''`, true, "[]"},
|
||||
{`test = "demo"'`, true, "[]"},
|
||||
{`test = "demo""`, true, "[]"},
|
||||
{`test = ""demo""`, true, "[]"},
|
||||
{`test = ''demo''`, true, "[]"},
|
||||
{"test = `demo`", true, "[]"},
|
||||
// comments
|
||||
{"test = / demo", true, "[]"},
|
||||
{"test = // demo", true, "[]"},
|
||||
{"// demo", true, "[]"},
|
||||
{"test = 123 // demo", false, "[{{{<nil> identifier test} = {<nil> number 123}} &&}]"},
|
||||
{"test = // demo\n123", false, "[{{{<nil> identifier test} = {<nil> number 123}} &&}]"},
|
||||
{`
|
||||
a = 123 &&
|
||||
// demo
|
||||
b = 456
|
||||
`, false, "[{{{<nil> identifier a} = {<nil> number 123}} &&} {{{<nil> identifier b} = {<nil> number 456}} &&}]"},
|
||||
// functions
|
||||
{`test() = 12`, false, `[{{{[] function test} = {<nil> number 12}} &&}]`},
|
||||
{`(a.b.c(1) = d.e.f(2)) || 1=2`, false, `[{[{{{[{<nil> number 1}] function a.b.c} = {[{<nil> number 2}] function d.e.f}} &&}] &&} {{{<nil> number 1} = {<nil> number 2}} ||}]`},
|
||||
// valid simple expression and sign operators check
|
||||
{`1=12`, false, `[{{{<nil> number 1} = {<nil> number 12}} &&}]`},
|
||||
{` 1 = 12 `, false, `[{{{<nil> number 1} = {<nil> number 12}} &&}]`},
|
||||
{`"demo" != test`, false, `[{{{<nil> text demo} != {<nil> identifier test}} &&}]`},
|
||||
{`a~1`, false, `[{{{<nil> identifier a} ~ {<nil> number 1}} &&}]`},
|
||||
{`a !~ 1`, false, `[{{{<nil> identifier a} !~ {<nil> number 1}} &&}]`},
|
||||
{`test>12`, false, `[{{{<nil> identifier test} > {<nil> number 12}} &&}]`},
|
||||
{`test > 12`, false, `[{{{<nil> identifier test} > {<nil> number 12}} &&}]`},
|
||||
{`test >="test"`, false, `[{{{<nil> identifier test} >= {<nil> text test}} &&}]`},
|
||||
{`test<@demo.test2`, false, `[{{{<nil> identifier test} < {<nil> identifier @demo.test2}} &&}]`},
|
||||
{`1<="test"`, false, `[{{{<nil> number 1} <= {<nil> text test}} &&}]`},
|
||||
{`1<="te'st"`, false, `[{{{<nil> number 1} <= {<nil> text te'st}} &&}]`},
|
||||
{`demo='te\'st'`, false, `[{{{<nil> identifier demo} = {<nil> text te'st}} &&}]`},
|
||||
{`demo="te\'st"`, false, `[{{{<nil> identifier demo} = {<nil> text te\'st}} &&}]`},
|
||||
{`demo="te\"st"`, false, `[{{{<nil> identifier demo} = {<nil> text te"st}} &&}]`},
|
||||
// invalid parenthesis
|
||||
{`(a=1`, true, `[]`},
|
||||
{`a=1)`, true, `[]`},
|
||||
{`((a=1)`, true, `[]`},
|
||||
{`{a=1}`, true, `[]`},
|
||||
{`[a=1]`, true, `[]`},
|
||||
{`((a=1 || a=2) && c=1))`, true, `[]`},
|
||||
// valid parenthesis
|
||||
{`()`, true, `[]`},
|
||||
{`(a=1)`, false, `[{[{{{<nil> identifier a} = {<nil> number 1}} &&}] &&}]`},
|
||||
{`(a="test(")`, false, `[{[{{{<nil> identifier a} = {<nil> text test(}} &&}] &&}]`},
|
||||
{`(a="test)")`, false, `[{[{{{<nil> identifier a} = {<nil> text test)}} &&}] &&}]`},
|
||||
{`((a=1))`, false, `[{[{[{{{<nil> identifier a} = {<nil> number 1}} &&}] &&}] &&}]`},
|
||||
{`a=1 || 2!=3`, false, `[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} ||}]`},
|
||||
{`a=1 && 2!=3`, false, `[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} &&}]`},
|
||||
{`a=1 && 2!=3 || "b"=a`, false, `[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} &&} {{{<nil> text b} = {<nil> identifier a}} ||}]`},
|
||||
{`(a=1 && 2!=3) || "b"=a`, false, `[{[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} &&}] &&} {{{<nil> text b} = {<nil> identifier a}} ||}]`},
|
||||
{`((a=1 || a=2) && (c=1))`, false, `[{[{[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> identifier a} = {<nil> number 2}} ||}] &&} {[{{{<nil> identifier c} = {<nil> number 1}} &&}] &&}] &&}]`},
|
||||
// https://github.com/pocketbase/pocketbase/issues/5017
|
||||
{`(a='"')`, false, `[{[{{{<nil> identifier a} = {<nil> text "}} &&}] &&}]`},
|
||||
{`(a='\'')`, false, `[{[{{{<nil> identifier a} = {<nil> text '}} &&}] &&}]`},
|
||||
{`(a="'")`, false, `[{[{{{<nil> identifier a} = {<nil> text '}} &&}] &&}]`},
|
||||
{`(a="\"")`, false, `[{[{{{<nil> identifier a} = {<nil> text "}} &&}] &&}]`},
|
||||
}
|
||||
|
||||
for i, scenario := range scenarios {
|
||||
t.Run(fmt.Sprintf("s%d:%s", i, scenario.input), func(t *testing.T) {
|
||||
v, err := Parse(scenario.input)
|
||||
|
||||
if scenario.expectedError && err == nil {
|
||||
t.Fatalf("Expected error, got nil (%q)", scenario.input)
|
||||
}
|
||||
|
||||
if !scenario.expectedError && err != nil {
|
||||
t.Fatalf("Did not expect error, got %q (%q).", err, scenario.input)
|
||||
}
|
||||
|
||||
vPrint := fmt.Sprintf("%v", v)
|
||||
|
||||
if vPrint != scenario.expectedPrint {
|
||||
t.Fatalf("Expected %s, got %s", scenario.expectedPrint, vPrint)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
679
scanner.go
Normal file
679
scanner.go
Normal file
|
@ -0,0 +1,679 @@
|
|||
package fexpr
|
||||
|
||||
import (
|
||||
"bytes"
|
||||
"fmt"
|
||||
"strings"
|
||||
"unicode/utf8"
|
||||
)
|
||||
|
||||
// eof represents a marker rune for the end of the reader.
|
||||
const eof = rune(0)
|
||||
|
||||
// JoinOp represents a join type operator.
|
||||
type JoinOp string
|
||||
|
||||
// supported join type operators
|
||||
const (
|
||||
JoinAnd JoinOp = "&&"
|
||||
JoinOr JoinOp = "||"
|
||||
)
|
||||
|
||||
// SignOp represents an expression sign operator.
|
||||
type SignOp string
|
||||
|
||||
// supported expression sign operators
|
||||
const (
|
||||
SignEq SignOp = "="
|
||||
SignNeq SignOp = "!="
|
||||
SignLike SignOp = "~"
|
||||
SignNlike SignOp = "!~"
|
||||
SignLt SignOp = "<"
|
||||
SignLte SignOp = "<="
|
||||
SignGt SignOp = ">"
|
||||
SignGte SignOp = ">="
|
||||
|
||||
// array/any operators
|
||||
SignAnyEq SignOp = "?="
|
||||
SignAnyNeq SignOp = "?!="
|
||||
SignAnyLike SignOp = "?~"
|
||||
SignAnyNlike SignOp = "?!~"
|
||||
SignAnyLt SignOp = "?<"
|
||||
SignAnyLte SignOp = "?<="
|
||||
SignAnyGt SignOp = "?>"
|
||||
SignAnyGte SignOp = "?>="
|
||||
)
|
||||
|
||||
// TokenType represents a Token type.
|
||||
type TokenType string
|
||||
|
||||
// token type constants
|
||||
const (
|
||||
TokenUnexpected TokenType = "unexpected"
|
||||
TokenEOF TokenType = "eof"
|
||||
TokenWS TokenType = "whitespace"
|
||||
TokenJoin TokenType = "join"
|
||||
TokenSign TokenType = "sign"
|
||||
TokenIdentifier TokenType = "identifier" // variable, column name, placeholder, etc.
|
||||
TokenFunction TokenType = "function" // function
|
||||
TokenNumber TokenType = "number"
|
||||
TokenText TokenType = "text" // ' or " quoted string
|
||||
TokenGroup TokenType = "group" // groupped/nested tokens
|
||||
TokenComment TokenType = "comment"
|
||||
)
|
||||
|
||||
// Token represents a single scanned literal (one or more combined runes).
|
||||
type Token struct {
|
||||
Meta interface{}
|
||||
Type TokenType
|
||||
Literal string
|
||||
}
|
||||
|
||||
// NewScanner creates and returns a new scanner instance loaded with the specified data.
|
||||
func NewScanner(data []byte) *Scanner {
|
||||
return &Scanner{
|
||||
data: data,
|
||||
maxFuncDepth: 3,
|
||||
}
|
||||
}
|
||||
|
||||
// Scanner represents a filter and lexical scanner.
|
||||
type Scanner struct {
|
||||
data []byte
|
||||
pos int
|
||||
maxFuncDepth int
|
||||
}
|
||||
|
||||
// Scan reads and returns the next available token value from the scanner's buffer.
|
||||
func (s *Scanner) Scan() (Token, error) {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
return Token{Type: TokenEOF, Literal: ""}, nil
|
||||
}
|
||||
|
||||
if isWhitespaceRune(ch) {
|
||||
s.unread()
|
||||
return s.scanWhitespace()
|
||||
}
|
||||
|
||||
if isGroupStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanGroup()
|
||||
}
|
||||
|
||||
if isIdentifierStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanIdentifier(s.maxFuncDepth)
|
||||
}
|
||||
|
||||
if isNumberStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanNumber()
|
||||
}
|
||||
|
||||
if isTextStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanText(false)
|
||||
}
|
||||
|
||||
if isSignStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanSign()
|
||||
}
|
||||
|
||||
if isJoinStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanJoin()
|
||||
}
|
||||
|
||||
if isCommentStartRune(ch) {
|
||||
s.unread()
|
||||
return s.scanComment()
|
||||
}
|
||||
|
||||
return Token{Type: TokenUnexpected, Literal: string(ch)}, fmt.Errorf("unexpected character %q", ch)
|
||||
}
|
||||
|
||||
// scanWhitespace consumes all contiguous whitespace runes.
|
||||
func (s *Scanner) scanWhitespace() (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// Reads every subsequent whitespace character into the buffer.
|
||||
// Non-whitespace runes and EOF will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
if !isWhitespaceRune(ch) {
|
||||
s.unread()
|
||||
break
|
||||
}
|
||||
|
||||
// write the whitespace rune
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
|
||||
return Token{Type: TokenWS, Literal: buf.String()}, nil
|
||||
}
|
||||
|
||||
// scanNumber consumes all contiguous digit runes
|
||||
// (complex numbers and scientific notations are not supported).
|
||||
func (s *Scanner) scanNumber() (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
var hadDot bool
|
||||
|
||||
// Read every subsequent digit rune into the buffer.
|
||||
// Non-digit runes and EOF will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
// not a digit rune
|
||||
if !isDigitRune(ch) &&
|
||||
// minus sign but not at the beginning
|
||||
(ch != '-' || buf.Len() != 0) &&
|
||||
// dot but there was already another dot
|
||||
(ch != '.' || hadDot) {
|
||||
s.unread()
|
||||
break
|
||||
}
|
||||
|
||||
// write the rune
|
||||
buf.WriteRune(ch)
|
||||
|
||||
if ch == '.' {
|
||||
hadDot = true
|
||||
}
|
||||
}
|
||||
|
||||
total := buf.Len()
|
||||
literal := buf.String()
|
||||
|
||||
var err error
|
||||
// only "-" or starts with "." or ends with "."
|
||||
if (total == 1 && literal[0] == '-') || literal[0] == '.' || literal[total-1] == '.' {
|
||||
err = fmt.Errorf("invalid number %q", literal)
|
||||
}
|
||||
|
||||
return Token{Type: TokenNumber, Literal: buf.String()}, err
|
||||
}
|
||||
|
||||
// scanText consumes all contiguous quoted text runes.
|
||||
func (s *Scanner) scanText(preserveQuotes bool) (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// read the first rune to determine the quotes type
|
||||
firstCh := s.read()
|
||||
buf.WriteRune(firstCh)
|
||||
var prevCh rune
|
||||
var hasMatchingQuotes bool
|
||||
|
||||
// Read every subsequent text rune into the buffer.
|
||||
// EOF and matching unescaped ending quote will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
// write the text rune
|
||||
buf.WriteRune(ch)
|
||||
|
||||
// unescaped matching quote, aka. the end
|
||||
if ch == firstCh && prevCh != '\\' {
|
||||
hasMatchingQuotes = true
|
||||
break
|
||||
}
|
||||
|
||||
prevCh = ch
|
||||
}
|
||||
|
||||
literal := buf.String()
|
||||
|
||||
var err error
|
||||
if !hasMatchingQuotes {
|
||||
err = fmt.Errorf("invalid quoted text %q", literal)
|
||||
} else if !preserveQuotes {
|
||||
// unquote
|
||||
literal = literal[1 : len(literal)-1]
|
||||
// remove escaped quotes prefix (aka. \)
|
||||
firstChStr := string(firstCh)
|
||||
literal = strings.ReplaceAll(literal, `\`+firstChStr, firstChStr)
|
||||
}
|
||||
|
||||
return Token{Type: TokenText, Literal: literal}, err
|
||||
}
|
||||
|
||||
// scanComment consumes all contiguous single line comment runes until
|
||||
// a new character (\n) or EOF is reached.
|
||||
func (s *Scanner) scanComment() (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// Read the first 2 characters without writting them to the buffer.
|
||||
if !isCommentStartRune(s.read()) || !isCommentStartRune(s.read()) {
|
||||
return Token{Type: TokenComment}, ErrInvalidComment
|
||||
}
|
||||
|
||||
// Read every subsequent comment text rune into the buffer.
|
||||
// \n and EOF will cause the loop to exit.
|
||||
for i := 0; ; i++ {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof || ch == '\n' {
|
||||
break
|
||||
}
|
||||
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
|
||||
return Token{Type: TokenComment, Literal: strings.TrimSpace(buf.String())}, nil
|
||||
}
|
||||
|
||||
// scanIdentifier consumes all contiguous ident runes.
|
||||
func (s *Scanner) scanIdentifier(funcDepth int) (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// read the first rune in case it is a special start identifier character
|
||||
buf.WriteRune(s.read())
|
||||
|
||||
// Read every subsequent identifier rune into the buffer.
|
||||
// Non-ident runes and EOF will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
// func
|
||||
if ch == '(' {
|
||||
funcName := buf.String()
|
||||
if funcDepth <= 0 {
|
||||
return Token{Type: TokenFunction, Literal: funcName}, fmt.Errorf("max nested function arguments reached (max: %d)", s.maxFuncDepth)
|
||||
}
|
||||
if !isValidIdentifier(funcName) {
|
||||
return Token{Type: TokenFunction, Literal: funcName}, fmt.Errorf("invalid function name %q", funcName)
|
||||
}
|
||||
s.unread()
|
||||
return s.scanFunctionArgs(funcName, funcDepth)
|
||||
}
|
||||
|
||||
// not an identifier character
|
||||
if !isLetterRune(ch) && !isDigitRune(ch) && !isIdentifierCombineRune(ch) && ch != '_' {
|
||||
s.unread()
|
||||
break
|
||||
}
|
||||
|
||||
// write the identifier rune
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
|
||||
literal := buf.String()
|
||||
|
||||
var err error
|
||||
if !isValidIdentifier(literal) {
|
||||
err = fmt.Errorf("invalid identifier %q", literal)
|
||||
}
|
||||
|
||||
return Token{Type: TokenIdentifier, Literal: literal}, err
|
||||
}
|
||||
|
||||
// scanSign consumes all contiguous sign operator runes.
|
||||
func (s *Scanner) scanSign() (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// Read every subsequent sign rune into the buffer.
|
||||
// Non-sign runes and EOF will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
if !isSignStartRune(ch) {
|
||||
s.unread()
|
||||
break
|
||||
}
|
||||
|
||||
// write the sign rune
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
|
||||
literal := buf.String()
|
||||
|
||||
var err error
|
||||
if !isSignOperator(literal) {
|
||||
err = fmt.Errorf("invalid sign operator %q", literal)
|
||||
}
|
||||
|
||||
return Token{Type: TokenSign, Literal: literal}, err
|
||||
}
|
||||
|
||||
// scanJoin consumes all contiguous join operator runes.
|
||||
func (s *Scanner) scanJoin() (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// Read every subsequent join operator rune into the buffer.
|
||||
// Non-join runes and EOF will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
if !isJoinStartRune(ch) {
|
||||
s.unread()
|
||||
break
|
||||
}
|
||||
|
||||
// write the join operator rune
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
|
||||
literal := buf.String()
|
||||
|
||||
var err error
|
||||
if !isJoinOperator(literal) {
|
||||
err = fmt.Errorf("invalid join operator %q", literal)
|
||||
}
|
||||
|
||||
return Token{Type: TokenJoin, Literal: literal}, err
|
||||
}
|
||||
|
||||
// scanGroup consumes all runes within a group/parenthesis.
|
||||
func (s *Scanner) scanGroup() (Token, error) {
|
||||
var buf bytes.Buffer
|
||||
|
||||
// read the first group bracket without writing it to the buffer
|
||||
firstChar := s.read()
|
||||
openGroups := 1
|
||||
|
||||
// Read every subsequent text rune into the buffer.
|
||||
// EOF and matching unescaped ending quote will cause the loop to exit.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
if isGroupStartRune(ch) {
|
||||
// nested group
|
||||
openGroups++
|
||||
buf.WriteRune(ch)
|
||||
} else if isTextStartRune(ch) {
|
||||
s.unread()
|
||||
t, err := s.scanText(true) // with quotes to preserve the exact text start/end runes
|
||||
if err != nil {
|
||||
// write the errored literal as it is
|
||||
buf.WriteString(t.Literal)
|
||||
return Token{Type: TokenGroup, Literal: buf.String()}, err
|
||||
}
|
||||
|
||||
buf.WriteString(t.Literal)
|
||||
} else if ch == ')' {
|
||||
openGroups--
|
||||
|
||||
if openGroups <= 0 {
|
||||
// main group end
|
||||
break
|
||||
} else {
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
} else {
|
||||
buf.WriteRune(ch)
|
||||
}
|
||||
}
|
||||
|
||||
literal := buf.String()
|
||||
|
||||
var err error
|
||||
if !isGroupStartRune(firstChar) || openGroups > 0 {
|
||||
err = fmt.Errorf("invalid formatted group - missing %d closing bracket(s)", openGroups)
|
||||
}
|
||||
|
||||
return Token{Type: TokenGroup, Literal: literal}, err
|
||||
}
|
||||
|
||||
// scanFunctionArgs consumes all contiguous function call runes to
|
||||
// extract its arguments and returns a function token with the found
|
||||
// Token arguments loaded in Token.Meta.
|
||||
func (s *Scanner) scanFunctionArgs(funcName string, funcDepth int) (Token, error) {
|
||||
var args []Token
|
||||
|
||||
var expectComma, isComma, isClosed bool
|
||||
|
||||
ch := s.read()
|
||||
if ch != '(' {
|
||||
return Token{Type: TokenFunction, Literal: funcName}, fmt.Errorf("invalid or incomplete function call %q", funcName)
|
||||
}
|
||||
|
||||
// Read every subsequent rune until ')' or EOF has been reached.
|
||||
for {
|
||||
ch := s.read()
|
||||
|
||||
if ch == eof {
|
||||
break
|
||||
}
|
||||
|
||||
if ch == ')' {
|
||||
isClosed = true
|
||||
break
|
||||
}
|
||||
|
||||
// skip whitespaces
|
||||
if isWhitespaceRune(ch) {
|
||||
_, err := s.scanWhitespace()
|
||||
if err != nil {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("failed to scan whitespaces in function %q: %w", funcName, err)
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
// skip comments
|
||||
if isCommentStartRune(ch) {
|
||||
s.unread()
|
||||
_, err := s.scanComment()
|
||||
if err != nil {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("failed to scan comment in function %q: %w", funcName, err)
|
||||
}
|
||||
continue
|
||||
}
|
||||
|
||||
isComma = ch == ','
|
||||
|
||||
if expectComma && !isComma {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("expected comma after the last argument in function %q", funcName)
|
||||
}
|
||||
|
||||
if !expectComma && isComma {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("unexpected comma in function %q", funcName)
|
||||
}
|
||||
|
||||
expectComma = false // reset
|
||||
|
||||
if isComma {
|
||||
continue
|
||||
}
|
||||
|
||||
if isIdentifierStartRune(ch) {
|
||||
s.unread()
|
||||
t, err := s.scanIdentifier(funcDepth - 1)
|
||||
if err != nil {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid identifier argument %q in function %q: %w", t.Literal, funcName, err)
|
||||
}
|
||||
args = append(args, t)
|
||||
expectComma = true
|
||||
} else if isNumberStartRune(ch) {
|
||||
s.unread()
|
||||
t, err := s.scanNumber()
|
||||
if err != nil {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid number argument %q in function %q: %w", t.Literal, funcName, err)
|
||||
}
|
||||
args = append(args, t)
|
||||
expectComma = true
|
||||
} else if isTextStartRune(ch) {
|
||||
s.unread()
|
||||
t, err := s.scanText(false)
|
||||
if err != nil {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid text argument %q in function %q: %w", t.Literal, funcName, err)
|
||||
}
|
||||
args = append(args, t)
|
||||
expectComma = true
|
||||
} else {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("unsupported argument character %q in function %q", ch, funcName)
|
||||
}
|
||||
}
|
||||
|
||||
if !isClosed {
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid or incomplete function %q (expected ')')", funcName)
|
||||
}
|
||||
|
||||
return Token{Type: TokenFunction, Literal: funcName, Meta: args}, nil
|
||||
}
|
||||
|
||||
// unread unreads the last character and revert the position 1 step back.
|
||||
func (s *Scanner) unread() {
|
||||
if s.pos > 0 {
|
||||
s.pos = s.pos - 1
|
||||
}
|
||||
}
|
||||
|
||||
// read reads the next rune and moves the position forward.
|
||||
func (s *Scanner) read() rune {
|
||||
if s.pos >= len(s.data) {
|
||||
return eof
|
||||
}
|
||||
|
||||
ch, n := utf8.DecodeRune(s.data[s.pos:])
|
||||
s.pos += n
|
||||
|
||||
return ch
|
||||
}
|
||||
|
||||
// Lexical helpers:
|
||||
// -------------------------------------------------------------------
|
||||
|
||||
// isWhitespaceRune checks if a rune is a space, tab, or newline.
|
||||
func isWhitespaceRune(ch rune) bool { return ch == ' ' || ch == '\t' || ch == '\n' }
|
||||
|
||||
// isLetterRune checks if a rune is a letter.
|
||||
func isLetterRune(ch rune) bool {
|
||||
return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')
|
||||
}
|
||||
|
||||
// isDigitRune checks if a rune is a digit.
|
||||
func isDigitRune(ch rune) bool {
|
||||
return (ch >= '0' && ch <= '9')
|
||||
}
|
||||
|
||||
// isTextStartRune checks if a rune is a valid quoted text first character
|
||||
// (aka. single or double quote).
|
||||
func isTextStartRune(ch rune) bool {
|
||||
return ch == '\'' || ch == '"'
|
||||
}
|
||||
|
||||
// isNumberStartRune checks if a rune is a valid number start character (aka. digit).
|
||||
func isNumberStartRune(ch rune) bool {
|
||||
return ch == '-' || isDigitRune(ch)
|
||||
}
|
||||
|
||||
// isSignStartRune checks if a rune is a valid sign operator start character.
|
||||
func isSignStartRune(ch rune) bool {
|
||||
return ch == '=' ||
|
||||
ch == '?' ||
|
||||
ch == '!' ||
|
||||
ch == '>' ||
|
||||
ch == '<' ||
|
||||
ch == '~'
|
||||
}
|
||||
|
||||
// isJoinStartRune checks if a rune is a valid join type start character.
|
||||
func isJoinStartRune(ch rune) bool {
|
||||
return ch == '&' || ch == '|'
|
||||
}
|
||||
|
||||
// isGroupStartRune checks if a rune is a valid group/parenthesis start character.
|
||||
func isGroupStartRune(ch rune) bool {
|
||||
return ch == '('
|
||||
}
|
||||
|
||||
// isCommentStartRune checks if a rune is a valid comment start character.
|
||||
func isCommentStartRune(ch rune) bool {
|
||||
return ch == '/'
|
||||
}
|
||||
|
||||
// isIdentifierStartRune checks if a rune is valid identifier's first character.
|
||||
func isIdentifierStartRune(ch rune) bool {
|
||||
return isLetterRune(ch) || isIdentifierSpecialStartRune(ch)
|
||||
}
|
||||
|
||||
// isIdentifierSpecialStartRune checks if a rune is valid identifier's first special character.
|
||||
func isIdentifierSpecialStartRune(ch rune) bool {
|
||||
return ch == '@' || ch == '_' || ch == '#'
|
||||
}
|
||||
|
||||
// isIdentifierCombineRune checks if a rune is valid identifier's combine character.
|
||||
func isIdentifierCombineRune(ch rune) bool {
|
||||
return ch == '.' || ch == ':'
|
||||
}
|
||||
|
||||
// isSignOperator checks if a literal is a valid sign operator.
|
||||
func isSignOperator(literal string) bool {
|
||||
switch SignOp(literal) {
|
||||
case
|
||||
SignEq,
|
||||
SignNeq,
|
||||
SignLt,
|
||||
SignLte,
|
||||
SignGt,
|
||||
SignGte,
|
||||
SignLike,
|
||||
SignNlike,
|
||||
SignAnyEq,
|
||||
SignAnyNeq,
|
||||
SignAnyLike,
|
||||
SignAnyNlike,
|
||||
SignAnyLt,
|
||||
SignAnyLte,
|
||||
SignAnyGt,
|
||||
SignAnyGte:
|
||||
return true
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
// isJoinOperator checks if a literal is a valid join type operator.
|
||||
func isJoinOperator(literal string) bool {
|
||||
switch JoinOp(literal) {
|
||||
case
|
||||
JoinAnd,
|
||||
JoinOr:
|
||||
return true
|
||||
}
|
||||
|
||||
return false
|
||||
}
|
||||
|
||||
// isValidIdentifier validates the literal against common identifier requirements.
|
||||
func isValidIdentifier(literal string) bool {
|
||||
length := len(literal)
|
||||
|
||||
return (
|
||||
// doesn't end with combine rune
|
||||
!isIdentifierCombineRune(rune(literal[length-1])) &&
|
||||
// is not just a special start rune
|
||||
(length != 1 || !isIdentifierSpecialStartRune(rune(literal[0]))))
|
||||
}
|
166
scanner_test.go
Normal file
166
scanner_test.go
Normal file
|
@ -0,0 +1,166 @@
|
|||
package fexpr
|
||||
|
||||
import (
|
||||
"fmt"
|
||||
"testing"
|
||||
)
|
||||
|
||||
func TestNewScanner(t *testing.T) {
|
||||
s := NewScanner([]byte("test"))
|
||||
|
||||
data := string(s.data)
|
||||
|
||||
if data != "test" {
|
||||
t.Errorf("Expected the scanner reader data to be %q, got %q", "test", data)
|
||||
}
|
||||
}
|
||||
|
||||
func TestScannerScan(t *testing.T) {
|
||||
type output struct {
|
||||
error bool
|
||||
print string
|
||||
}
|
||||
testScenarios := []struct {
|
||||
text string
|
||||
expects []output
|
||||
}{
|
||||
// whitespace
|
||||
{" ", []output{{false, "{<nil> whitespace }"}}},
|
||||
{"test 123", []output{{false, "{<nil> identifier test}"}, {false, "{<nil> whitespace }"}, {false, "{<nil> number 123}"}}},
|
||||
// identifier
|
||||
{`test`, []output{{false, `{<nil> identifier test}`}}},
|
||||
{`@`, []output{{true, `{<nil> identifier @}`}}},
|
||||
{`test:`, []output{{true, `{<nil> identifier test:}`}}},
|
||||
{`test.`, []output{{true, `{<nil> identifier test.}`}}},
|
||||
{`@test.123:c`, []output{{false, `{<nil> identifier @test.123:c}`}}},
|
||||
{`_test_a.123`, []output{{false, `{<nil> identifier _test_a.123}`}}},
|
||||
{`#test.123:456`, []output{{false, `{<nil> identifier #test.123:456}`}}},
|
||||
{`.test.123`, []output{{true, `{<nil> unexpected .}`}, {false, `{<nil> identifier test.123}`}}},
|
||||
{`:test.123`, []output{{true, `{<nil> unexpected :}`}, {false, `{<nil> identifier test.123}`}}},
|
||||
{`test#@`, []output{{false, `{<nil> identifier test}`}, {true, `{<nil> identifier #}`}, {true, `{<nil> identifier @}`}}},
|
||||
{`test'`, []output{{false, `{<nil> identifier test}`}, {true, `{<nil> text '}`}}},
|
||||
{`test"d`, []output{{false, `{<nil> identifier test}`}, {true, `{<nil> text "d}`}}},
|
||||
// number
|
||||
{`123`, []output{{false, `{<nil> number 123}`}}},
|
||||
{`-123`, []output{{false, `{<nil> number -123}`}}},
|
||||
{`-123.456`, []output{{false, `{<nil> number -123.456}`}}},
|
||||
{`123.456`, []output{{false, `{<nil> number 123.456}`}}},
|
||||
{`12.34.56`, []output{{false, `{<nil> number 12.34}`}, {true, `{<nil> unexpected .}`}, {false, `{<nil> number 56}`}}},
|
||||
{`.123`, []output{{true, `{<nil> unexpected .}`}, {false, `{<nil> number 123}`}}},
|
||||
{`- 123`, []output{{true, `{<nil> number -}`}, {false, `{<nil> whitespace }`}, {false, `{<nil> number 123}`}}},
|
||||
{`12-3`, []output{{false, `{<nil> number 12}`}, {false, `{<nil> number -3}`}}},
|
||||
{`123.abc`, []output{{true, `{<nil> number 123.}`}, {false, `{<nil> identifier abc}`}}},
|
||||
// text
|
||||
{`""`, []output{{false, `{<nil> text }`}}},
|
||||
{`''`, []output{{false, `{<nil> text }`}}},
|
||||
{`'test'`, []output{{false, `{<nil> text test}`}}},
|
||||
{`'te\'st'`, []output{{false, `{<nil> text te'st}`}}},
|
||||
{`"te\"st"`, []output{{false, `{<nil> text te"st}`}}},
|
||||
{`"tes@#,;!@#%^'\"t"`, []output{{false, `{<nil> text tes@#,;!@#%^'"t}`}}},
|
||||
{`'tes@#,;!@#%^\'"t'`, []output{{false, `{<nil> text tes@#,;!@#%^'"t}`}}},
|
||||
{`"test`, []output{{true, `{<nil> text "test}`}}},
|
||||
{`'test`, []output{{true, `{<nil> text 'test}`}}},
|
||||
{`'АБЦ`, []output{{true, `{<nil> text 'АБЦ}`}}},
|
||||
// join types
|
||||
{`&&||`, []output{{true, `{<nil> join &&||}`}}},
|
||||
{`&& ||`, []output{{false, `{<nil> join &&}`}, {false, `{<nil> whitespace }`}, {false, `{<nil> join ||}`}}},
|
||||
{`'||test&&'&&123`, []output{{false, `{<nil> text ||test&&}`}, {false, `{<nil> join &&}`}, {false, `{<nil> number 123}`}}},
|
||||
// expression signs
|
||||
{`=!=`, []output{{true, `{<nil> sign =!=}`}}},
|
||||
{`= != ~ !~ > >= < <= ?= ?!= ?~ ?!~ ?> ?>= ?< ?<=`, []output{
|
||||
{false, `{<nil> sign =}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign !=}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ~}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign !~}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign >}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign >=}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign <}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign <=}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?=}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?!=}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?~}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?!~}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?>}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?>=}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?<}`},
|
||||
{false, `{<nil> whitespace }`},
|
||||
{false, `{<nil> sign ?<=}`},
|
||||
}},
|
||||
// comments
|
||||
{`/ test`, []output{{true, `{<nil> comment }`}, {false, `{<nil> identifier test}`}}},
|
||||
{`/ / test`, []output{{true, `{<nil> comment }`}, {true, `{<nil> comment }`}, {false, `{<nil> identifier test}`}}},
|
||||
{`//`, []output{{false, `{<nil> comment }`}}},
|
||||
{`//test`, []output{{false, `{<nil> comment test}`}}},
|
||||
{`// test`, []output{{false, `{<nil> comment test}`}}},
|
||||
{`// test1 //test2 `, []output{{false, `{<nil> comment test1 //test2}`}}},
|
||||
{`///test`, []output{{false, `{<nil> comment /test}`}}},
|
||||
// funcs
|
||||
{`test()`, []output{{false, `{[] function test}`}}},
|
||||
{`test(a, b`, []output{{true, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},
|
||||
{`@test:abc()`, []output{{false, `{[] function @test:abc}`}}},
|
||||
{`test( a )`, []output{{false, `{[{<nil> identifier a}] function test}`}}}, // with whitespaces
|
||||
{`test(a, b)`, []output{{false, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},
|
||||
{`test(a, b, )`, []output{{false, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}}, // single trailing comma
|
||||
{`test(a,,)`, []output{{true, `{[{<nil> identifier a}] function test}`}, {true, `{<nil> unexpected )}`}}}, // unexpected trailing commas
|
||||
{`test(a,,,b)`, []output{{true, `{[{<nil> identifier a}] function test}`}, {true, `{<nil> unexpected ,}`}, {false, `{<nil> identifier b}`}, {true, `{<nil> unexpected )}`}}}, // unexpected mid-args commas
|
||||
{`test( @test.a.b:test , 123, "ab)c", 'd,ce', false)`, []output{{false, `{[{<nil> identifier @test.a.b:test} {<nil> number 123} {<nil> text ab)c} {<nil> text d,ce} {<nil> identifier false}] function test}`}}},
|
||||
{"test(a //test)", []output{{true, `{[{<nil> identifier a}] function test}`}}}, // invalid simple comment
|
||||
{"test(a //test\n)", []output{{false, `{[{<nil> identifier a}] function test}`}}}, // valid simple comment
|
||||
{"test(a, //test\n, b)", []output{{true, `{[{<nil> identifier a}] function test}`}, {false, `{<nil> whitespace }`}, {false, `{<nil> identifier b}`}, {true, `{<nil> unexpected )}`}}},
|
||||
{"test(a, //test\n b)", []output{{false, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},
|
||||
{"test(a, test(test(b), c), d)", []output{{false, `{[{<nil> identifier a} {[{[{<nil> identifier b}] function test} {<nil> identifier c}] function test} {<nil> identifier d}] function test}`}}},
|
||||
// max funcs depth
|
||||
{"a(b(c(1)))", []output{{false, `{[{[{[{<nil> number 1}] function c}] function b}] function a}`}}},
|
||||
{"a(b(c(d(1))))", []output{{true, `{[] function a}`}, {false, `{<nil> number 1}`}, {true, `{<nil> unexpected )}`}, {true, `{<nil> unexpected )}`}, {true, `{<nil> unexpected )}`}, {true, `{<nil> unexpected )}`}}},
|
||||
// groups/parenthesis
|
||||
{`a)`, []output{{false, `{<nil> identifier a}`}, {true, `{<nil> unexpected )}`}}},
|
||||
{`(a b c`, []output{{true, `{<nil> group a b c}`}}},
|
||||
{`(a b c)`, []output{{false, `{<nil> group a b c}`}}},
|
||||
{`((a b c))`, []output{{false, `{<nil> group (a b c)}`}}},
|
||||
{`((a )b c))`, []output{{false, `{<nil> group (a )b c}`}, {true, `{<nil> unexpected )}`}}},
|
||||
{`("ab)("c)`, []output{{false, `{<nil> group "ab)("c}`}}},
|
||||
{`("ab)(c)`, []output{{true, `{<nil> group "ab)(c)}`}}},
|
||||
{`( func(1, 2, 3, func(4)) a b c )`, []output{{false, `{<nil> group func(1, 2, 3, func(4)) a b c }`}}},
|
||||
}
|
||||
|
||||
for _, scenario := range testScenarios {
|
||||
t.Run(scenario.text, func(t *testing.T) {
|
||||
s := NewScanner([]byte(scenario.text))
|
||||
|
||||
// scan the text tokens
|
||||
for j, expect := range scenario.expects {
|
||||
token, err := s.Scan()
|
||||
|
||||
hasErr := err != nil
|
||||
if expect.error != hasErr {
|
||||
t.Errorf("[%d] Expected hasErr %v, got %v: %v (%v)", j, expect.error, hasErr, err, token)
|
||||
}
|
||||
|
||||
tokenPrint := fmt.Sprintf("%v", token)
|
||||
if tokenPrint != expect.print {
|
||||
t.Errorf("[%d] Expected token %s, got %s", j, expect.print, tokenPrint)
|
||||
}
|
||||
}
|
||||
|
||||
// the last remaining token should be the eof
|
||||
lastToken, err := s.Scan()
|
||||
if err != nil || lastToken.Type != TokenEOF {
|
||||
t.Fatalf("Expected EOF token, got %v (%v)", lastToken, err)
|
||||
}
|
||||
})
|
||||
}
|
||||
}
|
Loading…
Add table
Add a link
Reference in a new issue