Adding upstream version 0.5.0.

Signed-off-by: Daniel Baumann <daniel@debian.org>
2025-05-22 20:57:13 +02:00 · 2025-05-22 20:57:13 +02:00 · 16e40566d2
commit 16e40566d2
parent f9051e9424
8 changed files with 1303 additions and 0 deletions
--- a/LICENSE.md
+++ b/LICENSE.md
@ -0,0 +1,29 @@
+BSD 3-Clause License
+
+Copyright (c) 2022-present, Gani Georgiev
+All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+   list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice,
+   this list of conditions and the following disclaimer in the documentation
+   and/or other materials provided with the distribution.
+
+3. Neither the name of the copyright holder nor the names of its
+   contributors may be used to endorse or promote products derived from
+   this software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
--- a/README.md
+++ b/README.md
@ -0,0 +1,118 @@
+fexpr
+[![Go Report Card](https://goreportcard.com/badge/github.com/ganigeorgiev/fexpr)](https://goreportcard.com/report/github.com/ganigeorgiev/fexpr)
+[![GoDoc](https://godoc.org/github.com/ganigeorgiev/fexpr?status.svg)](https://pkg.go.dev/github.com/ganigeorgiev/fexpr)
+================================================================================
+
+**fexpr** is a filter query language parser that generates easy to work with AST structure so that you can create safely SQL, Elasticsearch, etc. queries from user input.
+
+Or in other words, transform the string `"id > 1"` into the struct `[{&& {{identifier id} > {number 1}}}]`.
+
+Supports parenthesis and various conditional expression operators (see [Grammar](https://github.com/ganigeorgiev/fexpr#grammar)).
+
+
+## Example usage
+
+```
+go get github.com/ganigeorgiev/fexpr
+```
+
+```go
+package main
+
+import github.com/ganigeorgiev/fexpr
+
+func main() {
+    // [{&& {{identifier id} = {number 123}}} {&& {{identifier status} = {text active}}}]
+    result, err := fexpr.Parse("id=123 && status='active'")
+}
+```
+
+> Note that each parsed expression statement contains a join/union operator (`&&` or `||`) so that the result can be consumed on small chunks without having to rely on the group/nesting context.
+
+> See the [package documentation](https://pkg.go.dev/github.com/ganigeorgiev/fexpr) for more details and examples.
+
+
+## Grammar
+
+**fexpr** grammar resembles the SQL `WHERE` expression syntax. It recognizes several token types (identifiers, numbers, quoted text, expression operators, whitespaces, etc.).
+
+> You could find all supported tokens in [`scanner.go`](https://github.com/ganigeorgiev/fexpr/blob/master/scanner.go).
+
+#### Operators
+
+- **`=`**  Equal operator (eg. `a=b`)
+- **`!=`** NOT Equal operator (eg. `a!=b`)
+- **`>`**  Greater than operator (eg. `a>b`)
+- **`>=`** Greater than or equal operator (eg. `a>=b`)
+- **`<`**  Less than or equal operator (eg. `a<b`)
+- **`<=`** Less than or equal operator (eg. `a<=b`)
+- **`~`**  Like/Contains operator (eg. `a~b`)
+- **`!~`** NOT Like/Contains operator (eg. `a!~b`)
+- **`?=`**  Array/Any equal operator (eg. `a?=b`)
+- **`?!=`** Array/Any NOT Equal operator (eg. `a?!=b`)
+- **`?>`**  Array/Any Greater than operator (eg. `a?>b`)
+- **`?>=`** Array/Any Greater than or equal operator (eg. `a?>=b`)
+- **`?<`**  Array/Any Less than or equal operator (eg. `a?<b`)
+- **`?<=`** Array/Any Less than or equal operator (eg. `a?<=b`)
+- **`?~`**  Array/Any Like/Contains operator (eg. `a?~b`)
+- **`?!~`** Array/Any NOT Like/Contains operator (eg. `a?!~b`)
+- **`&&`** AND join operator (eg. `a=b && c=d`)
+- **`||`** OR join operator (eg. `a=b || c=d`)
+- **`()`** Parenthesis (eg. `(a=1 && b=2) || (a=3 && b=4)`)
+
+#### Numbers
+Number tokens are any integer or decimal numbers.
+
+_Example_: `123`, `10.50`, `-14`.
+
+#### Quoted text
+
+Text tokens are any literals that are wrapped by `'` or `"` quotes.
+
+_Example_: `'Lorem ipsum dolor 123!'`, `"escaped \"word\""`, `"mixed 'quotes' are fine"`.
+
+#### Identifiers
+
+Identifier tokens are literals that start with a letter, `_`, `@` or `#` and could contain further any number of letters, digits, `.` (usually used as a separator) or `:` (usually used as modifier) characters.
+
+_Example_: `id`, `a.b.c`, `field123`, `@request.method`, `author.name:length`.
+
+#### Functions
+
+Function tokens are similar to the identifiers but in addition accept a list of arguments enclosed in parenthesis `()`.
+The function arguments must be separated by comma (_a single trailing comma is also allowed_) and each argument can be an identifier, quoted text, number or another nested function (_up to 2 nested_).
+
+_Example_: `test()`, `test(a.b, 123, "abc")`, `@a.b.c:test(true)`, `a(b(c(1, 2)))`.
+
+#### Comments
+
+Comment tokens are any single line text literals starting with `//`.
+Similar to whitespaces, comments are ignored by `fexpr.Parse()`.
+
+_Example_: `// test`.
+
+
+## Using only the scanner
+
+The tokenizer (aka. `fexpr.Scanner`) could be used without the parser's state machine so that you can write your own custom tokens processing:
+
+```go
+s := fexpr.NewScanner([]byte("id > 123"))
+
+// scan single token at a time until EOF or error is reached
+for {
+    t, err := s.Scan()
+    if t.Type == fexpr.TokenEOF || err != nil {
+        break
+    }
+
+    fmt.Println(t)
+}
+
+// Output:
+// {<nil> identifier id}
+// {<nil> whitespace  }
+// {<nil> sign >}
+// {<nil> whitespace  }
+// {<nil> number 123}
+```
--- a/examples_test.go
+++ b/examples_test.go
@ -0,0 +1,36 @@
+package fexpr_test
+
+import (
+	"fmt"
+
+	"github.com/ganigeorgiev/fexpr"
+)
+
+func ExampleScanner_Scan() {
+	s := fexpr.NewScanner([]byte("id > 123"))
+
+	for {
+		t, err := s.Scan()
+		if t.Type == fexpr.TokenEOF || err != nil {
+			break
+		}
+
+		fmt.Println(t)
+	}
+
+	// Output:
+	// {<nil> identifier id}
+	// {<nil> whitespace  }
+	// {<nil> sign >}
+	// {<nil> whitespace  }
+	// {<nil> number 123}
+}
+
+func ExampleParse() {
+	result, _ := fexpr.Parse("id > 123")
+
+	fmt.Println(result)
+
+	// Output:
+	// [{{{<nil> identifier id} > {<nil> number 123}} &&}]
+}
--- a/go.mod
+++ b/go.mod
@ -0,0 +1,3 @@
+module github.com/ganigeorgiev/fexpr
+
+go 1.16
--- a/parser.go
+++ b/parser.go
@ -0,0 +1,130 @@
+package fexpr
+
+import (
+	"errors"
+	"fmt"
+)
+
+var ErrEmpty = errors.New("empty filter expression")
+var ErrIncomplete = errors.New("invalid or incomplete filter expression")
+var ErrInvalidComment = errors.New("invalid comment")
+
+// Expr represents an individual tokenized expression consisting
+// of left operand, operator and a right operand.
+type Expr struct {
+	Left  Token
+	Op    SignOp
+	Right Token
+}
+
+// IsZero checks if the current Expr has zero-valued props.
+func (e Expr) IsZero() bool {
+	return e.Op == "" && e.Left.Literal == "" && e.Left.Type == "" && e.Right.Literal == "" && e.Right.Type == ""
+}
+
+// ExprGroup represents a wrapped expression and its join type.
+//
+// The group's Item could be either an `Expr` instance or `[]ExprGroup` slice (for nested expressions).
+type ExprGroup struct {
+	Item interface{}
+	Join JoinOp
+}
+
+// parser's state machine steps
+const (
+	stepBeforeSign = iota
+	stepSign
+	stepAfterSign
+	StepJoin
+)
+
+// Parse parses the provided text and returns its processed AST
+// in the form of `ExprGroup` slice(s).
+//
+// Comments and whitespaces are ignored.
+func Parse(text string) ([]ExprGroup, error) {
+	result := []ExprGroup{}
+	scanner := NewScanner([]byte(text))
+	step := stepBeforeSign
+	join := JoinAnd
+
+	var expr Expr
+
+	for {
+		t, err := scanner.Scan()
+		if err != nil {
+			return nil, err
+		}
+
+		if t.Type == TokenEOF {
+			break
+		}
+
+		if t.Type == TokenWS || t.Type == TokenComment {
+			continue
+		}
+
+		if t.Type == TokenGroup {
+			groupResult, err := Parse(t.Literal)
+			if err != nil {
+				return nil, err
+			}
+
+			// append only if non-empty group
+			if len(groupResult) > 0 {
+				result = append(result, ExprGroup{Join: join, Item: groupResult})
+			}
+
+			step = StepJoin
+			continue
+		}
+
+		switch step {
+		case stepBeforeSign:
+			if t.Type != TokenIdentifier && t.Type != TokenText && t.Type != TokenNumber && t.Type != TokenFunction {
+				return nil, fmt.Errorf("expected left operand (identifier, function, text or number), got %q (%s)", t.Literal, t.Type)
+			}
+
+			expr = Expr{Left: t}
+
+			step = stepSign
+		case stepSign:
+			if t.Type != TokenSign {
+				return nil, fmt.Errorf("expected a sign operator, got %q (%s)", t.Literal, t.Type)
+			}
+
+			expr.Op = SignOp(t.Literal)
+			step = stepAfterSign
+		case stepAfterSign:
+			if t.Type != TokenIdentifier && t.Type != TokenText && t.Type != TokenNumber && t.Type != TokenFunction {
+				return nil, fmt.Errorf("expected right operand (identifier, function text or number), got %q (%s)", t.Literal, t.Type)
+			}
+
+			expr.Right = t
+			result = append(result, ExprGroup{Join: join, Item: expr})
+
+			step = StepJoin
+		case StepJoin:
+			if t.Type != TokenJoin {
+				return nil, fmt.Errorf("expected && or ||, got %q (%s)", t.Literal, t.Type)
+			}
+
+			join = JoinAnd
+			if t.Literal == "||" {
+				join = JoinOr
+			}
+
+			step = stepBeforeSign
+		}
+	}
+
+	if step != StepJoin {
+		if len(result) == 0 && expr.IsZero() {
+			return nil, ErrEmpty
+		}
+
+		return nil, ErrIncomplete
+	}
+
+	return result, nil
+}
--- a/parser_test.go
+++ b/parser_test.go
@ -0,0 +1,142 @@
+package fexpr
+
+import (
+	"fmt"
+	"testing"
+)
+
+func TestExprIzZero(t *testing.T) {
+	scenarios := []struct {
+		expr   Expr
+		result bool
+	}{
+		{Expr{}, true},
+		{Expr{Op: SignAnyEq}, false},
+		{Expr{Left: Token{Literal: "123"}}, false},
+		{Expr{Left: Token{Type: TokenWS}}, false},
+		{Expr{Right: Token{Literal: "123"}}, false},
+		{Expr{Right: Token{Type: TokenWS}}, false},
+	}
+
+	for i, s := range scenarios {
+		t.Run(fmt.Sprintf("s%d", i), func(t *testing.T) {
+			if v := s.expr.IsZero(); v != s.result {
+				t.Fatalf("Expected %v, got %v for \n%v", s.result, v, s.expr)
+			}
+		})
+	}
+}
+
+func TestParse(t *testing.T) {
+	scenarios := []struct {
+		input         string
+		expectedError bool
+		expectedPrint string
+	}{
+		{`> 1`, true, "[]"},
+		{`a >`, true, "[]"},
+		{`a > >`, true, "[]"},
+		{`a > %`, true, "[]"},
+		{`a ! 1`, true, "[]"},
+		{`a - 1`, true, "[]"},
+		{`a + 1`, true, "[]"},
+		{`1 - 1`, true, "[]"},
+		{`1 + 1`, true, "[]"},
+		{`> a 1`, true, "[]"},
+		{`a || 1`, true, "[]"},
+		{`a && 1`, true, "[]"},
+		{`test > 1 &&`, true, `[]`},
+		{`|| test = 1`, true, `[]`},
+		{`test = 1 && ||`, true, "[]"},
+		{`test = 1 && a`, true, "[]"},
+		{`test = 1 && a`, true, "[]"},
+		{`test = 1 && "a"`, true, "[]"},
+		{`test = 1 a`, true, "[]"},
+		{`test = 1 a`, true, "[]"},
+		{`test = 1 "a"`, true, "[]"},
+		{`test = 1@test`, true, "[]"},
+		{`test = .@test`, true, "[]"},
+		// mismatched text quotes
+		{`test = "demo'`, true, "[]"},
+		{`test = 'demo"`, true, "[]"},
+		{`test = 'demo'"`, true, "[]"},
+		{`test = 'demo''`, true, "[]"},
+		{`test = "demo"'`, true, "[]"},
+		{`test = "demo""`, true, "[]"},
+		{`test = ""demo""`, true, "[]"},
+		{`test = ''demo''`, true, "[]"},
+		{"test = `demo`", true, "[]"},
+		// comments
+		{"test = / demo", true, "[]"},
+		{"test = // demo", true, "[]"},
+		{"// demo", true, "[]"},
+		{"test = 123 // demo", false, "[{{{<nil> identifier test} = {<nil> number 123}} &&}]"},
+		{"test = // demo\n123", false, "[{{{<nil> identifier test} = {<nil> number 123}} &&}]"},
+		{`
+			a = 123 &&
+			// demo
+			b = 456
+		`, false, "[{{{<nil> identifier a} = {<nil> number 123}} &&} {{{<nil> identifier b} = {<nil> number 456}} &&}]"},
+		// functions
+		{`test() = 12`, false, `[{{{[] function test} = {<nil> number 12}} &&}]`},
+		{`(a.b.c(1) = d.e.f(2)) || 1=2`, false, `[{[{{{[{<nil> number 1}] function a.b.c} = {[{<nil> number 2}] function d.e.f}} &&}] &&} {{{<nil> number 1} = {<nil> number 2}} ||}]`},
+		// valid simple expression and sign operators check
+		{`1=12`, false, `[{{{<nil> number 1} = {<nil> number 12}} &&}]`},
+		{`   1    =    12    `, false, `[{{{<nil> number 1} = {<nil> number 12}} &&}]`},
+		{`"demo" != test`, false, `[{{{<nil> text demo} != {<nil> identifier test}} &&}]`},
+		{`a~1`, false, `[{{{<nil> identifier a} ~ {<nil> number 1}} &&}]`},
+		{`a !~ 1`, false, `[{{{<nil> identifier a} !~ {<nil> number 1}} &&}]`},
+		{`test>12`, false, `[{{{<nil> identifier test} > {<nil> number 12}} &&}]`},
+		{`test > 12`, false, `[{{{<nil> identifier test} > {<nil> number 12}} &&}]`},
+		{`test >="test"`, false, `[{{{<nil> identifier test} >= {<nil> text test}} &&}]`},
+		{`test<@demo.test2`, false, `[{{{<nil> identifier test} < {<nil> identifier @demo.test2}} &&}]`},
+		{`1<="test"`, false, `[{{{<nil> number 1} <= {<nil> text test}} &&}]`},
+		{`1<="te'st"`, false, `[{{{<nil> number 1} <= {<nil> text te'st}} &&}]`},
+		{`demo='te\'st'`, false, `[{{{<nil> identifier demo} = {<nil> text te'st}} &&}]`},
+		{`demo="te\'st"`, false, `[{{{<nil> identifier demo} = {<nil> text te\'st}} &&}]`},
+		{`demo="te\"st"`, false, `[{{{<nil> identifier demo} = {<nil> text te"st}} &&}]`},
+		// invalid parenthesis
+		{`(a=1`, true, `[]`},
+		{`a=1)`, true, `[]`},
+		{`((a=1)`, true, `[]`},
+		{`{a=1}`, true, `[]`},
+		{`[a=1]`, true, `[]`},
+		{`((a=1 || a=2) && c=1))`, true, `[]`},
+		// valid parenthesis
+		{`()`, true, `[]`},
+		{`(a=1)`, false, `[{[{{{<nil> identifier a} = {<nil> number 1}} &&}] &&}]`},
+		{`(a="test(")`, false, `[{[{{{<nil> identifier a} = {<nil> text test(}} &&}] &&}]`},
+		{`(a="test)")`, false, `[{[{{{<nil> identifier a} = {<nil> text test)}} &&}] &&}]`},
+		{`((a=1))`, false, `[{[{[{{{<nil> identifier a} = {<nil> number 1}} &&}] &&}] &&}]`},
+		{`a=1 || 2!=3`, false, `[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} ||}]`},
+		{`a=1 && 2!=3`, false, `[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} &&}]`},
+		{`a=1 && 2!=3 || "b"=a`, false, `[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} &&} {{{<nil> text b} = {<nil> identifier a}} ||}]`},
+		{`(a=1 && 2!=3) || "b"=a`, false, `[{[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> number 2} != {<nil> number 3}} &&}] &&} {{{<nil> text b} = {<nil> identifier a}} ||}]`},
+		{`((a=1 || a=2) && (c=1))`, false, `[{[{[{{{<nil> identifier a} = {<nil> number 1}} &&} {{{<nil> identifier a} = {<nil> number 2}} ||}] &&} {[{{{<nil> identifier c} = {<nil> number 1}} &&}] &&}] &&}]`},
+		// https://github.com/pocketbase/pocketbase/issues/5017
+		{`(a='"')`, false, `[{[{{{<nil> identifier a} = {<nil> text "}} &&}] &&}]`},
+		{`(a='\'')`, false, `[{[{{{<nil> identifier a} = {<nil> text '}} &&}] &&}]`},
+		{`(a="'")`, false, `[{[{{{<nil> identifier a} = {<nil> text '}} &&}] &&}]`},
+		{`(a="\"")`, false, `[{[{{{<nil> identifier a} = {<nil> text "}} &&}] &&}]`},
+	}
+
+	for i, scenario := range scenarios {
+		t.Run(fmt.Sprintf("s%d:%s", i, scenario.input), func(t *testing.T) {
+			v, err := Parse(scenario.input)
+
+			if scenario.expectedError && err == nil {
+				t.Fatalf("Expected error, got nil (%q)", scenario.input)
+			}
+
+			if !scenario.expectedError && err != nil {
+				t.Fatalf("Did not expect error, got %q (%q).", err, scenario.input)
+			}
+
+			vPrint := fmt.Sprintf("%v", v)
+
+			if vPrint != scenario.expectedPrint {
+				t.Fatalf("Expected %s, got %s", scenario.expectedPrint, vPrint)
+			}
+		})
+	}
+}
--- a/scanner.go
+++ b/scanner.go
@ -0,0 +1,679 @@
+package fexpr
+
+import (
+	"bytes"
+	"fmt"
+	"strings"
+	"unicode/utf8"
+)
+
+// eof represents a marker rune for the end of the reader.
+const eof = rune(0)
+
+// JoinOp represents a join type operator.
+type JoinOp string
+
+// supported join type operators
+const (
+	JoinAnd JoinOp = "&&"
+	JoinOr  JoinOp = "||"
+)
+
+// SignOp represents an expression sign operator.
+type SignOp string
+
+// supported expression sign operators
+const (
+	SignEq    SignOp = "="
+	SignNeq   SignOp = "!="
+	SignLike  SignOp = "~"
+	SignNlike SignOp = "!~"
+	SignLt    SignOp = "<"
+	SignLte   SignOp = "<="
+	SignGt    SignOp = ">"
+	SignGte   SignOp = ">="
+
+	// array/any operators
+	SignAnyEq    SignOp = "?="
+	SignAnyNeq   SignOp = "?!="
+	SignAnyLike  SignOp = "?~"
+	SignAnyNlike SignOp = "?!~"
+	SignAnyLt    SignOp = "?<"
+	SignAnyLte   SignOp = "?<="
+	SignAnyGt    SignOp = "?>"
+	SignAnyGte   SignOp = "?>="
+)
+
+// TokenType represents a Token type.
+type TokenType string
+
+// token type constants
+const (
+	TokenUnexpected TokenType = "unexpected"
+	TokenEOF        TokenType = "eof"
+	TokenWS         TokenType = "whitespace"
+	TokenJoin       TokenType = "join"
+	TokenSign       TokenType = "sign"
+	TokenIdentifier TokenType = "identifier" // variable, column name, placeholder, etc.
+	TokenFunction   TokenType = "function"   // function
+	TokenNumber     TokenType = "number"
+	TokenText       TokenType = "text"  // ' or " quoted string
+	TokenGroup      TokenType = "group" // groupped/nested tokens
+	TokenComment    TokenType = "comment"
+)
+
+// Token represents a single scanned literal (one or more combined runes).
+type Token struct {
+	Meta    interface{}
+	Type    TokenType
+	Literal string
+}
+
+// NewScanner creates and returns a new scanner instance loaded with the specified data.
+func NewScanner(data []byte) *Scanner {
+	return &Scanner{
+		data:         data,
+		maxFuncDepth: 3,
+	}
+}
+
+// Scanner represents a filter and lexical scanner.
+type Scanner struct {
+	data         []byte
+	pos          int
+	maxFuncDepth int
+}
+
+// Scan reads and returns the next available token value from the scanner's buffer.
+func (s *Scanner) Scan() (Token, error) {
+	ch := s.read()
+
+	if ch == eof {
+		return Token{Type: TokenEOF, Literal: ""}, nil
+	}
+
+	if isWhitespaceRune(ch) {
+		s.unread()
+		return s.scanWhitespace()
+	}
+
+	if isGroupStartRune(ch) {
+		s.unread()
+		return s.scanGroup()
+	}
+
+	if isIdentifierStartRune(ch) {
+		s.unread()
+		return s.scanIdentifier(s.maxFuncDepth)
+	}
+
+	if isNumberStartRune(ch) {
+		s.unread()
+		return s.scanNumber()
+	}
+
+	if isTextStartRune(ch) {
+		s.unread()
+		return s.scanText(false)
+	}
+
+	if isSignStartRune(ch) {
+		s.unread()
+		return s.scanSign()
+	}
+
+	if isJoinStartRune(ch) {
+		s.unread()
+		return s.scanJoin()
+	}
+
+	if isCommentStartRune(ch) {
+		s.unread()
+		return s.scanComment()
+	}
+
+	return Token{Type: TokenUnexpected, Literal: string(ch)}, fmt.Errorf("unexpected character %q", ch)
+}
+
+// scanWhitespace consumes all contiguous whitespace runes.
+func (s *Scanner) scanWhitespace() (Token, error) {
+	var buf bytes.Buffer
+
+	// Reads every subsequent whitespace character into the buffer.
+	// Non-whitespace runes and EOF will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		if !isWhitespaceRune(ch) {
+			s.unread()
+			break
+		}
+
+		// write the whitespace rune
+		buf.WriteRune(ch)
+	}
+
+	return Token{Type: TokenWS, Literal: buf.String()}, nil
+}
+
+// scanNumber consumes all contiguous digit runes
+// (complex numbers and scientific notations are not supported).
+func (s *Scanner) scanNumber() (Token, error) {
+	var buf bytes.Buffer
+
+	var hadDot bool
+
+	// Read every subsequent digit rune into the buffer.
+	// Non-digit runes and EOF will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		// not a digit rune
+		if !isDigitRune(ch) &&
+			// minus sign but not at the beginning
+			(ch != '-' || buf.Len() != 0) &&
+			// dot but there was already another dot
+			(ch != '.' || hadDot) {
+			s.unread()
+			break
+		}
+
+		// write the rune
+		buf.WriteRune(ch)
+
+		if ch == '.' {
+			hadDot = true
+		}
+	}
+
+	total := buf.Len()
+	literal := buf.String()
+
+	var err error
+	// only "-" or starts with "." or ends with "."
+	if (total == 1 && literal[0] == '-') || literal[0] == '.' || literal[total-1] == '.' {
+		err = fmt.Errorf("invalid number %q", literal)
+	}
+
+	return Token{Type: TokenNumber, Literal: buf.String()}, err
+}
+
+// scanText consumes all contiguous quoted text runes.
+func (s *Scanner) scanText(preserveQuotes bool) (Token, error) {
+	var buf bytes.Buffer
+
+	// read the first rune to determine the quotes type
+	firstCh := s.read()
+	buf.WriteRune(firstCh)
+	var prevCh rune
+	var hasMatchingQuotes bool
+
+	// Read every subsequent text rune into the buffer.
+	// EOF and matching unescaped ending quote will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		// write the text rune
+		buf.WriteRune(ch)
+
+		// unescaped matching quote, aka. the end
+		if ch == firstCh && prevCh != '\\' {
+			hasMatchingQuotes = true
+			break
+		}
+
+		prevCh = ch
+	}
+
+	literal := buf.String()
+
+	var err error
+	if !hasMatchingQuotes {
+		err = fmt.Errorf("invalid quoted text %q", literal)
+	} else if !preserveQuotes {
+		// unquote
+		literal = literal[1 : len(literal)-1]
+		// remove escaped quotes prefix (aka. \)
+		firstChStr := string(firstCh)
+		literal = strings.ReplaceAll(literal, `\`+firstChStr, firstChStr)
+	}
+
+	return Token{Type: TokenText, Literal: literal}, err
+}
+
+// scanComment consumes all contiguous single line comment runes until
+// a new character (\n) or EOF is reached.
+func (s *Scanner) scanComment() (Token, error) {
+	var buf bytes.Buffer
+
+	// Read the first 2 characters without writting them to the buffer.
+	if !isCommentStartRune(s.read()) || !isCommentStartRune(s.read()) {
+		return Token{Type: TokenComment}, ErrInvalidComment
+	}
+
+	// Read every subsequent comment text rune into the buffer.
+	// \n and EOF will cause the loop to exit.
+	for i := 0; ; i++ {
+		ch := s.read()
+
+		if ch == eof || ch == '\n' {
+			break
+		}
+
+		buf.WriteRune(ch)
+	}
+
+	return Token{Type: TokenComment, Literal: strings.TrimSpace(buf.String())}, nil
+}
+
+// scanIdentifier consumes all contiguous ident runes.
+func (s *Scanner) scanIdentifier(funcDepth int) (Token, error) {
+	var buf bytes.Buffer
+
+	// read the first rune in case it is a special start identifier character
+	buf.WriteRune(s.read())
+
+	// Read every subsequent identifier rune into the buffer.
+	// Non-ident runes and EOF will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		// func
+		if ch == '(' {
+			funcName := buf.String()
+			if funcDepth <= 0 {
+				return Token{Type: TokenFunction, Literal: funcName}, fmt.Errorf("max nested function arguments reached (max: %d)", s.maxFuncDepth)
+			}
+			if !isValidIdentifier(funcName) {
+				return Token{Type: TokenFunction, Literal: funcName}, fmt.Errorf("invalid function name %q", funcName)
+			}
+			s.unread()
+			return s.scanFunctionArgs(funcName, funcDepth)
+		}
+
+		// not an identifier character
+		if !isLetterRune(ch) && !isDigitRune(ch) && !isIdentifierCombineRune(ch) && ch != '_' {
+			s.unread()
+			break
+		}
+
+		// write the identifier rune
+		buf.WriteRune(ch)
+	}
+
+	literal := buf.String()
+
+	var err error
+	if !isValidIdentifier(literal) {
+		err = fmt.Errorf("invalid identifier %q", literal)
+	}
+
+	return Token{Type: TokenIdentifier, Literal: literal}, err
+}
+
+// scanSign consumes all contiguous sign operator runes.
+func (s *Scanner) scanSign() (Token, error) {
+	var buf bytes.Buffer
+
+	// Read every subsequent sign rune into the buffer.
+	// Non-sign runes and EOF will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		if !isSignStartRune(ch) {
+			s.unread()
+			break
+		}
+
+		// write the sign rune
+		buf.WriteRune(ch)
+	}
+
+	literal := buf.String()
+
+	var err error
+	if !isSignOperator(literal) {
+		err = fmt.Errorf("invalid sign operator %q", literal)
+	}
+
+	return Token{Type: TokenSign, Literal: literal}, err
+}
+
+// scanJoin consumes all contiguous join operator runes.
+func (s *Scanner) scanJoin() (Token, error) {
+	var buf bytes.Buffer
+
+	// Read every subsequent join operator rune into the buffer.
+	// Non-join runes and EOF will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		if !isJoinStartRune(ch) {
+			s.unread()
+			break
+		}
+
+		// write the join operator rune
+		buf.WriteRune(ch)
+	}
+
+	literal := buf.String()
+
+	var err error
+	if !isJoinOperator(literal) {
+		err = fmt.Errorf("invalid join operator %q", literal)
+	}
+
+	return Token{Type: TokenJoin, Literal: literal}, err
+}
+
+// scanGroup consumes all runes within a group/parenthesis.
+func (s *Scanner) scanGroup() (Token, error) {
+	var buf bytes.Buffer
+
+	// read the first group bracket without writing it to the buffer
+	firstChar := s.read()
+	openGroups := 1
+
+	// Read every subsequent text rune into the buffer.
+	// EOF and matching unescaped ending quote will cause the loop to exit.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		if isGroupStartRune(ch) {
+			// nested group
+			openGroups++
+			buf.WriteRune(ch)
+		} else if isTextStartRune(ch) {
+			s.unread()
+			t, err := s.scanText(true) // with quotes to preserve the exact text start/end runes
+			if err != nil {
+				// write the errored literal as it is
+				buf.WriteString(t.Literal)
+				return Token{Type: TokenGroup, Literal: buf.String()}, err
+			}
+
+			buf.WriteString(t.Literal)
+		} else if ch == ')' {
+			openGroups--
+
+			if openGroups <= 0 {
+				// main group end
+				break
+			} else {
+				buf.WriteRune(ch)
+			}
+		} else {
+			buf.WriteRune(ch)
+		}
+	}
+
+	literal := buf.String()
+
+	var err error
+	if !isGroupStartRune(firstChar) || openGroups > 0 {
+		err = fmt.Errorf("invalid formatted group - missing %d closing bracket(s)", openGroups)
+	}
+
+	return Token{Type: TokenGroup, Literal: literal}, err
+}
+
+// scanFunctionArgs consumes all contiguous function call runes to
+// extract its arguments and returns a function token with the found
+// Token arguments loaded in Token.Meta.
+func (s *Scanner) scanFunctionArgs(funcName string, funcDepth int) (Token, error) {
+	var args []Token
+
+	var expectComma, isComma, isClosed bool
+
+	ch := s.read()
+	if ch != '(' {
+		return Token{Type: TokenFunction, Literal: funcName}, fmt.Errorf("invalid or incomplete function call %q", funcName)
+	}
+
+	// Read every subsequent rune until ')' or EOF has been reached.
+	for {
+		ch := s.read()
+
+		if ch == eof {
+			break
+		}
+
+		if ch == ')' {
+			isClosed = true
+			break
+		}
+
+		// skip whitespaces
+		if isWhitespaceRune(ch) {
+			_, err := s.scanWhitespace()
+			if err != nil {
+				return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("failed to scan whitespaces in function %q: %w", funcName, err)
+			}
+			continue
+		}
+
+		// skip comments
+		if isCommentStartRune(ch) {
+			s.unread()
+			_, err := s.scanComment()
+			if err != nil {
+				return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("failed to scan comment in function %q: %w", funcName, err)
+			}
+			continue
+		}
+
+		isComma = ch == ','
+
+		if expectComma && !isComma {
+			return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("expected comma after the last argument in function %q", funcName)
+		}
+
+		if !expectComma && isComma {
+			return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("unexpected comma in function %q", funcName)
+		}
+
+		expectComma = false // reset
+
+		if isComma {
+			continue
+		}
+
+		if isIdentifierStartRune(ch) {
+			s.unread()
+			t, err := s.scanIdentifier(funcDepth - 1)
+			if err != nil {
+				return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid identifier argument %q in function %q: %w", t.Literal, funcName, err)
+			}
+			args = append(args, t)
+			expectComma = true
+		} else if isNumberStartRune(ch) {
+			s.unread()
+			t, err := s.scanNumber()
+			if err != nil {
+				return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid number argument %q in function %q: %w", t.Literal, funcName, err)
+			}
+			args = append(args, t)
+			expectComma = true
+		} else if isTextStartRune(ch) {
+			s.unread()
+			t, err := s.scanText(false)
+			if err != nil {
+				return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid text argument %q in function %q: %w", t.Literal, funcName, err)
+			}
+			args = append(args, t)
+			expectComma = true
+		} else {
+			return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("unsupported argument character %q in function %q", ch, funcName)
+		}
+	}
+
+	if !isClosed {
+		return Token{Type: TokenFunction, Literal: funcName, Meta: args}, fmt.Errorf("invalid or incomplete function %q (expected ')')", funcName)
+	}
+
+	return Token{Type: TokenFunction, Literal: funcName, Meta: args}, nil
+}
+
+// unread unreads the last character and revert the position 1 step back.
+func (s *Scanner) unread() {
+	if s.pos > 0 {
+		s.pos = s.pos - 1
+	}
+}
+
+// read reads the next rune and moves the position forward.
+func (s *Scanner) read() rune {
+	if s.pos >= len(s.data) {
+		return eof
+	}
+
+	ch, n := utf8.DecodeRune(s.data[s.pos:])
+	s.pos += n
+
+	return ch
+}
+
+// Lexical helpers:
+// -------------------------------------------------------------------
+
+// isWhitespaceRune checks if a rune is a space, tab, or newline.
+func isWhitespaceRune(ch rune) bool { return ch == ' ' || ch == '\t' || ch == '\n' }
+
+// isLetterRune checks if a rune is a letter.
+func isLetterRune(ch rune) bool {
+	return (ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z')
+}
+
+// isDigitRune checks if a rune is a digit.
+func isDigitRune(ch rune) bool {
+	return (ch >= '0' && ch <= '9')
+}
+
+// isTextStartRune checks if a rune is a valid quoted text first character
+// (aka. single or double quote).
+func isTextStartRune(ch rune) bool {
+	return ch == '\'' || ch == '"'
+}
+
+// isNumberStartRune checks if a rune is a valid number start character (aka. digit).
+func isNumberStartRune(ch rune) bool {
+	return ch == '-' || isDigitRune(ch)
+}
+
+// isSignStartRune checks if a rune is a valid sign operator start character.
+func isSignStartRune(ch rune) bool {
+	return ch == '=' ||
+		ch == '?' ||
+		ch == '!' ||
+		ch == '>' ||
+		ch == '<' ||
+		ch == '~'
+}
+
+// isJoinStartRune checks if a rune is a valid join type start character.
+func isJoinStartRune(ch rune) bool {
+	return ch == '&' || ch == '|'
+}
+
+// isGroupStartRune checks if a rune is a valid group/parenthesis start character.
+func isGroupStartRune(ch rune) bool {
+	return ch == '('
+}
+
+// isCommentStartRune checks if a rune is a valid comment start character.
+func isCommentStartRune(ch rune) bool {
+	return ch == '/'
+}
+
+// isIdentifierStartRune checks if a rune is valid identifier's first character.
+func isIdentifierStartRune(ch rune) bool {
+	return isLetterRune(ch) || isIdentifierSpecialStartRune(ch)
+}
+
+// isIdentifierSpecialStartRune checks if a rune is valid identifier's first special character.
+func isIdentifierSpecialStartRune(ch rune) bool {
+	return ch == '@' || ch == '_' || ch == '#'
+}
+
+// isIdentifierCombineRune checks if a rune is valid identifier's combine character.
+func isIdentifierCombineRune(ch rune) bool {
+	return ch == '.' || ch == ':'
+}
+
+// isSignOperator checks if a literal is a valid sign operator.
+func isSignOperator(literal string) bool {
+	switch SignOp(literal) {
+	case
+		SignEq,
+		SignNeq,
+		SignLt,
+		SignLte,
+		SignGt,
+		SignGte,
+		SignLike,
+		SignNlike,
+		SignAnyEq,
+		SignAnyNeq,
+		SignAnyLike,
+		SignAnyNlike,
+		SignAnyLt,
+		SignAnyLte,
+		SignAnyGt,
+		SignAnyGte:
+		return true
+	}
+
+	return false
+}
+
+// isJoinOperator checks if a literal is a valid join type operator.
+func isJoinOperator(literal string) bool {
+	switch JoinOp(literal) {
+	case
+		JoinAnd,
+		JoinOr:
+		return true
+	}
+
+	return false
+}
+
+// isValidIdentifier validates the literal against common identifier requirements.
+func isValidIdentifier(literal string) bool {
+	length := len(literal)
+
+	return (
+	// doesn't end with combine rune
+	!isIdentifierCombineRune(rune(literal[length-1])) &&
+		// is not just a special start rune
+		(length != 1 || !isIdentifierSpecialStartRune(rune(literal[0]))))
+}
--- a/scanner_test.go
+++ b/scanner_test.go
@ -0,0 +1,166 @@
+package fexpr
+
+import (
+	"fmt"
+	"testing"
+)
+
+func TestNewScanner(t *testing.T) {
+	s := NewScanner([]byte("test"))
+
+	data := string(s.data)
+
+	if data != "test" {
+		t.Errorf("Expected the scanner reader data to be %q, got %q", "test", data)
+	}
+}
+
+func TestScannerScan(t *testing.T) {
+	type output struct {
+		error bool
+		print string
+	}
+	testScenarios := []struct {
+		text    string
+		expects []output
+	}{
+		// whitespace
+		{"   ", []output{{false, "{<nil> whitespace    }"}}},
+		{"test 123", []output{{false, "{<nil> identifier test}"}, {false, "{<nil> whitespace  }"}, {false, "{<nil> number 123}"}}},
+		// identifier
+		{`test`, []output{{false, `{<nil> identifier test}`}}},
+		{`@`, []output{{true, `{<nil> identifier @}`}}},
+		{`test:`, []output{{true, `{<nil> identifier test:}`}}},
+		{`test.`, []output{{true, `{<nil> identifier test.}`}}},
+		{`@test.123:c`, []output{{false, `{<nil> identifier @test.123:c}`}}},
+		{`_test_a.123`, []output{{false, `{<nil> identifier _test_a.123}`}}},
+		{`#test.123:456`, []output{{false, `{<nil> identifier #test.123:456}`}}},
+		{`.test.123`, []output{{true, `{<nil> unexpected .}`}, {false, `{<nil> identifier test.123}`}}},
+		{`:test.123`, []output{{true, `{<nil> unexpected :}`}, {false, `{<nil> identifier test.123}`}}},
+		{`test#@`, []output{{false, `{<nil> identifier test}`}, {true, `{<nil> identifier #}`}, {true, `{<nil> identifier @}`}}},
+		{`test'`, []output{{false, `{<nil> identifier test}`}, {true, `{<nil> text '}`}}},
+		{`test"d`, []output{{false, `{<nil> identifier test}`}, {true, `{<nil> text "d}`}}},
+		// number
+		{`123`, []output{{false, `{<nil> number 123}`}}},
+		{`-123`, []output{{false, `{<nil> number -123}`}}},
+		{`-123.456`, []output{{false, `{<nil> number -123.456}`}}},
+		{`123.456`, []output{{false, `{<nil> number 123.456}`}}},
+		{`12.34.56`, []output{{false, `{<nil> number 12.34}`}, {true, `{<nil> unexpected .}`}, {false, `{<nil> number 56}`}}},
+		{`.123`, []output{{true, `{<nil> unexpected .}`}, {false, `{<nil> number 123}`}}},
+		{`- 123`, []output{{true, `{<nil> number -}`}, {false, `{<nil> whitespace  }`}, {false, `{<nil> number 123}`}}},
+		{`12-3`, []output{{false, `{<nil> number 12}`}, {false, `{<nil> number -3}`}}},
+		{`123.abc`, []output{{true, `{<nil> number 123.}`}, {false, `{<nil> identifier abc}`}}},
+		// text
+		{`""`, []output{{false, `{<nil> text }`}}},
+		{`''`, []output{{false, `{<nil> text }`}}},
+		{`'test'`, []output{{false, `{<nil> text test}`}}},
+		{`'te\'st'`, []output{{false, `{<nil> text te'st}`}}},
+		{`"te\"st"`, []output{{false, `{<nil> text te"st}`}}},
+		{`"tes@#,;!@#%^'\"t"`, []output{{false, `{<nil> text tes@#,;!@#%^'"t}`}}},
+		{`'tes@#,;!@#%^\'"t'`, []output{{false, `{<nil> text tes@#,;!@#%^'"t}`}}},
+		{`"test`, []output{{true, `{<nil> text "test}`}}},
+		{`'test`, []output{{true, `{<nil> text 'test}`}}},
+		{`'АБЦ`, []output{{true, `{<nil> text 'АБЦ}`}}},
+		// join types
+		{`&&||`, []output{{true, `{<nil> join &&||}`}}},
+		{`&& ||`, []output{{false, `{<nil> join &&}`}, {false, `{<nil> whitespace  }`}, {false, `{<nil> join ||}`}}},
+		{`'||test&&'&&123`, []output{{false, `{<nil> text ||test&&}`}, {false, `{<nil> join &&}`}, {false, `{<nil> number 123}`}}},
+		// expression signs
+		{`=!=`, []output{{true, `{<nil> sign =!=}`}}},
+		{`= != ~ !~ > >= < <= ?= ?!= ?~ ?!~ ?> ?>= ?< ?<=`, []output{
+			{false, `{<nil> sign =}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign !=}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ~}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign !~}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign >}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign >=}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign <}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign <=}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?=}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?!=}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?~}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?!~}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?>}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?>=}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?<}`},
+			{false, `{<nil> whitespace  }`},
+			{false, `{<nil> sign ?<=}`},
+		}},
+		// comments
+		{`/ test`, []output{{true, `{<nil> comment }`}, {false, `{<nil> identifier test}`}}},
+		{`/ / test`, []output{{true, `{<nil> comment }`}, {true, `{<nil> comment }`}, {false, `{<nil> identifier test}`}}},
+		{`//`, []output{{false, `{<nil> comment }`}}},
+		{`//test`, []output{{false, `{<nil> comment test}`}}},
+		{`// test`, []output{{false, `{<nil> comment test}`}}},
+		{`//   test1 //test2  `, []output{{false, `{<nil> comment test1 //test2}`}}},
+		{`///test`, []output{{false, `{<nil> comment /test}`}}},
+		// funcs
+		{`test()`, []output{{false, `{[] function test}`}}},
+		{`test(a, b`, []output{{true, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},
+		{`@test:abc()`, []output{{false, `{[] function @test:abc}`}}},
+		{`test(  a  )`, []output{{false, `{[{<nil> identifier a}] function test}`}}}, // with whitespaces
+		{`test(a, b)`, []output{{false, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},
+		{`test(a, b,  )`, []output{{false, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},                                                                          // single trailing comma
+		{`test(a,,)`, []output{{true, `{[{<nil> identifier a}] function test}`}, {true, `{<nil> unexpected )}`}}},                                                                    // unexpected trailing commas
+		{`test(a,,,b)`, []output{{true, `{[{<nil> identifier a}] function test}`}, {true, `{<nil> unexpected ,}`}, {false, `{<nil> identifier b}`}, {true, `{<nil> unexpected )}`}}}, // unexpected mid-args commas
+		{`test(   @test.a.b:test  , 123, "ab)c", 'd,ce', false)`, []output{{false, `{[{<nil> identifier @test.a.b:test} {<nil> number 123} {<nil> text ab)c} {<nil> text d,ce} {<nil> identifier false}] function test}`}}},
+		{"test(a //test)", []output{{true, `{[{<nil> identifier a}] function test}`}}},    // invalid simple comment
+		{"test(a //test\n)", []output{{false, `{[{<nil> identifier a}] function test}`}}}, // valid simple comment
+		{"test(a, //test\n, b)", []output{{true, `{[{<nil> identifier a}] function test}`}, {false, `{<nil> whitespace  }`}, {false, `{<nil> identifier b}`}, {true, `{<nil> unexpected )}`}}},
+		{"test(a, //test\n b)", []output{{false, `{[{<nil> identifier a} {<nil> identifier b}] function test}`}}},
+		{"test(a, test(test(b), c), d)", []output{{false, `{[{<nil> identifier a} {[{[{<nil> identifier b}] function test} {<nil> identifier c}] function test} {<nil> identifier d}] function test}`}}},
+		// max funcs depth
+		{"a(b(c(1)))", []output{{false, `{[{[{[{<nil> number 1}] function c}] function b}] function a}`}}},
+		{"a(b(c(d(1))))", []output{{true, `{[] function a}`}, {false, `{<nil> number 1}`}, {true, `{<nil> unexpected )}`}, {true, `{<nil> unexpected )}`}, {true, `{<nil> unexpected )}`}, {true, `{<nil> unexpected )}`}}},
+		// groups/parenthesis
+		{`a)`, []output{{false, `{<nil> identifier a}`}, {true, `{<nil> unexpected )}`}}},
+		{`(a b c`, []output{{true, `{<nil> group a b c}`}}},
+		{`(a b c)`, []output{{false, `{<nil> group a b c}`}}},
+		{`((a b c))`, []output{{false, `{<nil> group (a b c)}`}}},
+		{`((a )b c))`, []output{{false, `{<nil> group (a )b c}`}, {true, `{<nil> unexpected )}`}}},
+		{`("ab)("c)`, []output{{false, `{<nil> group "ab)("c}`}}},
+		{`("ab)(c)`, []output{{true, `{<nil> group "ab)(c)}`}}},
+		{`( func(1, 2, 3, func(4)) a b c )`, []output{{false, `{<nil> group  func(1, 2, 3, func(4)) a b c }`}}},
+	}
+
+	for _, scenario := range testScenarios {
+		t.Run(scenario.text, func(t *testing.T) {
+			s := NewScanner([]byte(scenario.text))
+
+			// scan the text tokens
+			for j, expect := range scenario.expects {
+				token, err := s.Scan()
+
+				hasErr := err != nil
+				if expect.error != hasErr {
+					t.Errorf("[%d] Expected hasErr %v, got %v: %v (%v)", j, expect.error, hasErr, err, token)
+				}
+
+				tokenPrint := fmt.Sprintf("%v", token)
+				if tokenPrint != expect.print {
+					t.Errorf("[%d] Expected token %s, got %s", j, expect.print, tokenPrint)
+				}
+			}
+
+			// the last remaining token should be the eof
+			lastToken, err := s.Scan()
+			if err != nil || lastToken.Type != TokenEOF {
+				t.Fatalf("Expected EOF token, got %v (%v)", lastToken, err)
+			}
+		})
+	}
+}