1
0
Fork 0

Merging upstream version 26.16.2.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-04-25 07:27:01 +02:00
parent f03ef3fd88
commit 1e2a8571aa
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
110 changed files with 62370 additions and 61414 deletions

View file

@ -1,6 +1,101 @@
Changelog Changelog
========= =========
## [v26.16.1] - 2025-04-24
### :sparkles: New Features
- [`27a9fb2`](https://github.com/tobymao/sqlglot/commit/27a9fb26a1936512a09b8b09ed2656e22918f2c6) - **clickhouse**: Support parsing CTAS with alias *(PR [#5003](https://github.com/tobymao/sqlglot/pull/5003) by [@dorranh](https://github.com/dorranh))*
- [`45cd165`](https://github.com/tobymao/sqlglot/commit/45cd165eaca96b33f1de753a147bdc352b9d56d0) - **clickhouse**: Support ClickHouse Nothing type *(PR [#5004](https://github.com/tobymao/sqlglot/pull/5004) by [@dorranh](https://github.com/dorranh))*
- [`ca61a61`](https://github.com/tobymao/sqlglot/commit/ca61a617fa67082bc0fc94853dee4d70b8ca5c59) - Support exp.PartitionByProperty for parse_into() *(PR [#5006](https://github.com/tobymao/sqlglot/pull/5006) by [@erindru](https://github.com/erindru))*
- [`a6d4c3c`](https://github.com/tobymao/sqlglot/commit/a6d4c3c901f828cdd96a16a0e55eac1b244f63be) - **snowflake**: Add numeric parameter support *(PR [#5008](https://github.com/tobymao/sqlglot/pull/5008) by [@hovaesco](https://github.com/hovaesco))*
### :bug: Bug Fixes
- [`8e9dbd4`](https://github.com/tobymao/sqlglot/commit/8e9dbd491b9516c614554e05f05cc1cb976838e3) - **duckdb**: warn on unsupported IGNORE/RESPECT NULLS *(PR [#5002](https://github.com/tobymao/sqlglot/pull/5002) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#5001](https://github.com/tobymao/sqlglot/issues/5001) opened by [@MarcoGorelli](https://github.com/MarcoGorelli)*
- [`10b02bc`](https://github.com/tobymao/sqlglot/commit/10b02bce304042fea09e9cb2369db3c873452245) - **clickhouse**: Support optional timezone argument in date_diff() *(PR [#5005](https://github.com/tobymao/sqlglot/pull/5005) by [@dorranh](https://github.com/dorranh))*
### :wrench: Chores
- [`1d4d906`](https://github.com/tobymao/sqlglot/commit/1d4d906abc60d29b6606bc8eee50c92cef21d3fd) - use _try_parse for parsing ClickHouse's CREATE TABLE .. AS <table> *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`fc58c27`](https://github.com/tobymao/sqlglot/commit/fc58c273690734263b971b138ec8f0186f524672) - Refactor placeholder parsing for TokenType.COLON *(PR [#5009](https://github.com/tobymao/sqlglot/pull/5009) by [@VaggelisD](https://github.com/VaggelisD))*
## [v26.16.0] - 2025-04-22
### :boom: BREAKING CHANGES
- due to [`510984f`](https://github.com/tobymao/sqlglot/commit/510984f2ddc6ff13b8a8030f698aed9ad0e6f46b) - stop generating redundant TO_DATE calls *(PR [#4990](https://github.com/tobymao/sqlglot/pull/4990) by [@georgesittas](https://github.com/georgesittas))*:
stop generating redundant TO_DATE calls (#4990)
- due to [`da9ec61`](https://github.com/tobymao/sqlglot/commit/da9ec61e8edd5049e246390e1b638cf14d50fa2d) - Fix pretty generation of exp.Window *(PR [#4994](https://github.com/tobymao/sqlglot/pull/4994) by [@VaggelisD](https://github.com/VaggelisD))*:
Fix pretty generation of exp.Window (#4994)
- due to [`fb83fac`](https://github.com/tobymao/sqlglot/commit/fb83fac2d097d8d3e8e2556c072792857609bd94) - remove recursion from `simplify` *(PR [#4988](https://github.com/tobymao/sqlglot/pull/4988) by [@georgesittas](https://github.com/georgesittas))*:
remove recursion from `simplify` (#4988)
- due to [`890b24a`](https://github.com/tobymao/sqlglot/commit/890b24a5cec269f5595743d0a86024a23217a3f1) - remove `connector_depth` as it is now dead code *(commit by [@georgesittas](https://github.com/georgesittas))*:
remove `connector_depth` as it is now dead code
- due to [`1dc501b`](https://github.com/tobymao/sqlglot/commit/1dc501b8ed68638375d869e11f3bf188948a4990) - remove `max_depth` argument in simplify as it is now dead code *(commit by [@georgesittas](https://github.com/georgesittas))*:
remove `max_depth` argument in simplify as it is now dead code
### :sparkles: New Features
- [`76535ce`](https://github.com/tobymao/sqlglot/commit/76535ce9487186d2eb7071fac2f224238de7a9ba) - **optimizer**: add support for Spark's TRANSFORM clause *(PR [#4993](https://github.com/tobymao/sqlglot/pull/4993) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *addresses issue [#4991](https://github.com/tobymao/sqlglot/issues/4991) opened by [@karta0807913](https://github.com/karta0807913)*
### :bug: Bug Fixes
- [`510984f`](https://github.com/tobymao/sqlglot/commit/510984f2ddc6ff13b8a8030f698aed9ad0e6f46b) - **hive**: stop generating redundant TO_DATE calls *(PR [#4990](https://github.com/tobymao/sqlglot/pull/4990) by [@georgesittas](https://github.com/georgesittas))*
- [`da9ec61`](https://github.com/tobymao/sqlglot/commit/da9ec61e8edd5049e246390e1b638cf14d50fa2d) - **generator**: Fix pretty generation of exp.Window *(PR [#4994](https://github.com/tobymao/sqlglot/pull/4994) by [@VaggelisD](https://github.com/VaggelisD))*
- :arrow_lower_right: *fixes issue [#4098](https://github.com/TobikoData/sqlmesh/issues/4098) opened by [@tanghyd](https://github.com/tanghyd)*
- [`aae9aa8`](https://github.com/tobymao/sqlglot/commit/aae9aa8f96ccaa7686cda3cdabec208ae4c3d60a) - **optimizer**: ensure there are no shared refs after qualify_tables *(PR [#4995](https://github.com/tobymao/sqlglot/pull/4995) by [@georgesittas](https://github.com/georgesittas))*
- [`adaef42`](https://github.com/tobymao/sqlglot/commit/adaef42234d8f1c9c331f53bee2c42686f29bdec) - **trino**: Dont quote identifiers in string literals for the partitioned_by property *(PR [#4998](https://github.com/tobymao/sqlglot/pull/4998) by [@erindru](https://github.com/erindru))*
- [`a547f8d`](https://github.com/tobymao/sqlglot/commit/a547f8d4292f3b3a4c85f9d6466ead2ad976dfd2) - **postgres**: Capture optional minus sign in interval regex *(PR [#5000](https://github.com/tobymao/sqlglot/pull/5000) by [@VaggelisD](https://github.com/VaggelisD))*
- :arrow_lower_right: *fixes issue [#4999](https://github.com/tobymao/sqlglot/issues/4999) opened by [@cpimhoff](https://github.com/cpimhoff)*
### :recycle: Refactors
- [`fb83fac`](https://github.com/tobymao/sqlglot/commit/fb83fac2d097d8d3e8e2556c072792857609bd94) - **optimizer**: remove recursion from `simplify` *(PR [#4988](https://github.com/tobymao/sqlglot/pull/4988) by [@georgesittas](https://github.com/georgesittas))*
### :wrench: Chores
- [`890b24a`](https://github.com/tobymao/sqlglot/commit/890b24a5cec269f5595743d0a86024a23217a3f1) - remove `connector_depth` as it is now dead code *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`1dc501b`](https://github.com/tobymao/sqlglot/commit/1dc501b8ed68638375d869e11f3bf188948a4990) - remove `max_depth` argument in simplify as it is now dead code *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`6572517`](https://github.com/tobymao/sqlglot/commit/6572517c1ec76f14cbd661aacc15c84bef065284) - improve tooling around benchmarks *(commit by [@georgesittas](https://github.com/georgesittas))*
## [v26.15.0] - 2025-04-17
### :boom: BREAKING CHANGES
- due to [`2b7845a`](https://github.com/tobymao/sqlglot/commit/2b7845a3a821d366ae90ba9ef5e7d61194a34874) - Add support for Athena's Iceberg partitioning transforms *(PR [#4976](https://github.com/tobymao/sqlglot/pull/4976) by [@VaggelisD](https://github.com/VaggelisD))*:
Add support for Athena's Iceberg partitioning transforms (#4976)
- due to [`ee794e9`](https://github.com/tobymao/sqlglot/commit/ee794e9c6a3b2fdb142114327d904b6c94a16cd0) - use the standard POWER function instead of ^ fixes [#4982](https://github.com/tobymao/sqlglot/pull/4982) *(commit by [@georgesittas](https://github.com/georgesittas))*:
use the standard POWER function instead of ^ fixes #4982
- due to [`2369195`](https://github.com/tobymao/sqlglot/commit/2369195635e25dabd5ce26c13e402076508bba04) - consistently parse INTERVAL value as a string *(PR [#4986](https://github.com/tobymao/sqlglot/pull/4986) by [@georgesittas](https://github.com/georgesittas))*:
consistently parse INTERVAL value as a string (#4986)
- due to [`e866cff`](https://github.com/tobymao/sqlglot/commit/e866cffbaac3b62255d0d5c8be043ab2394af619) - support RELY option for PRIMARY KEY, FOREIGN KEY, and UNIQUE constraints *(PR [#4987](https://github.com/tobymao/sqlglot/pull/4987) by [@geooo109](https://github.com/geooo109))*:
support RELY option for PRIMARY KEY, FOREIGN KEY, and UNIQUE constraints (#4987)
### :sparkles: New Features
- [`e866cff`](https://github.com/tobymao/sqlglot/commit/e866cffbaac3b62255d0d5c8be043ab2394af619) - **parser**: support RELY option for PRIMARY KEY, FOREIGN KEY, and UNIQUE constraints *(PR [#4987](https://github.com/tobymao/sqlglot/pull/4987) by [@geooo109](https://github.com/geooo109))*
- :arrow_lower_right: *addresses issue [#4983](https://github.com/tobymao/sqlglot/issues/4983) opened by [@ggadon](https://github.com/ggadon)*
### :bug: Bug Fixes
- [`2b7845a`](https://github.com/tobymao/sqlglot/commit/2b7845a3a821d366ae90ba9ef5e7d61194a34874) - Add support for Athena's Iceberg partitioning transforms *(PR [#4976](https://github.com/tobymao/sqlglot/pull/4976) by [@VaggelisD](https://github.com/VaggelisD))*
- [`fa6af23`](https://github.com/tobymao/sqlglot/commit/fa6af2302f8482c5d89ead481afe4195aaa41a9c) - **optimizer**: compare the whole type to determine if a cast can be removed *(PR [#4981](https://github.com/tobymao/sqlglot/pull/4981) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#4977](https://github.com/tobymao/sqlglot/issues/4977) opened by [@MeinAccount](https://github.com/MeinAccount)*
- [`830c9b8`](https://github.com/tobymao/sqlglot/commit/830c9b8bbf906cf5d4fa8028b67dadda73fc58a9) - **unnest_subqueries**: avoid adding GROUP BY on aggregate projections in lateral subqueries *(PR [#4970](https://github.com/tobymao/sqlglot/pull/4970) by [@skadel](https://github.com/skadel))*
- [`ee794e9`](https://github.com/tobymao/sqlglot/commit/ee794e9c6a3b2fdb142114327d904b6c94a16cd0) - **postgres**: use the standard POWER function instead of ^ fixes [#4982](https://github.com/tobymao/sqlglot/pull/4982) *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`85e62b8`](https://github.com/tobymao/sqlglot/commit/85e62b88df2822797f527dce4eaa230c778cbe9e) - **bigquery**: Do not consume JOIN keywords after WITH OFFSET *(PR [#4984](https://github.com/tobymao/sqlglot/pull/4984) by [@VaggelisD](https://github.com/VaggelisD))*
- [`2369195`](https://github.com/tobymao/sqlglot/commit/2369195635e25dabd5ce26c13e402076508bba04) - consistently parse INTERVAL value as a string *(PR [#4986](https://github.com/tobymao/sqlglot/pull/4986) by [@georgesittas](https://github.com/georgesittas))*
## [v26.14.0] - 2025-04-15 ## [v26.14.0] - 2025-04-15
### :boom: BREAKING CHANGES ### :boom: BREAKING CHANGES
- due to [`cb20038`](https://github.com/tobymao/sqlglot/commit/cb2003875fc6e149bd4a631e99c312a04435a46b) - treat GO as command *(PR [#4978](https://github.com/tobymao/sqlglot/pull/4978) by [@georgesittas](https://github.com/georgesittas))*: - due to [`cb20038`](https://github.com/tobymao/sqlglot/commit/cb2003875fc6e149bd4a631e99c312a04435a46b) - treat GO as command *(PR [#4978](https://github.com/tobymao/sqlglot/pull/4978) by [@georgesittas](https://github.com/georgesittas))*:
@ -6358,3 +6453,6 @@ Changelog
[v26.13.1]: https://github.com/tobymao/sqlglot/compare/v26.13.0...v26.13.1 [v26.13.1]: https://github.com/tobymao/sqlglot/compare/v26.13.0...v26.13.1
[v26.13.2]: https://github.com/tobymao/sqlglot/compare/v26.13.1...v26.13.2 [v26.13.2]: https://github.com/tobymao/sqlglot/compare/v26.13.1...v26.13.2
[v26.14.0]: https://github.com/tobymao/sqlglot/compare/v26.13.2...v26.14.0 [v26.14.0]: https://github.com/tobymao/sqlglot/compare/v26.13.2...v26.14.0
[v26.15.0]: https://github.com/tobymao/sqlglot/compare/v26.14.0...v26.15.0
[v26.16.0]: https://github.com/tobymao/sqlglot/compare/v26.15.0...v26.16.0
[v26.16.1]: https://github.com/tobymao/sqlglot/compare/v26.16.0...v26.16.1

View file

@ -4,7 +4,10 @@ install:
pip install -e . pip install -e .
bench: install-dev-rs-release bench: install-dev-rs-release
python benchmarks/bench.py python -m benchmarks.bench
bench-optimize: install-dev-rs-release
python -m benchmarks.optimize
install-dev-rs-release: install-dev-rs-release:
cd sqlglotrs/ && python -m maturin develop -r cd sqlglotrs/ && python -m maturin develop -r

View file

@ -533,6 +533,10 @@ make check # Full test suite & linter checks
| long | 0.00889 (1.0) | 0.00572 (0.643) | 0.36982 (41.56) | 0.00614 (0.690) | 0.02530 (2.844) | 0.02931 (3.294) | 0.00059 (0.066) | | long | 0.00889 (1.0) | 0.00572 (0.643) | 0.36982 (41.56) | 0.00614 (0.690) | 0.02530 (2.844) | 0.02931 (3.294) | 0.00059 (0.066) |
| crazy | 0.02918 (1.0) | 0.01991 (0.682) | 1.88695 (64.66) | 0.02003 (0.686) | 7.46894 (255.9) | 0.64994 (22.27) | 0.00327 (0.112) | | crazy | 0.02918 (1.0) | 0.01991 (0.682) | 1.88695 (64.66) | 0.02003 (0.686) | 7.46894 (255.9) | 0.64994 (22.27) | 0.00327 (0.112) |
```
make bench # Run parsing benchmark
make bench-optimize # Run optimization benchmark
```
## Optional Dependencies ## Optional Dependencies

View file

@ -1,6 +1,6 @@
import collections.abc import collections.abc
from helpers import ascii_table from benchmarks.helpers import ascii_table
# moz_sql_parser 3.10 compatibility # moz_sql_parser 3.10 compatibility
collections.Iterable = collections.abc.Iterable collections.Iterable = collections.abc.Iterable

View file

@ -1,12 +1,12 @@
import sys
import typing as t import typing as t
from argparse import ArgumentParser from argparse import ArgumentParser
from helpers import ascii_table from benchmarks.helpers import ascii_table
from sqlglot.optimizer import optimize from sqlglot.optimizer import optimize
from sqlglot import parse_one from sqlglot import parse_one
from tests.helpers import load_sql_fixture_pairs, TPCH_SCHEMA, TPCDS_SCHEMA from tests.helpers import load_sql_fixture_pairs, TPCH_SCHEMA, TPCDS_SCHEMA
from timeit import Timer from timeit import Timer
import sys
# Deeply nested conditions currently require a lot of recursion # Deeply nested conditions currently require a lot of recursion
sys.setrecursionlimit(10000) sys.setrecursionlimit(10000)

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -62,6 +62,7 @@ dialect implementations in order to understand how their various components can
""" """
import importlib import importlib
import threading
DIALECTS = [ DIALECTS = [
"Athena", "Athena",
@ -104,11 +105,14 @@ MODULE_BY_ATTRIBUTE = {
__all__ = list(MODULE_BY_ATTRIBUTE) __all__ = list(MODULE_BY_ATTRIBUTE)
_import_lock = threading.Lock()
def __getattr__(name): def __getattr__(name):
module_name = MODULE_BY_ATTRIBUTE.get(name) module_name = MODULE_BY_ATTRIBUTE.get(name)
if module_name: if module_name:
module = importlib.import_module(f"sqlglot.dialects.{module_name}") with _import_lock:
module = importlib.import_module(f"sqlglot.dialects.{module_name}")
return getattr(module, name) return getattr(module, name)
raise AttributeError(f"module {__name__} has no attribute {name}") raise AttributeError(f"module {__name__} has no attribute {name}")

View file

@ -103,6 +103,7 @@ def _datetime_delta_sql(name: str) -> t.Callable[[Generator, DATEΤΙΜΕ_DELTA]
unit_to_var(expression), unit_to_var(expression),
expression.expression, expression.expression,
expression.this, expression.this,
expression.args.get("zone"),
) )
return _delta_sql return _delta_sql
@ -260,6 +261,7 @@ class ClickHouse(Dialect):
"LOWCARDINALITY": TokenType.LOWCARDINALITY, "LOWCARDINALITY": TokenType.LOWCARDINALITY,
"MAP": TokenType.MAP, "MAP": TokenType.MAP,
"NESTED": TokenType.NESTED, "NESTED": TokenType.NESTED,
"NOTHING": TokenType.NOTHING,
"SAMPLE": TokenType.TABLE_SAMPLE, "SAMPLE": TokenType.TABLE_SAMPLE,
"TUPLE": TokenType.STRUCT, "TUPLE": TokenType.STRUCT,
"UINT16": TokenType.USMALLINT, "UINT16": TokenType.USMALLINT,
@ -301,8 +303,8 @@ class ClickHouse(Dialect):
"COUNTIF": _build_count_if, "COUNTIF": _build_count_if,
"DATE_ADD": build_date_delta(exp.DateAdd, default_unit=None), "DATE_ADD": build_date_delta(exp.DateAdd, default_unit=None),
"DATEADD": build_date_delta(exp.DateAdd, default_unit=None), "DATEADD": build_date_delta(exp.DateAdd, default_unit=None),
"DATE_DIFF": build_date_delta(exp.DateDiff, default_unit=None), "DATE_DIFF": build_date_delta(exp.DateDiff, default_unit=None, supports_timezone=True),
"DATEDIFF": build_date_delta(exp.DateDiff, default_unit=None), "DATEDIFF": build_date_delta(exp.DateDiff, default_unit=None, supports_timezone=True),
"DATE_FORMAT": _build_date_format, "DATE_FORMAT": _build_date_format,
"DATE_SUB": build_date_delta(exp.DateSub, default_unit=None), "DATE_SUB": build_date_delta(exp.DateSub, default_unit=None),
"DATESUB": build_date_delta(exp.DateSub, default_unit=None), "DATESUB": build_date_delta(exp.DateSub, default_unit=None),
@ -1018,6 +1020,7 @@ class ClickHouse(Dialect):
exp.DataType.Type.LOWCARDINALITY: "LowCardinality", exp.DataType.Type.LOWCARDINALITY: "LowCardinality",
exp.DataType.Type.MAP: "Map", exp.DataType.Type.MAP: "Map",
exp.DataType.Type.NESTED: "Nested", exp.DataType.Type.NESTED: "Nested",
exp.DataType.Type.NOTHING: "Nothing",
exp.DataType.Type.SMALLINT: "Int16", exp.DataType.Type.SMALLINT: "Int16",
exp.DataType.Type.STRUCT: "Tuple", exp.DataType.Type.STRUCT: "Tuple",
exp.DataType.Type.TINYINT: "Int8", exp.DataType.Type.TINYINT: "Int8",

View file

@ -27,6 +27,12 @@ class Databricks(Spark):
class JSONPathTokenizer(jsonpath.JSONPathTokenizer): class JSONPathTokenizer(jsonpath.JSONPathTokenizer):
IDENTIFIERS = ["`", '"'] IDENTIFIERS = ["`", '"']
class Tokenizer(Spark.Tokenizer):
KEYWORDS = {
**Spark.Tokenizer.KEYWORDS,
"VOID": TokenType.VOID,
}
class Parser(Spark.Parser): class Parser(Spark.Parser):
LOG_DEFAULTS_TO_LN = True LOG_DEFAULTS_TO_LN = True
STRICT_CAST = True STRICT_CAST = True
@ -83,6 +89,11 @@ class Databricks(Spark):
TRANSFORMS.pop(exp.TryCast) TRANSFORMS.pop(exp.TryCast)
TYPE_MAPPING = {
**Spark.Generator.TYPE_MAPPING,
exp.DataType.Type.NULL: "VOID",
}
def columndef_sql(self, expression: exp.ColumnDef, sep: str = " ") -> str: def columndef_sql(self, expression: exp.ColumnDef, sep: str = " ") -> str:
constraint = expression.find(exp.GeneratedAsIdentityColumnConstraint) constraint = expression.find(exp.GeneratedAsIdentityColumnConstraint)
kind = expression.kind kind = expression.kind

View file

@ -1238,15 +1238,20 @@ def build_date_delta(
exp_class: t.Type[E], exp_class: t.Type[E],
unit_mapping: t.Optional[t.Dict[str, str]] = None, unit_mapping: t.Optional[t.Dict[str, str]] = None,
default_unit: t.Optional[str] = "DAY", default_unit: t.Optional[str] = "DAY",
supports_timezone: bool = False,
) -> t.Callable[[t.List], E]: ) -> t.Callable[[t.List], E]:
def _builder(args: t.List) -> E: def _builder(args: t.List) -> E:
unit_based = len(args) == 3 unit_based = len(args) >= 3
has_timezone = len(args) == 4
this = args[2] if unit_based else seq_get(args, 0) this = args[2] if unit_based else seq_get(args, 0)
unit = None unit = None
if unit_based or default_unit: if unit_based or default_unit:
unit = args[0] if unit_based else exp.Literal.string(default_unit) unit = args[0] if unit_based else exp.Literal.string(default_unit)
unit = exp.var(unit_mapping.get(unit.name.lower(), unit.name)) if unit_mapping else unit unit = exp.var(unit_mapping.get(unit.name.lower(), unit.name)) if unit_mapping else unit
return exp_class(this=this, expression=seq_get(args, 1), unit=unit) expression = exp_class(this=this, expression=seq_get(args, 1), unit=unit)
if supports_timezone and has_timezone:
expression.set("zone", args[-1])
return expression
return _builder return _builder

View file

@ -47,14 +47,6 @@ DATETIME_DELTA = t.Union[
exp.DateAdd, exp.TimeAdd, exp.DatetimeAdd, exp.TsOrDsAdd, exp.DateSub, exp.DatetimeSub exp.DateAdd, exp.TimeAdd, exp.DatetimeAdd, exp.TsOrDsAdd, exp.DateSub, exp.DatetimeSub
] ]
WINDOW_FUNCS_WITH_IGNORE_NULLS = (
exp.FirstValue,
exp.LastValue,
exp.Lag,
exp.Lead,
exp.NthValue,
)
def _date_delta_sql(self: DuckDB.Generator, expression: DATETIME_DELTA) -> str: def _date_delta_sql(self: DuckDB.Generator, expression: DATETIME_DELTA) -> str:
this = expression.this this = expression.this
@ -879,6 +871,14 @@ class DuckDB(Dialect):
PROPERTIES_LOCATION[exp.TemporaryProperty] = exp.Properties.Location.POST_CREATE PROPERTIES_LOCATION[exp.TemporaryProperty] = exp.Properties.Location.POST_CREATE
PROPERTIES_LOCATION[exp.ReturnsProperty] = exp.Properties.Location.POST_ALIAS PROPERTIES_LOCATION[exp.ReturnsProperty] = exp.Properties.Location.POST_ALIAS
IGNORE_RESPECT_NULLS_WINDOW_FUNCTIONS = (
exp.FirstValue,
exp.Lag,
exp.LastValue,
exp.Lead,
exp.NthValue,
)
def show_sql(self, expression: exp.Show) -> str: def show_sql(self, expression: exp.Show) -> str:
return f"SHOW {expression.name}" return f"SHOW {expression.name}"
@ -1098,11 +1098,21 @@ class DuckDB(Dialect):
return super().unnest_sql(expression) return super().unnest_sql(expression)
def ignorenulls_sql(self, expression: exp.IgnoreNulls) -> str: def ignorenulls_sql(self, expression: exp.IgnoreNulls) -> str:
if isinstance(expression.this, WINDOW_FUNCS_WITH_IGNORE_NULLS): if isinstance(expression.this, self.IGNORE_RESPECT_NULLS_WINDOW_FUNCTIONS):
# DuckDB should render IGNORE NULLS only for the general-purpose # DuckDB should render IGNORE NULLS only for the general-purpose
# window functions that accept it e.g. FIRST_VALUE(... IGNORE NULLS) OVER (...) # window functions that accept it e.g. FIRST_VALUE(... IGNORE NULLS) OVER (...)
return super().ignorenulls_sql(expression) return super().ignorenulls_sql(expression)
self.unsupported("IGNORE NULLS is not supported for non-window functions.")
return self.sql(expression, "this")
def respectnulls_sql(self, expression: exp.RespectNulls) -> str:
if isinstance(expression.this, self.IGNORE_RESPECT_NULLS_WINDOW_FUNCTIONS):
# DuckDB should render RESPECT NULLS only for the general-purpose
# window functions that accept it e.g. FIRST_VALUE(... RESPECT NULLS) OVER (...)
return super().respectnulls_sql(expression)
self.unsupported("RESPECT NULLS is not supported for non-window functions.")
return self.sql(expression, "this") return self.sql(expression, "this")
def arraytostring_sql(self, expression: exp.ArrayToString) -> str: def arraytostring_sql(self, expression: exp.ArrayToString) -> str:

View file

@ -62,6 +62,13 @@ TIME_DIFF_FACTOR = {
DIFF_MONTH_SWITCH = ("YEAR", "QUARTER", "MONTH") DIFF_MONTH_SWITCH = ("YEAR", "QUARTER", "MONTH")
TS_OR_DS_EXPRESSIONS = (
exp.DateDiff,
exp.Day,
exp.Month,
exp.Year,
)
def _add_date_sql(self: Hive.Generator, expression: DATE_ADD_OR_SUB) -> str: def _add_date_sql(self: Hive.Generator, expression: DATE_ADD_OR_SUB) -> str:
if isinstance(expression, exp.TsOrDsAdd) and not expression.unit: if isinstance(expression, exp.TsOrDsAdd) and not expression.unit:
@ -167,7 +174,7 @@ def _to_date_sql(self: Hive.Generator, expression: exp.TsOrDsToDate) -> str:
if time_format and time_format not in (Hive.TIME_FORMAT, Hive.DATE_FORMAT): if time_format and time_format not in (Hive.TIME_FORMAT, Hive.DATE_FORMAT):
return self.func("TO_DATE", expression.this, time_format) return self.func("TO_DATE", expression.this, time_format)
if isinstance(expression.this, exp.TsOrDsToDate): if isinstance(expression.parent, TS_OR_DS_EXPRESSIONS):
return self.sql(expression, "this") return self.sql(expression, "this")
return self.func("TO_DATE", expression.this) return self.func("TO_DATE", expression.this)

View file

@ -57,11 +57,13 @@ def _no_sort_array(self: Presto.Generator, expression: exp.SortArray) -> str:
def _schema_sql(self: Presto.Generator, expression: exp.Schema) -> str: def _schema_sql(self: Presto.Generator, expression: exp.Schema) -> str:
if isinstance(expression.parent, exp.PartitionedByProperty): if isinstance(expression.parent, exp.PartitionedByProperty):
# Any columns in the ARRAY[] string literals should not be quoted
expression.transform(lambda n: n.name if isinstance(n, exp.Identifier) else n, copy=False)
partition_exprs = [ partition_exprs = [
self.sql(c) if isinstance(c, (exp.Func, exp.Property)) else self.sql(c, "this") self.sql(c) if isinstance(c, (exp.Func, exp.Property)) else self.sql(c, "this")
for c in expression.expressions for c in expression.expressions
] ]
return self.sql(exp.Array(expressions=[exp.Literal.string(c) for c in partition_exprs])) return self.sql(exp.Array(expressions=[exp.Literal.string(c) for c in partition_exprs]))
if expression.parent: if expression.parent:

View file

@ -401,6 +401,8 @@ class Snowflake(Dialect):
TABLE_ALIAS_TOKENS = parser.Parser.TABLE_ALIAS_TOKENS | {TokenType.WINDOW} TABLE_ALIAS_TOKENS = parser.Parser.TABLE_ALIAS_TOKENS | {TokenType.WINDOW}
TABLE_ALIAS_TOKENS.discard(TokenType.MATCH_CONDITION) TABLE_ALIAS_TOKENS.discard(TokenType.MATCH_CONDITION)
COLON_PLACEHOLDER_TOKENS = ID_VAR_TOKENS | {TokenType.NUMBER}
FUNCTIONS = { FUNCTIONS = {
**parser.Parser.FUNCTIONS, **parser.Parser.FUNCTIONS,
"APPROX_PERCENTILE": exp.ApproxQuantile.from_arg_list, "APPROX_PERCENTILE": exp.ApproxQuantile.from_arg_list,

View file

@ -9,7 +9,7 @@ from sqlglot.errors import ExecuteError
from sqlglot.executor.context import Context from sqlglot.executor.context import Context
from sqlglot.executor.env import ENV from sqlglot.executor.env import ENV
from sqlglot.executor.table import RowReader, Table from sqlglot.executor.table import RowReader, Table
from sqlglot.helper import csv_reader, ensure_list, subclasses from sqlglot.helper import csv_reader, subclasses
class PythonExecutor: class PythonExecutor:
@ -370,8 +370,8 @@ def _rename(self, e):
return self.func(e.key, *values) return self.func(e.key, *values)
if isinstance(e, exp.Func) and e.is_var_len_args: if isinstance(e, exp.Func) and e.is_var_len_args:
*head, tail = values args = itertools.chain.from_iterable(x if isinstance(x, list) else [x] for x in values)
return self.func(e.key, *head, *ensure_list(tail)) return self.func(e.key, *args)
return self.func(e.key, *values) return self.func(e.key, *values)
except Exception as ex: except Exception as ex:

View file

@ -4533,6 +4533,7 @@ class DataType(Expression):
NAME = auto() NAME = auto()
NCHAR = auto() NCHAR = auto()
NESTED = auto() NESTED = auto()
NOTHING = auto()
NULL = auto() NULL = auto()
NUMMULTIRANGE = auto() NUMMULTIRANGE = auto()
NUMRANGE = auto() NUMRANGE = auto()
@ -5752,7 +5753,7 @@ class DateSub(Func, IntervalOp):
class DateDiff(Func, TimeUnit): class DateDiff(Func, TimeUnit):
_sql_names = ["DATEDIFF", "DATE_DIFF"] _sql_names = ["DATEDIFF", "DATE_DIFF"]
arg_types = {"this": True, "expression": True, "unit": False} arg_types = {"this": True, "expression": True, "unit": False, "zone": False}
class DateTrunc(Func): class DateTrunc(Func):
@ -7865,7 +7866,7 @@ def parse_identifier(name: str | Identifier, dialect: DialectType = None) -> Ide
return expression return expression
INTERVAL_STRING_RE = re.compile(r"\s*([0-9]+)\s*([a-zA-Z]+)\s*") INTERVAL_STRING_RE = re.compile(r"\s*(-?[0-9]+)\s*([a-zA-Z]+)\s*")
def to_interval(interval: str | Literal) -> Interval: def to_interval(interval: str | Literal) -> Interval:

View file

@ -2782,7 +2782,9 @@ class Generator(metaclass=_Generator):
if not partition and not order and not spec and alias: if not partition and not order and not spec and alias:
return f"{this} {alias}" return f"{this} {alias}"
args = " ".join(arg for arg in (alias, first, partition, order, spec) if arg) args = self.format_args(
*[arg for arg in (alias, first, partition, order, spec) if arg], sep=" "
)
return f"{this} ({args})" return f"{this} ({args})"
def partition_by_sql(self, expression: exp.Window | exp.MatchRecognize) -> str: def partition_by_sql(self, expression: exp.Window | exp.MatchRecognize) -> str:

View file

@ -211,16 +211,31 @@ def while_changing(expression: Expression, func: t.Callable[[Expression], E]) ->
Returns: Returns:
The transformed expression. The transformed expression.
""" """
while True: end_hash: t.Optional[int] = None
for n in reversed(tuple(expression.walk())):
n._hash = hash(n)
start = hash(expression) while True:
# No need to walk the AST we've already cached the hashes in the previous iteration
if end_hash is None:
for n in reversed(tuple(expression.walk())):
n._hash = hash(n)
start_hash = hash(expression)
expression = func(expression) expression = func(expression)
for n in expression.walk(): expression_nodes = tuple(expression.walk())
# Uncache previous caches so we can recompute them
for n in reversed(expression_nodes):
n._hash = None n._hash = None
if start == hash(expression): n._hash = hash(n)
end_hash = hash(expression)
if start_hash == end_hash:
# ... and reset the hash so we don't risk it becoming out of date if a mutation happens
for n in expression_nodes:
n._hash = None
break break
return expression return expression

View file

@ -5,7 +5,7 @@ import typing as t
from collections import defaultdict from collections import defaultdict
from sqlglot import expressions as exp from sqlglot import expressions as exp
from sqlglot.helper import find_new_name from sqlglot.helper import find_new_name, seq_get
from sqlglot.optimizer.scope import Scope, traverse_scope from sqlglot.optimizer.scope import Scope, traverse_scope
if t.TYPE_CHECKING: if t.TYPE_CHECKING:
@ -217,6 +217,7 @@ def _mergeable(
and not _is_a_window_expression_in_unmergable_operation() and not _is_a_window_expression_in_unmergable_operation()
and not _is_recursive() and not _is_recursive()
and not (inner_select.args.get("order") and outer_scope.is_union) and not (inner_select.args.get("order") and outer_scope.is_union)
and not isinstance(seq_get(inner_select.expressions, 0), exp.QueryTransform)
) )

View file

@ -5,6 +5,7 @@ from sqlglot.optimizer.qualify_columns import Resolver
from sqlglot.optimizer.scope import Scope, traverse_scope from sqlglot.optimizer.scope import Scope, traverse_scope
from sqlglot.schema import ensure_schema from sqlglot.schema import ensure_schema
from sqlglot.errors import OptimizeError from sqlglot.errors import OptimizeError
from sqlglot.helper import seq_get
# Sentinel value that means an outer query selecting ALL columns # Sentinel value that means an outer query selecting ALL columns
SELECT_ALL = object() SELECT_ALL = object()
@ -92,7 +93,13 @@ def pushdown_projections(expression, schema=None, remove_unused_selections=True)
# Push the selected columns down to the next scope # Push the selected columns down to the next scope
for name, (node, source) in scope.selected_sources.items(): for name, (node, source) in scope.selected_sources.items():
if isinstance(source, Scope): if isinstance(source, Scope):
columns = {SELECT_ALL} if scope.pivots else selects.get(name) or set() select = seq_get(source.expression.selects, 0)
if scope.pivots or isinstance(select, exp.QueryTransform):
columns = {SELECT_ALL}
else:
columns = selects.get(name) or set()
referenced_columns[source].update(columns) referenced_columns[source].update(columns)
column_aliases = node.alias_column_names column_aliases = node.alias_column_names

View file

@ -770,7 +770,7 @@ def qualify_outputs(scope_or_expression: Scope | exp.Expression) -> None:
for i, (selection, aliased_column) in enumerate( for i, (selection, aliased_column) in enumerate(
itertools.zip_longest(scope.expression.selects, scope.outer_columns) itertools.zip_longest(scope.expression.selects, scope.outer_columns)
): ):
if selection is None: if selection is None or isinstance(selection, exp.QueryTransform):
break break
if isinstance(selection, exp.Subquery): if isinstance(selection, exp.Subquery):
@ -787,7 +787,7 @@ def qualify_outputs(scope_or_expression: Scope | exp.Expression) -> None:
new_selections.append(selection) new_selections.append(selection)
if isinstance(scope.expression, exp.Select): if new_selections and isinstance(scope.expression, exp.Select):
scope.expression.set("expressions", new_selections) scope.expression.set("expressions", new_selections)
@ -945,7 +945,14 @@ class Resolver:
else: else:
columns = set_op.named_selects columns = set_op.named_selects
else: else:
columns = source.expression.named_selects select = seq_get(source.expression.selects, 0)
if isinstance(select, exp.QueryTransform):
# https://spark.apache.org/docs/3.5.1/sql-ref-syntax-qry-select-transform.html
schema = select.args.get("schema")
columns = [c.name for c in schema.expressions] if schema else ["key", "value"]
else:
columns = source.expression.named_selects
node, _ = self.scope.selected_sources.get(name) or (None, None) node, _ = self.scope.selected_sources.get(name) or (None, None)
if isinstance(node, Scope): if isinstance(node, Scope):

View file

@ -54,10 +54,10 @@ def qualify_tables(
def _qualify(table: exp.Table) -> None: def _qualify(table: exp.Table) -> None:
if isinstance(table.this, exp.Identifier): if isinstance(table.this, exp.Identifier):
if not table.args.get("db"): if db and not table.args.get("db"):
table.set("db", db) table.set("db", db.copy())
if not table.args.get("catalog") and table.args.get("db"): if catalog and not table.args.get("catalog") and table.args.get("db"):
table.set("catalog", catalog) table.set("catalog", catalog.copy())
if (db or catalog) and not isinstance(expression, exp.Query): if (db or catalog) and not isinstance(expression, exp.Query):
for node in expression.walk(prune=lambda n: isinstance(n, exp.Query)): for node in expression.walk(prune=lambda n: isinstance(n, exp.Query)):
@ -148,6 +148,7 @@ def qualify_tables(
if table_alias: if table_alias:
for p in exp.COLUMN_PARTS[1:]: for p in exp.COLUMN_PARTS[1:]:
column.set(p, None) column.set(p, None)
column.set("table", table_alias)
column.set("table", table_alias.copy())
return expression return expression

View file

@ -40,7 +40,6 @@ def simplify(
expression: exp.Expression, expression: exp.Expression,
constant_propagation: bool = False, constant_propagation: bool = False,
dialect: DialectType = None, dialect: DialectType = None,
max_depth: t.Optional[int] = None,
): ):
""" """
Rewrite sqlglot AST to simplify expressions. Rewrite sqlglot AST to simplify expressions.
@ -54,114 +53,99 @@ def simplify(
Args: Args:
expression: expression to simplify expression: expression to simplify
constant_propagation: whether the constant propagation rule should be used constant_propagation: whether the constant propagation rule should be used
max_depth: Chains of Connectors (AND, OR, etc) exceeding `max_depth` will be skipped
Returns: Returns:
sqlglot.Expression: simplified expression sqlglot.Expression: simplified expression
""" """
dialect = Dialect.get_or_raise(dialect) dialect = Dialect.get_or_raise(dialect)
def _simplify(expression, root=True): def _simplify(expression):
if ( pre_transformation_stack = [expression]
max_depth post_transformation_stack = []
and isinstance(expression, exp.Connector)
and not isinstance(expression.parent, exp.Connector)
):
depth = connector_depth(expression)
if depth > max_depth:
logger.info(
f"Skipping simplification because connector depth {depth} exceeds max {max_depth}"
)
return expression
if expression.meta.get(FINAL): while pre_transformation_stack:
return expression node = pre_transformation_stack.pop()
# group by expressions cannot be simplified, for example if node.meta.get(FINAL):
# select x + 1 + 1 FROM y GROUP BY x + 1 + 1 continue
# the projection must exactly match the group by key
group = expression.args.get("group")
if group and hasattr(expression, "selects"): # group by expressions cannot be simplified, for example
groups = set(group.expressions) # select x + 1 + 1 FROM y GROUP BY x + 1 + 1
group.meta[FINAL] = True # the projection must exactly match the group by key
group = node.args.get("group")
for e in expression.selects: if group and hasattr(node, "selects"):
for node in e.walk(): groups = set(group.expressions)
if node in groups: group.meta[FINAL] = True
e.meta[FINAL] = True
break
having = expression.args.get("having") for s in node.selects:
if having: for n in s.walk():
for node in having.walk(): if n in groups:
if node in groups: s.meta[FINAL] = True
having.meta[FINAL] = True break
break
# Pre-order transformations having = node.args.get("having")
node = expression if having:
node = rewrite_between(node) for n in having.walk():
node = uniq_sort(node, root) if n in groups:
node = absorb_and_eliminate(node, root) having.meta[FINAL] = True
node = simplify_concat(node) break
node = simplify_conditionals(node)
if constant_propagation: parent = node.parent
node = propagate_constants(node, root) root = node is expression
exp.replace_children(node, lambda e: _simplify(e, False)) new_node = rewrite_between(node)
new_node = uniq_sort(new_node, root)
new_node = absorb_and_eliminate(new_node, root)
new_node = simplify_concat(new_node)
new_node = simplify_conditionals(new_node)
# Post-order transformations if constant_propagation:
node = simplify_not(node) new_node = propagate_constants(new_node, root)
node = flatten(node)
node = simplify_connectors(node, root)
node = remove_complements(node, root)
node = simplify_coalesce(node, dialect)
node.parent = expression.parent
node = simplify_literals(node, root)
node = simplify_equality(node)
node = simplify_parens(node)
node = simplify_datetrunc(node, dialect)
node = sort_comparison(node)
node = simplify_startswith(node)
if root: if new_node is not node:
expression.replace(node) node.replace(new_node)
return node
pre_transformation_stack.extend(
n for n in new_node.iter_expressions(reverse=True) if not n.meta.get(FINAL)
)
post_transformation_stack.append((new_node, parent))
while post_transformation_stack:
node, parent = post_transformation_stack.pop()
root = node is expression
# Resets parent, arg_key, index pointers this is needed because some of the
# previous transformations mutate the AST, leading to an inconsistent state
for k, v in tuple(node.args.items()):
node.set(k, v)
# Post-order transformations
new_node = simplify_not(node)
new_node = flatten(new_node)
new_node = simplify_connectors(new_node, root)
new_node = remove_complements(new_node, root)
new_node = simplify_coalesce(new_node, dialect)
new_node.parent = parent
new_node = simplify_literals(new_node, root)
new_node = simplify_equality(new_node)
new_node = simplify_parens(new_node)
new_node = simplify_datetrunc(new_node, dialect)
new_node = sort_comparison(new_node)
new_node = simplify_startswith(new_node)
if new_node is not node:
node.replace(new_node)
return new_node
expression = while_changing(expression, _simplify) expression = while_changing(expression, _simplify)
remove_where_true(expression) remove_where_true(expression)
return expression return expression
def connector_depth(expression: exp.Expression) -> int:
"""
Determine the maximum depth of a tree of Connectors.
For example:
>>> from sqlglot import parse_one
>>> connector_depth(parse_one("a AND b AND c AND d"))
3
"""
stack = deque([(expression, 0)])
max_depth = 0
while stack:
expression, depth = stack.pop()
if not isinstance(expression, exp.Connector):
continue
depth += 1
max_depth = max(depth, max_depth)
stack.append((expression.left, depth))
stack.append((expression.right, depth))
return max_depth
def catch(*exceptions): def catch(*exceptions):
"""Decorator that ignores a simplification function if any of `exceptions` are raised""" """Decorator that ignores a simplification function if any of `exceptions` are raised"""

View file

@ -397,6 +397,7 @@ class Parser(metaclass=_Parser):
TokenType.IMAGE, TokenType.IMAGE,
TokenType.VARIANT, TokenType.VARIANT,
TokenType.VECTOR, TokenType.VECTOR,
TokenType.VOID,
TokenType.OBJECT, TokenType.OBJECT,
TokenType.OBJECT_IDENTIFIER, TokenType.OBJECT_IDENTIFIER,
TokenType.INET, TokenType.INET,
@ -405,6 +406,7 @@ class Parser(metaclass=_Parser):
TokenType.IPV4, TokenType.IPV4,
TokenType.IPV6, TokenType.IPV6,
TokenType.UNKNOWN, TokenType.UNKNOWN,
TokenType.NOTHING,
TokenType.NULL, TokenType.NULL,
TokenType.NAME, TokenType.NAME,
TokenType.TDIGEST, TokenType.TDIGEST,
@ -579,6 +581,8 @@ class Parser(metaclass=_Parser):
ALIAS_TOKENS = ID_VAR_TOKENS ALIAS_TOKENS = ID_VAR_TOKENS
COLON_PLACEHOLDER_TOKENS = ID_VAR_TOKENS
ARRAY_CONSTRUCTORS = { ARRAY_CONSTRUCTORS = {
"ARRAY": exp.Array, "ARRAY": exp.Array,
"LIST": exp.List, "LIST": exp.List,
@ -799,6 +803,7 @@ class Parser(metaclass=_Parser):
exp.Order: lambda self: self._parse_order(), exp.Order: lambda self: self._parse_order(),
exp.Ordered: lambda self: self._parse_ordered(), exp.Ordered: lambda self: self._parse_ordered(),
exp.Properties: lambda self: self._parse_properties(), exp.Properties: lambda self: self._parse_properties(),
exp.PartitionedByProperty: lambda self: self._parse_partitioned_by(),
exp.Qualify: lambda self: self._parse_qualify(), exp.Qualify: lambda self: self._parse_qualify(),
exp.Returning: lambda self: self._parse_returning(), exp.Returning: lambda self: self._parse_returning(),
exp.Select: lambda self: self._parse_select(), exp.Select: lambda self: self._parse_select(),
@ -900,7 +905,7 @@ class Parser(metaclass=_Parser):
TokenType.PARAMETER: lambda self: self._parse_parameter(), TokenType.PARAMETER: lambda self: self._parse_parameter(),
TokenType.COLON: lambda self: ( TokenType.COLON: lambda self: (
self.expression(exp.Placeholder, this=self._prev.text) self.expression(exp.Placeholder, this=self._prev.text)
if self._match_set(self.ID_VAR_TOKENS) if self._match_set(self.COLON_PLACEHOLDER_TOKENS)
else None else None
), ),
} }
@ -1999,7 +2004,7 @@ class Parser(metaclass=_Parser):
# exp.Properties.Location.POST_SCHEMA and POST_WITH # exp.Properties.Location.POST_SCHEMA and POST_WITH
extend_props(self._parse_properties()) extend_props(self._parse_properties())
self._match(TokenType.ALIAS) has_alias = self._match(TokenType.ALIAS)
if not self._match_set(self.DDL_SELECT_TOKENS, advance=False): if not self._match_set(self.DDL_SELECT_TOKENS, advance=False):
# exp.Properties.Location.POST_ALIAS # exp.Properties.Location.POST_ALIAS
extend_props(self._parse_properties()) extend_props(self._parse_properties())
@ -2010,6 +2015,11 @@ class Parser(metaclass=_Parser):
else: else:
expression = self._parse_ddl_select() expression = self._parse_ddl_select()
# Some dialects also support using a table as an alias instead of a SELECT.
# Here we fallback to this as an alternative.
if not expression and has_alias:
expression = self._try_parse(self._parse_table_parts)
if create_token.token_type == TokenType.TABLE: if create_token.token_type == TokenType.TABLE:
# exp.Properties.Location.POST_EXPRESSION # exp.Properties.Location.POST_EXPRESSION
extend_props(self._parse_properties()) extend_props(self._parse_properties())
@ -5229,6 +5239,8 @@ class Parser(metaclass=_Parser):
this = self.expression(exp.DataType, this=self.expression(exp.Interval, unit=unit)) this = self.expression(exp.DataType, this=self.expression(exp.Interval, unit=unit))
else: else:
this = self.expression(exp.DataType, this=exp.DataType.Type.INTERVAL) this = self.expression(exp.DataType, this=exp.DataType.Type.INTERVAL)
elif type_token == TokenType.VOID:
this = exp.DataType(this=exp.DataType.Type.NULL)
if maybe_func and check_func: if maybe_func and check_func:
index2 = self._index index2 = self._index
@ -7416,7 +7428,7 @@ class Parser(metaclass=_Parser):
if self._match_text_seq("WITH", "SYNC", "MODE") or self._match_text_seq( if self._match_text_seq("WITH", "SYNC", "MODE") or self._match_text_seq(
"WITH", "ASYNC", "MODE" "WITH", "ASYNC", "MODE"
): ):
mode = f"WITH {self._tokens[self._index-2].text.upper()} MODE" mode = f"WITH {self._tokens[self._index - 2].text.upper()} MODE"
else: else:
mode = None mode = None

View file

@ -222,6 +222,7 @@ class TokenType(AutoName):
UNKNOWN = auto() UNKNOWN = auto()
VECTOR = auto() VECTOR = auto()
DYNAMIC = auto() DYNAMIC = auto()
VOID = auto()
# keywords # keywords
ALIAS = auto() ALIAS = auto()
@ -333,6 +334,7 @@ class TokenType(AutoName):
MODEL = auto() MODEL = auto()
NATURAL = auto() NATURAL = auto()
NEXT = auto() NEXT = auto()
NOTHING = auto()
NOTNULL = auto() NOTNULL = auto()
NULL = auto() NULL = auto()
OBJECT_IDENTIFIER = auto() OBJECT_IDENTIFIER = auto()

View file

@ -276,15 +276,17 @@ class TestAthena(Validator):
exp.FileFormatProperty(this=exp.Literal.string("parquet")), exp.FileFormatProperty(this=exp.Literal.string("parquet")),
exp.LocationProperty(this=exp.Literal.string("s3://foo")), exp.LocationProperty(this=exp.Literal.string("s3://foo")),
exp.PartitionedByProperty( exp.PartitionedByProperty(
this=exp.Schema(expressions=[exp.to_column("partition_col")]) this=exp.Schema(expressions=[exp.to_column("partition_col", quoted=True)])
), ),
] ]
), ),
expression=exp.select("1"), expression=exp.select("1"),
) )
# Even if identify=True, the column names should not be quoted within the string literals in the partitioned_by ARRAY[]
self.assertEqual( self.assertEqual(
ctas_hive.sql(dialect=self.dialect, identify=True), ctas_hive.sql(dialect=self.dialect, identify=True),
"CREATE TABLE \"foo\".\"bar\" WITH (format='parquet', external_location='s3://foo', partitioned_by=ARRAY['\"partition_col\"']) AS SELECT 1", "CREATE TABLE \"foo\".\"bar\" WITH (format='parquet', external_location='s3://foo', partitioned_by=ARRAY['partition_col']) AS SELECT 1",
) )
self.assertEqual( self.assertEqual(
ctas_hive.sql(dialect=self.dialect, identify=False), ctas_hive.sql(dialect=self.dialect, identify=False),
@ -303,7 +305,8 @@ class TestAthena(Validator):
expressions=[ expressions=[
exp.to_column("partition_col"), exp.to_column("partition_col"),
exp.PartitionedByBucket( exp.PartitionedByBucket(
this=exp.to_column("a"), expression=exp.Literal.number(4) this=exp.to_column("a", quoted=True),
expression=exp.Literal.number(4),
), ),
] ]
) )
@ -312,11 +315,25 @@ class TestAthena(Validator):
), ),
expression=exp.select("1"), expression=exp.select("1"),
) )
# Even if identify=True, the column names should not be quoted within the string literals in the partitioning ARRAY[]
# Technically Trino's Iceberg connector does support quoted column names in the string literals but its undocumented
# so we dont do it to keep consistency with the Hive connector
self.assertEqual( self.assertEqual(
ctas_iceberg.sql(dialect=self.dialect, identify=True), ctas_iceberg.sql(dialect=self.dialect, identify=True),
"CREATE TABLE \"foo\".\"bar\" WITH (table_type='iceberg', location='s3://foo', partitioning=ARRAY['\"partition_col\"', 'BUCKET(\"a\", 4)']) AS SELECT 1", "CREATE TABLE \"foo\".\"bar\" WITH (table_type='iceberg', location='s3://foo', partitioning=ARRAY['partition_col', 'BUCKET(a, 4)']) AS SELECT 1",
) )
self.assertEqual( self.assertEqual(
ctas_iceberg.sql(dialect=self.dialect, identify=False), ctas_iceberg.sql(dialect=self.dialect, identify=False),
"CREATE TABLE foo.bar WITH (table_type='iceberg', location='s3://foo', partitioning=ARRAY['partition_col', 'BUCKET(a, 4)']) AS SELECT 1", "CREATE TABLE foo.bar WITH (table_type='iceberg', location='s3://foo', partitioning=ARRAY['partition_col', 'BUCKET(a, 4)']) AS SELECT 1",
) )
def test_parse_partitioned_by_returns_iceberg_transforms(self):
# check that parse_into works for PartitionedByProperty and also that correct AST nodes are emitted for Iceberg transforms
parsed = self.parse_one(
"(a, bucket(4, b), truncate(3, c), month(d))", into=exp.PartitionedByProperty
)
assert isinstance(parsed, exp.PartitionedByProperty)
assert isinstance(parsed.this, exp.Schema)
assert next(n for n in parsed.this.expressions if isinstance(n, exp.PartitionedByBucket))
assert next(n for n in parsed.this.expressions if isinstance(n, exp.PartitionByTruncate))

View file

@ -448,14 +448,13 @@ LANGUAGE js AS
"SELECT SUM(x RESPECT NULLS) AS x", "SELECT SUM(x RESPECT NULLS) AS x",
read={ read={
"bigquery": "SELECT SUM(x RESPECT NULLS) AS x", "bigquery": "SELECT SUM(x RESPECT NULLS) AS x",
"duckdb": "SELECT SUM(x RESPECT NULLS) AS x",
"postgres": "SELECT SUM(x) RESPECT NULLS AS x", "postgres": "SELECT SUM(x) RESPECT NULLS AS x",
"spark": "SELECT SUM(x) RESPECT NULLS AS x", "spark": "SELECT SUM(x) RESPECT NULLS AS x",
"snowflake": "SELECT SUM(x) RESPECT NULLS AS x", "snowflake": "SELECT SUM(x) RESPECT NULLS AS x",
}, },
write={ write={
"bigquery": "SELECT SUM(x RESPECT NULLS) AS x", "bigquery": "SELECT SUM(x RESPECT NULLS) AS x",
"duckdb": "SELECT SUM(x RESPECT NULLS) AS x", "duckdb": "SELECT SUM(x) AS x",
"postgres": "SELECT SUM(x) RESPECT NULLS AS x", "postgres": "SELECT SUM(x) RESPECT NULLS AS x",
"spark": "SELECT SUM(x) RESPECT NULLS AS x", "spark": "SELECT SUM(x) RESPECT NULLS AS x",
"snowflake": "SELECT SUM(x) RESPECT NULLS AS x", "snowflake": "SELECT SUM(x) RESPECT NULLS AS x",
@ -465,7 +464,7 @@ LANGUAGE js AS
"SELECT PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER ()", "SELECT PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER ()",
write={ write={
"bigquery": "SELECT PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER ()", "bigquery": "SELECT PERCENTILE_CONT(x, 0.5 RESPECT NULLS) OVER ()",
"duckdb": "SELECT QUANTILE_CONT(x, 0.5 RESPECT NULLS) OVER ()", "duckdb": "SELECT QUANTILE_CONT(x, 0.5) OVER ()",
"spark": "SELECT PERCENTILE_CONT(x, 0.5) RESPECT NULLS OVER ()", "spark": "SELECT PERCENTILE_CONT(x, 0.5) RESPECT NULLS OVER ()",
}, },
) )

View file

@ -739,6 +739,12 @@ class TestClickhouse(Validator):
with self.subTest(f"Casting to ClickHouse {data_type}"): with self.subTest(f"Casting to ClickHouse {data_type}"):
self.validate_identity(f"SELECT CAST(val AS {data_type})") self.validate_identity(f"SELECT CAST(val AS {data_type})")
def test_nothing_type(self):
data_types = ["Nothing", "Nullable(Nothing)"]
for data_type in data_types:
with self.subTest(f"Casting to ClickHouse {data_type}"):
self.validate_identity(f"SELECT CAST(val AS {data_type})")
def test_aggregate_function_column_with_any_keyword(self): def test_aggregate_function_column_with_any_keyword(self):
# Regression test for https://github.com/tobymao/sqlglot/issues/4723 # Regression test for https://github.com/tobymao/sqlglot/issues/4723
self.validate_all( self.validate_all(
@ -766,6 +772,17 @@ ORDER BY (
pretty=True, pretty=True,
) )
def test_create_table_as_alias(self):
ctas_alias = "CREATE TABLE my_db.my_table AS another_db.another_table"
expected = exp.Create(
this=exp.to_table("my_db.my_table"),
kind="TABLE",
expression=exp.to_table("another_db.another_table"),
)
self.assertEqual(self.parse_one(ctas_alias), expected)
self.validate_identity(ctas_alias)
def test_ddl(self): def test_ddl(self):
db_table_expr = exp.Table(this=None, db=exp.to_identifier("foo"), catalog=None) db_table_expr = exp.Table(this=None, db=exp.to_identifier("foo"), catalog=None)
create_with_cluster = exp.Create( create_with_cluster = exp.Create(
@ -1220,6 +1237,15 @@ LIFETIME(MIN 0 MAX 0)""",
f"SELECT {func_alias}(SECOND, 1, bar)", f"SELECT {func_alias}(SECOND, 1, bar)",
f"SELECT {func_name}(SECOND, 1, bar)", f"SELECT {func_name}(SECOND, 1, bar)",
) )
# 4-arg functions of type <func>(unit, value, date, timezone)
for func in (("DATE_DIFF", "DATEDIFF"),):
func_name = func[0]
for func_alias in func:
with self.subTest(f"Test 4-arg date-time function {func_alias}"):
self.validate_identity(
f"SELECT {func_alias}(SECOND, 1, bar, 'UTC')",
f"SELECT {func_name}(SECOND, 1, bar, 'UTC')",
)
def test_convert(self): def test_convert(self):
self.assertEqual( self.assertEqual(

View file

@ -7,6 +7,12 @@ class TestDatabricks(Validator):
dialect = "databricks" dialect = "databricks"
def test_databricks(self): def test_databricks(self):
null_type = exp.DataType.build("VOID", dialect="databricks")
self.assertEqual(null_type.sql(), "NULL")
self.assertEqual(null_type.sql("databricks"), "VOID")
self.validate_identity("SELECT CAST(NULL AS VOID)")
self.validate_identity("SELECT void FROM t")
self.validate_identity("SELECT * FROM stream") self.validate_identity("SELECT * FROM stream")
self.validate_identity("SELECT t.current_time FROM t") self.validate_identity("SELECT t.current_time FROM t")
self.validate_identity("ALTER TABLE labels ADD COLUMN label_score FLOAT") self.validate_identity("ALTER TABLE labels ADD COLUMN label_score FLOAT")
@ -89,7 +95,7 @@ class TestDatabricks(Validator):
self.validate_all( self.validate_all(
"CREATE TABLE foo (x INT GENERATED ALWAYS AS (YEAR(y)))", "CREATE TABLE foo (x INT GENERATED ALWAYS AS (YEAR(y)))",
write={ write={
"databricks": "CREATE TABLE foo (x INT GENERATED ALWAYS AS (YEAR(TO_DATE(y))))", "databricks": "CREATE TABLE foo (x INT GENERATED ALWAYS AS (YEAR(y)))",
"tsql": "CREATE TABLE foo (x AS YEAR(CAST(y AS DATE)))", "tsql": "CREATE TABLE foo (x AS YEAR(CAST(y AS DATE)))",
}, },
) )

Some files were not shown because too many files have changed in this diff Show more