1
0
Fork 0

Merging upstream version 24.1.0.

Signed-off-by: Daniel Baumann <daniel@debian.org>
This commit is contained in:
Daniel Baumann 2025-02-13 21:37:09 +01:00
parent 9689eb837b
commit d5706efe6b
Signed by: daniel
GPG key ID: FBB4F0E80A80222F
70 changed files with 55134 additions and 50721 deletions

13
.gitpod.yml Normal file
View file

@ -0,0 +1,13 @@
image: gitpod/workspace-python-3.11
tasks:
- name: sqlglot
init: |
python -m venv .venv
source .venv/bin/activate
make install-dev
command: |
clear
source .venv/bin/activate

View file

@ -1,6 +1,94 @@
Changelog
=========
## [v24.0.3] - 2024-05-29
### :bug: Bug Fixes
- [`fb8db9f`](https://github.com/tobymao/sqlglot/commit/fb8db9f2219cfd578fda5c3f51737c180d5aecc6) - **parser**: edge case where TYPE_CONVERTERS leads to type instead of column *(PR [#3566](https://github.com/tobymao/sqlglot/pull/3566) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#3565](https://github.com/tobymao/sqlglot/issues/3565) opened by [@galunto](https://github.com/galunto)*
- [`aac8570`](https://github.com/tobymao/sqlglot/commit/aac85705c43edfcd1ebb552573f496c14dce519b) - use index2 instead of self._index in _parse_type index difference *(commit by [@georgesittas](https://github.com/georgesittas))*
## [v24.0.2] - 2024-05-28
### :sparkles: New Features
- [`078471d`](https://github.com/tobymao/sqlglot/commit/078471d3643da418c91b71dc7bfce5453b924028) - **mysql,doris**: improve transpilation of INTERVAL (plural to singular) *(PR [#3543](https://github.com/tobymao/sqlglot/pull/3543) by [@Toms1999](https://github.com/Toms1999))*
- [`fe56e64`](https://github.com/tobymao/sqlglot/commit/fe56e64aff775002c52843b6b9df973d96349400) - **postgres**: add support for col int[size] column def syntax *(PR [#3548](https://github.com/tobymao/sqlglot/pull/3548) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *addresses issue [#3544](https://github.com/tobymao/sqlglot/issues/3544) opened by [@judahrand](https://github.com/judahrand)*
- :arrow_lower_right: *addresses issue [#3545](https://github.com/tobymao/sqlglot/issues/3545) opened by [@judahrand](https://github.com/judahrand)*
- [`188dce8`](https://github.com/tobymao/sqlglot/commit/188dce8ae98f23b5741882c698109563445f11f6) - **snowflake**: add support for WITH-prefixed column constraints *(PR [#3549](https://github.com/tobymao/sqlglot/pull/3549) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *addresses issue [#3537](https://github.com/tobymao/sqlglot/issues/3537) opened by [@barino86](https://github.com/barino86)*
- [`712d247`](https://github.com/tobymao/sqlglot/commit/712d24704f1be9e54fd6385d6fdbd05173b007aa) - add support for ALTER COLUMN DROP NOT NULL *(PR [#3550](https://github.com/tobymao/sqlglot/pull/3550) by [@noklam](https://github.com/noklam))*
- :arrow_lower_right: *addresses issue [#3534](https://github.com/tobymao/sqlglot/issues/3534) opened by [@barino86](https://github.com/barino86)*
- [`7c323bd`](https://github.com/tobymao/sqlglot/commit/7c323bde83f1804d7a1e98fcf94e6832385a03d6) - add option in schema's find method to ensure types are DataTypes *(PR [#3560](https://github.com/tobymao/sqlglot/pull/3560) by [@georgesittas](https://github.com/georgesittas))*
### :bug: Bug Fixes
- [`1a8a16b`](https://github.com/tobymao/sqlglot/commit/1a8a16b459c7fe20fc2c689ad601b5beac57a206) - **clickhouse**: improve struct type parsing *(PR [#3547](https://github.com/tobymao/sqlglot/pull/3547) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#3546](https://github.com/tobymao/sqlglot/issues/3546) opened by [@cpcloud](https://github.com/cpcloud)*
- [`970d3b0`](https://github.com/tobymao/sqlglot/commit/970d3b03750d58ec236ce205bc250616e1fb1349) - **postgres**: setting un-suffixed FLOAT as DOUBLE ([#3551](https://github.com/tobymao/sqlglot/pull/3551)) *(PR [#3552](https://github.com/tobymao/sqlglot/pull/3552) by [@sandband](https://github.com/sandband))*
- :arrow_lower_right: *fixes issue [#3551](https://github.com/tobymao/sqlglot/issues/3551) opened by [@sandband](https://github.com/sandband)*
- [`e1a9a8b`](https://github.com/tobymao/sqlglot/commit/e1a9a8b6b7fbd44e62cee626540f90425d22d50c) - **redshift**: add support for MINUS operator [#3553](https://github.com/tobymao/sqlglot/pull/3553) *(PR [#3555](https://github.com/tobymao/sqlglot/pull/3555) by [@sandband](https://github.com/sandband))*
- [`beb0269`](https://github.com/tobymao/sqlglot/commit/beb0269b39e848897eaf56e1966d342db72e5c7c) - **tsql**: adapt TimeStrToTime to avoid superfluous casts *(PR [#3558](https://github.com/tobymao/sqlglot/pull/3558) by [@Themiscodes](https://github.com/Themiscodes))*
- [`eae3c51`](https://github.com/tobymao/sqlglot/commit/eae3c5165c16b61c7b524a55776bdb1127005c7d) - use regex to split interval strings *(PR [#3556](https://github.com/tobymao/sqlglot/pull/3556) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#3554](https://github.com/tobymao/sqlglot/issues/3554) opened by [@kevinjqiu](https://github.com/kevinjqiu)*
### :recycle: Refactors
- [`a67de5f`](https://github.com/tobymao/sqlglot/commit/a67de5faaa88c1fb5d9857a69c9df06506520cbc) - get rid of redundant dict_depth check in schema find *(PR [#3561](https://github.com/tobymao/sqlglot/pull/3561) by [@georgesittas](https://github.com/georgesittas))*
- [`89a8984`](https://github.com/tobymao/sqlglot/commit/89a8984b8db3817d934b4395e190f3848b1ee77a) - move UNESCAPED_SEQUENCES out of the _Dialect metaclass *(commit by [@georgesittas](https://github.com/georgesittas))*
### :wrench: Chores
- [`893addf`](https://github.com/tobymao/sqlglot/commit/893addf9d07602ec3a77097f38d696b6760c6038) - add SET NOT NULL test *(commit by [@georgesittas](https://github.com/georgesittas))*
## [v24.0.1] - 2024-05-23
### :boom: BREAKING CHANGES
- due to [`80c622e`](https://github.com/tobymao/sqlglot/commit/80c622e0c252ef3be9e469c1cf116c1cd4eaef94) - add reserved keywords fixes [#3526](https://github.com/tobymao/sqlglot/pull/3526) *(commit by [@georgesittas](https://github.com/georgesittas))*:
add reserved keywords fixes #3526
### :sparkles: New Features
- [`a255610`](https://github.com/tobymao/sqlglot/commit/a2556101c8d04907ae49252def84c55d2daf78b2) - add StringToArray expression (postgres), improve its transpilation *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`8f46d48`](https://github.com/tobymao/sqlglot/commit/8f46d48d4ef4e6be022aff5739992f149519c19d) - **redshift**: transpile SPLIT_TO_STRING *(commit by [@georgesittas](https://github.com/georgesittas))*
### :bug: Bug Fixes
- [`80c622e`](https://github.com/tobymao/sqlglot/commit/80c622e0c252ef3be9e469c1cf116c1cd4eaef94) - **doris**: add reserved keywords fixes [#3526](https://github.com/tobymao/sqlglot/pull/3526) *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`ebf5fc7`](https://github.com/tobymao/sqlglot/commit/ebf5fc70d8936b5e1522a3ae1b9e231cefe49623) - **hive**: generate correct names for weekofyear, dayofmonth, dayofweek *(PR [#3533](https://github.com/tobymao/sqlglot/pull/3533) by [@oshyun](https://github.com/oshyun))*
- :arrow_lower_right: *fixes issue [#3532](https://github.com/tobymao/sqlglot/issues/3532) opened by [@oshyun](https://github.com/oshyun)*
- [`3fe3c2c`](https://github.com/tobymao/sqlglot/commit/3fe3c2c0a3e5f465a0c62261c5a0ba6faf8f0846) - **parser**: make _parse_type less aggressive, only parse column as last resort *(PR [#3541](https://github.com/tobymao/sqlglot/pull/3541) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#3539](https://github.com/tobymao/sqlglot/issues/3539) opened by [@crash-g](https://github.com/crash-g)*
- :arrow_lower_right: *fixes issue [#3540](https://github.com/tobymao/sqlglot/issues/3540) opened by [@crash-g](https://github.com/crash-g)*
- [`8afff02`](https://github.com/tobymao/sqlglot/commit/8afff028977593789abe31c6168a93b7e32ac890) - **tsql**: preserve REPLICATE roundtrip *(commit by [@georgesittas](https://github.com/georgesittas))*
## [v24.0.0] - 2024-05-21
### :boom: BREAKING CHANGES
- due to [`a077f17`](https://github.com/tobymao/sqlglot/commit/a077f17d10200980769ff69dd9044c95d6d718f2) - add reserved keywords *(PR [#3525](https://github.com/tobymao/sqlglot/pull/3525) by [@georgesittas](https://github.com/georgesittas))*:
add reserved keywords (#3525)
### :sparkles: New Features
- [`d958bba`](https://github.com/tobymao/sqlglot/commit/d958bba8494b8bca9cf3ffef0384690bafd78393) - **snowflake**: add support for CREATE WAREHOUSE *(PR [#3510](https://github.com/tobymao/sqlglot/pull/3510) by [@yingw787](https://github.com/yingw787))*
- :arrow_lower_right: *addresses issue [#3502](https://github.com/tobymao/sqlglot/issues/3502) opened by [@yingw787](https://github.com/yingw787)*
- [`2105300`](https://github.com/tobymao/sqlglot/commit/21053004dbb4c6dc3bcb078c4ab93f267e2c63b2) - **databricks**: Enable hex string literals *(PR [#3522](https://github.com/tobymao/sqlglot/pull/3522) by [@VaggelisD](https://github.com/VaggelisD))*
- :arrow_lower_right: *addresses issue [#3521](https://github.com/tobymao/sqlglot/issues/3521) opened by [@aersam](https://github.com/aersam)*
- [`1ef3bb6`](https://github.com/tobymao/sqlglot/commit/1ef3bb6ab49eff66a50c4d3983f19292b6979e98) - **snowflake**: Add support for `CREATE STREAMLIT` *(PR [#3519](https://github.com/tobymao/sqlglot/pull/3519) by [@yingw787](https://github.com/yingw787))*
- :arrow_lower_right: *addresses issue [#3516](https://github.com/tobymao/sqlglot/issues/3516) opened by [@yingw787](https://github.com/yingw787)*
### :bug: Bug Fixes
- [`5cecbfa`](https://github.com/tobymao/sqlglot/commit/5cecbfa63a770c4d623f4a5f76d1a7a5f59d087d) - unnest identifier closes [#3512](https://github.com/tobymao/sqlglot/pull/3512) *(commit by [@tobymao](https://github.com/tobymao))*
- [`33ab353`](https://github.com/tobymao/sqlglot/commit/33ab3536d68203f4fceee63507b5c73076d48ed7) - **snowflake**: parse certain DB_CREATABLES as identifiers *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`d468f92`](https://github.com/tobymao/sqlglot/commit/d468f92a16decabdf847d7de19f82d65d1939d92) - **doris**: dont generate arrows for JSONExtract* closes [#3513](https://github.com/tobymao/sqlglot/pull/3513) *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`bfb9f98`](https://github.com/tobymao/sqlglot/commit/bfb9f983d35e080ec1f8c171a65d576af873c0ea) - **postgres**: parse @> into ArrayContainsAll, improve transpilation *(PR [#3515](https://github.com/tobymao/sqlglot/pull/3515) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#3511](https://github.com/tobymao/sqlglot/issues/3511) opened by [@Toms1999](https://github.com/Toms1999)*
- [`4def45b`](https://github.com/tobymao/sqlglot/commit/4def45bb553f6fbc65dcf0fa3d6e8c3f5ec000ea) - make UDF DDL property parsing more lenient closes [#3517](https://github.com/tobymao/sqlglot/pull/3517) *(commit by [@georgesittas](https://github.com/georgesittas))*
- [`a077f17`](https://github.com/tobymao/sqlglot/commit/a077f17d10200980769ff69dd9044c95d6d718f2) - **mysql**: add reserved keywords *(PR [#3525](https://github.com/tobymao/sqlglot/pull/3525) by [@georgesittas](https://github.com/georgesittas))*
- :arrow_lower_right: *fixes issue [#3520](https://github.com/tobymao/sqlglot/issues/3520) opened by [@Toms1999](https://github.com/Toms1999)*
- :arrow_lower_right: *fixes issue [#3524](https://github.com/tobymao/sqlglot/issues/3524) opened by [@Toms1999](https://github.com/Toms1999)*
### :wrench: Chores
- [`358f30c`](https://github.com/tobymao/sqlglot/commit/358f30cc02959275c53a2ee9eccde04ddc6a74a5) - remove redundant postgres JSONB token mapping *(commit by [@georgesittas](https://github.com/georgesittas))*
## [v23.17.0] - 2024-05-19
### :boom: BREAKING CHANGES
- due to [`77d21d9`](https://github.com/tobymao/sqlglot/commit/77d21d9379c3f130b803ea651ec3d36256bb84a4) - parse : operator as JSONExtract (similar to Snowflake) *(PR [#3508](https://github.com/tobymao/sqlglot/pull/3508) by [@georgesittas](https://github.com/georgesittas))*:
@ -3700,3 +3788,7 @@ Changelog
[v23.15.10]: https://github.com/tobymao/sqlglot/compare/v23.15.9...v23.15.10
[v23.16.0]: https://github.com/tobymao/sqlglot/compare/v23.15.10...v23.16.0
[v23.17.0]: https://github.com/tobymao/sqlglot/compare/v23.16.0...v23.17.0
[v24.0.0]: https://github.com/tobymao/sqlglot/compare/v23.17.0...v24.0.0
[v24.0.1]: https://github.com/tobymao/sqlglot/compare/v24.0.0...v24.0.1
[v24.0.2]: https://github.com/tobymao/sqlglot/compare/v24.0.1...v24.0.2
[v24.0.3]: https://github.com/tobymao/sqlglot/compare/v24.0.2...v24.0.3

File diff suppressed because one or more lines are too long

View file

@ -39,7 +39,6 @@
<h2>Submodules</h2>
<ul>
<li><a href="sqlglot/dataframe.html">dataframe</a></li>
<li><a href="sqlglot/dialects.html">dialects</a></li>
<li><a href="sqlglot/diff.html">diff</a></li>
<li><a href="sqlglot/errors.html">errors</a></li>
@ -196,6 +195,12 @@
<li>SQLGlot does not aim to be a SQL validator - it is designed to be very forgiving. This makes the codebase more comprehensive and also gives more flexibility to its users, e.g. by allowing them to include trailing commas in their projection lists.</li>
</ul>
<p>What happened to sqlglot.dataframe?</p>
<ul>
<li>The PySpark dataframe api was moved to a standalone library called <a href="https://github.com/eakmanrq/sqlframe">sqlframe</a> in v24. It now allows you to run queries as opposed to just generate SQL.</li>
</ul>
<h2 id="examples">Examples</h2>
<h3 id="formatting-and-transpiling">Formatting and Transpiling</h3>
@ -640,6 +645,7 @@
<li><a href="https://github.com/pinterest/querybook">Querybook</a></li>
<li><a href="https://github.com/marsupialtail/quokka">Quokka</a></li>
<li><a href="https://github.com/moj-analytical-services/splink">Splink</a></li>
<li><a href="https://github.com/eakmanrq/sqlframe">sqlframe</a></li>
</ul>
<h2 id="documentation">Documentation</h2>

View file

@ -76,8 +76,8 @@
</span><span id="L-12"><a href="#L-12"><span class="linenos">12</span></a><span class="n">__version_tuple__</span><span class="p">:</span> <span class="n">VERSION_TUPLE</span>
</span><span id="L-13"><a href="#L-13"><span class="linenos">13</span></a><span class="n">version_tuple</span><span class="p">:</span> <span class="n">VERSION_TUPLE</span>
</span><span id="L-14"><a href="#L-14"><span class="linenos">14</span></a>
</span><span id="L-15"><a href="#L-15"><span class="linenos">15</span></a><span class="n">__version__</span> <span class="o">=</span> <span class="n">version</span> <span class="o">=</span> <span class="s1">&#39;23.17.0&#39;</span>
</span><span id="L-16"><a href="#L-16"><span class="linenos">16</span></a><span class="n">__version_tuple__</span> <span class="o">=</span> <span class="n">version_tuple</span> <span class="o">=</span> <span class="p">(</span><span class="mi">23</span><span class="p">,</span> <span class="mi">17</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span>
</span><span id="L-15"><a href="#L-15"><span class="linenos">15</span></a><span class="n">__version__</span> <span class="o">=</span> <span class="n">version</span> <span class="o">=</span> <span class="s1">&#39;24.0.3&#39;</span>
</span><span id="L-16"><a href="#L-16"><span class="linenos">16</span></a><span class="n">__version_tuple__</span> <span class="o">=</span> <span class="n">version_tuple</span> <span class="o">=</span> <span class="p">(</span><span class="mi">24</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)</span>
</span></pre></div>
@ -97,7 +97,7 @@
<section id="version">
<div class="attr variable">
<span class="name">version</span><span class="annotation">: str</span> =
<span class="default_value">&#39;23.17.0&#39;</span>
<span class="default_value">&#39;24.0.3&#39;</span>
</div>
@ -109,7 +109,7 @@
<section id="version_tuple">
<div class="attr variable">
<span class="name">version_tuple</span><span class="annotation">: object</span> =
<span class="default_value">(23, 17, 0)</span>
<span class="default_value">(24, 0, 3)</span>
</div>

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -1893,7 +1893,7 @@ belong to some totally-ordered set.</p>
<section id="DATE_UNITS">
<div class="attr variable">
<span class="name">DATE_UNITS</span> =
<span class="default_value">{&#39;year_month&#39;, &#39;week&#39;, &#39;day&#39;, &#39;year&#39;, &#39;month&#39;, &#39;quarter&#39;}</span>
<span class="default_value">{&#39;year&#39;, &#39;week&#39;, &#39;year_month&#39;, &#39;day&#39;, &#39;month&#39;, &#39;quarter&#39;}</span>
</div>

View file

@ -577,7 +577,7 @@
<div class="attr variable">
<span class="name">ALL_JSON_PATH_PARTS</span> =
<input id="ALL_JSON_PATH_PARTS-view-value" class="view-value-toggle-state" type="checkbox" aria-hidden="true" tabindex="-1">
<label class="view-value-button pdoc-button" for="ALL_JSON_PATH_PARTS-view-value"></label><span class="default_value">{&lt;class &#39;<a href="expressions.html#JSONPathRecursive">sqlglot.expressions.JSONPathRecursive</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathKey">sqlglot.expressions.JSONPathKey</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathWildcard">sqlglot.expressions.JSONPathWildcard</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathFilter">sqlglot.expressions.JSONPathFilter</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathUnion">sqlglot.expressions.JSONPathUnion</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathSubscript">sqlglot.expressions.JSONPathSubscript</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathSelector">sqlglot.expressions.JSONPathSelector</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathSlice">sqlglot.expressions.JSONPathSlice</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathScript">sqlglot.expressions.JSONPathScript</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathRoot">sqlglot.expressions.JSONPathRoot</a>&#39;&gt;}</span>
<label class="view-value-button pdoc-button" for="ALL_JSON_PATH_PARTS-view-value"></label><span class="default_value">{&lt;class &#39;<a href="expressions.html#JSONPathFilter">sqlglot.expressions.JSONPathFilter</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathUnion">sqlglot.expressions.JSONPathUnion</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathSubscript">sqlglot.expressions.JSONPathSubscript</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathSelector">sqlglot.expressions.JSONPathSelector</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathSlice">sqlglot.expressions.JSONPathSlice</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathScript">sqlglot.expressions.JSONPathScript</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathWildcard">sqlglot.expressions.JSONPathWildcard</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathRoot">sqlglot.expressions.JSONPathRoot</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathRecursive">sqlglot.expressions.JSONPathRecursive</a>&#39;&gt;, &lt;class &#39;<a href="expressions.html#JSONPathKey">sqlglot.expressions.JSONPathKey</a>&#39;&gt;}</span>
</div>

File diff suppressed because one or more lines are too long

View file

@ -586,7 +586,7 @@ queries if it would result in multiple table selects in a single query:</p>
<div class="attr variable">
<span class="name">UNMERGABLE_ARGS</span> =
<input id="UNMERGABLE_ARGS-view-value" class="view-value-toggle-state" type="checkbox" aria-hidden="true" tabindex="-1">
<label class="view-value-button pdoc-button" for="UNMERGABLE_ARGS-view-value"></label><span class="default_value">{&#39;locks&#39;, &#39;with&#39;, &#39;cluster&#39;, &#39;distinct&#39;, &#39;pivots&#39;, &#39;prewhere&#39;, &#39;windows&#39;, &#39;options&#39;, &#39;offset&#39;, &#39;sort&#39;, &#39;settings&#39;, &#39;match&#39;, &#39;connect&#39;, &#39;sample&#39;, &#39;laterals&#39;, &#39;format&#39;, &#39;having&#39;, &#39;group&#39;, &#39;limit&#39;, &#39;qualify&#39;, &#39;kind&#39;, &#39;into&#39;, &#39;distribute&#39;}</span>
<label class="view-value-button pdoc-button" for="UNMERGABLE_ARGS-view-value"></label><span class="default_value">{&#39;offset&#39;, &#39;prewhere&#39;, &#39;match&#39;, &#39;locks&#39;, &#39;qualify&#39;, &#39;windows&#39;, &#39;pivots&#39;, &#39;cluster&#39;, &#39;settings&#39;, &#39;having&#39;, &#39;group&#39;, &#39;options&#39;, &#39;distinct&#39;, &#39;with&#39;, &#39;distribute&#39;, &#39;sample&#39;, &#39;format&#39;, &#39;connect&#39;, &#39;laterals&#39;, &#39;limit&#39;, &#39;sort&#39;, &#39;into&#39;, &#39;kind&#39;}</span>
</div>

View file

@ -3006,7 +3006,7 @@ prefix are statically known.</p>
<div class="attr variable">
<span class="name">DATETRUNC_COMPARISONS</span> =
<input id="DATETRUNC_COMPARISONS-view-value" class="view-value-toggle-state" type="checkbox" aria-hidden="true" tabindex="-1">
<label class="view-value-button pdoc-button" for="DATETRUNC_COMPARISONS-view-value"></label><span class="default_value">{&lt;class &#39;<a href="../expressions.html#GTE">sqlglot.expressions.GTE</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#EQ">sqlglot.expressions.EQ</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#LTE">sqlglot.expressions.LTE</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#GT">sqlglot.expressions.GT</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#NEQ">sqlglot.expressions.NEQ</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#LT">sqlglot.expressions.LT</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#In">sqlglot.expressions.In</a>&#39;&gt;}</span>
<label class="view-value-button pdoc-button" for="DATETRUNC_COMPARISONS-view-value"></label><span class="default_value">{&lt;class &#39;<a href="../expressions.html#LTE">sqlglot.expressions.LTE</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#GT">sqlglot.expressions.GT</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#LT">sqlglot.expressions.LT</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#NEQ">sqlglot.expressions.NEQ</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#EQ">sqlglot.expressions.EQ</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#GTE">sqlglot.expressions.GTE</a>&#39;&gt;, &lt;class &#39;<a href="../expressions.html#In">sqlglot.expressions.In</a>&#39;&gt;}</span>
</div>
@ -3086,7 +3086,7 @@ prefix are statically known.</p>
<section id="JOINS">
<div class="attr variable">
<span class="name">JOINS</span> =
<span class="default_value">{(&#39;RIGHT&#39;, &#39;&#39;), (&#39;RIGHT&#39;, &#39;OUTER&#39;), (&#39;&#39;, &#39;INNER&#39;), (&#39;&#39;, &#39;&#39;)}</span>
<span class="default_value">{(&#39;&#39;, &#39;&#39;), (&#39;RIGHT&#39;, &#39;&#39;), (&#39;RIGHT&#39;, &#39;OUTER&#39;), (&#39;&#39;, &#39;INNER&#39;)}</span>
</div>

File diff suppressed because one or more lines are too long

File diff suppressed because it is too large Load diff

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View file

@ -65,9 +65,6 @@ except ImportError:
pretty = False
"""Whether to format generated SQL by default."""
schema = MappingSchema()
"""The default schema used by SQLGlot (e.g. in the optimizer)."""
def tokenize(sql: str, read: DialectType = None, dialect: DialectType = None) -> t.List[Token]:
"""

View file

@ -25,6 +25,17 @@ if t.TYPE_CHECKING:
logger = logging.getLogger("sqlglot")
UNESCAPED_SEQUENCES = {
"\\a": "\a",
"\\b": "\b",
"\\f": "\f",
"\\n": "\n",
"\\r": "\r",
"\\t": "\t",
"\\v": "\v",
"\\\\": "\\",
}
class Dialects(str, Enum):
"""Dialects supported by SQLGLot."""
@ -145,14 +156,7 @@ class _Dialect(type):
if "\\" in klass.tokenizer_class.STRING_ESCAPES:
klass.UNESCAPED_SEQUENCES = {
"\\a": "\a",
"\\b": "\b",
"\\f": "\f",
"\\n": "\n",
"\\r": "\r",
"\\t": "\t",
"\\v": "\v",
"\\\\": "\\",
**UNESCAPED_SEQUENCES,
**klass.UNESCAPED_SEQUENCES,
}

View file

@ -53,8 +53,9 @@ class Doris(MySQL):
exp.Map: rename_func("ARRAY_MAP"),
exp.RegexpLike: rename_func("REGEXP"),
exp.RegexpSplit: rename_func("SPLIT_BY_STRING"),
exp.StrToUnix: lambda self, e: self.func("UNIX_TIMESTAMP", e.this, self.format_time(e)),
exp.Split: rename_func("SPLIT_BY_STRING"),
exp.StringToArray: rename_func("SPLIT_BY_STRING"),
exp.StrToUnix: lambda self, e: self.func("UNIX_TIMESTAMP", e.this, self.format_time(e)),
exp.TimeStrToDate: rename_func("TO_DATE"),
exp.TsOrDsAdd: lambda self, e: self.func("DATE_ADD", e.this, e.expression),
exp.TsOrDsToDate: lambda self, e: self.func("TO_DATE", e.this),
@ -65,3 +66,477 @@ class Doris(MySQL):
),
exp.UnixToTime: rename_func("FROM_UNIXTIME"),
}
# https://github.com/apache/doris/blob/e4f41dbf1ec03f5937fdeba2ee1454a20254015b/fe/fe-core/src/main/antlr4/org/apache/doris/nereids/DorisLexer.g4#L93
RESERVED_KEYWORDS = {
"account_lock",
"account_unlock",
"add",
"adddate",
"admin",
"after",
"agg_state",
"aggregate",
"alias",
"all",
"alter",
"analyze",
"analyzed",
"and",
"anti",
"append",
"array",
"array_range",
"as",
"asc",
"at",
"authors",
"auto",
"auto_increment",
"backend",
"backends",
"backup",
"begin",
"belong",
"between",
"bigint",
"bin",
"binary",
"binlog",
"bitand",
"bitmap",
"bitmap_union",
"bitor",
"bitxor",
"blob",
"boolean",
"brief",
"broker",
"buckets",
"build",
"builtin",
"bulk",
"by",
"cached",
"call",
"cancel",
"case",
"cast",
"catalog",
"catalogs",
"chain",
"char",
"character",
"charset",
"check",
"clean",
"cluster",
"clusters",
"collate",
"collation",
"collect",
"column",
"columns",
"comment",
"commit",
"committed",
"compact",
"complete",
"config",
"connection",
"connection_id",
"consistent",
"constraint",
"constraints",
"convert",
"copy",
"count",
"create",
"creation",
"cron",
"cross",
"cube",
"current",
"current_catalog",
"current_date",
"current_time",
"current_timestamp",
"current_user",
"data",
"database",
"databases",
"date",
"date_add",
"date_ceil",
"date_diff",
"date_floor",
"date_sub",
"dateadd",
"datediff",
"datetime",
"datetimev2",
"datev2",
"datetimev1",
"datev1",
"day",
"days_add",
"days_sub",
"decimal",
"decimalv2",
"decimalv3",
"decommission",
"default",
"deferred",
"delete",
"demand",
"desc",
"describe",
"diagnose",
"disk",
"distinct",
"distinctpc",
"distinctpcsa",
"distributed",
"distribution",
"div",
"do",
"doris_internal_table_id",
"double",
"drop",
"dropp",
"dual",
"duplicate",
"dynamic",
"else",
"enable",
"encryptkey",
"encryptkeys",
"end",
"ends",
"engine",
"engines",
"enter",
"errors",
"events",
"every",
"except",
"exclude",
"execute",
"exists",
"expired",
"explain",
"export",
"extended",
"external",
"extract",
"failed_login_attempts",
"false",
"fast",
"feature",
"fields",
"file",
"filter",
"first",
"float",
"follower",
"following",
"for",
"foreign",
"force",
"format",
"free",
"from",
"frontend",
"frontends",
"full",
"function",
"functions",
"generic",
"global",
"grant",
"grants",
"graph",
"group",
"grouping",
"groups",
"hash",
"having",
"hdfs",
"help",
"histogram",
"hll",
"hll_union",
"hostname",
"hour",
"hub",
"identified",
"if",
"ignore",
"immediate",
"in",
"incremental",
"index",
"indexes",
"infile",
"inner",
"insert",
"install",
"int",
"integer",
"intermediate",
"intersect",
"interval",
"into",
"inverted",
"ipv4",
"ipv6",
"is",
"is_not_null_pred",
"is_null_pred",
"isnull",
"isolation",
"job",
"jobs",
"join",
"json",
"jsonb",
"key",
"keys",
"kill",
"label",
"largeint",
"last",
"lateral",
"ldap",
"ldap_admin_password",
"left",
"less",
"level",
"like",
"limit",
"lines",
"link",
"list",
"load",
"local",
"localtime",
"localtimestamp",
"location",
"lock",
"logical",
"low_priority",
"manual",
"map",
"match",
"match_all",
"match_any",
"match_phrase",
"match_phrase_edge",
"match_phrase_prefix",
"match_regexp",
"materialized",
"max",
"maxvalue",
"memo",
"merge",
"migrate",
"migrations",
"min",
"minus",
"minute",
"modify",
"month",
"mtmv",
"name",
"names",
"natural",
"negative",
"never",
"next",
"ngram_bf",
"no",
"non_nullable",
"not",
"null",
"nulls",
"observer",
"of",
"offset",
"on",
"only",
"open",
"optimized",
"or",
"order",
"outer",
"outfile",
"over",
"overwrite",
"parameter",
"parsed",
"partition",
"partitions",
"password",
"password_expire",
"password_history",
"password_lock_time",
"password_reuse",
"path",
"pause",
"percent",
"period",
"permissive",
"physical",
"plan",
"process",
"plugin",
"plugins",
"policy",
"preceding",
"prepare",
"primary",
"proc",
"procedure",
"processlist",
"profile",
"properties",
"property",
"quantile_state",
"quantile_union",
"query",
"quota",
"random",
"range",
"read",
"real",
"rebalance",
"recover",
"recycle",
"refresh",
"references",
"regexp",
"release",
"rename",
"repair",
"repeatable",
"replace",
"replace_if_not_null",
"replica",
"repositories",
"repository",
"resource",
"resources",
"restore",
"restrictive",
"resume",
"returns",
"revoke",
"rewritten",
"right",
"rlike",
"role",
"roles",
"rollback",
"rollup",
"routine",
"row",
"rows",
"s3",
"sample",
"schedule",
"scheduler",
"schema",
"schemas",
"second",
"select",
"semi",
"sequence",
"serializable",
"session",
"set",
"sets",
"shape",
"show",
"signed",
"skew",
"smallint",
"snapshot",
"soname",
"split",
"sql_block_rule",
"start",
"starts",
"stats",
"status",
"stop",
"storage",
"stream",
"streaming",
"string",
"struct",
"subdate",
"sum",
"superuser",
"switch",
"sync",
"system",
"table",
"tables",
"tablesample",
"tablet",
"tablets",
"task",
"tasks",
"temporary",
"terminated",
"text",
"than",
"then",
"time",
"timestamp",
"timestampadd",
"timestampdiff",
"tinyint",
"to",
"transaction",
"trash",
"tree",
"triggers",
"trim",
"true",
"truncate",
"type",
"type_cast",
"types",
"unbounded",
"uncommitted",
"uninstall",
"union",
"unique",
"unlock",
"unsigned",
"update",
"use",
"user",
"using",
"value",
"values",
"varchar",
"variables",
"variant",
"vault",
"verbose",
"version",
"view",
"warnings",
"week",
"when",
"where",
"whitelist",
"with",
"work",
"workload",
"write",
"xor",
"year",
}

View file

@ -573,6 +573,9 @@ class Hive(Dialect):
exp.OnProperty: lambda *_: "",
exp.PrimaryKeyColumnConstraint: lambda *_: "PRIMARY KEY",
exp.ParseJSON: lambda self, e: self.sql(e.this),
exp.WeekOfYear: rename_func("WEEKOFYEAR"),
exp.DayOfMonth: rename_func("DAYOFMONTH"),
exp.DayOfWeek: rename_func("DAYOFWEEK"),
}
PROPERTIES_LOCATION = {

View file

@ -670,6 +670,7 @@ class MySQL(Dialect):
return self.expression(exp.GroupConcat, this=this, separator=separator)
class Generator(generator.Generator):
INTERVAL_ALLOWS_PLURAL_FORM = False
LOCKING_READS_SUPPORTED = True
NULL_ORDERING_SUPPORTED = None
JOIN_HINTS = False

View file

@ -116,7 +116,10 @@ def _string_agg_sql(self: Postgres.Generator, expression: exp.GroupConcat) -> st
def _datatype_sql(self: Postgres.Generator, expression: exp.DataType) -> str:
if expression.is_type("array"):
return f"{self.expressions(expression, flat=True)}[]" if expression.expressions else "ARRAY"
if expression.expressions:
values = self.expressions(expression, key="values", flat=True)
return f"{self.expressions(expression, flat=True)}[{values}]"
return "ARRAY"
return self.datatype_sql(expression)
@ -333,6 +336,7 @@ class Postgres(Dialect):
"REGPROCEDURE": TokenType.OBJECT_IDENTIFIER,
"REGROLE": TokenType.OBJECT_IDENTIFIER,
"REGTYPE": TokenType.OBJECT_IDENTIFIER,
"FLOAT": TokenType.DOUBLE,
}
SINGLE_TOKENS = {

View file

@ -63,6 +63,9 @@ class Redshift(Postgres):
"DATE_DIFF": _build_date_delta(exp.TsOrDsDiff),
"GETDATE": exp.CurrentTimestamp.from_arg_list,
"LISTAGG": exp.GroupConcat.from_arg_list,
"SPLIT_TO_ARRAY": lambda args: exp.StringToArray(
this=seq_get(args, 0), expression=seq_get(args, 1) or exp.Literal.string(",")
),
"STRTOL": exp.FromBase.from_arg_list,
}
@ -124,6 +127,7 @@ class Redshift(Postgres):
"TOP": TokenType.TOP,
"UNLOAD": TokenType.COMMAND,
"VARBYTE": TokenType.VARBINARY,
"MINUS": TokenType.EXCEPT,
}
KEYWORDS.pop("VALUES")
@ -186,6 +190,7 @@ class Redshift(Postgres):
e: f"{'COMPOUND ' if e.args['compound'] else ''}SORTKEY({self.format_args(*e.this)})",
exp.StartsWith: lambda self,
e: f"{self.sql(e.this)} LIKE {self.sql(e.expression)} || '%'",
exp.StringToArray: rename_func("SPLIT_TO_ARRAY"),
exp.TableSample: no_tablesample_sql,
exp.TsOrDsAdd: date_delta_sql("DATEADD"),
exp.TsOrDsDiff: date_delta_sql("DATEDIFF"),

View file

@ -473,6 +473,14 @@ class Snowflake(Dialect):
"TERSE USERS": _show_parser("USERS"),
}
CONSTRAINT_PARSERS = {
**parser.Parser.CONSTRAINT_PARSERS,
"WITH": lambda self: self._parse_with_constraint(),
"MASKING": lambda self: self._parse_with_constraint(),
"PROJECTION": lambda self: self._parse_with_constraint(),
"TAG": lambda self: self._parse_with_constraint(),
}
STAGED_FILE_SINGLE_TOKENS = {
TokenType.DOT,
TokenType.MOD,
@ -497,6 +505,29 @@ class Snowflake(Dialect):
),
}
def _parse_with_constraint(self) -> t.Optional[exp.Expression]:
if self._prev.token_type != TokenType.WITH:
self._retreat(self._index - 1)
if self._match_text_seq("MASKING", "POLICY"):
return self.expression(
exp.MaskingPolicyColumnConstraint,
this=self._parse_id_var(),
expressions=self._match(TokenType.USING)
and self._parse_wrapped_csv(self._parse_id_var),
)
if self._match_text_seq("PROJECTION", "POLICY"):
return self.expression(
exp.ProjectionPolicyColumnConstraint, this=self._parse_id_var()
)
if self._match(TokenType.TAG):
return self.expression(
exp.TagColumnConstraint,
expressions=self._parse_wrapped_csv(self._parse_property),
)
return None
def _parse_create(self) -> exp.Create | exp.Command:
expression = super()._parse_create()
if isinstance(expression, exp.Create) and expression.kind in self.NON_TABLE_CREATABLES:

View file

@ -17,7 +17,6 @@ from sqlglot.dialects.dialect import (
min_or_least,
build_date_delta,
rename_func,
timestrtotime_sql,
trim_sql,
)
from sqlglot.helper import seq_get
@ -818,6 +817,7 @@ class TSQL(Dialect):
exp.Min: min_or_least,
exp.NumberToStr: _format_sql,
exp.ParseJSON: lambda self, e: self.sql(e, "this"),
exp.Repeat: rename_func("REPLICATE"),
exp.Select: transforms.preprocess(
[
transforms.eliminate_distinct_on,
@ -834,7 +834,9 @@ class TSQL(Dialect):
"HASHBYTES", exp.Literal.string(f"SHA2_{e.args.get('length', 256)}"), e.this
),
exp.TemporaryProperty: lambda self, e: "",
exp.TimeStrToTime: timestrtotime_sql,
exp.TimeStrToTime: lambda self, e: self.sql(
exp.cast(e.this, exp.DataType.Type.DATETIME)
),
exp.TimeToStr: _format_sql,
exp.Trim: trim_sql,
exp.TsOrDsAdd: date_delta_sql("DATEADD", cast=True),

View file

@ -1632,6 +1632,7 @@ class AlterColumn(Expression):
"default": False,
"drop": False,
"comment": False,
"allow_null": False,
}
@ -1835,6 +1836,11 @@ class NotForReplicationColumnConstraint(ColumnConstraintKind):
arg_types = {}
# https://docs.snowflake.com/en/sql-reference/sql/create-table
class MaskingPolicyColumnConstraint(ColumnConstraintKind):
arg_types = {"this": True, "expressions": False}
class NotNullColumnConstraint(ColumnConstraintKind):
arg_types = {"allow_null": False}
@ -1844,6 +1850,11 @@ class OnUpdateColumnConstraint(ColumnConstraintKind):
pass
# https://docs.snowflake.com/en/sql-reference/sql/create-table
class TagColumnConstraint(ColumnConstraintKind):
arg_types = {"expressions": True}
# https://docs.snowflake.com/en/sql-reference/sql/create-external-table#optional-parameters
class TransformColumnConstraint(ColumnConstraintKind):
pass
@ -1869,6 +1880,11 @@ class PathColumnConstraint(ColumnConstraintKind):
pass
# https://docs.snowflake.com/en/sql-reference/sql/create-table
class ProjectionPolicyColumnConstraint(ColumnConstraintKind):
pass
# computed column expression
# https://learn.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql?view=sql-server-ver16
class ComputedColumnConstraint(ColumnConstraintKind):
@ -1992,7 +2008,7 @@ class Connect(Expression):
class CopyParameter(Expression):
arg_types = {"this": True, "expression": False}
arg_types = {"this": True, "expression": False, "expressions": False}
class Copy(Expression):
@ -4825,6 +4841,11 @@ class ArrayToString(Func):
_sql_names = ["ARRAY_TO_STRING", "ARRAY_JOIN"]
class StringToArray(Func):
arg_types = {"this": True, "expression": True, "null": False}
_sql_names = ["STRING_TO_ARRAY", "SPLIT_BY_STRING"]
class ArrayOverlaps(Binary, Func):
pass

View file

@ -123,6 +123,8 @@ class Generator(metaclass=_Generator):
exp.OnUpdateColumnConstraint: lambda self, e: f"ON UPDATE {self.sql(e, 'this')}",
exp.OutputModelProperty: lambda self, e: f"OUTPUT{self.sql(e, 'this')}",
exp.PathColumnConstraint: lambda self, e: f"PATH {self.sql(e, 'this')}",
exp.ProjectionPolicyColumnConstraint: lambda self,
e: f"PROJECTION POLICY {self.sql(e, 'this')}",
exp.RemoteWithConnectionModelProperty: lambda self,
e: f"REMOTE WITH CONNECTION {self.sql(e, 'this')}",
exp.ReturnsProperty: lambda self, e: (
@ -139,6 +141,7 @@ class Generator(metaclass=_Generator):
exp.StabilityProperty: lambda _, e: e.name,
exp.StrictProperty: lambda *_: "STRICT",
exp.TemporaryProperty: lambda *_: "TEMPORARY",
exp.TagColumnConstraint: lambda self, e: f"TAG ({self.expressions(e, flat=True)})",
exp.TitleColumnConstraint: lambda self, e: f"TITLE {self.sql(e, 'this')}",
exp.Timestamp: lambda self, e: self.func("TIMESTAMP", e.this, e.expression),
exp.ToMap: lambda self, e: f"MAP {self.sql(e, 'this')}",
@ -3022,9 +3025,16 @@ class Generator(metaclass=_Generator):
if comment:
return f"ALTER COLUMN {this} COMMENT {comment}"
if not expression.args.get("drop"):
allow_null = expression.args.get("allow_null")
drop = expression.args.get("drop")
if not drop and not allow_null:
self.unsupported("Unsupported ALTER COLUMN syntax")
if allow_null is not None:
keyword = "DROP" if drop else "SET"
return f"ALTER COLUMN {this} {keyword} NOT NULL"
return f"ALTER COLUMN {this} DROP DEFAULT"
def alterdiststyle_sql(self, expression: exp.AlterDistStyle) -> str:
@ -3850,9 +3860,16 @@ class Generator(metaclass=_Generator):
def copyparameter_sql(self, expression: exp.CopyParameter) -> str:
option = self.sql(expression, "this")
if option.upper() == "FILE_FORMAT":
values = self.expressions(expression, key="expression", flat=True, sep=" ")
return f"{option} = ({values})"
if expression.expressions:
upper = option.upper()
# Snowflake FILE_FORMAT options are separated by whitespace
sep = " " if upper == "FILE_FORMAT" else ", "
# Databricks copy/format options do not set their list of values with EQ
op = " " if upper in ("COPY_OPTIONS", "FORMAT_OPTIONS") else " = "
values = self.expressions(expression, flat=True, sep=sep)
return f"{option}{op}({values})"
value = self.sql(expression, "expression")
@ -3872,9 +3889,10 @@ class Generator(metaclass=_Generator):
else:
# Snowflake case: CREDENTIALS = (...)
credentials = self.expressions(expression, key="credentials", flat=True, sep=" ")
credentials = f"CREDENTIALS = ({credentials})" if credentials else ""
credentials = f"CREDENTIALS = ({credentials})" if cred_expr is not None else ""
storage = self.sql(expression, "storage")
storage = f"STORAGE_INTEGRATION = {storage}" if storage else ""
encryption = self.expressions(expression, key="encryption", flat=True, sep=" ")
encryption = f" ENCRYPTION = ({encryption})" if encryption else ""
@ -3929,3 +3947,11 @@ class Generator(metaclass=_Generator):
on_sql = self.func("ON", filter_col, retention_period)
return f"DATA_DELETION={on_sql}"
def maskingpolicycolumnconstraint_sql(
self, expression: exp.MaskingPolicyColumnConstraint
) -> str:
this = self.sql(expression, "this")
expressions = self.expressions(expression, flat=True)
expressions = f" USING ({expressions})" if expressions else ""
return f"MASKING POLICY {this}{expressions}"

View file

@ -3,7 +3,6 @@ from __future__ import annotations
import inspect
import typing as t
import sqlglot
from sqlglot import Schema, exp
from sqlglot.dialects.dialect import DialectType
from sqlglot.optimizer.annotate_types import annotate_types
@ -72,7 +71,7 @@ def optimize(
Returns:
The optimized expression.
"""
schema = ensure_schema(schema or sqlglot.schema, dialect=dialect)
schema = ensure_schema(schema, dialect=dialect)
possible_kwargs = {
"db": db,
"catalog": catalog,

View file

@ -63,6 +63,7 @@ def qualify_columns(
if schema.empty and expand_alias_refs:
_expand_alias_refs(scope, resolver)
_convert_columns_to_dots(scope, resolver)
_qualify_columns(scope, resolver)
if not schema.empty and expand_alias_refs:
@ -70,7 +71,13 @@ def qualify_columns(
if not isinstance(scope.expression, exp.UDTF):
if expand_stars:
_expand_stars(scope, resolver, using_column_tables, pseudocolumns)
_expand_stars(
scope,
resolver,
using_column_tables,
pseudocolumns,
annotator,
)
qualify_outputs(scope)
_expand_group_by(scope)
@ -329,6 +336,47 @@ def _select_by_pos(scope: Scope, node: exp.Literal) -> exp.Alias:
raise OptimizeError(f"Unknown output column: {node.name}")
def _convert_columns_to_dots(scope: Scope, resolver: Resolver) -> None:
"""
Converts `Column` instances that represent struct field lookup into chained `Dots`.
Struct field lookups look like columns (e.g. "struct"."field"), but they need to be
qualified separately and represented as Dot(Dot(...(<table>.<column>, field1), field2, ...)).
"""
converted = False
for column in itertools.chain(scope.columns, scope.stars):
if isinstance(column, exp.Dot):
continue
column_table: t.Optional[str | exp.Identifier] = column.table
if (
column_table
and column_table not in scope.sources
and (
not scope.parent
or column_table not in scope.parent.sources
or not scope.is_correlated_subquery
)
):
root, *parts = column.parts
if root.name in scope.sources:
# The struct is already qualified, but we still need to change the AST
column_table = root
root, *parts = parts
else:
column_table = resolver.get_table(root.name)
if column_table:
converted = True
column.replace(exp.Dot.build([exp.column(root, table=column_table), *parts]))
if converted:
# We want to re-aggregate the converted columns, otherwise they'd be skipped in
# a `for column in scope.columns` iteration, even though they shouldn't be
scope.clear_cache()
def _qualify_columns(scope: Scope, resolver: Resolver) -> None:
"""Disambiguate columns, ensuring each column specifies a source"""
for column in scope.columns:
@ -347,30 +395,10 @@ def _qualify_columns(scope: Scope, resolver: Resolver) -> None:
column.set("table", exp.to_identifier(scope.pivots[0].alias))
continue
column_table = resolver.get_table(column_name)
# column_table can be a '' because bigquery unnest has no table alias
column_table = resolver.get_table(column_name)
if column_table:
column.set("table", column_table)
elif column_table not in scope.sources and (
not scope.parent
or column_table not in scope.parent.sources
or not scope.is_correlated_subquery
):
# structs are used like tables (e.g. "struct"."field"), so they need to be qualified
# separately and represented as dot(dot(...(<table>.<column>, field1), field2, ...))
root, *parts = column.parts
if root.name in scope.sources:
# struct is already qualified, but we still need to change the AST representation
column_table = root
root, *parts = parts
else:
column_table = resolver.get_table(root.name)
if column_table:
column.replace(exp.Dot.build([exp.column(root, table=column_table), *parts]))
for pivot in scope.pivots:
for column in pivot.find_all(exp.Column):
@ -380,11 +408,64 @@ def _qualify_columns(scope: Scope, resolver: Resolver) -> None:
column.set("table", column_table)
def _expand_struct_stars(
expression: exp.Dot,
) -> t.List[exp.Alias]:
"""[BigQuery] Expand/Flatten foo.bar.* where bar is a struct column"""
dot_column = t.cast(exp.Column, expression.find(exp.Column))
if not dot_column.is_type(exp.DataType.Type.STRUCT):
return []
# All nested struct values are ColumnDefs, so normalize the first exp.Column in one
dot_column = dot_column.copy()
starting_struct = exp.ColumnDef(this=dot_column.this, kind=dot_column.type)
# First part is the table name and last part is the star so they can be dropped
dot_parts = expression.parts[1:-1]
# If we're expanding a nested struct eg. t.c.f1.f2.* find the last struct (f2 in this case)
for part in dot_parts[1:]:
for field in t.cast(exp.DataType, starting_struct.kind).expressions:
# Unable to expand star unless all fields are named
if not isinstance(field.this, exp.Identifier):
return []
if field.name == part.name and field.kind.is_type(exp.DataType.Type.STRUCT):
starting_struct = field
break
else:
# There is no matching field in the struct
return []
taken_names = set()
new_selections = []
for field in t.cast(exp.DataType, starting_struct.kind).expressions:
name = field.name
# Ambiguous or anonymous fields can't be expanded
if name in taken_names or not isinstance(field.this, exp.Identifier):
return []
taken_names.add(name)
this = field.this.copy()
root, *parts = [part.copy() for part in itertools.chain(dot_parts, [this])]
new_column = exp.column(
t.cast(exp.Identifier, root), table=dot_column.args.get("table"), fields=parts
)
new_selections.append(alias(new_column, this, copy=False))
return new_selections
def _expand_stars(
scope: Scope,
resolver: Resolver,
using_column_tables: t.Dict[str, t.Any],
pseudocolumns: t.Set[str],
annotator: TypeAnnotator,
) -> None:
"""Expand stars to lists of column selections"""
@ -392,6 +473,7 @@ def _expand_stars(
except_columns: t.Dict[int, t.Set[str]] = {}
replace_columns: t.Dict[int, t.Dict[str, str]] = {}
coalesced_columns = set()
dialect = resolver.schema.dialect
pivot_output_columns = None
pivot_exclude_columns = None
@ -413,16 +495,29 @@ def _expand_stars(
if not pivot_output_columns:
pivot_output_columns = [c.alias_or_name for c in pivot.expressions]
is_bigquery = dialect == "bigquery"
if is_bigquery and any(isinstance(col, exp.Dot) for col in scope.stars):
# Found struct expansion, annotate scope ahead of time
annotator.annotate_scope(scope)
for expression in scope.expression.selects:
tables = []
if isinstance(expression, exp.Star):
tables = list(scope.selected_sources)
tables.extend(scope.selected_sources)
_add_except_columns(expression, tables, except_columns)
_add_replace_columns(expression, tables, replace_columns)
elif expression.is_star and not isinstance(expression, exp.Dot):
tables = [expression.table]
_add_except_columns(expression.this, tables, except_columns)
_add_replace_columns(expression.this, tables, replace_columns)
else:
elif expression.is_star:
if not isinstance(expression, exp.Dot):
tables.append(expression.table)
_add_except_columns(expression.this, tables, except_columns)
_add_replace_columns(expression.this, tables, replace_columns)
elif is_bigquery:
struct_fields = _expand_struct_stars(expression)
if struct_fields:
new_selections.extend(struct_fields)
continue
if not tables:
new_selections.append(expression)
continue

View file

@ -86,6 +86,7 @@ class Scope:
def clear_cache(self):
self._collected = False
self._raw_columns = None
self._stars = None
self._derived_tables = None
self._udtfs = None
self._tables = None
@ -119,14 +120,20 @@ class Scope:
self._derived_tables = []
self._udtfs = []
self._raw_columns = []
self._stars = []
self._join_hints = []
for node in self.walk(bfs=False):
if node is self.expression:
continue
if isinstance(node, exp.Column) and not isinstance(node.this, exp.Star):
self._raw_columns.append(node)
if isinstance(node, exp.Dot) and node.is_star:
self._stars.append(node)
elif isinstance(node, exp.Column):
if isinstance(node.this, exp.Star):
self._stars.append(node)
else:
self._raw_columns.append(node)
elif isinstance(node, exp.Table) and not isinstance(node.parent, exp.JoinHint):
self._tables.append(node)
elif isinstance(node, exp.JoinHint):
@ -231,6 +238,14 @@ class Scope:
self._ensure_collected()
return self._subqueries
@property
def stars(self) -> t.List[exp.Column | exp.Dot]:
"""
List of star expressions (columns or dots) in this scope.
"""
self._ensure_collected()
return self._stars
@property
def columns(self):
"""

View file

@ -1134,6 +1134,8 @@ class Parser(metaclass=_Parser):
SELECT_START_TOKENS = {TokenType.L_PAREN, TokenType.WITH, TokenType.SELECT}
COPY_INTO_VARLEN_OPTIONS = {"FILE_FORMAT", "COPY_OPTIONS", "FORMAT_OPTIONS", "CREDENTIAL"}
STRICT_CAST = True
PREFIXED_PIVOT_COLUMNS = False
@ -1830,11 +1832,17 @@ class Parser(metaclass=_Parser):
self._retreat(index)
return self._parse_sequence_properties()
return self.expression(
exp.Property,
this=key.to_dot() if isinstance(key, exp.Column) else key,
value=self._parse_bitwise() or self._parse_var(any_token=True),
)
# Transform the key to exp.Dot if it's dotted identifiers wrapped in exp.Column or to exp.Var otherwise
if isinstance(key, exp.Column):
key = key.to_dot() if len(key.parts) > 1 else exp.var(key.name)
value = self._parse_bitwise() or self._parse_var(any_token=True)
# Transform the value to exp.Var if it was parsed as exp.Column(exp.Identifier())
if isinstance(value, exp.Column):
value = exp.var(value.name)
return self.expression(exp.Property, this=key, value=value)
def _parse_stored(self) -> exp.FileFormatProperty:
self._match(TokenType.ALIAS)
@ -1853,7 +1861,7 @@ class Parser(metaclass=_Parser):
),
)
def _parse_unquoted_field(self):
def _parse_unquoted_field(self) -> t.Optional[exp.Expression]:
field = self._parse_field()
if isinstance(field, exp.Identifier) and not field.quoted:
field = exp.var(field)
@ -2793,7 +2801,13 @@ class Parser(metaclass=_Parser):
if not alias and not columns:
return None
return self.expression(exp.TableAlias, this=alias, columns=columns)
table_alias = self.expression(exp.TableAlias, this=alias, columns=columns)
# We bubble up comments from the Identifier to the TableAlias
if isinstance(alias, exp.Identifier):
table_alias.add_comments(alias.pop_comments())
return table_alias
def _parse_subquery(
self, this: t.Optional[exp.Expression], parse_alias: bool = True
@ -4060,7 +4074,7 @@ class Parser(metaclass=_Parser):
return this
return self.expression(exp.Escape, this=this, expression=self._parse_string())
def _parse_interval(self, match_interval: bool = True) -> t.Optional[exp.Interval]:
def _parse_interval(self, match_interval: bool = True) -> t.Optional[exp.Add | exp.Interval]:
index = self._index
if not self._match(TokenType.INTERVAL) and match_interval:
@ -4090,23 +4104,33 @@ class Parser(metaclass=_Parser):
if this and this.is_number:
this = exp.Literal.string(this.name)
elif this and this.is_string:
parts = this.name.split()
if len(parts) == 2:
parts = exp.INTERVAL_STRING_RE.findall(this.name)
if len(parts) == 1:
if unit:
# This is not actually a unit, it's something else (e.g. a "window side")
unit = None
# Unconsume the eagerly-parsed unit, since the real unit was part of the string
self._retreat(self._index - 1)
this = exp.Literal.string(parts[0])
unit = self.expression(exp.Var, this=parts[1].upper())
this = exp.Literal.string(parts[0][0])
unit = self.expression(exp.Var, this=parts[0][1].upper())
if self.INTERVAL_SPANS and self._match_text_seq("TO"):
unit = self.expression(
exp.IntervalSpan, this=unit, expression=self._parse_var(any_token=True, upper=True)
)
return self.expression(exp.Interval, this=this, unit=unit)
interval = self.expression(exp.Interval, this=this, unit=unit)
index = self._index
self._match(TokenType.PLUS)
# Convert INTERVAL 'val_1' unit_1 [+] ... [+] 'val_n' unit_n into a sum of intervals
if self._match_set((TokenType.STRING, TokenType.NUMBER), advance=False):
return self.expression(
exp.Add, this=interval, expression=self._parse_interval(match_interval=False)
)
self._retreat(index)
return interval
def _parse_bitwise(self) -> t.Optional[exp.Expression]:
this = self._parse_term()
@ -4173,38 +4197,45 @@ class Parser(metaclass=_Parser):
) -> t.Optional[exp.Expression]:
interval = parse_interval and self._parse_interval()
if interval:
# Convert INTERVAL 'val_1' unit_1 [+] ... [+] 'val_n' unit_n into a sum of intervals
while True:
index = self._index
self._match(TokenType.PLUS)
if not self._match_set((TokenType.STRING, TokenType.NUMBER), advance=False):
self._retreat(index)
break
interval = self.expression( # type: ignore
exp.Add, this=interval, expression=self._parse_interval(match_interval=False)
)
return interval
index = self._index
data_type = self._parse_types(check_func=True, allow_identifiers=False)
this = self._parse_column()
if data_type:
index2 = self._index
this = self._parse_primary()
if isinstance(this, exp.Literal):
parser = self.TYPE_LITERAL_PARSERS.get(data_type.this)
if parser:
return parser(self, this, data_type)
return self.expression(exp.Cast, this=this, to=data_type)
if not data_type.expressions:
self._retreat(index)
return self._parse_id_var() if fallback_to_identifier else self._parse_column()
# The expressions arg gets set by the parser when we have something like DECIMAL(38, 0)
# in the input SQL. In that case, we'll produce these tokens: DECIMAL ( 38 , 0 )
#
# If the index difference here is greater than 1, that means the parser itself must have
# consumed additional tokens such as the DECIMAL scale and precision in the above example.
#
# If it's not greater than 1, then it must be 1, because we've consumed at least the type
# keyword, meaning that the expressions arg of the DataType must have gotten set by a
# callable in the TYPE_CONVERTERS mapping. For example, Snowflake converts DECIMAL to
# DECIMAL(38, 0)) in order to facilitate the data type's transpilation.
#
# In these cases, we don't really want to return the converted type, but instead retreat
# and try to parse a Column or Identifier in the section below.
if data_type.expressions and index2 - index > 1:
self._retreat(index2)
return self._parse_column_ops(data_type)
return self._parse_column_ops(data_type)
self._retreat(index)
if fallback_to_identifier:
return self._parse_id_var()
this = self._parse_column()
return this and self._parse_column_ops(this)
def _parse_type_size(self) -> t.Optional[exp.DataTypeParam]:
@ -4268,7 +4299,7 @@ class Parser(metaclass=_Parser):
if self._match(TokenType.L_PAREN):
if is_struct:
expressions = self._parse_csv(self._parse_struct_types)
expressions = self._parse_csv(lambda: self._parse_struct_types(type_required=True))
elif nested:
expressions = self._parse_csv(
lambda: self._parse_types(
@ -4369,8 +4400,26 @@ class Parser(metaclass=_Parser):
elif expressions:
this.set("expressions", expressions)
while self._match_pair(TokenType.L_BRACKET, TokenType.R_BRACKET):
this = exp.DataType(this=exp.DataType.Type.ARRAY, expressions=[this], nested=True)
index = self._index
# Postgres supports the INT ARRAY[3] syntax as a synonym for INT[3]
matched_array = self._match(TokenType.ARRAY)
while self._curr:
matched_l_bracket = self._match(TokenType.L_BRACKET)
if not matched_l_bracket and not matched_array:
break
matched_array = False
values = self._parse_csv(self._parse_conjunction) or None
if values and not schema:
self._retreat(index)
break
this = exp.DataType(
this=exp.DataType.Type.ARRAY, expressions=[this], values=values, nested=True
)
self._match(TokenType.R_BRACKET)
if self.TYPE_CONVERTER and isinstance(this.this, exp.DataType.Type):
converter = self.TYPE_CONVERTER.get(this.this)
@ -4386,15 +4435,16 @@ class Parser(metaclass=_Parser):
or self._parse_id_var()
)
self._match(TokenType.COLON)
column_def = self._parse_column_def(this)
if type_required and (
(isinstance(this, exp.Column) and this.this is column_def) or this is column_def
if (
type_required
and not isinstance(this, exp.DataType)
and not self._match_set(self.TYPE_TOKENS, advance=False)
):
self._retreat(index)
return self._parse_types()
return column_def
return self._parse_column_def(this)
def _parse_at_time_zone(self, this: t.Optional[exp.Expression]) -> t.Optional[exp.Expression]:
if not self._match_text_seq("AT", "TIME", "ZONE"):
@ -6030,7 +6080,19 @@ class Parser(metaclass=_Parser):
return self.expression(exp.AlterColumn, this=column, default=self._parse_conjunction())
if self._match(TokenType.COMMENT):
return self.expression(exp.AlterColumn, this=column, comment=self._parse_string())
if self._match_text_seq("DROP", "NOT", "NULL"):
return self.expression(
exp.AlterColumn,
this=column,
drop=True,
allow_null=True,
)
if self._match_text_seq("SET", "NOT", "NULL"):
return self.expression(
exp.AlterColumn,
this=column,
allow_null=False,
)
self._match_text_seq("SET", "DATA")
self._match_text_seq("TYPE")
return self.expression(
@ -6595,12 +6657,23 @@ class Parser(metaclass=_Parser):
return self.expression(exp.WithOperator, this=this, op=op)
def _parse_wrapped_options(self) -> t.List[t.Optional[exp.Expression]]:
opts = []
self._match(TokenType.EQ)
self._match(TokenType.L_PAREN)
opts: t.List[t.Optional[exp.Expression]] = []
while self._curr and not self._match(TokenType.R_PAREN):
opts.append(self._parse_conjunction())
if self._match_text_seq("FORMAT_NAME", "="):
# The FORMAT_NAME can be set to an identifier for Snowflake and T-SQL,
# so we parse it separately to use _parse_field()
prop = self.expression(
exp.Property, this=exp.var("FORMAT_NAME"), value=self._parse_field()
)
opts.append(prop)
else:
opts.append(self._parse_property())
self._match(TokenType.COMMA)
return opts
def _parse_copy_parameters(self) -> t.List[exp.CopyParameter]:
@ -6608,37 +6681,38 @@ class Parser(metaclass=_Parser):
options = []
while self._curr and not self._match(TokenType.R_PAREN, advance=False):
option = self._parse_unquoted_field()
value = None
option = self._parse_var(any_token=True)
prev = self._prev.text.upper()
# Some options are defined as functions with the values as params
if not isinstance(option, exp.Func):
prev = self._prev.text.upper()
# Different dialects might separate options and values by white space, "=" and "AS"
self._match(TokenType.EQ)
self._match(TokenType.ALIAS)
# Different dialects might separate options and values by white space, "=" and "AS"
self._match(TokenType.EQ)
self._match(TokenType.ALIAS)
if prev == "FILE_FORMAT" and self._match(TokenType.L_PAREN):
# Snowflake FILE_FORMAT case
value = self._parse_wrapped_options()
else:
value = self._parse_unquoted_field()
param = self.expression(exp.CopyParameter, this=option)
if prev in self.COPY_INTO_VARLEN_OPTIONS and self._match(
TokenType.L_PAREN, advance=False
):
# Snowflake FILE_FORMAT case, Databricks COPY & FORMAT options
param.set("expressions", self._parse_wrapped_options())
elif prev == "FILE_FORMAT":
# T-SQL's external file format case
param.set("expression", self._parse_field())
else:
param.set("expression", self._parse_unquoted_field())
param = self.expression(exp.CopyParameter, this=option, expression=value)
options.append(param)
if sep:
self._match(sep)
self._match(sep)
return options
def _parse_credentials(self) -> t.Optional[exp.Credentials]:
expr = self.expression(exp.Credentials)
if self._match_text_seq("STORAGE_INTEGRATION", advance=False):
expr.set("storage", self._parse_conjunction())
if self._match_text_seq("STORAGE_INTEGRATION", "="):
expr.set("storage", self._parse_field())
if self._match_text_seq("CREDENTIALS"):
# Snowflake supports CREDENTIALS = (...), while Redshift CREDENTIALS <string>
# Snowflake case: CREDENTIALS = (...), Redshift case: CREDENTIALS <string>
creds = (
self._parse_wrapped_options() if self._match(TokenType.EQ) else self._parse_field()
)
@ -6661,7 +6735,7 @@ class Parser(metaclass=_Parser):
self._match(TokenType.INTO)
this = (
self._parse_conjunction()
self._parse_select(nested=True, parse_subquery_alias=False)
if self._match(TokenType.L_PAREN, advance=False)
else self._parse_table(schema=True)
)

View file

@ -155,13 +155,16 @@ class AbstractMappingSchema:
return [table.this.name]
return [table.text(part) for part in exp.TABLE_PARTS if table.text(part)]
def find(self, table: exp.Table, raise_on_missing: bool = True) -> t.Optional[t.Any]:
def find(
self, table: exp.Table, raise_on_missing: bool = True, ensure_data_types: bool = False
) -> t.Optional[t.Any]:
"""
Returns the schema of a given table.
Args:
table: the target table.
raise_on_missing: whether to raise in case the schema is not found.
ensure_data_types: whether to convert `str` types to their `DataType` equivalents.
Returns:
The schema of the target table.
@ -239,6 +242,20 @@ class MappingSchema(AbstractMappingSchema, Schema):
normalize=mapping_schema.normalize,
)
def find(
self, table: exp.Table, raise_on_missing: bool = True, ensure_data_types: bool = False
) -> t.Optional[t.Any]:
schema = super().find(
table, raise_on_missing=raise_on_missing, ensure_data_types=ensure_data_types
)
if ensure_data_types and isinstance(schema, dict):
schema = {
col: self._to_data_type(dtype) if isinstance(dtype, str) else dtype
for col, dtype in schema.items()
}
return schema
def copy(self, **kwargs) -> MappingSchema:
return MappingSchema(
**{ # type: ignore

View file

@ -42,6 +42,7 @@ class TestClickhouse(Validator):
self.assertEqual(expr.sql(dialect="clickhouse"), "COUNT(x)")
self.assertIsNone(expr._meta)
self.validate_identity("SELECT CAST(x AS Tuple(String, Array(Nullable(Float64))))")
self.validate_identity("countIf(x, y)")
self.validate_identity("x = y")
self.validate_identity("x <> y")

View file

@ -9,6 +9,7 @@ class TestDatabricks(Validator):
def test_databricks(self):
self.validate_identity("DESCRIBE HISTORY a.b")
self.validate_identity("DESCRIBE history.tbl")
self.validate_identity("CREATE TABLE t (a STRUCT<c: MAP<STRING, STRING>>)")
self.validate_identity("CREATE TABLE t (c STRUCT<interval: DOUBLE COMMENT 'aaa'>)")
self.validate_identity("CREATE TABLE my_table TBLPROPERTIES (a.b=15)")
self.validate_identity("CREATE TABLE my_table TBLPROPERTIES ('a.b'=15)")
@ -50,7 +51,7 @@ class TestDatabricks(Validator):
"TRUNCATE TABLE t1 PARTITION(age = 10, name = 'test', city LIKE 'LA')"
)
self.validate_identity(
"COPY INTO target FROM `s3://link` FILEFORMAT = AVRO VALIDATE = ALL FILES = ('file1', 'file2') FORMAT_OPTIONS(opt1 = TRUE, opt2 = 'test') COPY_OPTIONS(opt3 = 5)"
"COPY INTO target FROM `s3://link` FILEFORMAT = AVRO VALIDATE = ALL FILES = ('file1', 'file2') FORMAT_OPTIONS ('opt1'='true', 'opt2'='test') COPY_OPTIONS ('mergeSchema'='true')"
)
self.validate_all(

View file

@ -735,7 +735,7 @@ class TestDuckDB(Validator):
)
self.validate_identity(
"COPY lineitem FROM 'lineitem.ndjson' WITH (FORMAT JSON, DELIMITER ',', AUTO_DETECT TRUE, COMPRESSION SNAPPY, CODEC ZSTD, FORCE_NOT_NULL(col1, col2))"
"COPY lineitem FROM 'lineitem.ndjson' WITH (FORMAT JSON, DELIMITER ',', AUTO_DETECT TRUE, COMPRESSION SNAPPY, CODEC ZSTD, FORCE_NOT_NULL (col1, col2))"
)
self.validate_identity(
"COPY (SELECT 42 AS a, 'hello' AS b) TO 'query.json' WITH (FORMAT JSON, ARRAY TRUE)"

View file

@ -406,7 +406,9 @@ class TestHive(Validator):
self.validate_identity("(VALUES (1 AS a, 2 AS b, 3))")
self.validate_identity("SELECT * FROM my_table TIMESTAMP AS OF DATE_ADD(CURRENT_DATE, -1)")
self.validate_identity("SELECT * FROM my_table VERSION AS OF DATE_ADD(CURRENT_DATE, -1)")
self.validate_identity(
"SELECT WEEKOFYEAR('2024-05-22'), DAYOFMONTH('2024-05-22'), DAYOFWEEK('2024-05-22')"
)
self.validate_identity(
"SELECT ROW() OVER (DISTRIBUTE BY x SORT BY y)",
"SELECT ROW() OVER (PARTITION BY x ORDER BY y)",

View file

@ -22,6 +22,7 @@ class TestPostgres(Validator):
self.assertIsInstance(expr, exp.AlterTable)
self.assertEqual(expr.sql(dialect="postgres"), alter_table_only)
self.validate_identity("STRING_TO_ARRAY('xx~^~yy~^~zz', '~^~', 'yy')")
self.validate_identity("SELECT x FROM t WHERE CAST($1 AS TEXT) = 'ok'")
self.validate_identity("SELECT * FROM t TABLESAMPLE SYSTEM (50) REPEATABLE (55)")
self.validate_identity("x @@ y")
@ -327,6 +328,16 @@ class TestPostgres(Validator):
"CAST(x AS BIGINT)",
)
self.validate_all(
"STRING_TO_ARRAY('xx~^~yy~^~zz', '~^~', 'yy')",
read={
"doris": "SPLIT_BY_STRING('xx~^~yy~^~zz', '~^~', 'yy')",
},
write={
"doris": "SPLIT_BY_STRING('xx~^~yy~^~zz', '~^~', 'yy')",
"postgres": "STRING_TO_ARRAY('xx~^~yy~^~zz', '~^~', 'yy')",
},
)
self.validate_all(
"SELECT ARRAY[1, 2, 3] @> ARRAY[1, 2]",
read={
@ -706,6 +717,9 @@ class TestPostgres(Validator):
self.validate_identity(
"COPY (SELECT * FROM t) TO 'file' WITH (FORMAT format, HEADER MATCH, FREEZE TRUE)"
)
self.validate_identity("cast(a as FLOAT)", "CAST(a AS DOUBLE PRECISION)")
self.validate_identity("cast(a as FLOAT8)", "CAST(a AS DOUBLE PRECISION)")
self.validate_identity("cast(a as FLOAT4)", "CAST(a AS REAL)")
def test_ddl(self):
# Checks that user-defined types are parsed into DataType instead of Identifier
@ -723,6 +737,8 @@ class TestPostgres(Validator):
cdef.args["kind"].assert_is(exp.DataType)
self.assertEqual(expr.sql(dialect="postgres"), "CREATE TABLE t (x INTERVAL DAY)")
self.validate_identity("CREATE TABLE t (col INT[3][5])")
self.validate_identity("CREATE TABLE t (col INT[3])")
self.validate_identity("CREATE INDEX IF NOT EXISTS ON t(c)")
self.validate_identity("CREATE INDEX et_vid_idx ON et(vid) INCLUDE (fid)")
self.validate_identity("CREATE INDEX idx_x ON x USING BTREE(x, y) WHERE (NOT y IS NULL)")
@ -845,6 +861,14 @@ class TestPostgres(Validator):
self.validate_identity(
"CREATE UNLOGGED TABLE foo AS WITH t(c) AS (SELECT 1) SELECT * FROM (SELECT c AS c FROM t) AS temp"
)
self.validate_identity(
"CREATE TABLE t (col integer ARRAY[3])",
"CREATE TABLE t (col INT[3])",
)
self.validate_identity(
"CREATE TABLE t (col integer ARRAY)",
"CREATE TABLE t (col INT[])",
)
self.validate_identity(
"CREATE FUNCTION x(INT) RETURNS INT SET search_path TO 'public'",
"CREATE FUNCTION x(INT) RETURNS INT SET search_path = 'public'",

View file

@ -210,6 +210,8 @@ class TestPresto(Validator):
"bigquery": f"SELECT INTERVAL '1' {expected}",
"presto": f"SELECT INTERVAL '1' {expected}",
"trino": f"SELECT INTERVAL '1' {expected}",
"mysql": f"SELECT INTERVAL '1' {expected}",
"doris": f"SELECT INTERVAL '1' {expected}",
},
)

View file

@ -6,6 +6,13 @@ class TestRedshift(Validator):
dialect = "redshift"
def test_redshift(self):
self.validate_all(
"SELECT SPLIT_TO_ARRAY('12,345,6789')",
write={
"postgres": "SELECT STRING_TO_ARRAY('12,345,6789', ',')",
"redshift": "SELECT SPLIT_TO_ARRAY('12,345,6789', ',')",
},
)
self.validate_all(
"GETDATE()",
read={
@ -473,6 +480,10 @@ FROM (
self.validate_identity("CREATE TABLE table_backup BACKUP YES AS SELECT * FROM event")
self.validate_identity("CREATE TABLE table_backup (i INTEGER, b VARCHAR) BACKUP NO")
self.validate_identity("CREATE TABLE table_backup (i INTEGER, b VARCHAR) BACKUP YES")
self.validate_identity(
"select foo, bar from table_1 minus select foo, bar from table_2",
"SELECT foo, bar FROM table_1 EXCEPT SELECT foo, bar FROM table_2",
)
def test_create_table_like(self):
self.validate_identity(

View file

@ -2,6 +2,7 @@ from unittest import mock
from sqlglot import UnsupportedError, exp, parse_one
from sqlglot.optimizer.normalize_identifiers import normalize_identifiers
from sqlglot.optimizer.qualify_columns import quote_identifiers
from tests.dialects.test_dialect import Validator
@ -48,6 +49,8 @@ WHERE
)""",
)
self.validate_identity("SELECT number").selects[0].assert_is(exp.Column)
self.validate_identity("INTERVAL '4 years, 5 months, 3 hours'")
self.validate_identity("ALTER TABLE table1 CLUSTER BY (name DESC)")
self.validate_identity("SELECT rename, replace")
self.validate_identity("SELECT TIMEADD(HOUR, 2, CAST('09:05:03' AS TIME))")
@ -110,6 +113,10 @@ WHERE
self.validate_identity(
"SELECT * FROM DATA AS DATA_L ASOF JOIN DATA AS DATA_R MATCH_CONDITION (DATA_L.VAL > DATA_R.VAL) ON DATA_L.ID = DATA_R.ID"
)
self.validate_identity(
"CURRENT_TIMESTAMP - INTERVAL '1 w' AND (1 = 1)",
"CURRENT_TIMESTAMP() - INTERVAL '1 WEEK' AND (1 = 1)",
)
self.validate_identity(
"REGEXP_REPLACE('target', 'pattern', '\n')",
"REGEXP_REPLACE('target', 'pattern', '\\n')",
@ -1186,6 +1193,25 @@ WHERE
)
def test_ddl(self):
for constraint_prefix in ("WITH ", ""):
with self.subTest(f"Constraint prefix: {constraint_prefix}"):
self.validate_identity(
f"CREATE TABLE t (id INT {constraint_prefix}MASKING POLICY p)",
"CREATE TABLE t (id INT MASKING POLICY p)",
)
self.validate_identity(
f"CREATE TABLE t (id INT {constraint_prefix}MASKING POLICY p USING (c1, c2, c3))",
"CREATE TABLE t (id INT MASKING POLICY p USING (c1, c2, c3))",
)
self.validate_identity(
f"CREATE TABLE t (id INT {constraint_prefix}PROJECTION POLICY p)",
"CREATE TABLE t (id INT PROJECTION POLICY p)",
)
self.validate_identity(
f"CREATE TABLE t (id INT {constraint_prefix}TAG (key1='value_1', key2='value_2'))",
"CREATE TABLE t (id INT TAG (key1='value_1', key2='value_2'))",
)
self.validate_identity(
"""create external table et2(
col1 date as (parse_json(metadata$external_table_partition):COL1::date),
@ -1210,6 +1236,9 @@ WHERE
self.validate_identity(
"CREATE OR REPLACE TAG IF NOT EXISTS cost_center COMMENT='cost_center tag'"
).this.assert_is(exp.Identifier)
self.validate_identity(
"ALTER TABLE db_name.schmaName.tblName ADD COLUMN COLUMN_1 VARCHAR NOT NULL TAG (key1='value_1')"
)
self.validate_identity(
"DROP FUNCTION my_udf (OBJECT(city VARCHAR, zipcode DECIMAL(38, 0), val ARRAY(BOOLEAN)))"
)
@ -1283,6 +1312,20 @@ WHERE
write={"snowflake": "CREATE TABLE a (b INT)"},
)
for action in ("SET", "DROP"):
with self.subTest(f"ALTER COLUMN {action} NOT NULL"):
self.validate_all(
f"""
ALTER TABLE a
ALTER COLUMN my_column {action} NOT NULL;
""",
write={
"snowflake": f"ALTER TABLE a ALTER COLUMN my_column {action} NOT NULL",
"duckdb": f"ALTER TABLE a ALTER COLUMN my_column {action} NOT NULL",
"postgres": f"ALTER TABLE a ALTER COLUMN my_column {action} NOT NULL",
},
)
def test_user_defined_functions(self):
self.validate_all(
"CREATE FUNCTION a(x DATE, y BIGINT) RETURNS ARRAY LANGUAGE JAVASCRIPT AS $$ SELECT 1 $$",
@ -1880,22 +1923,21 @@ STORAGE_ALLOWED_LOCATIONS=('s3://mybucket1/path1/', 's3://mybucket2/path2/')""",
def test_copy(self):
self.validate_identity("COPY INTO test (c1) FROM (SELECT $1.c1 FROM @mystage)")
self.validate_identity(
"""COPY INTO temp FROM @random_stage/path/ FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' NULL_IF = () FIELD_OPTIONALLY_ENCLOSED_BY = '"' TIMESTAMP_FORMAT = 'TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' DATE_FORMAT = 'TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' BINARY_FORMAT = BASE64) VALIDATION_MODE = 'RETURN_3_ROWS'"""
"""COPY INTO temp FROM @random_stage/path/ FILE_FORMAT = (TYPE=CSV FIELD_DELIMITER='|' NULL_IF=('str1', 'str2') FIELD_OPTIONALLY_ENCLOSED_BY='"' TIMESTAMP_FORMAT='TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' DATE_FORMAT='TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' BINARY_FORMAT=BASE64) VALIDATION_MODE = 'RETURN_3_ROWS'"""
)
self.validate_identity(
"""COPY INTO load1 FROM @%load1/data1/ FILES = ('test1.csv', 'test2.csv') FORCE = TRUE"""
"""COPY INTO load1 FROM @%load1/data1/ CREDENTIALS = (AWS_KEY_ID='id' AWS_SECRET_KEY='key' AWS_TOKEN='token') FILES = ('test1.csv', 'test2.csv') FORCE = TRUE"""
)
self.validate_identity(
"""COPY INTO mytable FROM 'azure://myaccount.blob.core.windows.net/mycontainer/data/files' CREDENTIALS = (AZURE_SAS_TOKEN = 'token') ENCRYPTION = (TYPE = 'AZURE_CSE' MASTER_KEY = 'kPx...') FILE_FORMAT = (FORMAT_NAME = my_csv_format)"""
"""COPY INTO mytable FROM 'azure://myaccount.blob.core.windows.net/mycontainer/data/files' CREDENTIALS = (AZURE_SAS_TOKEN='token') ENCRYPTION = (TYPE='AZURE_CSE' MASTER_KEY='kPx...') FILE_FORMAT = (FORMAT_NAME=my_csv_format)"""
)
self.validate_identity(
"""COPY INTO mytable (col1, col2) FROM 's3://mybucket/data/files' FILES = ('file1', 'file2') PATTERN = 'pattern' FILE_FORMAT = (FORMAT_NAME = my_csv_format NULL_IF = ('str1', 'str2')) PARSE_HEADER = TRUE"""
"""COPY INTO mytable (col1, col2) FROM 's3://mybucket/data/files' STORAGE_INTEGRATION = "storage" ENCRYPTION = (TYPE='NONE' MASTER_KEY='key') FILES = ('file1', 'file2') PATTERN = 'pattern' FILE_FORMAT = (FORMAT_NAME=my_csv_format NULL_IF=('')) PARSE_HEADER = TRUE"""
)
self.validate_all(
"""COPY INTO 's3://example/data.csv'
FROM EXTRA.EXAMPLE.TABLE
credentials = (x)
STORAGE_INTEGRATION = S3_INTEGRATION
CREDENTIALS = ()
FILE_FORMAT = (TYPE = CSV COMPRESSION = NONE NULL_IF = ('') FIELD_OPTIONALLY_ENCLOSED_BY = '"')
HEADER = TRUE
OVERWRITE = TRUE
@ -1904,22 +1946,20 @@ STORAGE_ALLOWED_LOCATIONS=('s3://mybucket1/path1/', 's3://mybucket2/path2/')""",
write={
"": """COPY INTO 's3://example/data.csv'
FROM EXTRA.EXAMPLE.TABLE
CREDENTIALS = (x) WITH (
STORAGE_INTEGRATION S3_INTEGRATION,
FILE_FORMAT = (TYPE = CSV COMPRESSION = NONE NULL_IF = (
CREDENTIALS = () WITH (
FILE_FORMAT = (TYPE=CSV COMPRESSION=NONE NULL_IF=(
''
) FIELD_OPTIONALLY_ENCLOSED_BY = '"'),
) FIELD_OPTIONALLY_ENCLOSED_BY='"'),
HEADER TRUE,
OVERWRITE TRUE,
SINGLE TRUE
)""",
"snowflake": """COPY INTO 's3://example/data.csv'
FROM EXTRA.EXAMPLE.TABLE
CREDENTIALS = (x)
STORAGE_INTEGRATION = S3_INTEGRATION
FILE_FORMAT = (TYPE = CSV COMPRESSION = NONE NULL_IF = (
CREDENTIALS = ()
FILE_FORMAT = (TYPE=CSV COMPRESSION=NONE NULL_IF=(
''
) FIELD_OPTIONALLY_ENCLOSED_BY = '"')
) FIELD_OPTIONALLY_ENCLOSED_BY='"')
HEADER = TRUE
OVERWRITE = TRUE
SINGLE = TRUE""",
@ -1929,19 +1969,27 @@ SINGLE = TRUE""",
self.validate_all(
"""COPY INTO 's3://example/data.csv'
FROM EXTRA.EXAMPLE.TABLE
credentials = (x)
STORAGE_INTEGRATION = S3_INTEGRATION
FILE_FORMAT = (TYPE = CSV COMPRESSION = NONE NULL_IF = ('') FIELD_OPTIONALLY_ENCLOSED_BY = '"')
FILE_FORMAT = (TYPE=CSV COMPRESSION=NONE NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"')
HEADER = TRUE
OVERWRITE = TRUE
SINGLE = TRUE
""",
write={
"": """COPY INTO 's3://example/data.csv' FROM EXTRA.EXAMPLE.TABLE CREDENTIALS = (x) WITH (STORAGE_INTEGRATION S3_INTEGRATION, FILE_FORMAT = (TYPE = CSV COMPRESSION = NONE NULL_IF = ('') FIELD_OPTIONALLY_ENCLOSED_BY = '"'), HEADER TRUE, OVERWRITE TRUE, SINGLE TRUE)""",
"snowflake": """COPY INTO 's3://example/data.csv' FROM EXTRA.EXAMPLE.TABLE CREDENTIALS = (x) STORAGE_INTEGRATION = S3_INTEGRATION FILE_FORMAT = (TYPE = CSV COMPRESSION = NONE NULL_IF = ('') FIELD_OPTIONALLY_ENCLOSED_BY = '"') HEADER = TRUE OVERWRITE = TRUE SINGLE = TRUE""",
"": """COPY INTO 's3://example/data.csv' FROM EXTRA.EXAMPLE.TABLE STORAGE_INTEGRATION = S3_INTEGRATION WITH (FILE_FORMAT = (TYPE=CSV COMPRESSION=NONE NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"'), HEADER TRUE, OVERWRITE TRUE, SINGLE TRUE)""",
"snowflake": """COPY INTO 's3://example/data.csv' FROM EXTRA.EXAMPLE.TABLE STORAGE_INTEGRATION = S3_INTEGRATION FILE_FORMAT = (TYPE=CSV COMPRESSION=NONE NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"') HEADER = TRUE OVERWRITE = TRUE SINGLE = TRUE""",
},
)
copy_ast = parse_one(
"""COPY INTO 's3://example/contacts.csv' FROM db.tbl STORAGE_INTEGRATION = PROD_S3_SIDETRADE_INTEGRATION FILE_FORMAT = (FORMAT_NAME=my_csv_format TYPE=CSV COMPRESSION=NONE NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"') MATCH_BY_COLUMN_NAME = CASE_SENSITIVE OVERWRITE = TRUE SINGLE = TRUE INCLUDE_METADATA = (col1 = METADATA$START_SCAN_TIME)""",
read="snowflake",
)
self.assertEqual(
quote_identifiers(copy_ast, dialect="snowflake").sql(dialect="snowflake"),
"""COPY INTO 's3://example/contacts.csv' FROM "db"."tbl" STORAGE_INTEGRATION = "PROD_S3_SIDETRADE_INTEGRATION" FILE_FORMAT = (FORMAT_NAME="my_csv_format" TYPE=CSV COMPRESSION=NONE NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"') MATCH_BY_COLUMN_NAME = CASE_SENSITIVE OVERWRITE = TRUE SINGLE = TRUE INCLUDE_METADATA = ("col1" = "METADATA$START_SCAN_TIME")""",
)
def test_querying_semi_structured_data(self):
self.validate_identity("SELECT $1")
self.validate_identity("SELECT $1.elem")
@ -1958,10 +2006,10 @@ SINGLE = TRUE""",
self.validate_identity("ALTER TABLE table1 SET TAG foo.bar = 'baz'")
self.validate_identity("ALTER TABLE IF EXISTS foo SET TAG a = 'a', b = 'b', c = 'c'")
self.validate_identity(
"""ALTER TABLE tbl SET STAGE_FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' NULL_IF = () FIELD_OPTIONALLY_ENCLOSED_BY = '"' TIMESTAMP_FORMAT = 'TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' DATE_FORMAT = 'TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' BINARY_FORMAT = BASE64)""",
"""ALTER TABLE tbl SET STAGE_FILE_FORMAT = (TYPE=CSV FIELD_DELIMITER='|' NULL_IF=('') FIELD_OPTIONALLY_ENCLOSED_BY='"' TIMESTAMP_FORMAT='TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' DATE_FORMAT='TZHTZM YYYY-MM-DD HH24:MI:SS.FF9' BINARY_FORMAT=BASE64)""",
)
self.validate_identity(
"""ALTER TABLE tbl SET STAGE_COPY_OPTIONS = (ON_ERROR = SKIP_FILE SIZE_LIMIT = 5 PURGE = TRUE MATCH_BY_COLUMN_NAME = CASE_SENSITIVE)"""
"""ALTER TABLE tbl SET STAGE_COPY_OPTIONS = (ON_ERROR=SKIP_FILE SIZE_LIMIT=5 PURGE=TRUE MATCH_BY_COLUMN_NAME=CASE_SENSITIVE)"""
)
self.validate_identity("ALTER TABLE foo UNSET TAG a, b, c")

View file

@ -325,7 +325,7 @@ TBLPROPERTIES (
write={
"clickhouse": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS name UNION ALL SELECT NULL AS id, 'jake' AS name) SELECT COUNT(DISTINCT id, name) AS cnt FROM tbl",
"databricks": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS name UNION ALL SELECT NULL AS id, 'jake' AS name) SELECT COUNT(DISTINCT id, name) AS cnt FROM tbl",
"doris": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS name UNION ALL SELECT NULL AS id, 'jake' AS name) SELECT COUNT(DISTINCT id, name) AS cnt FROM tbl",
"doris": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS `name` UNION ALL SELECT NULL AS id, 'jake' AS `name`) SELECT COUNT(DISTINCT id, `name`) AS cnt FROM tbl",
"duckdb": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS name UNION ALL SELECT NULL AS id, 'jake' AS name) SELECT COUNT(DISTINCT CASE WHEN id IS NULL THEN NULL WHEN name IS NULL THEN NULL ELSE (id, name) END) AS cnt FROM tbl",
"hive": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS name UNION ALL SELECT NULL AS id, 'jake' AS name) SELECT COUNT(DISTINCT id, name) AS cnt FROM tbl",
"mysql": "WITH tbl AS (SELECT 1 AS id, 'eggy' AS name UNION ALL SELECT NULL AS id, 'jake' AS name) SELECT COUNT(DISTINCT id, name) AS cnt FROM tbl",

View file

@ -31,7 +31,7 @@ class TestTSQL(Validator):
self.validate_identity("CAST(x AS int) OR y", "CAST(x AS INTEGER) <> 0 OR y <> 0")
self.validate_identity("TRUNCATE TABLE t1 WITH (PARTITIONS(1, 2 TO 5, 10 TO 20, 84))")
self.validate_identity(
"COPY INTO test_1 FROM 'path' WITH (FILE_TYPE = 'CSV', CREDENTIAL = (IDENTITY = 'Shared Access Signature', SECRET = 'token'), FIELDTERMINATOR = ';', ROWTERMINATOR = '0X0A', ENCODING = 'UTF8', DATEFORMAT = 'ymd', MAXERRORS = 10, ERRORFILE = 'errorsfolder', IDENTITY_INSERT = 'ON')"
"COPY INTO test_1 FROM 'path' WITH (FORMAT_NAME = test, FILE_TYPE = 'CSV', CREDENTIAL = (IDENTITY='Shared Access Signature', SECRET='token'), FIELDTERMINATOR = ';', ROWTERMINATOR = '0X0A', ENCODING = 'UTF8', DATEFORMAT = 'ymd', MAXERRORS = 10, ERRORFILE = 'errorsfolder', IDENTITY_INSERT = 'ON')"
)
self.validate_all(
@ -1093,7 +1093,13 @@ WHERE
self.validate_all("LEN('x')", write={"tsql": "LEN('x')", "spark": "LENGTH('x')"})
def test_replicate(self):
self.validate_all("REPLICATE('x', 2)", write={"spark": "REPEAT('x', 2)"})
self.validate_all(
"REPLICATE('x', 2)",
write={
"spark": "REPEAT('x', 2)",
"tsql": "REPLICATE('x', 2)",
},
)
def test_isnull(self):
self.validate_all("ISNULL(x, y)", write={"spark": "COALESCE(x, y)"})

View file

@ -210,6 +210,9 @@ SELECT _q_1.a AS a FROM (SELECT _q_0.a AS a FROM (SELECT x.a AS a FROM x AS x) A
SELECT x.a FROM x AS x JOIN (SELECT * FROM x) AS y ON x.a = y.a;
SELECT x.a AS a FROM x AS x JOIN (SELECT x.a AS a, x.b AS b FROM x AS x) AS y ON x.a = y.a;
SELECT a FROM x as t1 /* there is comment */;
SELECT t1.a AS a FROM x AS t1 /* there is comment */;
--------------------------------------
-- Joins
--------------------------------------
@ -314,6 +317,28 @@ SELECT s.a AS a, s.b AS b FROM (SELECT t.a AS a, t.b AS b FROM t AS t) AS s;
SELECT * FROM (SELECT * FROM t1 UNION ALL SELECT * FROM t2) AS s(b);
SELECT s.b AS b FROM (SELECT t1.b AS b FROM t1 AS t1 UNION ALL SELECT t2.b AS b FROM t2 AS t2) AS s;
# dialect: bigquery
# execute: false
WITH tbl1 AS (SELECT STRUCT(1 AS col1, 2 AS col2, Struct("test" AS col1, Struct(3 AS col2) AS lvl2) AS lvl1) AS col), tbl2 AS (SELECT STRUCT(1 AS col1, 2 AS col2, Struct("test" AS col1, Struct(3 AS col2) AS lvl2) AS lvl1) AS col) SELECT tbl1.col.*, tbl2.col.* FROM tbl1, tbl2;
WITH tbl1 AS (SELECT STRUCT(1 AS col1, 2 AS col2, Struct('test' AS col1, Struct(3 AS col2) AS lvl2) AS lvl1) AS col), tbl2 AS (SELECT STRUCT(1 AS col1, 2 AS col2, Struct('test' AS col1, Struct(3 AS col2) AS lvl2) AS lvl1) AS col) SELECT tbl1.col.col1 AS col1, tbl1.col.col2 AS col2, tbl1.col.lvl1 AS lvl1, tbl2.col.col1 AS col1, tbl2.col.col2 AS col2, tbl2.col.lvl1 AS lvl1 FROM tbl1 AS tbl1, tbl2 AS tbl2;
# dialect: bigquery
# execute: false
WITH tbl1 AS (SELECT STRUCT(1 AS col1, 2 AS col2, Struct("test" AS col1, Struct(3 AS col2) AS lvl2) AS lvl1, 3 AS col3) AS col) SELECT tbl1.col.lvl1.* FROM tbl1;
WITH tbl1 AS (SELECT STRUCT(1 AS col1, 2 AS col2, Struct('test' AS col1, Struct(3 AS col2) AS lvl2) AS lvl1, 3 AS col3) AS col) SELECT tbl1.col.lvl1.col1 AS col1, tbl1.col.lvl1.lvl2 AS lvl2 FROM tbl1 AS tbl1;
# dialect: bigquery
# execute: false
# title: Cannot expand struct star with unnamed fields
WITH tbl1 AS (SELECT STRUCT(1 AS col1, Struct(5 AS col1)) AS col) SELECT tbl1.col.* FROM tbl1;
WITH tbl1 AS (SELECT STRUCT(1 AS col1, Struct(5 AS col1)) AS col) SELECT tbl1.col.* FROM tbl1 AS tbl1;
# dialect: bigquery
# execute: false
# title: Cannot expand struct star with ambiguous fields
WITH tbl1 AS (SELECT STRUCT(1 AS col1, 2 AS col1) AS col) SELECT tbl1.col.* FROM tbl1;
WITH tbl1 AS (SELECT STRUCT(1 AS col1, 2 AS col1) AS col) SELECT tbl1.col.* FROM tbl1 AS tbl1;
--------------------------------------
-- CTEs
--------------------------------------

View file

@ -317,6 +317,18 @@ class TestOptimizer(unittest.TestCase):
'WITH "t" AS (SELECT 1 AS "c") (SELECT "t"."c" AS "c" FROM "t" AS "t")',
)
self.assertEqual(
optimizer.qualify_columns.qualify_columns(
parse_one(
"WITH tbl1 AS (SELECT STRUCT(1 AS `f0`, 2 as f1) AS col) SELECT tbl1.col.* from tbl1",
dialect="bigquery",
),
schema=MappingSchema(schema=None, dialect="bigquery"),
infer_schema=False,
).sql(dialect="bigquery"),
"WITH tbl1 AS (SELECT STRUCT(1 AS `f0`, 2 AS f1) AS col) SELECT tbl1.col.`f0` AS `f0`, tbl1.col.f1 AS f1 FROM tbl1",
)
self.check_file(
"qualify_columns", qualify_columns, execute=True, schema=self.schema, set_dialect=True
)

View file

@ -303,3 +303,10 @@ class TestSchema(unittest.TestCase):
schema = MappingSchema({"x": {"c": "int"}})
self.assertTrue(schema.has_column("x", exp.column("c")))
self.assertFalse(schema.has_column("x", exp.column("k")))
def test_find(self):
schema = MappingSchema({"x": {"c": "int"}})
found = schema.find(exp.to_table("x"))
self.assertEqual(found, {"c": "int"})
found = schema.find(exp.to_table("x"), ensure_data_types=True)
self.assertEqual(found, {"c": exp.DataType.build("int")})