Skip to content

Conversation

@kyleconroy
Copy link
Collaborator

…itions

When a parenthesized UNION query contains a DISTINCT->ALL mode transition, the explain output should group the DISTINCT portion into a nested SelectWithUnionQuery and lift the remaining selects to the outer level.

Changes:

  • Parser now keeps parenthesized unions as nested SelectWithUnionQuery
  • Added expandNestedUnions() to flatten/expand nested unions appropriately:
    • Single-select nested unions are flattened
    • All-ALL mode nested unions are fully flattened
    • Nested unions with DISTINCT->ALL transitions are expanded to grouped results
  • Updated groupSelectsByUnionMode() to find the last non-ALL->ALL transition
  • Applied expansion logic to both regular and inherited-WITH explain paths

Fixes stmt13 and stmt28 in 01529_union_distinct_and_setting_union_default_mode

kyleconroy and others added 30 commits January 15, 2026 06:47
…itions

When a parenthesized UNION query contains a DISTINCT->ALL mode transition,
the explain output should group the DISTINCT portion into a nested
SelectWithUnionQuery and lift the remaining selects to the outer level.

Changes:
- Parser now keeps parenthesized unions as nested SelectWithUnionQuery
- Added expandNestedUnions() to flatten/expand nested unions appropriately:
  - Single-select nested unions are flattened
  - All-ALL mode nested unions are fully flattened
  - Nested unions with DISTINCT->ALL transitions are expanded to grouped results
- Updated groupSelectsByUnionMode() to find the last non-ALL->ALL transition
- Applied expansion logic to both regular and inherited-WITH explain paths

Fixes stmt13 and stmt28 in 01529_union_distinct_and_setting_union_default_mode

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
The parser was not handling the UUID clause in CREATE MATERIALIZED VIEW
statements, causing the rest of the statement to be skipped. Added UUID
handling similar to how CREATE TABLE handles it.

Fixes:
- 00510_materizlized_view_and_deduplication_zookeeper stmt9, stmt10
- 00609_mv_index_in_in stmt7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When COMMENT comes before SETTINGS in a CREATE TABLE statement, SETTINGS
should be output at the CreateQuery level (outside Storage definition).
When SETTINGS comes before COMMENT, it stays inside Storage definition.

Added SettingsBeforeComment field to track the order in the AST.

Fixes:
- 03234_enable_secure_identifiers stmt11, stmt14

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add support for parsing APPLY(quantiles(0.5)) and similar parameterized
function calls within column transformers. Previously, the parser only
handled simple function names like APPLY(sum), but not functions with
parameters.

Changes:
- Add ApplyParams field to ColumnTransformer struct in AST
- Update parseColumnsApply and parseAsteriskApply to handle nested
  parentheses for parameterized functions
- Fixes 01470_columns_transformers stmt41, stmt42
- Also fixes 01710_projection_with_column_transformers

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Changes:
- Add parsing for ORDER BY clause in CREATE DATABASE statements
- Add QuerySettings field to CreateQuery AST for second SETTINGS clause
- Update parser to store second SETTINGS in QuerySettings
- Update explain to output QuerySettings as Set at CreateQuery level
- Fixes 02184_default_table_engine stmt56 (CREATE DATABASE ORDER BY)
- Fixes 02184_default_table_engine stmt107 (multiple SETTINGS)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add parsing and explain support for RENAME DATABASE statements.

Changes:
- Add RenameDatabase field to RenameQuery AST
- Update parser to handle RENAME DATABASE syntax
- Update explain to output correct format for database renames
- Fixes 01155_rename_move_materialized_view stmt44, stmt52
- Also fixes 02096_rename_atomic_hang stmt14

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add proper parsing for SETTINGS clause in CREATE DICTIONARY statements.
Handle SETTINGS as a keyword token (not IDENT).

Changes:
- Add SETTINGS keyword handling in dictionary definition parsing
- Parse settings with or without parentheses
- Output as "Dictionary settings" (not "Set") in explain
- Update condition to include Settings in dictionary definition check
- Fixes 01268_dictionary_direct_layout stmt25, stmt26
- Also fixes: 01259_dictionary_custom_settings_ddl stmt6,
  01676_range_hashed_dictionary stmt5, 01681_cache_dictionary_simple_key stmt7,
  01760_polygon_dictionaries stmt17, 01765_hashed_dictionary_simple_key stmt7

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When an IN expression has a trailing comma like `IN (2,)`, it should be
represented as a Function tuple with one element, not as a plain literal.

- Add TrailingComma field to InExpr to track trailing comma presence
- Add parseInList helper to detect trailing commas during parsing
- Update explainInExpr and explainInExprWithAlias to wrap single elements
  with trailing comma in Function tuple

Fixes stmt45 and stmt46 in 01756_optimize_skip_unused_shards_rewrite_in
and stmt5 in 01757_optimize_skip_unused_shards_limit.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When parsing negative numbers with :: cast like `-0.11::Float32`, the
expression parsing loop in parseUnaryMinus was using LOWEST precedence,
which incorrectly consumed the `and` keyword from BETWEEN expressions.

Fix by using MUL_PREC as the threshold, which allows casts (::) and
member access (.) but stops before operators like AND.

Fixes stmt42 and stmt43 in 02892_orc_filter_pushdown and
stmt26 in 02841_parquet_filter_pushdown.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added support for underscore digit separators in binary literals
(0b0010_0100_0111) and octal literals (0o755_644) to match ClickHouse's
behavior for numeric literals with underscores.

Updated both readNumber and readNumberOrIdent functions in the lexer
to handle underscores in binary and octal number formats.

Fixes stmt6 in 02354_numeric_literals_with_underscores.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added case for TernaryExpr in explainWithElement to properly output
the alias from WITH clause. Ternary expressions (? :) become
Function if with the alias from the WITH element.

Fixes stmt1 in 03254_uniq_exact_two_level_negative_zero.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added case for "CHANGED" keyword in parseShow to handle the
SHOW CHANGED SETTINGS query form which filters settings to only
show those that have been modified from their defaults.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Test 03254_test_alter_user_no_changes has stmt2 with clientError
SYNTAX_ERROR which means ClickHouse expects it to fail parsing.
There's no explain_2.txt since ClickHouse can't produce EXPLAIN
output for a syntax error. Our parser is more permissive and
parses it anyway, but there's nothing to compare against.

Removing stmt2 from explain_todo since:
- No explain_2.txt exists to compare against
- Test already skips due to missing file
- Parser being more permissive is acceptable behavior

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ClickHouse supports short notations for interval units like
"INTERVAL 4 h" for hours. Added support for:
- h -> hour
- m -> minute
- s -> second
- d -> day
- w -> week
- ms -> millisecond
- us -> microsecond
- ns -> nanosecond

Updated both the parser's intervalUnits map and the explain code's
normalizeIntervalUnit functions to handle these short forms.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
In ClickHouse, the ternary operator (? :) has very low precedence,
lower than AND and OR. This means "0 AND id ? 1 : 2" parses as
"(0 AND id) ? 1 : 2" not "0 AND (id ? 1 : 2)".

Added TERNARY_PREC precedence level between ALIAS_PREC and OR_PREC
to ensure correct parsing of expressions with ternary operators.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ClickHouse sometimes appends "OK" as a success indicator after
EXPLAIN AST output. This is not part of the actual AST and should
be stripped when comparing expected vs actual output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
In ClickHouse, unary plus (+x) is a no-op and doesn't appear in the
EXPLAIN AST output. Updated parseUnaryPlus to simply return the
operand without wrapping it in a UnaryExpr.

This fixes parsing of expressions like (+c0.2) which should produce
just tupleElement, not Function + wrapping tupleElement.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Added parsing and explain support for the ALTER TABLE MODIFY QUERY
statement which modifies the SELECT query of a materialized view.

Changes:
- Added AlterModifyQuery command type to ast/ast.go
- Added Query field to AlterCommand struct
- Added parsing logic for MODIFY QUERY in parser.go
- Added explain output handling in statements.go

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add escapeFunctionAlias function that only escapes single quotes (not
backslashes) for function aliases. This differs from escapeAlias (used
for column aliases) which also escapes backslashes.

ClickHouse EXPLAIN AST preserves backslashes in function aliases but
requires single quotes to be escaped.

Fixes test: 02915_analyzer_fuzz_1/stmt2

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When ClickHouse errors at runtime (e.g., BAD_ARGUMENTS for invalid settings),
it doesn't produce EXPLAIN output. These statements have empty expected files
and the --{clientError annotation in the SQL.

Skip these tests since the parser correctly parses the valid SQL but ClickHouse
would error at runtime. Also update explain_todo when -check-explain finds
these cases.

Fixes test: 03001_max_parallel_replicas_zero_value/stmt3

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…UTF-8

1. Lexer: Process escape sequences (\xFF, \0, etc.) in backtick-quoted
   identifiers, matching the behavior of string literals.

2. Explain output:
   - Replace invalid UTF-8 bytes with replacement character (U+FFFD)
   - Display null bytes as escape sequence \0
   - Escape backslashes and single quotes in identifier/alias output

3. Apply sanitization to:
   - Column declarations
   - Identifier names
   - Function aliases
   - Storage definition ORDER BY identifiers

This matches ClickHouse's EXPLAIN AST behavior for handling special
characters in identifiers.

Fixes test: 03356_tables_with_binary_identifiers_invalid_utf8/stmt2

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Remove the condition that skipped quoted aliases in CASE expressions.
ClickHouse EXPLAIN AST shows the alias regardless of whether it was
quoted in the original SQL.

Fixes test: 02244_casewithexpression_return_type/stmt1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Reorder column declaration children to match ClickHouse's EXPLAIN AST:
1. Type
2. Settings (SETTINGS clause)
3. Default value
4. TTL
5. Codec
6. Statistics
7. Comment

Previously Settings was output after TTL and Statistics, but ClickHouse
outputs it right after the type.

Fixes test: 03270_fix_column_modifier_write_order/stmt3

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Add check for PARTITION ID 'value' syntax when parsing ALTER TABLE
APPLY DELETED MASK IN PARTITION clause. This sets the PartitionIsID flag
so the explain output correctly shows Partition_ID instead of Partition.

Fixes test: 03743_fix_estimator_crash/stmt6

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
FORCE is equivalent to FINAL in OPTIMIZE TABLE statements - both force
a merge. Add parsing support for FORCE to set the Final flag.

Fixes test: 03306_optimize_table_force_keyword/stmt4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When arrays contain parenthesized literals like [[((NULL))]], they should
be rendered as nested Function array calls, not as Literal Array format.

Update containsNonLiteralExpressions to detect parenthesized literals
so nested arrays containing them use the correct Function array format.

Fixes tests:
- 01621_summap_check_types/stmt4
- 01635_sum_map_fuzz/stmt4

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Skip optional DATABASE keyword in USE statements (USE DATABASE d1 == USE d1)
- Only skip DATABASE if followed by an identifier (not semicolon/EOF)
- Track clientError annotations in splitStatements for proper handling
- Handle clientError for statements with no explain file (runtime errors)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ClickHouse outputs WINDOW definitions before QUALIFY in the EXPLAIN AST.
Fixed the ordering in both explainSelectQuery and explainSelectQueryWithInheritedWith.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When outputting types like AggregateFunction(1, sumMapFiltered([1, 2]), ...),
properly format nested function calls and array literals instead of using
raw Go struct representation.

Added formatFunctionCallForType and formatExprForType to handle nested
expressions in type parameters.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When a QueryParameter (e.g., {param:Type}) has an alias (AS name),
output it as: QueryParameter param:Type (alias name)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
kyleconroy and others added 29 commits January 15, 2026 09:42
ClickHouse EXPLAIN AST preserves the original case of function names as
written in the SQL source (e.g., CEIL stays CEIL, COALESCE stays COALESCE).
Only a few functions are normalized to specific canonical forms (e.g.,
DATEDIFF→dateDiff, POSITION→position, SUBSTRING→substring).

This fixes issues where special parser functions (parseIfFunction,
parseExtract, parseSubstring, parseArrayConstructor) were hardcoding
lowercase names instead of preserving the original case from the token.
Also fixes parseKeywordAsFunction which was incorrectly lowercasing all
keyword-based function names.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
For parametric aggregate functions like groupArraySample(5, 11111)(DISTINCT x),
the DISTINCT modifier was being parsed as a column name instead of being
recognized as a modifier. This adds DISTINCT/ALL handling to parseParametricFunctionCall
matching the existing logic in parseFunctionCall.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Projection column lists like `SELECT name, max(frequency) max_frequency`
need to handle implicit aliases where an identifier follows an expression
without the AS keyword. This adds parseImplicitAlias call after parsing
each column expression in projections.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ClickHouse EXPLAIN AST normalizes CLEAR PROJECTION to DROP_PROJECTION,
similar to how CLEAR_COLUMN→DROP_COLUMN and CLEAR_INDEX→DROP_INDEX
are already mapped.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…N AST

When array literals contain nested arrays with non-literal expressions
(like identifiers or binary operations), they should be output as nested
Function array calls, not as a single Literal node. This adds a recursive
check containsNonLiteralExpressionsRecursive that properly detects
expressions like [[[number]],[[number + 1],...]] at any nesting depth.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ClickHouse allows ON CLUSTER to appear either before or after column
definitions in CREATE MATERIALIZED VIEW statements. The parser was only
checking after column definitions, causing parsing failures for syntax
like: CREATE MATERIALIZED VIEW v ON CLUSTER c (x Int) ENGINE=...

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
COMMENT was being parsed as token.COMMENT but the dictionary parsing loop
only handled PRIMARY and SETTINGS as keyword tokens. Added handling for
token.COMMENT in the loop and output the comment as a Literal in the
EXPLAIN AST.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
MOVE PARTITION can target a disk or volume in addition to a table.
Added parsing for TO DISK 'disk_name' and TO VOLUME 'volume_name'
syntax which were previously causing parse errors.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parse and explain SETTINGS for SYSTEM commands like
"SYSTEM FLUSH DISTRIBUTED ... SETTINGS ...".

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
SYSTEM LOAD PRIMARY KEY and SYSTEM UNLOAD PRIMARY KEY commands need
the table name to appear twice in EXPLAIN output, similar to
RELOAD DICTIONARY.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parse TRUNCATE DATABASE differently from TRUNCATE TABLE and format
with the correct spacing in EXPLAIN output. ClickHouse uses different
spacing conventions for these two variants.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parse ALTER TABLE ... REMOVE TTL command and output the correct
REMOVE_TTL command type in EXPLAIN output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add Settings field to KillQuery struct
- Parse SETTINGS clause for KILL QUERY/MUTATION statements
- Map comparison operators to function names in EXPLAIN output
  (e.g., = becomes equals, != becomes notEquals)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
concat_ws should NOT be normalized to concat - ClickHouse preserves
the function name as concat_ws in EXPLAIN AST output.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
RENAME COLUMN IF EXISTS was not being parsed correctly - the IF token
was being treated as the column name. This fix adds proper IF EXISTS
handling for RENAME COLUMN, similar to DROP COLUMN.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Track whether tuples have spaces after commas in the source code and
preserve this formatting when outputting EXPLAIN AST. This is needed
for Ring/Polygon/MultiPolygon type literals to match ClickHouse output.

Changes:
- Add formatTupleAsStringFromLiteral to respect SpacedCommas flag
- Track spacedCommas when parsing tuples in parseGroupedOrTuple
- Fixes 00727_concat/stmt44 and 02935_format_with_arbitrary_types/stmt44

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When an IN expression contains only NULL literals (e.g., `IN (NULL, NULL)`),
format them as a tuple literal `Tuple_(NULL, NULL)` instead of
`Function tuple`. This matches ClickHouse's EXPLAIN AST output.

The fix adds an `allNull` flag to track when all items are NULL and
allows tuple literal formatting in this case. Applied to both
explainInExpr and explainInExprWithAlias functions.

Fixes 01558_transform_null_in/stmt21

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…sions (#117)

Fixes ORDER BY expressions like `(a + b) * c` being truncated to just `(a + b)`.

- Add isBinaryOperatorToken() to detect binary operators after parenthesized expressions
- Add parseExpressionFrom() to continue Pratt parsing from an existing left operand
- Check for binary operators after parsing parenthesized expressions in ORDER BY

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
ClickHouse has special handling where INSERT VALUES followed by SELECT on the
same line outputs the INSERT AST and then executes the SELECT, printing its result.

- Add ExplainStatements() function to handle multi-statement explain output
- When first statement is INSERT and subsequent are SELECT with simple literals,
  append the literal values to match ClickHouse behavior
- Update test to use ExplainStatements() for all explain output

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ens (#119)

When detecting SET continuation vs new TTL after comma, use peek/peekPeek
to check the pattern (IDENT EQ) without consuming tokens. This avoids
losing lexer state when restoring position.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use AND_PREC instead of ALIAS_PREC when parsing INTERVAL values to prevent
consuming subsequent AND expressions as part of the interval.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Parse and explain REFRESH AFTER/EVERY interval APPEND TO syntax:
- Add REFRESH-related fields to CreateQuery AST
- Parse REFRESH type (AFTER/EVERY), interval, unit, APPEND TO, and EMPTY
- Output "Refresh strategy definition" and "TimeInterval" in explain

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…122)

The Settings column in system tables was being lexed as token.SETTINGS and
incorrectly terminating function argument parsing. Now check for array access
syntax (followed by [) before treating SETTINGS as a clause keyword.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When GROUPING SETS contains ((a,b,c)) with double parentheses, the inner
tuple should be output as Function tuple, not unwrapped. Use the Parenthesized
flag on tuple literals to detect double-paren cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Use different precedence based on the first token of the interval value:
- String literals like '1 day' have embedded units, so use ADD_PREC
  to stop before arithmetic operators
- Other expressions need arithmetic included before the unit, so use
  LOWEST precedence

This fixes `interval '1 day' - interval '1 hour'` (two separate intervals)
while still handling `INTERVAL number - 15 MONTH` correctly (arithmetic
expression with separate unit).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add WindowView and InnerEngine fields to CreateQuery AST
- Handle CREATE WINDOW VIEW syntax with TO clause and INNER ENGINE
- Parse INNER ENGINE clause for window views (storage for internal data)
- Update EXPLAIN output to show ViewTargets with Storage definition
- ORDER BY for window views goes inside ViewTargets, not separate Storage

Fixes 01049_window_view_window_functions/stmt41.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
….) AS name

- Remove special case for `(SELECT ...) AS name` in parseWithClause, letting it
  go through the expression parser which handles binary expressions
- Add ScalarWith flag to WithElement to distinguish between:
  - "name AS (SELECT ...)" - standard CTE syntax
  - "(SELECT ...) AS name" - ClickHouse scalar WITH syntax
- Update EXPLAIN output to use correct format based on ScalarWith flag

Fixes 03212_variant_dynamic_cast_or_default/stmt51.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When parsing CAST(x, type) with comma syntax, check if the type string
is followed by an operator (CONCAT, PLUS, MINUS) before consuming it.
If so, parse it as a full expression to handle cases like:
  CAST(123, 'Str'||'ing')

Fixes 03011_definitive_guide_to_cast/stmt36.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
When parsing `SELECT * REPLACE expr AS name, other_col` without parentheses,
the REPLACE parser was consuming the comma before breaking, preventing the
caller from seeing there's another select item. Now the comma is only consumed
when inside parentheses.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@kyleconroy kyleconroy merged commit 544c6b2 into main Jan 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants