[Feature] Add table function relation syntax to SQL grammar#5318
Conversation
Add table function relation support to the SQL parser: - New `tableFunctionRelation` alternative in `relation` grammar rule - Named argument syntax: `key=value` (e.g., table='index', field='vec') - Alias is required by grammar (FROM func(...) AS alias) - AstBuilder emits existing TableFunction + SubqueryAlias AST nodes - 3 parser unit tests: basic parse, with WHERE/ORDER BY/LIMIT, alias required This is a pure grammar change — no execution support yet. Queries will parse successfully but fail at the Analyzer with "unsupported function". Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
PR Reviewer Guide 🔍(Review updated until commit d73e73f)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to d73e73f Explore these optional code suggestions:
Previous suggestionsSuggestions up to commit 7b30898
Suggestions up to commit 023d31f
|
1. Canonicalize argument names at parser boundary:
unquoteIdentifier + toLowerCase(Locale.ROOT) in visitTableFunctionRelation
so FIELD='x' and `field`='x' both produce argName="field"
2. Make AS keyword optional (AS? alias) for consistency with
tableAsRelation and subqueryAsRelation grammar rules
3. Strengthen test coverage:
- Full structural AST assertion for WHERE + ORDER BY + LIMIT
(verifies Sort, Limit, Filter nodes, not just toString)
- Argument reorder test proves names resolve by name not position
- Case canonicalization test (TABLE= → table=)
- Alias-without-AS test (FROM func(...) v)
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
|
Persistent review updated to latest commit 7b30898 |
Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
|
Persistent review updated to latest commit d73e73f |
| .forEach( | ||
| arg -> { | ||
| String argName = | ||
| StringUtils.unquoteIdentifier(arg.ident().getText()).toLowerCase(Locale.ROOT); |
There was a problem hiding this comment.
Does visitAstExpression (visitIdent) handle this already?
There was a problem hiding this comment.
No. visitIdent() only unquotes identifiers; it does not lowercase them. In this path, the arg name is stored in UnresolvedArgument, and ExpressionAnalyzer.visitUnresolvedArgument() preserves that name when creating NamedArgumentExpression. The explicit toLowerCase(Locale.ROOT) in visitTableFunctionRelation() is therefore what canonicalizes named arg keys for table-function relations.
There was a problem hiding this comment.
Thanks @mengweieric — that confirms the lowercase is the only thing canonicalizing arg keys on this path. My concern isn't whether it's needed, but where it lives.
Since UnresolvedArgument → NamedArgumentExpression is a shared pipeline that PPL also feeds (PPL's visitTableFunction doesn't lowercase, it just uses arg.ident().getText()), we now have two front-ends with different casing policies converging on the same TableFunctionResolver. The existing Prometheus resolver compares names with a case-sensitive HashSet.removeAll (TableFunctionUtils.java#L73-L89), so a future vectorSearch resolver that declares args in camelCase (topK, filterType) will silently work in PPL and fail in SQL with Missing arguments:[topK,filterType].
Two ways to keep them in sync — what's your preference?
- Move the
toLowerCaseintoExpressionAnalyzer.visitUnresolvedArgument(orAnalyzer.visitTableFunction) so both parsers get the same canonicalization for free. - Keep it here, but also apply it in PPL's
visitTableFunctionand add a javadoc onTableFunction.javadocumenting the contract so future resolver authors know to declare arg names in lowercase.
Not a blocker tbh — I was trying to avoid the "works in PPL, fails in SQL" footgun before vectorSearch lands on top of this.
There was a problem hiding this comment.
My preference would be to centralize table-function arg canonicalization in Analyzer.visitTableFunction rather than ExpressionAnalyzer, so both front-ends share one table-function-specific policy without changing all named-arg handling globally. For this PR I kept the fix local to the SQL parser to minimize scope.
RyanL1997
left a comment
There was a problem hiding this comment.
Hi @mengweieric , thanks for the change. LGTM, but I do left a comment.
| .forEach( | ||
| arg -> { | ||
| String argName = | ||
| StringUtils.unquoteIdentifier(arg.ident().getText()).toLowerCase(Locale.ROOT); |
There was a problem hiding this comment.
Thanks @mengweieric — that confirms the lowercase is the only thing canonicalizing arg keys on this path. My concern isn't whether it's needed, but where it lives.
Since UnresolvedArgument → NamedArgumentExpression is a shared pipeline that PPL also feeds (PPL's visitTableFunction doesn't lowercase, it just uses arg.ident().getText()), we now have two front-ends with different casing policies converging on the same TableFunctionResolver. The existing Prometheus resolver compares names with a case-sensitive HashSet.removeAll (TableFunctionUtils.java#L73-L89), so a future vectorSearch resolver that declares args in camelCase (topK, filterType) will silently work in PPL and fail in SQL with Missing arguments:[topK,filterType].
Two ways to keep them in sync — what's your preference?
- Move the
toLowerCaseintoExpressionAnalyzer.visitUnresolvedArgument(orAnalyzer.visitTableFunction) so both parsers get the same canonicalization for free. - Keep it here, but also apply it in PPL's
visitTableFunctionand add a javadoc onTableFunction.javadocumenting the contract so future resolver authors know to declare arg names in lowercase.
Not a blocker tbh — I was trying to avoid the "works in PPL, fails in SQL" footgun before vectorSearch lands on top of this.
e56d097
into
opensearch-project:feature/vector-search-p0
Summary
Adds generic table function relation syntax to the SQL grammar, enabling
FROM funcName(key=value, ...) AS aliasin SQL queries. This is the foundational grammar change for the SQL vectorSearch() feature, but the syntax is not vectorSearch-specific — any table function registered with the storage engine can be invoked through this syntax.Changes
tableFunctionRelationalternative in therelationrule, withtableFunctionArgsandtableFunctionArgsub-rules for named argument parsingvisitTableFunctionRelation()emitsSubqueryAlias(alias, TableFunction(name, args))AST nodes. Argument names are canonicalized at the parser boundary (unquoteIdentifier+toLowerCase) for case-insensitive matching downstreamTest plan
./gradlew :sql:test --tests "org.opensearch.sql.sql.parser.AstBuilderTest"— all 51 tests pass