Skip to content

[Feature] Add table function relation syntax to SQL grammar#5318

Merged
mengweieric merged 3 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/vector-search-p0
Apr 7, 2026
Merged

[Feature] Add table function relation syntax to SQL grammar#5318
mengweieric merged 3 commits intoopensearch-project:feature/vector-search-p0from
mengweieric:feature/vector-search-p0

Conversation

@mengweieric
Copy link
Copy Markdown
Collaborator

@mengweieric mengweieric commented Apr 6, 2026

Summary

Adds generic table function relation syntax to the SQL grammar, enabling FROM funcName(key=value, ...) AS alias in SQL queries. This is the foundational grammar change for the SQL vectorSearch() feature, but the syntax is not vectorSearch-specific — any table function registered with the storage engine can be invoked through this syntax.

Changes

  • Grammar: New tableFunctionRelation alternative in the relation rule, with tableFunctionArgs and tableFunctionArg sub-rules for named argument parsing
  • AstBuilder: visitTableFunctionRelation() emits SubqueryAlias(alias, TableFunction(name, args)) AST nodes. Argument names are canonicalized at the parser boundary (unquoteIdentifier + toLowerCase) for case-insensitive matching downstream
  • Tests: 6 parser unit tests covering basic parse, full clause combo (WHERE + ORDER BY + LIMIT with structural AST assertion), argument reordering, case canonicalization, alias-without-AS, and alias-required negative test

Test plan

  • ./gradlew :sql:test --tests "org.opensearch.sql.sql.parser.AstBuilderTest" — all 51 tests pass
  • CI green

Add table function relation support to the SQL parser:
- New `tableFunctionRelation` alternative in `relation` grammar rule
- Named argument syntax: `key=value` (e.g., table='index', field='vec')
- Alias is required by grammar (FROM func(...) AS alias)
- AstBuilder emits existing TableFunction + SubqueryAlias AST nodes
- 3 parser unit tests: basic parse, with WHERE/ORDER BY/LIMIT, alias required

This is a pure grammar change — no execution support yet. Queries will
parse successfully but fail at the Analyzer with "unsupported function".

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

PR Reviewer Guide 🔍

(Review updated until commit d73e73f)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add table function relation grammar rules

Relevant files:

  • sql/src/main/antlr/OpenSearchSQLParser.g4

Sub-PR theme: Implement and test AstBuilder visitor for table function relation

Relevant files:

  • sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java
  • sql/src/test/java/org/opensearch/sql/sql/parser/AstBuilderTest.java

⚡ Recommended focus areas for review

Empty Args

The tableFunctionArgs rule requires at least one argument (tableFunctionArg (COMMA tableFunctionArg)*). There is no support for zero-argument table functions. If any registered table function takes no arguments, this grammar will reject it. Consider whether tableFunctionArgs should be optional or allow empty argument lists.

tableFunctionArgs
   : tableFunctionArg (COMMA tableFunctionArg)*
   ;
Function Name Case

The function name (qualifiedName) is not canonicalized — only argument names are lowercased. This means vectorSearch(...) and VectorSearch(...) would produce different AST nodes. If case-insensitive function name matching is desired, the function name should also be normalized.

TableFunction tableFunction =
    new TableFunction(visitAstExpression(ctx.qualifiedName()), args.build());
String alias = StringUtils.unquoteIdentifier(ctx.alias().getText());
return new SubqueryAlias(alias, tableFunction);
Grammar Ambiguity

The new tableFunctionRelation alternative uses qualifiedName LR_BRACKET ..., which may be ambiguous with existing function call expressions in the grammar. ANTLR will resolve this via ordering, but it should be verified that the tableFunctionRelation alternative does not accidentally match or conflict with other relation alternatives or expression rules involving function calls.

| qualifiedName LR_BRACKET tableFunctionArgs RR_BRACKET AS? alias # tableFunctionRelation

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

PR Code Suggestions ✨

Latest suggestions up to d73e73f

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against null alias in table function

The alias() rule is optional in the grammar (AS? alias), but the code calls
ctx.alias().getText() unconditionally. If alias() is null (no alias provided), this
will throw a NullPointerException at runtime. You should add a null check before
accessing the alias, or handle the no-alias case explicitly.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [210-211]

+if (ctx.alias() == null) {
+  return tableFunction;
+}
 String alias = StringUtils.unquoteIdentifier(ctx.alias().getText());
 return new SubqueryAlias(alias, tableFunction);
Suggestion importance[1-10]: 4

__

Why: While the grammar shows AS? alias making the alias optional syntactically, the test table_function_relation_requires_alias explicitly verifies that a SyntaxCheckException is thrown when no alias is provided. This means the grammar or parser enforces the alias requirement at a different level, making the null check less critical. However, the suggestion is technically valid as a defensive measure.

Low
Guard against null table function arguments

If ctx.tableFunctionArgs() returns null (e.g., when no arguments are provided),
calling .tableFunctionArg() on it will throw a NullPointerException. A null check
should be added to guard against this case.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [199-207]

-ctx.tableFunctionArgs()
-    .tableFunctionArg()
-    .forEach(
-        arg -> {
-          String argName =
-              StringUtils.unquoteIdentifier(arg.ident().getText()).toLowerCase(Locale.ROOT);
-          UnresolvedExpression argValue = visitAstExpression(arg.functionArg());
-          args.add(new UnresolvedArgument(argName, argValue));
-        });
+if (ctx.tableFunctionArgs() != null) {
+  ctx.tableFunctionArgs()
+      .tableFunctionArg()
+      .forEach(
+          arg -> {
+            String argName =
+                StringUtils.unquoteIdentifier(arg.ident().getText()).toLowerCase(Locale.ROOT);
+            UnresolvedExpression argValue = visitAstExpression(arg.functionArg());
+            args.add(new UnresolvedArgument(argName, argValue));
+          });
+}
Suggestion importance[1-10]: 2

__

Why: The grammar rule tableFunctionArgs requires at least one tableFunctionArg (no ? or * quantifier), so ctx.tableFunctionArgs() should never be null in a valid parse. The suggestion addresses a case that cannot occur given the grammar definition, making it an unnecessary defensive check.

Low

Previous suggestions

Suggestions up to commit 7b30898
CategorySuggestion                                                                                                                                    Impact
Possible issue
Make alias mandatory in grammar rule

The grammar rule makes alias optional (AS? alias) but the Java visitor code
unconditionally calls ctx.alias().getText(), which will throw a NullPointerException
when no alias is provided. Either make the alias mandatory in the grammar (remove ?
and require AS alias) or handle the optional alias in the visitor. Making it
mandatory in the grammar is consistent with the test
table_function_relation_requires_alias.

sql/src/main/antlr/OpenSearchSQLParser.g4 [114]

-| qualifiedName LR_BRACKET tableFunctionArgs RR_BRACKET AS? alias # tableFunctionRelation
+| qualifiedName LR_BRACKET tableFunctionArgs RR_BRACKET AS alias # tableFunctionRelation
Suggestion importance[1-10]: 7

__

Why: The grammar currently has AS? alias making the alias optional, but the visitor unconditionally calls ctx.alias().getText() which would cause a NullPointerException. Making the alias mandatory in the grammar (removing ?) is consistent with the test table_function_relation_requires_alias and prevents the NPE at the grammar level, which is the cleaner fix.

Medium
Guard against null alias in table function

The alias() rule is optional in the grammar (AS? alias), so ctx.alias() could be
null when no alias is provided, causing a NullPointerException. You should add a
null check before accessing ctx.alias().getText(), or throw a meaningful error if
the alias is missing.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [211-212]

+if (ctx.alias() == null) {
+  throw new SyntaxCheckException("Table function relation requires an alias");
+}
 String alias = StringUtils.unquoteIdentifier(ctx.alias().getText());
 return new SubqueryAlias(alias, tableFunction);
Suggestion importance[1-10]: 5

__

Why: While the null check is a valid concern, the grammar rule uses AS? alias making alias optional, but the test table_function_relation_requires_alias shows this is intentionally caught as a SyntaxCheckException. The fix in the grammar (suggestion 2) is a more appropriate solution. This Java-level fix is a valid alternative but redundant if the grammar is fixed.

Low
Suggestions up to commit 023d31f
CategorySuggestion                                                                                                                                    Impact
General
Replace fragile string assertions with structural checks

Using toString() for structural assertions is fragile and tightly coupled to the
internal string representation of AST nodes, which may change without notice.
Consider using instanceof checks and casting to verify the actual AST structure,
similar to the approach used in can_build_from_table_function.

sql/src/test/java/org/opensearch/sql/sql/parser/AstBuilderTest.java [173-177]

-String planStr = plan.toString();
-assertTrue(planStr.contains("SubqueryAlias(alias=s"));
-assertTrue(planStr.contains("TableFunction(functionName=vectorSearch"));
-assertTrue(planStr.contains("UnresolvedArgument(argName=table, value=products)"));
-assertTrue(planStr.contains("Filter(condition==(s.category, shoes)"));
+assertTrue(plan instanceof Project);
+UnresolvedPlan child = ((Project) plan).getChild().get(0);
+assertTrue(child instanceof Filter);
+UnresolvedPlan filterChild = ((Filter) child).getChild().get(0);
+assertTrue(filterChild instanceof SubqueryAlias);
+assertEquals("s", ((SubqueryAlias) filterChild).getAlias());
+assertTrue(((SubqueryAlias) filterChild).getChild().get(0) instanceof TableFunction);
Suggestion importance[1-10]: 5

__

Why: Using toString() for structural assertions is indeed fragile and couples tests to internal string representations. The suggested approach using instanceof and casting is more robust, though the improved code may need minor adjustments to match the actual API of the AST node classes.

Low
Guard against missing alias in table function

The grammar rule requires AS alias (alias is mandatory for tableFunctionRelation),
but if ctx.alias() returns null for any reason, this will throw a
NullPointerException. A null check or a more descriptive error message would make
this more robust and easier to debug.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [208-209]

+if (ctx.alias() == null) {
+    throw new SyntaxCheckException("Table function relation requires an alias");
+}
 String alias = StringUtils.unquoteIdentifier(ctx.alias().getText());
 return new SubqueryAlias(alias, tableFunction);
Suggestion importance[1-10]: 2

__

Why: The grammar rule already mandates AS alias for tableFunctionRelation, so ctx.alias() cannot be null in a valid parse tree. The test table_function_relation_requires_alias confirms that missing alias causes a SyntaxCheckException at parse time, making this defensive check redundant.

Low
Possible issue
Add null check before iterating arguments

The code does not handle the case where tableFunctionArgs() or tableFunctionArg()
returns null, which could cause a NullPointerException if the grammar allows empty
argument lists. Add a null check before iterating over the arguments.

sql/src/main/java/org/opensearch/sql/sql/parser/AstBuilder.java [198-205]

-ctx.tableFunctionArgs()
-    .tableFunctionArg()
-    .forEach(
-        arg -> {
-          String argName = arg.ident().getText();
-          UnresolvedExpression argValue = visitAstExpression(arg.functionArg());
-          args.add(new UnresolvedArgument(argName, argValue));
-        });
+if (ctx.tableFunctionArgs() != null && ctx.tableFunctionArgs().tableFunctionArg() != null) {
+    ctx.tableFunctionArgs()
+        .tableFunctionArg()
+        .forEach(
+            arg -> {
+              String argName = arg.ident().getText();
+              UnresolvedExpression argValue = visitAstExpression(arg.functionArg());
+              args.add(new UnresolvedArgument(argName, argValue));
+            });
+}
Suggestion importance[1-10]: 2

__

Why: The grammar rule tableFunctionArgs requires at least one tableFunctionArg (no optional/empty production), so tableFunctionArgs() and tableFunctionArg() cannot be null in valid parse trees. The null check is unnecessary given the grammar definition, making this a low-value suggestion.

Low

@mengweieric mengweieric changed the base branch from main to feature/vector-search-p0 April 6, 2026 21:05
1. Canonicalize argument names at parser boundary:
   unquoteIdentifier + toLowerCase(Locale.ROOT) in visitTableFunctionRelation
   so FIELD='x' and `field`='x' both produce argName="field"

2. Make AS keyword optional (AS? alias) for consistency with
   tableAsRelation and subqueryAsRelation grammar rules

3. Strengthen test coverage:
   - Full structural AST assertion for WHERE + ORDER BY + LIMIT
     (verifies Sort, Limit, Filter nodes, not just toString)
   - Argument reorder test proves names resolve by name not position
   - Case canonicalization test (TABLE= → table=)
   - Alias-without-AS test (FROM func(...) v)

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@mengweieric mengweieric changed the title [Feature] Add table function relation to SQL grammar for vectorSearch() [Feature] Add table function relation syntax to SQL grammar Apr 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Persistent review updated to latest commit 7b30898

Signed-off-by: Eric Wei <mengwei.eric@gmail.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 6, 2026

Persistent review updated to latest commit d73e73f

@mengweieric mengweieric added the skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. label Apr 6, 2026
@mengweieric mengweieric changed the title [Feature] Add table function relation syntax to SQL grammar [Feature] Add table function relation syntax to SQL grammar and parser Apr 6, 2026
@mengweieric mengweieric changed the title [Feature] Add table function relation syntax to SQL grammar and parser [Feature] Add table function relation syntax to SQL grammar Apr 6, 2026
.forEach(
arg -> {
String argName =
StringUtils.unquoteIdentifier(arg.ident().getText()).toLowerCase(Locale.ROOT);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does visitAstExpression (visitIdent) handle this already?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. visitIdent() only unquotes identifiers; it does not lowercase them. In this path, the arg name is stored in UnresolvedArgument, and ExpressionAnalyzer.visitUnresolvedArgument() preserves that name when creating NamedArgumentExpression. The explicit toLowerCase(Locale.ROOT) in visitTableFunctionRelation() is therefore what canonicalizes named arg keys for table-function relations.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mengweieric — that confirms the lowercase is the only thing canonicalizing arg keys on this path. My concern isn't whether it's needed, but where it lives.

Since UnresolvedArgumentNamedArgumentExpression is a shared pipeline that PPL also feeds (PPL's visitTableFunction doesn't lowercase, it just uses arg.ident().getText()), we now have two front-ends with different casing policies converging on the same TableFunctionResolver. The existing Prometheus resolver compares names with a case-sensitive HashSet.removeAll (TableFunctionUtils.java#L73-L89), so a future vectorSearch resolver that declares args in camelCase (topK, filterType) will silently work in PPL and fail in SQL with Missing arguments:[topK,filterType].

Two ways to keep them in sync — what's your preference?

  1. Move the toLowerCase into ExpressionAnalyzer.visitUnresolvedArgument (or Analyzer.visitTableFunction) so both parsers get the same canonicalization for free.
  2. Keep it here, but also apply it in PPL's visitTableFunction and add a javadoc on TableFunction.java documenting the contract so future resolver authors know to declare arg names in lowercase.

Not a blocker tbh — I was trying to avoid the "works in PPL, fails in SQL" footgun before vectorSearch lands on top of this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference would be to centralize table-function arg canonicalization in Analyzer.visitTableFunction rather than ExpressionAnalyzer, so both front-ends share one table-function-specific policy without changing all named-arg handling globally. For this PR I kept the fix local to the SQL parser to minimize scope.

Copy link
Copy Markdown
Collaborator

@RyanL1997 RyanL1997 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @mengweieric , thanks for the change. LGTM, but I do left a comment.

.forEach(
arg -> {
String argName =
StringUtils.unquoteIdentifier(arg.ident().getText()).toLowerCase(Locale.ROOT);
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mengweieric — that confirms the lowercase is the only thing canonicalizing arg keys on this path. My concern isn't whether it's needed, but where it lives.

Since UnresolvedArgumentNamedArgumentExpression is a shared pipeline that PPL also feeds (PPL's visitTableFunction doesn't lowercase, it just uses arg.ident().getText()), we now have two front-ends with different casing policies converging on the same TableFunctionResolver. The existing Prometheus resolver compares names with a case-sensitive HashSet.removeAll (TableFunctionUtils.java#L73-L89), so a future vectorSearch resolver that declares args in camelCase (topK, filterType) will silently work in PPL and fail in SQL with Missing arguments:[topK,filterType].

Two ways to keep them in sync — what's your preference?

  1. Move the toLowerCase into ExpressionAnalyzer.visitUnresolvedArgument (or Analyzer.visitTableFunction) so both parsers get the same canonicalization for free.
  2. Keep it here, but also apply it in PPL's visitTableFunction and add a javadoc on TableFunction.java documenting the contract so future resolver authors know to declare arg names in lowercase.

Not a blocker tbh — I was trying to avoid the "works in PPL, fails in SQL" footgun before vectorSearch lands on top of this.

@mengweieric mengweieric merged commit e56d097 into opensearch-project:feature/vector-search-p0 Apr 7, 2026
44 of 46 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature skip-diff-analyzer Maintainer to skip code-diff-analyzer check, after reviewing issues in AI analysis. SQL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants