Skip to content
/ server Public

Conversation

@KhaledR57
Copy link
Contributor

@KhaledR57 KhaledR57 commented Nov 14, 2025

  • The Jira issue number for this PR is: MDEV-24943

Description

Aggregates lacked the SQL-standard FILTER clause, forcing CASE-based workarounds that reduced readability.

This update introduces the ability to specify a FILTER clause for aggregate functions, allowing for more granular control over which rows are included in the aggregation. Also, improves standards compliance and makes queries clearer and more readable.

<aggregate_function> ( <expression> ) [ FILTER ( WHERE <condition> ) ] [ OVER ( <window_spec> ) ]

The <condition> may contain any expression allowed in regular WHERE clauses, except subqueries, window functions, and outer references.

Example:

-- Using FILTER clause
SELECT AVG(value) FILTER (WHERE status = 'active') FROM sales;

-- Equivalent using CASE
SELECT AVG(CASE WHEN status = 'active' THEN value END) FROM sales;

How can this PR be tested?

Running the Test Suite

./mtr aggregates-filter

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch from 9f3c0ff to cb7d38d Compare November 14, 2025 07:38
@svoj svoj added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Nov 15, 2025
@gkodinov
Copy link
Member

Thank for taking the effort to work on this. Indeed, the LIMIT clause is a part of (at least) SQL 2011 and, as such, is a very good feature to have.

However, the diff seems very incomplete. Can you please make sure that you:

  1. Have a working diff.
  2. Have a good functional description of the change (hopefully in a jira)

Please re-submit when you have the above.
This is not a trivial change. I would suggest reaching out to the mariaDB developers at zulip if you need to talk about it with somebody.

@gkodinov gkodinov closed this Dec 15, 2025
@vuvova
Copy link
Member

vuvova commented Dec 15, 2025

I don't see how the diff is incomplete. It applies fine. It compiles fine. It passes tests on buildbot, not every builder, but it passes on many builders, this doesn't look like a non-working diff.

@DaveGosselin-MariaDB
Copy link
Member

DaveGosselin-MariaDB commented Dec 17, 2025

Hi @KhaledR57 , I'm experimenting with the changes and I see a difference between what should otherwise be equivalent statements. What was once written using CASE as a workaround should now work with FILTER as in the following examples below. First, on Postgres 18.1, we see that these queries are equivalent:

postgres=# SELECT COUNT(*) AS unfiltered, SUM( CASE WHEN generate_series < 5 THEN 1 ELSE 0 END ) AS filtered FROM generate_series(1,10);
 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

postgres=# SELECT COUNT(*) AS unfiltered, COUNT(*) FILTER (WHERE generate_series < 5) AS filtered FROM generate_series(1,10);
 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

However, on MariaDB, the FILTER analogue produces different results with your patch:

MariaDB [test]> SELECT COUNT(*) AS unfiltered, SUM( CASE WHEN seq < 5 THEN 1 ELSE 0 END ) AS filtered FROM seq_1_to_10;
+------------+----------+
| unfiltered | filtered |
+------------+----------+
|         10 |        4 |
+------------+----------+
1 row in set (0.005 sec)

MariaDB [test]> SELECT COUNT(*) AS unfiltered, COUNT(*) FILTER (WHERE seq < 5) AS filtered FROM seq_1_to_10;
+------------+----------+
| unfiltered | filtered |
+------------+----------+
|         10 |       10 |
+------------+----------+
1 row in set (0.001 sec)

Why is this the case?

@DaveGosselin-MariaDB
Copy link
Member

If we store the sequence engine result to an InnoDB-backed table and run the FILTER query against that instead we get the correct result:

create table t1 (seq int);
insert into t1 select seq from seq_1_to_10;
SELECT COUNT(*) AS unfiltered, COUNT(*) FILTER (WHERE seq < 5) AS filtered FROM t1;
+------------+----------+
| unfiltered | filtered |
+------------+----------+
|         10 |        4 |
+------------+----------+
1 row in set (0.001 sec)

@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch from cb7d38d to aaea60a Compare December 19, 2025 04:13
@KhaledR57
Copy link
Contributor Author

Hi @DaveGosselin-MariaDB , Sorry for the delayed response!

The issue was with the sequence storage engine's group_by_handler optimization in storage/sequence/sequence.cc. It computes SUM() and COUNT() directly using formulas without scanning rows, but it wasn't checking for the FILTER clause, so the filter was completely bypassed.

I've added a has_filter() check so it falls back to normal aggregation when FILTER is present. Fixed now!

if (item->type() != Item::SUM_FUNC_ITEM ||
    (((Item_sum*) item)->sum_func() != Item_sum::SUM_FUNC &&
     ((Item_sum*) item)->sum_func() != Item_sum::COUNT_FUNC) ||
    ((Item_sum*) item)->has_filter())  // NEW CHECK
    return 0;  // Fall back to normal aggregation

Storing the sequence data to an InnoDB table gave the correct result because InnoDB goes through the standard opt_sum.cc optimization path, which I had already updated here to check has_filter() and skip the optimization when FILTER is present.

I'll add sequence-specific tests in the next commit after I finish the stored aggregates (almost done with those).

@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch 2 times, most recently from 34ffec1 to 06b76ca Compare January 4, 2026 11:30
@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch 2 times, most recently from 2e31328 to 0d88cba Compare January 9, 2026 00:18
@DaveGosselin-MariaDB
Copy link
Member

Hi @KhaledR57 ,

I've added a has_filter() check so it falls back to normal aggregation when FILTER is present. Fixed now!

Will such a change be required for every storage engine? If so, is there a way to generalize this for every storage engine?

Copy link
Member

@DaveGosselin-MariaDB DaveGosselin-MariaDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @KhaledR57 ,
Here are the places I found in your patch which are not exercised by your new tests. Please add test cases that exercise them. Feel free to reach out to me on Zulip if you need help. I still have more review work to do and will update again soon.
Thanks,
Dave

if (filter_expr->type() == Item::SUM_FUNC_ITEM ||
filter_expr->type() == Item::WINDOW_FUNC_ITEM)
{
my_error(ER_WRONG_USAGE, MYF(0), "aggregate function", "FILTER");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -1698,7 +1774,7 @@ void Item_sum_sum::add_helper(bool perform_removal)
sum-= aggr->arg_val_real();
else
sum+= aggr->arg_val_real();
if (!aggr->arg_is_null(true))
if (!aggr->arg_is_null(true) && filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -2336,7 +2412,7 @@ void Item_sum_variance::reset_field()

nr= args[0]->val_real(); /* sets null_value as side-effect */

if (args[0]->null_value)
if (args[0]->null_value || !filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -2367,7 +2443,7 @@ void Item_sum_variance::update_field()

double nr= args[0]->val_real(); /* sets null_value as side-effect */

if (args[0]->null_value)
if (args[0]->null_value || !filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -2641,7 +2717,7 @@ bool Item_sum_bit::clear_as_window()
bool Item_sum_bit::remove_as_window(ulonglong value)
{
DBUG_ASSERT(as_window_function);
if (num_values_added == 0)
if (num_values_added == 0 || args[0]->null_value || !filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -3021,7 +3106,7 @@ void Item_sum_sum::update_field()
else
{
nr= args[0]->val_real();
null_flag= args[0]->null_value;
null_flag= args[0]->null_value || !filter_passed();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -3045,7 +3130,7 @@ void Item_sum_count::update_field()
direct_counted= direct_reseted_field= FALSE;
nr+= direct_count;
}
else if (!args[0]->maybe_null() || !args[0]->is_null())
else if ((!args[0]->maybe_null() || !args[0]->is_null()) && filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -3080,7 +3165,7 @@ void Item_sum_avg::update_field()
double nr;

nr= args[0]->val_real();
if (!args[0]->null_value)
if (!args[0]->null_value && filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -4639,7 +4734,7 @@ bool Item_func_collect::add() {
uint current_geometry_srid;
has_cached_result= false;

if (tmp_arg[0]->null_value)
if (tmp_arg[0]->null_value || !filter_passed())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@@ -4663,7 +4758,7 @@ void Item_func_collect::remove() {
String *wkb= args[0]->val_str(&value);
has_cached_result= false;

if (args[0]->null_value) return;
if (args[0]->null_value || !filter_passed()) return;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This codepath isn't exercised by your new tests.

@KhaledR57
Copy link
Contributor Author

Hi @KhaledR57 ,

I've added a has_filter() check so it falls back to normal aggregation when FILTER is present. Fixed now!

Will such a change be required for every storage engine? If so, is there a way to generalize this for every storage engine?

Hi @DaveGosselin-MariaDB,
I actually had the same question. This issue made me realize other engines with similar optimizations might have the same problem.
I was focused on the JOIN bug with FILTER clauses and custom aggregates, so I didn’t get to generalize this yet. I plan to test other storage engines to ensure FILTER clauses works correctly.

@DaveGosselin-MariaDB
Copy link
Member

Hi @KhaledR57 ,
In case you didn't notice, please see this comment on the associated Jira ticket.
Thanks,
Dave

@KhaledR57
Copy link
Contributor Author

KhaledR57 commented Jan 13, 2026

Hi @KhaledR57 , In case you didn't notice, please see this comment on the associated Jira ticket. Thanks, Dave

Sorry, I wasn't following the ticket. I believe these are already handled.
In opt_sum_query for optimization opt_sum.cc, and for QUICK_GROUP_MIN_MAX_SELECT opt_range.cc

Or the comment meant something else. If I misunderstood something, please let me know.

Aggregates lacked the SQL-standard FILTER clause, forcing CASE-based workarounds that reduced readability across (sum, avg, count, …).

This update introduces the ability to specify a FILTER clause for aggregate functions,
allowing for more granular control over which rows are included in the aggregation.
Also, improves standards compliance and makes queries clearer and more readable.

The FILTER(WHERE ...) condition may contain any expression allowed in regular WHERE clauses,
except subqueries, window functions, and outer references.
Avoid freeing SP query arena items when the aggregate function context was
never initialized, which could happen if the first group is filtered out
by HAVING.
@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch from 0d88cba to 408db34 Compare January 23, 2026 20:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants