fix: Optimize TaskQueueDB matching with composite indices and EXISTS#8462
fix: Optimize TaskQueueDB matching with composite indices and EXISTS#8462chrisburr wants to merge 2 commits intoDIRACGrid:integrationfrom
Conversation
The TaskIndex on multi-value tables (tq_TQTo*) previously only covered TQId, requiring a separate lookup for the Value column. Adding Value to the composite index makes it a covering index for all subqueries that filter on both columns, improving query performance significantly.
Replace COUNT-based subqueries with EXISTS/NOT EXISTS patterns in __generateTQMatchSQL, __generateTagSQLSubCond, and __generateTQFindSQL. EXISTS short-circuits on the first matching row instead of scanning all rows, which combined with the composite (TQId, Value) indices reduces matching query time from ~30ms to ~3ms on production.
495abee to
fe366e0
Compare
|
@fstagni At the dops it's probably worth recommending people do this: Even on lbwms this ran in a few hundred milliseconds so it's pretty safe to do even in a production system. |
aldbr
left a comment
There was a problem hiding this comment.
Not an expert of that part of the code but, given the description of the PR, the changes look good to me
| sql2 = sql1 + f" AND {tableName}.Value={tagMatchList}" | ||
| sql = "( " + sql1 + " ) = (" + sql2 + " )" | ||
| valuesStr = tagMatchList | ||
| sql = f"NOT EXISTS ( SELECT 1 FROM {tableName} WHERE {tableName}.TQId=tq.TQId AND {tableName}.Value NOT IN ( {valuesStr} ) )" |
There was a problem hiding this comment.
My favorite llm is telling me that:
When
tagMatchListis empty, the new code accepts ANY task (with zero values), but the old code required tasks to ONLY contain empty string values.Old logic:
(SELECT COUNT(all) WHERE id=tq.TQId) = (SELECT COUNT(...) AND value='')New logic:
NOT EXISTS (... WHERE value != '')
→ Accepts tasks with 0 values OR tasks where all values = ''This changes semantics - might incorrectly accept tasks that should be rejected.
I would say this is OK. Any other ideas?
There was a problem hiding this comment.
I think you need a smarter favorite LLM, I think that comment is claiming that 0 != 0
I will do, but just for the sake of organization, I would:
Like this:
|
The
__generateTQMatchSQLmatching query usesCOUNTsubqueries to check whether multi-value tables (tq_TQTo*) have rows for a given TQId. This is inefficient becauseCOUNTscans all matching rows even when we only need to know if any row exists, and the single-columnTaskIndexonTQIdrequires a separate lookup to fetch theValuecolumn.This PR makes two changes:
Composite
(TQId, Value)indices on the multi-value tables, replacing the single-columnTaskIndexonTQId. This makes the index a covering index for all subqueries that filter on both columns. The existing per-field{Field}IndexonValuealone is kept for reverse lookups (e.g. BannedSites antijoin).EXISTS/NOT EXISTSinstead ofCOUNTin__generateTQMatchSQL,__generateTagSQLSubCond, and__generateTQFindSQL.EXISTSshort-circuits on the first matching row rather than counting all of them.Together these drop
__generateTQMatchSQLquery time from ~30 ms to ~3 ms on a production-sized DB (confirmed viaEXPLAIN ANALYZE).Note: For existing deployments the schema change only takes effect on newly created tables (
__initializeDBskips tables that already exist). Production DBs need a one-offALTER TABLEto add the composite index.BEGINRELEASENOTES
*WorkloadManagement
FIX: Optimize TaskQueueDB matching queries by adding composite (TQId, Value) indices and replacing COUNT subqueries with EXISTS/NOT EXISTS
ENDRELEASENOTES