Conversation
|
Hey @beygorghor you were right to be skeptical, my previous PR on this issue didn't address the slow performance on the Forms page itself. This PR should be a substantial improvement. I don't think there's much leeway for more optimisation (see attached query plan), but there's always the possibility of adding an extra index (see PR notes) 👀 CC @madewulf |
d510be2 to
c452694
Compare
Performance testing for "latest submission"Trying a few different approaches with 3 million instances: # Initial approach
# Note, this has a side-effect of slowing down the formversion prefetch query dramatically
# by creating a join on the instance table on the queryset
queryset = queryset.annotate(instance_updated_at=Max("instances__updated_at"))Total request time: 14.959s with 14 queries latest_instance = Instance.objects.filter(form=OuterRef("pk")).order_by("-updated_at")
queryset = queryset.annotate(instance_updated_at=Subquery(latest_instance.values("updated_at")[:1]))Total request time: 23.708s with 14 queries max_updated_sq = (
Instance.objects.filter(form_id=OuterRef("id"))
.values("form_id")
.annotate(max_up=Max("updated_at"))
.values("max_up")
)
queryset = queryset.annotate(
instance_updated_at=Subquery(max_updated_sq)
)Total request time: 13.914s with 14 queries # Note: decently fast, but performs a SEQ scan on the instance table
instance_cte = With(
Instance.objects.values("form_id").annotate(max_updated=Max("updated_at")),
name="instance_max_updated_cte",
)
queryset = instance_cte.join(queryset, id=instance_cte.col.form_id, _join_type=LOUTER)
queryset = queryset.with_cte(instance_cte).annotate(instance_updated_at=instance_cte.col.max_updated)Total request time: 1.677s with 14 queries After adding a composite indexCREATE INDEX iaso_instance_updated_at_form_id_composite ON iaso_instance (form_id, updated_at);latest_instance = Instance.objects.filter(form=OuterRef("pk")).order_by("-updated_at")
queryset = queryset.annotate(instance_updated_at=Subquery(latest_instance.values("updated_at")[:1]))Total request time: 0.428s with 14 queries queryset = queryset.annotate(instance_updated_at=Max("instances__updated_at"))Total request time: 1.941s with 14 queries max_updated_sq = (
Instance.objects.filter(form_id=OuterRef("id"))
.values("form_id")
.annotate(max_up=Max("updated_at"))
.values("max_up")
)
queryset = queryset.annotate(
instance_updated_at=Subquery(max_updated_sq)
)Total request time: 1.098s with 14 queries instance_cte = With(
Instance.objects.values("form_id").annotate(max_updated=Max("updated_at")),
name="instance_max_updated_cte",
)
queryset = instance_cte.join(queryset, id=instance_cte.col.form_id, _join_type=LOUTER)
queryset = queryset.with_cte(instance_cte).annotate(instance_updated_at=instance_cte.col.max_updated)Total request time: 0.840s with 14 queries |
tdethier
left a comment
There was a problem hiding this comment.
Before testing this PR, loading forms with all extra columns took ~3.5 seconds (my DB is way smaller - I have only 181k submissions).
I loaded forms again, using this PR:
- without index: ~4 seconds
- with index: ~3 seconds
The improvement is not as big as I had expected, but maybe it's related to the fact that I have fewer instances.
Anyway, kudos for the nice work and analysis, I appreciate that 👍
What problem is this PR solving?
Improve performance of the forms API in the forms listing and when called from the submissions list.
Related JIRA tickets
IA-4870
Changes
How to test
Print screen / video
/
Notes
Doc
/