Skip to content

Optimize task_reschedule_count query with EXISTS check#62062

Closed
manipatnam wants to merge 2 commits into
apache:mainfrom
manipatnam:optimize-reschedule-count-query
Closed

Optimize task_reschedule_count query with EXISTS check#62062
manipatnam wants to merge 2 commits into
apache:mainfrom
manipatnam:optimize-reschedule-count-query

Conversation

@manipatnam

Copy link
Copy Markdown
Contributor

Optimize task_reschedule_count query with EXISTS check

For customers with huge number of TaskReschedule rows (from heavy sensor usage), the unconditional COUNT query on every task execution causes performance issues.

This PR adds an EXISTS check before the COUNT query. The EXISTS check returns instantly for tasks without reschedule records (99%+ of tasks), avoiding expensive operations on the TaskReschedule table. Sensors and tasks rescheduled due to DAG load failures still get accurate counts.

Performance impact:

  • Regular tasks: Single fast EXISTS query, no COUNT needed
  • Sensors with reschedules: EXISTS + COUNT

session.scalar(
select(func.count(TaskReschedule.id)).where(TaskReschedule.ti_id == task_instance_id)
)
or 0

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This amounts to double queries. I think the right fix is adding index on the id which was done here: #61983

@manipatnam

Copy link
Copy Markdown
Contributor Author

Closing this as index is added

@manipatnam manipatnam closed this Mar 25, 2026
@manipatnam manipatnam deleted the optimize-reschedule-count-query branch March 25, 2026 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants