Add logging to detect try number race#62703
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds targeted logging (and unit tests) to help detect try_number mismatches/races in the scheduler flow, particularly around TI scheduling and executor event processing (related to #57618).
Changes:
- Add a debug-gated post-update DB read in
DagRun.schedule_tis()to warn when the persistedtry_numberdiffers from the expected value. - Add additional scheduler logs/warnings around queueing workloads and handling executor events with mismatched/multiple
try_numbers. - Add/extend unit tests to assert the new warnings/logging behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
airflow-core/src/airflow/models/dagrun.py |
Adds debug-gated DB verification and warning logging for try_number mismatches after scheduling. |
airflow-core/src/airflow/jobs/scheduler_job_runner.py |
Adds more context-rich logs for queueing/scheduling and warnings for executor events with conflicting try_numbers. |
airflow-core/tests/unit/models/test_dagrun.py |
Adds tests validating warning behavior for schedule_tis() try-number mismatch checks. |
airflow-core/tests/unit/jobs/test_scheduler_job.py |
Extends/adds tests asserting new scheduler warnings via caplog. |
2c3047e to
3bfd673
Compare
This adds more logging to select places that try_number mismatch could happen and would help us detect and fix the issue. Related: apache#57618
a7a5e17 to
e0431ad
Compare
Backport failed to create: v3-1-test. View the failure log Run detailsNote: As of Merging PRs targeted for Airflow 3.X In matter of doubt please ask in #release-management Slack channel.
You can attempt to backport this manually by running: cherry_picker 95784d9 v3-1-testThis should apply the commit to the v3-1-test branch and leave the commit in conflict state marking After you have resolved the conflicts, you can continue the backport process by running: cherry_picker --continueIf you don't have cherry-picker installed, see the installation guide. |
* Add logging to detect try number race (#62703) * Log try_number mismatches during TI scheduling for HA race diagnosis This adds more logging to select places that try_number mismatch could happen and would help us detect and fix the issue. Related: #57618 * Add tests (cherry picked from commit 95784d9) * fixup! Add logging to detect try number race (#62703) * fixup! fixup! Add logging to detect try number race (#62703)
* Add logging to detect try number race (#62703) * Log try_number mismatches during TI scheduling for HA race diagnosis This adds more logging to select places that try_number mismatch could happen and would help us detect and fix the issue. Related: #57618 * Add tests (cherry picked from commit 95784d9) * fixup! Add logging to detect try number race (#62703) * fixup! fixup! Add logging to detect try number race (#62703)
* Log try_number mismatches during TI scheduling for HA race diagnosis This adds more logging to select places that try_number mismatch could happen and would help us detect and fix the issue. Related: apache#57618 * Add tests
This adds more logging to select places that try_number mismatch could happen and would help us detect and fix the issue.
Related: #57618
Was generative AI tooling used to co-author this PR?
GPT-5.3-codex