Fix broken dag_processing.total_parse_time metric#62128
Merged
Conversation
kaxil
reviewed
Feb 18, 2026
kaxil
reviewed
Feb 18, 2026
09fc07d to
45c5d57
Compare
Contributor
Author
|
Updated with a test and some slightly more invasive fixes, because it turns out this was even more broken than I thought! @kaxil PTAL. |
kaxil
reviewed
Feb 20, 2026
kaxil
reviewed
Feb 20, 2026
kaxil
reviewed
Feb 20, 2026
kaxil
approved these changes
Feb 20, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch.
45c5d57 to
0c3696a
Compare
Contributor
Author
|
Thanks for re-review @kaxil. Should be ready to merge once CI passes. |
kaxil
approved these changes
Feb 20, 2026
Contributor
Author
|
If someone could rerun the tests that would be ace. Looks like a flake to me. |
choo121600
pushed a commit
to choo121600/airflow
that referenced
this pull request
Feb 22, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch.
vatsrahul1001
pushed a commit
that referenced
this pull request
Mar 3, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch. (cherry picked from commit 57a7c64)
vatsrahul1001
pushed a commit
that referenced
this pull request
Mar 3, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch. (cherry picked from commit 57a7c64)
vatsrahul1001
pushed a commit
that referenced
this pull request
Mar 3, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch. (cherry picked from commit 57a7c64)
vatsrahul1001
pushed a commit
that referenced
this pull request
Mar 3, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch. (cherry picked from commit 57a7c64)
vatsrahul1001
added a commit
that referenced
this pull request
Mar 3, 2026
…) (#62764) * Fix broken `dag_processing.total_parse_time` metric (#62128) DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch. (cherry picked from commit 57a7c64) * fix static checks --------- Co-authored-by: Nick Stenning <nick@whiteink.com>
vatsrahul1001
added a commit
that referenced
this pull request
Mar 4, 2026
…) (#62764) * Fix broken `dag_processing.total_parse_time` metric (#62128) DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch. (cherry picked from commit 57a7c64) * fix static checks --------- Co-authored-by: Nick Stenning <nick@whiteink.com>
dominikhei
pushed a commit
to dominikhei/airflow
that referenced
this pull request
Mar 11, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch.
Ankurdeewan
pushed a commit
to Ankurdeewan/airflow
that referenced
this pull request
Mar 15, 2026
DagFileProcessorManager has been emitting a nonsense value for `dag_processing.total_parse_time` since 8774f28, which reversed the order in which `emit_metrics` and `prepare_file_queue` (then called `prepare_file_path_queue`) were called. As `prepare_file_path_queue` was responsible for resetting the value of `self._parsing_start_time`, the assumption made by `emit_metrics` was that it would be called once the file queue had been cleared, but crucially before `prepare_file_queue` was called to refill the queue. Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop. I've rearranged things in such a way that I hope will be harder to accidentally break in future: - `self._parsing_start_time` may be reset whenever files are added to the queue, if it was not set already. - metrics are emitted when `prepare_file_queue` is called -- when the queue is empty -- but only if `self._parsing_start_time` is set, meaning only if we've actually parsed any files since the last time metrics were emitted. Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
DagFileProcessorManager has been emitting a nonsense value for
dag_processing.total_parse_timesince 8774f28, which reversed the order in whichemit_metricsandprepare_file_queue(then calledprepare_file_path_queue) were called.As
prepare_file_queuewas responsible for resetting the value ofself._parsing_start_time, the assumption made byemit_metricswas that it would be called once the file queue had been cleared, but crucially beforeprepare_file_queuewas called to refill the queue.Additionally, there was no guarantee that we'd parsed any files at all since the last time the metric was emitted. If no work was due, we'd gladly emit near-zero metrics every time around the while loop.
I've rearranged things in such a way that I hope will be harder to accidentally break in future:
self._parsing_start_timemay be reset whenever files are added to the queue, if it was not set already.metrics are emitted when
prepare_file_queueis called -- when the queue is empty -- but only ifself._parsing_start_timeis set, meaning only if we've actually parsed any files since the last time metrics were emitted.Together, this means we should now emit metrics once per parsing loop. I've added a test which fails on main and passes on this branch.