Skip to content

Add optional OTel service to the Airflow Helm Chart#64902

Merged
jscheffl merged 24 commits into
apache:mainfrom
xBis7:otel-helm-chart
May 5, 2026
Merged

Add optional OTel service to the Airflow Helm Chart#64902
jscheffl merged 24 commits into
apache:mainfrom
xBis7:otel-helm-chart

Conversation

@xBis7

@xBis7 xBis7 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

This patch is adding an otel-collector to the Helm chart.

I've added 2 separate flags for enabling traces and metrics. OTel is the only supported backend for traces, and so the traces flag is enabled by default. But that's not the case with metrics, and they need to be manually enabled. When the user enables the otel metrics, statsd is disabled in the airflow config so that otel will be used instead.


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)
    Claude Sonnet 4.6 Extended

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

@jason810496 jason810496 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn’t look into this thoroughly, but there might be some concerns based on a high-level overview.

Just FYI that we had a discussion about whether to use Kustomize for this kind of optional feature for better long-term maintainability in https://apache-airflow.slack.com/archives/C027H098M1C/p1770794021001679.

Though we haven’t settled on the release process and the concrete structure if we go with the Kustomize approach.

@xBis7

xBis7 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor Author

@jason810496 Thank you, I wasn't aware of this.

Airflow needs to talk directly to the otel-collector and to be configured to work with it. Additionally, when OTel is enabled for metrics, we have to disable statsd. Based on that, it's hard to set it up with Kustomize and it should be part of the helm chart.

But the 3 observability backends, don't need to interact with Airflow. They are very good example candidates for Kustomize. I think I should be able to make it work.

As I understand from the Slack discussion, there was a consensus on using Kustomize from 1.19.0 and upwards.

After this PR, I would like to add integration tests that use OTel and the backends. I don't think setting them up via Kustomize will be a problem.

I can move forward with the changes.

@jscheffl jscheffl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it is a good idea adding more components to the Airflow chart. There are better and known charts for Prometheus/Grafana and such. You should rather reference them instead of adding more complexity to ours

@xBis7 xBis7 force-pushed the otel-helm-chart branch from 6e988f4 to ffec048 Compare April 8, 2026 19:10
@xBis7

xBis7 commented Apr 8, 2026

Copy link
Copy Markdown
Contributor Author

@jscheffl What about the Kustomize approach that @jason810496 suggested? I just pushed it.

Comment thread chart/values.yaml
Comment thread dev/breeze/src/airflow_breeze/utils/kubernetes_utils.py Outdated

@Miretpl Miretpl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only checked the Helm chart-related part. I would recommend splitting it from this PR, too.

Additionally, not in the comments, but the whole helm chart part has no tests addition, when it should have.

Comment thread chart/templates/_helpers.yaml
Comment thread chart/templates/_helpers.yaml Outdated
Comment thread chart/values.yaml Outdated
Comment thread chart/templates/configmaps/otel-collector-configmap.yaml Outdated
Comment thread chart/templates/otel-collector/otel-collector-deployment.yaml
Comment thread chart/templates/otel-collector/otel-collector-deployment.yaml Outdated
Comment thread chart/templates/otel-collector/otel-collector-service.yaml Outdated
Comment thread chart/values.schema.json
Comment thread chart/values.yaml
Comment thread chart/values.yaml Outdated
@kaxil kaxil requested a review from Copilot April 10, 2026 19:55

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds OpenTelemetry-driven observability (collector + optional backends) to the dev/Kubernetes workflow and Helm chart configuration, with flags to enable traces and/or metrics.

Changes:

  • Extend Helm chart values/schema and templates to support an optional OpenTelemetry Collector and Airflow OTel configuration.
  • Add CI/dev Kubernetes manifests for Jaeger/Prometheus/Grafana and expose them via NodePorts in kind.
  • Update Breeze and test utilities to manage additional forwarded ports and to deploy observability backends based on --set flags.

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
scripts/ci/prek/lint_json_schema.py Update YAML loading/validation to support multi-document YAML.
scripts/ci/kubernetes/observability/kustomization.yaml Add kustomize entrypoint for observability manifests.
scripts/ci/kubernetes/observability/jaeger.yaml Add Jaeger all-in-one Deployment/Service for dev/CI.
scripts/ci/kubernetes/observability/prometheus.yaml Add Prometheus config + Deployment/Service for dev/CI scraping.
scripts/ci/kubernetes/observability/grafana.yaml Add Grafana provisioning + Deployment/Service for dev/CI dashboards.
scripts/ci/kubernetes/nodeport.yaml Expose Jaeger/Prometheus/Grafana via NodePorts.
scripts/ci/kubernetes/kind-cluster-conf.yaml Map additional NodePorts to localhost via kind port mappings.
kubernetes-tests/tests/kubernetes_tests/test_base.py Switch API server port env var name used by k8s tests.
dev/breeze/tests/test_kubernetes_commands.py Add unit tests for parsing OTel --set flags.
dev/breeze/src/airflow_breeze/utils/kubernetes_utils.py Allocate/propagate new forwarded ports and print backend URLs.
dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py Apply observability manifests after deploy based on parsed OTel flags.
chart/values.yaml Add OTel collector values and wire OTel vs statsd settings in airflow.cfg.
chart/values.schema.json Extend JSON schema for OTel collector image and ports.
chart/templates/otel-collector/otel-collector-service.yaml Add Service for the optional OTel collector.
chart/templates/otel-collector/otel-collector-deployment.yaml Add Deployment for the optional OTel collector.
chart/templates/configmaps/otel-collector-configmap.yaml Add OTel collector config (receivers/exporters/pipelines).
chart/templates/_helpers.yaml Add OTel env vars and helper for OTel collector image string.

Comment thread chart/values.yaml
Comment thread chart/templates/_helpers.yaml
Comment thread chart/templates/configmaps/otel-collector-configmap.yaml Outdated
Comment thread dev/breeze/src/airflow_breeze/utils/kubernetes_utils.py Outdated
Comment thread scripts/ci/prek/lint_json_schema.py Outdated
Comment thread dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py Outdated
Comment thread dev/breeze/src/airflow_breeze/commands/kubernetes_commands.py Outdated
@xBis7 xBis7 force-pushed the otel-helm-chart branch 2 times, most recently from d1da7d0 to 89f2712 Compare April 14, 2026 19:35
@xBis7

xBis7 commented Apr 14, 2026

Copy link
Copy Markdown
Contributor Author

@Miretpl Thank you for the review!

I removed all the kustomize logic and the jaeger, grafana and prometheus yaml files. We can see how to optionally include these in a follow-up PR.

I'm going to address your comments and also add tests.

@xBis7 xBis7 changed the title Add optional OTel, Jaeger, Grafana and Prometheus services to the Airflow Helm Chart Add optional OTel service to the Airflow Helm Chart Apr 15, 2026
@xBis7 xBis7 force-pushed the otel-helm-chart branch from 89f2712 to b8d0e0a Compare April 15, 2026 14:49
@xBis7 xBis7 force-pushed the otel-helm-chart branch from 0cd12f4 to 2ad2c1c Compare May 5, 2026 06:13
@xBis7

xBis7 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

Green CI.

Will probably need some effort to back-port to 1.2x line after.

@jscheffl I can look into the backport.

@jscheffl jscheffl added the backport-to-chart/v1-2x-test Automatic backport to chart 1.2x maintenance branch label May 5, 2026
@jscheffl jscheffl merged commit 535e3cc into apache:main May 5, 2026
142 checks passed
@github-actions

github-actions Bot commented May 5, 2026

Copy link
Copy Markdown
Contributor

Backport failed to create: chart/v1-2x-test. View the failure log Run details

Note: As of Merging PRs targeted for Airflow 3.X
the committer who merges the PR is responsible for backporting the PRs that are bug fixes (generally speaking) to the maintenance branches.

In matter of doubt please ask in #release-management Slack channel.

Status Branch Result
chart/v1-2x-test Commit Link

You can attempt to backport this manually by running:

cherry_picker 535e3cc chart/v1-2x-test

This should apply the commit to the chart/v1-2x-test branch and leave the commit in conflict state marking
the files that need manual conflict resolution.

After you have resolved the conflicts, you can continue the backport process by running:

cherry_picker --continue

If you don't have cherry-picker installed, see the installation guide.

@xBis7

xBis7 commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

@jscheffl @Miretpl Thank you for the help! I'm going to backport it manually and create a PR against the v1-2 branch.

@Miretpl

Miretpl commented May 5, 2026

Copy link
Copy Markdown
Contributor

@xBis7 no problem. Feel free to mention me when the backport is ready

xBis7 added a commit to xBis7/airflow that referenced this pull request May 6, 2026
* add otel to helm chart

* use Kustomize for grafana, jaeger, prometheus

* enable specific service per flag + unit test

* remove grafana, jaeger and prometheus kustomization logic

* traces enabled and metrics disabled, by default

* remove otelCollector.enabled flag

* add statsd comments about otel metrics overriding the config

* make OTEL_METRIC_EXPORT_INTERVAL configurable and provide default value + entry in the values.schema.json

* remove hardcoded value for metrics otel_port in values.yaml

* add option to override the configmap

* add otelCollector.args and make the config.yml file as the default argument

* rename extraAnnotations to annotations in otel-collector-service.yaml

* parameterize the readiness and liveness probe values

* remove prometheus from the configmap

* update the default value for OTEL_TRACES_EXPORTER

* fix tests in airflow_aux + otel-collector-serviceaccount.yaml

* fix spellcheck errors in docs

* fix tests in security

* otel collector unit tests + networkpolicy file

* values.schema.json cleanup

* add a minimum to all integer configs in values.schema.json

* fix heading comments

* change config default to ~ from empty string

* fix static check error
@xBis7 xBis7 deleted the otel-helm-chart branch May 6, 2026 10:44
xBis7 added a commit to xBis7/airflow that referenced this pull request May 7, 2026
* add otel to helm chart

* use Kustomize for grafana, jaeger, prometheus

* enable specific service per flag + unit test

* remove grafana, jaeger and prometheus kustomization logic

* traces enabled and metrics disabled, by default

* remove otelCollector.enabled flag

* add statsd comments about otel metrics overriding the config

* make OTEL_METRIC_EXPORT_INTERVAL configurable and provide default value + entry in the values.schema.json

* remove hardcoded value for metrics otel_port in values.yaml

* add option to override the configmap

* add otelCollector.args and make the config.yml file as the default argument

* rename extraAnnotations to annotations in otel-collector-service.yaml

* parameterize the readiness and liveness probe values

* remove prometheus from the configmap

* update the default value for OTEL_TRACES_EXPORTER

* fix tests in airflow_aux + otel-collector-serviceaccount.yaml

* fix spellcheck errors in docs

* fix tests in security

* otel collector unit tests + networkpolicy file

* values.schema.json cleanup

* add a minimum to all integer configs in values.schema.json

* fix heading comments

* change config default to ~ from empty string

* fix static check error
jscheffl pushed a commit that referenced this pull request May 7, 2026
* add otel to helm chart

* use Kustomize for grafana, jaeger, prometheus

* enable specific service per flag + unit test

* remove grafana, jaeger and prometheus kustomization logic

* traces enabled and metrics disabled, by default

* remove otelCollector.enabled flag

* add statsd comments about otel metrics overriding the config

* make OTEL_METRIC_EXPORT_INTERVAL configurable and provide default value + entry in the values.schema.json

* remove hardcoded value for metrics otel_port in values.yaml

* add option to override the configmap

* add otelCollector.args and make the config.yml file as the default argument

* rename extraAnnotations to annotations in otel-collector-service.yaml

* parameterize the readiness and liveness probe values

* remove prometheus from the configmap

* update the default value for OTEL_TRACES_EXPORTER

* fix tests in airflow_aux + otel-collector-serviceaccount.yaml

* fix spellcheck errors in docs

* fix tests in security

* otel collector unit tests + networkpolicy file

* values.schema.json cleanup

* add a minimum to all integer configs in values.schema.json

* fix heading comments

* change config default to ~ from empty string

* fix static check error
@Miretpl Miretpl mentioned this pull request May 26, 2026
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:helm-chart Airflow Helm Chart area:kubernetes-tests backport-to-chart/v1-2x-test Automatic backport to chart 1.2x maintenance branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants