Skip to content

feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798

Open
WithEnoughCoffee wants to merge 2 commits into
microsoft:4.0from
WithEnoughCoffee:dev/autumnnash/telegraf-20399
Open

feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798
WithEnoughCoffee wants to merge 2 commits into
microsoft:4.0from
WithEnoughCoffee:dev/autumnnash/telegraf-20399

Conversation

@WithEnoughCoffee

@WithEnoughCoffee WithEnoughCoffee commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Summary

Telegraf shipped in Azure Linux 3.0 but is missing from 4.0. This restores it as a general-purpose, plugin-driven agent for collecting, processing, aggregating, and writing metrics. Resolves #20399.

Why a curated build

Built upstream-default, telegraf links ~400 plugins and the full transitive dependency tree — a large CVE surface, vendor footprint, and binary for a distro we maintain. Instead we compile a curated ("Balanced", 108 build tags) set: 63 inputs, 15 outputs, 7 processors, 4 aggregators, 12 parsers, 7 serializers. The rest are absent from the binary at build time.

  • General-purpose — common system, network, database, container, and cloud inputs/outputs included, useful out of the box.
  • Full first-party Azure + GitHubazure_monitor (in/out), azure_storage_queue, eventhub_consumer, azure_data_explorer, and the github input are all included, since AzL is an Azure/Microsoft + GitHub product and these should work by default.
  • Smaller attack/maintenance surface — fewer compiled plugins → fewer linked deps, fewer CVEs, smaller binary. Concretely, go list -deps ./cmd/telegraf links 1,877 packages vs 3,386 for the full build (~45% dropped), across 344 distinct third-party modules vs 592 (~248 fewer). Note: this reduces the linked/runtime-reachable surface only — the full vendor tree is still shipped (Fedora requires it), so the source-level CVE-scan footprint is unchanged.
  • Reviewable & adjustable in one place — the entire plugin policy is a single spec macro (%global buildtags). Adding (or removing) a plugin is a one-line change — append its tag to the macro, e.g.:
         inputs.cpu inputs.disk inputs.diskio inputs.mem inputs.net inputs.netstat \
    +    inputs.redis \
    then re-render and rebuild. No %build/%install/%files changes are needed, so curation stays easy to audit and evolve as requirements change.

Note on upstream: telegraf is not packaged by Fedora; the reference is InfluxData's own RPM, which ships the full plugin set. Our curation is a deliberate deviation produced via upstream's supported custom build tag (the same mechanism InfluxData exposes to end users for slimmed builds). The full vendor tree is retained for reproducibility and easy plugin additions.

Packaging (Fedora Go guidelines)

Uses the go2rpm --profile vendor scaffold as the baseline (Go Vendor Tools, vendored deps, %gobuild with GO_BUILDTAGS/GO_LDFLAGS), so it can be upstreamed to Fedora and matches the vendored-Go pattern AzL already uses (rootlesskit, git-lfs). Divergences are marked # AzL:. The full vendor tree is retained (Fedora requires it); curation only affects what is compiled. The cumulative SPDX License tag is computed with go_vendor_license and enforced by %go_vendor_license_check; bundled(golang(...)) provides are auto-generated.

systemd unit

The upstream systemd unit is shipped unmodified (runs as User=telegraf). We intentionally add no sandboxing drop-in: telegraf is a whole-system monitoring agent, and the curated inputs include hardware collectors that shell out via sudo -n (smart, smartctl, ipmi_sensor) or need CAP_NET_RAW (ping) — NoNewPrivileges/Protect* would break them. This matches upstream InfluxData and AzL 3.0. (An earlier revision shipped an openSUSE-derived 50-hardening.conf; it was dropped after review because it diverged from upstream and conflicted with the curated hardware inputs. See the PR discussion for the full rationale.)

Contents

  • telegraf.spec%gometa, curated %global buildtags, Go Vendor Tools license macros, sysusers (no userdel on uninstall), upstream systemd unit (unmodified), logrotate, generated default config, state dir, %check (license check + binary smoke test).
  • go-vendor-tools.toml — askalono detector + manual license entries.
  • telegraf.comp.toml — upstream source plus the full vendor tarball.
  • telegraf.sysusers, telegraf.default, generate_source_tarball.sh, locks/telegraf.lock.

Verification

Full mock build passes every phase including %check. Confirmed in mock:

  • Binary reports Telegraf 1.38.2 (branch stamped azurelinux); functional collection works (cpu input loads and emits).
  • Curated plugins present (azure_monitor in/out, azure_data_explorer, github, eventhub_consumer, docker, prometheus, snmp, …); non-curated absent (cloudwatch, sqlserver, nats, clickhouse).
  • File modes: telegraf.conf 0644 root:root (world-readable, as on Fedora); state dir /var/lib/telegraf 0770 root:telegraf (matching upstream InfluxData post-install.sh).
  • Install/erase lifecycle: sysusers creates the telegraf user with home /etc/telegraf (matching upstream InfluxData useradd -r -M -d /etc/telegraf; config is read via the unit's explicit -config flag, independent of $HOME); the unit installs; on erase the user is intentionally retained.
  • systemd-analyze verify accepts the unit; debuginfo is split into its own subpackage.

Why 1.38.2 (not 1.39.0)

telegraf 1.39.0's go.mod requires Go 1.26; AzL 4.0 currently ships Go 1.25.8, and 1.38.2 is the latest release that builds on it. We can bump to 1.39.0 once AzL golang reaches ≥ 1.26 (which also drops the logzio azure-monitor dependency).

Known follow-up

  • The reproducible vendor tarball (generate_source_tarball.sh, SHA512 1108fe48086a7051c5cb89935c6de1c675c3ea8212a979d147ad0c03aef327c6234fa9eee292e4f9594ba9ec2cb757fc9eff46630aea43551bca3d948b30b27f) must be uploaded to the lookaside store before CI source checks and package builds can fetch it; the comp.toml source URI already points at its final published path.

Copilot AI review requested due to automatic review settings June 24, 2026 21:52

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR re-introduces Telegraf (missing from Azure Linux 4.0, shipped in 3.0) as a new local component (base/comps/telegraf/). Because Telegraf's default build links ~400 plugins and the full transitive dependency tree, the spec uses a curated ("Balanced") plugin set (~104 Go build tags via GO_BUILDTAGS) to shrink the binary and its CVE/dependency surface, while still vendoring the full tree per Fedora Go packaging guidelines (%gometa, Go Vendor Tools, %gobuild). It adds systemd/sysusers/logrotate integration and a %check that validates the license expression and runs the binary. It resolves #20399.

Changes:

  • Adds a hand-maintained local telegraf.spec with a curated %global buildtags plugin policy, Go Vendor Tools license macros, sysusers, systemd unit, logrotate, and default-config generation.
  • Adds the component definition (telegraf.comp.toml, manual release), go-vendor-tools.toml, telegraf.sysusers, a reproducible vendor-tarball generator script, and the rendered specs/lock/sources.
  • The vendor tarball source URI is currently a 127.0.0.1 placeholder pending lookaside upload (noted as a known follow-up).

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
base/comps/telegraf/telegraf.comp.toml Local-spec component def, manual release, two source-files (upstream + vendor); vendor URI is a placeholder.
base/comps/telegraf/telegraf.spec Curated Go build spec (buildtags, license macros, sysusers, systemd, %check).
base/comps/telegraf/go-vendor-tools.toml askalono detector + manual SPDX entries for files the detector can't classify.
base/comps/telegraf/telegraf.sysusers Declarative telegraf system user.
base/comps/telegraf/generate_source_tarball.sh Reproducible go mod vendor tarball generator; comment references a stale macro name.
specs/t/telegraf/* Rendered spec/sysusers/go-vendor-tools/sources (body matches base sources).
locks/telegraf.lock Generated input-fingerprint lock.

Key findings: the helper script's comment cross-references a non-existent %{plugin_tags} macro (spec uses %{buildtags}), and the vendor source URI is an unresolved 127.0.0.1 placeholder that blocks CI fetch/build until replaced. Because this introduces a brand-new forked local spec (a long-term maintenance commitment) for a vendored Go package with a large curated plugin policy, license-expression tracking, and an unresolved source URI, it warrants human review.

Comment thread base/comps/telegraf/generate_source_tarball.sh Outdated
Comment thread base/comps/telegraf/telegraf.comp.toml Outdated
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from c23dfcb to a8edaec Compare June 24, 2026 22:31
Copilot AI review requested due to automatic review settings June 25, 2026 21:05
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from a8edaec to 6a6f026 Compare June 25, 2026 21:05

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 11 out of 12 changed files in this pull request and generated 1 comment.

Comment thread base/comps/telegraf/generate_source_tarball.sh Outdated
@WithEnoughCoffee WithEnoughCoffee changed the title telegraf: curated, Fedora-compliant build for Azure Linux 4.0 (#20399) feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399) Jun 26, 2026
@WithEnoughCoffee WithEnoughCoffee marked this pull request as ready for review June 26, 2026 17:46
@WithEnoughCoffee WithEnoughCoffee requested a review from a team as a code owner June 26, 2026 17:46
@tobiasb-ms

Copy link
Copy Markdown
Contributor

issue(blocking): While having a list of required follow-ups is good, make sure to remove it before we merge.

@tobiasb-ms tobiasb-ms left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question(blocking): I left a couple specific comments about this, but what are the practical differences between this and how we packaged it for AZL3? I know we've changed to use a fedora-blessed way of packaging, and that seems righteous. And of course we bumped the version. But I think there are more semantic differences here and we need to know what they are and why we're making them before taking this change.

Comment thread base/comps/telegraf/generate_source_tarball.sh
Comment thread base/comps/telegraf/generate_source_tarball.sh Outdated
Comment thread base/comps/telegraf/generate_source_tarball.sh Outdated
Comment thread base/comps/telegraf/telegraf.comp.toml Outdated
Comment thread base/comps/telegraf/telegraf-hardening.conf Outdated
Comment thread base/comps/telegraf/telegraf.comp.toml Outdated
Comment thread base/comps/telegraf/telegraf.spec Outdated
@tobiasb-ms

Copy link
Copy Markdown
Contributor

issue(blocking): This isn't in components-publish-channels.toml, so it will be published to the sdk repo, not the base repo.

@binujp

binujp commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf.

dnf repoquery -l telegraf | grep -v build-id
Updating and loading repositories:
Repositories loaded.
/etc/logrotate.d/telegraf
/etc/telegraf/telegraf.conf
/etc/telegraf/telegraf.conf.sample
/etc/telegraf/telegraf.d
/etc/telegraf/telegraf.d/.ignore
/usr/bin/telegraf
/usr/lib/telegraf/scripts/init.sh
/usr/lib/telegraf/scripts/telegraf.service
/var/log/telegraf

@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

Good question — here's the full inventory of what changed from AZL3 (1.31.0-10) and why. I've split it into (A) behavioral/semantic changes that genuinely affect what ships and how it runs (the ones worth scrutinizing), and (B) mechanical/compliance changes from adopting the Fedora Go scaffold.

A. Semantic / behavioral changes (the ones to sign off on)

# AZL3 (1.31.0) AZL4 (1.38.2) Why
1 All ~415 plugins compiled in Curated "Balanced" ~108 plugins via upstream custom-builder build tags Headline change. Binary ~250 MB → ~105 MB; cuts the applicable CVE surface from the vendored Go tree. Risk: any plugin not on the list is absent. Set is one auditable %global buildtags macro and is trivially expandable. Confirming the exact set with the MetricsExtension team is the last open item.
2 Default telegraf.conf from full catalog Default config generated from the curated binary Config reflects only compiled-in plugins (no dangling references).
3 %postun runs userdel/groupdeluser removed on uninstall User is never deleted (declarative sysusers.d; no userdel) Fedora guideline: removing a system user can orphan files under a later-reused UID. Leaves a telegraf user behind after uninstall.
4 telegraf.conf world-readable (0755, root:root) telegraf.conf 0640 telegraf:telegraf + telegraf.d/ drop-in dir Tighter perms; ownership set via %attr instead of a %post chown -R.
5 No sandboxing systemd hardening drop-in (50-hardening.conf) Sandboxes the agent (PrivateDevices intentionally omitted so hardware inputs still work). Could restrict plugins needing extra access.
6 Plain go build → effectively static, no debuginfo %gobuildPIE + hardened flags + debuginfo/debugsource subpackages Enterprise/Fedora norm; FIPS-consistent. Dynamic binary instead of AZL3's static one.
7 New: /etc/default/telegraf EnvironmentFile Lets operators pass TELEGRAF_OPTS without editing the unit.

B. Mechanical / compliance changes (no behavioral impact)

  • 12 inline CVE patches → 0. The 1.31.0→1.38.2 bump supersedes all of Patch0–11 and clears ~45 backported CVEs; ongoing CVE handling moves to vendor-tree bumps.
  • License accuracy. AZL3 declared License: MIT (telegraf's own license only). AZL4 ships the real cumulative SPDX expression computed over the entire vendor tree by go-vendor-tools (askalono) and enforced in %check.
  • Packaging model. AZL3 was a hand-written local spec; AZL4 is rebased on the canonical go2rpm --profile vendor scaffold (%gometa -L -f, go-vendor-tools, full vendor archive as .tar.bz2), with every AzL deviation marked # AzL: so it stays auditable and upstreamable. Telegraf isn't in Fedora, so it remains a local component.
  • Dependency wiring. Explicit shadow-utils/systemd Requires replaced by %{?systemd_requires} / %{?sysusers_requires_compat}; logrotate/procps-ng unchanged.

Net: the only changes that affect runtime behavior are the curated plugin set (#1–2), the no-user-deletion policy (#3), tighter config perms (#4), and the sandboxing drop-in (#5). Everything else is version/toolchain/compliance hygiene. Happy to expand the plugin set or relax any of these if they conflict with a known consumer.

@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

issue(blocking): While having a list of required follow-ups is good, make sure to remove it before we merge.

Agreed good call out. I will keep that in mind.

Copilot AI review requested due to automatic review settings June 26, 2026 20:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Comment thread base/comps/telegraf/generate_source_tarball.sh
@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf.

dnf repoquery -l telegraf | grep -v build-id Updating and loading repositories: Repositories loaded. /etc/logrotate.d/telegraf /etc/telegraf/telegraf.conf /etc/telegraf/telegraf.conf.sample /etc/telegraf/telegraf.d /etc/telegraf/telegraf.d/.ignore /usr/bin/telegraf /usr/lib/telegraf/scripts/init.sh /usr/lib/telegraf/scripts/telegraf.service /var/log/telegraf

Home directory — you're right, Switched the  telegraf  user's home from the SUSE-style  /var/lib/telegraf  to  /etc/telegraf  to match upstream InfluxData and Fedora/RHEL (their  useradd -d /etc/telegraf ). The rest of the layout already follows that convention ( /etc/telegraf  config,  /var/log/telegraf  logs,  /usr/bin/telegraf ).

One nuance worth documenting (and I've left a comment in the sysusers file):  /etc/telegraf  is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home. Some plugin SDKs (e.g. the Azure SDK credential cache) write under  $HOME , so the service drop-in sets  Environment=HOME=/var/lib/telegraf  (writable;  /var  stays writable under  ProtectSystem=full ). Net result for the supported path — running as the systemd service — config is read from  /etc/telegraf , SDK caches land in  /var/lib/telegraf , everything works.

The only caveat: if telegraf is run outside systemd (manual  sudo -u telegraf  debugging), that override isn't applied,  $HOME  falls back to  /etc/telegraf , and writes under  $HOME  would fail with EACCES. This is identical to upstream's design (they ship the same root-owned  /etc/telegraf  home); workaround is  HOME=/var/lib/telegraf  for manual runs. Flagged in a sysusers comment so it doesn't surprise a future maintainer.

Copilot AI review requested due to automatic review settings June 29, 2026 17:45
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from b71823f to 5a4c9e6 Compare June 29, 2026 17:45

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 13 out of 14 changed files in this pull request and generated no new comments.

@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from 5a4c9e6 to a8d52fe Compare June 29, 2026 18:07
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from a8d52fe to 60bcfe7 Compare June 29, 2026 20:29
@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

issue(blocking): This isn't in components-publish-channels.toml, so it will be published to the sdk repo, not the base repo.

fixed

@tobiasb-ms

tobiasb-ms commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf.
dnf repoquery -l telegraf | grep -v build-id Updating and loading repositories: Repositories loaded. /etc/logrotate.d/telegraf /etc/telegraf/telegraf.conf /etc/telegraf/telegraf.conf.sample /etc/telegraf/telegraf.d /etc/telegraf/telegraf.d/.ignore /usr/bin/telegraf /usr/lib/telegraf/scripts/init.sh /usr/lib/telegraf/scripts/telegraf.service /var/log/telegraf

Home directory — you're right, Switched the  telegraf  user's home from the SUSE-style  /var/lib/telegraf  to  /etc/telegraf  to match upstream InfluxData and Fedora/RHEL (their  useradd -d /etc/telegraf ). The rest of the layout already follows that convention ( /etc/telegraf  config,  /var/log/telegraf  logs,  /usr/bin/telegraf ).
One nuance worth documenting (and I've left a comment in the sysusers file):  /etc/telegraf  is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home. Some plugin SDKs (e.g. the Azure SDK credential cache) write under  $HOME , so the service drop-in sets  Environment=HOME=/var/lib/telegraf  (writable;  /var  stays writable under  ProtectSystem=full ). Net result for the supported path — running as the systemd service — config is read from  /etc/telegraf , SDK caches land in  /var/lib/telegraf , everything works.
The only caveat: if telegraf is run outside systemd (manual  sudo -u telegraf  debugging), that override isn't applied,  $HOME  falls back to  /etc/telegraf , and writes under  $HOME  would fail with EACCES. This is identical to upstream's design (they ship the same root-owned  /etc/telegraf  home); workaround is  HOME=/var/lib/telegraf  for manual runs. Flagged in a sysusers comment so it doesn't surprise a future maintainer.

This seems like a large caveat that ends up creating two sources of truth. In the worst case they could end up conflicting.
From what you say above, it seems like that could be avoided by removing some of the new hardening, right?

The home dir is root-owned and never writable by  telegraf  — by upstream design, not because of our hardening. Upstream:  useradd -r -M telegraf -s /bin/false -d /etc/telegraf -g telegraf  (pre-install.sh:8). The  -M  means do not create/own a home — it just points the home field at the existing root-owned  /etc/telegraf  config dir. Our spec matches:  %dir %{_sysconfdir}/%{name}  with no  %attr  root:root 0755  (telegraf.spec:209). So  $HOME=/etc/telegraf  is unwritable by  telegraf  even on a stock InfluxData install with zero sandboxing. Removing our hardening would not fix it. Config loading is independent of  $HOME .  ExecStart=... -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d  (upstream telegraf.service:13). The home-dir field has zero effect on where config is read. The two sources of truth are because were trying to make it fit the fedora RHEL defaults. These defaults for the home directory came from Telegraf not the Suse hardening. In AZL 3.0 we took it as is this is coming up because we are trying to meet fedora standards. I am really passionate about sending changes upstream, I think it would be better to purpose it with the influx upstream home directory and explain why. What are you're thoughts?

So is this how the fedora RPM made by InfluxData works? They also use /etc/telegraf. Do the plug-ins you're worried about here not work on Fedora because of this?

Also, earlier you said (my emphasis):

One nuance worth documenting (and I've left a comment in the sysusers file): /etc/telegraf is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home.

And in the hardening file, there's this comment:

# Several plugin SDKs (e.g. the Azure SDK credential cache) write under $HOME. The telegraf
# user's passwd home is /etc/telegraf (matching upstream/Fedora), but that path is read-only
# under ProtectSystem=full, so point $HOME at the writable state dir instead.

These seem at odds with this statement above:

The home dir is root-owned and never writable by  telegraf  — by upstream design, not because of our hardening.

Also:

The home-dir field has zero effect on where config is read.

Correct, it has zero effect on telegraf config reads. But the whole point of this is so plug-ins can write data which, presumably, they will read. I'm unfamiliar with architecture, but as you point out there is potential for a plug-in to use the HOME that's set up in the hardening file most of the time and then sometimes use /etc/telegraf, which means they wouldn't have any of the data they've written. Is this a realistic possibility or is it the case that in practice they will be run in the same context so we don't need to worry about this?

@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

issue(blocking): The commit message isn't a conventional commit. When doing your final rebase/reset, make sure the message is something like feat(telegraf): ....

will do

@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

issue/question(blocking): Do we have actual data/signal that reducing the number of plugins reduces the number of CVEs/attacks? It's a reasonable hypothesis, but it does move us away from the upstream (upstreamish -- it's not actually packaged by fedora), and away from AZL3. There's risk with that deviation and we need data to demonstrate that the risk is worth it.

InfluxData build the customer/builder specifically to support curated builds. It's an intentional and commonly supported way of selecting and curating plugins. I do think its going to be hard to say this is going to do %30 to for sure lower our CVEs, What I do know is the tree there so plugins can be added back. attack-surface. Smaller binary, faster builds.

This is a two way door being that their all shipped with the tree for the plugging and if they want all the plugins at the time of accepting the package we can do that change, or add what other plugins we decide on. But just like we try not to take GUI package and packages we don't need that can be deemed more problematic I think this is really just following the standard we've set for AZL 4.0 over all and is a two way door.

@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf.
dnf repoquery -l telegraf | grep -v build-id Updating and loading repositories: Repositories loaded. /etc/logrotate.d/telegraf /etc/telegraf/telegraf.conf /etc/telegraf/telegraf.conf.sample /etc/telegraf/telegraf.d /etc/telegraf/telegraf.d/.ignore /usr/bin/telegraf /usr/lib/telegraf/scripts/init.sh /usr/lib/telegraf/scripts/telegraf.service /var/log/telegraf

Home directory — you're right, Switched the  telegraf  user's home from the SUSE-style  /var/lib/telegraf  to  /etc/telegraf  to match upstream InfluxData and Fedora/RHEL (their  useradd -d /etc/telegraf ). The rest of the layout already follows that convention ( /etc/telegraf  config,  /var/log/telegraf  logs,  /usr/bin/telegraf ).
One nuance worth documenting (and I've left a comment in the sysusers file):  /etc/telegraf  is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home. Some plugin SDKs (e.g. the Azure SDK credential cache) write under  $HOME , so the service drop-in sets  Environment=HOME=/var/lib/telegraf  (writable;  /var  stays writable under  ProtectSystem=full ). Net result for the supported path — running as the systemd service — config is read from  /etc/telegraf , SDK caches land in  /var/lib/telegraf , everything works.
The only caveat: if telegraf is run outside systemd (manual  sudo -u telegraf  debugging), that override isn't applied,  $HOME  falls back to  /etc/telegraf , and writes under  $HOME  would fail with EACCES. This is identical to upstream's design (they ship the same root-owned  /etc/telegraf  home); workaround is  HOME=/var/lib/telegraf  for manual runs. Flagged in a sysusers comment so it doesn't surprise a future maintainer.

This seems like a large caveat that ends up creating two sources of truth. In the worst case they could end up conflicting.
From what you say above, it seems like that could be avoided by removing some of the new hardening, right?

The home dir is root-owned and never writable by  telegraf  — by upstream design, not because of our hardening. Upstream:  useradd -r -M telegraf -s /bin/false -d /etc/telegraf -g telegraf  (pre-install.sh:8). The  -M  means do not create/own a home — it just points the home field at the existing root-owned  /etc/telegraf  config dir. Our spec matches:  %dir %{_sysconfdir}/%{name}  with no  %attr  root:root 0755  (telegraf.spec:209). So  $HOME=/etc/telegraf  is unwritable by  telegraf  even on a stock InfluxData install with zero sandboxing. Removing our hardening would not fix it. Config loading is independent of  $HOME .  ExecStart=... -config /etc/telegraf/telegraf.conf -config-directory /etc/telegraf/telegraf.d  (upstream telegraf.service:13). The home-dir field has zero effect on where config is read. The two sources of truth are because were trying to make it fit the fedora RHEL defaults. These defaults for the home directory came from Telegraf not the Suse hardening. In AZL 3.0 we took it as is this is coming up because we are trying to meet fedora standards. I am really passionate about sending changes upstream, I think it would be better to purpose it with the influx upstream home directory and explain why. What are you're thoughts?

So is this how the fedora RPM made by InfluxData works? They also use /etc/telegraf. Do the plug-ins you're worried about here not work on Fedora because of this?

Also, earlier you said (my emphasis):

One nuance worth documenting (and I've left a comment in the sysusers file): /etc/telegraf is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home.

And in the hardening file, there's this comment:

# Several plugin SDKs (e.g. the Azure SDK credential cache) write under $HOME. The telegraf
# user's passwd home is /etc/telegraf (matching upstream/Fedora), but that path is read-only
# under ProtectSystem=full, so point $HOME at the writable state dir instead.

These seem at odds with this statement above:

The home dir is root-owned and never writable by  telegraf  — by upstream design, not because of our hardening.

Also:

The home-dir field has zero effect on where config is read.

Correct, it has zero effect on telegraf config reads. But the whole point of this is so plug-ins can write data which, presumably, they will read. I'm unfamiliar with architecture, but as you point out there is potential for a plug-in to use the HOME that's set up in the hardening file most of the time and then sometimes use /etc/telegraf, which means they wouldn't have any of the data they've written. Is this a realistic possibility or is it the case that in practice they will be run in the same context so we don't need to worry about this?

Fedora doesn't make telegraf its made by influxData we changed the home directory to match the fedora standard proposed in an earlier comment. I think we should change it back to what InfluxData does, Until we are asked to change it for upstream if that happens.

@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from 60bcfe7 to 9fa5f5e Compare July 1, 2026 01:07
@tobiasb-ms

tobiasb-ms commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Fedora doesn't make telegraf its made by influxData we changed the home directory to match the fedora standard proposed in an earlier comment. I think we should change it back to what InfluxData does, Until we are asked to change it for upstream if that happens.

If you install telegraf from InfluxData's repo (instructions: https://computingforgeeks.com/how-to-install-telegraf-on-fedora/), the user it creates has home directory /etc/telegraf:

bash-5.3# grep telegraf /etc/passwd
telegraf:x:999:999::/etc/telegraf:/bin/false

So using /var/lib/telegraf diverges from what we're currently considering the upstream, right?

We also still need answers to the questions/issues I brought up above:

  1. Do the plug-ins that we think will be problematic work with InfluxData's telegraf package on Fedora? If so, how?
  2. There are comments -- both here in the PR discussion and in telegraf-hardening.conf -- are inconsistent about what actually causes the need for the two HOME directories. In some places you've said that ProtectSystem=full plays into it and in other places you've said it's only the fact that /var/lib/telegraf is root-owned. We need that sorted out.
  3. Directly tied to issue 2 above is determining whether the hardening strategy adds value or gets in the way. I brought that up earlier and I think we both kind of lost the thread on it.

Comment thread base/comps/telegraf/generate_source_tarball.sh Outdated
Comment thread base/comps/telegraf/telegraf.sysusers Outdated
Copilot AI review requested due to automatic review settings July 1, 2026 18:36
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from 9fa5f5e to df6f439 Compare July 1, 2026 18:36

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Comment thread base/comps/telegraf/telegraf.spec Outdated
Comment thread base/comps/telegraf/telegraf.comp.toml Outdated
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from df6f439 to 2b4414f Compare July 1, 2026 18:57
Copilot AI review requested due to automatic review settings July 1, 2026 19:18
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from 2b4414f to 0d96465 Compare July 1, 2026 19:18

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Comment thread base/comps/telegraf/telegraf.spec Outdated
Comment thread base/comps/telegraf/telegraf.comp.toml Outdated
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from 0d96465 to bd9c5b8 Compare July 1, 2026 20:15
@WithEnoughCoffee

Copy link
Copy Markdown
Contributor Author

Fedora doesn't make telegraf its made by influxData we changed the home directory to match the fedora standard proposed in an earlier comment. I think we should change it back to what InfluxData does, Until we are asked to change it for upstream if that happens.

If you install telegraf from InfluxData's repo (instructions: https://computingforgeeks.com/how-to-install-telegraf-on-fedora/), the user it creates has home directory /etc/telegraf:

bash-5.3# grep telegraf /etc/passwd
telegraf:x:999:999::/etc/telegraf:/bin/false

So using /var/lib/telegraf diverges from what we're currently considering the upstream, right?

We also still need answers to the questions/issues I brought up above:

  1. Do the plug-ins that we think will be problematic work with InfluxData's telegraf package on Fedora? If so, how?
  2. There are comments -- both here in the PR discussion and in telegraf-hardening.conf -- are inconsistent about what actually causes the need for the two HOME directories. In some places you've said that ProtectSystem=full plays into it and in other places you've said it's only the fact that /var/lib/telegraf is root-owned. We need that sorted out.
  3. Directly tied to issue 2 above is determining whether the hardening strategy adds value or gets in the way. I brought that up earlier and I think we both kind of lost the thread on it.

We spoke privately and I addressed these comments.

Comment thread base/comps/telegraf/telegraf.spec Outdated
Comment thread base/comps/telegraf/telegraf.spec Outdated
Restore telegraf (absent from AzL 4.0) as a general-purpose metrics agent,
packaged per the Fedora Go guidelines for upstreaming.

- Curated ("Balanced") custom build via GO_BUILDTAGS: a general-purpose subset
  (~108 of ~415 plugins) is compiled in, including the full first-party Azure
  plugin set and the github input. This is a deliberate AzL deviation from the
  full upstream/AZL3 build, produced via upstream's supported `custom` build
  tag; the complete vendor tree is still shipped for reproducibility and easy
  plugin additions. Curation drops ~248 third-party modules from the linked
  binary (defense-in-depth: unlinked code is not runtime-reachable).
- Reproducible vendor tarball via generate_source_tarball.sh (deterministic tar
  flags, fixed SOURCE_DATE_EPOCH=0); pinned by SHA512. The script is an
  out-of-band maintainer tool and is never invoked during rpmbuild. Parser
  hardened with a catch-all case to avoid an infinite loop on unexpected args.
- rpmautospec %autorelease / %autochangelog (release calculation = autorelease),
  per the convention for newly authored specs.
- System user via sysusers.d with home /etc/telegraf, matching upstream
  InfluxData (useradd -M -d /etc/telegraf). The home field is unused for config
  loading: the unit reads config via explicit -config /etc/telegraf/telegraf.conf
  -config-directory /etc/telegraf/telegraf.d, and no curated plugin writes $HOME.
- Ships the upstream systemd unit as-is (no sandboxing drop-in), matching
  upstream InfluxData; curated hardware inputs (smart, ipmi_sensor, ping) rely on
  sudo/CAP_NET_RAW that aggressive Protect*/NoNewPrivileges settings would break.
- Config, sysusers and env-file drop-ins shipped as Source files;
  telegraf.conf installed 0644 root:root (world-readable, as on Fedora) and
  state dir /var/lib/telegraf installed 0770 root:telegraf, matching upstream
  InfluxData (scripts/rpm/post-install.sh).
Copilot AI review requested due to automatic review settings July 1, 2026 21:06
@WithEnoughCoffee WithEnoughCoffee force-pushed the dev/autumnnash/telegraf-20399 branch from bd9c5b8 to 6e032b0 Compare July 1, 2026 21:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Comment thread base/comps/telegraf/telegraf.comp.toml
Comment thread base/comps/telegraf/telegraf.comp.toml Outdated
Co-authored-by: Tobias Brick <39196763+tobiasb-ms@users.noreply.github.com>
Copilot AI review requested due to automatic review settings July 1, 2026 21:58
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown

🔒❌ Lock files are out of date

FIX: — run this and commit the result:

azldev component update -p telegraf

Or download the fix patch and apply it:

gh run download 28550470185 -R microsoft/azurelinux -n locks-patch
git apply locks.patch

Changed components (1)

Component New upstream commit
telegraf -

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.

Comment on lines +92 to +95
NAME_VER="telegraf-${PKG_VERSION}"
# Fedora forge macros expect the vendor archive as %{archivename}-vendor.tar.bz2.
VENDOR_TARBALL="$(realpath "${OUT_FOLDER}/${NAME_VER}-vendor.tar.bz2")"

Comment on lines +82 to +90
echo "Creating a tempdir."
TMPDIR=$(mktemp -d)
function cleanup {
echo "Clean-up: removing tempdir (${TMPDIR})."
rm -rf "${TMPDIR}"
}
trap cleanup EXIT

pushd "${TMPDIR}" > /dev/null
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants