feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798
feat: add curated telegraf 1.38.2 build for Azure Linux 4.0 (#20399)#17798WithEnoughCoffee wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR re-introduces Telegraf (missing from Azure Linux 4.0, shipped in 3.0) as a new local component (base/comps/telegraf/). Because Telegraf's default build links ~400 plugins and the full transitive dependency tree, the spec uses a curated ("Balanced") plugin set (~104 Go build tags via GO_BUILDTAGS) to shrink the binary and its CVE/dependency surface, while still vendoring the full tree per Fedora Go packaging guidelines (%gometa, Go Vendor Tools, %gobuild). It adds systemd/sysusers/logrotate integration and a %check that validates the license expression and runs the binary. It resolves #20399.
Changes:
- Adds a hand-maintained local
telegraf.specwith a curated%global buildtagsplugin policy, Go Vendor Tools license macros, sysusers, systemd unit, logrotate, and default-config generation. - Adds the component definition (
telegraf.comp.toml, manual release),go-vendor-tools.toml,telegraf.sysusers, a reproducible vendor-tarball generator script, and the rendered specs/lock/sources. - The vendor tarball source URI is currently a
127.0.0.1placeholder pending lookaside upload (noted as a known follow-up).
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
base/comps/telegraf/telegraf.comp.toml |
Local-spec component def, manual release, two source-files (upstream + vendor); vendor URI is a placeholder. |
base/comps/telegraf/telegraf.spec |
Curated Go build spec (buildtags, license macros, sysusers, systemd, %check). |
base/comps/telegraf/go-vendor-tools.toml |
askalono detector + manual SPDX entries for files the detector can't classify. |
base/comps/telegraf/telegraf.sysusers |
Declarative telegraf system user. |
base/comps/telegraf/generate_source_tarball.sh |
Reproducible go mod vendor tarball generator; comment references a stale macro name. |
specs/t/telegraf/* |
Rendered spec/sysusers/go-vendor-tools/sources (body matches base sources). |
locks/telegraf.lock |
Generated input-fingerprint lock. |
Key findings: the helper script's comment cross-references a non-existent %{plugin_tags} macro (spec uses %{buildtags}), and the vendor source URI is an unresolved 127.0.0.1 placeholder that blocks CI fetch/build until replaced. Because this introduces a brand-new forked local spec (a long-term maintenance commitment) for a vendored Go package with a large curated plugin policy, license-expression tracking, and an unresolved source URI, it warrants human review.
c23dfcb to
a8edaec
Compare
a8edaec to
6a6f026
Compare
|
issue(blocking): While having a list of required follow-ups is good, make sure to remove it before we merge. |
tobiasb-ms
left a comment
There was a problem hiding this comment.
question(blocking): I left a couple specific comments about this, but what are the practical differences between this and how we packaged it for AZL3? I know we've changed to use a fedora-blessed way of packaging, and that seems righteous. And of course we bumped the version. But I think there are more semantic differences here and we need to know what they are and why we're making them before taking this change.
|
issue(blocking): This isn't in |
|
rpm-layout, build (blocking): Can you please confirm this builds on koji? I also saw Suse was the reference for defaults. We should lean towards fedora/centos/redhat. There homedir is /etc/telegraf. dnf repoquery -l telegraf | grep -v build-id |
|
Good question — here's the full inventory of what changed from AZL3 ( A. Semantic / behavioral changes (the ones to sign off on)
B. Mechanical / compliance changes (no behavioral impact)
Net: the only changes that affect runtime behavior are the curated plugin set (#1–2), the no-user-deletion policy (#3), tighter config perms (#4), and the sandboxing drop-in (#5). Everything else is version/toolchain/compliance hygiene. Happy to expand the plugin set or relax any of these if they conflict with a known consumer. |
Agreed good call out. I will keep that in mind. |
Home directory — you're right, Switched the telegraf user's home from the SUSE-style /var/lib/telegraf to /etc/telegraf to match upstream InfluxData and Fedora/RHEL (their useradd -d /etc/telegraf ). The rest of the layout already follows that convention ( /etc/telegraf config, /var/log/telegraf logs, /usr/bin/telegraf ). One nuance worth documenting (and I've left a comment in the sysusers file): /etc/telegraf is root-owned and, under our hardening ( ProtectSystem=full ), read-only at runtime — so it's not a writable home. Some plugin SDKs (e.g. the Azure SDK credential cache) write under $HOME , so the service drop-in sets Environment=HOME=/var/lib/telegraf (writable; /var stays writable under ProtectSystem=full ). Net result for the supported path — running as the systemd service — config is read from /etc/telegraf , SDK caches land in /var/lib/telegraf , everything works. The only caveat: if telegraf is run outside systemd (manual sudo -u telegraf debugging), that override isn't applied, $HOME falls back to /etc/telegraf , and writes under $HOME would fail with EACCES. This is identical to upstream's design (they ship the same root-owned /etc/telegraf home); workaround is HOME=/var/lib/telegraf for manual runs. Flagged in a sysusers comment so it doesn't surprise a future maintainer. |
b71823f to
5a4c9e6
Compare
5a4c9e6 to
a8d52fe
Compare
a8d52fe to
60bcfe7
Compare
fixed |
So is this how the fedora RPM made by InfluxData works? They also use Also, earlier you said (my emphasis):
And in the hardening file, there's this comment: These seem at odds with this statement above:
Also:
Correct, it has zero effect on |
will do |
InfluxData build the customer/builder specifically to support curated builds. It's an intentional and commonly supported way of selecting and curating plugins. I do think its going to be hard to say this is going to do %30 to for sure lower our CVEs, What I do know is the tree there so plugins can be added back. attack-surface. Smaller binary, faster builds. This is a two way door being that their all shipped with the tree for the plugging and if they want all the plugins at the time of accepting the package we can do that change, or add what other plugins we decide on. But just like we try not to take GUI package and packages we don't need that can be deemed more problematic I think this is really just following the standard we've set for AZL 4.0 over all and is a two way door. |
Fedora doesn't make telegraf its made by influxData we changed the home directory to match the fedora standard proposed in an earlier comment. I think we should change it back to what InfluxData does, Until we are asked to change it for upstream if that happens. |
60bcfe7 to
9fa5f5e
Compare
If you install So using We also still need answers to the questions/issues I brought up above:
|
9fa5f5e to
df6f439
Compare
df6f439 to
2b4414f
Compare
2b4414f to
0d96465
Compare
0d96465 to
bd9c5b8
Compare
We spoke privately and I addressed these comments. |
Restore telegraf (absent from AzL 4.0) as a general-purpose metrics agent,
packaged per the Fedora Go guidelines for upstreaming.
- Curated ("Balanced") custom build via GO_BUILDTAGS: a general-purpose subset
(~108 of ~415 plugins) is compiled in, including the full first-party Azure
plugin set and the github input. This is a deliberate AzL deviation from the
full upstream/AZL3 build, produced via upstream's supported `custom` build
tag; the complete vendor tree is still shipped for reproducibility and easy
plugin additions. Curation drops ~248 third-party modules from the linked
binary (defense-in-depth: unlinked code is not runtime-reachable).
- Reproducible vendor tarball via generate_source_tarball.sh (deterministic tar
flags, fixed SOURCE_DATE_EPOCH=0); pinned by SHA512. The script is an
out-of-band maintainer tool and is never invoked during rpmbuild. Parser
hardened with a catch-all case to avoid an infinite loop on unexpected args.
- rpmautospec %autorelease / %autochangelog (release calculation = autorelease),
per the convention for newly authored specs.
- System user via sysusers.d with home /etc/telegraf, matching upstream
InfluxData (useradd -M -d /etc/telegraf). The home field is unused for config
loading: the unit reads config via explicit -config /etc/telegraf/telegraf.conf
-config-directory /etc/telegraf/telegraf.d, and no curated plugin writes $HOME.
- Ships the upstream systemd unit as-is (no sandboxing drop-in), matching
upstream InfluxData; curated hardware inputs (smart, ipmi_sensor, ping) rely on
sudo/CAP_NET_RAW that aggressive Protect*/NoNewPrivileges settings would break.
- Config, sysusers and env-file drop-ins shipped as Source files;
telegraf.conf installed 0644 root:root (world-readable, as on Fedora) and
state dir /var/lib/telegraf installed 0770 root:telegraf, matching upstream
InfluxData (scripts/rpm/post-install.sh).
bd9c5b8 to
6e032b0
Compare
Co-authored-by: Tobias Brick <39196763+tobiasb-ms@users.noreply.github.com>
🔒❌ Lock files are out of dateFIX: — run this and commit the result: azldev component update -p telegrafOr download the fix patch and apply it: gh run download 28550470185 -R microsoft/azurelinux -n locks-patch
git apply locks.patchChanged components (1)
|
| NAME_VER="telegraf-${PKG_VERSION}" | ||
| # Fedora forge macros expect the vendor archive as %{archivename}-vendor.tar.bz2. | ||
| VENDOR_TARBALL="$(realpath "${OUT_FOLDER}/${NAME_VER}-vendor.tar.bz2")" | ||
|
|
| echo "Creating a tempdir." | ||
| TMPDIR=$(mktemp -d) | ||
| function cleanup { | ||
| echo "Clean-up: removing tempdir (${TMPDIR})." | ||
| rm -rf "${TMPDIR}" | ||
| } | ||
| trap cleanup EXIT | ||
|
|
||
| pushd "${TMPDIR}" > /dev/null |
Summary
Telegraf shipped in Azure Linux 3.0 but is missing from 4.0. This restores it as a general-purpose, plugin-driven agent for collecting, processing, aggregating, and writing metrics. Resolves #20399.
Why a curated build
Built upstream-default, telegraf links ~400 plugins and the full transitive dependency tree — a large CVE surface, vendor footprint, and binary for a distro we maintain. Instead we compile a curated ("Balanced", 108 build tags) set: 63 inputs, 15 outputs, 7 processors, 4 aggregators, 12 parsers, 7 serializers. The rest are absent from the binary at build time.
azure_monitor(in/out),azure_storage_queue,eventhub_consumer,azure_data_explorer, and thegithubinput are all included, since AzL is an Azure/Microsoft + GitHub product and these should work by default.go list -deps ./cmd/telegraflinks 1,877 packages vs 3,386 for the full build (~45% dropped), across 344 distinct third-party modules vs 592 (~248 fewer). Note: this reduces the linked/runtime-reachable surface only — the full vendor tree is still shipped (Fedora requires it), so the source-level CVE-scan footprint is unchanged.%global buildtags). Adding (or removing) a plugin is a one-line change — append its tag to the macro, e.g.:inputs.cpu inputs.disk inputs.diskio inputs.mem inputs.net inputs.netstat \ + inputs.redis \%build/%install/%fileschanges are needed, so curation stays easy to audit and evolve as requirements change.Packaging (Fedora Go guidelines)
Uses the
go2rpm --profile vendorscaffold as the baseline (Go Vendor Tools, vendored deps,%gobuildwithGO_BUILDTAGS/GO_LDFLAGS), so it can be upstreamed to Fedora and matches the vendored-Go pattern AzL already uses (rootlesskit,git-lfs). Divergences are marked# AzL:. The full vendor tree is retained (Fedora requires it); curation only affects what is compiled. The cumulative SPDXLicensetag is computed withgo_vendor_licenseand enforced by%go_vendor_license_check;bundled(golang(...))provides are auto-generated.systemd unit
The upstream systemd unit is shipped unmodified (runs as
User=telegraf). We intentionally add no sandboxing drop-in: telegraf is a whole-system monitoring agent, and the curated inputs include hardware collectors that shell out viasudo -n(smart, smartctl, ipmi_sensor) or needCAP_NET_RAW(ping) —NoNewPrivileges/Protect*would break them. This matches upstream InfluxData and AzL 3.0. (An earlier revision shipped an openSUSE-derived50-hardening.conf; it was dropped after review because it diverged from upstream and conflicted with the curated hardware inputs. See the PR discussion for the full rationale.)Contents
telegraf.spec—%gometa, curated%global buildtags, Go Vendor Tools license macros, sysusers (nouserdelon uninstall), upstream systemd unit (unmodified), logrotate, generated default config, state dir,%check(license check + binary smoke test).go-vendor-tools.toml— askalono detector + manual license entries.telegraf.comp.toml— upstream source plus the full vendor tarball.telegraf.sysusers,telegraf.default,generate_source_tarball.sh,locks/telegraf.lock.Verification
Full mock build passes every phase including
%check. Confirmed in mock:Telegraf 1.38.2(branch stampedazurelinux); functional collection works (cpuinput loads and emits).azure_monitorin/out,azure_data_explorer,github,eventhub_consumer, docker, prometheus, snmp, …); non-curated absent (cloudwatch, sqlserver, nats, clickhouse).telegraf.conf0644 root:root(world-readable, as on Fedora); state dir/var/lib/telegraf0770 root:telegraf(matching upstream InfluxDatapost-install.sh).telegrafuser with home/etc/telegraf(matching upstream InfluxDatauseradd -r -M -d /etc/telegraf; config is read via the unit's explicit-configflag, independent of$HOME); the unit installs; on erase the user is intentionally retained.systemd-analyze verifyaccepts the unit; debuginfo is split into its own subpackage.Why 1.38.2 (not 1.39.0)
telegraf 1.39.0's
go.modrequires Go 1.26; AzL 4.0 currently ships Go 1.25.8, and 1.38.2 is the latest release that builds on it. We can bump to 1.39.0 once AzL golang reaches ≥ 1.26 (which also drops the logzio azure-monitor dependency).Known follow-up
generate_source_tarball.sh, SHA5121108fe48086a7051c5cb89935c6de1c675c3ea8212a979d147ad0c03aef327c6234fa9eee292e4f9594ba9ec2cb757fc9eff46630aea43551bca3d948b30b27f) must be uploaded to the lookaside store before CI source checks and package builds can fetch it; thecomp.tomlsource URI already points at its final published path.