Skip to content

fix(relay): enable Redis TLS for rediss:// (ElastiCache)#1417

Merged
tlongwell-block merged 2 commits into
mainfrom
fix/redis-tls
Jul 1, 2026
Merged

fix(relay): enable Redis TLS for rediss:// (ElastiCache)#1417
tlongwell-block merged 2 commits into
mainfrom
fix/redis-tls

Conversation

@tlongwell-block

Copy link
Copy Markdown
Collaborator

Problem

The relay crashes at startup on the bb-block (prod) cluster:

Error: Redis pool creation failed: Config: Redis: can't connect with TLS, the feature is not enabled - InvalidClientConfig

Prod ElastiCache is addressed via rediss:// (TLS), but the redis crate was compiled with no TLS feature (tokio-comp, connection-manager only). Local/dev uses plaintext redis://, so this never surfaced there — same shape as the S3/IRSA fix (#1406): a prod-only path dev never exercised.

This is the blocker after #1406 — kubectl confirms the relay now gets past S3/git (Postgres connects, migrations run, owner bootstrapped) and dies at Redis pool creation.

Fix — two coupled parts

Empirically, either part alone is insufficient:

  1. Enable TLS in the redis dep — add tokio-rustls-comp so the client can negotiate TLS for rediss://.
  2. Install a rustls CryptoProvider at relay startup — with the TLS feature on, both aws-lc-rs and ring are compiled in transitively, so rustls cannot auto-select a provider and panics at first TLS use. We install ring explicitly in main(), mirroring buzz-acp's existing rustls setup for wss://.

A probe confirmed adding only the feature swaps the clean startup error for a runtime panic in CryptoProvider::get_default_or_install_from_crate_features() — hence both parts.

Blast radius

  • Plaintext redis:// (local/dev) is unchanged: the TLS path is only taken for rediss:// URLs; the provider install is a cheap no-op otherwise.
  • Provider install is process-wide and idempotent-safe (install_default() runs once at the top of main(), before any TLS is attempted — Redis, wss, or S3-over-TLS).

Files

  • Cargo.toml — add tokio-rustls-comp to the workspace redis dep
  • crates/buzz-relay/Cargo.toml — add rustls (ring, std) dep
  • crates/buzz-relay/src/main.rs — install ring provider at startup
  • Cargo.lock

Validation

  • cargo test -p buzz-relay — 428 pass; the one Redis integration test (redis_presence_publish...) times out only under parallel full-suite pool contention and passes cleanly in isolation (env-dependent, not this change).
  • cargo fmt --check ✅ · cargo clippy -p buzz-relay
  • Pre-push hook rust-tests green (99s).

Deploy path

After this merges and the image builds, a third bb-block PR bumps the relay image tag. Deploy is not declared fixed until the pod is observed Running/Ready via read-only kubectl.

npub1qyvc0c5kl4gqv2fd97fsk46tu378sqgy35vc83rvgfwne90sel7s0ed67d and others added 2 commits June 30, 2026 20:45
Prod ElastiCache uses rediss:// (TLS), but the redis crate was compiled
without a TLS feature, so relay startup died at Redis pool creation with
"can't connect with TLS, the feature is not enabled". Dev uses plaintext
redis://, so it never surfaced.

Two coupled changes are required:

1. Add "tokio-rustls-comp" to the workspace redis dependency so the client
   can negotiate TLS for rediss:// URLs.

2. Install a rustls CryptoProvider at relay startup. With the TLS feature
   enabled, both aws-lc-rs and ring are compiled in transitively, so rustls
   cannot auto-select a provider and panics at first use. We install ring
   explicitly in main(), mirroring buzz-acp's existing rustls setup for
   wss://. Adding only the feature (without the provider) swaps the clean
   startup error for a runtime panic; both parts are needed.

Plaintext redis:// (local/dev) is unchanged: the TLS path is only taken for
rediss:// URLs, and the provider install is a no-op cost otherwise.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
The workspace redis TLS feature added in the prior commit also applies to
buzz-admin, which publishes membership changes (add-member/remove-member)
over Redis. Run inside the prod container against rediss:// ElastiCache, its
main() would hit the same panic the relay did: both aws-lc-rs and ring are
compiled in transitively, so rustls can't auto-select a CryptoProvider.

Install ring at the top of buzz-admin's main(), mirroring buzz-relay, so the
whole prod image is TLS-safe rather than just the relay binary.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
@tlongwell-block tlongwell-block merged commit 3292b50 into main Jul 1, 2026
50 of 52 checks passed
@tlongwell-block tlongwell-block deleted the fix/redis-tls branch July 1, 2026 01:14
tellaho pushed a commit that referenced this pull request Jul 1, 2026
…-preview

* origin/main:
  fix(relay): enable Redis TLS for rediss:// (ElastiCache) (#1417)
  chore(release): release Buzz Desktop version 0.3.40 (#1414)
  fix(desktop): stabilize channel-timeline scrollback with per-row height reserves (#1413)
  fix(sidebar): trim working badge label and name working agents in tooltip (#1408)
  Mobile tab bar polish (#1368)
  feat(desktop): let thread pane expand on ultrawide monitors (#1407)
  chore(release): release Buzz Desktop version 0.3.39 (#1410)
  fix: close cross-process keychain race and namespace dev-build nest (#1409)
  feat(relay): allow agent owners to edit/manage agent-owned content (#1403)
  fix(media): support IRSA/credential-chain S3 auth and configurable signing region (#1406)
  fix(desktop): fold baked build env into in-process model discovery (#1376)
  docs: link VISION_ACTIVITY from the VISION index (#1405)
tellaho pushed a commit that referenced this pull request Jul 1, 2026
…vity-embed

* origin/main:
  fix(relay): enable Redis TLS for rediss:// (ElastiCache) (#1417)
  chore(release): release Buzz Desktop version 0.3.40 (#1414)
  fix(desktop): stabilize channel-timeline scrollback with per-row height reserves (#1413)
  fix(sidebar): trim working badge label and name working agents in tooltip (#1408)
  Mobile tab bar polish (#1368)
  feat(desktop): let thread pane expand on ultrawide monitors (#1407)
  chore(release): release Buzz Desktop version 0.3.39 (#1410)
  fix: close cross-process keychain race and namespace dev-build nest (#1409)
  feat(relay): allow agent owners to edit/manage agent-owned content (#1403)
  fix(media): support IRSA/credential-chain S3 auth and configurable signing region (#1406)
  fix(desktop): fold baked build env into in-process model discovery (#1376)
  docs: link VISION_ACTIVITY from the VISION index (#1405)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant