Skip to content

feat(chart): per-pod emptyDir git scratch when persistence disabled (multi-replica HA)#1450

Merged
tlongwell-block merged 1 commit into
mainfrom
chart-release/0.1.2
Jul 1, 2026
Merged

feat(chart): per-pod emptyDir git scratch when persistence disabled (multi-replica HA)#1450
tlongwell-block merged 1 commit into
mainfrom
chart-release/0.1.2

Conversation

@tlongwell-block

Copy link
Copy Markdown
Collaborator

What

Make the relay's git working directory a per-pod emptyDir scratch volume when persistence.git.enabled: false, so the relay scales to N replicas cleanly.

Why

The git working dir (BUZZ_GIT_REPO_PATH) is ephemeral scratch — reads/writes hydrate repos from object storage per request, and repo-name uniqueness lives in Postgres (as of #1432). Nothing durable lives on this disk.

Two problems with the old shape:

  1. persistence.git.enabled: false mounted nothing — the git path pointed at an unmounted directory (git data landed on the container rootfs).
  2. enabled: true mounts a single ReadWriteOnce PVC, which binds to one node. On a Deployment with replicaCount > 1, pods scheduled onto other nodes stay Pending with a multi-attach error — even though the git data is fully disposable and could live on independent per-pod disks.

Change

  • templates/deployment.yaml: the git-repos volume is always mounted at mountPath; it's a PVC when persistence.git.enabled, a per-pod emptyDir when not.
  • values.yaml: document both modes; drop the misleading "RWO is fine at any replicaCount" note (true for disposability, not for multi-node scheduling).
  • tests/render_test.yaml: assert the emptyDir + zero-PVC path at replicaCount=5.
  • Chart 0.1.1 → 0.1.2.

enabled: true renders the PVC exactly as before — backward compatible.

Validation

  • helm unittest deploy/charts/buzz31/31.
  • helm template with persistence.git.enabled=false: replicas: 5, git-reposemptyDir: {}, 0 PVC docs.
  • helm template with persistence.git.enabled=true: git-repospersistentVolumeClaim: claimName: <fullname>-git, 1 PVC doc.

Deploy note

bb-block prod (buzz relay, scaling to 5) will consume this by bumping its chart dep to 0.1.2 and setting buzz.persistence.git.enabled: false — the storage half of squareup/bb-block#138.


Reopened from #1439: the head branch was renamed feat/git-scratch-emptydirchart-release/0.1.2 so the auto-tag workflow cuts chart-v0.1.2 and dispatches the chart publish on merge (the chart-release/<version> lane in auto-tag-on-release-pr-merge.yml). GitHub auto-closed #1439 on the rename and refuses to reopen it. Same commit, same diff: 24679bd.

The relay's git working dir (`BUZZ_GIT_REPO_PATH`) is ephemeral scratch —
reads/writes hydrate repos from object storage per request and repo-name
uniqueness lives in Postgres, so nothing durable lives on this disk.

Before: `persistence.git.enabled: true` mounted a PVC; `false` mounted
*nothing*, leaving the git path pointing at an unmounted directory. And a
single ReadWriteOnce PVC on a Deployment can only attach to one node, so
it cannot back a multi-replica relay across nodes despite the git data
being disposable.

Now: `enabled: false` mounts a per-pod `emptyDir` at the same mountPath —
true per-pod scratch with no shared volume to multi-attach, so the relay
scales to N replicas cleanly. `enabled: true` still renders the PVC
exactly as before (backward compatible).

- deployment.yaml: git-repos volume is always mounted; PVC when enabled,
  emptyDir when disabled.
- values.yaml: document both modes; drop the misleading "RWO is fine at
  any replicaCount" note (true only for disposability, not scheduling).
- render_test.yaml: assert the emptyDir + no-PVC path at replicaCount=5.
- Chart 0.1.1 -> 0.1.2.

Verified: `helm unittest` 31/31; `helm template` renders emptyDir (0 PVC
docs) when disabled and the PVC when enabled.

Co-authored-by: Tyler Longwell <tlongwell@block.xyz>
Signed-off-by: Tyler Longwell <tlongwell@block.xyz>
@tlongwell-block tlongwell-block merged commit c88799a into main Jul 1, 2026
54 checks passed
@tlongwell-block tlongwell-block deleted the chart-release/0.1.2 branch July 1, 2026 23:55
wpfleger96 added a commit that referenced this pull request Jul 2, 2026
…into HEAD

* origin/paul/nip-am-agent-turn-metrics:
  fix(profile): consolidate agent profile runtime metadata (#1451)
  fix(desktop): simplify workspace rail badges (#1462)
  perf(desktop): instant channel switching — non-blocking first paint, persisted snapshots (#1452)
  perf(relay): bounded-concurrency multi-filter query execution (S2) (#1457)
  fix(desktop): classify timeline prepends so history loads don't bump unread (#1416)
  fix(desktop): quiet gate for workspace switches instead of boot splash (#1449)
  fix(read-path): reach complete threads, dense-second timelines, and all people in the GUI (#1418)
  E1+E3: reduce relay ingest/fan-out DB round trips; ack p99 −7–16%, fd p99 −6–28%, p999 tails −29–53% vs PR #1453 tip (#1454)
  perf(relay): defer post-commit dispatch and avoid verify clone (#1453)
  fix(relay): include git hook tools in runtime image (#1326)
  feat(chart): per-pod emptyDir git scratch when persistence disabled (multi-replica HA) (#1450)
  fix(relay): remove media bearer-token auth (#1444)
  fix(desktop): stop search shortcut from hijacking the sidebar (#1447)

Co-authored-by: Will Pfleger <pfleger.will@gmail.com>
Signed-off-by: Will Pfleger <pfleger.will@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant