Decouple Sandbox Binaries / Config from ActorTemplate#261
Conversation
| SandboxClassGvisor SandboxClass = "gvisor" | ||
| // SandboxClassMicroVM is the micro-VM runtime (cmd/ateom-microvm); needs | ||
| // /dev/kvm and vhost devices. | ||
| SandboxClassMicroVM SandboxClass = "microvm" |
There was a problem hiding this comment.
We probably shouldn't squat on the "microvm" name for one particular microvm implementation?
There was a problem hiding this comment.
I've been on the fence about this.
We could call it "cloud-hypervisor" (ignores kata, verbose) "kata" (ignores specific vm implementation).
I think it's OK to squat on this, most users that want a microvm probably don't need a more specific name, it gives us some freedom to iterate on the implementation details.
gvisor on the other hand seems to be in a class of it's own with a different model and no comparable alternatives.
If there's a really compelling additional microvm-like backend ... we'll come up with a name for it ("firecracker" ?), so far the other folks looking at this have settled on chv as well I think.
There was a problem hiding this comment.
+1 to call it "cloud-hypervisor" or "chv"
There was a problem hiding this comment.
I'd prefer to stick with microvm for now. If this is fully extensible later then a different implementation can use the same sandbox class name and be more "portable" for users.
"chv" is too opaque to an end-user and kata + chv are currently implementation details that don't even exist yet.
There was a problem hiding this comment.
Later we might want to move to an open string, but I don't think we are there yet and I didn't want to stall uVM further. When we have 2 decent ateoms we will have a better idea what extensible looks like. Modal for example offers "sandbox" (gvisor) and "vm sandbox" (chv?).
| // and decouples sandbox binary selection from ActorTemplate. | ||
| // | ||
| // +genclient | ||
| // +genclient:nonNamespaced |
There was a problem hiding this comment.
Why non-namespaced? Wouldn't namespaced, and in the same namespace as the workerpool make sense?
A cluster scoped resource might be restrictive if you're not the admin of the Kubernetes cluster.
There was a problem hiding this comment.
Why non-namespaced? Wouldn't namespaced, and in the same namespace as the workerpool make sense?
I was patterning it after StorageClass which is cluster scoped and has one tagged default.
So you could avoid configuring this repeatedly if you just want all worker pools of the same sandboxClass to use the same binaries.
It's not quite the same since you have one default SandboxConfig per sandboxClass.
A cluster scoped resource might be restrictive if you're not the admin of the Kubernetes cluster.
Yeah, at first blush I thought that might be a feature, it's pretty privileged to configure this, since we'll run the referenced binaries with high permissions.
I imagine similar to StorageClass a platform / vendor might want to configure these across all namespaces?
I don't feel super strongly about this part though.
I was mostly thinking about how to factor this out towards something semi-extensible that the casual user could ignore. I'm genuinely not sure if nonNamespaced makes sense.
There was a problem hiding this comment.
Slightly orthogonal, but I've also been mulling putting defaults into the ateom binaries, so if you really don't care you don't have to configure this at all.
There was a problem hiding this comment.
So you could avoid configuring this repeatedly if you just want all worker pools of the same sandboxClass to use the same binaries.
I also don't feel strongly. However, I think that actually having it namespaced does not preclude referencing it from a different namespace. So a cluster administrator could provide some default ones in a special namespace, and give everyone permission to refer to them (via some K8s RBAC, I guess)
There was a problem hiding this comment.
Not that it’s the same, but the GatewayClass is also cluster scoped. I agree that namespaced can be shared, but given that other similar resources are cluster scoped, it seems easier to start there and then fan out later. What do yall think?
There was a problem hiding this comment.
I think we should merge and iterate on this point. It's easy enough to refactor and we have a lot to add on top.
50c978b to
165fde6
Compare
|
[trivial rebase to pickup CI fix] |
This is a weird github-ism. AFAICT what I did was ... push with an additional commit? But it's phrased as if I went out of my way to "dismiss" the review ... |
| Default: true, | ||
| Assets: map[string]map[string]atev1alpha1.AssetFile{ | ||
| "amd64": {"runsc": { | ||
| URL: "gs://gvisor/releases/nightly/2026-05-19/x86_64/runsc", |
There was a problem hiding this comment.
when do these get pruned? (from the gcs bucket)
There was a problem hiding this comment.
No idea, but this value isn't new to the PR. This is what we ship in clusters. We can update both when a release has the fixes we need.
|
Benjamin Elder (@BenTheElder) there is consensus that anyone can create their own microvm implementation without that code living in our repo? (just making sure we are all in violent agreement). |
It sounds like a nice future but landing one that works first is the top priority for POC 2 ("alpha", end of June). This is going way out of the way to make the layering more prepared for that sort of thing over just landing it but it doesn't solve all of the blockers. EDIT: IOW: yes ... eventually, but it's not clear yet if that's an ateom, an atelet or what. It's not totally clear if the ateom/atelet split will even survive, or if we should merge these two ateoms later (and let the same worker fungibly run different sandbox types, I intentionally didn't do that for now so we can figure out different ateoms / worker pools). This PR gets us closer to extensible. |
…late Introduce a cluster-scoped SandboxConfig CRD holding the sandbox binaries (per sandbox class, per arch: url + sha256) and decouple binary selection from the ActorTemplate: - New SandboxConfig with SandboxClass (gvisor|microvm), a Default flag (cluster default per class), and Assets[arch][name]=AssetFile. - WorkerPool gains SandboxClass + SandboxConfigName (resolve by name, else the cluster default for the class). - ActorTemplate drops Runsc and the runsc/auth config types. - Regenerated CRDs, deepcopy, and clientset/listers/informers. - docs/api-guide.md: document SandboxConfig and the new WorkerPool fields.
RunRequest now carries a backend-agnostic SandboxAssets (sandbox_class + assets keyed by arch then asset name). CheckpointRequest/RestoreRequest reserve the old runsc field: checkpoint uses the version recorded on-node at Run/Restore, and restore reads it from the snapshot manifest.
- Fetch sandbox binaries from RunRequest.SandboxAssets (content-addressed, cached); for gVisor this is the "runsc" asset passed to ateom as RunscPath. - Record the running version in per-actor on-node state at Run/Restore so a later Checkpoint can re-fetch and pin it (ateompath.ActorSandboxAssetsFile). - Write a snapshot manifest (manifest.json) beside the checkpoint images at Checkpoint and read it at Restore, so restore is self-describing across nodes. Adds ategcs.SendBytesToGCS for the manifest upload.
10f24ce to
f5be2fa
Compare
On boot-from-spec (Run), resolve the actor's WorkerPool, take its SandboxClass, and pick the named or cluster-default SandboxConfig, sending the result as RunRequest.SandboxAssets. Checkpoint/Restore no longer send sandbox config (atelet uses the on-node record / snapshot manifest). - New resolveSandboxAssets helper; thread WorkerPool + SandboxConfig listers through NewService/NewActorWorkflow into the resume Run step. - Drop the RunscConfig plumbing from resume/pause/suspend. - Grant ate-api-server get/watch/list on workerpools + sandboxconfigs. - functional_test: create a WorkerPool + default gvisor SandboxConfig.
- Ship a cluster-default gvisor SandboxConfig (gvisor-default) carrying the runsc assets and apply it in install-ate.sh so gVisor pools work out of the box; ensure_crds also checks for the SandboxConfig CRD. - Remove the now-removed runsc block from all demo templates and the e2e probe fixture; they rely on the default SandboxConfig. - e2e demo_test: stop copying ActorTemplate.Runsc; carry the WorkerPool's SandboxClass/SandboxConfigName instead.
The CRD already requires each AssetFile to set url + sha256 (sha256 must be 64 hex). Add a ValidatingAdmissionPolicy enforcing the per-class requirement the CRD schema can't express: a gvisor SandboxConfig must define a "runsc" asset for every architecture under spec.assets (fail-closed at apply time, instead of only failing later when an actor tries to boot). Applied before the default SandboxConfig in install-ate.sh so the defaults are validated too. Add an envtest test that loads the shipped policy, waits for it to activate, and covers both the VAP (gvisor must have runsc) and the CRD-level url/sha256 requirements.
f5be2fa to
ca581e1
Compare
| SHA256 string `json:"sha256"` | ||
| } | ||
|
|
||
| // SandboxConfigSpec is the desired state of a SandboxConfig. |
There was a problem hiding this comment.
When I originally read the issue I think I was envisioning something more akin to SandboxClass. Wherein the Sandbox class would define the assets, and there could be any number of them with arbitrary names. This one seems to instead have hardcoded class names. If we're going to use hardcoded class names I have a couple of follow-up questions:
- why not make the config blocks for
gvisorvschexplicit rather than a generic map? - Why would a user need multiple
SandboxConfig?
There was a problem hiding this comment.
[...] This one seems to instead have hardcoded class names. [...]
To start, because it's not really possible to bring a custom one yet. But you can see how that would update pretty easily (here, the other parts are trickier)
why not make the config blocks for gvisor vs ch explicit rather than a generic map?
It's not really necessary so far. They both just need a set of assets which should be recorded alongside the snapshot. A per-class VAP (see last commit) can ensure you have the right list of assets while the generic config shape can make sure you have set url+sha etc.
I think other properties of the sandbox, like say, memory capacity, volume mounts, will be on the actortemplate side.
Why would a user need multiple SandboxConfig?
You can set different SandboxConfig on different workerpools. For example to A/B a new gvisor release.
EDIT: Which to an ActorTemplate is still the same "sandbox class", it's more akin to a different release of that class. Everything in SandboxConfig is a versioned asset for a sandboxClass. It might need a different name ...
00a90a8
into
agent-substrate:main
Fixes #253
Should make something like #239 less awkward, and also #255 (we don't have to cram more fields into actortemplate for the shim binary and update all of the demo actortemplates to include them).