feat: replace ingress-nginx smoke cluster with Gateway API and Envoy Gateway#3386
Conversation
|
Thanks for the PR. It is labeled Slash commands (own line, regular comment) move it around the queue:
See CONTRIBUTING.md for details. |
hubcio
left a comment
There was a problem hiding this comment.
a couple of findings that don't map to a line in this diff:
the helm smoke job didn't actually run on this PR. .github/config/components.yml gates the helm component on paths: helm/** only, and this PR only touches scripts/ci/*.sh, so detect-changes skipped the validate+smoke matrix - the green checks here are lint/shellcheck/license, none of them helm-named. so this whole rewrite (the envoy gateway install, the v1.5 crd apply, the port-forward, the curl loops) shipped without ci ever exercising it, and future edits to these smoke scripts will keep skipping too. adding scripts/ci/setup-helm-smoke-cluster.sh and scripts/ci/test-helm.sh to the helm component paths would gate them - rust-bench-dashboard already co-lists its scripts/dashboard/** the same way. this is the one i'd fix first, it's why everything else here went unexercised.
minor: HELM_SMOKE_GATEWAY_NAMESPACE and HELM_SMOKE_GATEWAY_NAME are defined with matching defaults in both scripts, and the HTTPRoute parentRef has to match the Gateway. overriding one of them in only one script silently breaks route attach - worth a note that they have to be set together.
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either Thank you for your contribution! |
The helm component path filter only matched helm/** so edits to the smoke scripts skipped the validate+smoke matrix entirely. Add the two scripts to the component paths so future changes here actually run. While here, address review findings on the Gateway API rewrite: - bump Envoy Gateway default to v1.8.1 (gateway API stays at the matching v1.5.1) so the kind k8s v1.35 / EG / gw-api triple lands inside the EG compat matrix - drop the EnvoyProxy NodePort override and GatewayClass parametersRef; port-forward reaches Envoy via the apiserver ClusterIP tunnel so the default provider is enough - split get_gateway_base_url into find_gateway_service + start_gateway_port_forward so the port-forward PID lands in the parent shell (the previous version set it inside a command-substitution subshell, making the cleanup kill a no-op and orphaning the tunnel on local reruns) - install one trap cleanup_smoke_state EXIT covering both the port-forward PID and the temp values file so cleanup runs on every exit path - retry the owning-gateway Service lookup 15x with stderr suppressed to tolerate the gap between Gateway Programmed and the Service appearing - drop the redundant kubectl rollout status calls; helm upgrade --install --wait already blocks until both deployments are available - add HELM_SMOKE_GATEWAY_PF_PORT env override for consistency - fix the misleading "15 s" wait comment (actual worst case is ~45 s) - document that HELM_SMOKE_GATEWAY_NAMESPACE and HELM_SMOKE_GATEWAY_NAME must be set together across both scripts Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #3386 +/- ##
============================================
- Coverage 74.72% 74.68% -0.05%
Complexity 937 937
============================================
Files 1257 1257
Lines 124910 124910
Branches 100584 100629 +45
============================================
- Hits 93343 93291 -52
+ Misses 28570 28563 -7
- Partials 2997 3056 +59
🚀 New features to boost your workflow:
|
The Validate third-party licenses job failed on a transient TLS reset from crates.io while cargo metadata was downloading `spin` (OpenSSL SSL_read: unexpected eof). No dependency or license content changed in this PR; retriggering CI. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Envoy Gateway's default envoyService.type is LoadBalancer, which has no controller on kind, so the backing Service stays Pending and the Gateway never reaches Programmed. Restore the EnvoyProxy + GatewayClass parametersRef but switch the service type from NodePort to ClusterIP - port-forward tunnels through the apiserver regardless of service type so no node port is needed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
|
/ready |
Which issue does this PR address?
Closes #3098
Rationale
Ingress NGINX was officially retired in March 2026. Our Helm smoke/CI
cluster setup still pulled the retired controller, which gets no further
security or bug fixes.
What changed?
scripts/ci/setup-helm-smoke-cluster.shmappings, admission-webhook polling helpers).
v1.5.0 with
--server-side --force-conflicts. This ordering avoids thesafe-upgrade ValidatingAdmissionPolicy blocking EG's older bundled CRDs.
then waits for the Programmed condition.
scripts/ci/test-helm.shHELM_SMOKE_INGRESS_CLASS; chart is now deployed without Ingressobjects.
Services (server :3000, UI :3050).
kubectl port-forwardto expose the gateway on127.0.0.1:8080,which works on both Linux CI runners and macOS (unlike direct NodePort
access, which fails inside Docker Desktop's VM).
Local Execution
AI Usage
If AI tools were used, please answer: