Skip to content

feat: replace ingress-nginx smoke cluster with Gateway API and Envoy Gateway#3386

Merged
hubcio merged 7 commits into
apache:masterfrom
avirajkhare00:master
Jun 16, 2026
Merged

feat: replace ingress-nginx smoke cluster with Gateway API and Envoy Gateway#3386
hubcio merged 7 commits into
apache:masterfrom
avirajkhare00:master

Conversation

@avirajkhare00

Copy link
Copy Markdown
Contributor

Which issue does this PR address?

Closes #3098

Rationale

Ingress NGINX was officially retired in March 2026. Our Helm smoke/CI
cluster setup still pulled the retired controller, which gets no further
security or bug fixes.

What changed?

scripts/ci/setup-helm-smoke-cluster.sh

  • Removed nginx-specific kind config (ingress-ready label, hostPort
    mappings, admission-webhook polling helpers).
  • Install Envoy Gateway via Helm first, then upgrade Gateway API CRDs to
    v1.5.0 with --server-side --force-conflicts. This ordering avoids the
    safe-upgrade ValidatingAdmissionPolicy blocking EG's older bundled CRDs.
  • Creates an EnvoyProxy (NodePort), GatewayClass, and Gateway resource,
    then waits for the Programmed condition.

scripts/ci/test-helm.sh

  • Removed HELM_SMOKE_INGRESS_CLASS; chart is now deployed without Ingress
    objects.
  • After helm install, applies two HTTPRoute resources pointing at the chart
    Services (server :3000, UI :3050).
  • Uses kubectl port-forward to expose the gateway on 127.0.0.1:8080,
    which works on both Linux CI runners and macOS (unlike direct NodePort
    access, which fails inside Docker Desktop's VM).

Local Execution

  • Passed
  • Pre-commit hooks ran

AI Usage

If AI tools were used, please answer:

  1. Which tools? (e.g., GitHub Copilot, Claude, ChatGPT) -> Claude
  2. Scope of usage? - fixed existing code
  3. How did you verify the generated code works correctly? -> ran and verified all locally
  4. Can you explain every line of the code if asked? -> YES

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Thanks for the PR. It is labeled S-waiting-on-review and queued for review.

Slash commands (own line, regular comment) move it around the queue:

  • /ready - back to S-waiting-on-review after addressing feedback
  • /request-review @user-or-team - request a reviewer

See CONTRIBUTING.md for details.

@github-actions github-actions Bot added the S-waiting-on-review PR is waiting on a reviewer label Jun 1, 2026
Comment thread scripts/ci/test-helm.sh
Comment thread scripts/ci/setup-helm-smoke-cluster.sh
Comment thread scripts/ci/test-helm.sh Outdated
Comment thread scripts/ci/setup-helm-smoke-cluster.sh Outdated
Comment thread scripts/ci/test-helm.sh Outdated
Comment thread scripts/ci/test-helm.sh Outdated
Comment thread scripts/ci/test-helm.sh Outdated
@github-actions github-actions Bot added S-waiting-on-author PR is waiting on author response and removed S-waiting-on-review PR is waiting on a reviewer labels Jun 1, 2026

@hubcio hubcio left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of findings that don't map to a line in this diff:

the helm smoke job didn't actually run on this PR. .github/config/components.yml gates the helm component on paths: helm/** only, and this PR only touches scripts/ci/*.sh, so detect-changes skipped the validate+smoke matrix - the green checks here are lint/shellcheck/license, none of them helm-named. so this whole rewrite (the envoy gateway install, the v1.5 crd apply, the port-forward, the curl loops) shipped without ci ever exercising it, and future edits to these smoke scripts will keep skipping too. adding scripts/ci/setup-helm-smoke-cluster.sh and scripts/ci/test-helm.sh to the helm component paths would gate them - rust-bench-dashboard already co-lists its scripts/dashboard/** the same way. this is the one i'd fix first, it's why everything else here went unexercised.

minor: HELM_SMOKE_GATEWAY_NAMESPACE and HELM_SMOKE_GATEWAY_NAME are defined with matching defaults in both scripts, and the HTTPRoute parentRef has to match the Gateway. overriding one of them in only one script silently breaks route attach - worth a note that they have to be set together.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs.

If you need a review, please ensure CI is green and the PR is rebased on the latest master. Don't hesitate to ping the maintainers - either @core on Discord or by mentioning them directly here on the PR.

Thank you for your contribution!

@github-actions github-actions Bot added the S-stale Inactive issue or pull request label Jun 9, 2026
@github-actions github-actions Bot removed the S-stale Inactive issue or pull request label Jun 14, 2026
The helm component path filter only matched helm/** so edits to the
smoke scripts skipped the validate+smoke matrix entirely. Add the two
scripts to the component paths so future changes here actually run.

While here, address review findings on the Gateway API rewrite:

- bump Envoy Gateway default to v1.8.1 (gateway API stays at the matching
  v1.5.1) so the kind k8s v1.35 / EG / gw-api triple lands inside the EG
  compat matrix
- drop the EnvoyProxy NodePort override and GatewayClass parametersRef;
  port-forward reaches Envoy via the apiserver ClusterIP tunnel so the
  default provider is enough
- split get_gateway_base_url into find_gateway_service + start_gateway_port_forward
  so the port-forward PID lands in the parent shell (the previous version
  set it inside a command-substitution subshell, making the cleanup kill
  a no-op and orphaning the tunnel on local reruns)
- install one trap cleanup_smoke_state EXIT covering both the port-forward
  PID and the temp values file so cleanup runs on every exit path
- retry the owning-gateway Service lookup 15x with stderr suppressed to
  tolerate the gap between Gateway Programmed and the Service appearing
- drop the redundant kubectl rollout status calls; helm upgrade --install
  --wait already blocks until both deployments are available
- add HELM_SMOKE_GATEWAY_PF_PORT env override for consistency
- fix the misleading "15 s" wait comment (actual worst case is ~45 s)
- document that HELM_SMOKE_GATEWAY_NAMESPACE and HELM_SMOKE_GATEWAY_NAME
  must be set together across both scripts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 74.68%. Comparing base (5fcf40c) to head (cc1f604).

Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3386      +/-   ##
============================================
- Coverage     74.72%   74.68%   -0.05%     
  Complexity      937      937              
============================================
  Files          1257     1257              
  Lines        124910   124910              
  Branches     100584   100629      +45     
============================================
- Hits          93343    93291      -52     
+ Misses        28570    28563       -7     
- Partials       2997     3056      +59     
Components Coverage Δ
Rust Core 75.72% <ø> (-0.01%) ⬇️
Java SDK 58.57% <ø> (ø)
C# SDK 71.40% <ø> (-0.71%) ⬇️
Python SDK 88.88% <ø> (ø)
PHP SDK 84.29% <ø> (ø)
Node SDK 91.35% <ø> (ø)
Go SDK 40.36% <ø> (ø)
see 32 files with indirect coverage changes
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

avirajkhare00 and others added 2 commits June 15, 2026 07:42
The Validate third-party licenses job failed on a transient TLS reset
from crates.io while cargo metadata was downloading `spin` (OpenSSL
SSL_read: unexpected eof). No dependency or license content changed in
this PR; retriggering CI.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Envoy Gateway's default envoyService.type is LoadBalancer, which has no
controller on kind, so the backing Service stays Pending and the Gateway
never reaches Programmed. Restore the EnvoyProxy + GatewayClass
parametersRef but switch the service type from NodePort to ClusterIP -
port-forward tunnels through the apiserver regardless of service type
so no node port is needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@avirajkhare00

Copy link
Copy Markdown
Contributor Author
  • gated helm smoke on scripts/ci/setup-helm-smoke-cluster.sh and scripts/ci/test-helm.sh in components.yml so this matrix actually runs on future script edits
  • bumped Envoy Gateway to v1.8.1 + Gateway API v1.5.1 (inside the EG compat matrix on kind k8s v1.35)
  • pinned EnvoyProxy envoyService to ClusterIP (default LB never gets an address on kind, so the Gateway never reached Programmed - caught by the now-running smoke job)
  • split get_gateway_base_url into find_gateway_service + start_gateway_port_forward so the port-forward PID lands in the parent shell; added a single trap cleanup_smoke_state EXIT covering the PID and the temp values file
  • find_gateway_service retries 15x with stderr suppressed for the transient missing-Service window after Gateway Programmed
  • dropped the redundant kubectl rollout status calls (helm --wait already blocks)
  • added HELM_SMOKE_GATEWAY_PF_PORT env override and fixed the misleading "15 s" comment (worst case ~45 s)
  • documented HELM_SMOKE_GATEWAY_NAMESPACE / HELM_SMOKE_GATEWAY_NAME coupling in both scripts

@avirajkhare00

Copy link
Copy Markdown
Contributor Author

/ready

@github-actions github-actions Bot added S-waiting-on-review PR is waiting on a reviewer and removed S-waiting-on-author PR is waiting on author response labels Jun 15, 2026
@hubcio hubcio merged commit 988194d into apache:master Jun 16, 2026
92 checks passed
@github-actions github-actions Bot removed the S-waiting-on-review PR is waiting on a reviewer label Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Helm smoke tests: move off retired ingress, use Gateway API

3 participants