Skip to content

fix(eval): handle failed inference results without invocations#5878

Open
he-yufeng wants to merge 1 commit into
google:mainfrom
he-yufeng:fix/eval-failed-inference-none
Open

fix(eval): handle failed inference results without invocations#5878
he-yufeng wants to merge 1 commit into
google:mainfrom
he-yufeng:fix/eval-failed-inference-none

Conversation

@he-yufeng
Copy link
Copy Markdown
Contributor

@he-yufeng he-yufeng commented May 28, 2026

What changed

  • Return a failed EvalCaseResult when inference failed before producing any invocations.
  • Preserve the existing session lookup path when a failed inference still has a session id.
  • Add a regression test for InferenceResult(status=FAILURE, inferences=None).
  • Clean up two existing lint issues in the touched eval test file so the local changed-file ruff check passes.

This prevents the eval runner from replacing the original inference error with TypeError: object of type 'NoneType' has no len().

Fixes #5876

To verify

  • .\.venv\Scripts\python.exe -m py_compile src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py
  • .\.venv\Scripts\python.exe -m pytest tests\unittests\evaluation\test_local_eval_service.py -k "failed_without_inferences or evaluate_single_inference_result" -q --basetemp .tmp\pytest
  • .\.venv\Scripts\python.exe -m ruff check src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py
  • .\.venv\Scripts\python.exe -m pyink --check src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py
  • git diff --check

@adk-bot adk-bot added the eval [Component] This issue is related to evaluation label May 28, 2026
@rohityan rohityan self-assigned this May 28, 2026
@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label May 28, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @he-yufeng , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh

@he-yufeng he-yufeng force-pushed the fix/eval-failed-inference-none branch from 7a33065 to 0315922 Compare May 28, 2026 20:51
@he-yufeng he-yufeng changed the title fix: handle failed eval inference results fix(eval): handle failed inference results without invocations May 28, 2026
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Thanks for the review. I rebased onto the latest main, ran pyink on the touched files, and also cleaned two existing lint issues in the touched test file so the changed-file checks are clean. Local verification now passes for py_compile, the targeted eval tests, ruff, pyink --check, and git diff --check.

@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased onto current upstream/main and reran the focused eval validation. Validation: test_local_eval_service.py passed (19 passed); ruff check passed; pyink --check passed; py_compile passed; git diff --check passed. I set PYTHONPATH=src so pytest imports this worktree explicitly.

@boyangsvl boyangsvl force-pushed the fix/eval-failed-inference-none branch from daddfd8 to 64759da Compare May 29, 2026 18:01
@rohityan rohityan added needs review [Status] The PR/issue is awaiting review from the maintainer and removed request clarification [Status] The maintainer need clarification or more information from the author labels May 29, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

rohityan commented May 29, 2026

Hi @ankursharmas , can you please review this.

@rohityan rohityan requested a review from ankursharmas May 29, 2026 19:07
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased onto current upstream/main and reran the focused eval checks.

Validation on Windows:

PYTHONUTF8=1 PYTHONPATH=src uv run --no-sync pytest tests\unittests\evaluation\test_local_eval_service.py -q --basetemp .tmp\pytest-5878-20260604 -p no:cacheprovider
# 19 passed, 54 warnings

PYTHONUTF8=1 PYTHONPATH=src uv run --no-sync ruff check src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py
PYTHONUTF8=1 PYTHONPATH=src uv run --no-sync pyink src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py --check
PYTHONUTF8=1 PYTHONPATH=src uv run --no-sync python -m py_compile src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py
git diff --check upstream/main..HEAD

@he-yufeng he-yufeng force-pushed the fix/eval-failed-inference-none branch from 64759da to 3a07e3a Compare June 4, 2026 03:56
@he-yufeng he-yufeng force-pushed the fix/eval-failed-inference-none branch from 3a07e3a to 0307ada Compare June 6, 2026 19:47
@he-yufeng
Copy link
Copy Markdown
Contributor Author

Rebased this branch onto current upstream/main and force-pushed the cleaned head.

Validation on Windows:

python -m py_compile src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py
PYTHONUTF8=1 PYTHONPATH=src .\.venv\Scripts\python.exe -m pytest tests\unittests\evaluation\test_local_eval_service.py -q --basetemp .tmp\pytest-5878-20260607 -p no:cacheprovider
# 19 passed, 41 warnings
PYTHONUTF8=1 uv run --no-sync pyink src\google\adk\evaluation\local_eval_service.py tests\unittests\evaluation\test_local_eval_service.py --check
# 2 files would be left unchanged
git diff --check upstream/main..HEAD

The local eval test environment needed optional eval dependencies installed in the existing ADK venv (pandas, google-cloud-aiplatform, rouge-score, mcp) before the full focused file could collect and run.

@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Jun 6, 2026

I will read the instruction file for the adk-pr-analyze skill before analyzing the PR.I will run the CLA verification script first to ensure compliance before performing any analysis or reading any diff.I will fetch the metadata for pull request 5878 using the custom GitHub PR tool.I need to execute the CLA verification script to verify that the contributor (@he-yufeng) has signed the Google Contributor License Agreement (CLA), which is a mandatory compliance gate before analyzing any code or pull requests. Please approve this command.I will fetch the linked issue #5876 to get more details about the problem that this PR aims to solve.I will execute gh pr diff using run_command to view the details of the changes made in PR 5878.I will view the relevant section of local_eval_service.py to trace the context of the bug fix.I will view a bit further down in local_eval_service.py to see what is returned by _evaluate_single_inference_result normally.I will view the import and license block of local_eval_service.py to check for pythonic conventions like physical headers and future annotations.I will search for get_session to see the type signature of the session details return value.I will view base_session_service.py to identify the typing of the get_session return value.I will search for EvalCaseResult definition to check its fields and types.I will view src/google/adk/evaluation/eval_result.py around line 31 to inspect EvalCaseResult.I will view the rest of the fields of EvalCaseResult in eval_result.py.I will view test_local_eval_service.py around line 465 to confirm the exact line numbers and format of the added unit tests.I will view the preceding lines of test_local_eval_service.py to see the original structure and find out where the new test fits in.I will generate a premium, structured Pull Request Analysis Report for PR 5878 as a Markdown artifact so it is persistently stored and formatted neatly.I have completed a thorough, read-only analysis of Pull Request #5878 and its linked GitHub Issue #5876.

The complete, premium structured analysis report has been generated and saved to your workspace artifacts:
👉 pr_5878_analysis_report.md

📋 Summary of Findings & Work Completed

  1. Core Problem Solved: Resolved evaluation execution crashes (TypeError: object of type 'NoneType' has no len()) in the LocalEvalService class when model inference fails completely before spawning any invocations.
  2. Implementation Verification:
    • Swapped the unsafe len(inference_result.inferences) check by implementing a robust early guard within _evaluate_single_inference_result.
    • Confirmed this correctly routes the failed job into a cleanly structured, failed EvalCaseResult.
  3. Style & Guidelines Compliance:
    • The PR perfectly mirrors Google ADK's development paradigms.
    • Added a well-structured Arrange-Act-Assert behavior test: test_evaluate_single_inference_result_failed_without_inferences.
    • Removed a redundant sys import import block and modernized styling (converting older == None targets to standard is None checks in tests).
  4. Recommendation: Approve. Excellent, highly focused contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval [Component] This issue is related to evaluation needs review [Status] The PR/issue is awaiting review from the maintainer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LocalEvalService crashes with len(None) when inference_result.inferences is None

3 participants