fix(api-keys): reuse shared copy button for created keys#432
fix(api-keys): reuse shared copy button for created keys#432Soju06 merged 1 commit intoSoju06:mainfrom
Conversation
Soju06
left a comment
There was a problem hiding this comment.
Clean refactor replacing the inline clipboard logic with the shared CopyButton component. Approving.
- Shared
CopyButtonalready has error handling (toast.error("Failed to copy")) which the previous inline version was missing — this is the user-facing improvement behind #127. - Removes 36 lines of duplicated component code and drops 2 lucide imports.
- Frontend-only, no collisions with recently merged #421.
- CI 18/18 green.
Merging into the v1.13.2 batch.
|
@codex review (Post-merge audit — missed the pre-merge review step. Triggering now; any findings will be addressed in a follow-up PR.) |
|
Codex Review: Didn't find any major issues. Already looking forward to the next diff. ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
|
@all-contributors please add @stemirkhan for code, test (Contributions across #422, #425, #432 — all merged in v1.14.0. Picker UX, request-logs plan column, and the API-key CopyButton refactor that fixes #127.) |
|
I've put up a pull request to add @stemirkhan! 🎉 |
* fix(proxy): prefer budget-safe routing and support image-generation compatibility ("code":"invalid_request_error","param":"tools") (#421)
* fix(proxy): prefer budget-safe responses routing
* test(proxy): align budget-safe stickiness expectations
* chore(proxy): clarify budget-safe routing naming
* fix(proxy): support responses image-generation compatibility
* fix(api-keys): reuse shared copy button for created keys (#432)
* feat(api-keys): show assigned account availability in picker (#422)
* feat(api-keys): show account availability in picker
* chore(openspec): record validation
* docs(pr): add APIs picker screenshot
* feat(dashboard): show account plan in request logs table (#425)
* feat(dashboard): show request log account plans
* chore(pr): add request logs screenshot
* fix(request-logs): persist plan type snapshots
* style: restore import grouping blank line (ruff I001)
* style: apply ruff format
---------
Co-authored-by: Crawfish (Soju06) <crawfish@openclaw.local>
* fix(proxy): prevent context blowup by trimming input on client-supplied previous_response_id (#448)
* fix(proxy): force store=true to enable server-side context persistence and input trimming
The proxy was enforcing store=false on all requests, which prevented OpenAI
from persisting conversations server-side. Without server-side state,
previous_response_id cannot reference stored conversations, forcing the CLI
to resend the entire conversation history (~250K tokens) on every API call.
This caused context compaction to trigger after just ~4 agentic steps:
4 calls × 250K = ~1M cumulative tokens → exceeds compaction threshold
Changes:
- Force store=true in ResponsesRequest, ResponsesCompactRequest, and
V1ResponsesRequest validators (overrides client's store=false)
- Track input_item_count per request in _WebSocketRequestState
- Track last_completed_input_count per session in _HTTPBridgeSession
- Trim already-stored input items when previous_response_id is available,
reducing per-request input from ~250K to ~5K tokens
Note: store=true only enables server-side API persistence for
previous_response_id chaining. It does NOT expose conversations in the
ChatGPT UI — API and ChatGPT are completely separate systems. The stored
data appears only in the developer dashboard logs (retained 30 days) and
is never used for model training.
* fix(proxy): scope store override to Codex bridge and preserve trim counts
* style: align bridge fixes with ruff formatter
* fix(proxy): coerce store=true to false instead of rejecting, unblocking Codex CLI
* fix(proxy): remove force_store=true upstream — ChatGPT backend rejects store=true
The ChatGPT backend API explicitly requires store=false and returns
'Store must be set to false' when store=true is sent upstream.
Remove the force_store override that was injecting store=true into the
upstream payload. The validator coercion (silently returning false
instead of raising ValueError) remains in place to accept clients like
Codex CLI that send store=true in their request body.
* fix(proxy): trim input when client sends previous_response_id, not just proxy-injected
The trimming logic at line 648 only fired when the proxy itself injected
previous_response_id (fresh_reattach). When the Codex CLI sends its own
previous_response_id (which it does on every turn after the first), the
trimming was skipped entirely — causing the full conversation history to
be resent on every turn despite the upstream already having it.
This was the actual root cause of the context blowup: 49K tokens per
turn for a simple 'hi'/'hello' conversation instead of ~1-3K.
Fix: extend the trimming condition to also trigger when the client-
supplied previous_response_id is present, not just when the proxy
injected one.
* fix(proxy): guard trim against edited-prefix silent regression
Before this change, the HTTP bridge `store_context_input_trimmed` path
assumed the incoming `input` list was always an append-only extension of
whatever the session last saw. If a client re-sent the full history with
an in-place edit to an already-stored item (e.g. a corrected prior
user/tool message), the slice-based trim silently dropped the edited
prefix and forwarded only the suffix. Upstream would then answer against
its stale stored context and ignore the correction — a silent
correctness regression with no visible error.
Guard the trim with a SHA-256 fingerprint of the already-stored input
prefix:
- Compute a fingerprint of each request's full input list at
`_prepare_response_bridge_request_state` time and park it on
`_WebSocketRequestState`.
- When a response completes, promote the fingerprint onto the session
alongside `last_completed_input_count` so the session holds a stable
hash of the stored prefix.
- On the next trim decision, hash the incoming `input[:stored_count]`
and only trim when it matches the stored fingerprint byte-for-byte.
If it does not match, fall back to forwarding the full `input` and
emit a `store_context_input_trim_skipped_prefix_mismatch` warning so
the mismatch is observable.
The fingerprint uses canonical JSON (sorted keys, no whitespace) so
equivalent payloads produce the same hash regardless of upstream dict
ordering, and scales to long histories without pinning full items in
memory.
Added `test_stream_via_http_bridge_skips_trim_when_stored_prefix_was_edited`
to cover the regression. Updated the existing trim preservation test to
supply a matching fingerprint so the positive path still trims.
* fix(proxy): satisfy ty narrowing for JsonValue input list
ty doesn't eliminate the Mapping intersection type from JsonValue even
after isinstance(..., list), so pass explicit list[JsonValue] casts to
_fingerprint_input_items and the trim slice.
* fix(proxy): record original full-input fingerprint after trim
The trim path calls _prepare_http_bridge_request with trimmed_payload,
which sets request_state.input_full_fingerprint to the hash of the
trimmed suffix. response.completed then promotes that suffix hash onto
the session as last_completed_input_prefix_fingerprint. On the next
turn, the trim check compares hash(input[:stored_count]) — built from
the client's full input — against a suffix hash that will never match,
silently disabling trimming for the rest of the session and letting
long-session context growth return.
Override the fingerprint back to the ORIGINAL full input hash right
after input_item_count is restored, so the session always holds the
prefix fingerprint the next turn's check expects.
Guard the existing trim preservation test with an explicit assertion
that request_state.input_full_fingerprint matches the hash of the full
3-item input (not the 1-item suffix).
---------
Co-authored-by: Crawfish (via Soju06) <crawfish@openclaw.local>
* chore(main): release 1.14.0 (#447)
* docs: add stemirkhan as a contributor for code, and test (#452)
* docs: update README.md
* docs: update .all-contributorsrc
---------
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* fix(bootstrap): log first-run token at WARNING, not INFO (#459)
Operators following the README quickstart expect the auto-generated
bootstrap token to appear in `docker logs codex-lb`, but the default
docker/uvicorn runtime resolves the root logger (and therefore
`app.core.bootstrap`) to WARNING. `log_bootstrap_token` was emitting
at INFO, so the one-time token line was silently dropped. The token is
still generated and persisted (`bootstrapTokenConfigured: true`), so
the dashboard stays gated behind the bootstrap screen with no way for
the operator to recover the token short of reading the DB.
Promote the token message to WARNING. It's a one-time startup artifact
the operator genuinely needs to see regardless of the configured log
level, and WARNING is visible in every default logging setup we ship.
Added a regression test that attaches a WARNING-level handler to a
namespaced logger and asserts the token line reaches it, so any future
downgrade to INFO fails CI loudly.
Closes #458.
Co-authored-by: Crawfish (Soju06) <crawfish@openclaw.local>
* test(proxy): lock previous_response_id retries to reconnect-only recovery (#460)
Co-authored-by: Ruben Beuker <rubenbeuker@MacBook-Air-van-Ruben.local>
* fix(proxy): harden continuity recovery, safe WS replay, and shutdown/restart bridge lifecycle (#415)
* fix(proxy): narrow previous_response recovery to not_found semantics and add regression tests
* fix(ws): transparently replay pre-created responses on quota/rate-limit WS errors
* fix(proxy): harden shutdown and reconnect lifecycle
* test(proxy): fix typing in bridge shutdown regression coverage
* style: apply ruff formatting for bridge continuity changes
* fix(proxy): preserve scoped previous-response ownership across bridge and retry
* test(proxy): add regression coverage for bridged previous-response reconnect-only behavior
* fix(proxy): harden continuity fail-closed flows
* fix(proxy): resolve ty diagnostics in continuity tests
* fix(proxy): persist non-bridge continuity anchors
* fix(proxy): mask previous_response_not_found without breaking inflight response routing
* style(proxy): fix ruff line length
* style(proxy): format service.py with ruff
* fix(proxy): harden previous_response anchor matching for multiplexed follow-ups
* fix(proxy): fail-closed previous_response_not_found and keep WS/HTTP bridge run continuity
* fix(db): linearize request_logs migration chain after main merge
* fix(db): add alembic merge revision for request_logs heads
* test(proxy): make owner-lookup reservation-release regression test ty-compatible
* chore(main): release 1.14.1 (#453)
* fix(proxy): inject session-level previous_response_id to enable input trimming for all clients (#456)
* fix(proxy): inject session-level previous_response_id to enable input trimming for all clients
Codex CLI and other clients that don't send previous_response_id in their
request payload bypassed the input trimming logic entirely, causing input
tokens to grow monotonically within a bridge session (observed 31K → 201K
tokens across 15 requests in a single session).
The durable lookup injection only fires when a canonical session key
exists, which requires a prior successful lookup. For the common case of
a new session that stays on the same bridge WebSocket, neither the client
nor the durable layer provides a previous_response_id, so trimming never
activates.
Fix: after a response.completed event, record the response ID on the
bridge session. On the next request through the same session, if no
previous_response_id is available from the client or durable lookup,
inject the session's last_completed_response_id. This is reliable because
the upstream conversation state is guaranteed to contain this response —
it was produced on the very WebSocket connection we are about to use.
If the WebSocket reconnects, a new session is created with
last_completed_response_id=None, so no stale injection occurs.
* fix(proxy): scope session anchor injection to codex continuity
* fix(proxy): preserve fresh-upstream retries for injected anchors
* test(proxy): align durable-anchor expectations with full-resend guard
* fix(proxy): guard session-level anchor injection against non-trimmable payloads
Codex review flagged two P1/P2 issues on the session-level
previous_response_id injection path added in this PR:
1. (P1) The injection fired for every codex-session follow-up as soon as
the session had a completed response, with no check that the trim
branch downstream would actually strip the already-stored prefix.
For full-resend payloads that cannot be trimmed (non-list input,
shorter history, or prefix fingerprint mismatch), that meant the
upstream call received both the full history and an injected
previous_response_id, duplicating context and distorting output/cost.
Gate the injection on the same trimmability check the trim branch
performs so the anchor is only attached when the prefix would also
be stripped.
2. (P2) `session.last_completed_response_id` was being updated only
inside the `input_item_count > 0` branch, so string-input turns
never refreshed the field. That weakened continuity in mixed-input
sessions because later injections could reuse a stale id or skip
injection entirely. Move the response id update out of the input
count branch so every completed turn refreshes the session anchor.
Added a regression unit test that exercises a codex continuity session
with a completed response and a non-list (string) input, asserting that
`_prepare_http_bridge_request` still sees `previous_response_id=None`
(no anchor injected) so upstream does not end up with duplicated
context.
* fix(proxy): restrict fresh-upstream replay to retry-safe injections
Codex review flagged that the fresh-upstream retry helper was willing
to drop *any* proxy-injected previous_response_id and replay
`fresh_upstream_request_text` as a fresh turn. That is safe for
durable-anchor injections (which capture the original full-resend
payload before injecting the anchor) but unsafe for the session-level
anchor injection added in this PR: the unanchored payload there may
have relied on the anchor for context preservation (for example a
single-item follow-up whose prior turns live only in the stored
conversation), so replaying without it silently turns a continuation
into a context-free fresh turn.
Introduce `_WebSocketRequestState.fresh_upstream_request_is_retry_safe`
to mark which injection paths produced a replay-safe captured payload:
- Durable-anchor injection on the reattach path -> True
- Trim-branch re-preparation that preserves the trim-safe full-resend
payload -> True
- Session-level anchor injection on codex continuity sessions -> False
`_retry_http_bridge_request_on_fresh_upstream` now requires this flag
in addition to `fresh_upstream_request_text` before dropping the
anchor, so session-level injections surface the original send failure
as a retriable error instead of executing as a fresh turn.
Added two regression unit tests:
- retry helper replays when fresh_upstream_request_is_retry_safe=True
- retry helper refuses to replay when the flag is False, leaving the
original send failure to propagate
Also updated the existing bridge-context-blowup regression that asserts
the durable-injection replay path to set the new flag explicitly.
* fix(proxy): only mark trim-verified payloads as fresh-upstream retry-safe
Codex review flagged that the previous commit marked durable-anchor
injection as fresh-upstream retry-safe, but durable injection actually
only fires when the incoming payload is *not* a full resend (the
`not _http_bridge_payload_looks_like_full_resend(payload)` guard on
the reattach branch). The captured unanchored text for that path is
typically a short single-item follow-up whose context lived only in
the injected `previous_response_id`, so replaying it as a fresh turn
would silently strip conversation context and return wrong-but-
successful output.
The only execution path that has actually verified the unanchored
payload contains a full resend is the trim branch, which checks the
stored prefix fingerprint against the incoming input before stripping
it. So narrow fresh-turn replay eligibility to that branch only:
- Durable-anchor injection on reattach -> False
- Session-level anchor injection on codex continuity -> False
- Trim branch re-preparation after a successful prefix match -> True
Retry semantics are unchanged from the caller's perspective: unsafe
injections still surface send failures as retriable errors instead of
replaying as fresh turns.
---------
Co-authored-by: Crawfish (via Soju06) <crawfish@openclaw.local>
* fix(proxy): prevent admission semaphore leak and raise concurrency limits (#466)
AdmissionLease slots were permanently lost when asyncio.CancelledError
bypassed manual release() calls, causing proxy_overloaded errors that
never recovered without a restart.
- Add __enter__/__exit__ context manager to AdmissionLease for scoped safety
- Add __del__ safety net that releases leaked semaphores with a warning
- Catch BaseException (not just Exception) in HTTP bridge prewarm cleanup
so CancelledError also triggers _cleanup_http_bridge_submit_interruption
- Raise default concurrency limits to production-realistic values:
bulkhead_proxy_limit: 200 -> 512
proxy_token_refresh_limit: 32 -> 64
proxy_upstream_websocket_connect_limit: 64 -> 128
proxy_response_create_limit: 64 -> 256
proxy_compact_response_create_limit: 16 -> 64
* feat(proxy): add GPT-5.5 and GPT-5.5 Pro model support (#477)
Add pricing entries, alias patterns, and websocket preference for
the new GPT-5.5 family. Pricing from the official announcement:
$5/$30 per 1M tokens (standard), batch/flex at half rate, priority
at 2.5x. GPT-5.5 Pro at $30/$180.
* chore(main): release 1.15.0 (#467)
* fix(proxy): load balancer filter (#485)
* fix(proxy): filter paused accounts from selection inputs
* test(proxy): cover paused sticky account filtering
* feat(proxy): make upstream response.create max bytes configurable via env var (#476)
The 15 MiB websocket payload limit was hardcoded in two modules. Add
CODEX_LB_UPSTREAM_RESPONSE_CREATE_MAX_BYTES to Settings so operators
can raise the ceiling without patching source. The warn threshold
scales automatically to 80% of the configured max.
* fix(oauth): make manual callback idempotent (#481)
* fix(oauth): make manual callback idempotent
* fix(oauth): require state match for manual-callback idempotency
Apply codex review feedback (P2) on #481: returning success solely from
status=="success" allowed stale callback URLs from a different OAuth
attempt to bypass state/code validation. Restrict idempotent return to
the same attempt by also requiring the incoming state to match the
currently-stored state token.
Add regression tests:
- test_manual_callback_is_idempotent_for_same_attempt: re-submitting the
same callback URL for the same attempt does not re-exchange the code.
- test_manual_callback_after_success_rejects_stale_callback: a stale URL
(different state) arriving after success is rejected with the existing
state-mismatch error rather than masked as success.
Tests: tests/integration/test_oauth_flow.py 9 passed.
* style(oauth): ruff format
---------
Co-authored-by: Soju06 <qlskssk@gmail.com>
* fix(proxy): map unsupported reasoning effort 'minimal' to a supported value (#494)
* fix(proxy): map unsupported reasoning effort 'minimal' to 'low'
The OpenAI Responses API accepts reasoning.effort='minimal' for the GPT-5
family, but the upstream ChatGPT/Codex WebSocket backend codex-lb proxies
to silently drops the field: the stream emits 'codex.rate_limits' and
then never produces 'response.completed', leaving the client to time out.
Codex CLI's '--reasoning-effort minimal' and any other client that picks
'minimal' for latency reasons therefore hangs against codex-lb today.
Normalize 'minimal' to a value the resolved model advertises in its
'supported_reasoning_levels' (lowest, defaulting to 'low' when the
registry has no metadata yet) inside apply_api_key_enforcement, with an
info log so operators can see the rewrite. This keeps schema/key-level
enforcement permissive while preventing the upstream hang.
Closes #493
* style: ruff format
---------
Co-authored-by: craw <craw@openclaw.local>
* docs: add rio-jeong as a contributor for code, bug, and test (#492)
* docs: update README.md
* docs: update .all-contributorsrc
---------
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
* fix(proxy): pre-validate strict JSON schemas to surface invalid_json_schema (#491) (#495)
* fix(proxy): pre-validate strict JSON schemas to surface invalid_json_schema (#491)
When clients send a strict-mode JSON schema (response_format /
text.format with strict=true) that violates OpenAI's structured-outputs
rules (missing additionalProperties:false on an object node, missing
type, etc.), the Codex backend rejects the request and closes the
websocket session with close_code=1000 — sending the original
invalid_json_schema detail in a response.failed event that the proxy
currently overwrites with a generic stream_incomplete 502. The bridge
also tries to reconnect/reattach the same permanently-invalid request,
wasting another upstream hit.
Validate strict schemas locally before any upstream connection is opened
so /v1/responses and /v1/chat/completions both return a deterministic
400 with the exact OpenAI invalid_json_schema error envelope, and no
retry/reconnect loop is triggered.
The validator only enforces strict=true constraints; schemas with strict
omitted or strict=false are passed through unchanged so the upstream API
keeps owning that policy.
Refs: #491
* fix(strict-schema): widen helper signatures to Mapping for ty type-check
* fix(strict-schema): require every property to be listed in 'required'
Address codex-bot P1 review on #495: strict mode rejects schemas where
'properties' contains keys missing from 'required' (e.g. {'required': []}
on a non-empty 'properties'). Without this, schemas that only fail this
specific rule still escape the local pre-check and fall back to the
upstream stream_incomplete path the PR was intended to remove.
The diagnostic mirrors the upstream OpenAI API message verbatim so the
reporter sees the same body whether the failure happens locally or on
api.openai.com directly.
---------
Co-authored-by: crawfish <qlskssk+crawfish@gmail.com>
* docs: add tobwen as a contributor for code, test, and bug (#489)
* docs: update README.md
* docs: update .all-contributorsrc
---------
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Soju06 <qlskssk@gmail.com>
* fix(api-limit): Add fallback for api limit reset (#475)
* fix(api-limit): Fallback hourly limit reset, while preserve lazy reset logic
* add review changes
* fix ruff
* small ui change, remove the clock icon and cleaner time selector
* update time format block to not as a separate block
* feat(proxy): add OpenAI-compatible /v1/images API (gpt-image-2 via image_generation tool) (#498)
* feat(proxy): add OpenAI-compatible /v1/images API (gpt-image-2 via image_generation tool)
Expose POST /v1/images/generations and POST /v1/images/edits as a thin OpenAI
Images API compatibility layer. Both endpoints translate to internal
/v1/responses requests with the built-in image_generation tool, so the
existing ChatGPT account pool, sticky sessions, auth, and usage all keep
working without a separate ChatGPT-token -> openai-api-key exchange path.
Highlights:
- Public model gating: only the gpt-image-* family is accepted; gpt-image-2
is the default and uses the constrained quality/size/background matrix
(16-multiple sizes, max edge 3840 px, 3:1 ratio cap, 655_360..8_294_400
pixel envelope; rejects input_fidelity and background=transparent).
- Legacy gpt-image-1.5 / gpt-image-1 / gpt-image-1-mini accept the fixed
size set (1024x1024, 1536x1024, 1024x1536, auto) and allow input_fidelity
only on /v1/images/edits.
- /v1/images/variations is exposed but always returns 404 with a
not_found_error envelope; codex CLI does not call variations and the
ChatGPT Responses backend does not expose a tool path that maps cleanly.
- Streaming surfaces canonical OpenAI Images SSE events:
image_generation.partial_image (b64_json + partial_image_index + size /
quality / background / output_format) and image_generation.completed
(b64_json + revised_prompt + size / quality / background /
output_format + usage), followed by data: [DONE]. Internal
Responses-shape events (response.created, reasoning, content_part,
output_text, image_generation_call.{in_progress,generating},
codex.rate_limits, etc.) are intentionally dropped.
- Internal Responses request always sets stream=True regardless of the
public client's stream flag because the upstream image_generation tool
rejects non-streaming requests; non-streaming public clients get the
drained JSON envelope with usage attached from tool_usage.image_gen.
- The image_generation tool config does not accept 'n', so n is enforced
at the Images-API layer via images_max_n (default 4) instead of being
forwarded into the tool config.
- New settings: images_host_model (default gpt-5.5, never echoed to
clients), images_default_model (default gpt-image-2),
images_max_partial_images (default 3), images_max_n (default 4).
OpenSpec change add-images-api-compat documents the proposal, tasks, and
spec scenarios. Verified live against the dev server with both
non-streaming and streaming requests on /v1/images/generations and
/v1/images/edits, plus rejection paths for unsupported models, sizes,
backgrounds, and the variations endpoint.
* fix(proxy): satisfy ruff format + ty check for /v1/images adapter
CI on PR #498 flagged two checks that were not caught locally:
- ruff format --check reformatted app/modules/proxy/images_service.py
(a couple of long-line continuations that auto-format trims).
- uv run ty check rejected several places in the adapter and tests for
using JsonValue-typed values without narrowing.
Adapter fixes:
- images_response_from_responses: cast the upstream output field to
list[JsonValue] only after isinstance-narrowing so
_select_image_items keeps its declared parameter type.
- _build_error_event: build the SSE event by inserting each entry from
the OpenAI error envelope individually, so the declared return type
dict[str, JsonValue] matches what the spread literal would otherwise
produce as a wider dict[str, str | int | float | ...].
- _proxy_images_generation_request / _proxy_images_edit_request in
api.py: dispatch on isinstance(images_result, V1ImageResponse)
instead of isinstance(..., dict) so ty can narrow the union to a
pydantic model with model_dump and to the OpenAIErrorEnvelope (a
TypedDict) on the error branch.
Test fixes:
- tests/unit/test_images_translation.py and
tests/integration/test_proxy_images.py: introduce small cast(...)
helpers (_tool, _input_msg, _content_list,
_image_response, _as_mapping) so that ty can chain-subscript
request payload values stored as JsonValue /
Mapping[str, JsonValue]. The cast pattern mirrors what other tests
in the repo already do (see test_chat_request_mapping and
test_openai_requests). Behaviour is unchanged; this is purely a
type-narrowing change for the uv run ty check step.
Local re-runs:
- uvx ruff format --check . — 424 files clean
- uv run ty check — 0 new diagnostics on this branch
- pytest tests/unit -q — 1433 passed, 3 skipped
* feat(proxy): record public gpt-image-* model in /v1/images request logs
Previously every /v1/images/* call wrote its request log row with the
internal host Responses model (e.g. gpt-5.5) because the adapter routes
through stream_responses, which records ResponsesRequest.model. That
made the dashboard and usage views show gpt-5.5 instead of the
user-visible gpt-image-2 the client actually sent.
We now correlate the upstream Responses id with the public effective
model and rewrite the request_logs row once the stream completes.
Adapter changes:
- translate_responses_stream_to_images_stream and
collect_responses_stream_for_images accept an optional
captured: dict[str, str] and store the upstream Responses id under
the response_id key the first time it appears on any event.
- collect_responses_stream_for_images no longer breaks on the
response.completed event; it now keeps draining the upstream
stream after capturing final_response so stream_responses
finalizes (and writes its request log) before we return control to
the route handler.
- New ProxyService.rewrite_request_log_model(request_id, model)
helper updates the request_logs row for the given request_id to
the publicly-requested gpt-image-* value. It retries with short
backoff while the row is still missing, because the upstream stream
generator writes its log row from a finally block that may run
after the route handler regains control on the streaming path.
- New RequestLogsRepository.update_model_for_request(request_id,
model) performs the actual update and returns the affected row
count so the helper can stop retrying once the row is present.
- Both /v1/images/generations and /v1/images/edits route
handlers now pass the captured response_id back to
rewrite_request_log_model after the upstream stream finishes
(drain for non-streaming clients, after the public stream has yielded
its terminal event for streaming clients).
Spec change:
- The 'Image routes participate in usage accounting and policy'
requirement now documents that the request_log model column carries
the public gpt-image-* value rather than the host model, with a new
Scenario covering the rewrite.
New tests:
- test_captured_response_id_populated_during_collect — covers
collect_responses_stream_for_images capturing the response id.
- test_translate_populates_captured_response_id — covers the streaming
translator capturing the response id alongside the canonical
image_generation.* events.
Live verification on dev: both /v1/images/generations (non-stream and
stream) now write request_logs.model = 'gpt-image-2' instead of
'gpt-5.5'.
* fix(proxy): address codex review findings on /v1/images adapter
Codex review on PR #498 flagged four concerns (one P1 that boiled down
to two distinct issues, plus one P2). This commit addresses all of
them.
P1.1 - n parameter is silently dropped (images_service.py)
The image_generation tool config does not accept n and we were
del n without honouring the requested count. Multi-image fan-out
is not implemented yet, so we now reject n > 1 at the API
boundary with a clear OpenAI error envelope (param: n) and
default images_max_n to 1. Operators can raise the cap once
fan-out lands.
P1.2 - multi-image stream silently dropped earlier completions
translate_responses_stream_to_images_stream buffered only a
single pending_completed_event, so each new
response.output_item.done overwrote the previous one. The
buffer is now an ordered list and every completed image is
emitted in arrival order; only the *last* completion carries
usage to match the canonical OpenAI Images streaming shape.
P1.3 - StreamingResponse could be returned with a half-open upstream
stream_responses(propagate_http_errors=True) can raise
ProxyResponseError before yielding any chunk (exhausted retries,
upstream 5xx). The previous code wrapped the iterator into
StreamingResponse first, so those errors leaked as a broken or
truncated SSE body. New helper _prime_upstream_stream pulls the
first chunk eagerly, surfaces ProxyResponseError as a
structured OpenAI error envelope, and replays the captured chunk
through the rest of the iterator on success. Both
/v1/images/generations and /v1/images/edits route handlers
use it before constructing their downstream stream / collect call.
P2 - model is mandatory and images_default_model was unused
V1ImagesGenerationsRequest and V1ImagesEditsForm now make
model optional. New resolve_public_image_model helper
validates the resolved value (so a misconfigured default is caught
early) and returns the publicly-effective gpt-image-* value.
validate_generations_payload and validate_edits_payload now
return the payload with model populated to the resolved value
so downstream code can keep treating payload.model as a
concrete string.
Spec updates:
- Multi-image requests are rejected until upstream support
arrives scenario covers the new n > 1 rejection.
- Missing model defaults to images_default_model scenario covers
the optional model field.
- Image generation streaming uses canonical OpenAI Images events
requirement now states that every completed image is emitted in
order with usage attached only to the final event, and that
pre-first-chunk upstream errors must be surfaced as structured
envelopes.
Tests:
- test_images_generations_falls_back_to_default_model_when_omitted
- test_images_generations_rejects_n_greater_than_one
- test_images_generations_propagates_upstream_error_before_first_chunk
- test_multiple_completed_image_items_are_all_emitted
Local verification:
- pytest tests/unit -q - 1436 passed
- uvx ruff format --check . - 424 files clean
- uv run ty check - 0 new diagnostics
- Live empirical against dev confirms model omission falls back to
gpt-image-2, n=2 returns 400, streaming preserves usage on the
terminal event.
* docs(proxy): clarify n and multi-image comments after codex re-review
Codex re-review on 390e8ce flagged the same two P1 sites it raised on
the first round (lines 105 and 628 of images_service.py), even though
both findings were already addressed by:
- defaulting images_max_n to 1 and rejecting n > images_max_n at the
API boundary so the tool builder never sees n > 1 (P1.1), and
- buffering every completed image_generation_call in an ordered list
so multi-image responses are emitted in arrival order with usage
attached only to the final event (P1.2).
The bot only sees the lines themselves, not the new flow that
guarantees they are safe. Reword the inline comments so the safety
invariants are obvious in the diff and to future reviewers:
- The 'del n' line now explicitly says it is rejected upstream and
documents the contract operators must follow if they raise the cap.
- The 'pending_completed_events.append(event)' line now explicitly
contrasts with the previous overwrite-on-update bug.
No behaviour changes.
* fix(proxy): address codex re-review on /v1/images adapter
Two new findings on c0584e2 - both are correct.
P1 (api.py:624) - /v1/images/edits model was still mandatory at the
form-binding layer
v1_images_edits declared model: str = Form(...) so FastAPI
rejected requests that omitted model with 422 *before*
validate_edits_payload could fall back to
settings.images_default_model. The schema-level optional /
default-resolution from the previous round therefore never ran on
the multipart route. Make the form binding optional
(model: str | None = Form(None)) so the default resolution
actually fires. Verified live: POST /v1/images/edits with no
model field returns 200 with the default gpt-image-2 response.
P2 (images_service.py:764) -
collect_responses_stream_for_images accepted
response.incomplete as if it were response.completed, so a
half-finished upstream response could still produce a 200 image
envelope. The streaming translator already surfaces
response.incomplete as an error event; non-streaming collect now
matches by emitting an image_generation_failed envelope and
closing the drain loop, keeping the two paths consistent.
New tests:
- test_images_edits_falls_back_to_default_model_when_omitted -
multipart edits without model form field reach
validate_edits_payload and pick up the configured default.
- test_response_incomplete_returns_error_envelope -
collect_responses_stream_for_images returns an error envelope
on response.incomplete.
* fix(proxy): harden /v1/images n-cap and request-log rewrite
Codex re-review on ca050c0 surfaced two more correct findings.
P1 (images_service.py:105) -
validate_image_request_parameters only rejected n > images_max_n,
so an operator override of images_max_n > 1 would silently fall
through to the tool builder which discards n, returning fewer
images than requested. The cap is now hard-coded at n == 1
regardless of images_max_n (kept on the helper signature for
forward compatibility but no longer consulted) until client-side
fan-out lands. _build_image_generation_tool also asserts
n == 1 as a defence-in-depth check so a future regression in the
validator cannot reintroduce the silent-drop bug.
P2 (api.py:854) -
The streaming request-log model rewrite ran in the tail of the SSE
generator after the client finished consuming the body. Early client
disconnect (cancelled SSE response) skipped the rewrite, leaving
request_logs.model pinned to the internal host model. The
rewrite is now in a finally block on the inner async generator,
so it runs on both clean completion and cancellation. Verified
live: aborting the curl client at 3s still wrote the row with
model='gpt-image-2'.
Tests:
- test_n_greater_than_one_is_rejected_even_when_images_max_n_is_higher
pins the hard cap so a future configuration change cannot silently
re-enable the silent-drop path.
- Existing test_n_bounds parametrise tightened from
[(1, True), (4, True), (0, False), (5, False)] to
[(1, True), (0, False), (2, False), (5, False)] to reflect the
hard cap.
- test_stream_with_partial_images_passes_through now uses n=1
to match the new contract.
* fix(proxy): drop images_max_n setting and ignore late upstream errors
Two more correct findings on 21735d7.
P2 (images.py:178) - images_max_n is configured but never honored
Operators who set images_max_n > 1 would still get 400
param: n because the validator hard-rejects n > 1 regardless
of the cap. The setting was therefore both inert and misleading.
Drop images_max_n from settings and from
validate_image_request_parameters entirely so the runtime
surface no longer claims to be configurable. The cap is hard-coded
at 1 today; it will be lifted in the same change that introduces
client-side fan-out, alongside a new (real) configuration knob.
P2 (images_service.py:793) - late failed/error events overrode a
successful response.completed
After final_response is captured, the response.failed and
error event branches kept overwriting it with a terminal error,
so a trailing transport-level event (or an upstream nudge that
arrives after the result is already in hand) could turn a 200
image envelope into a spurious 502. Both branches now bail out
early when final_response is set, mirroring the streaming
translator's own already-emitted-terminal guard.
Tests:
- test_n_greater_than_one_is_unconditionally_rejected replaces
the previous images_max_n-aware assertion.
- test_late_failed_after_completed_is_ignored and
test_late_error_event_after_completed_is_ignored pin the
late-event behaviour for non-streaming collect.
Local re-runs: ruff format/check + ty clean, pytest tests/unit -q
1440 passed.
* fix(proxy): apply API-key enforced_model and OpenAI-shape edits 400s
Two more codex review findings on 477a445.
P1 (api.py:787) - API key enforced_model was ignored on /v1/images/*
/v1/responses and /v1/chat/completions route through
_effective_model_for_api_key(api_key, requested) so an API key
pinned to a specific model overrides the client's request before
validation, reservation, or dispatch. The image routes were only
calling validate_model_access(api_key, public_model), which lets
clients bypass the pin: a key pinned to (say) gpt-5.5 could
still call any gpt-image-* model through /v1/images/*.
Both image route handlers now compute effective_model the same
way as the rest of the proxy, fail closed with an OpenAI-shape 400
invalid_request_error (param: model) when the enforced model
is not a gpt-image-* value, and rebind payload.model so
every downstream component (validation, request log rewrite, tool
config, allowed-model check, limit reservation) sees the enforced
value. is_supported_image_model is now re-exported from
images_service so the route handlers can run the gpt-image-only
check without importing through app.core.openai.images.
P2 (api.py:633) - typed multipart fields produced framework 422s
Declaring n: int = Form(1) and friends made FastAPI 422 on
invalid scalars (n=abc, stream=yesplz) before
V1ImagesEditsForm.model_validate(...) could surface them as
OpenAI-shape invalid_request_error envelopes. Bind every typed
scalar form field as str | None = Form(None) and let Pydantic
coerce on the schema side; invalid values now flow into the
existing ValidationError -> openai_validation_error mapper.
New tests:
- test_images_edits_invalid_n_returns_openai_error and
test_images_edits_invalid_stream_returns_openai_error pin the
OpenAI-shape 400 contract for malformed multipart scalars.
Live verification: curl -F n=abc /v1/images/edits returns
HTTP 400 invalid_request_error (param: n) instead of a
framework 422; omitting model still falls back to
images_default_model and produces a 200 response.
* fix(proxy): apply enforced_model before validating image params
Codex review on 30f1f41 flagged that the enforced-model swap happened
*after* validate_generations_payload / validate_edits_payload,
so a request that satisfied the validation matrix under the client-
supplied model could be silently routed to a different
gpt-image-* variant whose matrix it does not satisfy. Both routes
now resolve effective_model = _effective_model_for_api_key(api_key,
requested_or_default) first, fail closed with an OpenAI-shape 400
when the resulting value is not in the gpt-image-* family, rebind
payload.model to the enforced value, and only then run the
cross-field validation matrix. settings.images_default_model is
also resolved up-front so the early enforced-model check never sees
None.
Behaviour changes:
- A key pinned to gpt-image-1 no longer accepts size /
quality values that are only valid for gpt-image-2 (or
vice-versa); the request is rejected at the API boundary with a
deterministic 400 instead of leaking through to upstream.
- input_fidelity is enforced against the *effective* model on
edits, so a key pinned to gpt-image-2 correctly rejects
input_fidelity no matter what the client wrote.
No new tests are needed beyond the existing matrix coverage in
test_images_schemas because the validation now runs against the
enforced model; the existing reject-cases continue to pin the
contract.
* fix(proxy): map /v1/images/* error envelopes to canonical HTTP status
Codex review on 3f5bdb6 flagged that the non-streaming
/v1/images/{generations,edits} error path hard-coded HTTP 502 for
every translated upstream error envelope. That breaks API
compatibility with /v1/responses (which uses _status_for_error)
and can trigger incorrect retry behaviour for client-originated
failures (content_policy_violation, invalid_request_error)
and throttling (rate_limit_error) by surfacing them as gateway
errors instead of 4xx/429.
New helper _status_for_image_error_envelope maps the OpenAI
error envelope dict to a canonical status:
- code precedence: content_policy_violation -> 400,
rate_limit_exceeded / insufficient_quota -> 429, plus
the existing _UNAVAILABLE_SELECTION_ERROR_CODES -> 503.
- type fallback: invalid_request_error -> 400,
authentication_error -> 401, permission_error -> 403,
not_found_error -> 404, rate_limit_error -> 429,
insufficient_quota -> 429.
- transport-level / unknown shapes still default to 502.
Both the error_envelope branch and the images_result error
branch on each route handler now route through this helper. The
streaming SSE error path is unaffected because it emits error
events instead of an HTTP status.
New test:
- test_images_generations_maps_content_policy_to_400 exercises
the code -> status precedence end-to-end through the route.
* fix(proxy): edit-stream event names + cost rewrite on /v1/images/*
Two more correct findings on 74c7479.
P2 (images_service.py:69) - /v1/images/edits emitted image_generation.*
The streaming translator hard-coded the downstream event names to
image_generation.partial_image / image_generation.completed
even when called from the edits route, but the OpenAI Images
streaming vocabulary distinguishes image_edit.* for the edits
surface. Edit clients listening for image_edit.* would miss
every event. The translator now takes is_edit: bool = False and
the route handler for /v1/images/edits passes is_edit=True,
so generations still emit image_generation.* and edits emit
image_edit.*. Verified live: -F stream=true on
/v1/images/edits produces image_edit.partial_image and
image_edit.completed with zero image_generation.* events.
P2 (request_logs/repository.py:242) - cost_usd kept host-model pricing
update_model_for_request only rewrote model, leaving
cost_usd at the value computed from the internal host model at
insert time. Dashboards then mixed the public gpt-image-* label
with host-model pricing. The rewrite now fetches the affected rows,
reassigns model, recomputes cost_usd via
calculated_cost_from_log against the new model, and commits the
combined update so reporting stays consistent. The update import
is no longer needed and is removed.
New tests:
- test_translator_emits_image_edit_events_when_is_edit_true pins
the event-name contract for the edits stream path.
The cost-rewrite path is exercised end-to-end by the existing
streaming integration tests; cost_usd arrives as None for
gpt-image-* until the model registry adds image pricing, which is
the same Null-as-unknown behaviour add_log produces today.
* feat(proxy): include created_at and usage details on /v1/images/* events
Two more correct codex review findings on 4e37706.
P1 (images_service.py:468) - created_at field was missing from
emitted image stream events
OpenAI Images stream event schemas expose created_at on both
partial and completed events (it lets clients sequence events and
feed observability pipelines), and SDKs that decode against the
official models reject events that drop the field. The translator
now forwards created_at from the upstream event when present
and synthesizes a current Unix timestamp otherwise, via a tiny
_coerce_created_at helper. Both image_generation.* and
image_edit.* events get the field.
P2 (images_service.py:373) - tool_usage.image_gen detail objects
were dropped
_extract_image_usage only forwarded scalar token counts and
silently discarded the nested input_tokens_details /
output_tokens_details objects (and any future detail keys
upstream may add). The OpenAI Images usage schema exposes those
per-modality breakdowns, so dropping them made the response shape
incomplete and could break clients that expect full metadata.
- V1ImageUsage now declares input_tokens_details and
output_tokens_details and switches to extra='allow' so
future upstream additions propagate without a schema bump.
- _extract_image_usage forwards both detail objects unchanged
and also threads any other non-canonical keys through as extras.
- Returns None only when token counts AND every detail object
AND every extra key are absent, so a request that only carries a
detail breakdown still surfaces usage to the client.
New tests:
- test_partial_and_completed_events_include_created_at pins the
schema field for both event kinds.
- test_input_and_output_tokens_details_are_forwarded covers the
usage detail propagation end-to-end.
Verified live on dev: streamed events carry created_at and the
non-streaming JSON envelope's usage block carries
input_tokens_details/output_tokens_details exactly as upstream
emits them.
* fix(pricing): add gpt-image-* entries so cost-based quotas bite
Codex review on 0ff2b0d flagged that /v1/images/* requests
reserved API-key usage with request_model=gpt-image-* but the
default pricing tables and aliases had no gpt-image-* entries.
ApiKeysService._calculate_cost_microdollars therefore resolved
every image call to $0, so keys constrained by cost_usd could
issue unlimited image requests without consuming budget.
Add token-based pricing entries for the entire gpt-image-*
family in DEFAULT_PRICING_MODELS (using the OpenAI-published
gpt-image-2 rates: text input $5.00/1M, image cached input $2.00/1M,
image output $30.00/1M; the legacy 1.5/1/mini variants currently
mirror gpt-image-2 until OpenAI publishes per-model deltas), and
register matching gpt-image-2* / gpt-image-1.5* /
gpt-image-1-mini* / gpt-image-1* aliases in
DEFAULT_MODEL_ALIASES so date-pinned snapshots resolve to the
canonical entry.
The current ModelPrice shape carries a single input rate, so
text and image input share the rate; once OpenAI publishes a more
nuanced split we can extend ModelPrice and the cost calculator
without changing the route surface.
Live verification: calculated_cost_from_log for
`(model='gpt-image-2', input=1659, output=22)` now returns
$0.008955 instead of None.
New tests in test_images_schemas:
- test_gpt_image_2_pricing_is_defined
- test_gpt_image_2_alias_resolves (date-pinned snapshot)
- test_calculated_cost_is_nonzero_for_gpt_image_2
- test_legacy_gpt_image_models_have_pricing
* fix(proxy): force image_generation tool and edit action on /v1/images/*
Two more correct codex review findings on 17b4127.
P1 (images_service.py:192) - tool_choice='auto' let the host model
refuse the tool call
Both image translation paths set tool_choice to "auto",
which lets the host Responses model return a refusal or plain text
instead of an image_generation_call. When that happened the
adapter fell through to image_generation_failed and surfaced a
5xx even though the request shape was valid. Both routes now pin
tool_choice to {"type": "image_generation"} so tool
invocation is deterministic and any failure surfaces as an image-
tool error rather than model-choice fallthrough.
P1 (images_service.py:113) - edits route did not set the tool
action, so the host model could pick generation behaviour
_build_image_generation_tool now takes is_edit: bool = False
and emits "action": "edit" on the tool config when
is_edit=True. images_edit_to_responses_request passes
is_edit=True, so /v1/images/edits requests are sent in edit
mode and the host model treats the attached input_image(s) as the
source/mask pair instead of inspiration for a fresh generation.
No behaviour change for non-streaming/streaming success paths -
existing image generation tests still pass. Live verification on
/v1/images/edits previously confirmed multipart edits work with
the forced action; the rate-limit window was hit when re-running but
the request shape was accepted (429 came back with an OpenAI
rate_limit_exceeded envelope, not a tool_choice/action rejection).
* fix(images): tighten quality and input_fidelity allowlists for gpt-image
Two more correct codex review findings on f10e00c.
P2 (images.py:51) - DALL-E-only quality values were accepted
_LEGACY_QUALITY previously included standard and hd,
which are DALL-E quality values and not valid for any
gpt-image-* model. Allowing them let invalid requests bypass
adapter-side validation and fail later upstream with a less
deterministic error. Drop them so the legacy quality allowlist is
{low, medium, high, auto} like the gpt-image-2 allowlist.
P2 (images.py:244) - gpt-image-1-mini accepted input_fidelity
The legacy branch accepted input_fidelity for every legacy
gpt-image-* model, but gpt-image-1-mini does NOT support
the parameter. Add _INPUT_FIDELITY_SUPPORTED_MODELS =
{gpt-image-1.5, gpt-image-1} and reject the parameter for any
other model with an OpenAI-shape invalid_request_error
(param: input_fidelity). The docstring at the top of the
module is updated to call out the gpt-image-1-mini exception.
New tests in test_images_schemas:
- test_legacy_quality_does_not_accept_dalle_only_values covers
the standard/hd rejection.
- test_input_fidelity_rejected_on_gpt_image_1_mini_edits pins
the new model-specific rejection.
- test_input_fidelity_still_accepted_on_gpt_image_1_edits is a
positive regression test for gpt-image-1 / gpt-image-1.5.
* fix(images): reject input_fidelity on /v1/images/generations
Codex review on d40f3c5 flagged that validate_generations_payload
hard-coded input_fidelity=None so the generations path never
checked the value. Combined with V1ImagesGenerationsRequest's
extra=ignore config, a client could send input_fidelity and
the field was silently dropped instead of being rejected, breaking
the documented matrix (generations should always error on
input_fidelity).
- V1ImagesGenerationsRequest now declares input_fidelity:
str | None = None so the field is captured into the payload
rather than absorbed by extra=ignore.
- validate_generations_payload forwards payload.input_fidelity
to validate_image_request_parameters. The validator already
rejects input_fidelity outside of /v1/images/edits, so a
generations request that includes the field now returns the
documented OpenAI-shape 400 invalid_request_error
(param: input_fidelity) instead of silently dropping it and
succeeding.
New test:
- test_images_generations_rejects_input_fidelity exercises the
rejection end-to-end through the route.
* fix(proxy): enforce API-key auth on /v1/images/variations
Codex review on d40f3c5 flagged that v1_images_variations had no
Security(validate_proxy_api_key) dependency, so the route
returned a public 404 to unauthenticated callers even when proxy
API-key auth was enabled. Every other /v1/images/* (and every
/v1/responses-family) route gates on validate_proxy_api_key,
so this was an inconsistent auth surface.
Add the same Security(validate_proxy_api_key) dependency to the
variations stub so the standard auth policy runs before we return
the 404 not_found_error envelope. The api_key parameter is
captured purely to trigger the dependency and explicitly discarded
with del api_key so static analysis does not flag it as unused.
* fix(proxy): record image_generation usage against API-key limits
Codex review on e71aef4 flagged that /v1/images/* requests
charged nothing to the API key: the standard stream settlement
reads response.usage (which is typically empty for the
image_generation tool path) and then releases the reservation,
so cost-based cost_usd quotas never bite on image requests and
the dashboard cost was undercounted. The real token counts already
arrive on response.tool_usage.image_gen, which the adapter
extracts for the public usage envelope.
This change wires those tokens into API-key settlement:
- ProxyService.record_image_api_key_usage(api_key, model,
input_tokens, output_tokens) calls
ApiKeysService.record_usage so the limits and cost are
incremented even after the reservation has been released. It is
CancelScope(shield=True) so client disconnects cannot skip
it, and bails out for api_key=None or all-zero counts.
- images_service now stashes image_input_tokens /
image_output_tokens on the captured dict via a small
_stash_image_usage_tokens helper, populated from both the
streaming translator and the non-streaming collector when the
trailing response.completed arrives. The captured dict
type is widened from dict[str, str] to dict[str, object]
to hold both the response_id (str) and the token counts (int).
- Both image route handlers (generations / edits, streaming /
non-streaming branches) call record_image_api_key_usage after
the request log model rewrite so the same captured value drives
both writes. The streaming branches keep the call inside the
inner generator's finally so client disconnects still credit
usage.
Live verification on dev: a single /v1/images/generations call
incremented the API key's request_count by 1, total_tokens
by 1681 (matching input=9 + output=196 plus the existing host
model accounting), and total_cost_usd by $0.008955 - the
exact gpt-image-2 token-based cost from
calculated_cost_from_log.
* fix(proxy): single-source image API-key billing and forward cached tokens
Two more correct codex review findings on 786153c.
P1 (api.py:971) - double-billing when both response.usage and
tool_usage.image_gen are present
The previous implementation kept the standard stream settlement
(which would finalize the reservation from response.usage) AND
added a post-hoc record_image_api_key_usage call from
tool_usage.image_gen. When upstream emits both, the same
request was charged twice and could prematurely throttle keys.
This change makes the image adapter the single source of API-key
billing for /v1/images/*:
- Both image route handlers now invoke stream_responses with
api_key_reservation=None, so _settle_stream_api_key_usage
no-ops and never touches the reservation.
- New _finalize_image_reservation(reservation, model, input,
output, cached) helper finalizes from tool_usage.image_gen
when token counts are present and releases otherwise. Both
streaming and non-streaming paths call it exactly once after the
request log model rewrite.
- The previous ProxyService.record_image_api_key_usage helper
is removed (no callers remain).
P2 (images_service.py:522) - cached image input tokens were dropped
_stash_image_usage_tokens only persisted input_tokens /
output_tokens, so any
tool_usage.image_gen.input_tokens_details.cached_tokens value
was lost before settlement. record_image_api_key_usage then
defaulted cached_input_tokens to 0, billing cached requests as
fully uncached input. The stash helper now extracts cached tokens
via a new _extract_cached_input_tokens helper and stores them
under image_cached_input_tokens, which the route handler
forwards to _finalize_image_reservation.
Live verification on dev: a single /v1/images/generations call
now increments the API key once - request_count +1, total_tokens by
the actual image_generation token count, total_cost_usd by exactly
calculated_cost_from_log - with no double-charge from the bypassed
standard settlement path.
* fix(images): accept image[] multipart key on /v1/images/edits
Codex review on 2c48cdc flagged that the edits route only bound
image: list[UploadFile] = File(...), so OpenAI SDKs and HTTP
clients that emit the array-style image[] form key (a common
multipart shape for repeated files) were treated as missing the
required image input and 400'd before translation.
Both keys are now accepted:
- image: list[UploadFile] | None = File(None) (canonical key)
- image_brackets: list[UploadFile] | None = File(None,
alias="image[]") (array-style key)
The handler merges the two lists in order and returns an
OpenAI-shape 400 only if the merged list is empty. Both keys
remain optional at the FastAPI binding layer; the empty-list check
runs after the merge so the rejection message is unified.
Live verification on dev: -F 'image[]=@source.png' now returns
HTTP 200 with a valid b64 image and the upstream payload includes
the input_image content part as expected.
New test test_images_edits_accepts_image_brackets_form_key
exercises the image[] path end-to-end through the route.
* fix(proxy): swallow image reservation finalize errors
Codex review on bb7af27 flagged that _finalize_image_reservation
performed API-key reservation writes directly without catching
persistence errors, unlike the existing
ProxyService._settle_stream_api_key_usage path. A transient
DB/session failure during the tail accounting could turn a
successfully generated image into a user-facing 500
(non-streaming) or an abrupt stream termination (streaming).
Wrap the body in a single try/except Exception that logs and
returns, mirroring the best-effort accounting policy used by the
standard stream settlement helper. The reservation may end up in
its previous state on a partial failure, but the client always
sees its image and the operator gets an actionable warning.
New test
test_images_generations_succeeds_when_reservation_finalize_fails
patches ApiKeysService.finalize_usage_reservation to raise and
confirms the route still returns 200 with the generated image
envelope.
* fix(proxy): return api-key limits from v1 usage (#501)
Co-authored-by: Ruben Beuker <rubenbeuker@MacBook-Air-van-Ruben.local>
* feat: add API key filter for dashboard request logs (#497)
* feat(dashboard): add request-log API key filter
* test(dashboard): add missing request-log apiKeyId fixtures
* docs: add stemirkhan as a contributor for code, doc, and test (#503)
* docs: update README.md
* docs: update .all-contributorsrc
---------
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
---------
Co-authored-by: Hugh Do <mhughdo@gmail.com>
Co-authored-by: Temirkhan <99467693+stemirkhan@users.noreply.github.com>
Co-authored-by: Crawfish (Soju06) <crawfish@openclaw.local>
Co-authored-by: Bala Kumar <mail@balakumar.dev>
Co-authored-by: Soju06 <qlskssk@gmail.com>
Co-authored-by: allcontributors[bot] <46447321+allcontributors[bot]@users.noreply.github.com>
Co-authored-by: Ruben <r.beuker@ziggo.nl>
Co-authored-by: Ruben Beuker <rubenbeuker@MacBook-Air-van-Ruben.local>
Co-authored-by: Kazet <kazet111@gmail.com>
Co-authored-by: Hannah Markfort <74815681+xCatalitY@users.noreply.github.com>
Co-authored-by: tobwen <1864057+tobwen@users.noreply.github.com>
Co-authored-by: Rio <rio.jeong@thebytesize.ai>
Co-authored-by: craw <craw@openclaw.local>
Co-authored-by: crawfish <qlskssk+crawfish@gmail.com>
Co-authored-by: Jacky Fong <hello@huzky.dev>
Summary
CopyButtonnavigator.clipboard.writeText(...)directlyWhy
The API key created dialog had its own minimal clipboard implementation with no error handling or user feedback on failure. Reusing the shared copy control makes the behavior consistent across browsers and aligns this dialog with the rest of the frontend.
Testing
Notes