📋 Prerequisites
🎯 Affected Service(s)
App Service
🚦 Impact/Severity
Minor inconvenience
🐛 Bug Description
We're using a Bedrock-backed agent with stream: true and noticed our users have to wait 4-5 seconds before seeing
anything, even for a simple "Hi" message. After some digging through the code, I found the culprit in
kagent/adk/models/_bedrock.py around line 320:
def _run_converse_stream(**kw):
resp = client.converse_stream(**kw)
return list(resp.get("stream", []))
That list() call collects the entire Bedrock response into memory before anything happens downstream. The rest of the
streaming chain actually works great (partial LlmResponse events, A2A SSE, the whole thing) but it's all waiting on this
one function to finish collecting everything first.
🔄 Steps To Reproduce
- Deploy an agent with stream: true and a Bedrock model config (we use eu.anthropic.claude-sonnet-4-6 in eu-central-1)
- Send a simple message like "Hi" through the A2A protocol
- Notice nothing appears for 4-5 seconds, then the full answer shows up all at once
- For comparison, calling Bedrock's converse_stream API directly gives you the first token in about 500ms
🤔 Expected Behavior
Text should start appearing within roughly 500ms (which is Bedrock's actual time-to-first-token), with the rest streaming
in progressively.
📱 Actual Behavior
The UI is blank for 4-5 seconds, then the complete response appears in one shot. Looking at the A2A events, the
kagent_adk_partial: true events do get emitted but they all arrive as a burst after buffering, not incrementally as
Bedrock produces them.
💻 Environment
- kagent-adk: 0.3.0
- google-adk: 1.31.1
- a2a-sdk: 0.3.23
- Model: eu.anthropic.claude-sonnet-4-6 via Bedrock (eu-central-1)
- Running on EKS, agent configured with stream: true
🔧 CLI Bug Report
No response
🔍 Additional Context
I think the fix could be fairly contained. Replace the list() buffering with something that bridges boto3's synchronous
iterator to the async world incrementally. Something along these lines:
async def _iter_converse_stream(client, **kw):
queue = asyncio.Queue()
def _produce():
resp = client.converse_stream(**kw)
for event in resp.get("stream", []):
queue.put_nowait(event)
queue.put_nowait(None)
loop = asyncio.get_event_loop()
loop.run_in_executor(None, _produce)
while (event := await queue.get()) is not None:
yield event
Or whatever pattern fits the project's conventions better. The important thing is just not materializing the whole stream
upfront.
All the downstream plumbing (event converter, A2A event queue, SSE to client) already handles partial events correctly.
It's really just this one spot that's holding things up.
Happy to test a fix on our cluster if you want to point me at a branch.
📋 Logs
📷 Screenshots
No response
🙋 Are you willing to contribute?
📋 Prerequisites
🎯 Affected Service(s)
App Service
🚦 Impact/Severity
Minor inconvenience
🐛 Bug Description
We're using a Bedrock-backed agent with stream: true and noticed our users have to wait 4-5 seconds before seeing
anything, even for a simple "Hi" message. After some digging through the code, I found the culprit in
kagent/adk/models/_bedrock.py around line 320:
That list() call collects the entire Bedrock response into memory before anything happens downstream. The rest of the
streaming chain actually works great (partial LlmResponse events, A2A SSE, the whole thing) but it's all waiting on this
one function to finish collecting everything first.
🔄 Steps To Reproduce
🤔 Expected Behavior
Text should start appearing within roughly 500ms (which is Bedrock's actual time-to-first-token), with the rest streaming
in progressively.
📱 Actual Behavior
The UI is blank for 4-5 seconds, then the complete response appears in one shot. Looking at the A2A events, the
kagent_adk_partial: true events do get emitted but they all arrive as a burst after buffering, not incrementally as
Bedrock produces them.
💻 Environment
🔧 CLI Bug Report
No response
🔍 Additional Context
I think the fix could be fairly contained. Replace the list() buffering with something that bridges boto3's synchronous
iterator to the async world incrementally. Something along these lines:
Or whatever pattern fits the project's conventions better. The important thing is just not materializing the whole stream
upfront.
All the downstream plumbing (event converter, A2A event queue, SSE to client) already handles partial events correctly.
It's really just this one spot that's holding things up.
Happy to test a fix on our cluster if you want to point me at a branch.
📋 Logs
📷 Screenshots
No response
🙋 Are you willing to contribute?