Skip to content

feat(observability): add configurable grafana.prometheusDatasourceName to agent chart#1982

Merged
EItanya merged 7 commits into
kagent-dev:mainfrom
mesutoezdil:feat/observability-prometheus-url
Jun 18, 2026
Merged

feat(observability): add configurable grafana.prometheusDatasourceName to agent chart#1982
EItanya merged 7 commits into
kagent-dev:mainfrom
mesutoezdil:feat/observability-prometheus-url

Conversation

@mesutoezdil

@mesutoezdil mesutoezdil commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Closes #1891

Adds a grafana.prometheusDatasourceName field to the observability agent chart values.

The observability agent reaches Prometheus exclusively through the Grafana MCP tool server.

Tools like query_prometheus take a Grafana datasource name or UID, not a raw endpoint.
When grafana.prometheusDatasourceName is set, it is injected into the agent system message so the agent knows which Grafana datasource to use for Prometheus queries across restarts.

When left empty (default), nothing changes.

Copilot AI review requested due to automatic review settings June 8, 2026 19:05
@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 8, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds a configurable Prometheus server URL to the observability agent Helm chart and injects it into the agent’s system message so the agent can reliably query the correct Prometheus endpoint.

Changes:

  • Introduces prometheus.url in chart values as an optional configuration.
  • Conditionally renders a “Prometheus Configuration” section in the agent system message when the URL is set.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
helm/agents/observability/values.yaml Adds a new prometheus.url value (default empty) with inline documentation.
helm/agents/observability/templates/agent.yaml Conditionally injects the configured Prometheus URL into the agent’s system prompt.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread helm/agents/observability/templates/agent.yaml Outdated
Comment thread helm/agents/observability/values.yaml Outdated
@mesutoezdil mesutoezdil force-pushed the feat/observability-prometheus-url branch 6 times, most recently from f8d444d to 17a001d Compare June 9, 2026 20:20
@iplay88keys

iplay88keys commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

I'm not sure that the URL will do much in practice here. The observability agent reaches Prometheus exclusively through the Grafana tool server. The query_prometheus tool takes a Grafana datasourceUid, not a raw endpoint and the grafana-mcp tool server is configured with only GRAFANA_URL and a service-account token. None of the tools attached to the agent take in a Prometheus URL.

I think the underlying issue is the agent not knowing which datasource to query, which would mean that the fix would be to inject the Grafana datasource name or UID rather than a raw url. Can you clarify a bit more about how the URL is used? Maybe I'm missing something here.

@mesutoezdil

Copy link
Copy Markdown
Contributor Author

I'm not sure that the URL will do much in practice here. The observability agent reaches Prometheus exclusively through the Grafana tool server. The query_prometheus tool takes a Grafana datasourceUid, not a raw endpoint and the grafana-mcp tool server is configured with only GRAFANA_URL and a service-account token. None of the tools attached to the agent take in a Prometheus URL.

I think the underlying issue is the agent not knowing which datasource to query, which would mean that the fix would be to inject the Grafana datasource name or UID rather than a raw url. Can you clarify a bit more about how the URL is used? Maybe I'm missing something here.

thx for catching that. yeah, i v updated the code to use grafana.prometheusDatasourceName instead. when set, the agent is told which grafan datasource to use for query_prometheus and related tools. and this keeps the original goal (persistence across restarts) while giving the agent sth it can actually act on.

@mesutoezdil mesutoezdil changed the title feat(observability): add configurable prometheus.url to agent chart feat(observability): add configurable grafana.prometheusDatasourceName to agent chart Jun 15, 2026
@mesutoezdil

Copy link
Copy Markdown
Contributor Author

I'm not sure that the URL will do much in practice here. The observability agent reaches Prometheus exclusively through the Grafana tool server. The query_prometheus tool takes a Grafana datasourceUid, not a raw endpoint and the grafana-mcp tool server is configured with only GRAFANA_URL and a service-account token. None of the tools attached to the agent take in a Prometheus URL.
I think the underlying issue is the agent not knowing which datasource to query, which would mean that the fix would be to inject the Grafana datasource name or UID rather than a raw url. Can you clarify a bit more about how the URL is used? Maybe I'm missing something here.

thx for catching that. yeah, i v updated the code to use grafana.prometheusDatasourceName instead. when set, the agent is told which grafan datasource to use for query_prometheus and related tools. and this keeps the original goal (persistence across restarts) while giving the agent sth it can actually act on.

I have updated the PR title and the pr description.

@github-actions github-actions Bot added enhancement New feature or request and removed enhancement New feature or request labels Jun 15, 2026
@mesutoezdil

mesutoezdil commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

dug into the grafana/mcp-grafana source to be more precise here.

query_prometheus (and all related tools like list_prometheus_metric_names, list_prometheus_label_names, etc.) strictly require a datasourceUid, not a name:

  type QueryPrometheusParams struct {
      DatasourceUID string `json:"datasourceUid" jsonschema:"required,description=The UID of the datasource to 
  query"`
      ...
  }

so grafana.prometheusDatasourceName is not directly usable in tool calls. the agent would need to call get_datasource_by_name first to resolve the name to a uid, then pass that uid to the prometheus tools.

2 options from here:

nr1: change to grafana.prometheusDatasourceUid so the agent can use the value directly without a lookup. downside: users need to know their datasource uid upfront.

nr2: keep prometheusDatasourceName and update the system message to explicitly tell the agent to resolve it via get_datasource_by_name before calling any prometheus tools. more user-friendly to configure, 1 extra tool call at runtime.

which one is the best @iplay88keys?

@mesutoezdil mesutoezdil force-pushed the feat/observability-prometheus-url branch from a674092 to f8aba17 Compare June 17, 2026 08:03
@iplay88keys

iplay88keys commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

nr2: keep prometheusDatasourceName and update the system message to explicitly tell the agent to resolve it via get_datasource_by_name before calling any prometheus tools. more user-friendly to configure, 1 extra tool call at runtime.

I would probably lean toward the second one, though it looks like it's was changed to get_datasource or list_datasources, not get_datasource_by_name in this PR, so we should probably update our agent and the tools it has access to accordingly. (docs).

It should really only need to be run once at the beginning of the context window and then the agent could re-use that in subsequent calls. It also allows the agent to be generic and know how to switch data sources.

Another option is to have the agent's system message say to list datasources and if there's more than one, raise that as a question to the user if it's not clear from the message.

@mesutoezdil mesutoezdil force-pushed the feat/observability-prometheus-url branch from 939e2d5 to 3ec71ca Compare June 17, 2026 17:13
@mesutoezdil mesutoezdil force-pushed the feat/observability-prometheus-url branch from 2c1ba10 to 9079422 Compare June 17, 2026 17:15
Adds an optional prometheus.url value. When set, the URL is injected
into the agent system message so the agent knows which endpoint to use.

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
…asourceName

The observability agent queries Prometheus exclusively through the Grafana
MCP tool server. Tools like query_prometheus take a Grafana datasourceUid,
not a raw Prometheus endpoint, so injecting a URL into the system message
provided no actionable value.

Replace prometheus.url with grafana.prometheusDatasourceName. When set,
the agent is told which Grafana datasource to use for all Prometheus
queries, matching how the tools actually work.

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
…by_name with get_datasource

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
@mesutoezdil mesutoezdil force-pushed the feat/observability-prometheus-url branch from 9079422 to 0370e7e Compare June 17, 2026 17:16
@iplay88keys

Copy link
Copy Markdown
Contributor

If we switch to using the grafana docker images for this, it looks like we could pin to a tagged version and not have to worry about the tool names going out of sync: https://hub.docker.com/r/grafana/mcp-grafana/tags. It might be worth a follow-up or at least an issue to track it, but it's not a big deal at this point.

@mesutoezdil

Copy link
Copy Markdown
Contributor Author

If we switch to using the grafana docker images for this, it looks like we could pin to a tagged version and not have to worry about the tool names going out of sync: https://hub.docker.com/r/grafana/mcp-grafana/tags. It might be worth a follow-up or at least an issue to track it, but it's not a big deal at this point.

oki, opened a follow-up issue to track it: #2040

@iplay88keys

Copy link
Copy Markdown
Contributor

Have you had a chance to test this out in your env? Can you provide some screenshots or a testing strategy showing that it works as expected?

@mesutoezdil

mesutoezdil commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Have you had a chance to test this out in your env? Can you provide some screenshots or a testing strategy showing that it works as expected?

quick check (I tried to fit the commands onto a single screen)

Screenshot 2026-06-17 at 20 43 58

@iplay88keys

iplay88keys commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Sorry, I meant using the agent to show this fixes the issue raised.

@mesutoezdil

mesutoezdil commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Sorry, I meant using the agent to show this fixes the issue raised.

sure (in my otel env: https://github.com/mesutoezdil/myOTel)

Screenshot 2026-06-17 at 22 00 16 Screenshot 2026-06-17 at 22 00 26

@iplay88keys

iplay88keys commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Looking at your otel env repo, shouldn't it have found the VictoriaMetrics datasource?

Based on the screenshots, it seems that the agent tried to do a tool call with the prometheus-uid and webstore-metrics datasource UIDs which didn't work.

@mesutoezdil

mesutoezdil commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Looking at your otel env repo, shouldn't it have found the VictoriaMetrics datasource?

Based on the screenshots, it seems that the agent tried to do a tool call with the prometheus-uid and webstore-metrics datasource UIDs which didn't work.

list_datasources result confirms this Grafana has 3 datasources: Prometheus (uid: webstore-metrics), Jaeger, and OpenSearch. no VictoriaMetrics. agent queried http_server_active_requests successfully and got real data back. the myOTel repo dashboards reference a VictoriaMetrics uid but that is a separate env, unrelated to this test.

4 3 2 1

@iplay88keys

Copy link
Copy Markdown
Contributor

Ok, would you say this is working as expected, then? It seems that the issues could be related to the LLM itself. For your last example, at least, I would have expected the model to know from the prior context that Prometheus meant uid of webstore-metrics.

…essage

Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
@mesutoezdil

mesutoezdil commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Ok, would you say this is working as expected, then? It seems that the issues could be related to the LLM itself. For your last example, at least, I would have expected the model to know from the prior context that Prometheus meant uid of webstore-metrics.

I see, now updated the wording to be more explicit. the model still guesses first before resolving correctly via list_datasources. end result is correct but first-call compliance depends on the model. with a better instruction-following model the system message would be used directly.

my last test (with new pr) as you described

Screenshot 2026-06-17 at 22 52 42

@iplay88keys

Copy link
Copy Markdown
Contributor

Cool, I think that's a lot better. The only remaining question is whether we should have helm tests around the configuration. I'm not set on it being a requirement, just putting it out there.

@mesutoezdil

Copy link
Copy Markdown
Contributor Author

Cool, I think that's a lot better. The only remaining question is whether we should have helm tests around the configuration. I'm not set on it being a requirement, just putting it out there.

what would you expect it to test, the system message content or the rendered template?

@iplay88keys

Copy link
Copy Markdown
Contributor

Mostly was curious if there was much of a benefit to it. It seems that none of the agents currently have tests around them and this is pretty minor, so I don't think it'll actually be necessary.

Comment thread helm/agents/observability/templates/agent.yaml Outdated
Signed-off-by: mesutoezdil <mesudozdil@gmail.com>
iplay88keys
iplay88keys previously approved these changes Jun 17, 2026

@iplay88keys iplay88keys left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

EItanya
EItanya previously approved these changes Jun 18, 2026
@mesutoezdil mesutoezdil dismissed stale reviews from EItanya and iplay88keys via b659f97 June 18, 2026 14:04
@mesutoezdil mesutoezdil force-pushed the feat/observability-prometheus-url branch from a97929f to b659f97 Compare June 18, 2026 14:04
@EItanya EItanya merged commit a766d22 into kagent-dev:main Jun 18, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add configurable prometheus.url in values.yaml with kubectl auto-discovery fallback

4 participants