Skip to content

feat: add taxonomy run APIs#85

Merged
BhagyaAmarasinghe merged 5 commits into
mainfrom
eng-1159-taxonomy-repository-api
Jun 12, 2026
Merged

feat: add taxonomy run APIs#85
BhagyaAmarasinghe merged 5 commits into
mainfrom
eng-1159-taxonomy-repository-api

Conversation

@BhagyaAmarasinghe

@BhagyaAmarasinghe BhagyaAmarasinghe commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

  • add Hub public taxonomy APIs for fields, manual run creation/status, active tree reads, node feedback drilldown, rename, and soft-remove
  • add Hub internal taxonomy service APIs for run input and result/failure persistence
  • add taxonomy repository/service models for runs, clusters, memberships, nodes, active runs, and node events
  • wire taxonomy service outbound client/config for manual run orchestration
  • address CodeRabbit tenant-scope review feedback by requiring tenant scoping on public run/tree reads and repository transitions/results

Validation

  • make migrate-validate
  • go test ./internal/...
  • go test ./internal/service ./internal/repository ./internal/api/handlers
  • go test -run '^$' ./tests
  • golangci-lint run --timeout 10m ./...
  • local smoke: Formbricks Web -> Hub -> standalone Python taxonomy service -> Hub persistence succeeded with 20 records, 20 768-dim embeddings, 9 clusters, and 10 nodes

Notes

The local smoke currently uses the taxonomy service's temporary simple generator. POC-backed clustering/LLM 5-level taxonomy generation remains tracked separately on the Python taxonomy service tickets.

@BhagyaAmarasinghe BhagyaAmarasinghe marked this pull request as draft June 11, 2026 11:27
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This pull request introduces a complete taxonomy subsystem for organizing feedback into hierarchical classification trees. It adds taxonomy domain models (runs, nodes, clusters with status and type enums), a PostgreSQL repository layer with atomic run creation via advisory locks and transactional result storage, a service layer that validates configuration (embedding model, external starter), enforces minimum embedding thresholds, and orchestrates run lifecycle (pending → running → succeeded/failed). Public HTTP endpoints (/v1/taxonomy/*) expose field options, manual run creation (returning 202 Accepted while in-progress or 200 OK if immediately available), run/tree retrieval, and node operations (rename, remove, list records). Internal endpoints (/internal/v1/taxonomy/*) handle run completion with result clustering and tree activation, run failure, and input record fetching with embeddings. Request validation includes strict JSON parsing, UUID path parameters, and query-derived scope construction, with domain errors mapped to 503 Service Unavailable for configuration issues.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 14.29% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ⚠️ Warning PR description is comprehensive but missing required testing documentation and checklist verification. Complete the 'How should this be tested?' section with specific test instructions and verify all required checklist items, particularly database migration validation and test coverage results.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: add taxonomy run APIs' is clear, concise, and directly describes the main change—adding taxonomy run APIs. It follows Conventional Commits conventions and accurately summarizes the primary focus of the changeset.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/api/handlers/taxonomy_handler.go`:
- Around line 144-158: GetTree currently returns tree data by run_id without
enforcing tenant boundaries; update GetTree to validate the requester's tenant
before calling or returning h.service.GetTree: extract tenant ID from the
request context/claims (same mechanism used by GetRun), fetch the run owner or
metadata (e.g., via h.service.GetRun or a new h.service.GetRunOwner/GetRunByID
helper) and compare it to the request tenant, and only call h.service.GetTree or
respond with a forbidden error if the tenants match; ensure you reuse
parseUUIDPathValue, h.service.GetTree, and respondTaxonomyError/standard
forbidden response so the check is performed prior to returning sensitive data.

In `@internal/api/handlers/taxonomy_internal_handler.go`:
- Around line 42-62: The internal endpoints GetRunInput, CompleteRun, and
FailRun currently act on runs by ID without tenant verification; update these
handlers (methods on TaxonomyInternalHandler) to require a tenant_id (e.g.,
query or JSON field) and validate it against the run's tenant when fetching the
run (use h.service.GetRunInput or add/consume a
h.service.GetRunMetadata/GetRunByID that returns TenantID) and return a 403 if
tenant_id does not match, or alternatively add a clear comment/logging and tests
if you intentionally mean cross-tenant behavior—ensure the change touches
GetRunInput, CompleteRun, FailRun and any service call used to fetch run
metadata so tenant enforcement is explicit.

In `@internal/repository/taxonomy_repository.go`:
- Around line 328-376: GetRunInput in TaxonomyRepository uses GetRun (which does
not enforce tenant scoping) leading to a privilege bypass; fix by enforcing
tenant verification when loading the run: either update GetRun to accept a
tenant ID and validate tenant (preferred) or add a new repository method (e.g.,
GetRunForTenant) and call that from GetRunInput, ensuring the run returned has
TenantID == run.TenantID (or matches the ctx tenant) before proceeding to query
feedback_records; update references to GetRun in TaxonomyRepository.GetRunInput
to use the tenant-checked method and return an error if the tenant does not
match.
- Around line 553-567: GetTree currently calls GetRun without enforcing tenant
constraints, allowing cross-tenant access; update GetTree in TaxonomyRepository
to verify the run belongs to the caller tenant (e.g., extract tenant from ctx or
use a tenant-aware method) before proceeding: replace or augment the GetRun call
with a tenant-checked lookup (use an existing GetRunWithTenant or add a tenant
filter) and return an authorization error if the tenant mismatch occurs, then
continue to call listVisibleNodes and buildTaxonomyTree only after the tenant
validation passes.
- Around line 236-255: MarkRunFailed updates a run by ID without enforcing
tenant boundaries; mirror the fix from MarkRunRunning by verifying tenant.
Modify MarkRunFailed to accept or extract the tenant UUID (same approach used by
MarkRunRunning), add a tenant condition to the UPDATE WHERE clause (e.g., "AND
tenant_id = $3" or equivalent positional param) and pass the tenant ID into the
call to queryTaxonomyRun so the query only updates runs belonging to that
tenant; keep taxonomyRunSelect and error handling unchanged except for using the
tenant-aware query.
- Around line 257-268: GetRun (and GetTree) currently fetch taxonomy runs by ID
without tenant scoping; update the handler (taxonomy_handler.go) to
require/obtain tenantID and pass it into the service call, update the service
method (taxonomy_service.go) signatures to accept tenantID and forward it to the
repository, and update repository methods GetRun and GetTree
(taxonomy_repository.go) to add tenant_id to their SQL WHERE clauses (e.g.,
"WHERE id = $1 AND tenant_id = $2") and accept the tenantID parameter, ensuring
you thread the tenantID through queryTaxonomyRun or equivalent functions so
every layer enforces tenant-based access control.
- Around line 215-234: The MarkRunRunning function currently updates a taxonomy
run by id without tenant scoping; change its signature to accept a tenantID
(uuid.UUID) and enforce tenant boundary by either (preferred) including
tenant_id = $2 in the UPDATE/WHERE so the UPDATE only affects runs belonging to
that tenant, or (alternatively) fetch the run with queryTaxonomyRun first and
verify run.TenantID equals the provided tenantID before performing the UPDATE;
ensure error handling returns a not-found or unauthorized error when tenant
mismatch occurs and update any callers of MarkRunRunning to pass the tenantID.
- Around line 378-551: StoreResultAndActivate currently locks a taxonomy run by
ID without verifying tenant ownership; retrieve the current tenant id from the
request context (the service's auth/claims) and change the initial
queryTaxonomyRun call that uses "FROM taxonomy_runs WHERE id = $1 FOR UPDATE" to
include "AND tenant_id = $2 FOR UPDATE" (or equivalent check) and pass the
tenant id; return a not-found or permission error if no row is returned so runs
from other tenants are never locked or processed. Ensure references:
StoreResultAndActivate, queryTaxonomyRun, and the taxonomy_runs WHERE id = $1
FOR UPDATE clause are updated accordingly.

In `@internal/service/taxonomy_service.go`:
- Around line 224-230: CompleteRun currently forwards to
s.repo.StoreResultAndActivate without verifying tenant boundaries; update
CompleteRun to first load the run (e.g., via s.repo.GetRun or s.repo.GetByID
using runID), extract the tenant ID from the request/context (use the same
tenant-extraction helper used by GetRunInput), compare the run.TenantID to the
caller tenant, and return a permission/forbidden error if they differ; only
after that, call s.repo.StoreResultAndActivate and propagate its result,
handling and returning errors consistently.
- Around line 232-243: FailRun must verify the run belongs to the caller's
tenant before marking it failed for defense-in-depth like GetRunInput and
CompleteRun; update FailRun to (1) fetch the run record (e.g., via s.repo.GetRun
or s.repo.GetRunByID) using runID, (2) obtain the caller tenant ID from the
request context using the same mechanism used by GetRunInput/CompleteRun (the
same auth/tenant extraction helper), (3) compare the run's TenantID to the
caller tenant and return a permission/forbidden error if they differ, and only
then proceed to sanitize the message and call s.repo.MarkRunFailed(ctx, runID,
sanitized).
- Around line 209-211: The TaxonomyService GetTree currently forwards directly
to s.repo.GetTree without tenant scoping; update TaxonomyService.GetTree to
extract the tenant (or workspace) identifier from the context (e.g., from auth
claims) and enforce it before returning results — either by calling a repository
method that accepts tenant (add s.repo.GetTreeWithTenant(ctx, runID, tenantID)
or change s.repo.GetTree signature) or by loading the run via s.repo.GetRun(ctx,
runID) and comparing its tenant to the context tenant and returning an error if
they differ; ensure you use the symbols TaxonomyService.GetTree, s.repo.GetTree
(or new s.repo.GetTreeWithTenant / s.repo.GetRun) to locate and change the code.
- Around line 213-222: The GetRunInput method must verify the tenant boundary
before returning data: extract the tenant ID from the context (use your existing
helper like tenant.FromContext or auth.GetTenantIDFromContext), then either
fetch the run metadata via a repository method (e.g.,
s.repo.GetRunByID/GetRunMeta) and compare its TenantID to the context tenant, or
add a repo method like s.repo.VerifyRunTenant(ctx, runID, tenantID) to perform
the check; if the tenant does not match return an authorization error (create
ErrUnauthorizedTenant if needed) and only call s.repo.GetRunInput(ctx, runID,
s.embeddingModel) after the tenant is verified, keeping the existing
embeddingModel check (ErrTaxonomyEmbeddingsNotConfigured).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a5803349-1f1f-4978-9a9f-6f721320f8b6

📥 Commits

Reviewing files that changed from the base of the PR and between c78b002 and 1c3edf3.

📒 Files selected for processing (8)
  • cmd/api/app.go
  • cmd/api/app_test.go
  • internal/api/handlers/taxonomy_handler.go
  • internal/api/handlers/taxonomy_internal_handler.go
  • internal/config/config.go
  • internal/models/taxonomy.go
  • internal/repository/taxonomy_repository.go
  • internal/service/taxonomy_service.go

Comment thread internal/api/handlers/taxonomy_handler.go
Comment thread internal/api/handlers/taxonomy_internal_handler.go
Comment thread internal/repository/taxonomy_repository.go Outdated
Comment thread internal/repository/taxonomy_repository.go Outdated
Comment thread internal/repository/taxonomy_repository.go Outdated
Comment thread internal/repository/taxonomy_repository.go Outdated
Comment thread internal/service/taxonomy_service.go Outdated
Comment thread internal/service/taxonomy_service.go
Comment thread internal/service/taxonomy_service.go
Comment thread internal/service/taxonomy_service.go
@BhagyaAmarasinghe BhagyaAmarasinghe marked this pull request as ready for review June 11, 2026 17:15

@xernobyl xernobyl left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BhagyaAmarasinghe BhagyaAmarasinghe added this pull request to the merge queue Jun 12, 2026
Merged via the queue into main with commit 4d165fb Jun 12, 2026
12 checks passed
@BhagyaAmarasinghe BhagyaAmarasinghe deleted the eng-1159-taxonomy-repository-api branch June 12, 2026 09:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants