feat: add taxonomy run APIs#85
Conversation
WalkthroughThis pull request introduces a complete taxonomy subsystem for organizing feedback into hierarchical classification trees. It adds taxonomy domain models (runs, nodes, clusters with status and type enums), a PostgreSQL repository layer with atomic run creation via advisory locks and transactional result storage, a service layer that validates configuration (embedding model, external starter), enforces minimum embedding thresholds, and orchestrates run lifecycle (pending → running → succeeded/failed). Public HTTP endpoints ( 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/api/handlers/taxonomy_handler.go`:
- Around line 144-158: GetTree currently returns tree data by run_id without
enforcing tenant boundaries; update GetTree to validate the requester's tenant
before calling or returning h.service.GetTree: extract tenant ID from the
request context/claims (same mechanism used by GetRun), fetch the run owner or
metadata (e.g., via h.service.GetRun or a new h.service.GetRunOwner/GetRunByID
helper) and compare it to the request tenant, and only call h.service.GetTree or
respond with a forbidden error if the tenants match; ensure you reuse
parseUUIDPathValue, h.service.GetTree, and respondTaxonomyError/standard
forbidden response so the check is performed prior to returning sensitive data.
In `@internal/api/handlers/taxonomy_internal_handler.go`:
- Around line 42-62: The internal endpoints GetRunInput, CompleteRun, and
FailRun currently act on runs by ID without tenant verification; update these
handlers (methods on TaxonomyInternalHandler) to require a tenant_id (e.g.,
query or JSON field) and validate it against the run's tenant when fetching the
run (use h.service.GetRunInput or add/consume a
h.service.GetRunMetadata/GetRunByID that returns TenantID) and return a 403 if
tenant_id does not match, or alternatively add a clear comment/logging and tests
if you intentionally mean cross-tenant behavior—ensure the change touches
GetRunInput, CompleteRun, FailRun and any service call used to fetch run
metadata so tenant enforcement is explicit.
In `@internal/repository/taxonomy_repository.go`:
- Around line 328-376: GetRunInput in TaxonomyRepository uses GetRun (which does
not enforce tenant scoping) leading to a privilege bypass; fix by enforcing
tenant verification when loading the run: either update GetRun to accept a
tenant ID and validate tenant (preferred) or add a new repository method (e.g.,
GetRunForTenant) and call that from GetRunInput, ensuring the run returned has
TenantID == run.TenantID (or matches the ctx tenant) before proceeding to query
feedback_records; update references to GetRun in TaxonomyRepository.GetRunInput
to use the tenant-checked method and return an error if the tenant does not
match.
- Around line 553-567: GetTree currently calls GetRun without enforcing tenant
constraints, allowing cross-tenant access; update GetTree in TaxonomyRepository
to verify the run belongs to the caller tenant (e.g., extract tenant from ctx or
use a tenant-aware method) before proceeding: replace or augment the GetRun call
with a tenant-checked lookup (use an existing GetRunWithTenant or add a tenant
filter) and return an authorization error if the tenant mismatch occurs, then
continue to call listVisibleNodes and buildTaxonomyTree only after the tenant
validation passes.
- Around line 236-255: MarkRunFailed updates a run by ID without enforcing
tenant boundaries; mirror the fix from MarkRunRunning by verifying tenant.
Modify MarkRunFailed to accept or extract the tenant UUID (same approach used by
MarkRunRunning), add a tenant condition to the UPDATE WHERE clause (e.g., "AND
tenant_id = $3" or equivalent positional param) and pass the tenant ID into the
call to queryTaxonomyRun so the query only updates runs belonging to that
tenant; keep taxonomyRunSelect and error handling unchanged except for using the
tenant-aware query.
- Around line 257-268: GetRun (and GetTree) currently fetch taxonomy runs by ID
without tenant scoping; update the handler (taxonomy_handler.go) to
require/obtain tenantID and pass it into the service call, update the service
method (taxonomy_service.go) signatures to accept tenantID and forward it to the
repository, and update repository methods GetRun and GetTree
(taxonomy_repository.go) to add tenant_id to their SQL WHERE clauses (e.g.,
"WHERE id = $1 AND tenant_id = $2") and accept the tenantID parameter, ensuring
you thread the tenantID through queryTaxonomyRun or equivalent functions so
every layer enforces tenant-based access control.
- Around line 215-234: The MarkRunRunning function currently updates a taxonomy
run by id without tenant scoping; change its signature to accept a tenantID
(uuid.UUID) and enforce tenant boundary by either (preferred) including
tenant_id = $2 in the UPDATE/WHERE so the UPDATE only affects runs belonging to
that tenant, or (alternatively) fetch the run with queryTaxonomyRun first and
verify run.TenantID equals the provided tenantID before performing the UPDATE;
ensure error handling returns a not-found or unauthorized error when tenant
mismatch occurs and update any callers of MarkRunRunning to pass the tenantID.
- Around line 378-551: StoreResultAndActivate currently locks a taxonomy run by
ID without verifying tenant ownership; retrieve the current tenant id from the
request context (the service's auth/claims) and change the initial
queryTaxonomyRun call that uses "FROM taxonomy_runs WHERE id = $1 FOR UPDATE" to
include "AND tenant_id = $2 FOR UPDATE" (or equivalent check) and pass the
tenant id; return a not-found or permission error if no row is returned so runs
from other tenants are never locked or processed. Ensure references:
StoreResultAndActivate, queryTaxonomyRun, and the taxonomy_runs WHERE id = $1
FOR UPDATE clause are updated accordingly.
In `@internal/service/taxonomy_service.go`:
- Around line 224-230: CompleteRun currently forwards to
s.repo.StoreResultAndActivate without verifying tenant boundaries; update
CompleteRun to first load the run (e.g., via s.repo.GetRun or s.repo.GetByID
using runID), extract the tenant ID from the request/context (use the same
tenant-extraction helper used by GetRunInput), compare the run.TenantID to the
caller tenant, and return a permission/forbidden error if they differ; only
after that, call s.repo.StoreResultAndActivate and propagate its result,
handling and returning errors consistently.
- Around line 232-243: FailRun must verify the run belongs to the caller's
tenant before marking it failed for defense-in-depth like GetRunInput and
CompleteRun; update FailRun to (1) fetch the run record (e.g., via s.repo.GetRun
or s.repo.GetRunByID) using runID, (2) obtain the caller tenant ID from the
request context using the same mechanism used by GetRunInput/CompleteRun (the
same auth/tenant extraction helper), (3) compare the run's TenantID to the
caller tenant and return a permission/forbidden error if they differ, and only
then proceed to sanitize the message and call s.repo.MarkRunFailed(ctx, runID,
sanitized).
- Around line 209-211: The TaxonomyService GetTree currently forwards directly
to s.repo.GetTree without tenant scoping; update TaxonomyService.GetTree to
extract the tenant (or workspace) identifier from the context (e.g., from auth
claims) and enforce it before returning results — either by calling a repository
method that accepts tenant (add s.repo.GetTreeWithTenant(ctx, runID, tenantID)
or change s.repo.GetTree signature) or by loading the run via s.repo.GetRun(ctx,
runID) and comparing its tenant to the context tenant and returning an error if
they differ; ensure you use the symbols TaxonomyService.GetTree, s.repo.GetTree
(or new s.repo.GetTreeWithTenant / s.repo.GetRun) to locate and change the code.
- Around line 213-222: The GetRunInput method must verify the tenant boundary
before returning data: extract the tenant ID from the context (use your existing
helper like tenant.FromContext or auth.GetTenantIDFromContext), then either
fetch the run metadata via a repository method (e.g.,
s.repo.GetRunByID/GetRunMeta) and compare its TenantID to the context tenant, or
add a repo method like s.repo.VerifyRunTenant(ctx, runID, tenantID) to perform
the check; if the tenant does not match return an authorization error (create
ErrUnauthorizedTenant if needed) and only call s.repo.GetRunInput(ctx, runID,
s.embeddingModel) after the tenant is verified, keeping the existing
embeddingModel check (ErrTaxonomyEmbeddingsNotConfigured).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: a5803349-1f1f-4978-9a9f-6f721320f8b6
📒 Files selected for processing (8)
cmd/api/app.gocmd/api/app_test.gointernal/api/handlers/taxonomy_handler.gointernal/api/handlers/taxonomy_internal_handler.gointernal/config/config.gointernal/models/taxonomy.gointernal/repository/taxonomy_repository.gointernal/service/taxonomy_service.go
Summary
Validation
make migrate-validatego test ./internal/...go test ./internal/service ./internal/repository ./internal/api/handlersgo test -run '^$' ./testsgolangci-lint run --timeout 10m ./...Notes
The local smoke currently uses the taxonomy service's temporary simple generator. POC-backed clustering/LLM 5-level taxonomy generation remains tracked separately on the Python taxonomy service tickets.