feat: add taxonomy schema migration#84
Conversation
WalkthroughThis PR introduces a complete PostgreSQL taxonomy schema to persist hierarchical taxonomy generation runs, clusters, membership relationships, and change history. The migration defines ENUM types for run status, node type, and event type, then creates six interdependent tables with cascading foreign keys, uniqueness constraints (including a partial unique index ensuring one root node per run), and CHECK constraints for data integrity. A comprehensive integration test validates constraint enforcement, relationship integrity, and cascade delete behavior across all schema tables when parent records are removed. 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@migrations/009_taxonomy_schema.sql`:
- Line 1: Update the goose annotation casing in the migration to match
repository guidelines: replace the annotation lines using "-- +goose Up" and "--
+goose Down" with the canonical lowercase "-- +goose up" and "-- +goose down"
(check both the top annotation currently shown as "-- +goose Up" and any other
occurrences such as the one referenced around the other occurrence), ensuring
all goose annotations in the file use lowercase "up" and "down".
- Around line 78-90: The membership row allows attaching any feedback_record by
id regardless of run/tenant; change the single-column FK on feedback_record_id
to a composite FK that also includes run_id so membership can only point to
feedback records in the same run/tenant. Replace the current "feedback_record_id
UUID NOT NULL REFERENCES feedback_records(id) ON DELETE CASCADE" and the
separate FOREIGN KEY line with a single composite foreign key definition: add
feedback_record_id UUID NOT NULL (no single-column REFERENCES) and then "FOREIGN
KEY (feedback_record_id, run_id) REFERENCES feedback_records(id, run_id) ON
DELETE CASCADE" in the taxonomy_cluster_memberships table definition so the row
is tenant-safe (referencing table: taxonomy_cluster_memberships, columns:
feedback_record_id and run_id; referenced table: feedback_records, columns: id
and run_id).
In `@tests/taxonomy_schema_test.go`:
- Around line 184-191: The test inserts into taxonomy_active_runs is currently
reusing runID so the failure may be a duplicate PK; change the second insert to
use a distinct run id (e.g., otherRunID) while keeping the same tenantID,
sourceType, sourceID, fieldID to exercise the “only one active run per directory
field” constraint; ensure otherRunID is generated/declared (distinct from runID)
and passed into the second db.Exec call instead of runID.
- Around line 90-107: Add a negative regression case in
tests/taxonomy_schema_test.go that verifies tenant-boundary protection for
taxonomy_cluster_memberships: create or obtain a feedback_record_id that belongs
to a different tenant (e.g., differentTenantFeedbackRecordID), then attempt the
INSERT using db.Exec with that feedback_record_id and the current
runID/clusterID and assert it fails (use require.Error like the existing
negative case). Ensure the test exercises the exact insertion path used earlier
(the INSERT into taxonomy_cluster_memberships with run_id, cluster_id,
feedback_record_id, confidence, metadata) and uses a clearly named variable
(differentTenantFeedbackRecordID) so reviewers can see it’s from another tenant.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 286a54bb-5b23-4903-999e-a146a98940a8
📒 Files selected for processing (2)
migrations/009_taxonomy_schema.sqltests/taxonomy_schema_test.go
|
Confirmed this was a real deployment risk: the previous Fixed in
This follows the same pattern already used in Hub migrations for large-table index work. |
Summary
feedback_records; memberships reference Hubfeedback_records.idUUIDs through tenant-bound constraints.tenant_id + source_type + source_id + field_id.Why
ENG-1159 replaces the POC's lazy/ad hoc taxonomy schema with first-class Hub-owned persistence. This unblocks the repository layer, Hub taxonomy APIs, and the Python service output contract.
How should this be tested?
Validation
make migrate-validategolangci-lint run --timeout 10m ./...go test -count=1 ./internal/config ./internal/repository ./internal/service ./internal/api/handlersgo test -run '^$' ./testsChecklist
Local note
The new DB-backed schema test compiles locally, but the actual test execution requires Postgres. Local Postgres/Docker was unavailable in this environment (
localhost:5432 connect: connection refused; Docker daemon unavailable), so GitHub CI exercised it with the repository's Postgres test service.