Skip to content

feat: add taxonomy schema migration#84

Merged
BhagyaAmarasinghe merged 4 commits into
mainfrom
eng-1159-taxonomy-schema
Jun 10, 2026
Merged

feat: add taxonomy schema migration#84
BhagyaAmarasinghe merged 4 commits into
mainfrom
eng-1159-taxonomy-schema

Conversation

@BhagyaAmarasinghe

@BhagyaAmarasinghe BhagyaAmarasinghe commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add explicit goose migration for taxonomy runs, clusters, cluster memberships, taxonomy nodes, active runs, and node edit events.
  • Keep taxonomy artifacts run-scoped and separate from feedback_records; memberships reference Hub feedback_records.id UUIDs through tenant-bound constraints.
  • Scope active taxonomy runs by Feedback Directory field: tenant_id + source_type + source_id + field_id.
  • Add schema constraints for same-run taxonomy trees, one root per run, soft-remove fields, and rename/soft-remove audit events.
  • Address CodeRabbit review feedback for goose annotation casing, tenant-safe membership linkage, cross-tenant membership regression coverage, and active-run uniqueness coverage.

Why

ENG-1159 replaces the POC's lazy/ad hoc taxonomy schema with first-class Hub-owned persistence. This unblocks the repository layer, Hub taxonomy APIs, and the Python service output contract.

How should this be tested?

  • Run migration validation to verify the new goose migration applies and rolls back cleanly.
  • Run Hub linting and targeted Go test packages for compile and repository/API compatibility.
  • Run DB-backed integration tests against Postgres to verify taxonomy relationships, tenant boundaries, active-run uniqueness, and cascades.
  • Confirm GitHub CI passes for migration validation, code quality, unit tests, integration tests across pg16/pg17/pg18, coverage, API contract tests, and CodeRabbit.

Validation

  • make migrate-validate
  • golangci-lint run --timeout 10m ./...
  • go test -count=1 ./internal/config ./internal/repository ./internal/service ./internal/api/handlers
  • go test -run '^$' ./tests
  • GitHub CI passed: migration validation, code quality, unit tests, integration tests on pg16/pg17/pg18, coverage, API contract tests, and CodeRabbit.

Checklist

  • Added taxonomy schema migration.
  • Added DB schema regression tests.
  • Used canonical lowercase goose annotations.
  • Enforced tenant-safe taxonomy membership linkage.
  • Added cross-tenant membership rejection coverage.
  • Verified active-run uniqueness with a distinct run ID.
  • Confirmed required GitHub checks are passing.

Local note

The new DB-backed schema test compiles locally, but the actual test execution requires Postgres. Local Postgres/Docker was unavailable in this environment (localhost:5432 connect: connection refused; Docker daemon unavailable), so GitHub CI exercised it with the repository's Postgres test service.

@BhagyaAmarasinghe BhagyaAmarasinghe marked this pull request as ready for review June 8, 2026 17:04
@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

This PR introduces a complete PostgreSQL taxonomy schema to persist hierarchical taxonomy generation runs, clusters, membership relationships, and change history. The migration defines ENUM types for run status, node type, and event type, then creates six interdependent tables with cascading foreign keys, uniqueness constraints (including a partial unique index ensuring one root node per run), and CHECK constraints for data integrity. A comprehensive integration test validates constraint enforcement, relationship integrity, and cascade delete behavior across all schema tables when parent records are removed.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers the main objectives, motivation, and validation steps, but lacks the 'How should this be tested?' section with specific test instructions and does not explicitly tick off all required checklist items. Add a 'How should this be tested?' section with detailed test instructions, and provide an explicit checklist indicating which required items were completed.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: adding a taxonomy schema migration. It is concise, specific, and directly related to the changeset's primary objective.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@migrations/009_taxonomy_schema.sql`:
- Line 1: Update the goose annotation casing in the migration to match
repository guidelines: replace the annotation lines using "-- +goose Up" and "--
+goose Down" with the canonical lowercase "-- +goose up" and "-- +goose down"
(check both the top annotation currently shown as "-- +goose Up" and any other
occurrences such as the one referenced around the other occurrence), ensuring
all goose annotations in the file use lowercase "up" and "down".
- Around line 78-90: The membership row allows attaching any feedback_record by
id regardless of run/tenant; change the single-column FK on feedback_record_id
to a composite FK that also includes run_id so membership can only point to
feedback records in the same run/tenant. Replace the current "feedback_record_id
UUID NOT NULL REFERENCES feedback_records(id) ON DELETE CASCADE" and the
separate FOREIGN KEY line with a single composite foreign key definition: add
feedback_record_id UUID NOT NULL (no single-column REFERENCES) and then "FOREIGN
KEY (feedback_record_id, run_id) REFERENCES feedback_records(id, run_id) ON
DELETE CASCADE" in the taxonomy_cluster_memberships table definition so the row
is tenant-safe (referencing table: taxonomy_cluster_memberships, columns:
feedback_record_id and run_id; referenced table: feedback_records, columns: id
and run_id).

In `@tests/taxonomy_schema_test.go`:
- Around line 184-191: The test inserts into taxonomy_active_runs is currently
reusing runID so the failure may be a duplicate PK; change the second insert to
use a distinct run id (e.g., otherRunID) while keeping the same tenantID,
sourceType, sourceID, fieldID to exercise the “only one active run per directory
field” constraint; ensure otherRunID is generated/declared (distinct from runID)
and passed into the second db.Exec call instead of runID.
- Around line 90-107: Add a negative regression case in
tests/taxonomy_schema_test.go that verifies tenant-boundary protection for
taxonomy_cluster_memberships: create or obtain a feedback_record_id that belongs
to a different tenant (e.g., differentTenantFeedbackRecordID), then attempt the
INSERT using db.Exec with that feedback_record_id and the current
runID/clusterID and assert it fails (use require.Error like the existing
negative case). Ensure the test exercises the exact insertion path used earlier
(the INSERT into taxonomy_cluster_memberships with run_id, cluster_id,
feedback_record_id, confidence, metadata) and uses a clearly named variable
(differentTenantFeedbackRecordID) so reviewers can see it’s from another tenant.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 286a54bb-5b23-4903-999e-a146a98940a8

📥 Commits

Reviewing files that changed from the base of the PR and between 602726b and 0a5a238.

📒 Files selected for processing (2)
  • migrations/009_taxonomy_schema.sql
  • tests/taxonomy_schema_test.go

Comment thread migrations/009_taxonomy_schema.sql Outdated
Comment thread migrations/010_taxonomy_schema.sql
Comment thread tests/taxonomy_schema_test.go
Comment thread tests/taxonomy_schema_test.go
Comment thread migrations/010_taxonomy_schema.sql
@BhagyaAmarasinghe

Copy link
Copy Markdown
Contributor Author

Confirmed this was a real deployment risk: the previous ALTER TABLE feedback_records ADD CONSTRAINT ... UNIQUE (id, tenant_id) would build the backing unique index while holding a blocking table lock on feedback_records.

Fixed in bb8f6c0 by splitting the migration:

  • 009_feedback_records_id_tenant_unique_index.sql uses -- +goose NO TRANSACTION and CREATE UNIQUE INDEX CONCURRENTLY for feedback_records (id, tenant_id).
  • 010_taxonomy_schema.sql attaches that prebuilt index with UNIQUE USING INDEX before creating the taxonomy FK that references it.

This follows the same pattern already used in Hub migrations for large-table index work.

@pandeymangg pandeymangg left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@pandeymangg pandeymangg dismissed xernobyl’s stale review June 10, 2026 11:43

addressed changes

@BhagyaAmarasinghe BhagyaAmarasinghe added this pull request to the merge queue Jun 10, 2026
Merged via the queue into main with commit c78b002 Jun 10, 2026
12 checks passed
@BhagyaAmarasinghe BhagyaAmarasinghe deleted the eng-1159-taxonomy-schema branch June 10, 2026 11:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants