From b8c6f3ba670321de93a260ef4c3d49c45e7992ae Mon Sep 17 00:00:00 2001 From: Pepe Fagoaga Date: Thu, 12 Mar 2026 17:37:43 +0000 Subject: [PATCH] chore(skills): add Django migrations skills (#10260) --- AGENTS.md | 6 +- api/AGENTS.md | 10 + skills/django-migration-psql/SKILL.md | 454 ++++++++++++++++++++++++++ skills/postgresql-indexing/SKILL.md | 392 ++++++++++++++++++++++ 4 files changed, 860 insertions(+), 2 deletions(-) create mode 100644 skills/django-migration-psql/SKILL.md create mode 100644 skills/postgresql-indexing/SKILL.md diff --git a/AGENTS.md b/AGENTS.md index 600d353ef9..7eb2262917 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -46,6 +46,8 @@ Use these skills for detailed patterns on-demand: | `prowler-commit` | Professional commits (conventional-commits) | [SKILL.md](skills/prowler-commit/SKILL.md) | | `prowler-pr` | Pull request conventions | [SKILL.md](skills/prowler-pr/SKILL.md) | | `prowler-docs` | Documentation style guide | [SKILL.md](skills/prowler-docs/SKILL.md) | +| `django-migration-psql` | Django migration best practices for PostgreSQL | [SKILL.md](skills/django-migration-psql/SKILL.md) | +| `postgresql-indexing` | PostgreSQL indexing, EXPLAIN, monitoring, maintenance | [SKILL.md](skills/postgresql-indexing/SKILL.md) | | `prowler-attack-paths-query` | Create Attack Paths openCypher queries | [SKILL.md](skills/prowler-attack-paths-query/SKILL.md) | | `gh-aw` | GitHub Agentic Workflows (gh-aw) | [SKILL.md](skills/gh-aw/SKILL.md) | | `skill-creator` | Create new AI agent skills | [SKILL.md](skills/skill-creator/SKILL.md) | @@ -85,15 +87,15 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST: | Fixing bug | `tdd` | | General Prowler development questions | `prowler` | | Implementing JSON:API endpoints | `django-drf` | -| Importing Copilot Custom Agents into workflows | `gh-aw` | | Implementing feature | `tdd` | +| Importing Copilot Custom Agents into workflows | `gh-aw` | | Inspect PR CI checks and gates (.github/workflows/*) | `prowler-ci` | | Inspect PR CI workflows (.github/workflows/*): conventional-commit, pr-check-changelog, pr-conflict-checker, labeler | `prowler-pr` | | Mapping checks to compliance controls | `prowler-compliance` | | Mocking AWS with moto in tests | `prowler-test-sdk` | | Modifying API responses | `jsonapi` | -| Modifying gh-aw workflow frontmatter or safe-outputs | `gh-aw` | | Modifying component | `tdd` | +| Modifying gh-aw workflow frontmatter or safe-outputs | `gh-aw` | | Refactoring code | `tdd` | | Regenerate AGENTS.md Auto-invoke tables (sync.sh) | `skill-sync` | | Review PR requirements: template, title conventions, changelog gate | `prowler-pr` | diff --git a/api/AGENTS.md b/api/AGENTS.md index e1738b4a26..2bdee1b28c 100644 --- a/api/AGENTS.md +++ b/api/AGENTS.md @@ -4,6 +4,8 @@ > - [`prowler-api`](../skills/prowler-api/SKILL.md) - Models, Serializers, Views, RLS patterns > - [`prowler-test-api`](../skills/prowler-test-api/SKILL.md) - Testing patterns (pytest-django) > - [`prowler-attack-paths-query`](../skills/prowler-attack-paths-query/SKILL.md) - Attack Paths openCypher queries +> - [`django-migration-psql`](../skills/django-migration-psql/SKILL.md) - Migration best practices for PostgreSQL +> - [`postgresql-indexing`](../skills/postgresql-indexing/SKILL.md) - PostgreSQL indexing, EXPLAIN, monitoring, maintenance > - [`django-drf`](../skills/django-drf/SKILL.md) - Generic DRF patterns > - [`jsonapi`](../skills/jsonapi/SKILL.md) - Strict JSON:API v1.1 spec compliance > - [`pytest`](../skills/pytest/SKILL.md) - Generic pytest patterns @@ -16,14 +18,20 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST: |--------|-------| | Add changelog entry for a PR or feature | `prowler-changelog` | | Adding DRF pagination or permissions | `django-drf` | +| Adding indexes or constraints to database tables | `django-migration-psql` | | Adding privilege escalation detection queries | `prowler-attack-paths-query` | +| Analyzing query performance with EXPLAIN | `postgresql-indexing` | | Committing changes | `prowler-commit` | | Create PR that requires changelog entry | `prowler-changelog` | | Creating API endpoints | `jsonapi` | | Creating Attack Paths queries | `prowler-attack-paths-query` | | Creating ViewSets, serializers, or filters in api/ | `django-drf` | | Creating a git commit | `prowler-commit` | +| Creating or modifying PostgreSQL indexes | `postgresql-indexing` | +| Creating or reviewing Django migrations | `django-migration-psql` | | Creating/modifying models, views, serializers | `prowler-api` | +| Debugging slow queries or missing indexes | `postgresql-indexing` | +| Dropping or reindexing PostgreSQL indexes | `postgresql-indexing` | | Fixing bug | `tdd` | | Implementing JSON:API endpoints | `django-drf` | | Implementing feature | `tdd` | @@ -32,12 +40,14 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST: | Refactoring code | `tdd` | | Review changelog format and conventions | `prowler-changelog` | | Reviewing JSON:API compliance | `jsonapi` | +| Running makemigrations or pgmakemigrations | `django-migration-psql` | | Testing RLS tenant isolation | `prowler-test-api` | | Update CHANGELOG.md in any component | `prowler-changelog` | | Updating existing Attack Paths queries | `prowler-attack-paths-query` | | Working on task | `tdd` | | Writing Prowler API tests | `prowler-test-api` | | Writing Python tests with pytest | `pytest` | +| Writing data backfill or data migration | `django-migration-psql` | --- diff --git a/skills/django-migration-psql/SKILL.md b/skills/django-migration-psql/SKILL.md new file mode 100644 index 0000000000..d292036dd0 --- /dev/null +++ b/skills/django-migration-psql/SKILL.md @@ -0,0 +1,454 @@ +--- +name: django-migration-psql +description: > + Reviews Django migration files for PostgreSQL best practices specific to Prowler. + Trigger: When creating migrations, running makemigrations/pgmakemigrations, reviewing migration PRs, + adding indexes or constraints to database tables, modifying existing migration files, or writing + data backfill migrations. Always use this skill when you see AddIndex, CreateModel, AddConstraint, + RunPython, bulk_create, bulk_update, or backfill operations in migration files. +license: Apache-2.0 +metadata: + author: prowler-cloud + version: "1.0" + scope: [api, root] + auto_invoke: + - "Creating or reviewing Django migrations" + - "Adding indexes or constraints to database tables" + - "Running makemigrations or pgmakemigrations" + - "Writing data backfill or data migration" +allowed-tools: Read, Grep, Glob, Edit, Write, Bash +--- + +## When to use + +- Creating a new Django migration +- Running `makemigrations` or `pgmakemigrations` +- Reviewing a PR that adds or modifies migrations +- Adding indexes, constraints, or models to the database + +## Why this matters + +A bad migration can lock a production table for minutes, block all reads/writes, or silently skip index creation on partitioned tables. + +## Auto-generated migrations need splitting + +`makemigrations` and `pgmakemigrations` bundle everything into one file: `CreateModel`, `AddIndex`, `AddConstraint`, sometimes across multiple tables. This is the default Django behavior and it violates every rule below. + +After generating a migration, ALWAYS review it and split it: + +1. Read the generated file and identify every operation +2. Group operations by concern: + - `CreateModel` + `AddConstraint` for each new table → one migration per table + - `AddIndex` per table → one migration per table + - `AddIndex` on partitioned tables → two migrations (partition + parent) + - `AlterField`, `AddField`, `RemoveField` for each table → one migration per table +3. Rewrite the generated file into separate migration files with correct dependencies +4. Delete the original auto-generated migration + +When adding fields or indexes to an existing model, `makemigrations` may also bundle `AddIndex` for unrelated tables that had pending model changes. Always check for stowaways from other tables. + +## Rule 1: separate indexes from model creation + +`CreateModel` + `AddConstraint` = same migration (structural). +`AddIndex` = separate migration file (performance). + +Django runs each migration inside a transaction (unless `atomic = False`). If an index operation fails, it rolls back everything, including the model creation. Splitting means a failed index doesn't prevent the table from existing. It also lets you `--fake` index migrations independently (see Rule 4). + +### Bad + +```python +# 0081_finding_group_daily_summary.py — DON'T DO THIS +class Migration(migrations.Migration): + operations = [ + migrations.CreateModel(name="FindingGroupDailySummary", ...), + migrations.AddIndex(model_name="findinggroupdailysummary", ...), # separate this + migrations.AddIndex(model_name="findinggroupdailysummary", ...), # separate this + migrations.AddConstraint(model_name="findinggroupdailysummary", ...), # this is fine here + ] +``` + +### Good + +```python +# 0081_create_finding_group_daily_summary.py +class Migration(migrations.Migration): + operations = [ + migrations.CreateModel(name="FindingGroupDailySummary", ...), + # Constraints belong with the model — they define its integrity rules + migrations.AddConstraint(model_name="findinggroupdailysummary", ...), # unique + migrations.AddConstraint(model_name="findinggroupdailysummary", ...), # RLS + ] + +# 0082_finding_group_daily_summary_indexes.py +class Migration(migrations.Migration): + dependencies = [("api", "0081_create_finding_group_daily_summary")] + operations = [ + migrations.AddIndex(model_name="findinggroupdailysummary", ...), + migrations.AddIndex(model_name="findinggroupdailysummary", ...), + migrations.AddIndex(model_name="findinggroupdailysummary", ...), + ] +``` + +Flag any migration with both `CreateModel` and `AddIndex` in `operations`. + +## Rule 2: one table's indexes per migration + +Each table's indexes must live in their own migration file. Never mix `AddIndex` for different `model_name` values in one migration. + +If the index on table B fails, the rollback also drops the index on table A. The migration name gives no hint that it touches unrelated tables. You lose the ability to `--fake` one table's indexes without affecting the other. + +### Bad + +```python +# 0081_finding_group_daily_summary.py — DON'T DO THIS +class Migration(migrations.Migration): + operations = [ + migrations.CreateModel(name="FindingGroupDailySummary", ...), + migrations.AddIndex(model_name="findinggroupdailysummary", ...), # table A + migrations.AddIndex(model_name="resource", ...), # table B! + migrations.AddIndex(model_name="resource", ...), # table B! + migrations.AddIndex(model_name="finding", ...), # table C! + ] +``` + +### Good + +```python +# 0081_create_finding_group_daily_summary.py — model + constraints +# 0082_finding_group_daily_summary_indexes.py — only FindingGroupDailySummary indexes +# 0083_resource_trigram_indexes.py — only Resource indexes +# 0084_finding_check_index_partitions.py — only Finding partition indexes (step 1) +# 0085_finding_check_index_parent.py — only Finding parent index (step 2) +``` + +Name each migration file after the table it affects. A reviewer should know which table a migration touches without opening the file. + +Flag any migration where `AddIndex` operations reference more than one `model_name`. + +## Rule 3: partitioned table indexes require the two-step pattern + +Tables `findings` and `resource_finding_mappings` are range-partitioned. Plain `AddIndex` only creates the index definition on the parent table. Postgres does NOT propagate it to existing partitions. New partitions inherit it, but all current data stays unindexed. + +Use the helpers in `api.db_utils`. + +### Step 1: create indexes on actual partitions + +```python +# 0084_finding_check_index_partitions.py +from functools import partial +from django.db import migrations +from api.db_utils import create_index_on_partitions, drop_index_on_partitions + + +class Migration(migrations.Migration): + atomic = False # REQUIRED — CREATE INDEX CONCURRENTLY can't run inside a transaction + + dependencies = [("api", "0083_resource_trigram_indexes")] + + operations = [ + migrations.RunPython( + partial( + create_index_on_partitions, + parent_table="findings", + index_name="find_tenant_check_ins_idx", + columns="tenant_id, check_id, inserted_at", + ), + reverse_code=partial( + drop_index_on_partitions, + parent_table="findings", + index_name="find_tenant_check_ins_idx", + ), + ) + ] +``` + +Key details: +- `atomic = False` is mandatory. `CREATE INDEX CONCURRENTLY` cannot run inside a transaction. +- Always provide `reverse_code` using `drop_index_on_partitions` so rollbacks work. +- The default is `all_partitions=True`, which creates indexes on every partition CONCURRENTLY (no locks). This is the safe default. +- Do NOT use `all_partitions=False` unless you understand the consequence: Step 2's `AddIndex` on the parent will create indexes on the skipped partitions **with locks** (not CONCURRENTLY), because PostgreSQL fills in missing partition indexes inline during parent index creation. + +### Step 2: register the index with Django + +```python +# 0085_finding_check_index_parent.py +from django.db import migrations, models + + +class Migration(migrations.Migration): + dependencies = [("api", "0084_finding_check_index_partitions")] + + operations = [ + migrations.AddIndex( + model_name="finding", + index=models.Index( + fields=["tenant_id", "check_id", "inserted_at"], + name="find_tenant_check_ins_idx", + ), + ), + ] +``` + +This second migration tells Django "this index exists" so it doesn't try to recreate it. New partitions created after this point inherit the index definition from the parent. + +### Existing examples in the codebase + +| Partition migration | Parent migration | +|---|---| +| `0020_findings_new_performance_indexes_partitions.py` | `0021_findings_new_performance_indexes_parent.py` | +| `0024_findings_uid_index_partitions.py` | `0025_findings_uid_index_parent.py` | +| `0028_findings_check_index_partitions.py` | `0029_findings_check_index_parent.py` | +| `0036_rfm_tenant_finding_index_partitions.py` | `0037_rfm_tenant_finding_index_parent.py` | + +Flag any plain `AddIndex` on `finding` or `resourcefindingmapping` without a preceding partition migration. + +## Rule 4: large table indexes — fake the migration, apply manually + +For huge tables (findings has millions of rows), even `CREATE INDEX CONCURRENTLY` can take minutes and consume significant I/O. In production, you may want to decouple the migration from the actual index creation. + +### Procedure + +1. Write the migration normally following the two-step pattern above. + +2. Fake the migration so Django marks it as applied without executing it: + +```bash +python manage.py migrate api 0084_finding_check_index_partitions --fake +python manage.py migrate api 0085_finding_check_index_parent --fake +``` + +3. Create the index manually during a low-traffic window via `psql` or `python manage.py dbshell --database admin`: + +```sql +-- For each partition you care about: +CREATE INDEX CONCURRENTLY IF NOT EXISTS findings_2026_jan_find_tenant_check_ins_idx + ON findings_2026_jan USING BTREE (tenant_id, check_id, inserted_at); + +CREATE INDEX CONCURRENTLY IF NOT EXISTS findings_2026_feb_find_tenant_check_ins_idx + ON findings_2026_feb USING BTREE (tenant_id, check_id, inserted_at); + +-- Then register on the parent (this is fast, no data scan): +CREATE INDEX IF NOT EXISTS find_tenant_check_ins_idx + ON findings USING BTREE (tenant_id, check_id, inserted_at); +``` + +4. Verify the index exists on the partitions you need: + +```sql +SELECT indexrelid::regclass, indrelid::regclass +FROM pg_index +WHERE indexrelid::regclass::text LIKE '%find_tenant_check_ins%'; +``` + +### When to use this approach + +- The table will grow exponentially, e.g.: findings. +- You want to control exactly when the I/O hit happens (e.g., during a maintenance window). + +This is optional. For smaller tables or non-production environments, letting the migration run normally is fine. + +## Rule 5: data backfills — never inline, always batched + +Data backfills (updating existing rows, populating new columns, generating summary data) are the most dangerous migrations. A naive `Model.objects.all().update(...)` on a multi-million row table will hold a transaction lock for minutes, blow out WAL, and potentially OOM the worker. + +### Never backfill inline in the migration + +The migration should only dispatch the work. The actual backfill runs asynchronously via Celery tasks, outside the migration transaction. + +```python +# 0090_backfill_finding_group_summaries.py +from django.db import migrations + +def trigger_backfill(apps, schema_editor): + from tasks.jobs.backfill import backfill_finding_group_summaries_task + Tenant = apps.get_model("api", "Tenant") + from api.db_router import MainRouter + + tenant_ids = Tenant.objects.using(MainRouter.admin_db).values_list("id", flat=True) + for tenant_id in tenant_ids: + backfill_finding_group_summaries_task.delay(tenant_id=str(tenant_id)) + +class Migration(migrations.Migration): + dependencies = [("api", "0089_previous_migration")] + operations = [ + migrations.RunPython(trigger_backfill, migrations.RunPython.noop), + ] +``` + +The migration finishes in seconds. The backfill runs in the background per-tenant. + +### Exception: trivial updates + +Single-statement bulk updates on small result sets are OK inline: + +```python +# Fine — single UPDATE, small result set, no iteration +def backfill_graph_data_ready(apps, schema_editor): + AttackPathsScan = apps.get_model("api", "AttackPathsScan") + AttackPathsScan.objects.using(MainRouter.admin_db).filter( + state="completed", graph_data_ready=False, + ).update(graph_data_ready=True) +``` + +Use inline only when you're confident the affected row count is small (< ~10K rows). + +### Batch processing in the Celery task + +The actual backfill task must process data in batches. Use the helpers in `api.db_utils`: + +```python +from api.db_utils import create_objects_in_batches, update_objects_in_batches, batch_delete + +# Creating objects in batches (500 per transaction) +create_objects_in_batches(tenant_id, ScanCategorySummary, summaries, batch_size=500) + +# Updating objects in batches +update_objects_in_batches(tenant_id, Finding, findings, fields=["status"], batch_size=500) + +# Deleting in batches +batch_delete(tenant_id, queryset, batch_size=settings.DJANGO_DELETION_BATCH_SIZE) +``` + +Each batch runs in its own `rls_transaction()` so: +- A failure in batch N doesn't roll back batches 1 through N-1 +- Lock duration is bounded to the batch size +- Memory stays constant regardless of total row count + +### Rules for backfill tasks + +1. **One RLS transaction per batch.** Never wrap the entire backfill in a single transaction. Each batch gets its own `rls_transaction(tenant_id)`. + +2. **Use `bulk_create` / `bulk_update` with explicit `batch_size`.** Never `.save()` in a loop. The default batch_size is 500. + +3. **Use `.iterator()` for reads.** When reading source data, use `queryset.iterator()` to avoid loading the entire result set into memory. + +4. **Use `.only()` / `.values_list()` for reads.** Fetch only the columns you need, not full model instances. + +5. **Catch and skip per-item failures.** Don't let one bad row kill the entire backfill. Log the error, count it, continue. + +```python +scans_processed = 0 +scans_skipped = 0 + +for scan_id in scan_ids: + try: + result = process_scan(tenant_id, scan_id) + scans_processed += 1 + except Exception: + logger.warning("Failed to process scan %s", scan_id) + scans_skipped += 1 + +logger.info("Backfill done: %d processed, %d skipped", scans_processed, scans_skipped) +``` + +6. **Log totals at start and end, not per-batch.** Per-batch logging floods the logs. Log the total count at the start, and the processed/skipped counts at the end. + +7. **Use `ignore_conflicts=True` for idempotent creates.** Makes the backfill safe to re-run if interrupted. + +```python +Model.objects.bulk_create(objects, batch_size=500, ignore_conflicts=True) +``` + +8. **Iterate per-tenant.** Dispatch one Celery task per tenant. This gives you natural parallelism, bounded memory per task, and the ability to retry a single tenant without re-running everything. + +### Existing examples + +| Migration | Task | +|---|---| +| `0062_backfill_daily_severity_summaries.py` | `backfill_daily_severity_summaries_task` | +| `0080_backfill_attack_paths_graph_data_ready.py` | Inline (trivial update) | +| `0082_backfill_finding_group_summaries.py` | `backfill_finding_group_summaries_task` | + +Task implementations: `tasks/jobs/backfill.py` +Batch utilities: `api/db_utils.py` (`batch_delete`, `create_objects_in_batches`, `update_objects_in_batches`) + +## Decision tree + +``` +Auto-generated migration? +├── Yes → Split it following the rules below +└── No → Review it against the rules below + +New model? +├── Yes → CreateModel + AddConstraint in one migration +│ AddIndex in separate migration(s), one per table +└── No, just indexes? +│ ├── Regular table → AddIndex in its own migration +│ └── Partitioned table (findings, resource_finding_mappings)? +│ ├── Step 1: RunPython + create_index_on_partitions (atomic=False) +│ └── Step 2: AddIndex on parent (separate migration) +│ └── Large table? → Consider --fake + manual apply +└── Data backfill? + ├── Trivial update (< ~10K rows)? → Inline RunPython is OK + └── Large backfill? → Migration dispatches Celery task(s) + ├── One task per tenant + ├── Batch processing (bulk_create/bulk_update, batch_size=500) + ├── One rls_transaction per batch + └── Catch + skip per-item failures, log totals +``` + +## Quick reference + +| Scenario | Approach | +|---|---| +| Auto-generated migration | Split by concern and table before committing | +| New model + constraints/RLS | Same migration (constraints are structural) | +| Indexes on a regular table | Separate migration, one table per file | +| Indexes on a partitioned table | Two migrations: partitions first (`RunPython` + `atomic=False`), then parent (`AddIndex`) | +| Index on a huge partitioned table | Same two migrations, but fake + apply manually in production | +| Trivial data backfill (< ~10K rows) | Inline `RunPython` with single `.update()` call | +| Large data backfill | Migration dispatches Celery task per tenant, task batches with `rls_transaction` | + +## Review output format + +1. List each violation with rule number and one-line explanation +2. Show corrected migration file(s) +3. For partitioned tables, show both partition and parent migrations + +If migration passes all checks, say so. + +## Context7 lookups + +**Prerequisite:** Install Context7 MCP server for up-to-date documentation lookup. + +When implementing or debugging migration patterns, query these libraries via `mcp_context7_query-docs`: + +| Library | Context7 ID | Use for | +|---------|-------------|---------| +| Django 5.1 | `/websites/djangoproject_en_5_1` | Migration operations, indexes, constraints, `SchemaEditor` | +| PostgreSQL | `/websites/postgresql_org_docs_current` | `CREATE INDEX CONCURRENTLY`, partitioned tables, `pg_inherits` | +| django-postgres-extra | `/SectorLabs/django-postgres-extra` | Partitioned models, `PostgresPartitionedModel`, partition management | + +**Example queries:** +``` +mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_1", query="migration operations AddIndex RunPython atomic") +mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_1", query="database indexes Meta class concurrently") +mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="CREATE INDEX CONCURRENTLY partitioned table") +mcp_context7_query-docs(libraryId="/SectorLabs/django-postgres-extra", query="partitioned model range partition index") +``` + +> **Note:** Use `mcp_context7_resolve-library-id` first if you need to find the correct library ID. + +## Commands + +```bash +# Generate migrations (ALWAYS review output before committing) +python manage.py makemigrations +python manage.py pgmakemigrations + +# Apply migrations +python manage.py migrate + +# Fake a migration (mark as applied without running) +python manage.py migrate api --fake + +# Manage partitions +python manage.py pgpartition --using admin +``` + +## Resources + +- **Partition helpers**: `api/src/backend/api/db_utils.py` (`create_index_on_partitions`, `drop_index_on_partitions`) +- **Partition config**: `api/src/backend/api/partitions.py` +- **RLS constraints**: `api/src/backend/api/rls.py` +- **Existing examples**: `0028` + `0029`, `0024` + `0025`, `0036` + `0037` diff --git a/skills/postgresql-indexing/SKILL.md b/skills/postgresql-indexing/SKILL.md new file mode 100644 index 0000000000..7fac9f4ecd --- /dev/null +++ b/skills/postgresql-indexing/SKILL.md @@ -0,0 +1,392 @@ +--- +name: postgresql-indexing +description: > + PostgreSQL indexing best practices for Prowler: index design, partial indexes, partitioned table + indexing, EXPLAIN ANALYZE validation, concurrent operations, monitoring, and maintenance. + Trigger: When creating or modifying PostgreSQL indexes, analyzing query performance with EXPLAIN, + debugging slow queries, reviewing index usage statistics, reindexing, dropping indexes, or working + with partitioned table indexes. Also trigger when discussing index strategies, partial indexes, + or index maintenance operations like VACUUM or ANALYZE. +license: Apache-2.0 +metadata: + author: prowler-cloud + version: "1.0" + scope: [api] + auto_invoke: + - "Creating or modifying PostgreSQL indexes" + - "Analyzing query performance with EXPLAIN" + - "Debugging slow queries or missing indexes" + - "Dropping or reindexing PostgreSQL indexes" +allowed-tools: Read, Grep, Glob, Bash +--- + +## When to use + +- Creating or modifying PostgreSQL indexes +- Analyzing query plans with `EXPLAIN` +- Debugging slow queries or missing index usage +- Dropping, reindexing, or validating indexes +- Working with indexes on partitioned tables (findings, resource_finding_mappings) +- Running VACUUM or ANALYZE after index changes + +## Index design + +### Partial indexes: constant columns go in WHERE, not in the key + +When a column has a fixed value for the query (e.g., `state = 'completed'`), put it in the `WHERE` clause of the index, not in the indexed columns. Otherwise the planner cannot exploit the ordering of the other columns. + +```sql +-- Bad: state in the key wastes space and breaks ordering +CREATE INDEX idx_scans_tenant_state ON scans (tenant_id, state, inserted_at DESC); + +-- Good: state as a filter, planner uses tenant_id + inserted_at ordering +CREATE INDEX idx_scans_tenant_ins_completed ON scans (tenant_id, inserted_at DESC) + WHERE state = 'completed'; +``` + +### Column order matters + +Put high-selectivity columns first (columns that filter out the most rows). For composite indexes, the leftmost column must appear in the query's WHERE clause for the index to be used. + +## Validating index effectiveness + +### Always EXPLAIN (ANALYZE, BUFFERS) after adding indexes + +Never assume an index is being used. Run `EXPLAIN (ANALYZE, BUFFERS)` to confirm. + +```sql +EXPLAIN (ANALYZE, BUFFERS) +SELECT * +FROM users +WHERE email = 'user@example.com'; +``` + +Use [Postgres EXPLAIN Visualizer (pev)](https://tatiyants.com/pev/) to visualize query plans and identify bottlenecks. + +### Force index usage for testing + +The planner may choose a sequential scan on small datasets. Toggle `enable_seqscan = off` to confirm the index path works, then re-enable it. + +```sql +SET enable_seqscan = off; + +EXPLAIN (ANALYZE, BUFFERS) +SELECT DISTINCT ON (provider_id) provider_id +FROM scans +WHERE tenant_id = '95383b24-da01-44b5-a713-0d9920d554db' + AND state = 'completed' +ORDER BY provider_id, inserted_at DESC; + +SET enable_seqscan = on; -- always re-enable after testing +``` + +This is for validation only. Never leave `enable_seqscan = off` in production. + +## Over-indexing + +Every extra index has three costs that compound: + +1. **Write overhead.** Every INSERT and UPDATE must maintain all indexes. Extra indexes also kill HOT (Heap-Only-Tuple) updates, which normally skip index maintenance when unindexed columns change. + +2. **Planning time.** The planner evaluates more execution paths per index. On simple OLTP queries, planning time can exceed execution time by 4x when index count is high. + +3. **Lock contention (fastpath limit).** PostgreSQL uses a fast path for the first 16 locks per backend. After 16 relations (table + its indexes), it falls back to slower LWLock mechanisms. At high QPS (100+), this causes `LockManager` wait events. + +Rules: +- Drop unused and redundant indexes regularly +- Be especially careful with partitioned tables (each partition multiplies the index count) +- Use prepared statements to reduce planning overhead when index count is high + +## Finding redundant indexes + +Two indexes are redundant when: +- They have the same columns in the same order (duplicates) +- One is a prefix of the other: index `(a)` is redundant to `(a, b)`, but NOT to `(b, a)` + +Column order matters. For partial indexes, the WHERE clause must also match. + +```sql +-- Quick check: find indexes that share a leading column on the same table +SELECT + a.indrelid::regclass AS table_name, + a.indexrelid::regclass AS index_a, + b.indexrelid::regclass AS index_b, + pg_size_pretty(pg_relation_size(a.indexrelid)) AS size_a, + pg_size_pretty(pg_relation_size(b.indexrelid)) AS size_b +FROM pg_index a +JOIN pg_index b ON a.indrelid = b.indrelid + AND a.indexrelid != b.indexrelid + AND a.indkey::text = ( + SELECT string_agg(x::text, ' ') + FROM unnest(b.indkey[:array_length(a.indkey, 1)]) AS x + ) +WHERE NOT a.indisunique; +``` + +Before dropping: verify on all workload nodes (primary + replicas), use `DROP INDEX CONCURRENTLY`, and monitor for plan regressions. + +## Monitoring index usage + +### Identify unused indexes + +Query `pg_stat_all_indexes` to find indexes that are never or rarely scanned: + +```sql +SELECT + idxstat.schemaname AS schema_name, + idxstat.relname AS table_name, + idxstat.indexrelname AS index_name, + idxstat.idx_scan AS index_scans_count, + idxstat.last_idx_scan AS last_idx_scan_timestamp, + pg_size_pretty(pg_relation_size(idxstat.indexrelid)) AS index_size +FROM pg_stat_all_indexes AS idxstat +JOIN pg_index i ON idxstat.indexrelid = i.indexrelid +WHERE idxstat.schemaname NOT IN ('pg_catalog', 'information_schema', 'pg_toast') + AND NOT i.indisunique +ORDER BY idxstat.idx_scan ASC, idxstat.last_idx_scan ASC; +``` + +Indexes with `idx_scan = 0` and no recent `last_idx_scan` are candidates for removal. + +Before dropping, verify: +- Stats haven't been reset recently (check `stats_reset` in `pg_stat_database`) +- Stats cover at least 1 month of production traffic +- All workload nodes (primary + replicas) have been checked +- The index isn't used by a periodic job that runs infrequently + +```sql +-- Check when stats were last reset +SELECT stats_reset, age(now(), stats_reset) +FROM pg_stat_database +WHERE datname = current_database(); +``` + +### Monitor index creation progress + +Do not assume index creation succeeded. Use `pg_stat_progress_create_index` (Postgres 12+) to watch progress live: + +```sql +SELECT * FROM pg_stat_progress_create_index; +``` + +In psql, use `\watch 5` to refresh every 5 seconds for a live dashboard view. `CREATE INDEX CONCURRENTLY` and `REINDEX CONCURRENTLY` have more phases than standard operations: monitor for blocking sessions and wait events. + +### Validate index integrity + +Check for invalid indexes regularly: + +```sql +SELECT c.relname AS index_name, i.indisvalid +FROM pg_class c +JOIN pg_index i ON i.indexrelid = c.oid +WHERE i.indisvalid = false; +``` + +Invalid indexes are ignored by the planner. They waste space and cause inconsistent query performance, especially on partitioned tables where some partitions may have valid indexes and others do not. + +## Concurrent operations + +### Always use CONCURRENTLY in production + +Never create or drop indexes without `CONCURRENTLY` on live tables. Without it, the operation holds a lock that blocks all writes. + +```sql +-- Create +CREATE INDEX CONCURRENTLY IF NOT EXISTS index_name ON table_name (column_name); + +-- Drop +DROP INDEX CONCURRENTLY IF EXISTS index_name; +``` + +`DROP INDEX CONCURRENTLY` cannot run inside a transaction block. + +### Always use IF NOT EXISTS / IF EXISTS + +Makes scripts idempotent. Safe to re-run without errors from duplicate or missing indexes. + +### Concurrent indexing can fail silently + +`CREATE INDEX CONCURRENTLY` can fail without raising an error. The result is an invalid index that the planner ignores. This is particularly dangerous on partitioned tables: some partitions get valid indexes, others don't, causing inconsistent query performance. + +After any concurrent index creation, always validate: + +```sql +SELECT c.relname, i.indisvalid +FROM pg_class c +JOIN pg_index i ON i.indexrelid = c.oid +WHERE c.relname LIKE '%your_index_name%'; +``` + +## Reindexing invalid indexes + +Rebuild invalid indexes without locking writes: + +```sql +REINDEX INDEX CONCURRENTLY index_name; +``` + +### Understanding _ccnew and _ccold artifacts + +When `CREATE INDEX CONCURRENTLY` or `REINDEX INDEX CONCURRENTLY` is interrupted, temporary indexes may remain: + +| Suffix | Meaning | Action | +|--------|---------|--------| +| `_ccnew` | New index being built, incomplete | Drop it and retry `REINDEX CONCURRENTLY` | +| `_ccold` | Old index being replaced, rebuild succeeded | Safe to drop | + +```sql +-- Example: both original and temp are invalid +-- users_emails_2019 btree (col) INVALID +-- users_emails_2019_ccnew btree (col) INVALID + +-- Drop the failed new one, then retry +DROP INDEX CONCURRENTLY IF EXISTS users_emails_2019_ccnew; +REINDEX INDEX CONCURRENTLY users_emails_2019; +``` + +These leftovers clutter the schema, confuse developers, and waste disk space. Clean them up. + +## Indexing partitioned tables + +### Do NOT use ALTER INDEX ATTACH PARTITION + +As stated in PostgreSQL documentation, `ALTER INDEX ... ATTACH PARTITION` prevents dropping malfunctioning or non-performant indexes from individual partitions. An attached index cannot be dropped by itself and is automatically dropped if its parent index is dropped. + +This removes the ability to manage indexes per-partition, which we need for: +- Dropping broken indexes on specific partitions +- Skipping indexes on old partitions to save storage +- Rebuilding indexes on individual partitions without affecting others + +### Correct approach: create on partitions, then on parent + +1. Create the index on each child partition concurrently: + +```sql +CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_child_partition + ON child_partition (column_name); +``` + +2. Create the index on the parent table (metadata-only, fast): + +```sql +CREATE INDEX IF NOT EXISTS idx_parent + ON parent_table (column_name); +``` + +PostgreSQL will automatically recognize partition-level indexes as part of the parent index definition when the index names and definitions match. + +### Prioritize active partitions + +For time-based partitions (findings uses monthly partitions): + +- Create indexes on recent/current partitions where data is actively queried +- Skip older partitions that are rarely accessed +- The `all_partitions=False` default in `create_index_on_partitions` handles this automatically + +## Index maintenance and bloat + +Over time, B-tree indexes accumulate bloat from updates and deletes. VACUUM reclaims heap space but does NOT rebalance B-tree pages. Periodic reindexing is necessary for heavily updated tables. + +### Detecting bloat + +Indexes with estimated bloat above 50% are candidates for `REINDEX CONCURRENTLY`. Check bloat with tools like `pgstattuple` or bloat estimation queries. + +### Reducing bloat buildup + +Three things slow degradation: +1. **Upgrade to PostgreSQL 14+** for B-tree deduplication and bottom-up deletion +2. **Maximize HOT updates** by not indexing frequently-updated columns +3. **Tune autovacuum** to run more aggressively on high-churn tables + +### Rebuilding many indexes without deadlocks + +If you rebuild two indexes on the same table in parallel, PostgreSQL detects a deadlock and kills one session. To rebuild many indexes across multiple sessions safely, assign all indexes for a given table to the same session: + +```sql +\set NUMBER_OF_SESSIONS 10 + +SELECT + format('%I.%I', n.nspname, c.relname) AS table_fqn, + format('%I.%I', n.nspname, i.relname) AS index_fqn, + mod( + hashtext(format('%I.%I', n.nspname, c.relname)) & 2147483647, + :NUMBER_OF_SESSIONS + ) AS session_id +FROM pg_index idx +JOIN pg_class c ON idx.indrelid = c.oid +JOIN pg_class i ON idx.indexrelid = i.oid +JOIN pg_namespace n ON c.relnamespace = n.oid +WHERE n.nspname NOT IN ('pg_catalog', 'pg_toast', 'information_schema') +ORDER BY table_fqn, index_fqn; +``` + +Then run each session's indexes in a separate `REINDEX INDEX CONCURRENTLY` call. Set `NUMBER_OF_SESSIONS` based on `max_parallel_maintenance_workers` and available I/O. + +## Dropping indexes + +### Post-drop maintenance + +After dropping an index, run VACUUM and ANALYZE to reclaim space and update planner statistics: + +```sql +-- Full vacuum + analyze (can be heavy on large tables) +VACUUM (ANALYZE) your_table; + +-- Lightweight alternative for huge tables: just update statistics +ANALYZE your_table; +``` + +## Commands + +```sql +-- Validate query uses an index +EXPLAIN (ANALYZE, BUFFERS) SELECT ...; + +-- Check index creation progress +SELECT * FROM pg_stat_progress_create_index; + +-- Find invalid indexes +SELECT c.relname, i.indisvalid +FROM pg_class c JOIN pg_index i ON i.indexrelid = c.oid +WHERE i.indisvalid = false; + +-- Find unused indexes +SELECT relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid)) +FROM pg_stat_all_indexes +WHERE schemaname = 'public' AND idx_scan = 0; + +-- Create index safely +CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_name ON table (columns); + +-- Drop index safely +DROP INDEX CONCURRENTLY IF EXISTS idx_name; + +-- Rebuild invalid index +REINDEX INDEX CONCURRENTLY idx_name; + +-- Post-drop maintenance +VACUUM (ANALYZE) table_name; +``` + +## Context7 lookups + +**Prerequisite:** Install Context7 MCP server for up-to-date documentation lookup. + +| Library | Context7 ID | Use for | +|---------|-------------|---------| +| PostgreSQL | `/websites/postgresql_org_docs_current` | Index types, EXPLAIN, partitioned table indexing, REINDEX | + +**Example queries:** +``` +mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="CREATE INDEX CONCURRENTLY partitioned table") +mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="EXPLAIN ANALYZE BUFFERS query plan") +mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="partial index WHERE clause") +mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="REINDEX CONCURRENTLY invalid index") +mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="pg_stat_all_indexes monitoring") +``` + +> **Note:** Use `mcp_context7_resolve-library-id` first if you need to find the correct library ID. + +## Resources + +- **EXPLAIN Visualizer**: [pev](https://tatiyants.com/pev/)