chore(skills): add Django migrations skills (#10260)

2026-05-06 08:47:18 +00:00 · 2026-03-12 17:37:43 +00:00
parent 80a814afce
commit b8c6f3ba67
4 changed files with 860 additions and 2 deletions
@@ -46,6 +46,8 @@ Use these skills for detailed patterns on-demand:
 | `prowler-commit` | Professional commits (conventional-commits) | [SKILL.md](skills/prowler-commit/SKILL.md) |
 | `prowler-pr` | Pull request conventions | [SKILL.md](skills/prowler-pr/SKILL.md) |
 | `prowler-docs` | Documentation style guide | [SKILL.md](skills/prowler-docs/SKILL.md) |
+| `django-migration-psql` | Django migration best practices for PostgreSQL | [SKILL.md](skills/django-migration-psql/SKILL.md) |
+| `postgresql-indexing` | PostgreSQL indexing, EXPLAIN, monitoring, maintenance | [SKILL.md](skills/postgresql-indexing/SKILL.md) |
 | `prowler-attack-paths-query` | Create Attack Paths openCypher queries | [SKILL.md](skills/prowler-attack-paths-query/SKILL.md) |
 | `gh-aw` | GitHub Agentic Workflows (gh-aw) | [SKILL.md](skills/gh-aw/SKILL.md) |
 | `skill-creator` | Create new AI agent skills | [SKILL.md](skills/skill-creator/SKILL.md) |
@@ -85,15 +87,15 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
 | Fixing bug | `tdd` |
 | General Prowler development questions | `prowler` |
 | Implementing JSON:API endpoints | `django-drf` |
-| Importing Copilot Custom Agents into workflows | `gh-aw` |
 | Implementing feature | `tdd` |
+| Importing Copilot Custom Agents into workflows | `gh-aw` |
 | Inspect PR CI checks and gates (.github/workflows/*) | `prowler-ci` |
 | Inspect PR CI workflows (.github/workflows/*): conventional-commit, pr-check-changelog, pr-conflict-checker, labeler | `prowler-pr` |
 | Mapping checks to compliance controls | `prowler-compliance` |
 | Mocking AWS with moto in tests | `prowler-test-sdk` |
 | Modifying API responses | `jsonapi` |
-| Modifying gh-aw workflow frontmatter or safe-outputs | `gh-aw` |
 | Modifying component | `tdd` |
+| Modifying gh-aw workflow frontmatter or safe-outputs | `gh-aw` |
 | Refactoring code | `tdd` |
 | Regenerate AGENTS.md Auto-invoke tables (sync.sh) | `skill-sync` |
 | Review PR requirements: template, title conventions, changelog gate | `prowler-pr` |
@@ -4,6 +4,8 @@
 > - [`prowler-api`](../skills/prowler-api/SKILL.md) - Models, Serializers, Views, RLS patterns
 > - [`prowler-test-api`](../skills/prowler-test-api/SKILL.md) - Testing patterns (pytest-django)
 > - [`prowler-attack-paths-query`](../skills/prowler-attack-paths-query/SKILL.md) - Attack Paths openCypher queries
+> - [`django-migration-psql`](../skills/django-migration-psql/SKILL.md) - Migration best practices for PostgreSQL
+> - [`postgresql-indexing`](../skills/postgresql-indexing/SKILL.md) - PostgreSQL indexing, EXPLAIN, monitoring, maintenance
 > - [`django-drf`](../skills/django-drf/SKILL.md) - Generic DRF patterns
 > - [`jsonapi`](../skills/jsonapi/SKILL.md) - Strict JSON:API v1.1 spec compliance
 > - [`pytest`](../skills/pytest/SKILL.md) - Generic pytest patterns
@@ -16,14 +18,20 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
 |--------|-------|
 | Add changelog entry for a PR or feature | `prowler-changelog` |
 | Adding DRF pagination or permissions | `django-drf` |
+| Adding indexes or constraints to database tables | `django-migration-psql` |
 | Adding privilege escalation detection queries | `prowler-attack-paths-query` |
+| Analyzing query performance with EXPLAIN | `postgresql-indexing` |
 | Committing changes | `prowler-commit` |
 | Create PR that requires changelog entry | `prowler-changelog` |
 | Creating API endpoints | `jsonapi` |
 | Creating Attack Paths queries | `prowler-attack-paths-query` |
 | Creating ViewSets, serializers, or filters in api/ | `django-drf` |
 | Creating a git commit | `prowler-commit` |
+| Creating or modifying PostgreSQL indexes | `postgresql-indexing` |
+| Creating or reviewing Django migrations | `django-migration-psql` |
 | Creating/modifying models, views, serializers | `prowler-api` |
+| Debugging slow queries or missing indexes | `postgresql-indexing` |
+| Dropping or reindexing PostgreSQL indexes | `postgresql-indexing` |
 | Fixing bug | `tdd` |
 | Implementing JSON:API endpoints | `django-drf` |
 | Implementing feature | `tdd` |
@@ -32,12 +40,14 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
 | Refactoring code | `tdd` |
 | Review changelog format and conventions | `prowler-changelog` |
 | Reviewing JSON:API compliance | `jsonapi` |
+| Running makemigrations or pgmakemigrations | `django-migration-psql` |
 | Testing RLS tenant isolation | `prowler-test-api` |
 | Update CHANGELOG.md in any component | `prowler-changelog` |
 | Updating existing Attack Paths queries | `prowler-attack-paths-query` |
 | Working on task | `tdd` |
 | Writing Prowler API tests | `prowler-test-api` |
 | Writing Python tests with pytest | `pytest` |
+| Writing data backfill or data migration | `django-migration-psql` |

 ---

@@ -0,0 +1,454 @@
+---
+name: django-migration-psql
+description: >
+  Reviews Django migration files for PostgreSQL best practices specific to Prowler.
+  Trigger: When creating migrations, running makemigrations/pgmakemigrations, reviewing migration PRs,
+  adding indexes or constraints to database tables, modifying existing migration files, or writing
+  data backfill migrations. Always use this skill when you see AddIndex, CreateModel, AddConstraint,
+  RunPython, bulk_create, bulk_update, or backfill operations in migration files.
+license: Apache-2.0
+metadata:
+  author: prowler-cloud
+  version: "1.0"
+  scope: [api, root]
+  auto_invoke:
+    - "Creating or reviewing Django migrations"
+    - "Adding indexes or constraints to database tables"
+    - "Running makemigrations or pgmakemigrations"
+    - "Writing data backfill or data migration"
+allowed-tools: Read, Grep, Glob, Edit, Write, Bash
+---
+
+## When to use
+
+- Creating a new Django migration
+- Running `makemigrations` or `pgmakemigrations`
+- Reviewing a PR that adds or modifies migrations
+- Adding indexes, constraints, or models to the database
+
+## Why this matters
+
+A bad migration can lock a production table for minutes, block all reads/writes, or silently skip index creation on partitioned tables.
+
+## Auto-generated migrations need splitting
+
+`makemigrations` and `pgmakemigrations` bundle everything into one file: `CreateModel`, `AddIndex`, `AddConstraint`, sometimes across multiple tables. This is the default Django behavior and it violates every rule below.
+
+After generating a migration, ALWAYS review it and split it:
+
+1. Read the generated file and identify every operation
+2. Group operations by concern:
+   - `CreateModel` + `AddConstraint` for each new table → one migration per table
+   - `AddIndex` per table → one migration per table
+   - `AddIndex` on partitioned tables → two migrations (partition + parent)
+   - `AlterField`, `AddField`, `RemoveField` for each table → one migration per table
+3. Rewrite the generated file into separate migration files with correct dependencies
+4. Delete the original auto-generated migration
+
+When adding fields or indexes to an existing model, `makemigrations` may also bundle `AddIndex` for unrelated tables that had pending model changes. Always check for stowaways from other tables.
+
+## Rule 1: separate indexes from model creation
+
+`CreateModel` + `AddConstraint` = same migration (structural).
+`AddIndex` = separate migration file (performance).
+
+Django runs each migration inside a transaction (unless `atomic = False`). If an index operation fails, it rolls back everything, including the model creation. Splitting means a failed index doesn't prevent the table from existing. It also lets you `--fake` index migrations independently (see Rule 4).
+
+### Bad
+
+```python
+# 0081_finding_group_daily_summary.py — DON'T DO THIS
+class Migration(migrations.Migration):
+    operations = [
+        migrations.CreateModel(name="FindingGroupDailySummary", ...),
+        migrations.AddIndex(model_name="findinggroupdailysummary", ...),  # separate this
+        migrations.AddIndex(model_name="findinggroupdailysummary", ...),  # separate this
+        migrations.AddConstraint(model_name="findinggroupdailysummary", ...),  # this is fine here
+    ]
+```
+
+### Good
+
+```python
+# 0081_create_finding_group_daily_summary.py
+class Migration(migrations.Migration):
+    operations = [
+        migrations.CreateModel(name="FindingGroupDailySummary", ...),
+        # Constraints belong with the model — they define its integrity rules
+        migrations.AddConstraint(model_name="findinggroupdailysummary", ...),  # unique
+        migrations.AddConstraint(model_name="findinggroupdailysummary", ...),  # RLS
+    ]
+
+# 0082_finding_group_daily_summary_indexes.py
+class Migration(migrations.Migration):
+    dependencies = [("api", "0081_create_finding_group_daily_summary")]
+    operations = [
+        migrations.AddIndex(model_name="findinggroupdailysummary", ...),
+        migrations.AddIndex(model_name="findinggroupdailysummary", ...),
+        migrations.AddIndex(model_name="findinggroupdailysummary", ...),
+    ]
+```
+
+Flag any migration with both `CreateModel` and `AddIndex` in `operations`.
+
+## Rule 2: one table's indexes per migration
+
+Each table's indexes must live in their own migration file. Never mix `AddIndex` for different `model_name` values in one migration.
+
+If the index on table B fails, the rollback also drops the index on table A. The migration name gives no hint that it touches unrelated tables. You lose the ability to `--fake` one table's indexes without affecting the other.
+
+### Bad
+
+```python
+# 0081_finding_group_daily_summary.py — DON'T DO THIS
+class Migration(migrations.Migration):
+    operations = [
+        migrations.CreateModel(name="FindingGroupDailySummary", ...),
+        migrations.AddIndex(model_name="findinggroupdailysummary", ...),  # table A
+        migrations.AddIndex(model_name="resource", ...),                  # table B!
+        migrations.AddIndex(model_name="resource", ...),                  # table B!
+        migrations.AddIndex(model_name="finding", ...),                   # table C!
+    ]
+```
+
+### Good
+
+```python
+# 0081_create_finding_group_daily_summary.py  — model + constraints
+# 0082_finding_group_daily_summary_indexes.py — only FindingGroupDailySummary indexes
+# 0083_resource_trigram_indexes.py            — only Resource indexes
+# 0084_finding_check_index_partitions.py      — only Finding partition indexes (step 1)
+# 0085_finding_check_index_parent.py          — only Finding parent index (step 2)
+```
+
+Name each migration file after the table it affects. A reviewer should know which table a migration touches without opening the file.
+
+Flag any migration where `AddIndex` operations reference more than one `model_name`.
+
+## Rule 3: partitioned table indexes require the two-step pattern
+
+Tables `findings` and `resource_finding_mappings` are range-partitioned. Plain `AddIndex` only creates the index definition on the parent table. Postgres does NOT propagate it to existing partitions. New partitions inherit it, but all current data stays unindexed.
+
+Use the helpers in `api.db_utils`.
+
+### Step 1: create indexes on actual partitions
+
+```python
+# 0084_finding_check_index_partitions.py
+from functools import partial
+from django.db import migrations
+from api.db_utils import create_index_on_partitions, drop_index_on_partitions
+
+
+class Migration(migrations.Migration):
+    atomic = False  # REQUIRED — CREATE INDEX CONCURRENTLY can't run inside a transaction
+
+    dependencies = [("api", "0083_resource_trigram_indexes")]
+
+    operations = [
+        migrations.RunPython(
+            partial(
+                create_index_on_partitions,
+                parent_table="findings",
+                index_name="find_tenant_check_ins_idx",
+                columns="tenant_id, check_id, inserted_at",
+            ),
+            reverse_code=partial(
+                drop_index_on_partitions,
+                parent_table="findings",
+                index_name="find_tenant_check_ins_idx",
+            ),
+        )
+    ]
+```
+
+Key details:
+- `atomic = False` is mandatory. `CREATE INDEX CONCURRENTLY` cannot run inside a transaction.
+- Always provide `reverse_code` using `drop_index_on_partitions` so rollbacks work.
+- The default is `all_partitions=True`, which creates indexes on every partition CONCURRENTLY (no locks). This is the safe default.
+- Do NOT use `all_partitions=False` unless you understand the consequence: Step 2's `AddIndex` on the parent will create indexes on the skipped partitions **with locks** (not CONCURRENTLY), because PostgreSQL fills in missing partition indexes inline during parent index creation.
+
+### Step 2: register the index with Django
+
+```python
+# 0085_finding_check_index_parent.py
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+    dependencies = [("api", "0084_finding_check_index_partitions")]
+
+    operations = [
+        migrations.AddIndex(
+            model_name="finding",
+            index=models.Index(
+                fields=["tenant_id", "check_id", "inserted_at"],
+                name="find_tenant_check_ins_idx",
+            ),
+        ),
+    ]
+```
+
+This second migration tells Django "this index exists" so it doesn't try to recreate it. New partitions created after this point inherit the index definition from the parent.
+
+### Existing examples in the codebase
+
+| Partition migration | Parent migration |
+|---|---|
+| `0020_findings_new_performance_indexes_partitions.py` | `0021_findings_new_performance_indexes_parent.py` |
+| `0024_findings_uid_index_partitions.py` | `0025_findings_uid_index_parent.py` |
+| `0028_findings_check_index_partitions.py` | `0029_findings_check_index_parent.py` |
+| `0036_rfm_tenant_finding_index_partitions.py` | `0037_rfm_tenant_finding_index_parent.py` |
+
+Flag any plain `AddIndex` on `finding` or `resourcefindingmapping` without a preceding partition migration.
+
+## Rule 4: large table indexes — fake the migration, apply manually
+
+For huge tables (findings has millions of rows), even `CREATE INDEX CONCURRENTLY` can take minutes and consume significant I/O. In production, you may want to decouple the migration from the actual index creation.
+
+### Procedure
+
+1. Write the migration normally following the two-step pattern above.
+
+2. Fake the migration so Django marks it as applied without executing it:
+
+```bash
+python manage.py migrate api 0084_finding_check_index_partitions --fake
+python manage.py migrate api 0085_finding_check_index_parent --fake
+```
+
+3. Create the index manually during a low-traffic window via `psql` or `python manage.py dbshell --database admin`:
+
+```sql
+-- For each partition you care about:
+CREATE INDEX CONCURRENTLY IF NOT EXISTS findings_2026_jan_find_tenant_check_ins_idx
+    ON findings_2026_jan USING BTREE (tenant_id, check_id, inserted_at);
+
+CREATE INDEX CONCURRENTLY IF NOT EXISTS findings_2026_feb_find_tenant_check_ins_idx
+    ON findings_2026_feb USING BTREE (tenant_id, check_id, inserted_at);
+
+-- Then register on the parent (this is fast, no data scan):
+CREATE INDEX IF NOT EXISTS find_tenant_check_ins_idx
+    ON findings USING BTREE (tenant_id, check_id, inserted_at);
+```
+
+4. Verify the index exists on the partitions you need:
+
+```sql
+SELECT indexrelid::regclass, indrelid::regclass
+FROM pg_index
+WHERE indexrelid::regclass::text LIKE '%find_tenant_check_ins%';
+```
+
+### When to use this approach
+
+- The table will grow exponentially, e.g.: findings.
+- You want to control exactly when the I/O hit happens (e.g., during a maintenance window).
+
+This is optional. For smaller tables or non-production environments, letting the migration run normally is fine.
+
+## Rule 5: data backfills — never inline, always batched
+
+Data backfills (updating existing rows, populating new columns, generating summary data) are the most dangerous migrations. A naive `Model.objects.all().update(...)` on a multi-million row table will hold a transaction lock for minutes, blow out WAL, and potentially OOM the worker.
+
+### Never backfill inline in the migration
+
+The migration should only dispatch the work. The actual backfill runs asynchronously via Celery tasks, outside the migration transaction.
+
+```python
+# 0090_backfill_finding_group_summaries.py
+from django.db import migrations
+
+def trigger_backfill(apps, schema_editor):
+    from tasks.jobs.backfill import backfill_finding_group_summaries_task
+    Tenant = apps.get_model("api", "Tenant")
+    from api.db_router import MainRouter
+
+    tenant_ids = Tenant.objects.using(MainRouter.admin_db).values_list("id", flat=True)
+    for tenant_id in tenant_ids:
+        backfill_finding_group_summaries_task.delay(tenant_id=str(tenant_id))
+
+class Migration(migrations.Migration):
+    dependencies = [("api", "0089_previous_migration")]
+    operations = [
+        migrations.RunPython(trigger_backfill, migrations.RunPython.noop),
+    ]
+```
+
+The migration finishes in seconds. The backfill runs in the background per-tenant.
+
+### Exception: trivial updates
+
+Single-statement bulk updates on small result sets are OK inline:
+
+```python
+# Fine — single UPDATE, small result set, no iteration
+def backfill_graph_data_ready(apps, schema_editor):
+    AttackPathsScan = apps.get_model("api", "AttackPathsScan")
+    AttackPathsScan.objects.using(MainRouter.admin_db).filter(
+        state="completed", graph_data_ready=False,
+    ).update(graph_data_ready=True)
+```
+
+Use inline only when you're confident the affected row count is small (< ~10K rows).
+
+### Batch processing in the Celery task
+
+The actual backfill task must process data in batches. Use the helpers in `api.db_utils`:
+
+```python
+from api.db_utils import create_objects_in_batches, update_objects_in_batches, batch_delete
+
+# Creating objects in batches (500 per transaction)
+create_objects_in_batches(tenant_id, ScanCategorySummary, summaries, batch_size=500)
+
+# Updating objects in batches
+update_objects_in_batches(tenant_id, Finding, findings, fields=["status"], batch_size=500)
+
+# Deleting in batches
+batch_delete(tenant_id, queryset, batch_size=settings.DJANGO_DELETION_BATCH_SIZE)
+```
+
+Each batch runs in its own `rls_transaction()` so:
+- A failure in batch N doesn't roll back batches 1 through N-1
+- Lock duration is bounded to the batch size
+- Memory stays constant regardless of total row count
+
+### Rules for backfill tasks
+
+1. **One RLS transaction per batch.** Never wrap the entire backfill in a single transaction. Each batch gets its own `rls_transaction(tenant_id)`.
+
+2. **Use `bulk_create` / `bulk_update` with explicit `batch_size`.** Never `.save()` in a loop. The default batch_size is 500.
+
+3. **Use `.iterator()` for reads.** When reading source data, use `queryset.iterator()` to avoid loading the entire result set into memory.
+
+4. **Use `.only()` / `.values_list()` for reads.** Fetch only the columns you need, not full model instances.
+
+5. **Catch and skip per-item failures.** Don't let one bad row kill the entire backfill. Log the error, count it, continue.
+
+```python
+scans_processed = 0
+scans_skipped = 0
+
+for scan_id in scan_ids:
+    try:
+        result = process_scan(tenant_id, scan_id)
+        scans_processed += 1
+    except Exception:
+        logger.warning("Failed to process scan %s", scan_id)
+        scans_skipped += 1
+
+logger.info("Backfill done: %d processed, %d skipped", scans_processed, scans_skipped)
+```
+
+6. **Log totals at start and end, not per-batch.** Per-batch logging floods the logs. Log the total count at the start, and the processed/skipped counts at the end.
+
+7. **Use `ignore_conflicts=True` for idempotent creates.** Makes the backfill safe to re-run if interrupted.
+
+```python
+Model.objects.bulk_create(objects, batch_size=500, ignore_conflicts=True)
+```
+
+8. **Iterate per-tenant.** Dispatch one Celery task per tenant. This gives you natural parallelism, bounded memory per task, and the ability to retry a single tenant without re-running everything.
+
+### Existing examples
+
+| Migration | Task |
+|---|---|
+| `0062_backfill_daily_severity_summaries.py` | `backfill_daily_severity_summaries_task` |
+| `0080_backfill_attack_paths_graph_data_ready.py` | Inline (trivial update) |
+| `0082_backfill_finding_group_summaries.py` | `backfill_finding_group_summaries_task` |
+
+Task implementations: `tasks/jobs/backfill.py`
+Batch utilities: `api/db_utils.py` (`batch_delete`, `create_objects_in_batches`, `update_objects_in_batches`)
+
+## Decision tree
+
+```
+Auto-generated migration?
+├── Yes → Split it following the rules below
+└── No → Review it against the rules below
+
+New model?
+├── Yes → CreateModel + AddConstraint in one migration
+│         AddIndex in separate migration(s), one per table
+└── No, just indexes?
+│   ├── Regular table → AddIndex in its own migration
+│   └── Partitioned table (findings, resource_finding_mappings)?
+│       ├── Step 1: RunPython + create_index_on_partitions (atomic=False)
+│       └── Step 2: AddIndex on parent (separate migration)
+│           └── Large table? → Consider --fake + manual apply
+└── Data backfill?
+    ├── Trivial update (< ~10K rows)? → Inline RunPython is OK
+    └── Large backfill? → Migration dispatches Celery task(s)
+        ├── One task per tenant
+        ├── Batch processing (bulk_create/bulk_update, batch_size=500)
+        ├── One rls_transaction per batch
+        └── Catch + skip per-item failures, log totals
+```
+
+## Quick reference
+
+| Scenario | Approach |
+|---|---|
+| Auto-generated migration | Split by concern and table before committing |
+| New model + constraints/RLS | Same migration (constraints are structural) |
+| Indexes on a regular table | Separate migration, one table per file |
+| Indexes on a partitioned table | Two migrations: partitions first (`RunPython` + `atomic=False`), then parent (`AddIndex`) |
+| Index on a huge partitioned table | Same two migrations, but fake + apply manually in production |
+| Trivial data backfill (< ~10K rows) | Inline `RunPython` with single `.update()` call |
+| Large data backfill | Migration dispatches Celery task per tenant, task batches with `rls_transaction` |
+
+## Review output format
+
+1. List each violation with rule number and one-line explanation
+2. Show corrected migration file(s)
+3. For partitioned tables, show both partition and parent migrations
+
+If migration passes all checks, say so.
+
+## Context7 lookups
+
+**Prerequisite:** Install Context7 MCP server for up-to-date documentation lookup.
+
+When implementing or debugging migration patterns, query these libraries via `mcp_context7_query-docs`:
+
+| Library | Context7 ID | Use for |
+|---------|-------------|---------|
+| Django 5.1 | `/websites/djangoproject_en_5_1` | Migration operations, indexes, constraints, `SchemaEditor` |
+| PostgreSQL | `/websites/postgresql_org_docs_current` | `CREATE INDEX CONCURRENTLY`, partitioned tables, `pg_inherits` |
+| django-postgres-extra | `/SectorLabs/django-postgres-extra` | Partitioned models, `PostgresPartitionedModel`, partition management |
+
+**Example queries:**
+```
+mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_1", query="migration operations AddIndex RunPython atomic")
+mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_1", query="database indexes Meta class concurrently")
+mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="CREATE INDEX CONCURRENTLY partitioned table")
+mcp_context7_query-docs(libraryId="/SectorLabs/django-postgres-extra", query="partitioned model range partition index")
+```
+
+> **Note:** Use `mcp_context7_resolve-library-id` first if you need to find the correct library ID.
+
+## Commands
+
+```bash
+# Generate migrations (ALWAYS review output before committing)
+python manage.py makemigrations
+python manage.py pgmakemigrations
+
+# Apply migrations
+python manage.py migrate
+
+# Fake a migration (mark as applied without running)
+python manage.py migrate api <migration_name> --fake
+
+# Manage partitions
+python manage.py pgpartition --using admin
+```
+
+## Resources
+
+- **Partition helpers**: `api/src/backend/api/db_utils.py` (`create_index_on_partitions`, `drop_index_on_partitions`)
+- **Partition config**: `api/src/backend/api/partitions.py`
+- **RLS constraints**: `api/src/backend/api/rls.py`
+- **Existing examples**: `0028` + `0029`, `0024` + `0025`, `0036` + `0037`
@@ -0,0 +1,392 @@
+---
+name: postgresql-indexing
+description: >
+  PostgreSQL indexing best practices for Prowler: index design, partial indexes, partitioned table
+  indexing, EXPLAIN ANALYZE validation, concurrent operations, monitoring, and maintenance.
+  Trigger: When creating or modifying PostgreSQL indexes, analyzing query performance with EXPLAIN,
+  debugging slow queries, reviewing index usage statistics, reindexing, dropping indexes, or working
+  with partitioned table indexes. Also trigger when discussing index strategies, partial indexes,
+  or index maintenance operations like VACUUM or ANALYZE.
+license: Apache-2.0
+metadata:
+  author: prowler-cloud
+  version: "1.0"
+  scope: [api]
+  auto_invoke:
+    - "Creating or modifying PostgreSQL indexes"
+    - "Analyzing query performance with EXPLAIN"
+    - "Debugging slow queries or missing indexes"
+    - "Dropping or reindexing PostgreSQL indexes"
+allowed-tools: Read, Grep, Glob, Bash
+---
+
+## When to use
+
+- Creating or modifying PostgreSQL indexes
+- Analyzing query plans with `EXPLAIN`
+- Debugging slow queries or missing index usage
+- Dropping, reindexing, or validating indexes
+- Working with indexes on partitioned tables (findings, resource_finding_mappings)
+- Running VACUUM or ANALYZE after index changes
+
+## Index design
+
+### Partial indexes: constant columns go in WHERE, not in the key
+
+When a column has a fixed value for the query (e.g., `state = 'completed'`), put it in the `WHERE` clause of the index, not in the indexed columns. Otherwise the planner cannot exploit the ordering of the other columns.
+
+```sql
+-- Bad: state in the key wastes space and breaks ordering
+CREATE INDEX idx_scans_tenant_state ON scans (tenant_id, state, inserted_at DESC);
+
+-- Good: state as a filter, planner uses tenant_id + inserted_at ordering
+CREATE INDEX idx_scans_tenant_ins_completed ON scans (tenant_id, inserted_at DESC)
+    WHERE state = 'completed';
+```
+
+### Column order matters
+
+Put high-selectivity columns first (columns that filter out the most rows). For composite indexes, the leftmost column must appear in the query's WHERE clause for the index to be used.
+
+## Validating index effectiveness
+
+### Always EXPLAIN (ANALYZE, BUFFERS) after adding indexes
+
+Never assume an index is being used. Run `EXPLAIN (ANALYZE, BUFFERS)` to confirm.
+
+```sql
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT *
+FROM users
+WHERE email = 'user@example.com';
+```
+
+Use [Postgres EXPLAIN Visualizer (pev)](https://tatiyants.com/pev/) to visualize query plans and identify bottlenecks.
+
+### Force index usage for testing
+
+The planner may choose a sequential scan on small datasets. Toggle `enable_seqscan = off` to confirm the index path works, then re-enable it.
+
+```sql
+SET enable_seqscan = off;
+
+EXPLAIN (ANALYZE, BUFFERS)
+SELECT DISTINCT ON (provider_id) provider_id
+FROM scans
+WHERE tenant_id = '95383b24-da01-44b5-a713-0d9920d554db'
+  AND state = 'completed'
+ORDER BY provider_id, inserted_at DESC;
+
+SET enable_seqscan = on;  -- always re-enable after testing
+```
+
+This is for validation only. Never leave `enable_seqscan = off` in production.
+
+## Over-indexing
+
+Every extra index has three costs that compound:
+
+1. **Write overhead.** Every INSERT and UPDATE must maintain all indexes. Extra indexes also kill HOT (Heap-Only-Tuple) updates, which normally skip index maintenance when unindexed columns change.
+
+2. **Planning time.** The planner evaluates more execution paths per index. On simple OLTP queries, planning time can exceed execution time by 4x when index count is high.
+
+3. **Lock contention (fastpath limit).** PostgreSQL uses a fast path for the first 16 locks per backend. After 16 relations (table + its indexes), it falls back to slower LWLock mechanisms. At high QPS (100+), this causes `LockManager` wait events.
+
+Rules:
+- Drop unused and redundant indexes regularly
+- Be especially careful with partitioned tables (each partition multiplies the index count)
+- Use prepared statements to reduce planning overhead when index count is high
+
+## Finding redundant indexes
+
+Two indexes are redundant when:
+- They have the same columns in the same order (duplicates)
+- One is a prefix of the other: index `(a)` is redundant to `(a, b)`, but NOT to `(b, a)`
+
+Column order matters. For partial indexes, the WHERE clause must also match.
+
+```sql
+-- Quick check: find indexes that share a leading column on the same table
+SELECT
+    a.indrelid::regclass AS table_name,
+    a.indexrelid::regclass AS index_a,
+    b.indexrelid::regclass AS index_b,
+    pg_size_pretty(pg_relation_size(a.indexrelid)) AS size_a,
+    pg_size_pretty(pg_relation_size(b.indexrelid)) AS size_b
+FROM pg_index a
+JOIN pg_index b ON a.indrelid = b.indrelid
+    AND a.indexrelid != b.indexrelid
+    AND a.indkey::text = (
+        SELECT string_agg(x::text, ' ')
+        FROM unnest(b.indkey[:array_length(a.indkey, 1)]) AS x
+    )
+WHERE NOT a.indisunique;
+```
+
+Before dropping: verify on all workload nodes (primary + replicas), use `DROP INDEX CONCURRENTLY`, and monitor for plan regressions.
+
+## Monitoring index usage
+
+### Identify unused indexes
+
+Query `pg_stat_all_indexes` to find indexes that are never or rarely scanned:
+
+```sql
+SELECT
+    idxstat.schemaname AS schema_name,
+    idxstat.relname AS table_name,
+    idxstat.indexrelname AS index_name,
+    idxstat.idx_scan AS index_scans_count,
+    idxstat.last_idx_scan AS last_idx_scan_timestamp,
+    pg_size_pretty(pg_relation_size(idxstat.indexrelid)) AS index_size
+FROM pg_stat_all_indexes AS idxstat
+JOIN pg_index i ON idxstat.indexrelid = i.indexrelid
+WHERE idxstat.schemaname NOT IN ('pg_catalog', 'information_schema', 'pg_toast')
+  AND NOT i.indisunique
+ORDER BY idxstat.idx_scan ASC, idxstat.last_idx_scan ASC;
+```
+
+Indexes with `idx_scan = 0` and no recent `last_idx_scan` are candidates for removal.
+
+Before dropping, verify:
+- Stats haven't been reset recently (check `stats_reset` in `pg_stat_database`)
+- Stats cover at least 1 month of production traffic
+- All workload nodes (primary + replicas) have been checked
+- The index isn't used by a periodic job that runs infrequently
+
+```sql
+-- Check when stats were last reset
+SELECT stats_reset, age(now(), stats_reset)
+FROM pg_stat_database
+WHERE datname = current_database();
+```
+
+### Monitor index creation progress
+
+Do not assume index creation succeeded. Use `pg_stat_progress_create_index` (Postgres 12+) to watch progress live:
+
+```sql
+SELECT * FROM pg_stat_progress_create_index;
+```
+
+In psql, use `\watch 5` to refresh every 5 seconds for a live dashboard view. `CREATE INDEX CONCURRENTLY` and `REINDEX CONCURRENTLY` have more phases than standard operations: monitor for blocking sessions and wait events.
+
+### Validate index integrity
+
+Check for invalid indexes regularly:
+
+```sql
+SELECT c.relname AS index_name, i.indisvalid
+FROM pg_class c
+JOIN pg_index i ON i.indexrelid = c.oid
+WHERE i.indisvalid = false;
+```
+
+Invalid indexes are ignored by the planner. They waste space and cause inconsistent query performance, especially on partitioned tables where some partitions may have valid indexes and others do not.
+
+## Concurrent operations
+
+### Always use CONCURRENTLY in production
+
+Never create or drop indexes without `CONCURRENTLY` on live tables. Without it, the operation holds a lock that blocks all writes.
+
+```sql
+-- Create
+CREATE INDEX CONCURRENTLY IF NOT EXISTS index_name ON table_name (column_name);
+
+-- Drop
+DROP INDEX CONCURRENTLY IF EXISTS index_name;
+```
+
+`DROP INDEX CONCURRENTLY` cannot run inside a transaction block.
+
+### Always use IF NOT EXISTS / IF EXISTS
+
+Makes scripts idempotent. Safe to re-run without errors from duplicate or missing indexes.
+
+### Concurrent indexing can fail silently
+
+`CREATE INDEX CONCURRENTLY` can fail without raising an error. The result is an invalid index that the planner ignores. This is particularly dangerous on partitioned tables: some partitions get valid indexes, others don't, causing inconsistent query performance.
+
+After any concurrent index creation, always validate:
+
+```sql
+SELECT c.relname, i.indisvalid
+FROM pg_class c
+JOIN pg_index i ON i.indexrelid = c.oid
+WHERE c.relname LIKE '%your_index_name%';
+```
+
+## Reindexing invalid indexes
+
+Rebuild invalid indexes without locking writes:
+
+```sql
+REINDEX INDEX CONCURRENTLY index_name;
+```
+
+### Understanding _ccnew and _ccold artifacts
+
+When `CREATE INDEX CONCURRENTLY` or `REINDEX INDEX CONCURRENTLY` is interrupted, temporary indexes may remain:
+
+| Suffix | Meaning | Action |
+|--------|---------|--------|
+| `_ccnew` | New index being built, incomplete | Drop it and retry `REINDEX CONCURRENTLY` |
+| `_ccold` | Old index being replaced, rebuild succeeded | Safe to drop |
+
+```sql
+-- Example: both original and temp are invalid
+-- users_emails_2019       btree (col) INVALID
+-- users_emails_2019_ccnew btree (col) INVALID
+
+-- Drop the failed new one, then retry
+DROP INDEX CONCURRENTLY IF EXISTS users_emails_2019_ccnew;
+REINDEX INDEX CONCURRENTLY users_emails_2019;
+```
+
+These leftovers clutter the schema, confuse developers, and waste disk space. Clean them up.
+
+## Indexing partitioned tables
+
+### Do NOT use ALTER INDEX ATTACH PARTITION
+
+As stated in PostgreSQL documentation, `ALTER INDEX ... ATTACH PARTITION` prevents dropping malfunctioning or non-performant indexes from individual partitions. An attached index cannot be dropped by itself and is automatically dropped if its parent index is dropped.
+
+This removes the ability to manage indexes per-partition, which we need for:
+- Dropping broken indexes on specific partitions
+- Skipping indexes on old partitions to save storage
+- Rebuilding indexes on individual partitions without affecting others
+
+### Correct approach: create on partitions, then on parent
+
+1. Create the index on each child partition concurrently:
+
+```sql
+CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_child_partition
+    ON child_partition (column_name);
+```
+
+2. Create the index on the parent table (metadata-only, fast):
+
+```sql
+CREATE INDEX IF NOT EXISTS idx_parent
+    ON parent_table (column_name);
+```
+
+PostgreSQL will automatically recognize partition-level indexes as part of the parent index definition when the index names and definitions match.
+
+### Prioritize active partitions
+
+For time-based partitions (findings uses monthly partitions):
+
+- Create indexes on recent/current partitions where data is actively queried
+- Skip older partitions that are rarely accessed
+- The `all_partitions=False` default in `create_index_on_partitions` handles this automatically
+
+## Index maintenance and bloat
+
+Over time, B-tree indexes accumulate bloat from updates and deletes. VACUUM reclaims heap space but does NOT rebalance B-tree pages. Periodic reindexing is necessary for heavily updated tables.
+
+### Detecting bloat
+
+Indexes with estimated bloat above 50% are candidates for `REINDEX CONCURRENTLY`. Check bloat with tools like `pgstattuple` or bloat estimation queries.
+
+### Reducing bloat buildup
+
+Three things slow degradation:
+1. **Upgrade to PostgreSQL 14+** for B-tree deduplication and bottom-up deletion
+2. **Maximize HOT updates** by not indexing frequently-updated columns
+3. **Tune autovacuum** to run more aggressively on high-churn tables
+
+### Rebuilding many indexes without deadlocks
+
+If you rebuild two indexes on the same table in parallel, PostgreSQL detects a deadlock and kills one session. To rebuild many indexes across multiple sessions safely, assign all indexes for a given table to the same session:
+
+```sql
+\set NUMBER_OF_SESSIONS 10
+
+SELECT
+    format('%I.%I', n.nspname, c.relname) AS table_fqn,
+    format('%I.%I', n.nspname, i.relname) AS index_fqn,
+    mod(
+        hashtext(format('%I.%I', n.nspname, c.relname)) & 2147483647,
+        :NUMBER_OF_SESSIONS
+    ) AS session_id
+FROM pg_index idx
+JOIN pg_class c ON idx.indrelid = c.oid
+JOIN pg_class i ON idx.indexrelid = i.oid
+JOIN pg_namespace n ON c.relnamespace = n.oid
+WHERE n.nspname NOT IN ('pg_catalog', 'pg_toast', 'information_schema')
+ORDER BY table_fqn, index_fqn;
+```
+
+Then run each session's indexes in a separate `REINDEX INDEX CONCURRENTLY` call. Set `NUMBER_OF_SESSIONS` based on `max_parallel_maintenance_workers` and available I/O.
+
+## Dropping indexes
+
+### Post-drop maintenance
+
+After dropping an index, run VACUUM and ANALYZE to reclaim space and update planner statistics:
+
+```sql
+-- Full vacuum + analyze (can be heavy on large tables)
+VACUUM (ANALYZE) your_table;
+
+-- Lightweight alternative for huge tables: just update statistics
+ANALYZE your_table;
+```
+
+## Commands
+
+```sql
+-- Validate query uses an index
+EXPLAIN (ANALYZE, BUFFERS) SELECT ...;
+
+-- Check index creation progress
+SELECT * FROM pg_stat_progress_create_index;
+
+-- Find invalid indexes
+SELECT c.relname, i.indisvalid
+FROM pg_class c JOIN pg_index i ON i.indexrelid = c.oid
+WHERE i.indisvalid = false;
+
+-- Find unused indexes
+SELECT relname, indexrelname, idx_scan, pg_size_pretty(pg_relation_size(indexrelid))
+FROM pg_stat_all_indexes
+WHERE schemaname = 'public' AND idx_scan = 0;
+
+-- Create index safely
+CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_name ON table (columns);
+
+-- Drop index safely
+DROP INDEX CONCURRENTLY IF EXISTS idx_name;
+
+-- Rebuild invalid index
+REINDEX INDEX CONCURRENTLY idx_name;
+
+-- Post-drop maintenance
+VACUUM (ANALYZE) table_name;
+```
+
+## Context7 lookups
+
+**Prerequisite:** Install Context7 MCP server for up-to-date documentation lookup.
+
+| Library | Context7 ID | Use for |
+|---------|-------------|---------|
+| PostgreSQL | `/websites/postgresql_org_docs_current` | Index types, EXPLAIN, partitioned table indexing, REINDEX |
+
+**Example queries:**
+```
+mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="CREATE INDEX CONCURRENTLY partitioned table")
+mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="EXPLAIN ANALYZE BUFFERS query plan")
+mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="partial index WHERE clause")
+mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="REINDEX CONCURRENTLY invalid index")
+mcp_context7_query-docs(libraryId="/websites/postgresql_org_docs_current", query="pg_stat_all_indexes monitoring")
+```
+
+> **Note:** Use `mcp_context7_resolve-library-id` first if you need to find the correct library ID.
+
+## Resources
+
+- **EXPLAIN Visualizer**: [pev](https://tatiyants.com/pev/)