mirror of
https://github.com/prowler-cloud/prowler.git
synced 2026-04-09 19:28:23 +00:00
Compare commits
2 Commits
fix/findin
...
improve-co
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
6ca837dfc0 | ||
|
|
b3c94cc54a |
@@ -1,16 +1,28 @@
|
||||
---
|
||||
name: prowler-compliance
|
||||
description: >
|
||||
Creates and manages Prowler compliance frameworks.
|
||||
Trigger: When working with compliance frameworks (CIS, NIST, PCI-DSS, SOC2, GDPR, ISO27001, ENS, MITRE ATT&CK).
|
||||
Creates, syncs, audits and manages Prowler compliance frameworks end-to-end.
|
||||
Covers the four-layer architecture (SDK models → JSON catalogs → output
|
||||
formatters → API/UI), upstream sync workflows, cloud-auditor check-mapping
|
||||
reviews, output formatter creation, and framework-specific attribute models.
|
||||
Trigger: When working with compliance frameworks (CIS, NIST, PCI-DSS, SOC2,
|
||||
GDPR, ISO27001, ENS, MITRE ATT&CK, CCC, C5, CSA CCM, KISA ISMS-P,
|
||||
Prowler ThreatScore, FedRAMP, HIPAA), syncing with upstream catalogs,
|
||||
auditing check-to-requirement mappings, adding output formatters, or fixing
|
||||
compliance JSON bugs (duplicate IDs, empty Version, wrong Section, stale
|
||||
check refs).
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: prowler-cloud
|
||||
version: "1.1"
|
||||
version: "1.2"
|
||||
scope: [root, sdk]
|
||||
auto_invoke:
|
||||
- "Creating/updating compliance frameworks"
|
||||
- "Mapping checks to compliance controls"
|
||||
- "Syncing compliance framework with upstream catalog"
|
||||
- "Auditing check-to-requirement mappings as a cloud auditor"
|
||||
- "Adding a compliance output formatter (per-provider class + table dispatcher)"
|
||||
- "Fixing compliance JSON bugs (duplicate IDs, empty Section, stale refs)"
|
||||
allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, WebSearch, Task
|
||||
---
|
||||
|
||||
@@ -18,10 +30,82 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, WebSearch, Task
|
||||
|
||||
Use this skill when:
|
||||
- Creating a new compliance framework for any provider
|
||||
- **Syncing an existing framework with an upstream source of truth** (CIS, FINOS CCC, CSA CCM, NIST, ENS, etc.)
|
||||
- Adding requirements to existing frameworks
|
||||
- Mapping checks to compliance controls
|
||||
- **Auditing existing check mappings as a cloud auditor** (user asks "are these mappings correct?", "which checks apply to this requirement?", "review the mappings")
|
||||
- **Adding a new output formatter** (new framework needs a table dispatcher + per-provider classes + CSV models)
|
||||
- **Fixing JSON bugs**: duplicate IDs, empty Version, wrong Section, stale check refs, inconsistent FamilyName, padded tangential check mappings
|
||||
- **Registering a framework in the CLI table dispatcher or API export map**
|
||||
- Investigating why a finding/check isn't showing under the expected compliance framework in the UI
|
||||
- Understanding compliance framework structures and attributes
|
||||
|
||||
## Four-Layer Architecture (Mental Model)
|
||||
|
||||
Prowler compliance is a **four-layer system** hanging off one Pydantic model tree. Bugs usually happen where one layer doesn't match another, so know all four before touching anything.
|
||||
|
||||
### Layer 1: SDK / Core Models — `prowler/lib/check/`
|
||||
|
||||
- **`compliance_models.py`** — Pydantic **v1** model tree (`from pydantic.v1 import`). One `*_Requirement_Attribute` class per framework type + `Generic_Compliance_Requirement_Attribute` as fallback.
|
||||
- `Compliance_Requirement.Attributes: list[Union[...]]` — **`Generic_Compliance_Requirement_Attribute` MUST be LAST** in the Union or every framework-specific attribute falls through to Generic (Pydantic v1 tries union members in order).
|
||||
- **`compliance.py`** — runtime linker. `get_check_compliance()` builds the key as `f"{Framework}-{Version}"` **only if `Version` is non-empty**. An empty Version makes the key just `"{Framework}"` — this breaks downstream filters and tests that expect the versioned key.
|
||||
- `Compliance.get_bulk(provider)` walks `prowler/compliance/{provider}/` and parses every `.json` file. No central index — just directory scan.
|
||||
|
||||
### Layer 2: JSON Frameworks — `prowler/compliance/{provider}/`
|
||||
|
||||
See "Compliance Framework Location" and "Framework-Specific Attribute Structures" sections below.
|
||||
|
||||
### Layer 3: Output Formatters — `prowler/lib/outputs/compliance/{framework}/`
|
||||
|
||||
**Every framework directory follows this exact convention** — do not deviate:
|
||||
|
||||
```
|
||||
{framework}/
|
||||
├── __init__.py
|
||||
├── {framework}.py # ONLY get_{framework}_table() — NO function docstring
|
||||
├── {framework}_{provider}.py # One class per provider (e.g., CCC_AWS, CCC_Azure, CCC_GCP)
|
||||
└── models.py # One Pydantic v2 BaseModel per provider (CSV columns)
|
||||
```
|
||||
|
||||
- **`{framework}.py`** holds the **table dispatcher function** `get_{framework}_table()`. It prints the pass/fail/muted summary table. **Must NOT import `Finding` or `ComplianceOutput`** — doing so creates a circular import with `prowler/lib/outputs/compliance/compliance.py`. Only imports: `colorama`, `tabulate`, `prowler.config.config.orange_color`.
|
||||
- **`{framework}_{provider}.py`** holds a per-provider class like `CCC_AWS(ComplianceOutput)` with a `transform()` method that walks findings and emits rows. This file IS allowed to import `Finding` because it's not on the dispatcher import chain.
|
||||
- **`models.py`** holds one Pydantic v2 `BaseModel` per provider. Field names become CSV column headers (**public API** — renaming breaks downstream consumers).
|
||||
- **Never collapse per-provider files into a unified parameterized class**, even when DRY-tempting. Every framework in Prowler follows the per-provider file pattern and reviewers will reject the refactor. CSV columns differ per provider (`AccountId`/`Region` vs `SubscriptionId`/`Location` vs `ProjectId`/`Location`) — three classes is the convention.
|
||||
- **No function docstring on `get_{framework}_table()`** — no other framework has one; stay consistent.
|
||||
- Register in `prowler/lib/outputs/compliance/compliance.py` → `display_compliance_table()` with an `elif compliance_framework.startswith("{framework}_"):` branch. Import the table function at the top of the file.
|
||||
|
||||
### Layer 4: API / UI
|
||||
|
||||
- **API table dispatcher**: `api/src/backend/tasks/jobs/export.py` → `COMPLIANCE_CLASS_MAP` keyed by provider. Uses `startswith` predicates: `(lambda name: name.startswith("ccc_"), CCC_AWS)`. **Never use exact match** (`name == "ccc_aws"`) — it's inconsistent and breaks versioning.
|
||||
- **API lazy loader**: `api/src/backend/api/compliance.py` — `LazyComplianceTemplate` and `LazyChecksMapping` load compliance per provider on first access.
|
||||
- **UI mapper routing**: `ui/lib/compliance/compliance-mapper.ts` routes framework names → per-framework mapper.
|
||||
- **UI per-framework mapper**: `ui/lib/compliance/{framework}.tsx` flattens `Requirements` into a 3-level tree (Framework → Category → Control → Requirement) for the accordion view. Groups by `Attributes[0].FamilyName` and `Attributes[0].Section`.
|
||||
- **UI detail panel**: `ui/components/compliance/compliance-custom-details/{framework}-details.tsx`.
|
||||
- **UI types**: `ui/types/compliance.ts` — TypeScript mirrors of the attribute metadata.
|
||||
|
||||
### The CLI Pipeline (end-to-end)
|
||||
|
||||
```
|
||||
prowler aws --compliance ccc_aws
|
||||
↓
|
||||
Compliance.get_bulk("aws") → parses prowler/compliance/aws/*.json
|
||||
↓
|
||||
update_checks_metadata_with_compliance() → attaches compliance info to CheckMetadata
|
||||
↓
|
||||
execute_checks() → runs checks, produces Finding objects
|
||||
↓
|
||||
get_check_compliance(finding, "aws", bulk_checks_metadata)
|
||||
→ dict "{Framework}-{Version}" → [requirement_ids]
|
||||
↓
|
||||
CCC_AWS(findings, compliance).transform() → per-provider class builds CSV rows
|
||||
↓
|
||||
batch_write_data_to_file() → writes {output_filename}_ccc_aws.csv
|
||||
↓
|
||||
display_compliance_table() → get_ccc_table() → prints stdout summary
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Compliance Framework Location
|
||||
|
||||
Frameworks are JSON files located in: `prowler/compliance/{provider}/{framework_name}_{provider}.json`
|
||||
@@ -455,14 +539,453 @@ Prowler ThreatScore is a custom security scoring framework developed by Prowler
|
||||
- **M365:** `cis_4.0_m365.json`, `iso27001_2022_m365.json`
|
||||
- **NHN:** `iso27001_2022_nhn.json`
|
||||
|
||||
## Workflow A: Sync a Framework With an Upstream Catalog
|
||||
|
||||
Use when the framework is maintained upstream (CIS Benchmarks, FINOS CCC, CSA CCM, NIST, ENS, etc.) and Prowler needs to catch up.
|
||||
|
||||
### Step 1 — Cache the upstream source
|
||||
|
||||
Download every upstream file to a local cache so subsequent iterations don't hit the network. For FINOS CCC:
|
||||
|
||||
```bash
|
||||
mkdir -p /tmp/ccc_upstream
|
||||
catalogs="core/ccc storage/object management/auditlog management/logging ..."
|
||||
for p in $catalogs; do
|
||||
safe=$(echo "$p" | tr '/' '_')
|
||||
gh api "repos/finos/common-cloud-controls/contents/catalogs/$p/controls.yaml" \
|
||||
-H "Accept: application/vnd.github.raw" > "/tmp/ccc_upstream/${safe}.yaml"
|
||||
done
|
||||
```
|
||||
|
||||
### Step 2 — Run the generic sync runner against a framework config
|
||||
|
||||
The sync tooling is split into three layers so adding a new framework only takes a YAML config (and optionally a new parser module for an unfamiliar upstream format):
|
||||
|
||||
```
|
||||
skills/prowler-compliance/assets/
|
||||
├── sync_framework.py # generic runner — works for any framework
|
||||
├── configs/
|
||||
│ └── ccc.yaml # per-framework config (canonical example)
|
||||
└── parsers/
|
||||
├── __init__.py
|
||||
└── finos_ccc.py # parser module for FINOS CCC YAML
|
||||
```
|
||||
|
||||
**For frameworks that already have a config + parser** (today: FINOS CCC), run:
|
||||
|
||||
```bash
|
||||
python skills/prowler-compliance/assets/sync_framework.py \
|
||||
skills/prowler-compliance/assets/configs/ccc.yaml
|
||||
```
|
||||
|
||||
The runner loads the config, validates it, dynamically imports the parser declared in `parser.module`, calls `parser.parse_upstream(config) -> list[dict]`, then applies generic post-processing (id uniqueness safety net, `FamilyName` normalization, legacy check-mapping preservation) and writes the provider JSONs.
|
||||
|
||||
**To add a new framework sync**:
|
||||
|
||||
1. **Write a config file** at `skills/prowler-compliance/assets/configs/{framework}.yaml`. See `configs/ccc.yaml` as the canonical example. Required top-level sections:
|
||||
- `framework` — `name`, `display_name`, `version` (**never empty** — empty Version silently breaks `get_check_compliance()` key construction, so the runner refuses to start), `description_template` (accepts `{provider_display}`, `{provider_key}`, `{framework_name}`, `{framework_display}`, `{version}` placeholders).
|
||||
- `providers` — list of `{key, display}` pairs, one per Prowler provider the framework targets.
|
||||
- `output.path_template` — supports `{provider}`, `{framework}`, `{version}` placeholders. Examples: `"prowler/compliance/{provider}/ccc_{provider}.json"` for unversioned file names, `"prowler/compliance/{provider}/cis_{version}_{provider}.json"` for versioned ones.
|
||||
- `upstream.dir` — local cache directory (populate via Step 1).
|
||||
- `parser.module` — name of the module under `parsers/` to load (without `.py`). Everything else under `parser.` is opaque to the runner and passed to the parser as config.
|
||||
- `post_processing.check_preservation.primary_key` — top-level field name for the primary legacy-mapping lookup (almost always `Id`).
|
||||
- `post_processing.check_preservation.fallback_keys` — **config-driven fallback keys** for preserving check mappings when ids change. Each entry is a list of `Attributes[0]` field names composed into a tuple. Examples:
|
||||
- CCC: `- [Section, Applicability]` (because `Applicability` is a CCC-only attribute, verified in `compliance_models.py:213`).
|
||||
- CIS would use `- [Section, Profile]`.
|
||||
- NIST would use `- [ItemId]`.
|
||||
- List-valued fields (like `Applicability`) are automatically frozen to `frozenset` so the tuple is hashable.
|
||||
- `post_processing.family_name_normalization` (optional) — map of raw → canonical `FamilyName` values. The UI groups by `Attributes[0].FamilyName` exactly, so inconsistent upstream variants otherwise become separate tree branches.
|
||||
|
||||
2. **Reuse an existing parser** if the upstream format matches one (currently only `finos_ccc` exists). Otherwise, **write a new parser** at `parsers/{name}.py` implementing:
|
||||
|
||||
```python
|
||||
def parse_upstream(config: dict) -> list[dict]:
|
||||
"""Return Prowler-format requirements {Id, Description, Attributes: [...], Checks: []}.
|
||||
|
||||
Ids MUST be unique in the returned list. The runner raises ValueError
|
||||
on duplicates — it does NOT silently renumber, because mutating a
|
||||
canonical upstream id (e.g. CIS '1.1.1' or NIST 'AC-2(1)') would be
|
||||
catastrophic. The parser owns all upstream-format quirks: foreign-prefix
|
||||
rewriting, genuine collision renumbering, shape handling.
|
||||
"""
|
||||
```
|
||||
|
||||
The parser reads its own settings from `config['upstream']` and `config['parser']`. It does NOT load existing Prowler JSONs (the runner does that for check preservation) and does NOT write output (the runner does that too).
|
||||
|
||||
**Gotchas the runner already handles for you** (learned from the FINOS CCC v2025.10 sync — they're documented here so you don't re-discover them):
|
||||
|
||||
- **Multiple upstream YAML shapes**. Most FINOS CCC catalogs use `control-families: [...]`, but `storage/object` uses a top-level `controls: [...]` with a `family: "CCC.X.Y"` reference id and no human-readable family name. A parser that only handles shape 1 silently drops the shape-2 catalog — this exact bug dropped ObjStor from Prowler for a full iteration. `parsers/finos_ccc.py` handles both shapes; if you write a new parser for a similar format, test with at least one file of each shape.
|
||||
- **Whitespace collapse**. Upstream YAML multi-line block scalars (`|`) preserve newlines. Prowler stores descriptions single-line. Collapse with `" ".join(value.split())` before emitting (see `parsers/finos_ccc.py::clean()`).
|
||||
- **Foreign-prefix AR id rewriting**. Upstream sometimes aliases requirements across catalogs by keeping the original prefix (e.g., `CCC.AuditLog.CN08.AR01` appears nested under `CCC.Logging.CN03`). Rewrite the foreign id to fit its parent control: `CCC.Logging.CN03.AR01`. This logic is parser-specific because the id structure varies per framework (CCC uses 3-dot depth; CIS uses numeric dots; NIST uses `AC-2(1)`).
|
||||
- **Genuine upstream collision renumbering**. Sometimes upstream has a real typo where two different requirements share the same id (e.g., `CCC.Core.CN14.AR02` defined twice for 30-day and 14-day backup variants). Renumber the second copy to the next free AR number (`.AR03`). The parser handles this; the runner asserts the final list has unique ids as a safety net.
|
||||
- **Existing check mapping preservation**. The runner uses the `primary_key` + `fallback_keys` declared in config to look up the old `Checks` list for each requirement. For CCC this means primary index by `Id` plus fallback index by `(Section, frozenset(Applicability))` — the fallback recovers mappings for requirements whose ids were rewritten or renumbered by the parser.
|
||||
- **FamilyName normalization**. Configured via `post_processing.family_name_normalization` — no code changes needed to collapse upstream variants like `"Logging & Monitoring"` → `"Logging and Monitoring"`.
|
||||
- **Populate `Version`**. The runner refuses to start on empty `framework.version` — fail-fast replaces the silent bug where `get_check_compliance()` would build the key as just `"{Framework}"`.
|
||||
|
||||
### Step 3 — Validate before committing
|
||||
|
||||
```python
|
||||
from prowler.lib.check.compliance_models import Compliance
|
||||
for prov in ['aws', 'azure', 'gcp']:
|
||||
c = Compliance.parse_file(f"prowler/compliance/{prov}/ccc_{prov}.json")
|
||||
print(f"{prov}: {len(c.Requirements)} reqs, version={c.Version}")
|
||||
```
|
||||
|
||||
Any `ValidationError` means the Attribute fields don't match the `*_Requirement_Attribute` model. Either fix the JSON or extend the model in `compliance_models.py` (remember: Generic stays last).
|
||||
|
||||
### Step 4 — Verify every check id exists
|
||||
|
||||
```python
|
||||
import json
|
||||
from pathlib import Path
|
||||
for prov in ['aws', 'azure', 'gcp']:
|
||||
existing = {p.stem.replace('.metadata','')
|
||||
for p in Path(f'prowler/providers/{prov}/services').rglob('*.metadata.json')}
|
||||
with open(f'prowler/compliance/{prov}/ccc_{prov}.json') as f:
|
||||
data = json.load(f)
|
||||
refs = {c for r in data['Requirements'] for c in r['Checks']}
|
||||
missing = refs - existing
|
||||
assert not missing, f"{prov} missing: {missing}"
|
||||
```
|
||||
|
||||
A stale check id silently becomes dead weight — no finding will ever map to it. This pre-validation **must run on every write**; bake it into the generator script.
|
||||
|
||||
### Step 5 — Add an attribute model if needed
|
||||
|
||||
Only if the framework has fields beyond `Generic_Compliance_Requirement_Attribute`. Add the class to `prowler/lib/check/compliance_models.py` and register it in `Compliance_Requirement.Attributes: list[Union[...]]`. **Generic stays last.**
|
||||
|
||||
---
|
||||
|
||||
## Workflow B: Audit Check Mappings as a Cloud Auditor
|
||||
|
||||
Use when the user asks to review existing mappings ("are these correct?", "verify that the checks apply", "audit the CCC mappings"). This is the highest-value compliance task — it surfaces padded mappings with zero actual coverage and missing mappings for legitimate coverage.
|
||||
|
||||
### The golden rule
|
||||
|
||||
> A Prowler check's title/risk MUST **literally describe what the requirement text says**. "Related" is not enough. If no check actually addresses the requirement, leave `Checks: []` (MANUAL) — **honest MANUAL is worth more than padded coverage**.
|
||||
|
||||
### Audit process
|
||||
|
||||
**Step 1 — Build a per-provider check inventory** (cache in `/tmp/`):
|
||||
|
||||
```python
|
||||
import json
|
||||
from pathlib import Path
|
||||
for provider in ['aws', 'azure', 'gcp']:
|
||||
inv = {}
|
||||
for meta in Path(f'prowler/providers/{provider}/services').rglob('*.metadata.json'):
|
||||
with open(meta) as f:
|
||||
d = json.load(f)
|
||||
cid = d.get('CheckID') or meta.stem.replace('.metadata','')
|
||||
inv[cid] = {
|
||||
'service': d.get('ServiceName', ''),
|
||||
'title': d.get('CheckTitle', ''),
|
||||
'risk': d.get('Risk', ''),
|
||||
'description': d.get('Description', ''),
|
||||
}
|
||||
with open(f'/tmp/checks_{provider}.json', 'w') as f:
|
||||
json.dump(inv, f, indent=2)
|
||||
```
|
||||
|
||||
**Step 2 — Keyword/service query helper** — see [assets/query_checks.py](assets/query_checks.py):
|
||||
|
||||
```bash
|
||||
python assets/query_checks.py aws encryption transit # keyword AND-search
|
||||
python assets/query_checks.py aws --service iam # all iam checks
|
||||
python assets/query_checks.py aws --id kms_cmk_rotation_enabled # full metadata
|
||||
```
|
||||
|
||||
**Step 3 — Dump a framework section with current mappings** — see [assets/dump_section.py](assets/dump_section.py):
|
||||
|
||||
```bash
|
||||
python assets/dump_section.py "CCC.Core." # all Core ARs across 3 providers
|
||||
python assets/dump_section.py "CCC.AuditLog." # all AuditLog ARs
|
||||
```
|
||||
|
||||
**Step 4 — Encode explicit REPLACE decisions** — see [assets/audit_framework_template.py](assets/audit_framework_template.py). Structure:
|
||||
|
||||
```python
|
||||
DECISIONS = {}
|
||||
|
||||
DECISIONS["CCC.Core.CN01.AR01"] = {
|
||||
"aws": [
|
||||
"cloudfront_distributions_https_enabled",
|
||||
"cloudfront_distributions_origin_traffic_encrypted",
|
||||
# ...
|
||||
],
|
||||
"azure": [
|
||||
"storage_secure_transfer_required_is_enabled",
|
||||
"app_minimum_tls_version_12",
|
||||
# ...
|
||||
],
|
||||
"gcp": [
|
||||
"cloudsql_instance_ssl_connections",
|
||||
],
|
||||
# Missing provider key = leave the legacy mapping untouched
|
||||
}
|
||||
|
||||
# Empty list = EXPLICITLY MANUAL (overwrites legacy)
|
||||
DECISIONS["CCC.Core.CN01.AR07"] = {
|
||||
"aws": [], # Prowler has no IANA port/protocol check
|
||||
"azure": [],
|
||||
"gcp": [],
|
||||
}
|
||||
```
|
||||
|
||||
**REPLACE, not PATCH.** Encoding every mapping as a full list (not add/remove delta) makes the audit reproducible and surfaces hidden assumptions from the legacy data.
|
||||
|
||||
**Step 5 — Pre-validation**. The audit script MUST validate every check id against the inventory and **abort with stderr listing typos**. Common typos caught during a real audit:
|
||||
|
||||
- `fsx_file_system_encryption_at_rest_using_kms` (doesn't exist)
|
||||
- `cosmosdb_account_encryption_at_rest_with_cmk` (doesn't exist)
|
||||
- `sqlserver_geo_replication` (doesn't exist)
|
||||
- `redshift_cluster_audit_logging` (should be `redshift_cluster_encrypted_at_rest`)
|
||||
- `postgresql_flexible_server_require_secure_transport` (should be `postgresql_flexible_server_enforce_ssl_enabled`)
|
||||
- `storage_secure_transfer_required_enabled` (should be `storage_secure_transfer_required_is_enabled`)
|
||||
- `sqlserver_minimum_tls_version_12` (should be `sqlserver_recommended_minimal_tls_version`)
|
||||
|
||||
**Step 6 — Apply + validate + test**:
|
||||
|
||||
```bash
|
||||
python /path/to/audit_script.py # applies decisions, pre-validates
|
||||
python -m pytest tests/lib/outputs/compliance/ tests/lib/check/ -q
|
||||
```
|
||||
|
||||
### Audit Reference Table: Requirement Text → Prowler Checks
|
||||
|
||||
Use this table to map CCC-style / NIST-style / ISO-style requirements to the checks that actually verify them. Built from a real audit of 172 CCC ARs × 3 providers.
|
||||
|
||||
| Requirement text | AWS checks | Azure checks | GCP checks |
|
||||
|---|---|---|---|
|
||||
| **TLS in transit enforced** | `cloudfront_distributions_https_enabled`, `s3_bucket_secure_transport_policy`, `elbv2_ssl_listeners`, `elbv2_insecure_ssl_ciphers`, `elb_ssl_listeners`, `elb_insecure_ssl_ciphers`, `opensearch_service_domains_https_communications_enforced`, `rds_instance_transport_encrypted`, `redshift_cluster_in_transit_encryption_enabled`, `elasticache_redis_cluster_in_transit_encryption_enabled`, `dynamodb_accelerator_cluster_in_transit_encryption_enabled`, `dms_endpoint_ssl_enabled`, `kafka_cluster_in_transit_encryption_enabled`, `transfer_server_in_transit_encryption_enabled`, `glue_database_connections_ssl_enabled`, `sns_subscription_not_using_http_endpoints` | `storage_secure_transfer_required_is_enabled`, `storage_ensure_minimum_tls_version_12`, `postgresql_flexible_server_enforce_ssl_enabled`, `mysql_flexible_server_ssl_connection_enabled`, `mysql_flexible_server_minimum_tls_version_12`, `sqlserver_recommended_minimal_tls_version`, `app_minimum_tls_version_12`, `app_ensure_http_is_redirected_to_https`, `app_ftp_deployment_disabled` | `cloudsql_instance_ssl_connections` (almost only option) |
|
||||
| **TLS 1.3 specifically** | Partial: `cloudfront_distributions_using_deprecated_ssl_protocols`, `elb*_insecure_ssl_ciphers`, `*_minimum_tls_version_12` | Partial: `*_minimum_tls_version_12` checks | None — accept as MANUAL |
|
||||
| **SSH / port 22 hardening** | `ec2_instance_port_ssh_exposed_to_internet`, `ec2_securitygroup_allow_ingress_from_internet_to_tcp_port_22`, `ec2_networkacl_allow_ingress_tcp_port_22` | `network_ssh_internet_access_restricted`, `vm_linux_enforce_ssh_authentication` | `compute_firewall_ssh_access_from_the_internet_allowed`, `compute_instance_block_project_wide_ssh_keys_disabled`, `compute_project_os_login_enabled`, `compute_project_os_login_2fa_enabled` |
|
||||
| **mTLS (mutual TLS)** | `kafka_cluster_mutual_tls_authentication_enabled`, `apigateway_restapi_client_certificate_enabled` | `app_client_certificates_on` | None — MANUAL |
|
||||
| **Data at rest encrypted** | `s3_bucket_default_encryption`, `s3_bucket_kms_encryption`, `ec2_ebs_default_encryption`, `ec2_ebs_volume_encryption`, `rds_instance_storage_encrypted`, `rds_cluster_storage_encrypted`, `rds_snapshots_encrypted`, `dynamodb_tables_kms_cmk_encryption_enabled`, `redshift_cluster_encrypted_at_rest`, `neptune_cluster_storage_encrypted`, `documentdb_cluster_storage_encrypted`, `opensearch_service_domains_encryption_at_rest_enabled`, `kinesis_stream_encrypted_at_rest`, `firehose_stream_encrypted_at_rest`, `sns_topics_kms_encryption_at_rest_enabled`, `sqs_queues_server_side_encryption_enabled`, `efs_encryption_at_rest_enabled`, `athena_workgroup_encryption`, `glue_data_catalogs_metadata_encryption_enabled`, `backup_vaults_encrypted`, `backup_recovery_point_encrypted`, `cloudtrail_kms_encryption_enabled`, `cloudwatch_log_group_kms_encryption_enabled`, `eks_cluster_kms_cmk_encryption_in_secrets_enabled`, `sagemaker_notebook_instance_encryption_enabled`, `apigateway_restapi_cache_encrypted`, `kafka_cluster_encryption_at_rest_uses_cmk`, `dynamodb_accelerator_cluster_encryption_enabled`, `storagegateway_fileshare_encryption_enabled` | `storage_infrastructure_encryption_is_enabled`, `storage_ensure_encryption_with_customer_managed_keys`, `vm_ensure_attached_disks_encrypted_with_cmk`, `vm_ensure_unattached_disks_encrypted_with_cmk`, `sqlserver_tde_encryption_enabled`, `sqlserver_tde_encrypted_with_cmk`, `databricks_workspace_cmk_encryption_enabled`, `monitor_storage_account_with_activity_logs_cmk_encrypted` | `compute_instance_encryption_with_csek_enabled`, `dataproc_encrypted_with_cmks_disabled`, `bigquery_dataset_cmk_encryption`, `bigquery_table_cmk_encryption` |
|
||||
| **CMEK required (customer-managed keys)** | `kms_cmk_are_used` | `storage_ensure_encryption_with_customer_managed_keys`, `vm_ensure_attached_disks_encrypted_with_cmk`, `vm_ensure_unattached_disks_encrypted_with_cmk`, `sqlserver_tde_encrypted_with_cmk`, `databricks_workspace_cmk_encryption_enabled` | `bigquery_dataset_cmk_encryption`, `bigquery_table_cmk_encryption`, `dataproc_encrypted_with_cmks_disabled`, `compute_instance_encryption_with_csek_enabled` |
|
||||
| **Key rotation enabled** | `kms_cmk_rotation_enabled` | `keyvault_key_rotation_enabled`, `storage_key_rotation_90_days` | `kms_key_rotation_enabled` |
|
||||
| **MFA for UI access** | `iam_root_mfa_enabled`, `iam_root_hardware_mfa_enabled`, `iam_user_mfa_enabled_console_access`, `iam_user_hardware_mfa_enabled`, `iam_administrator_access_with_mfa`, `cognito_user_pool_mfa_enabled` | `entra_privileged_user_has_mfa`, `entra_non_privileged_user_has_mfa`, `entra_user_with_vm_access_has_mfa`, `entra_security_defaults_enabled` | `compute_project_os_login_2fa_enabled` |
|
||||
| **API access / credentials** | `iam_no_root_access_key`, `iam_user_no_setup_initial_access_key`, `apigateway_restapi_authorizers_enabled`, `apigateway_restapi_public_with_authorizer`, `apigatewayv2_api_authorizers_enabled` | `entra_conditional_access_policy_require_mfa_for_management_api`, `app_function_access_keys_configured`, `app_function_identity_is_configured` | `apikeys_api_restrictions_configured`, `apikeys_key_exists`, `apikeys_key_rotated_in_90_days` |
|
||||
| **Log all admin/config changes** | `cloudtrail_multi_region_enabled`, `cloudtrail_multi_region_enabled_logging_management_events`, `cloudtrail_cloudwatch_logging_enabled`, `cloudtrail_log_file_validation_enabled`, `cloudwatch_log_metric_filter_*`, `cloudwatch_changes_to_*_alarm_configured`, `config_recorder_all_regions_enabled` | `monitor_diagnostic_settings_exists`, `monitor_diagnostic_setting_with_appropriate_categories`, `monitor_alert_*` | `iam_audit_logs_enabled`, `logging_log_metric_filter_and_alert_for_*`, `logging_sink_created` |
|
||||
| **Log integrity (digital signatures)** | `cloudtrail_log_file_validation_enabled` (exact) | None | None |
|
||||
| **Public access denied** | `s3_bucket_public_access`, `s3_bucket_public_list_acl`, `s3_bucket_public_write_acl`, `s3_account_level_public_access_blocks`, `apigateway_restapi_public`, `awslambda_function_url_public`, `awslambda_function_not_publicly_accessible`, `rds_instance_no_public_access`, `rds_snapshots_public_access`, `ec2_securitygroup_allow_ingress_from_internet_to_all_ports`, `sns_topics_not_publicly_accessible`, `sqs_queues_not_publicly_accessible` | `storage_blob_public_access_level_is_disabled`, `storage_ensure_private_endpoints_in_storage_accounts`, `containerregistry_not_publicly_accessible`, `keyvault_private_endpoints`, `app_function_not_publicly_accessible`, `aks_clusters_public_access_disabled`, `network_http_internet_access_restricted` | `cloudstorage_bucket_public_access`, `compute_instance_public_ip`, `cloudsql_instance_public_ip`, `compute_firewall_*_access_from_the_internet_allowed` |
|
||||
| **IAM least privilege** | `iam_*_no_administrative_privileges`, `iam_policy_allows_privilege_escalation`, `iam_inline_policy_allows_privilege_escalation`, `iam_role_administratoraccess_policy`, `iam_group_administrator_access_policy`, `iam_user_administrator_access_policy`, `iam_policy_attached_only_to_group_or_roles`, `iam_role_cross_service_confused_deputy_prevention` | `iam_role_user_access_admin_restricted`, `iam_subscription_roles_owner_custom_not_created`, `iam_custom_role_has_permissions_to_administer_resource_locks` | `iam_sa_no_administrative_privileges`, `iam_no_service_roles_at_project_level`, `iam_role_kms_enforce_separation_of_duties`, `iam_role_sa_enforce_separation_of_duties` |
|
||||
| **Password policy** | `iam_password_policy_minimum_length_14`, `iam_password_policy_uppercase`, `iam_password_policy_lowercase`, `iam_password_policy_symbol`, `iam_password_policy_number`, `iam_password_policy_expires_passwords_within_90_days_or_less`, `iam_password_policy_reuse_24` | None | None |
|
||||
| **Credential rotation / unused** | `iam_rotate_access_key_90_days`, `iam_user_accesskey_unused`, `iam_user_console_access_unused` | None | `iam_sa_user_managed_key_rotate_90_days`, `iam_sa_user_managed_key_unused`, `iam_service_account_unused` |
|
||||
| **VPC / flow logs** | `vpc_flow_logs_enabled` | `network_flow_log_captured_sent`, `network_watcher_enabled`, `network_flow_log_more_than_90_days` | `compute_subnet_flow_logs_enabled` |
|
||||
| **Backup / DR / Multi-AZ** | `backup_vaults_exist`, `backup_plans_exist`, `backup_reportplans_exist`, `rds_instance_backup_enabled`, `rds_*_protected_by_backup_plan`, `rds_cluster_multi_az`, `neptune_cluster_backup_enabled`, `documentdb_cluster_backup_enabled`, `efs_have_backup_enabled`, `s3_bucket_cross_region_replication`, `dynamodb_table_protected_by_backup_plan` | `vm_backup_enabled`, `vm_sufficient_daily_backup_retention_period`, `storage_geo_redundant_enabled` | `cloudsql_instance_automated_backups`, `cloudstorage_bucket_log_retention_policy_lock`, `cloudstorage_bucket_sufficient_retention_period` |
|
||||
| **Access analysis / discovery** | `accessanalyzer_enabled`, `accessanalyzer_enabled_without_findings` | None specific | `iam_account_access_approval_enabled`, `iam_cloud_asset_inventory_enabled` |
|
||||
| **Object lock / retention** | `s3_bucket_object_lock`, `s3_bucket_object_versioning`, `s3_bucket_lifecycle_enabled`, `cloudtrail_bucket_requires_mfa_delete`, `s3_bucket_no_mfa_delete` | `storage_ensure_soft_delete_is_enabled`, `storage_blob_versioning_is_enabled`, `storage_ensure_file_shares_soft_delete_is_enabled` | `cloudstorage_bucket_log_retention_policy_lock`, `cloudstorage_bucket_soft_delete_enabled`, `cloudstorage_bucket_versioning_enabled`, `cloudstorage_bucket_sufficient_retention_period` |
|
||||
| **Uniform bucket-level access** | `s3_bucket_acl_prohibited` | `storage_account_key_access_disabled`, `storage_default_to_entra_authorization_enabled` | `cloudstorage_bucket_uniform_bucket_level_access` |
|
||||
| **Container vulnerability scanning** | `ecr_registry_scan_images_on_push_enabled`, `ecr_repositories_scan_vulnerabilities_in_latest_image` | `defender_container_images_scan_enabled`, `defender_container_images_resolved_vulnerabilities` | `artifacts_container_analysis_enabled`, `gcr_container_scanning_enabled` |
|
||||
| **WAF / rate limiting** | `wafv2_webacl_with_rules`, `waf_*_webacl_with_rules`, `wafv2_webacl_logging_enabled`, `waf_global_webacl_logging_enabled` | None | None |
|
||||
| **Deployment region restriction** | `organizations_scp_check_deny_regions` | None | None |
|
||||
| **Secrets automatic rotation** | `secretsmanager_automatic_rotation_enabled`, `secretsmanager_secret_rotated_periodically` | `keyvault_rbac_secret_expiration_set`, `keyvault_non_rbac_secret_expiration_set` | None |
|
||||
| **Certificate management** | `acm_certificates_expiration_check`, `acm_certificates_with_secure_key_algorithms`, `acm_certificates_transparency_logs_enabled` | `keyvault_key_expiration_set_in_non_rbac`, `keyvault_rbac_key_expiration_set`, `keyvault_non_rbac_secret_expiration_set` | None |
|
||||
| **GenAI guardrails / input/output filtering** | `bedrock_guardrail_prompt_attack_filter_enabled`, `bedrock_guardrail_sensitive_information_filter_enabled`, `bedrock_agent_guardrail_enabled`, `bedrock_model_invocation_logging_enabled`, `bedrock_api_key_no_administrative_privileges`, `bedrock_api_key_no_long_term_credentials` | None | None |
|
||||
| **ML dev environment security** | `sagemaker_notebook_instance_root_access_disabled`, `sagemaker_notebook_instance_without_direct_internet_access_configured`, `sagemaker_notebook_instance_vpc_settings_configured`, `sagemaker_models_vpc_settings_configured`, `sagemaker_training_jobs_vpc_settings_configured`, `sagemaker_training_jobs_network_isolation_enabled`, `sagemaker_training_jobs_volume_and_output_encryption_enabled` | None | None |
|
||||
| **Threat detection / anomalous behavior** | `cloudtrail_threat_detection_enumeration`, `cloudtrail_threat_detection_privilege_escalation`, `cloudtrail_threat_detection_llm_jacking`, `guardduty_is_enabled`, `guardduty_no_high_severity_findings` | None | None |
|
||||
| **Serverless private access** | `awslambda_function_inside_vpc`, `awslambda_function_not_publicly_accessible`, `awslambda_function_url_public` | `app_function_not_publicly_accessible` | None |
|
||||
|
||||
### What Prowler Does NOT Cover (accept MANUAL honestly)
|
||||
|
||||
Don't pad mappings for these — mark `Checks: []` and move on:
|
||||
|
||||
- **TLS 1.3 version specifically** — Prowler verifies TLS is enforced, not always the exact version
|
||||
- **IANA port-protocol consistency** — no check for "protocol running on its assigned port"
|
||||
- **mTLS on most Azure/GCP services** — limited to App Service client certs on Azure, nothing on GCP
|
||||
- **Rate limiting** on monitoring endpoints, load balancers, serverless invocations, vector ingestion
|
||||
- **Session cookie expiry** (LB stickiness)
|
||||
- **HTTP header scrubbing** (Server, X-Powered-By)
|
||||
- **Certificate transparency verification for imports**
|
||||
- **Model version pinning, red teaming, AI quality review**
|
||||
- **Vector embedding validation, dimensional constraints, ANN vs exact search**
|
||||
- **Secret region replication** (cross-region residency)
|
||||
- **Lifecycle cleanup policies on container registries**
|
||||
- **Row-level / column-level security in data warehouses**
|
||||
- **Deployment region restriction on Azure/GCP** (AWS has `organizations_scp_check_deny_regions`, others don't)
|
||||
- **Cross-tenant alert silencing permissions**
|
||||
- **Field-level masking in logs**
|
||||
- **Managed view enforcement for database access**
|
||||
- **Automatic MFA delete on all S3 buckets** (only CloudTrail bucket variant exists for some frameworks — AWS has the generic `s3_bucket_no_mfa_delete` though)
|
||||
|
||||
---
|
||||
|
||||
## Workflow C: Add a New Output Formatter
|
||||
|
||||
Use when a new framework needs its own CSV columns or terminal table. Follow the c5/csa/ens layout exactly:
|
||||
|
||||
```bash
|
||||
mkdir -p prowler/lib/outputs/compliance/{framework}
|
||||
touch prowler/lib/outputs/compliance/{framework}/__init__.py
|
||||
```
|
||||
|
||||
### Step 1 — Create `{framework}.py` (table dispatcher ONLY)
|
||||
|
||||
Copy from `prowler/lib/outputs/compliance/c5/c5.py` and change the function name + framework string. The `diff` between your file and `c5.py` should be just those two lines. **No function docstring** — other frameworks don't have one, stay consistent.
|
||||
|
||||
### Step 2 — Create `models.py`
|
||||
|
||||
One Pydantic v2 `BaseModel` per provider. Field names become CSV column headers (public API — don't rename later without a migration).
|
||||
|
||||
```python
|
||||
from typing import Optional
|
||||
from pydantic import BaseModel
|
||||
|
||||
class {Framework}_AWSModel(BaseModel):
|
||||
Provider: str
|
||||
Description: str
|
||||
AccountId: str
|
||||
Region: str
|
||||
AssessmentDate: str
|
||||
Requirements_Id: str
|
||||
Requirements_Description: str
|
||||
# ... provider-specific columns
|
||||
Status: str
|
||||
StatusExtended: str
|
||||
ResourceId: str
|
||||
ResourceName: str
|
||||
CheckId: str
|
||||
Muted: bool
|
||||
```
|
||||
|
||||
### Step 3 — Create `{framework}_{provider}.py` for each provider
|
||||
|
||||
Copy from `prowler/lib/outputs/compliance/c5/c5_aws.py` etc. Contains the `{Framework}_AWS(ComplianceOutput)` class with `transform()` that walks findings and emits model rows. This file IS allowed to import `Finding`.
|
||||
|
||||
### Step 4 — Register everywhere
|
||||
|
||||
**`prowler/lib/outputs/compliance/compliance.py`** (CLI table dispatcher):
|
||||
```python
|
||||
from prowler.lib.outputs.compliance.{framework}.{framework} import get_{framework}_table
|
||||
|
||||
def display_compliance_table(...):
|
||||
...
|
||||
elif compliance_framework.startswith("{framework}_"):
|
||||
get_{framework}_table(findings, bulk_checks_metadata,
|
||||
compliance_framework, output_filename,
|
||||
output_directory, compliance_overview)
|
||||
```
|
||||
|
||||
**`prowler/__main__.py`** (CLI output writer per provider):
|
||||
Add imports at the top:
|
||||
```python
|
||||
from prowler.lib.outputs.compliance.{framework}.{framework}_aws import {Framework}_AWS
|
||||
from prowler.lib.outputs.compliance.{framework}.{framework}_azure import {Framework}_Azure
|
||||
from prowler.lib.outputs.compliance.{framework}.{framework}_gcp import {Framework}_GCP
|
||||
```
|
||||
Add provider-specific `elif compliance_name.startswith("{framework}_"):` branches that instantiate the class and call `batch_write_data_to_file()`.
|
||||
|
||||
**`api/src/backend/tasks/jobs/export.py`** (API export dispatcher):
|
||||
```python
|
||||
from prowler.lib.outputs.compliance.{framework}.{framework}_aws import {Framework}_AWS
|
||||
# ... azure, gcp
|
||||
|
||||
COMPLIANCE_CLASS_MAP = {
|
||||
"aws": [
|
||||
# ...
|
||||
(lambda name: name.startswith("{framework}_"), {Framework}_AWS),
|
||||
],
|
||||
# ... azure, gcp
|
||||
}
|
||||
```
|
||||
|
||||
**Always use `startswith`**, never `name == "framework_aws"`. Exact match is a regression.
|
||||
|
||||
### Step 5 — Add tests
|
||||
|
||||
Create `tests/lib/outputs/compliance/{framework}/` with `{framework}_aws_test.py`, `{framework}_azure_test.py`, `{framework}_gcp_test.py`. See the test template in [references/test_template.md](references/test_template.md).
|
||||
|
||||
Add fixtures to `tests/lib/outputs/compliance/fixtures.py`: one `Compliance` object per provider with 1 evaluated + 1 manual requirement to exercise both code paths in `transform()`.
|
||||
|
||||
### Circular import warning
|
||||
|
||||
**The table dispatcher file (`{framework}.py`) MUST NOT import `Finding`** (directly or transitively). The cycle is:
|
||||
|
||||
```
|
||||
compliance.compliance imports get_{framework}_table
|
||||
→ {framework}.py imports ComplianceOutput
|
||||
→ compliance_output imports Finding
|
||||
→ finding imports get_check_compliance from compliance.compliance
|
||||
→ CIRCULAR
|
||||
```
|
||||
|
||||
Keep `{framework}.py` bare — only `colorama`, `tabulate`, `prowler.config.config`. Put anything that imports `Finding` in the per-provider `{framework}_{provider}.py` files.
|
||||
|
||||
---
|
||||
|
||||
## Conventions and Hard-Won Gotchas
|
||||
|
||||
These are lessons from the FINOS CCC v2025.10 sync + 172-AR audit pass (April 2026). Learn them once; save days of debugging.
|
||||
|
||||
1. **Per-provider files are non-negotiable.** Never collapse `{framework}_aws.py`, `{framework}_azure.py`, `{framework}_gcp.py` into a single parameterized class, no matter how DRY-tempting. Every other framework in the codebase follows the per-provider pattern and reviewers will reject the refactor. The CSV column names differ per provider — three classes is the convention.
|
||||
2. **`{framework}.py` has NO function docstring.** Other frameworks don't have them. Don't add one to be "helpful".
|
||||
3. **Circular import protection**: the table dispatcher file MUST NOT import `Finding` (directly or transitively). Split the code so `{framework}.py` only has `get_{framework}_table()` with bare imports, and `{framework}_{provider}.py` holds the class that needs `Finding`.
|
||||
4. **`Generic_Compliance_Requirement_Attribute` is the fallback** — in the `Compliance_Requirement.Attributes` Union in `compliance_models.py`, Generic MUST be LAST because Pydantic v1 tries union members in order. Putting Generic first means every framework-specific attribute falls through to Generic and the specific model is never used.
|
||||
5. **Pydantic v1 imports.** `from pydantic.v1 import BaseModel` in `compliance_models.py` — not v2. Mixing causes validation errors. Pydantic v2 is used in the CSV models (`models.py`) — that's fine because they're separate trees.
|
||||
6. **`get_check_compliance()` key format** is `f"{Framework}-{Version}"` ONLY if Version is set. Empty Version → key is `"{Framework}"` (no version suffix). Tests that mock compliance dicts must match this exact format — when a framework ships with `Version: ""`, downstream code and tests break silently.
|
||||
7. **CSV column names from `models.py` are public API.** Don't rename a field without migrating downstream consumers — CSV headers change.
|
||||
8. **Upstream YAML multi-line scalars** (`|` block scalars) preserve newlines. Collapse to single-line with `" ".join(value.split())` before writing to JSON.
|
||||
9. **Upstream catalogs can use multiple shapes.** FINOS CCC uses `control-families: [...]` in most catalogs but `controls: [...]` at the top level in `storage/object`. Any sync script must handle both or silently drop entire catalogs.
|
||||
10. **Foreign-prefix AR ids.** Upstream sometimes "imports" requirements from one catalog into another by keeping the original id prefix (e.g., `CCC.AuditLog.CN08.AR01` appearing under `CCC.Logging.CN03`). Prowler's compliance model requires unique ids within a catalog — rewrite the foreign id to fit the parent control: `CCC.AuditLog.CN08.AR01` (inside `CCC.Logging.CN03`) → `CCC.Logging.CN03.AR01`.
|
||||
11. **Genuine upstream id collisions.** Sometimes upstream has a real typo where two different requirements share the same id (e.g., `CCC.Core.CN14.AR02` defined twice for 30-day and 14-day backup variants). Renumber the second copy to the next free AR number. Preserve check mappings by matching on `(Section, frozenset(Applicability))` since the renumbered id won't match by id.
|
||||
12. **`COMPLIANCE_CLASS_MAP` in `export.py` uses `startswith` predicates** for all modern frameworks. Exact match (`name == "ccc_aws"`) is an anti-pattern — it was present for CCC until April 2026 and was the reason CCC couldn't have versioned variants.
|
||||
13. **Pre-validate every check id** against the per-provider inventory before writing the JSON. A typo silently creates an unreferenced check that will fail when findings try to map to it. The audit script MUST abort with stderr listing typos, not swallow them.
|
||||
14. **REPLACE is better than PATCH** for audit decisions. Encoding every mapping explicitly makes the audit reproducible and surfaces hidden assumptions from the legacy data. A PATCH system that adds/removes is too easy to forget.
|
||||
15. **When no check applies, MANUAL is correct.** Do not pad mappings with tangential checks "just in case". Prowler's compliance reports are meant to be actionable — padding them with noise breaks that. Honest manual reqs can be mapped later when new checks land.
|
||||
16. **UI groups by `Attributes[0].FamilyName` and `Attributes[0].Section`.** If FamilyName has inconsistent variants within the same JSON (e.g., "Logging & Monitoring" vs "Logging and Monitoring"), the UI renders them as separate categories. Section empty → the requirement falls into an orphan control with label "". Normalize before shipping.
|
||||
17. **Provider coverage is asymmetric.** AWS has dense coverage (~586 checks across 80+ services): in-transit encryption, IAM, database encryption, backup. Azure (~167 checks) and GCP (~102 checks) are thinner especially for in-transit encryption, mTLS, and ML/AI. Accept the asymmetry in mappings — don't force GCP parity where Prowler genuinely can't verify.
|
||||
|
||||
---
|
||||
|
||||
## Useful One-Liners
|
||||
|
||||
```bash
|
||||
# Count requirements per service prefix (CCC, CIS sections, etc.)
|
||||
jq -r '.Requirements[].Id | split(".")[1]' prowler/compliance/aws/ccc_aws.json | sort | uniq -c
|
||||
|
||||
# Find duplicate requirement IDs
|
||||
jq -r '.Requirements[].Id' file.json | sort | uniq -d
|
||||
|
||||
# Count manual requirements (no checks)
|
||||
jq '[.Requirements[] | select((.Checks | length) == 0)] | length' file.json
|
||||
|
||||
# List all unique check references in a framework
|
||||
jq -r '.Requirements[].Checks[]' file.json | sort -u
|
||||
|
||||
# List all unique Sections (to spot inconsistency)
|
||||
jq '[.Requirements[].Attributes[0].Section] | unique' file.json
|
||||
|
||||
# List all unique FamilyNames (to spot inconsistency)
|
||||
jq '[.Requirements[].Attributes[0].FamilyName] | unique' file.json
|
||||
|
||||
# Diff requirement ids between two versions of the same framework
|
||||
diff <(jq -r '.Requirements[].Id' a.json | sort) <(jq -r '.Requirements[].Id' b.json | sort)
|
||||
|
||||
# Find where a check id is used across all frameworks
|
||||
grep -rl "my_check_name" prowler/compliance/
|
||||
|
||||
# Check if a Prowler check exists
|
||||
find prowler/providers/aws/services -name "{check_id}.metadata.json"
|
||||
|
||||
# Validate a JSON with Pydantic
|
||||
python -c "from prowler.lib.check.compliance_models import Compliance; print(Compliance.parse_file('prowler/compliance/aws/ccc_aws.json').Framework)"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Best Practices
|
||||
|
||||
1. **Requirement IDs**: Follow the original framework numbering exactly (e.g., "1.1", "A.5.1", "T1190", "ac_2_1")
|
||||
2. **Check Mapping**: Map to existing checks when possible. Use `Checks: []` for manual-only requirements
|
||||
2. **Check Mapping**: Map to existing checks when possible. Use `Checks: []` for manual-only requirements — honest MANUAL beats padded coverage
|
||||
3. **Completeness**: Include all framework requirements, even those without automated checks
|
||||
4. **Version Control**: Include framework version in `Name` and `Version` fields
|
||||
4. **Version Control**: Include framework version in `Name` and `Version` fields. **Never leave `Version: ""`** — it breaks `get_check_compliance()` key format
|
||||
5. **File Naming**: Use format `{framework}_{version}_{provider}.json`
|
||||
6. **Validation**: Prowler validates JSON against Pydantic models at startup - invalid JSON will cause errors
|
||||
6. **Validation**: Prowler validates JSON against Pydantic models at startup — invalid JSON will cause errors
|
||||
7. **Pre-validate check ids** against the provider's `*.metadata.json` inventory before every commit
|
||||
8. **Normalize FamilyName and Section** to avoid inconsistent UI tree branches
|
||||
9. **Register everywhere**: SDK model (if needed) → `compliance.py` dispatcher → `__main__.py` CLI writer → `export.py` API map → UI mapper. Skipping any layer results in silent failures
|
||||
10. **Audit, don't pad**: when reviewing mappings, apply the golden rule — the check's title/risk MUST literally describe what the requirement text says. Tangential relation doesn't count
|
||||
|
||||
## Commands
|
||||
|
||||
@@ -482,11 +1005,46 @@ prowler aws --compliance cis_5.0_aws -M csv json html
|
||||
|
||||
## Code References
|
||||
|
||||
- **Compliance Models:** `prowler/lib/check/compliance_models.py`
|
||||
- **Compliance Processing:** `prowler/lib/check/compliance.py`
|
||||
- **Compliance Output:** `prowler/lib/outputs/compliance/`
|
||||
### Layer 1 — SDK / Core
|
||||
- **Compliance Models:** `prowler/lib/check/compliance_models.py` (Pydantic v1 model tree)
|
||||
- **Compliance Processing / Linker:** `prowler/lib/check/compliance.py` (`get_check_compliance`, `update_checks_metadata_with_compliance`)
|
||||
- **Check Utils:** `prowler/lib/check/utils.py` (`list_compliance_modules`)
|
||||
|
||||
### Layer 2 — JSON Catalogs
|
||||
- **Framework JSONs:** `prowler/compliance/{provider}/` (auto-discovered via directory walk)
|
||||
|
||||
### Layer 3 — Output Formatters
|
||||
- **Per-framework folders:** `prowler/lib/outputs/compliance/{framework}/`
|
||||
- **Shared base class:** `prowler/lib/outputs/compliance/compliance_output.py` (`ComplianceOutput` + `batch_write_data_to_file`)
|
||||
- **CLI table dispatcher:** `prowler/lib/outputs/compliance/compliance.py` (`display_compliance_table`)
|
||||
- **Finding model:** `prowler/lib/outputs/finding.py` (**do not import transitively from table dispatcher files — circular import**)
|
||||
- **CLI writer:** `prowler/__main__.py` (per-provider `elif compliance_name.startswith(...)` branches that instantiate per-provider classes)
|
||||
|
||||
### Layer 4 — API / UI
|
||||
- **API lazy loader:** `api/src/backend/api/compliance.py` (`LazyComplianceTemplate`, `LazyChecksMapping`)
|
||||
- **API export dispatcher:** `api/src/backend/tasks/jobs/export.py` (`COMPLIANCE_CLASS_MAP` with `startswith` predicates)
|
||||
- **UI framework router:** `ui/lib/compliance/compliance-mapper.ts`
|
||||
- **UI per-framework mapper:** `ui/lib/compliance/{framework}.tsx`
|
||||
- **UI detail panel:** `ui/components/compliance/compliance-custom-details/{framework}-details.tsx`
|
||||
- **UI types:** `ui/types/compliance.ts`
|
||||
- **UI icon:** `ui/components/icons/compliance/{framework}.svg` + registration in `IconCompliance.tsx`
|
||||
|
||||
### Tests
|
||||
- **Output formatter tests:** `tests/lib/outputs/compliance/{framework}/{framework}_{provider}_test.py`
|
||||
- **Shared fixtures:** `tests/lib/outputs/compliance/fixtures.py`
|
||||
|
||||
## Resources
|
||||
|
||||
- **Templates:** See [assets/](assets/) for framework JSON templates
|
||||
- **JSON Templates:** See [assets/](assets/) for framework JSON templates (cis, ens, iso27001, mitre_attack, prowler_threatscore, generic)
|
||||
- **Config-driven compliance sync** (any upstream-backed framework):
|
||||
- [assets/sync_framework.py](assets/sync_framework.py) — generic runner. Loads a YAML config, dynamically imports the declared parser, applies generic post-processing (id uniqueness safety net, `FamilyName` normalization, legacy check-mapping preservation with config-driven fallback keys), and writes the provider JSONs with Pydantic post-validation. Framework-agnostic — works for any compliance framework.
|
||||
- [assets/configs/ccc.yaml](assets/configs/ccc.yaml) — canonical config example (FINOS CCC v2025.10). Copy and adapt for new frameworks.
|
||||
- [assets/parsers/finos_ccc.py](assets/parsers/finos_ccc.py) — FINOS CCC YAML parser. Handles both upstream shapes (`control-families` and top-level `controls`), foreign-prefix AR rewriting, and genuine collision renumbering. Exposes `parse_upstream(config) -> list[dict]`.
|
||||
- [assets/parsers/](assets/parsers/) — add new parser modules here for unfamiliar upstream formats (NIST OSCAL JSON, MITRE STIX, CIS Benchmarks, etc.). Each parser is a `{name}.py` file implementing `parse_upstream(config) -> list[dict]` with guaranteed-unique ids.
|
||||
- **Reusable audit tooling** (added April 2026 after the FINOS CCC v2025.10 sync):
|
||||
- [assets/audit_framework_template.py](assets/audit_framework_template.py) — explicit REPLACE decision ledger with pre-validation against the per-provider inventory. Drop-in template for auditing any framework.
|
||||
- [assets/query_checks.py](assets/query_checks.py) — keyword/service/id query helper over `/tmp/checks_{provider}.json`.
|
||||
- [assets/dump_section.py](assets/dump_section.py) — dumps every AR for a given id prefix across all 3 providers with current check mappings.
|
||||
- [assets/build_inventory.py](assets/build_inventory.py) — generates `/tmp/checks_{provider}.json` from `*.metadata.json` files.
|
||||
- **Documentation:** See [references/compliance-docs.md](references/compliance-docs.md) for additional resources
|
||||
- **Related skill:** [prowler-compliance-review](../prowler-compliance-review/SKILL.md) — PR review checklist and validator script for compliance framework PRs
|
||||
|
||||
188
skills/prowler-compliance/assets/audit_framework_template.py
Normal file
188
skills/prowler-compliance/assets/audit_framework_template.py
Normal file
@@ -0,0 +1,188 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Cloud-auditor pass template for any Prowler compliance framework.
|
||||
|
||||
Encode explicit REPLACE decisions per (requirement_id, provider) pair below.
|
||||
Each decision FULLY overwrites the legacy Checks list for that requirement.
|
||||
|
||||
Workflow:
|
||||
1. Run build_inventory.py first to cache per-provider check metadata.
|
||||
2. Run dump_section.py to see current mappings for the catalog you're auditing.
|
||||
3. Fill in DECISIONS below with explicit check lists.
|
||||
4. Run this script — it pre-validates every check id against the inventory
|
||||
and aborts with stderr listing typos before writing.
|
||||
|
||||
Decision rules (apply as a hostile cloud auditor):
|
||||
- The Prowler check's title/risk MUST literally describe what the AR text says.
|
||||
"Related" is not enough.
|
||||
- If no check actually addresses the requirement, leave `[]` (= MANUAL).
|
||||
HONEST MANUAL is worth more than padded coverage.
|
||||
- Missing provider key = leave the legacy mapping untouched.
|
||||
- Empty list `[]` = explicitly MANUAL (overwrites legacy).
|
||||
|
||||
Usage:
|
||||
# 1. Copy this file to /tmp/audit_<framework>.py and fill in DECISIONS
|
||||
# 2. Edit FRAMEWORK_KEY below to match your framework file naming
|
||||
# 3. Run:
|
||||
python /tmp/audit_<framework>.py
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Configure for your framework
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Framework file basename inside prowler/compliance/{provider}/.
|
||||
# If your framework is called "cis_5.0_aws.json", FRAMEWORK_KEY is "cis_5.0".
|
||||
# If the file is "ccc_aws.json", FRAMEWORK_KEY is "ccc".
|
||||
FRAMEWORK_KEY = "ccc"
|
||||
|
||||
# Which providers to apply decisions to.
|
||||
PROVIDERS = ["aws", "azure", "gcp"]
|
||||
|
||||
PROWLER_DIR = Path("prowler/compliance")
|
||||
CHECK_INV = {prov: Path(f"/tmp/checks_{prov}.json") for prov in PROVIDERS}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# DECISIONS — encode one entry per requirement you want to audit
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# DECISIONS[requirement_id][provider] = list[str] of check ids
|
||||
# See SKILL.md → "Audit Reference Table: Requirement Text → Prowler Checks"
|
||||
# for a comprehensive mapping cheat sheet built from a 172-AR CCC audit.
|
||||
|
||||
DECISIONS: dict[str, dict[str, list[str]]] = {}
|
||||
|
||||
# ---- Example entries (delete and replace with your own) ----
|
||||
|
||||
# Example 1: TLS in transit enforced (non-SSH traffic)
|
||||
# DECISIONS["CCC.Core.CN01.AR01"] = {
|
||||
# "aws": [
|
||||
# "cloudfront_distributions_https_enabled",
|
||||
# "cloudfront_distributions_origin_traffic_encrypted",
|
||||
# "s3_bucket_secure_transport_policy",
|
||||
# "elbv2_ssl_listeners",
|
||||
# "rds_instance_transport_encrypted",
|
||||
# "kafka_cluster_in_transit_encryption_enabled",
|
||||
# "redshift_cluster_in_transit_encryption_enabled",
|
||||
# "opensearch_service_domains_https_communications_enforced",
|
||||
# ],
|
||||
# "azure": [
|
||||
# "storage_secure_transfer_required_is_enabled",
|
||||
# "app_minimum_tls_version_12",
|
||||
# "postgresql_flexible_server_enforce_ssl_enabled",
|
||||
# "sqlserver_recommended_minimal_tls_version",
|
||||
# ],
|
||||
# "gcp": [
|
||||
# "cloudsql_instance_ssl_connections",
|
||||
# ],
|
||||
# }
|
||||
|
||||
# Example 2: MANUAL — no Prowler check exists
|
||||
# DECISIONS["CCC.Core.CN01.AR07"] = {
|
||||
# "aws": [], # no IANA port/protocol check exists in Prowler
|
||||
# "azure": [],
|
||||
# "gcp": [],
|
||||
# }
|
||||
|
||||
# Example 3: Reuse a decision for multiple sibling ARs
|
||||
# DECISIONS["CCC.ObjStor.CN05.AR02"] = DECISIONS["CCC.ObjStor.CN05.AR01"]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Driver — do not edit below
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def load_inventory(provider: str) -> dict:
|
||||
path = CHECK_INV[provider]
|
||||
if not path.exists():
|
||||
raise SystemExit(
|
||||
f"Check inventory missing: {path}\n"
|
||||
f"Run: python skills/prowler-compliance/assets/build_inventory.py {provider}"
|
||||
)
|
||||
with open(path) as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def resolve_json_path(provider: str) -> Path:
|
||||
"""Resolve the JSON file path for a given provider.
|
||||
|
||||
Handles both shapes: {FRAMEWORK_KEY}_{provider}.json (ccc_aws.json) and
|
||||
cases where FRAMEWORK_KEY already contains the provider suffix.
|
||||
"""
|
||||
candidates = [
|
||||
PROWLER_DIR / provider / f"{FRAMEWORK_KEY}_{provider}.json",
|
||||
PROWLER_DIR / provider / f"{FRAMEWORK_KEY}.json",
|
||||
]
|
||||
for c in candidates:
|
||||
if c.exists():
|
||||
return c
|
||||
raise SystemExit(
|
||||
f"Could not find framework JSON for provider={provider} "
|
||||
f"with FRAMEWORK_KEY={FRAMEWORK_KEY}. Tried: {candidates}"
|
||||
)
|
||||
|
||||
|
||||
def apply_for_provider(provider: str) -> tuple[int, int, int]:
|
||||
"""Apply DECISIONS to the JSON for one provider.
|
||||
|
||||
Returns (touched, added, removed).
|
||||
"""
|
||||
path = resolve_json_path(provider)
|
||||
with open(path) as f:
|
||||
data = json.load(f)
|
||||
inv = load_inventory(provider)
|
||||
|
||||
touched = 0
|
||||
add_count = 0
|
||||
rm_count = 0
|
||||
unknown: list[tuple[str, str]] = []
|
||||
|
||||
for req in data["Requirements"]:
|
||||
rid = req["Id"]
|
||||
if rid not in DECISIONS or provider not in DECISIONS[rid]:
|
||||
continue
|
||||
new_checks = list(dict.fromkeys(DECISIONS[rid][provider]))
|
||||
for c in new_checks:
|
||||
if c not in inv:
|
||||
unknown.append((rid, c))
|
||||
before = set(req.get("Checks") or [])
|
||||
after = set(new_checks)
|
||||
rm_count += len(before - after)
|
||||
add_count += len(after - before)
|
||||
req["Checks"] = new_checks
|
||||
touched += 1
|
||||
|
||||
if unknown:
|
||||
print(f"\n!! {provider} — UNKNOWN CHECK IDS (typos?):", file=sys.stderr)
|
||||
for rid, c in unknown:
|
||||
print(f" {rid} -> {c}", file=sys.stderr)
|
||||
print("\nAborting: fix the check ids above and re-run.", file=sys.stderr)
|
||||
sys.exit(2)
|
||||
|
||||
with open(path, "w") as f:
|
||||
json.dump(data, f, indent=2, ensure_ascii=False)
|
||||
f.write("\n")
|
||||
return touched, add_count, rm_count
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if not DECISIONS:
|
||||
print("No DECISIONS encoded. Fill in the DECISIONS dict and re-run.")
|
||||
return 1
|
||||
print(f"Applying {len(DECISIONS)} decisions to framework '{FRAMEWORK_KEY}'...")
|
||||
for provider in PROVIDERS:
|
||||
touched, added, removed = apply_for_provider(provider)
|
||||
print(
|
||||
f" {provider}: touched={touched} added={added} removed={removed}"
|
||||
)
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
82
skills/prowler-compliance/assets/build_inventory.py
Normal file
82
skills/prowler-compliance/assets/build_inventory.py
Normal file
@@ -0,0 +1,82 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Build a per-provider check inventory by scanning Prowler's check metadata files.
|
||||
|
||||
Outputs one JSON per provider at /tmp/checks_{provider}.json with the shape:
|
||||
{
|
||||
"check_id": {
|
||||
"service": "...",
|
||||
"subservice": "...",
|
||||
"resource": "...",
|
||||
"severity": "...",
|
||||
"title": "...",
|
||||
"description": "...",
|
||||
"risk": "..."
|
||||
},
|
||||
...
|
||||
}
|
||||
|
||||
This is the reference used by audit_framework_template.py for pre-validation
|
||||
(every check id in the audit ledger must exist in the inventory) and by
|
||||
query_checks.py for keyword/service lookup.
|
||||
|
||||
Usage:
|
||||
python skills/prowler-compliance/assets/build_inventory.py
|
||||
# Or for a specific provider:
|
||||
python skills/prowler-compliance/assets/build_inventory.py aws
|
||||
|
||||
Output:
|
||||
/tmp/checks_aws.json (~586 checks)
|
||||
/tmp/checks_azure.json (~167 checks)
|
||||
/tmp/checks_gcp.json (~102 checks)
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
|
||||
DEFAULT_PROVIDERS = ["aws", "azure", "gcp", "kubernetes", "m365", "github",
|
||||
"oraclecloud", "alibabacloud", "mongodbatlas", "nhn",
|
||||
"iac", "llm", "googleworkspace", "cloudflare"]
|
||||
|
||||
|
||||
def build_for_provider(provider: str) -> dict:
|
||||
inventory: dict[str, dict] = {}
|
||||
base = Path(f"prowler/providers/{provider}/services")
|
||||
if not base.exists():
|
||||
print(f" skip {provider}: no services directory", file=sys.stderr)
|
||||
return inventory
|
||||
for meta_path in base.rglob("*.metadata.json"):
|
||||
try:
|
||||
with open(meta_path) as f:
|
||||
data = json.load(f)
|
||||
except Exception as exc:
|
||||
print(f" warn: cannot parse {meta_path}: {exc}", file=sys.stderr)
|
||||
continue
|
||||
cid = data.get("CheckID") or meta_path.stem.replace(".metadata", "")
|
||||
inventory[cid] = {
|
||||
"service": data.get("ServiceName", ""),
|
||||
"subservice": data.get("SubServiceName", ""),
|
||||
"resource": data.get("ResourceType", ""),
|
||||
"severity": data.get("Severity", ""),
|
||||
"title": data.get("CheckTitle", ""),
|
||||
"description": data.get("Description", ""),
|
||||
"risk": data.get("Risk", ""),
|
||||
}
|
||||
return inventory
|
||||
|
||||
|
||||
def main() -> int:
|
||||
providers = sys.argv[1:] or DEFAULT_PROVIDERS
|
||||
for provider in providers:
|
||||
inv = build_for_provider(provider)
|
||||
out_path = Path(f"/tmp/checks_{provider}.json")
|
||||
with open(out_path, "w") as f:
|
||||
json.dump(inv, f, indent=2)
|
||||
print(f" {provider}: {len(inv)} checks → {out_path}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
111
skills/prowler-compliance/assets/configs/ccc.yaml
Normal file
111
skills/prowler-compliance/assets/configs/ccc.yaml
Normal file
@@ -0,0 +1,111 @@
|
||||
# FINOS Common Cloud Controls (CCC) sync config for sync_framework.py.
|
||||
#
|
||||
# Usage:
|
||||
# python skills/prowler-compliance/assets/sync_framework.py \
|
||||
# skills/prowler-compliance/assets/configs/ccc.yaml
|
||||
#
|
||||
# Prerequisite: run the upstream fetch step from SKILL.md Workflow A Step 1 to
|
||||
# populate upstream.dir with the raw FINOS catalog YAML files.
|
||||
|
||||
framework:
|
||||
name: CCC
|
||||
display_name: Common Cloud Controls Catalog (CCC)
|
||||
version: v2025.10
|
||||
# The {provider_display} placeholder is replaced at output time with the
|
||||
# per-provider display string from the providers list below.
|
||||
description_template: "Common Cloud Controls Catalog (CCC) for {provider_display}"
|
||||
|
||||
providers:
|
||||
- key: aws
|
||||
display: AWS
|
||||
- key: azure
|
||||
display: Azure
|
||||
- key: gcp
|
||||
display: GCP
|
||||
|
||||
output:
|
||||
# Supported placeholders: {provider}, {framework}, {version}.
|
||||
# For versioned frameworks like CIS the template would be
|
||||
# "prowler/compliance/{provider}/cis_{version}_{provider}.json".
|
||||
path_template: "prowler/compliance/{provider}/ccc_{provider}.json"
|
||||
|
||||
upstream:
|
||||
# Directory containing the cached FINOS catalog YAMLs. Populate via
|
||||
# SKILL.md Workflow A Step 1 (gh api raw download commands).
|
||||
dir: /tmp/ccc_upstream
|
||||
fetch_docs: "See SKILL.md Workflow A Step 1 for gh api fetch commands"
|
||||
|
||||
parser:
|
||||
# Name of the parser module under parsers/ (loaded dynamically by the
|
||||
# runner). For FINOS CCC YAML this is always finos_ccc.
|
||||
module: finos_ccc
|
||||
|
||||
# FINOS CCC catalog files in load order. Core first so its ARs render
|
||||
# first in the output JSON.
|
||||
catalog_files:
|
||||
- core_ccc.yaml
|
||||
- management_auditlog.yaml
|
||||
- management_logging.yaml
|
||||
- management_monitoring.yaml
|
||||
- storage_object.yaml
|
||||
- networking_loadbalancer.yaml
|
||||
- networking_vpc.yaml
|
||||
- crypto_key.yaml
|
||||
- crypto_secrets.yaml
|
||||
- database_warehouse.yaml
|
||||
- database_vector.yaml
|
||||
- database_relational.yaml
|
||||
- devtools_build.yaml
|
||||
- devtools_container-registry.yaml
|
||||
- identity_iam.yaml
|
||||
- ai-ml_gen-ai.yaml
|
||||
- ai-ml_mlde.yaml
|
||||
- app-integration_message.yaml
|
||||
- compute_serverless-computing.yaml
|
||||
|
||||
# Shape-2 catalogs (storage/object) reference the family via id only
|
||||
# (e.g. "CCC.ObjStor.Data") with no human-readable title or description
|
||||
# in the YAML. Map the suffix (after the last dot) to a canonical title
|
||||
# and description so the generated JSON has consistent FamilyName fields
|
||||
# regardless of upstream shape.
|
||||
family_id_title:
|
||||
Data: Data
|
||||
IAM: Identity and Access Management
|
||||
Identity: Identity and Access Management
|
||||
Encryption: Encryption
|
||||
Logging: Logging and Monitoring
|
||||
Network: Network Security
|
||||
Availability: Availability
|
||||
Integrity: Integrity
|
||||
Confidentiality: Confidentiality
|
||||
family_id_description:
|
||||
Data: "The Data control family ensures the confidentiality, integrity, availability, and sovereignty of data across its lifecycle."
|
||||
IAM: "The Identity and Access Management control family ensures that only trusted and authenticated entities can access resources."
|
||||
|
||||
post_processing:
|
||||
# Collapse FamilyName variants that appear inconsistently across upstream
|
||||
# catalogs. The Prowler UI groups by Attributes[0].FamilyName exactly,
|
||||
# so each variant would otherwise become a separate tree branch.
|
||||
family_name_normalization:
|
||||
"Logging & Monitoring": "Logging and Monitoring"
|
||||
"Logging and Metrics Publication": "Logging and Monitoring"
|
||||
|
||||
# Preserve existing Checks lists from the legacy Prowler JSON when
|
||||
# regenerating. The runner builds two lookup tables from the legacy
|
||||
# output: a primary index by Id, and fallback indexes composed of
|
||||
# attribute field names.
|
||||
#
|
||||
# primary_key: the top-level requirement field to use as the primary
|
||||
# lookup key (almost always "Id")
|
||||
# fallback_keys: a list of composite keys. Each composite key is a list
|
||||
# of Attributes[0] field names to join into a tuple. List-valued fields
|
||||
# (like Applicability) are frozen to frozenset so the tuple is hashable.
|
||||
#
|
||||
# CCC uses (Section, Applicability) because Applicability is a CCC-only
|
||||
# top-level attribute field. CIS would use (Section, Profile). NIST would
|
||||
# use (ItemId,). The fallback is how renumbered or rewritten ids still
|
||||
# recover their check mappings.
|
||||
check_preservation:
|
||||
primary_key: Id
|
||||
fallback_keys:
|
||||
- [Section, Applicability]
|
||||
91
skills/prowler-compliance/assets/dump_section.py
Normal file
91
skills/prowler-compliance/assets/dump_section.py
Normal file
@@ -0,0 +1,91 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Dump every requirement of a compliance framework for a given id prefix across
|
||||
providers, with their current Check mappings.
|
||||
|
||||
Useful for reviewing a whole control family in one pass before encoding audit
|
||||
decisions in audit_framework_template.py.
|
||||
|
||||
Usage:
|
||||
# Dump all CCC.Core requirements across aws/azure/gcp
|
||||
python skills/prowler-compliance/assets/dump_section.py ccc "CCC.Core."
|
||||
|
||||
# Dump all CIS 5.0 section 1 requirements for AWS only
|
||||
python skills/prowler-compliance/assets/dump_section.py cis_5.0_aws "1."
|
||||
|
||||
Arguments:
|
||||
framework_key: file prefix inside prowler/compliance/{provider}/ without
|
||||
the provider suffix. Examples:
|
||||
- "ccc" → loads ccc_aws.json / ccc_azure.json / ccc_gcp.json
|
||||
- "cis_5.0_aws" → loads only that one file
|
||||
- "iso27001_2022" → loads all providers
|
||||
id_prefix: Requirement id prefix to filter by (e.g. "CCC.Core.",
|
||||
"1.1.", "A.5.").
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from pathlib import Path
|
||||
|
||||
PROWLER_COMPLIANCE_DIR = Path("prowler/compliance")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) < 3:
|
||||
print(__doc__)
|
||||
return 1
|
||||
|
||||
framework_key = sys.argv[1]
|
||||
id_prefix = sys.argv[2]
|
||||
|
||||
# Find matching JSON files across all providers
|
||||
candidates: list[tuple[str, Path]] = []
|
||||
for prov_dir in sorted(PROWLER_COMPLIANCE_DIR.iterdir()):
|
||||
if not prov_dir.is_dir():
|
||||
continue
|
||||
for json_path in prov_dir.glob("*.json"):
|
||||
stem = json_path.stem
|
||||
if stem == framework_key or stem.startswith(f"{framework_key}_") \
|
||||
or stem == f"{framework_key}_{prov_dir.name}":
|
||||
candidates.append((prov_dir.name, json_path))
|
||||
|
||||
if not candidates:
|
||||
print(f"No files matching '{framework_key}'", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
by_id: dict[str, dict] = defaultdict(dict)
|
||||
for prov, path in candidates:
|
||||
with open(path) as f:
|
||||
data = json.load(f)
|
||||
for req in data["Requirements"]:
|
||||
if req["Id"].startswith(id_prefix):
|
||||
by_id[req["Id"]][prov] = {
|
||||
"desc": req.get("Description", ""),
|
||||
"sec": (req.get("Attributes") or [{}])[0].get("Section", ""),
|
||||
"obj": (req.get("Attributes") or [{}])[0].get(
|
||||
"SubSectionObjective", ""
|
||||
),
|
||||
"checks": req.get("Checks") or [],
|
||||
}
|
||||
|
||||
for ar_id in sorted(by_id):
|
||||
rows = by_id[ar_id]
|
||||
sample = next(iter(rows.values()))
|
||||
print(f"\n### {ar_id}")
|
||||
print(f" desc: {sample['desc']}")
|
||||
if sample["sec"]:
|
||||
print(f" sec : {sample['sec']}")
|
||||
if sample["obj"]:
|
||||
print(f" obj : {sample['obj']}")
|
||||
for prov in ["aws", "azure", "gcp", "kubernetes", "m365", "github",
|
||||
"oraclecloud", "alibabacloud"]:
|
||||
if prov in rows:
|
||||
checks = rows[prov]["checks"]
|
||||
print(f" {prov}: ({len(checks)}) {checks}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
219
skills/prowler-compliance/assets/parsers/finos_ccc.py
Normal file
219
skills/prowler-compliance/assets/parsers/finos_ccc.py
Normal file
@@ -0,0 +1,219 @@
|
||||
"""
|
||||
FINOS Common Cloud Controls (CCC) YAML parser.
|
||||
|
||||
Reads cached upstream YAML files and emits Prowler-format requirements
|
||||
(``{Id, Description, Attributes: [...], Checks: []}``). This module is
|
||||
agnostic to providers, JSON output paths, framework metadata and legacy
|
||||
check-mapping preservation — those are handled by ``sync_framework.py``.
|
||||
|
||||
Contract
|
||||
--------
|
||||
``parse_upstream(config: dict) -> list[dict]``
|
||||
Returns a list of Prowler-format requirement dicts with **guaranteed
|
||||
unique ids**. Foreign-prefix AR rewriting and genuine collision
|
||||
renumbering both happen inside this module — the runner treats id
|
||||
uniqueness as a contract violation, not as something to fix.
|
||||
|
||||
Config keys consumed
|
||||
--------------------
|
||||
This parser reads the following config entries (the rest of the config is
|
||||
opaque to it):
|
||||
|
||||
- ``upstream.dir`` — directory containing the cached YAMLs
|
||||
- ``parser.catalog_files`` — ordered list of YAML filenames to load
|
||||
- ``parser.family_id_title`` — suffix → canonical family title (shape 2)
|
||||
- ``parser.family_id_description`` — suffix → family description (shape 2)
|
||||
|
||||
Upstream shapes
|
||||
---------------
|
||||
FINOS CCC catalogs come in two shapes:
|
||||
|
||||
1. ``control-families: [{title, description, controls: [...]}]``
|
||||
(used by most catalogs)
|
||||
2. ``controls: [{id, family: "CCC.X.Y", ...}]`` (no families wrapper; used
|
||||
by ``storage/object``). The ``family`` field references a family id with
|
||||
no human-readable title in the file — the title/description come from
|
||||
``config.parser.family_id_title`` / ``family_id_description``.
|
||||
|
||||
Id rewriting rules
|
||||
------------------
|
||||
- **Foreign-prefix rewriting**: upstream intentionally aliases requirements
|
||||
across catalogs by keeping the original prefix (e.g. ``CCC.AuditLog.CN08.AR01``
|
||||
appears nested under ``CCC.Logging.CN03``). Prowler requires unique ids
|
||||
within a catalog file, so we rename the AR to fit its parent control:
|
||||
``CCC.Logging.CN03.AR01``. See ``rewrite_ar_id()``.
|
||||
- **Genuine collision renumbering**: sometimes upstream has a real typo
|
||||
where two distinct requirements share the same id (e.g.
|
||||
``CCC.Core.CN14.AR02`` appears twice for 30-day and 14-day backup variants).
|
||||
The second copy is renumbered to the next free AR number within the
|
||||
control. See the ``seen_ids`` logic in ``emit_requirement()``.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
from pathlib import Path
|
||||
|
||||
import yaml
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Helpers
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def clean(value: str | None) -> str:
|
||||
"""Trim and collapse internal whitespace/newlines into single spaces.
|
||||
|
||||
Upstream YAML uses ``|`` block scalars that preserve newlines; Prowler
|
||||
stores descriptions as single-line text.
|
||||
"""
|
||||
if not value:
|
||||
return ""
|
||||
return " ".join(value.split())
|
||||
|
||||
|
||||
def flatten_mappings(mappings):
|
||||
"""Convert upstream ``{reference-id, entries: [{reference-id, ...}]}`` to
|
||||
Prowler's ``{ReferenceId, Identifiers: [...]}``.
|
||||
"""
|
||||
if not mappings:
|
||||
return []
|
||||
out = []
|
||||
for m in mappings:
|
||||
ids = []
|
||||
for entry in m.get("entries") or []:
|
||||
eid = entry.get("reference-id")
|
||||
if eid:
|
||||
ids.append(eid)
|
||||
out.append({"ReferenceId": m.get("reference-id", ""), "Identifiers": ids})
|
||||
return out
|
||||
|
||||
|
||||
def ar_prefix(ar_id: str) -> str:
|
||||
"""Return the first three dot-segments of an AR id (the parent control).
|
||||
|
||||
e.g. ``CCC.Core.CN01.AR01`` -> ``CCC.Core.CN01``.
|
||||
"""
|
||||
return ".".join(ar_id.split(".")[:3])
|
||||
|
||||
|
||||
def rewrite_ar_id(parent_control_id: str, original_ar_id: str, ar_index: int) -> str:
|
||||
"""If an AR's id doesn't share its parent control's prefix, rename it.
|
||||
|
||||
Example
|
||||
-------
|
||||
parent ``CCC.Logging.CN03`` + AR id ``CCC.AuditLog.CN08.AR01`` with
|
||||
index 0 -> ``CCC.Logging.CN03.AR01``.
|
||||
"""
|
||||
if ar_prefix(original_ar_id) == parent_control_id:
|
||||
return original_ar_id
|
||||
return f"{parent_control_id}.AR{ar_index + 1:02d}"
|
||||
|
||||
|
||||
def emit_requirement(
|
||||
control: dict,
|
||||
family_name: str,
|
||||
family_desc: str,
|
||||
seen_ids: set[str],
|
||||
requirements: list[dict],
|
||||
) -> None:
|
||||
"""Translate one FINOS control + its assessment-requirements into
|
||||
Prowler-format requirement dicts and append them to ``requirements``.
|
||||
|
||||
Applies foreign-prefix rewriting and genuine-collision renumbering so
|
||||
the final list is guaranteed to have unique ids.
|
||||
"""
|
||||
control_id = clean(control.get("id"))
|
||||
control_title = clean(control.get("title"))
|
||||
section = f"{control_id} {control_title}".strip()
|
||||
objective = clean(control.get("objective"))
|
||||
threat_mappings = flatten_mappings(control.get("threat-mappings"))
|
||||
guideline_mappings = flatten_mappings(control.get("guideline-mappings"))
|
||||
ars = control.get("assessment-requirements") or []
|
||||
for idx, ar in enumerate(ars):
|
||||
raw_id = clean(ar.get("id"))
|
||||
if not raw_id:
|
||||
continue
|
||||
new_id = rewrite_ar_id(control_id, raw_id, idx)
|
||||
# Renumber on genuine upstream collision (find next free AR number)
|
||||
if new_id in seen_ids:
|
||||
base = ".".join(new_id.split(".")[:-1])
|
||||
n = 1
|
||||
while f"{base}.AR{n:02d}" in seen_ids:
|
||||
n += 1
|
||||
new_id = f"{base}.AR{n:02d}"
|
||||
seen_ids.add(new_id)
|
||||
|
||||
requirements.append(
|
||||
{
|
||||
"Id": new_id,
|
||||
"Description": clean(ar.get("text")),
|
||||
"Attributes": [
|
||||
{
|
||||
"FamilyName": family_name,
|
||||
"FamilyDescription": family_desc,
|
||||
"Section": section,
|
||||
"SubSection": "",
|
||||
"SubSectionObjective": objective,
|
||||
"Applicability": list(ar.get("applicability") or []),
|
||||
"Recommendation": clean(ar.get("recommendation")),
|
||||
"SectionThreatMappings": threat_mappings,
|
||||
"SectionGuidelineMappings": guideline_mappings,
|
||||
}
|
||||
],
|
||||
"Checks": [],
|
||||
}
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Public entry point
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def parse_upstream(config: dict) -> list[dict]:
|
||||
"""Walk upstream YAMLs and emit Prowler-format requirements.
|
||||
|
||||
Handles both top-level shapes (``control-families`` and ``controls``).
|
||||
Ids are guaranteed unique in the returned list.
|
||||
"""
|
||||
upstream_dir = Path(config["upstream"]["dir"])
|
||||
parser_cfg = config.get("parser") or {}
|
||||
catalog_files = parser_cfg.get("catalog_files") or []
|
||||
family_id_title = parser_cfg.get("family_id_title") or {}
|
||||
family_id_description = parser_cfg.get("family_id_description") or {}
|
||||
|
||||
requirements: list[dict] = []
|
||||
seen_ids: set[str] = set()
|
||||
|
||||
for filename in catalog_files:
|
||||
path = upstream_dir / filename
|
||||
if not path.exists():
|
||||
# The runner handles fatal errors; a missing optional catalog
|
||||
# file is surfaced as a warning via print to stderr.
|
||||
import sys
|
||||
|
||||
print(f"warn: missing upstream file {filename}", file=sys.stderr)
|
||||
continue
|
||||
with open(path) as f:
|
||||
doc = yaml.safe_load(f) or {}
|
||||
|
||||
# Shape 1: control-families wrapper
|
||||
for family in doc.get("control-families") or []:
|
||||
family_name = clean(family.get("title"))
|
||||
family_desc = clean(family.get("description"))
|
||||
for control in family.get("controls") or []:
|
||||
emit_requirement(
|
||||
control, family_name, family_desc, seen_ids, requirements
|
||||
)
|
||||
|
||||
# Shape 2: top-level controls with family reference id
|
||||
for control in doc.get("controls") or []:
|
||||
family_ref = clean(control.get("family"))
|
||||
suffix = family_ref.split(".")[-1] if family_ref else ""
|
||||
family_name = family_id_title.get(suffix, suffix or "Data")
|
||||
family_desc = family_id_description.get(suffix, "")
|
||||
emit_requirement(
|
||||
control, family_name, family_desc, seen_ids, requirements
|
||||
)
|
||||
|
||||
return requirements
|
||||
86
skills/prowler-compliance/assets/query_checks.py
Normal file
86
skills/prowler-compliance/assets/query_checks.py
Normal file
@@ -0,0 +1,86 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Keyword/service/id lookup over a Prowler check inventory produced by
|
||||
build_inventory.py.
|
||||
|
||||
Usage:
|
||||
# Keyword AND-search across id + title + risk + description
|
||||
python skills/prowler-compliance/assets/query_checks.py aws encryption transit
|
||||
|
||||
# Show all checks for a service
|
||||
python skills/prowler-compliance/assets/query_checks.py aws --service iam
|
||||
|
||||
# Show full metadata for one check id
|
||||
python skills/prowler-compliance/assets/query_checks.py aws --id kms_cmk_rotation_enabled
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import json
|
||||
import sys
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) < 3:
|
||||
print(__doc__)
|
||||
return 1
|
||||
|
||||
provider = sys.argv[1]
|
||||
try:
|
||||
with open(f"/tmp/checks_{provider}.json") as f:
|
||||
inv = json.load(f)
|
||||
except FileNotFoundError:
|
||||
print(
|
||||
f"No inventory for {provider}. Run build_inventory.py first.",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 2
|
||||
|
||||
if sys.argv[2] == "--service":
|
||||
if len(sys.argv) < 4:
|
||||
print("usage: --service <service_name>")
|
||||
return 1
|
||||
svc = sys.argv[3]
|
||||
hits = [cid for cid in sorted(inv) if inv[cid].get("service") == svc]
|
||||
for cid in hits:
|
||||
print(f" {cid}")
|
||||
print(f" {inv[cid].get('title', '')}")
|
||||
print(f"\n{len(hits)} checks in service '{svc}'")
|
||||
elif sys.argv[2] == "--id":
|
||||
if len(sys.argv) < 4:
|
||||
print("usage: --id <check_id>")
|
||||
return 1
|
||||
cid = sys.argv[3]
|
||||
if cid not in inv:
|
||||
print(f"NOT FOUND: {cid}")
|
||||
return 3
|
||||
m = inv[cid]
|
||||
print(f"== {cid} ==")
|
||||
print(f"service : {m.get('service')}")
|
||||
print(f"severity: {m.get('severity')}")
|
||||
print(f"resource: {m.get('resource')}")
|
||||
print(f"title : {m.get('title')}")
|
||||
print(f"desc : {m.get('description', '')[:500]}")
|
||||
print(f"risk : {m.get('risk', '')[:500]}")
|
||||
else:
|
||||
keywords = [k.lower() for k in sys.argv[2:]]
|
||||
hits = 0
|
||||
for cid in sorted(inv):
|
||||
m = inv[cid]
|
||||
blob = " ".join(
|
||||
[
|
||||
cid,
|
||||
m.get("title", ""),
|
||||
m.get("risk", ""),
|
||||
m.get("description", ""),
|
||||
]
|
||||
).lower()
|
||||
if all(k in blob for k in keywords):
|
||||
hits += 1
|
||||
print(f" {cid} [{m.get('service', '')}]")
|
||||
print(f" {m.get('title', '')[:120]}")
|
||||
print(f"\n{hits} matches for {' + '.join(keywords)}")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
478
skills/prowler-compliance/assets/sync_framework.py
Normal file
478
skills/prowler-compliance/assets/sync_framework.py
Normal file
@@ -0,0 +1,478 @@
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Generic, config-driven compliance framework sync runner.
|
||||
|
||||
Usage:
|
||||
python skills/prowler-compliance/assets/sync_framework.py \
|
||||
skills/prowler-compliance/assets/configs/ccc.yaml
|
||||
|
||||
Pipeline:
|
||||
1. Load and validate the YAML config (fail fast on missing or empty
|
||||
required fields — notably ``framework.version``, which silently
|
||||
breaks ``get_check_compliance()`` key construction if empty).
|
||||
2. Dynamically import the parser module declared in ``parser.module``
|
||||
(resolved as ``parsers.{name}`` under this script's directory).
|
||||
3. Call ``parser.parse_upstream(config) -> list[dict]`` to get raw
|
||||
Prowler-format requirements. The parser owns all upstream-format
|
||||
quirks (foreign-prefix AR rewriting, collision renumbering, shape
|
||||
handling) and MUST return ids that are unique within the returned
|
||||
list.
|
||||
4. **Safety net**: assert id uniqueness. The runner raises
|
||||
``ValueError`` on any duplicate — it does NOT silently renumber,
|
||||
because mutating a canonical upstream id (e.g. CIS ``1.1.1`` or
|
||||
NIST ``AC-2(1)``) would be catastrophic.
|
||||
5. Apply generic ``FamilyName`` normalization from
|
||||
``post_processing.family_name_normalization`` (optional).
|
||||
6. Preserve legacy ``Checks`` lists from the existing Prowler JSON
|
||||
using a config-driven primary key + fallback key chain. CCC uses
|
||||
``(Section, Applicability)`` as fallback; CIS would use
|
||||
``(Section, Profile)``; NIST would use ``(ItemId,)``.
|
||||
7. Wrap each provider's requirements in the framework metadata dict
|
||||
built from the config templates.
|
||||
8. Write each provider's JSON to the path resolved from
|
||||
``output.path_template`` (supports ``{framework}``, ``{version}``
|
||||
and ``{provider}`` placeholders).
|
||||
9. Pydantic-validate the written JSON via ``Compliance.parse_file()``
|
||||
and report the load counts per provider.
|
||||
|
||||
The runner is strictly generic — it never mentions CCC, knows nothing
|
||||
about YAML shapes, and can handle any upstream-backed framework given a
|
||||
parser module and a config file.
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import importlib
|
||||
import json
|
||||
import sys
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
import yaml
|
||||
|
||||
# Make sibling `parsers/` package importable regardless of the runner's
|
||||
# invocation directory.
|
||||
_SCRIPT_DIR = Path(__file__).resolve().parent
|
||||
if str(_SCRIPT_DIR) not in sys.path:
|
||||
sys.path.insert(0, str(_SCRIPT_DIR))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Config loading and validation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
class ConfigError(ValueError):
|
||||
"""Raised when the sync config is malformed or missing required fields."""
|
||||
|
||||
|
||||
def _require(cfg: dict, dotted_path: str) -> Any:
|
||||
"""Fetch a dotted-path key from nested dicts. Raises ConfigError on
|
||||
missing or empty values (empty-string, empty-list, None)."""
|
||||
current: Any = cfg
|
||||
parts = dotted_path.split(".")
|
||||
for i, part in enumerate(parts):
|
||||
if not isinstance(current, dict) or part not in current:
|
||||
raise ConfigError(f"config: missing required field '{dotted_path}'")
|
||||
current = current[part]
|
||||
if current in ("", None, [], {}):
|
||||
raise ConfigError(f"config: field '{dotted_path}' must not be empty")
|
||||
return current
|
||||
|
||||
|
||||
def load_config(path: Path) -> dict:
|
||||
if not path.exists():
|
||||
raise ConfigError(f"config file not found: {path}")
|
||||
with open(path) as f:
|
||||
cfg = yaml.safe_load(f) or {}
|
||||
if not isinstance(cfg, dict):
|
||||
raise ConfigError(f"config root must be a mapping, got {type(cfg).__name__}")
|
||||
|
||||
# Required fields — fail fast. Empty Version in particular silently
|
||||
# breaks get_check_compliance() key construction.
|
||||
_require(cfg, "framework.name")
|
||||
_require(cfg, "framework.display_name")
|
||||
_require(cfg, "framework.version")
|
||||
_require(cfg, "framework.description_template")
|
||||
_require(cfg, "providers")
|
||||
_require(cfg, "output.path_template")
|
||||
_require(cfg, "upstream.dir")
|
||||
_require(cfg, "parser.module")
|
||||
_require(cfg, "post_processing.check_preservation.primary_key")
|
||||
|
||||
providers = cfg["providers"]
|
||||
if not isinstance(providers, list) or not providers:
|
||||
raise ConfigError("config: 'providers' must be a non-empty list")
|
||||
for idx, p in enumerate(providers):
|
||||
if not isinstance(p, dict) or "key" not in p or "display" not in p:
|
||||
raise ConfigError(
|
||||
f"config: providers[{idx}] must have 'key' and 'display' fields"
|
||||
)
|
||||
|
||||
return cfg
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Parser loading
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def load_parser(parser_module_name: str):
|
||||
try:
|
||||
return importlib.import_module(f"parsers.{parser_module_name}")
|
||||
except ImportError as exc:
|
||||
raise ConfigError(
|
||||
f"cannot import parser 'parsers.{parser_module_name}': {exc}"
|
||||
) from exc
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Post-processing: id uniqueness safety net
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def assert_unique_ids(requirements: list[dict]) -> None:
|
||||
"""Enforce the parser contract: every requirement must have a unique Id.
|
||||
|
||||
The runner never renumbers silently — a duplicate is a parser bug.
|
||||
"""
|
||||
seen: set[str] = set()
|
||||
dups: list[str] = []
|
||||
for req in requirements:
|
||||
rid = req.get("Id")
|
||||
if not rid:
|
||||
raise ValueError(f"requirement missing Id: {req}")
|
||||
if rid in seen:
|
||||
dups.append(rid)
|
||||
seen.add(rid)
|
||||
if dups:
|
||||
raise ValueError(
|
||||
f"parser returned duplicate requirement ids: {sorted(set(dups))}"
|
||||
)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Post-processing: FamilyName normalization
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def normalize_family_names(requirements: list[dict], norm_map: dict[str, str]) -> None:
|
||||
"""Apply ``Attributes[0].FamilyName`` normalization in place."""
|
||||
if not norm_map:
|
||||
return
|
||||
for req in requirements:
|
||||
for attr in req.get("Attributes") or []:
|
||||
name = attr.get("FamilyName")
|
||||
if name in norm_map:
|
||||
attr["FamilyName"] = norm_map[name]
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Post-processing: legacy check-mapping preservation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _freeze(value: Any) -> Any:
|
||||
"""Make a value hashable for use in composite lookup keys.
|
||||
|
||||
Lists become frozensets (order-insensitive match). Scalars pass through.
|
||||
"""
|
||||
if isinstance(value, list):
|
||||
return frozenset(value)
|
||||
return value
|
||||
|
||||
|
||||
def _build_fallback_key(attrs: dict, field_names: list[str]) -> tuple | None:
|
||||
"""Build a composite tuple key from the given attribute field names.
|
||||
|
||||
Returns None if any field is missing or falsy — that key will be
|
||||
skipped (the lookup table just won't have an entry for it).
|
||||
"""
|
||||
parts = []
|
||||
for name in field_names:
|
||||
if name not in attrs:
|
||||
return None
|
||||
value = attrs[name]
|
||||
if value in ("", None, [], {}):
|
||||
return None
|
||||
parts.append(_freeze(value))
|
||||
return tuple(parts)
|
||||
|
||||
|
||||
def load_legacy_check_maps(
|
||||
legacy_path: Path,
|
||||
primary_key: str,
|
||||
fallback_keys: list[list[str]],
|
||||
) -> tuple[dict[str, list[str]], list[dict[tuple, list[str]]]]:
|
||||
"""Read the existing Prowler JSON and build lookup tables for check
|
||||
preservation.
|
||||
|
||||
Returns
|
||||
-------
|
||||
by_primary : dict
|
||||
``{primary_value: [checks]}`` — e.g. ``{ar_id: [checks]}``.
|
||||
by_fallback : list[dict]
|
||||
One lookup dict per entry in ``fallback_keys``. Each maps a
|
||||
composite tuple key to its preserved checks list.
|
||||
"""
|
||||
by_primary: dict[str, list[str]] = {}
|
||||
by_fallback: list[dict[tuple, list[str]]] = [{} for _ in fallback_keys]
|
||||
|
||||
if not legacy_path.exists():
|
||||
return by_primary, by_fallback
|
||||
|
||||
with open(legacy_path) as f:
|
||||
data = json.load(f)
|
||||
|
||||
for req in data.get("Requirements") or []:
|
||||
checks = req.get("Checks") or []
|
||||
# Primary index
|
||||
pv = req.get(primary_key)
|
||||
if pv:
|
||||
bucket = by_primary.setdefault(pv, [])
|
||||
for c in checks:
|
||||
if c not in bucket:
|
||||
bucket.append(c)
|
||||
|
||||
# Fallback indexes — read from Attributes[0]
|
||||
attributes = req.get("Attributes") or []
|
||||
if not attributes:
|
||||
continue
|
||||
attrs = attributes[0]
|
||||
for i, field_names in enumerate(fallback_keys):
|
||||
key = _build_fallback_key(attrs, field_names)
|
||||
if key is None:
|
||||
continue
|
||||
bucket = by_fallback[i].setdefault(key, [])
|
||||
for c in checks:
|
||||
if c not in bucket:
|
||||
bucket.append(c)
|
||||
|
||||
return by_primary, by_fallback
|
||||
|
||||
|
||||
def lookup_preserved_checks(
|
||||
req: dict,
|
||||
by_primary: dict,
|
||||
by_fallback: list[dict],
|
||||
primary_key: str,
|
||||
fallback_keys: list[list[str]],
|
||||
) -> list[str]:
|
||||
"""Return preserved check ids for a requirement, trying the primary
|
||||
key first then each fallback in order."""
|
||||
pv = req.get(primary_key)
|
||||
if pv and pv in by_primary:
|
||||
return list(by_primary[pv])
|
||||
attributes = req.get("Attributes") or []
|
||||
if not attributes:
|
||||
return []
|
||||
attrs = attributes[0]
|
||||
for i, field_names in enumerate(fallback_keys):
|
||||
key = _build_fallback_key(attrs, field_names)
|
||||
if key and key in by_fallback[i]:
|
||||
return list(by_fallback[i][key])
|
||||
return []
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Provider output assembly
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def resolve_output_path(template: str, framework: dict, provider_key: str) -> Path:
|
||||
return Path(
|
||||
template.format(
|
||||
provider=provider_key,
|
||||
framework=framework["name"].lower(),
|
||||
version=framework["version"],
|
||||
)
|
||||
)
|
||||
|
||||
|
||||
def build_provider_json(
|
||||
config: dict,
|
||||
provider: dict,
|
||||
base_requirements: list[dict],
|
||||
) -> tuple[dict, dict[str, int]]:
|
||||
"""Produce the provider-specific JSON dict ready to dump.
|
||||
|
||||
Returns ``(json_dict, counts)`` where ``counts`` tracks how each
|
||||
requirement's checks were resolved (primary, fallback, or none).
|
||||
"""
|
||||
framework = config["framework"]
|
||||
preservation = config["post_processing"]["check_preservation"]
|
||||
primary_key = preservation["primary_key"]
|
||||
fallback_keys = preservation.get("fallback_keys") or []
|
||||
|
||||
legacy_path = resolve_output_path(
|
||||
config["output"]["path_template"], framework, provider["key"]
|
||||
)
|
||||
by_primary, by_fallback = load_legacy_check_maps(
|
||||
legacy_path, primary_key, fallback_keys
|
||||
)
|
||||
|
||||
counts = {"primary": 0, "fallback": 0, "none": 0}
|
||||
enriched: list[dict] = []
|
||||
for req in base_requirements:
|
||||
# Try primary key first
|
||||
pv = req.get(primary_key)
|
||||
checks: list[str] = []
|
||||
source = "none"
|
||||
if pv and pv in by_primary:
|
||||
checks = list(by_primary[pv])
|
||||
source = "primary"
|
||||
else:
|
||||
attributes = req.get("Attributes") or []
|
||||
if attributes:
|
||||
attrs = attributes[0]
|
||||
for i, field_names in enumerate(fallback_keys):
|
||||
key = _build_fallback_key(attrs, field_names)
|
||||
if key and key in by_fallback[i]:
|
||||
checks = list(by_fallback[i][key])
|
||||
source = "fallback"
|
||||
break
|
||||
counts[source] += 1
|
||||
enriched.append(
|
||||
{
|
||||
"Id": req["Id"],
|
||||
"Description": req["Description"],
|
||||
# Shallow-copy attribute dicts so providers don't share refs
|
||||
"Attributes": [dict(a) for a in req.get("Attributes") or []],
|
||||
"Checks": checks,
|
||||
}
|
||||
)
|
||||
|
||||
description = framework["description_template"].format(
|
||||
provider_display=provider["display"],
|
||||
provider_key=provider["key"],
|
||||
framework_name=framework["name"],
|
||||
framework_display=framework["display_name"],
|
||||
version=framework["version"],
|
||||
)
|
||||
out = {
|
||||
"Framework": framework["name"],
|
||||
"Version": framework["version"],
|
||||
"Provider": provider["display"],
|
||||
"Name": framework["display_name"],
|
||||
"Description": description,
|
||||
"Requirements": enriched,
|
||||
}
|
||||
return out, counts
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Pydantic post-validation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def pydantic_validate(json_path: Path) -> int:
|
||||
"""Import Prowler lazily so the runner still works without Prowler
|
||||
installed (validation step is skipped in that case)."""
|
||||
try:
|
||||
from prowler.lib.check.compliance_models import Compliance
|
||||
except ImportError:
|
||||
print(
|
||||
" note: prowler package not importable — skipping Pydantic validation",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return -1
|
||||
try:
|
||||
parsed = Compliance.parse_file(str(json_path))
|
||||
except Exception as exc:
|
||||
raise RuntimeError(
|
||||
f"Pydantic validation failed for {json_path}: {exc}"
|
||||
) from exc
|
||||
return len(parsed.Requirements)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Driver
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def main() -> int:
|
||||
if len(sys.argv) != 2:
|
||||
print("usage: sync_framework.py <config.yaml>", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
config_path = Path(sys.argv[1])
|
||||
try:
|
||||
config = load_config(config_path)
|
||||
except ConfigError as exc:
|
||||
print(f"config error: {exc}", file=sys.stderr)
|
||||
return 2
|
||||
|
||||
framework_name = config["framework"]["name"]
|
||||
upstream_dir = Path(config["upstream"]["dir"])
|
||||
if not upstream_dir.exists():
|
||||
print(
|
||||
f"error: upstream cache dir {upstream_dir} not found\n"
|
||||
f" hint: {config['upstream'].get('fetch_docs', '(see SKILL.md Workflow A Step 1)')}",
|
||||
file=sys.stderr,
|
||||
)
|
||||
return 3
|
||||
|
||||
parser_module_name = config["parser"]["module"]
|
||||
print(
|
||||
f"Sync: framework={framework_name} version={config['framework']['version']} "
|
||||
f"parser={parser_module_name}"
|
||||
)
|
||||
|
||||
try:
|
||||
parser = load_parser(parser_module_name)
|
||||
except ConfigError as exc:
|
||||
print(f"parser error: {exc}", file=sys.stderr)
|
||||
return 4
|
||||
|
||||
print(f"Parsing upstream from {upstream_dir}...")
|
||||
base_requirements = parser.parse_upstream(config)
|
||||
print(f" parser returned {len(base_requirements)} requirements")
|
||||
|
||||
# Safety-net: parser contract
|
||||
try:
|
||||
assert_unique_ids(base_requirements)
|
||||
except ValueError as exc:
|
||||
print(f"parser contract violation: {exc}", file=sys.stderr)
|
||||
return 5
|
||||
|
||||
# Post-processing: family name normalization
|
||||
norm_map = (
|
||||
config.get("post_processing", {})
|
||||
.get("family_name_normalization")
|
||||
or {}
|
||||
)
|
||||
normalize_family_names(base_requirements, norm_map)
|
||||
|
||||
# Per-provider output
|
||||
print()
|
||||
for provider in config["providers"]:
|
||||
provider_json, counts = build_provider_json(
|
||||
config, provider, base_requirements
|
||||
)
|
||||
out_path = resolve_output_path(
|
||||
config["output"]["path_template"],
|
||||
config["framework"],
|
||||
provider["key"],
|
||||
)
|
||||
out_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
with open(out_path, "w") as f:
|
||||
json.dump(provider_json, f, indent=2, ensure_ascii=False)
|
||||
f.write("\n")
|
||||
|
||||
validated = pydantic_validate(out_path)
|
||||
validated_msg = (
|
||||
f" pydantic_reqs={validated}" if validated >= 0 else " pydantic=skipped"
|
||||
)
|
||||
print(
|
||||
f" {provider['key']}: total={len(provider_json['Requirements'])} "
|
||||
f"matched_primary={counts['primary']} "
|
||||
f"matched_fallback={counts['fallback']} "
|
||||
f"new_or_unmatched={counts['none']}{validated_msg}"
|
||||
)
|
||||
print(f" wrote {out_path}")
|
||||
|
||||
print("\nDone.")
|
||||
return 0
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
sys.exit(main())
|
||||
Reference in New Issue
Block a user