mirror of
https://github.com/prowler-cloud/prowler.git
synced 2026-07-04 19:21:51 +00:00
feat(sdk): replace detect-secrets library with kingfisher (#11694)
This commit is contained in:
committed by
GitHub
parent
ed1fec8866
commit
5dac8a0a53
@@ -445,3 +445,5 @@ The metadata structure is enforced in code using a Pydantic model. For reference
|
||||
## Specific Check Patterns
|
||||
|
||||
Details for specific providers can be found in documentation pages named using the pattern `<provider_name>-details`.
|
||||
|
||||
Checks that scan resources for plaintext secrets follow a dedicated batched structure. Refer to [Secret-Scanning Checks](/developer-guide/secret-scanning-checks) before creating or updating one.
|
||||
|
||||
@@ -153,7 +153,6 @@ Only fields with a numeric range, a fixed value set, or a length cap are listed.
|
||||
| `max_days_secret_unused` | `7..365` days | |
|
||||
| `max_days_secret_unrotated` | `1..180` days | NIST IA-5: rotate quarterly; CIS ≤90 |
|
||||
| `min_kinesis_stream_retention_hours` | `24..8760` h | 1 day .. 1 year |
|
||||
| `detect_secrets_plugins[].limit` | `0.0..10.0` | Shannon entropy threshold |
|
||||
| `shodan_api_key` | ≤512 chars | |
|
||||
|
||||
### Azure
|
||||
|
||||
@@ -0,0 +1,119 @@
|
||||
---
|
||||
title: 'Secret-Scanning Checks'
|
||||
---
|
||||
|
||||
import { VersionBadge } from "/snippets/version-badge.mdx"
|
||||
|
||||
<VersionBadge version="5.32.0" />
|
||||
|
||||
Prowler scans audited resources for plaintext secrets using [Kingfisher](https://github.com/mongodb/kingfisher), an open-source secret-scanning engine that Prowler invokes as a subprocess. This guide explains the structure every secret-scanning check must follow to keep scanning correct and efficient on large accounts.
|
||||
|
||||
<Note>
|
||||
Since Prowler 5.32.0 the secret-scanning checks scan with Kingfisher. Earlier versions used the `detect-secrets` library.
|
||||
</Note>
|
||||
|
||||
## Overview
|
||||
|
||||
Secret detection runs through a single helper in `prowler/lib/utils/utils.py`:
|
||||
|
||||
- **`detect_secrets_scan_batch(payloads, excluded_secrets=..., validate=...)`** scans many payloads in chunked subprocess invocations and returns a `{key: [findings]}` dictionary. To scan a single payload, pass a one-entry mapping (for example, `{0: data}`).
|
||||
|
||||
Every Kingfisher invocation carries a fixed process-startup cost (around 100 ms). Scanning once per resource would spawn thousands of subprocesses on large accounts (for example, thousands of CloudWatch log groups). `detect_secrets_scan_batch` amortizes that cost: it writes each payload to a temporary file as it consumes them, runs one subprocess per chunk (500 payloads by default), and maps the findings back to each payload by key.
|
||||
|
||||
## The Batched Structure
|
||||
|
||||
Every secret-scanning check follows three phases.
|
||||
|
||||
### Phase 1: Collect
|
||||
|
||||
Define a generator that yields `(key, payload)` for each scannable unit. The generator builds payload strings only — it does not call Kingfisher. Lazy yielding keeps memory and temporary-disk usage bounded to a single chunk, which matters when an account holds thousands of resources.
|
||||
|
||||
### Phase 2: Batch
|
||||
|
||||
Call `detect_secrets_scan_batch` once with the generator. The helper consumes it in chunks, runs Kingfisher per chunk, and returns the keys that produced findings mapped to their finding lists.
|
||||
|
||||
### Phase 3: Report
|
||||
|
||||
Iterate the resources, look up the findings by key, and build one report per resource. Emit a finding for **every** iterated resource — never drop one silently. When a resource's payload cannot be prepared for scanning (for example, user data that fails to base64-decode or decompress), report it as `MANUAL` with a status explaining the scan could not inspect it, rather than omitting it or claiming `PASS`.
|
||||
|
||||
```python
|
||||
from prowler.lib.check.models import Check, Check_Report_AWS
|
||||
from prowler.lib.utils.utils import (
|
||||
annotate_verified_secrets,
|
||||
detect_secrets_scan_batch,
|
||||
)
|
||||
from prowler.providers.aws.services.example.example_client import example_client
|
||||
|
||||
|
||||
class example_resource_no_secrets(Check):
|
||||
def execute(self):
|
||||
findings = []
|
||||
excluded = example_client.audit_config.get("secrets_ignore_patterns", [])
|
||||
validate = example_client.audit_config.get("secrets_validate", False)
|
||||
resources = list(example_client.resources)
|
||||
|
||||
# Phase 1: collect — builds strings only, no scan.
|
||||
def payloads():
|
||||
for index, resource in enumerate(resources):
|
||||
if resource.scannable_data:
|
||||
yield index, serialize(resource)
|
||||
|
||||
# Phase 2: batch — one call, chunked subprocesses.
|
||||
batch_results = detect_secrets_scan_batch(
|
||||
payloads(), excluded_secrets=excluded, validate=validate
|
||||
)
|
||||
|
||||
# Phase 3: report — look up findings by key.
|
||||
for index, resource in enumerate(resources):
|
||||
report = Check_Report_AWS(metadata=self.metadata(), resource=resource)
|
||||
report.status = "PASS"
|
||||
report.status_extended = f"No secrets found in {resource.name}."
|
||||
detect_secrets_output = batch_results.get(index)
|
||||
if detect_secrets_output:
|
||||
report.status = "FAIL"
|
||||
report.status_extended = (
|
||||
f"Potential secret found in {resource.name} -> ..."
|
||||
)
|
||||
annotate_verified_secrets(report, detect_secrets_output)
|
||||
findings.append(report)
|
||||
|
||||
return findings
|
||||
```
|
||||
|
||||
## Choosing the Key
|
||||
|
||||
The key maps each finding back to its source. Two shapes cover every check:
|
||||
|
||||
- **One payload per resource:** use the resource index. This fits checks that serialize a single payload per resource, such as launch configurations, CloudFormation outputs, SSM documents, Step Functions definitions, and OpenStack metadata.
|
||||
- **Several payloads per resource:** use a `(resource_index, fragment)` tuple, where the fragment identifies the variable, log stream, container, file, or version. Phase 3 groups the per-fragment findings to build the resource report. This fits CloudWatch log streams, ECS containers, CodeBuild variables, Glue arguments, and Lambda code files.
|
||||
|
||||
Derive the indices from the same `list(...)` of resources in both Phase 1 and Phase 3 so the order stays stable and the keys align.
|
||||
|
||||
## Preserving Per-Payload Results
|
||||
|
||||
`detect_secrets_scan_batch` runs Kingfisher with `--no-dedup`, so a secret that appears in more than one payload is reported for each one. This reproduces the result of scanning each payload individually. Build payload strings exactly as a single scan would: serialize the same data and keep line ordering, because messages often map a finding's `line_number` back to a variable name or metadata key.
|
||||
|
||||
## Validation and Severity
|
||||
|
||||
`detect_secrets_scan_batch` accepts `validate`, read from `secrets_validate` in the provider configuration or the `--scan-secrets-validate` flag. When enabled, Kingfisher confirms whether each secret is live, and confirmed secrets carry `is_verified: True`.
|
||||
|
||||
After marking a report as `FAIL`, pass the findings to `annotate_verified_secrets(report, findings)`. When any secret is verified, the helper escalates the finding to critical severity and appends a note that the secret was confirmed live. Validation stays off by default because it sends the discovered secret to the provider API.
|
||||
|
||||
## Excluded Secrets
|
||||
|
||||
`detect_secrets_scan_batch` applies `secrets_ignore_patterns` — regular expressions from the provider configuration — against each finding's source line and drops the matches, mirroring single-scan behavior.
|
||||
|
||||
## Testing
|
||||
|
||||
To assert on the verified-secret path, mock `detect_secrets_scan_batch` in the check module and return the keyed dictionary. For a single resource scanned at index `0`:
|
||||
|
||||
```python
|
||||
mock.patch(
|
||||
"prowler.providers.aws.services.example.example_resource_no_secrets.example_resource_no_secrets.detect_secrets_scan_batch",
|
||||
return_value={
|
||||
0: [{"type": "...", "line_number": 1, "is_verified": True}]
|
||||
},
|
||||
)
|
||||
```
|
||||
|
||||
Most tests need no mock at all: they seed resources that contain example secrets and assert on the `FAIL` status and message, which exercises the real batched path. Refer to the [Testing](/developer-guide/unit-testing) documentation for the general structure.
|
||||
Reference in New Issue
Block a user