feat(sdk): replace detect-secrets library with kingfisher (#11694)

2026-07-04 19:21:51 +00:00 · 2026-06-30 15:36:23 +02:00
parent ed1fec8866
commit 5dac8a0a53
60 changed files with 2969 additions and 881 deletions
@@ -445,3 +445,5 @@ The metadata structure is enforced in code using a Pydantic model. For reference
 ## Specific Check Patterns

 Details for specific providers can be found in documentation pages named using the pattern `<provider_name>-details`.
+
+Checks that scan resources for plaintext secrets follow a dedicated batched structure. Refer to [Secret-Scanning Checks](/developer-guide/secret-scanning-checks) before creating or updating one.
@@ -153,7 +153,6 @@ Only fields with a numeric range, a fixed value set, or a length cap are listed.
 | `max_days_secret_unused` | `7..365` days | |
 | `max_days_secret_unrotated` | `1..180` days | NIST IA-5: rotate quarterly; CIS ≤90 |
 | `min_kinesis_stream_retention_hours` | `24..8760` h | 1 day .. 1 year |
-| `detect_secrets_plugins[].limit` | `0.0..10.0` | Shannon entropy threshold |
 | `shodan_api_key` | ≤512 chars | |

 ### Azure
@@ -0,0 +1,119 @@
+---
+title: 'Secret-Scanning Checks'
+---
+
+import { VersionBadge } from "/snippets/version-badge.mdx"
+
+<VersionBadge version="5.32.0" />
+
+Prowler scans audited resources for plaintext secrets using [Kingfisher](https://github.com/mongodb/kingfisher), an open-source secret-scanning engine that Prowler invokes as a subprocess. This guide explains the structure every secret-scanning check must follow to keep scanning correct and efficient on large accounts.
+
+<Note>
+Since Prowler 5.32.0 the secret-scanning checks scan with Kingfisher. Earlier versions used the `detect-secrets` library.
+</Note>
+
+## Overview
+
+Secret detection runs through a single helper in `prowler/lib/utils/utils.py`:
+
+- **`detect_secrets_scan_batch(payloads, excluded_secrets=..., validate=...)`** scans many payloads in chunked subprocess invocations and returns a `{key: [findings]}` dictionary. To scan a single payload, pass a one-entry mapping (for example, `{0: data}`).
+
+Every Kingfisher invocation carries a fixed process-startup cost (around 100 ms). Scanning once per resource would spawn thousands of subprocesses on large accounts (for example, thousands of CloudWatch log groups). `detect_secrets_scan_batch` amortizes that cost: it writes each payload to a temporary file as it consumes them, runs one subprocess per chunk (500 payloads by default), and maps the findings back to each payload by key.
+
+## The Batched Structure
+
+Every secret-scanning check follows three phases.
+
+### Phase 1: Collect
+
+Define a generator that yields `(key, payload)` for each scannable unit. The generator builds payload strings only — it does not call Kingfisher. Lazy yielding keeps memory and temporary-disk usage bounded to a single chunk, which matters when an account holds thousands of resources.
+
+### Phase 2: Batch
+
+Call `detect_secrets_scan_batch` once with the generator. The helper consumes it in chunks, runs Kingfisher per chunk, and returns the keys that produced findings mapped to their finding lists.
+
+### Phase 3: Report
+
+Iterate the resources, look up the findings by key, and build one report per resource. Emit a finding for **every** iterated resource — never drop one silently. When a resource's payload cannot be prepared for scanning (for example, user data that fails to base64-decode or decompress), report it as `MANUAL` with a status explaining the scan could not inspect it, rather than omitting it or claiming `PASS`.
+
+```python
+from prowler.lib.check.models import Check, Check_Report_AWS
+from prowler.lib.utils.utils import (
+    annotate_verified_secrets,
+    detect_secrets_scan_batch,
+)
+from prowler.providers.aws.services.example.example_client import example_client
+
+
+class example_resource_no_secrets(Check):
+    def execute(self):
+        findings = []
+        excluded = example_client.audit_config.get("secrets_ignore_patterns", [])
+        validate = example_client.audit_config.get("secrets_validate", False)
+        resources = list(example_client.resources)
+
+        # Phase 1: collect — builds strings only, no scan.
+        def payloads():
+            for index, resource in enumerate(resources):
+                if resource.scannable_data:
+                    yield index, serialize(resource)
+
+        # Phase 2: batch — one call, chunked subprocesses.
+        batch_results = detect_secrets_scan_batch(
+            payloads(), excluded_secrets=excluded, validate=validate
+        )
+
+        # Phase 3: report — look up findings by key.
+        for index, resource in enumerate(resources):
+            report = Check_Report_AWS(metadata=self.metadata(), resource=resource)
+            report.status = "PASS"
+            report.status_extended = f"No secrets found in {resource.name}."
+            detect_secrets_output = batch_results.get(index)
+            if detect_secrets_output:
+                report.status = "FAIL"
+                report.status_extended = (
+                    f"Potential secret found in {resource.name} -> ..."
+                )
+                annotate_verified_secrets(report, detect_secrets_output)
+            findings.append(report)
+
+        return findings
+```
+
+## Choosing the Key
+
+The key maps each finding back to its source. Two shapes cover every check:
+
+- **One payload per resource:** use the resource index. This fits checks that serialize a single payload per resource, such as launch configurations, CloudFormation outputs, SSM documents, Step Functions definitions, and OpenStack metadata.
+- **Several payloads per resource:** use a `(resource_index, fragment)` tuple, where the fragment identifies the variable, log stream, container, file, or version. Phase 3 groups the per-fragment findings to build the resource report. This fits CloudWatch log streams, ECS containers, CodeBuild variables, Glue arguments, and Lambda code files.
+
+Derive the indices from the same `list(...)` of resources in both Phase 1 and Phase 3 so the order stays stable and the keys align.
+
+## Preserving Per-Payload Results
+
+`detect_secrets_scan_batch` runs Kingfisher with `--no-dedup`, so a secret that appears in more than one payload is reported for each one. This reproduces the result of scanning each payload individually. Build payload strings exactly as a single scan would: serialize the same data and keep line ordering, because messages often map a finding's `line_number` back to a variable name or metadata key.
+
+## Validation and Severity
+
+`detect_secrets_scan_batch` accepts `validate`, read from `secrets_validate` in the provider configuration or the `--scan-secrets-validate` flag. When enabled, Kingfisher confirms whether each secret is live, and confirmed secrets carry `is_verified: True`.
+
+After marking a report as `FAIL`, pass the findings to `annotate_verified_secrets(report, findings)`. When any secret is verified, the helper escalates the finding to critical severity and appends a note that the secret was confirmed live. Validation stays off by default because it sends the discovered secret to the provider API.
+
+## Excluded Secrets
+
+`detect_secrets_scan_batch` applies `secrets_ignore_patterns` — regular expressions from the provider configuration — against each finding's source line and drops the matches, mirroring single-scan behavior.
+
+## Testing
+
+To assert on the verified-secret path, mock `detect_secrets_scan_batch` in the check module and return the keyed dictionary. For a single resource scanned at index `0`:
+
+```python
+mock.patch(
+    "prowler.providers.aws.services.example.example_resource_no_secrets.example_resource_no_secrets.detect_secrets_scan_batch",
+    return_value={
+        0: [{"type": "...", "line_number": 1, "is_verified": True}]
+    },
+)
+```
+
+Most tests need no mock at all: they seed resources that contain example secrets and assert on the `FAIL` status and message, which exercises the real batched path. Refer to the [Testing](/developer-guide/unit-testing) documentation for the general structure.
@@ -398,6 +398,7 @@
              "developer-guide/provider",
              "developer-guide/services",
              "developer-guide/checks",
+              "developer-guide/secret-scanning-checks",
              "developer-guide/outputs",
              "developer-guide/integrations",
              "developer-guide/security-compliance-framework",
@@ -2,6 +2,8 @@
 title: "Configuration File"
 ---

+import { VersionBadge } from "/snippets/version-badge.mdx"
+
 Several Prowler's checks have user configurable variables that can be modified in a common **configuration file**. This file can be found in the following [path](https://github.com/prowler-cloud/prowler/blob/master/prowler/config/config.yaml):

 ```
@@ -87,6 +89,32 @@ The following list includes all the AWS checks with configurable variables that
 | `opensearch_service_domains_not_publicly_accessible`          | `trusted_ips`                                    | List of Strings |


+### Validating Discovered Secrets
+
+<VersionBadge version="5.32.0" />
+
+By default, the secret-scanning checks run fully offline: secrets are detected but never sent anywhere. Setting `secrets_validate` to `True` additionally confirms whether each discovered secret is live by authenticating with it against the corresponding provider API. The discovered secret itself serves as the credential, so Prowler requires no additional permissions to validate it.
+
+`secrets_validate` applies to every AWS secret-scanning check listed above (those that accept `secrets_ignore_patterns`). The `--scan-secrets-validate` CLI flag is provider-wide: it also enables validation for the secret-scanning checks of other providers, such as the OpenStack metadata checks.
+
+To enable validation through the configuration file, set the value under the `aws` section:
+
+```yaml
+aws:
+  secrets_validate: True
+```
+
+To enable validation for a single scan (any provider), use Prowler CLI:
+
+```
+prowler aws --scan-secrets-validate
+```
+
+<Warning>
+Secret validation makes outbound network calls that authenticate with each discovered secret. The credential is exercised against the provider, so the call appears in the audited account's logs and can trigger its monitoring (for example, AWS CloudTrail records the validation request). Validation stays disabled by default so that scans remain fully offline.
+</Warning>
+
+
 ## Azure

 ### Configurable Checks
@@ -6,20 +6,33 @@ Prowler has some checks that analyse pentesting risks (Secrets, Internet Exposed

 ## Detect Secrets

-Prowler uses `detect-secrets` library to search for any secrets that are stores in plaintext within your environment.
+Prowler scans for secrets stored in plaintext within the audited environment using [Kingfisher](https://github.com/mongodb/kingfisher), an open-source secret-scanning engine. By default these scans run fully offline, so no data leaves the audited environment. Discovered secrets can optionally be validated against the provider APIs to confirm whether they are live — see [Validating Discovered Secrets](/user-guide/cli/tutorials/configuration_file#validating-discovered-secrets).

-The actual checks that have this functionality are the following:
+The checks with this functionality are the following.
+
+AWS:

 - autoscaling\_find\_secrets\_ec2\_launch\_configuration
 - awslambda\_function\_no\_secrets\_in\_code
 - awslambda\_function\_no\_secrets\_in\_variables
 - cloudformation\_stack\_outputs\_find\_secrets
+- cloudwatch\_log\_group\_no\_secrets\_in\_logs
+- codebuild\_project\_no\_secrets\_in\_variables
 - ec2\_instance\_secrets\_user\_data
 - ec2\_launch\_template\_no\_secrets
 - ecs\_task\_definitions\_no\_environment\_secrets
+- glue\_etl\_jobs\_no\_secrets\_in\_arguments
 - ssm\_document\_secrets
+- stepfunctions\_statemachine\_no\_secrets\_in\_definition

-To execute detect-secrets related checks, you can run the following command:
+OpenStack:
+
+- compute\_instance\_metadata\_sensitive\_data
+- blockstorage\_volume\_metadata\_sensitive\_data
+- blockstorage\_snapshot\_metadata\_sensitive\_data
+- objectstorage\_container\_metadata\_sensitive\_data
+
+To execute the secret-scanning checks, run the following command:

 ```console
 prowler <provider> --categories secrets