Compare commits

...

125 Commits

Author SHA1 Message Date
StylusFrost a88a05d3c1 Merge remote-tracking branch 'origin/PROWLER-1777-dynamic-compliance-framework-discovery' into PROWLER-1778-dynamic-output-report-generation 2026-06-07 14:50:16 +02:00
StylusFrost 382f419fdb Merge remote-tracking branch 'origin/PROWLER-1775-dynamic-credential-validation' into PROWLER-1777-dynamic-compliance-framework-discovery 2026-06-07 14:50:05 +02:00
StylusFrost 0ffb0f6f45 Merge remote-tracking branch 'origin/PROWLER-1774-dynamic-provider-kwargs-connection' into PROWLER-1775-dynamic-credential-validation 2026-06-07 14:50:02 +02:00
StylusFrost 2b1c9c3381 Merge remote-tracking branch 'origin/PROWLER-1773-dynamic-provider-resolution' into PROWLER-1774-dynamic-provider-kwargs-connection 2026-06-07 14:49:59 +02:00
StylusFrost 983c2141ff Merge remote-tracking branch 'origin/PROWLER-1772-provider-type-storage-varchar' into PROWLER-1773-dynamic-provider-resolution 2026-06-07 14:49:56 +02:00
StylusFrost cdac1ce915 Merge remote-tracking branch 'origin/PROWLER-1771-public-dynamic-provider-class-resolver' into PROWLER-1772-provider-type-storage-varchar 2026-06-07 14:49:53 +02:00
StylusFrost 789dbfb620 Merge remote-tracking branch 'origin/PROWLER-1444-multi-provider-compliance-entry-points' into PROWLER-1771-public-dynamic-provider-class-resolver 2026-06-07 14:49:31 +02:00
StylusFrost d7346a6e63 docs(changelog): add entry for external universal compliance via entry points 2026-06-07 14:46:05 +02:00
StylusFrost 9e40f4aa52 Merge remote-tracking branch 'origin/PROWLER-1777-dynamic-compliance-framework-discovery' into PROWLER-1778-dynamic-output-report-generation 2026-06-07 14:27:52 +02:00
StylusFrost d7f8b5b51e Merge remote-tracking branch 'origin/PROWLER-1775-dynamic-credential-validation' into PROWLER-1777-dynamic-compliance-framework-discovery 2026-06-07 14:27:37 +02:00
StylusFrost 8efff5ccf8 Merge remote-tracking branch 'origin/PROWLER-1774-dynamic-provider-kwargs-connection' into PROWLER-1775-dynamic-credential-validation 2026-06-07 14:27:34 +02:00
StylusFrost 900a668ddc Merge remote-tracking branch 'origin/PROWLER-1773-dynamic-provider-resolution' into PROWLER-1774-dynamic-provider-kwargs-connection 2026-06-07 14:27:13 +02:00
StylusFrost f34daf1e69 Merge remote-tracking branch 'origin/PROWLER-1772-provider-type-storage-varchar' into PROWLER-1773-dynamic-provider-resolution 2026-06-07 14:25:41 +02:00
StylusFrost 3c72e9d25e Merge remote-tracking branch 'origin/PROWLER-1771-public-dynamic-provider-class-resolver' into PROWLER-1772-provider-type-storage-varchar 2026-06-07 14:22:41 +02:00
StylusFrost 5392a87a30 Merge remote-tracking branch 'origin/PROWLER-1444-multi-provider-compliance-entry-points' into PROWLER-1771-public-dynamic-provider-class-resolver
# Conflicts:
#	prowler/CHANGELOG.md
2026-06-07 14:04:36 +02:00
StylusFrost 40da359804 feat(compliance): discover external universal frameworks via entry points
External plug-ins ship multi-provider (universal-schema) frameworks through a
dedicated prowler.compliance.universal entry point group, separate from the
per-provider prowler.compliance group. Both get_bulk_compliance_frameworks_universal
(loading) and get_available_compliance_frameworks (listing / --compliance
choices) scan the new group. Built-ins load first and win on a name collision;
multiple packages under the same provider are merged. load_compliance_framework
gains fatal=False so the legacy external path skips a non-legacy JSON with a
warning instead of aborting the run.
2026-06-07 13:56:59 +02:00
StylusFrost 8a0d56786d fix(changelog): resolve leftover merge conflict marker
Remove a stray '=======' conflict marker left in prowler/CHANGELOG.md after
the master merge, and move the elbv2_alb_drop_invalid_header_fields_enabled
entry from Fixed to Added where it belongs.
2026-06-05 15:01:11 +02:00
StylusFrost f729c5a9f0 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-06-05 14:44:50 +02:00
StylusFrost f9682c1354 fix(compliance): make GenericCompliance tolerant of provider-specific schemas
GenericCompliance is the documented last-resort renderer, but it read the
universal attribute fields (Section, SubSection, SubGroup, Service, Type,
Comment) directly and raised AttributeError on frameworks whose schema does
not declare them (CIS, ENS, ISO27001), dropping the whole compliance CSV.
Read all six fields with getattr defaulting to None, and dedupe the finding
and manual rows into a single helper.
2026-06-05 14:38:42 +02:00
potato-20 6f172a5c19 feat(elbv2): add elbv2_alb_drop_invalid_header_fields_enabled check (FSBP ELB.4) (#11471)
Co-authored-by: Hugo P.Brito <hugopbrit@gmail.com>
2026-06-05 14:26:07 +02:00
StylusFrost 29825f9a2f fix(provider): move get_output_options default to call site
Resolve the provider<->models import cycle CodeQL flagged (py/cyclic-import
#7267, #7268). provider.py no longer imports models: get_output_options
stays override-only and __main__ falls back to a new default_output_options
helper in models.py when a provider does not implement it. models.py keeps
its original module-level Provider import (one-way, no cycle).
2026-06-05 14:21:49 +02:00
StylusFrost 356e6e2bb4 fix(provider): default get_summary_entity instead of raising
External providers that do not override get_summary_entity no longer cause
the summary table to be silently dropped. The base contract returns
(self.type, account_id), mirroring the get_output_options default.
2026-06-05 14:05:56 +02:00
StylusFrost 8bc8b16a77 fix(provider): avoid import cycle in get_output_options default
Move the models.py Provider import (used only in the shodan path) to a
local import so models no longer depends on provider at module level. This
breaks the provider <-> models import cycle CodeQL flagged after the
generic OutputOptions default was added, and lets provider.py import
ProviderOutputOptions without a cycle. The output_file_timestamp import is
consolidated into the existing top-level config import.
2026-06-05 13:57:36 +02:00
StylusFrost efa3283a25 fix(provider): return generic OutputOptions default instead of raising
External providers that do not override get_output_options no longer abort
the run with NotImplementedError. The base contract returns a generic
ProviderOutputOptions, honoring arguments.output_filename and otherwise
falling back to a provider-typed filename. Built-ins are unaffected.
2026-06-05 13:42:29 +02:00
Pedro Martín a7d180ea5b feat(dashboard): add AWS AI Security Framework compliance view (#11475) 2026-06-05 13:28:31 +02:00
Pedro Martín d4bbc8b5ad fix(jira): avoid 400 INVALID_INPUT on findings with empty field (#11474) 2026-06-05 13:26:28 +02:00
Aline Almeida a5bc226f11 fix(gcp): pass iam_service_account_unused for disabled service accounts (#11467) 2026-06-05 12:07:30 +02:00
Pablo Fernandez Guerra (PFE) 3a3d9d6146 chore(ui): type process.env via ambient NodeJS.ProcessEnv (#11328)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
2026-06-05 08:31:16 +02:00
Oleksandr_Sanin bcd282d3d0 fix(gcp): honour org-level aggregated sinks in logging_sink_created check (#11355)
Signed-off-by: Oleksandr Sanin <alexaaander.sanin@gmail.com>
Co-authored-by: Hugo P.Brito <hugopbrit@gmail.com>
2026-06-04 12:07:01 +02:00
Pedro Martín eb7949c884 fix(ui): show delete user action only for the current user (#11447)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-06-03 17:03:12 +02:00
Alejandro Bailo e60a4462e5 fix(ui): refine add-provider wizard flow between scans and providers (#11424) 2026-06-03 16:08:06 +02:00
StylusFrost 5452acd610 Merge remote-tracking branch 'origin/PROWLER-1777-dynamic-compliance-framework-discovery' into PROWLER-1778-dynamic-output-report-generation 2026-06-03 13:14:12 +02:00
StylusFrost 8fb6e85887 Merge remote-tracking branch 'origin/PROWLER-1775-dynamic-credential-validation' into PROWLER-1777-dynamic-compliance-framework-discovery
# Conflicts:
#	api/src/backend/api/tests/test_compliance.py
2026-06-03 13:14:00 +02:00
StylusFrost 4775f11dbf Merge remote-tracking branch 'origin/PROWLER-1774-dynamic-provider-kwargs-connection' into PROWLER-1775-dynamic-credential-validation 2026-06-03 13:12:28 +02:00
StylusFrost a471c82a7e Merge remote-tracking branch 'origin/PROWLER-1773-dynamic-provider-resolution' into PROWLER-1774-dynamic-provider-kwargs-connection 2026-06-03 13:12:13 +02:00
StylusFrost 849c399c93 Merge remote-tracking branch 'origin/PROWLER-1772-provider-type-storage-varchar' into PROWLER-1773-dynamic-provider-resolution 2026-06-03 13:12:02 +02:00
StylusFrost e25758ba8e Merge remote-tracking branch 'origin/PROWLER-1771-public-dynamic-provider-class-resolver' into PROWLER-1772-provider-type-storage-varchar
# Conflicts:
#	api/CHANGELOG.md
2026-06-03 13:11:38 +02:00
StylusFrost c4effd7a60 Merge remote-tracking branch 'origin/PROWLER-1391-provider-contract-dynamic-discovery' into PROWLER-1771-public-dynamic-provider-class-resolver 2026-06-03 13:07:55 +02:00
StylusFrost 38788b7922 Merge remote-tracking branch 'origin/master' into PROWLER-1391-provider-contract-dynamic-discovery
# Conflicts:
#	prowler/CHANGELOG.md
2026-06-03 12:15:24 +02:00
Pedro Martín f7f8747512 feat(compliance): add DORA framework for AWS (#11131) 2026-06-03 11:43:55 +02:00
RishiWig3 d573af911d feat(aws): add sagemaker_models_monitor_enabled check (#11278)
Co-authored-by: RishiWig3 <rishi.wig@gmail.com>
Co-authored-by: Hugo P.Brito <hugopbrit@gmail.com>
Co-authored-by: Hugo Pereira Brito <101209179+HugoPBrito@users.noreply.github.com>
2026-06-02 16:10:13 +01:00
Adrián Peña cf9beb8234 feat(api): recover orphaned background tasks and make task re-runs idempotent (#11416) 2026-06-02 14:00:17 +02:00
Davidm4r 7f67eac1bf perf(api): avoid N+1 query loading finding resource tags (#11420)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-06-02 13:19:21 +02:00
Pedro Martín a652e28b4a fix(api): clean up scan tmp output failure to avoid disk fill (#11421)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-06-02 11:37:05 +02:00
Hugo Pereira Brito 1b17304c4a docs(installation): add PowerShell commands for Prowler App install (#11413) 2026-06-02 09:17:40 +01:00
StylusFrost 9a574b32e4 Merge remote-tracking branch 'origin/PROWLER-1777-dynamic-compliance-framework-discovery' into PROWLER-1778-dynamic-output-report-generation 2026-06-01 22:17:59 +02:00
StylusFrost d12d920119 Merge remote-tracking branch 'origin/PROWLER-1775-dynamic-credential-validation' into PROWLER-1777-dynamic-compliance-framework-discovery 2026-06-01 22:17:36 +02:00
StylusFrost 5daa39c4cb feat: bind external provider secret validation to its secret type
- get_credentials_schema returns schemas keyed by secret type
- validate an external provider's secret against the schema for its
  secret_type instead of accepting any declared schema
- reject a secret_type the provider does not declare
2026-06-01 22:09:21 +02:00
StylusFrost 116fb7083d fix(api): reject non-object external provider secret
- Validate the secret is a JSON object before the no-schema path accepts it,
  so a list/string/null cannot be persisted and fail later at {**secret}
- Add parametrized coverage for list/string/null/int payloads
2026-06-01 21:35:34 +02:00
StylusFrost 77b2ffeb54 Merge remote-tracking branch 'origin/PROWLER-1774-dynamic-provider-kwargs-connection' into PROWLER-1775-dynamic-credential-validation 2026-06-01 21:35:10 +02:00
StylusFrost 21e63ebc7e Merge remote-tracking branch 'origin/PROWLER-1773-dynamic-provider-resolution' into PROWLER-1774-dynamic-provider-kwargs-connection
# Conflicts:
#	prowler/providers/common/provider.py
2026-06-01 21:30:39 +02:00
StylusFrost bcc697f42a Merge remote-tracking branch 'origin/PROWLER-1772-provider-type-storage-varchar' into PROWLER-1773-dynamic-provider-resolution 2026-06-01 21:29:02 +02:00
StylusFrost c94456c131 perf(api): share cached provider-type choices across filters and serializer
- Move get_provider_type_choices into a leaf module so the provider
  serializer field reuses the filters' cached list instead of recomputing
  SDK provider discovery on every request
2026-06-01 21:18:39 +02:00
StylusFrost 28433362c5 fix(api): make provider enum-to-varchar migration deploy-safe
- Backfill provider_str synchronously in 0095 (single UPDATE) so the column
  is populated before 0096 sets it NOT NULL, removing the Celery race
- Recreate the unique index inside 0096's transaction with a lock_timeout,
  closing the duplicate-provider window left by the concurrent rebuild
- Drop migration 0097 and the now-unused backfill_provider_str task
2026-06-01 21:18:31 +02:00
StylusFrost 9c7b33157f Merge remote-tracking branch 'origin/PROWLER-1771-public-dynamic-provider-class-resolver' into PROWLER-1772-provider-type-storage-varchar 2026-06-01 20:45:01 +02:00
StylusFrost a111ae763c chore(sdk): tighten get_class docstring and unknown-provider test
- Reword get_class docstring: it may populate the _ep_providers cache,
  rather than claiming "no global state"
- Assert ImportError specifically in the unknown-provider test to enforce
  the public API contract
- Drop the unused provider_class_name left over after get_class delegation
2026-06-01 20:28:19 +02:00
StylusFrost ece6af5dd3 Merge remote-tracking branch 'origin/PROWLER-1391-provider-contract-dynamic-discovery' into PROWLER-1771-public-dynamic-provider-class-resolver
# Conflicts:
#	prowler/providers/common/provider.py
2026-06-01 20:20:50 +02:00
StylusFrost b7b5565aeb Merge remote-tracking branch 'origin/master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-06-01 20:09:48 +02:00
StylusFrost 9c7afd64c5 fix(sdk): match compliance provider segment exactly in get_bulk
- Compare the module's last dotted segment instead of substring
- Prevent a provider name from capturing overlapping built-ins
- Add parametrized regression test (cloud, git, work, open cases)
- Update get_bulk test mock to the real dotted module name
2026-06-01 19:51:27 +02:00
StylusFrost 64e82682bd fix(sdk): detect shadowed provider plug-ins without loading them
- Match shadowing entry point by name instead of calling ep.load()
- Prevent plug-in code from executing during a built-in run
- Update regression test to assert ep.load is never called
2026-06-01 19:44:17 +02:00
StylusFrost 5070ce39c2 fix(sdk): guard built-in providers in is_tool_wrapper_provider
- Short-circuit on is_builtin_provider before loading entry points
- Prevent same-name plug-ins from flipping a built-in onto the tool-wrapper path
- Avoid executing plug-in code via ep.load() for built-in names
- Add regression test asserting ep.load is never called
2026-06-01 19:43:36 +02:00
Prowler Bot c2cef99b33 chore(release): Bump versions to v5.30.0 (#11418)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-06-01 18:37:51 +02:00
StylusFrost 9fc5a93f54 docs(api): add changelog entry for dynamic report availability 2026-06-01 15:07:04 +02:00
StylusFrost 765a6cec28 feat(api): derive report availability from provider frameworks
- Gate threatscore/ens/nis2/csa reports on the provider's available
  compliance frameworks instead of hard-coded provider whitelists
- Reference the ASFF/SecurityHub gate via the provider enum symbol
- Cover external-provider report availability and generic compliance
  fallback with tests
2026-06-01 15:05:46 +02:00
StylusFrost 82cd317390 docs(api): add changelog entry for dynamic compliance discovery 2026-06-01 13:48:42 +02:00
StylusFrost 58fb424ed7 feat(api): discover compliance frameworks from available providers
- Source compliance template, checks mapping, and overview filter
  validation from the SDK's available providers
- Replace static provider enum loops with get_available_providers()
- Cover dynamic external-provider compliance discovery with tests
2026-06-01 13:44:56 +02:00
StylusFrost b8d3312577 feat: validate external provider secrets via SDK credential schema
- Add get_credentials_schema to the provider contract
- Validate non-built-in secrets against the declared schema; built-ins unchanged
- Accept secrets as-is when no schema is declared (validated at connection)
2026-06-01 01:08:41 +02:00
StylusFrost 51581c35ec feat: add SDK contract for dynamic provider construction args
- Add get_scan_arguments/get_connection_arguments to the provider contract
- Route non-built-in providers through the contract; built-ins unchanged
- Cover the external dynamic path in tests
2026-06-01 00:19:17 +02:00
StylusFrost cd15ed07eb feat(api): resolve provider class dynamically via the SDK resolver
- Replace the hardcoded provider match with Provider.get_class
- Drop the closed provider-type union and TYPE_CHECKING imports
- Cover built-in and external entry-point resolution in tests
2026-05-31 23:41:10 +02:00
StylusFrost 30f8244ec1 docs(api): add changelog entry for provider varchar migration 2026-05-31 23:33:23 +02:00
StylusFrost 37323e691a feat(api): drive provider-type filters from SDK-available providers
- Replace the static provider enum in filters with the SDK provider list
- Cache the provider-type choices for hot list endpoints
- Drop the dead provider enum filter override
2026-05-31 23:33:23 +02:00
StylusFrost f14778438e feat(api): validate provider against SDK-available providers
- Drive provider validity from the SDK instead of a static enum
- Tolerate providers without a uid validator at model clean()
- Accept any SDK-exposed provider in the provider serializer field
2026-05-31 23:33:23 +02:00
StylusFrost 64fdea2954 feat(api): store provider as varchar and drop the enum type
- Promote the synced shadow column into provider, dropping the enum
- Rebuild the partial unique index concurrently after the swap
- Keep provider input validation at the serializer layer
2026-05-31 23:33:23 +02:00
StylusFrost 7dc0895581 feat(api): backfill provider_str shadow column per-tenant
- Add batched per-tenant backfill job for the provider_str column
- Register backfill task on the backfill queue
- Dispatch the backfill from a data-only migration
2026-05-31 23:33:23 +02:00
StylusFrost 383e9c6bd8 feat(api): add provider_str shadow column synced by trigger
- Add nullable provider_str CharField mirroring the provider enum column
- DB trigger keeps provider_str in sync on INSERT and UPDATE
- First step of the zero-downtime migration of the provider enum to varchar
2026-05-31 23:33:23 +02:00
StylusFrost 459f986abe Merge branch 'PROWLER-1391-provider-contract-dynamic-discovery' into PROWLER-1771-public-dynamic-provider-class-resolver
# Conflicts:
#	prowler/CHANGELOG.md
2026-05-31 20:18:56 +02:00
StylusFrost 468234577c docs(sdk): move #10700 changelog entries to 5.30.0 unreleased 2026-05-31 20:17:50 +02:00
StylusFrost fe821a41ea Merge branch 'PROWLER-1391-provider-contract-dynamic-discovery' into PROWLER-1771-public-dynamic-provider-class-resolver
# Conflicts:
#	prowler/CHANGELOG.md
2026-05-31 20:05:35 +02:00
StylusFrost c1e131766d fix(sdk): sync CLI parser provider list with available built-ins
- Add okta, scaleway, stackit to known_providers so they no longer
  appear as dynamically-discovered "extra" providers
- Fix missing comma in usage string (stackitvercel -> stackit,vercel)
2026-05-31 19:53:32 +02:00
StylusFrost e1ade761b5 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-05-31 19:30:23 +02:00
StylusFrost 64907898f7 Merge branch 'PROWLER-1391-provider-contract-dynamic-discovery' into PROWLER-1771-public-dynamic-provider-class-resolver
# Conflicts:
#	prowler/CHANGELOG.md
2026-05-31 19:10:18 +02:00
StylusFrost a6ae4903b8 docs(sdk): move #10700 changelog entries to 5.29.0 unreleased
- Move "external/custom providers" (Added) and namespaced-config unwrap
  (Fixed) out of the released 5.26.0/5.25.0 blocks
- Remove the duplicated Fixed section left in 5.25.0
2026-05-31 19:09:19 +02:00
StylusFrost dde265731c docs(sdk): add changelog entry for Provider.get_class 2026-05-31 18:57:19 +02:00
StylusFrost 073dbb74f6 feat(sdk): add Provider.get_class dynamic provider resolver
- Add public get_class() resolving built-in and entry-point providers
- Refactor init_global_provider to use it; collision warning stays there
- Refactor get_providers_help_text to use it
2026-05-31 18:52:03 +02:00
StylusFrost 03cacb83d1 fix(sdk): gate external mutelist delegate to non-builtin providers
- Azure/Cloudflare set per-finding mutelist args inside the loop;
  the external else-branch was overwriting them
- Use is_builtin_provider to scope get_mutelist_finding_args to
  truly external providers
2026-05-27 17:59:24 +02:00
StylusFrost b25a8e5b6e ci(sdk): switch external provider tests from poetry to uv
- Replace poetry.lock filter with uv.lock
- Run pytest via uv to match the migrated workflow
2026-05-27 17:33:46 +02:00
StylusFrost b3c0f78801 style(sdk): remove trailing whitespace on blank lines
- prowler/lib/check/check.py
- prowler/providers/common/provider.py
2026-05-27 17:20:27 +02:00
StylusFrost ca72922dca Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-05-27 17:12:54 +02:00
StylusFrost b13baa9076 refactor(sdk): scope ImageBaseException catch to image provider in __main__
After this PR, the tool-wrapper else branch in __main__.py covers iac,
image, and any external tool-wrapper provider registered via entry
points (anything Provider.is_tool_wrapper_provider returns True for and
that is not llm). The existing `except ImageBaseException` was specific
to the Image provider but lived in a branch that was no longer
Image-only — its name no longer matched its scope, misleading plug-in
authors reading the file.

Move the catch inside an explicit `if provider == "image":` branch so
the name matches what it catches. iac and external tool-wrapper
providers fall through to the outer `except Exception` backstop for
unexpected failures, which was already the effective behavior in
practice (they do not raise ImageBaseException).

No behavior change — pure refactor for legibility, flagged by reviewer
on the PR #10700 review of 2026-05-06.
2026-05-07 10:11:37 +02:00
StylusFrost 4fb14bbb21 perf(sdk): cache misses in Provider._load_ep_provider
Repeated lookups for unknown provider names (built-ins, typos, names
with no registered entry point) re-iterated entry_points() on every
call because only hits were cached. importlib.metadata.entry_points()
walks the metadata of every installed Python package, so the cost is
proportional to the size of the venv, not just to Prowler.

Caches None on miss so subsequent lookups hit the existing
`if name in Provider._ep_providers` short-circuit and return
immediately. Aligns Provider._ep_providers with the symmetric cache
in tool_wrapper._ep_class_cache, which already had this behavior.

Includes a regression test that mocks importlib.metadata.entry_points,
calls _load_ep_provider("nonexistent") twice, and asserts entry_points
is invoked exactly once.

Also underscore-prefix the remaining unused parameters on the abstract
Provider stubs (get_output_options, get_stdout_detail,
generate_compliance_output, display_compliance_table) so vulture stops
flagging them now that the file is in the diff. Same pattern applied
in bbe3a7dbf for the previous batch.
2026-05-07 10:00:07 +02:00
StylusFrost e5b9fee942 fix(sdk): dedupe entry-point compliance frameworks against built-ins
The entry-point loop in get_available_compliance_frameworks appended
framework names without checking whether the same name was already
provided by a built-in. The loader (Compliance.get_bulk) already
respects "built-in wins" via an explicit guard, but the listing
function did not, causing --list-compliance and
--list-compliance-requirements to print the same name twice on
collision with a built-in.

Adds the same dedup guard already used by the universal frameworks
loop a few lines above, restoring symmetry across the three
compliance discovery layers.

Includes a regression test that registers a fake entry point with
a name colliding with a known built-in (cis_2.0_aws) and asserts
the framework appears exactly once in the resulting list.
2026-05-07 09:32:48 +02:00
StylusFrost 020388824e fix(sdk): silence CodeQL py/not-named-self on CheckMetadata validators
Stack @classmethod under @validator for the five validators that take
cls as the first parameter. Pydantic v1 was already treating them as
classmethods via signature introspection, so runtime behavior is
unchanged. The explicit decorator resolves CodeQL alerts #6240-#6244.

Adds the existing # noqa: F841 marker (already used for cls in the
older validators of this file) so vulture does not flag cls as unused.

Affected validators in prowler/lib/check/models.py:
- validate_check_title
- validate_related_url
- validate_recommendation_url
- validate_description
- validate_risk
2026-05-06 18:26:22 +02:00
StylusFrost cf99e02ceb Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-05-05 08:57:11 +02:00
StylusFrost 9681901174 style(sdk): satisfy black and vulture in test_dynamic_provider_loading
Two unrelated lint blockers that CI surfaced once the file was in scope
of `black --check .` and `vulture` on the changed-file path:

- Reformat one if/and conditional that black wanted on a single line.
- Underscore-prefix unused parameter names on FakeProvider stub methods
  (from_cli_args, get_output_options, display_compliance_table,
  generate_compliance_output, FakePureContractProvider.from_cli_args)
  and the throwaway **kw in a MagicMock side_effect lambda. These are
  test fixtures whose method bodies don't reference those args; the
  signatures must match the abstract contract on Provider, so we keep
  positions/types and just rename to indicate intentional non-use.

No logic change.
2026-05-03 22:55:24 +02:00
StylusFrost bbe3a7dbf8 refactor(sdk): extract is_builtin_provider to leaf module to break import cycle
CodeQL flagged a cyclic import after `prowler/lib/check/utils.py` and
`prowler/lib/check/check.py` started importing `Provider` from
`prowler.providers.common.provider`. That module transitively imports
`prowler.config.config`, which imports back into `prowler.lib.check.*`
(`compliance_models`, `external_tool_providers`) — closing the cycle.

Apply the same pattern already used for `is_tool_wrapper_provider`:
extract the predicate to a leaf module, `prowler.providers.common.builtin`,
that depends only on `importlib.util`. `Provider.is_builtin` delegates to
the leaf, and call sites in `prowler.lib.check.*` now import directly
from the leaf — no more cycle.

Also underscore-prefix unused parameters on the abstract stubs in
`Provider` (get_finding_output_data, generate_compliance_output,
display_compliance_table) so vulture stops flagging them now that the
file is in the diff.
2026-05-03 22:46:00 +02:00
StylusFrost 0672c80563 fix(sdk): guard find_spec with is_builtin for external provider discovery
Calling importlib.util.find_spec on prowler.providers.{provider}.services
for an external provider propagates ModuleNotFoundError when the parent
package prowler.providers.{provider} does not exist, instead of returning
None. This caused recover_checks_from_provider, _resolve_check_module and
Scan.scan to fail with "No module named 'prowler.providers.{external}'"
even though the plug-in registered its checks via entry points correctly.

Gate the built-in branch on Provider.is_builtin (which already wraps the
find_spec in try/except) and reuse _resolve_check_module from Scan.scan
so external providers fall through to the entry-point lookup.
2026-05-03 22:31:31 +02:00
StylusFrost 92d7ea2170 Merge remote-tracking branch 'origin/master' into PROWLER-1391-provider-contract-dynamic-discovery
# Conflicts:
#	prowler/config/config.py
2026-05-03 19:41:25 +02:00
StylusFrost c7aa536896 fix(sdk): built-in wins on plug-in collision for providers and checks 2026-04-30 19:50:19 +02:00
StylusFrost e7f23bb13f fix(sdk): propagate provider argument from report to stdout_report 2026-04-30 14:06:17 +02:00
StylusFrost 82132a9341 fix(sdk): use find_spec to distinguish missing vs broken built-ins 2026-04-30 13:53:06 +02:00
StylusFrost 5e876579f8 fix(sdk): use is_tool_wrapper_provider for compliance framework gate 2026-04-30 10:12:24 +02:00
StylusFrost be49fd8c4e Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-04-28 14:48:21 +02:00
StylusFrost 15d8f1642e test(sdk): unit tests for tool_wrapper leaf module 2026-04-28 14:45:07 +02:00
StylusFrost 79f12f3617 refactor(sdk): extract is_tool_wrapper_provider to leaf module to break import cycle 2026-04-28 13:57:27 +02:00
StylusFrost 6715361246 fix(sdk): restore dynamic external providers help in CLI epilog 2026-04-28 12:47:50 +02:00
StylusFrost 45e946cd87 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-04-28 12:42:18 +02:00
StylusFrost 7836905b82 fix(sdk): consult Provider.is_tool_wrapper_provider in check discovery 2026-04-28 12:20:42 +02:00
StylusFrost 52f6653ccf fix(sdk): use equality not substring in provider dispatch chain 2026-04-28 11:54:23 +02:00
StylusFrost a5de6608ae fix(sdk): restore llm in parser usage line to match epilog 2026-04-28 10:50:43 +02:00
StylusFrost 1cdce02397 fix(sdk): use startswith("-") to detect CLI flags so external provider names with hyphens are not misparsed 2026-04-24 20:59:07 +02:00
StylusFrost a31fe9b618 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery
Conflict in prowler/config/config.py resolved by combining both branches:
- HEAD: external compliance discovery via entry points (PROWLER-1391)
- master: multi-provider framework JSONs scanned at top-level compliance/ (#10300)

Order: built-in per-provider -> built-in multi-provider -> external entry points.
Built-ins first so they win on name collisions against external registrations.

Supporting external plug-ins to register multi-provider frameworks is tracked
in PROWLER-1444.
2026-04-24 20:54:22 +02:00
StylusFrost 907166d88a fix(sdk): discriminate builtin vs external providers via find_spec for clearer import errors 2026-04-24 20:33:38 +02:00
StylusFrost 0883baad78 fix(sdk): external providers with --service and external checks for new services 2026-04-24 20:18:20 +02:00
StylusFrost cf70d1f9f8 fix(sdk): honor from_cli_args return value in init_global_provider fallback 2026-04-24 18:51:57 +02:00
StylusFrost 60e7657081 feat(sdk): wire is_external_tool_provider property to execution and metadata validators 2026-04-24 18:23:42 +02:00
StylusFrost e8487d0686 fix(sdk): unwrap namespaced config for all built-in and external providers
load_and_validate_config_file only detected the namespaced format for 5
hardcoded providers (aws, gcp, azure, kubernetes, m365). For every other
built-in (github, nhn, vercel, cloudflare, iac, llm, image, mongodbatlas,
oraclecloud, openstack, alibabacloud, googleworkspace) and for any
external plug-in, the full YAML was returned wrapped instead of the
provider's own block.

Replace the hardcoded list with a dynamic check: if the file has a
top-level key matching the provider and its value is a dict, unwrap it.
Keep the legacy flat format for AWS only (historical, pre-multicloud)
and identify it by the absence of nested-dict top-level values, which
prevents cross-provider config leakage when a namespaced file has no
section for the requested provider.
2026-04-24 18:01:47 +02:00
StylusFrost 9c056beed1 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-04-22 09:39:10 +02:00
StylusFrost f60f7c61c7 feat(provider): add display_compliance_table method for provider-specific compliance rendering 2026-04-21 19:40:38 +02:00
StylusFrost 3deb1359a5 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-04-21 18:52:13 +02:00
StylusFrost e2295bd086 feat(provider): implement get_mutelist_finding_args for external providers and add tests 2026-04-21 18:50:18 +02:00
StylusFrost e27317437d feat(external-provider): add dynamic loading tests and coverage for external provider 2026-04-21 14:37:50 +02:00
StylusFrost 6f6016d822 chore: update CHANGELOG for Prowler v5.25.0 with new features 2026-04-21 14:22:24 +02:00
StylusFrost 5f10e1c1b6 Merge branch 'master' into PROWLER-1391-provider-contract-dynamic-discovery 2026-04-21 14:19:52 +02:00
StylusFrost 484211b465 fix(sdk): align exception handlers to SDK convention and improve test coverage 2026-04-21 14:14:17 +02:00
StylusFrost f8333baf24 feat: Enhance dynamic provider loading and compliance framework discovery
- Implemented dynamic loading of external providers via entry points, allowing for greater flexibility in provider integration.
  - Added functionality to discover compliance directories from entry points, enabling external compliance frameworks to be loaded seamlessly.
  - Refactored check module resolution to prioritize built-in checks while falling back to entry points if necessary.
  - Improved compliance framework loading to include both built-in and external sources, ensuring comprehensive compliance coverage.
  - Enhanced CLI argument parsing to support external providers, improving user experience and configurability.
  - Introduced extensive unit tests to validate dynamic loading, compliance discovery, and overall integration of external providers.
2026-04-15 13:22:57 +02:00
168 changed files with 11906 additions and 39491 deletions
+1 -1
View File
@@ -145,7 +145,7 @@ SENTRY_RELEASE=local
NEXT_PUBLIC_SENTRY_ENVIRONMENT=${SENTRY_ENVIRONMENT}
#### Prowler release version ####
NEXT_PUBLIC_PROWLER_RELEASE_VERSION=v5.29.0
NEXT_PUBLIC_PROWLER_RELEASE_VERSION=v5.30.0
# Social login credentials
SOCIAL_GOOGLE_OAUTH_CALLBACK_URL="${AUTH_URL}/api/auth/callback/google"
+28 -1
View File
@@ -540,7 +540,7 @@ jobs:
with:
flags: prowler-py${{ matrix.python-version }}-vercel
files: ./vercel_coverage.xml
# Scaleway Provider
- name: Check if Scaleway files changed
if: steps.check-changes.outputs.any_changed == 'true'
@@ -588,7 +588,34 @@ jobs:
with:
flags: prowler-py${{ matrix.python-version }}-stackit
files: ./stackit_coverage.xml
# External Provider (dynamic loading)
- name: Check if External Provider files changed
if: steps.check-changes.outputs.any_changed == 'true'
id: changed-external
uses: tj-actions/changed-files@22103cc46bda19c2b464ffe86db46df6922fd323 # v47.0.5
with:
files: |
./prowler/providers/common/**
./prowler/config/**
./prowler/lib/**
./tests/providers/external/**
./uv.lock
- name: Run External Provider tests
if: steps.changed-external.outputs.any_changed == 'true'
run: uv run pytest -n auto --cov=./prowler/providers/common --cov=./prowler/config --cov=./prowler/lib --cov-report=xml:external_coverage.xml tests/providers/external
- name: Upload External Provider coverage to Codecov
if: steps.changed-external.outputs.any_changed == 'true'
uses: codecov/codecov-action@671740ac38dd9b0130fbe1cec585b89eea48d3de # v5.5.2
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}
with:
flags: prowler-py${{ matrix.python-version }}-external
files: ./external_coverage.xml
# Lib
- name: Check if Lib files changed
if: steps.check-changes.outputs.any_changed == 'true'
+12
View File
@@ -153,6 +153,8 @@ Prowler App offers flexible installation methods tailored to various environment
#### Commands
_macOS/Linux:_
``` console
VERSION=$(curl -s https://api.github.com/repos/prowler-cloud/prowler/releases/latest | jq -r .tag_name)
curl -sLO "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/${VERSION}/docker-compose.yml"
@@ -161,6 +163,16 @@ curl -sLO "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/${V
docker compose up -d
```
_Windows PowerShell:_
``` powershell
$VERSION = (Invoke-RestMethod -Uri "https://api.github.com/repos/prowler-cloud/prowler/releases/latest").tag_name
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/$VERSION/docker-compose.yml" -OutFile "docker-compose.yml"
# Environment variables can be customized in the .env file. Using default values in production environments is not recommended.
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/$VERSION/.env" -OutFile ".env"
docker compose up -d
```
> [!WARNING]
> 🔒 For a secure setup, the API auto-generates a unique key pair, `DJANGO_TOKEN_SIGNING_KEY` and `DJANGO_TOKEN_VERIFYING_KEY`, and stores it in `~/.config/prowler-api` (non-container) or the bound Docker volume in `_data/api` (container). Never commit or reuse static/default keys. To rotate keys, delete the stored key files and restart the API.
+31
View File
@@ -2,6 +2,37 @@
All notable changes to the **Prowler API** are documented in this file.
## [1.31.0] (Prowler UNRELEASED)
### 🚀 Added
- Automatic recovery of allowlisted idempotent background tasks whose worker died during a deploy or crash: stuck scan and summary tasks are detected and re-run instead of staying pending forever, with a `reconcile_orphan_tasks` management command for on-demand recovery [(#11416)](https://github.com/prowler-cloud/prowler/pull/11416)
- Jira integration no longer creates duplicate issues on a retried send; findings already ticketed are skipped [(#11416)](https://github.com/prowler-cloud/prowler/pull/11416)
- DORA compliance framework support [(#11131)](https://github.com/prowler-cloud/prowler/pull/11131)
### 🔄 Changed
- Allowlisted idempotent background tasks are no longer lost when a worker is stopped or crashes mid-task; tasks with external side effects are marked terminal instead of blindly re-running [(#11416)](https://github.com/prowler-cloud/prowler/pull/11416)
- A recovered scan rewrites its findings, summaries, attack surface, and compliance data instead of appending to the previous run, so recovery never leaves stale or duplicate materialized rows [(#11416)](https://github.com/prowler-cloud/prowler/pull/11416)
- Provider type is validated against the SDK's available providers instead of a static enum, so the API accepts any installed provider (built-in or external); `Provider.provider` is stored as `varchar` and the native PostgreSQL enum is removed [(#11399)](https://github.com/prowler-cloud/prowler/pull/11399)
- Compliance framework discovery is sourced from the SDK's available providers instead of a static enum, so an externally-registered provider's compliance frameworks are discoverable without API changes [(#11410)](https://github.com/prowler-cloud/prowler/pull/11410)
- Compliance report availability (ThreatScore, ENS, NIS2, CSA) is derived from the provider's available compliance frameworks instead of hard-coded provider whitelists, so a provider that ships one of these frameworks gets the report without API changes [(#11412)](https://github.com/prowler-cloud/prowler/pull/11412)
### 🐞 Fixed
- Workers now shut down gracefully on deploy or restart, finishing or re-queueing in-flight tasks instead of being force-killed and leaving them stuck [(#11416)](https://github.com/prowler-cloud/prowler/pull/11416)
---
## [1.30.1] (Prowler v5.29.1)
### 🐞 Fixed
- `GET /api/v1/findings` N+1 query loading `resources__tags` when listing findings [(#11420)](https://github.com/prowler-cloud/prowler/pull/11420)
- Clean up the scan tmp output directory when `scan-report` fails so partial files do not accumulate and fill the worker disk (`No space left on device`) [(#11421)](https://github.com/prowler-cloud/prowler/pull/11421)
---
## [1.30.0] (Prowler v5.29.0)
### 🔄 Changed
+4 -4
View File
@@ -22,12 +22,12 @@ apply_fixtures() {
start_dev_server() {
echo "Starting the development server..."
uv run python manage.py runserver 0.0.0.0:"${DJANGO_PORT:-8080}"
exec uv run python manage.py runserver 0.0.0.0:"${DJANGO_PORT:-8080}"
}
start_prod_server() {
echo "Starting the Gunicorn server..."
uv run gunicorn -c config/guniconf.py config.wsgi:application
exec uv run gunicorn -c config/guniconf.py config.wsgi:application
}
resolve_worker_hostname() {
@@ -47,7 +47,7 @@ resolve_worker_hostname() {
start_worker() {
echo "Starting the worker..."
uv run python -m celery -A config.celery worker \
exec uv run python -m celery -A config.celery worker \
-n "$(resolve_worker_hostname)" \
-l "${DJANGO_LOGGING_LEVEL:-info}" \
-Q celery,scans,scan-reports,deletion,backfill,overview,integrations,compliance,attack-paths-scans \
@@ -56,7 +56,7 @@ start_worker() {
start_worker_beat() {
echo "Starting the worker-beat..."
uv run python -m celery -A config.celery beat -l "${DJANGO_LOGGING_LEVEL:-info}" --scheduler django_celery_beat.schedulers:DatabaseScheduler
exec uv run python -m celery -A config.celery beat -l "${DJANGO_LOGGING_LEVEL:-info}" --scheduler django_celery_beat.schedulers:DatabaseScheduler
}
manage_db_partitions() {
+86
View File
@@ -0,0 +1,86 @@
# Orphan Celery task recovery
When a worker is terminated mid-task (a deploy, an OOM kill, a node eviction), the
task it was running can be left non-terminal forever: the `Scan` stays `EXECUTING`,
the `TaskResult` stays `STARTED`, and nothing re-runs it. This page describes the
mechanisms that detect and recover allowlisted idempotent orphans so users never
see a stuck scan and pending-task alerts do not fire.
## How recovery works
1. **Durable delivery.** The broker is configured so a task message is acknowledged
only after the task finishes (`task_acks_late`), one task is reserved at a time
(`worker_prefetch_multiplier = 1`), and an abruptly-lost worker re-queues its task
(`task_reject_on_worker_lost`). On `SIGTERM` the worker is given a soft-shutdown
window (`worker_soft_shutdown_timeout`) to finish or re-queue in-flight work
before it is force-killed.
2. **Periodic watchdog.** A Beat task, `reconcile-orphan-tasks`, runs every couple of
minutes (a `django_celery_beat` periodic task created by migration). For each
in-flight task result with an allowlisted idempotent task name, it pings the
worker recorded on the task's `TaskResult`:
- worker responds -> the task is still running, leave it alone;
- worker is gone (and the scan started before a short grace window) -> it is a
real orphan: the stale task is revoked and marked terminal (clearing the
pending/started alert), and the scan is re-enqueued from scratch.
The re-run is safe because only tasks with proven idempotency are allowlisted.
Scan persistence, for example, clears the scan's prior findings and materialized
summary/compliance rows before re-writing them. Jira sends are allowlisted too:
each finding is reserved in a dispatch table before the external call, so a re-run
skips already-ticketed findings (the worst case is one finding missed if a worker
is hard-killed mid-send, never a duplicate issue). Other external side effects stay
terminal: the S3 upload rebuilds from worker-local files that do not survive a
crash, and report/Security Hub recovery is out of scope.
3. **Recovery cap.** Each automatic re-enqueue increments `Scan.recovery_count`.
After `--max-attempts` recoveries (default 3) the scan is marked `FAILED` instead
of re-enqueued, so a task that repeatedly kills its worker cannot loop forever.
A Postgres advisory lock ensures that, even with multiple API/worker replicas, only
one reconciliation runs at a time; the others no-op.
## On-demand command
The same logic is available as a management command, useful right after a deploy or
for manual intervention:
```bash
python manage.py reconcile_orphan_tasks # recover now
python manage.py reconcile_orphan_tasks --dry-run # report orphans, change nothing
python manage.py reconcile_orphan_tasks --grace-minutes 5 --max-attempts 3
```
## Configuration
All settings have safe defaults; override via environment variables.
| Env var | Default | Purpose |
| --- | --- | --- |
| `DJANGO_CELERY_WORKER_PREFETCH_MULTIPLIER` | `1` | Tasks reserved per worker process. |
| `DJANGO_CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT` | `60` | Seconds the worker drains/re-queues on `SIGTERM` before force-kill. |
| `DJANGO_CELERY_TASK_TIME_LIMIT` | `21600` (6h) | Hard limit for most tasks; connection checks are capped at 120s. |
| `DJANGO_CELERY_TASK_SOFT_TIME_LIMIT` | hard - 600 | Soft limit; raises `SoftTimeLimitExceeded` for cleanup. |
| `DJANGO_CELERY_LONG_TASK_TIME_LIMIT` | `172800` (48h) | Hard limit for scans and provider/tenant deletions, which can legitimately run for more than a day. |
| `DJANGO_CELERY_LONG_TASK_SOFT_TIME_LIMIT` | long hard - 600 | Soft limit for the long-running tasks above. |
`task_acks_late` and `task_reject_on_worker_lost` are enabled in `config/celery.py`.
## Deployment requirement
Two conditions must both hold for the soft shutdown to actually drain work:
1. **The worker must receive `SIGTERM`.** The container entrypoint `exec`s the
Celery process so it runs as PID 1; otherwise `SIGTERM` from `docker stop`/ECS
hits the entrypoint shell, never reaches Celery, and the worker is hard-killed
(SIGKILL) at the grace deadline without draining. Custom entrypoints must
preserve the `exec`.
2. **The orchestrator must give the worker enough time** before force-killing it.
Set the stop grace period to exceed `DJANGO_CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT`
plus a margin:
- **docker-compose:** `stop_grace_period` on the worker services (set to `120s`).
- **AWS ECS:** the worker container `stopTimeout` (configured in the deployment
repository).
If either condition is missing, long tasks are still recovered by the watchdog,
but they are cut mid-run on every deploy instead of draining.
+1 -1
View File
@@ -68,7 +68,7 @@ name = "prowler-api"
package-mode = false
# Needed for the SDK compatibility
requires-python = ">=3.11,<3.13"
version = "1.30.0"
version = "1.31.0"
[tool.uv]
# Transitive pins matching master to avoid silent drift; bump deliberately.
+52 -40
View File
@@ -1,8 +1,11 @@
from collections.abc import Iterable, Mapping
from api.models import Provider
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.compliance_models import (
get_bulk_compliance_frameworks_universal,
)
from prowler.lib.check.models import CheckMetadata
from prowler.providers.common.provider import Provider as SDKProvider
AVAILABLE_COMPLIANCE_FRAMEWORKS = {}
@@ -12,7 +15,7 @@ class LazyComplianceTemplate(Mapping):
def __init__(self, provider_types: Iterable[str] | None = None) -> None:
if provider_types is None:
provider_types = Provider.ProviderChoices.values
provider_types = SDKProvider.get_available_providers()
self._provider_types = tuple(provider_types)
self._provider_types_set = set(self._provider_types)
self._cache: dict[str, dict] = {}
@@ -53,7 +56,7 @@ class LazyChecksMapping(Mapping):
def __init__(self, provider_types: Iterable[str] | None = None) -> None:
if provider_types is None:
provider_types = Provider.ProviderChoices.values
provider_types = SDKProvider.get_available_providers()
self._provider_types = tuple(provider_types)
self._provider_types_set = set(self._provider_types)
self._cache: dict[str, dict] = {}
@@ -94,25 +97,22 @@ PROWLER_CHECKS = LazyChecksMapping()
def get_compliance_frameworks(provider_type: Provider.ProviderChoices) -> list[str]:
"""List compliance frameworks the API can load for `provider_type`.
"""List compliance framework identifiers available for `provider_type`.
The list is sourced from `Compliance.get_bulk` so that the names
returned here are guaranteed to be loadable by the bulk loader. This
prevents downstream key mismatches (e.g. CSV report generation iterating
framework names and looking them up in the bulk dict).
Includes both per-provider frameworks and universal top-level frameworks
(e.g. ``dora``, ``csa_ccm_4.0``).
Args:
provider_type (Provider.ProviderChoices): The cloud provider type for which to retrieve
available compliance frameworks (e.g., "aws", "azure", "gcp", "m365").
provider_type (Provider.ProviderChoices): The cloud provider type
(e.g., "aws", "azure", "gcp", "m365").
Returns:
list[str]: A list of framework identifiers (e.g., "cis_1.4_aws", "mitre_attack_azure") available
for the given provider.
list[str]: Framework identifiers (e.g., "cis_1.4_aws", "dora").
"""
global AVAILABLE_COMPLIANCE_FRAMEWORKS
if provider_type not in AVAILABLE_COMPLIANCE_FRAMEWORKS:
AVAILABLE_COMPLIANCE_FRAMEWORKS[provider_type] = list(
Compliance.get_bulk(provider_type).keys()
get_bulk_compliance_frameworks_universal(provider_type).keys()
)
return AVAILABLE_COMPLIANCE_FRAMEWORKS[provider_type]
@@ -139,18 +139,14 @@ def get_prowler_provider_compliance(provider_type: Provider.ProviderChoices) ->
"""
Retrieve the Prowler compliance data for a specified provider type.
This function fetches the compliance frameworks and their associated
requirements for the given cloud provider.
Args:
provider_type (Provider.ProviderChoices): The provider type
(e.g., 'aws', 'azure') for which to retrieve compliance data.
Returns:
dict: A dictionary mapping compliance framework names to their respective
Compliance objects for the specified provider.
dict: Mapping of framework name to `ComplianceFramework` for the provider.
"""
return Compliance.get_bulk(provider_type)
return get_bulk_compliance_frameworks_universal(provider_type)
def _load_provider_assets(provider_type: Provider.ProviderChoices) -> tuple[dict, dict]:
@@ -201,7 +197,7 @@ def load_prowler_checks(
"""
checks = {}
if provider_types is None:
provider_types = Provider.ProviderChoices.values
provider_types = SDKProvider.get_available_providers()
for provider_type in provider_types:
checks[provider_type] = {
check_id: set() for check_id in get_prowler_provider_checks(provider_type)
@@ -209,8 +205,8 @@ def load_prowler_checks(
for compliance_name, compliance_data in prowler_compliance.get(
provider_type, {}
).items():
for requirement in compliance_data.Requirements:
for check in requirement.Checks:
for requirement in compliance_data.requirements:
for check in requirement.checks.get(provider_type, []):
try:
checks[provider_type][check].add(compliance_name)
except KeyError:
@@ -280,7 +276,7 @@ def generate_compliance_overview_template(
"""
template = {}
if provider_types is None:
provider_types = Provider.ProviderChoices.values
provider_types = SDKProvider.get_available_providers()
for provider_type in provider_types:
provider_compliance = template.setdefault(provider_type, {})
compliance_data_dict = prowler_compliance.get(provider_type, {})
@@ -290,24 +286,40 @@ def generate_compliance_overview_template(
requirements_status = {"passed": 0, "failed": 0, "manual": 0}
total_requirements = 0
for requirement in compliance_data.Requirements:
for requirement in compliance_data.requirements:
total_requirements += 1
total_checks = len(requirement.Checks)
checks_dict = {check: None for check in requirement.Checks}
provider_check_list = list(requirement.checks.get(provider_type, []))
total_checks = len(provider_check_list)
checks_dict = {check: None for check in provider_check_list}
req_status_val = "MANUAL" if total_checks == 0 else "PASS"
# MITRE attrs are wrapped under `_raw_attributes` by the
# universal adapter — unwrap so consumers see the flat list.
requirement_attributes = requirement.attributes
if (
isinstance(requirement_attributes, dict)
and "_raw_attributes" in requirement_attributes
):
attributes_payload = list(requirement_attributes["_raw_attributes"])
elif isinstance(requirement_attributes, dict):
attributes_payload = (
[dict(requirement_attributes)] if requirement_attributes else []
)
else:
attributes_payload = [
dict(attribute) for attribute in requirement_attributes
]
# Build requirement dictionary
requirement_dict = {
"name": requirement.Name or requirement.Id,
"description": requirement.Description,
"tactics": getattr(requirement, "Tactics", []),
"subtechniques": getattr(requirement, "SubTechniques", []),
"platforms": getattr(requirement, "Platforms", []),
"technique_url": getattr(requirement, "TechniqueURL", ""),
"attributes": [
dict(attribute) for attribute in requirement.Attributes
],
"name": requirement.name or requirement.id,
"description": requirement.description,
"tactics": requirement.tactics or [],
"subtechniques": requirement.sub_techniques or [],
"platforms": requirement.platforms or [],
"technique_url": requirement.technique_url or "",
"attributes": attributes_payload,
"checks": checks_dict,
"checks_status": {
"pass": 0,
@@ -325,15 +337,15 @@ def generate_compliance_overview_template(
requirements_status["passed"] += 1
# Add requirement to compliance requirements
compliance_requirements[requirement.Id] = requirement_dict
compliance_requirements[requirement.id] = requirement_dict
# Build compliance dictionary
compliance_dict = {
"framework": compliance_data.Framework,
"name": compliance_data.Name,
"version": compliance_data.Version,
"framework": compliance_data.framework,
"name": compliance_data.name,
"version": compliance_data.version,
"provider": provider_type,
"description": compliance_data.Description,
"description": compliance_data.description,
"requirements": compliance_requirements,
"requirements_status": requirements_status,
"total_requirements": total_requirements,
+23 -28
View File
@@ -19,7 +19,6 @@ from api.constants import SEVERITY_ORDER
from api.db_utils import (
FindingDeltaEnumField,
InvitationStateEnumField,
ProviderEnumField,
SeverityEnumField,
StatusEnumField,
)
@@ -67,6 +66,7 @@ from api.uuid_utils import (
uuid7_range,
uuid7_start,
)
from api.provider_types import get_provider_type_choices
from api.v1.serializers import TaskBase
@@ -109,11 +109,11 @@ class BaseProviderFilter(FilterSet):
provider_id = UUIDFilter(field_name="provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="provider__id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="provider__provider", choices=Provider.ProviderChoices.choices
field_name="provider__provider", choices=get_provider_type_choices()
)
provider_type__in = ChoiceInFilter(
field_name="provider__provider",
choices=Provider.ProviderChoices.choices,
choices=get_provider_type_choices(),
lookup_expr="in",
)
@@ -133,11 +133,11 @@ class BaseScanProviderFilter(FilterSet):
provider_id = UUIDFilter(field_name="scan__provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="scan__provider__id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="scan__provider__provider", choices=Provider.ProviderChoices.choices
field_name="scan__provider__provider", choices=get_provider_type_choices()
)
provider_type__in = ChoiceInFilter(
field_name="scan__provider__provider",
choices=Provider.ProviderChoices.choices,
choices=get_provider_type_choices(),
lookup_expr="in",
)
@@ -155,10 +155,10 @@ class CommonFindingFilters(FilterSet):
provider_id = UUIDFilter(field_name="scan__provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="scan__provider__id", lookup_expr="in")
provider_type = ChoiceFilter(
choices=Provider.ProviderChoices.choices, field_name="scan__provider__provider"
choices=get_provider_type_choices(), field_name="scan__provider__provider"
)
provider_type__in = ChoiceInFilter(
choices=Provider.ProviderChoices.choices, field_name="scan__provider__provider"
choices=get_provider_type_choices(), field_name="scan__provider__provider"
)
provider_uid = CharFilter(field_name="scan__provider__uid", lookup_expr="exact")
provider_uid__in = CharInFilter(field_name="scan__provider__uid", lookup_expr="in")
@@ -356,18 +356,18 @@ class ProviderFilter(FilterSet):
included. Providers with no connection attempt (status is null) are
excluded from this filter."""
)
provider = ChoiceFilter(choices=Provider.ProviderChoices.choices)
provider = ChoiceFilter(choices=get_provider_type_choices())
provider__in = ChoiceInFilter(
field_name="provider",
choices=Provider.ProviderChoices.choices,
choices=get_provider_type_choices(),
lookup_expr="in",
)
provider_type = ChoiceFilter(
choices=Provider.ProviderChoices.choices, field_name="provider"
choices=get_provider_type_choices(), field_name="provider"
)
provider_type__in = ChoiceInFilter(
field_name="provider",
choices=Provider.ProviderChoices.choices,
choices=get_provider_type_choices(),
lookup_expr="in",
)
@@ -381,19 +381,14 @@ class ProviderFilter(FilterSet):
"inserted_at": ["gte", "lte"],
"updated_at": ["gte", "lte"],
}
filter_overrides = {
ProviderEnumField: {
"filter_class": CharFilter,
},
}
class ProviderRelationshipFilterSet(FilterSet):
provider_type = ChoiceFilter(
choices=Provider.ProviderChoices.choices, field_name="provider__provider"
choices=get_provider_type_choices(), field_name="provider__provider"
)
provider_type__in = ChoiceInFilter(
choices=Provider.ProviderChoices.choices, field_name="provider__provider"
choices=get_provider_type_choices(), field_name="provider__provider"
)
provider_uid = CharFilter(field_name="provider__uid", lookup_expr="exact")
provider_uid__in = CharInFilter(field_name="provider__uid", lookup_expr="in")
@@ -998,7 +993,7 @@ class FindingGroupSummaryFilter(_CheckTitleToCheckIdMixin, FilterSet):
provider_id = UUIDFilter(field_name="provider_id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="provider_id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="provider__provider", choices=Provider.ProviderChoices.choices
field_name="provider__provider", choices=get_provider_type_choices()
)
provider_type__in = CharInFilter(field_name="provider__provider", lookup_expr="in")
@@ -1098,7 +1093,7 @@ class LatestFindingGroupSummaryFilter(_CheckTitleToCheckIdMixin, FilterSet):
provider_id = UUIDFilter(field_name="provider_id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="provider_id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="provider__provider", choices=Provider.ProviderChoices.choices
field_name="provider__provider", choices=get_provider_type_choices()
)
provider_type__in = CharInFilter(field_name="provider__provider", lookup_expr="in")
@@ -1301,10 +1296,10 @@ class ScanSummaryFilter(FilterSet):
provider_id = UUIDFilter(field_name="scan__provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="scan__provider__id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="scan__provider__provider", choices=Provider.ProviderChoices.choices
field_name="scan__provider__provider", choices=get_provider_type_choices()
)
provider_type__in = ChoiceInFilter(
field_name="scan__provider__provider", choices=Provider.ProviderChoices.choices
field_name="scan__provider__provider", choices=get_provider_type_choices()
)
region = CharFilter(field_name="region")
@@ -1324,10 +1319,10 @@ class DailySeveritySummaryFilter(FilterSet):
provider_id = UUIDFilter(field_name="provider_id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="provider_id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="provider__provider", choices=Provider.ProviderChoices.choices
field_name="provider__provider", choices=get_provider_type_choices()
)
provider_type__in = ChoiceInFilter(
field_name="provider__provider", choices=Provider.ProviderChoices.choices
field_name="provider__provider", choices=get_provider_type_choices()
)
date_from = DateFilter(method="filter_noop")
date_to = DateFilter(method="filter_noop")
@@ -1578,11 +1573,11 @@ class ThreatScoreSnapshotFilter(FilterSet):
provider_id = UUIDFilter(field_name="provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="provider__id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="provider__provider", choices=Provider.ProviderChoices.choices
field_name="provider__provider", choices=get_provider_type_choices()
)
provider_type__in = ChoiceInFilter(
field_name="provider__provider",
choices=Provider.ProviderChoices.choices,
choices=get_provider_type_choices(),
lookup_expr="in",
)
compliance_id = CharFilter(field_name="compliance_id", lookup_expr="exact")
@@ -1621,11 +1616,11 @@ class ResourceGroupOverviewFilter(FilterSet):
provider_id = UUIDFilter(field_name="scan__provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="scan__provider__id", lookup_expr="in")
provider_type = ChoiceFilter(
field_name="scan__provider__provider", choices=Provider.ProviderChoices.choices
field_name="scan__provider__provider", choices=get_provider_type_choices()
)
provider_type__in = ChoiceInFilter(
field_name="scan__provider__provider",
choices=Provider.ProviderChoices.choices,
choices=get_provider_type_choices(),
lookup_expr="in",
)
resource_group = CharFilter(field_name="resource_group", lookup_expr="exact")
@@ -0,0 +1,49 @@
from django.core.management.base import BaseCommand
from tasks.jobs.orphan_recovery import reconcile_orphans
class Command(BaseCommand):
help = (
"Recover orphaned allowlisted Celery tasks whose worker is gone and mark "
"other stale task results terminal. Single-flight via a Postgres advisory lock."
)
def add_arguments(self, parser):
parser.add_argument(
"--grace-minutes",
type=int,
default=2,
help="Skip tasks started within this window (worker may still register).",
)
parser.add_argument(
"--max-attempts",
type=int,
default=3,
help="Give up re-running a task after this many recovery attempts (scans are marked FAILED).",
)
parser.add_argument(
"--dry-run",
action="store_true",
help="Detect and report orphans without revoking or re-enqueuing.",
)
def handle(self, *args, **options):
result = reconcile_orphans(
grace_minutes=options["grace_minutes"],
max_attempts=options["max_attempts"],
dry_run=options["dry_run"],
)
if not result.get("acquired"):
self.stdout.write("Reconcile skipped: another run holds the lock.")
return
self.stdout.write(
self.style.SUCCESS(
"Orphan reconcile complete: "
f"recovered={len(result.get('recovered', []))} "
f"failed={len(result.get('failed', []))} "
f"skipped(in-flight)={len(result.get('skipped', []))}"
)
)
@@ -0,0 +1,17 @@
# Generated by Django 5.1.15 on 2026-05-30 17:38
from django.db import migrations, models
class Migration(migrations.Migration):
dependencies = [
("api", "0093_okta_provider"),
]
operations = [
migrations.AddField(
model_name="scan",
name="recovery_count",
field=models.IntegerField(default=0),
),
]
@@ -0,0 +1,49 @@
from django.db import migrations
TASK_NAME = "reconcile-orphan-tasks"
INTERVAL_MINUTES = 2
def create_periodic_task(apps, schema_editor):
IntervalSchedule = apps.get_model("django_celery_beat", "IntervalSchedule")
PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")
schedule, _ = IntervalSchedule.objects.get_or_create(
every=INTERVAL_MINUTES,
period="minutes",
)
PeriodicTask.objects.update_or_create(
name=TASK_NAME,
defaults={
"task": TASK_NAME,
"interval": schedule,
"enabled": True,
},
)
def delete_periodic_task(apps, schema_editor):
IntervalSchedule = apps.get_model("django_celery_beat", "IntervalSchedule")
PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")
PeriodicTask.objects.filter(name=TASK_NAME).delete()
# Clean up the schedule if no other task references it
IntervalSchedule.objects.filter(
every=INTERVAL_MINUTES,
period="minutes",
periodictask__isnull=True,
).delete()
class Migration(migrations.Migration):
dependencies = [
("api", "0094_scan_recovery_count"),
("django_celery_beat", "0019_alter_periodictasks_options"),
]
operations = [
migrations.RunPython(create_periodic_task, delete_periodic_task),
]
@@ -0,0 +1,64 @@
import uuid
import django.db.models.deletion
from django.db import migrations, models
import api.rls
class Migration(migrations.Migration):
dependencies = [
("api", "0095_reconcile_orphan_tasks_periodic_task"),
]
operations = [
migrations.CreateModel(
name="JiraIssueDispatch",
fields=[
(
"id",
models.UUIDField(
default=uuid.uuid4,
editable=False,
primary_key=True,
serialize=False,
),
),
("inserted_at", models.DateTimeField(auto_now_add=True)),
("finding_id", models.UUIDField()),
(
"integration",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE,
related_name="jira_dispatches",
to="api.integration",
),
),
(
"tenant",
models.ForeignKey(
on_delete=django.db.models.deletion.CASCADE, to="api.tenant"
),
),
],
options={
"db_table": "jira_issue_dispatches",
"abstract": False,
},
),
migrations.AddConstraint(
model_name="jiraissuedispatch",
constraint=models.UniqueConstraint(
fields=("tenant_id", "integration_id", "finding_id"),
name="unique_jira_issue_dispatch",
),
),
migrations.AddConstraint(
model_name="jiraissuedispatch",
constraint=api.rls.RowLevelSecurityConstraint(
"tenant_id",
name="rls_on_jiraissuedispatch",
statements=["SELECT", "INSERT", "UPDATE", "DELETE"],
),
),
]
@@ -0,0 +1,43 @@
from django.db import migrations, models
class Migration(migrations.Migration):
"""Expand step of the zero-downtime migration of Provider.provider from a
native PostgreSQL enum to varchar.
Adds a transitional varchar shadow column `provider_str` and a trigger that
keeps it in sync with the `provider` enum column on every INSERT/UPDATE.
Adding a nullable column is metadata-only (no table rewrite, no long lock).
The trigger covers writes from now on; existing rows are populated by a
later backfill. A subsequent migration drops the enum column and renames
`provider_str` to take its place.
"""
dependencies = [
("api", "0096_jiraissuedispatch"),
]
operations = [
migrations.AddField(
model_name="provider",
name="provider_str",
field=models.CharField(blank=True, max_length=50, null=True),
),
migrations.RunSQL(
sql=(
"CREATE OR REPLACE FUNCTION sync_provider_str() RETURNS trigger AS $$\n"
"BEGIN\n"
" NEW.provider_str := NEW.provider::text;\n"
" RETURN NEW;\n"
"END;\n"
"$$ LANGUAGE plpgsql;\n"
"CREATE TRIGGER providers_sync_provider_str\n"
" BEFORE INSERT OR UPDATE ON providers\n"
" FOR EACH ROW EXECUTE FUNCTION sync_provider_str();"
),
reverse_sql=(
"DROP TRIGGER IF EXISTS providers_sync_provider_str ON providers;\n"
"DROP FUNCTION IF EXISTS sync_provider_str();"
),
),
]
@@ -0,0 +1,25 @@
from django.db import migrations
class Migration(migrations.Migration):
"""Synchronous backfill of the `provider_str` shadow column.
A single UPDATE fills rows that predate the 0097 trigger. The providers
table is small, so this is safe inline and guarantees the column is fully
populated before 0099 sets it NOT NULL (no race with an async backfill).
Runs on the migration connection, which is exempt from RLS.
"""
dependencies = [
("api", "0097_provider_str_shadow_column"),
]
operations = [
migrations.RunSQL(
sql=(
"UPDATE providers SET provider_str = provider::text "
"WHERE provider_str IS NULL;"
),
reverse_sql=migrations.RunSQL.noop,
),
]
@@ -0,0 +1,51 @@
from django.db import migrations, models
class Migration(migrations.Migration):
"""Contract step: promote `provider_str` into `provider`.
Drops the trigger and enum column, renames the shadow column, sets it NOT
NULL, and drops the enum type. The unique index is dropped and recreated in
the same transaction, so there is no window for duplicate active providers;
recreated non-concurrently since the table is small, with a short
lock_timeout so the migration fails fast instead of queueing behind a
long-running transaction.
"""
dependencies = [
("api", "0098_backfill_provider_str"),
]
operations = [
migrations.SeparateDatabaseAndState(
state_operations=[
migrations.RemoveField(
model_name="provider",
name="provider_str",
),
migrations.AlterField(
model_name="provider",
name="provider",
field=models.CharField(default="aws", max_length=50),
),
],
database_operations=[
migrations.RunSQL(
sql=(
"SET LOCAL lock_timeout = '10s';\n"
"DROP TRIGGER IF EXISTS providers_sync_provider_str ON providers;\n"
"DROP FUNCTION IF EXISTS sync_provider_str();\n"
"DROP INDEX IF EXISTS unique_provider_uids;\n"
"ALTER TABLE providers DROP COLUMN provider;\n"
"ALTER TABLE providers RENAME COLUMN provider_str TO provider;\n"
"ALTER TABLE providers ALTER COLUMN provider SET DEFAULT 'aws';\n"
"ALTER TABLE providers ALTER COLUMN provider SET NOT NULL;\n"
"DROP TYPE provider;\n"
"CREATE UNIQUE INDEX unique_provider_uids ON providers "
"(tenant_id, provider, uid) WHERE NOT is_deleted;"
),
reverse_sql=migrations.RunSQL.noop,
),
],
),
]
+49 -5
View File
@@ -40,7 +40,6 @@ from api.db_utils import (
InvitationStateEnumField,
MemberRoleEnumField,
ProcessorTypeEnumField,
ProviderEnumField,
ProviderSecretTypeEnumField,
ScanTriggerEnumField,
SeverityEnumField,
@@ -59,6 +58,7 @@ from api.rls import (
Tenant,
)
from prowler.lib.check.models import Severity
from prowler.providers.common.provider import Provider as SDKProvider
fernet = Fernet(settings.SECRETS_ENCRYPTION_KEY.encode())
@@ -482,9 +482,10 @@ class Provider(RowLevelSecurityProtectedModel):
inserted_at = models.DateTimeField(auto_now_add=True, editable=False)
updated_at = models.DateTimeField(auto_now=True, editable=False)
is_deleted = models.BooleanField(default=False)
provider = ProviderEnumField(
choices=ProviderChoices.choices, default=ProviderChoices.AWS
)
# Stored as a plain varchar: the SDK is the source of truth for which
# providers are valid, so the column accepts any provider name without a
# database-level enum to keep in sync.
provider = models.CharField(max_length=50, default=ProviderChoices.AWS)
uid = models.CharField(
"Unique identifier for the provider, set by the provider",
max_length=250,
@@ -501,13 +502,24 @@ class Provider(RowLevelSecurityProtectedModel):
def clean(self):
super().clean()
if self.provider not in SDKProvider.get_available_providers():
raise ModelValidationError(
detail=f"{self.provider} is not a supported provider.",
code="invalid",
pointer="/data/attributes/provider",
)
if self.provider == self.ProviderChoices.OKTA and self.uid:
# Mirror the SDK, which lowercases the org domain before connecting.
# Without this the API would reject Acme.okta.com even though the
# SDK would accept it, and stored uids could disagree with the
# authenticated org domain.
self.uid = self.uid.strip().lower()
getattr(self, f"validate_{self.provider}_uid")(self.uid)
# Providers the SDK exposes but the API has no specific uid rule for
# (e.g. external providers) fall back to the field-level min-length
# check only, instead of failing on a missing validator.
uid_validator = getattr(self, f"validate_{self.provider}_uid", None)
if uid_validator is not None:
uid_validator(self.uid)
def save(self, *args, **kwargs):
self.full_clean()
@@ -666,6 +678,9 @@ class Scan(RowLevelSecurityProtectedModel):
state = StateEnumField(choices=StateChoices.choices, default=StateChoices.AVAILABLE)
unique_resource_count = models.IntegerField(default=0)
progress = models.IntegerField(default=0)
# Incremented by the scan-specific orphan-recovery path each time this scan is
# re-pointed to a fresh task; for observability (the retry cap is a Valkey counter).
recovery_count = models.IntegerField(default=0)
scanner_args = models.JSONField(default=dict)
duration = models.IntegerField(null=True, blank=True)
scheduled_at = models.DateTimeField(null=True, blank=True)
@@ -1998,6 +2013,35 @@ class IntegrationProviderRelationship(RowLevelSecurityProtectedModel):
]
class JiraIssueDispatch(RowLevelSecurityProtectedModel):
"""Tracks findings already sent to a Jira integration.
Lets the Jira task be re-run safely (e.g. by orphan recovery): findings with
an existing dispatch row are skipped, so no duplicate issues are created.
"""
id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
inserted_at = models.DateTimeField(auto_now_add=True, editable=False)
integration = models.ForeignKey(
Integration, on_delete=models.CASCADE, related_name="jira_dispatches"
)
finding_id = models.UUIDField()
class Meta(RowLevelSecurityProtectedModel.Meta):
db_table = "jira_issue_dispatches"
constraints = [
models.UniqueConstraint(
fields=["tenant_id", "integration_id", "finding_id"],
name="unique_jira_issue_dispatch",
),
RowLevelSecurityConstraint(
field="tenant_id",
name="rls_on_%(class)s",
statements=["SELECT", "INSERT", "UPDATE", "DELETE"],
),
]
class SAMLToken(models.Model):
id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
inserted_at = models.DateTimeField(auto_now_add=True, editable=False)
+15
View File
@@ -0,0 +1,15 @@
from functools import lru_cache
from prowler.providers.common.provider import Provider as SDKProvider
@lru_cache(maxsize=1)
def get_provider_type_choices():
"""Provider-type choices from the SDK's available providers, so they cover
external providers and not just a static enum.
Cached for the process lifetime; hot-installing a provider needs
coordinated cache invalidation (tracked separately) to show up here without
a restart. Shared by the filters and the provider serializer.
"""
return [(name, name) for name in SDKProvider.get_available_providers()]
+53 -2
View File
@@ -1,7 +1,7 @@
openapi: 3.0.3
info:
title: Prowler API
version: 1.30.0
version: 1.31.0
description: |-
Prowler API specification.
@@ -13137,8 +13137,59 @@ paths:
responses:
'200':
description: CSV file containing the compliance report
'202':
description: The task is in progress
'403':
description: There is a problem with credentials
'404':
description: Compliance report not found
description: Compliance report not found, or the scan has no reports yet
/api/v1/scans/{id}/compliance/{name}/ocsf:
get:
operationId: scans_compliance_ocsf_retrieve
description: Download a specific compliance report as an OCSF JSON file. Only
universal frameworks that declare an output configuration produce this artifact
(currently 'dora' and 'csa_ccm_4.0'); any other framework returns 404.
summary: Retrieve compliance report as OCSF JSON
parameters:
- in: query
name: fields[scan-reports]
schema:
type: array
items:
type: string
enum:
- id
- name
description: endpoint return only specific fields in the response on a per-type
basis by including a fields[TYPE] query parameter.
explode: false
- in: path
name: id
schema:
type: string
format: uuid
description: A UUID string identifying this scan.
required: true
- in: path
name: name
schema:
type: string
description: The compliance report name, like 'dora'
required: true
tags:
- Scan
security:
- JWT or API Key: []
responses:
'200':
description: OCSF JSON file containing the compliance report
'202':
description: The task is in progress
'403':
description: There is a problem with credentials
'404':
description: Compliance report not found, the framework does not provide
an OCSF export, or the scan has no reports yet
/api/v1/scans/{id}/csa:
get:
operationId: scans_csa_retrieve
+99 -48
View File
@@ -4,6 +4,8 @@ import pytest
from api import compliance as compliance_module
from api.compliance import (
LazyChecksMapping,
LazyComplianceTemplate,
generate_compliance_overview_template,
generate_scan_compliance,
get_compliance_frameworks,
@@ -12,7 +14,9 @@ from api.compliance import (
load_prowler_checks,
)
from api.models import Provider
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.compliance_models import (
get_bulk_compliance_frameworks_universal,
)
class TestCompliance:
@@ -28,32 +32,30 @@ class TestCompliance:
assert set(checks) == {"check1", "check2", "check3"}
mock_check_metadata.get_bulk.assert_called_once_with(provider_type)
@patch("api.compliance.Compliance")
def test_get_prowler_provider_compliance(self, mock_compliance):
@patch("api.compliance.get_bulk_compliance_frameworks_universal")
def test_get_prowler_provider_compliance(self, mock_get_bulk):
provider_type = Provider.ProviderChoices.AWS
mock_compliance.get_bulk.return_value = {
mock_get_bulk.return_value = {
"compliance1": MagicMock(),
"compliance2": MagicMock(),
}
compliance_data = get_prowler_provider_compliance(provider_type)
assert compliance_data == mock_compliance.get_bulk.return_value
mock_compliance.get_bulk.assert_called_once_with(provider_type)
assert compliance_data == mock_get_bulk.return_value
mock_get_bulk.assert_called_once_with(provider_type)
@patch("api.compliance.get_prowler_provider_checks")
@patch("api.models.Provider.ProviderChoices")
@patch("api.compliance.SDKProvider.get_available_providers", return_value=["aws"])
def test_load_prowler_checks(
self, mock_provider_choices, mock_get_prowler_provider_checks
self, _mock_get_available_providers, mock_get_prowler_provider_checks
):
mock_provider_choices.values = ["aws"]
mock_get_prowler_provider_checks.return_value = ["check1", "check2", "check3"]
prowler_compliance = {
"aws": {
"compliance1": MagicMock(
Requirements=[
requirements=[
MagicMock(
Checks=["check1", "check2"],
checks={"aws": ["check1", "check2"]},
),
],
),
@@ -163,39 +165,37 @@ class TestCompliance:
is None
)
@patch("api.models.Provider.ProviderChoices")
def test_generate_compliance_overview_template(self, mock_provider_choices):
mock_provider_choices.values = ["aws"]
@patch("api.compliance.SDKProvider.get_available_providers", return_value=["aws"])
def test_generate_compliance_overview_template(self, _mock_get_available_providers):
requirement1 = MagicMock(
Id="requirement1",
Name="Requirement 1",
Description="Description of requirement 1",
Attributes=[],
Checks=["check1", "check2"],
Tactics=["tactic1"],
SubTechniques=["subtechnique1"],
Platforms=["platform1"],
TechniqueURL="https://example.com",
id="requirement1",
description="Description of requirement 1",
attributes=[],
checks={"aws": ["check1", "check2"]},
tactics=["tactic1"],
sub_techniques=["subtechnique1"],
platforms=["platform1"],
technique_url="https://example.com",
)
requirement1.name = "Requirement 1"
requirement2 = MagicMock(
Id="requirement2",
Name="Requirement 2",
Description="Description of requirement 2",
Attributes=[],
Checks=[],
Tactics=[],
SubTechniques=[],
Platforms=[],
TechniqueURL="",
id="requirement2",
description="Description of requirement 2",
attributes=[],
checks={"aws": []},
tactics=[],
sub_techniques=[],
platforms=[],
technique_url="",
)
requirement2.name = "Requirement 2"
compliance1 = MagicMock(
Requirements=[requirement1, requirement2],
Framework="Framework 1",
Version="1.0",
Description="Description of compliance1",
Name="Compliance 1",
requirements=[requirement1, requirement2],
framework="Framework 1",
version="1.0",
description="Description of compliance1",
)
compliance1.name = "Compliance 1"
prowler_compliance = {"aws": {"compliance1": compliance1}}
template = generate_compliance_overview_template(prowler_compliance)
@@ -271,24 +271,28 @@ def reset_compliance_cache():
class TestGetComplianceFrameworks:
def test_returns_keys_from_compliance_get_bulk(self, reset_compliance_cache):
with patch("api.compliance.Compliance") as mock_compliance:
mock_compliance.get_bulk.return_value = {
with patch(
"api.compliance.get_bulk_compliance_frameworks_universal"
) as mock_get_bulk:
mock_get_bulk.return_value = {
"cis_1.4_aws": MagicMock(),
"mitre_attack_aws": MagicMock(),
}
result = get_compliance_frameworks(Provider.ProviderChoices.AWS)
assert sorted(result) == ["cis_1.4_aws", "mitre_attack_aws"]
mock_compliance.get_bulk.assert_called_once_with(Provider.ProviderChoices.AWS)
mock_get_bulk.assert_called_once_with(Provider.ProviderChoices.AWS)
def test_caches_result_per_provider(self, reset_compliance_cache):
with patch("api.compliance.Compliance") as mock_compliance:
mock_compliance.get_bulk.return_value = {"cis_1.4_aws": MagicMock()}
with patch(
"api.compliance.get_bulk_compliance_frameworks_universal"
) as mock_get_bulk:
mock_get_bulk.return_value = {"cis_1.4_aws": MagicMock()}
get_compliance_frameworks(Provider.ProviderChoices.AWS)
get_compliance_frameworks(Provider.ProviderChoices.AWS)
# Cached after first call.
assert mock_compliance.get_bulk.call_count == 1
assert mock_get_bulk.call_count == 1
@pytest.mark.parametrize(
"provider_type",
@@ -296,17 +300,64 @@ class TestGetComplianceFrameworks:
)
def test_listing_is_subset_of_bulk(self, reset_compliance_cache, provider_type):
"""Regression for CLOUD-API-40S: every name returned by
``get_compliance_frameworks`` must be loadable via ``Compliance.get_bulk``.
``get_compliance_frameworks`` must be loadable via
``get_bulk_compliance_frameworks_universal``.
A divergence here is what produced ``KeyError: 'csa_ccm_4.0'`` in
``generate_outputs_task`` after universal/multi-provider compliance
JSONs were introduced at the top-level ``prowler/compliance/`` path.
"""
bulk_keys = set(Compliance.get_bulk(provider_type).keys())
bulk_keys = set(get_bulk_compliance_frameworks_universal(provider_type).keys())
listed = set(get_compliance_frameworks(provider_type))
missing = listed - bulk_keys
assert not missing, (
f"get_compliance_frameworks({provider_type!r}) returned names not "
f"loadable by Compliance.get_bulk: {sorted(missing)}"
f"loadable by get_bulk_compliance_frameworks_universal: "
f"{sorted(missing)}"
)
class TestDynamicProviderDiscovery:
"""Compliance discovery is sourced from the SDK's available providers, so an
externally-registered provider absent from the legacy static enum is
discovered without any change to the API."""
EXTERNAL = "acme-external"
@patch(
"api.compliance.SDKProvider.get_available_providers",
return_value=["aws", EXTERNAL],
)
def test_lazy_template_includes_external_provider(self, _mock_available):
template = LazyComplianceTemplate()
assert self.EXTERNAL in template
assert set(template) == {"aws", self.EXTERNAL}
@patch(
"api.compliance.SDKProvider.get_available_providers",
return_value=["aws", EXTERNAL],
)
def test_lazy_checks_mapping_includes_external_provider(self, _mock_available):
checks_mapping = LazyChecksMapping()
assert self.EXTERNAL in checks_mapping
assert set(checks_mapping) == {"aws", self.EXTERNAL}
@patch(
"api.compliance.SDKProvider.get_available_providers",
return_value=[EXTERNAL],
)
def test_overview_template_iterates_dynamic_providers(self, _mock_available):
template = generate_compliance_overview_template({self.EXTERNAL: {}})
assert self.EXTERNAL in template
@patch("api.compliance.get_prowler_provider_checks", return_value=["check1"])
@patch(
"api.compliance.SDKProvider.get_available_providers",
return_value=[EXTERNAL],
)
def test_load_checks_iterates_dynamic_providers(
self, _mock_available, _mock_checks
):
checks = load_prowler_checks({self.EXTERNAL: {}})
assert self.EXTERNAL in checks
+23
View File
@@ -0,0 +1,23 @@
from api.filters import get_provider_type_choices
from prowler.providers.common.provider import Provider as SDKProvider
class TestProviderTypeChoices:
"""Provider-type filter choices are driven by the SDK's available providers
so filtering covers external providers, not just a static enum."""
def test_choices_track_sdk_available_providers(self):
available = set(SDKProvider.get_available_providers())
choices = get_provider_type_choices()
assert {value for value, _ in choices} == available
def test_choices_include_provider_absent_from_legacy_enum(self):
from api.models import Provider
legacy = {value for value, _ in Provider.ProviderChoices.choices}
choice_values = {value for value, _ in get_provider_type_choices()}
# `llm` is exposed by the SDK but is not part of the legacy static enum.
assert "llm" in choice_values
assert "llm" not in legacy
+28
View File
@@ -6,7 +6,9 @@ from django.core.exceptions import ValidationError
from django.db import IntegrityError
from api.db_router import MainRouter
from api.exceptions import ModelValidationError
from api.models import (
Provider,
ProviderComplianceScore,
Resource,
ResourceTag,
@@ -525,3 +527,29 @@ class TestTenantComplianceSummaryModel:
assert summary1.id != summary2.id
assert summary1.requirements_passed != summary2.requirements_passed
@pytest.mark.django_db
class TestProviderDynamicValidation:
"""Provider validity is driven by the SDK's available providers, not a
static enum. Providers the SDK exposes are accepted; for those without a
`validate_<provider>_uid` method only the uid min-length floor applies."""
def test_accepts_provider_without_uid_validator(self, tenants_fixture):
tenant = tenants_fixture[0]
provider = Provider.objects.create(
tenant_id=tenant.id, provider="llm", uid="my-llm-account"
)
assert provider.provider == "llm"
def test_rejects_provider_not_available_in_sdk(self, tenants_fixture):
tenant = tenants_fixture[0]
with pytest.raises(ModelValidationError):
Provider.objects.create(
tenant_id=tenant.id, provider="does-not-exist", uid="whatever"
)
def test_uid_floor_still_enforced_for_external_provider(self, tenants_fixture):
tenant = tenants_fixture[0]
with pytest.raises(ValidationError):
Provider.objects.create(tenant_id=tenant.id, provider="llm", uid="ab")
+96 -1
View File
@@ -1,8 +1,103 @@
from unittest.mock import MagicMock, patch
import pytest
from pydantic import BaseModel
from rest_framework.exceptions import ValidationError
from api.v1.serializer_utils.integrations import S3ConfigSerializer
from api.v1.serializers import ImageProviderSecret
from api.v1.serializers import (
BaseWriteProviderSecretSerializer,
ImageProviderSecret,
ProviderEnumSerializerField,
)
class TestExternalProviderSecretValidation:
"""A non-built-in provider's secret is validated against the schema it
declares for the chosen secret type through the SDK contract, or accepted as
an object when it declares none (then validated by test_connection)."""
class _StaticCredentials(BaseModel):
api_url: str
api_key: str
class _RoleCredentials(BaseModel):
role_arn: str
def _patch(self, schemas):
provider_class = MagicMock()
provider_class.get_credentials_schema.return_value = schemas
return patch(
"api.v1.serializers.SDKProvider.get_class", return_value=provider_class
)
def test_secret_validated_against_its_type_schema(self):
with self._patch({"static": self._StaticCredentials}):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
"external-template", "static", {"api_url": "u", "api_key": "k"}
)
def test_secret_rejected_when_schema_violated(self):
with self._patch({"static": self._StaticCredentials}):
with pytest.raises(ValidationError):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
"external-template", "static", {"api_url": "u"}
)
def test_secret_must_match_its_type_not_another(self):
"""A secret is validated against the schema for its declared secret_type,
not "any declared schema": a role-shaped secret under secret_type=static
is rejected. See PR #11402 review (josema-xyz / Alan-TheGentleman)."""
schemas = {"static": self._StaticCredentials, "role": self._RoleCredentials}
with self._patch(schemas):
with pytest.raises(ValidationError):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
"external-template", "static", {"role_arn": "arn:aws:iam::x"}
)
def test_rejects_secret_type_not_declared_by_provider(self):
with self._patch({"static": self._StaticCredentials}):
with pytest.raises(ValidationError):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
"external-template", "role", {"role_arn": "arn"}
)
def test_secret_accepted_when_no_schema_declared(self):
with self._patch({}):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
"external-template", "static", {"anything": "goes"}
)
@pytest.mark.parametrize("bad_secret", [["a", "b"], "a-string", None, 42])
def test_secret_rejected_when_not_a_json_object(self, bad_secret):
"""Even with no declared schema, a non-object secret must be rejected so
a list/string/null cannot be persisted and blow up later at
``{**secret}``. See PR #11402 review (Alan-TheGentleman)."""
with self._patch({}):
with pytest.raises(ValidationError):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
"external-template", "static", bad_secret
)
class TestProviderEnumSerializerField:
"""The provider field accepts whatever the SDK exposes (built-in or
external) and rejects anything else with `invalid_choice`."""
def test_accepts_sdk_available_provider(self):
field = ProviderEnumSerializerField()
assert field.run_validation("aws") == "aws"
def test_accepts_external_provider_absent_from_static_enum(self):
field = ProviderEnumSerializerField()
# `llm` is exposed by the SDK but is not part of the legacy static enum.
assert field.run_validation("llm") == "llm"
def test_rejects_unknown_provider(self):
field = ProviderEnumSerializerField()
with pytest.raises(ValidationError) as exc:
field.run_validation("does-not-exist")
assert exc.value.detail[0].code == "invalid_choice"
class TestS3ConfigSerializer:
+63 -3
View File
@@ -146,11 +146,25 @@ class TestReturnProwlerProvider:
with pytest.raises(ValueError):
return return_prowler_provider(provider)
def test_return_prowler_provider_external_resolves_via_get_class(self):
"""An external provider name resolves through the SDK resolver, so any
entry-point provider works without a hardcoded branch in the API."""
external_class = type("ExternalProvider", (), {})
provider = MagicMock()
provider.provider = "external-template"
with patch(
"api.utils.SDKProvider.get_class", return_value=external_class
) as mock_get_class:
resolved = return_prowler_provider(provider)
mock_get_class.assert_called_once_with("external-template")
assert resolved is external_class
class TestInitializeProwlerProvider:
@patch("api.utils.return_prowler_provider")
def test_initialize_prowler_provider(self, mock_return_prowler_provider):
provider = MagicMock()
provider.provider = "aws"
provider.secret.secret = {"key": "value"}
mock_return_prowler_provider.return_value = MagicMock()
@@ -162,6 +176,7 @@ class TestInitializeProwlerProvider:
self, mock_return_prowler_provider
):
provider = MagicMock()
provider.provider = "aws"
provider.secret.secret = {"key": "value"}
mutelist_processor = MagicMock()
mutelist_processor.configuration = {"Mutelist": {"key": "value"}}
@@ -177,6 +192,7 @@ class TestProwlerProviderConnectionTest:
@patch("api.utils.return_prowler_provider")
def test_prowler_provider_connection_test(self, mock_return_prowler_provider):
provider = MagicMock()
provider.provider = "aws"
provider.uid = "1234567890"
provider.secret.secret = {"key": "value"}
mock_return_prowler_provider.return_value = MagicMock()
@@ -186,6 +202,29 @@ class TestProwlerProviderConnectionTest:
key="value", provider_id="1234567890", raise_on_exception=False
)
@patch("api.utils.return_prowler_provider")
def test_prowler_provider_connection_test_external_uses_contract(
self, mock_return_prowler_provider
):
"""External providers build connection kwargs through the SDK contract
(get_connection_arguments), with no provider_id forced by the API."""
provider = MagicMock()
provider.provider = "external-template"
provider.uid = "acme-1"
provider.secret.secret = {"api_key": "k"}
mock_return_prowler_provider.return_value.get_connection_arguments.return_value = {
"api_key": "k"
}
prowler_provider_connection_test(provider)
mock_return_prowler_provider.return_value.get_connection_arguments.assert_called_once_with(
"acme-1", {"api_key": "k"}
)
mock_return_prowler_provider.return_value.test_connection.assert_called_once_with(
api_key="k", raise_on_exception=False
)
@pytest.mark.django_db
@patch("api.utils.return_prowler_provider")
def test_prowler_provider_connection_test_without_secret(
@@ -560,7 +599,8 @@ class TestGetProwlerProviderKwargs:
assert result == expected_result
def test_get_prowler_provider_kwargs_unsupported_provider(self):
# Setup
# A non-built-in provider is resolved dynamically; one that is neither a
# built-in nor an installed entry-point provider cannot be resolved.
provider_uid = "provider_uid"
secret_dict = {"key": "value"}
secret_mock = MagicMock()
@@ -571,10 +611,30 @@ class TestGetProwlerProviderKwargs:
provider.secret = secret_mock
provider.uid = provider_uid
with pytest.raises(ValueError):
get_prowler_provider_kwargs(provider)
@patch("api.utils.return_prowler_provider")
def test_get_prowler_provider_kwargs_external_uses_contract(
self, mock_return_prowler_provider
):
"""External providers build constructor kwargs through the SDK contract
(get_scan_arguments), not a hardcoded branch in the API."""
provider = MagicMock()
provider.provider = "external-template"
provider.uid = "acme-1"
provider.secret.secret = {"api_key": "k"}
mock_return_prowler_provider.return_value.get_scan_arguments.return_value = {
"api_key": "k",
"custom": "acme-1",
}
result = get_prowler_provider_kwargs(provider)
expected_result = secret_dict.copy()
assert result == expected_result
mock_return_prowler_provider.return_value.get_scan_arguments.assert_called_once_with(
"acme-1", {"api_key": "k"}, None
)
assert result == {"api_key": "k", "custom": "acme-1"}
def test_get_prowler_provider_kwargs_no_secret(self):
# Setup
+238 -13
View File
@@ -24,9 +24,11 @@ from conftest import (
today_after_n_days,
)
from django.conf import settings
from django.db import connection
from django.db.models import Count
from django.http import JsonResponse
from django.test import RequestFactory
from django.test.utils import CaptureQueriesContext
from django.urls import reverse
from django_celery_results.models import TaskResult
from rest_framework import status
@@ -64,6 +66,7 @@ from api.models import (
ProviderSecret,
Resource,
ResourceFindingMapping,
ResourceTag,
Role,
RoleProviderGroupRelationship,
SAMLConfiguration,
@@ -3856,16 +3859,20 @@ class TestScanViewSet:
scan.output_location = "dummy"
scan.save()
dummy_task = Task.objects.create(tenant_id=scan.tenant_id)
dummy_task.id = "dummy-task-id"
dummy_task_data = {"id": dummy_task.id, "state": StateChoices.EXECUTING}
task_result = TaskResult.objects.create(
task_id=str(uuid4()),
task_name="scan-report",
task_kwargs={"scan_id": str(scan.id)},
)
task = Task.objects.create(
tenant_id=scan.tenant_id,
task_runner_task=task_result,
)
dummy_task_data = {"id": str(task.id), "state": StateChoices.EXECUTING}
with (
patch("api.v1.views.Task.objects.get", return_value=dummy_task),
patch(
"api.v1.views.TaskSerializer",
return_value=type("DummySerializer", (), {"data": dummy_task_data}),
),
with patch(
"api.v1.views.TaskSerializer",
return_value=type("DummySerializer", (), {"data": dummy_task_data}),
):
url = reverse("scan-report", kwargs={"pk": scan.id})
response = authenticated_client.get(url)
@@ -4186,6 +4193,88 @@ class TestScanViewSet:
assert resp.status_code == status.HTTP_302_FOUND
assert resp["Location"] == presigned_url
def test_compliance_s3_returns_latest_match(
self, authenticated_client, scans_fixture, monkeypatch
):
"""When several files match, the most recently modified one is served."""
scan = scans_fixture[0]
bucket = "bucket"
scan.output_location = f"s3://{bucket}/path/scan.zip"
scan.state = StateChoices.COMPLETED
scan.save()
monkeypatch.setattr(
"api.v1.views.env",
type("env", (), {"str": lambda self, *args, **kwargs: "test-bucket"})(),
)
old_key = "path/compliance/prowler-output-aws-20240101000000_cis_1.4_aws.csv"
latest_key = "path/compliance/prowler-output-aws-20240202000000_cis_1.4_aws.csv"
class FakeS3Client:
def list_objects_v2(self, Bucket, Prefix):
return {
"Contents": [
{
"Key": old_key,
"LastModified": datetime(2024, 1, 1, tzinfo=timezone.utc),
},
{
"Key": latest_key,
"LastModified": datetime(2024, 2, 2, tzinfo=timezone.utc),
},
]
}
def generate_presigned_url(self, ClientMethod, Params, ExpiresIn):
assert Params["Key"] == latest_key
return "https://test-bucket.s3.amazonaws.com/latest"
monkeypatch.setattr("api.v1.views.get_s3_client", lambda: FakeS3Client())
url = reverse("scan-compliance", kwargs={"pk": scan.id, "name": "cis_1.4_aws"})
resp = authenticated_client.get(url)
assert resp.status_code == status.HTTP_302_FOUND
assert resp["Location"].endswith("/latest")
def test_compliance_local_returns_latest_match(
self, authenticated_client, scans_fixture, monkeypatch
):
"""The local branch serves the most recently modified matching file."""
scan = scans_fixture[0]
scan.state = StateChoices.COMPLETED
with tempfile.TemporaryDirectory() as tmp:
comp_dir = Path(tmp) / "reports" / "compliance"
comp_dir.mkdir(parents=True, exist_ok=True)
old_file = comp_dir / "prowler-output-aws-20240101000000_cis_1.4_aws.csv"
old_file.write_bytes(b"old")
latest_file = comp_dir / "prowler-output-aws-20240202000000_cis_1.4_aws.csv"
latest_file.write_bytes(b"latest")
# Make `latest_file` newer regardless of creation order.
os.utime(old_file, (1_700_000_000, 1_700_000_000))
os.utime(latest_file, (1_700_000_100, 1_700_000_100))
scan.output_location = str(Path(tmp) / "reports" / "scan.zip")
scan.save()
monkeypatch.setattr(
glob,
"glob",
lambda p: [str(old_file), str(latest_file)],
)
url = reverse(
"scan-compliance", kwargs={"pk": scan.id, "name": "cis_1.4_aws"}
)
resp = authenticated_client.get(url)
assert resp.status_code == status.HTTP_200_OK
assert resp.content == b"latest"
assert resp["Content-Disposition"].endswith(
f'filename="{latest_file.name}"'
)
def test_compliance_s3_not_found(
self, authenticated_client, scans_fixture, monkeypatch
):
@@ -4294,18 +4383,24 @@ class TestScanViewSet:
assert cd.startswith('attachment; filename="')
assert cd.endswith(f'filename="{fname.name}"')
@patch("api.v1.views.Task.objects.get")
@patch("api.v1.views.TaskSerializer")
def test__get_task_status_returns_none_if_task_not_executing(
self, mock_task_serializer, mock_task_get, authenticated_client, scans_fixture
self, mock_task_serializer, authenticated_client, scans_fixture
):
scan = scans_fixture[0]
scan.state = StateChoices.COMPLETED
scan.output_location = "dummy"
scan.save()
task = Task.objects.create(tenant_id=scan.tenant_id)
mock_task_get.return_value = task
task_result = TaskResult.objects.create(
task_id=str(uuid4()),
task_name="scan-report",
task_kwargs={"scan_id": str(scan.id)},
)
task = Task.objects.create(
tenant_id=scan.tenant_id,
task_runner_task=task_result,
)
mock_task_serializer.return_value.data = {
"id": str(task.id),
"state": StateChoices.COMPLETED,
@@ -4326,6 +4421,7 @@ class TestScanViewSet:
scan.save()
task_result = TaskResult.objects.create(
task_id=str(uuid4()),
task_name="scan-report",
task_kwargs={"scan_id": str(scan.id)},
)
@@ -4346,6 +4442,51 @@ class TestScanViewSet:
assert response.status_code == status.HTTP_202_ACCEPTED
assert response.data["id"] == str(task.id)
@patch("api.v1.views.TaskSerializer")
def test__get_task_status_returns_latest_task(
self, mock_task_serializer, authenticated_client, scans_fixture
):
"""With several scan-report tasks for the scan, the most recent is used."""
scan = scans_fixture[0]
scan.state = StateChoices.COMPLETED
scan.output_location = "dummy"
scan.save()
old_task = Task.objects.create(
tenant_id=scan.tenant_id,
task_runner_task=TaskResult.objects.create(
task_id=str(uuid4()),
task_name="scan-report",
task_kwargs={"scan_id": str(scan.id)},
),
)
new_task = Task.objects.create(
tenant_id=scan.tenant_id,
task_runner_task=TaskResult.objects.create(
task_id=str(uuid4()),
task_name="scan-report",
task_kwargs={"scan_id": str(scan.id)},
),
)
# `inserted_at` is `auto_now_add`, and within the test transaction the DB
# `now()` is constant, so force distinct timestamps to make order_by stable.
base = datetime(2024, 1, 1, tzinfo=timezone.utc)
Task.objects.filter(pk=old_task.pk).update(inserted_at=base)
Task.objects.filter(pk=new_task.pk).update(
inserted_at=base + timedelta(hours=1)
)
mock_task_serializer.side_effect = lambda instance, *a, **k: SimpleNamespace(
data={"id": str(instance.id), "state": StateChoices.EXECUTING}
)
url = reverse("scan-report", kwargs={"pk": scan.id})
response = authenticated_client.get(url)
assert response.status_code == status.HTTP_202_ACCEPTED
assert str(new_task.id) in response["Content-Location"]
assert str(old_task.id) not in response["Content-Location"]
@patch("api.v1.views.get_s3_client")
@patch("api.v1.views.sentry_sdk.capture_exception")
def test_compliance_list_objects_client_error(
@@ -6916,6 +7057,80 @@ class TestFindingViewSet:
== findings_fixture[0].status
)
def test_findings_list_resource_tags_no_n_plus_one(
self, authenticated_client, findings_fixture
):
"""Listing findings must load every resource's tags in a constant
number of queries, no matter how many findings/resources are returned.
This guards ``FindingViewSet._optimize_tags_loading`` against
regressions that would reintroduce one extra query per resource (the
N+1 the prefetch was added to remove).
"""
scan = findings_fixture[0].scan
tenant_id = findings_fixture[0].tenant_id
provider = scan.provider
def _create_finding_with_tagged_resource(index):
resource = Resource.objects.create(
tenant_id=tenant_id,
provider=provider,
uid=f"arn:aws:ec2:us-east-1:123456789012:instance/n-plus-one-{index}",
name=f"N+1 Instance {index}",
region="us-east-1",
service="ec2",
type="prowler-test",
)
resource.upsert_or_delete_tags(
[
ResourceTag.objects.create(
tenant_id=tenant_id,
key=f"key-{index}",
value=f"value-{index}",
)
]
)
finding = Finding.objects.create(
tenant_id=tenant_id,
uid=f"n_plus_one_finding_{index}",
scan=scan,
status=Status.FAIL,
status_extended="n+1 status",
impact=Severity.medium,
severity=Severity.medium,
check_id="test_check_id",
check_metadata={"CheckId": "test_check_id", "servicename": "ec2"},
first_seen_at="2024-01-02T00:00:00Z",
)
finding.add_resources([resource])
return finding
params = {"filter[inserted_at]": TODAY, "include": "resources"}
# Baseline: the two findings provided by the fixture.
with CaptureQueriesContext(connection) as baseline:
response = authenticated_client.get(reverse("finding-list"), params)
assert response.status_code == status.HTTP_200_OK
# Add more findings, each with its own resource carrying tags.
extra_findings = 5
for index in range(extra_findings):
_create_finding_with_tagged_resource(index)
with CaptureQueriesContext(connection) as scaled:
response = authenticated_client.get(reverse("finding-list"), params)
assert response.status_code == status.HTTP_200_OK
assert len(response.json()["data"]) == len(findings_fixture) + extra_findings
# The query count must not grow with the number of findings/resources.
assert len(scaled.captured_queries) == len(baseline.captured_queries), (
"Resource tags are not being prefetched: "
f"{len(baseline.captured_queries)} queries for {len(findings_fixture)} "
f"findings vs {len(scaled.captured_queries)} for "
f"{len(findings_fixture) + extra_findings}. Likely an N+1 regression "
"in FindingViewSet._optimize_tags_loading."
)
@pytest.mark.parametrize(
"include_values, expected_resources",
[
@@ -9345,6 +9560,16 @@ class TestComplianceOverviewViewSet:
assert "platforms" in attributes["attributes"]["technique_details"]
assert "technique_url" in attributes["attributes"]["technique_details"]
# Guard against the `_raw_attributes` wrapper leaking through —
# the UI reads metadata[i].Category / .AWSService directly.
metadata = attributes["attributes"]["metadata"]
assert isinstance(metadata, list) and len(metadata) > 0
first_attr = metadata[0]
assert isinstance(first_attr, dict)
assert "_raw_attributes" not in first_attr
assert "Category" in first_attr
assert "AWSService" in first_attr
def test_compliance_overview_attributes_missing_compliance_id(
self, authenticated_client
):
+39 -149
View File
@@ -1,7 +1,6 @@
from __future__ import annotations
from datetime import datetime, timezone
from typing import TYPE_CHECKING
from allauth.socialaccount.providers.oauth2.client import OAuth2Client
from django.contrib.postgres.aggregates import ArrayAgg
@@ -17,30 +16,7 @@ from prowler.lib.outputs.jira.jira import Jira, JiraBasicAuthError
from prowler.providers.aws.lib.s3.s3 import S3
from prowler.providers.aws.lib.security_hub.security_hub import SecurityHub
from prowler.providers.common.models import Connection
if TYPE_CHECKING:
from prowler.providers.alibabacloud.alibabacloud_provider import (
AlibabacloudProvider,
)
from prowler.providers.aws.aws_provider import AwsProvider
from prowler.providers.azure.azure_provider import AzureProvider
from prowler.providers.cloudflare.cloudflare_provider import CloudflareProvider
from prowler.providers.gcp.gcp_provider import GcpProvider
from prowler.providers.github.github_provider import GithubProvider
from prowler.providers.googleworkspace.googleworkspace_provider import (
GoogleworkspaceProvider,
)
from prowler.providers.iac.iac_provider import IacProvider
from prowler.providers.image.image_provider import ImageProvider
from prowler.providers.kubernetes.kubernetes_provider import KubernetesProvider
from prowler.providers.m365.m365_provider import M365Provider
from prowler.providers.mongodbatlas.mongodbatlas_provider import (
MongodbatlasProvider,
)
from prowler.providers.okta.okta_provider import OktaProvider
from prowler.providers.openstack.openstack_provider import OpenstackProvider
from prowler.providers.oraclecloud.oraclecloud_provider import OraclecloudProvider
from prowler.providers.vercel.vercel_provider import VercelProvider
from prowler.providers.common.provider import Provider as SDKProvider
class CustomOAuth2Client(OAuth2Client):
@@ -79,117 +55,26 @@ def merge_dicts(default_dict: dict, replacement_dict: dict) -> dict:
return result
def return_prowler_provider(
provider: Provider,
) -> (
AlibabacloudProvider
| AwsProvider
| AzureProvider
| CloudflareProvider
| GcpProvider
| GithubProvider
| GoogleworkspaceProvider
| IacProvider
| ImageProvider
| KubernetesProvider
| M365Provider
| MongodbatlasProvider
| OktaProvider
| OpenstackProvider
| OraclecloudProvider
| VercelProvider
):
"""Return the Prowler provider class based on the given provider type.
def return_prowler_provider(provider: Provider) -> type:
"""Resolve the Prowler provider class for the given provider.
The class is resolved dynamically from the SDK, so any provider the SDK
discovers built-in or external entry-point works without a per-provider
branch in the API.
Args:
provider (Provider): The provider object containing the provider type and associated secrets.
provider (Provider): The provider whose `provider` type to resolve.
Returns:
AlibabacloudProvider | AwsProvider | AzureProvider | CloudflareProvider | GcpProvider | GithubProvider | GoogleworkspaceProvider | IacProvider | ImageProvider | KubernetesProvider | M365Provider | MongodbatlasProvider | OpenstackProvider | OraclecloudProvider: The corresponding provider class.
type: The Prowler provider class.
Raises:
ValueError: If the provider type specified in `provider.provider` is not supported.
ValueError: If the provider type is not available in the SDK.
"""
match provider.provider:
case Provider.ProviderChoices.AWS.value:
from prowler.providers.aws.aws_provider import AwsProvider
prowler_provider = AwsProvider
case Provider.ProviderChoices.GCP.value:
from prowler.providers.gcp.gcp_provider import GcpProvider
prowler_provider = GcpProvider
case Provider.ProviderChoices.GOOGLEWORKSPACE.value:
from prowler.providers.googleworkspace.googleworkspace_provider import (
GoogleworkspaceProvider,
)
prowler_provider = GoogleworkspaceProvider
case Provider.ProviderChoices.AZURE.value:
from prowler.providers.azure.azure_provider import AzureProvider
prowler_provider = AzureProvider
case Provider.ProviderChoices.KUBERNETES.value:
from prowler.providers.kubernetes.kubernetes_provider import (
KubernetesProvider,
)
prowler_provider = KubernetesProvider
case Provider.ProviderChoices.M365.value:
from prowler.providers.m365.m365_provider import M365Provider
prowler_provider = M365Provider
case Provider.ProviderChoices.GITHUB.value:
from prowler.providers.github.github_provider import GithubProvider
prowler_provider = GithubProvider
case Provider.ProviderChoices.MONGODBATLAS.value:
from prowler.providers.mongodbatlas.mongodbatlas_provider import (
MongodbatlasProvider,
)
prowler_provider = MongodbatlasProvider
case Provider.ProviderChoices.IAC.value:
from prowler.providers.iac.iac_provider import IacProvider
prowler_provider = IacProvider
case Provider.ProviderChoices.ORACLECLOUD.value:
from prowler.providers.oraclecloud.oraclecloud_provider import (
OraclecloudProvider,
)
prowler_provider = OraclecloudProvider
case Provider.ProviderChoices.ALIBABACLOUD.value:
from prowler.providers.alibabacloud.alibabacloud_provider import (
AlibabacloudProvider,
)
prowler_provider = AlibabacloudProvider
case Provider.ProviderChoices.CLOUDFLARE.value:
from prowler.providers.cloudflare.cloudflare_provider import (
CloudflareProvider,
)
prowler_provider = CloudflareProvider
case Provider.ProviderChoices.OPENSTACK.value:
from prowler.providers.openstack.openstack_provider import OpenstackProvider
prowler_provider = OpenstackProvider
case Provider.ProviderChoices.IMAGE.value:
from prowler.providers.image.image_provider import ImageProvider
prowler_provider = ImageProvider
case Provider.ProviderChoices.VERCEL.value:
from prowler.providers.vercel.vercel_provider import VercelProvider
prowler_provider = VercelProvider
case Provider.ProviderChoices.OKTA.value:
from prowler.providers.okta.okta_provider import OktaProvider
prowler_provider = OktaProvider
case _:
raise ValueError(f"Provider type {provider.provider} not supported")
return prowler_provider
try:
return SDKProvider.get_class(provider.provider)
except ImportError as error:
raise ValueError(f"Provider type {provider.provider} not supported") from error
def get_prowler_provider_kwargs(
@@ -204,6 +89,18 @@ def get_prowler_provider_kwargs(
Returns:
dict: The provider kwargs for the corresponding provider class.
"""
# External providers declare their own uid/secret -> kwargs mapping through
# the SDK contract; built-in providers keep the explicit mapping below.
if not SDKProvider.is_builtin(provider.provider):
mutelist_content = (
mutelist_processor.configuration.get("Mutelist", {})
if mutelist_processor
else None
)
return return_prowler_provider(provider).get_scan_arguments(
provider.uid, provider.secret.secret, mutelist_content
)
prowler_provider_kwargs = provider.secret.secret
if provider.provider == Provider.ProviderChoices.AZURE.value:
prowler_provider_kwargs = {
@@ -288,24 +185,7 @@ def get_prowler_provider_kwargs(
def initialize_prowler_provider(
provider: Provider,
mutelist_processor: Processor | None = None,
) -> (
AlibabacloudProvider
| AwsProvider
| AzureProvider
| CloudflareProvider
| GcpProvider
| GithubProvider
| GoogleworkspaceProvider
| IacProvider
| ImageProvider
| KubernetesProvider
| M365Provider
| MongodbatlasProvider
| OktaProvider
| OpenstackProvider
| OraclecloudProvider
| VercelProvider
):
) -> SDKProvider:
"""Initialize a Prowler provider instance based on the given provider type.
Args:
@@ -313,8 +193,8 @@ def initialize_prowler_provider(
mutelist_processor (Processor): The mutelist processor object containing the mutelist configuration.
Returns:
AlibabacloudProvider | AwsProvider | AzureProvider | CloudflareProvider | GcpProvider | GithubProvider | GoogleworkspaceProvider | IacProvider | ImageProvider | KubernetesProvider | M365Provider | MongodbatlasProvider | OpenstackProvider | OraclecloudProvider: An instance of the corresponding provider class
initialized with the provider's secrets.
SDKProvider: An instance of the corresponding provider class initialized
with the provider's secrets.
"""
prowler_provider = return_prowler_provider(provider)
prowler_provider_kwargs = get_prowler_provider_kwargs(provider, mutelist_processor)
@@ -337,6 +217,16 @@ def prowler_provider_connection_test(provider: Provider) -> Connection:
except Provider.secret.RelatedObjectDoesNotExist as secret_error:
return Connection(is_connected=False, error=secret_error)
# External providers declare their own connection kwargs through the SDK
# contract; built-in providers keep the explicit mapping below.
if not SDKProvider.is_builtin(provider.provider):
return prowler_provider.test_connection(
**prowler_provider.get_connection_arguments(
provider.uid, prowler_provider_kwargs
),
raise_on_exception=False,
)
# For IaC provider, construct the kwargs properly for test_connection
if provider.provider == Provider.ProviderChoices.IAC.value:
# Don't pass repository_url from secret, use scan_repository_url with the UID
+61 -1
View File
@@ -10,6 +10,7 @@ from django.core.exceptions import ValidationError as DjangoValidationError
from django.db import IntegrityError
from drf_spectacular.utils import extend_schema_field
from jwt.exceptions import InvalidKeyError
from pydantic import ValidationError as PydanticValidationError
from rest_framework.reverse import reverse
from rest_framework.validators import UniqueTogetherValidator
from rest_framework_json_api import serializers
@@ -71,8 +72,10 @@ from api.v1.serializer_utils.lighthouse import (
OpenAICredentialsSerializer,
)
from api.v1.serializer_utils.processors import ProcessorConfigField
from api.provider_types import get_provider_type_choices
from api.v1.serializer_utils.providers import ProviderSecretField
from prowler.lib.mutelist.mutelist import Mutelist
from prowler.providers.common.provider import Provider as SDKProvider
# Base
@@ -854,7 +857,9 @@ class ProviderGroupMembershipSerializer(RLSSerializer, BaseWriteSerializer):
# Providers
class ProviderEnumSerializerField(serializers.ChoiceField):
def __init__(self, **kwargs):
kwargs["choices"] = Provider.ProviderChoices.choices
# Accepted values track the SDK's installed providers (built-in or
# external), shared with the filters via one cached source.
kwargs["choices"] = get_provider_type_choices()
super().__init__(**kwargs)
@@ -940,6 +945,12 @@ class ProviderIncludeSerializer(RLSSerializer):
class ProviderCreateSerializer(RLSSerializer, BaseWriteSerializer):
# Declared explicitly so provider validation stays at the serializer layer:
# the model column is now a plain varchar with no choices, so without this
# an unknown provider would slip through to Provider.clean() instead of
# being rejected here with `invalid_choice`.
provider = ProviderEnumSerializerField()
class Meta:
model = Provider
fields = [
@@ -1532,10 +1543,59 @@ class FindingMetadataSerializer(BaseSerializerV1):
# Provider secrets
class BaseWriteProviderSecretSerializer(BaseWriteSerializer):
@staticmethod
def _validate_external_provider_secret(
provider_type: str, secret_type: str, secret: dict
):
"""Validate a non-built-in provider's secret against the schema it
declares for the given secret type through the SDK contract.
The provider maps each secret type to one model, so the chosen
secret_type stays bound to the shape it claims. Providers that declare
no schema have their secret accepted as an object and validated by the
provider's ``test_connection``.
"""
if not isinstance(secret, dict):
raise serializers.ValidationError({"secret": ["Must be a JSON object."]})
schemas = SDKProvider.get_class(provider_type).get_credentials_schema()
if not schemas:
return
schema = schemas.get(secret_type)
if schema is None:
raise serializers.ValidationError(
{
"secret_type": [
f"'{secret_type}' is not supported by provider "
f"'{provider_type}'. Supported types: "
f"{', '.join(sorted(schemas))}."
]
}
)
try:
schema.model_validate(secret)
except PydanticValidationError as error:
raise serializers.ValidationError(
{
"secret": [
f"{'/'.join(str(loc) for loc in item['loc']) or 'secret'}: "
f"{item['msg']}"
for item in error.errors()
]
}
)
@staticmethod
def validate_secret_based_on_provider(
provider_type: str, secret_type: ProviderSecret.TypeChoices, secret: dict
):
# External providers validate against the schemas they declare via the
# SDK contract; built-in providers keep their explicit serializers below.
if not SDKProvider.is_builtin(provider_type):
BaseWriteProviderSecretSerializer._validate_external_provider_secret(
provider_type, secret_type, secret
)
return
if secret_type == ProviderSecret.TypeChoices.STATIC:
if provider_type == Provider.ProviderChoices.AWS.value:
serializer = AwsProviderSecret(data=secret)
+141 -57
View File
@@ -116,6 +116,7 @@ from api.base_views import BaseRLSViewSet, BaseTenantViewset, BaseUserViewset
from api.compliance import (
PROWLER_COMPLIANCE_OVERVIEW_TEMPLATE,
get_compliance_frameworks,
get_prowler_provider_compliance,
)
from api.constants import SEVERITY_ORDER
from api.db_router import MainRouter
@@ -324,6 +325,7 @@ from prowler.providers.aws.exceptions.exceptions import (
from prowler.providers.aws.lib.cloudtrail_timeline.cloudtrail_timeline import (
CloudTrailTimeline,
)
from prowler.providers.common.provider import Provider as SDKProvider
logger = logging.getLogger(BackendLogger.API)
@@ -1849,7 +1851,42 @@ class ProviderViewSet(DisablePaginationMixin, BaseRLSViewSet):
200: OpenApiResponse(
description="CSV file containing the compliance report"
),
404: OpenApiResponse(description="Compliance report not found"),
202: OpenApiResponse(description="The task is in progress"),
403: OpenApiResponse(description="There is a problem with credentials"),
404: OpenApiResponse(
description="Compliance report not found, or the scan has no reports yet"
),
},
request=None,
),
compliance_ocsf=extend_schema(
tags=["Scan"],
summary="Retrieve compliance report as OCSF JSON",
description=(
"Download a specific compliance report as an OCSF JSON file. "
"Only universal frameworks that declare an output configuration "
"produce this artifact (currently 'dora' and 'csa_ccm_4.0'); any "
"other framework returns 404."
),
parameters=[
OpenApiParameter(
name="name",
type=str,
location=OpenApiParameter.PATH,
required=True,
description="The compliance report name, like 'dora'",
),
],
responses={
200: OpenApiResponse(
description="OCSF JSON file containing the compliance report"
),
202: OpenApiResponse(description="The task is in progress"),
403: OpenApiResponse(description="There is a problem with credentials"),
404: OpenApiResponse(
description="Compliance report not found, the framework does "
"not provide an OCSF export, or the scan has no reports yet"
),
},
request=None,
),
@@ -1992,35 +2029,23 @@ class ScanViewSet(BaseRLSViewSet):
return queryset.select_related("provider", "task")
def get_serializer_class(self):
if self.action == "create":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
return ScanCreateSerializer
elif self.action == "partial_update":
if self.action == "partial_update":
return ScanUpdateSerializer
elif self.action == "report":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
return ScanReportSerializer
elif self.action == "compliance":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
return ScanComplianceReportSerializer
elif self.action == "threatscore":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
elif self.action == "ens":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
elif self.action == "nis2":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
elif self.action == "csa":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
elif self.action == "cis":
action_defaults = {
"create": ScanCreateSerializer,
"report": ScanReportSerializer,
"compliance": ScanComplianceReportSerializer,
"compliance_ocsf": ScanComplianceReportSerializer,
}
response_only_actions = {"threatscore", "ens", "nis2", "csa", "cis"}
if self.action in action_defaults or self.action in response_only_actions:
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
if self.action in action_defaults:
return action_defaults[self.action]
return super().get_serializer_class()
def partial_update(self, request, *args, **kwargs):
@@ -2059,12 +2084,17 @@ class ScanViewSet(BaseRLSViewSet):
if scan_instance.state == StateChoices.EXECUTING and scan_instance.task:
task = scan_instance.task
else:
try:
task = Task.objects.get(
# A scan can have several `scan-report` tasks (e.g. re-runs); take the
# most recent one. `.first()` also avoids `MultipleObjectsReturned`.
task = (
Task.objects.filter(
task_runner_task__task_name="scan-report",
task_runner_task__task_kwargs__contains=str(scan_instance.id),
)
except Task.DoesNotExist:
.order_by("-inserted_at")
.first()
)
if task is None:
return None
self.response_serializer_class = TaskSerializer
@@ -2139,27 +2169,32 @@ class ScanViewSet(BaseRLSViewSet):
status=status.HTTP_502_BAD_GATEWAY,
)
contents = resp.get("Contents", [])
keys = []
matches = []
for obj in contents:
key = obj["Key"]
key_basename = os.path.basename(key)
if any(ch in suffix for ch in ("*", "?", "[")):
if fnmatch.fnmatch(key_basename, suffix):
keys.append(key)
matches.append(obj)
elif key_basename == suffix:
keys.append(key)
matches.append(obj)
elif key.endswith(suffix):
# Backward compatibility if suffix already includes directories
keys.append(key)
if not keys:
matches.append(obj)
if not matches:
return Response(
{
"detail": f"No compliance file found for name '{os.path.splitext(suffix)[0]}'."
},
status=status.HTTP_404_NOT_FOUND,
)
# path_pattern here is prefix, but in compliance we build correct suffix check before
key = keys[0]
# Return the most recently modified match (latest report) when
# several files share the prefix/suffix. `list_objects_v2` always
# returns `LastModified`; the fallback keeps ordering deterministic
# if it is ever absent.
key = max(matches, key=lambda o: (o.get("LastModified", ""), o["Key"]))[
"Key"
]
else:
# path_pattern is exact key; HEAD before presigning to preserve the 404 contract.
key = path_pattern
@@ -2209,7 +2244,9 @@ class ScanViewSet(BaseRLSViewSet):
},
status=status.HTTP_404_NOT_FOUND,
)
filepath = files[0]
# Return the most recently modified match (latest report) when the
# pattern resolves to several files.
filepath = max(files, key=os.path.getmtime)
with open(filepath, "rb") as f:
content = f.read()
filename = os.path.basename(filepath)
@@ -2257,20 +2294,16 @@ class ScanViewSet(BaseRLSViewSet):
content, filename = loader
return self._serve_file(content, filename, "application/x-zip-compressed")
@action(
detail=True,
methods=["get"],
url_path="compliance/(?P<name>[^/]+)",
url_name="compliance",
)
def compliance(self, request, pk=None, name=None):
scan = self.get_object()
if name not in get_compliance_frameworks(scan.provider.provider):
return Response(
{"detail": f"Compliance '{name}' not found."},
status=status.HTTP_404_NOT_FOUND,
)
def _serve_compliance_artifact(self, scan, name, file_extension, content_type):
"""Resolve and serve a per-framework compliance artifact from disk/S3.
Shared by the CSV and OCSF compliance download actions. Both are
path-based (no query params) on purpose: ``get_object`` runs
``filter_queryset``, which triggers JSON:API's
``QueryParameterValidationFilter`` and 400s on any non-JSON:API
query param, so a ``?format=`` / ``?type=`` selector is not viable
here the format is encoded in the route instead.
"""
running_resp = self._get_task_status(scan)
if running_resp:
return running_resp
@@ -2287,25 +2320,66 @@ class ScanViewSet(BaseRLSViewSet):
bucket = env.str("DJANGO_OUTPUT_S3_AWS_OUTPUT_BUCKET", "")
key_prefix = scan.output_location.removeprefix(f"s3://{bucket}/")
prefix = os.path.join(
os.path.dirname(key_prefix), "compliance", f"{name}.csv"
os.path.dirname(key_prefix), "compliance", f"{name}.{file_extension}"
)
loader = self._load_file(
prefix,
s3=True,
bucket=bucket,
list_objects=True,
content_type="text/csv",
content_type=content_type,
)
else:
base = os.path.dirname(scan.output_location)
pattern = os.path.join(base, "compliance", f"*_{name}.csv")
pattern = os.path.join(base, "compliance", f"*_{name}.{file_extension}")
loader = self._load_file(pattern, s3=False)
if isinstance(loader, HttpResponseBase):
return loader
content, filename = loader
return self._serve_file(content, filename, "text/csv")
return self._serve_file(content, filename, content_type)
@action(
detail=True,
methods=["get"],
url_path="compliance/(?P<name>[^/]+)",
url_name="compliance",
)
def compliance(self, request, pk=None, name=None):
scan = self.get_object()
if name not in get_compliance_frameworks(scan.provider.provider):
return Response(
{"detail": f"Compliance '{name}' not found."},
status=status.HTTP_404_NOT_FOUND,
)
return self._serve_compliance_artifact(scan, name, "csv", "text/csv")
@action(
detail=True,
methods=["get"],
url_path="compliance/(?P<name>[^/]+)/ocsf",
url_name="compliance-ocsf",
)
def compliance_ocsf(self, request, pk=None, name=None):
scan = self.get_object()
if name not in get_compliance_frameworks(scan.provider.provider):
return Response(
{"detail": f"Compliance '{name}' not found."},
status=status.HTTP_404_NOT_FOUND,
)
universal_bulk = get_prowler_provider_compliance(scan.provider.provider)
framework_obj = universal_bulk.get(name)
if not (framework_obj and getattr(framework_obj, "outputs", None)):
return Response(
{"detail": f"Compliance '{name}' does not provide an OCSF export."},
status=status.HTTP_404_NOT_FOUND,
)
return self._serve_compliance_artifact(
scan, name, "ocsf.json", "application/json"
)
@action(
detail=True,
@@ -3749,6 +3823,16 @@ class FindingViewSet(PaginateByPkMixin, BaseRLSViewSet):
return queryset
return super().filter_queryset(queryset)
def _optimize_tags_loading(self, queryset):
"""Prefetch resource tags to avoid N+1 queries when serializing findings"""
return queryset.prefetch_related(
Prefetch(
"resources__tags",
queryset=ResourceTag.objects.filter(tenant_id=self.request.tenant_id),
to_attr="prefetched_tags",
)
)
def list(self, request, *args, **kwargs):
filtered_queryset = self.filter_queryset(self.get_queryset())
return self.paginate_by_pk(
@@ -4993,7 +5077,7 @@ class ComplianceOverviewViewSet(BaseRLSViewSet, TaskManagementMixin):
# If we couldn't determine from database, try each provider type
if not provider_type:
for pt in Provider.ProviderChoices.values:
for pt in SDKProvider.get_available_providers():
if compliance_id in get_compliance_frameworks(pt):
provider_type = pt
break
@@ -5333,7 +5417,7 @@ class OverviewViewSet(BaseRLSViewSet):
"""Extract and validate provider filters from query params."""
params = self.request.query_params
filters = {}
valid_provider_types = {c[0] for c in Provider.ProviderChoices.choices}
valid_provider_types = set(SDKProvider.get_available_providers())
provider_id = params.get("filter[provider_id]")
if provider_id:
+55
View File
@@ -26,6 +26,61 @@ celery_app.conf.result_backend_transport_options = {
}
celery_app.conf.visibility_timeout = BROKER_VISIBILITY_TIMEOUT
# Durable delivery: keep the message until the task finishes, so a worker killed
# mid-task (deploy/OOM/eviction) does not silently drop it. Reserve one task at a
# time so a crash exposes at most one extra reserved message.
celery_app.conf.task_acks_late = True
celery_app.conf.task_reject_on_worker_lost = True
celery_app.conf.worker_prefetch_multiplier = env.int(
"DJANGO_CELERY_WORKER_PREFETCH_MULTIPLIER", default=1
)
# On SIGTERM, give the worker time to finish or re-queue in-flight tasks before
# it is forcefully killed (Celery 5.5+ soft shutdown).
celery_app.conf.worker_soft_shutdown_timeout = env.int(
"DJANGO_CELERY_WORKER_SOFT_SHUTDOWN_TIMEOUT", default=60
)
# Bound execution so a blocked task cannot pin a worker forever. Connection
# checks get a tight limit; scans and provider/tenant deletions can legitimately
# run for more than a day on large tenants, so they get a much higher cap.
# The default for every other task is set as the global limit, not as a "*"
# annotation: Celery applies the "*" entry AFTER the per-task one, so a "*" in
# task_annotations would silently overwrite every specific limit defined below.
_TASK_HARD_LIMIT = env.int("DJANGO_CELERY_TASK_TIME_LIMIT", default=6 * 60 * 60)
_TASK_SOFT_LIMIT = env.int(
"DJANGO_CELERY_TASK_SOFT_TIME_LIMIT", default=_TASK_HARD_LIMIT - 600
)
_LONG_TASK_HARD_LIMIT = env.int(
"DJANGO_CELERY_LONG_TASK_TIME_LIMIT", default=48 * 60 * 60
)
_LONG_TASK_SOFT_LIMIT = env.int(
"DJANGO_CELERY_LONG_TASK_SOFT_TIME_LIMIT", default=_LONG_TASK_HARD_LIMIT - 600
)
celery_app.conf.task_time_limit = _TASK_HARD_LIMIT
celery_app.conf.task_soft_time_limit = _TASK_SOFT_LIMIT
celery_app.conf.task_annotations = {
**{
name: {"soft_time_limit": 60, "time_limit": 120}
for name in (
"provider-connection-check",
"integration-connection-check",
"lighthouse-connection-check",
"lighthouse-provider-connection-check",
)
},
**{
name: {
"soft_time_limit": _LONG_TASK_SOFT_LIMIT,
"time_limit": _LONG_TASK_HARD_LIMIT,
}
for name in (
"scan-perform",
"scan-perform-scheduled",
"provider-deletion",
"tenant-deletion",
)
},
}
celery_app.autodiscover_tasks(["api"])
@@ -1,12 +1,14 @@
from datetime import datetime, timedelta, timezone
from celery import current_app, states
from celery import states
from celery.utils.log import get_task_logger
from config.django.base import ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES
from tasks.jobs.attack_paths.db_utils import (
_mark_scan_finished,
recover_graph_data_ready,
)
from tasks.jobs.orphan_recovery import is_worker_alive as _is_worker_alive
from tasks.jobs.orphan_recovery import revoke_task as _revoke_task
from api.attack_paths import database as graph_database
from api.db_router import MainRouter
@@ -150,32 +152,6 @@ def _cleanup_stale_scheduled_scans(cutoff: datetime) -> list[str]:
return cleaned_up
def _is_worker_alive(worker: str) -> bool:
"""Ping a specific Celery worker. Returns `True` if it responds or on error."""
try:
response = current_app.control.inspect(destination=[worker], timeout=1.0).ping()
return response is not None and worker in response
except Exception:
logger.exception(f"Failed to ping worker {worker}, treating as alive")
return True
def _revoke_task(task_result, terminate: bool = True) -> None:
"""Revoke a Celery task. Non-fatal on failure.
`terminate=True` SIGTERMs the worker if the task is mid-execution; use
for EXECUTING cleanup. `terminate=False` only marks the task id revoked
across workers, so any worker pulling the queued message discards it;
use for SCHEDULED cleanup where the task hasn't run yet.
"""
try:
kwargs = {"terminate": True, "signal": "SIGTERM"} if terminate else {}
current_app.control.revoke(task_result.task_id, **kwargs)
logger.info(f"Revoked task {task_result.task_id}")
except Exception:
logger.exception(f"Failed to revoke task {task_result.task_id}")
def _cleanup_scan(scan, task_result, reason: str) -> bool:
"""
Clean up a single stale `AttackPathsScan`:
+9
View File
@@ -11,6 +11,7 @@ from api.db_utils import batch_delete, rls_transaction
from api.models import (
AttackPathsScan,
Finding,
JiraIssueDispatch,
Provider,
ProviderComplianceScore,
Resource,
@@ -80,6 +81,14 @@ def delete_provider(tenant_id: str, pk: str):
deletion_steps = [
("Scan Summaries", ScanSummary.all_objects.filter(scan__provider=instance)),
(
"Jira Issue Dispatches",
JiraIssueDispatch.objects.filter(
finding_id__in=Finding.all_objects.filter(
scan__provider=instance
).values_list("id", flat=True)
),
),
("Findings", Finding.all_objects.filter(scan__provider=instance)),
("Resources", Resource.all_objects.filter(provider=instance)),
("Scans", Scan.all_objects.filter(provider=instance)),
-10
View File
@@ -39,11 +39,6 @@ from prowler.lib.outputs.compliance.cis.cis_oraclecloud import OracleCloudCIS
from prowler.lib.outputs.compliance.cisa_scuba.cisa_scuba_googleworkspace import (
GoogleWorkspaceCISASCuBA,
)
from prowler.lib.outputs.compliance.csa.csa_alibabacloud import AlibabaCloudCSA
from prowler.lib.outputs.compliance.csa.csa_aws import AWSCSA
from prowler.lib.outputs.compliance.csa.csa_azure import AzureCSA
from prowler.lib.outputs.compliance.csa.csa_gcp import GCPCSA
from prowler.lib.outputs.compliance.csa.csa_oraclecloud import OracleCloudCSA
from prowler.lib.outputs.compliance.ens.ens_aws import AWSENS
from prowler.lib.outputs.compliance.ens.ens_azure import AzureENS
from prowler.lib.outputs.compliance.ens.ens_gcp import GCPENS
@@ -102,7 +97,6 @@ COMPLIANCE_CLASS_MAP = {
(lambda name: name == "prowler_threatscore_aws", ProwlerThreatScoreAWS),
(lambda name: name.startswith("ccc_"), CCC_AWS),
(lambda name: name.startswith("c5_"), AWSC5),
(lambda name: name.startswith("csa_"), AWSCSA),
(lambda name: name == "asd_essential_eight_aws", ASDEssentialEightAWS),
],
"azure": [
@@ -113,7 +107,6 @@ COMPLIANCE_CLASS_MAP = {
(lambda name: name.startswith("ccc_"), CCC_Azure),
(lambda name: name == "prowler_threatscore_azure", ProwlerThreatScoreAzure),
(lambda name: name == "c5_azure", AzureC5),
(lambda name: name.startswith("csa_"), AzureCSA),
],
"gcp": [
(lambda name: name.startswith("cis_"), GCPCIS),
@@ -123,7 +116,6 @@ COMPLIANCE_CLASS_MAP = {
(lambda name: name == "prowler_threatscore_gcp", ProwlerThreatScoreGCP),
(lambda name: name.startswith("ccc_"), CCC_GCP),
(lambda name: name == "c5_gcp", GCPC5),
(lambda name: name.startswith("csa_"), GCPCSA),
],
"kubernetes": [
(lambda name: name.startswith("cis_"), KubernetesCIS),
@@ -152,11 +144,9 @@ COMPLIANCE_CLASS_MAP = {
"image": [],
"oraclecloud": [
(lambda name: name.startswith("cis_"), OracleCloudCIS),
(lambda name: name.startswith("csa_"), OracleCloudCSA),
],
"alibabacloud": [
(lambda name: name.startswith("cis_"), AlibabaCloudCIS),
(lambda name: name.startswith("csa_"), AlibabaCloudCSA),
(
lambda name: name == "prowler_threatscore_alibabacloud",
ProwlerThreatScoreAlibaba,
+100 -51
View File
@@ -9,7 +9,7 @@ from tasks.utils import batched
from api.db_router import READ_REPLICA_ALIAS, MainRouter
from api.db_utils import REPLICA_MAX_ATTEMPTS, REPLICA_RETRY_BASE_DELAY, rls_transaction
from api.models import Finding, Integration, Provider
from api.models import Finding, Integration, JiraIssueDispatch, Provider
from api.utils import initialize_prowler_integration, initialize_prowler_provider
from prowler.lib.outputs.asff.asff import ASFF
from prowler.lib.outputs.compliance.generic.generic import GenericCompliance
@@ -482,66 +482,115 @@ def send_findings_to_jira(
with rls_transaction(tenant_id):
integration = Integration.objects.get(id=integration_id)
jira_integration = initialize_prowler_integration(integration)
# Idempotency: findings already ticketed for this integration must not be
# sent again on a re-run (e.g. orphan recovery), to avoid duplicate issues
already_sent = {
str(fid)
for fid in JiraIssueDispatch.objects.filter(
integration_id=integration_id, finding_id__in=finding_ids
).values_list("finding_id", flat=True)
}
num_tickets_created = 0
skipped_count = 0
for finding_id in finding_ids:
if str(finding_id) in already_sent:
skipped_count += 1
continue
# Reserve the finding BEFORE the external call. The unique constraint on
# (tenant, integration, finding) makes the dispatch row the single source of
# truth, so a concurrent run or a retry that raced past the bulk pre-check
# cannot create a duplicate issue: created=False means another run already
# claimed it. The reservation is released below if the send does not succeed.
with rls_transaction(tenant_id):
finding_instance = (
Finding.all_objects.select_related("scan__provider")
.prefetch_related("resources")
.get(id=finding_id)
_, created = JiraIssueDispatch.objects.get_or_create(
tenant_id=tenant_id,
integration_id=integration_id,
finding_id=finding_id,
)
if not created:
skipped_count += 1
continue
# Extract resource information
resource = (
finding_instance.resources.first()
if finding_instance.resources.exists()
else None
)
resource_uid = resource.uid if resource else ""
resource_name = resource.name if resource else ""
resource_tags = {}
if resource and hasattr(resource, "tags"):
resource_tags = resource.get_tags(tenant_id)
sent = False
try:
with rls_transaction(tenant_id):
finding_instance = (
Finding.all_objects.select_related("scan__provider")
.prefetch_related("resources")
.get(id=finding_id)
)
# Get region
region = resource.region if resource and resource.region else ""
# Extract resource information
resource = (
finding_instance.resources.first()
if finding_instance.resources.exists()
else None
)
resource_uid = resource.uid if resource else ""
resource_name = resource.name if resource else ""
resource_tags = {}
if resource and hasattr(resource, "tags"):
resource_tags = resource.get_tags(tenant_id)
# Extract remediation information from check_metadata
check_metadata = finding_instance.check_metadata
remediation = check_metadata.get("remediation", {})
recommendation = remediation.get("recommendation", {})
remediation_code = remediation.get("code", {})
# Get region
region = resource.region if resource and resource.region else ""
# Send the individual finding to Jira
result = jira_integration.send_finding(
check_id=finding_instance.check_id,
check_title=check_metadata.get("checktitle", ""),
severity=finding_instance.severity,
status=finding_instance.status,
status_extended=finding_instance.status_extended or "",
provider=finding_instance.scan.provider.provider,
region=region,
resource_uid=resource_uid,
resource_name=resource_name,
risk=check_metadata.get("risk", ""),
recommendation_text=recommendation.get("text", ""),
recommendation_url=recommendation.get("url", ""),
remediation_code_native_iac=remediation_code.get("nativeiac", ""),
remediation_code_terraform=remediation_code.get("terraform", ""),
remediation_code_cli=remediation_code.get("cli", ""),
remediation_code_other=remediation_code.get("other", ""),
resource_tags=resource_tags,
compliance=finding_instance.compliance or {},
project_key=project_key,
issue_type=issue_type,
)
if result:
num_tickets_created += 1
else:
logger.error(f"Failed to send finding {finding_id} to Jira")
# Extract remediation information from check_metadata
check_metadata = finding_instance.check_metadata
remediation = check_metadata.get("remediation", {})
recommendation = remediation.get("recommendation", {})
remediation_code = remediation.get("code", {})
# Send the individual finding to Jira
sent = bool(
jira_integration.send_finding(
check_id=finding_instance.check_id,
check_title=check_metadata.get("checktitle", ""),
severity=finding_instance.severity,
status=finding_instance.status,
status_extended=finding_instance.status_extended or "",
provider=finding_instance.scan.provider.provider,
region=region,
resource_uid=resource_uid,
resource_name=resource_name,
risk=check_metadata.get("risk", ""),
recommendation_text=recommendation.get("text", ""),
recommendation_url=recommendation.get("url", ""),
remediation_code_native_iac=remediation_code.get(
"nativeiac", ""
),
remediation_code_terraform=remediation_code.get(
"terraform", ""
),
remediation_code_cli=remediation_code.get("cli", ""),
remediation_code_other=remediation_code.get("other", ""),
resource_tags=resource_tags,
compliance=finding_instance.compliance or {},
project_key=project_key,
issue_type=issue_type,
)
)
finally:
if not sent:
# Release the reservation so a later run can retry this finding: it
# was not ticketed (send failed or raised), so the row must not block
# a future legitimate send.
with rls_transaction(tenant_id):
JiraIssueDispatch.objects.filter(
tenant_id=tenant_id,
integration_id=integration_id,
finding_id=finding_id,
).delete()
if sent:
num_tickets_created += 1
else:
logger.error(f"Failed to send finding {finding_id} to Jira")
return {
"created_count": num_tickets_created,
"failed_count": len(finding_ids) - num_tickets_created,
"failed_count": len(finding_ids) - num_tickets_created - skipped_count,
"skipped_count": skipped_count,
}
@@ -0,0 +1,397 @@
"""Detect and recover orphaned Celery tasks.
A task is "orphaned" when its result row is non-terminal (STARTED/RECEIVED) but the
worker that was running it is gone (deploy, OOM, eviction). We tell a real orphan
from a still-running task by pinging the worker recorded on its `TaskResult`:
- worker responds -> the task is in flight, leave it alone (never double-run);
- worker is gone -> real orphan: mark the stale result terminal (so pending/started
alerts clear), then re-enqueue the task from its stored name + kwargs.
This recovers only allowlisted tasks with local, proven idempotency. Celery's
`result_extended=True` gives us the stored `task_name`/`task_kwargs`/`worker` once
the task starts, but external side-effect tasks are failed instead of blindly
re-run. A small recovery cap stops a task that repeatedly kills its worker from
looping forever.
This is the shared engine behind both the periodic Beat watchdog and the
`reconcile_orphan_tasks` management command.
"""
import ast
import json
from contextlib import contextmanager
from datetime import datetime, timedelta, timezone
from uuid import uuid4
from celery import current_app, states
from celery.utils.log import get_task_logger
from django.db import connections
logger = get_task_logger(__name__)
# Arbitrary constant key for pg_try_advisory_lock so only one reconciliation
# runs at a time across replicas / the watchdog / the command.
ORPHAN_RECOVERY_LOCK_KEY = 0x70726F77 # "prow"
# Non-terminal states that mean "a worker had this and may have died with it".
IN_FLIGHT_STATES = (states.STARTED, states.RECEIVED)
# Scan tasks are recovered by re-running scan-perform on the EXISTING scan row,
# not by re-enqueuing the original task: re-enqueuing scan-perform-scheduled would
# hit its "a scan is already executing" guard and no-op, leaving the scan stuck.
_SCAN_TASKS = ("scan-perform", "scan-perform-scheduled")
# Tasks with proven idempotency are auto re-enqueued. Scans/summaries clear and
# rewrite their own rows. integration-jira is safe too: each finding is reserved in
# JiraIssueDispatch before the external call, so a re-run skips already-ticketed
# findings (worst case one finding missed on a mid-send crash, never a duplicate).
# Other external side effects stay terminal: integration-s3 rebuilds its upload from
# worker-local files that do not survive a crash, and report/Security Hub recovery is
# out of scope.
REENQUEUEABLE_TASKS = {
*_SCAN_TASKS,
"provider-deletion",
"tenant-deletion",
"scan-summary",
"scan-compliance-overviews",
"scan-provider-compliance-scores",
"scan-daily-severity",
"scan-finding-group-summaries",
"scan-reset-ephemeral-resources",
"integration-jira",
}
# Tasks excluded from generic recovery: attack-paths scans are handled by their own
# stale-cleanup (which also drops the temp Neo4j db), and the maintenance tasks must
# not self-recover (they run again on their own schedule).
_SKIP_RECOVERY = {
"attack-paths-scan-perform",
"attack-paths-cleanup-stale-scans",
"reconcile-orphan-tasks",
}
@contextmanager
def advisory_lock(key: int = ORPHAN_RECOVERY_LOCK_KEY, using: str = "default"):
"""Yield True if this session won a Postgres advisory lock, else False.
Non-blocking: losers get False and should no-op. The lock is released on
exit (and implicitly if the session dies).
"""
with connections[using].cursor() as cursor:
cursor.execute("SELECT pg_try_advisory_lock(%s)", [key])
acquired = bool(cursor.fetchone()[0])
try:
yield acquired
finally:
if acquired:
cursor.execute("SELECT pg_advisory_unlock(%s)", [key])
def is_worker_alive(worker: str, timeout: float = 1.0) -> bool:
"""Ping a specific Celery worker. Returns True if it responds, or on error.
Erring on the side of "alive" means an unreachable control bus never causes
a still-running task to be re-enqueued.
"""
try:
response = current_app.control.inspect(
destination=[worker], timeout=timeout
).ping()
return response is not None and worker in response
except Exception:
logger.exception(f"Failed to ping worker {worker}, treating as alive")
return True
def revoke_task(task_result, terminate: bool = True) -> None:
"""Revoke a Celery task by its TaskResult. Non-fatal on failure.
terminate=True SIGTERMs the worker if the task is mid-execution; terminate=False
only marks the id revoked so any worker pulling the queued message discards it
(use before re-enqueuing, so a later broker redelivery of the stale message is
dropped).
"""
try:
kwargs = {"terminate": True, "signal": "SIGTERM"} if terminate else {}
current_app.control.revoke(task_result.task_id, **kwargs)
logger.info(f"Revoked task {task_result.task_id}")
except Exception:
logger.exception(f"Failed to revoke task {task_result.task_id}")
def _decode_celery_field(value, default):
"""Decode django-celery-results' stored task_args/task_kwargs to a Python object.
The backend stores them as a (sometimes double-encoded) repr/JSON string. An
empty or missing field returns ``default``; a non-empty value that cannot be
decoded raises ``ValueError`` so the caller can avoid re-enqueuing a task with
the wrong arguments.
"""
obj = value
for _ in range(2): # values can be double-encoded (a string holding a repr)
if not isinstance(obj, str):
break
text = obj.strip()
if not text:
return default
parsed = None
for parser in (ast.literal_eval, json.loads):
try:
parsed = parser(text)
break
except (ValueError, SyntaxError, TypeError):
continue
if parsed is None:
raise ValueError(f"undecodable celery field: {text[:120]!r}")
obj = parsed
return default if obj is None else obj
def reconcile_orphans(
grace_minutes: int = 2,
max_attempts: int = 3,
window_hours: int = 6,
dry_run: bool = False,
) -> dict:
"""Run the full orphan sweep under a single-flight advisory lock.
Recovers any orphaned in-flight task and delegates attack-paths scans that
never reached a worker to their existing stale-cleanup. Returns a summary;
a no-op (lock not won) is reported too.
"""
with advisory_lock() as acquired:
if not acquired:
logger.info("Orphan reconcile skipped: another run holds the lock")
return {"acquired": False}
# Populate the task registry so we can re-enqueue any task by name.
import tasks.tasks # noqa: F401
result = _reconcile_task_results(
grace_minutes=grace_minutes,
max_attempts=max_attempts,
window_hours=window_hours,
dry_run=dry_run,
)
if not dry_run:
from tasks.jobs.attack_paths.cleanup import cleanup_stale_attack_paths_scans
result["attack_paths"] = cleanup_stale_attack_paths_scans()
return {"acquired": True, **result}
def _reconcile_task_results(
grace_minutes: int, max_attempts: int, window_hours: int, dry_run: bool
) -> dict:
from django_celery_results.models import TaskResult
cutoff = datetime.now(tz=timezone.utc) - timedelta(minutes=grace_minutes)
candidates = list(
TaskResult.objects.filter(status__in=IN_FLIGHT_STATES, date_created__lt=cutoff)
.exclude(worker__isnull=True)
.exclude(worker="")
.exclude(task_name__in=_SKIP_RECOVERY)
)
# Ping each distinct worker at most once.
worker_alive = {w: is_worker_alive(w) for w in {tr.worker for tr in candidates}}
recovered, failed, skipped = [], [], []
for task_result in candidates:
if worker_alive.get(task_result.worker, True):
skipped.append(task_result.task_id) # in flight, do not double-run
continue
if dry_run:
recovered.append(task_result.task_id)
continue
outcome = _recover_task(task_result, max_attempts, window_hours)
(recovered if outcome == "recovered" else failed).append(task_result.task_id)
logger.info(
"Orphan reconcile: recovered=%d failed=%d skipped(in-flight)=%d",
len(recovered),
len(failed),
len(skipped),
)
return {"recovered": recovered, "failed": failed, "skipped": skipped}
def _recovery_attempt_count(name: str, kwargs_repr, window_hours: int) -> int:
"""Increment and return the recovery count for this (task, kwargs) within the
window. Backed by Valkey so it survives result-row churn (a worker processing
the revoke can blank the TaskResult fields). Fail-open if Valkey is down (the
broker being unreachable means nothing is running anyway).
"""
import hashlib
from django.conf import settings
try:
import redis
client = redis.from_url(settings.CELERY_BROKER_URL)
signature = f"{name}|{kwargs_repr}".encode()
key = (
"orphan-recovery:"
+ hashlib.sha1(signature, usedforsecurity=False).hexdigest()
)
count = client.incr(key)
if count == 1:
client.expire(key, max(1, window_hours) * 3600)
return int(count)
except Exception:
logger.exception("Recovery-attempt counter unavailable; allowing recovery")
return 1
def _recover_task(task_result, max_attempts: int, window_hours: int) -> str:
"""Recover one orphaned task. Returns 'recovered' or 'failed'."""
# Capture name/args/kwargs now: revoking can let a worker blank the row.
name = task_result.task_name
args_repr = task_result.task_args
kwargs_repr = task_result.task_kwargs
now = datetime.now(tz=timezone.utc)
# Drop any future broker redelivery of the stale message.
revoke_task(task_result, terminate=False)
# Mark the stale result terminal so "pending/started forever" alerts clear.
task_result.status = states.REVOKED
task_result.date_done = now
task_result.save(update_fields=["status", "date_done"])
attempt = _recovery_attempt_count(name, kwargs_repr, window_hours)
if name not in REENQUEUEABLE_TASKS or attempt > max_attempts:
reason = (
f"{name} is not allowlisted for auto recovery"
if name not in REENQUEUEABLE_TASKS
else f"recovery cap reached ({attempt}/{max_attempts})"
)
_fail_domain_row(task_result.task_id, name, now)
logger.warning(
"Orphan %s (%s) not re-enqueued: %s", task_result.task_id, name, reason
)
return "failed"
# Scan tasks: re-run the EXISTING scan row directly via scan-perform, so the
# scheduled-scan "already executing" guard cannot turn recovery into a no-op.
# Falls through to the generic path only if no scan is linked yet (e.g. a
# scheduled task that died before creating one), where re-running it creates one.
if name in _SCAN_TASKS:
scan = _scan_for_task(task_result.task_id)
if scan is not None:
if not _reenqueue_scan(task_result.task_id, scan):
return "failed"
logger.info(
"Re-enqueued orphaned scan %s (was task %s)",
scan.id,
task_result.task_id,
)
return "recovered"
task_obj = current_app.tasks.get(name)
if task_obj is None:
logger.error(
"Orphan %s: task %s not registered, cannot re-enqueue",
task_result.task_id,
name,
)
return "failed"
try:
args = _decode_celery_field(args_repr, [])
kwargs = _decode_celery_field(kwargs_repr, {})
except ValueError:
logger.error(
"Orphan %s (%s): could not decode stored args/kwargs, not re-enqueuing",
task_result.task_id,
name,
)
_fail_domain_row(task_result.task_id, name, now)
return "failed"
new_task_id = str(uuid4())
task_obj.apply_async(
args=list(args) if isinstance(args, (list, tuple)) else [],
kwargs=kwargs if isinstance(kwargs, dict) else {},
task_id=new_task_id,
)
logger.info(
"Re-enqueued orphan %s (%s) as %s", task_result.task_id, name, new_task_id
)
return "recovered"
def _scan_for_task(task_id: str):
"""Return the Scan linked to a Celery task id, or None (read across tenants)."""
from api.db_router import MainRouter
from api.models import Scan
return Scan.all_objects.using(MainRouter.admin_db).filter(task_id=task_id).first()
def _reenqueue_scan(old_task_id: str, scan) -> bool:
"""Re-run an orphaned scan via scan-perform on the existing row.
Pre-provisions the new task linkage (TaskResult + api.Task) and relinks the
Scan before enqueuing, so the FK is valid and a worker can never outrun the DB.
The relink is conditional on the scan still pointing at the old task, so a stale
orphan can never clobber a newer linkage.
"""
from django_celery_results.models import TaskResult
from api.db_utils import rls_transaction
from api.models import Scan
from api.models import Task as APITask
from tasks.tasks import perform_scan_task
tenant_id = str(scan.tenant_id)
new_task_id = str(uuid4())
with rls_transaction(tenant_id):
locked_scan = Scan.all_objects.select_for_update().filter(id=scan.id).first()
if locked_scan is None or str(locked_scan.task_id) != old_task_id:
logger.info(
"Scan %s no longer points at task %s; skipping recovery re-enqueue",
scan.id,
old_task_id,
)
return False
task_result_new, _ = TaskResult.objects.get_or_create(
task_id=new_task_id,
defaults={"status": states.PENDING, "task_name": "scan-perform"},
)
APITask.objects.update_or_create(
id=new_task_id,
tenant_id=tenant_id,
defaults={"task_runner_task": task_result_new},
)
locked_scan.task_id = new_task_id
locked_scan.recovery_count = (locked_scan.recovery_count or 0) + 1
locked_scan.save(update_fields=["task_id", "recovery_count", "updated_at"])
perform_scan_task.apply_async(
kwargs={
"tenant_id": tenant_id,
"scan_id": str(scan.id),
"provider_id": str(scan.provider_id),
},
task_id=new_task_id,
)
return True
def _fail_domain_row(old_task_id: str, name: str, now: datetime) -> None:
"""Mark a scan terminal when its task is capped/denylisted instead of re-run."""
from api.db_utils import rls_transaction
from api.models import Scan, StateChoices
if name in _SCAN_TASKS:
scan = _scan_for_task(old_task_id)
if scan is not None:
with rls_transaction(str(scan.tenant_id)):
Scan.all_objects.filter(id=scan.id, task_id=old_task_id).update(
state=StateChoices.FAILED, completed_at=now
)
+41 -43
View File
@@ -29,7 +29,10 @@ from api.db_router import READ_REPLICA_ALIAS, MainRouter
from api.db_utils import rls_transaction
from api.models import Provider, Scan, ScanSummary, StateChoices, ThreatScoreSnapshot
from api.utils import initialize_prowler_provider
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.compliance_models import (
Compliance,
get_bulk_compliance_frameworks_universal,
)
from prowler.lib.outputs.finding import Finding as FindingOutput
logger = get_task_logger(__name__)
@@ -571,7 +574,7 @@ def generate_csa_report(
Args:
tenant_id: The tenant ID for Row-Level Security context.
scan_id: ID of the scan executed by Prowler.
compliance_id: ID of the compliance framework (e.g., "csa_ccm_4.0_aws").
compliance_id: ID of the compliance framework (e.g., "csa_ccm_4.0").
output_path: Output PDF file path.
provider_id: Provider ID for the scan.
only_failed: If True, only include failed requirements in detailed section.
@@ -751,43 +754,10 @@ def generate_compliance_reports(
provider_uid = provider_obj.uid
provider_type = provider_obj.provider
# Check provider compatibility
if generate_threatscore and provider_type not in [
"aws",
"azure",
"gcp",
"m365",
"kubernetes",
"alibabacloud",
]:
logger.info("Provider %s not supported for ThreatScore report", provider_type)
results["threatscore"] = {"upload": False, "path": ""}
generate_threatscore = False
if generate_ens and provider_type not in ["aws", "azure", "gcp"]:
logger.info("Provider %s not supported for ENS report", provider_type)
results["ens"] = {"upload": False, "path": ""}
generate_ens = False
if generate_nis2 and provider_type not in ["aws", "azure", "gcp"]:
logger.info("Provider %s not supported for NIS2 report", provider_type)
results["nis2"] = {"upload": False, "path": ""}
generate_nis2 = False
if generate_csa and provider_type not in [
"aws",
"azure",
"gcp",
"oraclecloud",
"alibabacloud",
]:
logger.info("Provider %s not supported for CSA CCM report", provider_type)
results["csa"] = {"upload": False, "path": ""}
generate_csa = False
# Load the framework definitions for this provider once. We use this map
# both to pick the latest CIS variant and to precompute the set of
# check_ids each framework consumes (for findings_cache eviction).
# Load the framework definitions for this provider once. We use this map to
# gate per-report availability, to pick the latest CIS variant, and to
# precompute the set of check_ids each framework consumes (for findings_cache
# eviction).
frameworks_bulk: dict = {}
try:
frameworks_bulk = Compliance.get_bulk(provider_type)
@@ -796,6 +766,32 @@ def generate_compliance_reports(
# Fall through; individual frameworks will still try and fail
# gracefully if their compliance_id is missing.
# Report availability is derived from the dynamically loaded framework map,
# not a hard-coded provider whitelist (which drifts the moment a new
# framework JSON ships, and would silently exclude an external provider that
# ships the framework). Same approach already used for CIS below.
if generate_threatscore and not frameworks_bulk.get(
f"prowler_threatscore_{provider_type}"
):
logger.info("Provider %s not supported for ThreatScore report", provider_type)
results["threatscore"] = {"upload": False, "path": ""}
generate_threatscore = False
if generate_ens and not frameworks_bulk.get(f"ens_rd2022_{provider_type}"):
logger.info("Provider %s not supported for ENS report", provider_type)
results["ens"] = {"upload": False, "path": ""}
generate_ens = False
if generate_nis2 and not frameworks_bulk.get(f"nis2_{provider_type}"):
logger.info("Provider %s not supported for NIS2 report", provider_type)
results["nis2"] = {"upload": False, "path": ""}
generate_nis2 = False
if generate_csa and not frameworks_bulk.get(f"csa_ccm_4.0_{provider_type}"):
logger.info("Provider %s not supported for CSA CCM report", provider_type)
results["csa"] = {"upload": False, "path": ""}
generate_csa = False
# For CIS we do NOT pre-check the provider against a hard-coded whitelist
# (that list drifts the moment a new CIS JSON ships). Instead, we inspect
# the dynamically loaded framework map and pick the latest available CIS
@@ -883,9 +879,11 @@ def generate_compliance_reports(
frameworks_bulk.get(f"nis2_{provider_type}")
)
if generate_csa:
pending_checks_by_framework["csa"] = _get_compliance_check_ids(
frameworks_bulk.get(f"csa_ccm_4.0_{provider_type}")
)
# csa_ccm_4.0 lives at the top level, not under compliance/{provider}/.
csa_framework = frameworks_bulk.get(
"csa_ccm_4.0"
) or get_bulk_compliance_frameworks_universal(provider_type).get("csa_ccm_4.0")
pending_checks_by_framework["csa"] = _get_compliance_check_ids(csa_framework)
if generate_cis and latest_cis:
pending_checks_by_framework["cis"] = _get_compliance_check_ids(
frameworks_bulk.get(latest_cis)
@@ -1183,7 +1181,7 @@ def generate_compliance_reports(
if generate_csa:
generated_report_keys.append("csa")
csa_path = output_paths["csa"]
compliance_id_csa = f"csa_ccm_4.0_{provider_type}"
compliance_id_csa = "csa_ccm_4.0"
pdf_path_csa = f"{csa_path}_csa_report.pdf"
logger.info("Generating CSA CCM report with compliance %s", compliance_id_csa)
+57 -4
View File
@@ -5,6 +5,7 @@ import time
from abc import ABC, abstractmethod
from contextlib import contextmanager
from dataclasses import dataclass, field
from types import SimpleNamespace
from typing import Any
from celery.utils.log import get_task_logger
@@ -26,7 +27,10 @@ from api.db_router import READ_REPLICA_ALIAS
from api.db_utils import rls_transaction
from api.models import Provider, StatusChoices
from api.utils import initialize_prowler_provider
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.compliance_models import (
Compliance,
get_bulk_compliance_frameworks_universal,
)
from prowler.lib.outputs.finding import Finding as FindingOutput
from .components import (
@@ -222,6 +226,46 @@ def get_requirement_metadata(
return None
def _universal_attributes_to_list(attributes) -> list:
"""Flatten a universal requirement's ``attributes`` into a list of objects
with attribute access. MITRE wraps its list under ``_raw_attributes``."""
if isinstance(attributes, dict) and "_raw_attributes" in attributes:
entries = attributes.get("_raw_attributes") or []
return [
SimpleNamespace(**entry) for entry in entries if isinstance(entry, dict)
]
if isinstance(attributes, dict):
return [SimpleNamespace(**attributes)] if attributes else []
return list(attributes or [])
def _adapt_universal_to_legacy(framework, provider_type: str) -> SimpleNamespace:
"""Expose a universal ``ComplianceFramework`` under the legacy ``Compliance``
attribute names used by the PDF pipeline."""
provider_key = (provider_type or "").lower()
requirements = []
for requirement in framework.requirements:
checks_by_provider = (
requirement.checks if isinstance(requirement.checks, dict) else {}
)
requirements.append(
SimpleNamespace(
Id=requirement.id,
Description=requirement.description or "",
Checks=list(checks_by_provider.get(provider_key, [])),
Attributes=_universal_attributes_to_list(requirement.attributes),
)
)
return SimpleNamespace(
Framework=framework.framework,
Name=framework.name,
Version=framework.version or "",
Description=framework.description or "",
Provider=framework.provider or provider_type,
Requirements=requirements,
)
# =============================================================================
# PDF Styles Cache
# =============================================================================
@@ -869,9 +913,18 @@ class BaseComplianceReportGenerator(ABC):
prowler_provider = initialize_prowler_provider(provider_obj)
provider_type = provider_obj.provider
# Load compliance framework
frameworks_bulk = Compliance.get_bulk(provider_type)
compliance_obj = frameworks_bulk.get(compliance_id)
# Load compliance framework — fall back to the universal loader
# for top-level JSONs (e.g. csa_ccm_4.0) that Compliance.get_bulk
# does not scan.
compliance_obj = Compliance.get_bulk(provider_type).get(compliance_id)
if not compliance_obj:
universal_framework = get_bulk_compliance_frameworks_universal(
provider_type
).get(compliance_id)
if universal_framework:
compliance_obj = _adapt_universal_to_legacy(
universal_framework, provider_type
)
if not compliance_obj:
raise ValueError(f"Compliance framework not found: {compliance_id}")
+25 -3
View File
@@ -118,6 +118,19 @@ ATTACK_SURFACE_PROVIDER_COMPATIBILITY = {
_ATTACK_SURFACE_MAPPING_CACHE: dict[str, dict] = {}
def _clear_scan_rerun_state(tenant_id: str, scan_id: str) -> None:
"""Remove rows derived from a previous execution of this scan."""
with rls_transaction(tenant_id):
Finding.all_objects.filter(scan_id=scan_id).delete()
ResourceScanSummary.objects.filter(scan_id=scan_id).delete()
ScanCategorySummary.objects.filter(scan_id=scan_id).delete()
ScanGroupSummary.objects.filter(scan_id=scan_id).delete()
ScanSummary.objects.filter(scan_id=scan_id).delete()
AttackSurfaceOverview.objects.filter(scan_id=scan_id).delete()
ComplianceRequirementOverview.objects.filter(scan_id=scan_id).delete()
ComplianceOverviewSummary.objects.filter(scan_id=scan_id).delete()
def aggregate_category_counts(
categories: list[str],
severity: str,
@@ -476,9 +489,13 @@ def _create_compliance_summaries(
)
)
# Bulk insert summaries
if summary_objects:
with rls_transaction(tenant_id):
# Idempotent re-run: clear this scan's prior summaries before re-inserting, so
# a recovered scan's summary always reflects its own (re-derived) requirement
# rows rather than keeping a stale row (bulk_create ignore_conflicts alone would
# keep the old one).
with rls_transaction(tenant_id):
ComplianceOverviewSummary.objects.filter(scan_id=scan_id).delete()
if summary_objects:
ComplianceOverviewSummary.objects.bulk_create(
summary_objects, batch_size=500, ignore_conflicts=True
)
@@ -1022,6 +1039,7 @@ def perform_prowler_scan(
scan_instance.state = StateChoices.EXECUTING
scan_instance.started_at = datetime.now(tz=timezone.utc)
scan_instance.save(update_fields=["state", "started_at", "updated_at"])
_clear_scan_rerun_state(tenant_id, scan_id)
# Find the mutelist processor if it exists
with rls_transaction(tenant_id, using=READ_REPLICA_ALIAS):
@@ -1651,6 +1669,10 @@ def create_compliance_requirements(tenant_id: str, scan_id: str):
elif requirement_status == "PASS":
requirement_statuses[key]["pass_count"] += 1
# Idempotent re-run: COPY can't ON CONFLICT, so clear this scan's rows first.
with rls_transaction(tenant_id):
ComplianceRequirementOverview.objects.filter(scan_id=scan_id).delete()
# Bulk create requirement records using PostgreSQL COPY
_persist_compliance_requirement_rows(tenant_id, compliance_requirement_rows)
+27 -22
View File
@@ -359,35 +359,40 @@ def _load_findings_for_requirement_checks(
def _get_compliance_check_ids(compliance_obj) -> set[str]:
"""Return the union of all check_ids referenced by a compliance framework.
Used by the master report orchestrator to know which checks each
framework consumes from the shared ``findings_cache``, so that once a
framework finishes the entries no other pending framework needs can be
evicted from the cache (PROWLER-1733).
Used by the master report orchestrator to evict entries from
``findings_cache`` once no pending framework needs them (PROWLER-1733).
Args:
compliance_obj: A loaded Compliance framework object exposing a
``Requirements`` iterable, each requirement carrying ``Checks``.
``None`` is treated as "no checks" rather than raising, so the
caller can pass ``frameworks_bulk.get(...)`` directly without
an extra existence check.
Returns:
Set of check_id strings (empty if ``compliance_obj`` is ``None``).
Accepts the legacy ``Compliance`` shape (``Requirements`` / ``Checks``
lists) and the universal ``ComplianceFramework`` shape (``requirements``
/ ``checks`` dict keyed by provider). ``None`` returns an empty set so
callers can pass ``frameworks_bulk.get(...)`` directly.
"""
if compliance_obj is None:
return set()
checks: set[str] = set()
requirements = getattr(compliance_obj, "Requirements", None) or []
requirements = getattr(compliance_obj, "Requirements", None) or getattr(
compliance_obj, "requirements", None
)
if not requirements:
return set()
check_ids: set[str] = set()
try:
# Defensive: Mock objects (used in unit tests) return another Mock
# for any attribute access, which is truthy but not iterable. Treat
# any non-iterable Requirements value as "no checks".
for req in requirements:
req_checks = getattr(req, "Checks", None) or []
# Mock objects in unit tests return another Mock for any attribute
# access — truthy but not iterable. Treat that as "no checks".
for requirement in requirements:
requirement_checks = getattr(requirement, "Checks", None)
if requirement_checks is None:
checks_by_provider = getattr(requirement, "checks", None) or {}
requirement_checks = [
check_id
for check_ids_list in checks_by_provider.values()
for check_id in check_ids_list
]
try:
checks.update(req_checks)
check_ids.update(requirement_checks)
except TypeError:
continue
except TypeError:
return set()
return checks
return check_ids
+79 -5
View File
@@ -46,6 +46,7 @@ from tasks.jobs.lighthouse_providers import (
refresh_lighthouse_provider_models,
)
from tasks.jobs.muting import mute_historical_findings
from tasks.jobs.orphan_recovery import reconcile_orphans
from tasks.jobs.report import (
STALE_TMP_OUTPUT_MAX_AGE_HOURS,
_cleanup_stale_tmp_output_directories,
@@ -67,7 +68,10 @@ from tasks.utils import (
get_next_execution_datetime,
)
from api.compliance import get_compliance_frameworks
from api.compliance import (
get_compliance_frameworks,
get_prowler_provider_compliance,
)
from api.db_router import READ_REPLICA_ALIAS
from api.db_utils import delete_related_daily_task, rls_transaction
from api.decorators import handle_provider_deletion, set_tenant
@@ -75,6 +79,9 @@ from api.models import Finding, Integration, Provider, Scan, ScanSummary, StateC
from api.utils import initialize_prowler_provider
from api.v1.serializers import ScanTaskSerializer
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.compliance.compliance import (
process_universal_compliance_frameworks,
)
from prowler.lib.outputs.compliance.generic.generic import GenericCompliance
from prowler.lib.outputs.finding import Finding as FindingOutput
@@ -462,13 +469,42 @@ def cleanup_stale_attack_paths_scans_task():
return cleanup_stale_attack_paths_scans()
@shared_task(name="reconcile-orphan-tasks", queue="celery")
def reconcile_orphan_tasks_task():
"""Periodic watchdog: recover tasks whose worker is gone (deploys, crashes)."""
return reconcile_orphans()
@shared_task(name="tenant-deletion", queue="deletion", autoretry_for=(Exception,))
def delete_tenant_task(tenant_id: str):
return delete_tenant(pk=tenant_id)
def _scan_tmp_output_directory(tenant_id: str, scan_id: str) -> Path:
"""Root tmp output directory for a scan ({tmp}/{tenant_id}/{scan_id})."""
return Path(DJANGO_TMP_OUTPUT_DIRECTORY) / str(tenant_id) / str(scan_id)
class ScanReportRLSTask(RLSTask):
"""
RLS task that removes the scan's tmp output directory when the task fails.
Covers failures both inside and outside the task body (e.g. ENOSPC mid-write,
or setup errors) so partial artifacts do not accumulate on the worker disk.
"""
def on_failure(self, exc, task_id, args, kwargs, _einfo): # noqa: ARG002
del args # Required by Celery's Task.on_failure signature; not used.
tenant_id = kwargs.get("tenant_id")
scan_id = kwargs.get("scan_id")
if tenant_id and scan_id:
logger.error(f"Scan report task {task_id} failed: {exc}")
rmtree(_scan_tmp_output_directory(tenant_id, scan_id), ignore_errors=True)
@shared_task(
base=RLSTask,
base=ScanReportRLSTask,
name="scan-report",
queue="scan-reports",
)
@@ -513,11 +549,23 @@ def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
provider_uid = provider_obj.uid
provider_type = provider_obj.provider
# Per-framework exporters in `COMPLIANCE_CLASS_MAP` consume the legacy bulk.
frameworks_bulk = Compliance.get_bulk(provider_type)
# Universal-only frameworks (top-level JSONs like `dora.json`) are emitted
# via `process_universal_compliance_frameworks` below.
universal_bulk = get_prowler_provider_compliance(provider_type)
universal_only_names = {
name
for name in universal_bulk
if name not in frameworks_bulk and universal_bulk[name].outputs
}
frameworks_avail = get_compliance_frameworks(provider_type)
out_dir, comp_dir = _generate_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY, provider_uid, tenant_id, scan_id
)
# Removed on success here and on failure by ScanReportRLSTask.on_failure,
# so partial artifacts do not accumulate and fill the disk (ENOSPC).
scan_tmp_dir = _scan_tmp_output_directory(tenant_id, scan_id)
def get_writer(writer_map, name, factory, is_last):
"""
@@ -535,6 +583,10 @@ def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
output_writers = {}
compliance_writers = {}
# Shared across batches so universal writers are created once and reused.
universal_compliance_state: dict[str, list] = {"compliance": []}
universal_base_dir = os.path.dirname(out_dir)
universal_output_filename = os.path.basename(out_dir)
scan_summary = FindingOutput._transform_findings_stats(
ScanSummary.objects.filter(scan_id=scan_id)
@@ -542,7 +594,7 @@ def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
# Check if we need to generate ASFF output for AWS providers with SecurityHub integration
generate_asff = False
if provider_type == "aws":
if provider_type == Provider.ProviderChoices.AWS:
security_hub_integrations = Integration.objects.filter(
integrationproviderrelationship__provider_id=provider_id,
integration_type=Integration.IntegrationChoices.AWS_SECURITY_HUB,
@@ -589,8 +641,30 @@ def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
writer.batch_write_data_to_file(**extra)
writer._data.clear()
# Compliance CSVs
# Universal-only frameworks (e.g. `dora.json`).
if universal_only_names:
process_universal_compliance_frameworks(
input_compliance_frameworks=universal_only_names,
universal_frameworks=universal_bulk,
finding_outputs=fos,
output_directory=universal_base_dir,
output_filename=universal_output_filename,
provider=provider_type,
generated_outputs=universal_compliance_state,
from_cli=False,
is_last=is_last,
)
# Compliance CSVs (per-framework exporters).
for name in frameworks_avail:
if name in universal_only_names:
continue
if name not in frameworks_bulk:
logger.warning(
"Compliance framework '%s' missing from bulk; skipping CSV export",
name,
)
continue
compliance_obj = frameworks_bulk[name]
klass = GenericCompliance
@@ -666,7 +740,7 @@ def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
# TODO: We need to create a new periodic task to delete the output files
# This task shouldn't be responsible for deleting the output files
try:
rmtree(Path(compressed).parent, ignore_errors=True)
rmtree(scan_tmp_dir, ignore_errors=True)
except Exception as e:
logger.error(f"Error deleting output files: {e}")
final_location, did_upload = upload_uri, True
+39 -1
View File
@@ -1,11 +1,12 @@
from unittest.mock import call, patch
from uuid import uuid4
import pytest
from django.core.exceptions import ObjectDoesNotExist
from tasks.jobs.deletion import delete_provider, delete_tenant
from api.attack_paths import database as graph_database
from api.models import Provider, Tenant, TenantComplianceSummary
from api.models import JiraIssueDispatch, Provider, Tenant, TenantComplianceSummary
@pytest.mark.django_db
@@ -34,6 +35,43 @@ class TestDeleteProvider:
str(instance.id),
)
def test_delete_provider_removes_jira_dispatches(
self,
providers_fixture,
findings_fixture,
integrations_fixture,
):
"""Deleting a provider removes JiraIssueDispatch rows for its findings only."""
instance = providers_fixture[0]
tenant_id = str(instance.tenant_id)
finding = findings_fixture[0]
integration = integrations_fixture[0]
# Dispatch for one of the provider's findings: must be removed with it.
JiraIssueDispatch.objects.create(
tenant_id=tenant_id,
integration=integration,
finding_id=finding.id,
)
# Dispatch for an unrelated finding: must survive the provider deletion.
unrelated = JiraIssueDispatch.objects.create(
tenant_id=tenant_id,
integration=integration,
finding_id=uuid4(),
)
with (
patch(
"tasks.jobs.deletion.graph_database.get_database_name",
return_value="tenant-db",
),
patch("tasks.jobs.deletion.graph_database.drop_subgraph"),
):
delete_provider(tenant_id, instance.id)
assert not JiraIssueDispatch.objects.filter(finding_id=finding.id).exists()
assert JiraIssueDispatch.objects.filter(pk=unrelated.pk).exists()
def test_delete_provider_does_not_exist(self, tenants_fixture):
with (
patch(
@@ -1640,14 +1640,74 @@ class TestJiraIntegration:
@patch("tasks.jobs.integrations.Finding")
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_skips_already_dispatched(
self,
mock_jira_dispatch,
mock_initialize_integration,
mock_integration_model,
mock_finding_model,
mock_rls_transaction,
):
"""A re-run skips findings already ticketed (no duplicate Jira issues)."""
mock_rls_transaction.return_value.__enter__ = MagicMock()
mock_rls_transaction.return_value.__exit__ = MagicMock()
mock_integration_model.objects.get.return_value = MagicMock()
# finding-1 was already dispatched in a prior run; finding-2 is new.
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = [
"finding-1"
]
mock_jira_dispatch.objects.get_or_create.return_value = (MagicMock(), True)
mock_jira_integration = MagicMock()
mock_jira_integration.send_finding.return_value = True
mock_initialize_integration.return_value = mock_jira_integration
finding2 = MagicMock()
finding2.id = "finding-2"
finding2.check_id = "check_002"
finding2.severity = "low"
finding2.status = "FAIL"
finding2.status_extended = ""
finding2.compliance = {}
finding2.resources.exists.return_value = False
finding2.resources.first.return_value = None
finding2.scan.provider.provider = "aws"
finding2.check_metadata = {
"checktitle": "C2",
"risk": "",
"remediation": {"recommendation": {}, "code": {}},
}
mock_finding_model.all_objects.select_related.return_value.prefetch_related.return_value.get.return_value = finding2
result = send_findings_to_jira(
"tenant-123", "integration-456", "PROJ", "Task", ["finding-1", "finding-2"]
)
# finding-1 skipped (already sent); only finding-2 sent -> no duplicate.
assert result == {"created_count": 1, "failed_count": 0, "skipped_count": 1}
mock_jira_integration.send_finding.assert_called_once()
assert (
mock_jira_integration.send_finding.call_args.kwargs["check_id"]
== "check_002"
)
@patch("tasks.jobs.integrations.rls_transaction")
@patch("tasks.jobs.integrations.Finding")
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_success(
self,
mock_jira_dispatch,
mock_initialize_integration,
mock_integration_model,
mock_finding_model,
mock_rls_transaction,
):
"""Test successful sending of findings to Jira using send_finding method"""
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = []
mock_jira_dispatch.objects.get_or_create.return_value = (MagicMock(), True)
tenant_id = "tenant-123"
integration_id = "integration-456"
project_key = "PROJ"
@@ -1739,7 +1799,7 @@ class TestJiraIntegration:
)
# Assertions
assert result == {"created_count": 2, "failed_count": 0}
assert result == {"created_count": 2, "failed_count": 0, "skipped_count": 0}
# Verify Jira integration was initialized
mock_initialize_integration.assert_called_once_with(integration)
@@ -1771,8 +1831,10 @@ class TestJiraIntegration:
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.logger")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_partial_failure(
self,
mock_jira_dispatch,
mock_logger,
mock_initialize_integration,
mock_integration_model,
@@ -1780,6 +1842,8 @@ class TestJiraIntegration:
mock_rls_transaction,
):
"""Test partial failure when sending findings to Jira"""
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = []
mock_jira_dispatch.objects.get_or_create.return_value = (MagicMock(), True)
tenant_id = "tenant-123"
integration_id = "integration-456"
project_key = "PROJ"
@@ -1833,23 +1897,35 @@ class TestJiraIntegration:
)
# Assertions
assert result == {"created_count": 2, "failed_count": 1}
assert result == {"created_count": 2, "failed_count": 1, "skipped_count": 0}
# Verify error was logged for the failed finding
mock_logger.error.assert_called_with("Failed to send finding finding-2 to Jira")
# The failed finding's reservation is released so a later run can retry it.
mock_jira_dispatch.objects.filter.assert_any_call(
tenant_id=tenant_id,
integration_id=integration_id,
finding_id="finding-2",
)
mock_jira_dispatch.objects.filter.return_value.delete.assert_called_once()
@patch("tasks.jobs.integrations.rls_transaction")
@patch("tasks.jobs.integrations.Finding")
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_no_resources(
self,
mock_jira_dispatch,
mock_initialize_integration,
mock_integration_model,
mock_finding_model,
mock_rls_transaction,
):
"""Test sending findings to Jira when finding has no resources"""
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = []
mock_jira_dispatch.objects.get_or_create.return_value = (MagicMock(), True)
tenant_id = "tenant-123"
integration_id = "integration-456"
project_key = "PROJ"
@@ -1907,7 +1983,7 @@ class TestJiraIntegration:
)
# Assertions
assert result == {"created_count": 1, "failed_count": 0}
assert result == {"created_count": 1, "failed_count": 0, "skipped_count": 0}
# Verify send_finding was called with empty resource fields
call_kwargs = mock_jira_integration.send_finding.call_args.kwargs
@@ -1920,14 +1996,18 @@ class TestJiraIntegration:
@patch("tasks.jobs.integrations.Finding")
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_with_empty_check_metadata(
self,
mock_jira_dispatch,
mock_initialize_integration,
mock_integration_model,
mock_finding_model,
mock_rls_transaction,
):
"""Test sending findings to Jira when check_metadata is empty or missing fields"""
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = []
mock_jira_dispatch.objects.get_or_create.return_value = (MagicMock(), True)
tenant_id = "tenant-123"
integration_id = "integration-456"
project_key = "PROJ"
@@ -1970,7 +2050,7 @@ class TestJiraIntegration:
)
# Assertions
assert result == {"created_count": 1, "failed_count": 0}
assert result == {"created_count": 1, "failed_count": 0, "skipped_count": 0}
# Verify send_finding was called with default/empty values
call_kwargs = mock_jira_integration.send_finding.call_args.kwargs
@@ -1983,3 +2063,94 @@ class TestJiraIntegration:
assert call_kwargs["remediation_code_cli"] == ""
assert call_kwargs["remediation_code_other"] == ""
assert call_kwargs["compliance"] == {}
@patch("tasks.jobs.integrations.rls_transaction")
@patch("tasks.jobs.integrations.Finding")
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_reserves_before_sending(
self,
mock_jira_dispatch,
mock_initialize_integration,
mock_integration_model,
mock_finding_model,
mock_rls_transaction,
):
"""The dispatch row is reserved before the external Jira call (reserve-then-act)."""
mock_rls_transaction.return_value.__enter__ = MagicMock()
mock_rls_transaction.return_value.__exit__ = MagicMock()
mock_integration_model.objects.get.return_value = MagicMock()
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = []
order = []
mock_jira_dispatch.objects.get_or_create.side_effect = lambda **kw: (
order.append(("reserve", kw)) or (MagicMock(), True)
)
mock_jira_integration = MagicMock()
mock_jira_integration.send_finding.side_effect = lambda **kw: (
order.append(("send", kw)) or True
)
mock_initialize_integration.return_value = mock_jira_integration
finding = MagicMock()
finding.id = "finding-1"
finding.check_id = "check_001"
finding.severity = "low"
finding.status = "FAIL"
finding.status_extended = ""
finding.compliance = {}
finding.resources.exists.return_value = False
finding.resources.first.return_value = None
finding.scan.provider.provider = "aws"
finding.check_metadata = {
"checktitle": "C1",
"risk": "",
"remediation": {"recommendation": {}, "code": {}},
}
mock_finding_model.all_objects.select_related.return_value.prefetch_related.return_value.get.return_value = finding
result = send_findings_to_jira(
"tenant-123", "integration-456", "PROJ", "Task", ["finding-1"]
)
assert result == {"created_count": 1, "failed_count": 0, "skipped_count": 0}
# Reservation must precede the external send.
assert [entry[0] for entry in order] == ["reserve", "send"]
# A successful send keeps the reservation (no rollback delete).
mock_jira_dispatch.objects.filter.return_value.delete.assert_not_called()
@patch("tasks.jobs.integrations.rls_transaction")
@patch("tasks.jobs.integrations.Finding")
@patch("tasks.jobs.integrations.Integration")
@patch("tasks.jobs.integrations.initialize_prowler_integration")
@patch("tasks.jobs.integrations.JiraIssueDispatch")
def test_send_findings_to_jira_skips_when_already_reserved(
self,
mock_jira_dispatch,
mock_initialize_integration,
mock_integration_model,
mock_finding_model,
mock_rls_transaction,
):
"""A finding that races past the bulk pre-check but loses the reservation
(created=False) is skipped without a second issue, leaving the row intact."""
mock_rls_transaction.return_value.__enter__ = MagicMock()
mock_rls_transaction.return_value.__exit__ = MagicMock()
mock_integration_model.objects.get.return_value = MagicMock()
mock_jira_dispatch.objects.filter.return_value.values_list.return_value = []
# Another concurrent run already created the dispatch row.
mock_jira_dispatch.objects.get_or_create.return_value = (MagicMock(), False)
mock_jira_integration = MagicMock()
mock_initialize_integration.return_value = mock_jira_integration
result = send_findings_to_jira(
"tenant-123", "integration-456", "PROJ", "Task", ["finding-1"]
)
assert result == {"created_count": 0, "failed_count": 0, "skipped_count": 1}
mock_jira_integration.send_finding.assert_not_called()
# The reservation belongs to the run that won the race; do not delete it.
mock_jira_dispatch.objects.filter.return_value.delete.assert_not_called()
@@ -0,0 +1,372 @@
from datetime import datetime, timedelta, timezone
from unittest.mock import MagicMock, patch
from uuid import uuid4
import pytest
from celery import states
from django_celery_results.models import TaskResult
from api.models import Scan, StateChoices
from api.models import Task as APITask
from tasks.jobs.orphan_recovery import (
_decode_celery_field,
_reconcile_task_results,
_recovery_attempt_count,
_reenqueue_scan,
advisory_lock,
is_worker_alive,
)
def _orphan_result(*, name, kwargs, worker, created_minutes_ago, status=states.STARTED):
"""Create a TaskResult mimicking an in-flight task, backdated past the grace."""
tr = TaskResult.objects.create(
task_id=str(uuid4()),
status=status,
task_name=name,
worker=worker,
task_kwargs=repr(kwargs),
task_args=repr([]),
)
TaskResult.objects.filter(pk=tr.pk).update(
date_created=datetime.now(tz=timezone.utc)
- timedelta(minutes=created_minutes_ago)
)
tr.refresh_from_db()
return tr
@pytest.mark.django_db
class TestDecodeCeleryField:
def test_decodes_single_encoded_repr(self):
assert _decode_celery_field("{'tenant_id': 'abc'}", {}) == {"tenant_id": "abc"}
def test_decodes_double_encoded(self):
import json
stored = json.dumps(repr({"tenant_id": "abc", "scan_id": "s1"}))
assert _decode_celery_field(stored, {}) == {"tenant_id": "abc", "scan_id": "s1"}
def test_empty_returns_default(self):
assert _decode_celery_field(None, {}) == {}
assert _decode_celery_field("", []) == []
def test_unparseable_raises(self):
with pytest.raises(ValueError):
_decode_celery_field("<<not a literal>>", {})
@pytest.mark.django_db
class TestReconcileTaskResults:
def _patches(self, alive):
"""Patch worker liveness, revoke, and the task registry for re-enqueue."""
mock_app = MagicMock()
mock_task = MagicMock()
mock_app.tasks.get.return_value = mock_task
return (
patch("tasks.jobs.orphan_recovery.is_worker_alive", return_value=alive),
patch("tasks.jobs.orphan_recovery.revoke_task"),
patch("tasks.jobs.orphan_recovery.current_app", mock_app),
mock_task,
)
def test_recovers_non_scan_task(self, tenants_fixture):
"""A NON-scan task (tenant-deletion) left orphaned is re-enqueued too."""
tenant = tenants_fixture[0]
tr = _orphan_result(
name="tenant-deletion",
kwargs={"tenant_id": str(tenant.id)},
worker="dead@gone",
created_minutes_ago=60,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=False)
with (
p_alive,
p_revoke,
p_app,
patch("tasks.jobs.orphan_recovery._recovery_attempt_count", return_value=1),
):
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert tr.task_id in result["recovered"]
tr.refresh_from_db()
assert tr.status == states.REVOKED # stale result cleared (no pending alert)
mock_task.apply_async.assert_called_once()
call = mock_task.apply_async.call_args.kwargs
assert call["kwargs"] == {"tenant_id": str(tenant.id)}
assert call["task_id"] != tr.task_id # fresh task id
def test_external_integration_task_is_not_reenqueued_by_default(
self, tenants_fixture
):
"""External side-effect tasks without proven idempotency stay terminal.
integration-s3 rebuilds its upload from worker-local files that do not
survive the crash, so re-enqueuing it would upload nothing.
"""
tr = _orphan_result(
name="integration-s3",
kwargs={
"tenant_id": str(tenants_fixture[0].id),
"provider_id": str(uuid4()),
"output_directory": "/tmp/gone",
},
worker="dead@gone",
created_minutes_ago=60,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=False)
with (
p_alive,
p_revoke,
p_app,
patch("tasks.jobs.orphan_recovery._recovery_attempt_count", return_value=1),
):
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert tr.task_id in result["failed"]
mock_task.apply_async.assert_not_called()
def test_jira_integration_task_is_reenqueued(self, tenants_fixture):
"""integration-jira is re-enqueued: its JiraIssueDispatch reservation makes a
re-run skip already-ticketed findings, so recovery cannot duplicate issues."""
tenant = tenants_fixture[0]
kwargs = {
"tenant_id": str(tenant.id),
"integration_id": str(uuid4()),
"project_key": "PROWLER",
"issue_type": "Task",
"finding_ids": [str(uuid4()), str(uuid4())],
}
tr = _orphan_result(
name="integration-jira",
kwargs=kwargs,
worker="dead@gone",
created_minutes_ago=60,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=False)
with (
p_alive,
p_revoke,
p_app,
patch("tasks.jobs.orphan_recovery._recovery_attempt_count", return_value=1),
):
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert tr.task_id in result["recovered"]
tr.refresh_from_db()
assert tr.status == states.REVOKED # stale result cleared (no pending alert)
mock_task.apply_async.assert_called_once()
call = mock_task.apply_async.call_args.kwargs
assert call["kwargs"] == kwargs
assert call["task_id"] != tr.task_id # fresh task id
def test_skips_live_worker(self, tenants_fixture):
tr = _orphan_result(
name="tenant-deletion",
kwargs={"tenant_id": str(tenants_fixture[0].id)},
worker="alive@host",
created_minutes_ago=60,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=True)
with p_alive, p_revoke, p_app:
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert tr.task_id in result["skipped"]
mock_task.apply_async.assert_not_called()
def test_skips_recently_created(self, tenants_fixture):
tr = _orphan_result(
name="tenant-deletion",
kwargs={"tenant_id": str(tenants_fixture[0].id)},
worker="dead@gone",
created_minutes_ago=0,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=False)
with p_alive, p_revoke, p_app:
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
# too recent: excluded by the grace window (not even a candidate)
assert tr.task_id not in result["recovered"]
mock_task.apply_async.assert_not_called()
def test_denylisted_task_failed_not_reenqueued(self, tenants_fixture):
"""A non-allowlisted task is failed, never blind re-run."""
tr = _orphan_result(
name="some-non-idempotent-task",
kwargs={"tenant_id": str(tenants_fixture[0].id)},
worker="dead@gone",
created_minutes_ago=60,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=False)
with (
p_alive,
p_revoke,
p_app,
patch("tasks.jobs.orphan_recovery._recovery_attempt_count", return_value=1),
):
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert tr.task_id in result["failed"]
tr.refresh_from_db()
assert tr.status == states.REVOKED
mock_task.apply_async.assert_not_called()
def test_recovery_cap_marks_failed(self, tenants_fixture):
"""When the recovery counter exceeds the cap, the task is failed not re-run."""
tr = _orphan_result(
name="tenant-deletion",
kwargs={"tenant_id": str(tenants_fixture[0].id)},
worker="dead@gone",
created_minutes_ago=60,
)
p_alive, p_revoke, p_app, mock_task = self._patches(alive=False)
with (
p_alive,
p_revoke,
p_app,
patch("tasks.jobs.orphan_recovery._recovery_attempt_count", return_value=4),
):
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert tr.task_id in result["failed"]
mock_task.apply_async.assert_not_called()
@pytest.mark.django_db
class TestScanRecovery:
"""Scans are recovered by re-running scan-perform on the EXISTING scan row,
so even a scheduled-scan orphan (whose own task would no-op on its guard) is
actually re-executed."""
def _scan_orphan(self, tenant, provider, name):
old_id = str(uuid4())
tr = TaskResult.objects.create(
task_id=old_id,
status=states.STARTED,
task_name=name,
worker="dead@gone",
task_kwargs=repr(
{"tenant_id": str(tenant.id), "provider_id": str(provider.id)}
),
task_args=repr([]),
)
TaskResult.objects.filter(pk=tr.pk).update(
date_created=datetime.now(tz=timezone.utc) - timedelta(minutes=60)
)
APITask.objects.create(id=old_id, tenant_id=tenant.id, task_runner_task=tr)
scan = Scan.objects.create(
name="scan-orphan",
provider=provider,
trigger=Scan.TriggerChoices.SCHEDULED,
state=StateChoices.EXECUTING,
tenant_id=tenant.id,
task_id=old_id,
recovery_count=0,
)
return old_id, scan
@pytest.mark.parametrize("name", ["scan-perform", "scan-perform-scheduled"])
def test_scan_recovered_via_scan_perform(
self, tenants_fixture, providers_fixture, name
):
tenant, provider = tenants_fixture[0], providers_fixture[0]
old_id, scan = self._scan_orphan(tenant, provider, name)
with (
patch("tasks.jobs.orphan_recovery.is_worker_alive", return_value=False),
patch("tasks.jobs.orphan_recovery.revoke_task"),
patch("tasks.jobs.orphan_recovery._recovery_attempt_count", return_value=1),
patch("tasks.tasks.perform_scan_task") as mock_scan_task,
):
result = _reconcile_task_results(
grace_minutes=2, max_attempts=3, window_hours=6, dry_run=False
)
assert old_id in result["recovered"]
scan.refresh_from_db()
assert str(scan.task_id) != old_id # relinked to a fresh task
assert scan.recovery_count == 1
assert TaskResult.objects.get(task_id=old_id).status == states.REVOKED
# Recovered by re-running scan-perform on the existing scan row (so the
# scheduled guard cannot no-op it), regardless of the original task name.
mock_scan_task.apply_async.assert_called_once()
assert mock_scan_task.apply_async.call_args.kwargs["kwargs"]["scan_id"] == str(
scan.id
)
def test_reenqueue_skips_when_scan_already_repointed(
self, tenants_fixture, providers_fixture
):
# The scan already points at a newer task, so a stale orphan must not relink
# it or launch a second concurrent run against the same scan row.
tenant, provider = tenants_fixture[0], providers_fixture[0]
newer_id = str(uuid4())
tr = TaskResult.objects.create(
task_id=newer_id, status=states.STARTED, task_name="scan-perform"
)
APITask.objects.create(id=newer_id, tenant_id=tenant.id, task_runner_task=tr)
scan = Scan.objects.create(
name="scan-orphan",
provider=provider,
trigger=Scan.TriggerChoices.SCHEDULED,
state=StateChoices.EXECUTING,
tenant_id=tenant.id,
task_id=newer_id,
recovery_count=0,
)
with patch("tasks.tasks.perform_scan_task") as mock_scan_task:
recovered = _reenqueue_scan(str(uuid4()), scan)
assert recovered is False
mock_scan_task.apply_async.assert_not_called()
scan.refresh_from_db()
assert scan.recovery_count == 0
@pytest.mark.django_db
class TestOrphanRecoveryHelpers:
def test_advisory_lock_acquires_and_releases(self):
with advisory_lock() as acquired:
assert acquired is True
def test_is_worker_alive_true_when_responds(self):
inspect = MagicMock()
inspect.ping.return_value = {"w@h": {"ok": "pong"}}
with patch(
"tasks.jobs.orphan_recovery.current_app.control.inspect",
return_value=inspect,
):
assert is_worker_alive("w@h") is True
def test_is_worker_alive_false_when_silent(self):
inspect = MagicMock()
inspect.ping.return_value = None
with patch(
"tasks.jobs.orphan_recovery.current_app.control.inspect",
return_value=inspect,
):
assert is_worker_alive("w@h") is False
def test_recovery_attempt_count_increments(self):
# Unique signature so the Valkey counter starts fresh for this test.
kwargs_repr = repr({"probe": str(uuid4())})
redis_client = MagicMock()
redis_client.incr.side_effect = [1, 2]
with patch("redis.from_url", return_value=redis_client):
assert _recovery_attempt_count("probe-task", kwargs_repr, 6) == 1
assert _recovery_attempt_count("probe-task", kwargs_repr, 6) == 2
+138 -22
View File
@@ -539,12 +539,12 @@ class TestLoadFindingsForChecks:
total_counts_out=totals,
)
assert (
len(result[check_id]) == 5
), f"cap=5 should yield exactly 5 loaded findings, got {len(result[check_id])}"
assert (
totals[check_id] == 12
), f"total_counts_out should report the pre-cap total (12), got {totals[check_id]}"
assert len(result[check_id]) == 5, (
f"cap=5 should yield exactly 5 loaded findings, got {len(result[check_id])}"
)
assert totals[check_id] == 12, (
f"total_counts_out should report the pre-cap total (12), got {totals[check_id]}"
)
def test_only_failed_findings_pushes_down_to_sql(
self, tenants_fixture, scans_fixture
@@ -616,13 +616,13 @@ class TestLoadFindingsForChecks:
loaded = result[check_id]
assert len(loaded) == 3, f"expected 3 FAIL findings, got {len(loaded)}"
statuses = {getattr(f, "status", None) for f in loaded}
assert statuses == {
StatusChoices.FAIL
}, f"expected all loaded findings to be FAIL; got statuses {statuses}"
assert statuses == {StatusChoices.FAIL}, (
f"expected all loaded findings to be FAIL; got statuses {statuses}"
)
# total_counts must reflect the FAIL-only total, not the global total.
assert (
totals[check_id] == 3
), f"total_counts should be FAIL-only (3), got {totals[check_id]}"
assert totals[check_id] == 3, (
f"total_counts should be FAIL-only (3), got {totals[check_id]}"
)
def test_max_findings_per_check_disabled(self, tenants_fixture, scans_fixture):
"""``MAX_FINDINGS_PER_CHECK=0`` disables the cap; load all rows."""
@@ -1259,12 +1259,12 @@ class TestGenerateComplianceReportsOptimized:
# ``tsc_only`` was exclusive to ThreatScore → evicted before ENS ran.
# ``shared`` is still pending for ENS → must remain.
assert (
"tsc_only" not in observed_state["cache_keys_when_ens_runs"]
), "tsc_only should have been evicted before ENS ran"
assert (
"shared" in observed_state["cache_keys_when_ens_runs"]
), "shared must remain in cache because ENS still needs it"
assert "tsc_only" not in observed_state["cache_keys_when_ens_runs"], (
"tsc_only should have been evicted before ENS ran"
)
assert "shared" in observed_state["cache_keys_when_ens_runs"], (
"shared must remain in cache because ENS still needs it"
)
@patch("tasks.jobs.report.initialize_prowler_provider")
@patch("tasks.jobs.report.rmtree")
@@ -1301,8 +1301,15 @@ class TestGenerateComplianceReportsOptimized:
"""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="provider-uid", provider="aws")
# CIS variant discovery needs at least one cis_* key.
mock_get_bulk.return_value = {"cis_6.0_aws": Mock()}
# All five report frameworks exist for AWS; report availability is now
# derived from the framework map (CIS variant discovery needs a cis_* key).
mock_get_bulk.return_value = {
"prowler_threatscore_aws": Mock(),
"ens_rd2022_aws": Mock(),
"nis2_aws": Mock(),
"csa_ccm_4.0_aws": Mock(),
"cis_6.0_aws": Mock(),
}
mock_aggregate_stats.return_value = {}
mock_generate_output_dir.return_value = "/tmp/tenant/scan/x/prowler-out"
mock_upload_to_s3.return_value = "s3://bucket/tenant/scan/x/report.pdf"
@@ -1365,7 +1372,9 @@ class TestGenerateComplianceReportsOptimized:
"""Cleanup must run when all generated (supported) reports are uploaded."""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="provider-uid", provider="m365")
mock_get_bulk.return_value = {}
# Real ``Compliance.get_bulk("m365")`` exposes the ThreatScore framework
# but not ENS/NIS2/CSA; report availability is derived from this map.
mock_get_bulk.return_value = {"prowler_threatscore_m365": Mock()}
mock_aggregate_stats.return_value = {}
mock_generate_output_dir.return_value = (
"/tmp/tenant/scan/threatscore/prowler-output-provider-20240101000000"
@@ -1416,7 +1425,9 @@ class TestGenerateComplianceReportsOptimized:
"""Cleanup must not run when a generated report upload fails."""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="provider-uid", provider="m365")
mock_get_bulk.return_value = {}
# Real ``Compliance.get_bulk("m365")`` exposes the ThreatScore framework
# but not ENS/NIS2/CSA; report availability is derived from this map.
mock_get_bulk.return_value = {"prowler_threatscore_m365": Mock()}
mock_aggregate_stats.return_value = {}
mock_generate_output_dir.return_value = (
"/tmp/tenant/scan/threatscore/prowler-output-provider-20240101000000"
@@ -1440,6 +1451,111 @@ class TestGenerateComplianceReportsOptimized:
mock_threatscore.assert_called_once()
mock_rmtree.assert_not_called()
@patch("tasks.jobs.report.generate_csa_report")
@patch("tasks.jobs.report.generate_nis2_report")
@patch("tasks.jobs.report.generate_ens_report")
@patch("tasks.jobs.report.generate_threatscore_report")
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report.Compliance.get_bulk")
@patch("tasks.jobs.report.Provider.objects.get")
@patch("tasks.jobs.report.ScanSummary.objects.filter")
def test_external_provider_without_frameworks_disables_all_reports(
self,
mock_scan_summary_filter,
mock_provider_get,
mock_get_bulk,
mock_aggregate_stats,
mock_threatscore,
mock_ens,
mock_nis2,
mock_csa,
):
"""An externally-registered provider with no compliance frameworks gets
every report disabled (empty output set) without a code change."""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="acme-001", provider="local-template")
mock_get_bulk.return_value = {}
mock_aggregate_stats.return_value = {}
result = generate_compliance_reports(
tenant_id=str(uuid.uuid4()),
scan_id=str(uuid.uuid4()),
provider_id=str(uuid.uuid4()),
generate_threatscore=True,
generate_ens=True,
generate_nis2=True,
generate_csa=True,
generate_cis=False,
)
assert result["threatscore"]["upload"] is False
assert result["ens"]["upload"] is False
assert result["nis2"]["upload"] is False
assert result["csa"]["upload"] is False
mock_threatscore.assert_not_called()
mock_ens.assert_not_called()
mock_nis2.assert_not_called()
mock_csa.assert_not_called()
@patch("tasks.jobs.report.initialize_prowler_provider")
@patch("tasks.jobs.report.rmtree")
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_csa_report")
@patch("tasks.jobs.report.generate_nis2_report")
@patch("tasks.jobs.report.generate_ens_report")
@patch("tasks.jobs.report.generate_threatscore_report")
@patch("tasks.jobs.report._generate_compliance_output_directory")
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report.Compliance.get_bulk")
@patch("tasks.jobs.report.Provider.objects.get")
@patch("tasks.jobs.report.ScanSummary.objects.filter")
def test_external_provider_with_threatscore_framework_enables_it(
self,
mock_scan_summary_filter,
mock_provider_get,
mock_get_bulk,
mock_aggregate_stats,
mock_generate_output_dir,
mock_threatscore,
mock_ens,
mock_nis2,
mock_csa,
mock_upload_to_s3,
mock_rmtree,
mock_init_provider,
):
"""An externally-registered provider that ships its own ThreatScore
framework gets the ThreatScore report generated with no API code change
exactly what the old hard-coded provider whitelist silently blocked.
Frameworks it does not ship (ENS/NIS2/CSA) stay disabled."""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="acme-001", provider="local-template")
mock_get_bulk.return_value = {"prowler_threatscore_local-template": Mock()}
mock_aggregate_stats.return_value = {}
mock_generate_output_dir.return_value = "/tmp/tenant/scan/threatscore/out"
mock_upload_to_s3.return_value = "s3://bucket/report.pdf"
mock_init_provider.return_value = Mock(name="prowler_provider")
result = generate_compliance_reports(
tenant_id=str(uuid.uuid4()),
scan_id=str(uuid.uuid4()),
provider_id=str(uuid.uuid4()),
generate_threatscore=True,
generate_ens=True,
generate_nis2=True,
generate_csa=True,
generate_cis=False,
)
# The provider ships the ThreatScore framework -> report generated.
mock_threatscore.assert_called_once()
assert result["threatscore"]["upload"] is True
# It does not ship the others -> they stay disabled.
mock_ens.assert_not_called()
mock_nis2.assert_not_called()
mock_csa.assert_not_called()
assert result["ens"]["upload"] is False
@pytest.mark.django_db
class TestGenerateComplianceReportsCIS:
@@ -80,7 +80,7 @@ def basic_csa_compliance_data():
tenant_id="tenant-123",
scan_id="scan-456",
provider_id="provider-789",
compliance_id="csa_ccm_4.0_aws",
compliance_id="csa_ccm_4.0",
framework="CSA-CCM",
name="CSA Cloud Controls Matrix v4.0",
version="4.0",
+184
View File
@@ -32,12 +32,15 @@ from tasks.utils import CustomEncoder
from api.db_router import MainRouter
from api.exceptions import ProviderConnectionError
from api.models import (
AttackSurfaceOverview,
Finding,
MuteRule,
Provider,
Resource,
ResourceScanSummary,
Scan,
ScanCategorySummary,
ScanGroupSummary,
ScanSummary,
StateChoices,
StatusChoices,
@@ -229,6 +232,131 @@ class TestPerformScan:
# Assert that failed_findings_count is 0 (finding is PASS and muted)
assert scan_resource.failed_findings_count == 0
def test_perform_prowler_scan_idempotent_on_rerun(
self,
tenants_fixture,
scans_fixture,
providers_fixture,
):
"""Re-running a scan for the same scan_id must not duplicate findings."""
with (
patch("api.db_utils.rls_transaction"),
patch(
"tasks.jobs.scan.initialize_prowler_provider"
) as mock_initialize_prowler_provider,
patch("tasks.jobs.scan.ProwlerScan") as mock_prowler_scan_class,
patch(
"tasks.jobs.scan.PROWLER_COMPLIANCE_OVERVIEW_TEMPLATE",
new_callable=dict,
),
patch("api.compliance.PROWLER_CHECKS", new_callable=dict) as mock_checks,
):
mock_checks["aws"] = {"check1": {"compliance1"}}
tenant = tenants_fixture[0]
scan = scans_fixture[0]
provider = providers_fixture[0]
provider.provider = Provider.ProviderChoices.AWS
provider.save()
tenant_id = str(tenant.id)
scan_id = str(scan.id)
provider_id = str(provider.id)
stale_resource = Resource.objects.create(
tenant_id=tenant.id,
provider=provider,
uid="stale_resource_uid",
name="stale",
region="stale-region",
service="stale-service",
type="stale-type",
)
ResourceScanSummary.objects.create(
tenant_id=tenant.id,
scan_id=scan.id,
resource_id=stale_resource.id,
service="stale-service",
region="stale-region",
resource_type="stale-type",
)
ScanCategorySummary.objects.create(
tenant_id=tenant.id,
scan=scan,
category="stale-category",
severity=Severity.medium,
total_findings=1,
)
ScanGroupSummary.objects.create(
tenant_id=tenant.id,
scan=scan,
resource_group="stale-group",
severity=Severity.medium,
total_findings=1,
)
ScanSummary.objects.create(
tenant_id=tenant.id,
scan=scan,
check_id="stale_check",
service="stale-service",
severity=Severity.medium,
region="stale-region",
total=1,
)
AttackSurfaceOverview.objects.create(
tenant_id=tenant.id,
scan=scan,
attack_surface_type=AttackSurfaceOverview.AttackSurfaceTypeChoices.SECRETS,
total_findings=1,
)
finding = MagicMock()
finding.uid = "dup_probe_finding"
finding.status = StatusChoices.PASS
finding.status_extended = "x"
finding.severity = Severity.medium
finding.check_id = "check1"
finding.get_metadata.return_value = {"key": "value"}
finding.resource_uid = "resource_uid"
finding.resource_name = "resource_name"
finding.region = "region"
finding.service_name = "service_name"
finding.resource_type = "resource_type"
finding.resource_tags = {}
finding.muted = False
finding.raw = {}
finding.resource_metadata = {}
finding.resource_details = {}
finding.partition = "partition"
finding.compliance = {}
mock_scan_instance = MagicMock()
mock_scan_instance.scan.return_value = [(100, [finding])]
mock_prowler_scan_class.return_value = mock_scan_instance
mock_provider_instance = MagicMock()
mock_provider_instance.get_regions.return_value = ["region"]
mock_initialize_prowler_provider.return_value = mock_provider_instance
# Run the same scan twice (simulating an orphan-recovery re-run).
perform_prowler_scan(tenant_id, scan_id, provider_id, ["check1"])
perform_prowler_scan(tenant_id, scan_id, provider_id, ["check1"])
# Neither findings nor resources are duplicated by the re-run: findings are
# scope-deleted before re-insert; resources are upserted by (tenant, provider, uid).
assert Finding.objects.filter(scan=scan).count() == 1
assert Resource.objects.filter(provider=provider).count() == 2
assert ResourceScanSummary.objects.filter(scan_id=scan.id).count() == 1
assert not ResourceScanSummary.objects.filter(
scan_id=scan.id, resource_id=stale_resource.id
).exists()
assert not ScanCategorySummary.objects.filter(scan=scan).exists()
assert not ScanGroupSummary.objects.filter(scan=scan).exists()
assert not ScanSummary.objects.filter(
scan=scan, check_id="stale_check"
).exists()
assert not AttackSurfaceOverview.objects.filter(scan=scan).exists()
@patch("tasks.jobs.scan.ProwlerScan")
@patch(
"tasks.jobs.scan.initialize_prowler_provider",
@@ -1880,6 +2008,62 @@ class TestCreateComplianceRequirements:
assert "requirements_created" in result
@pytest.mark.django_db(transaction=True)
def test_create_compliance_requirements_idempotent_on_rerun(
self,
tenants_fixture,
scans_fixture,
providers_fixture,
findings_fixture,
):
"""Re-running compliance materialization must not raise nor duplicate rows.
Uses transaction=True because the COPY path commits on its own connection,
so the test must use real commits (mirroring production) rather than the
default rollback wrapper.
"""
from api.models import ComplianceRequirementOverview
with patch(
"tasks.jobs.scan.PROWLER_COMPLIANCE_OVERVIEW_TEMPLATE"
) as mock_compliance_template:
tenant_id = str(tenants_fixture[0].id)
scan_id = str(scans_fixture[0].id)
mock_compliance_template.__getitem__.return_value = {
"test_compliance": {
"framework": "Test Framework",
"version": "1.0",
"requirements": {
"req_1": {
"description": "Test Requirement 1",
"checks": {"test_check_id": None},
"checks_status": {
"pass": 2,
"fail": 1,
"manual": 0,
"total": 3,
},
"status": "FAIL",
},
},
}
}
create_compliance_requirements(tenant_id, scan_id)
count_after_first = ComplianceRequirementOverview.objects.filter(
scan_id=scan_id
).count()
# Second run must not raise and must not duplicate rows.
create_compliance_requirements(tenant_id, scan_id)
count_after_second = ComplianceRequirementOverview.objects.filter(
scan_id=scan_id
).count()
assert count_after_first > 0
assert count_after_second == count_after_first
def test_create_compliance_requirements_kubernetes_provider(
self,
tenants_fixture,
+166 -3
View File
@@ -15,8 +15,10 @@ from tasks.jobs.lighthouse_providers import (
from tasks.tasks import (
DJANGO_TMP_OUTPUT_DIRECTORY,
STALE_TMP_OUTPUT_MAX_AGE_HOURS,
ScanReportRLSTask,
_cleanup_orphan_scheduled_scans,
_perform_scan_complete_tasks,
_scan_tmp_output_directory,
check_integrations_task,
check_lighthouse_provider_connection_task,
generate_outputs_task,
@@ -321,6 +323,7 @@ class TestGenerateOutputs:
mock_transformed_stats = {"some": "stats"}
with (
patch("tasks.tasks.get_prowler_provider_compliance", return_value={}),
patch(
"tasks.tasks.FindingOutput._transform_findings_stats",
return_value=mock_transformed_stats,
@@ -439,6 +442,7 @@ class TestGenerateOutputs:
mock_provider.uid = "test-provider-uid"
with (
patch("tasks.tasks.get_prowler_provider_compliance", return_value={}),
patch("tasks.tasks.ScanSummary.objects.filter") as mock_filter,
patch("tasks.tasks.Provider.objects.get", return_value=mock_provider),
patch("tasks.tasks.initialize_prowler_provider"),
@@ -594,6 +598,7 @@ class TestGenerateOutputs:
]
with (
patch("tasks.tasks.get_prowler_provider_compliance", return_value={}),
patch("tasks.tasks.ScanSummary.objects.filter") as mock_summary,
patch(
"tasks.tasks.Provider.objects.get",
@@ -643,6 +648,92 @@ class TestGenerateOutputs:
assert writer.transform_calls == [([raw2], compliance_obj, "cis")]
assert result == {"upload": True}
def test_external_provider_falls_back_to_generic_compliance(self):
"""A provider absent from COMPLIANCE_CLASS_MAP (e.g. an externally
registered one) gets GenericCompliance for every framework the
provider-specific writers are never used and nothing errors."""
raw = MagicMock()
compliance_obj = MagicMock()
generic_instances = []
specific_instances = []
class TrackingGeneric:
def __init__(self, *args, **kwargs):
self._data = []
self.close_file = False
generic_instances.append(self)
def transform(self, *args, **kwargs):
pass
def batch_write_data_to_file(self):
pass
class AwsSpecificWriter:
def __init__(self, *args, **kwargs):
specific_instances.append(self)
def transform(self, *args, **kwargs):
pass
def batch_write_data_to_file(self):
pass
one_batch = [([raw], True)]
with (
patch("tasks.tasks.ScanSummary.objects.filter") as mock_summary,
patch(
"tasks.tasks.Provider.objects.get",
return_value=MagicMock(uid="acme-001", provider="local-template"),
),
patch("tasks.tasks.initialize_prowler_provider"),
patch(
"tasks.tasks.Compliance.get_bulk",
return_value={"prowlerbaseline_1.0_local-template": compliance_obj},
),
patch(
"tasks.tasks.get_compliance_frameworks",
return_value=["prowlerbaseline_1.0_local-template"],
),
patch(
"tasks.tasks._generate_output_directory",
return_value=("/tmp/test/outdir", "/tmp/test/compdir"),
),
patch("tasks.tasks.FindingOutput._transform_findings_stats"),
patch(
"tasks.tasks.FindingOutput.transform_api_finding",
side_effect=lambda f, _prov: f,
),
patch("tasks.tasks._compress_output_files", return_value="outdir.zip"),
patch("tasks.tasks._upload_to_s3", return_value="s3://bucket/outdir.zip"),
patch(
"tasks.tasks.Scan.all_objects.filter",
return_value=MagicMock(update=lambda **_kw: None),
),
patch("tasks.tasks.batched", return_value=one_batch),
patch("tasks.tasks.OUTPUT_FORMATS_MAPPING", {}),
patch("tasks.tasks.rmtree"),
patch("tasks.tasks.GenericCompliance", TrackingGeneric),
patch(
"tasks.tasks.COMPLIANCE_CLASS_MAP",
{"aws": [(lambda name: True, AwsSpecificWriter)]},
),
):
mock_summary.return_value.exists.return_value = True
result = generate_outputs_task(
scan_id=self.scan_id,
provider_id=self.provider_id,
tenant_id=self.tenant_id,
)
assert result == {"upload": True}
# The external provider's framework used the generic writer…
assert len(generic_instances) == 1
# …and the AWS-specific writer was never instantiated.
assert specific_instances == []
# TODO: We need to add a periodic task to delete old output files
def test_generate_outputs_logs_rmtree_exception(self, caplog):
mock_finding_output = MagicMock()
@@ -668,6 +759,7 @@ class TestGenerateOutputs:
mock_provider.uid = "test-provider-uid"
with (
patch("tasks.tasks.get_prowler_provider_compliance", return_value={}),
patch("tasks.tasks.ScanSummary.objects.filter") as mock_filter,
patch("tasks.tasks.Provider.objects.get", return_value=mock_provider),
patch("tasks.tasks.initialize_prowler_provider"),
@@ -771,6 +863,38 @@ class TestGenerateOutputs:
mock_s3_task.assert_called_once()
class TestScanReportRLSTaskOnFailure:
def test_on_failure_removes_scan_tmp_directory(self):
task = ScanReportRLSTask()
with patch("tasks.tasks.rmtree") as mock_rmtree:
task.on_failure(
exc=OSError("No space left on device"),
task_id="task-abc",
args=(),
kwargs={"tenant_id": "t-1", "scan_id": "s-1"},
_einfo=None,
)
mock_rmtree.assert_called_once_with(
_scan_tmp_output_directory("t-1", "s-1"), ignore_errors=True
)
def test_on_failure_skips_when_missing_kwargs(self):
task = ScanReportRLSTask()
with patch("tasks.tasks.rmtree") as mock_rmtree:
task.on_failure(
exc=OSError("No space left on device"),
task_id="task-abc",
args=(),
kwargs={},
_einfo=None,
)
mock_rmtree.assert_not_called()
class TestScanCompleteTasks:
@patch("tasks.tasks.aggregate_attack_surface_task.apply_async")
@patch("tasks.tasks.chain")
@@ -1079,6 +1203,7 @@ class TestCheckIntegrationsTask:
enabled=True,
)
@patch("tasks.tasks.get_prowler_provider_compliance", return_value={})
@patch("tasks.tasks.s3_integration_task")
@patch("tasks.tasks.Integration.objects.filter")
@patch("tasks.tasks.ScanSummary.objects.filter")
@@ -1111,6 +1236,7 @@ class TestCheckIntegrationsTask:
mock_scan_summary,
mock_integration_filter,
mock_s3_task,
mock_get_prowler_compliance,
):
"""Test that ASFF output is generated for AWS providers with SecurityHub integration."""
# Setup
@@ -1207,6 +1333,7 @@ class TestCheckIntegrationsTask:
assert result == {"upload": True}
@patch("tasks.tasks.get_prowler_provider_compliance", return_value={})
@patch("tasks.tasks.s3_integration_task")
@patch("tasks.tasks.Integration.objects.filter")
@patch("tasks.tasks.ScanSummary.objects.filter")
@@ -1239,6 +1366,7 @@ class TestCheckIntegrationsTask:
mock_scan_summary,
mock_integration_filter,
mock_s3_task,
mock_get_prowler_compliance,
):
"""Test that ASFF output is NOT generated for AWS providers without SecurityHub integration."""
# Setup
@@ -1332,6 +1460,7 @@ class TestCheckIntegrationsTask:
assert result == {"upload": True}
@patch("tasks.tasks.get_prowler_provider_compliance", return_value={})
@patch("tasks.tasks.ScanSummary.objects.filter")
@patch("tasks.tasks.Provider.objects.get")
@patch("tasks.tasks.initialize_prowler_provider")
@@ -1360,6 +1489,7 @@ class TestCheckIntegrationsTask:
mock_initialize_provider,
mock_provider_get,
mock_scan_summary,
mock_get_prowler_compliance,
):
"""Test that ASFF output is NOT generated for non-AWS providers (e.g., Azure, GCP)."""
# Setup
@@ -1434,9 +1564,9 @@ class TestCheckIntegrationsTask:
)
# Verify ASFF was NOT created for non-AWS provider
assert (
"asff" not in created_writers
), "ASFF writer should NOT be created for non-AWS providers"
assert "asff" not in created_writers, (
"ASFF writer should NOT be created for non-AWS providers"
)
assert "csv" in created_writers, "CSV writer should be created"
assert "ocsf" in created_writers, "OCSF writer should be created"
@@ -2672,3 +2802,36 @@ class TestReaggregateAllFindingGroupSummaries:
assert result == {"scans_reaggregated": 0}
mock_group.assert_not_called()
mock_chain.assert_not_called()
class TestTaskTimeLimits:
"""The per-task limits in task_annotations must actually take effect.
Celery applies a "*" annotation after the per-task one, so a "*" entry would
silently overwrite every specific limit and cap long scans at the default. The
default is set as the global limit instead, and these per-task limits must win.
"""
def test_long_running_tasks_exceed_the_default_limit(self):
from config.celery import celery_app
default = celery_app.conf.task_time_limit
for name in (
"scan-perform",
"scan-perform-scheduled",
"provider-deletion",
"tenant-deletion",
):
assert celery_app.tasks[name].time_limit > default
def test_connection_checks_stay_below_the_default_limit(self):
from config.celery import celery_app
default = celery_app.conf.task_time_limit
for name in (
"provider-connection-check",
"integration-connection-check",
"lighthouse-connection-check",
"lighthouse-provider-connection-check",
):
assert celery_app.tasks[name].time_limit < default
Generated
+1 -1
View File
@@ -4494,7 +4494,7 @@ dependencies = [
[[package]]
name = "prowler-api"
version = "1.30.0"
version = "1.31.0"
source = { virtual = "." }
dependencies = [
{ name = "cartography" },
@@ -0,0 +1,27 @@
import warnings
from dashboard.common_methods import get_section_containers_3_levels
warnings.filterwarnings("ignore")
def get_table(data):
aux = data[
[
"REQUIREMENTS_ATTRIBUTES_SECTION",
"REQUIREMENTS_ATTRIBUTES_SUBSECTION",
"NAME",
"CHECKID",
"STATUS",
"REGION",
"ACCOUNTID",
"RESOURCEID",
]
]
return get_section_containers_3_levels(
aux,
"REQUIREMENTS_ATTRIBUTES_SECTION",
"REQUIREMENTS_ATTRIBUTES_SUBSECTION",
"NAME",
)
+2
View File
@@ -139,6 +139,8 @@ services:
worker-dev:
image: prowler-api-dev
# Give Celery soft shutdown time to drain/re-queue in-flight tasks on stop.
stop_grace_period: 120s
build:
context: ./api
dockerfile: Dockerfile
+2
View File
@@ -129,6 +129,8 @@ services:
worker:
image: prowlercloud/prowler-api:${PROWLER_API_VERSION:-stable}
# Give Celery soft shutdown time to drain/re-queue in-flight tasks on stop.
stop_grace_period: 120s
env_file:
- path: .env
required: false
@@ -2,40 +2,228 @@
title: 'Creating a New Security Compliance Framework in Prowler'
---
This guide explains how to add a new security compliance framework to Prowler, end to end. It covers directory layout, the JSON schema, check mapping conventions, the Pydantic models that validate each framework, the CSV output formatter, local validation, testing, and the pull request process.
This guide explains how to add a new security compliance framework to Prowler, end to end. It covers directory layout, the two supported JSON schemas (universal and legacy), the Pydantic models that validate each framework, check mapping conventions, output formatting, local validation, testing, and the pull request process.
## Introduction
A compliance framework in Prowler maps a public or custom control catalog (for example CIS, NIST 800-53, PCI DSS, HIPAA, ENS, CCC) to the security checks that Prowler already runs. Each requirement links to zero, one or more Prowler checks. When a scan executes, findings are aggregated per requirement to produce the compliance report rendered by Prowler CLI and Prowler Cloud.
A compliance framework in Prowler maps a public or custom control catalog (for example CIS, NIST 800-53, PCI DSS, HIPAA, ENS, CCC, DORA) to the security checks that Prowler already runs. Each requirement links to zero, one or more Prowler checks. When a scan executes, findings are aggregated per requirement to produce the compliance report rendered by Prowler CLI and Prowler Cloud.
Prowler ships with 85+ compliance frameworks across All Providers. The catalog lives under `prowler/compliance/<provider>/` (or `prowler/compliance/` for universal compliance frameworks)
Prowler ships 85+ compliance frameworks across all providers. The catalog lives under `prowler/compliance/<provider>/` (legacy, per-provider) or `prowler/compliance/` (universal, multi-provider).
<Warning>
A compliance framework must represent the **complete state** of the source catalog. Every requirement defined by the framework has to be present in the JSON file, even when none of the existing Prowler checks can automate it. In that case, leave `Checks` as an empty array, but do not omit the requirement.
A compliance framework must represent the **complete state** of the source catalog. Every requirement defined by the framework has to be present in the JSON file, even when no Prowler check can automate it. In that case, leave the requirement's check list empty, but do not omit the requirement.
Requirement coverage feeds the compliance percentage calculations and the metadata surfaces (dashboards, widgets, exports). Missing requirements skew those metrics and break the report as a faithful snapshot of the framework.
</Warning>
### Two supported schemas
| Schema | When to use | File location | Discovered as |
| --- | --- | --- | --- |
| **Universal (recommended for new frameworks)** | Multi-provider frameworks, or single-provider frameworks that benefit from declarative table/PDF rendering | `prowler/compliance/<framework>.json` (top-level) | Available for **every** provider whose key appears in any `requirement.checks` dict |
| **Legacy provider-specific** | Single-provider frameworks with framework-specific attribute classes already declared in the codebase (CIS, ENS, ISO 27001, etc.) | `prowler/compliance/<provider>/<framework>_<version>_<provider>.json` | Available only under that provider |
Auto-discovery happens in `get_bulk_compliance_frameworks_universal(provider)` (`prowler/lib/check/compliance_models.py:915`), which scans **both** the top-level `prowler/compliance/` directory and every per-provider sub-directory. Legacy frameworks are transparently converted to the universal `ComplianceFramework` model via `adapt_legacy_to_universal()` before being returned, so the rest of Prowler — CLI table rendering, CSV/OCSF outputs, PDF generation — works the same regardless of the source schema.
> The legacy entry-point `Compliance.get_bulk(provider)` (used by older code paths) only scans per-provider sub-directories. Universal top-level files are picked up exclusively via the universal loader; this matters if you are wiring a new code path against the legacy API.
For **new** frameworks, prefer the universal schema: it requires no Python code changes, supports multiple providers in a single file, and table/PDF rendering is driven entirely from declarative configuration inside the JSON.
> All Pydantic models in `compliance_models.py` are imported from `pydantic.v1`. Subclasses you add for the legacy schema must use `from pydantic.v1 import BaseModel`.
### Prerequisites
Before adding a new framework, complete the following checks:
- **Verify the framework is not already supported.** Inspect `prowler/compliance/<provider>/` for an existing JSON file matching the name and version.
- **Verify the framework is not already supported.** Inspect `prowler/compliance/` and every `prowler/compliance/<provider>/` for an existing JSON file matching the name and version.
- **Confirm the required checks exist.** Every requirement that can be automated must point to one or more existing Prowler checks. For each missing check, implement it first by following the [Prowler Checks](/developer-guide/checks) guide.
- **Review a reference framework.** Use an existing framework with a similar structure as your template. `cis_2.0_aws.json` is the canonical reference for CIS-style frameworks. `ccc_aws.json`, `ens_rd2022_aws.json`, and `nist_800_53_revision_5_aws.json` illustrate other attribute shapes.
- **Review a reference framework.** Use an existing framework with a similar structure as your template:
- Universal: `prowler/compliance/dora.json`, `prowler/compliance/csa_ccm_4.0.json`.
- Legacy: `prowler/compliance/aws/cis_2.0_aws.json` (canonical CIS shape), `prowler/compliance/aws/ccc_aws.json`, `prowler/compliance/aws/ens_rd2022_aws.json`, `prowler/compliance/aws/nist_800_53_revision_5_aws.json`.
## Four-Layer Architecture
## Universal Compliance Framework
A compliance framework spans four layers. A complete contribution must touch each layer that applies.
### Where the file lives
- **Layer 1 Schema validation:** The Pydantic models in `prowler/lib/check/compliance_models.py` define the canonical schema for each attribute shape (CIS, ENS, Mitre, CCC, C5, CSA CCM, ISO 27001, KISA ISMS-P, AWS Well-Architected, Prowler ThreatScore, and a generic fallback).
- **Layer 2 JSON catalog:** The framework JSON file in `prowler/compliance/<provider>/` lists every requirement and maps it to checks.
- **Layer 3 Output formatter:** The Python module in `prowler/lib/outputs/compliance/<framework>/` builds the CSV row model, the per-provider transformer, and the CLI summary table.
- **Layer 4 Output dispatchers:** The dispatchers in `prowler/lib/outputs/compliance/compliance.py` and `prowler/lib/outputs/compliance/compliance_output.py` route findings to the right formatter based on the framework identifier.
Place the file at the top level of the compliance directory:
The rest of this guide walks each layer in order.
```
prowler/compliance/<framework_name>.json
```
## Directory Structure and File Naming
Examples in the repository: `prowler/compliance/csa_ccm_4.0.json`, `prowler/compliance/dora.json`.
The file is auto-discovered — there is **no** need to register it in any `__init__.py`, modify `prowler/lib/outputs/`, or update any other Python module. The framework key Prowler CLI accepts via `--compliance` is the basename of the JSON file without `.json` (`dora.json` → `dora`).
### Top-level structure
```json
{
"framework": "<short identifier, e.g. \"DORA\" or \"CSA-CCM\">",
"name": "<human-readable full name>",
"version": "<framework version>",
"description": "<one-paragraph description shown in --list-compliance and PDF reports>",
"icon": "<short icon slug, optional>",
"attributes_metadata": [ /* see below */ ],
"outputs": { /* see below — optional */ },
"requirements": [ /* see below */ ]
}
```
A `provider` field at the top level is **optional**. The framework's effective provider list is derived by `ComplianceFramework.get_providers()` (`compliance_models.py:739`) from the union of all keys appearing in `requirement.checks` across all requirements; the explicit `provider` field is used **only as a fallback** when no requirement carries any `checks` key. This is what enables a single file (e.g. `dora.json`) to cover AWS today and add Azure / GCP / etc. tomorrow without restructuring.
Provider keys inside `requirement.checks` must match the directory names under `prowler/providers/`. The valid keys at present are: `aws`, `azure`, `gcp`, `m365`, `kubernetes`, `iac`, `github`, `googleworkspace`, `alibabacloud`, `cloudflare`, `mongodbatlas`, `nhn`, `openstack`, `oraclecloud`, `llm`. Comparison in `supports_provider()` is case-insensitive, but lowercase is the convention used everywhere in the repository.
### `attributes_metadata`
Declares the shape of the per-requirement `attributes` dict. When this field is present, the root validator `validate_attributes_against_metadata` (`compliance_models.py:669`) enforces the schema at load time and rejects:
- Missing keys marked `required: true`.
- Keys present in `attributes` but not declared in `attributes_metadata` (typo / drift guard).
- Values that violate a declared `enum`.
- Values whose Python type does not match a declared `int`, `float` or `bool`.
The runtime type check **only** covers `int`, `float` and `bool`. For `str`, `list_str` and `list_dict` the type is documentation-only — non-conforming values won't fail validation. If `attributes_metadata` is omitted, **no per-requirement validation runs at all**.
```json
"attributes_metadata": [
{
"key": "Pillar",
"label": "Pillar",
"type": "str",
"required": true,
"enum": [
"ICT Risk Management",
"ICT-Related Incident Reporting",
"Digital Operational Resilience Testing",
"ICT Third-Party Risk Management",
"Information Sharing"
],
"output_formats": { "csv": true, "ocsf": true }
},
{
"key": "Article",
"label": "Article",
"type": "str",
"required": true,
"output_formats": { "csv": true, "ocsf": true }
}
]
```
Per attribute:
- `key` (required): attribute name as it will appear in `requirement.attributes`.
- `label`: human-readable label used in CSV headers and PDF.
- `type`: one of `str`, `int`, `float`, `bool`, `list_str`, `list_dict`. Defaults to `str`.
- `enum`: optional list of allowed values; non-conforming values are rejected at load time.
- `required`: if `true`, every requirement must include this key with a non-null value.
- `enum_display` / `enum_order`: optional per-enum-value visual metadata (label, abbreviation, color, icon) and explicit ordering for PDF rendering.
- `output_formats`: `{ "csv": <bool>, "ocsf": <bool> }` — toggles inclusion in each output format. Both default to `true`.
### `outputs`
Optional. Controls how the framework is rendered in the console table and in the generated PDF report. Skipping it falls back to sensible defaults.
```json
"outputs": {
"table_config": {
"group_by": "Pillar"
},
"pdf_config": {
"language": "en",
"primary_color": "#003399",
"secondary_color": "#0055A5",
"bg_color": "#F0F4FA",
"group_by_field": "Pillar",
"sections": [ "ICT Risk Management", "ICT-Related Incident Reporting", "..." ],
"section_short_names": { "ICT Risk Management": "ICT Risk Mgmt" },
"charts": [
{
"id": "pillar_compliance",
"type": "horizontal_bar",
"group_by": "Pillar",
"title": "Compliance Score by Pillar",
"y_label": "Pillar",
"x_label": "Compliance %",
"value_source": "compliance_percent",
"color_mode": "by_value"
}
],
"filter": { "only_failed": true, "include_manual": false }
}
}
```
`table_config.group_by` must reference an attribute key declared in `attributes_metadata`. The same applies to `pdf_config.group_by_field` and to every `charts[].group_by`.
For frameworks with weighted scoring (e.g. ThreatScore) declare `pdf_config.scoring` with `risk_field` / `weight_field` / `risk_boost_factor`. For column splitting (e.g. CIS Level 1 vs Level 2) use `table_config.split_by`.
### `requirements`
```json
"requirements": [
{
"id": "DORA-Art5",
"name": "Governance and organisation",
"description": "Financial entities shall have a sound, comprehensive and well-documented ICT internal governance and control framework. ...",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 5",
"ArticleTitle": "Governance and organisation"
},
"checks": {
"aws": [
"iam_avoid_root_usage",
"iam_no_root_access_key",
"iam_root_mfa_enabled"
],
"azure": [],
"gcp": []
}
}
]
```
Per requirement:
- `id` (required): unique identifier within the framework.
- `description` (required): the requirement text as authored by the framework.
- `name`: short title shown alongside the id.
- `attributes`: flat dict; keys must conform to `attributes_metadata`.
- `checks`: dict keyed by provider name (the same lowercase keys listed in the previous section). Each value is a list of Prowler check names that evidence this requirement for that provider. The list **may be empty** and the dict itself defaults to `{}` if omitted; either way the requirement is still loaded and listed by `--list-compliance-requirements`, it just has zero checks to execute. Note: there is **no automatic check-existence validation** at load time — referencing a non-existent check name will silently produce a requirement with no findings. Validate this yourself (see "Validating Your Framework" below).
For MITRE-style frameworks, additional optional fields are available on the requirement: `tactics`, `sub_techniques`, `platforms`, `technique_url` (these are populated automatically when adapting a legacy MITRE JSON to the universal model).
### Multi-provider frameworks
A single universal file can cover any number of providers. The framework appears under each provider's `--list-compliance` output as long as **at least one** requirement has that provider key in its `checks` dict.
When extending an existing universal framework with a new provider, the only change required is editing `requirement.checks`:
```diff
"checks": {
"aws": ["iam_avoid_root_usage", "iam_no_root_access_key"],
+ "azure": ["entra_policy_ensure_mfa_for_admin_roles"]
}
```
No code changes, no new file, no registration step.
## Legacy Provider-Specific Compliance Framework
The legacy schema is still fully supported and remains the format used by most frameworks shipped today (CIS, NIST, ISO 27001, FedRAMP, PCI DSS, GDPR, HIPAA, ENS, etc.). It binds a framework to a single provider and validates each requirement against a framework-specific Pydantic attribute class.
The legacy schema spans **four layers** — a complete contribution must touch every layer that applies:
- **Layer 1 — Schema validation:** the Pydantic models in `prowler/lib/check/compliance_models.py` define the canonical schema for each attribute shape.
- **Layer 2 — JSON catalog:** the framework JSON file in `prowler/compliance/<provider>/` lists every requirement and maps it to checks.
- **Layer 3 — Output formatter:** the Python module in `prowler/lib/outputs/compliance/<framework>/` builds the CSV row model, the per-provider transformer, and the CLI summary table.
- **Layer 4 — Output dispatchers:** the dispatchers in `prowler/lib/outputs/compliance/compliance.py` and `prowler/lib/outputs/compliance/compliance_output.py` route findings to the right formatter based on the framework identifier.
The universal schema collapses Layers 3 and 4 into declarative configuration inside the JSON — that is the main reason it is preferred for new contributions.
### Directory structure and file naming
Compliance frameworks live at:
@@ -46,8 +234,8 @@ prowler/compliance/<provider>/<framework>_<version>_<provider>.json
The filename conventions are:
- All lowercase, words separated with underscores.
- `<provider>` is a supported provider identifier: `aws`, `azure`, `gcp`, `kubernetes`, `m365`, `github`, `googleworkspace`, `alibabacloud`, `oraclecloud`, `cloudflare`, `mongodbatlas`, `nhn`, `openstack`, `iac`, `llm`.
- `<version>` is optional. Omit it when the framework has no versioning, as in `ccc_aws.json`.
- `<provider>` is a supported provider identifier (same lowercase list as the universal section above).
- `<version>` is optional but recommended. Omit only when the framework has no versioning (e.g. `ccc_aws.json`).
- The file basename (without `.json`) is the framework key that Prowler CLI accepts via `--compliance`.
Examples:
@@ -62,48 +250,50 @@ The output formatter directory mirrors the framework name:
```
prowler/lib/outputs/compliance/<framework>/
├── <framework>.py # CLI summary-table dispatcher
├── <framework>.py # CLI summary-table dispatcher
├── <framework>_<provider>.py # Per-provider transformer class
├── models.py # Pydantic CSV row model
└── __init__.py
```
## JSON Schema Reference
### JSON schema reference
Every compliance file is a JSON document with the following top-level keys.
Every legacy compliance file is a JSON document with the following top-level keys. `Framework`, `Name` and `Provider` are validated non-empty by the root validator `framework_and_provider_must_not_be_empty` (`compliance_models.py:329`).
| Field | Type | Required | Description |
|---|---|---|---|
| `Framework` | string | Yes | Canonical framework identifier, for example `CIS`, `NIST-800-53-Revision-5`, `ENS`, `CCC`. |
| `Name` | string | Yes | Human-readable framework name displayed by Prowler App. |
| `Version` | string | Yes | Framework version, for example `2.0`. Use an empty string only for frameworks without versioning. See [Version Handling](#version-handling). |
| `Version` | string | Yes (recommended) | Framework version, e.g. `2.0`. See [Version Handling](#version-handling). |
| `Provider` | string | Yes | Upper-cased provider identifier: `AWS`, `AZURE`, `GCP`, `KUBERNETES`, `M365`, `GITHUB`, `GOOGLEWORKSPACE`, and so on. |
| `Description` | string | Yes | Short description of the framework's scope and purpose. |
| `Requirements` | array | Yes | List of [requirement objects](#requirement-object). |
### Requirement Object
#### Requirement Object
Each entry in `Requirements` describes one control or requirement.
| Field | Type | Required | Description |
|---|---|---|---|
| `Id` | string | Yes | Unique identifier within the framework, for example `1.10` or `CCC.Core.CN01.AR01`. |
| `Name` | string | No | Optional human-readable name used by frameworks that distinguish control name from description, such as NIST. |
| `Name` | string | No | Optional human-readable name (frameworks like NIST distinguish control name from description). |
| `Description` | string | Yes | Verbatim description from the source framework. |
| `Attributes` | array | Yes | List of [attribute objects](#attribute-objects). The shape depends on the framework. |
| `Checks` | array of strings | Yes | Prowler check identifiers that automate the requirement. Leave the list empty when the control cannot be automated. |
### Attribute Objects
#### Attribute Objects
Attributes carry the metadata that Prowler App and the CSV output display for each requirement. The object shape is framework-specific and is validated by a dedicated Pydantic model in `prowler/lib/check/compliance_models.py`. The most common shapes are summarized below.
`Attributes` is parsed against the union declared in `Compliance_Requirement.Attributes` (`compliance_models.py:293`). Pydantic v1 tries each member of the union in declaration order and falls back to `Generic_Compliance_Requirement_Attribute` (the last entry) when nothing else matches — so a brand-new shape that doesn't match any existing class will silently be accepted as Generic, losing its specific fields.
#### CIS_Requirement_Attribute
As of today, the registered attribute classes are: `CIS_Requirement_Attribute`, `ENS_Requirement_Attribute`, `ASDEssentialEight_Requirement_Attribute`, `ISO27001_2013_Requirement_Attribute`, `AWS_Well_Architected_Requirement_Attribute`, `KISA_ISMSP_Requirement_Attribute`, `Prowler_ThreatScore_Requirement_Attribute`, `CCC_Requirement_Attribute`, `C5Germany_Requirement_Attribute`, `CSA_CCM_Requirement_Attribute`, and `Generic_Compliance_Requirement_Attribute` (fallback). MITRE-style frameworks use the separate `Mitre_Requirement` model with `Tactics` / `SubTechniques` / `Platforms` / `TechniqueURL` at the requirement top level. The most common shapes are summarized below.
##### CIS_Requirement_Attribute
Used by every CIS benchmark.
| Field | Type | Required | Notes |
|---|---|---|---|
| `Section` | string | Yes | Top-level section, for example `1 Identity and Access Management`. |
| `Section` | string | Yes | Top-level section, e.g. `1 Identity and Access Management`. |
| `SubSection` | string | No | Optional second-level grouping. |
| `Profile` | enum | Yes | One of `Level 1`, `Level 2`, `E3 Level 1`, `E3 Level 2`, `E5 Level 1`, `E5 Level 2`. |
| `AssessmentStatus` | enum | Yes | `Manual` or `Automated`. |
@@ -116,7 +306,7 @@ Used by every CIS benchmark.
| `DefaultValue` | string | No | Default configuration value, when relevant. |
| `References` | string | Yes | Colon-separated list of reference URLs. |
#### ENS_Requirement_Attribute
##### ENS_Requirement_Attribute
Used by the Spanish ENS (Esquema Nacional de Seguridad) frameworks.
@@ -132,13 +322,13 @@ Used by the Spanish ENS (Esquema Nacional de Seguridad) frameworks.
| `ModoEjecucion` | string | Yes | Execution mode (`manual`, `automático`, `híbrido`). |
| `Dependencias` | array of strings | Yes | Ids of prerequisite controls. Empty list when none. |
#### CCC_Requirement_Attribute
##### CCC_Requirement_Attribute
Used by the Common Cloud Controls Catalog.
| Field | Type | Required | Notes |
|---|---|---|---|
| `FamilyName` | string | Yes | Control family, for example `Data`. |
| `FamilyName` | string | Yes | Control family, e.g. `Data`. |
| `FamilyDescription` | string | Yes | Description of the family. |
| `Section` | string | Yes | Section title. |
| `SubSection` | string | Yes | Subsection title, or empty string. |
@@ -148,9 +338,9 @@ Used by the Common Cloud Controls Catalog.
| `SectionThreatMappings` | array of objects | Yes | Each entry has `ReferenceId` and `Identifiers`. |
| `SectionGuidelineMappings` | array of objects | Yes | Each entry has `ReferenceId` and `Identifiers`. |
#### Generic_Compliance_Requirement_Attribute
##### Generic_Compliance_Requirement_Attribute
The fallback attribute model used when no framework-specific schema applies (for example NIST 800-53, PCI DSS, GDPR, HIPAA).
The fallback attribute model used when no framework-specific schema applies (e.g. NIST 800-53, PCI DSS, GDPR, HIPAA). It is **always the last** element of the `Compliance_Requirement.Attributes` Union; that ordering is load-bearing.
| Field | Type | Required | Notes |
|---|---|---|---|
@@ -158,17 +348,17 @@ The fallback attribute model used when no framework-specific schema applies (for
| `Section` | string | No | Section name. |
| `SubSection` | string | No | Subsection name. |
| `SubGroup` | string | No | Subgroup name. |
| `Service` | string | No | Affected service, for example `aws`, `iam`. |
| `Service` | string | No | Affected service, e.g. `iam`. |
| `Type` | string | No | Control type. |
| `Comment` | string | No | Free-form comment. |
Additional per-framework attribute models exist for `AWS_Well_Architected_Requirement_Attribute`, `ISO27001_2013_Requirement_Attribute`, `Mitre_Requirement_Attribute_<Provider>`, `KISA_ISMSP_Requirement_Attribute`, `Prowler_ThreatScore_Requirement_Attribute`, `C5Germany_Requirement_Attribute`, and `CSA_CCM_Requirement_Attribute`. Consult `prowler/lib/check/compliance_models.py` for their full field sets.
For the remaining attribute classes (`AWS_Well_Architected_Requirement_Attribute`, `ISO27001_2013_Requirement_Attribute`, `Mitre_Requirement_Attribute_<Provider>`, `KISA_ISMSP_Requirement_Attribute`, `Prowler_ThreatScore_Requirement_Attribute`, `C5Germany_Requirement_Attribute`, `CSA_CCM_Requirement_Attribute`) consult `prowler/lib/check/compliance_models.py` for the full field sets.
<Note>
The `Attributes` field is a Pydantic `Union`. The generic attribute model must remain the last element of that Union, otherwise Pydantic v1 silently coerces every framework into the generic shape and your specialized fields are dropped.
The `Attributes` field is a Pydantic `Union`. The generic attribute model **must** remain the last element of that Union otherwise Pydantic v1 silently coerces every framework into the generic shape and your specialized fields are dropped. Adding a brand-new attribute shape requires inserting the Pydantic class **before** `Generic_Compliance_Requirement_Attribute`.
</Note>
## Minimal Working Example
#### Minimal working example
The following snippet is a complete, valid framework file named `my_framework_1.0_aws.json`, saved at `prowler/compliance/aws/my_framework_1.0_aws.json`. It uses the generic attribute shape for simplicity.
@@ -214,26 +404,26 @@ The following snippet is a complete, valid framework file named `my_framework_1.
}
```
## Mapping Checks to Requirements
### Mapping checks to requirements
Each requirement links to the Prowler checks that, together, produce a PASS or FAIL verdict for that control.
- **Include every requirement from the source catalog.** The framework file must mirror the full control list, one-to-one. Compliance percentages, dashboards, and exported metadata are computed against the total requirement count, so omitting an unmappable control inflates coverage and misrepresents the framework.
- List every check by its canonical identifier, the value of `CheckID` inside the check's `.metadata.json` file.
- **Include every requirement from the source catalog.** The framework file must mirror the full control list, one-to-one. Compliance percentages, dashboards, and exported metadata are computed against the total requirement count.
- List every check by its canonical identifier the value of `CheckID` inside the check's `.metadata.json` file.
- One requirement can reference multiple checks. The requirement is evaluated as FAIL when any referenced check produces a FAIL finding for a resource in scope.
- Leave `Checks` as an empty array when the requirement cannot be automated. The requirement still appears in the report, contributes to the total, and resolves to `MANUAL`. An empty mapping is valid; a missing requirement is not.
- Leave `Checks` (legacy) or `checks.<provider>` (universal) as an empty array when the requirement cannot be automated. The requirement still appears in the report and contributes to the total.
- Reuse checks across requirements when the same control applies in multiple places. Do not duplicate check logic to match framework structure.
- Avoid referencing checks from a different provider. A compliance file is bound to one provider, and cross-provider checks will never match findings in the scan.
- Avoid referencing checks from a different provider. A legacy compliance file is bound to one provider, and cross-provider checks will never match findings in the scan.
To discover available checks, run:
To discover available checks:
```bash
uv run python prowler-cli.py <provider> --list-checks
```
## Supporting Multiple Providers
### Supporting multiple providers (legacy)
Each compliance file targets a single provider. To cover several providers with the same framework (for example CIS across AWS, Azure, and GCP), ship one JSON file per provider:
The legacy schema binds each file to a single provider. To cover several providers with the same framework, ship one JSON file per provider:
```
prowler/compliance/aws/cis_2.0_aws.json
@@ -241,15 +431,15 @@ prowler/compliance/azure/cis_2.0_azure.json
prowler/compliance/gcp/cis_2.0_gcp.json
```
Keep the `Framework` and `Version` values identical across the files so the dispatcher matches them, and change only the `Provider`, `Checks`, and provider-specific metadata.
Keep the `Framework` and `Version` values identical across the files so the dispatcher matches them; change only the `Provider`, `Checks`, and provider-specific metadata. The CIS output formatter already supports every provider listed above.
The CIS output formatter already supports every provider listed above. For a brand-new framework that spans several providers, add one transformer per provider in `prowler/lib/outputs/compliance/<framework>/` and extend the summary-table dispatcher accordingly. See [Output Formatter](#output-formatter).
For a brand-new framework that spans several providers, **prefer the universal schema** — it covers every provider from a single file. If you must use the legacy schema, add one transformer per provider in `prowler/lib/outputs/compliance/<framework>/` and extend the summary-table dispatcher accordingly. See [Output Formatter](#output-formatter).
## Output Formatter
### Output formatter
Prowler renders every compliance framework in two forms: a detailed CSV report written to disk, and a summary table printed in the CLI. Both are produced by the output formatter package for the framework.
Legacy frameworks render in two forms: a detailed CSV report written to disk, and a summary table printed in the CLI. Both are produced by the output formatter package for the framework. Universal frameworks do **not** need a Python output formatter — the `outputs` config inside the JSON drives rendering — so this section applies only to the legacy schema.
For a new framework named `my_framework`, create:
For a new legacy framework named `my_framework`, create:
```
prowler/lib/outputs/compliance/my_framework/
@@ -259,19 +449,19 @@ prowler/lib/outputs/compliance/my_framework/
└── models.py # CSV row Pydantic model
```
### Step 1 Define the CSV Row Model
#### Step 1 Define the CSV row model
In `models.py`, declare a Pydantic v1 model with one field per CSV column. Use existing models such as `AWSCISModel` in `prowler/lib/outputs/compliance/cis/models.py` as the reference. Fields typically include `Provider`, `Description`, `AccountId`, `Region`, `AssessmentDate`, `Requirements_Id`, `Requirements_Description`, one `Requirements_Attributes_*` field per attribute key, plus the finding fields `Status`, `StatusExtended`, `ResourceId`, `ResourceName`, `CheckId`, `Muted`, `Framework`, `Name`.
### Step 2 Implement the Transformer Class
#### Step 2 Implement the transformer
In `my_framework_aws.py`, subclass `ComplianceOutput` from `prowler.lib.outputs.compliance.compliance_output` and implement `transform(findings, compliance, compliance_name)`. Iterate over `findings`, match each finding to the requirements it satisfies through `finding.compliance.get(compliance_name, [])`, and append one row per attribute to `self._data`.
### Step 3 Add the Summary-Table Dispatcher
#### Step 3 Add the summary-table dispatcher
In `my_framework.py`, implement `get_my_framework_table(findings, bulk_checks_metadata, compliance_framework, output_filename, output_directory, compliance_overview)` following the pattern in `prowler/lib/outputs/compliance/cis/cis.py`.
### Step 4 Register the Framework in the Dispatchers
#### Step 4 Register the framework in the dispatchers
- Add the dispatcher call in `prowler/lib/outputs/compliance/compliance.py`, inside `display_compliance_table`, with a branch such as `elif "my_framework" in compliance_framework:`.
- Register the CSV model and transformer in `prowler/lib/outputs/compliance/compliance_output.py` so the CSV file is emitted during the scan.
@@ -280,49 +470,94 @@ In `my_framework.py`, implement `get_my_framework_table(findings, bulk_checks_me
For NIST-style catalogs that use `Generic_Compliance_Requirement_Attribute`, no custom formatter is needed. The generic formatter in `prowler/lib/outputs/compliance/generic/` handles them automatically, provided the JSON validates against the generic attribute schema.
</Note>
## Version Handling
### Legacy-to-universal adapter
At load time, every legacy file is transparently adapted to a `ComplianceFramework` via `adapt_legacy_to_universal()` (`compliance_models.py:819`), which: (a) flattens the first element of `Attributes` into a flat `attributes` dict, (b) wraps `Checks` as `{provider_lower: [...]}`, (c) infers `attributes_metadata` from the matched Pydantic class via `_infer_attribute_metadata()`. The rest of Prowler (CSV/OCSF/PDF output, CLI table) then treats both formats identically.
Loader-error behaviour differs between the two entry points:
- `load_compliance_framework()` (legacy) is **fail-fast**: it calls `sys.exit(1)` on any `ValidationError` (`compliance_models.py:464`).
- `load_compliance_framework_universal()` is more lenient — it logs the error and returns `None`, so `get_bulk_compliance_frameworks_universal()` simply skips the broken file and keeps loading the rest.
## Version handling
Prowler matches frameworks by concatenating `Framework` and `Version`. A missing or empty `Version` collapses several frameworks to the same key and breaks CLI filtering with `--compliance`.
- Always set `Version` to a non-empty string, even for frameworks that rename editions rather than version them. Use the edition identifier (for example `RD2022`, `v2025.10`, `4.0`).
- Always set `Version` (or `version` for universal frameworks) to a non-empty string, even for frameworks that rename editions rather than version them. Use the edition identifier (for example `RD2022`, `v2025.10`, `4.0`, `2022/2554`).
- When the source catalog has no version, use the first year of adoption or the release date.
- Make sure the version substring embedded in the filename matches `Version`, because the CLI dispatcher reads `compliance_framework.split("_")[1]` to select the correct version.
- For **legacy** files, make sure the version substring embedded in the filename matches `Version`, because the CLI dispatcher reads `compliance_framework.split("_")[1]` to select the correct version.
## Validating the Framework Locally
## Validating Your Framework
Follow the steps below before opening a pull request.
Before opening a PR, validate the JSON loads cleanly against the model and that every referenced check actually exists.
### 1. Run the Compliance Model Validator
### 1. Schema validation
For **universal** frameworks, load the file and inspect what was parsed. The framework key inside `bulk` is the **basename of the JSON file** (without `.json`); for `prowler/compliance/dora.json` that key is `dora`, for `prowler/compliance/aws/cis_5.0_aws.json` it is `cis_5.0_aws`.
```python
from prowler.lib.check.compliance_models import (
load_compliance_framework_universal,
get_bulk_compliance_frameworks_universal,
)
fw = load_compliance_framework_universal("prowler/compliance/<your_framework>.json")
assert fw is not None, "load returned None — check the logs for the validation error"
print(fw.framework, len(fw.requirements), fw.get_providers())
bulk = get_bulk_compliance_frameworks_universal("aws")
assert "<your_framework_filename_without_json>" in bulk
```
### 2. Check existence cross-check
There is **no automatic check-existence validation** at load time. Cross-check that every check name in your framework maps to a real check directory:
```python
import os
real = set()
for svc in os.listdir("prowler/providers/aws/services"):
svc_path = f"prowler/providers/aws/services/{svc}"
if not os.path.isdir(svc_path):
continue
for entry in os.listdir(svc_path):
if os.path.isfile(f"{svc_path}/{entry}/{entry}.metadata.json"):
real.add(entry)
referenced = {c for r in fw.requirements for c in r.checks.get("aws", [])}
missing = referenced - real
assert not missing, f"checks referenced in framework but not found in repo: {sorted(missing)}"
```
### 3. CLI smoke test
```bash
uv run python prowler-cli.py <provider> --list-compliance
```
The framework must appear in the output. A validation error indicates a schema mismatch between the JSON file and the attribute model.
### 2. Run a Scan Filtered by the Framework
The framework must appear in the output. A validation error indicates a schema mismatch.
```bash
uv run python prowler-cli.py <provider> \
--compliance <framework>_<version>_<provider> \
--compliance <framework_key> \
--log-level ERROR
```
Verify that:
- Prowler produces a CSV file under `output/compliance/` with the expected name.
- The CLI summary table lists every section in the framework.
- The CLI summary table lists every section / pillar of the framework.
- Findings roll up under the expected requirements.
### 3. Inspect the CSV Output
### 4. Inspect the CSV output
Open the generated CSV and confirm:
- All columns defined in `models.py` appear.
- Every requirement has at least one row per scanned resource.
- Values such as `Requirements_Attributes_Section` reflect the JSON content.
- All columns defined in `models.py` (legacy) or in `attributes_metadata` (universal) appear.
- Every requirement has at least one row per scanned resource (when there are findings).
- Attribute values such as `Requirements_Attributes_Section` reflect the JSON content.
### 4. Verify the Framework in Prowler App
### 5. Verify the framework in Prowler App
Launch Prowler App locally (`docker compose up` from the repository root) and run a scan with the new compliance framework. Confirm the compliance page renders the requirements, sections, and status widgets correctly.
@@ -331,7 +566,7 @@ Launch Prowler App locally (`docker compose up` from the repository root) and ru
Compliance contributions require two layers of tests.
- **Schema tests** exercise the Pydantic models. Extend `tests/lib/check/universal_compliance_models_test.py` with a case that loads the new JSON file and asserts the attribute type matches the expected model.
- **Output tests** exercise the transformer. Mirror the structure under `tests/lib/outputs/compliance/<framework>/` with fixtures that feed synthetic findings through the transformer and assert the resulting CSV rows.
- **Output tests** (legacy frameworks only) exercise the transformer. Mirror the structure under `tests/lib/outputs/compliance/<framework>/` with fixtures that feed synthetic findings through the transformer and assert the resulting CSV rows.
Run the suite with:
@@ -342,7 +577,20 @@ uv run pytest -n auto tests/lib/check/universal_compliance_models_test.py \
For guidance on writing Prowler SDK tests, refer to [Unit Testing](/developer-guide/unit-testing).
## Submitting the Pull Request
## Running and listing your framework
Once the file is in place, the CLI auto-discovers it:
```sh
prowler <provider> --list-compliance # framework appears in the list
prowler <provider> --compliance <framework_key> --list-checks
prowler <provider> --compliance <framework_key> # full scan + compliance report
prowler <provider> --compliance <framework_key> --list-compliance-requirements <framework_key>
```
For end-user-facing tutorials (recommended for high-profile frameworks), add a dedicated page under `docs/user-guide/compliance/tutorials/` and register it in the `"Compliance"` group of `docs/docs.json`. See `docs/user-guide/compliance/tutorials/threatscore.mdx` as a reference.
## Submitting the pull request
Before opening the pull request:
@@ -352,28 +600,31 @@ Before opening the pull request:
uv run pytest -n auto
```
2. Add a changelog entry under the `### 🚀 Added` section of `prowler/CHANGELOG.md`, describing the new framework and the providers it covers.
3. Follow the [Pull Request Template](https://github.com/prowler-cloud/prowler/blob/master/.github/pull_request_template.md) and set the PR title using Conventional Commits, for example `feat(compliance): add My Framework 1.0 for AWS`.
3. Follow the [Pull Request Template](https://github.com/prowler-cloud/prowler/blob/master/.github/pull_request_template.md) and set the PR title using Conventional Commits, e.g. `feat(compliance): add My Framework 1.0 for AWS`.
4. Request review from the compliance codeowners listed in `.github/CODEOWNERS`.
## Troubleshooting
The following issues are the most common when contributing a compliance framework.
- **`ValidationError: field required` during scan.** The JSON is missing a required attribute field. Re-check the matching Pydantic model in `prowler/lib/check/compliance_models.py`.
- **All attributes collapse to `Generic_Compliance_Requirement_Attribute` values.** The Pydantic `Union` is ordered incorrectly, or the JSON matches only the generic shape. Move the generic model to the last Union position and ensure every required field is present in the JSON.
- **`--compliance` filter does not find the framework.** The filename does not match the expected pattern `<framework>_<version>_<provider>.json`, the version is empty, or the file lives outside `prowler/compliance/<provider>/`.
- **CLI summary table is empty but the CSV is populated.** The dispatcher branch in `prowler/lib/outputs/compliance/compliance.py` is missing or its substring match does not catch the framework key.
- **CSV file is missing after the scan.** The transformer class is not registered in `prowler/lib/outputs/compliance/compliance_output.py`, or `transform()` raises silently. Run the scan with `--log-level DEBUG`.
- **Findings do not roll up under a requirement.** A check listed in `Checks` either does not exist for that provider or is spelled incorrectly. Run `--list-checks | grep <check_name>` to confirm.
- **`ValidationError: field required` during scan (legacy).** The JSON is missing a required attribute field. Re-check the matching Pydantic model in `prowler/lib/check/compliance_models.py`.
- **All attributes collapse to `Generic_Compliance_Requirement_Attribute` values (legacy).** The Pydantic `Union` is ordered incorrectly, or the JSON matches only the generic shape. Keep the generic model in the last Union position and ensure every required field is present in the JSON.
- **`attributes_metadata validation failed` (universal).** The root validator in `compliance_models.py:669` rejected the file. The error message lists each offending requirement; common causes are unknown attribute keys (typo or missing entry in `attributes_metadata`), enum violations, or missing required keys.
- **`--compliance` filter does not find the framework.** For legacy: the filename does not match `<framework>_<version>_<provider>.json`, the version is empty, or the file lives outside `prowler/compliance/<provider>/`. For universal: the file is not at the top level of `prowler/compliance/` or it loaded as `None` (check logs for the validation error).
- **CLI summary table is empty but the CSV is populated (legacy).** The dispatcher branch in `prowler/lib/outputs/compliance/compliance.py` is missing or its substring match does not catch the framework key.
- **CSV file is missing after the scan (legacy).** The transformer class is not registered in `prowler/lib/outputs/compliance/compliance_output.py`, or `transform()` raises silently. Run the scan with `--log-level DEBUG`.
- **Findings do not roll up under a requirement.** A check listed in `Checks` either does not exist for that provider or is spelled incorrectly. Run `--list-checks | grep <check_name>` to confirm, or run the check-existence cross-check from "Validating Your Framework".
## Reference Examples
## Reference examples
Use the following files as templates when modeling a new contribution.
- `prowler/compliance/aws/cis_2.0_aws.json` CIS attribute shape.
- `prowler/compliance/aws/nist_800_53_revision_5_aws.json` Generic attribute shape.
- `prowler/compliance/aws/ccc_aws.json` CCC attribute shape.
- `prowler/compliance/azure/ens_rd2022_azure.json` ENS attribute shape.
- `prowler/lib/check/compliance_models.py` Canonical Pydantic schemas.
- `prowler/lib/outputs/compliance/cis/` Reference implementation of a multi-provider output formatter.
- `prowler/lib/outputs/compliance/generic/` Reference implementation of a generic output formatter.
- `prowler/compliance/dora.json` — universal schema, single-provider populated (AWS), ready to extend with more providers.
- `prowler/compliance/csa_ccm_4.0.json` — universal schema, multi-provider populated (AWS, Azure, GCP, AlibabaCloud, OracleCloud).
- `prowler/compliance/aws/cis_2.0_aws.json` — legacy CIS attribute shape.
- `prowler/compliance/aws/nist_800_53_revision_5_aws.json` — legacy generic attribute shape.
- `prowler/compliance/aws/ccc_aws.json` — legacy CCC attribute shape.
- `prowler/compliance/azure/ens_rd2022_azure.json` — legacy ENS attribute shape.
- `prowler/lib/check/compliance_models.py` — canonical Pydantic schemas for both formats.
- `prowler/lib/outputs/compliance/cis/` — reference implementation of a multi-provider legacy output formatter.
- `prowler/lib/outputs/compliance/generic/` — reference implementation of a legacy generic output formatter.
@@ -20,7 +20,8 @@ Refer to the [Prowler App Tutorial](/user-guide/tutorials/prowler-app) for detai
_Commands_:
```bash
<CodeGroup>
```bash macOS/Linux
VERSION=$(curl -s https://api.github.com/repos/prowler-cloud/prowler/releases/latest | jq -r .tag_name)
curl -sLO "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/${VERSION}/docker-compose.yml"
# Environment variables can be customized in the .env file. Using default values in production environments is not recommended.
@@ -28,6 +29,15 @@ Refer to the [Prowler App Tutorial](/user-guide/tutorials/prowler-app) for detai
docker compose up -d
```
```powershell Windows PowerShell
$VERSION = (Invoke-RestMethod -Uri "https://api.github.com/repos/prowler-cloud/prowler/releases/latest").tag_name
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/$VERSION/docker-compose.yml" -OutFile "docker-compose.yml"
# Environment variables can be customized in the .env file. Using default values in production environments is not recommended.
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/prowler-cloud/prowler/refs/tags/$VERSION/.env" -OutFile ".env"
docker compose up -d
```
</CodeGroup>
<Callout icon="lock" iconType="regular" color="#e74c3c">
For a secure setup, the API auto-generates a unique key pair, `DJANGO_TOKEN_SIGNING_KEY` and `DJANGO_TOKEN_VERIFYING_KEY`, and stores it in `~/.config/prowler-api` (non-container) or the bound Docker volume in `_data/api` (container). Never commit or reuse static/default keys. To rotate keys, delete the stored key files and restart the API.
</Callout>
@@ -118,8 +128,8 @@ To update the environment file:
Edit the `.env` file and change version values:
```env
PROWLER_UI_VERSION="5.28.0"
PROWLER_API_VERSION="5.28.0"
PROWLER_UI_VERSION="5.29.0"
PROWLER_API_VERSION="5.29.0"
```
<Note>
@@ -47,7 +47,11 @@ Follow these steps to remove a user of your account:
1. Navigate to **Users** from the side menu.
2. Click the delete button of your current user.
> **Note: Each user will be able to delete himself and not others, regardless of his permissions.**
> **Note: Each user can only delete their own account, regardless of their permissions. For this reason, the delete button is only shown on your own row and not on other users' rows.**
Deleting a user removes the **entire user account** from Prowler, not just its membership in your organization. Because a single account can belong to more than one tenant, allowing one administrator to delete it outright could affect organizations they don't manage and irreversibly remove another person's identity. To keep this destructive action under the control of the account owner, the API only permits a user to delete themselves (it rejects any other target with a `400` response), and the UI mirrors this by showing the delete button exclusively on your own row.
To remove **another** user from your organization, use the [_Expel from organization_](/user-guide/tutorials/prowler-app-multi-tenant#expelling-a-user-from-an-organization) action instead. Expelling removes the user's membership, role grants, and active sessions for your tenant only, and deletes the underlying account just for that user if your organization was their last remaining membership. This action is reserved for tenant **owners**.
<img src="/images/prowler-app/rbac/user_remove.png" alt="Remove User" width="700" />
+29
View File
@@ -2,6 +2,35 @@
All notable changes to the **Prowler SDK** are documented in this file.
## [5.30.0] (Prowler UNRELEASED)
### 🚀 Added
- `sagemaker_models_monitor_enabled` check for AWS provider, verifying that each SageMaker monitoring schedule is in the `Scheduled` state so data and model drift is actively detected [(#11278)](https://github.com/prowler-cloud/prowler/pull/11278)
- DORA (Digital Operational Resilience Act, Regulation (EU) 2022/2554) universal compliance framework with AWS provider coverage across the five DORA pillars [(#11131)](https://github.com/prowler-cloud/prowler/pull/11131)
- Support for external/custom providers, checks, and compliance frameworks without modifying core code [(#10700)](https://github.com/prowler-cloud/prowler/pull/10700)
- Public `Provider.get_class()` method that resolves a provider class by name for both built-in and external (entry-point) providers [(#11398)](https://github.com/prowler-cloud/prowler/pull/11398)
- `elbv2_alb_drop_invalid_header_fields_enabled` check for AWS provider, verifying Application Load Balancers have `routing.http.drop_invalid_header_fields.enabled` set to `true` to mitigate HTTP desync attacks (AWS FSBP ELB.4) [(#11471)](https://github.com/prowler-cloud/prowler/pull/11471)
- External multi-provider compliance frameworks can be registered via the `prowler.compliance.universal` entry point group [(#11490)](https://github.com/prowler-cloud/prowler/pull/11490)
### 🐞 Fixed
- `load_and_validate_config_file` now unwraps namespaced config for every built-in and external provider, and no longer leaks the full file as the provider's config when the file is namespaced [(#10700)](https://github.com/prowler-cloud/prowler/pull/10700)
- GCP `logging_sink_created` now recognizes organization-level aggregated sinks with `includeChildren=True`, avoiding false failures for covered projects [(#11355)](https://github.com/prowler-cloud/prowler/pull/11355)
- Jira integration no longer fails with `400 INVALID_INPUT` when a finding has empty fields [(#11474)](https://github.com/prowler-cloud/prowler/pull/11474)
- GCP `iam_service_account_unused` now passes disabled service accounts instead of failing them, since a disabled account cannot authenticate or be used [(#11467)](https://github.com/prowler-cloud/prowler/pull/11467)
- AWS AI Security Framework now renders in the dashboard instead of showing "No data found for this compliance", by adding the missing compliance view module [(#11470)](https://github.com/prowler-cloud/prowler/pull/11470)
---
## [5.29.1] (Prowler v5.29.1)
### 🐞 Fixed
- OCSF output writer now re-raises I/O errors (e.g. `ENOSPC`) instead of logging them per finding and leaving a truncated file [(#11421)](https://github.com/prowler-cloud/prowler/pull/11421)
---
## [5.29.0] (Prowler v5.29.0)
### 🚀 Added
+57 -77
View File
@@ -10,7 +10,6 @@ from colorama import Fore, Style
from colorama import init as colorama_init
from prowler.config.config import (
EXTERNAL_TOOL_PROVIDERS,
cloud_api_base_url,
csv_file_suffix,
get_available_compliance_frameworks,
@@ -85,11 +84,6 @@ from prowler.lib.outputs.compliance.compliance import (
display_compliance_table,
process_universal_compliance_frameworks,
)
from prowler.lib.outputs.compliance.csa.csa_alibabacloud import AlibabaCloudCSA
from prowler.lib.outputs.compliance.csa.csa_aws import AWSCSA
from prowler.lib.outputs.compliance.csa.csa_azure import AzureCSA
from prowler.lib.outputs.compliance.csa.csa_gcp import GCPCSA
from prowler.lib.outputs.compliance.csa.csa_oraclecloud import OracleCloudCSA
from prowler.lib.outputs.compliance.ens.ens_aws import AWSENS
from prowler.lib.outputs.compliance.ens.ens_azure import AzureENS
from prowler.lib.outputs.compliance.ens.ens_gcp import GCPENS
@@ -210,9 +204,10 @@ def prowler():
# We treat the compliance framework as another output format
if compliance_framework:
args.output_formats.extend(compliance_framework)
# If no input compliance framework, set all, unless a specific service or check is input
# Skip for IAC and LLM providers that don't use compliance frameworks
elif default_execution and provider not in ["iac", "llm"]:
# If no input compliance framework, set all, unless a specific service or check is input.
# Skip for tool-wrapper providers (iac, llm, image, and any external plug-in
# declaring `is_external_tool_provider = True`) — they don't use compliance frameworks.
elif default_execution and not Provider.is_tool_wrapper_provider(provider):
args.output_formats.extend(get_available_compliance_frameworks(provider))
# Set Logger configuration
@@ -250,7 +245,7 @@ def prowler():
universal_frameworks = {}
# Skip compliance frameworks for external-tool providers
if provider not in EXTERNAL_TOOL_PROVIDERS:
if not Provider.is_tool_wrapper_provider(provider):
bulk_compliance_frameworks = Compliance.get_bulk(provider)
# Complete checks metadata with the compliance framework specification
bulk_checks_metadata = update_checks_metadata_with_compliance(
@@ -318,7 +313,7 @@ def prowler():
sys.exit()
# Skip service and check loading for external-tool providers
if provider not in EXTERNAL_TOOL_PROVIDERS:
if not Provider.is_tool_wrapper_provider(provider):
# Import custom checks from folder
if checks_folder:
custom_checks = parse_checks_from_folder(global_provider, checks_folder)
@@ -441,6 +436,20 @@ def prowler():
output_options = ScalewayOutputOptions(
args, bulk_checks_metadata, global_provider.identity
)
else:
# Dynamic fallback: any external/custom provider
try:
output_options = global_provider.get_output_options(
args, bulk_checks_metadata
)
except NotImplementedError:
# No provider-specific OutputOptions: use the generic default so the
# run still produces output instead of aborting.
from prowler.providers.common.models import default_output_options
output_options = default_output_options(
global_provider, args, bulk_checks_metadata
)
# Run the quick inventory for the provider if available
if hasattr(args, "quick_inventory") and args.quick_inventory:
@@ -450,7 +459,7 @@ def prowler():
# Execute checks
findings = []
if provider in EXTERNAL_TOOL_PROVIDERS:
if Provider.is_tool_wrapper_provider(provider):
# For external-tool providers, run the scan directly
if provider == "llm":
@@ -460,12 +469,19 @@ def prowler():
findings = global_provider.run_scan(streaming_callback=streaming_callback)
else:
# Original behavior for IAC and Image
try:
if provider == "image":
try:
findings = global_provider.run()
except ImageBaseException as error:
logger.critical(f"{error}")
sys.exit(1)
else:
# IAC and external tool-wrapper providers registered via entry
# points. Unexpected failures propagate to the outer except
# Exception backstop further down in this file — keeping the
# branch free of an Image-specific catch that would otherwise
# mislead plug-in authors reading this code.
findings = global_provider.run()
except ImageBaseException as error:
logger.critical(f"{error}")
sys.exit(1)
# Note: External tool providers don't support granular progress tracking since
# they run external tools as a black box and return all findings at once.
# Progress tracking would just be 0% → 100%.
@@ -806,18 +822,6 @@ def prowler():
)
generated_outputs["compliance"].append(c5)
c5.batch_write_data_to_file()
elif compliance_name == "csa_ccm_4.0_aws":
filename = (
f"{output_options.output_directory}/compliance/"
f"{output_options.output_filename}_{compliance_name}.csv"
)
csa_ccm_4_0_aws = AWSCSA(
findings=finding_outputs,
compliance=bulk_compliance_frameworks[compliance_name],
file_path=filename,
)
generated_outputs["compliance"].append(csa_ccm_4_0_aws)
csa_ccm_4_0_aws.batch_write_data_to_file()
else:
filename = (
f"{output_options.output_directory}/compliance/"
@@ -921,18 +925,6 @@ def prowler():
)
generated_outputs["compliance"].append(c5_azure)
c5_azure.batch_write_data_to_file()
elif compliance_name == "csa_ccm_4.0_azure":
filename = (
f"{output_options.output_directory}/compliance/"
f"{output_options.output_filename}_{compliance_name}.csv"
)
csa_ccm_4_0_azure = AzureCSA(
findings=finding_outputs,
compliance=bulk_compliance_frameworks[compliance_name],
file_path=filename,
)
generated_outputs["compliance"].append(csa_ccm_4_0_azure)
csa_ccm_4_0_azure.batch_write_data_to_file()
else:
filename = (
f"{output_options.output_directory}/compliance/"
@@ -1036,18 +1028,6 @@ def prowler():
)
generated_outputs["compliance"].append(c5_gcp)
c5_gcp.batch_write_data_to_file()
elif compliance_name == "csa_ccm_4.0_gcp":
filename = (
f"{output_options.output_directory}/compliance/"
f"{output_options.output_filename}_{compliance_name}.csv"
)
csa_ccm_4_0_gcp = GCPCSA(
findings=finding_outputs,
compliance=bulk_compliance_frameworks[compliance_name],
file_path=filename,
)
generated_outputs["compliance"].append(csa_ccm_4_0_gcp)
csa_ccm_4_0_gcp.batch_write_data_to_file()
else:
filename = (
f"{output_options.output_directory}/compliance/"
@@ -1282,18 +1262,6 @@ def prowler():
)
generated_outputs["compliance"].append(cis)
cis.batch_write_data_to_file()
elif compliance_name == "csa_ccm_4.0_oraclecloud":
filename = (
f"{output_options.output_directory}/compliance/"
f"{output_options.output_filename}_{compliance_name}.csv"
)
csa_ccm_4_0_oraclecloud = OracleCloudCSA(
findings=finding_outputs,
compliance=bulk_compliance_frameworks[compliance_name],
file_path=filename,
)
generated_outputs["compliance"].append(csa_ccm_4_0_oraclecloud)
csa_ccm_4_0_oraclecloud.batch_write_data_to_file()
else:
filename = (
f"{output_options.output_directory}/compliance/"
@@ -1322,18 +1290,6 @@ def prowler():
)
generated_outputs["compliance"].append(cis)
cis.batch_write_data_to_file()
elif compliance_name == "csa_ccm_4.0_alibabacloud":
filename = (
f"{output_options.output_directory}/compliance/"
f"{output_options.output_filename}_{compliance_name}.csv"
)
csa_ccm_4_0_alibabacloud = AlibabaCloudCSA(
findings=finding_outputs,
compliance=bulk_compliance_frameworks[compliance_name],
file_path=filename,
)
generated_outputs["compliance"].append(csa_ccm_4_0_alibabacloud)
csa_ccm_4_0_alibabacloud.batch_write_data_to_file()
elif compliance_name == "prowler_threatscore_alibabacloud":
filename = (
f"{output_options.output_directory}/compliance/"
@@ -1358,6 +1314,30 @@ def prowler():
)
generated_outputs["compliance"].append(generic_compliance)
generic_compliance.batch_write_data_to_file()
else:
# Dynamic fallback: any external/custom provider
try:
global_provider.generate_compliance_output(
finding_outputs,
bulk_compliance_frameworks,
input_compliance_frameworks,
output_options,
generated_outputs,
)
except NotImplementedError:
# Last resort: generic compliance
for compliance_name in input_compliance_frameworks:
filename = (
f"{output_options.output_directory}/compliance/"
f"{output_options.output_filename}_{compliance_name}.csv"
)
generic_compliance = GenericCompliance(
findings=finding_outputs,
compliance=bulk_compliance_frameworks[compliance_name],
file_path=filename,
)
generated_outputs["compliance"].append(generic_compliance)
generic_compliance.batch_write_data_to_file()
# AWS Security Hub Integration
if provider == "aws":
File diff suppressed because it is too large Load Diff
@@ -1863,7 +1863,9 @@
"Id": "ELB.4",
"Name": "Application load balancers should be configured to drop HTTP headers",
"Description": "This control evaluates AWS Application Load Balancers (ALB) to ensure they are configured to drop invalid HTTP headers. The control fails if the value of routing.http.drop_invalid_header_fields.enabled is set to false. By default, ALBs are not configured to drop invalid HTTP header values. Removing these header values prevents HTTP desync attacks.",
"Checks": [],
"Checks": [
"elbv2_alb_drop_invalid_header_fields_enabled"
],
"Attributes": [
{
"ItemId": "ELB.4",
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+597
View File
@@ -0,0 +1,597 @@
{
"framework": "DORA",
"name": "Digital Operational Resilience Act (Regulation (EU) 2022/2554)",
"version": "2022/2554",
"description": "The Digital Operational Resilience Act (DORA) is a European Union regulation (Regulation (EU) 2022/2554) that sets a uniform framework for the digital operational resilience of the EU financial sector. Mandatory since 17 January 2025, it applies to financial entities (banks, insurers, investment firms, payment institutions, etc.) and to ICT third-party service providers. DORA is structured around five pillars: ICT risk management, ICT-related incident reporting, digital operational resilience testing, ICT third-party risk management, and information sharing. This Prowler mapping covers the technical controls auditable from cloud configuration; the organisational, contractual and supervisory obligations defined in DORA must be addressed outside of Prowler.",
"icon": "dora",
"attributes_metadata": [
{
"key": "Pillar",
"label": "Pillar",
"type": "str",
"required": true,
"enum": [
"ICT Risk Management",
"ICT-Related Incident Reporting",
"Digital Operational Resilience Testing",
"ICT Third-Party Risk Management",
"Information Sharing"
],
"output_formats": {
"csv": true,
"ocsf": true
}
},
{
"key": "Article",
"label": "Article",
"type": "str",
"required": true,
"output_formats": {
"csv": true,
"ocsf": true
}
},
{
"key": "ArticleTitle",
"label": "Article Title",
"type": "str",
"required": true,
"output_formats": {
"csv": true,
"ocsf": true
}
}
],
"outputs": {
"table_config": {
"group_by": "Pillar"
},
"pdf_config": {
"language": "en",
"primary_color": "#003399",
"secondary_color": "#0055A5",
"bg_color": "#F0F4FA",
"group_by_field": "Pillar",
"sections": [
"ICT Risk Management",
"ICT-Related Incident Reporting",
"Digital Operational Resilience Testing",
"ICT Third-Party Risk Management",
"Information Sharing"
],
"section_short_names": {
"ICT Risk Management": "ICT Risk Mgmt",
"ICT-Related Incident Reporting": "Incident Reporting",
"Digital Operational Resilience Testing": "Resilience Testing",
"ICT Third-Party Risk Management": "Third-Party Risk",
"Information Sharing": "Info Sharing"
},
"charts": [
{
"id": "pillar_compliance",
"type": "horizontal_bar",
"group_by": "Pillar",
"title": "Compliance Score by DORA Pillar",
"y_label": "Pillar",
"x_label": "Compliance %",
"value_source": "compliance_percent",
"color_mode": "by_value"
}
],
"filter": {
"only_failed": true,
"include_manual": false
}
}
},
"requirements": [
{
"id": "DORA-Art5",
"name": "Governance and organisation",
"description": "Financial entities shall have a sound, comprehensive and well-documented ICT internal governance and control framework. Senior management is accountable for ICT risk and shall enforce strong identity, authentication and least-privilege policies for privileged identities, including the root account.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 5",
"ArticleTitle": "Governance and organisation"
},
"checks": {
"aws": [
"iam_avoid_root_usage",
"iam_no_root_access_key",
"iam_root_mfa_enabled",
"iam_root_hardware_mfa_enabled",
"iam_root_credentials_management_enabled",
"iam_password_policy_minimum_length_14",
"iam_password_policy_lowercase",
"iam_password_policy_uppercase",
"iam_password_policy_number",
"iam_password_policy_symbol",
"iam_password_policy_reuse_24",
"iam_password_policy_expires_passwords_within_90_days_or_less",
"iam_securityaudit_role_created",
"iam_support_role_created",
"organizations_account_part_of_organizations",
"iam_user_mfa_enabled_console_access",
"iam_user_hardware_mfa_enabled"
]
}
},
{
"id": "DORA-Art6",
"name": "ICT risk management framework",
"description": "Financial entities shall have an ICT risk management framework that is sound, comprehensive and well-documented, enabling them to address ICT risk quickly, efficiently and comprehensively and to ensure a high level of digital operational resilience. This includes continuous configuration recording, security findings aggregation and an enterprise-wide visibility plane.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 6",
"ArticleTitle": "ICT risk management framework"
},
"checks": {
"aws": [
"config_recorder_all_regions_enabled",
"config_recorder_using_aws_service_role",
"securityhub_enabled",
"accessanalyzer_enabled",
"accessanalyzer_enabled_without_findings",
"organizations_delegated_administrators",
"guardduty_centrally_managed",
"guardduty_delegated_admin_enabled_all_regions"
]
}
},
{
"id": "DORA-Art7",
"name": "ICT systems, protocols and tools",
"description": "Financial entities shall use and maintain updated ICT systems, protocols and tools that are appropriate to the magnitude of operations supporting ICT functions, technologically resilient, and adequately equipped to securely process data. Cryptographic primitives, certificate hygiene and network segmentation are core to this requirement.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 7",
"ArticleTitle": "ICT systems, protocols and tools"
},
"checks": {
"aws": [
"acm_certificates_with_secure_key_algorithms",
"acm_certificates_transparency_logs_enabled",
"acm_certificates_expiration_check",
"ec2_ebs_default_encryption",
"kms_cmk_rotation_enabled",
"s3_bucket_secure_transport_policy",
"s3_bucket_default_encryption",
"s3_bucket_kms_encryption",
"vpc_subnet_separate_private_public",
"vpc_subnet_no_public_ip_by_default",
"elb_insecure_ssl_ciphers",
"elbv2_insecure_ssl_ciphers",
"elb_ssl_listeners",
"elbv2_ssl_listeners",
"cloudfront_distributions_using_deprecated_ssl_protocols",
"cloudfront_distributions_https_enabled",
"rds_instance_transport_encrypted"
]
}
},
{
"id": "DORA-Art8",
"name": "Identification",
"description": "Financial entities shall identify, classify and adequately document all ICT supported business functions, roles and responsibilities, the information assets and ICT assets supporting them, and their interdependencies. They shall on a continuous basis identify all sources of ICT risk, in particular the risk exposure to and from other financial entities.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 8",
"ArticleTitle": "Identification"
},
"checks": {
"aws": [
"accessanalyzer_enabled",
"accessanalyzer_enabled_without_findings",
"macie_is_enabled",
"macie_automated_sensitive_data_discovery_enabled",
"ec2_securitygroup_not_used",
"ec2_elastic_ip_unassigned",
"ec2_networkacl_unused",
"secretsmanager_secret_unused"
]
}
},
{
"id": "DORA-Art9",
"name": "Protection and prevention",
"description": "Financial entities shall continuously monitor and control the security and functioning of ICT systems and tools and minimise the impact of ICT risk by deploying appropriate ICT security tools, policies and procedures. Encryption at rest and in transit, blocking of public exposure, network access controls, secret management and instance hardening are central to this article.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 9",
"ArticleTitle": "Protection and prevention"
},
"checks": {
"aws": [
"kms_key_not_publicly_accessible",
"ec2_ebs_volume_encryption",
"ec2_ebs_snapshots_encrypted",
"ec2_ebs_public_snapshot",
"ec2_ebs_snapshot_account_block_public_access",
"s3_account_level_public_access_blocks",
"s3_bucket_level_public_access_block",
"s3_bucket_public_access",
"s3_bucket_policy_public_write_access",
"s3_bucket_public_write_acl",
"s3_bucket_public_list_acl",
"s3_bucket_acl_prohibited",
"s3_access_point_public_access_block",
"ec2_securitygroup_default_restrict_traffic",
"ec2_securitygroup_allow_ingress_from_internet_to_all_ports",
"ec2_securitygroup_allow_ingress_from_internet_to_any_port",
"ec2_securitygroup_allow_ingress_from_internet_to_high_risk_tcp_ports",
"ec2_securitygroup_allow_ingress_from_internet_to_tcp_port_22",
"ec2_securitygroup_allow_ingress_from_internet_to_tcp_port_3389",
"rds_instance_storage_encrypted",
"rds_cluster_storage_encrypted",
"rds_instance_no_public_access",
"rds_snapshots_public_access",
"secretsmanager_not_publicly_accessible",
"secretsmanager_has_restrictive_resource_policy",
"secretsmanager_automatic_rotation_enabled",
"dynamodb_tables_kms_cmk_encryption_enabled",
"sns_topics_kms_encryption_at_rest_enabled",
"sns_topics_not_publicly_accessible",
"ec2_instance_imdsv2_enabled",
"ec2_instance_account_imdsv2_enabled",
"efs_encryption_at_rest_enabled",
"awslambda_function_not_publicly_accessible"
]
}
},
{
"id": "DORA-Art10",
"name": "Detection",
"description": "Financial entities shall have in place mechanisms to promptly detect anomalous activities, including ICT network performance issues and ICT-related incidents, and to identify potential single points of failure. Threat detection across compute, identity, storage and the API control plane is required for timely detection.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 10",
"ArticleTitle": "Detection"
},
"checks": {
"aws": [
"guardduty_is_enabled",
"guardduty_no_high_severity_findings",
"guardduty_ec2_malware_protection_enabled",
"guardduty_lambda_protection_enabled",
"guardduty_rds_protection_enabled",
"guardduty_s3_protection_enabled",
"guardduty_eks_audit_log_enabled",
"guardduty_eks_runtime_monitoring_enabled",
"securityhub_enabled",
"cloudtrail_threat_detection_enumeration",
"cloudtrail_threat_detection_llm_jacking",
"cloudtrail_threat_detection_privilege_escalation",
"cloudtrail_insights_exist",
"inspector2_is_enabled",
"inspector2_active_findings_exist",
"ec2_elastic_ip_shodan"
]
}
},
{
"id": "DORA-Art11",
"name": "Response and recovery",
"description": "Financial entities shall put in place a comprehensive ICT business continuity policy, including ICT response and recovery plans, that ensures the continuity of ICT-supported critical or important functions. Operational alarming, automated event routing and tested recovery actions are essential.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 11",
"ArticleTitle": "Response and recovery"
},
"checks": {
"aws": [
"cloudwatch_alarm_actions_enabled",
"cloudwatch_alarm_actions_alarm_state_configured",
"eventbridge_global_endpoint_event_replication_enabled",
"sns_subscription_not_using_http_endpoints",
"backup_plans_exist",
"backup_vaults_exist",
"rds_instance_critical_event_subscription",
"rds_cluster_critical_event_subscription"
]
}
},
{
"id": "DORA-Art12",
"name": "Backup policies and procedures, restoration and recovery procedures and methods",
"description": "Financial entities shall develop and document backup policies and procedures specifying the scope of data subject to backup and the minimum frequency of the backup, as well as restoration and recovery procedures and methods. Backups must be encrypted, retained, and resources must be designed for recoverability across availability zones and regions.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 12",
"ArticleTitle": "Backup policies and procedures, restoration and recovery procedures and methods"
},
"checks": {
"aws": [
"backup_plans_exist",
"backup_vaults_exist",
"backup_vaults_encrypted",
"backup_recovery_point_encrypted",
"backup_reportplans_exist",
"rds_instance_backup_enabled",
"rds_cluster_protected_by_backup_plan",
"rds_instance_protected_by_backup_plan",
"rds_instance_multi_az",
"rds_cluster_multi_az",
"rds_cluster_backtrack_enabled",
"rds_instance_deletion_protection",
"rds_cluster_deletion_protection",
"rds_snapshots_encrypted",
"s3_bucket_object_versioning",
"s3_bucket_object_lock",
"s3_bucket_cross_region_replication",
"s3_bucket_no_mfa_delete",
"dynamodb_tables_pitr_enabled",
"dynamodb_table_deletion_protection_enabled",
"ec2_ebs_volume_protected_by_backup_plan",
"ec2_ebs_volume_snapshots_exists",
"autoscaling_group_multiple_az",
"elb_is_in_multiple_az",
"elbv2_is_in_multiple_az",
"cloudfront_distributions_multiple_origin_failover_configured",
"dynamodb_table_protected_by_backup_plan"
]
}
},
{
"id": "DORA-Art13",
"name": "Learning and evolving",
"description": "Financial entities shall have in place capabilities and staff to gather information on vulnerabilities and cyber threats, perform post ICT-related incident reviews, and continuously feed lessons learnt back into the ICT risk assessment process. Findings aggregation and continuous insights drive this cycle.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 13",
"ArticleTitle": "Learning and evolving"
},
"checks": {
"aws": [
"securityhub_enabled",
"guardduty_no_high_severity_findings",
"inspector2_active_findings_exist",
"accessanalyzer_enabled_without_findings",
"cloudtrail_insights_exist"
]
}
},
{
"id": "DORA-Art14",
"name": "Communication",
"description": "As part of the ICT risk management framework, financial entities shall have in place crisis communication plans enabling a responsible disclosure of ICT-related incidents or major vulnerabilities to clients, counterparts and the public. Reliable, encrypted and access-controlled notification channels are required.",
"attributes": {
"Pillar": "ICT Risk Management",
"Article": "Article 14",
"ArticleTitle": "Communication"
},
"checks": {
"aws": [
"sns_topics_kms_encryption_at_rest_enabled",
"sns_topics_not_publicly_accessible",
"sns_subscription_not_using_http_endpoints",
"eventbridge_bus_exposed",
"eventbridge_bus_cross_account_access",
"eventbridge_schema_registry_cross_account_access",
"cloudwatch_alarm_actions_enabled",
"cloudwatch_alarm_actions_alarm_state_configured"
]
}
},
{
"id": "DORA-Art17",
"name": "ICT-related incident management process",
"description": "Financial entities shall define, establish and implement an ICT-related incident management process to detect, manage and notify ICT-related incidents. Comprehensive trail logging, log integrity protection, retention and centralisation of ICT events are foundational requirements.",
"attributes": {
"Pillar": "ICT-Related Incident Reporting",
"Article": "Article 17",
"ArticleTitle": "ICT-related incident management process"
},
"checks": {
"aws": [
"cloudtrail_multi_region_enabled",
"cloudtrail_multi_region_enabled_logging_management_events",
"cloudtrail_kms_encryption_enabled",
"cloudtrail_log_file_validation_enabled",
"cloudtrail_cloudwatch_logging_enabled",
"cloudtrail_logs_s3_bucket_access_logging_enabled",
"cloudtrail_logs_s3_bucket_is_not_publicly_accessible",
"cloudtrail_s3_dataevents_read_enabled",
"cloudtrail_s3_dataevents_write_enabled",
"cloudtrail_bucket_requires_mfa_delete",
"cloudtrail_bedrock_logging_enabled",
"cloudwatch_log_group_retention_policy_specific_days_enabled",
"cloudwatch_log_group_kms_encryption_enabled",
"cloudwatch_log_group_no_secrets_in_logs",
"cloudwatch_log_group_not_publicly_accessible",
"vpc_flow_logs_enabled",
"ec2_client_vpn_endpoint_connection_logging_enabled",
"route53_public_hosted_zones_cloudwatch_logging_enabled",
"elb_logging_enabled",
"elbv2_logging_enabled",
"cloudfront_distributions_logging_enabled",
"s3_bucket_server_access_logging_enabled"
]
}
},
{
"id": "DORA-Art18",
"name": "Classification of ICT-related incidents and cyber threats",
"description": "Financial entities shall classify ICT-related incidents and shall determine their impact based on criteria such as the number of clients affected, duration, geographical spread, data losses, and criticality of the services affected. Severity-aware threat detection across the estate underpins this classification.",
"attributes": {
"Pillar": "ICT-Related Incident Reporting",
"Article": "Article 18",
"ArticleTitle": "Classification of ICT-related incidents and cyber threats"
},
"checks": {
"aws": [
"guardduty_no_high_severity_findings",
"guardduty_centrally_managed",
"guardduty_delegated_admin_enabled_all_regions",
"securityhub_enabled",
"inspector2_active_findings_exist",
"accessanalyzer_enabled_without_findings",
"cloudtrail_threat_detection_enumeration",
"cloudtrail_threat_detection_llm_jacking",
"cloudtrail_threat_detection_privilege_escalation"
]
}
},
{
"id": "DORA-Art19",
"name": "Reporting of major ICT-related incidents and voluntary notification of significant cyber threats",
"description": "Financial entities shall report major ICT-related incidents to the relevant competent authority and may, on a voluntary basis, notify significant cyber threats. Detective metric filters, change-tracking alarms and reliable notification topics are needed to surface and route reportable events.",
"attributes": {
"Pillar": "ICT-Related Incident Reporting",
"Article": "Article 19",
"ArticleTitle": "Reporting of major ICT-related incidents and voluntary notification of significant cyber threats"
},
"checks": {
"aws": [
"cloudwatch_log_metric_filter_authentication_failures",
"cloudwatch_log_metric_filter_unauthorized_api_calls",
"cloudwatch_log_metric_filter_root_usage",
"cloudwatch_log_metric_filter_sign_in_without_mfa",
"cloudwatch_log_metric_filter_disable_or_scheduled_deletion_of_kms_cmk",
"cloudwatch_log_metric_filter_for_s3_bucket_policy_changes",
"cloudwatch_log_metric_filter_policy_changes",
"cloudwatch_log_metric_filter_security_group_changes",
"cloudwatch_log_metric_filter_aws_organizations_changes",
"cloudwatch_log_metric_filter_and_alarm_for_aws_config_configuration_changes_enabled",
"cloudwatch_log_metric_filter_and_alarm_for_cloudtrail_configuration_changes_enabled",
"cloudwatch_changes_to_network_acls_alarm_configured",
"cloudwatch_changes_to_network_gateways_alarm_configured",
"cloudwatch_changes_to_network_route_tables_alarm_configured",
"cloudwatch_changes_to_vpcs_alarm_configured",
"sns_subscription_not_using_http_endpoints"
]
}
},
{
"id": "DORA-Art24",
"name": "General requirements for the performance of digital operational resilience testing",
"description": "Financial entities shall establish, maintain and review a sound and comprehensive digital operational resilience testing programme, as an integral part of the ICT risk management framework. Continuous vulnerability discovery, configuration assessment and instance manageability are foundational.",
"attributes": {
"Pillar": "Digital Operational Resilience Testing",
"Article": "Article 24",
"ArticleTitle": "General requirements for the performance of digital operational resilience testing"
},
"checks": {
"aws": [
"inspector2_is_enabled",
"inspector2_active_findings_exist",
"securityhub_enabled",
"ec2_instance_managed_by_ssm",
"ec2_instance_with_outdated_ami",
"ssm_managed_compliant_patching"
]
}
},
{
"id": "DORA-Art25",
"name": "Testing of ICT tools and systems",
"description": "Financial entities shall ensure that tests are undertaken on ICT tools and systems, on critical ICT systems supporting all critical or important functions, at least yearly. Vulnerability assessments, deprecated component detection and certificate hygiene must be tracked.",
"attributes": {
"Pillar": "Digital Operational Resilience Testing",
"Article": "Article 25",
"ArticleTitle": "Testing of ICT tools and systems"
},
"checks": {
"aws": [
"inspector2_is_enabled",
"inspector2_active_findings_exist",
"guardduty_is_enabled",
"guardduty_no_high_severity_findings",
"config_recorder_all_regions_enabled",
"ec2_instance_with_outdated_ami",
"ec2_instance_managed_by_ssm",
"ec2_instance_paravirtual_type",
"rds_instance_deprecated_engine_version",
"acm_certificates_expiration_check",
"rds_instance_certificate_expiration",
"iam_no_expired_server_certificates_stored",
"ssm_managed_compliant_patching"
]
}
},
{
"id": "DORA-Art28",
"name": "General principles (ICT third-party risk)",
"description": "Financial entities shall manage ICT third-party risk as an integral component of ICT risk within their ICT risk management framework. Cross-account access, trust boundaries, organization-level controls and dependency visibility are critical to monitor third-party exposure on AWS.",
"attributes": {
"Pillar": "ICT Third-Party Risk Management",
"Article": "Article 28",
"ArticleTitle": "General principles (ICT third-party risk)"
},
"checks": {
"aws": [
"iam_role_cross_service_confused_deputy_prevention",
"iam_role_cross_account_readonlyaccess_policy",
"iam_no_custom_policy_permissive_role_assumption",
"accessanalyzer_enabled",
"accessanalyzer_enabled_without_findings",
"s3_bucket_cross_account_access",
"dynamodb_table_cross_account_access",
"eventbridge_bus_cross_account_access",
"eventbridge_schema_registry_cross_account_access",
"cloudwatch_cross_account_sharing_disabled",
"organizations_delegated_administrators",
"organizations_account_part_of_organizations",
"organizations_scp_check_deny_regions",
"vpc_endpoint_connections_trust_boundaries",
"vpc_endpoint_services_allowed_principals_trust_boundaries",
"vpc_peering_routing_tables_with_least_privilege",
"awslambda_function_using_cross_account_layers"
]
}
},
{
"id": "DORA-Art30",
"name": "Key contractual provisions",
"description": "Contractual arrangements with ICT third-party service providers shall be set out in writing and include, at minimum, agreed service levels and clear allocation of rights and obligations. Privilege boundaries, least-privilege policies and absence of administrative wildcards are the technical guardrails that enforce these contractual constraints inside AWS.",
"attributes": {
"Pillar": "ICT Third-Party Risk Management",
"Article": "Article 30",
"ArticleTitle": "Key contractual provisions"
},
"checks": {
"aws": [
"iam_aws_attached_policy_no_administrative_privileges",
"iam_customer_attached_policy_no_administrative_privileges",
"iam_customer_unattached_policy_no_administrative_privileges",
"iam_inline_policy_no_administrative_privileges",
"iam_inline_policy_allows_privilege_escalation",
"iam_policy_allows_privilege_escalation",
"iam_inline_policy_no_full_access_to_cloudtrail",
"iam_inline_policy_no_full_access_to_kms",
"iam_policy_no_full_access_to_cloudtrail",
"iam_policy_no_full_access_to_kms",
"iam_role_administratoraccess_policy",
"iam_user_administrator_access_policy",
"iam_group_administrator_access_policy",
"iam_administrator_access_with_mfa",
"iam_policy_attached_only_to_group_or_roles",
"accessanalyzer_enabled"
]
}
},
{
"id": "DORA-Art45",
"name": "Information-sharing arrangements on cyber threat information and intelligence",
"description": "Financial entities may exchange amongst themselves cyber threat information and intelligence, including indicators of compromise, tactics, techniques and procedures, cyber security alerts and configuration tools. Centralised threat detection, sensitive data discovery and trail-based intelligence enable participation in such information-sharing arrangements.",
"attributes": {
"Pillar": "Information Sharing",
"Article": "Article 45",
"ArticleTitle": "Information-sharing arrangements on cyber threat information and intelligence"
},
"checks": {
"aws": [
"guardduty_is_enabled",
"guardduty_centrally_managed",
"securityhub_enabled",
"macie_is_enabled",
"macie_automated_sensitive_data_discovery_enabled",
"cloudtrail_threat_detection_enumeration",
"cloudtrail_threat_detection_llm_jacking",
"cloudtrail_threat_detection_privilege_escalation",
"accessanalyzer_enabled_without_findings"
]
}
}
]
}
File diff suppressed because it is too large Load Diff
File diff suppressed because it is too large Load Diff
+87 -15
View File
@@ -1,3 +1,4 @@
import importlib.metadata
import os
import pathlib
from datetime import datetime, timezone
@@ -48,7 +49,7 @@ class _MutableTimestamp:
timestamp = _MutableTimestamp(datetime.today())
timestamp_utc = _MutableTimestamp(datetime.now(timezone.utc))
prowler_version = "5.29.0"
prowler_version = "5.30.0"
html_logo_url = "https://github.com/prowler-cloud/prowler/"
square_logo_img = "https://raw.githubusercontent.com/prowler-cloud/prowler/dc7d2d5aeb92fdf12e8604f42ef6472cd3e8e889/docs/img/prowler-logo-black.png"
aws_logo = "https://user-images.githubusercontent.com/38561120/235953920-3e3fba08-0795-41dc-b480-9bea57db9f2e.png"
@@ -85,13 +86,38 @@ class Provider(str, Enum):
actual_directory = pathlib.Path(os.path.dirname(os.path.realpath(__file__)))
def _get_ep_compliance_dirs() -> dict:
"""Discover compliance directories from entry points. Returns {provider: path}."""
dirs = {}
for ep in importlib.metadata.entry_points(group="prowler.compliance"):
try:
module = ep.load()
if hasattr(module, "__path__"):
dirs[ep.name] = module.__path__[0]
elif hasattr(module, "__file__"):
dirs[ep.name] = os.path.dirname(module.__file__)
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
return dirs
def get_available_compliance_frameworks(provider=None):
available_compliance_frameworks = []
providers = [p.value for p in Provider]
# Built-in compliance
compliance_base = f"{actual_directory}/../compliance"
if provider:
providers = [provider]
for current_provider in providers:
compliance_dir = f"{actual_directory}/../compliance/{current_provider}"
else:
# Scan compliance directory for all provider subdirectories
providers = []
if os.path.isdir(compliance_base):
for entry in os.scandir(compliance_base):
if entry.is_dir():
providers.append(entry.name)
for prov in providers:
compliance_dir = f"{compliance_base}/{prov}"
if not os.path.isdir(compliance_dir):
continue
with os.scandir(compliance_dir) as files:
@@ -100,7 +126,8 @@ def get_available_compliance_frameworks(provider=None):
available_compliance_frameworks.append(
file.name.removesuffix(".json")
)
# Also scan top-level compliance/ for multi-provider (universal) JSONs.
# Built-in multi-provider frameworks at top-level compliance/ directory.
# Placed before external entry points so built-ins win on name collisions.
# When a specific provider was requested, only include the framework if it
# declares support for that provider; otherwise include all universal frameworks.
compliance_root = f"{actual_directory}/../compliance"
@@ -117,6 +144,43 @@ def get_available_compliance_frameworks(provider=None):
continue
if name not in available_compliance_frameworks:
available_compliance_frameworks.append(name)
# External per-provider compliance via entry points.
ep_dirs = _get_ep_compliance_dirs()
for prov, path in ep_dirs.items():
if provider and prov != provider:
continue
if os.path.isdir(path):
for file in os.scandir(path):
if file.is_file() and file.name.endswith(".json"):
name = file.name.removesuffix(".json")
if name not in available_compliance_frameworks:
available_compliance_frameworks.append(name)
# External multi-provider frameworks via the dedicated universal group;
# filtered by supports_provider when a provider is given.
for ep in importlib.metadata.entry_points(group="prowler.compliance.universal"):
try:
module = ep.load()
path = (
module.__path__[0]
if hasattr(module, "__path__")
else os.path.dirname(module.__file__)
)
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
continue
if not os.path.isdir(path):
continue
for file in os.scandir(path):
if file.is_file() and file.name.endswith(".json"):
name = file.name.removesuffix(".json")
if provider:
framework = load_compliance_framework_universal(file.path)
if framework is None or not framework.supports_provider(provider):
continue
if name not in available_compliance_frameworks:
available_compliance_frameworks.append(name)
return available_compliance_frameworks
@@ -228,18 +292,26 @@ def load_and_validate_config_file(provider: str, config_file_path: str) -> dict:
with open(config_file_path, "r", encoding=encoding_format_utf_8) as f:
config_file = yaml.safe_load(f)
# Not to introduce a breaking change, allow the old format config file without any provider keys
# and a new format with a key for each provider to include their configuration values within.
if any(
key in config_file
for key in ["aws", "gcp", "azure", "kubernetes", "m365"]
# Namespaced format: each provider has its own top-level key.
# Works for every built-in and every external plugin without a hardcoded list.
# Flat legacy format is AWS-only (historical, pre-multicloud). We identify it
# by the absence of nested-dict top-level values (namespaced files always
# have dict values; the legacy AWS format only has primitives/lists).
if (
isinstance(config_file, dict)
and provider in config_file
and isinstance(config_file[provider], dict)
):
config = config_file.get(provider, {})
config = config_file.get(provider, {}) or {}
elif (
isinstance(config_file, dict)
and config_file
and provider == "aws"
and not any(isinstance(v, dict) for v in config_file.values())
):
config = config_file
else:
config = config_file if config_file else {}
# Not to break Azure, K8s and GCP does not support or use the old config format
if provider in ["azure", "gcp", "kubernetes", "m365"]:
config = {}
config = {}
return config
+54 -6
View File
@@ -1,4 +1,6 @@
import importlib
import importlib.metadata
import importlib.util
import json
import os
import re
@@ -19,6 +21,7 @@ from prowler.lib.check.utils import recover_checks_from_provider
from prowler.lib.logger import logger
from prowler.lib.outputs.outputs import report
from prowler.lib.utils.utils import open_file, parse_json_file, print_boxes
from prowler.providers.common.builtin import is_builtin_provider
from prowler.providers.common.models import Audit_Metadata
@@ -385,6 +388,45 @@ def import_check(check_path: str) -> ModuleType:
return lib
def _resolve_check_module(
provider_type: str, service: str, check_name: str
) -> ModuleType:
"""Resolve and import a check module.
Built-in wins on CheckID collision. Plug-ins are first-class extenders
(they can add new checks under new CheckIDs) but cannot override
existing built-ins a security tool prefers fail-loud predictability
over silent overrides. CheckMetadata.get_bulk() applies the same
precedence on the metadata side (first-write-wins) and emits a warning
when a plug-in tries to override, so the user knows their plug-in
duplicate is being ignored and can rename it.
Gates the built-in branch on `is_builtin_provider(provider_type)`
calling `find_spec` on `prowler.providers.{provider_type}.services...`
directly would propagate `ModuleNotFoundError` for external providers
(their parent package `prowler.providers.{provider_type}` does not
exist) instead of returning None. The leaf helper encapsulates the
safe lookup, so external providers go straight to entry points. For
built-ins we still use `find_spec` to distinguish "check doesn't
exist" from "check exists but failed to import" (broken transitive
dep, etc.).
"""
# Built-in first — built-in wins on CheckID collision
if is_builtin_provider(provider_type):
builtin_path = f"prowler.providers.{provider_type}.services.{service}.{check_name}.{check_name}"
if importlib.util.find_spec(builtin_path) is not None:
return import_check(builtin_path)
# Entry point lookup — only consulted when the built-in truly doesn't exist
for ep in importlib.metadata.entry_points(group=f"prowler.checks.{provider_type}"):
if ep.name == check_name:
return importlib.import_module(ep.value)
raise ModuleNotFoundError(
f"Check '{check_name}' not found for provider '{provider_type}'"
)
def run_fixer(check_findings: list) -> int:
"""
Run the fixer for the check if it exists and there are any FAIL findings
@@ -525,9 +567,10 @@ def execute_checks(
service = check_name.split("_")[0]
try:
try:
# Import check module
check_module_path = f"prowler.providers.{global_provider.type}.services.{service}.{check_name}.{check_name}"
lib = import_check(check_module_path)
# Import check module (built-in or entry point)
lib = _resolve_check_module(
global_provider.type, service, check_name
)
# Recover functions from check
check_to_execute = getattr(lib, check_name)
check = check_to_execute()
@@ -605,9 +648,10 @@ def execute_checks(
)
try:
try:
# Import check module
check_module_path = f"prowler.providers.{global_provider.type}.services.{service}.{check_name}.{check_name}"
lib = import_check(check_module_path)
# Import check module (built-in or entry point)
lib = _resolve_check_module(
global_provider.type, service, check_name
)
# Recover functions from check
check_to_execute = getattr(lib, check_name)
check = check_to_execute()
@@ -753,6 +797,10 @@ def execute(
is_finding_muted_args["org_domain"] = (
global_provider.identity.org_domain
)
elif not is_builtin_provider(global_provider.type):
# External/custom provider — delegate identity args
is_finding_muted_args = global_provider.get_mutelist_finding_args()
for finding in check_findings:
if global_provider.type == "cloudflare":
is_finding_muted_args["account_id"] = finding.account_id
+8 -3
View File
@@ -2,10 +2,10 @@ import sys
from colorama import Fore, Style
from prowler.config.config import EXTERNAL_TOOL_PROVIDERS
from prowler.lib.check.check import parse_checks_from_file
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.models import CheckMetadata, Severity
from prowler.lib.check.tool_wrapper import is_tool_wrapper_provider
from prowler.lib.logger import logger
@@ -26,8 +26,13 @@ def load_checks_to_execute(
) -> set:
"""Generate the list of checks to execute based on the cloud provider and the input arguments given"""
try:
# Bypass check loading for providers that use external tools directly
if provider in EXTERNAL_TOOL_PROVIDERS:
# Bypass check loading for tool-wrapper providers — they delegate
# scanning to an external tool and have no checks to recover.
# Single source of truth across __main__, the CheckMetadata validators,
# check discovery and this loader, covering both built-in tool wrappers
# (iac/llm/image) and external plug-ins that declare
# `is_external_tool_provider = True` via the contract.
if is_tool_wrapper_provider(provider):
return set()
# Local subsets
+80 -15
View File
@@ -1,3 +1,4 @@
import importlib.metadata
import json
import os
import sys
@@ -434,26 +435,63 @@ class Compliance(BaseModel):
"""Bulk load all compliance frameworks specification into a dict"""
try:
bulk_compliance_frameworks = {}
# Built-in compliance from prowler/compliance/{provider}/
available_compliance_framework_modules = list_compliance_modules()
for compliance_framework in available_compliance_framework_modules:
if provider in compliance_framework.name:
# Match the provider segment exactly, not as a substring, so
# e.g. `cloud` does not capture `cloudflare`.
if compliance_framework.name.split(".")[-1] == provider:
compliance_specification_dir_path = (
f"{compliance_framework.module_finder.path}/{provider}"
)
# for compliance_framework in available_compliance_framework_modules:
for filename in os.listdir(compliance_specification_dir_path):
file_path = os.path.join(
compliance_specification_dir_path, filename
)
# Check if it is a file and ti size is greater than 0
if os.path.isfile(file_path) and os.stat(file_path).st_size > 0:
# Open Compliance file in JSON
# cis_v1.4_aws.json --> cis_v1.4_aws
compliance_framework_name = filename.split(".json")[0]
# Store the compliance info
bulk_compliance_frameworks[compliance_framework_name] = (
load_compliance_framework(file_path)
)
# External compliance via entry points
for ep in importlib.metadata.entry_points(group="prowler.compliance"):
if ep.name == provider:
try:
module = ep.load()
compliance_dir = (
module.__path__[0]
if hasattr(module, "__path__")
else os.path.dirname(module.__file__)
)
for filename in os.listdir(compliance_dir):
if filename.endswith(".json"):
file_path = os.path.join(compliance_dir, filename)
if (
os.path.isfile(file_path)
and os.stat(file_path).st_size > 0
):
compliance_framework_name = filename.split(".json")[
0
]
if (
compliance_framework_name
not in bulk_compliance_frameworks
):
# External JSON: tolerate non-legacy
# schemas (skip + warn) instead of aborting.
framework = load_compliance_framework(
file_path, fatal=False
)
if framework is not None:
bulk_compliance_frameworks[
compliance_framework_name
] = framework
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
except Exception as e:
logger.error(f"{e.__class__.__name__}[{e.__traceback__.tb_lineno}] -- {e}")
@@ -462,18 +500,26 @@ class Compliance(BaseModel):
# Testing Pending
def load_compliance_framework(
compliance_specification_file: str,
) -> Compliance:
"""load_compliance_framework loads and parse a Compliance Framework Specification"""
compliance_specification_file: str, fatal: bool = True
) -> Optional[Compliance]:
"""load_compliance_framework loads and parse a Compliance Framework Specification.
With ``fatal=True`` (built-in JSONs) an invalid file aborts the run; with
``fatal=False`` (external JSONs) it is skipped with a warning and ``None``
is returned.
"""
try:
compliance_framework = Compliance.parse_file(compliance_specification_file)
return Compliance.parse_file(compliance_specification_file)
except ValidationError as error:
logger.critical(
f"Compliance Framework Specification from {compliance_specification_file} is not valid: {error}"
if fatal:
logger.critical(
f"Compliance Framework Specification from {compliance_specification_file} is not valid: {error}"
)
sys.exit(1)
logger.warning(
f"Skipping invalid compliance framework {compliance_specification_file}: {error}"
)
sys.exit(1)
else:
return compliance_framework
return None
# ─── Universal Compliance Schema Models (Phase 1-3) ─────────────────────────
@@ -950,6 +996,25 @@ def get_bulk_compliance_frameworks_universal(provider: str) -> dict:
if compliance_root and os.path.isdir(compliance_root):
_load_jsons_from_dir(compliance_root, provider, bulk)
# External multi-provider frameworks via the dedicated universal entry
# point group, kept separate from the per-provider `prowler.compliance`
# group so the legacy loader never parses a universal JSON. Built-ins
# (already in bulk) win on a name collision.
for ep in importlib.metadata.entry_points(group="prowler.compliance.universal"):
try:
module = ep.load()
ep_dir = (
module.__path__[0]
if hasattr(module, "__path__")
else os.path.dirname(module.__file__)
)
if os.path.isdir(ep_dir):
_load_jsons_from_dir(ep_dir, provider, bulk)
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
except Exception as e:
logger.error(f"{e.__class__.__name__}[{e.__traceback__.tb_lineno}] -- {e}")
return bulk
+37 -12
View File
@@ -11,10 +11,10 @@ from typing import Any, Dict, Optional, Set
from pydantic.v1 import BaseModel, Field, ValidationError, validator
from pydantic.v1.error_wrappers import ErrorWrapper
from prowler.config.config import EXTERNAL_TOOL_PROVIDERS, Provider
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.utils import recover_checks_from_provider
from prowler.lib.logger import logger
from prowler.providers.common.provider import Provider as ProviderABC
# Valid ResourceGroup values as defined in the RFC
VALID_RESOURCE_GROUPS = frozenset(
@@ -259,7 +259,7 @@ class CheckMetadata(BaseModel):
)
if (
value_lower not in VALID_CATEGORIES
and values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS
and not ProviderABC.is_tool_wrapper_provider(values.get("Provider"))
):
raise ValueError(
f"Invalid category: '{value_lower}'. Must be one of: {', '.join(sorted(VALID_CATEGORIES))}."
@@ -288,7 +288,9 @@ class CheckMetadata(BaseModel):
raise ValueError("ServiceName must be a non-empty string")
check_id = values.get("CheckID")
if check_id and values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if check_id and not ProviderABC.is_tool_wrapper_provider(
values.get("Provider")
):
service_from_check_id = check_id.split("_")[0]
if service_name != service_from_check_id:
raise ValueError(
@@ -304,7 +306,9 @@ class CheckMetadata(BaseModel):
if not check_id:
raise ValueError("CheckID must be a non-empty string")
if check_id and values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if check_id and not ProviderABC.is_tool_wrapper_provider(
values.get("Provider")
):
if "-" in check_id:
raise ValueError(
f"CheckID {check_id} contains a hyphen, which is not allowed"
@@ -313,8 +317,9 @@ class CheckMetadata(BaseModel):
return check_id
@validator("CheckTitle", pre=True, always=True)
@classmethod
def validate_check_title(cls, check_title, values): # noqa: F841
if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if not ProviderABC.is_tool_wrapper_provider(values.get("Provider")):
if len(check_title) > 150:
raise ValueError(
f"CheckTitle must not exceed 150 characters, got {len(check_title)} characters"
@@ -326,14 +331,18 @@ class CheckMetadata(BaseModel):
return check_title
@validator("RelatedUrl", pre=True, always=True)
@classmethod
def validate_related_url(cls, related_url, values): # noqa: F841
if related_url and values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if related_url and not ProviderABC.is_tool_wrapper_provider(
values.get("Provider")
):
raise ValueError("RelatedUrl must be empty. This field is deprecated.")
return related_url
@validator("Remediation")
@classmethod
def validate_recommendation_url(cls, remediation, values): # noqa: F841
if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if not ProviderABC.is_tool_wrapper_provider(values.get("Provider")):
url = remediation.Recommendation.Url
if url and not url.startswith("https://hub.prowler.com/"):
raise ValueError(
@@ -346,7 +355,7 @@ class CheckMetadata(BaseModel):
provider = values.get("Provider", "").lower()
# Non-AWS providers must have an empty CheckType list
if provider != "aws" and provider not in EXTERNAL_TOOL_PROVIDERS:
if provider != "aws" and not ProviderABC.is_tool_wrapper_provider(provider):
if check_type:
raise ValueError(
f"CheckType must be empty for non-AWS providers. Got {check_type} for provider '{provider}'."
@@ -371,8 +380,9 @@ class CheckMetadata(BaseModel):
return check_type
@validator("Description", pre=True, always=True)
@classmethod
def validate_description(cls, description, values): # noqa: F841
if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if not ProviderABC.is_tool_wrapper_provider(values.get("Provider")):
if len(description) > 400:
raise ValueError(
f"Description must not exceed 400 characters, got {len(description)} characters"
@@ -380,8 +390,9 @@ class CheckMetadata(BaseModel):
return description
@validator("Risk", pre=True, always=True)
@classmethod
def validate_risk(cls, risk, values): # noqa: F841
if values.get("Provider") not in EXTERNAL_TOOL_PROVIDERS:
if not ProviderABC.is_tool_wrapper_provider(values.get("Provider")):
if len(risk) > 400:
raise ValueError(
f"Risk must not exceed 400 characters, got {len(risk)} characters"
@@ -433,6 +444,20 @@ class CheckMetadata(BaseModel):
metadata_file = f"{check_path}/{check_name}.metadata.json"
# Load metadata
check_metadata = load_check_metadata(metadata_file)
# Built-in wins on CheckID collision. Plug-in entry points are
# appended after built-ins by `recover_checks_from_provider`, so
# a duplicate CheckID here means an entry-point check is trying
# to override a built-in. Ignore the override (the built-in
# metadata stays) and surface it via a warning — matching the
# precedence enforced by `_resolve_check_module`.
if check_metadata.CheckID in bulk_check_metadata:
logger.warning(
f"Plug-in check metadata '{check_metadata.CheckID}' "
f"(loaded from '{metadata_file}') is being IGNORED — "
f"a built-in with the same CheckID exists. To use your "
f"plug-in, register it under a different CheckID."
)
continue
bulk_check_metadata[check_metadata.CheckID] = check_metadata
return bulk_check_metadata
@@ -470,7 +495,7 @@ class CheckMetadata(BaseModel):
# If the bulk checks metadata is not provided, get it
if not bulk_checks_metadata:
bulk_checks_metadata = {}
available_providers = [p.value for p in Provider]
available_providers = ProviderABC.get_available_providers()
for provider_name in available_providers:
bulk_checks_metadata.update(CheckMetadata.get_bulk(provider_name))
if provider:
@@ -495,7 +520,7 @@ class CheckMetadata(BaseModel):
# Loaded here, as it is not always needed
if not bulk_compliance_frameworks:
bulk_compliance_frameworks = {}
available_providers = [p.value for p in Provider]
available_providers = ProviderABC.get_available_providers()
for provider in available_providers:
bulk_compliance_frameworks = Compliance.get_bulk(provider=provider)
checks_from_compliance_framework = (
+62
View File
@@ -0,0 +1,62 @@
"""Standalone helper for tool-wrapper provider detection.
A provider is a "tool wrapper" if it delegates scanning to an external tool
(Trivy, promptfoo, etc.) instead of running checks/services through the
standard Prowler engine. This module is the single source of truth for that
classification across the codebase.
Kept as a leaf module with no Prowler imports beyond the leaf
`external_tool_providers` so it can be referenced from `prowler.lib.check.*`
and `prowler.providers.common.provider` without forming an import cycle.
"""
import importlib.metadata
from prowler.lib.check.external_tool_providers import EXTERNAL_TOOL_PROVIDERS
from prowler.providers.common.builtin import is_builtin_provider
# Module-level cache for entry-point classes consulted by this helper.
# Independent of `Provider._ep_providers` to keep this module leaf — the cost
# of a duplicate cache entry is negligible (one class object per external
# provider, loaded lazily on first lookup).
_ep_class_cache: dict = {}
def _load_ep_class(provider: str):
"""Return the entry-point provider class for `provider`, or None.
Caches the result in `_ep_class_cache`. Errors during entry-point loading
are swallowed (returning None) so a broken plug-in never crashes the
is-tool-wrapper check; it just falls through to "not a tool wrapper".
"""
if provider in _ep_class_cache:
return _ep_class_cache[provider]
for ep in importlib.metadata.entry_points(group="prowler.providers"):
if ep.name == provider:
try:
cls = ep.load()
except Exception:
cls = None
_ep_class_cache[provider] = cls
return cls
_ep_class_cache[provider] = None
return None
def is_tool_wrapper_provider(provider: str) -> bool:
"""Return True if the provider delegates scanning to an external tool.
Combines the built-in `EXTERNAL_TOOL_PROVIDERS` frozenset (fast path for
iac/llm/image) with the `is_external_tool_provider` class attribute of
external plug-ins registered via entry points. This is the single source
of truth consulted by `__main__`, the `CheckMetadata` validators, the
check-loading utilities, and the checks loader.
"""
if provider in EXTERNAL_TOOL_PROVIDERS:
return True
# Built-in wins: short-circuit before ep.load() so a same-name plug-in
# cannot flip a built-in onto the tool-wrapper path or run its code.
if is_builtin_provider(provider):
return False
cls = _load_ep_class(provider)
return bool(cls and getattr(cls, "is_external_tool_provider", False))
+84 -23
View File
@@ -1,9 +1,43 @@
import importlib
import importlib.metadata
import importlib.util
import os
import sys
from pkgutil import walk_packages
from prowler.lib.check.external_tool_providers import EXTERNAL_TOOL_PROVIDERS
from prowler.lib.check.tool_wrapper import is_tool_wrapper_provider
from prowler.lib.logger import logger
from prowler.providers.common.builtin import is_builtin_provider
def _recover_ep_checks(provider: str, service: str = None) -> list[tuple]:
"""Discover external checks registered via entry points for a provider.
External plugins follow the same layout as built-ins:
`{plugin_root}.services.{service}.{check}.{check}`
When `service` is provided, only entry points whose dotted path contains
`.services.{service}.` are included mirroring how built-in discovery
filters by the `prowler.providers.{provider}.services.{service}` package.
Uses find_spec to locate the check module without importing it,
avoiding service client initialization at discovery time.
"""
checks = []
for ep in importlib.metadata.entry_points(group=f"prowler.checks.{provider}"):
try:
if service and f".services.{service}." not in ep.value:
continue
spec = importlib.util.find_spec(ep.value)
if spec and spec.origin:
check_path = os.path.dirname(spec.origin)
checks.append((ep.name, check_path))
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
return checks
def recover_checks_from_provider(
@@ -15,29 +49,55 @@ def recover_checks_from_provider(
Returns a list of tuples with the following format (check_name, check_path)
"""
try:
# Bypass check loading for providers that use external tools directly
if provider in EXTERNAL_TOOL_PROVIDERS:
# Bypass check loading for tool-wrapper providers — they delegate
# scanning to an external tool and have no checks to recover.
# Single source of truth: combines the EXTERNAL_TOOL_PROVIDERS
# frozenset (built-ins) with the per-provider `is_external_tool_provider`
# class attribute (so external plug-ins opt in via the contract).
if is_tool_wrapper_provider(provider):
return []
checks = []
modules = list_modules(provider, service)
for module_name in modules:
# Format: "prowler.providers.{provider}.services.{service}.{check_name}.{check_name}"
check_module_name = module_name.name
# We need to exclude common shared libraries in services
if (
check_module_name.count(".") == 6
and ".lib." not in check_module_name
and (not check_module_name.endswith("_fixer") or include_fixers)
):
check_path = module_name.module_finder.path
# Check name is the last part of the check_module_name
check_name = check_module_name.split(".")[-1]
check_info = (check_name, check_path)
checks.append(check_info)
except ModuleNotFoundError:
logger.critical(f"Service {service} was not found for the {provider} provider.")
sys.exit(1)
# Built-in checks from prowler.providers.{provider}.services. Gate
# the built-in branch on `is_builtin_provider(provider)` — calling
# `find_spec` directly on `prowler.providers.{provider}.services`
# would propagate `ModuleNotFoundError` when the parent package
# `prowler.providers.{provider}` does not exist (i.e. the provider
# is external), instead of returning None. The leaf helper
# encapsulates the safe lookup, so we only run the built-in
# discovery when the provider actually ships with the SDK; for
# external providers we go straight to entry points.
if is_builtin_provider(provider):
modules = list_modules(provider, service)
for module_name in modules:
# Format: "prowler.providers.{provider}.services.{service}.{check_name}.{check_name}"
check_module_name = module_name.name
# We need to exclude common shared libraries in services
if (
check_module_name.count(".") == 6
and ".lib." not in check_module_name
and (not check_module_name.endswith("_fixer") or include_fixers)
):
check_path = module_name.module_finder.path
check_name = check_module_name.split(".")[-1]
check_info = (check_name, check_path)
checks.append(check_info)
# External checks registered via entry points — always consulted, with
# optional service filter. Previously gated by `if not service:`, which
# prevented external providers from being usable with --service.
checks.extend(_recover_ep_checks(provider, service))
# A service was requested but nothing matched in either built-ins or
# entry points — surface this as a clear error instead of silently
# returning an empty list.
if service and not checks:
logger.critical(
f"Service '{service}' was not found for the '{provider}' provider "
f"(neither as a built-in nor via external entry points)."
)
sys.exit(1)
except Exception as e:
logger.critical(f"{e.__class__.__name__}[{e.__traceback__.tb_lineno}]: {e}")
sys.exit(1)
@@ -64,8 +124,9 @@ def recover_checks_from_service(service_list: list, provider: str) -> set:
Returns a set of checks from the given services
"""
try:
# Bypass check loading for providers that use external tools directly
if provider in EXTERNAL_TOOL_PROVIDERS:
# Bypass check loading for tool-wrapper providers — symmetric with
# `recover_checks_from_provider` above, using the same source of truth.
if is_tool_wrapper_provider(provider):
return set()
checks = set()
+52 -7
View File
@@ -20,19 +20,61 @@ from prowler.providers.common.arguments import (
validate_provider_arguments,
validate_sarif_usage,
)
from prowler.providers.common.provider import Provider
class ProwlerArgumentParser:
# Set the default parser
def __init__(self):
# Discover any providers not in the hardcoded list below
# TODO - First step to support current providers and the new external provider implementation
known_providers = {
"aws",
"azure",
"gcp",
"kubernetes",
"m365",
"github",
"googleworkspace",
"cloudflare",
"oraclecloud",
"openstack",
"alibabacloud",
"iac",
"llm",
"image",
"nhn",
"mongodbatlas",
"vercel",
"okta",
"scaleway",
"stackit",
}
all_providers = set(Provider.get_available_providers())
new_providers = sorted(all_providers - known_providers)
# Build extra strings for dynamically discovered providers
extra_providers_csv = ""
extra_providers_text = ""
if new_providers:
providers_help = Provider.get_providers_help_text()
extra_providers_csv = "," + ",".join(new_providers)
extra_lines = []
for name in new_providers:
help_text = providers_help.get(name, "")
if help_text:
extra_lines.append(f" {name:<20}{help_text}")
if extra_lines:
extra_providers_text = "\n" + "\n".join(extra_lines)
# CLI Arguments
self.parser = argparse.ArgumentParser(
prog="prowler",
formatter_class=RawTextHelpFormatter,
usage="prowler [-h] [--version] {aws,azure,gcp,kubernetes,m365,github,googleworkspace,okta,nhn,mongodbatlas,oraclecloud,alibabacloud,cloudflare,openstack,scaleway,stackit,vercel,dashboard,iac,image,llm} ...",
epilog="""
usage=f"prowler [-h] [--version] {{aws,azure,gcp,kubernetes,m365,github,googleworkspace,okta,nhn,mongodbatlas,oraclecloud,alibabacloud,cloudflare,openstack,scaleway,stackit,vercel,dashboard,iac,image,llm{extra_providers_csv}}} ...",
epilog=f"""
Available Cloud Providers:
{aws,azure,gcp,kubernetes,m365,github,googleworkspace,okta,iac,llm,image,nhn,mongodbatlas,oraclecloud,alibabacloud,cloudflare,openstack,scaleway,stackit,vercel}
{{aws,azure,gcp,kubernetes,m365,github,googleworkspace,okta,iac,llm,image,nhn,mongodbatlas,oraclecloud,alibabacloud,cloudflare,openstack,scaleway,stackit,vercel{extra_providers_csv}}}
aws AWS Provider
azure Azure Provider
gcp GCP Provider
@@ -52,13 +94,14 @@ Available Cloud Providers:
nhn NHN Provider (Unofficial)
mongodbatlas MongoDB Atlas Provider
scaleway Scaleway Provider
vercel Vercel Provider
vercel Vercel Provider{extra_providers_text}
Available components:
dashboard Local dashboard
To see the different available options on a specific component, run:
prowler {provider|dashboard} -h|--help
prowler {{provider|dashboard}} -h|--help
Detailed documentation at https://docs.prowler.com
""",
@@ -117,8 +160,10 @@ Detailed documentation at https://docs.prowler.com
and (sys.argv[1] not in ("-v", "--version"))
):
# Since the provider is always the second argument, we are checking if
# a flag, starting by "-", is supplied
if "-" in sys.argv[1]:
# a flag is supplied. Use startswith("-") instead of "in" to avoid
# matching external provider names that contain hyphens
# (e.g. "local-acme-snowflake").
if sys.argv[1].startswith("-"):
sys.argv = self.__set_default_provider__(sys.argv)
# Provider aliases mapping
+67 -39
View File
@@ -10,7 +10,6 @@ from prowler.lib.outputs.compliance.cis.cis import get_cis_table
from prowler.lib.outputs.compliance.compliance_check import ( # noqa: F401 - re-export for backward compatibility
get_check_compliance,
)
from prowler.lib.outputs.compliance.csa.csa import get_csa_table
from prowler.lib.outputs.compliance.ens.ens import get_ens_table
from prowler.lib.outputs.compliance.generic.generic_table import (
get_generic_compliance_table,
@@ -33,24 +32,28 @@ def process_universal_compliance_frameworks(
output_filename: str,
provider: str,
generated_outputs: dict,
from_cli: bool = True,
is_last: bool = True,
) -> set:
"""Process universal compliance frameworks, generating CSV and OCSF outputs.
For each framework in *input_compliance_frameworks* that exists in
*universal_frameworks* and has an outputs.table_config, this function
creates both a CSV (UniversalComplianceOutput) and an OCSF JSON
(OCSFComplianceOutput) file. OCSF is always generated regardless of
*universal_frameworks* and has an ``outputs.table_config``, this function
writes both a CSV (``UniversalComplianceOutput``) and an OCSF JSON
(``OCSFComplianceOutput``) file. OCSF is always generated regardless of
the user's ``--output-formats`` flag.
The function is idempotent: it tracks already-created writers via
``generated_outputs["compliance"]`` keyed by ``file_path``. If invoked
again for the same framework (e.g. once per streaming batch), it
reuses the existing writer instead of recreating it. This guarantees
one output writer per framework for the whole execution and keeps
the OCSF JSON array valid across multiple calls.
Streaming-aware: writers are tracked via ``generated_outputs["compliance"]``
keyed by ``file_path``. On the first call per framework a new writer is
created and emits both findings and manual requirements; subsequent calls
reuse the writer, transform only the new ``finding_outputs`` (manual
requirements are not re-emitted), and append to the open file. Set
``from_cli=False`` and ``is_last=False`` for intermediate batches; pass
``is_last=True`` on the final batch to close the file (OCSF is also
finalized as a valid JSON array).
Returns the set of framework names that were processed so the caller
can remove them before entering the legacy per-provider output loop.
Returns the set of framework names processed so the caller can subtract
them from the legacy per-provider output loop.
"""
from prowler.lib.outputs.compliance.universal.ocsf_compliance import (
OCSFComplianceOutput,
@@ -65,6 +68,13 @@ def process_universal_compliance_frameworks(
if isinstance(out, (UniversalComplianceOutput, OCSFComplianceOutput))
}
def _flush(writer, framework, label, is_new):
if not is_new:
writer._transform(finding_outputs, framework, label, include_manual=False)
writer.close_file = is_last
writer.batch_write_data_to_file()
writer._data.clear()
processed = set()
for compliance_name in input_compliance_frameworks:
if not (
@@ -75,37 +85,46 @@ def process_universal_compliance_frameworks(
continue
fw = universal_frameworks[compliance_name]
compliance_label = (
fw.framework + "-" + fw.version if fw.version else fw.framework
)
# CSV output
csv_path = (
f"{output_directory}/compliance/" f"{output_filename}_{compliance_name}.csv"
)
if csv_path not in existing_writers:
output = UniversalComplianceOutput(
csv_writer = existing_writers.get(csv_path)
csv_is_new = csv_writer is None
if csv_is_new:
csv_writer = UniversalComplianceOutput(
findings=finding_outputs,
framework=fw,
file_path=csv_path,
from_cli=from_cli,
provider=provider,
)
generated_outputs["compliance"].append(output)
existing_writers[csv_path] = output
output.batch_write_data_to_file()
generated_outputs["compliance"].append(csv_writer)
existing_writers[csv_path] = csv_writer
_flush(csv_writer, fw, compliance_label, csv_is_new)
# OCSF output (always generated for universal frameworks)
ocsf_path = (
f"{output_directory}/compliance/"
f"{output_filename}_{compliance_name}.ocsf.json"
)
if ocsf_path not in existing_writers:
ocsf_output = OCSFComplianceOutput(
ocsf_writer = existing_writers.get(ocsf_path)
ocsf_is_new = ocsf_writer is None
if ocsf_is_new:
ocsf_writer = OCSFComplianceOutput(
findings=finding_outputs,
framework=fw,
file_path=ocsf_path,
from_cli=from_cli,
provider=provider,
)
generated_outputs["compliance"].append(ocsf_output)
existing_writers[ocsf_path] = ocsf_output
ocsf_output.batch_write_data_to_file()
generated_outputs["compliance"].append(ocsf_writer)
existing_writers[ocsf_path] = ocsf_writer
_flush(ocsf_writer, fw, compliance_label, ocsf_is_new)
processed.add(compliance_name)
@@ -206,15 +225,6 @@ def display_compliance_table(
output_directory,
compliance_overview,
)
elif compliance_framework.startswith("csa_ccm_"):
get_csa_table(
findings,
bulk_checks_metadata,
compliance_framework,
output_filename,
output_directory,
compliance_overview,
)
elif compliance_framework.startswith("c5_"):
get_c5_table(
findings,
@@ -243,14 +253,32 @@ def display_compliance_table(
compliance_overview,
)
else:
get_generic_compliance_table(
findings,
bulk_checks_metadata,
compliance_framework,
output_filename,
output_directory,
compliance_overview,
)
# Try provider-specific table first, fall back to generic
from prowler.providers.common.provider import Provider
provider = Provider.get_global_provider()
handled = False
if provider is not None:
try:
handled = provider.display_compliance_table(
findings,
bulk_checks_metadata,
compliance_framework,
output_filename,
output_directory,
compliance_overview,
)
except NotImplementedError:
handled = False
if not handled:
get_generic_compliance_table(
findings,
bulk_checks_metadata,
compliance_framework,
output_filename,
output_directory,
compliance_overview,
)
except Exception as error:
logger.critical(
f"{error.__class__.__name__}:{error.__traceback__.tb_lineno} -- {error}"
-101
View File
@@ -1,101 +0,0 @@
from colorama import Fore, Style
from tabulate import tabulate
from prowler.config.config import orange_color
def get_csa_table(
findings: list,
bulk_checks_metadata: dict,
compliance_framework: str,
output_filename: str,
output_directory: str,
compliance_overview: bool,
):
section_table = {
"Provider": [],
"Section": [],
"Status": [],
"Muted": [],
}
pass_count = []
fail_count = []
muted_count = []
sections = {}
for index, finding in enumerate(findings):
check = bulk_checks_metadata[finding.check_metadata.CheckID]
check_compliances = check.Compliance
for compliance in check_compliances:
if (
compliance.Framework == "CSA-CCM"
and compliance.Version in compliance_framework
):
for requirement in compliance.Requirements:
for attribute in requirement.Attributes:
section = attribute.Section
if section not in sections:
sections[section] = {"FAIL": 0, "PASS": 0, "Muted": 0}
if finding.muted:
if index not in muted_count:
muted_count.append(index)
sections[section]["Muted"] += 1
else:
if finding.status == "FAIL" and index not in fail_count:
fail_count.append(index)
sections[section]["FAIL"] += 1
elif finding.status == "PASS" and index not in pass_count:
pass_count.append(index)
sections[section]["PASS"] += 1
sections = dict(sorted(sections.items()))
for section in sections:
section_table["Provider"].append(compliance.Provider)
section_table["Section"].append(section)
if sections[section]["FAIL"] > 0:
section_table["Status"].append(
f"{Fore.RED}FAIL({sections[section]['FAIL']}){Style.RESET_ALL}"
)
else:
if sections[section]["PASS"] > 0:
section_table["Status"].append(
f"{Fore.GREEN}PASS({sections[section]['PASS']}){Style.RESET_ALL}"
)
else:
section_table["Status"].append(f"{Fore.GREEN}PASS{Style.RESET_ALL}")
section_table["Muted"].append(
f"{orange_color}{sections[section]['Muted']}{Style.RESET_ALL}"
)
if (
len(fail_count) + len(pass_count) + len(muted_count) > 1
): # If there are no resources, don't print the compliance table
print(
f"\nCompliance Status of {Fore.YELLOW}{compliance_framework.upper()}{Style.RESET_ALL} Framework:"
)
total_findings_count = len(fail_count) + len(pass_count) + len(muted_count)
overview_table = [
[
f"{Fore.RED}{round(len(fail_count) / total_findings_count * 100, 2)}% ({len(fail_count)}) FAIL{Style.RESET_ALL}",
f"{Fore.GREEN}{round(len(pass_count) / total_findings_count * 100, 2)}% ({len(pass_count)}) PASS{Style.RESET_ALL}",
f"{orange_color}{round(len(muted_count) / total_findings_count * 100, 2)}% ({len(muted_count)}) MUTED{Style.RESET_ALL}",
]
]
print(tabulate(overview_table, tablefmt="rounded_grid"))
if not compliance_overview:
if len(fail_count) > 0 and len(section_table["Section"]) > 0:
print(
f"\nFramework {Fore.YELLOW}{compliance_framework.upper()}{Style.RESET_ALL} Results:"
)
print(
tabulate(
section_table,
tablefmt="rounded_grid",
headers="keys",
)
)
print(f"\nDetailed results of {compliance_framework.upper()} are in:")
print(
f" - CSV: {output_directory}/compliance/{output_filename}_{compliance_framework}.csv\n"
)
@@ -1,95 +0,0 @@
from prowler.config.config import timestamp
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.compliance.compliance_output import ComplianceOutput
from prowler.lib.outputs.compliance.csa.models import AlibabaCloudCSAModel
from prowler.lib.outputs.finding import Finding
class AlibabaCloudCSA(ComplianceOutput):
"""
This class represents the Alibaba Cloud CSA compliance output.
Attributes:
- _data (list): A list to store transformed data from findings.
- _file_descriptor (TextIOWrapper): A file descriptor to write data to a file.
Methods:
- transform: Transforms findings into Alibaba Cloud CSA compliance format.
"""
def transform(
self,
findings: list[Finding],
compliance: Compliance,
compliance_name: str,
) -> None:
"""
Transforms a list of findings into Alibaba Cloud CSA compliance format.
Parameters:
- findings (list): A list of findings.
- compliance (Compliance): A compliance model.
- compliance_name (str): The name of the compliance model.
Returns:
- None
"""
for finding in findings:
for requirement in compliance.Requirements:
# Source of truth: framework JSON, not finding.compliance snapshot (avoids CSV/UI count drift).
if finding.check_id in requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = AlibabaCloudCSAModel(
Provider=finding.provider,
Description=compliance.Description,
AccountId=finding.account_uid,
Region=finding.region,
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status=finding.status,
StatusExtended=finding.status_extended,
ResourceId=finding.resource_uid,
ResourceName=finding.resource_name,
CheckId=finding.check_id,
Muted=finding.muted,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
# Add manual requirements to the compliance output
for requirement in compliance.Requirements:
if not requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = AlibabaCloudCSAModel(
Provider=compliance.Provider.lower(),
Description=compliance.Description,
AccountId="",
Region="",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status="MANUAL",
StatusExtended="Manual check",
ResourceId="manual_check",
ResourceName="Manual check",
CheckId="manual",
Muted=False,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
@@ -1,95 +0,0 @@
from prowler.config.config import timestamp
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.compliance.compliance_output import ComplianceOutput
from prowler.lib.outputs.compliance.csa.models import AWSCSAModel
from prowler.lib.outputs.finding import Finding
class AWSCSA(ComplianceOutput):
"""
This class represents the AWS CSA compliance output.
Attributes:
- _data (list): A list to store transformed data from findings.
- _file_descriptor (TextIOWrapper): A file descriptor to write data to a file.
Methods:
- transform: Transforms findings into AWS CSA compliance format.
"""
def transform(
self,
findings: list[Finding],
compliance: Compliance,
compliance_name: str,
) -> None:
"""
Transforms a list of findings into AWS CSA compliance format.
Parameters:
- findings (list): A list of findings.
- compliance (Compliance): A compliance model.
- compliance_name (str): The name of the compliance model.
Returns:
- None
"""
for finding in findings:
for requirement in compliance.Requirements:
# Source of truth: framework JSON, not finding.compliance snapshot (avoids CSV/UI count drift).
if finding.check_id in requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = AWSCSAModel(
Provider=finding.provider,
Description=compliance.Description,
AccountId=finding.account_uid,
Region=finding.region,
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status=finding.status,
StatusExtended=finding.status_extended,
ResourceId=finding.resource_uid,
ResourceName=finding.resource_name,
CheckId=finding.check_id,
Muted=finding.muted,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
# Add manual requirements to the compliance output
for requirement in compliance.Requirements:
if not requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = AWSCSAModel(
Provider=compliance.Provider.lower(),
Description=compliance.Description,
AccountId="",
Region="",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status="MANUAL",
StatusExtended="Manual check",
ResourceId="manual_check",
ResourceName="Manual check",
CheckId="manual",
Muted=False,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
@@ -1,95 +0,0 @@
from prowler.config.config import timestamp
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.compliance.compliance_output import ComplianceOutput
from prowler.lib.outputs.compliance.csa.models import AzureCSAModel
from prowler.lib.outputs.finding import Finding
class AzureCSA(ComplianceOutput):
"""
This class represents the Azure CSA compliance output.
Attributes:
- _data (list): A list to store transformed data from findings.
- _file_descriptor (TextIOWrapper): A file descriptor to write data to a file.
Methods:
- transform: Transforms findings into Azure CSA compliance format.
"""
def transform(
self,
findings: list[Finding],
compliance: Compliance,
compliance_name: str,
) -> None:
"""
Transforms a list of findings into Azure CSA compliance format.
Parameters:
- findings (list): A list of findings.
- compliance (Compliance): A compliance model.
- compliance_name (str): The name of the compliance model.
Returns:
- None
"""
for finding in findings:
for requirement in compliance.Requirements:
# Source of truth: framework JSON, not finding.compliance snapshot (avoids CSV/UI count drift).
if finding.check_id in requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = AzureCSAModel(
Provider=finding.provider,
Description=compliance.Description,
SubscriptionId=finding.account_uid,
Location=finding.region,
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status=finding.status,
StatusExtended=finding.status_extended,
ResourceId=finding.resource_uid,
ResourceName=finding.resource_name,
CheckId=finding.check_id,
Muted=finding.muted,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
# Add manual requirements to the compliance output
for requirement in compliance.Requirements:
if not requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = AzureCSAModel(
Provider=compliance.Provider.lower(),
Description=compliance.Description,
SubscriptionId="",
Location="",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status="MANUAL",
StatusExtended="Manual check",
ResourceId="manual_check",
ResourceName="Manual check",
CheckId="manual",
Muted=False,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
@@ -1,95 +0,0 @@
from prowler.config.config import timestamp
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.compliance.compliance_output import ComplianceOutput
from prowler.lib.outputs.compliance.csa.models import GCPCSAModel
from prowler.lib.outputs.finding import Finding
class GCPCSA(ComplianceOutput):
"""
This class represents the GCP CSA compliance output.
Attributes:
- _data (list): A list to store transformed data from findings.
- _file_descriptor (TextIOWrapper): A file descriptor to write data to a file.
Methods:
- transform: Transforms findings into GCP CSA compliance format.
"""
def transform(
self,
findings: list[Finding],
compliance: Compliance,
compliance_name: str,
) -> None:
"""
Transforms a list of findings into GCP CSA compliance format.
Parameters:
- findings (list): A list of findings.
- compliance (Compliance): A compliance model.
- compliance_name (str): The name of the compliance model.
Returns:
- None
"""
for finding in findings:
for requirement in compliance.Requirements:
# Source of truth: framework JSON, not finding.compliance snapshot (avoids CSV/UI count drift).
if finding.check_id in requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = GCPCSAModel(
Provider=finding.provider,
Description=compliance.Description,
ProjectId=finding.account_uid,
Location=finding.region,
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status=finding.status,
StatusExtended=finding.status_extended,
ResourceId=finding.resource_uid,
ResourceName=finding.resource_name,
CheckId=finding.check_id,
Muted=finding.muted,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
# Add manual requirements to the compliance output
for requirement in compliance.Requirements:
if not requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = GCPCSAModel(
Provider=compliance.Provider.lower(),
Description=compliance.Description,
ProjectId="",
Location="",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status="MANUAL",
StatusExtended="Manual check",
ResourceId="manual_check",
ResourceName="Manual check",
CheckId="manual",
Muted=False,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
@@ -1,95 +0,0 @@
from prowler.config.config import timestamp
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.compliance.compliance_output import ComplianceOutput
from prowler.lib.outputs.compliance.csa.models import OracleCloudCSAModel
from prowler.lib.outputs.finding import Finding
class OracleCloudCSA(ComplianceOutput):
"""
This class represents the OracleCloud CSA compliance output.
Attributes:
- _data (list): A list to store transformed data from findings.
- _file_descriptor (TextIOWrapper): A file descriptor to write data to a file.
Methods:
- transform: Transforms findings into OracleCloud CSA compliance format.
"""
def transform(
self,
findings: list[Finding],
compliance: Compliance,
compliance_name: str,
) -> None:
"""
Transforms a list of findings into OracleCloud CSA compliance format.
Parameters:
- findings (list): A list of findings.
- compliance (Compliance): A compliance model.
- compliance_name (str): The name of the compliance model.
Returns:
- None
"""
for finding in findings:
for requirement in compliance.Requirements:
# Source of truth: framework JSON, not finding.compliance snapshot (avoids CSV/UI count drift).
if finding.check_id in requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = OracleCloudCSAModel(
Provider=finding.provider,
Description=compliance.Description,
TenancyId=finding.account_uid,
Region=finding.region,
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status=finding.status,
StatusExtended=finding.status_extended,
ResourceId=finding.resource_uid,
ResourceName=finding.resource_name,
CheckId=finding.check_id,
Muted=finding.muted,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
# Add manual requirements to the compliance output
for requirement in compliance.Requirements:
if not requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = OracleCloudCSAModel(
Provider=compliance.Provider.lower(),
Description=compliance.Description,
TenancyId="",
Region="",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Name=requirement.Name,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_CCMLite=attribute.CCMLite,
Requirements_Attributes_IaaS=attribute.IaaS,
Requirements_Attributes_PaaS=attribute.PaaS,
Requirements_Attributes_SaaS=attribute.SaaS,
Requirements_Attributes_ScopeApplicability=attribute.ScopeApplicability,
Status="MANUAL",
StatusExtended="Manual check",
ResourceId="manual_check",
ResourceName="Manual check",
CheckId="manual",
Muted=False,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
@@ -1,146 +0,0 @@
from pydantic.v1 import BaseModel
class AWSCSAModel(BaseModel):
"""
AWSCSAModel generates a finding's output in CSV CSA format for AWS.
"""
Provider: str
Description: str
AccountId: str
Region: str
AssessmentDate: str
Requirements_Id: str
Requirements_Description: str
Requirements_Name: str
Requirements_Attributes_Section: str
Requirements_Attributes_CCMLite: str
Requirements_Attributes_IaaS: str
Requirements_Attributes_PaaS: str
Requirements_Attributes_SaaS: str
Requirements_Attributes_ScopeApplicability: list[dict]
Status: str
StatusExtended: str
ResourceId: str
CheckId: str
Muted: bool
ResourceName: str
Framework: str
Name: str
class GCPCSAModel(BaseModel):
"""
GCPCSAModel generates a finding's output in CSV CSA format for GCP.
"""
Provider: str
Description: str
ProjectId: str
Location: str
AssessmentDate: str
Requirements_Id: str
Requirements_Description: str
Requirements_Name: str
Requirements_Attributes_Section: str
Requirements_Attributes_CCMLite: str
Requirements_Attributes_IaaS: str
Requirements_Attributes_PaaS: str
Requirements_Attributes_SaaS: str
Requirements_Attributes_ScopeApplicability: list[dict]
Status: str
StatusExtended: str
ResourceId: str
CheckId: str
Muted: bool
ResourceName: str
Framework: str
Name: str
class OracleCloudCSAModel(BaseModel):
"""
OracleCloudCSAModel generates a finding's output in CSV CSA format for OracleCloud.
"""
Provider: str
Description: str
TenancyId: str
Region: str
AssessmentDate: str
Requirements_Id: str
Requirements_Description: str
Requirements_Name: str
Requirements_Attributes_Section: str
Requirements_Attributes_CCMLite: str
Requirements_Attributes_IaaS: str
Requirements_Attributes_PaaS: str
Requirements_Attributes_SaaS: str
Requirements_Attributes_ScopeApplicability: list[dict]
Status: str
StatusExtended: str
ResourceId: str
CheckId: str
Muted: bool
ResourceName: str
Framework: str
Name: str
class AlibabaCloudCSAModel(BaseModel):
"""
AlibabaCloudCSAModel generates a finding's output in CSV CSA format for Alibaba Cloud.
"""
Provider: str
Description: str
AccountId: str
Region: str
AssessmentDate: str
Requirements_Id: str
Requirements_Description: str
Requirements_Name: str
Requirements_Attributes_Section: str
Requirements_Attributes_CCMLite: str
Requirements_Attributes_IaaS: str
Requirements_Attributes_PaaS: str
Requirements_Attributes_SaaS: str
Requirements_Attributes_ScopeApplicability: list[dict]
Status: str
StatusExtended: str
ResourceId: str
CheckId: str
Muted: bool
ResourceName: str
Framework: str
Name: str
class AzureCSAModel(BaseModel):
"""
AzureCSAModel generates a finding's output in CSV CSA format for Azure.
"""
Provider: str
Description: str
SubscriptionId: str
Location: str
AssessmentDate: str
Requirements_Id: str
Requirements_Description: str
Requirements_Name: str
Requirements_Attributes_Section: str
Requirements_Attributes_CCMLite: str
Requirements_Attributes_IaaS: str
Requirements_Attributes_PaaS: str
Requirements_Attributes_SaaS: str
Requirements_Attributes_ScopeApplicability: list[dict]
Status: str
StatusExtended: str
ResourceId: str
CheckId: str
Muted: bool
ResourceName: str
Framework: str
Name: str
@@ -34,60 +34,48 @@ class GenericCompliance(ComplianceOutput):
Returns:
- None
"""
def compliance_row(requirement, attribute, finding=None):
# Read attribute fields defensively: GenericCompliance is the
# last-resort renderer for any framework, and provider-specific
# schemas (e.g. CIS, ENS, ISO27001) do not declare the universal
# Section/SubSection/SubGroup/Service/Type/Comment fields.
return GenericComplianceModel(
Provider=(finding.provider if finding else compliance.Provider.lower()),
Description=compliance.Description,
AccountId=finding.account_uid if finding else "",
Region=finding.region if finding else "",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Attributes_Section=getattr(attribute, "Section", None),
Requirements_Attributes_SubSection=getattr(
attribute, "SubSection", None
),
Requirements_Attributes_SubGroup=getattr(attribute, "SubGroup", None),
Requirements_Attributes_Service=getattr(attribute, "Service", None),
Requirements_Attributes_Type=getattr(attribute, "Type", None),
Requirements_Attributes_Comment=getattr(attribute, "Comment", None),
Status=finding.status if finding else "MANUAL",
StatusExtended=(finding.status_extended if finding else "Manual check"),
ResourceId=finding.resource_uid if finding else "manual_check",
ResourceName=finding.resource_name if finding else "Manual check",
CheckId=finding.check_id if finding else "manual",
Muted=finding.muted if finding else False,
Framework=compliance.Framework,
Name=compliance.Name,
)
for finding in findings:
for requirement in compliance.Requirements:
# Source of truth: framework JSON, not finding.compliance snapshot (avoids CSV/UI count drift).
if finding.check_id in requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = GenericComplianceModel(
Provider=finding.provider,
Description=compliance.Description,
AccountId=finding.account_uid,
Region=finding.region,
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_SubSection=attribute.SubSection,
Requirements_Attributes_SubGroup=attribute.SubGroup,
Requirements_Attributes_Service=attribute.Service,
Requirements_Attributes_Type=attribute.Type,
Requirements_Attributes_Comment=attribute.Comment,
Status=finding.status,
StatusExtended=finding.status_extended,
ResourceId=finding.resource_uid,
ResourceName=finding.resource_name,
CheckId=finding.check_id,
Muted=finding.muted,
Framework=compliance.Framework,
Name=compliance.Name,
self._data.append(
compliance_row(requirement, attribute, finding)
)
self._data.append(compliance_row)
# Add manual requirements to the compliance output
for requirement in compliance.Requirements:
if not requirement.Checks:
for attribute in requirement.Attributes:
compliance_row = GenericComplianceModel(
Provider=compliance.Provider.lower(),
Description=compliance.Description,
AccountId="",
Region="",
AssessmentDate=str(timestamp),
Requirements_Id=requirement.Id,
Requirements_Description=requirement.Description,
Requirements_Attributes_Section=attribute.Section,
Requirements_Attributes_SubSection=attribute.SubSection,
Requirements_Attributes_SubGroup=attribute.SubGroup,
Requirements_Attributes_Service=attribute.Service,
Requirements_Attributes_Type=attribute.Type,
Requirements_Attributes_Comment=attribute.Comment,
Status="MANUAL",
StatusExtended="Manual check",
ResourceId="manual_check",
ResourceName="Manual check",
CheckId="manual",
Muted=False,
Framework=compliance.Framework,
Name=compliance.Name,
)
self._data.append(compliance_row)
self._data.append(compliance_row(requirement, attribute))
@@ -79,30 +79,43 @@ def _to_snake_case(name: str) -> str:
return s.lower()
def _build_requirement_attrs(requirement, framework) -> dict:
"""Build a dict with requirement attributes for the unmapped section.
def _build_requirement_attrs(requirement, framework):
"""Build the requirement attributes payload for the unmapped section.
Keys are normalized to snake_case for OCSF consistency.
Only includes attributes whose AttributeMetadata has output_formats.ocsf=True.
When no metadata is declared, all attributes are included.
Keys are snake_cased and filtered by ``AttributeMetadata.output_formats.ocsf``
when declared. MITRE-style attrs (``{"_raw_attributes": [...]}``) are
unwrapped into a list of per-entry dicts.
"""
attrs = requirement.attributes
if not attrs:
requirement_attributes = requirement.attributes
if not requirement_attributes:
return {}
# Build set of keys allowed for OCSF output
metadata = framework.attributes_metadata
if metadata:
ocsf_keys = {m.key for m in metadata if m.output_formats.ocsf}
else:
ocsf_keys = None # No metadata → include all
allowed_keys = (
{entry.key for entry in metadata if entry.output_formats.ocsf}
if metadata
else None
)
result = {}
for key, value in attrs.items():
if ocsf_keys is not None and key not in ocsf_keys:
continue
result[_to_snake_case(key)] = value
return result
def _to_snake_case_dict(entry: dict) -> dict:
return {
_to_snake_case(key): value
for key, value in entry.items()
if allowed_keys is None or key in allowed_keys
}
if (
isinstance(requirement_attributes, dict)
and "_raw_attributes" in requirement_attributes
):
raw_entries = requirement_attributes.get("_raw_attributes") or []
return [
_to_snake_case_dict(entry)
for entry in raw_entries
if isinstance(entry, dict)
]
return _to_snake_case_dict(requirement_attributes)
class OCSFComplianceOutput:
@@ -147,7 +160,14 @@ class OCSFComplianceOutput:
findings: List["Finding"],
framework: ComplianceFramework,
compliance_name: str,
include_manual: bool = True,
) -> None:
"""Transform findings into OCSF ComplianceFinding events.
Manual requirements are emitted only when ``include_manual=True``. The
caller must pass ``False`` for subsequent streaming batches so manual
events are not duplicated.
"""
# Build check -> requirements map
check_req_map = {}
for req in framework.requirements:
@@ -170,6 +190,9 @@ class OCSFComplianceOutput:
if cf:
self._data.append(cf)
if not include_manual:
return
# Manual requirements (no checks or empty for current provider)
for req in framework.requirements:
checks = req.checks
@@ -198,8 +198,15 @@ class UniversalComplianceOutput:
findings: list["Finding"],
framework: ComplianceFramework,
compliance_name: str,
include_manual: bool = True,
) -> None:
"""Transform findings into universal compliance CSV rows."""
"""Transform findings into universal compliance CSV rows.
Manual requirements (no checks or empty for current provider) are
emitted only when ``include_manual=True``. When the writer is reused
across streaming batches, the caller should pass ``False`` after the
first batch so manual rows are not duplicated.
"""
# Build check -> requirements map (filtered by provider for dict checks)
check_req_map = {}
for req in framework.requirements:
@@ -228,6 +235,9 @@ class UniversalComplianceOutput:
except Exception as e:
logger.debug(f"Skipping row for {req.id}: {e}")
if not include_manual:
return
# Manual requirements (no checks or empty dict)
for req in framework.requirements:
checks = req.checks
+5
View File
@@ -517,6 +517,11 @@ class Finding(BaseModel):
check_output, "fixed_version", ""
)
else:
# Dynamic fallback: any external/custom provider
provider_data = provider.get_finding_output_data(check_output)
output_data.update(provider_data)
# check_output Unique ID
# TODO: move this to a function
# TODO: in Azure, GCP and K8s there are findings without resource_name
+7 -5
View File
@@ -1608,11 +1608,13 @@ class HTML(Output):
# Azure_provider --> azure
# Kubernetes_provider --> kubernetes
# Dynamically get the Provider quick inventory handler
provider_html_assessment_summary_function = (
f"get_{provider.type}_assessment_summary"
)
return getattr(HTML, provider_html_assessment_summary_function)(provider)
# Try static method first, fall back to provider method
method_name = f"get_{provider.type}_assessment_summary"
if hasattr(HTML, method_name):
return getattr(HTML, method_name)(provider)
else:
# Dynamic fallback: any external/custom provider
return provider.get_html_assessment_summary()
except Exception as error:
logger.error(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}] -- {error}"
+15 -1
View File
@@ -229,7 +229,9 @@ class MarkdownToADFConverter:
return node
def _paragraph_with_text(self, text: str) -> Dict:
return {"type": "paragraph", "content": [self._create_text_node(text, None)]}
# ADF forbids empty text nodes; emit an empty paragraph instead.
content = [self._create_text_node(text, None)] if text else []
return {"type": "paragraph", "content": content}
@staticmethod
def _pop_mark(marks_stack: List[Dict], mark_type: str) -> None:
@@ -1118,6 +1120,18 @@ class Jira:
tenant_info: str = "",
) -> dict:
# ADF forbids empty text nodes, so Jira rejects them with 400 INVALID_INPUT.
def _safe(value: str) -> str:
return value if (value and value.strip()) else "-"
check_id = _safe(check_id)
check_title = _safe(check_title)
status_extended = _safe(status_extended)
provider = _safe(provider)
region = _safe(region)
resource_uid = _safe(resource_uid)
resource_name = _safe(resource_name)
table_rows = [
{
"type": "tableRow",
+8
View File
@@ -227,6 +227,10 @@ class OCSF(Output):
json_output = finding.json(exclude_none=True, indent=4)
self._file_descriptor.write(json_output)
self._file_descriptor.write(",")
except OSError:
# I/O errors (e.g. ENOSPC) are not recoverable per finding:
# fail fast instead of logging once per finding.
raise
except Exception as error:
logger.error(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
@@ -239,6 +243,10 @@ class OCSF(Output):
self._file_descriptor.truncate()
self._file_descriptor.write("]")
self._file_descriptor.close()
except OSError:
# Propagate unrecoverable I/O errors (e.g. ENOSPC) so the caller can
# fail fast instead of producing a corrupt output file.
raise
except Exception as error:
logger.error(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
+36 -22
View File
@@ -7,45 +7,52 @@ from prowler.lib.outputs.common import Status
from prowler.lib.outputs.finding import Finding
def stdout_report(finding, color, verbose, status, fix):
def stdout_report(finding, color, verbose, status, fix, provider=None):
if finding.check_metadata.Provider == "aws":
details = finding.region
if finding.check_metadata.Provider == "azure":
elif finding.check_metadata.Provider == "azure":
details = finding.location
if finding.check_metadata.Provider == "gcp":
elif finding.check_metadata.Provider == "gcp":
details = finding.location.lower()
if finding.check_metadata.Provider == "kubernetes":
elif finding.check_metadata.Provider == "kubernetes":
details = finding.namespace.lower()
if finding.check_metadata.Provider == "github":
elif finding.check_metadata.Provider == "github":
details = finding.owner
if finding.check_metadata.Provider == "m365":
elif finding.check_metadata.Provider == "m365":
details = finding.location
if finding.check_metadata.Provider == "mongodbatlas":
elif finding.check_metadata.Provider == "mongodbatlas":
details = finding.location
if finding.check_metadata.Provider == "nhn":
elif finding.check_metadata.Provider == "nhn":
details = finding.location
if finding.check_metadata.Provider == "stackit":
elif finding.check_metadata.Provider == "stackit":
details = finding.location
if finding.check_metadata.Provider == "llm":
elif finding.check_metadata.Provider == "llm":
details = finding.check_metadata.CheckID
if finding.check_metadata.Provider == "iac":
elif finding.check_metadata.Provider == "iac":
details = finding.check_metadata.CheckID
if finding.check_metadata.Provider == "oraclecloud":
elif finding.check_metadata.Provider == "oraclecloud":
details = finding.region
if finding.check_metadata.Provider == "alibabacloud":
elif finding.check_metadata.Provider == "alibabacloud":
details = finding.region
if finding.check_metadata.Provider == "openstack":
elif finding.check_metadata.Provider == "openstack":
details = finding.region
if finding.check_metadata.Provider == "cloudflare":
elif finding.check_metadata.Provider == "cloudflare":
details = finding.zone_name
if finding.check_metadata.Provider == "googleworkspace":
elif finding.check_metadata.Provider == "googleworkspace":
details = finding.location
if finding.check_metadata.Provider == "vercel":
elif finding.check_metadata.Provider == "vercel":
details = finding.region
if finding.check_metadata.Provider == "okta":
elif finding.check_metadata.Provider == "okta":
details = finding.region
if finding.check_metadata.Provider == "scaleway":
elif finding.check_metadata.Provider == "scaleway":
details = finding.region
else:
# Dynamic fallback: any external/custom provider
if provider is None:
from prowler.providers.common.provider import Provider
provider = Provider.get_global_provider()
details = provider.get_stdout_detail(finding)
if (verbose or fix) and (not status or finding.status in status):
if finding.muted:
@@ -65,12 +72,15 @@ def report(check_findings, provider, output_options):
if hasattr(output_options, "verbose"):
verbose = output_options.verbose
if check_findings:
# TO-DO Generic Function
if provider.type == "aws":
check_findings.sort(key=lambda x: x.region)
if provider.type == "azure":
elif provider.type == "azure":
check_findings.sort(key=lambda x: x.subscription)
else:
# Dynamic fallback: any external/custom provider
sort_key = provider.get_finding_sort_key()
if sort_key and isinstance(sort_key, str):
check_findings.sort(key=lambda x: getattr(x, sort_key, ""))
for finding in check_findings:
# Print findings by stdout
@@ -81,12 +91,16 @@ def report(check_findings, provider, output_options):
if hasattr(output_options, "fixer"):
fixer = output_options.fixer
color = set_report_color(finding.status, finding.muted)
# Pass the local `provider` through so the dynamic else inside
# `stdout_report` does not have to consult the global singleton
# — defeating the whole purpose of the new parameter.
stdout_report(
finding,
color,
verbose,
status,
fixer,
provider=provider,
)
else: # No service resources in the whole account
+3
View File
@@ -121,6 +121,9 @@ def display_summary_table(
elif provider.type == "scaleway":
entity_type = "Organization"
audited_entities = provider.identity.organization_id
else:
# Dynamic fallback: any external/custom provider
entity_type, audited_entities = provider.get_summary_entity()
# Check if there are findings and that they are not all MANUAL
if findings and not all(finding.status == "MANUAL" for finding in findings):
+9 -4
View File
@@ -4,8 +4,8 @@ from types import SimpleNamespace
from typing import Generator
from prowler.lib.check.check import (
_resolve_check_module,
execute,
import_check,
list_services,
update_audit_metadata,
)
@@ -426,9 +426,14 @@ class Scan:
# Recover service from check name
service = get_service_name_from_check_name(check_name)
try:
# Import check module
check_module_path = f"prowler.providers.{self._provider.type}.services.{service}.{check_name}.{check_name}"
lib = import_check(check_module_path)
# Import check module (built-in or entry point) —
# delegates to `_resolve_check_module` so external
# providers registered via entry points are resolved
# correctly (their checks do not live under
# `prowler.providers.{type}.services...`).
lib = _resolve_check_module(
self._provider.type, service, check_name
)
# Recover functions from check
check_to_execute = getattr(lib, check_name)
check = check_to_execute()
@@ -0,0 +1,40 @@
{
"Provider": "aws",
"CheckID": "elbv2_alb_drop_invalid_header_fields_enabled",
"CheckTitle": "Application Load Balancer should be configured to drop invalid HTTP header fields",
"CheckType": [
"Software and Configuration Checks/AWS Security Best Practices/Network Reachability",
"Software and Configuration Checks/Industry and Regulatory Standards/AWS Foundational Security Best Practices",
"TTPs/Initial Access",
"Effects/Data Exposure"
],
"ServiceName": "elbv2",
"SubServiceName": "",
"ResourceIdTemplate": "",
"Severity": "medium",
"ResourceType": "AwsElbv2LoadBalancer",
"ResourceGroup": "network",
"Description": "Ensure that Application Load Balancers (ALB) are configured to drop invalid HTTP header fields. The check fails when `routing.http.drop_invalid_header_fields.enabled` is not set to `true`. By default, ALBs do not remove HTTP headers that do not conform to RFC 7230.",
"Risk": "Forwarding non-RFC-compliant HTTP headers to backend targets enables HTTP desync (request smuggling):\n- **Confidentiality**: session/token theft, data exfiltration\n- **Integrity**: cache poisoning, request routing bypass, unauthorized actions\n- **Availability**: backend exhaustion.\nDropping invalid header fields removes a primary smuggling vector.",
"RelatedUrl": "",
"AdditionalURLs": [
"https://docs.aws.amazon.com/elasticloadbalancing/latest/application/application-load-balancers.html#drop-invalid-header-fields",
"https://docs.aws.amazon.com/securityhub/latest/userguide/elb-controls.html#elb-4"
],
"Remediation": {
"Code": {
"CLI": "aws elbv2 modify-load-balancer-attributes --load-balancer-arn <ALB_ARN> --attributes Key=routing.http.drop_invalid_header_fields.enabled,Value=true",
"NativeIaC": "```yaml\n# CloudFormation: enable drop invalid header fields on an ALB\nResources:\n <example_resource_name>:\n Type: AWS::ElasticLoadBalancingV2::LoadBalancer\n Properties:\n Type: application\n Subnets:\n - <example_subnet_id1>\n - <example_subnet_id2>\n LoadBalancerAttributes:\n - Key: routing.http.drop_invalid_header_fields.enabled # Critical: drop non-RFC-compliant headers\n Value: true\n```",
"Other": "1. Open the Amazon EC2 console and choose Load Balancers.\n2. Select the Application Load Balancer.\n3. On the Attributes tab, choose Edit.\n4. Set 'Drop invalid header fields' to Enabled.\n5. Save changes.",
"Terraform": "```hcl\n# Terraform: enable drop invalid header fields on an ALB\nresource \"aws_lb\" \"<example_resource_name>\" {\n name = \"<example_resource_name>\"\n load_balancer_type = \"application\"\n subnets = [\"<example_subnet_id1>\", \"<example_subnet_id2>\"]\n drop_invalid_header_fields = true # Critical: drop non-RFC-compliant headers\n}\n```"
},
"Recommendation": {
"Text": "Enable 'drop invalid header fields' on Application Load Balancers so non-RFC-compliant HTTP headers are removed before requests reach backend targets, reducing exposure to HTTP desync and request smuggling. Apply defense in depth and validate requests at the application layer as well.",
"Url": "https://hub.prowler.com/check/elbv2_alb_drop_invalid_header_fields_enabled"
}
},
"Categories": [],
"DependsOn": [],
"RelatedTo": [],
"Notes": ""
}
@@ -0,0 +1,27 @@
from prowler.lib.check.models import Check, Check_Report_AWS
from prowler.providers.aws.services.elbv2.elbv2_client import elbv2_client
class elbv2_alb_drop_invalid_header_fields_enabled(Check):
def execute(self):
findings = []
for lb in elbv2_client.loadbalancersv2.values():
if lb.type == "application":
report = Check_Report_AWS(
metadata=self.metadata(),
resource=lb,
)
report.status = "PASS"
report.status_extended = (
f"ELBv2 ALB {lb.name} is configured to drop invalid "
"header fields."
)
if lb.drop_invalid_header_fields != "true":
report.status = "FAIL"
report.status_extended = (
f"ELBv2 ALB {lb.name} is not configured to drop "
"invalid header fields."
)
findings.append(report)
return findings
@@ -0,0 +1,40 @@
{
"Provider": "aws",
"CheckID": "sagemaker_models_monitor_enabled",
"CheckTitle": "Amazon SageMaker has a monitoring schedule scheduled",
"CheckType": [
"Software and Configuration Checks/AWS Security Best Practices",
"Software and Configuration Checks/Industry and Regulatory Standards/AWS Foundational Security Best Practices"
],
"ServiceName": "sagemaker",
"SubServiceName": "",
"ResourceIdTemplate": "",
"Severity": "low",
"ResourceType": "Other",
"ResourceGroup": "ai_ml",
"Description": "**SageMaker Models Monitor** detects data drift, model quality issues, and bias drift in production.",
"Risk": "Without an **active monitoring schedule**, data drift, model quality issues, and bias drift go undetected, so **model quality degrades silently** while downstream decisions such as fraud detection, access control, and pricing keep relying on a degrading model.",
"RelatedUrl": "",
"AdditionalURLs": [
"https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor.html",
"https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-scheduling.html"
],
"Remediation": {
"Code": {
"CLI": "",
"NativeIaC": "",
"Other": "",
"Terraform": ""
},
"Recommendation": {
"Text": "Enable **Amazon SageMaker Model Monitor** and keep at least one **monitoring schedule** in the `Scheduled` state so data quality, model quality, and bias drift are continuously evaluated against a baseline.",
"Url": "https://hub.prowler.com/check/sagemaker_models_monitor_enabled"
}
},
"Categories": [
"gen-ai"
],
"DependsOn": [],
"RelatedTo": [],
"Notes": ""
}
@@ -0,0 +1,22 @@
from prowler.lib.check.models import Check, Check_Report_AWS
from prowler.providers.aws.services.sagemaker.sagemaker_client import sagemaker_client
class sagemaker_models_monitor_enabled(Check):
def execute(self):
findings = []
for monitoring_schedule in sagemaker_client.sagemaker_monitoring_schedules:
report = Check_Report_AWS(
metadata=self.metadata(), resource=monitoring_schedule
)
if monitoring_schedule.is_scheduled:
report.status = "PASS"
report.status_extended = f"SageMaker monitoring schedule {monitoring_schedule.name} is enabled in region {monitoring_schedule.region}."
elif not monitoring_schedule.has_schedules:
report.status = "FAIL"
report.status_extended = f"No SageMaker monitoring schedules found in region {monitoring_schedule.region}."
else:
report.status = "FAIL"
report.status_extended = f"No active SageMaker monitoring schedule in region {monitoring_schedule.region}; existing schedules are not in Scheduled status."
findings.append(report)
return findings
@@ -18,6 +18,7 @@ class SageMaker(AWSService):
self.sagemaker_domains = []
self.endpoint_configs = {}
self.sagemaker_model_registries = []
self.sagemaker_monitoring_schedules = []
# Retrieve resources concurrently
self.__threading_call__(self._list_notebook_instances)
@@ -26,6 +27,7 @@ class SageMaker(AWSService):
self.__threading_call__(self._list_endpoint_configs)
self.__threading_call__(self._list_domains)
self.__threading_call__(self._list_model_package_groups)
self.__threading_call__(self._list_monitoring_schedules)
# Describe resources concurrently
self.__threading_call__(self._describe_model, self.sagemaker_models)
@@ -377,6 +379,46 @@ class SageMaker(AWSService):
f"{regional_client.region} -- {error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
def _list_monitoring_schedules(self, regional_client):
logger.info("SageMaker - listing monitoring schedules...")
name = "SageMaker Monitoring Schedules"
arn = self.get_unknown_arn(
region=regional_client.region,
resource_type="monitoring-schedule",
)
has_schedules = False
is_scheduled = False
try:
paginator = regional_client.get_paginator("list_monitoring_schedules")
for page in paginator.paginate():
for schedule in page["MonitoringScheduleSummaries"]:
if not self.audit_resources or (
is_resource_filtered(
schedule["MonitoringScheduleArn"], self.audit_resources
)
):
has_schedules = True
if schedule["MonitoringScheduleStatus"] == "Scheduled":
is_scheduled = True
name = schedule["MonitoringScheduleName"]
arn = schedule["MonitoringScheduleArn"]
break
if is_scheduled:
break
except Exception as error:
logger.error(
f"{regional_client.region} -- {error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
self.sagemaker_monitoring_schedules.append(
MonitoringSchedule(
name=name,
region=regional_client.region,
arn=arn,
has_schedules=has_schedules,
is_scheduled=is_scheduled,
)
)
class NotebookInstance(BaseModel):
name: str
@@ -441,3 +483,11 @@ class ModelRegistry(BaseModel):
region: str
has_groups: bool = False
has_approved_packages: bool = False
class MonitoringSchedule(BaseModel):
name: str
region: str
arn: str
has_schedules: bool = False
is_scheduled: bool = False
+35 -12
View File
@@ -16,18 +16,41 @@ def init_providers_parser(self):
# We need to call the arguments parser for each provider
providers = Provider.get_available_providers()
for provider in providers:
try:
getattr(
import_module(
f"{providers_path}.{provider}.{provider_arguments_lib_path}"
),
init_provider_arguments_function,
)(self)
except Exception as error:
logger.critical(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
sys.exit(1)
# Discriminate built-in vs external upfront via find_spec, so an
# ImportError from a transitive dependency missing inside a built-in
# arguments module surfaces clearly instead of being silently
# re-routed to the entry-point path (which only has external providers).
if Provider.is_builtin(provider):
try:
getattr(
import_module(
f"{providers_path}.{provider}.{provider_arguments_lib_path}"
),
init_provider_arguments_function,
)(self)
except ImportError as e:
logger.critical(
f"Failed to load arguments for built-in provider '{provider}'. "
f"Missing dependency: {e}. "
f"Ensure all required dependencies are installed."
)
logger.debug("Full traceback:", exc_info=True)
sys.exit(1)
except Exception as error:
logger.critical(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
sys.exit(1)
else:
# External provider — init_parser classmethod via entry point
cls = Provider._load_ep_provider(provider)
if cls and hasattr(cls, "init_parser"):
try:
cls.init_parser(self)
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
def validate_provider_arguments(arguments: Namespace) -> tuple[bool, str]:
+29
View File
@@ -0,0 +1,29 @@
"""Leaf helper for built-in provider detection.
Lives in its own module with no imports back into `prowler.lib.check` so
that callers in `prowler.lib.check.*` can ask "is this provider built-in?"
without creating an import cycle through `prowler.providers.common.provider`
(which transitively imports `prowler.config.config` and from there
`prowler.lib.check.compliance_models` / `prowler.lib.check.external_tool_providers`).
Same rationale as `prowler.lib.check.tool_wrapper`: extracting the predicate
to a leaf module is the canonical way to break the cycle in this codebase.
"""
import importlib.util
def is_builtin_provider(provider: str) -> bool:
"""Return True if the provider's own package ships with the SDK.
Wraps `importlib.util.find_spec` in `try/except (ImportError, ValueError)`
because `find_spec` propagates `ModuleNotFoundError` when a parent package
in the dotted path does not exist (instead of returning `None`). The
try/except is what makes the call safe for external providers, whose
package does not live under `prowler.providers.{provider}`.
"""
try:
spec = importlib.util.find_spec(f"prowler.providers.{provider}")
return spec is not None
except (ImportError, ValueError):
return False
+13
View File
@@ -4,6 +4,7 @@ from os.path import isdir
from pydantic.v1 import BaseModel
from prowler.config.config import output_file_timestamp
from prowler.providers.common.provider import Provider
@@ -69,3 +70,15 @@ class Connection:
is_connected: bool = False
error: Exception = None
def default_output_options(provider, arguments, bulk_checks_metadata):
"""Generic OutputOptions fallback for external providers that do not
implement get_output_options, so the run still produces output instead of
aborting. Honors arguments.output_filename and otherwise derives a name
from the provider type."""
output_options = ProviderOutputOptions(arguments, bulk_checks_metadata)
output_options.output_filename = getattr(arguments, "output_filename", None) or (
f"prowler-output-{provider.type}-{output_file_timestamp}"
)
return output_options
+345 -32
View File
@@ -1,4 +1,6 @@
import importlib
import importlib.metadata
import importlib.util
import os
import pkgutil
import sys
@@ -136,6 +138,155 @@ class Provider(ABC):
"""
return set()
# --- Dynamic provider contract methods (not @abstractmethod for incremental migration) ---
_cli_help_text: str = ""
@classmethod
def from_cli_args(cls, arguments: Namespace, fixer_config: dict) -> "Provider":
"""Instantiate the provider from CLI arguments and return the instance.
The caller wires the returned instance into the global provider slot
via Provider.set_global_provider(). Implementations that already call
set_global_provider(self) from __init__ are also supported the call
site tolerates a None return in that case.
"""
raise NotImplementedError(f"{cls.__name__} has not implemented from_cli_args()")
def get_output_options(self, arguments, _bulk_checks_metadata):
"""Create the provider-specific OutputOptions."""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented get_output_options()"
)
def get_stdout_detail(self, _finding) -> str:
"""Return the detail string for stdout reporting (region, location, etc.)."""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented get_stdout_detail()"
)
def get_finding_sort_key(self) -> Optional[str]:
"""Return the attribute name to sort findings by, or None for no sorting."""
return None
def get_summary_entity(self) -> tuple:
"""Return (entity_type, audited_entities) for the summary table."""
return (self.type, getattr(self.identity, "account_id", ""))
def get_finding_output_data(self, _check_output) -> dict:
"""Return provider-specific fields for Finding.generate_output()."""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented get_finding_output_data()"
)
def get_html_assessment_summary(self) -> str:
"""Return the HTML assessment summary card for this provider."""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented get_html_assessment_summary()"
)
def generate_compliance_output(
self,
_findings,
_bulk_compliance_frameworks,
_input_compliance_frameworks,
_output_options,
_generated_outputs,
) -> None:
"""Generate compliance CSV output for this provider's frameworks."""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented generate_compliance_output()"
)
def get_mutelist_finding_args(self) -> dict:
"""Return extra kwargs for mutelist.is_finding_muted() besides 'finding'.
External providers must return a dict with the identity key their
Mutelist subclass expects, e.g. ``{"account_id": self.identity.account_id}``.
The ``finding`` kwarg is added automatically by the caller.
"""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented get_mutelist_finding_args()"
)
@classmethod
def get_scan_arguments(
cls,
provider_uid: str,
secret: dict,
mutelist_content: Optional[dict] = None,
) -> dict:
"""Build the provider constructor kwargs from a stored uid and secret.
This is the programmatic construction interface used by callers that
persist a provider as a single ``uid`` plus a ``secret`` dict (e.g. the
API), as opposed to the CLI which passes explicit per-provider flags.
The base implementation passes the secret through and adds the mutelist;
providers whose constructor needs the uid (e.g. as a subscription id) or
that rename/filter secret keys override this.
"""
kwargs = {**secret}
if mutelist_content:
kwargs["mutelist_content"] = mutelist_content
return kwargs
@classmethod
def get_connection_arguments(cls, provider_uid: str, secret: dict) -> dict:
"""Build the ``test_connection`` kwargs from a stored uid and secret.
Companion to :meth:`get_scan_arguments` for the connection check, which
often needs a different shape than the constructor. The base passes the
secret through; providers add their identity kwarg (and ``provider_id``
where their ``test_connection`` expects it).
"""
return {**secret}
@classmethod
def get_credentials_schema(cls) -> dict:
"""Return the provider's credential schemas keyed by secret type.
Maps each secret type the provider accepts (``"static"``, ``"role"`` or
``"service_account"``) to the pydantic model that validates a secret of
that type. The provider declares which type each schema belongs to, so
the API validates a secret against the model for the secret type it is
created with and the chosen type stays bound to the shape it claims.
Each model documents each field via ``Field(description=...)`` and
whether it is required (no default) or optional. An empty dict means no
schema is declared: the secret is accepted as an object and validated by
:meth:`test_connection`.
"""
return {}
def display_compliance_table(
self,
_findings: list,
_bulk_checks_metadata: dict,
_compliance_framework: str,
_output_filename: str,
_output_directory: str,
_compliance_overview: bool,
) -> bool:
"""Render a custom compliance table in the terminal.
External providers can override this to display a detailed
compliance table (e.g., per-section breakdown). Return True
if the table was rendered, False to fall back to the generic table.
"""
raise NotImplementedError(
f"{self.__class__.__name__} has not implemented display_compliance_table()"
)
# Class-level flag: True for providers that delegate scanning to an external
# tool (e.g. Trivy, promptfoo) and bypass standard check/service loading and
# metadata validation. Subclasses override as `is_external_tool_provider = True`.
# Kept as a class attribute (not a property) so it can be read from the class
# without instantiation — the metadata validators in lib.check.models need to
# decide whether to relax validation before any provider instance exists.
is_external_tool_provider: bool = False
# --- End dynamic provider contract methods ---
@staticmethod
def get_excluded_regions_from_env() -> set:
"""Parse the PROWLER_AWS_DISALLOWED_REGIONS environment variable.
@@ -159,20 +310,57 @@ class Provider(ABC):
@staticmethod
def init_global_provider(arguments: Namespace) -> None:
try:
provider_class_path = (
f"{providers_path}.{arguments.provider}.{arguments.provider}_provider"
)
provider_class_name = f"{arguments.provider.capitalize()}Provider"
provider_class = getattr(
import_module(provider_class_path), provider_class_name
)
# Delegate class resolution to the public, side-effect-free
# resolver. init_global_provider owns the CLI-specific error
# handling: a missing transitive dep in a built-in becomes a
# logger.critical + sys.exit(1); a completely unknown provider
# re-raises so the outer try/except can sys.exit too.
try:
provider_class = Provider.get_class(arguments.provider)
except ImportError as e:
if Provider.is_builtin(arguments.provider):
# Built-in's transitive dependency is missing — loud CLI error.
logger.critical(
f"Failed to load built-in provider '{arguments.provider}'. "
f"Missing dependency: {e}. "
f"Ensure all required dependencies are installed."
)
logger.debug("Full traceback:", exc_info=True)
sys.exit(1)
# Unknown or missing external provider — propagate so the
# outer try/except can handle it (sys.exit(1) via generic
# exception handler).
raise
# Built-in wins on name collision — warn that a same-named
# plug-in is ignored. This lives here (not in get_class) so
# that `prowler --help` and API callers that resolve a class
# without initialising a global provider do not see spurious
# warnings. Match by name only — never ep.load() a shadowing
# plug-in, or its module code would run during a built-in run.
if Provider.is_builtin(arguments.provider) and any(
ep.name == arguments.provider
for ep in importlib.metadata.entry_points(group="prowler.providers")
):
logger.warning(
f"Plug-in provider '{arguments.provider}' registered "
f"via entry points is being IGNORED — a built-in with "
f"the same name exists. To use your plug-in, register "
f"it under a different name."
)
fixer_config = load_and_validate_config_file(
arguments.provider, arguments.fixer_config
)
# Dispatch by exact provider name (equality, not substring) so
# external plug-ins whose names contain a built-in substring
# (e.g. `awsx`, `azure_gov`, `iac_v2`) cannot be silently routed
# to the wrong built-in branch. Anything that doesn't match a
# built-in falls through to the dynamic else and uses the
# contract's `from_cli_args`.
if not isinstance(Provider._global, provider_class):
if "aws" in provider_class_name.lower():
if arguments.provider == "aws":
excluded_regions = (
set(arguments.excluded_region)
if getattr(arguments, "excluded_region", None)
@@ -196,7 +384,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "azure" in provider_class_name.lower():
elif arguments.provider == "azure":
provider_class(
az_cli_auth=arguments.az_cli_auth,
sp_env_auth=arguments.sp_env_auth,
@@ -209,7 +397,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "gcp" in provider_class_name.lower():
elif arguments.provider == "gcp":
provider_class(
retries_max_attempts=arguments.gcp_retries_max_attempts,
organization_id=arguments.organization_id,
@@ -223,7 +411,7 @@ class Provider(ABC):
fixer_config=fixer_config,
skip_api_check=arguments.skip_api_check,
)
elif "kubernetes" in provider_class_name.lower():
elif arguments.provider == "kubernetes":
provider_class(
kubeconfig_file=arguments.kubeconfig_file,
context=arguments.context,
@@ -233,7 +421,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "m365" in provider_class_name.lower():
elif arguments.provider == "m365":
provider_class(
region=arguments.region,
config_path=arguments.config_file,
@@ -247,7 +435,7 @@ class Provider(ABC):
init_modules=arguments.init_modules,
fixer_config=fixer_config,
)
elif "nhn" in provider_class_name.lower():
elif arguments.provider == "nhn":
provider_class(
username=arguments.nhn_username,
password=arguments.nhn_password,
@@ -256,7 +444,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "stackit" in provider_class_name.lower():
elif arguments.provider == "stackit":
provider_class(
project_id=arguments.stackit_project_id,
service_account_key_path=getattr(
@@ -275,7 +463,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "github" in provider_class_name.lower():
elif arguments.provider == "github":
orgs = []
repos = []
@@ -307,13 +495,13 @@ class Provider(ABC):
exclude_workflows=getattr(arguments, "exclude_workflows", []),
fixer_config=fixer_config,
)
elif "googleworkspace" in provider_class_name.lower():
elif arguments.provider == "googleworkspace":
provider_class(
config_path=arguments.config_file,
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "cloudflare" in provider_class_name.lower():
elif arguments.provider == "cloudflare":
provider_class(
filter_zones=arguments.region,
filter_accounts=arguments.account_id,
@@ -321,7 +509,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "iac" in provider_class_name.lower():
elif arguments.provider == "iac":
provider_class(
scan_path=arguments.scan_path,
scan_repository_url=arguments.scan_repository_url,
@@ -334,13 +522,13 @@ class Provider(ABC):
oauth_app_token=arguments.oauth_app_token,
provider_uid=arguments.provider_uid,
)
elif "llm" in provider_class_name.lower():
elif arguments.provider == "llm":
provider_class(
max_concurrency=arguments.max_concurrency,
config_path=arguments.config_file,
fixer_config=fixer_config,
)
elif "image" in provider_class_name.lower():
elif arguments.provider == "image":
provider_class(
images=arguments.images,
image_list_file=arguments.image_list_file,
@@ -358,7 +546,7 @@ class Provider(ABC):
registry_insecure=arguments.registry_insecure,
registry_list_images=arguments.registry_list_images,
)
elif "mongodbatlas" in provider_class_name.lower():
elif arguments.provider == "mongodbatlas":
provider_class(
atlas_public_key=arguments.atlas_public_key,
atlas_private_key=arguments.atlas_private_key,
@@ -367,7 +555,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "oraclecloud" in provider_class_name.lower():
elif arguments.provider == "oraclecloud":
provider_class(
oci_config_file=arguments.oci_config_file,
profile=arguments.profile,
@@ -378,7 +566,7 @@ class Provider(ABC):
fixer_config=fixer_config,
use_instance_principal=arguments.use_instance_principal,
)
elif "openstack" in provider_class_name.lower():
elif arguments.provider == "openstack":
provider_class(
clouds_yaml_file=getattr(arguments, "clouds_yaml_file", None),
clouds_yaml_content=getattr(
@@ -403,7 +591,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "alibabacloud" in provider_class_name.lower():
elif arguments.provider == "alibabacloud":
provider_class(
role_arn=arguments.role_arn,
role_session_name=arguments.role_session_name,
@@ -415,14 +603,14 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "vercel" in provider_class_name.lower():
elif arguments.provider == "vercel":
provider_class(
projects=getattr(arguments, "project", None),
config_path=arguments.config_file,
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "okta" in provider_class_name.lower():
elif arguments.provider == "okta":
provider_class(
okta_org_domain=getattr(arguments, "okta_org_domain", ""),
okta_client_id=getattr(arguments, "okta_client_id", ""),
@@ -435,7 +623,7 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
elif "scaleway" in provider_class_name.lower():
elif arguments.provider == "scaleway":
# Credentials are read from the SCW_ACCESS_KEY /
# SCW_SECRET_KEY env vars by the provider itself; there
# are no credential CLI flags to avoid leaking secrets.
@@ -447,6 +635,18 @@ class Provider(ABC):
mutelist_path=arguments.mutelist_file,
fixer_config=fixer_config,
)
else:
# Dynamic fallback: any external/custom provider.
# Honor the from_cli_args type hint (-> Provider): if the
# implementation returns an instance, wire it as the global
# provider here. Implementations that call
# set_global_provider(self) from __init__ return None and
# remain supported (the condition below is a no-op for them).
provider_instance = provider_class.from_cli_args(
arguments, fixer_config
)
if provider_instance is not None:
Provider.set_global_provider(provider_instance)
except TypeError as error:
logger.critical(
@@ -459,17 +659,130 @@ class Provider(ABC):
)
sys.exit(1)
# Cache for entry-point provider classes {name: class}
_ep_providers: dict = {}
@staticmethod
def get_available_providers() -> list[str]:
"""get_available_providers returns a list of the available providers"""
providers = []
# Dynamically import the package based on its string path
providers = set()
# Built-in providers from local package
prowler_providers = importlib.import_module(providers_path)
# Iterate over all modules found in the prowler_providers package
for _, provider, ispkg in pkgutil.iter_modules(prowler_providers.__path__):
if provider != "common" and ispkg:
providers.append(provider)
return providers
providers.add(provider)
# External providers registered via entry points
for ep in importlib.metadata.entry_points(group="prowler.providers"):
providers.add(ep.name)
return sorted(providers)
@staticmethod
def is_tool_wrapper_provider(provider: str) -> bool:
"""Return True if the provider delegates scanning to an external tool.
Delegates to `prowler.lib.check.tool_wrapper.is_tool_wrapper_provider`,
the leaf module that holds the actual logic. Kept on `Provider` as a
convenience entry point for callers that already import `Provider`.
"""
from prowler.lib.check.tool_wrapper import is_tool_wrapper_provider as _impl
return _impl(provider)
@staticmethod
def is_builtin(provider: str) -> bool:
"""Return True if the provider's own package is importable as a built-in.
Delegates to `prowler.providers.common.builtin.is_builtin_provider`,
the leaf module that holds the actual check. Kept on `Provider` as a
convenience entry point for callers that already import `Provider`.
Call sites in `prowler.lib.check.*` should import from the leaf
directly to avoid the import cycle through this module.
"""
from prowler.providers.common.builtin import is_builtin_provider as _impl
return _impl(provider)
@staticmethod
def _load_ep_provider(name: str):
"""Load an external provider class from entry points, with cache.
Caches both hits and misses so repeated lookups for unknown names do
not re-iterate entry_points(). Symmetric with
tool_wrapper._ep_class_cache.
"""
if name in Provider._ep_providers:
return Provider._ep_providers[name]
for ep in importlib.metadata.entry_points(group="prowler.providers"):
if ep.name == name:
try:
cls = ep.load()
Provider._ep_providers[name] = cls
return cls
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
Provider._ep_providers[name] = None
return None
@staticmethod
def get_class(provider: str) -> type:
"""Resolve the provider class for a name (built-in or entry-point).
Does not call ``sys.exit`` and does not initialize the global
provider (it may populate the ``_ep_providers`` memoization cache).
Collision warnings are emitted by ``init_global_provider``, not here.
The caller handles errors (CLI exits; the API can return HTTP 400).
Args:
provider: Provider name, e.g. ``"aws"`` or an external plug-in.
Returns:
The provider class (a subclass of :class:`Provider`).
Raises:
ImportError: If not found as built-in or entry point, or a
built-in's transitive dependency is missing.
"""
if Provider.is_builtin(provider):
provider_class_path = f"{providers_path}.{provider}.{provider}_provider"
provider_class_name = f"{provider.capitalize()}Provider"
# Let ImportError propagate — the caller decides whether to
# sys.exit (CLI) or return HTTP 400 (API).
module = import_module(provider_class_path)
try:
return getattr(module, provider_class_name)
except AttributeError:
# Module exists but doesn't define the expected class —
# fall through to entry points.
cls = Provider._load_ep_provider(provider)
if cls is not None:
return cls
raise ImportError(
f"Provider '{provider}' not found as built-in or entry point"
)
cls = Provider._load_ep_provider(provider)
if cls is None:
raise ImportError(
f"Provider '{provider}' not found as built-in or entry point"
)
return cls
@staticmethod
def get_providers_help_text() -> dict:
"""Returns a dict of {provider_name: cli_help_text} for all available providers."""
help_text = {}
for name in Provider.get_available_providers():
try:
cls = Provider.get_class(name)
help_text[name] = getattr(cls, "_cli_help_text", "")
except Exception as error:
logger.warning(
f"{error.__class__.__name__}[{error.__traceback__.tb_lineno}]: {error}"
)
help_text[name] = ""
return help_text
@staticmethod
def update_provider_config(audit_config: dict, variable: str, value: str):
@@ -37,6 +37,7 @@ class IAM(GCPService):
display_name=account.get("displayName", ""),
project_id=project_id,
uniqueId=account.get("uniqueId", ""),
disabled=account.get("disabled", False),
)
)
@@ -102,6 +103,7 @@ class ServiceAccount(BaseModel):
keys: list[Key] = []
project_id: str
uniqueId: str
disabled: bool = False
class AccessApproval(GCPService):

Some files were not shown because too many files have changed in this diff Show More