Compare commits

..

144 Commits

Author SHA1 Message Date
Prowler Bot 68eb946326 chore(api): Update prowler dependency to v5.25 for release 5.25.0 (#10906)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-28 11:00:51 +02:00
Hugo Pereira Brito e252058af4 fix(m365): exclude guest users from entra_users_mfa_capable (#10785) 2026-04-28 08:58:16 +01:00
Pepe Fagoaga 37e6c9761f chore: changelog for v5.25.0 (#10900) 2026-04-28 08:47:20 +02:00
Pepe Fagoaga ebe666bec7 chore(boto3): configure user agent extra via env (#10904) 2026-04-28 08:01:11 +02:00
Pepe Fagoaga 7df2703db1 fix(aws): get organization's metadata with assumed role (#10894) 2026-04-27 22:15:11 +01:00
Kay Agahd 67234210ba feat(aws): add check secretsmanager_has_restrictive_resource_policy (#6985) 2026-04-27 21:49:34 +01:00
Josema Camacho 15ca69942d fix(api): align get_compliance_frameworks with Compliance.get_bulk (#10903) 2026-04-27 18:10:08 +02:00
Adrián Peña df76efc197 fix(api): skip null service/region in scan summary aggregation (#10902) 2026-04-27 17:46:46 +02:00
Hugo Pereira Brito 3441ad7f70 fix(sdk): align googleworkspace finding resources (#10901) 2026-04-27 15:17:29 +01:00
Hugo Pereira Brito 059b71d34b feat(ui): add View Resource action to findings drawer (#10847) 2026-04-27 13:19:18 +01:00
lydiavilchez 013809919c feat(googleworkspace): add Gmail service with first batch of checks (#10683) 2026-04-27 13:49:07 +02:00
Daniel Barranquero 368d9c1519 fix(admincenter): restrict admincenter group visibility check to Unified groups (#10899) 2026-04-27 13:23:03 +02:00
Adrián Peña fb6da427f8 fix(api): prevent /tmp saturation from compliance report generation (#10874) 2026-04-27 11:05:34 +02:00
Adrián Peña 65fd3335d3 fix(api): reaggregate resource inventory and attack surface after muting findings (#10843) 2026-04-27 11:03:28 +02:00
César Arroba d6288be472 chore(ci): align sdk-bump-version PR titles with other bump workflows (#10897) 2026-04-27 10:20:56 +02:00
César Arroba 0cddb71d1c fix(ci): simplify docs-bump-version to a single master-only PR (#10896) 2026-04-27 10:20:47 +02:00
Andoni Alonso af2930130c fix(check): break circular import between config and check.utils (#10895) 2026-04-27 10:11:50 +02:00
Andoni Alonso b668770480 feat(github): add zizmor GitHub Actions scanning as a service of the GitHub provider (#10607) 2026-04-27 08:55:07 +02:00
Andoni Alonso f31c5717e9 chore(devex): add worktrunk worktree bootstrap config (#10867) 2026-04-27 08:45:04 +02:00
Alejandro Bailo 4788dcade2 fix(ui): polish shared table pagination and provider spacing (#10891) 2026-04-24 15:40:40 +02:00
Alejandro Bailo 22a6cc9e73 fix(ui): align resources filters and resource drawer behavior (#10861) 2026-04-24 15:03:47 +02:00
Pablo Fernandez Guerra (PFE) 06bb382f8e chore(ui): add knip for dead code detection (#10654)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-24 14:45:59 +02:00
Pedro Martín d4ece2b43e feat(sdk): add multi-provider compliance framework JSONs (#10300)
Co-authored-by: Alan Buscaglia <gentlemanprogramming@gmail.com>
2026-04-24 13:27:31 +02:00
César Arroba b97d68fbd5 fix(ci): also gate cache-dependency-path on enable-cache in setup-python-poetry (#10885) 2026-04-24 12:38:13 +02:00
César Arroba ca79300440 fix(ci): poetry cache post-step failure on release workflows (#10881) 2026-04-24 11:57:30 +02:00
Pepe Fagoaga 7a0e107617 chore(api): changelog for v5.24.4 (#10882) 2026-04-24 11:57:02 +02:00
César Arroba 6d3fcec5da ci: bump docs version against master on patch releases (#10879) 2026-04-24 11:49:14 +02:00
César Arroba ce1cf51d37 fix(ci): allow github.com egress in backport workflow (#10876) 2026-04-24 10:00:55 +02:00
Pablo Fernandez Guerra (PFE) 3554859a5c fix(ui): load every Attack Paths scan before displaying the selector (#10864)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Alejandro Bailo <59607668+alejandrobailo@users.noreply.github.com>
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-24 09:41:47 +02:00
Daniel Barranquero 80d62f355f fix(alibabacloud): fix CS service SDK compatibility and harden Alibaba provider (#10871) 2026-04-24 09:26:09 +02:00
Josema Camacho 0df24eeff6 fix(api): make Neo4j connection acquisition timeout configurable and enable Sentry tracing (#10873) 2026-04-23 17:52:14 +02:00
Alejandro Bailo d1fc482832 feat(ui): improve Mutelist UX and mute modal (#10846) 2026-04-23 17:36:32 +02:00
Andoni Alonso ffb1bb89e1 feat(ci): add official Prowler GitHub Action (#10872) 2026-04-23 16:15:13 +02:00
Alejandro Bailo d877bea0e3 chore(ui): unify filter search and batch patterns (#10859) 2026-04-23 16:03:33 +02:00
Pedro Martín 2304bf0093 feat(compliance): add CIS pdf reporting (#10650) 2026-04-23 13:28:30 +02:00
Pepe Fagoaga 2ca74102a9 chore(poetry): lock poetry with 2.3.4 and install git as required (#10868) 2026-04-23 12:30:14 +02:00
Pablo Fernandez Guerra (PFE) 6ae129fcc0 chore: remove unused submodule (#10869)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
2026-04-23 12:13:35 +02:00
Alejandro Bailo e9731f53ad chore(ui): reorganize changelog and open 1.24.4 section (#10866) 2026-04-23 11:22:32 +02:00
Pablo Fernandez Guerra (PFE) db2f92e6d5 chore: add prowler-openspec-opensource as git submodule (#10680)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-23 10:52:17 +02:00
Alejandro Bailo f4b0f8fa22 fix(ui): prevent rescheduling scans during credential update (#10851) 2026-04-23 09:45:16 +02:00
Pedro Martín dff5541e11 fix(ci): improve compliance check action (#10850) 2026-04-22 16:31:05 +02:00
Mathisdjango 927be17fb7 feat(github): add check for dismissing stale PR approvals on default branch (CIS 1.1.4) (#10569)
Co-authored-by: Daniel Barranquero <danielbo2001@gmail.com>
2026-04-22 16:14:10 +02:00
Pepe Fagoaga c27cb28a2a chore(safety): define policy for high and critical (#10845) 2026-04-22 13:28:59 +02:00
Pepe Fagoaga 94ee24071a refactor: unify filtering and sorting for finding (#10803)
Co-authored-by: alejandrobailo <alejandrobailo94@gmail.com>
2026-04-22 13:11:50 +02:00
Josema Camacho 1093f6c99b fix(api): merge Attack Paths findings on short UIDs for AWS resources (#10839) 2026-04-22 12:19:03 +02:00
Hugo Pereira Brito 48060c47ba fix(ui): improve Resource Inventory cards light mode (#10757)
Co-authored-by: alejandrobailo <alejandrobailo94@gmail.com>
2026-04-22 12:05:09 +02:00
Pedro Martín 72acc2119d fix(aws): disallow me-south-1 & me-central-1 avoid stuck scans (#10837) 2026-04-22 11:16:41 +02:00
Rubén De la Torre Vico b1ebea4a7e chore(pre-commit): scope hooks per monorepo component (#10815) 2026-04-22 11:04:31 +02:00
dependabot[bot] 001057644e chore(deps): bump pyasn1 from 0.6.2 to 0.6.3 (#10365)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: pedrooot <pedromarting3@gmail.com>
Co-authored-by: Hugo P.Brito <hugopbrit@gmail.com>
2026-04-22 10:53:39 +02:00
Adrián Peña 1456def7d4 fix(api): reaggregate overview summaries after muting findings (#10827) 2026-04-22 10:44:21 +02:00
dependabot[bot] 12d475e7af chore(deps-dev): bump pygments from 2.19.2 to 2.20.0 (#10521)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Pedro Martín <pedromarting3@gmail.com>
2026-04-22 10:09:06 +02:00
Andoni Alonso 43bd1083e0 feat(sdk): add SARIF output format for IaC provider (#10626)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-22 09:32:20 +02:00
dependabot[bot] bbd4ce7565 chore(deps): bump pygments from 2.19.2 to 2.20.0 in /mcp_server (#10523)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 09:31:04 +02:00
Davidm4r 97a085bf21 feat(ui): Add user expulsion from tenants with JWT authentication fix (#10787)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Adrián Peña <adrianjpr@gmail.com>
2026-04-22 09:28:39 +02:00
Pablo Fernandez Guerra (PFE) 29a2f8fac8 chore: remove legacy ui-checks hook from root pre-commit config (#10834)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
2026-04-22 09:18:39 +02:00
Pedro Martín a24869fc26 feat(sdk): add universal compliance output modules (CSV, OCSF, table) (#10299) 2026-04-22 09:01:45 +02:00
dependabot[bot] 72c94db1cf chore(deps): bump pygments from 2.19.2 to 2.20.0 in /api (#10522)
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-22 08:59:21 +02:00
Pepe Fagoaga 4ef7bbdb7c docs: how to configure AWS SDK Default for IAM Role authentication (#10807) 2026-04-21 18:46:18 +02:00
Pepe Fagoaga f2c5d2ec87 fix(aws): fallback lookup events to resource name (#10828) 2026-04-21 18:31:50 +02:00
Adrián Peña 61a62fd6e0 fix(api): treat muted findings as resolved in finding-groups status (#10825) 2026-04-21 17:31:44 +02:00
Raajhesh Kannaa Chidambaram 39911e3ab7 feat(github): add --repo-list-file flag for GitHub scanning (#10501)
Co-authored-by: Raajhesh Kannaa Chidambaram <495042+raajheshkannaa@users.noreply.github.com>
Co-authored-by: Andoni A. <14891798+andoniaf@users.noreply.github.com>
2026-04-21 15:31:34 +02:00
Alejandro Bailo bcce8d6236 fix(ui): centralize default muted findings filter on finding groups (#10818)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-21 14:26:51 +02:00
Pablo Fernandez Guerra (PFE) 570c86948e chore: prek workspace for UI + builtin hooks + parallel execution (#10651)
Co-authored-by: Rubén De la Torre Vico <ruben@prowler.com>
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 13:26:07 +02:00
Adrián Peña 548389d79f perf(api): speed up finding-groups /resources endpoint (#10816) 2026-04-21 12:53:59 +02:00
Alejandro Bailo fc3066bc60 refactor(ui): redesign compliance page layout and components (#10767) 2026-04-21 12:48:57 +02:00
Pedro Martín ac6dd03fb8 feat(sdk): add universal compliance schema models and loaders (#10298) 2026-04-21 11:39:04 +02:00
Javier Grau d3a1df3473 chore(skills): centralize AI assistant config via symlinks (#9951)
Co-authored-by: Alan Buscaglia <gentlemanprogramming@gmail.com>
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-21 09:29:42 +02:00
César Arroba 858dfc2a00 fix(ci): remove broken resolved_reference step from setup-python-poetry (#10687) 2026-04-21 08:58:24 +02:00
Pepe Fagoaga 6b0ba79652 fix(changelog): relocate entries for the SDK (#10812) 2026-04-21 08:17:14 +02:00
Pablo Fernandez Guerra (PFE) 390bbdd1a6 refactor(ui): remove backward-compat redirect for legacy invitation links (#10797)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 08:11:51 +02:00
Pepe Fagoaga 8d48c26c1e chore(secrets): don't block for trufflehog (#10806) 2026-04-20 17:57:32 +02:00
Boon 98b9449e14 feat: add nginx reverse proxy configuration (#8516) (#10780)
Co-authored-by: Boon <boon@security8.work>
2026-04-20 17:30:21 +02:00
Pedro Martín 3406c5ec64 chore(skills): improve prowler-compliance (#10627) 2026-04-20 17:22:05 +02:00
Adrián Peña 4346401a0a fix(api): align latest_resources scan selection with completed_at (#10802) 2026-04-20 17:16:01 +02:00
dependabot[bot] dcec79d259 chore(deps): bump pyasn1 from 0.6.2 to 0.6.3 in /api (#10366) 2026-04-20 16:43:19 +02:00
Pepe Fagoaga 2a9c538aff chore: review changelog for v5.24.1 (#10791) 2026-04-20 14:01:29 +02:00
Pepe Fagoaga bf1b53bbd2 fix(ui): sorting and filtering for findings (#10778)
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: alejandrobailo <alejandrobailo94@gmail.com>
2026-04-20 13:34:31 +02:00
César Arroba 94a2ea1e8f chore: update CODEOWNERS for new team hierarchy (#10706) 2026-04-20 11:39:00 +02:00
Daniel Barranquero f7194b32de docs: remove prowler ctf page (#10782) 2026-04-20 09:37:30 +02:00
Pedro Martín 6ffe4e95bf fix(api): detect silent failures in ResourceFindingMapping (#10724)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-20 09:00:43 +02:00
Alan Buscaglia 577aa14acc fix(ui): correct IaC findings counters (#10736)
Co-authored-by: alejandrobailo <alejandrobailo94@gmail.com>
2026-04-17 12:48:57 +02:00
Andoni Alonso 19c752c127 fix(cloudflare): guard validate_credentials against paginator infinite loops (#10771) 2026-04-17 11:23:31 +02:00
Alejandro Bailo f2d35f5885 fix(ui): exclude muted findings and polish filter selectors (#10734) 2026-04-17 11:07:22 +02:00
Josema Camacho 536e90f2a5 perf(attack-paths): cleanup task prioritization, restore default batch sizes to 1000, upgrade Cartography to 0.135.0 (#10729) 2026-04-17 10:22:30 +02:00
Daniel Barranquero 276a5d66bd feat(docs): add ctf documentation (#10761) 2026-04-16 19:35:52 +02:00
Alejandro Bailo 489c6c1073 fix: CHANGELOG minor issue (#10758) 2026-04-16 17:07:22 +02:00
Adrián Peña b08b072288 fix(api): exclude muted findings from pass_count, fail_count and manual_count (#10753) 2026-04-16 15:56:08 +02:00
Josema Camacho ca29e354b6 chore(deps): bump msgraph-sdk to 1.55.0 and azure-mgmt-resource to 24.0.0, remove marshmallow (#10733) 2026-04-16 15:34:28 +02:00
Alejandro Bailo 85a3927950 fix(ui): upgrade React 19.2.5 and Next.js 16.2.3 to mitigate CVE-2026-23869 (#10752) 2026-04-16 15:24:10 +02:00
Rubén De la Torre Vico 04fe3f65e0 chore(deps): enable Dependabot pre-commit ecosystem and bump hooks (#10732) 2026-04-16 13:38:11 +02:00
Andoni Alonso 297c9d0734 fix(sdk): move #10726 changelog entry to unreleased version (#10728) 2026-04-16 13:10:00 +02:00
Erich Blume a2a1a73749 fix(image): --registry-list crashes with AttributeError on global_provider (#10691)
Co-authored-by: Andoni A. <14891798+andoniaf@users.noreply.github.com>
2026-04-16 13:02:25 +02:00
lydiavilchez 08fbe17e29 fix(googleworkspace): treat secure Google defaults as PASS for Drive checks (#10727) 2026-04-16 13:01:55 +02:00
lydiavilchez d920f78059 fix(googleworkspace): treat secure Google defaults as PASS for Calendar checks (#10726) 2026-04-16 12:51:40 +02:00
Pepe Fagoaga 12bf3d5e70 fix(db): add missing tenant_id filter in queries (#10722) 2026-04-16 11:55:38 +02:00
Adrián Peña 4002c28b5d fix(api): add fallback handling for missing resources in findings (#10708) 2026-04-16 11:45:06 +02:00
Andoni Alonso 2439f54280 fix(sdk): allow account-scoped tokens in Cloudflare connection test (#10723) 2026-04-16 11:38:15 +02:00
Prowler Bot b0e59156e6 chore(ui): Bump version to v5.25.0 (#10711)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-15 20:14:46 +02:00
Prowler Bot f013bd4a53 docs: Update version to v5.24.0 (#10714)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-15 20:14:17 +02:00
Prowler Bot 6ad15f900f chore(release): Bump version to v5.25.0 (#10710)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-15 20:14:06 +02:00
Prowler Bot 1784bf38ab chore(api): Bump version to v1.26.0 (#10715)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-15 20:13:33 +02:00
Pepe Fagoaga ba5b23245f chore: review changelog for v5.24 (#10707) 2026-04-15 18:05:55 +02:00
Daniel Barranquero 43913b1592 feat(aws): support excluding regions from scans via CLI, env var, and config (#10688) 2026-04-15 17:59:46 +02:00
Alan Buscaglia 9e31160887 fix(ui): improve attack paths scan table UX and fix info banner variant (#10704)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
Co-authored-by: alejandrobailo <alejandrobailo94@gmail.com>
2026-04-15 17:33:29 +02:00
Pepe Fagoaga 9a0c73256e chore: delete .opencode (#10702) 2026-04-15 15:10:40 +02:00
Alejandro Bailo 2a160a10df refactor(ui): remove legacy side drawers and clean code (#10692) 2026-04-15 13:55:57 +02:00
Alan Buscaglia 8d8bee165b feat(ui): improve attack paths scan selection UX (#10685) 2026-04-15 13:54:25 +02:00
Alan Buscaglia 606efec9f8 fix(ui): keep update credentials wizard open (#10675) 2026-04-15 13:50:20 +02:00
Alan Buscaglia d5354e8b1d feat(ui): add syntax highlighting to finding groups remediation code (#10698) 2026-04-15 12:58:35 +02:00
Rubén De la Torre Vico a96e5890dc docs: replace Excalidraw diagrams with Mermaid and fix architecture connections (#10697) 2026-04-15 12:51:29 +02:00
Pepe Fagoaga bb81c5dd2d docs: add contextual menu for copy and issue/feat (#10699) 2026-04-15 12:50:29 +02:00
Daniel Barranquero c3acb818d9 fix(vercel): handle team-scoped firewall config responses (#10695) 2026-04-15 11:59:20 +02:00
Andoni Alonso e6fc59267b docs: add Finding Groups documentation page (#10696)
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-15 11:58:39 +02:00
Josema Camacho 62f114f5d0 refactor(api): remove dead cleanup_findings no-op from attack-paths module (#10684) 2026-04-15 09:16:38 +02:00
Pepe Fagoaga 392ffd5a60 fix(beat): make it dependant from API service (#10603)
Co-authored-by: Josema Camacho <josema@prowler.com>
2026-04-14 18:35:26 +02:00
Alejandro Bailo 507b0882d5 fix(ui): fix findings group resource filters and mute modal migration (#10662) 2026-04-14 18:01:45 +02:00
Alejandro Bailo 89d72cf8fd feat(ui): new resources side drawer with redesigned detail panel (#10673) 2026-04-14 17:20:19 +02:00
Rubén De la Torre Vico f3a042933f chore(deps): replace pre-commit and husky with prek (#10601) 2026-04-14 16:34:54 +02:00
stepsecurity-app[bot] 96e7d6cb3a feat(security): security best practices from StepSecurity (#10682)
Signed-off-by: StepSecurity Bot <bot@stepsecurity.io>
Co-authored-by: stepsecurity-app[bot] <188008098+stepsecurity-app[bot]@users.noreply.github.com>
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
2026-04-14 15:13:12 +02:00
Hugo Pereira Brito a82eaa885d refactor(m365): normalize CA platforms at model level (#10635)
Co-authored-by: Hugo P.Brito <hugopbrito@Mac.home>
2026-04-14 15:00:23 +02:00
Hugo Pereira Brito 90a619a8b4 feat(m365): add entra_conditional_access_policy_block_unknown_device_platforms security check (#10615)
Co-authored-by: Hugo P.Brito <hugopbrito@Mac.home>
2026-04-14 14:32:37 +02:00
Hugo Pereira Brito 638bf62d76 feat(entra): directory sync account exclusion (#10620)
Co-authored-by: Hugo P.Brito <hugopbrito@Mac.home>
2026-04-14 14:16:32 +02:00
Pablo Fernandez Guerra (PFE) 962615ca1f chore(ui): bump serialize-javascript override from 7.0.3 to 7.0.5 (#10653)
Co-authored-by: Pablo F.G <pablo.fernandez@prowler.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:11:59 +02:00
Hugo Pereira Brito 5610f5ad90 feat(m365): add entra_conditional_access_policy_corporate_device_sign_in_frequency_enforced security check (#10618)
Co-authored-by: Hugo P.Brito <hugopbrito@Mac.home>
2026-04-14 14:10:00 +02:00
Pepe Fagoaga be6fe1db04 chore(security): bump pytest to 9.0.3 (#10678) 2026-04-14 13:59:30 +02:00
Hugo Pereira Brito 92b838866a feat(m365): add entra_conditional_access_policy_mfa_enforced_for_guest_users security check (#10616)
Co-authored-by: Hugo P.Brito <hugopbrito@Mac.home>
2026-04-14 13:45:12 +02:00
Josema Camacho 51591cb8cd build: bump poetry to 2.3.4 and consolidate SDK workflows (#10681) 2026-04-14 13:32:46 +02:00
Hugo Pereira Brito e24e1ab771 feat(m365): add exchange_organization_delicensing_resiliency_enabled security check (#10608) 2026-04-14 13:30:45 +02:00
Hugo Pereira Brito bc3fd79457 feat(intune): add device compliance policy marks noncompliant check (#10599) 2026-04-14 13:01:47 +02:00
Hugo Pereira Brito 4941ed5797 feat(entra): add new check entra_conditional_access_policy_all_apps_all_users (#10619)
Co-authored-by: Hugo P.Brito <hugopbrito@Mac.home>
2026-04-14 12:47:57 +02:00
Daniel Barranquero 0f4d8ff891 feat(aws): add bedrock_vpc_endpoints_configured security check (#10591) 2026-04-14 12:22:22 +02:00
Daniel Barranquero d1ab8b8ae5 feat(aws): add iam_policy_no_wildcard_marketplace_subscribe and iam_inline_policy_no_wildcard_marketplace_subscribe checks (#10525)
Co-authored-by: Hugo P.Brito <hugopbrit@gmail.com>
2026-04-14 12:08:40 +02:00
Daniel Barranquero 65e9593b41 feat(aws): add bedrock_access_not_stale security check (#10536) 2026-04-14 11:20:40 +02:00
Daniel Barranquero 131112398b feat(aws): add bedrock_full_access_policy_attached security check (#10577) 2026-04-14 11:00:40 +02:00
Pedro Martín c952ea018e fix(ui): reflect actual provider in compliance detail header (#10674)
Co-authored-by: Alan Buscaglia <gentlemanprogramming@gmail.com>
2026-04-14 10:22:42 +02:00
Pedro Martín 31b645ee53 chore(github): allow GitHub release CDN in trivy scan allowlist (#10679) 2026-04-14 10:09:54 +02:00
harshadkhetpal 0123e603d8 fix: replace bare except with except Exception in prowler-wrapper (#10499) 2026-04-14 08:11:53 +02:00
Prowler Bot b65265da4b feat(aws): Update regions for AWS services (#10659)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-14 08:03:14 +02:00
Prowler Bot 1335332fe9 chore(api): Bump version to v1.25.0 (#10668)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-13 22:18:59 +02:00
Prowler Bot f37a2a1efe chore(release): Bump version to v5.24.0 (#10666)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-13 22:18:54 +02:00
Prowler Bot 3e0e1398c4 docs: Update version to v5.23.0 (#10667)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-13 22:18:13 +02:00
Prowler Bot a4ad9ba01f chore(ui): Bump version to v5.24.0 (#10665)
Co-authored-by: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
2026-04-13 22:17:44 +02:00
Adrián Peña c6d5f44c5e chore: update pyjwt (#10661) 2026-04-13 14:09:37 +02:00
Adrián Peña 5d24a41625 feat(api): add sort support for all finding group counter fields (#10655) 2026-04-13 13:34:35 +02:00
676 changed files with 59972 additions and 10432 deletions
+23
View File
@@ -0,0 +1,23 @@
# Prowler worktree automation for worktrunk (wt CLI).
# Runs automatically on `wt switch --create`.
# Block 1: setup + copy gitignored env files (.envrc, ui/.env.local)
# from the primary worktree — patterns selected via .worktreeinclude.
[[pre-start]]
skills = "./skills/setup.sh --claude"
python = "poetry env use python3.12"
envs = "wt step copy-ignored"
# Block 2: install Python deps (requires `poetry env use` from block 1).
[[pre-start]]
deps = "poetry install --with dev"
# Block 3: reminder — last visible output before `wt switch` returns.
# Hooks can't mutate the parent shell, so venv activation is manual.
[[pre-start]]
reminder = "echo '>> Reminder: activate the venv in this shell with: eval $(poetry env activate)'"
# Background: pnpm install runs while you start working.
# Tail logs via `wt config state logs`.
[post-start]
ui = "cd ui && pnpm install"
+1 -1
View File
@@ -145,7 +145,7 @@ SENTRY_RELEASE=local
NEXT_PUBLIC_SENTRY_ENVIRONMENT=${SENTRY_ENVIRONMENT}
#### Prowler release version ####
NEXT_PUBLIC_PROWLER_RELEASE_VERSION=v5.23.1
NEXT_PUBLIC_PROWLER_RELEASE_VERSION=v5.25.0
# Social login credentials
SOCIAL_GOOGLE_OAUTH_CALLBACK_URL="${AUTH_URL}/api/auth/callback/google"
+12 -11
View File
@@ -1,14 +1,15 @@
# SDK
/* @prowler-cloud/sdk
/prowler/ @prowler-cloud/sdk @prowler-cloud/detection-and-remediation
/tests/ @prowler-cloud/sdk @prowler-cloud/detection-and-remediation
/dashboard/ @prowler-cloud/sdk
/docs/ @prowler-cloud/sdk
/examples/ @prowler-cloud/sdk
/util/ @prowler-cloud/sdk
/contrib/ @prowler-cloud/sdk
/permissions/ @prowler-cloud/sdk
/codecov.yml @prowler-cloud/sdk @prowler-cloud/api
/* @prowler-cloud/detection-remediation
/prowler/ @prowler-cloud/detection-remediation
/prowler/compliance/ @prowler-cloud/compliance
/tests/ @prowler-cloud/detection-remediation
/dashboard/ @prowler-cloud/detection-remediation
/docs/ @prowler-cloud/detection-remediation
/examples/ @prowler-cloud/detection-remediation
/util/ @prowler-cloud/detection-remediation
/contrib/ @prowler-cloud/detection-remediation
/permissions/ @prowler-cloud/detection-remediation
/codecov.yml @prowler-cloud/detection-remediation @prowler-cloud/api
# API
/api/ @prowler-cloud/api
@@ -17,7 +18,7 @@
/ui/ @prowler-cloud/ui
# AI
/mcp_server/ @prowler-cloud/ai
/mcp_server/ @prowler-cloud/detection-remediation
# Platform
/.github/ @prowler-cloud/platform
+14 -17
View File
@@ -13,11 +13,19 @@ inputs:
poetry-version:
description: 'Poetry version to install'
required: false
default: '2.1.1'
default: '2.3.4'
install-dependencies:
description: 'Install Python dependencies with Poetry'
required: false
default: 'true'
update-lock:
description: 'Run `poetry lock` during setup. Only enable when a prior step mutates pyproject.toml (e.g. API `@master` VCS rewrite). Default: false.'
required: false
default: 'false'
enable-cache:
description: 'Whether to enable Poetry dependency caching via actions/setup-python'
required: false
default: 'true'
runs:
using: 'composite'
@@ -60,21 +68,8 @@ runs:
echo "Updated resolved_reference:"
grep -A2 -B2 "resolved_reference" poetry.lock
- name: Update SDK resolved_reference to latest commit (prowler repo on push)
if: github.event_name == 'push' && github.ref == 'refs/heads/master' && github.repository == 'prowler-cloud/prowler'
shell: bash
working-directory: ${{ inputs.working-directory }}
run: |
LATEST_COMMIT=$(curl -s "https://api.github.com/repos/prowler-cloud/prowler/commits/master" | jq -r '.sha')
echo "Latest commit hash: $LATEST_COMMIT"
sed -i '/url = "https:\/\/github\.com\/prowler-cloud\/prowler\.git"/,/resolved_reference = / {
s/resolved_reference = "[a-f0-9]\{40\}"/resolved_reference = "'"$LATEST_COMMIT"'"/
}' poetry.lock
echo "Updated resolved_reference:"
grep -A2 -B2 "resolved_reference" poetry.lock
- name: Update poetry.lock (prowler repo only)
if: github.repository == 'prowler-cloud/prowler'
if: github.repository == 'prowler-cloud/prowler' && inputs.update-lock == 'true'
shell: bash
working-directory: ${{ inputs.working-directory }}
run: poetry lock
@@ -83,8 +78,10 @@ runs:
uses: actions/setup-python@e797f83bcb11b83ae66e0230d6156d7c80228e7c # v6.0.0
with:
python-version: ${{ inputs.python-version }}
cache: 'poetry'
cache-dependency-path: ${{ inputs.working-directory }}/poetry.lock
# Disable cache when callers skip dependency install: Poetry 2.3.4 creates
# the venv in a path setup-python can't hash, breaking the post-step save-cache.
cache: ${{ inputs.enable-cache == 'true' && 'poetry' || '' }}
cache-dependency-path: ${{ inputs.enable-cache == 'true' && format('{0}/poetry.lock', inputs.working-directory) || '' }}
- name: Install Python dependencies
if: inputs.install-dependencies == 'true'
+12
View File
@@ -66,6 +66,18 @@ updates:
cooldown:
default-days: 7
- package-ecosystem: "pre-commit"
directory: "/"
schedule:
interval: "monthly"
open-pull-requests-limit: 25
target-branch: master
labels:
- "dependencies"
- "pre-commit"
cooldown:
default-days: 7
# Dependabot Updates are temporary disabled - 2025/04/15
# v4.6
# - package-ecosystem: "pip"
+2
View File
@@ -13,6 +13,8 @@ env:
PROWLER_VERSION: ${{ github.event.release.tag_name }}
BASE_BRANCH: master
permissions: {}
jobs:
detect-release-type:
runs-on: ubuntu-latest
+3
View File
@@ -17,6 +17,8 @@ concurrency:
env:
API_WORKING_DIR: ./api
permissions: {}
jobs:
api-code-quality:
runs-on: ubuntu-latest
@@ -67,6 +69,7 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
working-directory: ./api
update-lock: 'true'
- name: Poetry check
if: steps.check-changes.outputs.any_changed == 'true'
+2
View File
@@ -24,6 +24,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
api-analyze:
name: CodeQL Security Analysis
@@ -33,6 +33,8 @@ env:
PROWLERCLOUD_DOCKERHUB_REPOSITORY: prowlercloud
PROWLERCLOUD_DOCKERHUB_IMAGE: prowler-api
permissions: {}
jobs:
setup:
if: github.repository == 'prowler-cloud/prowler'
@@ -18,6 +18,8 @@ env:
API_WORKING_DIR: ./api
IMAGE_NAME: prowler-api
permissions: {}
jobs:
api-dockerfile-lint:
if: github.repository == 'prowler-cloud/prowler'
+6 -4
View File
@@ -17,6 +17,8 @@ concurrency:
env:
API_WORKING_DIR: ./api
permissions: {}
jobs:
api-security-scans:
runs-on: ubuntu-latest
@@ -58,6 +60,7 @@ jobs:
files: |
api/**
.github/workflows/api-security.yml
.safety-policy.yml
files_ignore: |
api/docs/**
api/README.md
@@ -70,6 +73,7 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
working-directory: ./api
update-lock: 'true'
- name: Bandit
if: steps.check-changes.outputs.any_changed == 'true'
@@ -77,10 +81,8 @@ jobs:
- name: Safety
if: steps.check-changes.outputs.any_changed == 'true'
run: poetry run safety check --ignore 79023,79027,86217,71600
# TODO: 79023 & 79027 knack ReDoS until `azure-cli-core` (via `cartography`) allows `knack` >=0.13.0
# TODO: 86217 because `alibabacloud-tea-openapi == 0.4.3` don't let us upgrade `cryptography >= 46.0.0`
# TODO: 71600 CVE-2024-1135 false positive - fixed in gunicorn 22.0.0, project uses 23.0.0
# Accepted CVEs, severity threshold, and ignore expirations live in ../.safety-policy.yml
run: poetry run safety check --policy-file ../.safety-policy.yml
- name: Vulture
if: steps.check-changes.outputs.any_changed == 'true'
+3
View File
@@ -30,6 +30,8 @@ env:
VALKEY_DB: 0
API_WORKING_DIR: ./api
permissions: {}
jobs:
api-tests:
runs-on: ubuntu-latest
@@ -116,6 +118,7 @@ jobs:
with:
python-version: ${{ matrix.python-version }}
working-directory: ./api
update-lock: 'true'
- name: Run tests with pytest
if: steps.check-changes.outputs.any_changed == 'true'
+3
View File
@@ -17,6 +17,8 @@ env:
BACKPORT_LABEL_PREFIX: backport-to-
BACKPORT_LABEL_IGNORE: was-backported
permissions: {}
jobs:
backport:
if: github.event.pull_request.merged == true && !(contains(github.event.pull_request.labels.*.name, 'backport')) && !(contains(github.event.pull_request.labels.*.name, 'was-backported'))
@@ -33,6 +35,7 @@ jobs:
egress-policy: block
allowed-endpoints: >
api.github.com:443
github.com:443
- name: Check labels
id: label_check
+2
View File
@@ -21,6 +21,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
zizmor:
if: github.repository == 'prowler-cloud/prowler'
@@ -9,6 +9,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.issue.number }}
cancel-in-progress: false
permissions: {}
jobs:
update-labels:
if: contains(github.event.issue.labels.*.name, 'status/awaiting-response')
@@ -16,6 +16,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
permissions: {}
jobs:
conventional-commit-check:
runs-on: ubuntu-latest
@@ -13,6 +13,8 @@ env:
BACKPORT_LABEL_PREFIX: backport-to-
BACKPORT_LABEL_COLOR: B60205
permissions: {}
jobs:
create-label:
runs-on: ubuntu-latest
+40 -225
View File
@@ -12,72 +12,12 @@ concurrency:
env:
PROWLER_VERSION: ${{ github.event.release.tag_name }}
BASE_BRANCH: master
DOCS_FILE: docs/getting-started/installation/prowler-app.mdx
permissions: {}
jobs:
detect-release-type:
runs-on: ubuntu-latest
timeout-minutes: 5
permissions:
contents: read
outputs:
is_minor: ${{ steps.detect.outputs.is_minor }}
is_patch: ${{ steps.detect.outputs.is_patch }}
major_version: ${{ steps.detect.outputs.major_version }}
minor_version: ${{ steps.detect.outputs.minor_version }}
patch_version: ${{ steps.detect.outputs.patch_version }}
current_docs_version: ${{ steps.get_docs_version.outputs.current_docs_version }}
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@fa2e9d605c4eeb9fcad4c99c224cee0c6c7f3594 # v2.16.0
with:
egress-policy: audit
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Get current documentation version
id: get_docs_version
run: |
CURRENT_DOCS_VERSION=$(grep -oP 'PROWLER_UI_VERSION="\K[^"]+' docs/getting-started/installation/prowler-app.mdx)
echo "current_docs_version=${CURRENT_DOCS_VERSION}" >> "${GITHUB_OUTPUT}"
echo "Current documentation version: $CURRENT_DOCS_VERSION"
- name: Detect release type and parse version
id: detect
run: |
if [[ $PROWLER_VERSION =~ ^([0-9]+)\.([0-9]+)\.([0-9]+)$ ]]; then
MAJOR_VERSION=${BASH_REMATCH[1]}
MINOR_VERSION=${BASH_REMATCH[2]}
PATCH_VERSION=${BASH_REMATCH[3]}
echo "major_version=${MAJOR_VERSION}" >> "${GITHUB_OUTPUT}"
echo "minor_version=${MINOR_VERSION}" >> "${GITHUB_OUTPUT}"
echo "patch_version=${PATCH_VERSION}" >> "${GITHUB_OUTPUT}"
if (( MAJOR_VERSION != 5 )); then
echo "::error::Releasing another Prowler major version, aborting..."
exit 1
fi
if (( PATCH_VERSION == 0 )); then
echo "is_minor=true" >> "${GITHUB_OUTPUT}"
echo "is_patch=false" >> "${GITHUB_OUTPUT}"
echo "✓ Minor release detected: $PROWLER_VERSION"
else
echo "is_minor=false" >> "${GITHUB_OUTPUT}"
echo "is_patch=true" >> "${GITHUB_OUTPUT}"
echo "✓ Patch release detected: $PROWLER_VERSION"
fi
else
echo "::error::Invalid version syntax: '$PROWLER_VERSION' (must be X.Y.Z)"
exit 1
fi
bump-minor-version:
needs: detect-release-type
if: needs.detect-release-type.outputs.is_minor == 'true'
bump-version:
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
@@ -89,185 +29,60 @@ jobs:
with:
egress-policy: audit
- name: Checkout repository
- name: Validate release version
run: |
if [[ ! $PROWLER_VERSION =~ ^([0-9]+)\.([0-9]+)\.([0-9]+)$ ]]; then
echo "::error::Invalid version syntax: '$PROWLER_VERSION' (must be X.Y.Z)"
exit 1
fi
if (( ${BASH_REMATCH[1]} != 5 )); then
echo "::error::Releasing another Prowler major version, aborting..."
exit 1
fi
- name: Checkout master branch
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: ${{ env.BASE_BRANCH }}
persist-credentials: false
- name: Calculate next minor version
- name: Read current docs version on master
id: docs_version
run: |
MAJOR_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MAJOR_VERSION}
MINOR_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MINOR_VERSION}
CURRENT_DOCS_VERSION="${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_CURRENT_DOCS_VERSION}"
NEXT_MINOR_VERSION=${MAJOR_VERSION}.$((MINOR_VERSION + 1)).0
CURRENT_DOCS_VERSION=$(grep -oP 'PROWLER_UI_VERSION="\K[^"]+' "${DOCS_FILE}")
echo "CURRENT_DOCS_VERSION=${CURRENT_DOCS_VERSION}" >> "${GITHUB_ENV}"
echo "NEXT_MINOR_VERSION=${NEXT_MINOR_VERSION}" >> "${GITHUB_ENV}"
echo "Current docs version on master: $CURRENT_DOCS_VERSION"
echo "Target release version: $PROWLER_VERSION"
echo "Current documentation version: $CURRENT_DOCS_VERSION"
echo "Current release version: $PROWLER_VERSION"
echo "Next minor version: $NEXT_MINOR_VERSION"
env:
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MAJOR_VERSION: ${{ needs.detect-release-type.outputs.major_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MINOR_VERSION: ${{ needs.detect-release-type.outputs.minor_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_CURRENT_DOCS_VERSION: ${{ needs.detect-release-type.outputs.current_docs_version }}
# Skip if master is already at or ahead of the release version
# (re-run, or patch shipped against an older minor line)
HIGHEST=$(printf '%s\n%s\n' "${CURRENT_DOCS_VERSION}" "${PROWLER_VERSION}" | sort -V | tail -n1)
if [[ "${CURRENT_DOCS_VERSION}" == "${PROWLER_VERSION}" || "${HIGHEST}" != "${PROWLER_VERSION}" ]]; then
echo "skip=true" >> "${GITHUB_OUTPUT}"
echo "Skipping bump: current ($CURRENT_DOCS_VERSION) >= release ($PROWLER_VERSION)"
else
echo "skip=false" >> "${GITHUB_OUTPUT}"
fi
- name: Bump versions in documentation for master
- name: Bump versions in documentation
if: steps.docs_version.outputs.skip == 'false'
run: |
set -e
# Update prowler-app.mdx with current release version
sed -i "s|PROWLER_UI_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_UI_VERSION=\"${PROWLER_VERSION}\"|" docs/getting-started/installation/prowler-app.mdx
sed -i "s|PROWLER_API_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_API_VERSION=\"${PROWLER_VERSION}\"|" docs/getting-started/installation/prowler-app.mdx
sed -i "s|PROWLER_UI_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_UI_VERSION=\"${PROWLER_VERSION}\"|" "${DOCS_FILE}"
sed -i "s|PROWLER_API_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_API_VERSION=\"${PROWLER_VERSION}\"|" "${DOCS_FILE}"
echo "Files modified:"
git --no-pager diff
- name: Create PR for documentation update to master
if: steps.docs_version.outputs.skip == 'false'
uses: peter-evans/create-pull-request@c0f553fe549906ede9cf27b5156039d195d2ece0 # v8.1.0
with:
author: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
base: master
commit-message: 'docs: Update version to v${{ env.PROWLER_VERSION }}'
branch: docs-version-update-to-v${{ env.PROWLER_VERSION }}
title: 'docs: Update version to v${{ env.PROWLER_VERSION }}'
labels: no-changelog,skip-sync
body: |
### Description
Update Prowler documentation version references to v${{ env.PROWLER_VERSION }} after releasing Prowler v${{ env.PROWLER_VERSION }}.
### Files Updated
- `docs/getting-started/installation/prowler-app.mdx`: `PROWLER_UI_VERSION` and `PROWLER_API_VERSION`
- All `*.mdx` files with `<VersionBadge>` components
### License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
- name: Checkout version branch
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
ref: v${{ needs.detect-release-type.outputs.major_version }}.${{ needs.detect-release-type.outputs.minor_version }}
persist-credentials: false
- name: Calculate first patch version
run: |
MAJOR_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MAJOR_VERSION}
MINOR_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MINOR_VERSION}
CURRENT_DOCS_VERSION="${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_CURRENT_DOCS_VERSION}"
FIRST_PATCH_VERSION=${MAJOR_VERSION}.${MINOR_VERSION}.1
VERSION_BRANCH=v${MAJOR_VERSION}.${MINOR_VERSION}
echo "CURRENT_DOCS_VERSION=${CURRENT_DOCS_VERSION}" >> "${GITHUB_ENV}"
echo "FIRST_PATCH_VERSION=${FIRST_PATCH_VERSION}" >> "${GITHUB_ENV}"
echo "VERSION_BRANCH=${VERSION_BRANCH}" >> "${GITHUB_ENV}"
echo "First patch version: $FIRST_PATCH_VERSION"
echo "Version branch: $VERSION_BRANCH"
env:
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MAJOR_VERSION: ${{ needs.detect-release-type.outputs.major_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MINOR_VERSION: ${{ needs.detect-release-type.outputs.minor_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_CURRENT_DOCS_VERSION: ${{ needs.detect-release-type.outputs.current_docs_version }}
- name: Bump versions in documentation for version branch
run: |
set -e
# Update prowler-app.mdx with current release version
sed -i "s|PROWLER_UI_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_UI_VERSION=\"${PROWLER_VERSION}\"|" docs/getting-started/installation/prowler-app.mdx
sed -i "s|PROWLER_API_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_API_VERSION=\"${PROWLER_VERSION}\"|" docs/getting-started/installation/prowler-app.mdx
echo "Files modified:"
git --no-pager diff
- name: Create PR for documentation update to version branch
uses: peter-evans/create-pull-request@c0f553fe549906ede9cf27b5156039d195d2ece0 # v8.1.0
with:
author: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
base: ${{ env.VERSION_BRANCH }}
commit-message: 'docs: Update version to v${{ env.PROWLER_VERSION }}'
branch: docs-version-update-to-v${{ env.PROWLER_VERSION }}-branch
title: 'docs: Update version to v${{ env.PROWLER_VERSION }}'
labels: no-changelog,skip-sync
body: |
### Description
Update Prowler documentation version references to v${{ env.PROWLER_VERSION }} in version branch after releasing Prowler v${{ env.PROWLER_VERSION }}.
### Files Updated
- `docs/getting-started/installation/prowler-app.mdx`: `PROWLER_UI_VERSION` and `PROWLER_API_VERSION`
### License
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
bump-patch-version:
needs: detect-release-type
if: needs.detect-release-type.outputs.is_patch == 'true'
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
contents: read
pull-requests: write
steps:
- name: Harden the runner (Audit all outbound calls)
uses: step-security/harden-runner@fa2e9d605c4eeb9fcad4c99c224cee0c6c7f3594 # v2.16.0
with:
egress-policy: audit
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
with:
persist-credentials: false
- name: Calculate next patch version
run: |
MAJOR_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MAJOR_VERSION}
MINOR_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MINOR_VERSION}
PATCH_VERSION=${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_PATCH_VERSION}
CURRENT_DOCS_VERSION="${NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_CURRENT_DOCS_VERSION}"
NEXT_PATCH_VERSION=${MAJOR_VERSION}.${MINOR_VERSION}.$((PATCH_VERSION + 1))
VERSION_BRANCH=v${MAJOR_VERSION}.${MINOR_VERSION}
echo "CURRENT_DOCS_VERSION=${CURRENT_DOCS_VERSION}" >> "${GITHUB_ENV}"
echo "NEXT_PATCH_VERSION=${NEXT_PATCH_VERSION}" >> "${GITHUB_ENV}"
echo "VERSION_BRANCH=${VERSION_BRANCH}" >> "${GITHUB_ENV}"
echo "Current documentation version: $CURRENT_DOCS_VERSION"
echo "Current release version: $PROWLER_VERSION"
echo "Next patch version: $NEXT_PATCH_VERSION"
echo "Target branch: $VERSION_BRANCH"
env:
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MAJOR_VERSION: ${{ needs.detect-release-type.outputs.major_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_MINOR_VERSION: ${{ needs.detect-release-type.outputs.minor_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_PATCH_VERSION: ${{ needs.detect-release-type.outputs.patch_version }}
NEEDS_DETECT_RELEASE_TYPE_OUTPUTS_CURRENT_DOCS_VERSION: ${{ needs.detect-release-type.outputs.current_docs_version }}
- name: Bump versions in documentation for patch version
run: |
set -e
# Update prowler-app.mdx with current release version
sed -i "s|PROWLER_UI_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_UI_VERSION=\"${PROWLER_VERSION}\"|" docs/getting-started/installation/prowler-app.mdx
sed -i "s|PROWLER_API_VERSION=\"${CURRENT_DOCS_VERSION}\"|PROWLER_API_VERSION=\"${PROWLER_VERSION}\"|" docs/getting-started/installation/prowler-app.mdx
echo "Files modified:"
git --no-pager diff
- name: Create PR for documentation update to version branch
uses: peter-evans/create-pull-request@c0f553fe549906ede9cf27b5156039d195d2ece0 # v8.1.0
with:
author: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
base: ${{ env.VERSION_BRANCH }}
commit-message: 'docs: Update version to v${{ env.PROWLER_VERSION }}'
branch: docs-version-update-to-v${{ env.PROWLER_VERSION }}
title: 'docs: Update version to v${{ env.PROWLER_VERSION }}'
base: ${{ env.BASE_BRANCH }}
commit-message: 'chore(docs): Bump version to v${{ env.PROWLER_VERSION }}'
branch: docs-version-bump-to-v${{ env.PROWLER_VERSION }}
title: 'chore(docs): Bump version to v${{ env.PROWLER_VERSION }}'
labels: no-changelog,skip-sync
body: |
### Description
+8 -5
View File
@@ -14,6 +14,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
scan-secrets:
runs-on: ubuntu-latest
@@ -25,11 +27,12 @@ jobs:
- name: Harden Runner
uses: step-security/harden-runner@fa2e9d605c4eeb9fcad4c99c224cee0c6c7f3594 # v2.16.0
with:
egress-policy: block
allowed-endpoints: >
github.com:443
ghcr.io:443
pkg-containers.githubusercontent.com:443
# We can't block as Trufflehog needs to verify secrets against vendors
egress-policy: audit
# allowed-endpoints: >
# github.com:443
# ghcr.io:443
# pkg-containers.githubusercontent.com:443
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+2
View File
@@ -21,6 +21,8 @@ concurrency:
env:
CHART_PATH: contrib/k8s/helm/prowler-app
permissions: {}
jobs:
helm-lint:
if: github.repository == 'prowler-cloud/prowler'
+2
View File
@@ -13,6 +13,8 @@ concurrency:
env:
CHART_PATH: contrib/k8s/helm/prowler-app
permissions: {}
jobs:
release-helm-chart:
if: github.repository == 'prowler-cloud/prowler'
@@ -9,6 +9,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.issue.number }}
cancel-in-progress: false
permissions: {}
jobs:
lock:
if: |
+2
View File
@@ -15,6 +15,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
permissions: {}
jobs:
labeler:
runs-on: ubuntu-latest
@@ -32,6 +32,8 @@ env:
PROWLERCLOUD_DOCKERHUB_REPOSITORY: prowlercloud
PROWLERCLOUD_DOCKERHUB_IMAGE: prowler-mcp
permissions: {}
jobs:
setup:
if: github.repository == 'prowler-cloud/prowler'
@@ -18,6 +18,8 @@ env:
MCP_WORKING_DIR: ./mcp_server
IMAGE_NAME: prowler-mcp
permissions: {}
jobs:
mcp-dockerfile-lint:
if: github.repository == 'prowler-cloud/prowler'
@@ -87,6 +89,9 @@ jobs:
api.github.com:443
mirror.gcr.io:443
check.trivy.dev:443
get.trivy.dev:443
release-assets.githubusercontent.com:443
objects.githubusercontent.com:443
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+2
View File
@@ -14,6 +14,8 @@ env:
PYTHON_VERSION: "3.12"
WORKING_DIRECTORY: ./mcp_server
permissions: {}
jobs:
validate-release:
if: github.repository == 'prowler-cloud/prowler'
+2
View File
@@ -16,6 +16,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
permissions: {}
jobs:
check-changelog:
if: contains(github.event.pull_request.labels.*.name, 'no-changelog') == false
@@ -16,9 +16,17 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
permissions: {}
jobs:
check-compliance-mapping:
if: contains(github.event.pull_request.labels.*.name, 'no-compliance-check') == false
if: >-
github.event.pull_request.state == 'open' &&
contains(github.event.pull_request.labels.*.name, 'no-compliance-check') == false &&
(
(github.event.action != 'labeled' && github.event.action != 'unlabeled')
|| github.event.label.name == 'no-compliance-check'
)
runs-on: ubuntu-latest
timeout-minutes: 15
permissions:
@@ -15,6 +15,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: true
permissions: {}
jobs:
check-conflicts:
runs-on: ubuntu-latest
+2
View File
@@ -12,6 +12,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number }}
cancel-in-progress: false
permissions: {}
jobs:
trigger-cloud-pull-request:
if: |
+6 -7
View File
@@ -17,6 +17,8 @@ concurrency:
env:
PROWLER_VERSION: ${{ inputs.prowler_version }}
permissions: {}
jobs:
prepare-release:
if: github.event_name == 'workflow_dispatch' && github.repository == 'prowler-cloud/prowler'
@@ -38,15 +40,12 @@ jobs:
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- name: Setup Python with Poetry
uses: ./.github/actions/setup-python-poetry
with:
python-version: '3.12'
- name: Install Poetry
run: |
python3 -m pip install --user poetry==2.1.1
echo "$HOME/.local/bin" >> $GITHUB_PATH
install-dependencies: 'false'
enable-cache: 'false'
- name: Configure Git
run: |
+11 -9
View File
@@ -13,6 +13,8 @@ env:
PROWLER_VERSION: ${{ github.event.release.tag_name }}
BASE_BRANCH: master
permissions: {}
jobs:
detect-release-type:
runs-on: ubuntu-latest
@@ -111,9 +113,9 @@ jobs:
author: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
base: master
commit-message: 'chore(release): Bump version to v${{ env.NEXT_MINOR_VERSION }}'
branch: version-bump-to-v${{ env.NEXT_MINOR_VERSION }}
title: 'chore(release): Bump version to v${{ env.NEXT_MINOR_VERSION }}'
commit-message: 'chore(sdk): Bump version to v${{ env.NEXT_MINOR_VERSION }}'
branch: sdk-version-bump-to-v${{ env.NEXT_MINOR_VERSION }}
title: 'chore(sdk): Bump version to v${{ env.NEXT_MINOR_VERSION }}'
labels: no-changelog,skip-sync
body: |
### Description
@@ -163,9 +165,9 @@ jobs:
author: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
base: ${{ env.VERSION_BRANCH }}
commit-message: 'chore(release): Bump version to v${{ env.FIRST_PATCH_VERSION }}'
branch: version-bump-to-v${{ env.FIRST_PATCH_VERSION }}
title: 'chore(release): Bump version to v${{ env.FIRST_PATCH_VERSION }}'
commit-message: 'chore(sdk): Bump version to v${{ env.FIRST_PATCH_VERSION }}'
branch: sdk-version-bump-to-v${{ env.FIRST_PATCH_VERSION }}
title: 'chore(sdk): Bump version to v${{ env.FIRST_PATCH_VERSION }}'
labels: no-changelog,skip-sync
body: |
### Description
@@ -231,9 +233,9 @@ jobs:
author: prowler-bot <179230569+prowler-bot@users.noreply.github.com>
token: ${{ secrets.PROWLER_BOT_ACCESS_TOKEN }}
base: ${{ env.VERSION_BRANCH }}
commit-message: 'chore(release): Bump version to v${{ env.NEXT_PATCH_VERSION }}'
branch: version-bump-to-v${{ env.NEXT_PATCH_VERSION }}
title: 'chore(release): Bump version to v${{ env.NEXT_PATCH_VERSION }}'
commit-message: 'chore(sdk): Bump version to v${{ env.NEXT_PATCH_VERSION }}'
branch: sdk-version-bump-to-v${{ env.NEXT_PATCH_VERSION }}
title: 'chore(sdk): Bump version to v${{ env.NEXT_PATCH_VERSION }}'
labels: no-changelog,skip-sync
body: |
### Description
@@ -10,6 +10,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
check-duplicate-test-names:
if: github.repository == 'prowler-cloud/prowler'
+4 -13
View File
@@ -14,6 +14,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
sdk-code-quality:
if: github.repository == 'prowler-cloud/prowler'
@@ -69,22 +71,11 @@ jobs:
contrib/**
**/AGENTS.md
- name: Install Poetry
- name: Setup Python with Poetry
if: steps.check-changes.outputs.any_changed == 'true'
run: pipx install poetry==2.1.1
- name: Set up Python ${{ matrix.python-version }}
if: steps.check-changes.outputs.any_changed == 'true'
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: ./.github/actions/setup-python-poetry
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'
- name: Install dependencies
if: steps.check-changes.outputs.any_changed == 'true'
run: |
poetry install --no-root
poetry run pip list
- name: Check Poetry lock file
if: steps.check-changes.outputs.any_changed == 'true'
+2
View File
@@ -30,6 +30,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
sdk-analyze:
if: github.repository == 'prowler-cloud/prowler'
@@ -47,6 +47,8 @@ env:
# AWS configuration (for ECR)
AWS_REGION: us-east-1
permissions: {}
jobs:
setup:
if: github.repository == 'prowler-cloud/prowler'
@@ -74,15 +76,15 @@ jobs:
with:
persist-credentials: false
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- name: Setup Python with Poetry
uses: ./.github/actions/setup-python-poetry
with:
python-version: ${{ env.PYTHON_VERSION }}
install-dependencies: 'false'
enable-cache: 'false'
- name: Install Poetry
run: |
pipx install poetry==2.1.1
pipx inject poetry poetry-bumpversion
- name: Inject poetry-bumpversion plugin
run: pipx inject poetry poetry-bumpversion
- name: Get Prowler version and set tags
id: get-prowler-version
@@ -17,6 +17,8 @@ concurrency:
env:
IMAGE_NAME: prowler
permissions: {}
jobs:
sdk-dockerfile-lint:
if: github.repository == 'prowler-cloud/prowler'
@@ -85,6 +87,7 @@ jobs:
check.trivy.dev:443
debian.map.fastlydns.net:80
release-assets.githubusercontent.com:443
objects.githubusercontent.com:443
pypi.org:443
files.pythonhosted.org:443
www.powershellgallery.com:443
+10 -10
View File
@@ -13,6 +13,8 @@ env:
RELEASE_TAG: ${{ github.event.release.tag_name }}
PYTHON_VERSION: '3.12'
permissions: {}
jobs:
validate-release:
if: github.repository == 'prowler-cloud/prowler'
@@ -73,13 +75,12 @@ jobs:
with:
persist-credentials: false
- name: Install Poetry
run: pipx install poetry==2.1.1
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- name: Setup Python with Poetry
uses: ./.github/actions/setup-python-poetry
with:
python-version: ${{ env.PYTHON_VERSION }}
install-dependencies: 'false'
enable-cache: 'false'
- name: Build Prowler package
run: poetry build
@@ -111,13 +112,12 @@ jobs:
with:
persist-credentials: false
- name: Install Poetry
run: pipx install poetry==2.1.1
- name: Set up Python ${{ env.PYTHON_VERSION }}
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
- name: Setup Python with Poetry
uses: ./.github/actions/setup-python-poetry
with:
python-version: ${{ env.PYTHON_VERSION }}
install-dependencies: 'false'
enable-cache: 'false'
- name: Install toml package
run: pip install toml
@@ -13,6 +13,8 @@ env:
PYTHON_VERSION: '3.12'
AWS_REGION: 'us-east-1'
permissions: {}
jobs:
refresh-aws-regions:
if: github.repository == 'prowler-cloud/prowler'
@@ -12,6 +12,8 @@ concurrency:
env:
PYTHON_VERSION: '3.12'
permissions: {}
jobs:
refresh-oci-regions:
if: github.repository == 'prowler-cloud/prowler'
+6 -12
View File
@@ -14,6 +14,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
sdk-security-scans:
if: github.repository == 'prowler-cloud/prowler'
@@ -69,20 +71,11 @@ jobs:
contrib/**
**/AGENTS.md
- name: Install Poetry
- name: Setup Python with Poetry
if: steps.check-changes.outputs.any_changed == 'true'
run: pipx install poetry==2.1.1
- name: Set up Python 3.12
if: steps.check-changes.outputs.any_changed == 'true'
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: ./.github/actions/setup-python-poetry
with:
python-version: '3.12'
cache: 'poetry'
- name: Install dependencies
if: steps.check-changes.outputs.any_changed == 'true'
run: poetry install --no-root
- name: Security scan with Bandit
if: steps.check-changes.outputs.any_changed == 'true'
@@ -90,7 +83,8 @@ jobs:
- name: Security scan with Safety
if: steps.check-changes.outputs.any_changed == 'true'
run: poetry run safety check -r pyproject.toml
# Accepted CVEs, severity threshold, and ignore expirations live in .safety-policy.yml
run: poetry run safety check -r pyproject.toml --policy-file .safety-policy.yml
- name: Dead code detection with Vulture
if: steps.check-changes.outputs.any_changed == 'true'
+4 -11
View File
@@ -14,6 +14,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
sdk-tests:
if: github.repository == 'prowler-cloud/prowler'
@@ -90,20 +92,11 @@ jobs:
contrib/**
**/AGENTS.md
- name: Install Poetry
- name: Setup Python with Poetry
if: steps.check-changes.outputs.any_changed == 'true'
run: pipx install poetry==2.1.1
- name: Set up Python ${{ matrix.python-version }}
if: steps.check-changes.outputs.any_changed == 'true'
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0
uses: ./.github/actions/setup-python-poetry
with:
python-version: ${{ matrix.python-version }}
cache: 'poetry'
- name: Install dependencies
if: steps.check-changes.outputs.any_changed == 'true'
run: poetry install --no-root
# AWS Provider
- name: Check if AWS files changed
@@ -31,6 +31,8 @@ on:
description: "Whether there are UI E2E tests to run"
value: ${{ jobs.analyze.outputs.has-ui-e2e }}
permissions: {}
jobs:
analyze:
runs-on: ubuntu-latest
+2
View File
@@ -13,6 +13,8 @@ env:
PROWLER_VERSION: ${{ github.event.release.tag_name }}
BASE_BRANCH: master
permissions: {}
jobs:
detect-release-type:
runs-on: ubuntu-latest
+2
View File
@@ -26,6 +26,8 @@ concurrency:
group: ${{ github.workflow }}-${{ github.ref }}
cancel-in-progress: true
permissions: {}
jobs:
ui-analyze:
if: github.repository == 'prowler-cloud/prowler'
@@ -35,6 +35,8 @@ env:
# Build args
NEXT_PUBLIC_API_BASE_URL: http://prowler-api:8080/api/v1
permissions: {}
jobs:
setup:
if: github.repository == 'prowler-cloud/prowler'
@@ -18,6 +18,8 @@ env:
UI_WORKING_DIR: ./ui
IMAGE_NAME: prowler-ui
permissions: {}
jobs:
ui-dockerfile-lint:
if: github.repository == 'prowler-cloud/prowler'
@@ -89,6 +91,8 @@ jobs:
mirror.gcr.io:443
check.trivy.dev:443
get.trivy.dev:443
release-assets.githubusercontent.com:443
objects.githubusercontent.com:443
- name: Checkout repository
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
+2
View File
@@ -15,6 +15,8 @@ on:
- 'ui/**'
- 'api/**' # API changes can affect UI E2E
permissions: {}
jobs:
# First, analyze which tests need to run
impact-analysis:
+2
View File
@@ -18,6 +18,8 @@ env:
UI_WORKING_DIR: ./ui
NODE_VERSION: '24.13.0'
permissions: {}
jobs:
ui-tests:
runs-on: ubuntu-latest
+3
View File
@@ -84,6 +84,7 @@ continue.json
.continuerc.json
# AI Coding Assistants - OpenCode
.opencode/
opencode.json
# AI Coding Assistants - GitHub Copilot
@@ -150,6 +151,8 @@ node_modules
# Persistent data
_data/
/openspec/
/.gitmodules
# AI Instructions (generated by skills/setup.sh from AGENTS.md)
CLAUDE.md
+66 -39
View File
@@ -1,12 +1,11 @@
repos:
## GENERAL
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
## GENERAL (prek built-in — no external repo needed)
- repo: builtin
hooks:
- id: check-merge-conflict
- id: check-yaml
args: ["--unsafe"]
exclude: prowler/config/llm_config.yaml
args: ["--allow-multiple-documents"]
exclude: (prowler/config/llm_config.yaml|contrib/)
- id: check-json
- id: end-of-file-fixer
- id: trailing-whitespace
@@ -16,7 +15,7 @@ repos:
## TOML
- repo: https://github.com/macisamuele/language-formatters-pre-commit-hooks
rev: v2.13.0
rev: v2.16.0
hooks:
- id: pretty-format-toml
args: [--autofix]
@@ -24,24 +23,25 @@ repos:
## GITHUB ACTIONS
- repo: https://github.com/zizmorcore/zizmor-pre-commit
rev: v1.6.0
rev: v1.24.1
hooks:
- id: zizmor
files: ^\.github/
## BASH
- repo: https://github.com/koalaman/shellcheck-precommit
rev: v0.10.0
rev: v0.11.0
hooks:
- id: shellcheck
exclude: contrib
## PYTHON
## PYTHON — SDK (prowler/, tests/, dashboard/, util/, scripts/)
- repo: https://github.com/myint/autoflake
rev: v2.3.1
rev: v2.3.3
hooks:
- id: autoflake
exclude: ^skills/
name: "SDK - autoflake"
files: { glob: ["{prowler,tests,dashboard,util,scripts}/**/*.py"] }
args:
[
"--in-place",
@@ -50,100 +50,127 @@ repos:
]
- repo: https://github.com/pycqa/isort
rev: 5.13.2
rev: 8.0.1
hooks:
- id: isort
exclude: ^skills/
name: "SDK - isort"
files: { glob: ["{prowler,tests,dashboard,util,scripts}/**/*.py"] }
args: ["--profile", "black"]
- repo: https://github.com/psf/black
rev: 24.4.2
rev: 26.3.1
hooks:
- id: black
exclude: ^skills/
name: "SDK - black"
files: { glob: ["{prowler,tests,dashboard,util,scripts}/**/*.py"] }
- repo: https://github.com/pycqa/flake8
rev: 7.0.0
rev: 7.3.0
hooks:
- id: flake8
exclude: (contrib|^skills/)
name: "SDK - flake8"
files: { glob: ["{prowler,tests,dashboard,util,scripts}/**/*.py"] }
args: ["--ignore=E266,W503,E203,E501,W605"]
## PYTHON — API + MCP Server (ruff)
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.11
hooks:
- id: ruff
name: "API + MCP - ruff check"
files: { glob: ["{api,mcp_server}/**/*.py"] }
args: ["--fix"]
- id: ruff-format
name: "API + MCP - ruff format"
files: { glob: ["{api,mcp_server}/**/*.py"] }
## PYTHON — Poetry
- repo: https://github.com/python-poetry/poetry
rev: 2.1.1
rev: 2.3.4
hooks:
- id: poetry-check
name: API - poetry-check
args: ["--directory=./api"]
files: { glob: ["api/{pyproject.toml,poetry.lock}"] }
pass_filenames: false
- id: poetry-lock
name: API - poetry-lock
args: ["--directory=./api"]
files: { glob: ["api/{pyproject.toml,poetry.lock}"] }
pass_filenames: false
- id: poetry-check
name: SDK - poetry-check
args: ["--directory=./"]
files: { glob: ["{pyproject.toml,poetry.lock}"] }
pass_filenames: false
- id: poetry-lock
name: SDK - poetry-lock
args: ["--directory=./"]
files: { glob: ["{pyproject.toml,poetry.lock}"] }
pass_filenames: false
## CONTAINERS
- repo: https://github.com/hadolint/hadolint
rev: v2.13.0-beta
rev: v2.14.0
hooks:
- id: hadolint
args: ["--ignore=DL3013"]
## LOCAL HOOKS
- repo: local
hooks:
- id: pylint
name: pylint
entry: bash -c 'pylint --disable=W,C,R,E -j 0 -rn -sn prowler/'
name: "SDK - pylint"
entry: pylint --disable=W,C,R,E -j 0 -rn -sn
language: system
files: '.*\.py'
types: [python]
files: { glob: ["{prowler,tests,dashboard,util,scripts}/**/*.py"] }
- id: trufflehog
name: TruffleHog
description: Detect secrets in your data.
entry: bash -c 'trufflehog --no-update git file://. --only-verified --fail'
entry: bash -c 'trufflehog --no-update git file://. --since-commit HEAD --only-verified --fail'
# For running trufflehog in docker, use the following entry instead:
# entry: bash -c 'docker run -v "$(pwd):/workdir" -i --rm trufflesecurity/trufflehog:latest git file:///workdir --only-verified --fail'
language: system
pass_filenames: false
stages: ["pre-commit", "pre-push"]
- id: bandit
name: bandit
description: "Bandit is a tool for finding common security issues in Python code"
entry: bash -c 'bandit -q -lll -x '*_test.py,./contrib/,./.venv/,./skills/' -r .'
entry: bandit -q -lll
language: system
types: [python]
files: '.*\.py'
exclude:
{ glob: ["{contrib,skills}/**", "**/.venv/**", "**/*_test.py"] }
- id: safety
name: safety
description: "Safety is a tool that checks your installed dependencies for known security vulnerabilities"
# TODO: Botocore needs urllib3 1.X so we need to ignore these vulnerabilities 77744,77745. Remove this once we upgrade to urllib3 2.X
# TODO: 79023 & 79027 knack ReDoS until `azure-cli-core` (via `cartography`) allows `knack` >=0.13.0
# TODO: 86217 because `alibabacloud-tea-openapi == 0.4.3` don't let us upgrade `cryptography >= 46.0.0`
# TODO: 71600 CVE-2024-1135 false positive - fixed in gunicorn 22.0.0, project uses 23.0.0
entry: bash -c 'safety check --ignore 70612,66963,74429,76352,76353,77744,77745,79023,79027,86217,71600'
# Accepted CVEs, severity threshold, and ignore expirations live in .safety-policy.yml
entry: safety check --policy-file .safety-policy.yml
language: system
pass_filenames: false
files:
{
glob:
[
"**/pyproject.toml",
"**/poetry.lock",
"**/requirements*.txt",
".safety-policy.yml",
],
}
- id: vulture
name: vulture
description: "Vulture finds unused code in Python programs."
entry: bash -c 'vulture --exclude "contrib,.venv,api/src/backend/api/tests/,api/src/backend/conftest.py,api/src/backend/tasks/tests/,skills/" --min-confidence 100 .'
entry: vulture --min-confidence 100
language: system
types: [python]
files: '.*\.py'
- id: ui-checks
name: UI - Husky Pre-commit
description: "Run UI pre-commit checks (Claude Code validation + healthcheck)"
entry: bash -c 'cd ui && .husky/pre-commit'
language: system
files: '^ui/.*\.(ts|tsx|js|jsx|json|css)$'
pass_filenames: false
verbose: true
+1 -1
View File
@@ -13,7 +13,7 @@ build:
post_create_environment:
# Install poetry
# https://python-poetry.org/docs/#installing-manually
- python -m pip install poetry
- python -m pip install poetry==2.3.4
post_install:
# Install dependencies with 'docs' dependency group
# https://python-poetry.org/docs/managing-dependencies/#dependency-groups
+58
View File
@@ -0,0 +1,58 @@
# Safety policy for `safety check` (Safety CLI 3.x, v2 schema).
# Applied in: .pre-commit-config.yaml, .github/workflows/api-security.yml,
# .github/workflows/sdk-security.yml via `--policy-file`.
#
# Validate: poetry run safety validate policy_file --path .safety-policy.yml
security:
# Scan unpinned requirements too. Prowler pins via poetry.lock, so this is
# defensive against accidental unpinned entries.
ignore-unpinned-requirements: False
# CVSS severity filter. 7 = report only HIGH (7.08.9) and CRITICAL (9.010.0).
# Reference: 9=CRITICAL only, 7=CRITICAL+HIGH, 4=CRITICAL+HIGH+MEDIUM.
ignore-cvss-severity-below: 7
# Unknown severity is unrated, not safe. Keep False so unrated CVEs still fail
# the build and get a human eye. Flip to True only if noise is unmanageable.
ignore-cvss-unknown-severity: False
# Fail the build when a non-ignored vulnerability is found.
continue-on-vulnerability-error: False
# Explicit accepted vulnerabilities. Each entry MUST have a reason and an
# expiry. Expired entries fail the scan, forcing re-audit.
ignore-vulnerabilities:
77744:
reason: "Botocore requires urllib3 1.X. Remove once upgraded to urllib3 2.X."
expires: '2026-10-22'
77745:
reason: "Botocore requires urllib3 1.X. Remove once upgraded to urllib3 2.X."
expires: '2026-10-22'
79023:
reason: "knack ReDoS; blocked until azure-cli-core (via cartography) allows knack >=0.13.0."
expires: '2026-10-22'
79027:
reason: "knack ReDoS; blocked until azure-cli-core (via cartography) allows knack >=0.13.0."
expires: '2026-10-22'
86217:
reason: "alibabacloud-tea-openapi==0.4.3 blocks upgrade to cryptography >=46.0.0."
expires: '2026-10-22'
71600:
reason: "CVE-2024-1135 false positive. Fixed in gunicorn 22.0.0; project uses 23.0.0."
expires: '2026-10-22'
70612:
reason: "TBD - audit required. Reason not documented in prior --ignore list."
expires: '2026-07-22'
66963:
reason: "TBD - audit required. Reason not documented in prior --ignore list."
expires: '2026-07-22'
74429:
reason: "TBD - audit required. Reason not documented in prior --ignore list."
expires: '2026-07-22'
76352:
reason: "TBD - audit required. Reason not documented in prior --ignore list."
expires: '2026-07-22'
76353:
reason: "TBD - audit required. Reason not documented in prior --ignore list."
expires: '2026-07-22'
+2
View File
@@ -0,0 +1,2 @@
.envrc
ui/.env.local
+3 -3
View File
@@ -140,7 +140,7 @@ Prowler is an open-source cloud security assessment tool supporting AWS, Azure,
| Component | Location | Tech Stack |
|-----------|----------|------------|
| SDK | `prowler/` | Python 3.10+, Poetry |
| SDK | `prowler/` | Python 3.10+, Poetry 2.3+ |
| API | `api/` | Django 5.1, DRF, Celery |
| UI | `ui/` | Next.js 15, React 19, Tailwind 4 |
| MCP Server | `mcp_server/` | FastMCP, Python 3.12+ |
@@ -153,12 +153,12 @@ Prowler is an open-source cloud security assessment tool supporting AWS, Azure,
```bash
# Setup
poetry install --with dev
poetry run pre-commit install
poetry run prek install
# Code quality
poetry run make lint
poetry run make format
poetry run pre-commit run --all-files
poetry run prek run --all-files
```
---
+20 -1
View File
@@ -9,6 +9,9 @@ ENV POWERSHELL_VERSION=${POWERSHELL_VERSION}
ARG TRIVY_VERSION=0.69.2
ENV TRIVY_VERSION=${TRIVY_VERSION}
ARG ZIZMOR_VERSION=1.24.1
ENV ZIZMOR_VERSION=${ZIZMOR_VERSION}
# hadolint ignore=DL3008
RUN apt-get update && apt-get install -y --no-install-recommends \
wget libicu72 libunwind8 libssl3 libcurl4 ca-certificates apt-transport-https gnupg \
@@ -48,6 +51,22 @@ RUN ARCH=$(uname -m) && \
mkdir -p /tmp/.cache/trivy && \
chmod 777 /tmp/.cache/trivy
# Install zizmor for GitHub Actions workflow scanning
RUN ARCH=$(uname -m) && \
if [ "$ARCH" = "x86_64" ]; then \
ZIZMOR_ARCH="x86_64-unknown-linux-gnu" ; \
elif [ "$ARCH" = "aarch64" ]; then \
ZIZMOR_ARCH="aarch64-unknown-linux-gnu" ; \
else \
echo "Unsupported architecture for zizmor: $ARCH" && exit 1 ; \
fi && \
wget --progress=dot:giga "https://github.com/zizmorcore/zizmor/releases/download/v${ZIZMOR_VERSION}/zizmor-${ZIZMOR_ARCH}.tar.gz" -O /tmp/zizmor.tar.gz && \
mkdir -p /tmp/zizmor-extract && \
tar zxf /tmp/zizmor.tar.gz -C /tmp/zizmor-extract && \
mv /tmp/zizmor-extract/zizmor /usr/local/bin/zizmor && \
chmod +x /usr/local/bin/zizmor && \
rm -rf /tmp/zizmor.tar.gz /tmp/zizmor-extract
# Add prowler user
RUN addgroup --gid 1000 prowler && \
adduser --uid 1000 --gid 1000 --disabled-password --gecos "" prowler
@@ -68,7 +87,7 @@ ENV HOME='/home/prowler'
ENV PATH="${HOME}/.local/bin:${PATH}"
#hadolint ignore=DL3013
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir poetry
pip install --no-cache-dir poetry==2.3.4
RUN poetry install --compile && \
rm -rf ~/.cache/pip
+31 -8
View File
@@ -246,14 +246,7 @@ Some pre-commit hooks require tools installed on your system:
1. **Install [TruffleHog](https://github.com/trufflesecurity/trufflehog#install)** (secret scanning) — see the [official installation options](https://github.com/trufflesecurity/trufflehog#install).
2. **Install [Safety](https://github.com/pyupio/safety)** (dependency vulnerability checking):
```console
# Requires a Python environment (e.g. via pyenv)
pip install safety
```
3. **Install [Hadolint](https://github.com/hadolint/hadolint#install)** (Dockerfile linting) — see the [official installation options](https://github.com/hadolint/hadolint#install).
2. **Install [Hadolint](https://github.com/hadolint/hadolint#install)** (Dockerfile linting) — see the [official installation options](https://github.com/hadolint/hadolint#install).
## Prowler CLI
### Pip package
@@ -307,6 +300,36 @@ python prowler-cli.py -v
> If your Poetry version is below v2.0.0, continue using `poetry shell` to activate your environment.
> For further guidance, refer to the Poetry Environment Activation Guide https://python-poetry.org/docs/managing-environments/#activating-the-environment.
# 🛡️ GitHub Action
The official **Prowler GitHub Action** runs Prowler scans in your GitHub workflows using the official [`prowlercloud/prowler`](https://hub.docker.com/r/prowlercloud/prowler) Docker image. Scans run on any [supported provider](https://docs.prowler.com/user-guide/providers/), with optional [`--push-to-cloud`](https://docs.prowler.com/user-guide/tutorials/prowler-app-import-findings) to send findings to Prowler Cloud and optional SARIF upload so findings show up in the repo's **Security → Code scanning** tab and as inline PR annotations.
```yaml
name: Prowler IaC Scan
on:
pull_request:
permissions:
contents: read
security-events: write
actions: read
jobs:
prowler:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: prowler-cloud/prowler@5.25
with:
provider: iac
output-formats: sarif json-ocsf
upload-sarif: true
flags: --severity critical high
```
Full configuration, per-provider authentication, and SARIF examples: [Prowler GitHub Action tutorial](docs/user-guide/tutorials/prowler-app-github-action.mdx). Marketplace listing: [Prowler Security Scan](https://github.com/marketplace/actions/prowler-security-scan).
# ✏️ High level architecture
## Prowler App
+307
View File
@@ -0,0 +1,307 @@
name: Prowler Security Scan
description: Run Prowler cloud security scanner using the official Docker image
branding:
icon: cloud
color: green
inputs:
provider:
description: Cloud provider to scan (e.g. aws, azure, gcp, github, kubernetes, iac). See https://docs.prowler.com for supported providers.
required: true
image-tag:
description: >
Docker image tag for prowlercloud/prowler.
Default is "stable" (latest release). Available tags:
"stable" (latest release), "latest" (master branch, not stable),
"<x.y.z>" (pinned release version).
See all tags at https://hub.docker.com/r/prowlercloud/prowler/tags
required: false
default: stable
output-formats:
description: Output format(s) for scan results (e.g. "json-ocsf", "sarif json-ocsf")
required: false
default: json-ocsf
push-to-cloud:
description: Push scan findings to Prowler Cloud. Requires the PROWLER_CLOUD_API_KEY environment variable. See https://docs.prowler.com/user-guide/tutorials/prowler-app-import-findings#using-the-cli
required: false
default: "false"
flags:
description: 'Additional CLI flags passed to the Prowler scan (e.g. "--severity critical high --compliance cis_aws"). Values containing spaces can be quoted, e.g. "--resource-tag ''Environment=My Server''".'
required: false
default: ""
extra-env:
description: >
Space-, newline-, or comma-separated list of host environment variable NAMES to forward to the Prowler container
(e.g. "AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN" for AWS,
"GITHUB_PERSONAL_ACCESS_TOKEN" for GitHub, "CLOUDFLARE_API_TOKEN" for Cloudflare).
List names only; set the values via `env:` at the workflow or job level (typically from `secrets.*`).
See the README for per-provider examples.
required: false
default: ""
upload-sarif:
description: 'Upload SARIF results to GitHub Code Scanning (requires "sarif" in output-formats and both `security-events: write` and `actions: read` permissions)'
required: false
default: "false"
sarif-file:
description: Path to the SARIF file to upload (auto-detected from output/ if not set)
required: false
default: ""
sarif-category:
description: Category for the SARIF upload (used to distinguish multiple analyses)
required: false
default: prowler
fail-on-findings:
description: Fail the workflow step when Prowler detects findings (exit code 3). By default the action tolerates findings and succeeds.
required: false
default: "false"
runs:
using: composite
steps:
- name: Validate inputs
shell: bash
env:
INPUT_IMAGE_TAG: ${{ inputs.image-tag }}
INPUT_UPLOAD_SARIF: ${{ inputs.upload-sarif }}
INPUT_OUTPUT_FORMATS: ${{ inputs.output-formats }}
run: |
# Validate image tag format (alphanumeric, dots, hyphens, underscores only)
if [[ ! "$INPUT_IMAGE_TAG" =~ ^[a-zA-Z0-9._-]+$ ]]; then
echo "::error::Invalid image-tag '${INPUT_IMAGE_TAG}'. Must contain only alphanumeric characters, dots, hyphens, and underscores."
exit 1
fi
# Warn if upload-sarif is enabled but sarif not in output-formats
if [ "$INPUT_UPLOAD_SARIF" = "true" ]; then
if [[ ! "$INPUT_OUTPUT_FORMATS" =~ (^|[[:space:]])sarif($|[[:space:]]) ]]; then
echo "::warning::upload-sarif is enabled but 'sarif' is not included in output-formats ('${INPUT_OUTPUT_FORMATS}'). SARIF upload will fail unless you add 'sarif' to output-formats."
fi
fi
- name: Run Prowler scan
shell: bash
env:
INPUT_PROVIDER: ${{ inputs.provider }}
INPUT_IMAGE_TAG: ${{ inputs.image-tag }}
INPUT_OUTPUT_FORMATS: ${{ inputs.output-formats }}
INPUT_PUSH_TO_CLOUD: ${{ inputs.push-to-cloud }}
INPUT_FLAGS: ${{ inputs.flags }}
INPUT_EXTRA_ENV: ${{ inputs.extra-env }}
INPUT_FAIL_ON_FINDINGS: ${{ inputs.fail-on-findings }}
run: |
set -e
# Parse space-separated inputs with shlex so values with spaces can be quoted
# (e.g. `--resource-tag 'Environment=My Server'`).
mapfile -t OUTPUT_FORMATS < <(python3 -c 'import shlex, os; [print(t) for t in shlex.split(os.environ.get("INPUT_OUTPUT_FORMATS", ""))]')
mapfile -t EXTRA_FLAGS < <(python3 -c 'import shlex, os; [print(t) for t in shlex.split(os.environ.get("INPUT_FLAGS", ""))]')
mapfile -t EXTRA_ENV_NAMES < <(python3 -c 'import shlex, os; [print(t) for t in shlex.split(os.environ.get("INPUT_EXTRA_ENV", "").replace(",", " "))]')
env_args=()
for var in "${EXTRA_ENV_NAMES[@]}"; do
[ -z "$var" ] && continue
if [[ ! "$var" =~ ^[A-Za-z_][A-Za-z0-9_]*$ ]]; then
echo "::error::Invalid env var name '${var}' in extra-env. Names must match ^[A-Za-z_][A-Za-z0-9_]*$."
exit 1
fi
env_args+=("-e" "$var")
done
push_args=()
if [ "$INPUT_PUSH_TO_CLOUD" = "true" ]; then
push_args=("--push-to-cloud")
env_args+=("-e" "PROWLER_CLOUD_API_KEY")
fi
mkdir -p "$GITHUB_WORKSPACE/output"
chmod 777 "$GITHUB_WORKSPACE/output"
set +e
docker run --rm \
"${env_args[@]}" \
-v "$GITHUB_WORKSPACE:/home/prowler/workspace" \
-v "$GITHUB_WORKSPACE/output:/home/prowler/workspace/output" \
-w /home/prowler/workspace \
"prowlercloud/prowler:${INPUT_IMAGE_TAG}" \
"$INPUT_PROVIDER" \
--output-formats "${OUTPUT_FORMATS[@]}" \
"${push_args[@]}" \
"${EXTRA_FLAGS[@]}"
exit_code=$?
set -e
# Exit code 3 = findings detected
if [ "$exit_code" -eq 3 ] && [ "$INPUT_FAIL_ON_FINDINGS" != "true" ]; then
echo "::notice::Prowler detected findings (exit code 3). Set fail-on-findings to 'true' to fail the workflow on findings."
exit 0
fi
exit $exit_code
- name: Upload scan results
if: always()
uses: actions/upload-artifact@b7c566a772e6b6bfb58ed0dc250532a479d7789f # v6.0.0
with:
name: prowler-${{ inputs.provider }}
path: output/
retention-days: 30
if-no-files-found: warn
- name: Find SARIF file
if: always() && inputs.upload-sarif == 'true'
id: find-sarif
shell: bash
env:
INPUT_SARIF_FILE: ${{ inputs.sarif-file }}
run: |
if [ -n "$INPUT_SARIF_FILE" ]; then
echo "sarif_path=$INPUT_SARIF_FILE" >> "$GITHUB_OUTPUT"
else
sarif_file=$(find output/ -name '*.sarif' -type f | head -1)
if [ -z "$sarif_file" ]; then
echo "::warning::No .sarif file found in output/. Ensure 'sarif' is included in output-formats."
echo "sarif_path=" >> "$GITHUB_OUTPUT"
else
echo "sarif_path=$sarif_file" >> "$GITHUB_OUTPUT"
fi
fi
- name: Upload SARIF to GitHub Code Scanning
if: always() && inputs.upload-sarif == 'true' && steps.find-sarif.outputs.sarif_path != ''
uses: github/codeql-action/upload-sarif@d4b3ca9fa7f69d38bfcd667bdc45bc373d16277e # v4
with:
sarif_file: ${{ steps.find-sarif.outputs.sarif_path }}
category: ${{ inputs.sarif-category }}
- name: Write scan summary
if: always()
shell: bash
env:
INPUT_PROVIDER: ${{ inputs.provider }}
INPUT_UPLOAD_SARIF: ${{ inputs.upload-sarif }}
INPUT_PUSH_TO_CLOUD: ${{ inputs.push-to-cloud }}
RUN_URL: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}
REPO_URL: ${{ github.server_url }}/${{ github.repository }}
BRANCH: ${{ github.head_ref || github.ref_name }}
GH_TOKEN: ${{ github.token }}
run: |
set +e
# Build a link to the scan step in the workflow logs. Requires `actions: read`
# on the caller's GITHUB_TOKEN; silently skips the link if unavailable.
scan_step_url=""
if [ -n "${GH_TOKEN:-}" ] && command -v gh >/dev/null 2>&1; then
job_info=$(gh api \
"repos/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/attempts/${GITHUB_RUN_ATTEMPT:-1}/jobs" \
--jq ".jobs[] | select(.runner_name == \"${RUNNER_NAME:-}\")" 2>/dev/null)
if [ -n "$job_info" ]; then
job_id=$(jq -r '.id // empty' <<<"$job_info")
step_number=$(jq -r '[.steps[]? | select((.name // "") | test("Run Prowler scan"; "i")) | .number] | first // empty' <<<"$job_info")
if [ -z "$step_number" ]; then
step_number=$(jq -r '[.steps[]? | select(.status == "in_progress") | .number] | first // empty' <<<"$job_info")
fi
if [ -n "$job_id" ] && [ -n "$step_number" ]; then
scan_step_url="${REPO_URL}/actions/runs/${GITHUB_RUN_ID}/job/${job_id}#step:${step_number}:1"
elif [ -n "$job_id" ]; then
scan_step_url="${REPO_URL}/actions/runs/${GITHUB_RUN_ID}/job/${job_id}"
fi
fi
fi
# Map provider code to a properly-cased display name.
case "$INPUT_PROVIDER" in
alibabacloud) provider_name="Alibaba Cloud" ;;
aws) provider_name="AWS" ;;
azure) provider_name="Azure" ;;
cloudflare) provider_name="Cloudflare" ;;
gcp) provider_name="GCP" ;;
github) provider_name="GitHub" ;;
googleworkspace) provider_name="Google Workspace" ;;
iac) provider_name="IaC" ;;
image) provider_name="Container Image" ;;
kubernetes) provider_name="Kubernetes" ;;
llm) provider_name="LLM" ;;
m365) provider_name="Microsoft 365" ;;
mongodbatlas) provider_name="MongoDB Atlas" ;;
nhn) provider_name="NHN" ;;
openstack) provider_name="OpenStack" ;;
oraclecloud) provider_name="Oracle Cloud" ;;
vercel) provider_name="Vercel" ;;
*) provider_name="${INPUT_PROVIDER^}" ;;
esac
ocsf_file=$(find output/ -name '*.ocsf.json' -type f 2>/dev/null | head -1)
{
echo "## Prowler ${provider_name} Scan Summary"
echo ""
counts=""
if [ -n "$ocsf_file" ] && [ -s "$ocsf_file" ]; then
counts=$(jq -r '[
length,
([.[] | select(.status_code == "FAIL")] | length),
([.[] | select(.status_code == "PASS")] | length),
([.[] | select(.status_code == "MUTED")] | length),
([.[] | select(.status_code == "FAIL" and .severity == "Critical")] | length),
([.[] | select(.status_code == "FAIL" and .severity == "High")] | length),
([.[] | select(.status_code == "FAIL" and .severity == "Medium")] | length),
([.[] | select(.status_code == "FAIL" and .severity == "Low")] | length),
([.[] | select(.status_code == "FAIL" and .severity == "Informational")] | length)
] | @tsv' "$ocsf_file" 2>/dev/null)
fi
if [ -n "$counts" ]; then
read -r total fail pass muted critical high medium low info <<<"$counts"
line="**${fail:-0} failing** · ${pass:-0} passing"
[ "${muted:-0}" -gt 0 ] && line="${line} · ${muted} muted"
echo "${line} — ${total:-0} checks total"
echo ""
echo "| Severity | Failing |"
echo "|----------|---------|"
echo "| ‼️ Critical | ${critical:-0} |"
echo "| 🔴 High | ${high:-0} |"
echo "| 🟠 Medium | ${medium:-0} |"
echo "| 🔵 Low | ${low:-0} |"
echo "| ⚪ Informational | ${info:-0} |"
echo ""
else
echo "_No findings report was produced. Check the scan logs above._"
echo ""
fi
if [ -n "$scan_step_url" ]; then
echo "**Scan logs:** [view in workflow run](${scan_step_url})"
echo ""
fi
echo "**Get the full report:** [\`prowler-${INPUT_PROVIDER}\` artifact](${RUN_URL}#artifacts)"
if [ "$INPUT_UPLOAD_SARIF" = "true" ] && [ -n "$BRANCH" ]; then
encoded_branch=$(jq -nr --arg b "$BRANCH" '$b|@uri')
echo ""
echo "**See results in GitHub Code Security:** [open alerts on \`${BRANCH}\`](${REPO_URL}/security/code-scanning?query=is%3Aopen+branch%3A${encoded_branch})"
fi
if [ "$INPUT_PUSH_TO_CLOUD" != "true" ]; then
echo ""
echo "---"
echo ""
echo "### Scale ${provider_name} security with Prowler Cloud ☁️"
echo ""
echo "Send this scan's findings to **[Prowler Cloud](https://cloud.prowler.com)** and get:"
echo ""
echo "- **Unified findings** across every cloud, SaaS provider (M365, Google Workspace, GitHub, MongoDB Atlas), IaC repo, Kubernetes cluster, and container image"
echo "- **Posture over time** with alerts, and notifications"
echo "- **Prowler Lighthouse AI**: agentic assistant that triages findings, explains root cause and helps with remediation"
echo "- **50+ Compliance frameworks** mapped automatically"
echo "- **Enterprise-ready platform**: SOC 2 Type 2, SSO/SAML, AWS Security Hub, S3 and Jira integrations"
echo ""
echo "**Get started in 3 steps:**"
echo "1. Create an account at [cloud.prowler.com](https://cloud.prowler.com)"
echo "2. Generate a Prowler Cloud API key ([docs](https://docs.prowler.com/user-guide/tutorials/prowler-app-import-findings#using-the-cli))"
echo "3. Add \`PROWLER_CLOUD_API_KEY\` to your GitHub secrets and set \`push-to-cloud: true\` on this action"
echo ""
echo "See [prowler.com/pricing](https://prowler.com/pricing) for plan details."
fi
} >> "$GITHUB_STEP_SUMMARY"
+96
View File
@@ -2,6 +2,101 @@
All notable changes to the **Prowler API** are documented in this file.
## [1.26.0] (Prowler v5.25.0)
### 🚀 Added
- CIS Benchmark PDF report generation for scans, exposing the latest CIS version per provider via `GET /scans/{id}/cis/{name}/` [(#10650)](https://github.com/prowler-cloud/prowler/pull/10650)
- `/overviews/resource-groups` (resource inventory), `/overviews/categories` and `/overviews/attack-surfaces` now reflect newly-muted findings without waiting for the next scan. The post-mute `reaggregate-all-finding-group-summaries` task now also dispatches `aggregate_scan_resource_group_summaries_task`, `aggregate_scan_category_summaries_task` and `aggregate_attack_surface_task` per latest scan of every `(provider, day)` pair, rebuilding `ScanGroupSummary`, `ScanCategorySummary` and `AttackSurfaceOverview` alongside the tables already covered in #10827 [(#10843)](https://github.com/prowler-cloud/prowler/pull/10843)
- Install zizmor v1.24.1 in API Docker image for GitHub Actions workflow scanning [(#10607)](https://github.com/prowler-cloud/prowler/pull/10607)
### 🔄 Changed
- Allows tenant owners to expel users from their organizations [(#10787)](https://github.com/prowler-cloud/prowler/pull/10787)
- `aggregate_findings`, `aggregate_attack_surface`, `aggregate_scan_resource_group_summaries` and `aggregate_scan_category_summaries` now upsert via `bulk_create(update_conflicts=True, ...)` instead of the prior `ignore_conflicts=True` / plain INSERT / `already backfilled` short-circuit. Re-runs triggered by the post-mute reaggregation pipeline no longer trip the `unique_*_per_scan` constraints nor silently drop updates, and are race-safe under concurrent writers (e.g. scan completion overlapping with a fresh mute rule) [(#10843)](https://github.com/prowler-cloud/prowler/pull/10843)
- Rename the scan-category and scan-resource-group summary aggregators from `backfill_*` to `aggregate_*` [(#10843)](https://github.com/prowler-cloud/prowler/pull/10843)
### 🐞 Fixed
- `generate_outputs_task` crashing with `KeyError` for compliance frameworks listed by `get_compliance_frameworks` but not loadable by `Compliance.get_bulk` [(#10903)](https://github.com/prowler-cloud/prowler/pull/10903)
---
## [1.25.4] (Prowler v5.24.4)
### 🚀 Added
- `DJANGO_SENTRY_TRACES_SAMPLE_RATE` env var (default `0.02`) enables Sentry performance tracing for the API [(#10873)](https://github.com/prowler-cloud/prowler/pull/10873)
### 🔄 Changed
- Attack Paths: Neo4j driver `connection_acquisition_timeout` is now configurable via `NEO4J_CONN_ACQUISITION_TIMEOUT` (default lowered from 120 s to 15 s) [(#10873)](https://github.com/prowler-cloud/prowler/pull/10873)
### 🐞 Fixed
- `/tmp/prowler_api_output` saturation in compliance report workers: the final `rmtree` in `generate_compliance_reports` now only waits on frameworks actually generated for the provider (so unsupported frameworks no longer leave a placeholder `results` entry that blocks cleanup), output directories are created lazily per enabled framework, and both `generate_compliance_reports` and `generate_outputs_task` run an opportunistic stale cleanup at task start with a 48h age threshold, a per-host `fcntl` throttle, a 50-deletions-per-run cap, and guards that protect EXECUTING scans and scans whose `output_location` still points to a local path (metadata lookups routed through the admin DB so RLS does not hide those rows) [(#10874)](https://github.com/prowler-cloud/prowler/pull/10874)
---
## [1.25.3] (Prowler v5.24.3)
### 🚀 Added
- `/overviews/findings`, `/overviews/findings-severity` and `/overviews/services` now reflect newly-muted findings without waiting for the next scan. The post-mute `reaggregate-all-finding-group-summaries` task was extended to re-run the same per-scan pipeline that scan completion runs (`ScanSummary`, `DailySeveritySummary`, `FindingGroupDailySummary`) on the latest scan of every `(provider, day)` pair, keeping the pre-aggregated tables in sync with `Finding.muted` updates [(#10827)](https://github.com/prowler-cloud/prowler/pull/10827)
### 🐞 Fixed
- Finding groups aggregated `status` now treats muted findings as resolved: a group is `FAIL` only while at least one non-muted FAIL remains, otherwise it is `PASS` (including fully-muted groups). The `filter[status]` filter and the `sort=status` ordering share the same semantics, keeping `status` consistent with `fail_count` and the orthogonal `muted` flag [(#10825)](https://github.com/prowler-cloud/prowler/pull/10825)
- `aggregate_findings` is now idempotent: it deletes the scan's existing `ScanSummary` rows before `bulk_create`, so re-runs (such as the post-mute reaggregation pipeline) no longer violate the `unique_scan_summary` constraint and no longer abort the downstream `DailySeveritySummary` / `FindingGroupDailySummary` recomputation for the affected scan [(#10827)](https://github.com/prowler-cloud/prowler/pull/10827)
- Attack Paths: Findings on AWS were silently dropped during the Neo4j merge for resources whose Cartography node is keyed by a short identifier (e.g. EC2 instances) rather than the full ARN [(#10839)](https://github.com/prowler-cloud/prowler/pull/10839)
---
## [1.25.2] (Prowler v5.24.2)
### 🔄 Changed
- Finding groups `/resources` endpoints now materialize the filtered finding IDs into a Python list before filtering `ResourceFindingMapping`, so PostgreSQL switches from a Merge Semi Join that read hundreds of thousands of RFM index entries to a Nested Loop Index Scan over `finding_id`. The `has_mappings.exists()` pre-check is removed, and a request-scoped cache deduplicates the finding-id round-trip across the helpers that build different RFM querysets [(#10816)](https://github.com/prowler-cloud/prowler/pull/10816)
### 🐞 Fixed
- `/finding-groups/latest/<check_id>/resources` now selects the latest completed scan per provider by `-completed_at` (then `-inserted_at`) instead of `-inserted_at`, matching the `/finding-groups/latest` summary path and the daily-summary upsert so overlapping scans no longer produce diverging `delta`/`new_count` between the two endpoints [(#10802)](https://github.com/prowler-cloud/prowler/pull/10802)
## [1.25.1] (Prowler v5.24.1)
### 🔄 Changed
- Attack Paths: Restore `SYNC_BATCH_SIZE` and `FINDINGS_BATCH_SIZE` defaults to 1000, upgrade Cartography to 0.135.0, enable Celery queue priority for cleanup task, rewrite Finding insertion, remove AWS graph cleanup and add timing logs [(#10729)](https://github.com/prowler-cloud/prowler/pull/10729)
### 🐞 Fixed
- Finding group resources endpoints now include findings without associated resources (orphaned IaC findings) as simulated resource rows, and return one row per finding when multiple findings share a resource [(#10708)](https://github.com/prowler-cloud/prowler/pull/10708)
- Attack Paths: Missing `tenant_id` filter while getting related findings after scan completes [(#10722)](https://github.com/prowler-cloud/prowler/pull/10722)
- Finding group counters `pass_count`, `fail_count` and `manual_count` now exclude muted findings [(#10753)](https://github.com/prowler-cloud/prowler/pull/10753)
- Silent data loss in `ResourceFindingMapping` bulk insert that left findings orphaned when `INSERT ... ON CONFLICT DO NOTHING` dropped rows without raising; added explicit `unique_fields` [(#10724)](https://github.com/prowler-cloud/prowler/pull/10724)
- `DELETE /tenants/{tenant_pk}/memberships/{id}` now deletes the expelled user's account when the removed membership was their last one, and blacklists every outstanding refresh token for that user so their existing sessions can no longer mint new access tokens [(#10787)](https://github.com/prowler-cloud/prowler/pull/10787)
---
## [1.25.0] (Prowler v5.24.0)
### 🔄 Changed
- Bump Poetry to `2.3.4` in Dockerfile and pre-commit hooks. Regenerate `api/poetry.lock` [(#10681)](https://github.com/prowler-cloud/prowler/pull/10681)
- Attack Paths: Remove dead `cleanup_findings` no-op and its supporting `prowler_finding_lastupdated` index [(#10684)](https://github.com/prowler-cloud/prowler/pull/10684)
### 🐞 Fixed
- Worker-beat race condition on cold start: replaced `sleep 15` with API service healthcheck dependency (Docker Compose) and init containers (Helm), aligned Gunicorn default port to `8080` [(#10603)](https://github.com/prowler-cloud/prowler/pull/10603)
- API container startup crash on Linux due to root-owned bind-mount preventing JWT key generation [(#10646)](https://github.com/prowler-cloud/prowler/pull/10646)
### 🔐 Security
- `pytest` from 8.2.2 to 9.0.3 to fix CVE-2025-71176 [(#10678)](https://github.com/prowler-cloud/prowler/pull/10678)
---
## [1.24.0] (Prowler v5.23.0)
### 🚀 Added
@@ -12,6 +107,7 @@ All notable changes to the **Prowler API** are documented in this file.
- Finding groups list and latest endpoints support `sort=delta`, ordering by `new_count` then `changed_count` so groups with the most new findings rank highest [(#10606)](https://github.com/prowler-cloud/prowler/pull/10606)
- Finding group resources endpoints (`/finding-groups/{check_id}/resources` and `/finding-groups/latest/{check_id}/resources`) now expose `finding_id` per row, pointing to the most recent matching Finding for each resource. UUIDv7 ordering guarantees `Max(finding__id)` resolves to the latest snapshot [(#10630)](https://github.com/prowler-cloud/prowler/pull/10630)
- Handle CIS and CISA SCuBA compliance framework from google workspace [(#10629)](https://github.com/prowler-cloud/prowler/pull/10629)
- Sort support for all finding group counter fields: `pass_muted_count`, `fail_muted_count`, `manual_muted_count`, and all `new_*`/`changed_*` status-mute breakdown counters [(#10655)](https://github.com/prowler-cloud/prowler/pull/10655)
### 🔄 Changed
+21 -1
View File
@@ -8,6 +8,9 @@ ENV POWERSHELL_VERSION=${POWERSHELL_VERSION}
ARG TRIVY_VERSION=0.69.2
ENV TRIVY_VERSION=${TRIVY_VERSION}
ARG ZIZMOR_VERSION=1.24.1
ENV ZIZMOR_VERSION=${ZIZMOR_VERSION}
# hadolint ignore=DL3008
RUN apt-get update && apt-get install -y --no-install-recommends \
wget \
@@ -22,6 +25,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libtool \
libxslt1-dev \
python3-dev \
git \
&& rm -rf /var/lib/apt/lists/*
# Install PowerShell
@@ -57,6 +61,22 @@ RUN ARCH=$(uname -m) && \
mkdir -p /tmp/.cache/trivy && \
chmod 777 /tmp/.cache/trivy
# Install zizmor for GitHub Actions workflow scanning
RUN ARCH=$(uname -m) && \
if [ "$ARCH" = "x86_64" ]; then \
ZIZMOR_ARCH="x86_64-unknown-linux-gnu" ; \
elif [ "$ARCH" = "aarch64" ]; then \
ZIZMOR_ARCH="aarch64-unknown-linux-gnu" ; \
else \
echo "Unsupported architecture for zizmor: $ARCH" && exit 1 ; \
fi && \
wget --progress=dot:giga "https://github.com/zizmorcore/zizmor/releases/download/v${ZIZMOR_VERSION}/zizmor-${ZIZMOR_ARCH}.tar.gz" -O /tmp/zizmor.tar.gz && \
mkdir -p /tmp/zizmor-extract && \
tar zxf /tmp/zizmor.tar.gz -C /tmp/zizmor-extract && \
mv /tmp/zizmor-extract/zizmor /usr/local/bin/zizmor && \
chmod +x /usr/local/bin/zizmor && \
rm -rf /tmp/zizmor.tar.gz /tmp/zizmor-extract
# Add prowler user
RUN addgroup --gid 1000 prowler && \
adduser --uid 1000 --gid 1000 --disabled-password --gecos "" prowler
@@ -71,7 +91,7 @@ RUN mkdir -p /tmp/prowler_api_output
COPY pyproject.toml ./
RUN pip install --no-cache-dir --upgrade pip && \
pip install --no-cache-dir poetry
pip install --no-cache-dir poetry==2.3.4
ENV PATH="/home/prowler/.local/bin:$PATH"
-1
View File
@@ -56,7 +56,6 @@ start_worker() {
start_worker_beat() {
echo "Starting the worker-beat..."
sleep 15
poetry run python -m celery -A config.celery beat -l "${DJANGO_LOGGING_LEVEL:-info}" --scheduler django_celery_beat.schedulers:DatabaseScheduler
}
+108 -425
View File
File diff suppressed because it is too large Load Diff
+5 -5
View File
@@ -25,7 +25,7 @@ dependencies = [
"defusedxml==0.7.1",
"gunicorn==23.0.0",
"lxml==5.3.2",
"prowler @ git+https://github.com/prowler-cloud/prowler.git@v5.23",
"prowler @ git+https://github.com/prowler-cloud/prowler.git@v5.25",
"psycopg2-binary==2.9.9",
"pytest-celery[redis] (==1.3.0)",
"sentry-sdk[django] (==2.56.0)",
@@ -38,7 +38,7 @@ dependencies = [
"matplotlib (==3.10.8)",
"reportlab (==4.4.10)",
"neo4j (==6.1.0)",
"cartography (==0.132.0)",
"cartography (==0.135.0)",
"gevent (==25.9.1)",
"werkzeug (==3.1.7)",
"sqlparse (==0.5.5)",
@@ -50,7 +50,7 @@ name = "prowler-api"
package-mode = false
# Needed for the SDK compatibility
requires-python = ">=3.11,<3.13"
version = "1.24.1"
version = "1.26.0"
[project.scripts]
celery = "src.backend.config.settings.celery"
@@ -62,10 +62,9 @@ django-silk = "5.3.2"
docker = "7.1.0"
filelock = "3.20.3"
freezegun = "1.5.1"
marshmallow = "==3.26.2"
mypy = "1.10.1"
pylint = "3.2.5"
pytest = "8.2.2"
pytest = "9.0.3"
pytest-cov = "5.0.0"
pytest-django = "4.8.0"
pytest-env = "1.1.3"
@@ -75,3 +74,4 @@ ruff = "0.5.0"
safety = "3.7.0"
tqdm = "4.67.1"
vulture = "2.14"
prek = "0.3.9"
+2 -1
View File
@@ -28,6 +28,7 @@ READ_QUERY_TIMEOUT_SECONDS = env.int(
"ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
)
MAX_CUSTOM_QUERY_NODES = env.int("ATTACK_PATHS_MAX_CUSTOM_QUERY_NODES", default=250)
CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
READ_EXCEPTION_CODES = [
"Neo.ClientError.Statement.AccessMode",
"Neo.ClientError.Procedure.ProcedureNotFound",
@@ -62,7 +63,7 @@ def init_driver() -> neo4j.Driver:
auth=(config["USER"], config["PASSWORD"]),
keep_alive=True,
max_connection_lifetime=7200,
connection_acquisition_timeout=120,
connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
max_connection_pool_size=50,
)
_driver.verify_connectivity()
+7 -8
View File
@@ -1,7 +1,6 @@
from collections.abc import Iterable, Mapping
from api.models import Provider
from prowler.config.config import get_available_compliance_frameworks
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.check.models import CheckMetadata
@@ -95,12 +94,12 @@ PROWLER_CHECKS = LazyChecksMapping()
def get_compliance_frameworks(provider_type: Provider.ProviderChoices) -> list[str]:
"""
Retrieve and cache the list of available compliance frameworks for a specific cloud provider.
"""List compliance frameworks the API can load for `provider_type`.
This function lazily loads and caches the available compliance frameworks (e.g., CIS, MITRE, ISO)
for each provider type (AWS, Azure, GCP, etc.) on first access. Subsequent calls for the same
provider will return the cached result.
The list is sourced from `Compliance.get_bulk` so that the names
returned here are guaranteed to be loadable by the bulk loader. This
prevents downstream key mismatches (e.g. CSV report generation iterating
framework names and looking them up in the bulk dict).
Args:
provider_type (Provider.ProviderChoices): The cloud provider type for which to retrieve
@@ -112,8 +111,8 @@ def get_compliance_frameworks(provider_type: Provider.ProviderChoices) -> list[s
"""
global AVAILABLE_COMPLIANCE_FRAMEWORKS
if provider_type not in AVAILABLE_COMPLIANCE_FRAMEWORKS:
AVAILABLE_COMPLIANCE_FRAMEWORKS[provider_type] = (
get_available_compliance_frameworks(provider_type)
AVAILABLE_COMPLIANCE_FRAMEWORKS[provider_type] = list(
Compliance.get_bulk(provider_type).keys()
)
return AVAILABLE_COMPLIANCE_FRAMEWORKS[provider_type]
+1
View File
@@ -330,6 +330,7 @@ class MembershipFilter(FilterSet):
model = Membership
fields = {
"tenant": ["exact"],
"user": ["exact"],
"role": ["exact"],
"date_joined": ["date", "gte", "lte"],
}
@@ -0,0 +1,23 @@
from django.db import migrations
TASK_NAME = "attack-paths-cleanup-stale-scans"
def set_cleanup_priority(apps, schema_editor):
PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")
PeriodicTask.objects.filter(name=TASK_NAME).update(priority=0)
def unset_cleanup_priority(apps, schema_editor):
PeriodicTask = apps.get_model("django_celery_beat", "PeriodicTask")
PeriodicTask.objects.filter(name=TASK_NAME).update(priority=None)
class Migration(migrations.Migration):
dependencies = [
("api", "0089_backfill_finding_group_status_muted"),
]
operations = [
migrations.RunPython(set_cleanup_priority, unset_cleanup_priority),
]
+1 -1
View File
@@ -1,7 +1,7 @@
openapi: 3.0.3
info:
title: Prowler API
version: 1.24.1
version: 1.26.0
description: |-
Prowler API specification.
@@ -12,6 +12,8 @@ from unittest.mock import MagicMock, patch
import neo4j
import pytest
import api.attack_paths.database as db_module
class TestLazyInitialization:
"""Test that Neo4j driver is initialized lazily on first use."""
@@ -19,8 +21,6 @@ class TestLazyInitialization:
@pytest.fixture(autouse=True)
def reset_module_state(self):
"""Reset module-level singleton state before each test."""
import api.attack_paths.database as db_module
original_driver = db_module._driver
db_module._driver = None
@@ -31,8 +31,6 @@ class TestLazyInitialization:
def test_driver_not_initialized_at_import(self):
"""Driver should be None after module import (no eager connection)."""
import api.attack_paths.database as db_module
assert db_module._driver is None
@patch("api.attack_paths.database.settings")
@@ -41,8 +39,6 @@ class TestLazyInitialization:
self, mock_driver_factory, mock_settings
):
"""init_driver() should create connection only when called."""
import api.attack_paths.database as db_module
mock_driver = MagicMock()
mock_driver_factory.return_value = mock_driver
mock_settings.DATABASES = {
@@ -69,8 +65,6 @@ class TestLazyInitialization:
self, mock_driver_factory, mock_settings
):
"""Subsequent calls should return cached driver without reconnecting."""
import api.attack_paths.database as db_module
mock_driver = MagicMock()
mock_driver_factory.return_value = mock_driver
mock_settings.DATABASES = {
@@ -99,8 +93,6 @@ class TestLazyInitialization:
self, mock_driver_factory, mock_settings
):
"""get_driver() should use init_driver() for lazy initialization."""
import api.attack_paths.database as db_module
mock_driver = MagicMock()
mock_driver_factory.return_value = mock_driver
mock_settings.DATABASES = {
@@ -118,14 +110,50 @@ class TestLazyInitialization:
mock_driver_factory.assert_called_once()
class TestConnectionAcquisitionTimeout:
"""Test that the connection acquisition timeout is configurable."""
@pytest.fixture(autouse=True)
def reset_module_state(self):
original_driver = db_module._driver
original_timeout = db_module.CONN_ACQUISITION_TIMEOUT
db_module._driver = None
yield
db_module._driver = original_driver
db_module.CONN_ACQUISITION_TIMEOUT = original_timeout
@patch("api.attack_paths.database.settings")
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
def test_driver_receives_configured_timeout(
self, mock_driver_factory, mock_settings
):
"""init_driver() should pass CONN_ACQUISITION_TIMEOUT to the neo4j driver."""
mock_driver_factory.return_value = MagicMock()
mock_settings.DATABASES = {
"neo4j": {
"HOST": "localhost",
"PORT": 7687,
"USER": "neo4j",
"PASSWORD": "password",
}
}
db_module.CONN_ACQUISITION_TIMEOUT = 42
db_module.init_driver()
_, kwargs = mock_driver_factory.call_args
assert kwargs["connection_acquisition_timeout"] == 42
class TestAtexitRegistration:
"""Test that atexit cleanup handler is registered correctly."""
@pytest.fixture(autouse=True)
def reset_module_state(self):
"""Reset module-level singleton state before each test."""
import api.attack_paths.database as db_module
original_driver = db_module._driver
db_module._driver = None
@@ -141,8 +169,6 @@ class TestAtexitRegistration:
self, mock_driver_factory, mock_atexit_register, mock_settings
):
"""atexit.register should be called on first initialization."""
import api.attack_paths.database as db_module
mock_driver_factory.return_value = MagicMock()
mock_settings.DATABASES = {
"neo4j": {
@@ -168,8 +194,6 @@ class TestAtexitRegistration:
The double-checked locking on _driver ensures the atexit registration
block only executes once (when _driver is first created).
"""
import api.attack_paths.database as db_module
mock_driver_factory.return_value = MagicMock()
mock_settings.DATABASES = {
"neo4j": {
@@ -194,8 +218,6 @@ class TestCloseDriver:
@pytest.fixture(autouse=True)
def reset_module_state(self):
"""Reset module-level singleton state before each test."""
import api.attack_paths.database as db_module
original_driver = db_module._driver
db_module._driver = None
@@ -206,8 +228,6 @@ class TestCloseDriver:
def test_close_driver_closes_and_clears_driver(self):
"""close_driver() should close the driver and set it to None."""
import api.attack_paths.database as db_module
mock_driver = MagicMock()
db_module._driver = mock_driver
@@ -218,8 +238,6 @@ class TestCloseDriver:
def test_close_driver_handles_none_driver(self):
"""close_driver() should handle case where driver is None."""
import api.attack_paths.database as db_module
db_module._driver = None
# Should not raise
@@ -229,8 +247,6 @@ class TestCloseDriver:
def test_close_driver_clears_driver_even_on_close_error(self):
"""Driver should be cleared even if close() raises an exception."""
import api.attack_paths.database as db_module
mock_driver = MagicMock()
mock_driver.close.side_effect = Exception("Connection error")
db_module._driver = mock_driver
@@ -246,8 +262,6 @@ class TestExecuteReadQuery:
"""Test read query execution helper."""
def test_execute_read_query_calls_read_session_and_returns_result(self):
import api.attack_paths.database as db_module
tx = MagicMock()
expected_graph = MagicMock()
run_result = MagicMock()
@@ -289,8 +303,6 @@ class TestExecuteReadQuery:
assert result is expected_graph
def test_execute_read_query_defaults_parameters_to_empty_dict(self):
import api.attack_paths.database as db_module
tx = MagicMock()
run_result = MagicMock()
run_result.graph.return_value = MagicMock()
@@ -325,8 +337,6 @@ class TestGetSessionReadOnly:
@pytest.fixture(autouse=True)
def reset_module_state(self):
import api.attack_paths.database as db_module
original_driver = db_module._driver
db_module._driver = None
yield
@@ -341,8 +351,6 @@ class TestGetSessionReadOnly:
)
def test_get_session_raises_write_query_not_allowed(self, neo4j_code):
"""Read-mode Neo4j errors should raise `WriteQueryNotAllowedException`."""
import api.attack_paths.database as db_module
mock_session = MagicMock()
neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
code=neo4j_code,
@@ -362,8 +370,6 @@ class TestGetSessionReadOnly:
def test_get_session_raises_generic_exception_for_other_errors(self):
"""Non-read-mode Neo4j errors should raise GraphDatabaseQueryException."""
import api.attack_paths.database as db_module
mock_session = MagicMock()
neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
code="Neo.ClientError.Statement.SyntaxError",
@@ -388,8 +394,6 @@ class TestThreadSafety:
@pytest.fixture(autouse=True)
def reset_module_state(self):
"""Reset module-level singleton state before each test."""
import api.attack_paths.database as db_module
original_driver = db_module._driver
db_module._driver = None
@@ -404,8 +408,6 @@ class TestThreadSafety:
self, mock_driver_factory, mock_settings
):
"""Multiple threads calling init_driver() should create only one driver."""
import api.attack_paths.database as db_module
mock_driver = MagicMock()
mock_driver_factory.return_value = mock_driver
mock_settings.DATABASES = {
@@ -448,8 +450,6 @@ class TestHasProviderData:
"""Test has_provider_data helper for checking provider nodes in Neo4j."""
def test_returns_true_when_nodes_exist(self):
import api.attack_paths.database as db_module
mock_session = MagicMock()
mock_result = MagicMock()
mock_result.single.return_value = MagicMock() # non-None record
@@ -468,8 +468,6 @@ class TestHasProviderData:
mock_session.run.assert_called_once()
def test_returns_false_when_no_nodes(self):
import api.attack_paths.database as db_module
mock_session = MagicMock()
mock_result = MagicMock()
mock_result.single.return_value = None
@@ -486,8 +484,6 @@ class TestHasProviderData:
assert db_module.has_provider_data("db-tenant-abc", "provider-123") is False
def test_returns_false_when_database_not_found(self):
import api.attack_paths.database as db_module
session_ctx = MagicMock()
session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
message="Database does not exist",
@@ -503,8 +499,6 @@ class TestHasProviderData:
)
def test_raises_on_other_errors(self):
import api.attack_paths.database as db_module
session_ctx = MagicMock()
session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
message="Connection refused",
@@ -1,13 +1,18 @@
from unittest.mock import MagicMock, patch
import pytest
from api import compliance as compliance_module
from api.compliance import (
generate_compliance_overview_template,
generate_scan_compliance,
get_compliance_frameworks,
get_prowler_provider_checks,
get_prowler_provider_compliance,
load_prowler_checks,
)
from api.models import Provider
from prowler.lib.check.compliance_models import Compliance
class TestCompliance:
@@ -250,3 +255,58 @@ class TestCompliance:
}
assert template == expected_template
@pytest.fixture
def reset_compliance_cache():
"""Reset the module-level cache so each test starts cold."""
previous = dict(compliance_module.AVAILABLE_COMPLIANCE_FRAMEWORKS)
compliance_module.AVAILABLE_COMPLIANCE_FRAMEWORKS.clear()
try:
yield
finally:
compliance_module.AVAILABLE_COMPLIANCE_FRAMEWORKS.clear()
compliance_module.AVAILABLE_COMPLIANCE_FRAMEWORKS.update(previous)
class TestGetComplianceFrameworks:
def test_returns_keys_from_compliance_get_bulk(self, reset_compliance_cache):
with patch("api.compliance.Compliance") as mock_compliance:
mock_compliance.get_bulk.return_value = {
"cis_1.4_aws": MagicMock(),
"mitre_attack_aws": MagicMock(),
}
result = get_compliance_frameworks(Provider.ProviderChoices.AWS)
assert sorted(result) == ["cis_1.4_aws", "mitre_attack_aws"]
mock_compliance.get_bulk.assert_called_once_with(Provider.ProviderChoices.AWS)
def test_caches_result_per_provider(self, reset_compliance_cache):
with patch("api.compliance.Compliance") as mock_compliance:
mock_compliance.get_bulk.return_value = {"cis_1.4_aws": MagicMock()}
get_compliance_frameworks(Provider.ProviderChoices.AWS)
get_compliance_frameworks(Provider.ProviderChoices.AWS)
# Cached after first call.
assert mock_compliance.get_bulk.call_count == 1
@pytest.mark.parametrize(
"provider_type",
[choice.value for choice in Provider.ProviderChoices],
)
def test_listing_is_subset_of_bulk(self, reset_compliance_cache, provider_type):
"""Regression for CLOUD-API-40S: every name returned by
``get_compliance_frameworks`` must be loadable via ``Compliance.get_bulk``.
A divergence here is what produced ``KeyError: 'csa_ccm_4.0'`` in
``generate_outputs_task`` after universal/multi-provider compliance
JSONs were introduced at the top-level ``prowler/compliance/`` path.
"""
bulk_keys = set(Compliance.get_bulk(provider_type).keys())
listed = set(get_compliance_frameworks(provider_type))
missing = listed - bulk_keys
assert not missing, (
f"get_compliance_frameworks({provider_type!r}) returned names not "
f"loadable by Compliance.get_bulk: {sorted(missing)}"
)
+603 -8
View File
@@ -32,6 +32,11 @@ from django_celery_results.models import TaskResult
from rest_framework import status
from rest_framework.exceptions import PermissionDenied
from rest_framework.response import Response
from rest_framework_simplejwt.token_blacklist.models import (
BlacklistedToken,
OutstandingToken,
)
from rest_framework_simplejwt.tokens import RefreshToken
from api.attack_paths import (
AttackPathsQueryDefinition,
@@ -47,6 +52,7 @@ from api.models import (
Finding,
Integration,
Invitation,
InvitationRoleRelationship,
LighthouseProviderConfiguration,
LighthouseProviderModels,
LighthouseTenantConfiguration,
@@ -57,6 +63,7 @@ from api.models import (
ProviderGroupMembership,
ProviderSecret,
Resource,
ResourceFindingMapping,
Role,
RoleProviderGroupRelationship,
SAMLConfiguration,
@@ -745,6 +752,39 @@ class TestTenantViewSet:
# Test user + 2 extra users for tenant 2
assert len(response.json()["data"]) == 3
def test_tenants_list_memberships_filter_by_user(
self, authenticated_client, tenants_fixture, extra_users
):
_, tenant2, _ = tenants_fixture
_, user3_membership = extra_users
user3, membership3 = user3_membership
response = authenticated_client.get(
reverse("tenant-membership-list", kwargs={"tenant_pk": tenant2.id}),
{"filter[user]": str(user3.id)},
)
assert response.status_code == status.HTTP_200_OK
data = response.json()["data"]
assert len(data) == 1
assert data[0]["id"] == str(membership3.id)
def test_tenants_list_memberships_filter_by_user_no_match(
self, authenticated_client, tenants_fixture, extra_users
):
_, tenant2, _ = tenants_fixture
unrelated_user = User.objects.create_user(
name="unrelated",
password=TEST_PASSWORD,
email="unrelated@gmail.com",
)
response = authenticated_client.get(
reverse("tenant-membership-list", kwargs={"tenant_pk": tenant2.id}),
{"filter[user]": str(unrelated_user.id)},
)
assert response.status_code == status.HTTP_200_OK
assert response.json()["data"] == []
def test_tenants_list_memberships_as_member(
self, authenticated_client, tenants_fixture, extra_users
):
@@ -802,6 +842,7 @@ class TestTenantViewSet:
):
_, tenant2, _ = tenants_fixture
user_membership = Membership.objects.get(tenant=tenant2, user__email=TEST_USER)
user_id = user_membership.user_id
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
@@ -810,6 +851,127 @@ class TestTenantViewSet:
)
assert response.status_code == status.HTTP_403_FORBIDDEN
assert Membership.objects.filter(id=user_membership.id).exists()
assert User.objects.filter(id=user_id).exists()
def test_expel_user_deletes_account_if_last_membership(
self, authenticated_client, tenants_fixture, extra_users
):
# TEST_USER is OWNER of tenant2; user3 is MEMBER only in tenant2
_, tenant2, _ = tenants_fixture
_, user3_membership = extra_users
user3, membership3 = user3_membership
assert Membership.objects.filter(user=user3).count() == 1
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
kwargs={"tenant_pk": tenant2.id, "pk": membership3.id},
)
)
assert response.status_code == status.HTTP_204_NO_CONTENT
assert not Membership.objects.filter(id=membership3.id).exists()
assert not User.objects.filter(id=user3.id).exists()
def test_expel_user_blacklists_refresh_tokens(
self, authenticated_client, tenants_fixture, extra_users
):
_, tenant2, _ = tenants_fixture
_, user3_membership = extra_users
user3, membership3 = user3_membership
# Issue two refresh tokens to simulate active sessions
RefreshToken.for_user(user3)
RefreshToken.for_user(user3)
outstanding_ids = list(
OutstandingToken.objects.filter(user=user3).values_list("id", flat=True)
)
assert len(outstanding_ids) == 2
assert not BlacklistedToken.objects.filter(
token_id__in=outstanding_ids
).exists()
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
kwargs={"tenant_pk": tenant2.id, "pk": membership3.id},
)
)
assert response.status_code == status.HTTP_204_NO_CONTENT
assert (
BlacklistedToken.objects.filter(token_id__in=outstanding_ids).count() == 2
)
def test_expel_user_blacklists_refresh_tokens_is_idempotent(
self, authenticated_client, tenants_fixture, extra_users
):
# Regression test for the bulk blacklisting path: if one of the
# user's refresh tokens is already blacklisted when the expel
# endpoint runs, the remaining tokens must still be blacklisted
# and the already-blacklisted one must not be duplicated.
tenant1, tenant2, _ = tenants_fixture
_, user3_membership = extra_users
user3, membership3 = user3_membership
# Keep the user alive after the expel so the assertions below can
# still query OutstandingToken by user_id.
Membership.objects.create(
user=user3,
tenant=tenant1,
role=Membership.RoleChoices.MEMBER,
)
RefreshToken.for_user(user3)
RefreshToken.for_user(user3)
outstanding_ids = list(
OutstandingToken.objects.filter(user=user3).values_list("id", flat=True)
)
assert len(outstanding_ids) == 2
# Pre-blacklist one of the two tokens to simulate a prior revocation.
BlacklistedToken.objects.create(token_id=outstanding_ids[0])
assert (
BlacklistedToken.objects.filter(token_id__in=outstanding_ids).count() == 1
)
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
kwargs={"tenant_pk": tenant2.id, "pk": membership3.id},
)
)
assert response.status_code == status.HTTP_204_NO_CONTENT
blacklisted = BlacklistedToken.objects.filter(token_id__in=outstanding_ids)
assert blacklisted.count() == 2
assert set(blacklisted.values_list("token_id", flat=True)) == set(
outstanding_ids
)
def test_expel_user_keeps_account_if_has_other_memberships(
self, authenticated_client, tenants_fixture, extra_users
):
tenant1, tenant2, _ = tenants_fixture
_, user3_membership = extra_users
user3, membership3 = user3_membership
# Give user3 an additional membership in tenant1 so they are not orphaned
other_membership = Membership.objects.create(
user=user3,
tenant=tenant1,
role=Membership.RoleChoices.MEMBER,
)
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
kwargs={"tenant_pk": tenant2.id, "pk": membership3.id},
)
)
assert response.status_code == status.HTTP_204_NO_CONTENT
assert not Membership.objects.filter(id=membership3.id).exists()
assert User.objects.filter(id=user3.id).exists()
assert Membership.objects.filter(id=other_membership.id).exists()
def test_tenants_delete_another_membership_as_owner(
self, authenticated_client, tenants_fixture, extra_users
@@ -881,6 +1043,128 @@ class TestTenantViewSet:
assert response.status_code == status.HTTP_404_NOT_FOUND
assert Membership.objects.filter(id=other_membership.id).exists()
def test_delete_membership_cleans_up_orphaned_role_grants(
self, authenticated_client, tenants_fixture
):
"""Test that deleting a membership removes UserRoleRelationship records
for that tenant while preserving grants in other tenants."""
tenant1, tenant2, _ = tenants_fixture
# Create a user with memberships in both tenants
user = User.objects.create_user(
name="Multi-tenant User",
password=TEST_PASSWORD,
email="multitenant@test.com",
)
# Create memberships in both tenants
Membership.objects.create(
user=user, tenant=tenant1, role=Membership.RoleChoices.MEMBER
)
membership2 = Membership.objects.create(
user=user, tenant=tenant2, role=Membership.RoleChoices.MEMBER
)
# Create roles in both tenants
role1 = Role.objects.create(
name="Test Role 1", tenant=tenant1, manage_providers=True
)
role2 = Role.objects.create(
name="Test Role 2", tenant=tenant2, manage_scans=True
)
# Create user role relationships for both tenants
UserRoleRelationship.objects.create(user=user, role=role1, tenant=tenant1)
UserRoleRelationship.objects.create(user=user, role=role2, tenant=tenant2)
# Verify initial state
assert UserRoleRelationship.objects.filter(user=user, tenant=tenant1).exists()
assert UserRoleRelationship.objects.filter(user=user, tenant=tenant2).exists()
assert Role.objects.filter(id=role1.id).exists()
assert Role.objects.filter(id=role2.id).exists()
# Delete membership from tenant2 (authenticated user is owner of tenant2)
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
kwargs={"tenant_pk": tenant2.id, "pk": membership2.id},
)
)
assert response.status_code == status.HTTP_204_NO_CONTENT
# Verify the membership was deleted
assert not Membership.objects.filter(id=membership2.id).exists()
# Verify UserRoleRelationship for tenant2 was deleted
assert not UserRoleRelationship.objects.filter(
user=user, tenant=tenant2
).exists()
# Verify UserRoleRelationship for tenant1 is preserved
assert UserRoleRelationship.objects.filter(user=user, tenant=tenant1).exists()
# Verify orphaned role2 was deleted (no more user or invitation relationships)
assert not Role.objects.filter(id=role2.id).exists()
# Verify role1 is preserved (still has user relationship)
assert Role.objects.filter(id=role1.id).exists()
# Verify the user still exists (has other memberships)
assert User.objects.filter(id=user.id).exists()
def test_delete_membership_preserves_role_with_invitation_relationship(
self, authenticated_client, tenants_fixture
):
"""Test that roles are not deleted if they have invitation relationships."""
_, tenant2, _ = tenants_fixture
# Create a user with membership
user = User.objects.create_user(
name="Test User", password=TEST_PASSWORD, email="testuser@test.com"
)
membership = Membership.objects.create(
user=user, tenant=tenant2, role=Membership.RoleChoices.MEMBER
)
# Create a role and user relationship
role = Role.objects.create(
name="Shared Role", tenant=tenant2, manage_providers=True
)
UserRoleRelationship.objects.create(user=user, role=role, tenant=tenant2)
# Create an invitation with the same role
invitation = Invitation.objects.create(email="pending@test.com", tenant=tenant2)
InvitationRoleRelationship.objects.create(
invitation=invitation, role=role, tenant=tenant2
)
# Verify initial state
assert UserRoleRelationship.objects.filter(user=user, role=role).exists()
assert InvitationRoleRelationship.objects.filter(
invitation=invitation, role=role
).exists()
assert Role.objects.filter(id=role.id).exists()
# Delete the membership
response = authenticated_client.delete(
reverse(
"tenant-membership-detail",
kwargs={"tenant_pk": tenant2.id, "pk": membership.id},
)
)
assert response.status_code == status.HTTP_204_NO_CONTENT
# Verify UserRoleRelationship was deleted
assert not UserRoleRelationship.objects.filter(user=user, role=role).exists()
# Verify role is preserved because invitation relationship exists
assert Role.objects.filter(id=role.id).exists()
assert InvitationRoleRelationship.objects.filter(
invitation=invitation, role=role
).exists()
def test_tenants_list_no_permissions(
self, authenticated_client_no_permissions_rbac, tenants_fixture
):
@@ -3829,6 +4113,51 @@ class TestScanViewSet:
assert cd.startswith('attachment; filename="')
assert cd.endswith(f'filename="{fname.name}"')
def test_cis_no_output(self, authenticated_client, scans_fixture):
"""CIS PDF endpoint must 404 when the scan has no output_location."""
scan = scans_fixture[0]
scan.state = StateChoices.COMPLETED
scan.output_location = ""
scan.save()
url = reverse("scan-cis", kwargs={"pk": scan.id})
resp = authenticated_client.get(url)
assert resp.status_code == status.HTTP_404_NOT_FOUND
assert (
resp.json()["errors"]["detail"]
== "The scan has no reports, or the CIS report generation task has not started yet."
)
def test_cis_local_file(self, authenticated_client, scans_fixture, monkeypatch):
"""CIS PDF endpoint must serve the latest generated PDF."""
scan = scans_fixture[0]
scan.state = StateChoices.COMPLETED
with tempfile.TemporaryDirectory() as tmp:
tmp_path = Path(tmp)
base = tmp_path / "reports"
cis_dir = base / "cis"
cis_dir.mkdir(parents=True, exist_ok=True)
fname = cis_dir / "prowler-output-aws-20260101000000_cis_report.pdf"
fname.write_bytes(b"%PDF-1.4 fake pdf")
scan.output_location = str(base / "scan.zip")
scan.save()
monkeypatch.setattr(
glob,
"glob",
lambda p: [str(fname)] if p.endswith("*_cis_report.pdf") else [],
)
url = reverse("scan-cis", kwargs={"pk": scan.id})
resp = authenticated_client.get(url)
assert resp.status_code == status.HTTP_200_OK
assert resp["Content-Type"] == "application/pdf"
cd = resp["Content-Disposition"]
assert cd.startswith('attachment; filename="')
assert cd.endswith(f'filename="{fname.name}"')
@patch("api.v1.views.Task.objects.get")
@patch("api.v1.views.TaskSerializer")
def test__get_task_status_returns_none_if_task_not_executing(
@@ -15445,15 +15774,15 @@ class TestFindingGroupViewSet:
# iam_password_policy has only PASS findings
assert data[0]["attributes"]["status"] == "PASS"
def test_finding_groups_fully_muted_group_reflects_underlying_status(
def test_finding_groups_fully_muted_group_is_pass(
self, authenticated_client, finding_groups_fixture
):
"""A fully-muted group still surfaces its underlying status (no MUTED).
"""A fully-muted group reports status=PASS and muted=True.
rds_encryption has 2 muted FAIL findings, so the group must report
status=FAIL (the orthogonal `muted` boolean signals it isn't actionable).
The status×muted breakdown lets clients answer 'how many failing
findings are muted in this group'.
rds_encryption has 2 muted FAIL findings. Muted findings are treated
as resolved/accepted, so the group is no longer actionable and its
status must be PASS. The `muted` flag is True because every finding
in the group is muted.
"""
response = authenticated_client.get(
reverse("finding-group-list"),
@@ -15463,9 +15792,9 @@ class TestFindingGroupViewSet:
data = response.json()["data"]
assert len(data) == 1
attrs = data[0]["attributes"]
assert attrs["status"] == "FAIL"
assert attrs["status"] == "PASS"
assert attrs["muted"] is True
assert attrs["fail_count"] == 2
assert attrs["fail_count"] == 0
assert attrs["fail_muted_count"] == 2
assert attrs["pass_muted_count"] == 0
assert attrs["manual_muted_count"] == 0
@@ -15478,6 +15807,83 @@ class TestFindingGroupViewSet:
== attrs["muted_count"]
)
def test_finding_groups_status_ignores_muted_failures(
self,
authenticated_client,
tenants_fixture,
scans_fixture,
resources_fixture,
):
"""Muted FAIL findings must not drive the aggregated status.
When a group mixes one non-muted PASS with one muted FAIL, the
actionable outcome is PASS: there are no unmuted failures left. The
aggregated `status` must reflect that (not FAIL), while `muted`
stays False because the group still has a non-muted finding.
"""
tenant = tenants_fixture[0]
scan1, *_ = scans_fixture
resource1, *_ = resources_fixture
pass_finding = Finding.objects.create(
tenant_id=tenant.id,
uid="fg_mixed_muted_pass",
scan=scan1,
delta=None,
status=Status.PASS,
severity=Severity.low,
impact=Severity.low,
check_id="mixed_muted_check",
check_metadata={
"CheckId": "mixed_muted_check",
"checktitle": "Mixed muted check",
"Description": "Fixture for muted status aggregation.",
},
first_seen_at="2024-01-11T00:00:00Z",
muted=False,
)
pass_finding.add_resources([resource1])
fail_muted_finding = Finding.objects.create(
tenant_id=tenant.id,
uid="fg_mixed_muted_fail",
scan=scan1,
delta=None,
status=Status.FAIL,
severity=Severity.high,
impact=Severity.high,
check_id="mixed_muted_check",
check_metadata={
"CheckId": "mixed_muted_check",
"checktitle": "Mixed muted check",
"Description": "Fixture for muted status aggregation.",
},
first_seen_at="2024-01-12T00:00:00Z",
muted=True,
)
fail_muted_finding.add_resources([resource1])
# filter[region] forces finding-level aggregation so we exercise the
# raw-findings path without touching the daily summary fixture.
response = authenticated_client.get(
reverse("finding-group-list"),
{
"filter[inserted_at]": TODAY,
"filter[check_id]": "mixed_muted_check",
"filter[region]": "us-east-1",
},
)
assert response.status_code == status.HTTP_200_OK
data = response.json()["data"]
assert len(data) == 1
attrs = data[0]["attributes"]
assert attrs["status"] == "PASS"
assert attrs["muted"] is False
assert attrs["pass_count"] == 1
assert attrs["fail_count"] == 0
assert attrs["fail_muted_count"] == 1
assert attrs["muted_count"] == 1
def test_finding_groups_status_filter(
self, authenticated_client, finding_groups_fixture
):
@@ -16030,6 +16436,36 @@ class TestFindingGroupViewSet:
# s3_bucket_public_access has 2 findings with 2 different resources
assert len(data) == 2
def test_resources_id_matches_resource_id_for_mapped_findings(
self, authenticated_client, finding_groups_fixture
):
"""Findings with a resource expose the resource id as row id (hot path contract)."""
response = authenticated_client.get(
reverse(
"finding-group-resources", kwargs={"pk": "s3_bucket_public_access"}
),
{"filter[inserted_at]": TODAY},
)
assert response.status_code == status.HTTP_200_OK
data = response.json()["data"]
assert data, "expected resources in response"
resource_ids = set(
ResourceFindingMapping.objects.filter(
finding__check_id="s3_bucket_public_access",
).values_list("resource_id", flat=True)
)
finding_ids = set(
Finding.objects.filter(
check_id="s3_bucket_public_access",
).values_list("id", flat=True)
)
returned_ids = {item["id"] for item in data}
assert returned_ids <= {str(rid) for rid in resource_ids}
assert returned_ids.isdisjoint({str(fid) for fid in finding_ids})
def test_resources_fields(self, authenticated_client, finding_groups_fixture):
"""Test resource fields (uid, name, service, region, type) have valid values."""
response = authenticated_client.get(
@@ -17046,6 +17482,57 @@ class TestFindingGroupViewSet:
# Descending boolean: True (1) before False (0)
assert muted_values == sorted(muted_values, reverse=True)
@pytest.mark.parametrize(
"endpoint_name", ["finding-group-list", "finding-group-latest"]
)
@pytest.mark.parametrize(
"sort_field",
[
"pass_muted_count",
"fail_muted_count",
"manual_muted_count",
"new_fail_count",
"new_fail_muted_count",
"new_pass_count",
"new_pass_muted_count",
"new_manual_count",
"new_manual_muted_count",
"changed_fail_count",
"changed_fail_muted_count",
"changed_pass_count",
"changed_pass_muted_count",
"changed_manual_count",
"changed_manual_muted_count",
],
)
def test_finding_groups_sort_by_counter_fields(
self,
authenticated_client,
finding_groups_fixture,
endpoint_name,
sort_field,
):
"""All counter fields are accepted as sort parameters (asc and desc)."""
params = {"sort": f"-{sort_field}"}
if endpoint_name == "finding-group-list":
params["filter[inserted_at]"] = TODAY
response = authenticated_client.get(reverse(endpoint_name), params)
assert response.status_code == status.HTTP_200_OK
data = response.json()["data"]
assert len(data) > 0
desc_values = [item["attributes"][sort_field] for item in data]
assert desc_values == sorted(desc_values, reverse=True)
params["sort"] = sort_field
response = authenticated_client.get(reverse(endpoint_name), params)
assert response.status_code == status.HTTP_200_OK
asc_values = [
item["attributes"][sort_field] for item in response.json()["data"]
]
assert asc_values == sorted(asc_values)
@pytest.mark.parametrize(
"endpoint_name", ["finding-group-list", "finding-group-latest"]
)
@@ -17189,3 +17676,111 @@ class TestFindingGroupViewSet:
attrs = item["attributes"]
assert "finding_id" in attrs
assert attrs["finding_id"] in rds_finding_ids
def test_latest_resources_picks_scan_by_completed_at_when_overlap(
self,
authenticated_client,
tenants_fixture,
providers_fixture,
resources_fixture,
):
"""Overlapping scans on the same provider must resolve to the scan
with the latest completed_at, matching the /latest summary path and
the daily-summary upsert (keyed on midnight(completed_at)). Picking
by inserted_at here produced /resources and /latest reading from
different scans and reporting diverging delta/new counts.
"""
tenant = tenants_fixture[0]
provider = providers_fixture[0]
resource = resources_fixture[0]
check_id = "overlap_regression_check"
t0 = datetime.now(timezone.utc) - timedelta(hours=5)
t1 = t0 + timedelta(hours=1)
t1_end = t1 + timedelta(minutes=30)
t2 = t0 + timedelta(hours=4)
scan_long = Scan.objects.create(
name="long overlap scan",
provider=provider,
trigger=Scan.TriggerChoices.MANUAL,
state=StateChoices.COMPLETED,
tenant_id=tenant.id,
started_at=t0,
completed_at=t2,
)
scan_short = Scan.objects.create(
name="short overlap scan",
provider=provider,
trigger=Scan.TriggerChoices.MANUAL,
state=StateChoices.COMPLETED,
tenant_id=tenant.id,
started_at=t1,
completed_at=t1_end,
)
# inserted_at is auto_now_add so override with .update() to recreate
# the overlap shape: short scan inserted later but completed earlier.
Scan.all_objects.filter(pk=scan_long.pk).update(inserted_at=t0)
Scan.all_objects.filter(pk=scan_short.pk).update(inserted_at=t1)
scan_long.refresh_from_db()
scan_short.refresh_from_db()
assert scan_short.inserted_at > scan_long.inserted_at
assert scan_long.completed_at > scan_short.completed_at
long_finding = Finding.objects.create(
tenant_id=tenant.id,
uid=f"{check_id}_long",
scan=scan_long,
delta=None,
status=Status.FAIL,
status_extended="long scan finding",
impact=Severity.high,
impact_extended="high",
severity=Severity.high,
raw_result={"status": Status.FAIL, "severity": Severity.high},
check_id=check_id,
check_metadata={
"CheckId": check_id,
"checktitle": "Overlap regression",
"Description": "Overlapping scan regression.",
},
first_seen_at=t0,
muted=False,
)
long_finding.add_resources([resource])
short_finding = Finding.objects.create(
tenant_id=tenant.id,
uid=f"{check_id}_short",
scan=scan_short,
delta="new",
status=Status.FAIL,
status_extended="short scan finding",
impact=Severity.high,
impact_extended="high",
severity=Severity.high,
raw_result={"status": Status.FAIL, "severity": Severity.high},
check_id=check_id,
check_metadata={
"CheckId": check_id,
"checktitle": "Overlap regression",
"Description": "Overlapping scan regression.",
},
first_seen_at=t1,
muted=False,
)
short_finding.add_resources([resource])
response = authenticated_client.get(
reverse(
"finding-group-latest_resources",
kwargs={"check_id": check_id},
),
)
assert response.status_code == status.HTTP_200_OK
data = response.json()["data"]
assert len(data) == 1
attrs = data[0]["attributes"]
assert attrs["finding_id"] == str(long_finding.id)
assert attrs["delta"] is None
+3 -2
View File
@@ -4225,10 +4225,11 @@ class FindingGroupResourceSerializer(BaseSerializerV1):
Serializer for Finding Group Resources - resources within a finding group.
Returns individual resources with their current status, severity,
and timing information.
and timing information. Orphan findings (without any resource) expose the
finding id as `id` so the row stays identifiable in the UI.
"""
id = serializers.UUIDField(source="resource_id")
id = serializers.UUIDField(source="row_id")
resource = serializers.SerializerMethodField()
provider = serializers.SerializerMethodField()
finding_id = serializers.UUIDField()
+461 -66
View File
@@ -35,11 +35,13 @@ from django.db.models import (
CharField,
Count,
DecimalField,
Exists,
ExpressionWrapper,
F,
IntegerField,
Max,
Min,
OuterRef,
Prefetch,
Q,
QuerySet,
@@ -81,6 +83,10 @@ from rest_framework.permissions import SAFE_METHODS
from rest_framework_json_api import filters as jsonapi_filters
from rest_framework_json_api.views import RelationshipView, Response
from rest_framework_simplejwt.exceptions import InvalidToken, TokenError
from rest_framework_simplejwt.token_blacklist.models import (
BlacklistedToken,
OutstandingToken,
)
from tasks.beat import schedule_provider_scan
from tasks.jobs.attack_paths import db_utils as attack_paths_db_utils
from tasks.jobs.export import get_s3_client
@@ -167,6 +173,7 @@ from api.models import (
FindingGroupDailySummary,
Integration,
Invitation,
InvitationRoleRelationship,
LighthouseConfiguration,
LighthouseProviderConfiguration,
LighthouseProviderModels,
@@ -415,7 +422,7 @@ class SchemaView(SpectacularAPIView):
def get(self, request, *args, **kwargs):
spectacular_settings.TITLE = "Prowler API"
spectacular_settings.VERSION = "1.24.1"
spectacular_settings.VERSION = "1.26.0"
spectacular_settings.DESCRIPTION = (
"Prowler API specification.\n\nThis file is auto-generated."
)
@@ -1328,9 +1335,11 @@ class MembershipViewSet(BaseTenantViewset):
),
destroy=extend_schema(
summary="Delete tenant memberships",
description="Delete the membership details of users in a tenant. You need to be one of the owners to delete a "
"membership that is not yours. If you are the last owner of a tenant, you cannot delete your own "
"membership.",
description="Delete a user's membership from a tenant. This action: (1) removes the membership, "
"(2) revokes all refresh tokens for the expelled user, (3) removes their role grants for this tenant, "
"(4) cleans up orphaned roles, and (5) deletes the user account if this was their last membership. "
"You must be a tenant owner to delete another user's membership. The last owner of a tenant cannot "
"delete their own membership.",
tags=["Tenant"],
),
)
@@ -1339,6 +1348,7 @@ class TenantMembersViewSet(BaseTenantViewset):
http_method_names = ["get", "delete"]
serializer_class = MembershipSerializer
queryset = Membership.objects.none()
filterset_class = MembershipFilter
# Authorization is handled by get_requesting_membership (owner/member checks),
# not by RBAC, since the target tenant differs from the JWT tenant.
required_permissions = []
@@ -1396,7 +1406,84 @@ class TenantMembersViewSet(BaseTenantViewset):
"You do not have permission to delete this membership."
)
membership_to_delete.delete()
user_to_check_id = membership_to_delete.user_id
tenant_id = membership_to_delete.tenant_id
# All writes run on the admin connection so that the uncommitted
# membership delete is visible to the subsequent "other memberships"
# check. Splitting the delete and the check across the default
# (prowler_user, RLS) and admin connections caused the admin side to
# miss the just-deleted row and leave the User row orphaned.
with transaction.atomic(using=MainRouter.admin_db):
Membership.objects.using(MainRouter.admin_db).filter(
id=membership_to_delete.id
).delete()
# Remove role grants for this user in this tenant to prevent
# orphaned permissions that could allow access after expulsion
deleted_role_relationships = UserRoleRelationship.objects.using(
MainRouter.admin_db
).filter(user_id=user_to_check_id, tenant_id=tenant_id)
# Collect role IDs that might become orphaned after deletion
role_ids_to_check = list(
deleted_role_relationships.values_list("role_id", flat=True)
)
# Delete the user role relationships for this tenant
deleted_role_relationships.delete()
# Clean up orphaned roles that have no remaining user or invitation relationships
if role_ids_to_check:
for role_id in role_ids_to_check:
has_user_relationships = (
UserRoleRelationship.objects.using(MainRouter.admin_db)
.filter(role_id=role_id)
.exists()
)
has_invitation_relationships = (
InvitationRoleRelationship.objects.using(MainRouter.admin_db)
.filter(role_id=role_id)
.exists()
)
if not has_user_relationships and not has_invitation_relationships:
Role.objects.using(MainRouter.admin_db).filter(
id=role_id
).delete()
# Revoke any refresh tokens the expelled user still holds so they
# cannot mint fresh access tokens. This must happen before the
# User row is deleted, because OutstandingToken.user is
# on_delete=SET_NULL in djangorestframework-simplejwt 5.5.1
# (see rest_framework_simplejwt/token_blacklist/models.py): once
# the user row is gone, user_id becomes NULL and we can no longer
# look up that user's outstanding tokens. Access tokens already
# issued remain valid until SIMPLE_JWT["ACCESS_TOKEN_LIFETIME"]
# expires.
outstanding_token_ids = list(
OutstandingToken.objects.using(MainRouter.admin_db)
.filter(user_id=user_to_check_id)
.values_list("id", flat=True)
)
if outstanding_token_ids:
BlacklistedToken.objects.using(MainRouter.admin_db).bulk_create(
[
BlacklistedToken(token_id=token_id)
for token_id in outstanding_token_ids
],
ignore_conflicts=True,
)
has_other_memberships = (
Membership.objects.using(MainRouter.admin_db)
.filter(user_id=user_to_check_id)
.exists()
)
if not has_other_memberships:
User.objects.using(MainRouter.admin_db).filter(
id=user_to_check_id
).delete()
return Response(status=status.HTTP_204_NO_CONTENT)
@@ -1839,6 +1926,27 @@ class ProviderViewSet(DisablePaginationMixin, BaseRLSViewSet):
),
},
),
cis=extend_schema(
tags=["Scan"],
summary="Retrieve CIS Benchmark compliance report",
description="Download the CIS Benchmark compliance report as a PDF file. "
"When a provider ships multiple CIS versions, the report is generated "
"for the highest available version.",
request=None,
responses={
200: OpenApiResponse(
description="PDF file containing the CIS compliance report"
),
202: OpenApiResponse(description="The task is in progress"),
401: OpenApiResponse(
description="API key missing or user not Authenticated"
),
403: OpenApiResponse(description="There is a problem with credentials"),
404: OpenApiResponse(
description="The scan has no CIS reports, or the CIS report generation task has not started yet"
),
},
),
)
@method_decorator(CACHE_DECORATOR, name="list")
@method_decorator(CACHE_DECORATOR, name="retrieve")
@@ -1907,6 +2015,9 @@ class ScanViewSet(BaseRLSViewSet):
elif self.action == "csa":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
elif self.action == "cis":
if hasattr(self, "response_serializer_class"):
return self.response_serializer_class
return super().get_serializer_class()
def partial_update(self, request, *args, **kwargs):
@@ -2149,6 +2260,45 @@ class ScanViewSet(BaseRLSViewSet):
content, filename = loader
return self._serve_file(content, filename, "text/csv")
@action(
detail=True,
methods=["get"],
url_name="cis",
)
def cis(self, request, pk=None):
scan = self.get_object()
running_resp = self._get_task_status(scan)
if running_resp:
return running_resp
if not scan.output_location:
return Response(
{
"detail": "The scan has no reports, or the CIS report generation task has not started yet."
},
status=status.HTTP_404_NOT_FOUND,
)
if scan.output_location.startswith("s3://"):
bucket = env.str("DJANGO_OUTPUT_S3_AWS_OUTPUT_BUCKET", "")
key_prefix = scan.output_location.removeprefix(f"s3://{bucket}/")
prefix = os.path.join(
os.path.dirname(key_prefix),
"cis",
"*_cis_report.pdf",
)
loader = self._load_file(prefix, s3=True, bucket=bucket, list_objects=True)
else:
base = os.path.dirname(scan.output_location)
pattern = os.path.join(base, "cis", "*_cis_report.pdf")
loader = self._load_file(pattern, s3=False)
if isinstance(loader, Response):
return loader
content, filename = loader
return self._serve_file(content, filename, "application/pdf")
@action(
detail=True,
methods=["get"],
@@ -7125,17 +7275,16 @@ class FindingGroupViewSet(BaseRLSViewSet):
output_field=IntegerField(),
)
# `pass_count`, `fail_count` and `manual_count` count *every* finding
# for the check (muted or not) so the aggregated `status` reflects the
# underlying check outcome regardless of mute state. Whether the group
# is actionable is signalled by the orthogonal `muted` flag below.
# `pass_count`, `fail_count` and `manual_count` only count non-muted
# findings. Muted findings are tracked separately via the
# `*_muted_count` fields.
return (
queryset.values("check_id")
.annotate(
severity_order=Max(severity_case),
pass_count=Count("id", filter=Q(status="PASS")),
fail_count=Count("id", filter=Q(status="FAIL")),
manual_count=Count("id", filter=Q(status="MANUAL")),
pass_count=Count("id", filter=Q(status="PASS", muted=False)),
fail_count=Count("id", filter=Q(status="FAIL", muted=False)),
manual_count=Count("id", filter=Q(status="MANUAL", muted=False)),
pass_muted_count=Count("id", filter=Q(status="PASS", muted=True)),
fail_muted_count=Count("id", filter=Q(status="FAIL", muted=True)),
manual_muted_count=Count("id", filter=Q(status="MANUAL", muted=True)),
@@ -7280,12 +7429,18 @@ class FindingGroupViewSet(BaseRLSViewSet):
# finding-level aggregation path.
row.pop("nonmuted_count", None)
# Compute aggregated status. Counts are inclusive of muted findings,
# so the underlying check outcome surfaces even when the group is
# fully muted.
# Muted findings are treated as resolved/accepted, so they do not
# contribute to a failing status. A group is FAIL only when there
# is at least one non-muted FAIL; otherwise any pass (muted or
# not) or any muted fail makes the group PASS. Only groups whose
# findings are exclusively MANUAL fall through to MANUAL.
if row.get("fail_count", 0) > 0:
row["status"] = "FAIL"
elif row.get("pass_count", 0) > 0:
elif (
row.get("pass_count", 0) > 0
or row.get("pass_muted_count", 0) > 0
or row.get("fail_muted_count", 0) > 0
):
row["status"] = "PASS"
else:
row["status"] = "MANUAL"
@@ -7311,8 +7466,23 @@ class FindingGroupViewSet(BaseRLSViewSet):
"pass_count": "pass_count",
"manual_count": "manual_count",
"muted_count": "muted_count",
"pass_muted_count": "pass_muted_count",
"fail_muted_count": "fail_muted_count",
"manual_muted_count": "manual_muted_count",
"new_count": "new_count",
"new_fail_count": "new_fail_count",
"new_fail_muted_count": "new_fail_muted_count",
"new_pass_count": "new_pass_count",
"new_pass_muted_count": "new_pass_muted_count",
"new_manual_count": "new_manual_count",
"new_manual_muted_count": "new_manual_muted_count",
"changed_count": "changed_count",
"changed_fail_count": "changed_fail_count",
"changed_fail_muted_count": "changed_fail_muted_count",
"changed_pass_count": "changed_pass_count",
"changed_pass_muted_count": "changed_pass_muted_count",
"changed_manual_count": "changed_manual_count",
"changed_manual_muted_count": "changed_manual_muted_count",
"resources_total": "resources_total",
"resources_fail": "resources_fail",
"first_seen_at": "agg_first_seen_at",
@@ -7373,6 +7543,8 @@ class FindingGroupViewSet(BaseRLSViewSet):
aggregated_status=Case(
When(fail_count__gt=0, then=Value("FAIL")),
When(pass_count__gt=0, then=Value("PASS")),
When(pass_muted_count__gt=0, then=Value("PASS")),
When(fail_muted_count__gt=0, then=Value("PASS")),
default=Value("MANUAL"),
output_field=CharField(),
)
@@ -7392,6 +7564,25 @@ class FindingGroupViewSet(BaseRLSViewSet):
return filterset.qs
def _resolve_finding_ids(self, filtered_queryset):
"""
Materialize and request-cache the finding_ids list used to anchor
RFM lookups.
Turning `finding_id__in=Subquery(findings_qs)` into `finding_id__in=
[uuid, ...]` nudges PostgreSQL out of a Merge Semi Join that ends up
reading hundreds of thousands of RFM index entries just to post-
filter tenant_id. Caching on the ViewSet instance (one instance per
request) avoids duplicating the findings round-trip when several
helpers build different RFM querysets from the same filtered set.
"""
cached = getattr(self, "_finding_ids_cache", None)
if cached is not None and cached[0] is filtered_queryset:
return cached[1]
finding_ids = list(filtered_queryset.order_by().values_list("id", flat=True))
self._finding_ids_cache = (filtered_queryset, finding_ids)
return finding_ids
def _build_resource_mapping_queryset(
self, filtered_queryset, resource_ids=None, tenant_id: str | None = None
):
@@ -7401,10 +7592,10 @@ class FindingGroupViewSet(BaseRLSViewSet):
Starting from ResourceFindingMapping avoids scanning all mappings
before applying check_id/date filters on findings.
"""
finding_ids = filtered_queryset.order_by().values("id")
finding_ids = self._resolve_finding_ids(filtered_queryset)
mapping_queryset = ResourceFindingMapping.objects.filter(
finding_id__in=Subquery(finding_ids)
finding_id__in=finding_ids
)
if tenant_id:
mapping_queryset = mapping_queryset.filter(tenant_id=tenant_id)
@@ -7563,6 +7754,53 @@ class FindingGroupViewSet(BaseRLSViewSet):
.order_by(*ordering)
)
def _orphan_findings_queryset(self, filtered_queryset, finding_ids=None):
"""Findings in the filtered set with no ResourceFindingMapping entries."""
orphan_qs = filtered_queryset.filter(
~Exists(ResourceFindingMapping.objects.filter(finding_id=OuterRef("pk")))
)
if finding_ids is not None:
orphan_qs = orphan_qs.filter(id__in=finding_ids)
return orphan_qs
def _has_orphan_findings(self, filtered_queryset) -> bool:
"""Return True if any finding in the filtered set has no resource mapping."""
return self._orphan_findings_queryset(filtered_queryset).exists()
def _orphan_aggregation_values(self, orphan_queryset):
"""Raw rows for orphan findings; resource payload synthesized from metadata.
check_metadata is stored with lowercase keys (see
`prowler.lib.outputs.finding.Finding.get_metadata`) and
`Finding.resource_groups` is already denormalized at ingest time.
"""
return orphan_queryset.annotate(
_provider_type=F("scan__provider__provider"),
_provider_uid=F("scan__provider__uid"),
_provider_alias=F("scan__provider__alias"),
_svc=KeyTextTransform("servicename", "check_metadata"),
_region=KeyTextTransform("region", "check_metadata"),
_rtype=KeyTextTransform("resourcetype", "check_metadata"),
_rgroup=F("resource_groups"),
).values(
"id",
"uid",
"status",
"severity",
"delta",
"muted",
"muted_reason",
"first_seen_at",
"inserted_at",
"_provider_type",
"_provider_uid",
"_provider_alias",
"_svc",
"_region",
"_rtype",
"_rgroup",
)
def _post_process_resources(self, resource_data):
"""Convert resource aggregation rows to API output."""
results = []
@@ -7584,9 +7822,13 @@ class FindingGroupViewSet(BaseRLSViewSet):
else:
delta = None
resource_id = row["resource_id"]
finding_id = str(row["finding_id"]) if row.get("finding_id") else None
results.append(
{
"resource_id": row["resource_id"],
"row_id": resource_id,
"resource_id": resource_id,
"resource_uid": row["resource_uid"],
"resource_name": row["resource_name"],
"resource_service": row["resource_service"],
@@ -7605,9 +7847,46 @@ class FindingGroupViewSet(BaseRLSViewSet):
"muted": bool(row.get("muted", False)),
"muted_reason": row.get("muted_reason"),
"resource_group": row.get("resource_group", ""),
"finding_id": (
str(row["finding_id"]) if row.get("finding_id") else None
),
"finding_id": finding_id,
}
)
return results
def _post_process_orphans(self, orphan_rows):
"""Convert orphan finding rows into the same API shape as mapping rows."""
results = []
for row in orphan_rows:
status_val = row["status"]
status = status_val if status_val in ("FAIL", "PASS") else "MANUAL"
muted = bool(row["muted"])
delta_val = row.get("delta")
delta = delta_val if delta_val in ("new", "changed") and not muted else None
finding_id = str(row["id"])
results.append(
{
"row_id": finding_id,
"resource_id": None,
"resource_uid": row["uid"],
"resource_name": row["uid"],
"resource_service": row["_svc"] or "",
"resource_region": row["_region"] or "",
"resource_type": row["_rtype"] or "",
"provider_type": row["_provider_type"],
"provider_uid": row["_provider_uid"],
"provider_alias": row["_provider_alias"],
"status": status,
"severity": row["severity"],
"delta": delta,
"first_seen_at": row["first_seen_at"],
"last_seen_at": row["inserted_at"],
"muted": muted,
"muted_reason": row.get("muted_reason"),
"resource_group": row["_rgroup"] or "",
"finding_id": finding_id,
}
)
@@ -7668,26 +7947,18 @@ class FindingGroupViewSet(BaseRLSViewSet):
sort_param, self._FINDING_GROUP_SORT_MAP
)
if ordering:
# status_order is annotated on demand so groups can be sorted by
# their aggregated status (FAIL > PASS > MANUAL), mirroring the
# priority used in _post_process_aggregation. Counts are
# inclusive of muted findings, so the underlying check outcome
# surfaces even for fully muted groups.
if any(field.lstrip("-") == "status_order" for field in ordering):
aggregated_queryset = aggregated_queryset.annotate(
status_order=Case(
When(fail_count__gt=0, then=Value(3)),
When(pass_count__gt=0, then=Value(2)),
When(pass_muted_count__gt=0, then=Value(2)),
When(fail_muted_count__gt=0, then=Value(2)),
default=Value(1),
output_field=IntegerField(),
)
)
# delta_order is a virtual sort field: expand it to a
# lexicographic ordering by (new_count, changed_count) so groups
# with more new findings rank higher, with changed_count as the
# tie-breaker (preserves the "new > changed" priority used by
# the resources endpoint, but driven by the actual counters).
expanded_ordering = []
for field in ordering:
if field.lstrip("-") == "delta_order":
@@ -7721,41 +7992,65 @@ class FindingGroupViewSet(BaseRLSViewSet):
def _paginated_resource_response(
self, request, filtered_queryset, resource_ids, tenant_id
):
"""Paginate and return resources.
"""Paginate and return resources, appending orphan findings when present.
Without sort: paginate lightweight resource IDs first, aggregate only the page.
With sort: build a lightweight ordering subquery (resource_id + sort keys),
paginate that, then aggregate full details only for the page.
Hot path (no orphans, or resource filter applied): resources come from
ResourceFindingMapping aggregation. Untouched pre-existing behaviour.
Orphan fallback: findings without a mapping (e.g. IaC) are appended
after mapping rows as synthesised resource-like rows so they remain
visible in the UI without paying the aggregation cost on the hot path.
"""
sort_param = request.query_params.get("sort")
ordering = None
if sort_param:
ordering = self._validate_sort_fields(sort_param, self._RESOURCE_SORT_MAP)
if ordering:
if "resource_id" not in {field.lstrip("-") for field in ordering}:
ordering.append("resource_id")
validated = self._validate_sort_fields(sort_param, self._RESOURCE_SORT_MAP)
ordering = validated if validated else None
# Phase 1: lightweight aggregation with only sort keys, paginate
ordering_qs = self._build_resource_ordering_queryset(
filtered_queryset,
resource_ids=resource_ids,
tenant_id=tenant_id,
ordering=ordering,
)
page = self.paginate_queryset(ordering_qs)
if page is not None:
page_ids = [row["resource_id"] for row in page]
resource_data = self._build_resource_aggregation(
filtered_queryset, resource_ids=page_ids, tenant_id=tenant_id
)
# Re-sort to match the page ordering
id_order = {rid: idx for idx, rid in enumerate(page_ids)}
results = self._post_process_resources(resource_data)
results.sort(key=lambda r: id_order.get(r["resource_id"], 0))
serializer = FindingGroupResourceSerializer(results, many=True)
return self.get_paginated_response(serializer.data)
# Resource filters can only match findings with resources; skip orphan
# detection entirely when they are present.
if resource_ids is not None:
return self._mapping_paginated_response(
request, filtered_queryset, resource_ids, tenant_id, ordering
)
page_ids = [row["resource_id"] for row in ordering_qs]
# Serve the mapping response directly and piggyback on the paginator
# count to detect orphan-only groups, instead of paying a separate
# has_mappings.exists() semi-join over ResourceFindingMapping on
# every non-IaC request. TODO: once the ephemeral resources strategy
# is decided, mixed groups should route to _combined_paginated_response.
response = self._mapping_paginated_response(
request, filtered_queryset, resource_ids, tenant_id, ordering
)
page = getattr(self.paginator, "page", None)
mapping_total = page.paginator.count if page is not None else None
if mapping_total == 0:
# Pure orphan group (e.g. IaC): synthesize resource-like rows.
return self._combined_paginated_response(
request, filtered_queryset, tenant_id, ordering
)
return response
def _mapping_paginated_response(
self, request, filtered_queryset, resource_ids, tenant_id, ordering
):
"""Mapping-only paginated response (original fast path)."""
if ordering:
if "resource_id" not in {field.lstrip("-") for field in ordering}:
ordering.append("resource_id")
# Phase 1: lightweight aggregation with only sort keys, paginate
ordering_qs = self._build_resource_ordering_queryset(
filtered_queryset,
resource_ids=resource_ids,
tenant_id=tenant_id,
ordering=ordering,
)
page = self.paginate_queryset(ordering_qs)
if page is not None:
page_ids = [row["resource_id"] for row in page]
resource_data = self._build_resource_aggregation(
filtered_queryset, resource_ids=page_ids, tenant_id=tenant_id
)
@@ -7763,10 +8058,18 @@ class FindingGroupViewSet(BaseRLSViewSet):
results = self._post_process_resources(resource_data)
results.sort(key=lambda r: id_order.get(r["resource_id"], 0))
serializer = FindingGroupResourceSerializer(results, many=True)
return Response(serializer.data)
return self.get_paginated_response(serializer.data)
page_ids = [row["resource_id"] for row in ordering_qs]
resource_data = self._build_resource_aggregation(
filtered_queryset, resource_ids=page_ids, tenant_id=tenant_id
)
id_order = {rid: idx for idx, rid in enumerate(page_ids)}
results = self._post_process_resources(resource_data)
results.sort(key=lambda r: id_order.get(r["resource_id"], 0))
serializer = FindingGroupResourceSerializer(results, many=True)
return Response(serializer.data)
# No sort (or only empty sort fragments): paginate lightweight resource IDs
# first, aggregate only the page.
mapping_qs = self._build_resource_mapping_queryset(
filtered_queryset, resource_ids=resource_ids, tenant_id=tenant_id
)
@@ -7794,6 +8097,95 @@ class FindingGroupViewSet(BaseRLSViewSet):
serializer = FindingGroupResourceSerializer(results, many=True)
return Response(serializer.data)
def _combined_paginated_response(
self, request, filtered_queryset, tenant_id, ordering
):
"""Mapping rows + orphan findings appended at end.
Orphans sit after mapping rows regardless of sort. This keeps the
mapping-only code path intact for checks that have no orphans (the
common case) and avoids paying UNION/coalesce costs there.
"""
mapping_qs = self._build_resource_mapping_queryset(
filtered_queryset, resource_ids=None, tenant_id=tenant_id
)
mapping_count = mapping_qs.values("resource_id").distinct().count()
orphan_ids = list(
self._orphan_findings_queryset(filtered_queryset)
.order_by("id")
.values_list("id", flat=True)
)
orphan_count = len(orphan_ids)
total = mapping_count + orphan_count
# Paginate a simple [0..total) index sequence so DRF produces proper
# links/meta; then slice mapping / orphan sources accordingly.
page = self.paginate_queryset(range(total))
page_indices = list(page) if page is not None else list(range(total))
mapping_indices = [i for i in page_indices if i < mapping_count]
orphan_positions = [
i - mapping_count for i in page_indices if i >= mapping_count
]
mapping_results = []
if mapping_indices:
start = mapping_indices[0]
stop = mapping_indices[-1] + 1
if ordering:
ordering_fields = list(ordering)
if "resource_id" not in {
field.lstrip("-") for field in ordering_fields
}:
ordering_fields.append("resource_id")
ordered_qs = self._build_resource_ordering_queryset(
filtered_queryset,
resource_ids=None,
tenant_id=tenant_id,
ordering=ordering_fields,
)
slice_rids = [row["resource_id"] for row in ordered_qs[start:stop]]
else:
slice_rids = list(
mapping_qs.values_list("resource_id", flat=True)
.distinct()
.order_by("resource_id")[start:stop]
)
if slice_rids:
resource_data = self._build_resource_aggregation(
filtered_queryset,
resource_ids=slice_rids,
tenant_id=tenant_id,
)
rows_by_rid = {row["resource_id"]: row for row in resource_data}
ordered_rows = [
rows_by_rid[rid] for rid in slice_rids if rid in rows_by_rid
]
mapping_results = self._post_process_resources(ordered_rows)
orphan_results = []
if orphan_positions:
slice_fids = [orphan_ids[pos] for pos in orphan_positions]
raw_rows = list(
self._orphan_aggregation_values(
self._orphan_findings_queryset(
filtered_queryset, finding_ids=slice_fids
)
)
)
rows_by_fid = {row["id"]: row for row in raw_rows}
ordered_rows = [
rows_by_fid[fid] for fid in slice_fids if fid in rows_by_fid
]
orphan_results = self._post_process_orphans(ordered_rows)
results = mapping_results + orphan_results
serializer = FindingGroupResourceSerializer(results, many=True)
if page is not None:
return self.get_paginated_response(serializer.data)
return Response(serializer.data)
def list(self, request, *args, **kwargs):
"""
List finding groups with aggregation and filtering.
@@ -7923,10 +8315,13 @@ class FindingGroupViewSet(BaseRLSViewSet):
tenant_id = request.tenant_id
queryset = self._get_finding_queryset()
# Get latest completed scan for each provider
# Order by -completed_at (matching the /latest summary path and the
# daily summary upsert keyed on midnight(completed_at)) so that
# overlapping scans do not make /resources and /latest read from
# different scans and report diverging counts.
latest_scan_ids = (
Scan.objects.filter(tenant_id=tenant_id, state=StateChoices.COMPLETED)
.order_by("provider_id", "-inserted_at")
.order_by("provider_id", "-completed_at", "-inserted_at")
.distinct("provider_id")
.values_list("id", flat=True)
)
+3 -1
View File
@@ -17,8 +17,10 @@ celery_app.config_from_object("django.conf:settings", namespace="CELERY")
celery_app.conf.update(result_extended=True, result_expires=None)
celery_app.conf.broker_transport_options = {
"visibility_timeout": BROKER_VISIBILITY_TIMEOUT
"visibility_timeout": BROKER_VISIBILITY_TIMEOUT,
"queue_order_strategy": "priority",
}
celery_app.conf.task_default_priority = 6
celery_app.conf.result_backend_transport_options = {
"visibility_timeout": BROKER_VISIBILITY_TIMEOUT
}
+1 -1
View File
@@ -15,7 +15,7 @@ from config.django.production import LOGGING as DJANGO_LOGGERS, DEBUG # noqa: E
from config.custom_logging import BackendLogger # noqa: E402
BIND_ADDRESS = env("DJANGO_BIND_ADDRESS", default="127.0.0.1")
PORT = env("DJANGO_PORT", default=8000)
PORT = env("DJANGO_PORT", default=8080)
# Server settings
bind = f"{BIND_ADDRESS}:{PORT}"
@@ -120,6 +120,7 @@ sentry_sdk.init(
# see https://docs.sentry.io/platforms/python/data-management/data-collected/ for more info
before_send=before_send,
send_default_pii=True,
traces_sample_rate=env.float("DJANGO_SENTRY_TRACES_SAMPLE_RATE", default=0.02),
_experiments={
# Set continuous_profiling_auto_start to True
# to automatically start the profiler on when
+4 -4
View File
@@ -14,8 +14,8 @@ from rest_framework import status
from rest_framework.test import APIClient
from tasks.jobs.backfill import (
backfill_resource_scan_summaries,
backfill_scan_category_summaries,
backfill_scan_resource_group_summaries,
aggregate_scan_category_summaries,
aggregate_scan_resource_group_summaries,
)
from api.attack_paths import (
@@ -1445,8 +1445,8 @@ def latest_scan_finding_with_categories(
)
finding.add_resources([resource])
backfill_resource_scan_summaries(tenant_id, str(scan.id))
backfill_scan_category_summaries(tenant_id, str(scan.id))
backfill_scan_resource_group_summaries(tenant_id, str(scan.id))
aggregate_scan_category_summaries(tenant_id, str(scan.id))
aggregate_scan_resource_group_summaries(tenant_id, str(scan.id))
return finding
Binary file not shown.

After

Width:  |  Height:  |  Size: 131 KiB

+59 -10
View File
@@ -1,6 +1,8 @@
# Portions of this file are based on code from the Cartography project
# (https://github.com/cartography-cncf/cartography), which is licensed under the Apache 2.0 License.
import time
from typing import Any
import aioboto3
@@ -33,7 +35,7 @@ def start_aws_ingestion(
For the scan progress updates:
- The caller of this function (`tasks.jobs.attack_paths.scan.run`) has set it to 2.
- When the control returns to the caller, it will be set to 95.
- When the control returns to the caller, it will be set to 93.
"""
# Initialize variables common to all jobs
@@ -89,34 +91,50 @@ def start_aws_ingestion(
logger.info(
f"Syncing function permission_relationships for AWS account {prowler_api_provider.uid}"
)
t0 = time.perf_counter()
cartography_aws.RESOURCE_FUNCTIONS["permission_relationships"](**sync_args)
logger.info(
f"Synced function permission_relationships for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 88)
if "resourcegroupstaggingapi" in requested_syncs:
logger.info(
f"Syncing function resourcegroupstaggingapi for AWS account {prowler_api_provider.uid}"
)
t0 = time.perf_counter()
cartography_aws.RESOURCE_FUNCTIONS["resourcegroupstaggingapi"](**sync_args)
logger.info(
f"Synced function resourcegroupstaggingapi for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 89)
logger.info(
f"Syncing ec2_iaminstanceprofile scoped analysis for AWS account {prowler_api_provider.uid}"
)
t0 = time.perf_counter()
cartography_aws.run_scoped_analysis_job(
"aws_ec2_iaminstanceprofile.json",
neo4j_session,
common_job_parameters,
)
logger.info(
f"Synced ec2_iaminstanceprofile scoped analysis for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 90)
logger.info(
f"Syncing lambda_ecr analysis for AWS account {prowler_api_provider.uid}"
)
t0 = time.perf_counter()
cartography_aws.run_analysis_job(
"aws_lambda_ecr.json",
neo4j_session,
common_job_parameters,
)
logger.info(
f"Synced lambda_ecr analysis for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
if all(
s in requested_syncs
@@ -125,25 +143,34 @@ def start_aws_ingestion(
logger.info(
f"Syncing lb_container_exposure scoped analysis for AWS account {prowler_api_provider.uid}"
)
t0 = time.perf_counter()
cartography_aws.run_scoped_analysis_job(
"aws_lb_container_exposure.json",
neo4j_session,
common_job_parameters,
)
logger.info(
f"Synced lb_container_exposure scoped analysis for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
if all(s in requested_syncs for s in ["ec2:network_acls", "ec2:load_balancer_v2"]):
logger.info(
f"Syncing lb_nacl_direct scoped analysis for AWS account {prowler_api_provider.uid}"
)
t0 = time.perf_counter()
cartography_aws.run_scoped_analysis_job(
"aws_lb_nacl_direct.json",
neo4j_session,
common_job_parameters,
)
logger.info(
f"Synced lb_nacl_direct scoped analysis for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 91)
logger.info(f"Syncing metadata for AWS account {prowler_api_provider.uid}")
t0 = time.perf_counter()
cartography_aws.merge_module_sync_metadata(
neo4j_session,
group_type="AWSAccount",
@@ -152,24 +179,23 @@ def start_aws_ingestion(
update_tag=cartography_config.update_tag,
stat_handler=cartography_aws.stat_handler,
)
logger.info(
f"Synced metadata for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 92)
# Removing the added extra field
del common_job_parameters["AWS_ID"]
logger.info(f"Syncing cleanup_job for AWS account {prowler_api_provider.uid}")
cartography_aws.run_cleanup_job(
"aws_post_ingestion_principals_cleanup.json",
neo4j_session,
common_job_parameters,
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 93)
logger.info(f"Syncing analysis for AWS account {prowler_api_provider.uid}")
t0 = time.perf_counter()
cartography_aws._perform_aws_analysis(
requested_syncs, neo4j_session, common_job_parameters
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 94)
logger.info(
f"Synced analysis for AWS account {prowler_api_provider.uid} in {time.perf_counter() - t0:.3f}s"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 93)
return failed_syncs
@@ -234,6 +260,8 @@ def sync_aws_account(
)
try:
func_t0 = time.perf_counter()
# `ecr:image_layers` uses `aioboto3_session` instead of `boto3_session`
if func_name == "ecr:image_layers":
cartography_aws.RESOURCE_FUNCTIONS[func_name](
@@ -257,7 +285,15 @@ def sync_aws_account(
else:
cartography_aws.RESOURCE_FUNCTIONS[func_name](**sync_args)
logger.info(
f"Synced function {func_name} for AWS account {prowler_api_provider.uid} in {time.perf_counter() - func_t0:.3f}s"
)
except Exception as e:
logger.info(
f"Synced function {func_name} for AWS account {prowler_api_provider.uid} in {time.perf_counter() - func_t0:.3f}s (FAILED)"
)
exception_message = utils.stringify_exception(
e, f"Exception for AWS sync function: {func_name}"
)
@@ -277,3 +313,16 @@ def sync_aws_account(
)
return failed_syncs
def extract_short_uid(uid: str) -> str:
"""Return the short identifier from an AWS ARN or resource ID.
Supported inputs end in one of:
- `<type>/<id>` (e.g. `instance/i-xxx`)
- `<type>:<id>` (e.g. `function:name`)
- `<id>` (e.g. `bucket-name` or `i-xxx`)
If `uid` is already a short resource ID, it is returned unchanged.
"""
return uid.rsplit("/", 1)[-1].rsplit(":", 1)[-1]
@@ -8,9 +8,9 @@ from tasks.jobs.attack_paths import aws
# Batch size for Neo4j write operations (resource labeling, cleanup)
BATCH_SIZE = env.int("ATTACK_PATHS_BATCH_SIZE", 1000)
# Batch size for Postgres findings fetch (keyset pagination page size)
FINDINGS_BATCH_SIZE = env.int("ATTACK_PATHS_FINDINGS_BATCH_SIZE", 500)
FINDINGS_BATCH_SIZE = env.int("ATTACK_PATHS_FINDINGS_BATCH_SIZE", 1000)
# Batch size for temp-to-tenant graph sync (nodes and relationships per cursor page)
SYNC_BATCH_SIZE = env.int("ATTACK_PATHS_SYNC_BATCH_SIZE", 250)
SYNC_BATCH_SIZE = env.int("ATTACK_PATHS_SYNC_BATCH_SIZE", 1000)
# Neo4j internal labels (Prowler-specific, not provider-specific)
# - `Internet`: Singleton node representing external internet access for exposed-resource queries
@@ -37,6 +37,8 @@ class ProviderConfig:
# Label for resources connected to the account node, enabling indexed finding lookups.
resource_label: str # e.g., "_AWSResource"
ingestion_function: Callable
# Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance).
short_uid_extractor: Callable[[str], str]
# Provider Configurations
@@ -48,6 +50,7 @@ AWS_CONFIG = ProviderConfig(
uid_field="arn",
resource_label="_AWSResource",
ingestion_function=aws.start_aws_ingestion,
short_uid_extractor=aws.extract_short_uid,
)
PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
@@ -116,6 +119,21 @@ def get_provider_resource_label(provider_type: str) -> str:
return config.resource_label if config else "_UnknownProviderResource"
def _identity_short_uid(uid: str) -> str:
"""Fallback short-uid extractor for providers without a custom mapping."""
return uid
def get_short_uid_extractor(provider_type: str) -> Callable[[str], str]:
"""Get the short-uid extractor for a provider type.
Returns an identity function when the provider is unknown, so callers can
rely on a callable always being returned.
"""
config = PROVIDER_CONFIGS.get(provider_type)
return config.short_uid_extractor if config else _identity_short_uid
# Dynamic Isolation Label Helpers
# --------------------------------
@@ -5,14 +5,14 @@ This module handles:
- Adding resource labels to Cartography nodes for efficient lookups
- Loading Prowler findings into the graph
- Linking findings to resources
- Cleaning up stale findings
"""
from collections import defaultdict
from typing import Any, Generator
from typing import Any, Callable, Generator
from uuid import UUID
import neo4j
from cartography.config import Config as CartographyConfig
from celery.utils.log import get_task_logger
from tasks.jobs.attack_paths.config import (
@@ -21,10 +21,10 @@ from tasks.jobs.attack_paths.config import (
get_node_uid_field,
get_provider_resource_label,
get_root_node_label,
get_short_uid_extractor,
)
from tasks.jobs.attack_paths.queries import (
ADD_RESOURCE_LABEL_TEMPLATE,
CLEANUP_FINDINGS_TEMPLATE,
INSERT_FINDING_TEMPLATE,
render_cypher_template,
)
@@ -58,7 +58,9 @@ _DB_QUERY_FIELDS = [
]
def _to_neo4j_dict(record: dict[str, Any], resource_uid: str) -> dict[str, Any]:
def _to_neo4j_dict(
record: dict[str, Any], resource_uid: str, resource_short_uid: str
) -> dict[str, Any]:
"""Transform a Django `.values()` record into a `dict` ready for Neo4j ingestion."""
return {
"id": str(record["id"]),
@@ -76,6 +78,7 @@ def _to_neo4j_dict(record: dict[str, Any], resource_uid: str) -> dict[str, Any]:
"muted": record["muted"],
"muted_reason": record["muted_reason"],
"resource_uid": resource_uid,
"resource_short_uid": resource_short_uid,
}
@@ -88,18 +91,21 @@ def analysis(
prowler_api_provider: Provider,
scan_id: str,
config: CartographyConfig,
) -> None:
) -> tuple[int, int]:
"""
Main entry point for Prowler findings analysis.
Adds resource labels, loads findings, and cleans up stale data.
Adds resource labels and loads findings.
Returns (labeled_nodes, findings_loaded).
"""
add_resource_label(
total_labeled = add_resource_label(
neo4j_session, prowler_api_provider.provider, str(prowler_api_provider.uid)
)
findings_data = stream_findings_with_resources(prowler_api_provider, scan_id)
load_findings(neo4j_session, findings_data, prowler_api_provider, config)
cleanup_findings(neo4j_session, prowler_api_provider, config)
total_loaded = load_findings(
neo4j_session, findings_data, prowler_api_provider, config
)
return total_labeled, total_loaded
def add_resource_label(
@@ -149,12 +155,11 @@ def load_findings(
findings_batches: Generator[list[dict[str, Any]], None, None],
prowler_api_provider: Provider,
config: CartographyConfig,
) -> None:
) -> int:
"""Load Prowler findings into the graph, linking them to resources."""
query = render_cypher_template(
INSERT_FINDING_TEMPLATE,
{
"__ROOT_NODE_LABEL__": get_root_node_label(prowler_api_provider.provider),
"__NODE_UID_FIELD__": get_node_uid_field(prowler_api_provider.provider),
"__RESOURCE_LABEL__": get_provider_resource_label(
prowler_api_provider.provider
@@ -163,13 +168,14 @@ def load_findings(
)
parameters = {
"provider_uid": str(prowler_api_provider.uid),
"last_updated": config.update_tag,
"prowler_version": ProwlerConfig.prowler_version,
}
batch_num = 0
total_records = 0
edges_merged = 0
edges_dropped = 0
for batch in findings_batches:
batch_num += 1
batch_size = len(batch)
@@ -178,31 +184,16 @@ def load_findings(
parameters["findings_data"] = batch
logger.info(f"Loading findings batch {batch_num} ({batch_size} records)")
neo4j_session.run(query, parameters)
summary = neo4j_session.run(query, parameters).single()
if summary is not None:
edges_merged += summary.get("merged_count", 0)
edges_dropped += summary.get("dropped_count", 0)
logger.info(f"Finished loading {total_records} records in {batch_num} batches")
def cleanup_findings(
neo4j_session: neo4j.Session,
prowler_api_provider: Provider,
config: CartographyConfig,
) -> None:
"""Remove stale findings (classic Cartography behaviour)."""
parameters = {
"last_updated": config.update_tag,
"batch_size": BATCH_SIZE,
}
batch = 1
deleted_count = 1
while deleted_count > 0:
logger.info(f"Cleaning findings batch {batch}")
result = neo4j_session.run(CLEANUP_FINDINGS_TEMPLATE, parameters)
deleted_count = result.single().get("deleted_findings_count", 0)
batch += 1
logger.info(
f"Finished loading {total_records} records in {batch_num} batches "
f"(edges_merged={edges_merged}, edges_dropped={edges_dropped})"
)
return total_records
# Findings Streaming (Generator-based)
@@ -226,8 +217,9 @@ def stream_findings_with_resources(
)
tenant_id = prowler_api_provider.tenant_id
short_uid_extractor = get_short_uid_extractor(prowler_api_provider.provider)
for batch in _paginate_findings(tenant_id, scan_id):
enriched = _enrich_batch_with_resources(batch, tenant_id)
enriched = _enrich_batch_with_resources(batch, tenant_id, short_uid_extractor)
if enriched:
yield enriched
@@ -273,7 +265,9 @@ def _fetch_findings_batch(
with rls_transaction(tenant_id, using=READ_REPLICA_ALIAS):
# Use `all_objects` to get `Findings` even on soft-deleted `Providers`
# But even the provider is already validated as active in this context
qs = FindingModel.all_objects.filter(scan_id=scan_id).order_by("id")
qs = FindingModel.all_objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).order_by("id")
if after_id is not None:
qs = qs.filter(id__gt=after_id)
@@ -288,6 +282,7 @@ def _fetch_findings_batch(
def _enrich_batch_with_resources(
findings_batch: list[dict[str, Any]],
tenant_id: str,
short_uid_extractor: Callable[[str], str],
) -> list[dict[str, Any]]:
"""
Enrich findings with their resource UIDs.
@@ -299,7 +294,7 @@ def _enrich_batch_with_resources(
resource_map = _build_finding_resource_map(finding_ids, tenant_id)
return [
_to_neo4j_dict(finding, resource_uid)
_to_neo4j_dict(finding, resource_uid, short_uid_extractor(resource_uid))
for finding in findings_batch
for resource_uid in resource_map.get(finding["id"], [])
]
@@ -13,14 +13,13 @@ from tasks.jobs.attack_paths.config import (
logger = get_task_logger(__name__)
# Indexes for Prowler findings and resource lookups
# Indexes for Prowler Findings and resource lookups
FINDINGS_INDEX_STATEMENTS = [
# Resource indexes for Prowler Finding lookups
"CREATE INDEX aws_resource_arn IF NOT EXISTS FOR (n:_AWSResource) ON (n.arn);",
"CREATE INDEX aws_resource_id IF NOT EXISTS FOR (n:_AWSResource) ON (n.id);",
# Prowler Finding indexes
f"CREATE INDEX prowler_finding_id IF NOT EXISTS FOR (n:{PROWLER_FINDING_LABEL}) ON (n.id);",
f"CREATE INDEX prowler_finding_lastupdated IF NOT EXISTS FOR (n:{PROWLER_FINDING_LABEL}) ON (n.lastupdated);",
f"CREATE INDEX prowler_finding_status IF NOT EXISTS FOR (n:{PROWLER_FINDING_LABEL}) ON (n.status);",
# Internet node index for MERGE lookups
f"CREATE INDEX internet_id IF NOT EXISTS FOR (n:{INTERNET_NODE_LABEL}) ON (n.id);",
@@ -32,63 +32,59 @@ ADD_RESOURCE_LABEL_TEMPLATE = """
"""
INSERT_FINDING_TEMPLATE = f"""
MATCH (account:__ROOT_NODE_LABEL__ {{id: $provider_uid}})
UNWIND $findings_data AS finding_data
OPTIONAL MATCH (account)-->(resource_by_uid:__RESOURCE_LABEL__)
WHERE resource_by_uid.__NODE_UID_FIELD__ = finding_data.resource_uid
WITH account, finding_data, resource_by_uid
OPTIONAL MATCH (account)-->(resource_by_id:__RESOURCE_LABEL__)
OPTIONAL MATCH (resource_by_uid:__RESOURCE_LABEL__ {{__NODE_UID_FIELD__: finding_data.resource_uid}})
OPTIONAL MATCH (resource_by_id:__RESOURCE_LABEL__ {{id: finding_data.resource_uid}})
WHERE resource_by_uid IS NULL
AND resource_by_id.id = finding_data.resource_uid
WITH account, finding_data, COALESCE(resource_by_uid, resource_by_id) AS resource
WHERE resource IS NOT NULL
OPTIONAL MATCH (resource_by_short:__RESOURCE_LABEL__ {{id: finding_data.resource_short_uid}})
WHERE resource_by_uid IS NULL AND resource_by_id IS NULL
WITH finding_data,
resource_by_uid,
resource_by_id,
head(collect(resource_by_short)) AS resource_by_short
WITH finding_data,
COALESCE(resource_by_uid, resource_by_id, resource_by_short) AS resource
MERGE (finding:{PROWLER_FINDING_LABEL} {{id: finding_data.id}})
ON CREATE SET
finding.id = finding_data.id,
finding.uid = finding_data.uid,
finding.inserted_at = finding_data.inserted_at,
finding.updated_at = finding_data.updated_at,
finding.first_seen_at = finding_data.first_seen_at,
finding.scan_id = finding_data.scan_id,
finding.delta = finding_data.delta,
finding.status = finding_data.status,
finding.status_extended = finding_data.status_extended,
finding.severity = finding_data.severity,
finding.check_id = finding_data.check_id,
finding.check_title = finding_data.check_title,
finding.muted = finding_data.muted,
finding.muted_reason = finding_data.muted_reason,
finding.firstseen = timestamp(),
finding.lastupdated = $last_updated,
finding._module_name = 'cartography:prowler',
finding._module_version = $prowler_version
ON MATCH SET
finding.status = finding_data.status,
finding.status_extended = finding_data.status_extended,
finding.lastupdated = $last_updated
FOREACH (_ IN CASE WHEN resource IS NOT NULL THEN [1] ELSE [] END |
MERGE (finding:{PROWLER_FINDING_LABEL} {{id: finding_data.id}})
ON CREATE SET
finding.id = finding_data.id,
finding.uid = finding_data.uid,
finding.inserted_at = finding_data.inserted_at,
finding.updated_at = finding_data.updated_at,
finding.first_seen_at = finding_data.first_seen_at,
finding.scan_id = finding_data.scan_id,
finding.delta = finding_data.delta,
finding.status = finding_data.status,
finding.status_extended = finding_data.status_extended,
finding.severity = finding_data.severity,
finding.check_id = finding_data.check_id,
finding.check_title = finding_data.check_title,
finding.muted = finding_data.muted,
finding.muted_reason = finding_data.muted_reason,
finding.firstseen = timestamp(),
finding.lastupdated = $last_updated,
finding._module_name = 'cartography:prowler',
finding._module_version = $prowler_version
ON MATCH SET
finding.status = finding_data.status,
finding.status_extended = finding_data.status_extended,
finding.lastupdated = $last_updated
MERGE (resource)-[rel:HAS_FINDING]->(finding)
ON CREATE SET
rel.firstseen = timestamp(),
rel.lastupdated = $last_updated,
rel._module_name = 'cartography:prowler',
rel._module_version = $prowler_version
ON MATCH SET
rel.lastupdated = $last_updated
)
MERGE (resource)-[rel:HAS_FINDING]->(finding)
ON CREATE SET
rel.firstseen = timestamp(),
rel.lastupdated = $last_updated,
rel._module_name = 'cartography:prowler',
rel._module_version = $prowler_version
ON MATCH SET
rel.lastupdated = $last_updated
"""
WITH sum(CASE WHEN resource IS NOT NULL THEN 1 ELSE 0 END) AS merged_count,
sum(CASE WHEN resource IS NULL THEN 1 ELSE 0 END) AS dropped_count
CLEANUP_FINDINGS_TEMPLATE = f"""
MATCH (finding:{PROWLER_FINDING_LABEL})
WHERE finding.lastupdated < $last_updated
WITH finding LIMIT $batch_size
DETACH DELETE finding
RETURN COUNT(finding) AS deleted_findings_count
RETURN merged_count, dropped_count
"""
# Internet queries (used by internet.py)
+38 -10
View File
@@ -55,6 +55,7 @@ exception propagates to Celery.
import logging
import time
from typing import Any
from cartography.config import Config as CartographyConfig
@@ -144,6 +145,12 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
attack_paths_scan, task_id, tenant_cartography_config
)
scan_t0 = time.perf_counter()
logger.info(
f"Starting Attack Paths scan ({attack_paths_scan.id}) for "
f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id}"
)
subgraph_dropped = False
sync_completed = False
provider_gated = False
@@ -169,6 +176,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 2)
# The real scan, where iterates over cloud services
t0 = time.perf_counter()
ingestion_exceptions = utils.call_within_event_loop(
cartography_ingestion_function,
tmp_neo4j_session,
@@ -177,19 +185,23 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
prowler_sdk_provider,
attack_paths_scan,
)
logger.info(
f"Cartography ingestion completed in {time.perf_counter() - t0:.3f}s "
f"(failed_syncs={len(ingestion_exceptions)})"
)
# Post-processing: Just keeping it to be more Cartography compliant
logger.info(
f"Syncing Cartography ontology for AWS account {prowler_api_provider.uid}"
)
cartography_ontology.run(tmp_neo4j_session, tmp_cartography_config)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 95)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 94)
logger.info(
f"Syncing Cartography analysis for AWS account {prowler_api_provider.uid}"
)
cartography_analysis.run(tmp_neo4j_session, tmp_cartography_config)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 96)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 95)
# Creating Internet node and CAN_ACCESS relationships
logger.info(
@@ -198,14 +210,20 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
internet.analysis(
tmp_neo4j_session, prowler_api_provider, tmp_cartography_config
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 96)
# Adding Prowler Finding nodes and relationships
logger.info(
f"Syncing Prowler analysis for AWS account {prowler_api_provider.uid}"
)
findings.analysis(
t0 = time.perf_counter()
labeled_nodes, findings_loaded = findings.analysis(
tmp_neo4j_session, prowler_api_provider, scan_id, tmp_cartography_config
)
logger.info(
f"Prowler analysis completed in {time.perf_counter() - t0:.3f}s "
f"(findings={findings_loaded}, labeled_nodes={labeled_nodes})"
)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 97)
logger.info(
@@ -227,22 +245,33 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
logger.info(f"Deleting existing provider graph in {tenant_database_name}")
db_utils.set_provider_graph_data_ready(attack_paths_scan, False)
provider_gated = True
graph_database.drop_subgraph(
t0 = time.perf_counter()
deleted_nodes = graph_database.drop_subgraph(
database=tenant_database_name,
provider_id=str(prowler_api_provider.id),
)
logger.info(
f"Deleted existing provider graph in {time.perf_counter() - t0:.3f}s "
f"(deleted_nodes={deleted_nodes})"
)
subgraph_dropped = True
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 98)
logger.info(
f"Syncing graph from {tmp_database_name} into {tenant_database_name}"
)
sync.sync_graph(
t0 = time.perf_counter()
sync_result = sync.sync_graph(
source_database=tmp_database_name,
target_database=tenant_database_name,
tenant_id=str(prowler_api_provider.tenant_id),
provider_id=str(prowler_api_provider.id),
)
logger.info(
f"Synced graph in {time.perf_counter() - t0:.3f}s "
f"(nodes={sync_result['nodes']}, relationships={sync_result['relationships']})"
)
sync_completed = True
db_utils.set_graph_data_ready(attack_paths_scan, True)
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 99)
@@ -250,17 +279,16 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
logger.info(f"Clearing Neo4j cache for database {tenant_database_name}")
graph_database.clear_cache(tenant_database_name)
logger.info(
f"Completed Cartography ({attack_paths_scan.id}) for "
f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id}"
)
logger.info(f"Dropping temporary Neo4j database {tmp_database_name}")
graph_database.drop_database(tmp_database_name)
db_utils.finish_attack_paths_scan(
attack_paths_scan, StateChoices.COMPLETED, ingestion_exceptions
)
logger.info(
f"Attack Paths scan completed in {time.perf_counter() - scan_t0:.3f}s "
f"(state=completed, failed_syncs={len(ingestion_exceptions)})"
)
return ingestion_exceptions
except Exception as e:
@@ -5,6 +5,8 @@ This module handles syncing graph data from temporary scan databases
to the tenant database, adding provider isolation labels and properties.
"""
import time
from collections import defaultdict
from typing import Any
@@ -81,6 +83,7 @@ def sync_nodes(
Source and target sessions are opened sequentially per batch to avoid
holding two Bolt connections simultaneously for the entire sync duration.
"""
t0 = time.perf_counter()
last_id = -1
total_synced = 0
@@ -117,7 +120,7 @@ def sync_nodes(
total_synced += batch_count
logger.info(
f"Synced {total_synced} nodes from {source_database} to {target_database}"
f"Synced {total_synced} nodes from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
)
return total_synced
@@ -136,6 +139,7 @@ def sync_relationships(
Source and target sessions are opened sequentially per batch to avoid
holding two Bolt connections simultaneously for the entire sync duration.
"""
t0 = time.perf_counter()
last_id = -1
total_synced = 0
@@ -166,7 +170,7 @@ def sync_relationships(
total_synced += batch_count
logger.info(
f"Synced {total_synced} relationships from {source_database} to {target_database}"
f"Synced {total_synced} relationships from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
)
return total_synced
+46 -26
View File
@@ -297,12 +297,15 @@ def backfill_daily_severity_summaries(tenant_id: str, days: int = None):
}
def backfill_scan_category_summaries(tenant_id: str, scan_id: str):
def aggregate_scan_category_summaries(tenant_id: str, scan_id: str):
"""
Backfill ScanCategorySummary for a completed scan.
Aggregates category counts from all findings in the scan and creates
one ScanCategorySummary row per (category, severity) combination.
Idempotent: re-runs replace the scan's existing rows so counts stay in
sync with `Finding.muted` updates triggered outside scan completion
(e.g. mute rules).
Args:
tenant_id: Target tenant UUID
@@ -312,11 +315,6 @@ def backfill_scan_category_summaries(tenant_id: str, scan_id: str):
dict: Status indicating whether backfill was performed
"""
with rls_transaction(tenant_id, using=READ_REPLICA_ALIAS):
if ScanCategorySummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).exists():
return {"status": "already backfilled"}
if not Scan.objects.filter(
tenant_id=tenant_id,
id=scan_id,
@@ -337,9 +335,6 @@ def backfill_scan_category_summaries(tenant_id: str, scan_id: str):
cache=category_counts,
)
if not category_counts:
return {"status": "no categories to backfill"}
category_summaries = [
ScanCategorySummary(
tenant_id=tenant_id,
@@ -353,20 +348,38 @@ def backfill_scan_category_summaries(tenant_id: str, scan_id: str):
for (category, severity), counts in category_counts.items()
]
with rls_transaction(tenant_id):
ScanCategorySummary.objects.bulk_create(
category_summaries, batch_size=500, ignore_conflicts=True
)
if category_summaries:
with rls_transaction(tenant_id):
# Upsert so re-runs (post-mute reaggregation) don't trip
# `unique_category_severity_per_scan`; race-safe under concurrent writers.
ScanCategorySummary.objects.bulk_create(
category_summaries,
batch_size=500,
update_conflicts=True,
unique_fields=["tenant_id", "scan_id", "category", "severity"],
update_fields=[
"total_findings",
"failed_findings",
"new_failed_findings",
],
)
if not category_counts:
return {"status": "no categories to backfill"}
return {"status": "backfilled", "categories_count": len(category_counts)}
def backfill_scan_resource_group_summaries(tenant_id: str, scan_id: str):
def aggregate_scan_resource_group_summaries(tenant_id: str, scan_id: str):
"""
Backfill ScanGroupSummary for a completed scan.
Aggregates resource group counts from all findings in the scan and creates
one ScanGroupSummary row per (resource_group, severity) combination.
Idempotent: re-runs replace the scan's existing rows so counts stay in
sync with `Finding.muted` updates triggered outside scan completion
(e.g. mute rules) and with resource-inventory views reading from this
table.
Args:
tenant_id: Target tenant UUID
@@ -376,11 +389,6 @@ def backfill_scan_resource_group_summaries(tenant_id: str, scan_id: str):
dict: Status indicating whether backfill was performed
"""
with rls_transaction(tenant_id, using=READ_REPLICA_ALIAS):
if ScanGroupSummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).exists():
return {"status": "already backfilled"}
if not Scan.objects.filter(
tenant_id=tenant_id,
id=scan_id,
@@ -418,9 +426,6 @@ def backfill_scan_resource_group_summaries(tenant_id: str, scan_id: str):
group_resources_cache=group_resources_cache,
)
if not resource_group_counts:
return {"status": "no resource groups to backfill"}
# Compute group-level resource counts (same value for all severity rows in a group)
group_resource_counts = {
grp: len(uids) for grp, uids in group_resources_cache.items()
@@ -439,10 +444,25 @@ def backfill_scan_resource_group_summaries(tenant_id: str, scan_id: str):
for (grp, severity), counts in resource_group_counts.items()
]
with rls_transaction(tenant_id):
ScanGroupSummary.objects.bulk_create(
resource_group_summaries, batch_size=500, ignore_conflicts=True
)
if resource_group_summaries:
with rls_transaction(tenant_id):
# Upsert so re-runs (post-mute reaggregation) don't trip
# `unique_resource_group_severity_per_scan`; race-safe under concurrent writers.
ScanGroupSummary.objects.bulk_create(
resource_group_summaries,
batch_size=500,
update_conflicts=True,
unique_fields=["tenant_id", "scan_id", "resource_group", "severity"],
update_fields=[
"total_findings",
"failed_findings",
"new_failed_findings",
"resources_count",
],
)
if not resource_group_counts:
return {"status": "no resource groups to backfill"}
return {"status": "backfilled", "resource_groups_count": len(resource_group_counts)}
+636 -42
View File
@@ -1,11 +1,19 @@
import gc
import os
import re
import time
from collections.abc import Iterable
from pathlib import Path
from shutil import rmtree
from uuid import UUID
import fcntl
from celery.utils.log import get_task_logger
from config.django.base import DJANGO_TMP_OUTPUT_DIRECTORY
from tasks.jobs.export import _generate_compliance_output_directory, _upload_to_s3
from tasks.jobs.reports import (
FRAMEWORK_REGISTRY,
CISReportGenerator,
CSAReportGenerator,
ENSReportGenerator,
NIS2ReportGenerator,
@@ -14,12 +22,398 @@ from tasks.jobs.reports import (
from tasks.jobs.threatscore import compute_threatscore_metrics
from tasks.jobs.threatscore_utils import _aggregate_requirement_statistics_from_database
from api.db_router import READ_REPLICA_ALIAS
from api.db_router import READ_REPLICA_ALIAS, MainRouter
from api.db_utils import rls_transaction
from api.models import Provider, ScanSummary, ThreatScoreSnapshot
from api.models import Provider, Scan, ScanSummary, StateChoices, ThreatScoreSnapshot
from prowler.lib.check.compliance_models import Compliance
from prowler.lib.outputs.finding import Finding as FindingOutput
logger = get_task_logger(__name__)
STALE_TMP_OUTPUT_MAX_AGE_HOURS = 48
STALE_TMP_OUTPUT_MAX_DELETIONS_PER_RUN = 50
STALE_TMP_OUTPUT_THROTTLE_SECONDS = 60 * 60
STALE_TMP_OUTPUT_LOCK_FILE_NAME = ".stale_tmp_cleanup.lock"
# Refuse to ever run rmtree against shared system roots; the configured
# DJANGO_TMP_OUTPUT_DIRECTORY must be a dedicated subdirectory.
_FORBIDDEN_CLEANUP_ROOTS = frozenset(
Path(p).resolve()
for p in ("/", "/tmp", "/var", "/var/tmp", "/home", "/root", "/etc", "/usr")
)
def _resolve_stale_tmp_safe_root() -> Path | None:
"""Resolve the configured tmp output directory, rejecting unsafe roots."""
try:
configured_root = Path(DJANGO_TMP_OUTPUT_DIRECTORY).resolve()
except OSError:
return None
if configured_root in _FORBIDDEN_CLEANUP_ROOTS:
return None
return configured_root
STALE_TMP_OUTPUT_SAFE_ROOT = _resolve_stale_tmp_safe_root()
# Matches CIS compliance_ids like "cis_1.4_aws", "cis_5.0_azure",
# "cis_1.10_kubernetes", "cis_3.0.1_aws". Requires at least one dotted
# component so malformed inputs like "cis_._aws" or "cis_5._aws" are rejected
# at the regex stage, rather than by a later ValueError fallback.
_CIS_VARIANT_RE = re.compile(r"^cis_(?P<version>\d+(?:\.\d+)+)_(?P<provider>.+)$")
def _pick_latest_cis_variant(compliance_ids: Iterable[str]) -> str | None:
"""Return the CIS compliance_id with the highest semantic version.
CIS ships many variants per provider (e.g. cis_1.4_aws, ..., cis_6.0_aws).
A lexicographic sort is incorrect for version strings like ``1.10`` vs
``1.2``; this helper parses the version into a tuple of ints so ``1.10``
is correctly ordered after ``1.2``. Malformed names are skipped so a
broken JSON cannot crash the whole CIS pipeline.
Args:
compliance_ids: Iterable of CIS compliance identifiers. Expected to
belong to a single provider (callers should pass the already
filtered keys from ``Compliance.get_bulk(provider_type)``).
Returns:
The compliance_id with the highest parsed version, or ``None`` if no
well-formed CIS identifier was found.
"""
best_key: tuple[int, ...] | None = None
best_name: str | None = None
for name in compliance_ids:
match = _CIS_VARIANT_RE.match(name)
if not match:
continue
try:
key = tuple(int(part) for part in match.group("version").split("."))
except ValueError:
# Defensive: the regex already guarantees numeric chunks, but we
# keep the guard so a future regex change cannot crash callers.
continue
if best_key is None or key > best_key:
best_key = key
best_name = name
return best_name
def _should_run_stale_cleanup(
root_path: Path,
throttle_seconds: int = STALE_TMP_OUTPUT_THROTTLE_SECONDS,
) -> bool:
"""Throttle stale cleanup to at most once per hour per host."""
lock_file_path = root_path / STALE_TMP_OUTPUT_LOCK_FILE_NAME
now_timestamp = int(time.time())
try:
with lock_file_path.open("a+", encoding="ascii") as lock_file:
try:
fcntl.flock(lock_file.fileno(), fcntl.LOCK_EX | fcntl.LOCK_NB)
except BlockingIOError:
return False
lock_file.seek(0)
previous_value = lock_file.read().strip()
try:
last_run_timestamp = int(previous_value) if previous_value else 0
except ValueError:
last_run_timestamp = 0
if now_timestamp - last_run_timestamp < throttle_seconds:
return False
lock_file.seek(0)
lock_file.truncate()
lock_file.write(str(now_timestamp))
lock_file.flush()
os.fsync(lock_file.fileno())
except OSError as error:
logger.warning("Skipping stale tmp cleanup: lock file error (%s)", error)
return False
return True
def _is_scan_metadata_protected(
scan_path: Path,
scan_state: str | None,
output_location: str | None,
) -> bool:
"""
Return True when metadata indicates the directory must not be deleted.
Protected cases:
- Scan is still EXECUTING.
- Scan has a local output artifact path (non-S3) under this scan directory.
"""
if scan_state == StateChoices.EXECUTING.value:
return True
output_location = output_location or ""
if output_location and not output_location.startswith("s3://"):
try:
resolved_output_location = Path(output_location).resolve()
except OSError:
# Conservative fallback: if we cannot resolve a local output path,
# keep the directory to avoid deleting potentially needed artifacts.
return True
if (
resolved_output_location == scan_path
or scan_path in resolved_output_location.parents
):
return True
return False
def _is_scan_directory_protected(
tenant_id: str,
scan_id: str,
scan_path: Path,
) -> bool:
"""
DB-backed wrapper used when batch metadata is not already available.
"""
try:
scan_uuid = UUID(scan_id)
except ValueError:
return False
try:
scan = (
Scan.all_objects.using(MainRouter.admin_db)
.filter(tenant_id=tenant_id, id=scan_uuid)
.only("state", "output_location")
.first()
)
except Exception as error:
logger.warning(
"Skipping stale tmp cleanup for %s/%s due to scan lookup error: %s",
tenant_id,
scan_id,
error,
)
return True
if not scan:
return False
return _is_scan_metadata_protected(
scan_path=scan_path,
scan_state=scan.state,
output_location=scan.output_location,
)
def _cleanup_stale_tmp_output_directories(
tmp_output_root: str,
max_age_hours: int = STALE_TMP_OUTPUT_MAX_AGE_HOURS,
exclude_scan: tuple[str, str] | None = None,
max_deletions_per_run: int = STALE_TMP_OUTPUT_MAX_DELETIONS_PER_RUN,
) -> int:
"""
Opportunistically delete stale scan directories under the tmp output root.
Expected directory layout:
<tmp_output_root>/<tenant_id>/<scan_id>/...
Each run that wins the per-host throttle sweeps every tenant directory so
leftover artifacts cannot pile up for tenants whose own tasks happen to
lose the throttle race.
Args:
tmp_output_root: Base tmp output path.
max_age_hours: Directory max age before deletion.
exclude_scan: Optional (tenant_id, scan_id) that must never be deleted.
max_deletions_per_run: Max number of scan directories deleted per run.
Returns:
Number of deleted scan directories.
"""
try:
if max_age_hours <= 0:
return 0
try:
root_path = Path(tmp_output_root).resolve()
except OSError as error:
logger.warning(
"Skipping stale tmp cleanup: unable to resolve %s (%s)",
tmp_output_root,
error,
)
return 0
if (
STALE_TMP_OUTPUT_SAFE_ROOT is None
or root_path != STALE_TMP_OUTPUT_SAFE_ROOT
):
logger.warning(
"Skipping stale tmp cleanup: unsupported root %s (allowed: %s)",
root_path,
STALE_TMP_OUTPUT_SAFE_ROOT,
)
return 0
if not root_path.exists() or not root_path.is_dir():
return 0
if max_deletions_per_run <= 0:
return 0
if not _should_run_stale_cleanup(root_path):
return 0
cutoff_timestamp = time.time() - (max_age_hours * 60 * 60)
deleted_scan_dirs = 0
try:
tenant_dirs = list(root_path.iterdir())
except OSError as error:
logger.warning(
"Skipping stale tmp cleanup: unable to list %s (%s)",
root_path,
error,
)
return 0
for tenant_dir in tenant_dirs:
if deleted_scan_dirs >= max_deletions_per_run:
break
if not tenant_dir.is_dir() or tenant_dir.is_symlink():
continue
try:
scan_dirs = list(tenant_dir.iterdir())
except OSError:
continue
stale_candidates: list[tuple[str, Path, UUID | None]] = []
for scan_dir in scan_dirs:
if not scan_dir.is_dir() or scan_dir.is_symlink():
continue
if exclude_scan and (
tenant_dir.name == exclude_scan[0]
and scan_dir.name == exclude_scan[1]
):
continue
try:
if scan_dir.stat().st_mtime >= cutoff_timestamp:
continue
except OSError:
continue
try:
resolved_scan_dir = scan_dir.resolve()
except OSError:
continue
if root_path not in resolved_scan_dir.parents:
logger.warning(
"Skipping stale tmp cleanup for path outside root: %s",
resolved_scan_dir,
)
continue
try:
scan_uuid: UUID | None = UUID(scan_dir.name)
except ValueError:
scan_uuid = None
stale_candidates.append((scan_dir.name, resolved_scan_dir, scan_uuid))
if not stale_candidates:
continue
scan_metadata_by_id: dict[UUID, tuple[str | None, str | None]] = {}
metadata_preload_succeeded = False
candidate_scan_ids = [
candidate[2] for candidate in stale_candidates if candidate[2]
]
if candidate_scan_ids:
try:
scan_rows = (
Scan.all_objects.using(MainRouter.admin_db)
.filter(
tenant_id=tenant_dir.name,
id__in=candidate_scan_ids,
)
.values_list("id", "state", "output_location")
)
scan_metadata_by_id = {
scan_id: (scan_state, output_location)
for scan_id, scan_state, output_location in scan_rows
}
metadata_preload_succeeded = True
except Exception as error:
logger.warning(
"Skipping stale tmp cleanup metadata preload for tenant %s: %s",
tenant_dir.name,
error,
)
else:
metadata_preload_succeeded = True
for scan_name, resolved_scan_dir, scan_uuid in stale_candidates:
if deleted_scan_dirs >= max_deletions_per_run:
break
should_check_scan_fallback = True
if scan_uuid and metadata_preload_succeeded:
should_check_scan_fallback = False
scan_metadata = scan_metadata_by_id.get(scan_uuid)
if scan_metadata:
scan_state, output_location = scan_metadata
if _is_scan_metadata_protected(
scan_path=resolved_scan_dir,
scan_state=scan_state,
output_location=output_location,
):
continue
if should_check_scan_fallback and _is_scan_directory_protected(
tenant_id=tenant_dir.name,
scan_id=scan_name,
scan_path=resolved_scan_dir,
):
continue
try:
rmtree(resolved_scan_dir, ignore_errors=True)
deleted_scan_dirs += 1
except Exception as error:
logger.warning(
"Error cleaning stale tmp directory %s: %s",
resolved_scan_dir,
error,
)
if deleted_scan_dirs:
logger.info(
"Deleted %s stale tmp output directories older than %sh from %s",
deleted_scan_dirs,
max_age_hours,
root_path,
)
if deleted_scan_dirs >= max_deletions_per_run:
logger.info(
"Stale tmp cleanup hit deletion limit (%s) for root %s",
max_deletions_per_run,
root_path,
)
return deleted_scan_dirs
except Exception as error:
logger.warning(
"Skipping stale tmp cleanup due to unexpected error: %s",
error,
exc_info=True,
)
return 0
def generate_threatscore_report(
@@ -191,6 +585,53 @@ def generate_csa_report(
)
def generate_cis_report(
tenant_id: str,
scan_id: str,
compliance_id: str,
output_path: str,
provider_id: str,
only_failed: bool = True,
include_manual: bool = False,
provider_obj: Provider | None = None,
requirement_statistics: dict[str, dict[str, int]] | None = None,
findings_cache: dict[str, list[FindingOutput]] | None = None,
) -> None:
"""
Generate a PDF compliance report for a specific CIS Benchmark variant.
Unlike single-version frameworks (ENS, NIS2, CSA), CIS has multiple
variants per provider (e.g., cis_1.4_aws, cis_5.0_aws, cis_6.0_aws). This
wrapper is called once per variant, receiving the specific compliance_id.
Args:
tenant_id: The tenant ID for Row-Level Security context.
scan_id: ID of the scan executed by Prowler.
compliance_id: ID of the specific CIS variant (e.g., "cis_5.0_aws").
output_path: Output PDF file path.
provider_id: Provider ID for the scan.
only_failed: If True, only include failed requirements in detailed section.
include_manual: If True, include manual requirements in detailed section.
provider_obj: Pre-fetched Provider object to avoid duplicate queries.
requirement_statistics: Pre-aggregated requirement statistics.
findings_cache: Cache of already loaded findings to avoid duplicate queries.
"""
generator = CISReportGenerator(FRAMEWORK_REGISTRY["cis"])
generator.generate(
tenant_id=tenant_id,
scan_id=scan_id,
compliance_id=compliance_id,
output_path=output_path,
provider_id=provider_id,
provider_obj=provider_obj,
requirement_statistics=requirement_statistics,
findings_cache=findings_cache,
only_failed=only_failed,
include_manual=include_manual,
)
def generate_compliance_reports(
tenant_id: str,
scan_id: str,
@@ -199,6 +640,7 @@ def generate_compliance_reports(
generate_ens: bool = True,
generate_nis2: bool = True,
generate_csa: bool = True,
generate_cis: bool = True,
only_failed_threatscore: bool = True,
min_risk_level_threatscore: int = 4,
include_manual_ens: bool = True,
@@ -206,6 +648,8 @@ def generate_compliance_reports(
only_failed_nis2: bool = True,
only_failed_csa: bool = True,
include_manual_csa: bool = False,
only_failed_cis: bool = True,
include_manual_cis: bool = False,
) -> dict[str, dict[str, bool | str]]:
"""
Generate multiple compliance reports with shared database queries.
@@ -215,6 +659,13 @@ def generate_compliance_reports(
- Aggregating requirement statistics once (shared across all reports)
- Reusing compliance framework data when possible
For CIS a single PDF is produced per run: the one matching the highest
available CIS version for the scan's provider (picked dynamically from
``Compliance.get_bulk`` via :func:`_pick_latest_cis_variant`). The
returned ``results["cis"]`` entry has the same flat shape as the other
single-version frameworks the picked variant is an internal detail,
not surfaced in the result.
Args:
tenant_id: The tenant ID for Row-Level Security context.
scan_id: The ID of the scan to generate reports for.
@@ -223,6 +674,8 @@ def generate_compliance_reports(
generate_ens: Whether to generate ENS report.
generate_nis2: Whether to generate NIS2 report.
generate_csa: Whether to generate CSA CCM report.
generate_cis: Whether to generate a CIS Benchmark report for the
latest CIS version available for the provider.
only_failed_threatscore: For ThreatScore, only include failed requirements.
min_risk_level_threatscore: Minimum risk level for ThreatScore critical requirements.
include_manual_ens: For ENS, include manual requirements.
@@ -230,22 +683,39 @@ def generate_compliance_reports(
only_failed_nis2: For NIS2, only include failed requirements.
only_failed_csa: For CSA CCM, only include failed requirements.
include_manual_csa: For CSA CCM, include manual requirements.
only_failed_cis: For CIS, only include failed requirements in detailed section.
include_manual_cis: For CIS, include manual requirements in detailed section.
Returns:
Dictionary with results for each report type.
Dictionary with results for each report type. Every value has the
same flat shape: ``{"upload": bool, "path": str, "error"?: str}``.
"""
logger.info(
"Generating compliance reports for scan %s with provider %s"
" (ThreatScore: %s, ENS: %s, NIS2: %s, CSA: %s)",
" (ThreatScore: %s, ENS: %s, NIS2: %s, CSA: %s, CIS: %s)",
scan_id,
provider_id,
generate_threatscore,
generate_ens,
generate_nis2,
generate_csa,
generate_cis,
)
results = {}
try:
_cleanup_stale_tmp_output_directories(
DJANGO_TMP_OUTPUT_DIRECTORY,
max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS,
exclude_scan=(tenant_id, scan_id),
)
except Exception as error:
logger.warning(
"Skipping stale tmp cleanup before compliance reports for scan %s: %s",
scan_id,
error,
)
results: dict = {}
# Validate that the scan has findings and get provider info
with rls_transaction(tenant_id, using=READ_REPLICA_ALIAS):
@@ -259,6 +729,8 @@ def generate_compliance_reports(
results["nis2"] = {"upload": False, "path": ""}
if generate_csa:
results["csa"] = {"upload": False, "path": ""}
if generate_cis:
results["cis"] = {"upload": False, "path": ""}
return results
provider_obj = Provider.objects.get(id=provider_id)
@@ -299,11 +771,39 @@ def generate_compliance_reports(
results["csa"] = {"upload": False, "path": ""}
generate_csa = False
# For CIS we do NOT pre-check the provider against a hard-coded whitelist
# (that list drifts the moment a new CIS JSON ships). Instead, we inspect
# the dynamically loaded framework map and pick the latest available CIS
# version, if any.
latest_cis: str | None = None
if generate_cis:
try:
frameworks_bulk = Compliance.get_bulk(provider_type)
latest_cis = _pick_latest_cis_variant(
name for name in frameworks_bulk.keys() if name.startswith("cis_")
)
except Exception as e:
logger.error("Error discovering CIS variants for %s: %s", provider_type, e)
results["cis"] = {"upload": False, "path": "", "error": str(e)}
generate_cis = False
else:
if latest_cis is None:
logger.info("No CIS variants available for provider %s", provider_type)
results["cis"] = {"upload": False, "path": ""}
generate_cis = False
else:
logger.info(
"Selected latest CIS variant for provider %s: %s",
provider_type,
latest_cis,
)
if (
not generate_threatscore
and not generate_ens
and not generate_nis2
and not generate_csa
and not generate_cis
):
return results
@@ -319,38 +819,56 @@ def generate_compliance_reports(
findings_cache = {}
logger.info("Created shared findings cache for all reports")
# Generate output directories
generated_report_keys: list[str] = []
output_paths: dict[str, str] = {}
out_dir: str | None = None
# Generate output directories only for enabled and supported report types.
try:
logger.info("Generating output directories")
threatscore_path = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="threatscore",
)
ens_path = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="ens",
)
nis2_path = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="nis2",
)
csa_path = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="csa",
)
out_dir = str(Path(threatscore_path).parent.parent)
if generate_threatscore:
output_paths["threatscore"] = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="threatscore",
)
if generate_ens:
output_paths["ens"] = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="ens",
)
if generate_nis2:
output_paths["nis2"] = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="nis2",
)
if generate_csa:
output_paths["csa"] = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="csa",
)
if generate_cis and latest_cis:
output_paths["cis"] = _generate_compliance_output_directory(
DJANGO_TMP_OUTPUT_DIRECTORY,
provider_uid,
tenant_id,
scan_id,
compliance_framework="cis",
)
if output_paths:
first_output_path = next(iter(output_paths.values()))
out_dir = str(Path(first_output_path).parent.parent)
except Exception as e:
logger.error("Error generating output directory: %s", e)
error_dict = {"error": str(e), "upload": False, "path": ""}
@@ -362,10 +880,14 @@ def generate_compliance_reports(
results["nis2"] = error_dict.copy()
if generate_csa:
results["csa"] = error_dict.copy()
if generate_cis:
results["cis"] = error_dict.copy()
return results
# Generate ThreatScore report
if generate_threatscore:
generated_report_keys.append("threatscore")
threatscore_path = output_paths["threatscore"]
compliance_id_threatscore = f"prowler_threatscore_{provider_type}"
pdf_path_threatscore = f"{threatscore_path}_threatscore_report.pdf"
logger.info(
@@ -467,6 +989,8 @@ def generate_compliance_reports(
# Generate ENS report
if generate_ens:
generated_report_keys.append("ens")
ens_path = output_paths["ens"]
compliance_id_ens = f"ens_rd2022_{provider_type}"
pdf_path_ens = f"{ens_path}_ens_report.pdf"
logger.info("Generating ENS report with compliance %s", compliance_id_ens)
@@ -501,6 +1025,8 @@ def generate_compliance_reports(
# Generate NIS2 report
if generate_nis2:
generated_report_keys.append("nis2")
nis2_path = output_paths["nis2"]
compliance_id_nis2 = f"nis2_{provider_type}"
pdf_path_nis2 = f"{nis2_path}_nis2_report.pdf"
logger.info("Generating NIS2 report with compliance %s", compliance_id_nis2)
@@ -536,6 +1062,8 @@ def generate_compliance_reports(
# Generate CSA CCM report
if generate_csa:
generated_report_keys.append("csa")
csa_path = output_paths["csa"]
compliance_id_csa = f"csa_ccm_4.0_{provider_type}"
pdf_path_csa = f"{csa_path}_csa_report.pdf"
logger.info("Generating CSA CCM report with compliance %s", compliance_id_csa)
@@ -569,14 +1097,75 @@ def generate_compliance_reports(
logger.error("Error generating CSA CCM report: %s", e)
results["csa"] = {"upload": False, "path": "", "error": str(e)}
# Clean up temporary files if all reports were uploaded successfully
all_uploaded = all(
result.get("upload", False)
for result in results.values()
if result.get("upload") is not None
# Generate CIS Benchmark report for the latest available version only.
# CIS ships multiple versions per provider (e.g. cis_1.4_aws, cis_5.0_aws,
# cis_6.0_aws); we dynamically pick the highest semantic version at run
# time rather than hard-coding a per-provider mapping.
if generate_cis and latest_cis:
generated_report_keys.append("cis")
cis_path = output_paths["cis"]
if out_dir is None:
out_dir = str(Path(cis_path).parent.parent)
pdf_path_cis = f"{cis_path}_cis_report.pdf"
try:
generate_cis_report(
tenant_id=tenant_id,
scan_id=scan_id,
compliance_id=latest_cis,
output_path=pdf_path_cis,
provider_id=provider_id,
only_failed=only_failed_cis,
include_manual=include_manual_cis,
provider_obj=provider_obj,
requirement_statistics=requirement_statistics,
findings_cache=findings_cache,
)
upload_uri_cis = _upload_to_s3(
tenant_id,
scan_id,
pdf_path_cis,
f"cis/{Path(pdf_path_cis).name}",
)
if upload_uri_cis:
results["cis"] = {
"upload": True,
"path": upload_uri_cis,
}
logger.info(
"CIS report %s uploaded to %s",
latest_cis,
upload_uri_cis,
)
else:
results["cis"] = {"upload": False, "path": out_dir}
logger.warning(
"CIS report %s saved locally at %s",
latest_cis,
out_dir,
)
except Exception as e:
logger.error("Error generating CIS report %s: %s", latest_cis, e)
results["cis"] = {
"upload": False,
"path": "",
"error": str(e),
}
finally:
# Free ReportLab/matplotlib memory before moving on.
gc.collect()
# Clean up temporary files only if all generated reports were
# uploaded successfully. Reports skipped for provider incompatibility
# or missing CIS variants must not block cleanup.
all_uploaded = bool(generated_report_keys) and all(
results.get(report_key, {}).get("upload", False)
for report_key in generated_report_keys
)
if all_uploaded:
if all_uploaded and out_dir:
try:
rmtree(Path(out_dir), ignore_errors=True)
logger.info("Cleaned up temporary files at %s", out_dir)
@@ -595,6 +1184,7 @@ def generate_compliance_reports_job(
generate_ens: bool = True,
generate_nis2: bool = True,
generate_csa: bool = True,
generate_cis: bool = True,
) -> dict[str, dict[str, bool | str]]:
"""
Celery task wrapper for generate_compliance_reports.
@@ -607,9 +1197,12 @@ def generate_compliance_reports_job(
generate_ens: Whether to generate ENS report.
generate_nis2: Whether to generate NIS2 report.
generate_csa: Whether to generate CSA CCM report.
generate_cis: Whether to generate the CIS Benchmark report for the
latest CIS version available for the provider.
Returns:
Dictionary with results for each report type.
Dictionary with results for each report type. Every entry shares the
same flat ``{"upload", "path", "error"?}`` shape.
"""
return generate_compliance_reports(
tenant_id=tenant_id,
@@ -619,4 +1212,5 @@ def generate_compliance_reports_job(
generate_ens=generate_ens,
generate_nis2=generate_nis2,
generate_csa=generate_csa,
generate_cis=generate_cis,
)
@@ -17,6 +17,9 @@ from .charts import (
get_chart_color_for_percentage,
)
# Framework-specific generators
from .cis import CISReportGenerator
# Reusable components
# Reusable components: Color helpers, Badge components, Risk component,
# Table components, Section components
@@ -31,10 +34,12 @@ from .components import (
create_section_header,
create_status_badge,
create_summary_table,
escape_html,
get_color_for_compliance,
get_color_for_risk_level,
get_color_for_weight,
get_status_color,
truncate_text,
)
# Framework configuration: Main configuration, Color constants, ENS colors,
@@ -90,8 +95,6 @@ from .config import (
FrameworkConfig,
get_framework_config,
)
# Framework-specific generators
from .csa import CSAReportGenerator
from .ens import ENSReportGenerator
from .nis2 import NIS2ReportGenerator
@@ -109,6 +112,7 @@ __all__ = [
"ENSReportGenerator",
"NIS2ReportGenerator",
"CSAReportGenerator",
"CISReportGenerator",
# Configuration
"FrameworkConfig",
"FRAMEWORK_REGISTRY",
@@ -182,6 +186,9 @@ __all__ = [
# Section components
"create_section_header",
"create_summary_table",
# Text helpers
"truncate_text",
"escape_html",
# Chart functions
"get_chart_color_for_percentage",
"create_vertical_bar_chart",
+755
View File
@@ -0,0 +1,755 @@
import os
import re
from collections import defaultdict
from typing import Any
from reportlab.lib.units import inch
from reportlab.platypus import Image, PageBreak, Paragraph, Spacer, Table, TableStyle
from api.models import StatusChoices
from .base import (
BaseComplianceReportGenerator,
ComplianceData,
RequirementData,
get_requirement_metadata,
)
from .charts import (
create_horizontal_bar_chart,
create_pie_chart,
create_stacked_bar_chart,
get_chart_color_for_percentage,
)
from .components import ColumnConfig, create_data_table, escape_html, truncate_text
from .config import (
CHART_COLOR_GREEN_1,
CHART_COLOR_RED,
CHART_COLOR_YELLOW,
COLOR_BG_BLUE,
COLOR_BLUE,
COLOR_BORDER_GRAY,
COLOR_DARK_GRAY,
COLOR_GRAY,
COLOR_GRID_GRAY,
COLOR_HIGH_RISK,
COLOR_LIGHT_BLUE,
COLOR_SAFE,
COLOR_WHITE,
)
# Ordered buckets used both in the executive summary tables and the charts
# section. Exposed as module constants so the two call sites never drift.
_PROFILE_BUCKET_ORDER: tuple[str, ...] = ("L1", "L2", "Other")
_ASSESSMENT_BUCKET_ORDER: tuple[str, ...] = ("Automated", "Manual")
# Anchored matchers for profile normalization — substring checks on "L1"/"L2"
# would happily match unrelated tokens like "CL2 Worker" or "HL2" coming from
# future CIS profile enum values.
_LEVEL_2_RE = re.compile(r"(?:\bLevel\s*2\b|\bL2\b|Level_2)")
_LEVEL_1_RE = re.compile(r"(?:\bLevel\s*1\b|\bL1\b|Level_1)")
def _normalize_profile(profile: Any) -> str:
"""Bucket a CIS Profile enum/string into one of: ``L1``, ``L2``, ``Other``.
The ``CIS_Requirement_Attribute_Profile`` enum has values like
``"Level 1"``, ``"Level 2"``, ``"E3 Level 1"``, ``"E5 Level 2"``. We
collapse them into three buckets to keep charts and badges readable
across CIS variants, using anchored regex matches so that future enum
values cannot accidentally promote e.g. ``"CL2 Worker"`` into ``L2``.
Args:
profile: The profile value (enum member, string, or ``None``).
Returns:
One of ``"L1"``, ``"L2"``, ``"Other"``.
"""
if profile is None:
return "Other"
value = getattr(profile, "value", None) or str(profile)
if _LEVEL_2_RE.search(value):
return "L2"
if _LEVEL_1_RE.search(value):
return "L1"
return "Other"
def _profile_badge_text(bucket: str) -> str:
"""Map a normalized profile bucket (L1/L2/Other) to a short badge label."""
return {"L1": "Level 1", "L2": "Level 2"}.get(bucket, "Other")
# =============================================================================
# CIS Report Generator
# =============================================================================
class CISReportGenerator(BaseComplianceReportGenerator):
"""
PDF report generator for CIS (Center for Internet Security) Benchmarks.
CIS differs from single-version frameworks (ENS, NIS2, CSA) in that:
- Each provider has multiple CIS versions (e.g. AWS: 1.4, 1.5, ..., 6.0).
- Section names differ across versions and providers and MUST be derived
at runtime from the loaded compliance data.
- Requirements carry Profile (Level 1/Level 2) and AssessmentStatus
(Automated/Manual) attributes that drive the executive summary and
charts.
This generator produces:
- Cover page with Prowler logo and dynamic CIS version/provider metadata
- Executive summary with overall compliance score, counts, and breakdowns
by Profile and AssessmentStatus
- Charts: overall status pie, pass rate by section (horizontal bar),
Level 1 vs Level 2 pass/fail distribution (stacked bar)
- Requirements index grouped by dynamic section
- Detailed findings for FAIL requirements with CIS-specific audit /
remediation / rationale details
"""
# Per-run memoization cache for ``_compute_statistics``. ``generate()``
# is the public entry point and is called once per PDF, so scoping the
# cache to the last seen ComplianceData instance is enough to avoid the
# double computation between executive summary and charts section.
_stats_cache_key: int | None = None
_stats_cache_value: dict | None = None
# Body section ordering — ensure every top-level section starts on its
# own clean page. The base class only puts a PageBreak AFTER Charts and
# Requirements Index, so Executive Summary and Charts end up sharing a
# page. This override prepends a PageBreak so Compliance Analysis always
# begins on a fresh page.
def _build_body_sections(self, data: ComplianceData) -> list:
return [PageBreak(), *super()._build_body_sections(data)]
# -------------------------------------------------------------------------
# Cover page override — shows dynamic CIS version + provider in the title
# -------------------------------------------------------------------------
def create_cover_page(self, data: ComplianceData) -> list:
"""Create the CIS report cover page with Prowler + CIS logos side by side."""
elements = []
# Create logos side by side (same pattern as NIS2 / ENS)
prowler_logo_path = os.path.join(
os.path.dirname(__file__), "../../assets/img/prowler_logo.png"
)
cis_logo_path = os.path.join(
os.path.dirname(__file__), "../../assets/img/cis_logo.png"
)
if os.path.exists(cis_logo_path):
prowler_logo = Image(prowler_logo_path, width=3.5 * inch, height=0.7 * inch)
cis_logo = Image(cis_logo_path, width=2.3 * inch, height=1.1 * inch)
logos_table = Table(
[[prowler_logo, cis_logo]], colWidths=[4 * inch, 2.5 * inch]
)
logos_table.setStyle(
TableStyle(
[
("ALIGN", (0, 0), (0, 0), "LEFT"),
("ALIGN", (1, 0), (1, 0), "RIGHT"),
("VALIGN", (0, 0), (0, 0), "MIDDLE"),
("VALIGN", (1, 0), (1, 0), "MIDDLE"),
]
)
)
elements.append(logos_table)
elif os.path.exists(prowler_logo_path):
# Fallback: only the Prowler logo if the CIS asset is missing
elements.append(Image(prowler_logo_path, width=5 * inch, height=1 * inch))
elements.append(Spacer(1, 0.5 * inch))
# Dynamic title: "CIS Benchmark v5.0 — AWS Compliance Report"
provider_label = ""
if data.provider_obj:
provider_label = f"{data.provider_obj.provider.upper()}"
title_text = (
f"CIS Benchmark v{data.version}{provider_label}<br/>Compliance Report"
)
elements.append(Paragraph(title_text, self.styles["title"]))
elements.append(Spacer(1, 0.5 * inch))
# Metadata table via base class helper
info_rows = self._build_info_rows(data, language=self.config.language)
metadata_data = []
for label, value in info_rows:
if label in ("Name:", "Description:") and value:
metadata_data.append(
[label, Paragraph(str(value), self.styles["normal_center"])]
)
else:
metadata_data.append([label, value])
metadata_table = Table(metadata_data, colWidths=[2 * inch, 4 * inch])
metadata_table.setStyle(
TableStyle(
[
("BACKGROUND", (0, 0), (0, -1), COLOR_BLUE),
("TEXTCOLOR", (0, 0), (0, -1), COLOR_WHITE),
("FONTNAME", (0, 0), (0, -1), "FiraCode"),
("BACKGROUND", (1, 0), (1, -1), COLOR_BG_BLUE),
("TEXTCOLOR", (1, 0), (1, -1), COLOR_GRAY),
("FONTNAME", (1, 0), (1, -1), "PlusJakartaSans"),
("ALIGN", (0, 0), (-1, -1), "LEFT"),
("VALIGN", (0, 0), (-1, -1), "TOP"),
("FONTSIZE", (0, 0), (-1, -1), 11),
("GRID", (0, 0), (-1, -1), 1, COLOR_BORDER_GRAY),
("LEFTPADDING", (0, 0), (-1, -1), 10),
("RIGHTPADDING", (0, 0), (-1, -1), 10),
("TOPPADDING", (0, 0), (-1, -1), 8),
("BOTTOMPADDING", (0, 0), (-1, -1), 8),
]
)
)
elements.append(metadata_table)
return elements
# -------------------------------------------------------------------------
# Executive Summary
# -------------------------------------------------------------------------
def create_executive_summary(self, data: ComplianceData) -> list:
"""Create the CIS executive summary section."""
elements = []
elements.append(Paragraph("Executive Summary", self.styles["h1"]))
elements.append(Spacer(1, 0.1 * inch))
stats = self._compute_statistics(data)
# --- Summary metrics table ---
summary_data = [
["Metric", "Value"],
["Total Requirements", str(stats["total"])],
["Passed", str(stats["passed"])],
["Failed", str(stats["failed"])],
["Manual", str(stats["manual"])],
["Overall Compliance", f"{stats['overall_compliance']:.1f}%"],
]
summary_table = Table(summary_data, colWidths=[3 * inch, 2 * inch])
summary_table.setStyle(
TableStyle(
[
("BACKGROUND", (0, 0), (-1, 0), COLOR_BLUE),
("TEXTCOLOR", (0, 0), (-1, 0), COLOR_WHITE),
("BACKGROUND", (0, 2), (0, 2), COLOR_SAFE),
("TEXTCOLOR", (0, 2), (0, 2), COLOR_WHITE),
("BACKGROUND", (0, 3), (0, 3), COLOR_HIGH_RISK),
("TEXTCOLOR", (0, 3), (0, 3), COLOR_WHITE),
("BACKGROUND", (0, 4), (0, 4), COLOR_DARK_GRAY),
("TEXTCOLOR", (0, 4), (0, 4), COLOR_WHITE),
("FONTNAME", (0, 0), (-1, 0), "PlusJakartaSans"),
("FONTSIZE", (0, 0), (-1, 0), 12),
("FONTSIZE", (0, 1), (-1, -1), 10),
("ALIGN", (0, 0), (-1, -1), "CENTER"),
("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
("GRID", (0, 0), (-1, -1), 0.5, COLOR_BORDER_GRAY),
("BOTTOMPADDING", (0, 0), (-1, 0), 10),
(
"ROWBACKGROUNDS",
(1, 1),
(1, -1),
[COLOR_WHITE, COLOR_BG_BLUE],
),
]
)
)
elements.append(summary_table)
elements.append(Spacer(1, 0.25 * inch))
# --- Profile breakdown table ---
elements.append(Paragraph("Breakdown by Profile", self.styles["h2"]))
elements.append(Spacer(1, 0.1 * inch))
profile_counts = stats["profile_counts"]
profile_table_data = [["Profile", "Passed", "Failed", "Manual", "Total"]]
for bucket in _PROFILE_BUCKET_ORDER:
counts = profile_counts.get(bucket, {"passed": 0, "failed": 0, "manual": 0})
total = counts["passed"] + counts["failed"] + counts["manual"]
if total == 0:
continue
profile_table_data.append(
[
_profile_badge_text(bucket),
str(counts["passed"]),
str(counts["failed"]),
str(counts["manual"]),
str(total),
]
)
profile_table = Table(
profile_table_data,
colWidths=[1.5 * inch, 1 * inch, 1 * inch, 1 * inch, 1 * inch],
)
profile_table.setStyle(
TableStyle(
[
("BACKGROUND", (0, 0), (-1, 0), COLOR_BLUE),
("TEXTCOLOR", (0, 0), (-1, 0), COLOR_WHITE),
("FONTNAME", (0, 0), (-1, 0), "FiraCode"),
("FONTSIZE", (0, 0), (-1, 0), 10),
("ALIGN", (0, 0), (-1, -1), "CENTER"),
("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
("FONTSIZE", (0, 1), (-1, -1), 9),
("GRID", (0, 0), (-1, -1), 0.5, COLOR_GRID_GRAY),
(
"ROWBACKGROUNDS",
(0, 1),
(-1, -1),
[COLOR_WHITE, COLOR_BG_BLUE],
),
]
)
)
elements.append(profile_table)
elements.append(Spacer(1, 0.25 * inch))
# --- Assessment status breakdown ---
elements.append(Paragraph("Breakdown by Assessment Status", self.styles["h2"]))
elements.append(Spacer(1, 0.1 * inch))
assessment_counts = stats["assessment_counts"]
assessment_table_data = [["Assessment", "Passed", "Failed", "Manual", "Total"]]
for bucket in _ASSESSMENT_BUCKET_ORDER:
counts = assessment_counts.get(
bucket, {"passed": 0, "failed": 0, "manual": 0}
)
total = counts["passed"] + counts["failed"] + counts["manual"]
if total == 0:
continue
assessment_table_data.append(
[
bucket,
str(counts["passed"]),
str(counts["failed"]),
str(counts["manual"]),
str(total),
]
)
assessment_table = Table(
assessment_table_data,
colWidths=[1.5 * inch, 1 * inch, 1 * inch, 1 * inch, 1 * inch],
)
assessment_table.setStyle(
TableStyle(
[
("BACKGROUND", (0, 0), (-1, 0), COLOR_LIGHT_BLUE),
("TEXTCOLOR", (0, 0), (-1, 0), COLOR_WHITE),
("FONTNAME", (0, 0), (-1, 0), "FiraCode"),
("FONTSIZE", (0, 0), (-1, 0), 10),
("ALIGN", (0, 0), (-1, -1), "CENTER"),
("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
("FONTSIZE", (0, 1), (-1, -1), 9),
("GRID", (0, 0), (-1, -1), 0.5, COLOR_GRID_GRAY),
(
"ROWBACKGROUNDS",
(0, 1),
(-1, -1),
[COLOR_WHITE, COLOR_BG_BLUE],
),
]
)
)
elements.append(assessment_table)
elements.append(Spacer(1, 0.25 * inch))
# --- Top 5 failing sections ---
top_failing = stats["top_failing_sections"]
if top_failing:
elements.append(
Paragraph("Top Sections with Lowest Compliance", self.styles["h2"])
)
elements.append(Spacer(1, 0.1 * inch))
top_table_data = [["Section", "Passed", "Failed", "Compliance"]]
for section_label, section_stats in top_failing:
passed = section_stats["passed"]
failed = section_stats["failed"]
total = passed + failed
pct = (passed / total * 100) if total > 0 else 100
top_table_data.append(
[
truncate_text(section_label, 55),
str(passed),
str(failed),
f"{pct:.1f}%",
]
)
top_table = Table(
top_table_data,
colWidths=[3.5 * inch, 0.9 * inch, 0.9 * inch, 1.2 * inch],
)
top_table.setStyle(
TableStyle(
[
("BACKGROUND", (0, 0), (-1, 0), COLOR_HIGH_RISK),
("TEXTCOLOR", (0, 0), (-1, 0), COLOR_WHITE),
("FONTNAME", (0, 0), (-1, 0), "FiraCode"),
("FONTSIZE", (0, 0), (-1, 0), 10),
("ALIGN", (0, 0), (-1, -1), "CENTER"),
("VALIGN", (0, 0), (-1, -1), "MIDDLE"),
("FONTSIZE", (0, 1), (-1, -1), 9),
("GRID", (0, 0), (-1, -1), 0.5, COLOR_GRID_GRAY),
(
"ROWBACKGROUNDS",
(0, 1),
(-1, -1),
[COLOR_WHITE, COLOR_BG_BLUE],
),
]
)
)
elements.append(top_table)
return elements
# -------------------------------------------------------------------------
# Charts section
# -------------------------------------------------------------------------
def create_charts_section(self, data: ComplianceData) -> list:
"""Create the CIS charts section."""
elements = []
elements.append(Paragraph("Compliance Analysis", self.styles["h1"]))
elements.append(Spacer(1, 0.1 * inch))
# --- Pie chart: overall Pass / Fail / Manual ---
stats = self._compute_statistics(data)
pie_labels = []
pie_values = []
pie_colors = []
if stats["passed"] > 0:
pie_labels.append(f"Pass ({stats['passed']})")
pie_values.append(stats["passed"])
pie_colors.append(CHART_COLOR_GREEN_1)
if stats["failed"] > 0:
pie_labels.append(f"Fail ({stats['failed']})")
pie_values.append(stats["failed"])
pie_colors.append(CHART_COLOR_RED)
if stats["manual"] > 0:
pie_labels.append(f"Manual ({stats['manual']})")
pie_values.append(stats["manual"])
pie_colors.append(CHART_COLOR_YELLOW)
if pie_values:
elements.append(Paragraph("Overall Status Distribution", self.styles["h2"]))
elements.append(Spacer(1, 0.1 * inch))
pie_buffer = create_pie_chart(
labels=pie_labels,
values=pie_values,
colors=pie_colors,
)
pie_buffer.seek(0)
elements.append(Image(pie_buffer, width=4.5 * inch, height=4.5 * inch))
elements.append(Spacer(1, 0.2 * inch))
# --- Horizontal bar: pass rate by section ---
section_stats = stats["section_stats"]
if section_stats:
elements.append(PageBreak())
elements.append(Paragraph("Compliance by Section", self.styles["h1"]))
elements.append(Spacer(1, 0.1 * inch))
elements.append(
Paragraph(
"The following chart shows compliance percentage for each CIS "
"section based on automated checks:",
self.styles["normal_center"],
)
)
elements.append(Spacer(1, 0.1 * inch))
# Sort sections by pass rate descending for readability
sorted_sections = sorted(
section_stats.items(),
key=lambda item: (
(item[1]["passed"] / (item[1]["passed"] + item[1]["failed"]) * 100)
if (item[1]["passed"] + item[1]["failed"]) > 0
else 100
),
reverse=True,
)
bar_labels = []
bar_values = []
for section_label, section_data in sorted_sections:
total = section_data["passed"] + section_data["failed"]
if total == 0:
continue
pct = (section_data["passed"] / total) * 100
bar_labels.append(truncate_text(section_label, 60))
bar_values.append(pct)
if bar_values:
bar_buffer = create_horizontal_bar_chart(
labels=bar_labels,
values=bar_values,
xlabel="Compliance (%)",
color_func=get_chart_color_for_percentage,
label_fontsize=9,
)
bar_buffer.seek(0)
elements.append(Image(bar_buffer, width=6.5 * inch, height=5 * inch))
# --- Stacked bar: Level 1 vs Level 2 pass/fail ---
profile_counts = stats["profile_counts"]
has_profile_data = any(
(counts["passed"] + counts["failed"]) > 0
for counts in profile_counts.values()
)
if has_profile_data:
elements.append(PageBreak())
elements.append(Paragraph("Profile Breakdown", self.styles["h1"]))
elements.append(Spacer(1, 0.1 * inch))
elements.append(
Paragraph(
"Distribution of Pass / Fail / Manual across CIS profile levels.",
self.styles["normal_center"],
)
)
elements.append(Spacer(1, 0.1 * inch))
profile_labels = []
pass_series = []
fail_series = []
manual_series = []
for bucket in _PROFILE_BUCKET_ORDER:
counts = profile_counts.get(bucket)
if not counts:
continue
total = counts["passed"] + counts["failed"] + counts["manual"]
if total == 0:
continue
profile_labels.append(_profile_badge_text(bucket))
pass_series.append(counts["passed"])
fail_series.append(counts["failed"])
manual_series.append(counts["manual"])
if profile_labels:
stacked_buffer = create_stacked_bar_chart(
labels=profile_labels,
data_series={
"Pass": pass_series,
"Fail": fail_series,
"Manual": manual_series,
},
xlabel="Profile",
ylabel="Requirements",
)
stacked_buffer.seek(0)
elements.append(Image(stacked_buffer, width=6 * inch, height=4 * inch))
return elements
# -------------------------------------------------------------------------
# Requirements Index
# -------------------------------------------------------------------------
def create_requirements_index(self, data: ComplianceData) -> list:
"""Create the CIS requirements index grouped by dynamic section."""
elements = []
elements.append(Paragraph("Requirements Index", self.styles["h1"]))
elements.append(Spacer(1, 0.1 * inch))
sections = self._derive_sections(data)
by_section: dict[str, list[dict]] = defaultdict(list)
for req in data.requirements:
meta = get_requirement_metadata(req.id, data.attributes_by_requirement_id)
section = "Other"
profile_bucket = "Other"
assessment = ""
if meta:
section = getattr(meta, "Section", "Other") or "Other"
profile_bucket = _normalize_profile(getattr(meta, "Profile", None))
assessment_enum = getattr(meta, "AssessmentStatus", None)
assessment = getattr(assessment_enum, "value", None) or str(
assessment_enum or ""
)
by_section[section].append(
{
"id": req.id,
"description": truncate_text(req.description, 80),
"profile": _profile_badge_text(profile_bucket),
"assessment": assessment or "-",
"status": (req.status or "").upper(),
}
)
columns = [
ColumnConfig("ID", 0.9 * inch, "id", align="LEFT"),
ColumnConfig("Description", 3.0 * inch, "description", align="LEFT"),
ColumnConfig("Profile", 0.9 * inch, "profile"),
ColumnConfig("Assessment", 1 * inch, "assessment"),
ColumnConfig("Status", 0.9 * inch, "status"),
]
for section in sections:
rows = by_section.get(section, [])
if not rows:
continue
elements.append(Paragraph(truncate_text(section, 90), self.styles["h2"]))
elements.append(Spacer(1, 0.05 * inch))
table = create_data_table(
data=rows,
columns=columns,
header_color=self.config.primary_color,
normal_style=self.styles["normal_center"],
)
elements.append(table)
elements.append(Spacer(1, 0.15 * inch))
return elements
# -------------------------------------------------------------------------
# Detailed findings hook — inject CIS-specific rationale / audit content
# -------------------------------------------------------------------------
def _render_requirement_detail_extras(
self, req: RequirementData, data: ComplianceData
) -> list:
"""Render CIS rationale, impact, audit, remediation and references."""
extras = []
meta = get_requirement_metadata(req.id, data.attributes_by_requirement_id)
if meta is None:
return extras
field_map = [
("Rationale", "RationaleStatement"),
("Impact", "ImpactStatement"),
("Audit Procedure", "AuditProcedure"),
("Remediation", "RemediationProcedure"),
("References", "References"),
]
for label, attr_name in field_map:
value = getattr(meta, attr_name, None)
if not value:
continue
text = str(value).strip()
if not text:
continue
extras.append(Paragraph(f"<b>{label}:</b>", self.styles["h3"]))
extras.append(Paragraph(escape_html(text), self.styles["normal"]))
extras.append(Spacer(1, 0.08 * inch))
return extras
# -------------------------------------------------------------------------
# Private helpers
# -------------------------------------------------------------------------
def _derive_sections(self, data: ComplianceData) -> list[str]:
"""Extract ordered unique Section names from loaded compliance data."""
seen: dict[str, bool] = {}
for req in data.requirements:
meta = get_requirement_metadata(req.id, data.attributes_by_requirement_id)
if meta is None:
continue
section = getattr(meta, "Section", None) or "Other"
if section not in seen:
seen[section] = True
return list(seen.keys())
def _compute_statistics(self, data: ComplianceData) -> dict:
"""Aggregate all statistics needed for summary and charts.
Memoized per-``ComplianceData`` instance via ``_stats_cache_*``: the
executive summary and the charts section both need the same numbers,
so they would otherwise re-iterate the requirements twice. We key on
``id(data)`` because ``ComplianceData`` is a dataclass and its
instances are not hashable.
Returns a dict with:
- total, passed, failed, manual: int
- overall_compliance: float (percentage)
- profile_counts: {"L1": {"passed", "failed", "manual"}, ...}
- assessment_counts: {"Automated": {...}, "Manual": {...}}
- section_stats: {section_name: {"passed", "failed", "manual"}, ...}
- top_failing_sections: list[(section_name, stats)] (up to 5)
"""
cache_key = id(data)
if self._stats_cache_key == cache_key and self._stats_cache_value is not None:
return self._stats_cache_value
stats = self._compute_statistics_uncached(data)
self._stats_cache_key = cache_key
self._stats_cache_value = stats
return stats
def _compute_statistics_uncached(self, data: ComplianceData) -> dict:
"""Actual aggregation kernel; call ``_compute_statistics`` instead."""
total = len(data.requirements)
passed = sum(1 for r in data.requirements if r.status == StatusChoices.PASS)
failed = sum(1 for r in data.requirements if r.status == StatusChoices.FAIL)
manual = sum(1 for r in data.requirements if r.status == StatusChoices.MANUAL)
evaluated = passed + failed
overall_compliance = (passed / evaluated * 100) if evaluated > 0 else 100.0
profile_counts: dict[str, dict[str, int]] = {
"L1": {"passed": 0, "failed": 0, "manual": 0},
"L2": {"passed": 0, "failed": 0, "manual": 0},
"Other": {"passed": 0, "failed": 0, "manual": 0},
}
assessment_counts: dict[str, dict[str, int]] = {
"Automated": {"passed": 0, "failed": 0, "manual": 0},
"Manual": {"passed": 0, "failed": 0, "manual": 0},
}
section_stats: dict[str, dict[str, int]] = defaultdict(
lambda: {"passed": 0, "failed": 0, "manual": 0}
)
for req in data.requirements:
meta = get_requirement_metadata(req.id, data.attributes_by_requirement_id)
if meta is None:
continue
profile_bucket = _normalize_profile(getattr(meta, "Profile", None))
assessment_enum = getattr(meta, "AssessmentStatus", None)
assessment_value = getattr(assessment_enum, "value", None) or str(
assessment_enum or ""
)
assessment_bucket = (
"Automated" if assessment_value == "Automated" else "Manual"
)
section = getattr(meta, "Section", None) or "Other"
status_key = {
StatusChoices.PASS: "passed",
StatusChoices.FAIL: "failed",
StatusChoices.MANUAL: "manual",
}.get(req.status)
if status_key is None:
continue
profile_counts[profile_bucket][status_key] += 1
assessment_counts[assessment_bucket][status_key] += 1
section_stats[section][status_key] += 1
# Top 5 sections with lowest pass rate (only sections with evaluated reqs)
def _section_rate(item):
_, stats_ = item
evaluated_ = stats_["passed"] + stats_["failed"]
if evaluated_ == 0:
return 101 # sort evaluated=0 to the bottom
return stats_["passed"] / evaluated_ * 100
top_failing_sections = sorted(
(
item
for item in section_stats.items()
if (item[1]["passed"] + item[1]["failed"]) > 0
),
key=_section_rate,
)[:5]
return {
"total": total,
"passed": passed,
"failed": failed,
"manual": manual,
"overall_compliance": overall_compliance,
"profile_counts": profile_counts,
"assessment_counts": assessment_counts,
"section_stats": dict(section_stats),
"top_failing_sections": top_failing_sections,
}
@@ -26,6 +26,52 @@ from .config import (
)
def truncate_text(text: str, max_len: int) -> str:
"""Truncate ``text`` to ``max_len`` characters, appending an ellipsis if cut.
Used by report generators that need to squeeze long descriptions, section
titles or finding titles into a fixed-width table cell.
Args:
text: Source string. ``None`` and non-string values are treated as empty.
max_len: Maximum output length including the ellipsis. Values < 4 are
clamped so the result never grows beyond ``max_len``.
Returns:
The original string if short enough, otherwise ``text[: max_len - 3] + "..."``.
When ``max_len < 4`` a plain substring of length ``max_len`` is returned
so callers never get a string longer than they asked for.
"""
if not text:
return ""
text = str(text)
if len(text) <= max_len:
return text
if max_len < 4:
return text[:max_len]
return text[: max_len - 3] + "..."
def escape_html(text: str) -> str:
"""Escape the minimal HTML entities required for safe ReportLab Paragraph rendering.
ReportLab's ``Paragraph`` parses a small HTML subset, so raw ``<``, ``>``
and ``&`` in user-provided content (rationale, remediation, etc.) would
break layout or be interpreted as tags. This helper mirrors
``html.escape`` but avoids pulling in the stdlib dependency and keeps the
output deterministic.
Args:
text: Untrusted source string.
Returns:
A string safe to embed inside a ReportLab Paragraph.
"""
return (
str(text or "").replace("&", "&amp;").replace("<", "&lt;").replace(">", "&gt;")
)
def get_color_for_risk_level(risk_level: int) -> colors.Color:
"""
Get color based on risk level.
@@ -313,6 +313,32 @@ FRAMEWORK_REGISTRY: dict[str, FrameworkConfig] = {
has_niveles=False,
has_weight=False,
),
"cis": FrameworkConfig(
name="cis",
display_name="CIS Benchmark",
logo_filename=None,
primary_color=COLOR_BLUE,
secondary_color=COLOR_LIGHT_BLUE,
bg_color=COLOR_BG_BLUE,
attribute_fields=[
"Section",
"SubSection",
"Profile",
"AssessmentStatus",
"Description",
"RationaleStatement",
"ImpactStatement",
"RemediationProcedure",
"AuditProcedure",
"References",
],
sections=None, # Derived dynamically per CIS variant (section names differ across versions/providers)
language="en",
has_risk_levels=False,
has_dimensions=False,
has_niveles=False,
has_weight=False,
),
}
@@ -336,5 +362,7 @@ def get_framework_config(compliance_id: str) -> FrameworkConfig | None:
return FRAMEWORK_REGISTRY["nis2"]
if "csa" in compliance_lower or "ccm" in compliance_lower:
return FRAMEWORK_REGISTRY["csa_ccm"]
if compliance_lower.startswith("cis_") or "cis" in compliance_lower:
return FRAMEWORK_REGISTRY["cis"]
return None
+62 -14
View File
@@ -752,11 +752,19 @@ def _process_finding_micro_batch(
)
if mappings_to_create:
ResourceFindingMapping.objects.bulk_create(
created_mappings = ResourceFindingMapping.objects.bulk_create(
mappings_to_create,
batch_size=SCAN_DB_BATCH_SIZE,
ignore_conflicts=True,
unique_fields=["tenant_id", "resource_id", "finding_id"],
)
inserted = sum(1 for m in created_mappings if m.pk)
if inserted != len(mappings_to_create):
logger.error(
f"scan {scan_instance.id}: expected "
f"{len(mappings_to_create)} ResourceFindingMapping rows, "
f"inserted {inserted}. Rolling back micro-batch."
)
# Update finding denormalized arrays
findings_to_update = []
@@ -1189,8 +1197,39 @@ def aggregate_findings(tenant_id: str, scan_id: str):
muted_changed=agg["muted_changed"],
)
for agg in aggregation
if agg["resources__service"] is not None
and agg["resources__region"] is not None
}
ScanSummary.objects.bulk_create(scan_aggregations, batch_size=3000)
# Upsert so re-runs (post-mute reaggregation) don't trip
# `unique_scan_summary`; race-safe under concurrent writers.
ScanSummary.objects.bulk_create(
scan_aggregations,
batch_size=3000,
update_conflicts=True,
unique_fields=[
"tenant",
"scan",
"check_id",
"service",
"severity",
"region",
],
update_fields=[
"_pass",
"fail",
"muted",
"total",
"new",
"changed",
"unchanged",
"fail_new",
"fail_changed",
"pass_new",
"pass_changed",
"muted_new",
"muted_changed",
],
)
def _aggregate_findings_by_region(
@@ -1535,13 +1574,24 @@ def aggregate_attack_surface(tenant_id: str, scan_id: str):
)
)
# Bulk create overview records
if overview_objects:
with rls_transaction(tenant_id):
AttackSurfaceOverview.objects.bulk_create(overview_objects, batch_size=500)
logger.info(
f"Created {len(overview_objects)} attack surface overview records for scan {scan_id}"
# Upsert so re-runs (post-mute reaggregation) don't trip
# `unique_attack_surface_per_scan`; race-safe under concurrent writers.
AttackSurfaceOverview.objects.bulk_create(
overview_objects,
batch_size=500,
update_conflicts=True,
unique_fields=["tenant_id", "scan_id", "attack_surface_type"],
update_fields=[
"total_findings",
"failed_findings",
"muted_failed_findings",
],
)
logger.info(
f"Upserted {len(overview_objects)} attack surface overview records for scan {scan_id}"
)
else:
logger.info(f"No attack surface overview records created for scan {scan_id}")
@@ -1804,11 +1854,9 @@ def aggregate_finding_group_summaries(tenant_id: str, scan_id: str):
)
# Aggregate findings by check_id for this scan.
# `pass_count`, `fail_count` and `manual_count` count *every* finding
# in this group, regardless of mute state, so the aggregated `status`
# always reflects the underlying check outcome (FAIL > PASS > MANUAL)
# even when the group is fully muted. The orthogonal `muted` flag is
# what tells whether the group has any actionable (non-muted) findings.
# `pass_count`, `fail_count` and `manual_count` only count non-muted
# findings. Muted findings are tracked separately via the
# `*_muted_count` fields.
aggregated = (
Finding.objects.filter(
tenant_id=tenant_id,
@@ -1817,9 +1865,9 @@ def aggregate_finding_group_summaries(tenant_id: str, scan_id: str):
.values("check_id")
.annotate(
severity_order=Max(severity_case),
pass_count=Count("id", filter=Q(status="PASS")),
fail_count=Count("id", filter=Q(status="FAIL")),
manual_count=Count("id", filter=Q(status="MANUAL")),
pass_count=Count("id", filter=Q(status="PASS", muted=False)),
fail_count=Count("id", filter=Q(status="FAIL", muted=False)),
manual_count=Count("id", filter=Q(status="MANUAL", muted=False)),
pass_muted_count=Count("id", filter=Q(status="PASS", muted=True)),
fail_muted_count=Count("id", filter=Q(status="FAIL", muted=True)),
manual_muted_count=Count("id", filter=Q(status="MANUAL", muted=True)),
+72 -19
View File
@@ -20,8 +20,8 @@ from tasks.jobs.backfill import (
backfill_finding_group_summaries,
backfill_provider_compliance_scores,
backfill_resource_scan_summaries,
backfill_scan_category_summaries,
backfill_scan_resource_group_summaries,
aggregate_scan_category_summaries,
aggregate_scan_resource_group_summaries,
)
from tasks.jobs.connection import (
check_integration_connection,
@@ -46,7 +46,11 @@ from tasks.jobs.lighthouse_providers import (
refresh_lighthouse_provider_models,
)
from tasks.jobs.muting import mute_historical_findings
from tasks.jobs.report import generate_compliance_reports_job
from tasks.jobs.report import (
STALE_TMP_OUTPUT_MAX_AGE_HOURS,
_cleanup_stale_tmp_output_directories,
generate_compliance_reports_job,
)
from tasks.jobs.scan import (
aggregate_attack_surface,
aggregate_daily_severity,
@@ -440,6 +444,19 @@ def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
scan_id (str): The scan identifier.
provider_id (str): The provider_id id to be used in generating outputs.
"""
try:
_cleanup_stale_tmp_output_directories(
DJANGO_TMP_OUTPUT_DIRECTORY,
max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS,
exclude_scan=(tenant_id, scan_id),
)
except Exception as error:
logger.warning(
"Skipping stale tmp cleanup before output generation for scan %s: %s",
scan_id,
error,
)
# Check if the scan has findings
if not ScanSummary.objects.filter(scan_id=scan_id).exists():
logger.info(f"No findings found for scan {scan_id}")
@@ -659,9 +676,9 @@ def backfill_finding_group_summaries_task(tenant_id: str, days: int = None):
return backfill_finding_group_summaries(tenant_id=tenant_id, days=days)
@shared_task(name="backfill-scan-category-summaries", queue="backfill")
@shared_task(name="scan-category-summaries", queue="overview")
@handle_provider_deletion
def backfill_scan_category_summaries_task(tenant_id: str, scan_id: str):
def aggregate_scan_category_summaries_task(tenant_id: str, scan_id: str):
"""
Backfill ScanCategorySummary for a completed scan.
@@ -671,12 +688,12 @@ def backfill_scan_category_summaries_task(tenant_id: str, scan_id: str):
tenant_id (str): The tenant identifier.
scan_id (str): The scan identifier.
"""
return backfill_scan_category_summaries(tenant_id=tenant_id, scan_id=scan_id)
return aggregate_scan_category_summaries(tenant_id=tenant_id, scan_id=scan_id)
@shared_task(name="backfill-scan-resource-group-summaries", queue="backfill")
@shared_task(name="scan-resource-group-summaries", queue="overview")
@handle_provider_deletion
def backfill_scan_resource_group_summaries_task(tenant_id: str, scan_id: str):
def aggregate_scan_resource_group_summaries_task(tenant_id: str, scan_id: str):
"""
Backfill ScanGroupSummary for a completed scan.
@@ -686,7 +703,7 @@ def backfill_scan_resource_group_summaries_task(tenant_id: str, scan_id: str):
tenant_id (str): The tenant identifier.
scan_id (str): The scan identifier.
"""
return backfill_scan_resource_group_summaries(tenant_id=tenant_id, scan_id=scan_id)
return aggregate_scan_resource_group_summaries(tenant_id=tenant_id, scan_id=scan_id)
@shared_task(name="backfill-provider-compliance-scores", queue="backfill")
@@ -771,15 +788,26 @@ def aggregate_finding_group_summaries_task(tenant_id: str, scan_id: str):
)
@set_tenant(keep_tenant=True)
def reaggregate_all_finding_group_summaries_task(tenant_id: str):
"""Reaggregate finding group summaries for every (provider, day) combination.
"""Reaggregate every pre-aggregated summary table for this tenant.
Mirrors the unbounded scope of `mute_historical_findings_task`: that task
rewrites every Finding row whose UID matches a mute rule, with no time
limit. To keep the daily summaries consistent with that update, this task
re-runs the aggregator on the latest completed scan of every (provider,
day) pair that exists in the database. Tasks are dispatched in parallel
via a Celery group so the wallclock scales with the worker pool, not with
the number of pairs.
limit. To keep the pre-aggregated tables consistent with that update,
this task re-runs the same per-scan aggregation pipeline that scan
completion runs on the latest completed scan of every (provider, day)
pair, rebuilding the tables that power the read endpoints:
- `ScanSummary` and `DailySeveritySummary` -> `/overviews/findings`,
`/overviews/findings-severity`, `/overviews/services`.
- `FindingGroupDailySummary` -> `/finding-groups` and
`/finding-groups/latest`.
- `ScanGroupSummary` -> `/overviews/resource-groups` (resource
inventory).
- `ScanCategorySummary` -> `/overviews/categories`.
- `AttackSurfaceOverview` -> `/overviews/attack-surfaces`.
Per-scan pipelines are dispatched in parallel via a Celery group so
wallclock scales with the worker pool.
"""
completed_scans = list(
Scan.objects.filter(
@@ -804,12 +832,32 @@ def reaggregate_all_finding_group_summaries_task(tenant_id: str):
scan_ids = list(latest_scans.values())
if scan_ids:
logger.info(
"Reaggregating finding group summaries for %d scans (provider x day)",
"Reaggregating overview/finding summaries for %d scans (provider x day)",
len(scan_ids),
)
# DailySeveritySummary reads from ScanSummary, so ScanSummary must be
# recomputed first; the other aggregators read Finding directly and
# can run in parallel with the severity step.
group(
aggregate_finding_group_summaries_task.si(
tenant_id=tenant_id, scan_id=scan_id
chain(
perform_scan_summary_task.si(tenant_id=tenant_id, scan_id=scan_id),
group(
aggregate_daily_severity_task.si(
tenant_id=tenant_id, scan_id=scan_id
),
aggregate_finding_group_summaries_task.si(
tenant_id=tenant_id, scan_id=scan_id
),
aggregate_scan_resource_group_summaries_task.si(
tenant_id=tenant_id, scan_id=scan_id
),
aggregate_scan_category_summaries_task.si(
tenant_id=tenant_id, scan_id=scan_id
),
aggregate_attack_surface_task.si(
tenant_id=tenant_id, scan_id=scan_id
),
),
)
for scan_id in scan_ids
).apply_async()
@@ -982,13 +1030,17 @@ def jira_integration_task(
@handle_provider_deletion
def generate_compliance_reports_task(tenant_id: str, scan_id: str, provider_id: str):
"""
Optimized task to generate ThreatScore, ENS, NIS2, and CSA CCM reports with shared queries.
Optimized task to generate ThreatScore, ENS, NIS2, CSA CCM and CIS reports with shared queries.
This task is more efficient than running separate report tasks because it reuses database queries:
- Provider object fetched once (instead of multiple times)
- Requirement statistics aggregated once (instead of multiple times)
- Can reduce database load by up to 50-70%
CIS emits a single PDF per run: the one matching the highest CIS version
available for the scan's provider, picked dynamically from
``Compliance.get_bulk`` (no hard-coded provider version mapping).
Args:
tenant_id (str): The tenant identifier.
scan_id (str): The scan identifier.
@@ -1005,6 +1057,7 @@ def generate_compliance_reports_task(tenant_id: str, scan_id: str, provider_id:
generate_ens=True,
generate_nis2=True,
generate_csa=True,
generate_cis=True,
)
@@ -38,11 +38,14 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.sync.sync_graph")
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph")
@patch(
"tasks.jobs.attack_paths.scan.sync.sync_graph",
return_value={"nodes": 0, "relationships": 0},
)
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph", return_value=0)
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -188,7 +191,7 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -287,7 +290,7 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -390,7 +393,7 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -489,14 +492,17 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.sync.sync_graph")
@patch(
"tasks.jobs.attack_paths.scan.sync.sync_graph",
return_value={"nodes": 0, "relationships": 0},
)
@patch(
"tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
side_effect=RuntimeError("drop failed"),
)
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -609,7 +615,7 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph")
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -718,11 +724,14 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.sync.sync_graph")
@patch(
"tasks.jobs.attack_paths.scan.sync.sync_graph",
return_value={"nodes": 0, "relationships": 0},
)
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph")
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -833,14 +842,17 @@ class TestAttackPathsRun:
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
@patch("tasks.jobs.attack_paths.scan.db_utils.update_attack_paths_scan_progress")
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
@patch("tasks.jobs.attack_paths.scan.sync.sync_graph")
@patch(
"tasks.jobs.attack_paths.scan.sync.sync_graph",
return_value={"nodes": 0, "relationships": 0},
)
@patch(
"tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
side_effect=RuntimeError("drop failed"),
)
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis")
@patch("tasks.jobs.attack_paths.scan.findings.analysis", return_value=(0, 0))
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
@@ -1273,11 +1285,13 @@ class TestAttackPathsFindingsHelpers:
config = SimpleNamespace(update_tag=12345)
mock_session = MagicMock()
first_result = MagicMock()
first_result.single.return_value = {"merged_count": 1, "dropped_count": 0}
second_result = MagicMock()
second_result.single.return_value = {"merged_count": 0, "dropped_count": 1}
mock_session.run.side_effect = [first_result, second_result]
with (
patch(
"tasks.jobs.attack_paths.findings.get_root_node_label",
return_value="AWSAccount",
),
patch(
"tasks.jobs.attack_paths.findings.get_node_uid_field",
return_value="arn",
@@ -1286,6 +1300,7 @@ class TestAttackPathsFindingsHelpers:
"tasks.jobs.attack_paths.findings.get_provider_resource_label",
return_value="_AWSResource",
),
patch("tasks.jobs.attack_paths.findings.logger") as mock_logger,
):
findings_module.load_findings(
mock_session, findings_generator(), provider, config
@@ -1294,26 +1309,16 @@ class TestAttackPathsFindingsHelpers:
assert mock_session.run.call_count == 2
for call_args in mock_session.run.call_args_list:
params = call_args.args[1]
assert params["provider_uid"] == str(provider.uid)
assert params["last_updated"] == config.update_tag
assert "findings_data" in params
def test_cleanup_findings_runs_batches(self, providers_fixture):
provider = providers_fixture[0]
config = SimpleNamespace(update_tag=1024)
mock_session = MagicMock()
first_batch = MagicMock()
first_batch.single.return_value = {"deleted_findings_count": 3}
second_batch = MagicMock()
second_batch.single.return_value = {"deleted_findings_count": 0}
mock_session.run.side_effect = [first_batch, second_batch]
findings_module.cleanup_findings(mock_session, provider, config)
assert mock_session.run.call_count == 2
params = mock_session.run.call_args.args[1]
assert params["last_updated"] == config.update_tag
summary_log = next(
call_args.args[0]
for call_args in mock_logger.info.call_args_list
if call_args.args and "Finished loading" in call_args.args[0]
)
assert "edges_merged=1" in summary_log
assert "edges_dropped=1" in summary_log
def test_stream_findings_with_resources_returns_latest_scan_data(
self,
@@ -1494,11 +1499,12 @@ class TestAttackPathsFindingsHelpers:
"default",
):
result = findings_module._enrich_batch_with_resources(
[finding_dict], str(tenant.id)
[finding_dict], str(tenant.id), lambda uid: f"short:{uid}"
)
assert len(result) == 1
assert result[0]["resource_uid"] == resource.uid
assert result[0]["resource_short_uid"] == f"short:{resource.uid}"
assert result[0]["id"] == str(finding.id)
assert result[0]["status"] == "FAIL"
@@ -1582,7 +1588,7 @@ class TestAttackPathsFindingsHelpers:
"default",
):
result = findings_module._enrich_batch_with_resources(
[finding_dict], str(tenant.id)
[finding_dict], str(tenant.id), lambda uid: uid
)
assert len(result) == 3
@@ -1656,7 +1662,7 @@ class TestAttackPathsFindingsHelpers:
patch("tasks.jobs.attack_paths.findings.logger") as mock_logger,
):
result = findings_module._enrich_batch_with_resources(
[finding_dict], str(tenant.id)
[finding_dict], str(tenant.id), lambda uid: uid
)
assert len(result) == 0
@@ -1690,10 +1696,6 @@ class TestAttackPathsFindingsHelpers:
yield # Make it a generator
with (
patch(
"tasks.jobs.attack_paths.findings.get_root_node_label",
return_value="AWSAccount",
),
patch(
"tasks.jobs.attack_paths.findings.get_node_uid_field",
return_value="arn",
@@ -1707,6 +1709,63 @@ class TestAttackPathsFindingsHelpers:
mock_session.run.assert_not_called()
@pytest.mark.parametrize(
"uid, expected",
[
(
"arn:aws:ec2:us-east-1:552455647653:instance/i-05075b63eb51baacb",
"i-05075b63eb51baacb",
),
(
"arn:aws:ec2:us-east-1:123456789012:volume/vol-0abcd1234ef567890",
"vol-0abcd1234ef567890",
),
(
"arn:aws:ec2:us-east-1:123456789012:security-group/sg-0123abcd",
"sg-0123abcd",
),
("arn:aws:s3:::my-bucket-name", "my-bucket-name"),
("arn:aws:iam::123456789012:role/MyRole", "MyRole"),
(
"arn:aws:lambda:us-east-1:123456789012:function:my-function",
"my-function",
),
("i-05075b63eb51baacb", "i-05075b63eb51baacb"),
],
)
def test_extract_short_uid_aws_variants(self, uid, expected):
from tasks.jobs.attack_paths.aws import extract_short_uid
assert extract_short_uid(uid) == expected
def test_insert_finding_template_has_short_id_fallback(self):
from tasks.jobs.attack_paths.queries import (
INSERT_FINDING_TEMPLATE,
render_cypher_template,
)
rendered = render_cypher_template(
INSERT_FINDING_TEMPLATE,
{
"__NODE_UID_FIELD__": "arn",
"__RESOURCE_LABEL__": "_AWSResource",
},
)
assert (
"resource_by_uid:_AWSResource {arn: finding_data.resource_uid}" in rendered
)
assert "resource_by_id:_AWSResource {id: finding_data.resource_uid}" in rendered
assert (
"resource_by_short:_AWSResource {id: finding_data.resource_short_uid}"
in rendered
)
assert "head(collect(resource_by_short)) AS resource_by_short" in rendered
assert (
"COALESCE(resource_by_uid, resource_by_id, resource_by_short)" in rendered
)
assert "RETURN merged_count, dropped_count" in rendered
class TestAddResourceLabel:
def test_add_resource_label_applies_private_label(self):
+145 -14
View File
@@ -7,8 +7,8 @@ from tasks.jobs.backfill import (
backfill_compliance_summaries,
backfill_provider_compliance_scores,
backfill_resource_scan_summaries,
backfill_scan_category_summaries,
backfill_scan_resource_group_summaries,
aggregate_scan_category_summaries,
aggregate_scan_resource_group_summaries,
)
from api.models import (
@@ -183,6 +183,10 @@ class TestBackfillComplianceSummaries:
def test_backfill_creates_compliance_summaries(
self, tenants_fixture, scans_fixture, compliance_requirements_overviews_fixture
):
# Fixture seeds compliance rows the backfill aggregates over; pytest
# injects it by parameter name, so we reference it explicitly here
# to keep static analysers from flagging it as unused.
del compliance_requirements_overviews_fixture
tenant = tenants_fixture[0]
scan = scans_fixture[0]
@@ -227,22 +231,86 @@ class TestBackfillComplianceSummaries:
@pytest.mark.django_db
class TestBackfillScanCategorySummaries:
def test_already_backfilled(self, scan_category_summary_fixture):
def test_rerun_with_no_findings_is_noop(self, scan_category_summary_fixture):
"""When the scan has no findings, the backfill is a no-op: it
reports `no categories to backfill` and leaves the table
untouched. The upsert path cannot drop rows it does not produce,
so any pre-existing row survives (matching the scan-completion
writer that used `ignore_conflicts=True`)."""
tenant_id = scan_category_summary_fixture.tenant_id
scan_id = scan_category_summary_fixture.scan_id
result = backfill_scan_category_summaries(str(tenant_id), str(scan_id))
result = aggregate_scan_category_summaries(str(tenant_id), str(scan_id))
assert result == {"status": "already backfilled"}
assert result == {"status": "no categories to backfill"}
assert ScanCategorySummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id, category="existing-category"
).exists()
def test_rerun_upserts_without_duplicating(self, findings_with_categories_fixture):
"""Calling the backfill twice upserts rather than raising on
`unique_category_severity_per_scan`; rows are updated in place
(same primary keys)."""
finding = findings_with_categories_fixture
tenant_id = str(finding.tenant_id)
scan_id = str(finding.scan_id)
aggregate_scan_category_summaries(tenant_id, scan_id)
first_ids = set(
ScanCategorySummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).values_list("id", flat=True)
)
aggregate_scan_category_summaries(tenant_id, scan_id)
second_ids = set(
ScanCategorySummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).values_list("id", flat=True)
)
assert first_ids == second_ids
assert len(first_ids) == 2 # 2 categories x 1 severity
def test_rerun_reflects_mute_between_runs(self, findings_with_categories_fixture):
"""Muting a finding between two backfill runs must move counters:
`failed_findings` and `new_failed_findings` drop to zero (muted
findings are excluded from those totals). Guards against a
regression where the upsert keeps stale counts from the first run."""
finding = findings_with_categories_fixture
tenant_id = str(finding.tenant_id)
scan_id = str(finding.scan_id)
aggregate_scan_category_summaries(tenant_id, scan_id)
before = list(
ScanCategorySummary.objects.filter(tenant_id=tenant_id, scan_id=scan_id)
)
assert all(s.failed_findings == 1 for s in before)
assert all(s.new_failed_findings == 1 for s in before)
assert all(s.total_findings == 1 for s in before)
Finding.all_objects.filter(pk=finding.pk).update(muted=True)
aggregate_scan_category_summaries(tenant_id, scan_id)
after = list(
ScanCategorySummary.objects.filter(tenant_id=tenant_id, scan_id=scan_id)
)
assert {s.id for s in after} == {s.id for s in before}
assert all(s.failed_findings == 0 for s in after)
assert all(s.new_failed_findings == 0 for s in after)
assert all(s.total_findings == 0 for s in after)
def test_not_completed_scan(self, get_not_completed_scans):
for scan in get_not_completed_scans:
result = backfill_scan_category_summaries(str(scan.tenant_id), str(scan.id))
result = aggregate_scan_category_summaries(
str(scan.tenant_id), str(scan.id)
)
assert result == {"status": "scan is not completed"}
def test_no_categories_to_backfill(self, scans_fixture):
scan = scans_fixture[1] # Failed scan with no findings
result = backfill_scan_category_summaries(str(scan.tenant_id), str(scan.id))
result = aggregate_scan_category_summaries(str(scan.tenant_id), str(scan.id))
assert result == {"status": "no categories to backfill"}
def test_successful_backfill(self, findings_with_categories_fixture):
@@ -250,7 +318,7 @@ class TestBackfillScanCategorySummaries:
tenant_id = str(finding.tenant_id)
scan_id = str(finding.scan_id)
result = backfill_scan_category_summaries(tenant_id, scan_id)
result = aggregate_scan_category_summaries(tenant_id, scan_id)
# 2 categories × 1 severity = 2 rows
assert result == {"status": "backfilled", "categories_count": 2}
@@ -311,24 +379,87 @@ def scan_resource_group_summary_fixture(scans_fixture):
@pytest.mark.django_db
class TestBackfillScanGroupSummaries:
def test_already_backfilled(self, scan_resource_group_summary_fixture):
def test_rerun_with_no_findings_is_noop(self, scan_resource_group_summary_fixture):
"""When the scan has no findings, the backfill is a no-op: it
reports `no resource groups to backfill` and leaves the table
untouched. The upsert path cannot drop rows it does not produce,
so any pre-existing row survives (matching the scan-completion
writer that used `ignore_conflicts=True`)."""
tenant_id = scan_resource_group_summary_fixture.tenant_id
scan_id = scan_resource_group_summary_fixture.scan_id
result = backfill_scan_resource_group_summaries(str(tenant_id), str(scan_id))
result = aggregate_scan_resource_group_summaries(str(tenant_id), str(scan_id))
assert result == {"status": "already backfilled"}
assert result == {"status": "no resource groups to backfill"}
assert ScanGroupSummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id, resource_group="existing-group"
).exists()
def test_rerun_upserts_without_duplicating(self, findings_with_group_fixture):
"""Calling the backfill twice upserts rather than raising on
`unique_resource_group_severity_per_scan`; rows are updated in
place (same primary keys)."""
finding = findings_with_group_fixture
tenant_id = str(finding.tenant_id)
scan_id = str(finding.scan_id)
aggregate_scan_resource_group_summaries(tenant_id, scan_id)
first_ids = set(
ScanGroupSummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).values_list("id", flat=True)
)
aggregate_scan_resource_group_summaries(tenant_id, scan_id)
second_ids = set(
ScanGroupSummary.objects.filter(
tenant_id=tenant_id, scan_id=scan_id
).values_list("id", flat=True)
)
assert first_ids == second_ids
assert len(first_ids) == 1 # 1 resource group x 1 severity
def test_rerun_reflects_mute_between_runs(self, findings_with_group_fixture):
"""Muting a finding between two backfill runs must move counters:
`failed_findings` and `new_failed_findings` drop to zero (muted
findings are excluded from those totals). Guards against a
regression where the upsert keeps stale counts from the first run."""
finding = findings_with_group_fixture
tenant_id = str(finding.tenant_id)
scan_id = str(finding.scan_id)
aggregate_scan_resource_group_summaries(tenant_id, scan_id)
before = list(
ScanGroupSummary.objects.filter(tenant_id=tenant_id, scan_id=scan_id)
)
assert len(before) == 1
assert before[0].failed_findings == 1
assert before[0].new_failed_findings == 1
assert before[0].total_findings == 1
Finding.all_objects.filter(pk=finding.pk).update(muted=True)
aggregate_scan_resource_group_summaries(tenant_id, scan_id)
after = list(
ScanGroupSummary.objects.filter(tenant_id=tenant_id, scan_id=scan_id)
)
assert {s.id for s in after} == {s.id for s in before}
assert after[0].failed_findings == 0
assert after[0].new_failed_findings == 0
assert after[0].total_findings == 0
def test_not_completed_scan(self, get_not_completed_scans):
for scan in get_not_completed_scans:
result = backfill_scan_resource_group_summaries(
result = aggregate_scan_resource_group_summaries(
str(scan.tenant_id), str(scan.id)
)
assert result == {"status": "scan is not completed"}
def test_no_resource_groups_to_backfill(self, scans_fixture):
scan = scans_fixture[1] # Failed scan with no findings
result = backfill_scan_resource_group_summaries(
result = aggregate_scan_resource_group_summaries(
str(scan.tenant_id), str(scan.id)
)
assert result == {"status": "no resource groups to backfill"}
@@ -338,7 +469,7 @@ class TestBackfillScanGroupSummaries:
tenant_id = str(finding.tenant_id)
scan_id = str(finding.scan_id)
result = backfill_scan_resource_group_summaries(tenant_id, scan_id)
result = aggregate_scan_resource_group_summaries(tenant_id, scan_id)
# 1 resource group × 1 severity = 1 row
assert result == {"status": "backfilled", "resource_groups_count": 1}
+798 -2
View File
@@ -1,10 +1,21 @@
import os
import time
import uuid
from unittest.mock import Mock, patch
import matplotlib
import pytest
from reportlab.lib import colors
from tasks.jobs.report import generate_compliance_reports, generate_threatscore_report
from tasks.jobs.report import (
STALE_TMP_OUTPUT_MAX_AGE_HOURS,
STALE_TMP_OUTPUT_LOCK_FILE_NAME,
_cleanup_stale_tmp_output_directories,
_is_scan_directory_protected,
_pick_latest_cis_variant,
_should_run_stale_cleanup,
generate_compliance_reports,
generate_threatscore_report,
)
from tasks.jobs.reports import (
CHART_COLOR_GREEN_1,
CHART_COLOR_GREEN_2,
@@ -29,7 +40,13 @@ from tasks.jobs.threatscore_utils import (
_load_findings_for_requirement_checks,
)
from api.models import Finding, Resource, ResourceFindingMapping, StatusChoices
from api.models import (
Finding,
Resource,
ResourceFindingMapping,
StateChoices,
StatusChoices,
)
from prowler.lib.check.models import Severity
matplotlib.use("Agg") # Use non-interactive backend for tests
@@ -351,6 +368,366 @@ class TestLoadFindingsForChecks:
assert result == {}
class TestCleanupStaleTmpOutputDirectories:
"""Unit tests for opportunistic stale cleanup under tmp output root."""
def test_removes_only_scan_dirs_older_than_ttl(self, tmp_path, monkeypatch):
"""Should remove stale scan directories and keep recent ones."""
root_dir = tmp_path / "prowler_api_output"
old_scan_dir = root_dir / "tenant-a" / "scan-old"
old_scan_dir.mkdir(parents=True)
(old_scan_dir / "artifact.txt").write_text("old")
recent_scan_dir = root_dir / "tenant-a" / "scan-recent"
recent_scan_dir.mkdir(parents=True)
(recent_scan_dir / "artifact.txt").write_text("recent")
now = time.time()
stale_ts = now - ((STALE_TMP_OUTPUT_MAX_AGE_HOURS + 1) * 60 * 60)
os.utime(old_scan_dir, (stale_ts, stale_ts))
monkeypatch.setattr(
"tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT", root_dir.resolve()
)
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", lambda *_: True
)
monkeypatch.setattr(
"tasks.jobs.report._is_scan_directory_protected", lambda **_: False
)
removed = _cleanup_stale_tmp_output_directories(
str(root_dir), max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS
)
assert removed == 1
assert not old_scan_dir.exists()
assert recent_scan_dir.exists()
def test_skips_current_scan_even_when_stale(self, tmp_path, monkeypatch):
"""Should not delete stale directory for the currently processed scan."""
root_dir = tmp_path / "prowler_api_output"
current_scan_dir = root_dir / "tenant-current" / "scan-current"
current_scan_dir.mkdir(parents=True)
(current_scan_dir / "artifact.txt").write_text("current")
other_stale_scan_dir = root_dir / "tenant-other" / "scan-old"
other_stale_scan_dir.mkdir(parents=True)
(other_stale_scan_dir / "artifact.txt").write_text("other")
now = time.time()
stale_ts = now - ((STALE_TMP_OUTPUT_MAX_AGE_HOURS + 1) * 60 * 60)
os.utime(current_scan_dir, (stale_ts, stale_ts))
os.utime(other_stale_scan_dir, (stale_ts, stale_ts))
monkeypatch.setattr(
"tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT", root_dir.resolve()
)
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", lambda *_: True
)
monkeypatch.setattr(
"tasks.jobs.report._is_scan_directory_protected", lambda **_: False
)
removed = _cleanup_stale_tmp_output_directories(
str(root_dir),
max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS,
exclude_scan=("tenant-current", "scan-current"),
)
assert removed == 1
assert current_scan_dir.exists()
assert not other_stale_scan_dir.exists()
def test_respects_max_deletions_per_run(self, tmp_path, monkeypatch):
"""Cleanup should stop deleting when max_deletions_per_run is reached."""
root_dir = tmp_path / "prowler_api_output"
stale_dir_1 = root_dir / "tenant-a" / "scan-old-1"
stale_dir_2 = root_dir / "tenant-a" / "scan-old-2"
stale_dir_1.mkdir(parents=True)
stale_dir_2.mkdir(parents=True)
(stale_dir_1 / "artifact.txt").write_text("old-1")
(stale_dir_2 / "artifact.txt").write_text("old-2")
now = time.time()
stale_ts = now - ((STALE_TMP_OUTPUT_MAX_AGE_HOURS + 1) * 60 * 60)
os.utime(stale_dir_1, (stale_ts, stale_ts))
os.utime(stale_dir_2, (stale_ts, stale_ts))
monkeypatch.setattr(
"tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT", root_dir.resolve()
)
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", lambda *_: True
)
monkeypatch.setattr(
"tasks.jobs.report._is_scan_directory_protected", lambda **_: False
)
removed = _cleanup_stale_tmp_output_directories(
str(root_dir),
max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS,
max_deletions_per_run=1,
)
assert removed == 1
remaining = sum(
1 for scan_dir in (stale_dir_1, stale_dir_2) if scan_dir.exists()
)
assert remaining == 1
def test_rejects_non_safe_root(self, tmp_path, monkeypatch):
"""Cleanup must no-op when called with a root outside the allowed safe root."""
root_dir = tmp_path / "prowler_api_output"
root_dir.mkdir(parents=True)
monkeypatch.setattr(
"tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT",
(tmp_path / "another-root").resolve(),
)
def _fail_should_run(*_args, **_kwargs):
raise AssertionError("_should_run_stale_cleanup should not be called")
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", _fail_should_run
)
removed = _cleanup_stale_tmp_output_directories(str(root_dir), max_age_hours=48)
assert removed == 0
def test_ignores_symlink_scan_directories(self, tmp_path, monkeypatch):
"""Symlinked scan directories must never be deleted by cleanup."""
root_dir = tmp_path / "prowler_api_output"
stale_real_scan_dir = root_dir / "tenant-a" / "scan-old-real"
stale_real_scan_dir.mkdir(parents=True)
(stale_real_scan_dir / "artifact.txt").write_text("old")
symlink_target = tmp_path / "symlink-target"
symlink_target.mkdir(parents=True)
(symlink_target / "artifact.txt").write_text("target")
symlink_scan_dir = root_dir / "tenant-a" / "scan-link"
symlink_scan_dir.symlink_to(symlink_target, target_is_directory=True)
now = time.time()
stale_ts = now - ((STALE_TMP_OUTPUT_MAX_AGE_HOURS + 1) * 60 * 60)
os.utime(stale_real_scan_dir, (stale_ts, stale_ts))
monkeypatch.setattr(
"tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT", root_dir.resolve()
)
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", lambda *_: True
)
monkeypatch.setattr(
"tasks.jobs.report._is_scan_directory_protected", lambda **_: False
)
removed = _cleanup_stale_tmp_output_directories(
str(root_dir), max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS
)
assert removed == 1
assert not stale_real_scan_dir.exists()
assert symlink_scan_dir.exists()
assert symlink_target.exists()
def test_handles_internal_exception_without_propagating(
self, tmp_path, monkeypatch
):
"""Cleanup errors must be swallowed so callers are not interrupted."""
root_dir = tmp_path / "prowler_api_output"
stale_scan_dir = root_dir / "tenant-a" / "scan-old"
stale_scan_dir.mkdir(parents=True)
now = time.time()
stale_ts = now - ((STALE_TMP_OUTPUT_MAX_AGE_HOURS + 1) * 60 * 60)
os.utime(stale_scan_dir, (stale_ts, stale_ts))
monkeypatch.setattr(
"tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT", root_dir.resolve()
)
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", lambda *_: True
)
def _raise(*_args, **_kwargs):
raise RuntimeError("db timeout")
monkeypatch.setattr("tasks.jobs.report._is_scan_directory_protected", _raise)
removed = _cleanup_stale_tmp_output_directories(
str(root_dir), max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS
)
assert removed == 0
assert stale_scan_dir.exists()
def test_safe_root_follows_custom_tmp_output_directory(self, tmp_path, monkeypatch):
"""Custom DJANGO_TMP_OUTPUT_DIRECTORY must be honored as the safe root."""
from tasks.jobs import report as report_module
custom_root = tmp_path / "custom_tmp_output"
custom_root.mkdir(parents=True)
monkeypatch.setattr(
report_module, "DJANGO_TMP_OUTPUT_DIRECTORY", str(custom_root)
)
resolved_root = report_module._resolve_stale_tmp_safe_root()
assert resolved_root == custom_root.resolve()
stale_scan_dir = custom_root / "tenant-a" / "scan-old"
stale_scan_dir.mkdir(parents=True)
(stale_scan_dir / "artifact.txt").write_text("old")
stale_ts = time.time() - ((STALE_TMP_OUTPUT_MAX_AGE_HOURS + 1) * 60 * 60)
os.utime(stale_scan_dir, (stale_ts, stale_ts))
monkeypatch.setattr(report_module, "STALE_TMP_OUTPUT_SAFE_ROOT", resolved_root)
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", lambda *_: True
)
monkeypatch.setattr(
"tasks.jobs.report._is_scan_directory_protected", lambda **_: False
)
removed = _cleanup_stale_tmp_output_directories(
str(custom_root), max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS
)
assert removed == 1
assert not stale_scan_dir.exists()
@pytest.mark.parametrize(
"forbidden_root",
["/", "/tmp", "/var", "/var/tmp", "/home", "/root", "/etc", "/usr"],
)
def test_safe_root_rejects_forbidden_system_roots(
self, forbidden_root, monkeypatch
):
"""Cleanup must refuse to operate against shared system roots."""
from tasks.jobs import report as report_module
monkeypatch.setattr(
report_module, "DJANGO_TMP_OUTPUT_DIRECTORY", forbidden_root
)
assert report_module._resolve_stale_tmp_safe_root() is None
def test_skips_cleanup_when_safe_root_is_none(self, tmp_path, monkeypatch):
"""A None safe root (forbidden config) must short-circuit the cleanup."""
root_dir = tmp_path / "prowler_api_output"
root_dir.mkdir(parents=True)
monkeypatch.setattr("tasks.jobs.report.STALE_TMP_OUTPUT_SAFE_ROOT", None)
def _fail_should_run(*_args, **_kwargs):
raise AssertionError("_should_run_stale_cleanup should not be called")
monkeypatch.setattr(
"tasks.jobs.report._should_run_stale_cleanup", _fail_should_run
)
removed = _cleanup_stale_tmp_output_directories(
str(root_dir), max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS
)
assert removed == 0
class TestStaleCleanupProtectionHelpers:
"""Unit tests for stale cleanup helper guard logic."""
def test_should_run_cleanup_is_throttled(self, tmp_path):
root_dir = tmp_path / "prowler_api_output"
root_dir.mkdir(parents=True)
assert _should_run_stale_cleanup(root_dir, throttle_seconds=3600) is True
assert _should_run_stale_cleanup(root_dir, throttle_seconds=3600) is False
lock_file = root_dir / STALE_TMP_OUTPUT_LOCK_FILE_NAME
lock_file.write_text(str(int(time.time()) - 7200), encoding="ascii")
assert _should_run_stale_cleanup(root_dir, throttle_seconds=3600) is True
@patch("tasks.jobs.report.fcntl.flock", side_effect=BlockingIOError)
def test_should_run_cleanup_returns_false_when_lock_is_busy(
self, _mock_flock, tmp_path
):
root_dir = tmp_path / "prowler_api_output"
root_dir.mkdir(parents=True)
assert _should_run_stale_cleanup(root_dir, throttle_seconds=3600) is False
@patch("tasks.jobs.report.Scan.all_objects.using")
def test_is_scan_directory_protected_for_executing_scan(
self, mock_scan_using, tmp_path
):
scan_id = str(uuid.uuid4())
scan_path = tmp_path / scan_id
scan_path.mkdir(parents=True)
mock_scan_using.return_value.filter.return_value.only.return_value.first.return_value = Mock(
state=StateChoices.EXECUTING, output_location=None
)
assert (
_is_scan_directory_protected(
tenant_id="tenant-a",
scan_id=scan_id,
scan_path=scan_path,
)
is True
)
@patch("tasks.jobs.report.Scan.all_objects.using")
def test_is_scan_directory_protected_for_local_output(
self, mock_scan_using, tmp_path
):
scan_id = str(uuid.uuid4())
scan_path = tmp_path / scan_id
scan_path.mkdir(parents=True)
local_output_path = scan_path / "outputs.zip"
mock_scan_using.return_value.filter.return_value.only.return_value.first.return_value = Mock(
state=StateChoices.COMPLETED, output_location=str(local_output_path)
)
assert (
_is_scan_directory_protected(
tenant_id="tenant-a",
scan_id=scan_id,
scan_path=scan_path.resolve(),
)
is True
)
@patch("tasks.jobs.report.Scan.all_objects.using")
def test_is_scan_directory_not_protected_for_s3_output(
self, mock_scan_using, tmp_path
):
scan_id = str(uuid.uuid4())
scan_path = tmp_path / scan_id
scan_path.mkdir(parents=True)
mock_scan_using.return_value.filter.return_value.only.return_value.first.return_value = Mock(
state=StateChoices.COMPLETED,
output_location="s3://bucket/path/report.zip",
)
assert (
_is_scan_directory_protected(
tenant_id="tenant-a",
scan_id=scan_id,
scan_path=scan_path,
)
is False
)
@pytest.mark.django_db
class TestGenerateThreatscoreReportFunction:
"""Test suite for generate_threatscore_report function."""
@@ -422,6 +799,425 @@ class TestGenerateComplianceReportsOptimized:
mock_ens.assert_not_called()
mock_nis2.assert_not_called()
@patch(
"tasks.jobs.report._cleanup_stale_tmp_output_directories",
side_effect=RuntimeError("cleanup boom"),
)
def test_cleanup_exception_does_not_break_no_findings_flow(self, _mock_cleanup):
"""Unexpected cleanup failures must not abort report generation."""
random_tenant = str(uuid.uuid4())
random_scan = str(uuid.uuid4())
random_provider = str(uuid.uuid4())
with patch("tasks.jobs.report.ScanSummary.objects.filter") as mock_filter:
mock_filter.return_value.exists.return_value = False
result = generate_compliance_reports(
tenant_id=random_tenant,
scan_id=random_scan,
provider_id=random_provider,
generate_threatscore=True,
generate_ens=False,
generate_nis2=False,
generate_csa=False,
generate_cis=False,
)
assert result["threatscore"] == {"upload": False, "path": ""}
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_cis_report")
def test_no_findings_returns_flat_cis_entry(
self,
mock_cis,
mock_upload,
tenants_fixture,
scans_fixture,
providers_fixture,
):
"""Scan with no findings and ``generate_cis=True`` must yield a flat
``{"upload": False, "path": ""}`` entry, consistent with the other
frameworks (no nested dict, no sentinel keys)."""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
provider = providers_fixture[0]
result = generate_compliance_reports(
tenant_id=str(tenant.id),
scan_id=str(scan.id),
provider_id=str(provider.id),
generate_threatscore=False,
generate_ens=False,
generate_nis2=False,
generate_csa=False,
generate_cis=True,
)
assert result["cis"] == {"upload": False, "path": ""}
mock_cis.assert_not_called()
@patch("tasks.jobs.report.rmtree")
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_threatscore_report")
@patch("tasks.jobs.report._generate_compliance_output_directory")
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report.Compliance.get_bulk")
@patch("tasks.jobs.report.Provider.objects.get")
@patch("tasks.jobs.report.ScanSummary.objects.filter")
def test_cleanup_runs_when_supported_reports_upload_successfully(
self,
mock_scan_summary_filter,
mock_provider_get,
mock_get_bulk,
mock_aggregate_stats,
mock_generate_output_dir,
mock_threatscore,
mock_upload_to_s3,
mock_rmtree,
):
"""Cleanup must run when all generated (supported) reports are uploaded."""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="provider-uid", provider="m365")
mock_get_bulk.return_value = {}
mock_aggregate_stats.return_value = {}
mock_generate_output_dir.return_value = (
"/tmp/tenant/scan/threatscore/prowler-output-provider-20240101000000"
)
mock_upload_to_s3.return_value = (
"s3://bucket/tenant/scan/threatscore/report.pdf"
)
result = generate_compliance_reports(
tenant_id=str(uuid.uuid4()),
scan_id=str(uuid.uuid4()),
provider_id=str(uuid.uuid4()),
generate_threatscore=True,
generate_ens=True,
generate_nis2=True,
generate_csa=True,
generate_cis=True,
)
assert result["threatscore"]["upload"] is True
assert result["ens"]["upload"] is False
assert result["nis2"]["upload"] is False
assert result["csa"]["upload"] is False
assert result["cis"] == {"upload": False, "path": ""}
mock_generate_output_dir.assert_called_once()
mock_threatscore.assert_called_once()
mock_rmtree.assert_called_once()
@patch("tasks.jobs.report.rmtree")
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_threatscore_report")
@patch("tasks.jobs.report._generate_compliance_output_directory")
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report.Compliance.get_bulk")
@patch("tasks.jobs.report.Provider.objects.get")
@patch("tasks.jobs.report.ScanSummary.objects.filter")
def test_cleanup_skipped_when_supported_upload_fails(
self,
mock_scan_summary_filter,
mock_provider_get,
mock_get_bulk,
mock_aggregate_stats,
mock_generate_output_dir,
mock_threatscore,
mock_upload_to_s3,
mock_rmtree,
):
"""Cleanup must not run when a generated report upload fails."""
mock_scan_summary_filter.return_value.exists.return_value = True
mock_provider_get.return_value = Mock(uid="provider-uid", provider="m365")
mock_get_bulk.return_value = {}
mock_aggregate_stats.return_value = {}
mock_generate_output_dir.return_value = (
"/tmp/tenant/scan/threatscore/prowler-output-provider-20240101000000"
)
mock_upload_to_s3.return_value = None
result = generate_compliance_reports(
tenant_id=str(uuid.uuid4()),
scan_id=str(uuid.uuid4()),
provider_id=str(uuid.uuid4()),
generate_threatscore=True,
generate_ens=True,
generate_nis2=True,
generate_csa=True,
generate_cis=True,
)
assert result["threatscore"]["upload"] is False
assert result["cis"] == {"upload": False, "path": ""}
mock_generate_output_dir.assert_called_once()
mock_threatscore.assert_called_once()
mock_rmtree.assert_not_called()
@pytest.mark.django_db
class TestGenerateComplianceReportsCIS:
"""Test suite covering the CIS branch of generate_compliance_reports."""
def _force_scan_has_findings(self, monkeypatch):
"""Bypass the ScanSummary.exists() early-return guard."""
class _FakeManager:
def filter(self, **kwargs):
class _Q:
def exists(self):
return True
return _Q()
monkeypatch.setattr("tasks.jobs.report.ScanSummary.objects", _FakeManager())
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_cis_report")
@patch("tasks.jobs.report.Compliance.get_bulk")
def test_cis_picks_latest_version(
self,
mock_get_bulk,
mock_cis,
mock_upload,
mock_stats,
monkeypatch,
tenants_fixture,
scans_fixture,
providers_fixture,
):
"""CIS branch should generate a single PDF for the highest version.
The returned ``results["cis"]`` must have the same flat shape as the
other single-version frameworks (``{"upload", "path"}``) the picked
variant is an internal detail and is not exposed in the result.
"""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
provider = providers_fixture[0]
self._force_scan_has_findings(monkeypatch)
mock_stats.return_value = {}
# Multiple CIS variants + a non-CIS framework that must be ignored.
# Includes 1.10 to verify the selection is not lexicographic.
mock_get_bulk.return_value = {
"cis_1.4_aws": Mock(),
"cis_1.10_aws": Mock(),
"cis_2.0_aws": Mock(),
"cis_5.0_aws": Mock(),
"ens_rd2022_aws": Mock(),
}
mock_upload.return_value = "s3://bucket/path"
result = generate_compliance_reports(
tenant_id=str(tenant.id),
scan_id=str(scan.id),
provider_id=str(provider.id),
generate_threatscore=False,
generate_ens=False,
generate_nis2=False,
generate_csa=False,
generate_cis=True,
)
# Exactly one call for the latest version, never for older variants
# or non-CIS frameworks.
assert mock_cis.call_count == 1
assert mock_cis.call_args.kwargs["compliance_id"] == "cis_5.0_aws"
assert result["cis"]["upload"] is True
assert result["cis"]["path"] == "s3://bucket/path"
assert "compliance_id" not in result["cis"]
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_cis_report")
@patch("tasks.jobs.report.Compliance.get_bulk")
def test_cis_latest_variant_failure_captured_in_results(
self,
mock_get_bulk,
mock_cis,
mock_upload,
mock_stats,
monkeypatch,
tenants_fixture,
scans_fixture,
providers_fixture,
):
"""A failure in the latest CIS variant must be surfaced in the flat results entry."""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
provider = providers_fixture[0]
self._force_scan_has_findings(monkeypatch)
mock_stats.return_value = {}
mock_get_bulk.return_value = {
"cis_1.4_aws": Mock(),
"cis_5.0_aws": Mock(),
}
mock_cis.side_effect = RuntimeError("boom")
result = generate_compliance_reports(
tenant_id=str(tenant.id),
scan_id=str(scan.id),
provider_id=str(provider.id),
generate_threatscore=False,
generate_ens=False,
generate_nis2=False,
generate_csa=False,
generate_cis=True,
)
# Only the latest variant is attempted; its failure lands in a flat
# entry keyed under "cis" with the same shape as sibling frameworks.
assert mock_cis.call_count == 1
assert result["cis"]["upload"] is False
assert result["cis"]["error"] == "boom"
assert "compliance_id" not in result["cis"]
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report._upload_to_s3")
@patch("tasks.jobs.report.generate_cis_report")
@patch("tasks.jobs.report.Compliance.get_bulk")
def test_cis_provider_without_cis_skipped_cleanly(
self,
mock_get_bulk,
mock_cis,
mock_upload,
mock_stats,
monkeypatch,
tenants_fixture,
scans_fixture,
providers_fixture,
):
"""When ``Compliance.get_bulk`` returns no CIS entry the CIS branch
must skip cleanly and record a flat ``{"upload": False, "path": ""}``
entry no hard-coded provider whitelist is consulted."""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
provider = providers_fixture[0]
self._force_scan_has_findings(monkeypatch)
mock_stats.return_value = {}
# No ``cis_*`` keys in the bulk → no variant picked.
mock_get_bulk.return_value = {"ens_rd2022_aws": Mock()}
result = generate_compliance_reports(
tenant_id=str(tenant.id),
scan_id=str(scan.id),
provider_id=str(provider.id),
generate_threatscore=False,
generate_ens=False,
generate_nis2=False,
generate_csa=False,
generate_cis=True,
)
assert result["cis"] == {"upload": False, "path": ""}
mock_cis.assert_not_called()
@patch("tasks.jobs.report._aggregate_requirement_statistics_from_database")
@patch("tasks.jobs.report._generate_compliance_output_directory")
@patch("tasks.jobs.report.Compliance.get_bulk")
def test_cis_output_directory_failure_is_captured(
self,
mock_get_bulk,
mock_generate_output_dir,
mock_stats,
monkeypatch,
tenants_fixture,
scans_fixture,
providers_fixture,
):
"""CIS output dir errors must be captured in results (not raised)."""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
provider = providers_fixture[0]
self._force_scan_has_findings(monkeypatch)
mock_stats.return_value = {}
mock_get_bulk.return_value = {"cis_5.0_aws": Mock()}
mock_generate_output_dir.side_effect = RuntimeError("dir boom")
result = generate_compliance_reports(
tenant_id=str(tenant.id),
scan_id=str(scan.id),
provider_id=str(provider.id),
generate_threatscore=False,
generate_ens=False,
generate_nis2=False,
generate_csa=False,
generate_cis=True,
)
assert result["cis"]["upload"] is False
assert result["cis"]["error"] == "dir boom"
class TestPickLatestCisVariant:
"""Unit tests for `_pick_latest_cis_variant` helper."""
def test_empty_returns_none(self):
assert _pick_latest_cis_variant([]) is None
def test_single_variant(self):
assert _pick_latest_cis_variant(["cis_5.0_aws"]) == "cis_5.0_aws"
def test_numeric_not_lexicographic(self):
"""1.10 must beat 1.2 (lex sort would pick 1.2)."""
variants = ["cis_1.2_kubernetes", "cis_1.10_kubernetes"]
assert _pick_latest_cis_variant(variants) == "cis_1.10_kubernetes"
def test_major_version_wins(self):
variants = ["cis_1.4_aws", "cis_2.0_aws", "cis_5.0_aws", "cis_6.0_aws"]
assert _pick_latest_cis_variant(variants) == "cis_6.0_aws"
def test_minor_version_breaks_tie(self):
variants = ["cis_3.0_aws", "cis_3.1_aws", "cis_2.9_aws"]
assert _pick_latest_cis_variant(variants) == "cis_3.1_aws"
def test_three_part_version(self):
"""Versions like 3.0.1 must win over 3.0."""
variants = ["cis_3.0_aws", "cis_3.0.1_aws"]
assert _pick_latest_cis_variant(variants) == "cis_3.0.1_aws"
def test_malformed_names_ignored(self):
variants = ["notcis_1.0_aws", "cis_abc_aws", "cis_5.0_aws"]
assert _pick_latest_cis_variant(variants) == "cis_5.0_aws"
def test_only_malformed_returns_none(self):
variants = ["notcis_1.0_aws", "cis_abc_aws"]
assert _pick_latest_cis_variant(variants) is None
def test_multidigit_provider_name(self):
"""Provider name with underscores (e.g. googleworkspace) must parse."""
variants = ["cis_1.3_googleworkspace"]
assert _pick_latest_cis_variant(variants) == "cis_1.3_googleworkspace"
def test_accepts_iterator(self):
"""The helper must accept any iterable, not just lists."""
def _gen():
yield "cis_1.4_aws"
yield "cis_5.0_aws"
assert _pick_latest_cis_variant(_gen()) == "cis_5.0_aws"
def test_rejects_single_integer_version(self):
"""The regex requires at least one dotted component. ``cis_5_aws``
without a minor version is malformed per the backend contract."""
assert _pick_latest_cis_variant(["cis_5_aws"]) is None
def test_rejects_trailing_dot(self):
"""Inputs like ``cis_5._aws`` must be rejected at the regex stage
instead of silently normalising to ``(5, 0)``."""
assert _pick_latest_cis_variant(["cis_5._aws", "cis_1.0_aws"]) == "cis_1.0_aws"
def test_rejects_lone_dot_version(self):
"""``cis_._aws`` has no numeric component and must be skipped."""
assert _pick_latest_cis_variant(["cis_._aws", "cis_1.0_aws"]) == "cis_1.0_aws"
class TestOptimizationImprovements:
"""Test suite for optimization-related functionality."""
@@ -0,0 +1,532 @@
from unittest.mock import Mock, patch
import pytest
from reportlab.platypus import Image, LongTable, Paragraph, Table
from tasks.jobs.reports import FRAMEWORK_REGISTRY, ComplianceData, RequirementData
from tasks.jobs.reports.cis import (
CISReportGenerator,
_normalize_profile,
_profile_badge_text,
)
from api.models import StatusChoices
# =============================================================================
# Fixtures
# =============================================================================
@pytest.fixture
def cis_generator():
"""Create a CISReportGenerator instance for testing."""
config = FRAMEWORK_REGISTRY["cis"]
return CISReportGenerator(config)
def _make_attr(
section: str,
profile_value: str = "Level 1",
assessment_value: str = "Automated",
sub_section: str = "",
**extras,
) -> Mock:
"""Build a mock CIS_Requirement_Attribute with duck-typed fields."""
attr = Mock()
attr.Section = section
attr.SubSection = sub_section
# CIS enums have `.value`. Use a simple Mock that exposes `.value`.
attr.Profile = Mock(value=profile_value)
attr.AssessmentStatus = Mock(value=assessment_value)
attr.Description = extras.get("description", "desc")
attr.RationaleStatement = extras.get("rationale", "the rationale")
attr.ImpactStatement = extras.get("impact", "the impact")
attr.RemediationProcedure = extras.get("remediation", "the remediation")
attr.AuditProcedure = extras.get("audit", "the audit")
attr.AdditionalInformation = ""
attr.DefaultValue = ""
attr.References = extras.get("references", "https://example.com")
return attr
@pytest.fixture
def basic_cis_compliance_data():
"""Create basic ComplianceData for CIS testing (no requirements)."""
return ComplianceData(
tenant_id="tenant-123",
scan_id="scan-456",
provider_id="provider-789",
compliance_id="cis_5.0_aws",
framework="CIS",
name="CIS Amazon Web Services Foundations Benchmark v5.0.0",
version="5.0",
description="Center for Internet Security AWS Foundations Benchmark",
)
@pytest.fixture
def populated_cis_compliance_data(basic_cis_compliance_data):
"""CIS data with mixed requirements across 2 sections, Profile L1/L2, Pass/Fail/Manual."""
data = basic_cis_compliance_data
data.requirements = [
RequirementData(
id="1.1",
description="Maintain current contact details",
status=StatusChoices.PASS,
passed_findings=5,
failed_findings=0,
total_findings=5,
checks=["aws_check_1"],
),
RequirementData(
id="1.2",
description="Ensure root account has no access keys",
status=StatusChoices.FAIL,
passed_findings=0,
failed_findings=3,
total_findings=3,
checks=["aws_check_2"],
),
RequirementData(
id="1.3",
description="Ensure MFA is enabled for all IAM users",
status=StatusChoices.MANUAL,
checks=[],
),
RequirementData(
id="2.1",
description="Ensure S3 Buckets are logging",
status=StatusChoices.PASS,
passed_findings=2,
failed_findings=0,
total_findings=2,
checks=["aws_check_3"],
),
RequirementData(
id="2.2",
description="Ensure encryption at rest is enabled",
status=StatusChoices.FAIL,
passed_findings=0,
failed_findings=4,
total_findings=4,
checks=["aws_check_4"],
),
]
data.attributes_by_requirement_id = {
"1.1": {
"attributes": {
"req_attributes": [
_make_attr(
"1 Identity and Access Management",
profile_value="Level 1",
assessment_value="Automated",
)
],
"checks": ["aws_check_1"],
}
},
"1.2": {
"attributes": {
"req_attributes": [
_make_attr(
"1 Identity and Access Management",
profile_value="Level 1",
assessment_value="Automated",
)
],
"checks": ["aws_check_2"],
}
},
"1.3": {
"attributes": {
"req_attributes": [
_make_attr(
"1 Identity and Access Management",
profile_value="Level 2",
assessment_value="Manual",
)
],
"checks": [],
}
},
"2.1": {
"attributes": {
"req_attributes": [
_make_attr(
"2 Storage",
profile_value="Level 2",
assessment_value="Automated",
)
],
"checks": ["aws_check_3"],
}
},
"2.2": {
"attributes": {
"req_attributes": [
_make_attr(
"2 Storage",
profile_value="Level 1",
assessment_value="Automated",
)
],
"checks": ["aws_check_4"],
}
},
}
return data
# =============================================================================
# Helper function tests
# =============================================================================
class TestNormalizeProfile:
"""Test suite for _normalize_profile helper."""
def test_level_1_string(self):
assert _normalize_profile(Mock(value="Level 1")) == "L1"
def test_level_2_string(self):
assert _normalize_profile(Mock(value="Level 2")) == "L2"
def test_e3_level_1(self):
assert _normalize_profile(Mock(value="E3 Level 1")) == "L1"
def test_e5_level_2(self):
assert _normalize_profile(Mock(value="E5 Level 2")) == "L2"
def test_none_returns_other(self):
assert _normalize_profile(None) == "Other"
def test_substring_trap_rejected(self):
"""Unrelated tokens containing the literal ``L2`` must NOT map to L2."""
# A future enum value like "CL2 Kubernetes Worker" would be silently
# misclassified by a naive substring check.
assert _normalize_profile(Mock(value="CL2 Worker")) == "Other"
assert _normalize_profile(Mock(value="HL2 Legacy")) == "Other"
def test_raw_string_level_1(self):
# Mock without .value falls back to str(profile); use a real string
class NoValue:
def __str__(self):
return "Level 1"
assert _normalize_profile(NoValue()) == "L1"
def test_unknown_profile_returns_other(self):
assert _normalize_profile(Mock(value="Custom Profile")) == "Other"
class TestProfileBadgeText:
def test_l1_label(self):
assert _profile_badge_text("L1") == "Level 1"
def test_l2_label(self):
assert _profile_badge_text("L2") == "Level 2"
def test_other_label(self):
assert _profile_badge_text("Other") == "Other"
# =============================================================================
# Generator initialization
# =============================================================================
class TestCISGeneratorInitialization:
def test_generator_created(self, cis_generator):
assert cis_generator is not None
assert cis_generator.config.name == "cis"
def test_generator_language(self, cis_generator):
assert cis_generator.config.language == "en"
def test_generator_sections_dynamic(self, cis_generator):
# CIS sections differ per variant so config.sections MUST be None
assert cis_generator.config.sections is None
def test_attribute_fields_contain_cis_specific(self, cis_generator):
for field in ("Profile", "AssessmentStatus", "RationaleStatement"):
assert field in cis_generator.config.attribute_fields
# =============================================================================
# _derive_sections
# =============================================================================
class TestDeriveSections:
def test_preserves_first_seen_order(
self, cis_generator, populated_cis_compliance_data
):
sections = cis_generator._derive_sections(populated_cis_compliance_data)
assert sections == [
"1 Identity and Access Management",
"2 Storage",
]
def test_deduplicates_sections(self, cis_generator, basic_cis_compliance_data):
basic_cis_compliance_data.requirements = [
RequirementData(id="1.1", description="a", status=StatusChoices.PASS),
RequirementData(id="1.2", description="b", status=StatusChoices.PASS),
]
attr = _make_attr("1 IAM")
basic_cis_compliance_data.attributes_by_requirement_id = {
"1.1": {"attributes": {"req_attributes": [attr], "checks": []}},
"1.2": {"attributes": {"req_attributes": [attr], "checks": []}},
}
assert cis_generator._derive_sections(basic_cis_compliance_data) == ["1 IAM"]
def test_empty_data_returns_empty(self, cis_generator, basic_cis_compliance_data):
basic_cis_compliance_data.requirements = []
basic_cis_compliance_data.attributes_by_requirement_id = {}
assert cis_generator._derive_sections(basic_cis_compliance_data) == []
# =============================================================================
# _compute_statistics
# =============================================================================
class TestComputeStatistics:
def test_totals(self, cis_generator, populated_cis_compliance_data):
stats = cis_generator._compute_statistics(populated_cis_compliance_data)
assert stats["total"] == 5
assert stats["passed"] == 2
assert stats["failed"] == 2
assert stats["manual"] == 1
def test_overall_compliance_excludes_manual(
self, cis_generator, populated_cis_compliance_data
):
stats = cis_generator._compute_statistics(populated_cis_compliance_data)
# 2 passed / 4 evaluated (pass + fail) = 50%
assert stats["overall_compliance"] == pytest.approx(50.0)
def test_overall_compliance_all_manual(
self, cis_generator, basic_cis_compliance_data
):
basic_cis_compliance_data.requirements = [
RequirementData(id="x", description="d", status=StatusChoices.MANUAL),
]
attr = _make_attr("1 IAM", profile_value="Level 1", assessment_value="Manual")
basic_cis_compliance_data.attributes_by_requirement_id = {
"x": {"attributes": {"req_attributes": [attr], "checks": []}},
}
stats = cis_generator._compute_statistics(basic_cis_compliance_data)
# No evaluated → defaults to 100%
assert stats["overall_compliance"] == 100.0
def test_profile_counts(self, cis_generator, populated_cis_compliance_data):
stats = cis_generator._compute_statistics(populated_cis_compliance_data)
profile = stats["profile_counts"]
# From fixture:
# L1: 1.1 (PASS, Auto), 1.2 (FAIL, Auto), 2.2 (FAIL, Auto) → pass=1, fail=2, manual=0
# L2: 1.3 (MANUAL, Manual), 2.1 (PASS, Auto) → pass=1, fail=0, manual=1
assert profile["L1"] == {"passed": 1, "failed": 2, "manual": 0}
assert profile["L2"] == {"passed": 1, "failed": 0, "manual": 1}
def test_assessment_counts(self, cis_generator, populated_cis_compliance_data):
stats = cis_generator._compute_statistics(populated_cis_compliance_data)
assessment = stats["assessment_counts"]
# Automated: 1.1 PASS, 1.2 FAIL, 2.1 PASS, 2.2 FAIL → pass=2, fail=2, manual=0
# Manual: 1.3 MANUAL → pass=0, fail=0, manual=1
assert assessment["Automated"] == {"passed": 2, "failed": 2, "manual": 0}
assert assessment["Manual"] == {"passed": 0, "failed": 0, "manual": 1}
def test_top_failing_sections_includes_all_evaluated(
self, cis_generator, populated_cis_compliance_data
):
stats = cis_generator._compute_statistics(populated_cis_compliance_data)
top = stats["top_failing_sections"]
# Both sections have 1 PASS + 1 FAIL evaluated → tied at 50%. The
# sort is stable, so both must appear and both must be capped at
# 5 entries.
assert len(top) == 2
section_names = {name for name, _ in top}
assert section_names == {
"1 Identity and Access Management",
"2 Storage",
}
def test_compute_statistics_is_memoized(
self, cis_generator, populated_cis_compliance_data
):
"""Calling ``_compute_statistics`` twice with the same data must
reuse the cached value and not re-run the uncached kernel."""
with patch.object(
CISReportGenerator,
"_compute_statistics_uncached",
wraps=cis_generator._compute_statistics_uncached,
) as spy:
cis_generator._compute_statistics(populated_cis_compliance_data)
cis_generator._compute_statistics(populated_cis_compliance_data)
assert spy.call_count == 1
# =============================================================================
# Executive summary
# =============================================================================
class TestCISExecutiveSummary:
def test_title_present(self, cis_generator, populated_cis_compliance_data):
elements = cis_generator.create_executive_summary(populated_cis_compliance_data)
paragraphs = [e for e in elements if isinstance(e, Paragraph)]
text = " ".join(str(p.text) for p in paragraphs)
assert "Executive Summary" in text
def test_tables_rendered(self, cis_generator, populated_cis_compliance_data):
elements = cis_generator.create_executive_summary(populated_cis_compliance_data)
tables = [e for e in elements if isinstance(e, Table)]
# Exact count: Summary, Profile, Assessment, Top Failing Sections = 4.
assert len(tables) == 4
def test_no_requirements(self, cis_generator, basic_cis_compliance_data):
basic_cis_compliance_data.requirements = []
basic_cis_compliance_data.attributes_by_requirement_id = {}
elements = cis_generator.create_executive_summary(basic_cis_compliance_data)
# With no requirements: Summary table always renders, and both Profile
# and Assessment breakdown tables render with a 0-filled default row,
# but Top Failing Sections is suppressed → exactly 3 tables.
tables = [e for e in elements if isinstance(e, Table)]
assert len(tables) == 3
# =============================================================================
# Charts section
# =============================================================================
class TestCISChartsSection:
def test_charts_rendered(self, cis_generator, populated_cis_compliance_data):
elements = cis_generator.create_charts_section(populated_cis_compliance_data)
# At least 1 image for the pie + 1 for section bar + 1 for stacked
images = [e for e in elements if isinstance(e, Image)]
assert len(images) >= 1
def test_charts_no_data_no_crash(self, cis_generator, basic_cis_compliance_data):
basic_cis_compliance_data.requirements = []
basic_cis_compliance_data.attributes_by_requirement_id = {}
elements = cis_generator.create_charts_section(basic_cis_compliance_data)
# Must not raise; may or may not have any Image
assert isinstance(elements, list)
# =============================================================================
# Requirements index
# =============================================================================
class TestCISRequirementsIndex:
def test_title_present(self, cis_generator, populated_cis_compliance_data):
elements = cis_generator.create_requirements_index(
populated_cis_compliance_data
)
paragraphs = [e for e in elements if isinstance(e, Paragraph)]
text = " ".join(str(p.text) for p in paragraphs)
assert "Requirements Index" in text
def test_groups_by_section(self, cis_generator, populated_cis_compliance_data):
elements = cis_generator.create_requirements_index(
populated_cis_compliance_data
)
paragraphs = [e for e in elements if isinstance(e, Paragraph)]
text = " ".join(str(p.text) for p in paragraphs)
assert "1 Identity and Access Management" in text
assert "2 Storage" in text
def test_renders_tables_per_section(
self, cis_generator, populated_cis_compliance_data
):
elements = cis_generator.create_requirements_index(
populated_cis_compliance_data
)
# One table per section with requirements. ``create_data_table``
# returns a LongTable when the row count exceeds its threshold and a
# plain Table otherwise — both are valid.
tables = [e for e in elements if isinstance(e, (Table, LongTable))]
assert len(tables) == 2
# =============================================================================
# Detailed findings extras hook
# =============================================================================
class TestRenderRequirementDetailExtras:
def test_inserts_all_fields(self, cis_generator, populated_cis_compliance_data):
req = populated_cis_compliance_data.requirements[1] # 1.2 FAIL
extras = cis_generator._render_requirement_detail_extras(
req, populated_cis_compliance_data
)
text = " ".join(str(p.text) for p in extras if isinstance(p, Paragraph))
assert "Rationale" in text
assert "Impact" in text
assert "Audit Procedure" in text
assert "Remediation" in text
assert "References" in text
def test_missing_metadata_returns_empty(
self, cis_generator, basic_cis_compliance_data
):
basic_cis_compliance_data.attributes_by_requirement_id = {}
req = RequirementData(id="99", description="unknown", status=StatusChoices.FAIL)
extras = cis_generator._render_requirement_detail_extras(
req, basic_cis_compliance_data
)
assert extras == []
def test_escapes_html_chars(self, cis_generator, basic_cis_compliance_data):
attr = _make_attr(
"1 IAM",
rationale="<script>alert('x')</script>",
)
basic_cis_compliance_data.attributes_by_requirement_id = {
"1.1": {"attributes": {"req_attributes": [attr], "checks": []}}
}
req = RequirementData(id="1.1", description="d", status=StatusChoices.FAIL)
extras = cis_generator._render_requirement_detail_extras(
req, basic_cis_compliance_data
)
text = " ".join(str(p.text) for p in extras if isinstance(p, Paragraph))
assert "<script>" not in text
assert "&lt;script&gt;" in text
# =============================================================================
# Cover page
# =============================================================================
class TestCISCoverPage:
@patch("tasks.jobs.reports.cis.Image")
def test_cover_page_has_logo(
self, mock_image, cis_generator, basic_cis_compliance_data
):
elements = cis_generator.create_cover_page(basic_cis_compliance_data)
assert len(elements) > 0
assert mock_image.call_count >= 1
def test_cover_page_title_includes_version(
self, cis_generator, basic_cis_compliance_data
):
elements = cis_generator.create_cover_page(basic_cis_compliance_data)
paragraphs = [e for e in elements if isinstance(e, Paragraph)]
content = " ".join(str(p.text) for p in paragraphs)
assert "CIS Benchmark" in content
assert "5.0" in content
def test_cover_page_title_includes_provider_when_set(
self, cis_generator, basic_cis_compliance_data
):
provider = Mock()
provider.provider = "aws"
provider.uid = "123456789012"
provider.alias = "test-account"
basic_cis_compliance_data.provider_obj = provider
elements = cis_generator.create_cover_page(basic_cis_compliance_data)
paragraphs = [e for e in elements if isinstance(e, Paragraph)]
content = " ".join(str(p.text) for p in paragraphs)
assert "AWS" in content
+170
View File
@@ -36,6 +36,7 @@ from api.models import (
Provider,
Resource,
Scan,
ScanSummary,
StateChoices,
StatusChoices,
)
@@ -3358,6 +3359,175 @@ class TestAggregateFindings:
regions = {s.region for s in summaries}
assert regions == {"us-east-1", "us-west-2"}
@patch("tasks.jobs.scan.Finding.objects.filter")
@patch("tasks.jobs.scan.ScanSummary.objects.bulk_create")
@patch("tasks.jobs.scan.rls_transaction")
def test_aggregate_findings_skips_rows_with_null_service_or_region(
self, mock_rls_transaction, mock_bulk_create, mock_findings_filter
):
"""Aggregation rows with NULL service or region (orphan Findings whose
ResourceFindingMapping is missing) must be dropped before
``bulk_create`` so the NOT NULL constraints on ``scan_summaries`` are
not violated. Valid rows in the same batch must still be persisted."""
tenant_id = str(uuid.uuid4())
scan_id = str(uuid.uuid4())
base_counts = {
"fail": 1,
"_pass": 0,
"muted_count": 0,
"total": 1,
"new": 0,
"changed": 0,
"unchanged": 1,
"fail_new": 0,
"fail_changed": 0,
"pass_new": 0,
"pass_changed": 0,
"muted_new": 0,
"muted_changed": 0,
}
mock_queryset = MagicMock()
mock_queryset.values.return_value = mock_queryset
mock_queryset.annotate.return_value = [
{
"check_id": "check_valid",
"resources__service": "s3",
"severity": "high",
"resources__region": "us-east-1",
**base_counts,
},
{
"check_id": "check_null_service",
"resources__service": None,
"severity": "high",
"resources__region": "us-east-1",
**base_counts,
},
{
"check_id": "check_null_region",
"resources__service": "ec2",
"severity": "low",
"resources__region": None,
**base_counts,
},
{
"check_id": "check_null_both",
"resources__service": None,
"severity": "medium",
"resources__region": None,
**base_counts,
},
]
ctx = MagicMock()
ctx.__enter__.return_value = None
ctx.__exit__.return_value = False
mock_rls_transaction.return_value = ctx
mock_findings_filter.return_value = mock_queryset
aggregate_findings(tenant_id, scan_id)
mock_bulk_create.assert_called_once()
args, _ = mock_bulk_create.call_args
summaries = list(args[0])
assert len(summaries) == 1
assert summaries[0].check_id == "check_valid"
assert summaries[0].service == "s3"
assert summaries[0].region == "us-east-1"
def test_aggregate_findings_is_idempotent_on_rerun(
self,
tenants_fixture,
scans_fixture,
findings_fixture,
):
"""Re-running `aggregate_findings` for the same scan must not violate
the `unique_scan_summary` constraint. The post-mute reaggregation
pipeline re-dispatches `perform_scan_summary_task` against scans
whose summaries already exist; upsert must update existing rows in
place (same primary keys) rather than inserting duplicates."""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
value_columns = (
"check_id",
"service",
"severity",
"region",
"fail",
"_pass",
"muted",
"total",
)
aggregate_findings(str(tenant.id), str(scan.id))
first_run_ids = set(
ScanSummary.all_objects.filter(
tenant_id=tenant.id, scan_id=scan.id
).values_list("id", flat=True)
)
first_run_rows = list(
ScanSummary.all_objects.filter(tenant_id=tenant.id, scan_id=scan.id).values(
*value_columns
)
)
# Second invocation must not raise and must not duplicate rows.
aggregate_findings(str(tenant.id), str(scan.id))
second_run_ids = set(
ScanSummary.all_objects.filter(
tenant_id=tenant.id, scan_id=scan.id
).values_list("id", flat=True)
)
second_run_rows = list(
ScanSummary.all_objects.filter(tenant_id=tenant.id, scan_id=scan.id).values(
*value_columns
)
)
# Upsert preserves the original row identities; values stay stable
# because the underlying Finding set is unchanged between runs.
assert second_run_rows == first_run_rows
assert first_run_ids == second_run_ids
def test_aggregate_findings_reflects_mute_between_runs(
self,
tenants_fixture,
scans_fixture,
findings_fixture,
):
"""Re-running `aggregate_findings` after a finding is muted between
runs must move counters: the matching ScanSummary row's `fail`
decrements and `muted` increments. Guards against a regression where
upsert silently keeps stale values from the first run."""
tenant = tenants_fixture[0]
scan = scans_fixture[0]
finding1, _ = findings_fixture # finding1 is FAIL and not muted.
aggregate_findings(str(tenant.id), str(scan.id))
before = ScanSummary.all_objects.get(
tenant_id=tenant.id,
scan_id=scan.id,
check_id=finding1.check_id,
service="ec2",
severity=finding1.severity,
region="us-east-1",
)
assert before.fail == 1
assert before.muted == 0
Finding.all_objects.filter(pk=finding1.pk).update(muted=True)
aggregate_findings(str(tenant.id), str(scan.id))
after = ScanSummary.all_objects.get(pk=before.pk)
assert after.fail == 0
assert after.muted == 1
assert after.total == before.total
@pytest.mark.django_db
class TestAggregateFindingsByRegion:
+130 -33
View File
@@ -13,6 +13,8 @@ from tasks.jobs.lighthouse_providers import (
_extract_bedrock_credentials,
)
from tasks.tasks import (
DJANGO_TMP_OUTPUT_DIRECTORY,
STALE_TMP_OUTPUT_MAX_AGE_HOURS,
_cleanup_orphan_scheduled_scans,
_perform_scan_complete_tasks,
check_integrations_task,
@@ -236,7 +238,8 @@ class TestGenerateOutputs:
self.provider_id = str(uuid.uuid4())
self.tenant_id = str(uuid.uuid4())
def test_no_findings_returns_early(self):
@patch("tasks.tasks._cleanup_stale_tmp_output_directories")
def test_no_findings_returns_early(self, mock_cleanup_stale_tmp_output_directories):
with patch("tasks.tasks.ScanSummary.objects.filter") as mock_filter:
mock_filter.return_value.exists.return_value = False
@@ -248,6 +251,34 @@ class TestGenerateOutputs:
assert result == {"upload": False}
mock_filter.assert_called_once_with(scan_id=self.scan_id)
mock_cleanup_stale_tmp_output_directories.assert_called_once_with(
DJANGO_TMP_OUTPUT_DIRECTORY,
max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS,
exclude_scan=(self.tenant_id, self.scan_id),
)
@patch(
"tasks.tasks._cleanup_stale_tmp_output_directories",
side_effect=RuntimeError("cleanup boom"),
)
def test_cleanup_exception_does_not_break_no_findings_flow(
self, mock_cleanup_stale_tmp_output_directories
):
with patch("tasks.tasks.ScanSummary.objects.filter") as mock_filter:
mock_filter.return_value.exists.return_value = False
result = generate_outputs_task(
scan_id=self.scan_id,
provider_id=self.provider_id,
tenant_id=self.tenant_id,
)
assert result == {"upload": False}
mock_cleanup_stale_tmp_output_directories.assert_called_once_with(
DJANGO_TMP_OUTPUT_DIRECTORY,
max_age_hours=STALE_TMP_OUTPUT_MAX_AGE_HOURS,
exclude_scan=(self.tenant_id, self.scan_id),
)
@patch("tasks.tasks._upload_to_s3")
@patch("tasks.tasks._compress_output_files")
@@ -309,7 +340,7 @@ class TestGenerateOutputs:
),
patch(
"tasks.tasks.COMPLIANCE_CLASS_MAP",
{"aws": [(lambda x: True, MagicMock(name="CSVCompliance"))]},
{"aws": [(lambda _x: True, MagicMock(name="CSVCompliance"))]},
),
patch(
"tasks.tasks._generate_output_directory",
@@ -361,7 +392,7 @@ class TestGenerateOutputs:
),
patch(
"tasks.tasks.COMPLIANCE_CLASS_MAP",
{"aws": [(lambda x: True, MagicMock())]},
{"aws": [(lambda _x: True, MagicMock())]},
),
patch("tasks.tasks._compress_output_files", return_value="/tmp/compressed"),
patch("tasks.tasks._upload_to_s3", return_value=None),
@@ -441,7 +472,7 @@ class TestGenerateOutputs:
),
patch(
"tasks.tasks.COMPLIANCE_CLASS_MAP",
{"aws": [(lambda x: True, mock_compliance_class)]},
{"aws": [(lambda _x: True, mock_compliance_class)]},
),
):
mock_filter.return_value.exists.return_value = True
@@ -470,6 +501,10 @@ class TestGenerateOutputs:
class TrackingWriter:
def __init__(self, findings, file_path, file_extension, from_cli):
self.findings = findings
self.file_path = file_path
self.file_extension = file_extension
self.from_cli = from_cli
self.transform_called = 0
self.batch_write_data_to_file = MagicMock()
self._data = []
@@ -578,13 +613,13 @@ class TestGenerateOutputs:
patch("tasks.tasks.FindingOutput._transform_findings_stats"),
patch(
"tasks.tasks.FindingOutput.transform_api_finding",
side_effect=lambda f, prov: f,
side_effect=lambda f, _prov: f,
),
patch("tasks.tasks._compress_output_files", return_value="outdir.zip"),
patch("tasks.tasks._upload_to_s3", return_value="s3://bucket/outdir.zip"),
patch(
"tasks.tasks.Scan.all_objects.filter",
return_value=MagicMock(update=lambda **kw: None),
return_value=MagicMock(update=lambda **_kw: None),
),
patch("tasks.tasks.batched", return_value=two_batches),
patch("tasks.tasks.OUTPUT_FORMATS_MAPPING", {}),
@@ -666,7 +701,7 @@ class TestGenerateOutputs:
),
patch(
"tasks.tasks.COMPLIANCE_CLASS_MAP",
{"aws": [(lambda x: True, mock_compliance_class)]},
{"aws": [(lambda _x: True, mock_compliance_class)]},
),
):
mock_filter.return_value.exists.return_value = True
@@ -748,7 +783,7 @@ class TestScanCompleteTasks:
@patch("tasks.tasks.can_provider_run_attack_paths_scan", return_value=False)
def test_scan_complete_tasks(
self,
mock_can_run_attack_paths,
_mock_can_run_attack_paths,
mock_attack_paths_task,
mock_check_integrations_task,
mock_compliance_reports_task,
@@ -994,7 +1029,7 @@ class TestCheckIntegrationsTask:
@patch("tasks.tasks.rmtree")
def test_generate_outputs_with_asff_for_aws_with_security_hub(
self,
mock_rmtree,
_mock_rmtree,
mock_scan_update,
mock_upload,
mock_compress,
@@ -1122,7 +1157,7 @@ class TestCheckIntegrationsTask:
@patch("tasks.tasks.rmtree")
def test_generate_outputs_no_asff_for_aws_without_security_hub(
self,
mock_rmtree,
_mock_rmtree,
mock_scan_update,
mock_upload,
mock_compress,
@@ -1245,7 +1280,7 @@ class TestCheckIntegrationsTask:
@patch("tasks.tasks.rmtree")
def test_generate_outputs_no_asff_for_non_aws_provider(
self,
mock_rmtree,
_mock_rmtree,
mock_scan_update,
mock_upload,
mock_compress,
@@ -2359,11 +2394,26 @@ class TestReaggregateAllFindingGroupSummaries:
def setup_method(self):
self.tenant_id = str(uuid.uuid4())
@patch("tasks.tasks.chain")
@patch("tasks.tasks.group")
@patch("tasks.tasks.aggregate_attack_surface_task")
@patch("tasks.tasks.aggregate_scan_category_summaries_task")
@patch("tasks.tasks.aggregate_scan_resource_group_summaries_task")
@patch("tasks.tasks.aggregate_finding_group_summaries_task")
@patch("tasks.tasks.aggregate_daily_severity_task")
@patch("tasks.tasks.perform_scan_summary_task")
@patch("tasks.tasks.Scan.objects.filter")
def test_dispatches_subtasks_for_each_provider_per_day(
self, mock_scan_filter, mock_agg_task, mock_group
self,
mock_scan_filter,
mock_scan_summary_task,
mock_daily_severity_task,
mock_finding_group_task,
mock_resource_group_task,
mock_category_task,
mock_attack_surface_task,
mock_group,
mock_chain,
):
provider_id_1 = uuid.uuid4()
provider_id_2 = uuid.uuid4()
@@ -2373,8 +2423,13 @@ class TestReaggregateAllFindingGroupSummaries:
today = datetime.now(tz=timezone.utc)
yesterday = today - timedelta(days=1)
mock_group_result = MagicMock()
mock_group.side_effect = lambda gen: (list(gen), mock_group_result)[1]
mock_outer_group_result = MagicMock()
# The first `group()` call wraps the inner parallel step; subsequent
# calls wrap the outer per-scan generator.
mock_group.side_effect = lambda *args, **kwargs: (
list(args[0]) if args and hasattr(args[0], "__iter__") else None,
mock_outer_group_result,
)[1]
mock_scan_filter.return_value.order_by.return_value.values.return_value = [
{
@@ -2397,23 +2452,49 @@ class TestReaggregateAllFindingGroupSummaries:
result = reaggregate_all_finding_group_summaries_task(tenant_id=self.tenant_id)
assert result == {"scans_reaggregated": 3}
assert mock_agg_task.si.call_count == 3
mock_agg_task.si.assert_any_call(
tenant_id=self.tenant_id, scan_id=str(scan_id_today_p1)
)
mock_agg_task.si.assert_any_call(
tenant_id=self.tenant_id, scan_id=str(scan_id_today_p2)
)
mock_agg_task.si.assert_any_call(
tenant_id=self.tenant_id, scan_id=str(scan_id_yesterday_p1)
)
mock_group_result.apply_async.assert_called_once()
expected_scan_ids = {
str(scan_id_today_p1),
str(scan_id_today_p2),
str(scan_id_yesterday_p1),
}
for task_mock in (
mock_scan_summary_task,
mock_daily_severity_task,
mock_finding_group_task,
mock_resource_group_task,
mock_category_task,
mock_attack_surface_task,
):
assert task_mock.si.call_count == 3
dispatched = {
call.kwargs["scan_id"] for call in task_mock.si.call_args_list
}
assert dispatched == expected_scan_ids
for call in task_mock.si.call_args_list:
assert call.kwargs["tenant_id"] == self.tenant_id
assert mock_chain.call_count == 3
mock_outer_group_result.apply_async.assert_called_once()
@patch("tasks.tasks.chain")
@patch("tasks.tasks.group")
@patch("tasks.tasks.aggregate_attack_surface_task")
@patch("tasks.tasks.aggregate_scan_category_summaries_task")
@patch("tasks.tasks.aggregate_scan_resource_group_summaries_task")
@patch("tasks.tasks.aggregate_finding_group_summaries_task")
@patch("tasks.tasks.aggregate_daily_severity_task")
@patch("tasks.tasks.perform_scan_summary_task")
@patch("tasks.tasks.Scan.objects.filter")
def test_dedupes_scans_to_latest_per_provider_per_day(
self, mock_scan_filter, mock_agg_task, mock_group
self,
mock_scan_filter,
mock_scan_summary_task,
mock_daily_severity_task,
mock_finding_group_task,
mock_resource_group_task,
mock_category_task,
mock_attack_surface_task,
mock_group,
mock_chain,
):
"""When several scans run on the same day for the same provider, only
the latest one is dispatched (matching the daily summary unique key)."""
@@ -2423,8 +2504,11 @@ class TestReaggregateAllFindingGroupSummaries:
today_late = datetime.now(tz=timezone.utc)
today_early = today_late - timedelta(hours=4)
mock_group_result = MagicMock()
mock_group.side_effect = lambda gen: (list(gen), mock_group_result)[1]
mock_outer_group_result = MagicMock()
mock_group.side_effect = lambda *args, **kwargs: (
list(args[0]) if args and hasattr(args[0], "__iter__") else None,
mock_outer_group_result,
)[1]
# Returned ordered by `-completed_at`, so the most recent comes first.
mock_scan_filter.return_value.order_by.return_value.values.return_value = [
@@ -2443,17 +2527,30 @@ class TestReaggregateAllFindingGroupSummaries:
result = reaggregate_all_finding_group_summaries_task(tenant_id=self.tenant_id)
assert result == {"scans_reaggregated": 1}
mock_agg_task.si.assert_called_once_with(
tenant_id=self.tenant_id, scan_id=str(latest_scan_today)
)
mock_group_result.apply_async.assert_called_once()
for task_mock in (
mock_scan_summary_task,
mock_daily_severity_task,
mock_finding_group_task,
mock_resource_group_task,
mock_category_task,
mock_attack_surface_task,
):
task_mock.si.assert_called_once_with(
tenant_id=self.tenant_id, scan_id=str(latest_scan_today)
)
mock_chain.assert_called_once()
mock_outer_group_result.apply_async.assert_called_once()
@patch("tasks.tasks.chain")
@patch("tasks.tasks.group")
@patch("tasks.tasks.Scan.objects.filter")
def test_no_completed_scans_skips_dispatch(self, mock_scan_filter, mock_group):
def test_no_completed_scans_skips_dispatch(
self, mock_scan_filter, mock_group, mock_chain
):
mock_scan_filter.return_value.order_by.return_value.values.return_value = []
result = reaggregate_all_finding_group_summaries_task(tenant_id=self.tenant_id)
assert result == {"scans_reaggregated": 0}
mock_group.assert_not_called()
mock_chain.assert_not_called()
@@ -34,6 +34,10 @@ spec:
securityContext:
{{- toYaml . | nindent 8 }}
{{- end }}
{{- with .Values.worker.initContainers }}
initContainers:
{{- toYaml . | nindent 8 }}
{{- end }}
containers:
- name: worker
{{- with .Values.worker.securityContext }}

Some files were not shown because too many files have changed in this diff Show More