feat(api): make Attack Paths sink selectable between Neo4j and Neptune (#11524)

2026-07-04 19:21:51 +00:00 · 2026-06-26 10:22:29 +02:00
parent 9b8b77cec0
commit 5793cd7e38
48 changed files with 9928 additions and 3210 deletions
@@ -2,13 +2,14 @@
 name: prowler-attack-paths-query
 description: >
  Creates Prowler Attack Paths openCypher queries using the Cartography schema as the source of truth
-  for node labels, properties, and relationships. Also covers Prowler-specific additions (Internet node,
-  ProwlerFinding, internal isolation labels) and $provider_uid scoping for predefined queries.
+  for node labels, properties, and relationships. Covers Prowler-specific additions (Internet node,
+  ProwlerFinding, internal isolation labels), $provider_uid scoping, and list-property item nodes
+  with typed `HAS_*` edges that run efficiently on both Neo4j and Amazon Neptune sinks.
  Trigger: When creating or updating Attack Paths queries.
 license: Apache-2.0
 metadata:
  author: prowler-cloud
-  version: "2.0"
+  version: "3.0"
  scope: [root, api]
  auto_invoke:
    - "Creating Attack Paths queries"
@@ -19,36 +20,30 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task

 ## Overview

-Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs (ingested via Cartography) to detect security risks like privilege escalation paths, network exposure, and misconfigurations.
-
-Queries are written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
+Attack Paths queries are read-only openCypher queries over a Cartography-ingested cloud graph that detect privilege escalation chains, network exposure, and other graph-shaped security risks. Queries are written in openCypher Version 9 so they run on both Neo4j and Amazon Neptune sinks.

 ---

 ## Two query audiences

-This skill covers two types of queries with different isolation mechanisms:
+|                    | Predefined queries                                          | Custom queries                                                        |
+| ------------------ | ----------------------------------------------------------- | --------------------------------------------------------------------- |
+| Where they live    | `api/src/backend/api/attack_paths/queries/{provider}.py`    | User-supplied via the custom query API endpoint                       |
+| Provider isolation | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection by `cypher_sanitizer.py` |
+| What to write      | Chain every MATCH from the `aws` variable                   | Plain Cypher, no isolation boilerplate                                |
+| Internal labels    | Never use                                                   | Never use (system-injected)                                           |

-| | Predefined queries | Custom queries |
-|---|---|---|
-| **Where they live** | `api/src/backend/api/attack_paths/queries/{provider}.py` | User/LLM-supplied via the custom query API endpoint |
-| **Provider isolation** | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection via `cypher_sanitizer.py` |
-| **What to write** | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate needed |
-| **Internal labels** | Never use (`_ProviderResource`, `_Tenant_*`, `_Provider_*`) | Never use (injected automatically by the system) |
+**Predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. That is the isolation boundary.

-**For predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. This is the isolation boundary.
-
-**For custom queries**: write natural Cypher without isolation concerns. The query runner injects a `_Provider_{uuid}` label into every node pattern before execution, and a post-query filter catches edge cases.
+**Custom queries**: write natural Cypher. The runner injects a `_Provider_{uuid}` label into every node pattern, and a post-query filter handles edge cases.

 ---

-## Input Sources
+## Input sources

-Queries can be created from:
+Two sources for new queries:

-1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
-   - Reference: https://github.com/DataDog/pathfinding.cloud
-   - The aggregated `paths.json` is too large for WebFetch. Use Bash:
+1. **pathfinding.cloud ID** (e.g. `ECS-001`, `GLUE-001`), the Datadog research catalogue. The aggregated `paths.json` is too large for WebFetch:

   ```bash
   # Fetch a single path by ID
@@ -64,28 +59,24 @@ Queries can be created from:
     | jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
   ```

-   If `jq` is not available, use `python3 -c "import json,sys; ..."` as a fallback.
+   If `jq` is unavailable, use `python3 -c "import json,sys; ..."`.

-2. **Natural language description** from the user
+2. **Natural language description** from the requester.

 ---

-## Query Structure
+## Query structure

 ### Provider scoping parameter

-One parameter is injected automatically by the query runner:
+| Parameter       | Property | Used on      | Purpose                                |
+| --------------- | -------- | ------------ | -------------------------------------- |
+| `$provider_uid` | `id`     | `AWSAccount` | Scopes the query to a specific account |

-| Parameter       | Property it matches | Used on      | Purpose                          |
-| --------------- | ------------------- | ------------ | -------------------------------- |
-| `$provider_uid` | `id`                | `AWSAccount` | Scopes to a specific AWS account |
-
-All other nodes are isolated by path connectivity from the `AWSAccount` anchor.
+The runner binds `$provider_uid` automatically. Every other node is isolated by path connectivity from the `AWSAccount` anchor.

 ### Imports

-All query files start with these imports:
-
 ```python
 from api.attack_paths.queries.types import (
    AttackPathsQueryAttribution,
@@ -95,29 +86,33 @@ from api.attack_paths.queries.types import (
 from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
 ```

-The `PROWLER_FINDING_LABEL` constant (value: `"ProwlerFinding"`) is used via f-string interpolation in all queries. Never hardcode the label string.
+Always use `PROWLER_FINDING_LABEL` via f-string interpolation, never hardcode `"ProwlerFinding"`.

-### Privilege escalation sub-patterns
+### Definition fields

-There are four distinct privilege escalation patterns. Choose based on the attack type:
+- **id**: kebab-case `{provider}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
+- **name**: short, human-friendly label. Sourced queries append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
+- **short_description**: one sentence, no technical permissions.
+- **description**: full technical explanation, plain text.
+- **provider**: `aws`, `azure`, `gcp`, `kubernetes`, or `github`.
+- **cypher**: f-string Cypher body. Literal `{` / `}` are escaped as `{{` / `}}`.
+- **parameters**: `parameters=[]` if none.
+- **attribution**: optional `AttackPathsQueryAttribution(text, link)` for sourced queries. `link` uses the lowercase ID.

-| Sub-pattern | Target | `path_target` shape | Example |
-|---|---|---|---|
-| Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
-| Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
-| Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
-| PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(...)` | EC2-001 |
+Append the constant to the `{PROVIDER}_QUERIES` list at the bottom of the provider file.

-#### Self-escalation (e.g., IAM-001)
+---

-The principal modifies resources attached to itself. `path_target` loops back to `principal`:
+## Predefined query template
+
+The canonical shape combines a principal walk, an optional target walk, deduplicated nodes, and a typed finding overlay:

 ```python
 AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    id="aws-{kebab-case-name}",
-    name="{Human-friendly label} ({REFERENCE_ID})",
-    short_description="{Brief explanation, no technical permissions.}",
-    description="{Detailed description of the attack vector and impact.}",
+    name="{Label} ({REFERENCE_ID})",
+    short_description="{One sentence.}",
+    description="{Full technical explanation.}",
    attribution=AttackPathsQueryAttribution(
        text="pathfinding.cloud - {REFERENCE_ID} - {permission}",
        link="https://pathfinding.cloud/paths/{reference_id_lowercase}",
@@ -125,29 +120,27 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    provider="aws",
    cypher=f"""
        // Find principals with {permission}
-        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
-        WHERE stmt.effect = 'Allow'
-            AND any(action IN stmt.action WHERE
-                toLower(action) = '{permission_lowercase}'
-                OR toLower(action) = '{service}:*'
-                OR action = '*'
-            )
+        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {{effect: 'Allow'}})
+        MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
+        WHERE toLower(act.value) IN ['{permission_lowercase}', '{service}:*']
+           OR act.value = '*'
+        WITH DISTINCT aws, principal, stmt, path_principal

-        // Find target resources attached to the same principal
+        // Target resources attached to the same principal (sub-patterns below)
        MATCH path_target = (aws)--(target_policy:AWSPolicy)--(principal)
        WHERE target_policy.arn CONTAINS $provider_uid
-            AND any(resource IN stmt.resource WHERE
-                resource = '*'
-                OR target_policy.arn CONTAINS resource
-            )
+        MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
+        WHERE res.value = '*'
+           OR target_policy.arn CONTAINS res.value

+        WITH DISTINCT path_principal, path_target
        WITH collect(path_principal) + collect(path_target) AS paths
        UNWIND paths AS p
        UNWIND nodes(p) AS n

        WITH paths, collect(DISTINCT n) AS unique_nodes
        UNWIND unique_nodes AS n
-        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+        OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

        RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
    """,
@@ -155,158 +148,145 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
 )
 ```

-#### Other sub-pattern `path_target` shapes
+Key points:

-The other 3 sub-patterns share the same `path_principal`, deduplication tail, and RETURN as self-escalation. Only the `path_target` MATCH differs:
+- The principal walk types the `POLICY` and `STATEMENT` hops. Both are low-fan-out (each principal has a handful of policies; each policy a handful of statements), so the typed edge lets the planner cost a cheap inline filter.
+- The `(aws)--` hub hops stay anonymous. `AWSAccount` is a high-degree node that fans out to every principal, role, policy, and resource in the account; typing those edges forces the planner to enumerate from the hub and collapses performance on multi-tenant Neptune.
+- Other relationship types appear only where the file's existing queries already use one (`TRUSTS_AWS_PRINCIPAL`, `STS_ASSUMEROLE_ALLOW`, `MEMBER_AWS_GROUP`, `HAS_EXECUTION_ROLE`).
+- The finding probe is typed `:HAS_FINDING` and left undirected. The type lets Neptune apply an inline edge filter; the lack of direction matches the convention of the rest of the file.
+- Collapse duplicate rows after each permission gate with `WITH DISTINCT`, carrying only the variables needed by later clauses.
+- Each `HAS_*` traversal is its own `MATCH` clause with a `WHERE` on the child item node. `WITH DISTINCT path_principal, path_target` precedes `collect(path...)` to dedupe the row multiplication produced by the joins.
+- The `RETURN` shape `paths, dpf, dpfr` is the contract the serializer and visualiser depend on. Do not change it.
+
+---
+
+## Privilege escalation sub-patterns
+
+Four `path_target` shapes cover the common attack types. Each shares the canonical template's `path_principal`, deduplication tail, and `RETURN`; only the `path_target` MATCH and its resource predicate differ.
+
+| Sub-pattern         | Target                   | `path_target` shape                                                                                     | Example |
+| ------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------- | ------- |
+| Self-escalation     | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)`                                                         | IAM-001 |
+| Lateral to user     | Other IAM users          | `(aws)--(target_user:AWSUser)`                                                                          | IAM-002 |
+| Assume-role lateral | Assumable roles          | `(aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)`                                      | IAM-014 |
+| PassRole + service  | Service-trusting roles   | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: '{service}.amazonaws.com'})` | EC2-001 |
+
+**Multi-permission queries** (e.g. PassRole plus a service-create action) add permission gates before `path_target`. Reuse the per-query counter for new variables (`act2`, `policy2`, `stmt2`) and collapse rows after each gate:

 ```cypher
-// Lateral to user (e.g., IAM-002) - targets other IAM users
-MATCH path_target = (aws)--(target_user:AWSUser)
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_user.arn CONTAINS resource OR resource CONTAINS target_user.name)
-
-// Assume-role lateral (e.g., IAM-014) - targets roles the principal can assume
-MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
-
-// PassRole + service (e.g., EC2-001) - targets roles trusting a service
-MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: '{service}.amazonaws.com'})
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
+MATCH (principal)-[:POLICY]->(policy2:AWSPolicy)-[:STATEMENT]->(stmt2:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt2)-[:HAS_ACTION]->(act2:AWSPolicyStatementActionItem)
+WHERE toLower(act2.value) IN ['service:*', 'service:createsomething']
+   OR act2.value = '*'
+WITH DISTINCT aws, principal, stmt, stmt2, path_principal
 ```

-**Multi-permission**: PassRole queries require a second permission. Add `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` with its own WHERE before `path_target`, then check BOTH `stmt.resource` AND `stmt2.resource` against the target. See IAM-015 or EC2-001 in `aws.py` for examples.
+If a permission is an existence-only gate whose statement resource is not checked later, keep the policy and statement anonymous and carry only the variables still needed:

-### Network exposure pattern
+```cypher
+MATCH (principal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(:AWSPolicyStatement {effect: 'Allow'})-[:HAS_ACTION]->(act3:AWSPolicyStatementActionItem)
+WHERE toLower(act3.value) IN ['service:*', 'service:othersomething']
+   OR act3.value = '*'
+WITH DISTINCT aws, principal, stmt, path_principal
+```

-The Internet node is reached via `CAN_ACCESS` through the already-scoped resource, not via a standalone lookup:
+When all matching principals can target the same independent resource set, collect principal paths before expanding targets instead of creating one row per principal-target pair:
+
+```cypher
+WITH aws, collect(DISTINCT path_principal) AS principal_paths
+MATCH path_target = (aws)--(target)
+WITH principal_paths + collect(DISTINCT path_target) AS paths
+```
+
+Statements that constrain a target are still checked via `HAS_RESOURCE` traversals (`res`, `res2`). See IAM-015 or EC2-001 in `aws.py`.
+
+---
+
+## Network exposure pattern
+
+The Internet node is reached via `CAN_ACCESS` through an already-scoped resource, never as a standalone lookup:

 ```python
-AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
-    id="aws-{kebab-case-name}",
-    name="{Human-friendly label}",
-    short_description="{Brief explanation.}",
-    description="{Detailed description.}",
-    provider="aws",
-    cypher=f"""
-        // Match exposed resources (MUST chain from `aws`)
-        MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
-        WHERE resource.exposed_internet = true
+cypher=f"""
+    // Resource scoped through the account anchor
+    MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
+    WHERE resource.exposed_internet = true

-        // Internet node reached via path connectivity through the resource
-        OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
+    // Internet node reached via path connectivity through the resource
+    OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)

-        WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
-        UNWIND paths AS p
-        UNWIND nodes(p) AS n
+    WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
+    UNWIND paths AS p
+    UNWIND nodes(p) AS n

-        WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
-        UNWIND unique_nodes AS n
-        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+    WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
+    UNWIND unique_nodes AS n
+    OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

-        RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
-            internet, can_access
-    """,
-    parameters=[],
-)
+    RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
+        internet, can_access
+"""
 ```

-### Register in query list
-
-Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
-
-```python
-AWS_QUERIES: list[AttackPathsQueryDefinition] = [
-    # ... existing queries ...
-    AWS_{NEW_QUERY_NAME},  # Add here
-]
-```
+The `CAN_ACCESS` edge stays typed and directed (`-[:CAN_ACCESS]->`); that is its canonical sync-time orientation.

 ---

-## Step-by-step creation process
+## List-typed properties as child nodes

-### 1. Read the queries module
+Some Cartography node properties carry a list of values: `AWSPolicyStatement.action`, `AWSPolicyStatement.resource`, `KMSKey.encryption_algorithms`, `CloudFrontDistribution.aliases`, and many others. The graph models each such property as a set of child item nodes connected to the parent by a typed edge. Queries reach the values by traversing the edge; the parent does not carry the list as a single field.

-**FIRST**, read all files in the queries module to understand the structure, type definitions, registration, and existing style:
+### Naming convention

-```text
-api/src/backend/api/attack_paths/queries/
-├── __init__.py      # Module exports
-├── types.py         # AttackPathsQueryDefinition, AttackPathsQueryParameterDefinition
-├── registry.py      # Query registry logic
-└── {provider}.py    # Provider-specific queries (e.g., aws.py)
+For a list-typed parent property the sink stores:
+
+- **Child label**: `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` → `AWSPolicyStatementResourceItem`.
+- **Edge type**: `HAS_<PROPERTY_UPPER>`. Example: `resource` → `HAS_RESOURCE`.
+- **Child property**: `value` (a single scalar string) for scalar-list properties. For list-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) the child carries the dict keys as named fields per the catalog's `field_map`.
+
+### Variable naming for child-item matches
+
+`aws.py` uses a per-query counter for each `HAS_*` traversal so chained matches stay unambiguous:
+
+| Edge              | First  | Second  | Third   |
+| ----------------- | ------ | ------- | ------- |
+| `HAS_ACTION`      | `act`  | `act2`  | `act3`  |
+| `HAS_RESOURCE`    | `res`  | `res2`  | `res3`  |
+| `HAS_NOTACTION`   | `nact` | `nact2` | `nact3` |
+| `HAS_NOTRESOURCE` | `nres` | `nres2` | `nres3` |
+
+The counter resets at the top of every query.
+
+### Example - action match
+
+Find statements that grant `iam:PassRole`, `iam:*`, or `*`. Traverse the `HAS_ACTION` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`:
+
+```cypher
+MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
+WHERE toLower(act.value) IN ['iam:passrole', 'iam:*']
+   OR act.value = '*'
 ```

-**DO NOT** use generic templates. Match the exact style of existing queries in the file.
+The literal-action list is case-folded with `toLower(act.value)` because IAM authors mix case (`iam:PassRole`, `iam:passrole`); the `*` wildcard never lower-cases.

-### 2. Fetch and consult the Cartography schema
+### Example - resource ARN match

-**This is the most important step.** Every node label, property, and relationship in the query must exist in the Cartography schema for the pinned version. Do not guess or rely on memory.
+Find statements whose resource can target a specific role:

-Check `api/pyproject.toml` for the Cartography dependency, then fetch the schema:
-
-```bash
-grep cartography api/pyproject.toml
+```cypher
+MATCH path_target = (aws)--(target_role:AWSRole)
+MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
+WHERE res.value = '*'
+   OR res.value CONTAINS target_role.name
+   OR target_role.arn CONTAINS res.value
 ```

-Build the schema URL (ALWAYS use the specific tag, not master/main):
+Three predicates cover the cases: full wildcard (`*`), pattern containing the role name (`arn:aws:iam::*:role/admin*`), and pattern that is a prefix or component of the actual ARN.

-```text
-# Git dependency (prowler-cloud/cartography@0.126.1):
-https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/{provider}/schema.md
+### Catalog of list properties

-# PyPI dependency (cartography = "^0.126.0"):
-https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/{provider}/schema.md
-```
-
-Read the schema to discover available node labels, properties, and relationships for the target resources. Internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never appear in queries.
-
-### 4. Create query definition
-
-Use the appropriate pattern (privilege escalation or network exposure) with:
-
- **id**: `{provider}-{kebab-case-description}`
- **name**: Short, human-friendly label. For sourced queries, append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
- **short_description**: Brief explanation, no technical permissions.
- **description**: Full technical explanation. Plain text only.
- **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
- **cypher**: The openCypher query with proper escaping
- **parameters**: Optional list of user-provided parameters (`parameters=[]` if none)
- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes source, reference ID, and permissions. The `link` uses a lowercase ID. Omit for non-sourced queries.
-
-### 5. Add query to provider list
-
-Add the constant to the `{PROVIDER}_QUERIES` list.
-
---
-
-## Query naming conventions
-
-### Query ID
-
-```text
-{provider}-{category}-{description}
-```
-
-Examples: `aws-ec2-privesc-passrole-iam`, `aws-ec2-instances-internet-exposed`
-
-### Query constant name
-
-```text
-{PROVIDER}_{CATEGORY}_{DESCRIPTION}
-```
-
-Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
-
---
-
-## Query categories
-
-| Category             | Description                    | Example                   |
-| -------------------- | ------------------------------ | ------------------------- |
-| Basic Resource       | List resources with properties | RDS instances, S3 buckets |
-| Network Exposure     | Internet-exposed resources     | EC2 with public IPs       |
-| Privilege Escalation | IAM privilege escalation paths | PassRole + RunInstances   |
-| Data Access          | Access to sensitive data       | EC2 with S3 access        |
+The provider catalog lives in `api/src/backend/tasks/jobs/attack_paths/provider_config.py` (`AWS_NORMALIZED_LISTS`). Beyond policy statements it includes KMS algorithms, ECS container-definition lists (`entry_point`, `command`, `links`, `dns_servers`, ...), CloudFront aliases, Inspector finding URL and vulnerability lists, RDS event-subscription categories, and others. To query a list property that is not in the catalog, add an entry there first so the sync layer materialises it.

 ---

@@ -315,53 +295,42 @@ Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
 ### Match account and principal

 ```cypher
-MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
 ```

-### Check IAM action permissions
+The `(aws)--(principal)` hop stays anonymous; the `POLICY` and `STATEMENT` hops are typed.
+
+### Roles trusting a service

 ```cypher
-WHERE stmt.effect = 'Allow'
-    AND any(action IN stmt.action WHERE
-        toLower(action) = 'iam:passrole'
-        OR toLower(action) = 'iam:*'
-        OR action = '*'
-    )
+MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
 ```

-### Find roles trusting a service
+### Roles a principal can assume

 ```cypher
-MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
+MATCH path_target = (aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)
 ```

-### Find roles the principal can assume
+### JSON-encoded properties

-Note the arrow direction - `STS_ASSUMEROLE_ALLOW` points from the role to the principal:
+Object-typed Cartography properties (most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement`) are stored as JSON-encoded strings, e.g. `'{"StringEquals":{"aws:SourceAccount":"123456789012"}}'`. There is no JSON parser at query time, so use `CONTAINS` for substring checks:

 ```cypher
-MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
+WHERE stmt.condition CONTAINS '"aws:SourceAccount"'
 ```

-### Check resource scope
-
-```cypher
-WHERE any(resource IN stmt.resource WHERE
-    resource = '*'
-    OR target_role.arn CONTAINS resource
-    OR resource CONTAINS target_role.name
-)
-```
+For structured inspection, fetch the rows and parse in Python. Cypher cannot navigate JSON object keys.

 ### Internet node via path connectivity

-The Internet node is reached through `CAN_ACCESS` relationships to already-scoped resources. No standalone lookup needed:
-
 ```cypher
 OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
 ```

-### Multi-label OR (match multiple resource types)
+`resource` must already be bound by the account-anchored pattern above.
+
+### Multi-label OR (multiple resource types)

 ```cypher
 MATCH path = (aws:AWSAccount {id: $provider_uid})-[r]-(x)-[q]-(y)
@@ -373,7 +342,7 @@ WHERE (x:EC2PrivateIp AND x.public_ip = $ip)

 ### Include Prowler findings

-Deduplicate nodes before the ProwlerFinding lookup to avoid redundant OPTIONAL MATCH calls on nodes that appear in multiple paths:
+Deduplicate nodes before the typed finding probe to avoid one `OPTIONAL MATCH` per path-occurrence of the same node:

 ```cypher
 WITH collect(path_principal) + collect(path_target) AS paths
@@ -382,12 +351,12 @@ UNWIND nodes(p) AS n

 WITH paths, collect(DISTINCT n) AS unique_nodes
 UNWIND unique_nodes AS n
-OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

 RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
 ```

-For network exposure queries, aggregate the internet node and relationship alongside paths:
+For network-exposure queries, aggregate the Internet node and its edge alongside paths:

 ```cypher
 WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
@@ -396,7 +365,7 @@ UNWIND nodes(p) AS n

 WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
 UNWIND unique_nodes AS n
-OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

 RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
    internet, can_access
@@ -406,22 +375,22 @@ RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,

 ## Prowler-specific labels and relationships

-These are added by the sync task, not part of the Cartography schema. For all other node labels, properties, and relationships, **always consult the Cartography schema** (see step 2 below).
+Added by the sync task, not part of the Cartography schema. For everything else, consult the pinned Cartography schema (see "Creation steps").

-| Label/Relationship     | Description                                        |
-| ---------------------- | -------------------------------------------------- |
-| `ProwlerFinding`       | Finding node (`status`, `severity`, `check_id`)    |
-| `Internet`             | Internet sentinel node                             |
-| `CAN_ACCESS`           | Internet-to-resource exposure (relationship)       |
-| `HAS_FINDING`          | Resource-to-finding link (relationship)            |
-| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship                            |
-| `STS_ASSUMEROLE_ALLOW` | Can assume role (direction: role -> principal)      |
+| Label / Relationship   | Description                                                 |
+| ---------------------- | ----------------------------------------------------------- |
+| `ProwlerFinding`       | Finding node (`status`, `severity`, `check_id`)             |
+| `Internet`             | Internet sentinel node                                      |
+| `CAN_ACCESS`           | `(Internet)-[:CAN_ACCESS]->(resource)` exposure edge        |
+| `HAS_FINDING`          | `(resource)-[:HAS_FINDING]->(:ProwlerFinding)` finding link |
+| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship                                     |
+| `STS_ASSUMEROLE_ALLOW` | Can assume role                                             |

 ---

 ## Parameters

-For queries requiring user input:
+For queries that take user input:

 ```python
 parameters=[
@@ -438,50 +407,83 @@ parameters=[

 ---

-## Best practices
+## openCypher compatibility

-1. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). An unanchored `MATCH` would return nodes from all providers.
+Queries must run on both Neo4j and Amazon Neptune. Avoid these constructs:

-   ```cypher
-   // WRONG: matches ALL AWSRoles across all providers
-   MATCH (role:AWSRole) WHERE role.name = 'admin'
+| Feature                                 | Use instead                                                                                                                                 |
+| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| APOC procedures (`apoc.*`)              | Real nodes and relationships in the graph                                                                                                   |
+| Neptune extensions                      | Standard openCypher                                                                                                                         |
+| `reduce()`                              | `UNWIND` + `collect()`                                                                                                                      |
+| `FOREACH`                               | `WITH` + `UNWIND` + `SET`                                                                                                                   |
+| Regex `=~`                              | `toLower()` + exact match, or `STARTS WITH` / `CONTAINS`                                                                                    |
+| `CALL () { UNION }`                     | Multi-label `OR` in `WHERE` (see pattern above)                                                                                             |
+| `any(x IN list ...)`                    | `size([x IN list WHERE pred]) > 0`                                                                                                          |
+| `all(x IN list ...)`                    | `size([x IN list WHERE pred]) = size(list)`                                                                                                 |
+| `none(x IN list ...)`                   | `size([x IN list WHERE pred]) = 0`                                                                                                          |
+| `EXISTS { MATCH (pattern) WHERE pred }` | Standalone `MATCH (pattern)` + `WHERE pred`; precede the downstream `collect(path...)` with `WITH DISTINCT <path-vars>` to dedupe the joins |

-   // CORRECT: scoped to the specific account's subgraph
-   MATCH (aws)--(role:AWSRole) WHERE role.name = 'admin'
-   ```
-
-   **Exception**: A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph by the first MATCH. It does not need to chain from `aws` again.
-
-2. **Include Prowler findings**: Always add `OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})` with `collect(DISTINCT pf)`.
-
-3. **Comment the query purpose**: Add inline comments explaining each MATCH clause.
-
-4. **Never use internal labels in queries**: `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are for system isolation. They should never appear in predefined or custom query text.
-
-6. **Internet node uses path connectivity**: Reach it via `OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)` where `resource` is already scoped by the account anchor. No standalone lookup.
+For list-typed properties in the catalog (action, resource, and so on), traverse the `HAS_*` edges to the child item nodes via the multi-`MATCH` shape shown in "List-typed properties as child nodes". The parent node does not carry the list as a single field, so `split(...)` and comma-string predicates do not apply.

 ---

-## openCypher compatibility
+## Best practices

-Queries must be written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
+1. **Chain every MATCH from the account anchor.** An unanchored `MATCH (role:AWSRole)` returns roles from every provider in the graph; `MATCH (aws)--(role:AWSRole)` is scoped. A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph.
+2. **Type the finding probe.** Always `OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})`. The type lets Neptune apply an inline edge filter; an untyped probe scans every incident edge of high-degree nodes.
+3. **Comment each MATCH.** One inline `// ...` line per clause explaining its role.
+4. **Never use internal labels.** `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are system isolation labels and must not appear in query text (predefined or custom).
+5. **Reach the Internet node through path connectivity** via `(internet:Internet)-[:CAN_ACCESS]->(resource)`, never as a standalone match.
+6. **Preserve the `RETURN` contract.** `paths, dpf, dpfr` for the standard shape; add `internet, can_access` for network-exposure queries. The serializer and visualiser depend on these names.

-### Avoid these (not in openCypher spec)
+---

-| Feature                    | Use instead                                            |
-| -------------------------- | ------------------------------------------------------ |
-| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph              |
-| Neptune extensions         | Standard openCypher                                    |
-| `reduce()` function        | `UNWIND` + `collect()`                                 |
-| `FOREACH` clause           | `WITH` + `UNWIND` + `SET`                              |
-| Regex operator (`=~`)      | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH`. One legacy query uses `=~` - do not add new usages |
-| `CALL () { UNION }`        | Multi-label OR in WHERE (see patterns section)         |
+## Naming conventions
+
+- **ID**: kebab-case `{provider}-{category}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
+- **Constant**: SHOUTING*SNAKE_CASE `{PROVIDER}*{CATEGORY}\_{DESCRIPTION}`, e.g. `AWS_EC2_PRIVESC_PASSROLE_IAM`.
+
+---
+
+## Creation steps
+
+1. **Read the queries module first** to match the existing style:
+
+   ```text
+   api/src/backend/api/attack_paths/queries/
+   ├── __init__.py
+   ├── types.py         # dataclass definitions
+   ├── registry.py
+   └── {provider}.py
+   ```
+
+2. **Fetch the Cartography schema for the pinned version.** Do not guess labels, properties, or relationships. Read the dependency pin:
+
+   ```bash
+   grep cartography api/pyproject.toml
+   ```
+
+   Then fetch the schema for that exact tag:
+
+   ```text
+   # Git pin (prowler-cloud/cartography@<TAG>):
+   https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
+
+   # PyPI pin (cartography==<TAG>):
+   https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
+   ```
+
+3. **Build the query** using the canonical predefined template plus the appropriate sub-pattern (privilege escalation or network exposure). For list-typed properties (action/resource/etc.), traverse the exploded child nodes via `[:HAS_ACTION]->(:AWSPolicyStatementActionItem)` etc. (see "List-typed properties as child nodes" and the `AWS_NORMALIZED_LISTS` catalog).
+
+4. **Register** the constant in the `{PROVIDER}_QUERIES` list at the bottom of the provider file.

 ---

 ## Reference

- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`, not WebFetch)
- **Cartography schema**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
- **openCypher spec**: https://github.com/opencypher/openCypher
+- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`; the aggregated `paths.json` is too large for WebFetch).
+- **Cartography schema** (per pinned tag): `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{tag}/docs/root/modules/{provider}/schema.md`.
+- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html.
+- **openCypher spec**: https://github.com/opencypher/openCypher.
+- **Sync converter** (`tasks/jobs/attack_paths/sync.py`): list-typed node properties listed in `tasks/jobs/attack_paths/provider_config.py::AWS_NORMALIZED_LISTS` are materialised as child item nodes + `HAS_*` edges. Properties that are not in the catalog are serialised to a comma-delimited string and emit a one-time warning. Dict-typed properties become JSON strings. Same shape on both sinks.