chore(skills): improve attack-paths-query skill accuracy and schema emphasis

2026-03-22 03:08:23 +00:00 · 2026-03-19 19:28:38 +01:00
parent 776f8122ba
commit 9ffd9ffc72
1 changed files with 180 additions and 280 deletions
--- a/skills/prowler-attack-paths-query/SKILL.md
+++ b/skills/prowler-attack-paths-query/SKILL.md
@@ -1,13 +1,14 @@
 ---
 name: prowler-attack-paths-query
 description: >
-  Creates Prowler Attack Paths openCypher queries for graph analysis (compatible with Neo4j and Neptune).
-  Trigger: When creating or updating Attack Paths queries that detect privilege escalation paths,
-  network exposure, or security misconfigurations in cloud environments.
+  Creates Prowler Attack Paths openCypher queries using the Cartography schema as the source of truth
+  for node labels, properties, and relationships. Also covers Prowler-specific additions (Internet node,
+  ProwlerFinding, internal isolation labels) and $provider_uid scoping for predefined queries.
+  Trigger: When creating or updating Attack Paths queries.
 license: Apache-2.0
 metadata:
  author: prowler-cloud
-  version: "1.1"
+  version: "2.0"
  scope: [root, api]
  auto_invoke:
    - "Creating Attack Paths queries"
@@ -20,7 +21,24 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task

 Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs (ingested via Cartography) to detect security risks like privilege escalation paths, network exposure, and misconfigurations.

-Queries are written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
+Queries are written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
+
+---
+
+## Two query audiences
+
+This skill covers two types of queries with different isolation mechanisms:
+
+| | Predefined queries | Custom queries |
+|---|---|---|
+| **Where they live** | `api/src/backend/api/attack_paths/queries/{provider}.py` | User/LLM-supplied via the custom query API endpoint |
+| **Provider isolation** | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection via `cypher_rewriter.py` |
+| **What to write** | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate needed |
+| **Internal labels** | Never use (`_ProviderResource`, `_Tenant_*`, `_Provider_*`) | Never use (injected automatically by the system) |
+
+**For predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. This is the isolation boundary.
+
+**For custom queries**: write natural Cypher without isolation concerns. The query runner injects a `_Provider_{uuid}` label into every node pattern before execution, and a post-query filter catches edge cases.

 ---

@@ -29,67 +47,44 @@ Queries are written in **openCypher Version 9** to ensure compatibility with bot
 Queries can be created from:

 1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
-   - The JSON index contains: `id`, `name`, `description`, `services`, `permissions`, `exploitationSteps`, `prerequisites`, etc.
   - Reference: https://github.com/DataDog/pathfinding.cloud
-
-   **Fetching a single path by ID** - The aggregated `paths.json` is too large for WebFetch
-   (content gets truncated). Use Bash with `curl` and a JSON parser instead:
-
-   Prefer `jq` (concise), fall back to `python3` (guaranteed in this Python project):
+   - The aggregated `paths.json` is too large for WebFetch. Use Bash:

   ```bash
-   # With jq
+   # Fetch a single path by ID
   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
     | jq '.[] | select(.id == "ecs-002")'

-   # With python3 (fallback)
-   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
-     | python3 -c "import json,sys; print(json.dumps(next((p for p in json.load(sys.stdin) if p['id']=='ecs-002'), None), indent=2))"
-   ```
-
-2. **Listing Available Attack Paths**
-   - Use Bash to list available paths from the JSON index:
-
-   ```bash
-   # List all path IDs and names (jq)
+   # List all path IDs and names
   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
     | jq -r '.[] | "\(.id): \(.name)"'

-   # List all path IDs and names (python3 fallback)
-   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
-     | python3 -c "import json,sys; [print(f\"{p['id']}: {p['name']}\") for p in json.load(sys.stdin)]"
-
-   # List paths filtered by service prefix
+   # Filter by service prefix
   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
     | jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
   ```

-3. **Natural Language Description**
-   - User describes the Attack Paths in plain language
-   - Agent maps to appropriate openCypher patterns
+   If `jq` is not available, use `python3 -c "import json,sys; ..."` as a fallback.
+
+2. **Natural language description** from the user

 ---

 ## Query Structure

-### File Location
+### Provider scoping parameter

-```
-api/src/backend/api/attack_paths/queries/{provider}.py
-```
+One parameter is injected automatically by the query runner:

-Example: `api/src/backend/api/attack_paths/queries/aws.py`
+| Parameter       | Property it matches | Used on      | Purpose                          |
+| --------------- | ------------------- | ------------ | -------------------------------- |
+| `$provider_uid` | `id`                | `AWSAccount` | Scopes to a specific AWS account |

-### Query parameters for provider scoping
+All other nodes are isolated by path connectivity from the `AWSAccount` anchor.

-Two parameters exist. Both are injected automatically by the query runner.
+### Imports

-| Parameter       | Property it matches | Used on        | Purpose                              |
-| --------------- | ------------------- | -------------- | ------------------------------------ |
-| `$provider_uid` | `id`                | `AWSAccount`   | Scopes to a specific AWS account     |
-| `$provider_id`  | `_provider_id`      | Any other node | Scopes nodes to the provider context |
-
-### Privilege Escalation Query Pattern
+All query files start with these imports:

 ```python
 from api.attack_paths.queries.types import (
@@ -97,47 +92,57 @@ from api.attack_paths.queries.types import (
    AttackPathsQueryDefinition,
    AttackPathsQueryParameterDefinition,
 )
+from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
+```

-# {REFERENCE_ID} (e.g., EC2-001, GLUE-001)
+The `PROWLER_FINDING_LABEL` constant (value: `"ProwlerFinding"`) is used via f-string interpolation in all queries. Never hardcode the label string.
+
+### Privilege escalation sub-patterns
+
+There are four distinct privilege escalation patterns. Choose based on the attack type:
+
+| Sub-pattern | Target | `path_target` shape | Example |
+|---|---|---|---|
+| Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
+| Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
+| Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
+| PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(...)` | EC2-001 |
+
+#### Self-escalation (e.g., IAM-001)
+
+The principal modifies resources attached to itself. `path_target` loops back to `principal`:
+
+```python
 AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    id="aws-{kebab-case-name}",
    name="{Human-friendly label} ({REFERENCE_ID})",
-    short_description="{Brief explanation of the attack, no technical permissions.}",
+    short_description="{Brief explanation, no technical permissions.}",
    description="{Detailed description of the attack vector and impact.}",
    attribution=AttackPathsQueryAttribution(
-        text="pathfinding.cloud - {REFERENCE_ID} - {permission1} + {permission2}",
+        text="pathfinding.cloud - {REFERENCE_ID} - {permission}",
        link="https://pathfinding.cloud/paths/{reference_id_lowercase}",
    ),
    provider="aws",
    cypher=f"""
-        // Find principals with {permission1}
+        // Find principals with {permission}
        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
        WHERE stmt.effect = 'Allow'
            AND any(action IN stmt.action WHERE
-                toLower(action) = '{permission1_lowercase}'
+                toLower(action) = '{permission_lowercase}'
                OR toLower(action) = '{service}:*'
                OR action = '*'
            )

-        // Find {permission2}
-        MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)
-        WHERE stmt2.effect = 'Allow'
-            AND any(action IN stmt2.action WHERE
-                toLower(action) = '{permission2_lowercase}'
-                OR toLower(action) = '{service2}:*'
-                OR action = '*'
+        // Find target resources attached to the same principal
+        MATCH path_target = (aws)--(target_policy:AWSPolicy)--(principal)
+        WHERE target_policy.arn CONTAINS $provider_uid
+            AND any(resource IN stmt.resource WHERE
+                resource = '*'
+                OR target_policy.arn CONTAINS resource
            )

-        // Find target resources (MUST chain from `aws` for provider isolation)
-        MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {{arn: '{service}.amazonaws.com'}})
-        WHERE any(resource IN stmt.resource WHERE
-            resource = '*'
-            OR target_role.arn CONTAINS resource
-            OR resource CONTAINS target_role.name
-        )
-
        UNWIND nodes(path_principal) + nodes(path_target) as n
-        OPTIONAL MATCH (n)-[pfr]-(pf:ProwlerFinding {{status: 'FAIL', provider_uid: $provider_uid}})
+        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

        RETURN path_principal, path_target,
            collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
@@ -146,7 +151,29 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
 )
 ```

-### Network Exposure Query Pattern
+#### Other sub-pattern `path_target` shapes
+
+The other 3 sub-patterns share the same `path_principal`, UNWIND, and RETURN as self-escalation. Only the `path_target` MATCH differs:
+
+```cypher
+// Lateral to user (e.g., IAM-002) - targets other IAM users
+MATCH path_target = (aws)--(target_user:AWSUser)
+WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_user.arn CONTAINS resource OR resource CONTAINS target_user.name)
+
+// Assume-role lateral (e.g., IAM-014) - targets roles the principal can assume
+MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
+WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
+
+// PassRole + service (e.g., EC2-001) - targets roles trusting a service
+MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: '{service}.amazonaws.com'})
+WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
+```
+
+**Multi-permission**: PassRole queries require a second permission. Add `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` with its own WHERE before `path_target`, then check BOTH `stmt.resource` AND `stmt2.resource` against the target. See IAM-015 or EC2-001 in `aws.py` for examples.
+
+### Network exposure pattern
+
+The Internet node is reached via `CAN_ACCESS` through the already-scoped resource, not via a standalone lookup:

 ```python
 AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
@@ -156,18 +183,15 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    description="{Detailed description.}",
    provider="aws",
    cypher=f"""
-        // Match the Internet sentinel node
-        OPTIONAL MATCH (internet:Internet {{_provider_id: $provider_id}})
-
        // Match exposed resources (MUST chain from `aws`)
        MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
        WHERE resource.exposed_internet = true

-        // Link Internet to resource
-        OPTIONAL MATCH (internet)-[can_access:CAN_ACCESS]->(resource)
+        // Internet node reached via path connectivity through the resource
+        OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)

        UNWIND nodes(path) as n
-        OPTIONAL MATCH (n)-[pfr]-(pf:ProwlerFinding {{status: 'FAIL', provider_uid: $provider_uid}})
+        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

        RETURN path, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
            internet, can_access
@@ -176,7 +200,7 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
 )
 ```

-### Register in Query List
+### Register in query list

 Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:

@@ -189,11 +213,11 @@ AWS_QUERIES: list[AttackPathsQueryDefinition] = [

 ---

-## Step-by-Step Creation Process
+## Step-by-step creation process

-### 1. Read the Queries Module
+### 1. Read the queries module

-**FIRST**, read all files in the queries module to understand the structure:
+**FIRST**, read all files in the queries module to understand the structure, type definitions, registration, and existing style:

 ```
 api/src/backend/api/attack_paths/queries/
@@ -203,94 +227,50 @@ api/src/backend/api/attack_paths/queries/
 └── {provider}.py    # Provider-specific queries (e.g., aws.py)
 ```

-Read these files to learn:
+**DO NOT** use generic templates. Match the exact style of existing queries in the file.

- Type definitions and available fields
- How queries are registered
- Current query patterns, style, and naming conventions
+### 2. Fetch and consult the Cartography schema

-### 2. Determine Schema Source
+**This is the most important step.** Every node label, property, and relationship in the query must exist in the Cartography schema for the pinned version. Do not guess or rely on memory.

-Check the Cartography dependency in `api/pyproject.toml`:
+Check `api/pyproject.toml` for the Cartography dependency, then fetch the schema:

 ```bash
 grep cartography api/pyproject.toml
 ```

-Parse the dependency to determine the schema source:
-
-**If git-based dependency** (e.g., `cartography @ git+https://github.com/prowler-cloud/cartography@0.126.1`):
-
- Extract the repository (e.g., `prowler-cloud/cartography`)
- Extract the version/tag (e.g., `0.126.1`)
- Fetch schema from that repository at that tag
-
-**If PyPI dependency** (e.g., `cartography = "^0.126.0"` or `cartography>=0.126.0`):
-
- Extract the version (e.g., `0.126.0`)
- Use the official `cartography-cncf` repository
-
-**Schema URL patterns** (ALWAYS use the specific version tag, not master/main):
+Build the schema URL (ALWAYS use the specific tag, not master/main):

 ```
-# Official Cartography (cartography-cncf)
-https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
+# Git dependency (prowler-cloud/cartography@0.126.1):
+https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/{provider}/schema.md

-# Prowler fork (prowler-cloud)
-https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
+# PyPI dependency (cartography = "^0.126.0"):
+https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/{provider}/schema.md
 ```

-**Examples**:
+Read the schema to discover available node labels, properties, and relationships for the target resources. Internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never appear in queries.

-```bash
-# For prowler-cloud/cartography@0.126.1 (git), fetch AWS schema:
-https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/aws/schema.md
-
-# For cartography = "^0.126.0" (PyPI), fetch AWS schema:
-https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/aws/schema.md
-```
-
-**IMPORTANT**: Always match the schema version to the dependency version in `pyproject.toml`. Using master/main may reference node labels or properties that don't exist in the deployed version.
-
-**Additional Prowler Labels**: The Attack Paths sync task adds labels that queries can reference:
-
- `ProwlerFinding` - Prowler finding nodes with `status`, `provider_uid` properties
- `Internet` - Internet sentinel node with `_provider_id` property (used in network exposure queries)
-
-Other internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never be used in queries.
-
-These are defined in `api/src/backend/tasks/jobs/attack_paths/config.py`.
-
-### 3. Consult the Schema for Available Data
-
-Use the Cartography schema to discover:
-
- What node labels exist for the target resources
- What properties are available on those nodes
- What relationships connect the nodes
-
-This informs query design by showing what data is actually available to query.
-
-### 4. Create Query Definition
+### 4. Create query definition

 Use the appropriate pattern (privilege escalation or network exposure) with:

- **id**: Auto-generated as `{provider}-{kebab-case-description}`
- **name**: Short, human-friendly label. No raw IAM permissions. For sourced queries (e.g., pathfinding.cloud), append the reference ID in parentheses: `"EC2 Instance Launch with Privileged Role (EC2-001)"`. If the name already has parentheses, prepend the ID inside them: `"ECS Service Creation with Privileged Role (ECS-003 - Existing Cluster)"`.
- **short_description**: Brief explanation of the attack, no technical permissions. E.g., "Launch EC2 instances with privileged IAM roles to gain their permissions via IMDS."
- **description**: Full technical explanation of the attack vector and impact. Plain text only, no HTML or technical permissions here.
+- **id**: `{provider}-{kebab-case-description}`
+- **name**: Short, human-friendly label. For sourced queries, append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
+- **short_description**: Brief explanation, no technical permissions.
+- **description**: Full technical explanation. Plain text only.
 - **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
 - **cypher**: The openCypher query with proper escaping
- **parameters**: Optional list of user-provided parameters (use `parameters=[]` if none needed)
- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes the source, reference ID, and technical permissions (e.g., `"pathfinding.cloud - EC2-001 - iam:PassRole + ec2:RunInstances"`). The `link` is the URL with a lowercase ID (e.g., `"https://pathfinding.cloud/paths/ec2-001"`). Omit (defaults to `None`) for non-sourced queries.
+- **parameters**: Optional list of user-provided parameters (`parameters=[]` if none)
+- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes source, reference ID, and permissions. The `link` uses a lowercase ID. Omit for non-sourced queries.

-### 5. Add Query to Provider List
+### 5. Add query to provider list

 Add the constant to the `{PROVIDER}_QUERIES` list.

 ---

-## Query Naming Conventions
+## Query naming conventions

 ### Query ID

@@ -298,27 +278,19 @@ Add the constant to the `{PROVIDER}_QUERIES` list.
 {provider}-{category}-{description}
 ```

-Examples:
+Examples: `aws-ec2-privesc-passrole-iam`, `aws-ec2-instances-internet-exposed`

- `aws-ec2-privesc-passrole-iam`
- `aws-iam-privesc-attach-role-policy-assume-role`
- `aws-ec2-instances-internet-exposed`
-
-### Query Constant Name
+### Query constant name

 ```
 {PROVIDER}_{CATEGORY}_{DESCRIPTION}
 ```

-Examples:
-
- `AWS_EC2_PRIVESC_PASSROLE_IAM`
- `AWS_IAM_PRIVESC_ATTACH_ROLE_POLICY_ASSUME_ROLE`
- `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
+Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`

 ---

-## Query Categories
+## Query categories

 | Category             | Description                    | Example                   |
 | -------------------- | ------------------------------ | ------------------------- |
@@ -329,15 +301,15 @@ Examples:

 ---

-## Common openCypher Patterns
+## Common openCypher patterns

-### Match Account and Principal
+### Match account and principal

 ```cypher
 MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
 ```

-### Check IAM Action Permissions
+### Check IAM action permissions

 ```cypher
 WHERE stmt.effect = 'Allow'
@@ -348,13 +320,21 @@ WHERE stmt.effect = 'Allow'
    )
 ```

-### Find Roles Trusting a Service
+### Find roles trusting a service

 ```cypher
 MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
 ```

-### Check Resource Scope
+### Find roles the principal can assume
+
+Note the arrow direction - `STS_ASSUMEROLE_ALLOW` points from the role to the principal:
+
+```cypher
+MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
+```
+
+### Check resource scope

 ```cypher
 WHERE any(resource IN stmt.resource WHERE
@@ -364,26 +344,16 @@ WHERE any(resource IN stmt.resource WHERE
 )
 ```

-### Match Internet Sentinel Node
+### Internet node via path connectivity

-Used in network exposure queries. The Internet node is a real graph node, scoped by `_provider_id`:
+The Internet node is reached through `CAN_ACCESS` relationships to already-scoped resources. No standalone lookup needed:

 ```cypher
-OPTIONAL MATCH (internet:Internet {_provider_id: $provider_id})
-```
-
-### Link Internet to Exposed Resource
-
-The `CAN_ACCESS` relationship is a real graph relationship linking the Internet node to exposed resources:
-
-```cypher
-OPTIONAL MATCH (internet)-[can_access:CAN_ACCESS]->(resource)
+OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
 ```

 ### Multi-label OR (match multiple resource types)

-When a query needs to match different resource types in the same position, use label checks in WHERE:
-
 ```cypher
 MATCH path = (aws:AWSAccount {id: $provider_uid})-[r]-(x)-[q]-(y)
 WHERE (x:EC2PrivateIp AND x.public_ip = $ip)
@@ -392,11 +362,11 @@ WHERE (x:EC2PrivateIp AND x.public_ip = $ip)
   OR (x:ElasticIPAddress AND x.public_ip = $ip)
 ```

-### Include Prowler Findings
+### Include Prowler findings

 ```cypher
 UNWIND nodes(path_principal) + nodes(path_target) as n
-OPTIONAL MATCH (n)-[pfr]-(pf:ProwlerFinding {status: 'FAIL', provider_uid: $provider_uid})
+OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

 RETURN path_principal, path_target,
    collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
@@ -411,154 +381,84 @@ RETURN path, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,

 ---

-## Common Node Labels by Provider
+## Prowler-specific labels and relationships

-### AWS
+These are added by the sync task, not part of the Cartography schema. For all other node labels, properties, and relationships, **always consult the Cartography schema** (see step 2 below).

-| Label                 | Description                             |
-| --------------------- | --------------------------------------- |
-| `AWSAccount`          | AWS account root                        |
-| `AWSPrincipal`        | IAM principal (user, role, service)     |
-| `AWSRole`             | IAM role                                |
-| `AWSUser`             | IAM user                                |
-| `AWSPolicy`           | IAM policy                              |
-| `AWSPolicyStatement`  | Policy statement                        |
-| `AWSTag`              | Resource tag (key/value)                |
-| `EC2Instance`         | EC2 instance                            |
-| `EC2SecurityGroup`    | Security group                          |
-| `EC2PrivateIp`        | EC2 private IP (has `public_ip`)        |
-| `IpPermissionInbound` | Inbound security group rule             |
-| `IpRange`             | IP range (e.g., `0.0.0.0/0`)            |
-| `NetworkInterface`    | ENI (has `public_ip`)                   |
-| `ElasticIPAddress`    | Elastic IP (has `public_ip`)            |
-| `S3Bucket`            | S3 bucket                               |
-| `RDSInstance`         | RDS database instance                   |
-| `LoadBalancer`        | Classic ELB                             |
-| `LoadBalancerV2`      | ALB/NLB                                 |
-| `ELBListener`         | Classic ELB listener                    |
-| `ELBV2Listener`       | ALB/NLB listener                        |
-| `LaunchTemplate`      | EC2 launch template                     |
-| `Internet`            | Internet sentinel node (`_provider_id`) |
-
-### Common Relationships
-
-| Relationship           | Description                        |
-| ---------------------- | ---------------------------------- |
-| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship            |
-| `STS_ASSUMEROLE_ALLOW` | Can assume role                    |
-| `CAN_ACCESS`           | Internet-to-resource exposure link |
-| `POLICY`               | Has policy attached                |
-| `STATEMENT`            | Policy has statement               |
+| Label/Relationship     | Description                                        |
+| ---------------------- | -------------------------------------------------- |
+| `ProwlerFinding`       | Finding node (`status`, `severity`, `check_id`)    |
+| `Internet`             | Internet sentinel node                             |
+| `CAN_ACCESS`           | Internet-to-resource exposure (relationship)       |
+| `HAS_FINDING`          | Resource-to-finding link (relationship)            |
+| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship                            |
+| `STS_ASSUMEROLE_ALLOW` | Can assume role (direction: role -> principal)      |

 ---

 ## Parameters

-For queries requiring user input, define parameters:
+For queries requiring user input:

 ```python
 parameters=[
    AttackPathsQueryParameterDefinition(
        name="ip",
        label="IP address",
+        # data_type defaults to "string", cast defaults to str.
+        # For non-string params, set both: data_type="integer", cast=int
        description="Public IP address, e.g. 192.0.2.0.",
        placeholder="192.0.2.0",
    ),
-    AttackPathsQueryParameterDefinition(
-        name="tag_key",
-        label="Tag key",
-        description="Tag key to filter resources.",
-        placeholder="Environment",
-    ),
 ],
 ```

 ---

-## Best Practices
+## Best practices

-1. **Always scope by provider**: Use `{id: $provider_uid}` on `AWSAccount` nodes. Use `{_provider_id: $provider_id}` on any other node that needs provider scoping (e.g., `Internet`).
-
-2. **Use consistent naming**: Follow existing patterns in the file
-
-3. **Include Prowler findings**: Always add the OPTIONAL MATCH for ProwlerFinding nodes
-
-4. **Return distinct findings**: Use `collect(DISTINCT pf)` to avoid duplicates
-
-5. **Comment the query purpose**: Add inline comments explaining each MATCH clause
-
-6. **Validate schema first**: Ensure all node labels and properties exist in Cartography schema
-
-7. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). The tenant database contains data from multiple providers — an unanchored `MATCH` would return nodes from all providers, breaking provider isolation.
+1. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). An unanchored `MATCH` would return nodes from all providers.

   ```cypher
-   // WRONG: matches ALL AWSRoles across all providers in the tenant DB
+   // WRONG: matches ALL AWSRoles across all providers
   MATCH (role:AWSRole) WHERE role.name = 'admin'

   // CORRECT: scoped to the specific account's subgraph
   MATCH (aws)--(role:AWSRole) WHERE role.name = 'admin'
   ```

-   The `Internet` node is an exception: it uses `OPTIONAL MATCH` with `_provider_id` for scoping instead of chaining from `aws`.
+   **Exception**: A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph by the first MATCH. It does not need to chain from `aws` again.
+
+2. **Include Prowler findings**: Always add `OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})` with `collect(DISTINCT pf)`.
+
+3. **Comment the query purpose**: Add inline comments explaining each MATCH clause.
+
+4. **Never use internal labels in queries**: `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are for system isolation. They should never appear in predefined or custom query text.
+
+6. **Internet node uses path connectivity**: Reach it via `OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)` where `resource` is already scoped by the account anchor. No standalone lookup.

 ---

-## openCypher Compatibility
+## openCypher compatibility

-Queries must be written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
+Queries must be written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.

-> **Why Version 9?** Amazon Neptune implements openCypher Version 9. By targeting this specification, queries work on both Neo4j and Neptune without modification.
+### Avoid these (not in openCypher spec)

-### Avoid These (Not in openCypher spec)
-
-| Feature                    | Reason                                          | Use instead                                            |
-| -------------------------- | ----------------------------------------------- | ------------------------------------------------------ |
-| APOC procedures (`apoc.*`) | Neo4j-specific plugin, not available in Neptune | Real nodes and relationships in the graph              |
-| Neptune extensions         | Not available in Neo4j                          | Standard openCypher                                    |
-| `reduce()` function        | Not in openCypher spec                          | `UNWIND` + `collect()`                                 |
-| `FOREACH` clause           | Not in openCypher spec                          | `WITH` + `UNWIND` + `SET`                              |
-| Regex operator (`=~`)      | Not supported in Neptune                        | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH` |
-| `CALL () { UNION }`        | Complex, hard to maintain                       | Multi-label OR in WHERE (see patterns section)         |
+| Feature                    | Use instead                                            |
+| -------------------------- | ------------------------------------------------------ |
+| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph              |
+| Neptune extensions         | Standard openCypher                                    |
+| `reduce()` function        | `UNWIND` + `collect()`                                 |
+| `FOREACH` clause           | `WITH` + `UNWIND` + `SET`                              |
+| Regex operator (`=~`)      | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH`. One legacy query uses `=~` - do not add new usages |
+| `CALL () { UNION }`        | Multi-label OR in WHERE (see patterns section)         |

 ---

 ## Reference

-### pathfinding.cloud (Attack Path Definitions)
-
- **Repository**: https://github.com/DataDog/pathfinding.cloud
- **All paths JSON**: `https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json`
- Always use Bash with `curl | jq` to fetch paths (WebFetch truncates the large JSON)
-
-### Cartography Schema
-
- **URL pattern**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
- Always use the version from `api/pyproject.toml`, not master/main
-
-### openCypher Specification
-
- **Neptune openCypher compliance** (what Neptune supports): https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
- **openCypher project** (spec, grammar, TCK): https://github.com/opencypher/openCypher
-
---
-
-## Learning from the Queries Module
-
-**IMPORTANT**: Before creating a new query, ALWAYS read the entire queries module:
-
-```
-api/src/backend/api/attack_paths/queries/
-├── __init__.py      # Module exports
-├── types.py         # Type definitions
-├── registry.py      # Registry logic
-└── {provider}.py    # Provider queries (aws.py, etc.)
-```
-
-Use the existing queries to learn:
-
- Query structure and formatting
- Variable naming conventions
- How to include Prowler findings
- Comment style
-
-**DO NOT** use generic templates. Match the exact style of existing queries in the file.
+- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`, not WebFetch)
+- **Cartography schema**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
+- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
+- **openCypher spec**: https://github.com/opencypher/openCypher