feat(skills): create new Attack Packs queries in openCypher (#9975)

2026-03-21 18:58:04 +00:00 · 2026-02-06 11:57:33 +01:00
parent 619d1ffc62
commit ecc8eaf366
5 changed files with 496 additions and 0 deletions
--- a/skills/README.md
+++ b/skills/README.md
@@ -77,6 +77,7 @@ Patterns tailored for Prowler development:
 | `prowler-provider` | Add new cloud providers |
 | `prowler-pr` | Pull request conventions |
 | `prowler-docs` | Documentation style guide |
+| `prowler-attack-paths-query` | Create Attack Paths openCypher queries |

 ### Meta Skills

--- a/skills/prowler-attack-paths-query/SKILL.md
+++ b/skills/prowler-attack-paths-query/SKILL.md
@@ -0,0 +1,479 @@
+---
+name: prowler-attack-paths-query
+description: >
+  Creates Prowler Attack Paths openCypher queries for graph analysis (compatible with Neo4j and Neptune).
+  Trigger: When creating or updating Attack Paths queries that detect privilege escalation paths,
+  network exposure, or security misconfigurations in cloud environments.
+license: Apache-2.0
+metadata:
+  author: prowler-cloud
+  version: "1.0"
+  scope: [root, api]
+  auto_invoke:
+    - "Creating Attack Paths queries"
+    - "Updating existing Attack Paths queries"
+    - "Adding privilege escalation detection queries"
+allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task
+---
+
+## Overview
+
+Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs
+(ingested via Cartography) to detect security risks like privilege escalation paths,
+network exposure, and misconfigurations.
+
+Queries are written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
+
+---
+
+## Input Sources
+
+Queries can be created from:
+
+1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
+   - The JSON index contains: `id`, `name`, `description`, `services`, `permissions`, `exploitationSteps`, `prerequisites`, etc.
+   - Reference: https://github.com/DataDog/pathfinding.cloud
+
+   **Fetching a single path by ID** — The aggregated `paths.json` is too large for WebFetch
+   (content gets truncated). Use Bash with `curl` and a JSON parser instead:
+
+   Prefer `jq` (concise), fall back to `python3` (guaranteed in this Python project):
+   ```bash
+   # With jq
+   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
+     | jq '.[] | select(.id == "ecs-002")'
+
+   # With python3 (fallback)
+   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
+     | python3 -c "import json,sys; print(json.dumps(next((p for p in json.load(sys.stdin) if p['id']=='ecs-002'), None), indent=2))"
+   ```
+
+2. **Listing Available Attack Paths**
+   - Use Bash to list available paths from the JSON index:
+   ```bash
+   # List all path IDs and names (jq)
+   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
+     | jq -r '.[] | "\(.id): \(.name)"'
+
+   # List all path IDs and names (python3 fallback)
+   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
+     | python3 -c "import json,sys; [print(f\"{p['id']}: {p['name']}\") for p in json.load(sys.stdin)]"
+
+   # List paths filtered by service prefix
+   curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
+     | jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
+   ```
+
+3. **Natural Language Description**
+   - User describes the Attack Paths in plain language
+   - Agent maps to appropriate openCypher patterns
+
+---
+
+## Query Structure
+
+### File Location
+
+```
+api/src/backend/api/attack_paths/queries/{provider}.py
+```
+
+Example: `api/src/backend/api/attack_paths/queries/aws.py`
+
+### Query Definition Pattern
+
+```python
+from api.attack_paths.queries.types import (
+    AttackPathsQueryDefinition,
+    AttackPathsQueryParameterDefinition,
+)
+from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
+
+# {REFERENCE_ID} (e.g., EC2-001, GLUE-001)
+AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
+    id="aws-{kebab-case-name}",
+    name="Privilege Escalation: {permission1} + {permission2}",
+    description="{Detailed description of the Attack Paths}.",
+    provider="aws",
+    cypher=f"""
+        // Find principals with {permission1}
+        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+        WHERE stmt.effect = 'Allow'
+            AND any(action IN stmt.action WHERE
+                toLower(action) = '{permission1_lowercase}'
+                OR toLower(action) = '{service}:*'
+                OR action = '*'
+            )
+
+        // Find {permission2}
+        MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)
+        WHERE stmt2.effect = 'Allow'
+            AND any(action IN stmt2.action WHERE
+                toLower(action) = '{permission2_lowercase}'
+                OR toLower(action) = '{service2}:*'
+                OR action = '*'
+            )
+
+        // Find target resources
+        MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {{arn: '{service}.amazonaws.com'}})
+        WHERE any(resource IN stmt.resource WHERE
+            resource = '*'
+            OR target_role.arn CONTAINS resource
+            OR resource CONTAINS target_role.name
+        )
+
+        UNWIND nodes(path_principal) + nodes(path_target) as n
+        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL', provider_uid: $provider_uid}})
+
+        RETURN path_principal, path_target,
+            collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
+    """,
+    parameters=[],
+)
+```
+
+### Register in Query List
+
+Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
+
+```python
+AWS_QUERIES: list[AttackPathsQueryDefinition] = [
+    # ... existing queries ...
+    AWS_{NEW_QUERY_NAME},  # Add here
+]
+```
+
+---
+
+## Step-by-Step Creation Process
+
+### 1. Read the Queries Module
+
+**FIRST**, read all files in the queries module to understand the structure:
+
+```
+api/src/backend/api/attack_paths/queries/
+├── __init__.py      # Module exports
+├── types.py         # AttackPathsQueryDefinition, AttackPathsQueryParameterDefinition
+├── registry.py      # Query registry logic
+└── {provider}.py    # Provider-specific queries (e.g., aws.py)
+```
+
+Read these files to learn:
+
+- Type definitions and available fields
+- How queries are registered
+- Current query patterns, style, and naming conventions
+
+### 2. Determine Schema Source
+
+Check the Cartography dependency in `api/pyproject.toml`:
+
+```bash
+grep cartography api/pyproject.toml
+```
+
+Parse the dependency to determine the schema source:
+
+**If git-based dependency** (e.g., `cartography @ git+https://github.com/prowler-cloud/cartography@0.126.1`):
+
+- Extract the repository (e.g., `prowler-cloud/cartography`)
+- Extract the version/tag (e.g., `0.126.1`)
+- Fetch schema from that repository at that tag
+
+**If PyPI dependency** (e.g., `cartography = "^0.126.0"` or `cartography>=0.126.0`):
+
+- Extract the version (e.g., `0.126.0`)
+- Use the official `cartography-cncf` repository
+
+**Schema URL patterns** (ALWAYS use the specific version tag, not master/main):
+
+```
+# Official Cartography (cartography-cncf)
+https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
+
+# Prowler fork (prowler-cloud)
+https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
+```
+
+**Examples**:
+
+```bash
+# For prowler-cloud/cartography@0.126.1 (git), fetch AWS schema:
+https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/aws/schema.md
+
+# For cartography = "^0.126.0" (PyPI), fetch AWS schema:
+https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/aws/schema.md
+```
+
+**IMPORTANT**: Always match the schema version to the dependency version in `pyproject.toml`. Using master/main may reference node labels or properties that don't exist in the deployed version.
+
+**Additional Prowler Labels**: The Attack Paths sync task adds extra labels:
+
+- `ProwlerFinding` - Prowler finding nodes with `status`, `provider_uid` properties
+- `ProviderResource` - Generic resource marker
+- `{Provider}Resource` - Provider-specific marker (e.g., `AWSResource`)
+
+These are defined in `api/src/backend/tasks/jobs/attack_paths/config.py`.
+
+### 3. Consult the Schema for Available Data
+
+Use the Cartography schema to discover:
+
+- What node labels exist for the target resources
+- What properties are available on those nodes
+- What relationships connect the nodes
+
+This informs query design by showing what data is actually available to query.
+
+### 4. Create Query Definition
+
+Use the standard pattern (see above) with:
+
+- **id**: Auto-generated as `{provider}-{kebab-case-description}`
+- **name**: Human-readable, e.g., "Privilege Escalation: {perm1} + {perm2}"
+- **description**: Explain the attack vector and impact
+- **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
+- **cypher**: The openCypher query with proper escaping
+- **parameters**: Optional list of user-provided parameters (use `parameters=[]` if none needed)
+
+### 5. Add Query to Provider List
+
+Add the constant to the `{PROVIDER}_QUERIES` list.
+
+---
+
+## Query Naming Conventions
+
+### Query ID
+
+```
+{provider}-{category}-{description}
+```
+
+Examples:
+
+- `aws-ec2-privesc-passrole-iam`
+- `aws-iam-privesc-attach-role-policy-assume-role`
+- `aws-rds-unencrypted-storage`
+
+### Query Constant Name
+
+```
+{PROVIDER}_{CATEGORY}_{DESCRIPTION}
+```
+
+Examples:
+
+- `AWS_EC2_PRIVESC_PASSROLE_IAM`
+- `AWS_IAM_PRIVESC_ATTACH_ROLE_POLICY_ASSUME_ROLE`
+- `AWS_RDS_UNENCRYPTED_STORAGE`
+
+---
+
+## Query Categories
+
+| Category             | Description                    | Example                   |
+| -------------------- | ------------------------------ | ------------------------- |
+| Basic Resource       | List resources with properties | RDS instances, S3 buckets |
+| Network Exposure     | Internet-exposed resources     | EC2 with public IPs       |
+| Privilege Escalation | IAM privilege escalation paths | PassRole + RunInstances   |
+| Data Access          | Access to sensitive data       | EC2 with S3 access        |
+
+---
+
+## Common openCypher Patterns
+
+### Match Account and Principal
+
+```cypher
+MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+```
+
+### Check IAM Action Permissions
+
+```cypher
+WHERE stmt.effect = 'Allow'
+    AND any(action IN stmt.action WHERE
+        toLower(action) = 'iam:passrole'
+        OR toLower(action) = 'iam:*'
+        OR action = '*'
+    )
+```
+
+### Find Roles Trusting a Service
+
+```cypher
+MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
+```
+
+### Check Resource Scope
+
+```cypher
+WHERE any(resource IN stmt.resource WHERE
+    resource = '*'
+    OR target_role.arn CONTAINS resource
+    OR resource CONTAINS target_role.name
+)
+```
+
+### Include Prowler Findings
+
+```cypher
+UNWIND nodes(path_principal) + nodes(path_target) as n
+OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {status: 'FAIL', provider_uid: $provider_uid})
+
+RETURN path_principal, path_target,
+    collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
+```
+
+---
+
+## Common Node Labels by Provider
+
+### AWS
+
+| Label                | Description                         |
+| -------------------- | ----------------------------------- |
+| `AWSAccount`         | AWS account root                    |
+| `AWSPrincipal`       | IAM principal (user, role, service) |
+| `AWSRole`            | IAM role                            |
+| `AWSUser`            | IAM user                            |
+| `AWSPolicy`          | IAM policy                          |
+| `AWSPolicyStatement` | Policy statement                    |
+| `EC2Instance`        | EC2 instance                        |
+| `EC2SecurityGroup`   | Security group                      |
+| `S3Bucket`           | S3 bucket                           |
+| `RDSInstance`        | RDS database instance               |
+| `LoadBalancer`       | Classic ELB                         |
+| `LoadBalancerV2`     | ALB/NLB                             |
+| `LaunchTemplate`     | EC2 launch template                 |
+
+### Common Relationships
+
+| Relationship           | Description             |
+| ---------------------- | ----------------------- |
+| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship |
+| `STS_ASSUMEROLE_ALLOW` | Can assume role         |
+| `POLICY`               | Has policy attached     |
+| `STATEMENT`            | Policy has statement    |
+
+---
+
+## Parameters
+
+For queries requiring user input, define parameters:
+
+```python
+parameters=[
+    AttackPathsQueryParameterDefinition(
+        name="ip",
+        label="IP address",
+        description="Public IP address, e.g. 192.0.2.0.",
+        placeholder="192.0.2.0",
+    ),
+    AttackPathsQueryParameterDefinition(
+        name="tag_key",
+        label="Tag key",
+        description="Tag key to filter resources.",
+        placeholder="Environment",
+    ),
+],
+```
+
+---
+
+## Best Practices
+
+1. **Always filter by provider_uid**: Use `{id: $provider_uid}` on account nodes and `{provider_uid: $provider_uid}` on ProwlerFinding nodes
+
+2. **Use consistent naming**: Follow existing patterns in the file
+
+3. **Include Prowler findings**: Always add the OPTIONAL MATCH for ProwlerFinding nodes
+
+4. **Return distinct findings**: Use `collect(DISTINCT pf)` to avoid duplicates
+
+5. **Comment the query purpose**: Add inline comments explaining each MATCH clause
+
+6. **Validate schema first**: Ensure all node labels and properties exist in Cartography schema
+
+---
+
+## openCypher Compatibility
+
+Queries must be written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
+
+> **Why Version 9?** Amazon Neptune implements openCypher Version 9. By targeting this specification, queries work on both Neo4j and Neptune without modification.
+
+### Avoid These (Not in openCypher spec)
+
+| Feature                                             | Reason                                          |
+| --------------------------------------------------- | ----------------------------------------------- |
+| APOC procedures (`apoc.*`)                          | Neo4j-specific plugin, not available in Neptune |
+| Virtual nodes (`apoc.create.vNode`)                 | APOC-specific                                   |
+| Virtual relationships (`apoc.create.vRelationship`) | APOC-specific                                   |
+| Neptune extensions                                  | Not available in Neo4j                          |
+| `reduce()` function                                 | Use `UNWIND` + aggregation instead              |
+| `FOREACH` clause                                    | Use `WITH` + `UNWIND` + `SET` instead           |
+| Regex match operator (`=~`)                         | Not supported in Neptune                        |
+
+### CALL Subqueries
+
+Supported with limitations:
+
+- Use `WITH` clause to import variables: `CALL { WITH var ... }`
+- Updates inside CALL subqueries are NOT supported
+- Emitted variables cannot overlap with variables before the CALL
+
+---
+
+## Reference
+
+### pathfinding.cloud (Attack Path Definitions)
+
+- **Repository**: https://github.com/DataDog/pathfinding.cloud
+- **All paths JSON**: `https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json`
+- Use WebFetch to query specific paths or list available services
+
+### Cartography Schema
+
+- **URL pattern**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
+- Always use the version from `api/pyproject.toml`, not master/main
+
+### openCypher Specification
+
+- **Neptune openCypher compliance** (what Neptune supports): https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
+- **Rewriting Cypher for Neptune** (converting Neo4j-specific syntax): https://docs.aws.amazon.com/neptune/latest/userguide/migration-opencypher-rewrites.html
+- **openCypher project** (spec, grammar, TCK): https://github.com/opencypher/openCypher
+
+---
+
+## Learning from the Queries Module
+
+**IMPORTANT**: Before creating a new query, ALWAYS read the entire queries module:
+
+```
+api/src/backend/api/attack_paths/queries/
+├── __init__.py      # Module exports
+├── types.py         # Type definitions
+├── registry.py      # Registry logic
+└── {provider}.py    # Provider queries (aws.py, etc.)
+```
+
+Use the existing queries to learn:
+
+- Query structure and formatting
+- Variable naming conventions
+- How to include Prowler findings
+- Comment style
+
+> **Compatibility Warning**: Some existing queries use Neo4j-specific features
+> (e.g., `apoc.create.vNode`, `apoc.create.vRelationship`, regex `=~`) that are
+> **NOT compatible** with Amazon Neptune. Use these queries to learn general
+> patterns (structure, naming, Prowler findings integration, comment style) but
+> **DO NOT copy APOC procedures or other Neo4j-specific syntax** into new queries.
+> New queries must be pure openCypher Version 9. Refer to the
+> [openCypher Compatibility](#opencypher-compatibility) section for the full list
+> of features to avoid.
+
+**DO NOT** use generic templates. Match the exact style of existing **compatible** queries in the file.