chore(skills): improve attack-paths-query skill accuracy and schema emphasis

This commit is contained in:
Josema Camacho
2026-03-19 19:28:38 +01:00
parent 776f8122ba
commit 9ffd9ffc72

View File

@@ -1,13 +1,14 @@
---
name: prowler-attack-paths-query
description: >
Creates Prowler Attack Paths openCypher queries for graph analysis (compatible with Neo4j and Neptune).
Trigger: When creating or updating Attack Paths queries that detect privilege escalation paths,
network exposure, or security misconfigurations in cloud environments.
Creates Prowler Attack Paths openCypher queries using the Cartography schema as the source of truth
for node labels, properties, and relationships. Also covers Prowler-specific additions (Internet node,
ProwlerFinding, internal isolation labels) and $provider_uid scoping for predefined queries.
Trigger: When creating or updating Attack Paths queries.
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.1"
version: "2.0"
scope: [root, api]
auto_invoke:
- "Creating Attack Paths queries"
@@ -20,7 +21,24 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task
Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs (ingested via Cartography) to detect security risks like privilege escalation paths, network exposure, and misconfigurations.
Queries are written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
Queries are written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
---
## Two query audiences
This skill covers two types of queries with different isolation mechanisms:
| | Predefined queries | Custom queries |
|---|---|---|
| **Where they live** | `api/src/backend/api/attack_paths/queries/{provider}.py` | User/LLM-supplied via the custom query API endpoint |
| **Provider isolation** | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection via `cypher_rewriter.py` |
| **What to write** | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate needed |
| **Internal labels** | Never use (`_ProviderResource`, `_Tenant_*`, `_Provider_*`) | Never use (injected automatically by the system) |
**For predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. This is the isolation boundary.
**For custom queries**: write natural Cypher without isolation concerns. The query runner injects a `_Provider_{uuid}` label into every node pattern before execution, and a post-query filter catches edge cases.
---
@@ -29,67 +47,44 @@ Queries are written in **openCypher Version 9** to ensure compatibility with bot
Queries can be created from:
1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
- The JSON index contains: `id`, `name`, `description`, `services`, `permissions`, `exploitationSteps`, `prerequisites`, etc.
- Reference: https://github.com/DataDog/pathfinding.cloud
**Fetching a single path by ID** - The aggregated `paths.json` is too large for WebFetch
(content gets truncated). Use Bash with `curl` and a JSON parser instead:
Prefer `jq` (concise), fall back to `python3` (guaranteed in this Python project):
- The aggregated `paths.json` is too large for WebFetch. Use Bash:
```bash
# With jq
# Fetch a single path by ID
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| jq '.[] | select(.id == "ecs-002")'
# With python3 (fallback)
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| python3 -c "import json,sys; print(json.dumps(next((p for p in json.load(sys.stdin) if p['id']=='ecs-002'), None), indent=2))"
```
2. **Listing Available Attack Paths**
- Use Bash to list available paths from the JSON index:
```bash
# List all path IDs and names (jq)
# List all path IDs and names
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| jq -r '.[] | "\(.id): \(.name)"'
# List all path IDs and names (python3 fallback)
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| python3 -c "import json,sys; [print(f\"{p['id']}: {p['name']}\") for p in json.load(sys.stdin)]"
# List paths filtered by service prefix
# Filter by service prefix
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
```
3. **Natural Language Description**
- User describes the Attack Paths in plain language
- Agent maps to appropriate openCypher patterns
If `jq` is not available, use `python3 -c "import json,sys; ..."` as a fallback.
2. **Natural language description** from the user
---
## Query Structure
### File Location
### Provider scoping parameter
```
api/src/backend/api/attack_paths/queries/{provider}.py
```
One parameter is injected automatically by the query runner:
Example: `api/src/backend/api/attack_paths/queries/aws.py`
| Parameter | Property it matches | Used on | Purpose |
| --------------- | ------------------- | ------------ | -------------------------------- |
| `$provider_uid` | `id` | `AWSAccount` | Scopes to a specific AWS account |
### Query parameters for provider scoping
All other nodes are isolated by path connectivity from the `AWSAccount` anchor.
Two parameters exist. Both are injected automatically by the query runner.
### Imports
| Parameter | Property it matches | Used on | Purpose |
| --------------- | ------------------- | -------------- | ------------------------------------ |
| `$provider_uid` | `id` | `AWSAccount` | Scopes to a specific AWS account |
| `$provider_id` | `_provider_id` | Any other node | Scopes nodes to the provider context |
### Privilege Escalation Query Pattern
All query files start with these imports:
```python
from api.attack_paths.queries.types import (
@@ -97,47 +92,57 @@ from api.attack_paths.queries.types import (
AttackPathsQueryDefinition,
AttackPathsQueryParameterDefinition,
)
from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
```
# {REFERENCE_ID} (e.g., EC2-001, GLUE-001)
The `PROWLER_FINDING_LABEL` constant (value: `"ProwlerFinding"`) is used via f-string interpolation in all queries. Never hardcode the label string.
### Privilege escalation sub-patterns
There are four distinct privilege escalation patterns. Choose based on the attack type:
| Sub-pattern | Target | `path_target` shape | Example |
|---|---|---|---|
| Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
| Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
| Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
| PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(...)` | EC2-001 |
#### Self-escalation (e.g., IAM-001)
The principal modifies resources attached to itself. `path_target` loops back to `principal`:
```python
AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
id="aws-{kebab-case-name}",
name="{Human-friendly label} ({REFERENCE_ID})",
short_description="{Brief explanation of the attack, no technical permissions.}",
short_description="{Brief explanation, no technical permissions.}",
description="{Detailed description of the attack vector and impact.}",
attribution=AttackPathsQueryAttribution(
text="pathfinding.cloud - {REFERENCE_ID} - {permission1} + {permission2}",
text="pathfinding.cloud - {REFERENCE_ID} - {permission}",
link="https://pathfinding.cloud/paths/{reference_id_lowercase}",
),
provider="aws",
cypher=f"""
// Find principals with {permission1}
// Find principals with {permission}
MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
WHERE stmt.effect = 'Allow'
AND any(action IN stmt.action WHERE
toLower(action) = '{permission1_lowercase}'
toLower(action) = '{permission_lowercase}'
OR toLower(action) = '{service}:*'
OR action = '*'
)
// Find {permission2}
MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)
WHERE stmt2.effect = 'Allow'
AND any(action IN stmt2.action WHERE
toLower(action) = '{permission2_lowercase}'
OR toLower(action) = '{service2}:*'
OR action = '*'
// Find target resources attached to the same principal
MATCH path_target = (aws)--(target_policy:AWSPolicy)--(principal)
WHERE target_policy.arn CONTAINS $provider_uid
AND any(resource IN stmt.resource WHERE
resource = '*'
OR target_policy.arn CONTAINS resource
)
// Find target resources (MUST chain from `aws` for provider isolation)
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {{arn: '{service}.amazonaws.com'}})
WHERE any(resource IN stmt.resource WHERE
resource = '*'
OR target_role.arn CONTAINS resource
OR resource CONTAINS target_role.name
)
UNWIND nodes(path_principal) + nodes(path_target) as n
OPTIONAL MATCH (n)-[pfr]-(pf:ProwlerFinding {{status: 'FAIL', provider_uid: $provider_uid}})
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
RETURN path_principal, path_target,
collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
@@ -146,7 +151,29 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
)
```
### Network Exposure Query Pattern
#### Other sub-pattern `path_target` shapes
The other 3 sub-patterns share the same `path_principal`, UNWIND, and RETURN as self-escalation. Only the `path_target` MATCH differs:
```cypher
// Lateral to user (e.g., IAM-002) - targets other IAM users
MATCH path_target = (aws)--(target_user:AWSUser)
WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_user.arn CONTAINS resource OR resource CONTAINS target_user.name)
// Assume-role lateral (e.g., IAM-014) - targets roles the principal can assume
MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
// PassRole + service (e.g., EC2-001) - targets roles trusting a service
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: '{service}.amazonaws.com'})
WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
```
**Multi-permission**: PassRole queries require a second permission. Add `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` with its own WHERE before `path_target`, then check BOTH `stmt.resource` AND `stmt2.resource` against the target. See IAM-015 or EC2-001 in `aws.py` for examples.
### Network exposure pattern
The Internet node is reached via `CAN_ACCESS` through the already-scoped resource, not via a standalone lookup:
```python
AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
@@ -156,18 +183,15 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
description="{Detailed description.}",
provider="aws",
cypher=f"""
// Match the Internet sentinel node
OPTIONAL MATCH (internet:Internet {{_provider_id: $provider_id}})
// Match exposed resources (MUST chain from `aws`)
MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
WHERE resource.exposed_internet = true
// Link Internet to resource
OPTIONAL MATCH (internet)-[can_access:CAN_ACCESS]->(resource)
// Internet node reached via path connectivity through the resource
OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
UNWIND nodes(path) as n
OPTIONAL MATCH (n)-[pfr]-(pf:ProwlerFinding {{status: 'FAIL', provider_uid: $provider_uid}})
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
RETURN path, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
internet, can_access
@@ -176,7 +200,7 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
)
```
### Register in Query List
### Register in query list
Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
@@ -189,11 +213,11 @@ AWS_QUERIES: list[AttackPathsQueryDefinition] = [
---
## Step-by-Step Creation Process
## Step-by-step creation process
### 1. Read the Queries Module
### 1. Read the queries module
**FIRST**, read all files in the queries module to understand the structure:
**FIRST**, read all files in the queries module to understand the structure, type definitions, registration, and existing style:
```
api/src/backend/api/attack_paths/queries/
@@ -203,94 +227,50 @@ api/src/backend/api/attack_paths/queries/
└── {provider}.py # Provider-specific queries (e.g., aws.py)
```
Read these files to learn:
**DO NOT** use generic templates. Match the exact style of existing queries in the file.
- Type definitions and available fields
- How queries are registered
- Current query patterns, style, and naming conventions
### 2. Fetch and consult the Cartography schema
### 2. Determine Schema Source
**This is the most important step.** Every node label, property, and relationship in the query must exist in the Cartography schema for the pinned version. Do not guess or rely on memory.
Check the Cartography dependency in `api/pyproject.toml`:
Check `api/pyproject.toml` for the Cartography dependency, then fetch the schema:
```bash
grep cartography api/pyproject.toml
```
Parse the dependency to determine the schema source:
**If git-based dependency** (e.g., `cartography @ git+https://github.com/prowler-cloud/cartography@0.126.1`):
- Extract the repository (e.g., `prowler-cloud/cartography`)
- Extract the version/tag (e.g., `0.126.1`)
- Fetch schema from that repository at that tag
**If PyPI dependency** (e.g., `cartography = "^0.126.0"` or `cartography>=0.126.0`):
- Extract the version (e.g., `0.126.0`)
- Use the official `cartography-cncf` repository
**Schema URL patterns** (ALWAYS use the specific version tag, not master/main):
Build the schema URL (ALWAYS use the specific tag, not master/main):
```
# Official Cartography (cartography-cncf)
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
# Git dependency (prowler-cloud/cartography@0.126.1):
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/{provider}/schema.md
# Prowler fork (prowler-cloud)
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
# PyPI dependency (cartography = "^0.126.0"):
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/{provider}/schema.md
```
**Examples**:
Read the schema to discover available node labels, properties, and relationships for the target resources. Internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never appear in queries.
```bash
# For prowler-cloud/cartography@0.126.1 (git), fetch AWS schema:
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/aws/schema.md
# For cartography = "^0.126.0" (PyPI), fetch AWS schema:
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/aws/schema.md
```
**IMPORTANT**: Always match the schema version to the dependency version in `pyproject.toml`. Using master/main may reference node labels or properties that don't exist in the deployed version.
**Additional Prowler Labels**: The Attack Paths sync task adds labels that queries can reference:
- `ProwlerFinding` - Prowler finding nodes with `status`, `provider_uid` properties
- `Internet` - Internet sentinel node with `_provider_id` property (used in network exposure queries)
Other internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never be used in queries.
These are defined in `api/src/backend/tasks/jobs/attack_paths/config.py`.
### 3. Consult the Schema for Available Data
Use the Cartography schema to discover:
- What node labels exist for the target resources
- What properties are available on those nodes
- What relationships connect the nodes
This informs query design by showing what data is actually available to query.
### 4. Create Query Definition
### 4. Create query definition
Use the appropriate pattern (privilege escalation or network exposure) with:
- **id**: Auto-generated as `{provider}-{kebab-case-description}`
- **name**: Short, human-friendly label. No raw IAM permissions. For sourced queries (e.g., pathfinding.cloud), append the reference ID in parentheses: `"EC2 Instance Launch with Privileged Role (EC2-001)"`. If the name already has parentheses, prepend the ID inside them: `"ECS Service Creation with Privileged Role (ECS-003 - Existing Cluster)"`.
- **short_description**: Brief explanation of the attack, no technical permissions. E.g., "Launch EC2 instances with privileged IAM roles to gain their permissions via IMDS."
- **description**: Full technical explanation of the attack vector and impact. Plain text only, no HTML or technical permissions here.
- **id**: `{provider}-{kebab-case-description}`
- **name**: Short, human-friendly label. For sourced queries, append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
- **short_description**: Brief explanation, no technical permissions.
- **description**: Full technical explanation. Plain text only.
- **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
- **cypher**: The openCypher query with proper escaping
- **parameters**: Optional list of user-provided parameters (use `parameters=[]` if none needed)
- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes the source, reference ID, and technical permissions (e.g., `"pathfinding.cloud - EC2-001 - iam:PassRole + ec2:RunInstances"`). The `link` is the URL with a lowercase ID (e.g., `"https://pathfinding.cloud/paths/ec2-001"`). Omit (defaults to `None`) for non-sourced queries.
- **parameters**: Optional list of user-provided parameters (`parameters=[]` if none)
- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes source, reference ID, and permissions. The `link` uses a lowercase ID. Omit for non-sourced queries.
### 5. Add Query to Provider List
### 5. Add query to provider list
Add the constant to the `{PROVIDER}_QUERIES` list.
---
## Query Naming Conventions
## Query naming conventions
### Query ID
@@ -298,27 +278,19 @@ Add the constant to the `{PROVIDER}_QUERIES` list.
{provider}-{category}-{description}
```
Examples:
Examples: `aws-ec2-privesc-passrole-iam`, `aws-ec2-instances-internet-exposed`
- `aws-ec2-privesc-passrole-iam`
- `aws-iam-privesc-attach-role-policy-assume-role`
- `aws-ec2-instances-internet-exposed`
### Query Constant Name
### Query constant name
```
{PROVIDER}_{CATEGORY}_{DESCRIPTION}
```
Examples:
- `AWS_EC2_PRIVESC_PASSROLE_IAM`
- `AWS_IAM_PRIVESC_ATTACH_ROLE_POLICY_ASSUME_ROLE`
- `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
---
## Query Categories
## Query categories
| Category | Description | Example |
| -------------------- | ------------------------------ | ------------------------- |
@@ -329,15 +301,15 @@ Examples:
---
## Common openCypher Patterns
## Common openCypher patterns
### Match Account and Principal
### Match account and principal
```cypher
MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
```
### Check IAM Action Permissions
### Check IAM action permissions
```cypher
WHERE stmt.effect = 'Allow'
@@ -348,13 +320,21 @@ WHERE stmt.effect = 'Allow'
)
```
### Find Roles Trusting a Service
### Find roles trusting a service
```cypher
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
```
### Check Resource Scope
### Find roles the principal can assume
Note the arrow direction - `STS_ASSUMEROLE_ALLOW` points from the role to the principal:
```cypher
MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
```
### Check resource scope
```cypher
WHERE any(resource IN stmt.resource WHERE
@@ -364,26 +344,16 @@ WHERE any(resource IN stmt.resource WHERE
)
```
### Match Internet Sentinel Node
### Internet node via path connectivity
Used in network exposure queries. The Internet node is a real graph node, scoped by `_provider_id`:
The Internet node is reached through `CAN_ACCESS` relationships to already-scoped resources. No standalone lookup needed:
```cypher
OPTIONAL MATCH (internet:Internet {_provider_id: $provider_id})
```
### Link Internet to Exposed Resource
The `CAN_ACCESS` relationship is a real graph relationship linking the Internet node to exposed resources:
```cypher
OPTIONAL MATCH (internet)-[can_access:CAN_ACCESS]->(resource)
OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
```
### Multi-label OR (match multiple resource types)
When a query needs to match different resource types in the same position, use label checks in WHERE:
```cypher
MATCH path = (aws:AWSAccount {id: $provider_uid})-[r]-(x)-[q]-(y)
WHERE (x:EC2PrivateIp AND x.public_ip = $ip)
@@ -392,11 +362,11 @@ WHERE (x:EC2PrivateIp AND x.public_ip = $ip)
OR (x:ElasticIPAddress AND x.public_ip = $ip)
```
### Include Prowler Findings
### Include Prowler findings
```cypher
UNWIND nodes(path_principal) + nodes(path_target) as n
OPTIONAL MATCH (n)-[pfr]-(pf:ProwlerFinding {status: 'FAIL', provider_uid: $provider_uid})
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
RETURN path_principal, path_target,
collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
@@ -411,154 +381,84 @@ RETURN path, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
---
## Common Node Labels by Provider
## Prowler-specific labels and relationships
### AWS
These are added by the sync task, not part of the Cartography schema. For all other node labels, properties, and relationships, **always consult the Cartography schema** (see step 2 below).
| Label | Description |
| --------------------- | --------------------------------------- |
| `AWSAccount` | AWS account root |
| `AWSPrincipal` | IAM principal (user, role, service) |
| `AWSRole` | IAM role |
| `AWSUser` | IAM user |
| `AWSPolicy` | IAM policy |
| `AWSPolicyStatement` | Policy statement |
| `AWSTag` | Resource tag (key/value) |
| `EC2Instance` | EC2 instance |
| `EC2SecurityGroup` | Security group |
| `EC2PrivateIp` | EC2 private IP (has `public_ip`) |
| `IpPermissionInbound` | Inbound security group rule |
| `IpRange` | IP range (e.g., `0.0.0.0/0`) |
| `NetworkInterface` | ENI (has `public_ip`) |
| `ElasticIPAddress` | Elastic IP (has `public_ip`) |
| `S3Bucket` | S3 bucket |
| `RDSInstance` | RDS database instance |
| `LoadBalancer` | Classic ELB |
| `LoadBalancerV2` | ALB/NLB |
| `ELBListener` | Classic ELB listener |
| `ELBV2Listener` | ALB/NLB listener |
| `LaunchTemplate` | EC2 launch template |
| `Internet` | Internet sentinel node (`_provider_id`) |
### Common Relationships
| Relationship | Description |
| ---------------------- | ---------------------------------- |
| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship |
| `STS_ASSUMEROLE_ALLOW` | Can assume role |
| `CAN_ACCESS` | Internet-to-resource exposure link |
| `POLICY` | Has policy attached |
| `STATEMENT` | Policy has statement |
| Label/Relationship | Description |
| ---------------------- | -------------------------------------------------- |
| `ProwlerFinding` | Finding node (`status`, `severity`, `check_id`) |
| `Internet` | Internet sentinel node |
| `CAN_ACCESS` | Internet-to-resource exposure (relationship) |
| `HAS_FINDING` | Resource-to-finding link (relationship) |
| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship |
| `STS_ASSUMEROLE_ALLOW` | Can assume role (direction: role -> principal) |
---
## Parameters
For queries requiring user input, define parameters:
For queries requiring user input:
```python
parameters=[
AttackPathsQueryParameterDefinition(
name="ip",
label="IP address",
# data_type defaults to "string", cast defaults to str.
# For non-string params, set both: data_type="integer", cast=int
description="Public IP address, e.g. 192.0.2.0.",
placeholder="192.0.2.0",
),
AttackPathsQueryParameterDefinition(
name="tag_key",
label="Tag key",
description="Tag key to filter resources.",
placeholder="Environment",
),
],
```
---
## Best Practices
## Best practices
1. **Always scope by provider**: Use `{id: $provider_uid}` on `AWSAccount` nodes. Use `{_provider_id: $provider_id}` on any other node that needs provider scoping (e.g., `Internet`).
2. **Use consistent naming**: Follow existing patterns in the file
3. **Include Prowler findings**: Always add the OPTIONAL MATCH for ProwlerFinding nodes
4. **Return distinct findings**: Use `collect(DISTINCT pf)` to avoid duplicates
5. **Comment the query purpose**: Add inline comments explaining each MATCH clause
6. **Validate schema first**: Ensure all node labels and properties exist in Cartography schema
7. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). The tenant database contains data from multiple providers — an unanchored `MATCH` would return nodes from all providers, breaking provider isolation.
1. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). An unanchored `MATCH` would return nodes from all providers.
```cypher
// WRONG: matches ALL AWSRoles across all providers in the tenant DB
// WRONG: matches ALL AWSRoles across all providers
MATCH (role:AWSRole) WHERE role.name = 'admin'
// CORRECT: scoped to the specific account's subgraph
MATCH (aws)--(role:AWSRole) WHERE role.name = 'admin'
```
The `Internet` node is an exception: it uses `OPTIONAL MATCH` with `_provider_id` for scoping instead of chaining from `aws`.
**Exception**: A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph by the first MATCH. It does not need to chain from `aws` again.
2. **Include Prowler findings**: Always add `OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})` with `collect(DISTINCT pf)`.
3. **Comment the query purpose**: Add inline comments explaining each MATCH clause.
4. **Never use internal labels in queries**: `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are for system isolation. They should never appear in predefined or custom query text.
6. **Internet node uses path connectivity**: Reach it via `OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)` where `resource` is already scoped by the account anchor. No standalone lookup.
---
## openCypher Compatibility
## openCypher compatibility
Queries must be written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
Queries must be written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
> **Why Version 9?** Amazon Neptune implements openCypher Version 9. By targeting this specification, queries work on both Neo4j and Neptune without modification.
### Avoid these (not in openCypher spec)
### Avoid These (Not in openCypher spec)
| Feature | Reason | Use instead |
| -------------------------- | ----------------------------------------------- | ------------------------------------------------------ |
| APOC procedures (`apoc.*`) | Neo4j-specific plugin, not available in Neptune | Real nodes and relationships in the graph |
| Neptune extensions | Not available in Neo4j | Standard openCypher |
| `reduce()` function | Not in openCypher spec | `UNWIND` + `collect()` |
| `FOREACH` clause | Not in openCypher spec | `WITH` + `UNWIND` + `SET` |
| Regex operator (`=~`) | Not supported in Neptune | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH` |
| `CALL () { UNION }` | Complex, hard to maintain | Multi-label OR in WHERE (see patterns section) |
| Feature | Use instead |
| -------------------------- | ------------------------------------------------------ |
| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph |
| Neptune extensions | Standard openCypher |
| `reduce()` function | `UNWIND` + `collect()` |
| `FOREACH` clause | `WITH` + `UNWIND` + `SET` |
| Regex operator (`=~`) | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH`. One legacy query uses `=~` - do not add new usages |
| `CALL () { UNION }` | Multi-label OR in WHERE (see patterns section) |
---
## Reference
### pathfinding.cloud (Attack Path Definitions)
- **Repository**: https://github.com/DataDog/pathfinding.cloud
- **All paths JSON**: `https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json`
- Always use Bash with `curl | jq` to fetch paths (WebFetch truncates the large JSON)
### Cartography Schema
- **URL pattern**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
- Always use the version from `api/pyproject.toml`, not master/main
### openCypher Specification
- **Neptune openCypher compliance** (what Neptune supports): https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
- **openCypher project** (spec, grammar, TCK): https://github.com/opencypher/openCypher
---
## Learning from the Queries Module
**IMPORTANT**: Before creating a new query, ALWAYS read the entire queries module:
```
api/src/backend/api/attack_paths/queries/
├── __init__.py # Module exports
├── types.py # Type definitions
├── registry.py # Registry logic
└── {provider}.py # Provider queries (aws.py, etc.)
```
Use the existing queries to learn:
- Query structure and formatting
- Variable naming conventions
- How to include Prowler findings
- Comment style
**DO NOT** use generic templates. Match the exact style of existing queries in the file.
- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`, not WebFetch)
- **Cartography schema**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
- **openCypher spec**: https://github.com/opencypher/openCypher