feat(skills): create new Attack Packs queries in openCypher (#9975)

This commit is contained in:
Josema Camacho
2026-02-06 11:57:33 +01:00
committed by GitHub
parent 619d1ffc62
commit ecc8eaf366
5 changed files with 496 additions and 0 deletions

View File

@@ -77,6 +77,7 @@ Patterns tailored for Prowler development:
| `prowler-provider` | Add new cloud providers |
| `prowler-pr` | Pull request conventions |
| `prowler-docs` | Documentation style guide |
| `prowler-attack-paths-query` | Create Attack Paths openCypher queries |
### Meta Skills

View File

@@ -0,0 +1,479 @@
---
name: prowler-attack-paths-query
description: >
Creates Prowler Attack Paths openCypher queries for graph analysis (compatible with Neo4j and Neptune).
Trigger: When creating or updating Attack Paths queries that detect privilege escalation paths,
network exposure, or security misconfigurations in cloud environments.
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.0"
scope: [root, api]
auto_invoke:
- "Creating Attack Paths queries"
- "Updating existing Attack Paths queries"
- "Adding privilege escalation detection queries"
allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task
---
## Overview
Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs
(ingested via Cartography) to detect security risks like privilege escalation paths,
network exposure, and misconfigurations.
Queries are written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
---
## Input Sources
Queries can be created from:
1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
- The JSON index contains: `id`, `name`, `description`, `services`, `permissions`, `exploitationSteps`, `prerequisites`, etc.
- Reference: https://github.com/DataDog/pathfinding.cloud
**Fetching a single path by ID** — The aggregated `paths.json` is too large for WebFetch
(content gets truncated). Use Bash with `curl` and a JSON parser instead:
Prefer `jq` (concise), fall back to `python3` (guaranteed in this Python project):
```bash
# With jq
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| jq '.[] | select(.id == "ecs-002")'
# With python3 (fallback)
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| python3 -c "import json,sys; print(json.dumps(next((p for p in json.load(sys.stdin) if p['id']=='ecs-002'), None), indent=2))"
```
2. **Listing Available Attack Paths**
- Use Bash to list available paths from the JSON index:
```bash
# List all path IDs and names (jq)
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| jq -r '.[] | "\(.id): \(.name)"'
# List all path IDs and names (python3 fallback)
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| python3 -c "import json,sys; [print(f\"{p['id']}: {p['name']}\") for p in json.load(sys.stdin)]"
# List paths filtered by service prefix
curl -s https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json \
| jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
```
3. **Natural Language Description**
- User describes the Attack Paths in plain language
- Agent maps to appropriate openCypher patterns
---
## Query Structure
### File Location
```
api/src/backend/api/attack_paths/queries/{provider}.py
```
Example: `api/src/backend/api/attack_paths/queries/aws.py`
### Query Definition Pattern
```python
from api.attack_paths.queries.types import (
AttackPathsQueryDefinition,
AttackPathsQueryParameterDefinition,
)
from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
# {REFERENCE_ID} (e.g., EC2-001, GLUE-001)
AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
id="aws-{kebab-case-name}",
name="Privilege Escalation: {permission1} + {permission2}",
description="{Detailed description of the Attack Paths}.",
provider="aws",
cypher=f"""
// Find principals with {permission1}
MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
WHERE stmt.effect = 'Allow'
AND any(action IN stmt.action WHERE
toLower(action) = '{permission1_lowercase}'
OR toLower(action) = '{service}:*'
OR action = '*'
)
// Find {permission2}
MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)
WHERE stmt2.effect = 'Allow'
AND any(action IN stmt2.action WHERE
toLower(action) = '{permission2_lowercase}'
OR toLower(action) = '{service2}:*'
OR action = '*'
)
// Find target resources
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {{arn: '{service}.amazonaws.com'}})
WHERE any(resource IN stmt.resource WHERE
resource = '*'
OR target_role.arn CONTAINS resource
OR resource CONTAINS target_role.name
)
UNWIND nodes(path_principal) + nodes(path_target) as n
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL', provider_uid: $provider_uid}})
RETURN path_principal, path_target,
collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
""",
parameters=[],
)
```
### Register in Query List
Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
```python
AWS_QUERIES: list[AttackPathsQueryDefinition] = [
# ... existing queries ...
AWS_{NEW_QUERY_NAME}, # Add here
]
```
---
## Step-by-Step Creation Process
### 1. Read the Queries Module
**FIRST**, read all files in the queries module to understand the structure:
```
api/src/backend/api/attack_paths/queries/
├── __init__.py # Module exports
├── types.py # AttackPathsQueryDefinition, AttackPathsQueryParameterDefinition
├── registry.py # Query registry logic
└── {provider}.py # Provider-specific queries (e.g., aws.py)
```
Read these files to learn:
- Type definitions and available fields
- How queries are registered
- Current query patterns, style, and naming conventions
### 2. Determine Schema Source
Check the Cartography dependency in `api/pyproject.toml`:
```bash
grep cartography api/pyproject.toml
```
Parse the dependency to determine the schema source:
**If git-based dependency** (e.g., `cartography @ git+https://github.com/prowler-cloud/cartography@0.126.1`):
- Extract the repository (e.g., `prowler-cloud/cartography`)
- Extract the version/tag (e.g., `0.126.1`)
- Fetch schema from that repository at that tag
**If PyPI dependency** (e.g., `cartography = "^0.126.0"` or `cartography>=0.126.0`):
- Extract the version (e.g., `0.126.0`)
- Use the official `cartography-cncf` repository
**Schema URL patterns** (ALWAYS use the specific version tag, not master/main):
```
# Official Cartography (cartography-cncf)
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
# Prowler fork (prowler-cloud)
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md
```
**Examples**:
```bash
# For prowler-cloud/cartography@0.126.1 (git), fetch AWS schema:
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/aws/schema.md
# For cartography = "^0.126.0" (PyPI), fetch AWS schema:
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/aws/schema.md
```
**IMPORTANT**: Always match the schema version to the dependency version in `pyproject.toml`. Using master/main may reference node labels or properties that don't exist in the deployed version.
**Additional Prowler Labels**: The Attack Paths sync task adds extra labels:
- `ProwlerFinding` - Prowler finding nodes with `status`, `provider_uid` properties
- `ProviderResource` - Generic resource marker
- `{Provider}Resource` - Provider-specific marker (e.g., `AWSResource`)
These are defined in `api/src/backend/tasks/jobs/attack_paths/config.py`.
### 3. Consult the Schema for Available Data
Use the Cartography schema to discover:
- What node labels exist for the target resources
- What properties are available on those nodes
- What relationships connect the nodes
This informs query design by showing what data is actually available to query.
### 4. Create Query Definition
Use the standard pattern (see above) with:
- **id**: Auto-generated as `{provider}-{kebab-case-description}`
- **name**: Human-readable, e.g., "Privilege Escalation: {perm1} + {perm2}"
- **description**: Explain the attack vector and impact
- **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
- **cypher**: The openCypher query with proper escaping
- **parameters**: Optional list of user-provided parameters (use `parameters=[]` if none needed)
### 5. Add Query to Provider List
Add the constant to the `{PROVIDER}_QUERIES` list.
---
## Query Naming Conventions
### Query ID
```
{provider}-{category}-{description}
```
Examples:
- `aws-ec2-privesc-passrole-iam`
- `aws-iam-privesc-attach-role-policy-assume-role`
- `aws-rds-unencrypted-storage`
### Query Constant Name
```
{PROVIDER}_{CATEGORY}_{DESCRIPTION}
```
Examples:
- `AWS_EC2_PRIVESC_PASSROLE_IAM`
- `AWS_IAM_PRIVESC_ATTACH_ROLE_POLICY_ASSUME_ROLE`
- `AWS_RDS_UNENCRYPTED_STORAGE`
---
## Query Categories
| Category | Description | Example |
| -------------------- | ------------------------------ | ------------------------- |
| Basic Resource | List resources with properties | RDS instances, S3 buckets |
| Network Exposure | Internet-exposed resources | EC2 with public IPs |
| Privilege Escalation | IAM privilege escalation paths | PassRole + RunInstances |
| Data Access | Access to sensitive data | EC2 with S3 access |
---
## Common openCypher Patterns
### Match Account and Principal
```cypher
MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
```
### Check IAM Action Permissions
```cypher
WHERE stmt.effect = 'Allow'
AND any(action IN stmt.action WHERE
toLower(action) = 'iam:passrole'
OR toLower(action) = 'iam:*'
OR action = '*'
)
```
### Find Roles Trusting a Service
```cypher
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
```
### Check Resource Scope
```cypher
WHERE any(resource IN stmt.resource WHERE
resource = '*'
OR target_role.arn CONTAINS resource
OR resource CONTAINS target_role.name
)
```
### Include Prowler Findings
```cypher
UNWIND nodes(path_principal) + nodes(path_target) as n
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {status: 'FAIL', provider_uid: $provider_uid})
RETURN path_principal, path_target,
collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
```
---
## Common Node Labels by Provider
### AWS
| Label | Description |
| -------------------- | ----------------------------------- |
| `AWSAccount` | AWS account root |
| `AWSPrincipal` | IAM principal (user, role, service) |
| `AWSRole` | IAM role |
| `AWSUser` | IAM user |
| `AWSPolicy` | IAM policy |
| `AWSPolicyStatement` | Policy statement |
| `EC2Instance` | EC2 instance |
| `EC2SecurityGroup` | Security group |
| `S3Bucket` | S3 bucket |
| `RDSInstance` | RDS database instance |
| `LoadBalancer` | Classic ELB |
| `LoadBalancerV2` | ALB/NLB |
| `LaunchTemplate` | EC2 launch template |
### Common Relationships
| Relationship | Description |
| ---------------------- | ----------------------- |
| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship |
| `STS_ASSUMEROLE_ALLOW` | Can assume role |
| `POLICY` | Has policy attached |
| `STATEMENT` | Policy has statement |
---
## Parameters
For queries requiring user input, define parameters:
```python
parameters=[
AttackPathsQueryParameterDefinition(
name="ip",
label="IP address",
description="Public IP address, e.g. 192.0.2.0.",
placeholder="192.0.2.0",
),
AttackPathsQueryParameterDefinition(
name="tag_key",
label="Tag key",
description="Tag key to filter resources.",
placeholder="Environment",
),
],
```
---
## Best Practices
1. **Always filter by provider_uid**: Use `{id: $provider_uid}` on account nodes and `{provider_uid: $provider_uid}` on ProwlerFinding nodes
2. **Use consistent naming**: Follow existing patterns in the file
3. **Include Prowler findings**: Always add the OPTIONAL MATCH for ProwlerFinding nodes
4. **Return distinct findings**: Use `collect(DISTINCT pf)` to avoid duplicates
5. **Comment the query purpose**: Add inline comments explaining each MATCH clause
6. **Validate schema first**: Ensure all node labels and properties exist in Cartography schema
---
## openCypher Compatibility
Queries must be written in **openCypher Version 9** to ensure compatibility with both Neo4j and Amazon Neptune.
> **Why Version 9?** Amazon Neptune implements openCypher Version 9. By targeting this specification, queries work on both Neo4j and Neptune without modification.
### Avoid These (Not in openCypher spec)
| Feature | Reason |
| --------------------------------------------------- | ----------------------------------------------- |
| APOC procedures (`apoc.*`) | Neo4j-specific plugin, not available in Neptune |
| Virtual nodes (`apoc.create.vNode`) | APOC-specific |
| Virtual relationships (`apoc.create.vRelationship`) | APOC-specific |
| Neptune extensions | Not available in Neo4j |
| `reduce()` function | Use `UNWIND` + aggregation instead |
| `FOREACH` clause | Use `WITH` + `UNWIND` + `SET` instead |
| Regex match operator (`=~`) | Not supported in Neptune |
### CALL Subqueries
Supported with limitations:
- Use `WITH` clause to import variables: `CALL { WITH var ... }`
- Updates inside CALL subqueries are NOT supported
- Emitted variables cannot overlap with variables before the CALL
---
## Reference
### pathfinding.cloud (Attack Path Definitions)
- **Repository**: https://github.com/DataDog/pathfinding.cloud
- **All paths JSON**: `https://raw.githubusercontent.com/DataDog/pathfinding.cloud/main/docs/paths.json`
- Use WebFetch to query specific paths or list available services
### Cartography Schema
- **URL pattern**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
- Always use the version from `api/pyproject.toml`, not master/main
### openCypher Specification
- **Neptune openCypher compliance** (what Neptune supports): https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
- **Rewriting Cypher for Neptune** (converting Neo4j-specific syntax): https://docs.aws.amazon.com/neptune/latest/userguide/migration-opencypher-rewrites.html
- **openCypher project** (spec, grammar, TCK): https://github.com/opencypher/openCypher
---
## Learning from the Queries Module
**IMPORTANT**: Before creating a new query, ALWAYS read the entire queries module:
```
api/src/backend/api/attack_paths/queries/
├── __init__.py # Module exports
├── types.py # Type definitions
├── registry.py # Registry logic
└── {provider}.py # Provider queries (aws.py, etc.)
```
Use the existing queries to learn:
- Query structure and formatting
- Variable naming conventions
- How to include Prowler findings
- Comment style
> **Compatibility Warning**: Some existing queries use Neo4j-specific features
> (e.g., `apoc.create.vNode`, `apoc.create.vRelationship`, regex `=~`) that are
> **NOT compatible** with Amazon Neptune. Use these queries to learn general
> patterns (structure, naming, Prowler findings integration, comment style) but
> **DO NOT copy APOC procedures or other Neo4j-specific syntax** into new queries.
> New queries must be pure openCypher Version 9. Refer to the
> [openCypher Compatibility](#opencypher-compatibility) section for the full list
> of features to avoid.
**DO NOT** use generic templates. Match the exact style of existing **compatible** queries in the file.