mirror of
https://github.com/prowler-cloud/prowler.git
synced 2026-07-04 19:21:51 +00:00
feat(api): make Attack Paths sink selectable between Neo4j and Neptune (#11524)
This commit is contained in:
@@ -169,3 +169,7 @@ GEMINI.md
|
||||
|
||||
# Claude Code
|
||||
.claude/*
|
||||
|
||||
# Docker
|
||||
docker-compose.override.yml
|
||||
docker-compose-dev.override.yml
|
||||
|
||||
@@ -83,16 +83,35 @@ prowler dashboard
|
||||
|
||||
## Attack Paths
|
||||
|
||||
Attack Paths automatically extends every completed AWS scan with a Neo4j graph that combines Cartography's cloud inventory with Prowler findings. The feature runs in the API worker after each scan and therefore requires:
|
||||
Attack Paths automatically extends every completed AWS scan with a graph that combines Cartography's cloud inventory with Prowler findings. The feature runs in the API worker after each scan.
|
||||
|
||||
- An accessible Neo4j instance (the Docker Compose files already ships a `neo4j` service).
|
||||
- The following environment variables so Django and Celery can connect:
|
||||
Two graph backends are supported as the long-lived sink:
|
||||
|
||||
| Variable | Description | Default |
|
||||
| --- | --- | --- |
|
||||
| `NEO4J_HOST` | Hostname used by the API containers. | `neo4j` |
|
||||
| `NEO4J_PORT` | Bolt port exposed by Neo4j. | `7687` |
|
||||
| `NEO4J_USER` / `NEO4J_PASSWORD` | Credentials with rights to create per-tenant databases. | `neo4j` / `neo4j_password` |
|
||||
- **Neo4j** (default; the Docker Compose files already ship a `neo4j` service).
|
||||
- **Amazon Neptune** (cloud-managed; opt-in).
|
||||
|
||||
Select the sink with `ATTACK_PATHS_SINK_DATABASE` (`neo4j` or `neptune`; default `neo4j`).
|
||||
|
||||
> Note: Cartography ingestion always uses a temporary Neo4j database, regardless of the configured sink. The `NEO4J_*` variables below must remain set even when `ATTACK_PATHS_SINK_DATABASE=neptune`.
|
||||
|
||||
### Neo4j sink
|
||||
|
||||
| Variable | Description | Default |
|
||||
| --- | --- | --- |
|
||||
| `NEO4J_HOST` | Hostname used by the API containers. | `neo4j` |
|
||||
| `NEO4J_PORT` | Bolt port exposed by Neo4j. | `7687` |
|
||||
| `NEO4J_USER` / `NEO4J_PASSWORD` | Credentials with rights to create per-tenant databases. | `neo4j` / `neo4j_password` |
|
||||
|
||||
### Neptune sink
|
||||
|
||||
| Variable | Description | Default |
|
||||
| --- | --- | --- |
|
||||
| `NEPTUNE_WRITER_ENDPOINT` | Bolt host for the Neptune writer instance. Required when sink is `neptune`. | _empty_ |
|
||||
| `NEPTUNE_READER_ENDPOINT` | Optional reader endpoint for read-only queries. Falls back to the writer when unset. | _empty_ |
|
||||
| `NEPTUNE_PORT` | Bolt port exposed by Neptune. | `8182` |
|
||||
| `AWS_REGION` | Region the Neptune cluster lives in. Required when sink is `neptune`. | _empty_ |
|
||||
|
||||
Neptune authenticates with SigV4 using the standard boto3 credential chain. The worker's IAM role (or `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`) supplies the credentials. There is no Neptune password variable.
|
||||
|
||||
Every AWS provider scan will enqueue an Attack Paths ingestion job automatically. Other cloud providers will be added in future iterations.
|
||||
|
||||
|
||||
@@ -2,6 +2,14 @@
|
||||
|
||||
All notable changes to the **Prowler API** are documented in this file.
|
||||
|
||||
## [1.33.0] (Prowler UNRELEASED)
|
||||
|
||||
### 🔄 Changed
|
||||
|
||||
- Attack Paths: AWS Neptune is now supported as a persistent sink database, selectable via `ATTACK_PATHS_SINK_DATABASE=neptune` (default `neo4j`), Cartography's (bumped to 0.138.1) per-scan ingest database stays on Neo4j [(#11524)](https://github.com/prowler-cloud/prowler/pull/11524)
|
||||
|
||||
---
|
||||
|
||||
## [1.32.2] (Prowler UNRELEASED)
|
||||
|
||||
### 🐞 Fixed
|
||||
|
||||
+11
-5
@@ -58,7 +58,7 @@ dependencies = [
|
||||
"matplotlib (==3.10.8)",
|
||||
"reportlab (==4.4.10)",
|
||||
"neo4j (==6.1.0)",
|
||||
"cartography (==0.135.0)",
|
||||
"cartography (==0.138.1)",
|
||||
"gevent (==25.9.1)",
|
||||
"werkzeug (==3.1.7)",
|
||||
"sqlparse (==0.5.5)",
|
||||
@@ -193,7 +193,7 @@ constraint-dependencies = [
|
||||
"blinker==1.9.0",
|
||||
"boto3==1.40.61",
|
||||
"botocore==1.40.61",
|
||||
"cartography==0.135.0",
|
||||
"cartography==0.138.1",
|
||||
"celery==5.6.2",
|
||||
"certifi==2026.1.4",
|
||||
"cffi==2.0.0",
|
||||
@@ -447,7 +447,7 @@ constraint-dependencies = [
|
||||
"wcwidth==0.5.3",
|
||||
"websocket-client==1.9.0",
|
||||
"werkzeug==3.1.7",
|
||||
"workos==6.0.4",
|
||||
"workos==6.0.8",
|
||||
"wrapt==1.17.3",
|
||||
"xlsxwriter==3.2.9",
|
||||
"xmlsec==1.3.17",
|
||||
@@ -458,8 +458,13 @@ constraint-dependencies = [
|
||||
"zope-interface==8.2",
|
||||
"zstd==1.5.7.3"
|
||||
]
|
||||
# prowler@master needs okta==3.4.2; cartography 0.135.0 declares okta<1.0.0 for an
|
||||
# integration prowler does not import.
|
||||
# prowler@master needs okta==3.4.2, but cartography 0.138.1 requires okta<1.0.0.
|
||||
# Attack Paths does not ingest Okta today, so override the Cartography
|
||||
# dependency to the Prowler pin.
|
||||
#
|
||||
# prowler@master needs azure-mgmt-containerservice==34.1.0, but cartography
|
||||
# 0.138.1 requires azure-mgmt-containerservice>=41.0.0. Attack Paths does not
|
||||
# ingest Azure today, so override the Cartography dependency to the Prowler pin.
|
||||
#
|
||||
# prowler@master hard-pins microsoft-kiota-abstractions==1.9.2 in [project.dependencies].
|
||||
# The microsoft-kiota-http security bump to 1.9.9 (GHSA-7j59-v9qr-6fq9) requires
|
||||
@@ -475,6 +480,7 @@ constraint-dependencies = [
|
||||
# that request pyjwt[crypto] and leave cryptography (needed for RS256) only transitive.
|
||||
override-dependencies = [
|
||||
"okta==3.4.2",
|
||||
"azure-mgmt-containerservice==34.1.0",
|
||||
"microsoft-kiota-abstractions==1.9.9",
|
||||
"dulwich==1.2.5",
|
||||
"pyjwt[crypto]==2.13.0"
|
||||
|
||||
@@ -42,9 +42,6 @@ class ApiConfig(AppConfig):
|
||||
):
|
||||
self._ensure_crypto_keys()
|
||||
|
||||
# Neo4j driver is created lazily on first use (see api.attack_paths.database).
|
||||
# App init never contacts Neo4j, so a Neo4j outage cannot block API startup.
|
||||
|
||||
def _ensure_crypto_keys(self):
|
||||
"""
|
||||
Orchestrator method that ensures all required cryptographic keys are present.
|
||||
|
||||
@@ -4,10 +4,10 @@ Cypher sanitizer for custom (user-supplied) Attack Paths queries.
|
||||
Two responsibilities:
|
||||
|
||||
1. **Validation** - reject queries containing SSRF or dangerous procedure
|
||||
patterns (defense-in-depth; the primary control is ``neo4j.READ_ACCESS``).
|
||||
patterns (defense-in-depth; the primary control is `neo4j.READ_ACCESS`).
|
||||
|
||||
2. **Provider-scoped label injection** - inject a dynamic
|
||||
``_Provider_{uuid}`` label into every node pattern so the database can
|
||||
`_Provider_{uuid}` label into every node pattern so the database can
|
||||
use its native label index for provider isolation.
|
||||
|
||||
Label-injection pipeline:
|
||||
@@ -25,13 +25,13 @@ from rest_framework.exceptions import ValidationError
|
||||
from tasks.jobs.attack_paths.config import get_provider_label
|
||||
|
||||
# Step 1 - String / comment protection
|
||||
# Single combined regex: strings first, then line comments.
|
||||
# Single combined regex: strings first, then line comments
|
||||
# The regex engine finds the leftmost match, so a string like 'https://prowler.com'
|
||||
# is consumed as a string before the // inside it can match as a comment.
|
||||
# is consumed as a string before the // inside it can match as a comment
|
||||
_PROTECTED_RE = re.compile(r"'(?:[^'\\]|\\.)*'|\"(?:[^\"\\]|\\.)*\"|//[^\n]*")
|
||||
|
||||
# Step 2 - Clause splitting
|
||||
# OPTIONAL MATCH must come before MATCH to avoid partial matching.
|
||||
# `OPTIONAL MATCH` must come before `MATCH` to avoid partial matching
|
||||
_CLAUSE_RE = re.compile(
|
||||
r"\b(OPTIONAL\s+MATCH|MATCH|WHERE|RETURN|WITH|ORDER\s+BY"
|
||||
r"|SKIP|LIMIT|UNION|UNWIND|CALL)\b",
|
||||
@@ -39,10 +39,10 @@ _CLAUSE_RE = re.compile(
|
||||
)
|
||||
|
||||
# Pass A - Labeled node patterns (all segments)
|
||||
# Matches node patterns that have at least one :Label.
|
||||
# (?<!\w)\( - open paren NOT preceded by a word char (excludes function calls).
|
||||
# Group 1: optional variable + one or more :Label
|
||||
# Group 2: optional {properties} + closing paren
|
||||
# Matches node patterns that have at least one `:Label`
|
||||
# `(?<!\w)\(` - open paren NOT preceded by a word char, excludes function calls
|
||||
# Group 1: optional variable + one or more `:Label`
|
||||
# Group 2: optional `{`properties`}` + closing paren
|
||||
_LABELED_NODE_RE = re.compile(
|
||||
r"(?<!\w)\("
|
||||
r"("
|
||||
@@ -55,9 +55,9 @@ _LABELED_NODE_RE = re.compile(
|
||||
r")"
|
||||
)
|
||||
|
||||
# Pass B - Bare node patterns (MATCH segments only)
|
||||
# Matches (identifier) or (identifier {properties}) without any :Label.
|
||||
# Only applied in MATCH/OPTIONAL MATCH segments.
|
||||
# Pass B - Bare node patterns (`MATCH` segments only)
|
||||
# Matches (identifier) or (identifier {properties}) without any `:Label`
|
||||
# Only applied in `MATCH` / `OPTIONAL MATCH` segments
|
||||
_BARE_NODE_RE = re.compile(
|
||||
r"(?<!\w)\(" r"(\s*[a-zA-Z_]\w*)" r"(\s*(?:\{[^}]*\})?)" r"\s*\)"
|
||||
)
|
||||
@@ -134,9 +134,7 @@ def inject_provider_label(cypher: str, provider_id: str) -> str:
|
||||
return work
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Validation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
# Patterns that indicate SSRF or dangerous procedure calls
|
||||
# Defense-in-depth layer - the primary control is `neo4j.READ_ACCESS`
|
||||
|
||||
@@ -1,261 +1,32 @@
|
||||
import atexit
|
||||
import logging
|
||||
import threading
|
||||
from collections.abc import Iterator
|
||||
from contextlib import contextmanager
|
||||
"""Backwards-compatible facade over the ingest and sink modules.
|
||||
|
||||
Historically this module owned a single Neo4j driver used for both the
|
||||
cartography temp database and the per-tenant sink database. The port to AWS
|
||||
Neptune split those roles: the cartography ingest (temp) database is always
|
||||
Neo4j and lives in `api.attack_paths.ingest`; the sink is configurable
|
||||
(Neo4j or Neptune) and lives in `api.attack_paths.sink`. This shim preserves
|
||||
the public API that `tasks/` and `api/v1/views.py` already depend on, and
|
||||
dispatches to the right module by database-name prefix.
|
||||
|
||||
A database name starting with `db-tmp-scan-` is a cartography temp DB and
|
||||
routes to ingest. Everything else routes to the configured sink.
|
||||
"""
|
||||
|
||||
from contextlib import AbstractContextManager
|
||||
from typing import Any
|
||||
from uuid import UUID
|
||||
|
||||
import neo4j
|
||||
import neo4j.exceptions
|
||||
from api.attack_paths.retryable_session import RetryableSession
|
||||
import neo4j # noqa: F401 - kept for tests that patch api.attack_paths.database.neo4j
|
||||
from api.attack_paths import ingest
|
||||
from api.attack_paths import sink as sink_module
|
||||
from config.env import env
|
||||
from django.conf import settings
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
BATCH_SIZE,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_provider_label,
|
||||
from django.conf import (
|
||||
settings, # noqa: F401 - kept for tests that patch ...database.settings
|
||||
)
|
||||
|
||||
# Without this Celery goes crazy with Neo4j logging
|
||||
logging.getLogger("neo4j").setLevel(logging.ERROR)
|
||||
logging.getLogger("neo4j").propagate = False
|
||||
|
||||
SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
|
||||
"ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
|
||||
)
|
||||
READ_QUERY_TIMEOUT_SECONDS = env.int(
|
||||
"ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
|
||||
)
|
||||
MAX_CUSTOM_QUERY_NODES = env.int("ATTACK_PATHS_MAX_CUSTOM_QUERY_NODES", default=250)
|
||||
# Shorter than CONN_ACQUISITION_TIMEOUT — the driver requires acquisition to be
|
||||
# the longer of the two (it may include opening a new connection).
|
||||
CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
|
||||
CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
|
||||
READ_EXCEPTION_CODES = [
|
||||
"Neo.ClientError.Statement.AccessMode",
|
||||
"Neo.ClientError.Procedure.ProcedureNotFound",
|
||||
]
|
||||
CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
|
||||
|
||||
# Module-level process-wide driver singleton
|
||||
_driver: neo4j.Driver | None = None
|
||||
_lock = threading.Lock()
|
||||
|
||||
# Base Neo4j functions
|
||||
|
||||
|
||||
def get_uri() -> str:
|
||||
host = settings.DATABASES["neo4j"]["HOST"]
|
||||
port = settings.DATABASES["neo4j"]["PORT"]
|
||||
return f"bolt://{host}:{port}"
|
||||
|
||||
|
||||
def init_driver() -> neo4j.Driver:
|
||||
global _driver
|
||||
if _driver is not None:
|
||||
return _driver
|
||||
|
||||
with _lock:
|
||||
if _driver is None:
|
||||
uri = get_uri()
|
||||
config = settings.DATABASES["neo4j"]
|
||||
|
||||
driver = neo4j.GraphDatabase.driver(
|
||||
uri,
|
||||
auth=(config["USER"], config["PASSWORD"]),
|
||||
keep_alive=True,
|
||||
max_connection_lifetime=7200,
|
||||
connection_timeout=CONNECTION_TIMEOUT,
|
||||
connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
|
||||
max_connection_pool_size=50,
|
||||
)
|
||||
# Publish the singleton only after connectivity is verified so a
|
||||
# failed probe does not leave an unverified driver behind. Close the
|
||||
# driver on failure so a repeatedly-probed outage cannot leak pools.
|
||||
try:
|
||||
driver.verify_connectivity()
|
||||
except Exception:
|
||||
driver.close()
|
||||
raise
|
||||
_driver = driver
|
||||
|
||||
# Register cleanup handler (only runs once since we're inside the _driver is None block)
|
||||
atexit.register(close_driver)
|
||||
|
||||
return _driver
|
||||
|
||||
|
||||
def get_driver() -> neo4j.Driver:
|
||||
return init_driver()
|
||||
|
||||
|
||||
def close_driver() -> None: # TODO: Use it
|
||||
global _driver
|
||||
with _lock:
|
||||
if _driver is not None:
|
||||
try:
|
||||
_driver.close()
|
||||
|
||||
finally:
|
||||
_driver = None
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_session(
|
||||
database: str | None = None, default_access_mode: str | None = None
|
||||
) -> Iterator[RetryableSession]:
|
||||
session_wrapper: RetryableSession | None = None
|
||||
|
||||
try:
|
||||
session_wrapper = RetryableSession(
|
||||
session_factory=lambda: get_driver().session(
|
||||
database=database, default_access_mode=default_access_mode
|
||||
),
|
||||
max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
|
||||
)
|
||||
yield session_wrapper
|
||||
|
||||
except neo4j.exceptions.Neo4jError as exc:
|
||||
if (
|
||||
default_access_mode == neo4j.READ_ACCESS
|
||||
and exc.code
|
||||
and exc.code in READ_EXCEPTION_CODES
|
||||
):
|
||||
message = "Read query not allowed"
|
||||
code = READ_EXCEPTION_CODES[0]
|
||||
raise WriteQueryNotAllowedException(message=message, code=code)
|
||||
|
||||
message = exc.message if exc.message is not None else str(exc)
|
||||
|
||||
if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
|
||||
raise ClientStatementException(message=message, code=exc.code)
|
||||
|
||||
raise GraphDatabaseQueryException(message=message, code=exc.code)
|
||||
|
||||
finally:
|
||||
if session_wrapper is not None:
|
||||
session_wrapper.close()
|
||||
|
||||
|
||||
def execute_read_query(
|
||||
database: str,
|
||||
cypher: str,
|
||||
parameters: dict[str, Any] | None = None,
|
||||
) -> neo4j.graph.Graph:
|
||||
with get_session(database, default_access_mode=neo4j.READ_ACCESS) as session:
|
||||
|
||||
def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
|
||||
result = tx.run(
|
||||
cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
|
||||
)
|
||||
return result.graph()
|
||||
|
||||
return session.execute_read(_run)
|
||||
|
||||
|
||||
def create_database(database: str) -> None:
|
||||
query = "CREATE DATABASE $database IF NOT EXISTS"
|
||||
parameters = {"database": database}
|
||||
|
||||
with get_session() as session:
|
||||
session.run(query, parameters)
|
||||
|
||||
|
||||
def drop_database(database: str) -> None:
|
||||
query = f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA"
|
||||
|
||||
with get_session() as session:
|
||||
session.run(query)
|
||||
|
||||
|
||||
def drop_subgraph(database: str, provider_id: str) -> int:
|
||||
"""
|
||||
Delete all nodes for a provider from the tenant database.
|
||||
|
||||
Deletes relationships then nodes in batches (not `DETACH DELETE`) so a dense
|
||||
provider's graph cannot exceed Neo4j's transaction memory limit.
|
||||
Silently returns 0 if the database doesn't exist.
|
||||
"""
|
||||
provider_label = get_provider_label(provider_id)
|
||||
deleted_nodes = 0
|
||||
|
||||
try:
|
||||
with get_session(database) as session:
|
||||
# Phase 1: delete relationships incident to provider nodes in batches.
|
||||
deleted_count = 1
|
||||
while deleted_count > 0:
|
||||
result = session.run(
|
||||
f"""
|
||||
MATCH (:`{provider_label}`)-[r]-()
|
||||
WITH DISTINCT r LIMIT $batch_size
|
||||
DELETE r
|
||||
RETURN COUNT(r) AS deleted_rels_count
|
||||
""",
|
||||
{"batch_size": BATCH_SIZE},
|
||||
)
|
||||
deleted_count = result.single().get("deleted_rels_count", 0)
|
||||
|
||||
# Phase 2: delete the now relationship-free nodes in batches.
|
||||
deleted_count = 1
|
||||
while deleted_count > 0:
|
||||
result = session.run(
|
||||
f"""
|
||||
MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`)
|
||||
WITH n LIMIT $batch_size
|
||||
DELETE n
|
||||
RETURN COUNT(n) AS deleted_nodes_count
|
||||
""",
|
||||
{"batch_size": BATCH_SIZE},
|
||||
)
|
||||
deleted_count = result.single().get("deleted_nodes_count", 0)
|
||||
deleted_nodes += deleted_count
|
||||
|
||||
except GraphDatabaseQueryException as exc:
|
||||
if exc.code == "Neo.ClientError.Database.DatabaseNotFound":
|
||||
return 0
|
||||
raise
|
||||
|
||||
return deleted_nodes
|
||||
|
||||
|
||||
def has_provider_data(database: str, provider_id: str) -> bool:
|
||||
"""
|
||||
Check if any ProviderResource node exists for this provider.
|
||||
|
||||
Returns `False` if the database doesn't exist.
|
||||
"""
|
||||
provider_label = get_provider_label(provider_id)
|
||||
query = f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
|
||||
|
||||
try:
|
||||
with get_session(database, default_access_mode=neo4j.READ_ACCESS) as session:
|
||||
result = session.run(query)
|
||||
return result.single() is not None
|
||||
|
||||
except GraphDatabaseQueryException as exc:
|
||||
if exc.code == "Neo.ClientError.Database.DatabaseNotFound":
|
||||
return False
|
||||
raise
|
||||
|
||||
|
||||
def clear_cache(database: str) -> None:
|
||||
query = "CALL db.clearQueryCaches()"
|
||||
|
||||
try:
|
||||
with get_session(database) as session:
|
||||
session.run(query)
|
||||
|
||||
except GraphDatabaseQueryException as exc:
|
||||
logging.warning(f"Failed to clear query cache for database `{database}`: {exc}")
|
||||
|
||||
|
||||
# Neo4j functions related to Prowler + Cartography
|
||||
|
||||
|
||||
def get_database_name(entity_id: str | UUID, temporary: bool = False) -> str:
|
||||
prefix = "tmp-scan" if temporary else "tenant"
|
||||
return f"db-{prefix}-{str(entity_id).lower()}"
|
||||
TEMP_DB_PREFIX = "db-tmp-scan-"
|
||||
|
||||
|
||||
# Exceptions
|
||||
@@ -270,7 +41,6 @@ class GraphDatabaseQueryException(Exception):
|
||||
def __str__(self) -> str:
|
||||
if self.code:
|
||||
return f"{self.code}: {self.message}"
|
||||
|
||||
return self.message
|
||||
|
||||
|
||||
@@ -280,3 +50,152 @@ class WriteQueryNotAllowedException(GraphDatabaseQueryException):
|
||||
|
||||
class ClientStatementException(GraphDatabaseQueryException):
|
||||
pass
|
||||
|
||||
|
||||
# Routing
|
||||
|
||||
|
||||
def _is_ingest_database(database: str | None) -> bool:
|
||||
return bool(database) and database.startswith(TEMP_DB_PREFIX)
|
||||
|
||||
|
||||
# Driver lifecycle
|
||||
|
||||
|
||||
def init_driver() -> Any:
|
||||
"""Initialize the configured sink backend.
|
||||
|
||||
The ingest driver (Neo4j for cartography temp DBs) stays lazy: it is
|
||||
only initialized when a temp-DB operation actually runs, which never
|
||||
happens on API pods.
|
||||
"""
|
||||
return sink_module.init()
|
||||
|
||||
|
||||
def close_driver() -> None:
|
||||
"""Close every driver held by this process."""
|
||||
sink_module.close()
|
||||
ingest.close_driver()
|
||||
|
||||
|
||||
def get_driver() -> neo4j.Driver:
|
||||
"""Return the sink backend's underlying driver.
|
||||
|
||||
Only meaningful for the Neo4j sink (where the backend has a single Neo4j
|
||||
driver). On Neptune this returns the writer driver. Kept for tests and
|
||||
legacy call-sites; prefer `get_session` for new code.
|
||||
"""
|
||||
backend = sink_module.get_backend()
|
||||
|
||||
# Neo4jSink exposes get_driver(); NeptuneSink exposes get_writer()
|
||||
if hasattr(backend, "get_driver"):
|
||||
return backend.get_driver()
|
||||
|
||||
if hasattr(backend, "get_writer"):
|
||||
return backend.get_writer()
|
||||
|
||||
raise RuntimeError("Active sink backend does not expose a driver handle")
|
||||
|
||||
|
||||
def verify_connectivity() -> None:
|
||||
"""Raise if the configured graph database is unreachable on the API read path.
|
||||
|
||||
Backend-agnostic entry point for the readiness probe: Neo4j verifies its
|
||||
driver, Neptune verifies the reader endpoint.
|
||||
"""
|
||||
sink_module.get_backend().verify_connectivity()
|
||||
|
||||
|
||||
def get_uri() -> str:
|
||||
"""Return the sink URI. Retained for backwards compatibility."""
|
||||
if settings.ATTACK_PATHS_SINK_DATABASE == "neptune":
|
||||
cfg = settings.DATABASES["neptune"]
|
||||
return f"bolt+s://{cfg['WRITER_ENDPOINT']}:{cfg['PORT']}"
|
||||
|
||||
cfg = settings.DATABASES["neo4j"]
|
||||
return f"bolt://{cfg['HOST']}:{cfg['PORT']}"
|
||||
|
||||
|
||||
def get_ingest_uri() -> str:
|
||||
"""Neo4j URI for the cartography temp (ingest) database, which is always
|
||||
Neo4j regardless of the configured sink."""
|
||||
return ingest.get_uri()
|
||||
|
||||
|
||||
# Session API
|
||||
|
||||
|
||||
def get_session(
|
||||
database: str | None = None,
|
||||
default_access_mode: str | None = None,
|
||||
) -> AbstractContextManager:
|
||||
"""Return a session against the right backend.
|
||||
|
||||
- `database` names starting with `db-tmp-scan-` always go to ingest.
|
||||
- No database name → ingest (used for CREATE / DROP DATABASE admin ops).
|
||||
- Any other name → sink.
|
||||
"""
|
||||
if _is_ingest_database(database) or database is None:
|
||||
return ingest.get_session(
|
||||
database=database, default_access_mode=default_access_mode
|
||||
)
|
||||
|
||||
return sink_module.get_backend().get_session(
|
||||
database=database, default_access_mode=default_access_mode
|
||||
)
|
||||
|
||||
|
||||
def execute_read_query(
|
||||
database: str,
|
||||
cypher: str,
|
||||
parameters: dict[str, Any] | None = None,
|
||||
) -> neo4j.graph.Graph:
|
||||
"""Read-only query against the sink."""
|
||||
return sink_module.get_backend().execute_read_query(database, cypher, parameters)
|
||||
|
||||
|
||||
def create_database(database: str) -> None:
|
||||
"""Create a database. Temp DBs always land on ingest (Neo4j).
|
||||
|
||||
On the Neo4j sink, tenant DBs also route to ingest because both drivers
|
||||
connect to the same Neo4j cluster. On the Neptune sink, tenant DB creates
|
||||
are no-ops.
|
||||
"""
|
||||
if _is_ingest_database(database):
|
||||
ingest.create_database(database)
|
||||
return
|
||||
|
||||
sink_module.get_backend().create_database(database)
|
||||
|
||||
|
||||
def drop_database(database: str) -> None:
|
||||
"""Drop a database. Mirrors `create_database` routing."""
|
||||
if _is_ingest_database(database):
|
||||
ingest.drop_database(database)
|
||||
return
|
||||
|
||||
sink_module.get_backend().drop_database(database)
|
||||
|
||||
|
||||
def drop_subgraph(database: str, provider_id: str) -> int:
|
||||
return sink_module.get_backend().drop_subgraph(database, provider_id)
|
||||
|
||||
|
||||
def has_provider_data(database: str, provider_id: str) -> bool:
|
||||
return sink_module.get_backend().has_provider_data(database, provider_id)
|
||||
|
||||
|
||||
def clear_cache(database: str) -> None:
|
||||
if _is_ingest_database(database):
|
||||
ingest.clear_cache(database)
|
||||
return
|
||||
|
||||
sink_module.get_backend().clear_cache(database)
|
||||
|
||||
|
||||
# Name helper
|
||||
|
||||
|
||||
def get_database_name(entity_id: str | UUID, temporary: bool = False) -> str:
|
||||
prefix = "tmp-scan" if temporary else "tenant"
|
||||
return f"db-{prefix}-{str(entity_id).lower()}"
|
||||
|
||||
@@ -0,0 +1,29 @@
|
||||
"""Cartography ingest layer.
|
||||
|
||||
Public surface for the per-scan Neo4j temp database driver. Implementation
|
||||
lives in `api.attack_paths.ingest.driver`.
|
||||
"""
|
||||
|
||||
from api.attack_paths.ingest.driver import (
|
||||
clear_cache,
|
||||
close_driver,
|
||||
create_database,
|
||||
drop_database,
|
||||
get_driver,
|
||||
get_session,
|
||||
get_uri,
|
||||
init_driver,
|
||||
run_cypher,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"clear_cache",
|
||||
"close_driver",
|
||||
"create_database",
|
||||
"drop_database",
|
||||
"get_driver",
|
||||
"get_session",
|
||||
"get_uri",
|
||||
"init_driver",
|
||||
"run_cypher",
|
||||
]
|
||||
@@ -0,0 +1,187 @@
|
||||
"""Cartography ingest driver: per-scan throw-away Neo4j database.
|
||||
|
||||
Cartography writes each scan's graph into a throw-away Neo4j database named
|
||||
`db-tmp-scan-{scan_uuid}`. This is always Neo4j, regardless of the configured
|
||||
sink: Neptune is single-database and cannot host per-scan throw-away
|
||||
databases. This module owns the Neo4j driver used for those temp DBs and the
|
||||
admin ops they need (CREATE / DROP DATABASE).
|
||||
"""
|
||||
|
||||
import atexit
|
||||
import logging
|
||||
import threading
|
||||
from collections.abc import Iterator
|
||||
from contextlib import contextmanager
|
||||
from typing import Any
|
||||
|
||||
import neo4j
|
||||
import neo4j.exceptions
|
||||
from api.attack_paths.retryable_session import RetryableSession
|
||||
from config.env import env
|
||||
from django.conf import settings
|
||||
|
||||
logging.getLogger("neo4j").setLevel(logging.ERROR)
|
||||
logging.getLogger("neo4j").propagate = False
|
||||
|
||||
SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
|
||||
"ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
|
||||
)
|
||||
CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
|
||||
# TCP connect timeout, ordered below the acquisition timeout so an unreachable
|
||||
# host can't pin a worker on a temp-DB op longer than this.
|
||||
CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
|
||||
MAX_CONNECTION_LIFETIME = env.int("NEO4J_MAX_CONNECTION_LIFETIME", default=7200)
|
||||
MAX_CONNECTION_POOL_SIZE = env.int("NEO4J_MAX_CONNECTION_POOL_SIZE", default=50)
|
||||
|
||||
_driver: neo4j.Driver | None = None
|
||||
_lock = threading.Lock()
|
||||
|
||||
|
||||
def _neo4j_config() -> dict:
|
||||
return settings.DATABASES["neo4j"]
|
||||
|
||||
|
||||
def get_uri() -> str:
|
||||
"""Bolt URI for the Neo4j temp (ingest) database. Always Neo4j."""
|
||||
config = _neo4j_config()
|
||||
host = config["HOST"]
|
||||
port = config["PORT"]
|
||||
if not host or not port:
|
||||
raise RuntimeError(
|
||||
"NEO4J_HOST / NEO4J_PORT must be set to use the attack-paths "
|
||||
"temp database. Workers require Neo4j env even when the sink is Neptune."
|
||||
)
|
||||
|
||||
return f"bolt://{host}:{port}"
|
||||
|
||||
|
||||
def init_driver() -> neo4j.Driver:
|
||||
"""Initialize the temp-database Neo4j driver. Idempotent."""
|
||||
global _driver
|
||||
if _driver is not None:
|
||||
return _driver
|
||||
|
||||
with _lock:
|
||||
if _driver is None:
|
||||
config = _neo4j_config()
|
||||
_driver = neo4j.GraphDatabase.driver(
|
||||
get_uri(),
|
||||
auth=(config["USER"], config["PASSWORD"]),
|
||||
keep_alive=True,
|
||||
max_connection_lifetime=MAX_CONNECTION_LIFETIME,
|
||||
connection_timeout=CONNECTION_TIMEOUT,
|
||||
connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
|
||||
max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
|
||||
)
|
||||
# Best-effort connectivity check: a Neo4j that is down at boot must
|
||||
# not crash the worker. The driver reconnects lazily on first use.
|
||||
try:
|
||||
_driver.verify_connectivity()
|
||||
|
||||
except Exception:
|
||||
logging.warning(
|
||||
"Neo4j temp-database unreachable at init; continuing with a "
|
||||
"lazily-reconnecting driver",
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
atexit.register(close_driver)
|
||||
|
||||
return _driver
|
||||
|
||||
|
||||
def get_driver() -> neo4j.Driver:
|
||||
return init_driver()
|
||||
|
||||
|
||||
def close_driver() -> None:
|
||||
global _driver
|
||||
with _lock:
|
||||
if _driver is not None:
|
||||
try:
|
||||
_driver.close()
|
||||
finally:
|
||||
_driver = None
|
||||
|
||||
|
||||
@contextmanager
|
||||
def get_session(
|
||||
database: str | None = None,
|
||||
default_access_mode: str | None = None,
|
||||
) -> Iterator[RetryableSession]:
|
||||
"""Session against the Neo4j temp-database cluster. Used for temp DB sessions
|
||||
and for admin operations (CREATE / DROP DATABASE) when `database` is None."""
|
||||
from api.attack_paths.database import (
|
||||
ClientStatementException,
|
||||
GraphDatabaseQueryException,
|
||||
WriteQueryNotAllowedException,
|
||||
)
|
||||
|
||||
READ_EXCEPTION_CODES = [
|
||||
"Neo.ClientError.Statement.AccessMode",
|
||||
"Neo.ClientError.Procedure.ProcedureNotFound",
|
||||
]
|
||||
CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
|
||||
|
||||
session_wrapper: RetryableSession | None = None
|
||||
try:
|
||||
session_wrapper = RetryableSession(
|
||||
session_factory=lambda: get_driver().session(
|
||||
database=database, default_access_mode=default_access_mode
|
||||
),
|
||||
max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
|
||||
)
|
||||
yield session_wrapper
|
||||
|
||||
except neo4j.exceptions.Neo4jError as exc:
|
||||
if (
|
||||
default_access_mode == neo4j.READ_ACCESS
|
||||
and exc.code
|
||||
and exc.code in READ_EXCEPTION_CODES
|
||||
):
|
||||
raise WriteQueryNotAllowedException(
|
||||
message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
|
||||
)
|
||||
|
||||
message = exc.message if exc.message is not None else str(exc)
|
||||
if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
|
||||
raise ClientStatementException(message=message, code=exc.code)
|
||||
raise GraphDatabaseQueryException(message=message, code=exc.code)
|
||||
|
||||
finally:
|
||||
if session_wrapper is not None:
|
||||
session_wrapper.close()
|
||||
|
||||
|
||||
def create_database(database: str) -> None:
|
||||
"""Create a database on the Neo4j cluster. Used for temp scan DBs."""
|
||||
with get_session() as session:
|
||||
session.run("CREATE DATABASE $database IF NOT EXISTS", {"database": database})
|
||||
|
||||
|
||||
def drop_database(database: str) -> None:
|
||||
"""Drop a database on the Neo4j cluster. Used for temp scan DBs."""
|
||||
with get_session() as session:
|
||||
session.run(f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA")
|
||||
|
||||
|
||||
def clear_cache(database: str) -> None:
|
||||
"""Best-effort cache clear for a Neo4j database."""
|
||||
from api.attack_paths.database import GraphDatabaseQueryException
|
||||
|
||||
try:
|
||||
with get_session(database) as session:
|
||||
session.run("CALL db.clearQueryCaches()")
|
||||
|
||||
except GraphDatabaseQueryException as exc:
|
||||
logging.warning(f"Failed to clear query cache for database `{database}`: {exc}")
|
||||
|
||||
|
||||
def run_cypher(
|
||||
database: str | None,
|
||||
cypher: str,
|
||||
parameters: dict[str, Any] | None = None,
|
||||
) -> Any:
|
||||
"""Execute Cypher directly without the context manager. Thin helper."""
|
||||
with get_session(database) as session:
|
||||
return session.run(cypher, parameters or {})
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -1,12 +1,14 @@
|
||||
from api.attack_paths.queries.aws import AWS_QUERIES
|
||||
|
||||
# TODO: drop after Neptune cutover
|
||||
from api.attack_paths.queries.aws_deprecated import AWS_DEPRECATED_QUERIES
|
||||
from api.attack_paths.queries.types import AttackPathsQueryDefinition
|
||||
|
||||
# Query definitions organized by provider
|
||||
# Query definitions for scans synced with the current schema.
|
||||
_QUERY_DEFINITIONS: dict[str, list[AttackPathsQueryDefinition]] = {
|
||||
"aws": AWS_QUERIES,
|
||||
}
|
||||
|
||||
# Flat lookup by query ID for O(1) access
|
||||
_QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
|
||||
definition.id: definition
|
||||
for definitions in _QUERY_DEFINITIONS.values()
|
||||
@@ -14,11 +16,45 @@ _QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
|
||||
}
|
||||
|
||||
|
||||
def get_queries_for_provider(provider: str) -> list[AttackPathsQueryDefinition]:
|
||||
"""Get all attack path queries for a specific provider."""
|
||||
return _QUERY_DEFINITIONS.get(provider, [])
|
||||
# TODO: drop after Neptune cutover
|
||||
#
|
||||
# Query definitions for pre-cutover scans (`AttackPathsScan.is_migrated=False`)
|
||||
# whose graph data was written under the previous schema. Both maps expose the
|
||||
# same query IDs so the API contract is identical regardless of which set is
|
||||
# routed to.
|
||||
_DEPRECATED_QUERY_DEFINITIONS: dict[str, list[AttackPathsQueryDefinition]] = {
|
||||
"aws": AWS_DEPRECATED_QUERIES,
|
||||
}
|
||||
|
||||
_DEPRECATED_QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
|
||||
definition.id: definition
|
||||
for definitions in _DEPRECATED_QUERY_DEFINITIONS.values()
|
||||
for definition in definitions
|
||||
}
|
||||
|
||||
|
||||
def get_query_by_id(query_id: str) -> AttackPathsQueryDefinition | None:
|
||||
"""Get a specific attack path query by its ID."""
|
||||
return _QUERIES_BY_ID.get(query_id)
|
||||
def get_queries_for_provider(
|
||||
provider: str,
|
||||
is_migrated: bool = True,
|
||||
) -> list[AttackPathsQueryDefinition]:
|
||||
"""Get all attack path queries for a provider.
|
||||
|
||||
`is_migrated` selects the catalog: True for scans synced with the current
|
||||
schema, False for pre-cutover scans still using the legacy graph shape.
|
||||
# TODO: drop the `is_migrated` parameter after Neptune cutover
|
||||
"""
|
||||
catalog = _QUERY_DEFINITIONS if is_migrated else _DEPRECATED_QUERY_DEFINITIONS
|
||||
return catalog.get(provider, [])
|
||||
|
||||
|
||||
def get_query_by_id(
|
||||
query_id: str,
|
||||
is_migrated: bool = True,
|
||||
) -> AttackPathsQueryDefinition | None:
|
||||
"""Get a specific attack path query by ID.
|
||||
|
||||
`is_migrated` selects the catalog (see `get_queries_for_provider`).
|
||||
# TODO: drop the `is_migrated` parameter after Neptune cutover
|
||||
"""
|
||||
by_id = _QUERIES_BY_ID if is_migrated else _DEPRECATED_QUERIES_BY_ID
|
||||
return by_id.get(query_id)
|
||||
|
||||
@@ -0,0 +1,28 @@
|
||||
"""Attack-paths sink database layer.
|
||||
|
||||
The sink is the persistent store where attack-paths graphs live after a scan
|
||||
finishes. Currently selectable between Neo4j (OSS / local dev default) and
|
||||
AWS Neptune (hosted dev/staging/prod). Backend is picked by the
|
||||
`ATTACK_PATHS_SINK_DATABASE` setting at process init.
|
||||
|
||||
This package exposes the public factory API; the implementation lives in
|
||||
`api.attack_paths.sink.factory`.
|
||||
"""
|
||||
|
||||
from api.attack_paths.sink.factory import (
|
||||
SinkBackend,
|
||||
close,
|
||||
get_backend,
|
||||
get_backend_for_name,
|
||||
get_backend_for_scan,
|
||||
init,
|
||||
)
|
||||
|
||||
__all__ = [
|
||||
"SinkBackend",
|
||||
"close",
|
||||
"get_backend",
|
||||
"get_backend_for_name",
|
||||
"get_backend_for_scan",
|
||||
"init",
|
||||
]
|
||||
@@ -0,0 +1,92 @@
|
||||
"""Protocol every sink backend must implement."""
|
||||
|
||||
from contextlib import AbstractContextManager
|
||||
from typing import Any, Protocol
|
||||
|
||||
import neo4j
|
||||
|
||||
|
||||
class SinkDatabase(Protocol):
|
||||
"""Contract for the persistent attack-paths graph store.
|
||||
|
||||
The `database` argument is an opaque identifier passed through from the
|
||||
legacy `database.py` API surface. On Neo4j it is the per-tenant database
|
||||
name (e.g. `db-tenant-{uuid}`). On Neptune it is ignored (the cluster
|
||||
has a single graph, and isolation is label-based).
|
||||
"""
|
||||
|
||||
def init(self) -> None: ...
|
||||
|
||||
def close(self) -> None: ...
|
||||
|
||||
def verify_connectivity(self) -> None:
|
||||
"""Raise if the backend the API read path uses is unreachable.
|
||||
|
||||
Neo4j verifies its single driver. Neptune verifies the reader
|
||||
driver (the endpoint the API serves reads from); on single-endpoint
|
||||
clusters the reader aliases the writer, so that path is covered too.
|
||||
Used by the readiness probe; must not block longer than the caller's
|
||||
probe budget.
|
||||
"""
|
||||
...
|
||||
|
||||
def get_session(
|
||||
self,
|
||||
database: str | None = None,
|
||||
default_access_mode: str | None = None,
|
||||
) -> AbstractContextManager: ...
|
||||
|
||||
def execute_read_query(
|
||||
self,
|
||||
database: str,
|
||||
cypher: str,
|
||||
parameters: dict[str, Any] | None = None,
|
||||
) -> neo4j.graph.Graph: ...
|
||||
|
||||
def create_database(self, database: str) -> None: ...
|
||||
|
||||
def drop_database(self, database: str) -> None: ...
|
||||
|
||||
def drop_subgraph(self, database: str, provider_id: str) -> int: ...
|
||||
|
||||
def has_provider_data(self, database: str, provider_id: str) -> bool: ...
|
||||
|
||||
def clear_cache(self, database: str) -> None: ...
|
||||
|
||||
def ensure_sync_indexes(self, database: str) -> None:
|
||||
"""Create any index needed for the sync write path.
|
||||
|
||||
Called once at the start of each provider sync; must be idempotent.
|
||||
Neo4j creates a `_provider_element_id` index on `_ProviderResource`;
|
||||
Neptune is a no-op (its `~id` lookup needs no index).
|
||||
"""
|
||||
...
|
||||
|
||||
def write_nodes(
|
||||
self,
|
||||
database: str,
|
||||
labels: str,
|
||||
rows: list[dict[str, Any]],
|
||||
) -> None:
|
||||
"""Upsert a batch of nodes into the sink.
|
||||
|
||||
`labels` is a pre-rendered Cypher label string ready to drop after
|
||||
the node variable (e.g. `` `AWSUser`:`_ProviderResource`:`_Tenant_x` ``).
|
||||
Each row carries `provider_element_id` and `props`.
|
||||
"""
|
||||
...
|
||||
|
||||
def write_relationships(
|
||||
self,
|
||||
database: str,
|
||||
rel_type: str,
|
||||
provider_id: str,
|
||||
rows: list[dict[str, Any]],
|
||||
) -> None:
|
||||
"""Upsert a batch of relationships into the sink.
|
||||
|
||||
Each row carries `start_element_id`, `end_element_id`,
|
||||
`provider_element_id` and `props`. `rel_type` is the relationship
|
||||
type (already a valid Cypher identifier).
|
||||
"""
|
||||
...
|
||||
@@ -0,0 +1,134 @@
|
||||
"""Sink backend factory and process-wide handle cache.
|
||||
|
||||
Picks the active backend from `settings.ATTACK_PATHS_SINK_DATABASE` at first
|
||||
use, holds the active backend plus any secondary backends needed to serve
|
||||
scans written under the previous configuration, and tears them all down on
|
||||
process shutdown. Imported via `from api.attack_paths import sink as
|
||||
sink_module`.
|
||||
"""
|
||||
|
||||
import threading
|
||||
from enum import StrEnum, auto
|
||||
|
||||
from api.attack_paths.sink.base import SinkDatabase
|
||||
from api.models import AttackPathsScan
|
||||
from django.conf import settings
|
||||
|
||||
# Backend names
|
||||
|
||||
|
||||
class SinkBackend(StrEnum):
|
||||
NEO4J = auto()
|
||||
NEPTUNE = auto()
|
||||
|
||||
|
||||
# Backend cache
|
||||
|
||||
_backend: SinkDatabase | None = None
|
||||
_secondary_backends: dict[SinkBackend, SinkDatabase] = {}
|
||||
_lock = threading.Lock()
|
||||
|
||||
|
||||
def _resolve_setting() -> SinkBackend:
|
||||
raw = settings.ATTACK_PATHS_SINK_DATABASE.lower()
|
||||
try:
|
||||
return SinkBackend(raw)
|
||||
|
||||
except ValueError:
|
||||
valid = sorted(b.value for b in SinkBackend)
|
||||
raise RuntimeError(
|
||||
f"ATTACK_PATHS_SINK_DATABASE must be one of {valid}; got {raw!r}"
|
||||
)
|
||||
|
||||
|
||||
def _build_backend(name: SinkBackend) -> SinkDatabase:
|
||||
if name is SinkBackend.NEO4J:
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
return Neo4jSink()
|
||||
|
||||
if name is SinkBackend.NEPTUNE:
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
return NeptuneSink()
|
||||
|
||||
raise RuntimeError(f"Unknown sink backend {name!r}")
|
||||
|
||||
|
||||
# Lifecycle
|
||||
|
||||
|
||||
def init(name: SinkBackend | str | None = None) -> SinkDatabase:
|
||||
"""Initialize the configured sink backend. Idempotent."""
|
||||
global _backend
|
||||
if _backend is not None:
|
||||
return _backend
|
||||
|
||||
with _lock:
|
||||
if _backend is None:
|
||||
resolved = SinkBackend(name) if name else _resolve_setting()
|
||||
backend = _build_backend(resolved)
|
||||
backend.init()
|
||||
_backend = backend
|
||||
|
||||
return _backend
|
||||
|
||||
|
||||
def close() -> None:
|
||||
"""Close the active backend and every cached secondary backend."""
|
||||
global _backend
|
||||
with _lock:
|
||||
backends = [
|
||||
b for b in (_backend, *_secondary_backends.values()) if b is not None
|
||||
]
|
||||
_backend = None
|
||||
_secondary_backends.clear()
|
||||
|
||||
for backend in backends:
|
||||
try:
|
||||
backend.close()
|
||||
|
||||
except Exception: # pragma: no cover - best-effort
|
||||
pass
|
||||
|
||||
|
||||
def get_backend() -> SinkDatabase:
|
||||
"""Return the active sink. Initializes on first call."""
|
||||
return init()
|
||||
|
||||
|
||||
# Per-scan routing
|
||||
|
||||
|
||||
def get_backend_for_scan(scan: AttackPathsScan) -> SinkDatabase:
|
||||
"""Route reads by the sink that stores this scan's graph."""
|
||||
raw_backend = getattr(scan, "sink_backend", SinkBackend.NEO4J.value)
|
||||
if not isinstance(raw_backend, str):
|
||||
raw_backend = SinkBackend.NEO4J.value
|
||||
return get_backend_for_name(raw_backend)
|
||||
|
||||
|
||||
def get_backend_for_name(name: SinkBackend | str) -> SinkDatabase:
|
||||
"""Return the backend named by persisted scan metadata."""
|
||||
resolved = SinkBackend(name)
|
||||
if resolved is _resolve_setting():
|
||||
return get_backend()
|
||||
|
||||
return _build_backend_cached(resolved)
|
||||
|
||||
|
||||
def _build_backend_cached(name: SinkBackend) -> SinkDatabase:
|
||||
# TODO: drop after Neptune cutover
|
||||
# Needed only during cutover to serve Neo4j-written scans from a Neptune-
|
||||
# configured API pod (and vice versa). Once every scan is on Neptune,
|
||||
# `get_backend_for_scan` becomes a one-liner returning `get_backend()`.
|
||||
if name in _secondary_backends:
|
||||
return _secondary_backends[name]
|
||||
|
||||
with _lock:
|
||||
if name not in _secondary_backends:
|
||||
backend = _build_backend(name)
|
||||
backend.init()
|
||||
_secondary_backends[name] = backend
|
||||
|
||||
return _secondary_backends[name]
|
||||
@@ -0,0 +1,454 @@
|
||||
"""Neo4j sink implementation.
|
||||
|
||||
Owns a Neo4j driver independent from the staging driver. On OSS and local dev
|
||||
this is the only sink; on hosted deployments it runs only as a legacy read
|
||||
path while phase-1 drains tenant DBs.
|
||||
"""
|
||||
|
||||
import atexit
|
||||
import logging
|
||||
import threading
|
||||
import time
|
||||
from collections.abc import Iterator
|
||||
from contextlib import AbstractContextManager, contextmanager
|
||||
from typing import Any
|
||||
|
||||
import neo4j
|
||||
import neo4j.exceptions
|
||||
from api.attack_paths.retryable_session import RetryableSession
|
||||
from api.attack_paths.sink.base import SinkDatabase
|
||||
from config.env import env
|
||||
from django.conf import settings
|
||||
|
||||
logging.getLogger("neo4j").setLevel(logging.ERROR)
|
||||
logging.getLogger("neo4j").propagate = False
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
|
||||
"ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
|
||||
)
|
||||
READ_QUERY_TIMEOUT_SECONDS = env.int(
|
||||
"ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
|
||||
)
|
||||
CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
|
||||
# TCP connect timeout, ordered below the acquisition timeout so an unreachable
|
||||
# host can't pin a request or the readiness probe longer than this.
|
||||
CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
|
||||
MAX_CONNECTION_LIFETIME = env.int("NEO4J_MAX_CONNECTION_LIFETIME", default=7200)
|
||||
MAX_CONNECTION_POOL_SIZE = env.int("NEO4J_MAX_CONNECTION_POOL_SIZE", default=50)
|
||||
|
||||
READ_EXCEPTION_CODES = [
|
||||
"Neo.ClientError.Statement.AccessMode",
|
||||
"Neo.ClientError.Procedure.ProcedureNotFound",
|
||||
]
|
||||
CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
|
||||
DATABASE_NOT_FOUND_CODE = "Neo.ClientError.Database.DatabaseNotFound"
|
||||
|
||||
|
||||
class Neo4jSink(SinkDatabase):
|
||||
"""Neo4j-backed sink. Multi-database cluster; tenant isolation is physical."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._driver: neo4j.Driver | None = None
|
||||
self._lock = threading.Lock()
|
||||
self._atexit_registered = False
|
||||
|
||||
# Driver
|
||||
|
||||
def _config(self) -> dict:
|
||||
return settings.DATABASES["neo4j"]
|
||||
|
||||
def _uri(self) -> str:
|
||||
cfg = self._config()
|
||||
host = cfg["HOST"]
|
||||
port = cfg["PORT"]
|
||||
if not host or not port:
|
||||
raise RuntimeError(
|
||||
"NEO4J_HOST / NEO4J_PORT must be set when ATTACK_PATHS_SINK_DATABASE=neo4j"
|
||||
)
|
||||
return f"bolt://{host}:{port}"
|
||||
|
||||
def init(self) -> neo4j.Driver:
|
||||
if self._driver is not None:
|
||||
return self._driver
|
||||
with self._lock:
|
||||
if self._driver is None:
|
||||
cfg = self._config()
|
||||
self._driver = neo4j.GraphDatabase.driver(
|
||||
self._uri(),
|
||||
auth=(cfg["USER"], cfg["PASSWORD"]),
|
||||
keep_alive=True,
|
||||
max_connection_lifetime=MAX_CONNECTION_LIFETIME,
|
||||
connection_timeout=CONNECTION_TIMEOUT,
|
||||
connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
|
||||
max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
|
||||
)
|
||||
# Eager connectivity check is best-effort:
|
||||
# A Neo4j that is down at boot must not crash the process, same degradation model as Postgres
|
||||
# The driver reconnects lazily on first use
|
||||
# /health/ready surfaces the outage until it recovers
|
||||
try:
|
||||
self._driver.verify_connectivity()
|
||||
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Neo4j sink unreachable at init; continuing with a lazily-reconnecting driver",
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
if not self._atexit_registered:
|
||||
atexit.register(self.close)
|
||||
self._atexit_registered = True
|
||||
return self._driver
|
||||
|
||||
def _get_driver(self) -> neo4j.Driver:
|
||||
return self.init()
|
||||
|
||||
def verify_connectivity(self) -> None:
|
||||
self._get_driver().verify_connectivity()
|
||||
|
||||
def close(self) -> None:
|
||||
with self._lock:
|
||||
if self._driver is not None:
|
||||
try:
|
||||
self._driver.close()
|
||||
finally:
|
||||
self._driver = None
|
||||
|
||||
# Sessions
|
||||
|
||||
@contextmanager
|
||||
def get_session(
|
||||
self,
|
||||
database: str | None = None,
|
||||
default_access_mode: str | None = None,
|
||||
) -> Iterator[RetryableSession]:
|
||||
from api.attack_paths.database import (
|
||||
ClientStatementException,
|
||||
GraphDatabaseQueryException,
|
||||
WriteQueryNotAllowedException,
|
||||
)
|
||||
|
||||
session_wrapper: RetryableSession | None = None
|
||||
try:
|
||||
session_wrapper = RetryableSession(
|
||||
session_factory=lambda: self._get_driver().session(
|
||||
database=database, default_access_mode=default_access_mode
|
||||
),
|
||||
max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
|
||||
)
|
||||
yield session_wrapper
|
||||
|
||||
except neo4j.exceptions.Neo4jError as exc:
|
||||
if (
|
||||
default_access_mode == neo4j.READ_ACCESS
|
||||
and exc.code
|
||||
and exc.code in READ_EXCEPTION_CODES
|
||||
):
|
||||
raise WriteQueryNotAllowedException(
|
||||
message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
|
||||
)
|
||||
|
||||
message = exc.message if exc.message is not None else str(exc)
|
||||
if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
|
||||
raise ClientStatementException(message=message, code=exc.code)
|
||||
raise GraphDatabaseQueryException(message=message, code=exc.code)
|
||||
|
||||
finally:
|
||||
if session_wrapper is not None:
|
||||
session_wrapper.close()
|
||||
|
||||
# Operations
|
||||
|
||||
def execute_read_query(
|
||||
self,
|
||||
database: str,
|
||||
cypher: str,
|
||||
parameters: dict[str, Any] | None = None,
|
||||
) -> neo4j.graph.Graph:
|
||||
with self.get_session(
|
||||
database, default_access_mode=neo4j.READ_ACCESS
|
||||
) as session:
|
||||
|
||||
def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
|
||||
result = tx.run(
|
||||
cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
|
||||
)
|
||||
return result.graph()
|
||||
|
||||
return session.execute_read(_run)
|
||||
|
||||
def create_database(self, database: str) -> None:
|
||||
with self.get_session() as session:
|
||||
session.run(
|
||||
"CREATE DATABASE $database IF NOT EXISTS", {"database": database}
|
||||
)
|
||||
|
||||
def drop_database(self, database: str) -> None:
|
||||
with self.get_session() as session:
|
||||
session.run(f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA")
|
||||
|
||||
def drop_subgraph(self, database: str, provider_id: str) -> int:
|
||||
"""Delete all nodes for a provider from a tenant database, batched.
|
||||
|
||||
Deletes relationships then nodes in batches (not `DETACH DELETE`) so a
|
||||
dense provider's graph cannot exceed Neo4j's transaction memory limit.
|
||||
Silently returns 0 if the database doesn't exist.
|
||||
"""
|
||||
from api.attack_paths.database import GraphDatabaseQueryException
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
BATCH_SIZE,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_provider_label,
|
||||
)
|
||||
|
||||
provider_label = get_provider_label(provider_id)
|
||||
deleted_nodes = 0
|
||||
deleted_relationships = 0
|
||||
relationship_batches = 0
|
||||
node_batches = 0
|
||||
drop_t0 = time.perf_counter()
|
||||
|
||||
logger.info(
|
||||
"Dropping provider graph from Neo4j sink database %s "
|
||||
"(provider=%s, provider_label=%s)",
|
||||
database,
|
||||
provider_id,
|
||||
provider_label,
|
||||
)
|
||||
|
||||
try:
|
||||
logger.info(
|
||||
"Opening Neo4j sink session for provider graph drop "
|
||||
"(database=%s, provider=%s)",
|
||||
database,
|
||||
provider_id,
|
||||
)
|
||||
with self.get_session(database) as session:
|
||||
logger.info(
|
||||
"Opened Neo4j sink session for provider graph drop "
|
||||
"(database=%s, provider=%s)",
|
||||
database,
|
||||
provider_id,
|
||||
)
|
||||
# Phase 1: delete relationships incident to provider nodes in
|
||||
# batches. The undirected pattern matches an edge between two
|
||||
# provider nodes from both ends, so `DISTINCT r` dedupes it to
|
||||
# delete a full batch of unique relationships each round.
|
||||
deleted_count = 1
|
||||
while deleted_count > 0:
|
||||
next_batch = relationship_batches + 1
|
||||
logger.info(
|
||||
"Deleting relationship batch from Neo4j sink database %s "
|
||||
"(provider=%s, batch=%s, total_rels=%s, elapsed=%.3fs)",
|
||||
database,
|
||||
provider_id,
|
||||
next_batch,
|
||||
deleted_relationships,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
result = session.run(
|
||||
f"""
|
||||
MATCH (:`{provider_label}`)-[r]-()
|
||||
WITH DISTINCT r LIMIT $batch_size
|
||||
DELETE r
|
||||
RETURN COUNT(r) AS deleted_rels_count
|
||||
""",
|
||||
{"batch_size": BATCH_SIZE},
|
||||
)
|
||||
deleted_count = result.single().get("deleted_rels_count", 0)
|
||||
if deleted_count > 0:
|
||||
relationship_batches += 1
|
||||
deleted_relationships += deleted_count
|
||||
logger.info(
|
||||
"Deleted relationship batch from Neo4j sink database %s "
|
||||
"(provider=%s, batch=%s, deleted_rels=%s, "
|
||||
"total_rels=%s, elapsed=%.3fs)",
|
||||
database,
|
||||
provider_id,
|
||||
relationship_batches,
|
||||
deleted_count,
|
||||
deleted_relationships,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
|
||||
# Phase 2: delete the now relationship-free nodes in batches.
|
||||
deleted_count = 1
|
||||
while deleted_count > 0:
|
||||
next_batch = node_batches + 1
|
||||
logger.info(
|
||||
"Deleting node batch from Neo4j sink database %s "
|
||||
"(provider=%s, batch=%s, total_nodes=%s, elapsed=%.3fs)",
|
||||
database,
|
||||
provider_id,
|
||||
next_batch,
|
||||
deleted_nodes,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
result = session.run(
|
||||
f"""
|
||||
MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`)
|
||||
WITH n LIMIT $batch_size
|
||||
DELETE n
|
||||
RETURN COUNT(n) AS deleted_nodes_count
|
||||
""",
|
||||
{"batch_size": BATCH_SIZE},
|
||||
)
|
||||
deleted_count = result.single().get("deleted_nodes_count", 0)
|
||||
if deleted_count > 0:
|
||||
node_batches += 1
|
||||
deleted_nodes += deleted_count
|
||||
logger.info(
|
||||
"Deleted node batch from Neo4j sink database %s "
|
||||
"(provider=%s, batch=%s, deleted_nodes=%s, "
|
||||
"total_nodes=%s, elapsed=%.3fs)",
|
||||
database,
|
||||
provider_id,
|
||||
node_batches,
|
||||
deleted_count,
|
||||
deleted_nodes,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
|
||||
except GraphDatabaseQueryException as exc:
|
||||
if exc.code == DATABASE_NOT_FOUND_CODE:
|
||||
logger.info(
|
||||
"Skipped provider graph drop from Neo4j sink database %s "
|
||||
"(provider=%s, reason=database_not_found, elapsed=%.3fs)",
|
||||
database,
|
||||
provider_id,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
return 0
|
||||
raise
|
||||
|
||||
logger.info(
|
||||
"Finished dropping provider graph from Neo4j sink database %s "
|
||||
"(provider=%s, relationship_batches=%s, deleted_rels=%s, "
|
||||
"node_batches=%s, deleted_nodes=%s, elapsed=%.3fs)",
|
||||
database,
|
||||
provider_id,
|
||||
relationship_batches,
|
||||
deleted_relationships,
|
||||
node_batches,
|
||||
deleted_nodes,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
return deleted_nodes
|
||||
|
||||
def has_provider_data(self, database: str, provider_id: str) -> bool:
|
||||
from api.attack_paths.database import GraphDatabaseQueryException
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_provider_label,
|
||||
)
|
||||
|
||||
provider_label = get_provider_label(provider_id)
|
||||
query = (
|
||||
f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
|
||||
)
|
||||
try:
|
||||
with self.get_session(
|
||||
database, default_access_mode=neo4j.READ_ACCESS
|
||||
) as session:
|
||||
result = session.run(query)
|
||||
return result.single() is not None
|
||||
|
||||
except GraphDatabaseQueryException as exc:
|
||||
if exc.code == DATABASE_NOT_FOUND_CODE:
|
||||
return False
|
||||
raise
|
||||
|
||||
def clear_cache(self, database: str) -> None:
|
||||
from api.attack_paths.database import GraphDatabaseQueryException
|
||||
|
||||
try:
|
||||
with self.get_session(database) as session:
|
||||
session.run("CALL db.clearQueryCaches()")
|
||||
except GraphDatabaseQueryException as exc:
|
||||
logger.warning(
|
||||
f"Failed to clear query cache for database `{database}`: {exc}"
|
||||
)
|
||||
|
||||
# Sync write path
|
||||
|
||||
def ensure_sync_indexes(self, database: str) -> None:
|
||||
"""Create the `_provider_element_id` lookup index on `_ProviderResource`.
|
||||
|
||||
Every synced node carries the `_ProviderResource` label, so a single
|
||||
index covers both node-upserts and relationship endpoint MATCHes.
|
||||
Without this index the rel sync degrades to a label scan per row and
|
||||
large provider syncs become unworkable.
|
||||
"""
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_ELEMENT_ID_PROPERTY,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
)
|
||||
|
||||
query = (
|
||||
f"CREATE INDEX provider_element_id_idx IF NOT EXISTS "
|
||||
f"FOR (n:`{PROVIDER_RESOURCE_LABEL}`) "
|
||||
f"ON (n.`{PROVIDER_ELEMENT_ID_PROPERTY}`)"
|
||||
)
|
||||
with self.get_session(database) as session:
|
||||
session.run(query).consume()
|
||||
|
||||
def write_nodes(
|
||||
self,
|
||||
database: str,
|
||||
labels: str,
|
||||
rows: list[dict[str, Any]],
|
||||
) -> None:
|
||||
if not rows:
|
||||
return
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_ELEMENT_ID_PROPERTY,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
)
|
||||
|
||||
query = f"""
|
||||
UNWIND $rows AS row
|
||||
MERGE (n:`{PROVIDER_RESOURCE_LABEL}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}})
|
||||
SET n:{labels}
|
||||
SET n += row.props
|
||||
"""
|
||||
with self.get_session(database) as session:
|
||||
session.run(query, {"rows": rows}).consume()
|
||||
|
||||
def write_relationships(
|
||||
self,
|
||||
database: str,
|
||||
rel_type: str,
|
||||
provider_id: str,
|
||||
rows: list[dict[str, Any]],
|
||||
) -> None:
|
||||
if not rows:
|
||||
return
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_ELEMENT_ID_PROPERTY,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_provider_label,
|
||||
)
|
||||
|
||||
provider_label = get_provider_label(provider_id)
|
||||
query = f"""
|
||||
UNWIND $rows AS row
|
||||
MATCH (s:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.start_element_id}})
|
||||
MATCH (t:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.end_element_id}})
|
||||
MERGE (s)-[r:`{rel_type}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}}]->(t)
|
||||
SET r += row.props
|
||||
"""
|
||||
with self.get_session(database) as session:
|
||||
session.run(query, {"rows": rows}).consume()
|
||||
|
||||
# For compatibility with test harnesses that patch the concrete driver
|
||||
def get_driver(self) -> neo4j.Driver:
|
||||
return self._get_driver()
|
||||
|
||||
|
||||
# Helper for tests / external callers that want a writer session specifically
|
||||
def get_read_session(
|
||||
sink: Neo4jSink, database: str
|
||||
) -> AbstractContextManager[RetryableSession]:
|
||||
return sink.get_session(database, default_access_mode=neo4j.READ_ACCESS)
|
||||
@@ -0,0 +1,524 @@
|
||||
"""AWS Neptune sink implementation.
|
||||
|
||||
Dual Bolt drivers: one against the writer endpoint for workers, one against
|
||||
the reader endpoint for the API read path. If `NEPTUNE_READER_ENDPOINT` is
|
||||
unset the reader falls back to the writer driver so single-node clusters work.
|
||||
|
||||
Neptune is single-database. The `database` argument on the SinkDatabase
|
||||
protocol is ignored; tenant / provider isolation is enforced by labels that
|
||||
the sync step already writes on every node (see tasks/jobs/attack_paths/sync.py).
|
||||
|
||||
SigV4 auth lives at the bottom of this file as `neptune_auth_provider`. The
|
||||
neo4j driver invokes the returned callable on each token refresh.
|
||||
"""
|
||||
|
||||
import atexit
|
||||
import datetime
|
||||
import json
|
||||
import logging
|
||||
import threading
|
||||
import time
|
||||
from collections.abc import Callable, Iterator
|
||||
from contextlib import contextmanager
|
||||
from typing import Any
|
||||
from urllib.parse import urlsplit
|
||||
|
||||
import neo4j
|
||||
import neo4j.exceptions
|
||||
from api.attack_paths.retryable_session import RetryableSession
|
||||
from api.attack_paths.sink.base import SinkDatabase
|
||||
from botocore.auth import SigV4Auth
|
||||
from botocore.awsrequest import AWSRequest
|
||||
from botocore.session import Session as BotoSession
|
||||
from config.env import env
|
||||
from django.conf import settings
|
||||
from neo4j.auth_management import AuthManagers, ExpiringAuth
|
||||
|
||||
logging.getLogger("neo4j").setLevel(logging.ERROR)
|
||||
logging.getLogger("neo4j").propagate = False
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
|
||||
"ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
|
||||
)
|
||||
READ_QUERY_TIMEOUT_SECONDS = env.int(
|
||||
"ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
|
||||
)
|
||||
# Neptune serverless cold-start can be >30s; give the driver room
|
||||
CONN_ACQUISITION_TIMEOUT = env.int("NEPTUNE_CONN_ACQUISITION_TIMEOUT", default=60)
|
||||
# TCP connect timeout, ordered below the acquisition timeout so an unreachable
|
||||
# endpoint can't pin a request or the readiness probe longer than this. Kept
|
||||
# generous: cold-start delays query execution, not the socket connect.
|
||||
CONNECTION_TIMEOUT = env.int("NEPTUNE_CONNECTION_TIMEOUT", default=10)
|
||||
# Roll connections hourly so SigV4 rotations and cert refreshes don't strand long-lived pool entries
|
||||
MAX_CONNECTION_LIFETIME = env.int("NEPTUNE_MAX_CONNECTION_LIFETIME", default=3600)
|
||||
MAX_CONNECTION_POOL_SIZE = env.int("NEPTUNE_MAX_CONNECTION_POOL_SIZE", default=50)
|
||||
|
||||
READ_EXCEPTION_CODES = [
|
||||
"Neo.ClientError.Statement.AccessMode",
|
||||
"Neo.ClientError.Procedure.ProcedureNotFound",
|
||||
]
|
||||
CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
|
||||
|
||||
# Refresh 60s before the 5-minute SigV4 window closes
|
||||
SIGV4_TOKEN_LIFETIME_MINUTES = 4
|
||||
|
||||
|
||||
class NeptuneSink(SinkDatabase):
|
||||
"""Neptune-backed sink. Single database; isolation is label-based."""
|
||||
|
||||
def __init__(self) -> None:
|
||||
self._writer: neo4j.Driver | None = None
|
||||
self._reader: neo4j.Driver | None = None
|
||||
self._lock = threading.Lock()
|
||||
self._atexit_registered = False
|
||||
|
||||
# Config
|
||||
|
||||
def _config(self) -> dict:
|
||||
return settings.DATABASES["neptune"]
|
||||
|
||||
def _bolt_uri(self, endpoint: str, port: str) -> str:
|
||||
return f"bolt+s://{endpoint}:{port}"
|
||||
|
||||
def _https_url(self, endpoint: str, port: str) -> str:
|
||||
return f"https://{endpoint}:{port}"
|
||||
|
||||
def _build_driver(self, endpoint: str) -> neo4j.Driver:
|
||||
cfg = self._config()
|
||||
port = cfg["PORT"]
|
||||
region = cfg["REGION"]
|
||||
if not endpoint or not region:
|
||||
raise RuntimeError(
|
||||
"NEPTUNE_WRITER_ENDPOINT and AWS_REGION must be set when "
|
||||
"ATTACK_PATHS_SINK_DATABASE=neptune"
|
||||
)
|
||||
return neo4j.GraphDatabase.driver(
|
||||
self._bolt_uri(endpoint, port),
|
||||
auth=AuthManagers.bearer(
|
||||
neptune_auth_provider(region, self._https_url(endpoint, port))
|
||||
),
|
||||
keep_alive=True,
|
||||
max_connection_lifetime=MAX_CONNECTION_LIFETIME,
|
||||
connection_timeout=CONNECTION_TIMEOUT,
|
||||
connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
|
||||
max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
|
||||
max_transaction_retry_time=0,
|
||||
)
|
||||
|
||||
# Lifecycle
|
||||
|
||||
def init(self) -> None:
|
||||
if self._writer is not None:
|
||||
return
|
||||
with self._lock:
|
||||
if self._writer is None:
|
||||
cfg = self._config()
|
||||
writer_endpoint = cfg["WRITER_ENDPOINT"]
|
||||
reader_endpoint = cfg["READER_ENDPOINT"] or writer_endpoint
|
||||
|
||||
# Eager connectivity checks are best-effort
|
||||
# A Neptune that is down at boot must not crash the process, same degradation model as Postgres
|
||||
# Drivers reconnect lazily on first use
|
||||
# /health/ready surfaces the outage until it recovers
|
||||
self._writer = self._build_driver(writer_endpoint)
|
||||
self._verify_best_effort(self._writer, "writer")
|
||||
|
||||
if reader_endpoint == writer_endpoint:
|
||||
self._reader = self._writer
|
||||
|
||||
else:
|
||||
self._reader = self._build_driver(reader_endpoint)
|
||||
self._verify_best_effort(self._reader, "reader")
|
||||
|
||||
if not self._atexit_registered:
|
||||
atexit.register(self.close)
|
||||
self._atexit_registered = True
|
||||
|
||||
def close(self) -> None:
|
||||
with self._lock:
|
||||
# `Driver.close()` is idempotent, so closing the same driver twice
|
||||
# (when reader aliases writer on single-endpoint configs) is safe
|
||||
for driver in (self._reader, self._writer):
|
||||
if driver is None:
|
||||
continue
|
||||
try:
|
||||
driver.close()
|
||||
except Exception: # pragma: no cover - best-effort
|
||||
pass
|
||||
self._writer = None
|
||||
self._reader = None
|
||||
|
||||
# Sessions
|
||||
|
||||
def _get_writer(self) -> neo4j.Driver:
|
||||
self.init()
|
||||
assert self._writer is not None
|
||||
return self._writer
|
||||
|
||||
def _get_reader(self) -> neo4j.Driver:
|
||||
self.init()
|
||||
assert self._reader is not None
|
||||
return self._reader
|
||||
|
||||
@staticmethod
|
||||
def _verify_best_effort(driver: neo4j.Driver, role: str) -> None:
|
||||
try:
|
||||
driver.verify_connectivity()
|
||||
|
||||
except Exception:
|
||||
logger.warning(
|
||||
"Neptune %s endpoint unreachable at init; continuing with a lazily-reconnecting driver",
|
||||
role,
|
||||
exc_info=True,
|
||||
)
|
||||
|
||||
def verify_connectivity(self) -> None:
|
||||
# The API read path uses the reader driver
|
||||
# On single-endpoint clusters it aliases the writer, so this also covers the writer
|
||||
# A writer-only outage is a workers' concern (no HTTP probe there) and deliberately does not fail API readiness
|
||||
self._get_reader().verify_connectivity()
|
||||
|
||||
@contextmanager
|
||||
def get_session(
|
||||
self,
|
||||
database: str | None = None, # noqa: ARG002 - ignored on Neptune
|
||||
default_access_mode: str | None = None,
|
||||
) -> Iterator[RetryableSession]:
|
||||
from api.attack_paths.database import (
|
||||
ClientStatementException,
|
||||
GraphDatabaseQueryException,
|
||||
WriteQueryNotAllowedException,
|
||||
)
|
||||
|
||||
driver = (
|
||||
self._get_reader()
|
||||
if default_access_mode == neo4j.READ_ACCESS
|
||||
else self._get_writer()
|
||||
)
|
||||
|
||||
session_wrapper: RetryableSession | None = None
|
||||
try:
|
||||
session_wrapper = RetryableSession(
|
||||
session_factory=lambda: driver.session(
|
||||
default_access_mode=default_access_mode
|
||||
),
|
||||
max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
|
||||
)
|
||||
yield session_wrapper
|
||||
|
||||
except neo4j.exceptions.Neo4jError as exc:
|
||||
if (
|
||||
default_access_mode == neo4j.READ_ACCESS
|
||||
and exc.code
|
||||
and exc.code in READ_EXCEPTION_CODES
|
||||
):
|
||||
raise WriteQueryNotAllowedException(
|
||||
message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
|
||||
)
|
||||
|
||||
message = exc.message if exc.message is not None else str(exc)
|
||||
if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
|
||||
raise ClientStatementException(message=message, code=exc.code)
|
||||
raise GraphDatabaseQueryException(message=message, code=exc.code)
|
||||
|
||||
finally:
|
||||
if session_wrapper is not None:
|
||||
session_wrapper.close()
|
||||
|
||||
# Operations
|
||||
|
||||
def execute_read_query(
|
||||
self,
|
||||
database: str, # noqa: ARG002 - ignored on Neptune
|
||||
cypher: str,
|
||||
parameters: dict[str, Any] | None = None,
|
||||
) -> neo4j.graph.Graph:
|
||||
with self.get_session(default_access_mode=neo4j.READ_ACCESS) as session:
|
||||
|
||||
def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
|
||||
result = tx.run(
|
||||
cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
|
||||
)
|
||||
return result.graph()
|
||||
|
||||
return session.execute_read(_run)
|
||||
|
||||
def create_database(self, database: str) -> None: # noqa: ARG002
|
||||
# Neptune clusters are single-database; there is nothing to create.
|
||||
return None
|
||||
|
||||
def drop_database(self, database: str) -> None: # noqa: ARG002
|
||||
# Neptune clusters are single-database; there is nothing to drop.
|
||||
return None
|
||||
|
||||
def drop_subgraph(self, database: str, provider_id: str) -> int: # noqa: ARG002
|
||||
"""Delete a provider's subgraph in two bounded phases.
|
||||
|
||||
Neptune write transactions are capped at ~2 minutes. A naive
|
||||
`DETACH DELETE` on a label-scanned batch grows unbounded with graph
|
||||
density (one node can drag thousands of relationships into the same
|
||||
transaction). Instead:
|
||||
|
||||
1. Delete relationships incident to provider nodes, one fixed-size
|
||||
batch per transaction.
|
||||
2. Delete the now-orphaned nodes, one fixed-size batch per transaction.
|
||||
|
||||
Each transaction does work proportional to `batch_size`, never to the
|
||||
graph's branching factor.
|
||||
"""
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
BATCH_SIZE,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_provider_label,
|
||||
)
|
||||
|
||||
provider_label = get_provider_label(provider_id)
|
||||
deleted_relationships = 0
|
||||
relationship_batches = 0
|
||||
node_batches = 0
|
||||
drop_t0 = time.perf_counter()
|
||||
|
||||
logger.info(
|
||||
"Dropping provider graph from Neptune sink "
|
||||
"(provider=%s, provider_label=%s)",
|
||||
provider_id,
|
||||
provider_label,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"Opening Neptune writer session for provider graph drop (provider=%s)",
|
||||
provider_id,
|
||||
)
|
||||
with self.get_session() as session:
|
||||
logger.info(
|
||||
"Opened Neptune writer session for provider graph drop (provider=%s)",
|
||||
provider_id,
|
||||
)
|
||||
while True:
|
||||
next_batch = relationship_batches + 1
|
||||
logger.info(
|
||||
"Deleting relationship batch from Neptune sink "
|
||||
"(provider=%s, batch=%s, total_rels=%s, elapsed=%.3fs)",
|
||||
provider_id,
|
||||
next_batch,
|
||||
deleted_relationships,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
result = session.run(
|
||||
f"""
|
||||
MATCH (:`{provider_label}`)-[r]-()
|
||||
WITH DISTINCT r LIMIT $batch_size
|
||||
DELETE r
|
||||
RETURN COUNT(r) AS deleted_rels_count
|
||||
""",
|
||||
{"batch_size": BATCH_SIZE},
|
||||
)
|
||||
record = result.single()
|
||||
deleted_rels = (record["deleted_rels_count"] if record else 0) or 0
|
||||
if deleted_rels == 0:
|
||||
break
|
||||
relationship_batches += 1
|
||||
deleted_relationships += deleted_rels
|
||||
logger.info(
|
||||
"Deleted relationship batch from Neptune sink "
|
||||
"(provider=%s, batch=%s, deleted_rels=%s, total_rels=%s, "
|
||||
"elapsed=%.3fs)",
|
||||
provider_id,
|
||||
relationship_batches,
|
||||
deleted_rels,
|
||||
deleted_relationships,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
|
||||
deleted_nodes = 0
|
||||
while True:
|
||||
next_batch = node_batches + 1
|
||||
logger.info(
|
||||
"Deleting node batch from Neptune sink "
|
||||
"(provider=%s, batch=%s, total_nodes=%s, elapsed=%.3fs)",
|
||||
provider_id,
|
||||
next_batch,
|
||||
deleted_nodes,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
result = session.run(
|
||||
f"""
|
||||
MATCH (n:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}`)
|
||||
WITH n LIMIT $batch_size
|
||||
DELETE n
|
||||
RETURN COUNT(n) AS deleted_nodes_count
|
||||
""",
|
||||
{"batch_size": BATCH_SIZE},
|
||||
)
|
||||
record = result.single()
|
||||
deleted = (record["deleted_nodes_count"] if record else 0) or 0
|
||||
if deleted == 0:
|
||||
break
|
||||
node_batches += 1
|
||||
deleted_nodes += deleted
|
||||
logger.info(
|
||||
"Deleted node batch from Neptune sink "
|
||||
"(provider=%s, batch=%s, deleted_nodes=%s, total_nodes=%s, "
|
||||
"elapsed=%.3fs)",
|
||||
provider_id,
|
||||
node_batches,
|
||||
deleted,
|
||||
deleted_nodes,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
|
||||
logger.info(
|
||||
"Finished dropping provider graph from Neptune sink "
|
||||
"(provider=%s, relationship_batches=%s, deleted_rels=%s, "
|
||||
"node_batches=%s, deleted_nodes=%s, elapsed=%.3fs)",
|
||||
provider_id,
|
||||
relationship_batches,
|
||||
deleted_relationships,
|
||||
node_batches,
|
||||
deleted_nodes,
|
||||
time.perf_counter() - drop_t0,
|
||||
)
|
||||
return deleted_nodes
|
||||
|
||||
def has_provider_data(self, database: str, provider_id: str) -> bool: # noqa: ARG002
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_provider_label,
|
||||
)
|
||||
|
||||
provider_label = get_provider_label(provider_id)
|
||||
query = (
|
||||
f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
|
||||
)
|
||||
with self.get_session(default_access_mode=neo4j.READ_ACCESS) as session:
|
||||
result = session.run(query)
|
||||
return result.single() is not None
|
||||
|
||||
def clear_cache(self, database: str) -> None: # noqa: ARG002
|
||||
# Neptune has no user-facing cache-clear procedure; no-op.
|
||||
return None
|
||||
|
||||
# Sync write path
|
||||
|
||||
def ensure_sync_indexes(self, database: str) -> None: # noqa: ARG002
|
||||
# Neptune routes node and relationship lookups through `~id`, which is the cluster's primary key
|
||||
# No additional index is needed or supported
|
||||
return None
|
||||
|
||||
def write_nodes(
|
||||
self,
|
||||
database: str, # noqa: ARG002
|
||||
labels: str,
|
||||
rows: list[dict[str, Any]],
|
||||
) -> None:
|
||||
if not rows:
|
||||
return
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_ELEMENT_ID_PROPERTY,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
)
|
||||
|
||||
# MERGE on `~id` is the documented and engine-optimized idempotent
|
||||
# upsert pattern for Neptune openCypher. The label inside the MERGE
|
||||
# matters: Neptune assigns a default `vertex` label to any node
|
||||
# created without an explicit one, so we pin `_ProviderResource`
|
||||
# (which every synced node carries anyway) at MERGE-time. Additional
|
||||
# labels are added after
|
||||
#
|
||||
# We also write `_provider_element_id` as a regular property so
|
||||
# non-sync code (drop_subgraph, query helpers) keeps a stable contract
|
||||
# that doesn't know about `~id`
|
||||
query = f"""
|
||||
UNWIND $rows AS row
|
||||
MERGE (n:`{PROVIDER_RESOURCE_LABEL}` {{`~id`: row.provider_element_id}})
|
||||
SET n:{labels}
|
||||
SET n += row.props
|
||||
SET n.`{PROVIDER_ELEMENT_ID_PROPERTY}` = row.provider_element_id
|
||||
"""
|
||||
with self.get_session() as session:
|
||||
session.run(query, {"rows": rows}).consume()
|
||||
|
||||
def write_relationships(
|
||||
self,
|
||||
database: str, # noqa: ARG002
|
||||
rel_type: str,
|
||||
provider_id: str, # noqa: ARG002 - encoded in start/end `~id` already
|
||||
rows: list[dict[str, Any]],
|
||||
) -> None:
|
||||
if not rows:
|
||||
return
|
||||
from tasks.jobs.attack_paths.config import PROVIDER_ELEMENT_ID_PROPERTY
|
||||
|
||||
# `id(n) = $value` is Neptune's parameterized fast path; both endpoint
|
||||
# MATCHes resolve in O(1) via the system `~id`, so per-row work stays
|
||||
# bounded regardless of batch size
|
||||
query = f"""
|
||||
UNWIND $rows AS row
|
||||
MATCH (s) WHERE id(s) = row.start_element_id
|
||||
MATCH (e) WHERE id(e) = row.end_element_id
|
||||
MERGE (s)-[r:`{rel_type}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}}]->(e)
|
||||
SET r += row.props
|
||||
"""
|
||||
with self.get_session() as session:
|
||||
session.run(query, {"rows": rows}).consume()
|
||||
|
||||
# Test helpers
|
||||
|
||||
def get_writer(self) -> neo4j.Driver:
|
||||
return self._get_writer()
|
||||
|
||||
def get_reader(self) -> neo4j.Driver:
|
||||
return self._get_reader()
|
||||
|
||||
|
||||
# SigV4 auth provider
|
||||
|
||||
|
||||
class _NeptuneAuthToken(neo4j.Auth):
|
||||
"""Neo4j Auth backed by a SigV4-signed GET to `/opencypher`."""
|
||||
|
||||
def __init__(self, region: str, url: str) -> None:
|
||||
session = BotoSession()
|
||||
credentials = session.get_credentials()
|
||||
if credentials is None:
|
||||
raise RuntimeError(
|
||||
"No AWS credentials available for Neptune SigV4 signing. "
|
||||
"Ensure the boto3 credential chain can resolve."
|
||||
)
|
||||
credentials = credentials.get_frozen_credentials()
|
||||
|
||||
request = AWSRequest(method="GET", url=url + "/opencypher")
|
||||
# SigV4 canonical Host must carry the real `host:port`
|
||||
# Neptune runs on a non-default port (8182), so `.hostname` would drop it and break signing
|
||||
request.headers.add_header("Host", urlsplit(url).netloc)
|
||||
SigV4Auth(credentials, "neptune-db", region).add_auth(request)
|
||||
|
||||
auth_obj = {
|
||||
header: request.headers[header]
|
||||
for header in (
|
||||
"Authorization",
|
||||
"X-Amz-Date",
|
||||
"X-Amz-Security-Token",
|
||||
"Host",
|
||||
)
|
||||
if header in request.headers
|
||||
}
|
||||
auth_obj["HttpMethod"] = "GET"
|
||||
|
||||
super().__init__("basic", "username", json.dumps(auth_obj))
|
||||
|
||||
|
||||
def neptune_auth_provider(region: str, https_url: str) -> Callable[[], ExpiringAuth]:
|
||||
"""Return a callable the neo4j driver can invoke to refresh credentials."""
|
||||
|
||||
def _provider() -> ExpiringAuth:
|
||||
token = _NeptuneAuthToken(region, https_url)
|
||||
expires_at = (
|
||||
datetime.datetime.now(datetime.UTC)
|
||||
+ datetime.timedelta(minutes=SIGV4_TOKEN_LIFETIME_MINUTES)
|
||||
).timestamp()
|
||||
return ExpiringAuth(auth=token, expires_at=expires_at)
|
||||
|
||||
return _provider
|
||||
@@ -5,6 +5,7 @@ from typing import Any
|
||||
import neo4j
|
||||
from api.attack_paths import AttackPathsQueryDefinition
|
||||
from api.attack_paths import database as graph_database
|
||||
from api.attack_paths import sink as sink_module
|
||||
from api.attack_paths.cypher_sanitizer import (
|
||||
inject_provider_label,
|
||||
validate_custom_query,
|
||||
@@ -14,7 +15,9 @@ from api.attack_paths.queries.schema import (
|
||||
RAW_SCHEMA_URL,
|
||||
get_cartography_schema_query,
|
||||
)
|
||||
from api.models import AttackPathsScan
|
||||
from config.custom_logging import BackendLogger
|
||||
from config.env import env
|
||||
from rest_framework.exceptions import APIException, PermissionDenied, ValidationError
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
INTERNAL_LABELS,
|
||||
@@ -26,6 +29,10 @@ from tasks.jobs.attack_paths.config import (
|
||||
logger = logging.getLogger(BackendLogger.API)
|
||||
|
||||
|
||||
def _custom_query_timeout_ms() -> int:
|
||||
return env.int("ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30) * 1000
|
||||
|
||||
|
||||
# Predefined query helpers
|
||||
|
||||
|
||||
@@ -102,13 +109,13 @@ def execute_query(
|
||||
definition: AttackPathsQueryDefinition,
|
||||
parameters: dict[str, Any],
|
||||
provider_id: str,
|
||||
scan: AttackPathsScan,
|
||||
) -> dict[str, Any]:
|
||||
try:
|
||||
graph = graph_database.execute_read_query(
|
||||
database=database_name,
|
||||
cypher=definition.cypher,
|
||||
parameters=parameters,
|
||||
)
|
||||
# TODO: drop after Neptune cutover
|
||||
# Route reads by the scan row's recorded sink, not by current settings.
|
||||
backend = sink_module.get_backend_for_scan(scan)
|
||||
graph = backend.execute_read_query(database_name, definition.cypher, parameters)
|
||||
return _serialize_graph(graph, provider_id)
|
||||
|
||||
except graph_database.WriteQueryNotAllowedException:
|
||||
@@ -142,22 +149,31 @@ def execute_custom_query(
|
||||
database_name: str,
|
||||
cypher: str,
|
||||
provider_id: str,
|
||||
scan: AttackPathsScan,
|
||||
) -> dict[str, Any]:
|
||||
# Defense-in-depth for custom queries:
|
||||
# 1. neo4j.READ_ACCESS — prevents mutations at the driver level
|
||||
# 2. inject_provider_label() — regex-based label injection scopes node patterns
|
||||
# 3. _serialize_graph() — post-query filter drops nodes without the provider label
|
||||
# 1. `neo4j.READ_ACCESS` — prevents mutations at the driver level
|
||||
# 2. `inject_provider_label()` — regex-based label injection scopes node patterns
|
||||
# 3. `_serialize_graph()` — post-query filter drops nodes without the provider label
|
||||
# 4. `USING QUERY:TIMEOUTMILLISECONDS` on Neptune — server-side runaway cutoff
|
||||
#
|
||||
# Layer 2 is best-effort (regex can't fully parse Cypher);
|
||||
# layer 3 is the safety net that guarantees provider isolation.
|
||||
validate_custom_query(cypher)
|
||||
cypher = inject_provider_label(cypher, provider_id)
|
||||
|
||||
# TODO: drop after Neptune cutover
|
||||
backend = sink_module.get_backend_for_scan(scan)
|
||||
|
||||
# Neptune enforces a cluster-level query timeout; prepending the hint
|
||||
# makes the limit explicit and matches the client-side read timeout.
|
||||
# Applies only when the scan's graph lives in Neptune.
|
||||
if getattr(scan, "sink_backend", None) == "neptune":
|
||||
timeout_ms = _custom_query_timeout_ms()
|
||||
cypher = f"USING QUERY:TIMEOUTMILLISECONDS {timeout_ms}\n{cypher}"
|
||||
|
||||
try:
|
||||
graph = graph_database.execute_read_query(
|
||||
database=database_name,
|
||||
cypher=cypher,
|
||||
)
|
||||
graph = backend.execute_read_query(database_name, cypher, None)
|
||||
serialized = _serialize_graph(graph, provider_id)
|
||||
return _truncate_graph(serialized)
|
||||
|
||||
@@ -180,10 +196,11 @@ def execute_custom_query(
|
||||
|
||||
|
||||
def get_cartography_schema(
|
||||
database_name: str, provider_id: str
|
||||
database_name: str, provider_id: str, scan: AttackPathsScan
|
||||
) -> dict[str, str] | None:
|
||||
try:
|
||||
with graph_database.get_session(
|
||||
backend = sink_module.get_backend_for_scan(scan)
|
||||
with backend.get_session(
|
||||
database_name, default_access_mode=neo4j.READ_ACCESS
|
||||
) as session:
|
||||
result = session.run(get_cartography_schema_query(provider_id))
|
||||
|
||||
@@ -2,8 +2,9 @@
|
||||
Format (draft-inadarei-api-health-check-06).
|
||||
|
||||
Liveness reports only process status. Readiness verifies that PostgreSQL,
|
||||
Valkey and Neo4j are reachable and returns per-dependency detail when any
|
||||
of them is unreachable.
|
||||
Valkey and the attack-paths graph store (Neo4j or Neptune, per
|
||||
``ATTACK_PATHS_SINK_DATABASE``) are reachable and returns per-dependency
|
||||
detail when any of them is unreachable.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
@@ -11,6 +12,8 @@ from __future__ import annotations
|
||||
import logging
|
||||
import threading
|
||||
import time
|
||||
from concurrent.futures import ThreadPoolExecutor
|
||||
from concurrent.futures import TimeoutError as FuturesTimeoutError
|
||||
from contextlib import suppress
|
||||
from datetime import UTC, datetime
|
||||
from typing import Any
|
||||
@@ -37,9 +40,28 @@ STATUS_FAIL = "fail"
|
||||
STATUS_WARN = "warn"
|
||||
|
||||
# Short socket timeout so a stuck Valkey cannot stall the probe.
|
||||
# Neo4j inherits its driver-level ``connection_acquisition_timeout``.
|
||||
VALKEY_PROBE_TIMEOUT_SECONDS = 2
|
||||
|
||||
# Probe-scoped budget for the graph database.
|
||||
# ``Driver.verify_connectivity()`` takes no timeout; its only bound is the
|
||||
# driver-level ``connection_acquisition_timeout`` (60s on Neptune). The
|
||||
# probe needs its own budget, independent of the workload driver, so a
|
||||
# graph-database outage cannot pin a worker thread (and the readiness lock)
|
||||
# for a minute.
|
||||
GRAPH_DB_PROBE_TIMEOUT_SECONDS = 5
|
||||
|
||||
# Bounded pool that enforces ``GRAPH_DB_PROBE_TIMEOUT_SECONDS``. If the
|
||||
# graph database is unreachable the probe call blocks until the driver's
|
||||
# own acquisition timeout fires; we abandon the future after the budget and
|
||||
# report ``fail``. Orphaned tasks are capped by ``max_workers`` plus the 3s
|
||||
# readiness cache plus the per-IP throttle, so they cannot pile up: worst
|
||||
# case during a graph-database outage is every readiness call failing fast
|
||||
# in ``GRAPH_DB_PROBE_TIMEOUT_SECONDS`` with at most 2 background threads
|
||||
# stuck for <= the driver acquisition timeout.
|
||||
_graph_db_probe_executor = ThreadPoolExecutor(
|
||||
max_workers=2, thread_name_prefix="health-graph-db-probe"
|
||||
)
|
||||
|
||||
# Brief cache window so high-frequency probes (ALB target groups, scrapers)
|
||||
# do not stampede the actual dependency checks.
|
||||
CACHE_CONTROL_HEADER = "max-age=3, must-revalidate"
|
||||
@@ -109,11 +131,24 @@ def _probe_valkey() -> None:
|
||||
client.close()
|
||||
|
||||
|
||||
def _probe_neo4j() -> None:
|
||||
# Lazy import: avoids pulling attack_paths into the boot import graph.
|
||||
from api.attack_paths.database import get_driver
|
||||
def _graph_db_component_id() -> str:
|
||||
"""Return the active graph database name for the ``componentId`` field."""
|
||||
return settings.ATTACK_PATHS_SINK_DATABASE.strip().lower()
|
||||
|
||||
get_driver().verify_connectivity()
|
||||
|
||||
def _probe_graph_db() -> None:
|
||||
# Lazy import: avoids pulling attack_paths into the boot import graph
|
||||
from api.attack_paths.database import verify_connectivity
|
||||
|
||||
future = _graph_db_probe_executor.submit(verify_connectivity)
|
||||
try:
|
||||
future.result(timeout=GRAPH_DB_PROBE_TIMEOUT_SECONDS)
|
||||
except FuturesTimeoutError as exc:
|
||||
# Do not wait for the abandoned task; it ends when the driver's own acquisition timeout fires
|
||||
future.cancel()
|
||||
raise TimeoutError(
|
||||
f"graph-db probe exceeded {GRAPH_DB_PROBE_TIMEOUT_SECONDS}s"
|
||||
) from exc
|
||||
|
||||
|
||||
def _build_check_entry(
|
||||
@@ -176,14 +211,18 @@ def _readiness_payload() -> tuple[dict[str, Any], int]:
|
||||
):
|
||||
return snapshot[1], snapshot[2]
|
||||
|
||||
graph_db_component_id = _graph_db_component_id()
|
||||
|
||||
postgres_result, postgres_ms = _measure("postgres", _probe_postgres)
|
||||
valkey_result, valkey_ms = _measure("valkey", _probe_valkey)
|
||||
neo4j_result, neo4j_ms = _measure("neo4j", _probe_neo4j)
|
||||
graph_db_result, graph_db_ms = _measure(graph_db_component_id, _probe_graph_db)
|
||||
|
||||
entries = [
|
||||
_build_check_entry("postgres", "datastore", postgres_result, postgres_ms),
|
||||
_build_check_entry("valkey", "datastore", valkey_result, valkey_ms),
|
||||
_build_check_entry("neo4j", "datastore", neo4j_result, neo4j_ms),
|
||||
_build_check_entry(
|
||||
graph_db_component_id, "datastore", graph_db_result, graph_db_ms
|
||||
),
|
||||
]
|
||||
overall = _aggregate_status(entries)
|
||||
|
||||
@@ -191,7 +230,7 @@ def _readiness_payload() -> tuple[dict[str, Any], int]:
|
||||
payload["checks"] = {
|
||||
"postgres:responseTime": [entries[0]],
|
||||
"valkey:responseTime": [entries[1]],
|
||||
"neo4j:responseTime": [entries[2]],
|
||||
"graphdb:responseTime": [entries[2]],
|
||||
}
|
||||
|
||||
http_status = (
|
||||
@@ -233,10 +272,10 @@ class LivenessView(APIView):
|
||||
class ReadinessView(APIView):
|
||||
"""Readiness probe.
|
||||
|
||||
Returns 200 when PostgreSQL, Valkey and Neo4j all respond, or 503 with
|
||||
per-dependency detail when any of them is unreachable. Per-IP throttle
|
||||
plus the short in-process result cache cap the real dependency hits
|
||||
regardless of inbound traffic shape.
|
||||
Returns 200 when PostgreSQL, Valkey and the attack-paths graph store
|
||||
all respond, or 503 with per-dependency detail when any of them is
|
||||
unreachable. Per-IP throttle plus the short in-process result cache cap
|
||||
the real dependency hits regardless of inbound traffic shape.
|
||||
"""
|
||||
|
||||
authentication_classes: list = []
|
||||
|
||||
@@ -0,0 +1,24 @@
|
||||
from django.db import migrations, models
|
||||
|
||||
|
||||
class Migration(migrations.Migration):
|
||||
dependencies = [
|
||||
("api", "0095_reconcile_orphan_tasks_periodic_task"),
|
||||
]
|
||||
|
||||
operations = [
|
||||
migrations.AddField(
|
||||
model_name="attackpathsscan",
|
||||
name="is_migrated",
|
||||
field=models.BooleanField(default=False),
|
||||
),
|
||||
migrations.AddField(
|
||||
model_name="attackpathsscan",
|
||||
name="sink_backend",
|
||||
field=models.CharField(
|
||||
choices=[("neo4j", "Neo4j"), ("neptune", "Neptune")],
|
||||
default="neo4j",
|
||||
max_length=16,
|
||||
),
|
||||
),
|
||||
]
|
||||
@@ -757,6 +757,10 @@ class Scan(RowLevelSecurityProtectedModel):
|
||||
|
||||
|
||||
class AttackPathsScan(RowLevelSecurityProtectedModel):
|
||||
class SinkBackendChoices(models.TextChoices):
|
||||
NEO4J = "neo4j", "Neo4j"
|
||||
NEPTUNE = "neptune", "Neptune"
|
||||
|
||||
objects = ActiveProviderManager()
|
||||
all_objects = models.Manager()
|
||||
|
||||
@@ -805,6 +809,18 @@ class AttackPathsScan(RowLevelSecurityProtectedModel):
|
||||
)
|
||||
ingestion_exceptions = models.JSONField(default=dict, null=True, blank=True)
|
||||
|
||||
# True when the scan was synced with the current schema (list-typed
|
||||
# properties materialised as child item nodes). False for pre-cutover scans
|
||||
# still using the previous graph shape. Query catalog selection uses this
|
||||
# flag; physical read routing uses sink_backend below.
|
||||
# TODO: drop after Neptune cutover
|
||||
is_migrated = models.BooleanField(default=False)
|
||||
sink_backend = models.CharField(
|
||||
choices=SinkBackendChoices.choices,
|
||||
default=SinkBackendChoices.NEO4J,
|
||||
max_length=16,
|
||||
)
|
||||
|
||||
class Meta(RowLevelSecurityProtectedModel.Meta):
|
||||
db_table = "attack_paths_scans"
|
||||
|
||||
|
||||
@@ -92,7 +92,9 @@ def test_prepare_parameters_validates_cast(
|
||||
|
||||
|
||||
def test_execute_query_serializes_graph(
|
||||
attack_paths_query_definition_factory, attack_paths_graph_stub_classes
|
||||
attack_paths_query_definition_factory,
|
||||
attack_paths_graph_stub_classes,
|
||||
sink_backend_stub,
|
||||
):
|
||||
definition = attack_paths_query_definition_factory(
|
||||
id="aws-rds",
|
||||
@@ -135,18 +137,17 @@ def test_execute_query_serializes_graph(
|
||||
|
||||
database_name = "db-tenant-test-tenant-id"
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.views_helpers.graph_database.execute_read_query",
|
||||
return_value=graph_result,
|
||||
) as mock_execute_read_query:
|
||||
result = views_helpers.execute_query(
|
||||
database_name, definition, parameters, provider_id=provider_id
|
||||
)
|
||||
sink_backend_stub.execute_read_query.return_value = graph_result
|
||||
result = views_helpers.execute_query(
|
||||
database_name,
|
||||
definition,
|
||||
parameters,
|
||||
provider_id=provider_id,
|
||||
scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
|
||||
)
|
||||
|
||||
mock_execute_read_query.assert_called_once_with(
|
||||
database=database_name,
|
||||
cypher=definition.cypher,
|
||||
parameters=parameters,
|
||||
sink_backend_stub.execute_read_query.assert_called_once_with(
|
||||
database_name, definition.cypher, parameters
|
||||
)
|
||||
assert result["nodes"][0]["id"] == "node-1"
|
||||
assert result["nodes"][0]["properties"]["complex"]["items"][0] == "value"
|
||||
@@ -155,6 +156,7 @@ def test_execute_query_serializes_graph(
|
||||
|
||||
def test_execute_query_wraps_graph_errors(
|
||||
attack_paths_query_definition_factory,
|
||||
sink_backend_stub,
|
||||
):
|
||||
definition = attack_paths_query_definition_factory(
|
||||
id="aws-rds",
|
||||
@@ -167,16 +169,17 @@ def test_execute_query_wraps_graph_errors(
|
||||
database_name = "db-tenant-test-tenant-id"
|
||||
parameters = {"provider_uid": "123"}
|
||||
|
||||
with (
|
||||
patch(
|
||||
"api.attack_paths.views_helpers.graph_database.execute_read_query",
|
||||
side_effect=graph_database.GraphDatabaseQueryException("boom"),
|
||||
),
|
||||
patch("api.attack_paths.views_helpers.logger") as mock_logger,
|
||||
):
|
||||
sink_backend_stub.execute_read_query.side_effect = (
|
||||
graph_database.GraphDatabaseQueryException("boom")
|
||||
)
|
||||
with patch("api.attack_paths.views_helpers.logger") as mock_logger:
|
||||
with pytest.raises(APIException):
|
||||
views_helpers.execute_query(
|
||||
database_name, definition, parameters, provider_id="test-provider-123"
|
||||
database_name,
|
||||
definition,
|
||||
parameters,
|
||||
provider_id="test-provider-123",
|
||||
scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
|
||||
)
|
||||
|
||||
mock_logger.error.assert_called_once()
|
||||
@@ -184,6 +187,7 @@ def test_execute_query_wraps_graph_errors(
|
||||
|
||||
def test_execute_query_raises_permission_denied_on_read_only(
|
||||
attack_paths_query_definition_factory,
|
||||
sink_backend_stub,
|
||||
):
|
||||
definition = attack_paths_query_definition_factory(
|
||||
id="aws-rds",
|
||||
@@ -196,17 +200,20 @@ def test_execute_query_raises_permission_denied_on_read_only(
|
||||
database_name = "db-tenant-test-tenant-id"
|
||||
parameters = {"provider_uid": "123"}
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.views_helpers.graph_database.execute_read_query",
|
||||
side_effect=graph_database.WriteQueryNotAllowedException(
|
||||
sink_backend_stub.execute_read_query.side_effect = (
|
||||
graph_database.WriteQueryNotAllowedException(
|
||||
message="Read query not allowed",
|
||||
code="Neo.ClientError.Statement.AccessMode",
|
||||
),
|
||||
):
|
||||
with pytest.raises(PermissionDenied):
|
||||
views_helpers.execute_query(
|
||||
database_name, definition, parameters, provider_id="test-provider-123"
|
||||
)
|
||||
)
|
||||
)
|
||||
with pytest.raises(PermissionDenied):
|
||||
views_helpers.execute_query(
|
||||
database_name,
|
||||
definition,
|
||||
parameters,
|
||||
provider_id="test-provider-123",
|
||||
scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
|
||||
)
|
||||
|
||||
|
||||
def test_serialize_graph_filters_by_provider_label(attack_paths_graph_stub_classes):
|
||||
@@ -440,6 +447,7 @@ def test_normalize_custom_query_payload_passthrough_for_flat_dict():
|
||||
|
||||
def test_execute_custom_query_serializes_graph(
|
||||
attack_paths_graph_stub_classes,
|
||||
sink_backend_stub,
|
||||
):
|
||||
provider_id = "test-provider-123"
|
||||
plabel = get_provider_label(provider_id)
|
||||
@@ -453,50 +461,73 @@ def test_execute_custom_query_serializes_graph(
|
||||
graph_result.nodes = [node_1, node_2]
|
||||
graph_result.relationships = [relationship]
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.views_helpers.graph_database.execute_read_query",
|
||||
return_value=graph_result,
|
||||
) as mock_execute:
|
||||
result = views_helpers.execute_custom_query(
|
||||
"db-tenant-test", "MATCH (n) RETURN n", provider_id
|
||||
)
|
||||
sink_backend_stub.execute_read_query.return_value = graph_result
|
||||
result = views_helpers.execute_custom_query(
|
||||
"db-tenant-test",
|
||||
"MATCH (n) RETURN n",
|
||||
provider_id,
|
||||
scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
|
||||
)
|
||||
|
||||
mock_execute.assert_called_once()
|
||||
call_kwargs = mock_execute.call_args[1]
|
||||
assert call_kwargs["database"] == "db-tenant-test"
|
||||
sink_backend_stub.execute_read_query.assert_called_once()
|
||||
call_args = sink_backend_stub.execute_read_query.call_args[0]
|
||||
assert call_args[0] == "db-tenant-test"
|
||||
# The cypher is rewritten with the provider label injection
|
||||
assert plabel in call_kwargs["cypher"]
|
||||
assert plabel in call_args[1]
|
||||
assert len(result["nodes"]) == 2
|
||||
assert result["relationships"][0]["label"] == "OWNS"
|
||||
assert result["truncated"] is False
|
||||
assert result["total_nodes"] == 2
|
||||
|
||||
|
||||
def test_execute_custom_query_raises_permission_denied_on_write():
|
||||
def test_execute_custom_query_adds_timeout_for_neptune_scan(sink_backend_stub):
|
||||
graph_result = MagicMock()
|
||||
graph_result.nodes = []
|
||||
graph_result.relationships = []
|
||||
sink_backend_stub.execute_read_query.return_value = graph_result
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.views_helpers.graph_database.execute_read_query",
|
||||
side_effect=graph_database.WriteQueryNotAllowedException(
|
||||
"api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
|
||||
return_value=sink_backend_stub,
|
||||
):
|
||||
views_helpers.execute_custom_query(
|
||||
"db-tenant-test",
|
||||
"MATCH (n) RETURN n",
|
||||
"provider-1",
|
||||
scan=MagicMock(is_migrated=True, sink_backend="neptune"),
|
||||
)
|
||||
|
||||
cypher = sink_backend_stub.execute_read_query.call_args[0][1]
|
||||
assert cypher.startswith("USING QUERY:TIMEOUTMILLISECONDS")
|
||||
|
||||
|
||||
def test_execute_custom_query_raises_permission_denied_on_write(sink_backend_stub):
|
||||
sink_backend_stub.execute_read_query.side_effect = (
|
||||
graph_database.WriteQueryNotAllowedException(
|
||||
message="Read query not allowed",
|
||||
code="Neo.ClientError.Statement.AccessMode",
|
||||
),
|
||||
):
|
||||
with pytest.raises(PermissionDenied):
|
||||
views_helpers.execute_custom_query(
|
||||
"db-tenant-test", "CREATE (n) RETURN n", "provider-1"
|
||||
)
|
||||
)
|
||||
)
|
||||
with pytest.raises(PermissionDenied):
|
||||
views_helpers.execute_custom_query(
|
||||
"db-tenant-test",
|
||||
"CREATE (n) RETURN n",
|
||||
"provider-1",
|
||||
scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
|
||||
)
|
||||
|
||||
|
||||
def test_execute_custom_query_wraps_graph_errors():
|
||||
with (
|
||||
patch(
|
||||
"api.attack_paths.views_helpers.graph_database.execute_read_query",
|
||||
side_effect=graph_database.GraphDatabaseQueryException("boom"),
|
||||
),
|
||||
patch("api.attack_paths.views_helpers.logger") as mock_logger,
|
||||
):
|
||||
def test_execute_custom_query_wraps_graph_errors(sink_backend_stub):
|
||||
sink_backend_stub.execute_read_query.side_effect = (
|
||||
graph_database.GraphDatabaseQueryException("boom")
|
||||
)
|
||||
with patch("api.attack_paths.views_helpers.logger") as mock_logger:
|
||||
with pytest.raises(APIException):
|
||||
views_helpers.execute_custom_query(
|
||||
"db-tenant-test", "MATCH (n) RETURN n", "provider-1"
|
||||
"db-tenant-test",
|
||||
"MATCH (n) RETURN n",
|
||||
"provider-1",
|
||||
scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
|
||||
)
|
||||
|
||||
mock_logger.error.assert_called_once()
|
||||
@@ -561,13 +592,33 @@ def test_truncate_graph_empty_graph():
|
||||
|
||||
@pytest.fixture
|
||||
def mock_neo4j_session():
|
||||
"""Mock the Neo4j driver so execute_read_query uses a fake session."""
|
||||
"""Install a Neo4jSink with a mocked Bolt driver into the sink factory.
|
||||
|
||||
The yielded mock is the `neo4j.Session` that the Neo4jSink will obtain via
|
||||
`driver.session(...)`. Tests configure `mock_neo4j_session.execute_read`
|
||||
return values / side effects to exercise the read-mode error translation
|
||||
path on the real `Neo4jSink.execute_read_query` and `get_session` code.
|
||||
"""
|
||||
from api.attack_paths.sink import factory
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
mock_session = MagicMock(spec=neo4j.Session)
|
||||
mock_driver = MagicMock(spec=neo4j.Driver)
|
||||
mock_driver.session.return_value = mock_session
|
||||
|
||||
with patch("api.attack_paths.database.get_driver", return_value=mock_driver):
|
||||
sink = Neo4jSink()
|
||||
sink._driver = mock_driver
|
||||
|
||||
previous_backend = factory._backend
|
||||
previous_secondary = dict(factory._secondary_backends)
|
||||
factory._backend = sink
|
||||
factory._secondary_backends.clear()
|
||||
try:
|
||||
yield mock_session
|
||||
finally:
|
||||
factory._backend = previous_backend
|
||||
factory._secondary_backends.clear()
|
||||
factory._secondary_backends.update(previous_secondary)
|
||||
|
||||
|
||||
def test_execute_read_query_succeeds_with_select(mock_neo4j_session):
|
||||
@@ -663,16 +714,20 @@ def test_execute_read_query_rejects_apoc_real_create(mock_neo4j_session, cypher)
|
||||
|
||||
@pytest.fixture
|
||||
def mock_schema_session():
|
||||
"""Mock get_session for cartography schema tests."""
|
||||
"""Mock the routed sink backend session for cartography schema tests."""
|
||||
mock_result = MagicMock()
|
||||
mock_session = MagicMock()
|
||||
mock_session.run.return_value = mock_result
|
||||
mock_backend = MagicMock()
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.views_helpers.graph_database.get_session"
|
||||
) as mock_get_session:
|
||||
mock_get_session.return_value.__enter__ = MagicMock(return_value=mock_session)
|
||||
mock_get_session.return_value.__exit__ = MagicMock(return_value=False)
|
||||
"api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
|
||||
return_value=mock_backend,
|
||||
):
|
||||
mock_backend.get_session.return_value.__enter__ = MagicMock(
|
||||
return_value=mock_session
|
||||
)
|
||||
mock_backend.get_session.return_value.__exit__ = MagicMock(return_value=False)
|
||||
yield mock_session, mock_result
|
||||
|
||||
|
||||
@@ -683,7 +738,9 @@ def test_get_cartography_schema_returns_urls(mock_schema_session):
|
||||
"module_version": "0.129.0",
|
||||
}
|
||||
|
||||
result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
|
||||
result = views_helpers.get_cartography_schema(
|
||||
"db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
|
||||
)
|
||||
|
||||
mock_session.run.assert_called_once()
|
||||
assert result["id"] == "aws-0.129.0"
|
||||
@@ -699,7 +756,9 @@ def test_get_cartography_schema_returns_none_when_no_data(mock_schema_session):
|
||||
_, mock_result = mock_schema_session
|
||||
mock_result.single.return_value = None
|
||||
|
||||
result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
|
||||
result = views_helpers.get_cartography_schema(
|
||||
"db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
|
||||
)
|
||||
|
||||
assert result is None
|
||||
|
||||
@@ -721,21 +780,29 @@ def test_get_cartography_schema_extracts_provider(
|
||||
"module_version": "1.0.0",
|
||||
}
|
||||
|
||||
result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
|
||||
result = views_helpers.get_cartography_schema(
|
||||
"db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
|
||||
)
|
||||
|
||||
assert result["id"] == f"{expected_provider}-1.0.0"
|
||||
assert result["provider"] == expected_provider
|
||||
|
||||
|
||||
def test_get_cartography_schema_wraps_database_error():
|
||||
mock_backend = MagicMock()
|
||||
mock_backend.get_session.side_effect = graph_database.GraphDatabaseQueryException(
|
||||
"boom"
|
||||
)
|
||||
with (
|
||||
patch(
|
||||
"api.attack_paths.views_helpers.graph_database.get_session",
|
||||
side_effect=graph_database.GraphDatabaseQueryException("boom"),
|
||||
"api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
|
||||
return_value=mock_backend,
|
||||
),
|
||||
patch("api.attack_paths.views_helpers.logger") as mock_logger,
|
||||
):
|
||||
with pytest.raises(APIException):
|
||||
views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
|
||||
views_helpers.get_cartography_schema(
|
||||
"db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
|
||||
)
|
||||
|
||||
mock_logger.error.assert_called_once()
|
||||
|
||||
@@ -1,623 +1,174 @@
|
||||
"""
|
||||
Tests for Neo4j database lazy initialization.
|
||||
"""Tests for the attack-paths database facade.
|
||||
|
||||
The Neo4j driver is created on first use for every process type; app startup
|
||||
never contacts Neo4j. These tests validate the database module behavior itself.
|
||||
After the Neptune port, `api.attack_paths.database` is a thin routing shim
|
||||
over `api.attack_paths.ingest` (cartography temp DB, always Neo4j) and
|
||||
`api.attack_paths.sink` (configurable Neo4j or Neptune). The facade's
|
||||
contract is routing by database-name prefix and the public exception
|
||||
hierarchy; sink-internal behavior is exercised in `test_sink.py`.
|
||||
"""
|
||||
|
||||
import threading
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import api.attack_paths.database as db_module
|
||||
import neo4j
|
||||
import neo4j.exceptions
|
||||
import pytest
|
||||
|
||||
|
||||
class TestLazyInitialization:
|
||||
"""Test that Neo4j driver is initialized lazily on first use."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_module_state(self):
|
||||
"""Reset module-level singleton state before each test."""
|
||||
original_driver = db_module._driver
|
||||
|
||||
db_module._driver = None
|
||||
|
||||
yield
|
||||
|
||||
db_module._driver = original_driver
|
||||
|
||||
def test_driver_not_initialized_at_import(self):
|
||||
"""Driver should be None after module import (no eager connection)."""
|
||||
assert db_module._driver is None
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_init_driver_creates_connection_on_first_call(
|
||||
self, mock_driver_factory, mock_settings
|
||||
):
|
||||
"""init_driver() should create connection only when called."""
|
||||
mock_driver = MagicMock()
|
||||
mock_driver_factory.return_value = mock_driver
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
assert db_module._driver is None
|
||||
|
||||
result = db_module.init_driver()
|
||||
|
||||
mock_driver_factory.assert_called_once()
|
||||
mock_driver.verify_connectivity.assert_called_once()
|
||||
assert result is mock_driver
|
||||
assert db_module._driver is mock_driver
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_init_driver_leaves_driver_none_when_verify_fails(
|
||||
self, mock_driver_factory, mock_settings
|
||||
):
|
||||
"""A failed verify_connectivity() must not publish or leak the driver."""
|
||||
mock_driver = MagicMock()
|
||||
mock_driver.verify_connectivity.side_effect = (
|
||||
neo4j.exceptions.ServiceUnavailable("down")
|
||||
)
|
||||
mock_driver_factory.return_value = mock_driver
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
with pytest.raises(neo4j.exceptions.ServiceUnavailable):
|
||||
db_module.init_driver()
|
||||
|
||||
assert db_module._driver is None
|
||||
mock_driver.close.assert_called_once()
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_init_driver_returns_cached_driver_on_subsequent_calls(
|
||||
self, mock_driver_factory, mock_settings
|
||||
):
|
||||
"""Subsequent calls should return cached driver without reconnecting."""
|
||||
mock_driver = MagicMock()
|
||||
mock_driver_factory.return_value = mock_driver
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
first_result = db_module.init_driver()
|
||||
second_result = db_module.init_driver()
|
||||
third_result = db_module.init_driver()
|
||||
|
||||
# Only one connection attempt
|
||||
assert mock_driver_factory.call_count == 1
|
||||
assert mock_driver.verify_connectivity.call_count == 1
|
||||
|
||||
# All calls return same instance
|
||||
assert first_result is second_result is third_result
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_get_driver_delegates_to_init_driver(
|
||||
self, mock_driver_factory, mock_settings
|
||||
):
|
||||
"""get_driver() should use init_driver() for lazy initialization."""
|
||||
mock_driver = MagicMock()
|
||||
mock_driver_factory.return_value = mock_driver
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
result = db_module.get_driver()
|
||||
|
||||
assert result is mock_driver
|
||||
mock_driver_factory.assert_called_once()
|
||||
|
||||
|
||||
class TestConnectionAcquisitionTimeout:
|
||||
"""Test that the connection acquisition timeout is configurable."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_module_state(self):
|
||||
original_driver = db_module._driver
|
||||
original_acq_timeout = db_module.CONN_ACQUISITION_TIMEOUT
|
||||
original_conn_timeout = db_module.CONNECTION_TIMEOUT
|
||||
|
||||
db_module._driver = None
|
||||
|
||||
yield
|
||||
|
||||
db_module._driver = original_driver
|
||||
db_module.CONN_ACQUISITION_TIMEOUT = original_acq_timeout
|
||||
db_module.CONNECTION_TIMEOUT = original_conn_timeout
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_driver_receives_configured_timeout(
|
||||
self, mock_driver_factory, mock_settings
|
||||
):
|
||||
"""init_driver() should pass the configured timeouts to the neo4j driver."""
|
||||
mock_driver_factory.return_value = MagicMock()
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
db_module.CONN_ACQUISITION_TIMEOUT = 42
|
||||
db_module.CONNECTION_TIMEOUT = 7
|
||||
|
||||
db_module.init_driver()
|
||||
|
||||
_, kwargs = mock_driver_factory.call_args
|
||||
assert kwargs["connection_acquisition_timeout"] == 42
|
||||
assert kwargs["connection_timeout"] == 7
|
||||
|
||||
|
||||
class TestAtexitRegistration:
|
||||
"""Test that atexit cleanup handler is registered correctly."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_module_state(self):
|
||||
"""Reset module-level singleton state before each test."""
|
||||
original_driver = db_module._driver
|
||||
|
||||
db_module._driver = None
|
||||
|
||||
yield
|
||||
|
||||
db_module._driver = original_driver
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.atexit.register")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_atexit_registered_on_first_init(
|
||||
self, mock_driver_factory, mock_atexit_register, mock_settings
|
||||
):
|
||||
"""atexit.register should be called on first initialization."""
|
||||
mock_driver_factory.return_value = MagicMock()
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
db_module.init_driver()
|
||||
|
||||
mock_atexit_register.assert_called_once_with(db_module.close_driver)
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.atexit.register")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_atexit_registered_only_once(
|
||||
self, mock_driver_factory, mock_atexit_register, mock_settings
|
||||
):
|
||||
"""atexit.register should only be called once across multiple inits.
|
||||
|
||||
The double-checked locking on _driver ensures the atexit registration
|
||||
block only executes once (when _driver is first created).
|
||||
"""
|
||||
mock_driver_factory.return_value = MagicMock()
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
db_module.init_driver()
|
||||
db_module.init_driver()
|
||||
db_module.init_driver()
|
||||
|
||||
# Only registered once because subsequent calls hit the fast path
|
||||
assert mock_atexit_register.call_count == 1
|
||||
|
||||
|
||||
class TestCloseDriver:
|
||||
"""Test driver cleanup functionality."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_module_state(self):
|
||||
"""Reset module-level singleton state before each test."""
|
||||
original_driver = db_module._driver
|
||||
|
||||
db_module._driver = None
|
||||
|
||||
yield
|
||||
|
||||
db_module._driver = original_driver
|
||||
|
||||
def test_close_driver_closes_and_clears_driver(self):
|
||||
"""close_driver() should close the driver and set it to None."""
|
||||
mock_driver = MagicMock()
|
||||
db_module._driver = mock_driver
|
||||
|
||||
db_module.close_driver()
|
||||
|
||||
mock_driver.close.assert_called_once()
|
||||
assert db_module._driver is None
|
||||
|
||||
def test_close_driver_handles_none_driver(self):
|
||||
"""close_driver() should handle case where driver is None."""
|
||||
db_module._driver = None
|
||||
|
||||
# Should not raise
|
||||
db_module.close_driver()
|
||||
|
||||
assert db_module._driver is None
|
||||
|
||||
def test_close_driver_clears_driver_even_on_close_error(self):
|
||||
"""Driver should be cleared even if close() raises an exception."""
|
||||
mock_driver = MagicMock()
|
||||
mock_driver.close.side_effect = Exception("Connection error")
|
||||
db_module._driver = mock_driver
|
||||
|
||||
with pytest.raises(Exception, match="Connection error"):
|
||||
db_module.close_driver()
|
||||
|
||||
# Driver should still be cleared
|
||||
assert db_module._driver is None
|
||||
|
||||
|
||||
class TestExecuteReadQuery:
|
||||
"""Test read query execution helper."""
|
||||
|
||||
def test_execute_read_query_calls_read_session_and_returns_result(self):
|
||||
tx = MagicMock()
|
||||
expected_graph = MagicMock()
|
||||
run_result = MagicMock()
|
||||
run_result.graph.return_value = expected_graph
|
||||
tx.run.return_value = run_result
|
||||
|
||||
session = MagicMock()
|
||||
|
||||
def execute_read_side_effect(fn):
|
||||
return fn(tx)
|
||||
|
||||
session.execute_read.side_effect = execute_read_side_effect
|
||||
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.return_value = session
|
||||
session_ctx.__exit__.return_value = False
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
) as mock_get_session:
|
||||
result = db_module.execute_read_query(
|
||||
"db-tenant-test-tenant-id",
|
||||
"MATCH (n) RETURN n",
|
||||
{"provider_uid": "123"},
|
||||
)
|
||||
|
||||
mock_get_session.assert_called_once_with(
|
||||
"db-tenant-test-tenant-id",
|
||||
default_access_mode=neo4j.READ_ACCESS,
|
||||
)
|
||||
session.execute_read.assert_called_once()
|
||||
tx.run.assert_called_once_with(
|
||||
"MATCH (n) RETURN n",
|
||||
{"provider_uid": "123"},
|
||||
timeout=db_module.READ_QUERY_TIMEOUT_SECONDS,
|
||||
)
|
||||
run_result.graph.assert_called_once_with()
|
||||
assert result is expected_graph
|
||||
|
||||
def test_execute_read_query_defaults_parameters_to_empty_dict(self):
|
||||
tx = MagicMock()
|
||||
run_result = MagicMock()
|
||||
run_result.graph.return_value = MagicMock()
|
||||
tx.run.return_value = run_result
|
||||
|
||||
session = MagicMock()
|
||||
session.execute_read.side_effect = lambda fn: fn(tx)
|
||||
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.return_value = session
|
||||
session_ctx.__exit__.return_value = False
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
db_module.execute_read_query(
|
||||
"db-tenant-test-tenant-id",
|
||||
"MATCH (n) RETURN n",
|
||||
)
|
||||
|
||||
tx.run.assert_called_once_with(
|
||||
"MATCH (n) RETURN n",
|
||||
{},
|
||||
timeout=db_module.READ_QUERY_TIMEOUT_SECONDS,
|
||||
)
|
||||
run_result.graph.assert_called_once_with()
|
||||
|
||||
|
||||
class TestGetSessionReadOnly:
|
||||
"""Test that get_session translates Neo4j read-mode errors."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_module_state(self):
|
||||
original_driver = db_module._driver
|
||||
db_module._driver = None
|
||||
yield
|
||||
db_module._driver = original_driver
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"neo4j_code",
|
||||
[
|
||||
"Neo.ClientError.Statement.AccessMode",
|
||||
"Neo.ClientError.Procedure.ProcedureNotFound",
|
||||
],
|
||||
)
|
||||
def test_get_session_raises_write_query_not_allowed(self, neo4j_code):
|
||||
"""Read-mode Neo4j errors should raise `WriteQueryNotAllowedException`."""
|
||||
mock_session = MagicMock()
|
||||
neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
|
||||
code=neo4j_code,
|
||||
message="Write operations are not allowed",
|
||||
)
|
||||
mock_session.run.side_effect = neo4j_error
|
||||
|
||||
mock_driver = MagicMock()
|
||||
mock_driver.session.return_value = mock_session
|
||||
db_module._driver = mock_driver
|
||||
|
||||
with pytest.raises(db_module.WriteQueryNotAllowedException):
|
||||
with db_module.get_session(
|
||||
default_access_mode=neo4j.READ_ACCESS
|
||||
) as session:
|
||||
session.run("CREATE (n) RETURN n")
|
||||
|
||||
def test_get_session_raises_generic_exception_for_other_errors(self):
|
||||
"""Non-read-mode Neo4j errors should raise GraphDatabaseQueryException."""
|
||||
mock_session = MagicMock()
|
||||
neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
|
||||
code="Neo.ClientError.Statement.SyntaxError",
|
||||
message="Invalid syntax",
|
||||
)
|
||||
mock_session.run.side_effect = neo4j_error
|
||||
|
||||
mock_driver = MagicMock()
|
||||
mock_driver.session.return_value = mock_session
|
||||
db_module._driver = mock_driver
|
||||
|
||||
with pytest.raises(db_module.GraphDatabaseQueryException):
|
||||
with db_module.get_session(
|
||||
default_access_mode=neo4j.READ_ACCESS
|
||||
) as session:
|
||||
session.run("INVALID CYPHER")
|
||||
|
||||
|
||||
class TestThreadSafety:
|
||||
"""Test thread-safe initialization."""
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_module_state(self):
|
||||
"""Reset module-level singleton state before each test."""
|
||||
original_driver = db_module._driver
|
||||
|
||||
db_module._driver = None
|
||||
|
||||
yield
|
||||
|
||||
db_module._driver = original_driver
|
||||
|
||||
@patch("api.attack_paths.database.settings")
|
||||
@patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
|
||||
def test_concurrent_init_creates_single_driver(
|
||||
self, mock_driver_factory, mock_settings
|
||||
):
|
||||
"""Multiple threads calling init_driver() should create only one driver."""
|
||||
mock_driver = MagicMock()
|
||||
mock_driver_factory.return_value = mock_driver
|
||||
mock_settings.DATABASES = {
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": 7687,
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "password",
|
||||
}
|
||||
}
|
||||
|
||||
results = []
|
||||
errors = []
|
||||
|
||||
def call_init():
|
||||
try:
|
||||
result = db_module.init_driver()
|
||||
results.append(result)
|
||||
except Exception as e:
|
||||
errors.append(e)
|
||||
|
||||
threads = [threading.Thread(target=call_init) for _ in range(10)]
|
||||
|
||||
for t in threads:
|
||||
t.start()
|
||||
for t in threads:
|
||||
t.join()
|
||||
|
||||
assert not errors, f"Threads raised errors: {errors}"
|
||||
|
||||
# Only one driver created
|
||||
assert mock_driver_factory.call_count == 1
|
||||
|
||||
# All threads got the same driver instance
|
||||
assert all(r is mock_driver for r in results)
|
||||
assert len(results) == 10
|
||||
|
||||
|
||||
class TestHasProviderData:
|
||||
"""Test has_provider_data helper for checking provider nodes in Neo4j."""
|
||||
|
||||
def test_returns_true_when_nodes_exist(self):
|
||||
mock_session = MagicMock()
|
||||
mock_result = MagicMock()
|
||||
mock_result.single.return_value = MagicMock() # non-None record
|
||||
mock_session.run.return_value = mock_result
|
||||
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.return_value = mock_session
|
||||
session_ctx.__exit__.return_value = False
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
assert db_module.has_provider_data("db-tenant-abc", "provider-123") is True
|
||||
|
||||
mock_session.run.assert_called_once()
|
||||
|
||||
def test_returns_false_when_no_nodes(self):
|
||||
mock_session = MagicMock()
|
||||
mock_result = MagicMock()
|
||||
mock_result.single.return_value = None
|
||||
mock_session.run.return_value = mock_result
|
||||
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.return_value = mock_session
|
||||
session_ctx.__exit__.return_value = False
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
assert db_module.has_provider_data("db-tenant-abc", "provider-123") is False
|
||||
|
||||
def test_returns_false_when_database_not_found(self):
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
|
||||
message="Database does not exist",
|
||||
code="Neo.ClientError.Database.DatabaseNotFound",
|
||||
class TestDatabaseNameHelper:
|
||||
def test_tenant_name_lowercases_uuid(self):
|
||||
assert (
|
||||
db_module.get_database_name("ABC-123", temporary=False)
|
||||
== "db-tenant-abc-123"
|
||||
)
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
assert (
|
||||
db_module.has_provider_data("db-tenant-gone", "provider-123") is False
|
||||
)
|
||||
|
||||
def test_raises_on_other_errors(self):
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
|
||||
message="Connection refused",
|
||||
code="Neo.TransientError.General.UnknownError",
|
||||
def test_temporary_name_uses_tmp_scan_prefix(self):
|
||||
assert (
|
||||
db_module.get_database_name("XYZ-789", temporary=True)
|
||||
== "db-tmp-scan-xyz-789"
|
||||
)
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
with pytest.raises(db_module.GraphDatabaseQueryException):
|
||||
db_module.has_provider_data("db-tenant-abc", "provider-123")
|
||||
|
||||
class TestExceptionHierarchy:
|
||||
"""`tasks/` and `api/v1/views.py` import these from the facade."""
|
||||
|
||||
class TestDropSubgraph:
|
||||
"""Test drop_subgraph two-phase batched deletion of a provider's graph."""
|
||||
|
||||
@staticmethod
|
||||
def _result(count):
|
||||
result = MagicMock()
|
||||
result.single.return_value.get.return_value = count
|
||||
return result
|
||||
|
||||
@staticmethod
|
||||
def _session_ctx(session):
|
||||
ctx = MagicMock()
|
||||
ctx.__enter__.return_value = session
|
||||
ctx.__exit__.return_value = False
|
||||
return ctx
|
||||
|
||||
def test_deletes_relationships_then_nodes_in_batches(self):
|
||||
session = MagicMock()
|
||||
# Phase 1 (relationships): one full batch then empty.
|
||||
# Phase 2 (nodes): one full batch then empty.
|
||||
session.run.side_effect = [
|
||||
self._result(1000),
|
||||
self._result(0),
|
||||
self._result(1000),
|
||||
self._result(0),
|
||||
]
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=self._session_ctx(session),
|
||||
):
|
||||
deleted = db_module.drop_subgraph("db-tenant-abc", "provider-123")
|
||||
|
||||
# Only phase-2 node counts contribute to the return value.
|
||||
assert deleted == 1000
|
||||
assert session.run.call_count == 4
|
||||
|
||||
queries = [call.args[0] for call in session.run.call_args_list]
|
||||
|
||||
# Regression guard: the memory blow-up was caused by DETACH DELETE.
|
||||
assert all("DETACH DELETE" not in query for query in queries)
|
||||
|
||||
rel_queries = [query for query in queries if "DELETE r" in query]
|
||||
node_queries = [query for query in queries if "DELETE n" in query]
|
||||
assert rel_queries and node_queries
|
||||
# DISTINCT avoids double-counting relationships matched from both ends.
|
||||
assert all("DISTINCT r" in query for query in rel_queries)
|
||||
|
||||
# Relationships must be fully drained before nodes are deleted.
|
||||
first_node = next(i for i, q in enumerate(queries) if "DELETE n" in q)
|
||||
last_rel = max(i for i, q in enumerate(queries) if "DELETE r" in q)
|
||||
assert last_rel < first_node
|
||||
|
||||
def test_returns_zero_when_database_not_found(self):
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
|
||||
message="Database does not exist",
|
||||
code="Neo.ClientError.Database.DatabaseNotFound",
|
||||
def test_write_query_is_graph_database_exception(self):
|
||||
assert issubclass(
|
||||
db_module.WriteQueryNotAllowedException,
|
||||
db_module.GraphDatabaseQueryException,
|
||||
)
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
assert db_module.drop_subgraph("db-tenant-gone", "provider-123") == 0
|
||||
|
||||
def test_raises_on_other_errors(self):
|
||||
session_ctx = MagicMock()
|
||||
session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
|
||||
message="Connection refused",
|
||||
code="Neo.TransientError.General.UnknownError",
|
||||
def test_client_statement_is_graph_database_exception(self):
|
||||
assert issubclass(
|
||||
db_module.ClientStatementException, db_module.GraphDatabaseQueryException
|
||||
)
|
||||
|
||||
with patch(
|
||||
"api.attack_paths.database.get_session",
|
||||
return_value=session_ctx,
|
||||
):
|
||||
with pytest.raises(db_module.GraphDatabaseQueryException):
|
||||
db_module.drop_subgraph("db-tenant-abc", "provider-123")
|
||||
def test_exception_str_includes_code_when_set(self):
|
||||
exc = db_module.GraphDatabaseQueryException(
|
||||
message="boom", code="Neo.ClientError.X.Y"
|
||||
)
|
||||
assert str(exc) == "Neo.ClientError.X.Y: boom"
|
||||
|
||||
def test_exception_str_falls_back_to_message_without_code(self):
|
||||
exc = db_module.GraphDatabaseQueryException(message="boom")
|
||||
assert str(exc) == "boom"
|
||||
|
||||
|
||||
class TestExecuteReadQueryRoutes:
|
||||
def test_execute_read_query_delegates_to_sink(self, sink_backend_stub):
|
||||
sink_backend_stub.execute_read_query.return_value = "graph"
|
||||
|
||||
result = db_module.execute_read_query(
|
||||
"db-tenant-abc", "MATCH (n) RETURN n", {"provider_uid": "123"}
|
||||
)
|
||||
|
||||
sink_backend_stub.execute_read_query.assert_called_once_with(
|
||||
"db-tenant-abc", "MATCH (n) RETURN n", {"provider_uid": "123"}
|
||||
)
|
||||
assert result == "graph"
|
||||
|
||||
def test_execute_read_query_defaults_parameters_to_none(self, sink_backend_stub):
|
||||
db_module.execute_read_query("db-tenant-abc", "MATCH (n) RETURN n")
|
||||
|
||||
sink_backend_stub.execute_read_query.assert_called_once_with(
|
||||
"db-tenant-abc", "MATCH (n) RETURN n", None
|
||||
)
|
||||
|
||||
|
||||
class TestSinkOperationsDelegation:
|
||||
def test_has_provider_data_delegates_to_sink(self, sink_backend_stub):
|
||||
sink_backend_stub.has_provider_data.return_value = True
|
||||
|
||||
assert db_module.has_provider_data("db-tenant-abc", "provider-123") is True
|
||||
sink_backend_stub.has_provider_data.assert_called_once_with(
|
||||
"db-tenant-abc", "provider-123"
|
||||
)
|
||||
|
||||
def test_drop_subgraph_delegates_to_sink(self, sink_backend_stub):
|
||||
sink_backend_stub.drop_subgraph.return_value = 42
|
||||
|
||||
assert db_module.drop_subgraph("db-tenant-abc", "provider-123") == 42
|
||||
sink_backend_stub.drop_subgraph.assert_called_once_with(
|
||||
"db-tenant-abc", "provider-123"
|
||||
)
|
||||
|
||||
|
||||
class TestRoutingByDatabasePrefix:
|
||||
"""`db-tmp-scan-*` and `None` route to ingest; everything else to sink."""
|
||||
|
||||
def test_create_database_routes_temp_to_ingest(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
db_module.create_database("db-tmp-scan-uuid-1")
|
||||
|
||||
mock_ingest.create_database.assert_called_once_with("db-tmp-scan-uuid-1")
|
||||
sink_backend_stub.create_database.assert_not_called()
|
||||
|
||||
def test_create_database_routes_tenant_to_sink(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
db_module.create_database("db-tenant-abc")
|
||||
|
||||
sink_backend_stub.create_database.assert_called_once_with("db-tenant-abc")
|
||||
mock_ingest.create_database.assert_not_called()
|
||||
|
||||
def test_drop_database_routes_temp_to_ingest(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
db_module.drop_database("db-tmp-scan-uuid-1")
|
||||
|
||||
mock_ingest.drop_database.assert_called_once_with("db-tmp-scan-uuid-1")
|
||||
sink_backend_stub.drop_database.assert_not_called()
|
||||
|
||||
def test_drop_database_routes_tenant_to_sink(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
db_module.drop_database("db-tenant-abc")
|
||||
|
||||
sink_backend_stub.drop_database.assert_called_once_with("db-tenant-abc")
|
||||
mock_ingest.drop_database.assert_not_called()
|
||||
|
||||
def test_clear_cache_routes_temp_to_ingest(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
db_module.clear_cache("db-tmp-scan-uuid-1")
|
||||
|
||||
mock_ingest.clear_cache.assert_called_once_with("db-tmp-scan-uuid-1")
|
||||
sink_backend_stub.clear_cache.assert_not_called()
|
||||
|
||||
def test_clear_cache_routes_tenant_to_sink(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
db_module.clear_cache("db-tenant-abc")
|
||||
|
||||
sink_backend_stub.clear_cache.assert_called_once_with("db-tenant-abc")
|
||||
mock_ingest.clear_cache.assert_not_called()
|
||||
|
||||
def test_get_session_routes_temp_to_ingest(self, sink_backend_stub):
|
||||
sentinel = MagicMock()
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
mock_ingest.get_session.return_value = sentinel
|
||||
|
||||
result = db_module.get_session("db-tmp-scan-uuid-1")
|
||||
|
||||
assert result is sentinel
|
||||
mock_ingest.get_session.assert_called_once()
|
||||
sink_backend_stub.get_session.assert_not_called()
|
||||
|
||||
def test_get_session_routes_none_to_ingest(self, sink_backend_stub):
|
||||
sentinel = MagicMock()
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
mock_ingest.get_session.return_value = sentinel
|
||||
|
||||
result = db_module.get_session(None)
|
||||
|
||||
assert result is sentinel
|
||||
sink_backend_stub.get_session.assert_not_called()
|
||||
|
||||
def test_get_ingest_uri_delegates_to_ingest(self, sink_backend_stub):
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
mock_ingest.get_uri.return_value = "bolt://neo4j:7687"
|
||||
|
||||
assert db_module.get_ingest_uri() == "bolt://neo4j:7687"
|
||||
|
||||
mock_ingest.get_uri.assert_called_once_with()
|
||||
|
||||
def test_get_session_routes_tenant_to_sink(self, sink_backend_stub):
|
||||
sentinel = MagicMock()
|
||||
sink_backend_stub.get_session.return_value = sentinel
|
||||
with patch("api.attack_paths.database.ingest") as mock_ingest:
|
||||
result = db_module.get_session("db-tenant-abc")
|
||||
|
||||
assert result is sentinel
|
||||
mock_ingest.get_session.assert_not_called()
|
||||
|
||||
@@ -67,7 +67,7 @@ class TestLivenessEndpoint:
|
||||
with (
|
||||
patch("api.health._probe_postgres") as mock_pg,
|
||||
patch("api.health._probe_valkey") as mock_vk,
|
||||
patch("api.health._probe_neo4j") as mock_neo,
|
||||
patch("api.health._probe_graph_db") as mock_neo,
|
||||
):
|
||||
response = api_client.get(reverse("health-live"))
|
||||
|
||||
@@ -83,14 +83,14 @@ class TestReadinessEndpoint:
|
||||
return (
|
||||
patch("api.health._probe_postgres", return_value=None),
|
||||
patch("api.health._probe_valkey", return_value=None),
|
||||
patch("api.health._probe_neo4j", return_value=None),
|
||||
patch("api.health._probe_graph_db", return_value=None),
|
||||
)
|
||||
|
||||
def test_returns_200_and_pass_when_all_dependencies_healthy(self, api_client):
|
||||
with (
|
||||
patch("api.health._probe_postgres"),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
|
||||
@@ -107,7 +107,7 @@ class TestReadinessEndpoint:
|
||||
assert set(body["checks"].keys()) == {
|
||||
"postgres:responseTime",
|
||||
"valkey:responseTime",
|
||||
"neo4j:responseTime",
|
||||
"graphdb:responseTime",
|
||||
}
|
||||
for key in body["checks"]:
|
||||
entries = body["checks"][key]
|
||||
@@ -122,6 +122,23 @@ class TestReadinessEndpoint:
|
||||
# `output` must not leak when the check passed.
|
||||
assert "output" not in entry
|
||||
|
||||
@pytest.mark.parametrize("sink", ["neo4j", "neptune"])
|
||||
def test_graphdb_component_id_reflects_active_sink(self, api_client, sink):
|
||||
from django.test import override_settings
|
||||
|
||||
with (
|
||||
override_settings(ATTACK_PATHS_SINK_DATABASE=sink),
|
||||
patch("api.health._probe_postgres"),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
|
||||
assert response.status_code == status.HTTP_200_OK
|
||||
entry = response.json()["checks"]["graphdb:responseTime"][0]
|
||||
# Stable key, but the concrete store is named in componentId.
|
||||
assert entry["componentId"] == sink
|
||||
|
||||
def test_returns_503_and_fail_when_postgres_is_down(self, api_client):
|
||||
with (
|
||||
patch(
|
||||
@@ -129,7 +146,7 @@ class TestReadinessEndpoint:
|
||||
side_effect=RuntimeError("connection refused"),
|
||||
),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
|
||||
@@ -141,13 +158,13 @@ class TestReadinessEndpoint:
|
||||
# Exception detail is never echoed in the response, only logged.
|
||||
assert "output" not in pg_entry
|
||||
assert body["checks"]["valkey:responseTime"][0]["status"] == "pass"
|
||||
assert body["checks"]["neo4j:responseTime"][0]["status"] == "pass"
|
||||
assert body["checks"]["graphdb:responseTime"][0]["status"] == "pass"
|
||||
|
||||
def test_returns_503_and_fail_when_valkey_is_down(self, api_client):
|
||||
with (
|
||||
patch("api.health._probe_postgres"),
|
||||
patch("api.health._probe_valkey", side_effect=ConnectionError("timeout")),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
|
||||
@@ -158,12 +175,12 @@ class TestReadinessEndpoint:
|
||||
assert vk_entry["status"] == "fail"
|
||||
assert "output" not in vk_entry
|
||||
|
||||
def test_returns_503_and_fail_when_neo4j_is_down(self, api_client):
|
||||
def test_returns_503_and_fail_when_graph_db_is_down(self, api_client):
|
||||
with (
|
||||
patch("api.health._probe_postgres"),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch(
|
||||
"api.health._probe_neo4j",
|
||||
"api.health._probe_graph_db",
|
||||
side_effect=RuntimeError("ServiceUnavailable"),
|
||||
),
|
||||
):
|
||||
@@ -172,15 +189,15 @@ class TestReadinessEndpoint:
|
||||
assert response.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
|
||||
body = response.json()
|
||||
assert body["status"] == "fail"
|
||||
neo_entry = body["checks"]["neo4j:responseTime"][0]
|
||||
assert neo_entry["status"] == "fail"
|
||||
assert "output" not in neo_entry
|
||||
graph_db_entry = body["checks"]["graphdb:responseTime"][0]
|
||||
assert graph_db_entry["status"] == "fail"
|
||||
assert "output" not in graph_db_entry
|
||||
|
||||
def test_reports_all_failures_simultaneously(self, api_client):
|
||||
with (
|
||||
patch("api.health._probe_postgres", side_effect=RuntimeError("pg down")),
|
||||
patch("api.health._probe_valkey", side_effect=RuntimeError("vk down")),
|
||||
patch("api.health._probe_neo4j", side_effect=RuntimeError("neo down")),
|
||||
patch("api.health._probe_graph_db", side_effect=RuntimeError("neo down")),
|
||||
):
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
|
||||
@@ -190,7 +207,7 @@ class TestReadinessEndpoint:
|
||||
for key in (
|
||||
"postgres:responseTime",
|
||||
"valkey:responseTime",
|
||||
"neo4j:responseTime",
|
||||
"graphdb:responseTime",
|
||||
):
|
||||
entry = body["checks"][key][0]
|
||||
assert entry["status"] == "fail"
|
||||
@@ -209,7 +226,7 @@ class TestReadinessEndpoint:
|
||||
with (
|
||||
patch("api.health._probe_postgres", side_effect=RuntimeError(sensitive)),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
|
||||
@@ -229,7 +246,7 @@ class TestReadinessEndpoint:
|
||||
with (
|
||||
patch("api.health._probe_postgres"),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
api_client.credentials()
|
||||
response = api_client.get(reverse("health-ready"))
|
||||
@@ -244,7 +261,7 @@ class TestReadinessCache:
|
||||
with (
|
||||
patch("api.health._probe_postgres") as pg,
|
||||
patch("api.health._probe_valkey") as vk,
|
||||
patch("api.health._probe_neo4j") as neo,
|
||||
patch("api.health._probe_graph_db") as neo,
|
||||
):
|
||||
r1 = api_client.get(reverse("health-ready"))
|
||||
r2 = api_client.get(reverse("health-ready"))
|
||||
@@ -262,7 +279,7 @@ class TestReadinessCache:
|
||||
with (
|
||||
patch("api.health._probe_postgres") as pg,
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
api_client.get(reverse("health-ready"))
|
||||
assert pg.call_count == 1
|
||||
@@ -286,7 +303,7 @@ class TestReadinessCache:
|
||||
with (
|
||||
patch("api.health._probe_postgres", side_effect=RuntimeError("down")) as pg,
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
):
|
||||
r1 = api_client.get(reverse("health-ready"))
|
||||
r2 = api_client.get(reverse("health-ready"))
|
||||
@@ -320,7 +337,7 @@ class TestRateLimiting:
|
||||
with (
|
||||
patch("api.health._probe_postgres"),
|
||||
patch("api.health._probe_valkey"),
|
||||
patch("api.health._probe_neo4j"),
|
||||
patch("api.health._probe_graph_db"),
|
||||
patch.object(ScopedRateThrottle, "parse_rate", return_value=(2, 60)),
|
||||
):
|
||||
statuses = [
|
||||
@@ -414,19 +431,42 @@ class TestProbeImplementations:
|
||||
with pytest.raises(RuntimeError, match="bug"):
|
||||
health._probe_valkey()
|
||||
|
||||
def test_neo4j_probe_calls_verify_connectivity(self):
|
||||
with patch("api.attack_paths.database.get_driver") as mock_get_driver:
|
||||
mock_get_driver.return_value.verify_connectivity.return_value = None
|
||||
assert health._probe_neo4j() is None
|
||||
mock_get_driver.return_value.verify_connectivity.assert_called_once_with()
|
||||
def test_graph_db_probe_calls_verify_connectivity(self):
|
||||
with patch("api.attack_paths.database.verify_connectivity") as mock_verify:
|
||||
mock_verify.return_value = None
|
||||
assert health._probe_graph_db() is None
|
||||
mock_verify.assert_called_once_with()
|
||||
|
||||
def test_neo4j_probe_propagates_driver_errors(self):
|
||||
with patch("api.attack_paths.database.get_driver") as mock_get_driver:
|
||||
mock_get_driver.return_value.verify_connectivity.side_effect = RuntimeError(
|
||||
"unreachable"
|
||||
)
|
||||
def test_graph_db_probe_propagates_errors(self):
|
||||
with patch(
|
||||
"api.attack_paths.database.verify_connectivity",
|
||||
side_effect=RuntimeError("unreachable"),
|
||||
):
|
||||
with pytest.raises(RuntimeError, match="unreachable"):
|
||||
health._probe_neo4j()
|
||||
health._probe_graph_db()
|
||||
|
||||
def test_graph_db_probe_times_out_when_check_exceeds_budget(self):
|
||||
# A sink whose connectivity check blocks past the probe budget must
|
||||
# surface as a failure fast, not pin the request thread for the
|
||||
# driver's full acquisition timeout.
|
||||
import time as _time
|
||||
|
||||
def _hang() -> None:
|
||||
_time.sleep(2)
|
||||
|
||||
with (
|
||||
patch("api.health.GRAPH_DB_PROBE_TIMEOUT_SECONDS", 0.2),
|
||||
patch(
|
||||
"api.attack_paths.database.verify_connectivity",
|
||||
side_effect=_hang,
|
||||
),
|
||||
):
|
||||
started = _time.perf_counter()
|
||||
with pytest.raises(TimeoutError):
|
||||
health._probe_graph_db()
|
||||
elapsed = _time.perf_counter() - started
|
||||
|
||||
assert elapsed < health.GRAPH_DB_PROBE_TIMEOUT_SECONDS + 1
|
||||
|
||||
|
||||
class TestStatusAggregation:
|
||||
|
||||
@@ -0,0 +1,626 @@
|
||||
"""Tests for the attack-paths sink factory and Neo4j sink.
|
||||
|
||||
The sink module picks a backend per ``settings.ATTACK_PATHS_SINK_DATABASE``.
|
||||
Neo4j is the default and preserves today's behavior; Neptune is opt-in and
|
||||
builds dual writer/reader Bolt drivers.
|
||||
"""
|
||||
|
||||
import json
|
||||
from importlib import import_module
|
||||
from unittest.mock import MagicMock, patch
|
||||
|
||||
import pytest
|
||||
|
||||
# Prime patch-target resolution. `api.attack_paths.sink/__init__.py` doesn't
|
||||
# eagerly import these submodules (they're loaded on demand inside the
|
||||
# factory), so `mock.patch("api.attack_paths.sink.<sub>.…")` would fail with
|
||||
# AttributeError on first call. Importing here registers them as attributes
|
||||
# of the package before any decorator runs.
|
||||
import_module("api.attack_paths.sink.neo4j")
|
||||
import_module("api.attack_paths.sink.neptune")
|
||||
|
||||
|
||||
@pytest.fixture(autouse=True)
|
||||
def reset_sink_state():
|
||||
"""Reset the module-level backend singletons around each test.
|
||||
|
||||
The cache lives in `api.attack_paths.sink.factory`, not on the package.
|
||||
"""
|
||||
from api.attack_paths.sink import factory
|
||||
|
||||
original_backend = factory._backend
|
||||
original_secondary = dict(factory._secondary_backends)
|
||||
factory._backend = None
|
||||
factory._secondary_backends.clear()
|
||||
yield
|
||||
factory._backend = original_backend
|
||||
factory._secondary_backends.clear()
|
||||
factory._secondary_backends.update(original_secondary)
|
||||
|
||||
|
||||
class TestSinkFactory:
|
||||
def test_default_resolves_to_neo4j(self, settings):
|
||||
from api.attack_paths.sink import factory
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
|
||||
assert factory._resolve_setting() == "neo4j"
|
||||
|
||||
def test_neptune_resolves_correctly(self, settings):
|
||||
from api.attack_paths.sink import factory
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
|
||||
assert factory._resolve_setting() == "neptune"
|
||||
|
||||
def test_invalid_value_raises(self, settings):
|
||||
from api.attack_paths.sink import factory
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "foo"
|
||||
with pytest.raises(RuntimeError, match="ATTACK_PATHS_SINK_DATABASE"):
|
||||
factory._resolve_setting()
|
||||
|
||||
@patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
|
||||
def test_init_builds_neo4j_backend_by_default(self, mock_driver, settings):
|
||||
from api.attack_paths import sink as sink_module
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": "7687",
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "pw",
|
||||
},
|
||||
}
|
||||
mock_driver.return_value = MagicMock()
|
||||
|
||||
backend = sink_module.init()
|
||||
|
||||
assert isinstance(backend, Neo4jSink)
|
||||
mock_driver.assert_called_once()
|
||||
|
||||
@patch("api.attack_paths.sink.neptune.neptune_auth_provider")
|
||||
@patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
|
||||
def test_init_builds_neptune_backend(
|
||||
self, mock_driver, mock_auth_provider, settings
|
||||
):
|
||||
from api.attack_paths import sink as sink_module
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neptune": {
|
||||
"WRITER_ENDPOINT": "writer.example",
|
||||
"READER_ENDPOINT": "reader.example",
|
||||
"PORT": "8182",
|
||||
"REGION": "eu-west-1",
|
||||
},
|
||||
}
|
||||
mock_driver.return_value = MagicMock()
|
||||
mock_auth_provider.return_value = lambda: None
|
||||
|
||||
backend = sink_module.init()
|
||||
|
||||
assert isinstance(backend, NeptuneSink)
|
||||
# Writer + reader endpoints both trigger driver construction
|
||||
assert mock_driver.call_count == 2
|
||||
writer_uri = mock_driver.call_args_list[0][0][0]
|
||||
reader_uri = mock_driver.call_args_list[1][0][0]
|
||||
assert writer_uri == "bolt+s://writer.example:8182"
|
||||
assert reader_uri == "bolt+s://reader.example:8182"
|
||||
|
||||
@patch("api.attack_paths.sink.neptune.neptune_auth_provider")
|
||||
@patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
|
||||
def test_neptune_reader_falls_back_to_writer(
|
||||
self, mock_driver, mock_auth_provider, settings
|
||||
):
|
||||
from api.attack_paths import sink as sink_module
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neptune": {
|
||||
"WRITER_ENDPOINT": "writer.example",
|
||||
"READER_ENDPOINT": "",
|
||||
"PORT": "8182",
|
||||
"REGION": "eu-west-1",
|
||||
},
|
||||
}
|
||||
mock_driver.return_value = MagicMock()
|
||||
mock_auth_provider.return_value = lambda: None
|
||||
|
||||
sink_module.init()
|
||||
|
||||
# Only one driver call — reader aliases writer
|
||||
assert mock_driver.call_count == 1
|
||||
|
||||
|
||||
class TestGetBackendForScan:
|
||||
"""``get_backend_for_scan`` routes by the row's recorded sink backend."""
|
||||
|
||||
@patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
|
||||
def test_legacy_scan_in_neo4j_process_uses_active_backend(
|
||||
self, mock_driver, settings
|
||||
):
|
||||
from api.attack_paths import sink as sink_module
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": "7687",
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "pw",
|
||||
},
|
||||
}
|
||||
mock_driver.return_value = MagicMock()
|
||||
|
||||
scan = MagicMock(sink_backend="neo4j")
|
||||
backend = sink_module.get_backend_for_scan(scan)
|
||||
|
||||
assert backend is sink_module.get_backend()
|
||||
|
||||
def test_neptune_scan_on_neo4j_process_uses_neptune_secondary(self, settings):
|
||||
from api.attack_paths.sink import factory
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
|
||||
active_neo4j = MagicMock(name="neo4j-active")
|
||||
factory._backend = active_neo4j
|
||||
|
||||
secondary_neptune = MagicMock(name="neptune-secondary")
|
||||
with patch.object(factory, "_build_backend", return_value=secondary_neptune):
|
||||
scan = MagicMock(sink_backend="neptune")
|
||||
backend = factory.get_backend_for_scan(scan)
|
||||
|
||||
assert backend is secondary_neptune
|
||||
assert backend is not active_neo4j
|
||||
|
||||
|
||||
def _session_ctx(session: MagicMock) -> MagicMock:
|
||||
ctx = MagicMock()
|
||||
ctx.__enter__ = MagicMock(return_value=session)
|
||||
ctx.__exit__ = MagicMock(return_value=False)
|
||||
return ctx
|
||||
|
||||
|
||||
class TestNeo4jSinkSyncWrites:
|
||||
def test_ensure_sync_indexes_runs_create_index_idempotent(self):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
session.run.return_value = MagicMock()
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
sink.ensure_sync_indexes("db-tenant-x")
|
||||
|
||||
query = session.run.call_args.args[0]
|
||||
assert "CREATE INDEX" in query
|
||||
assert "IF NOT EXISTS" in query
|
||||
assert "`_ProviderResource`" in query
|
||||
assert "`_provider_element_id`" in query
|
||||
|
||||
def test_write_nodes_skips_empty_batch(self):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
with patch.object(sink, "get_session") as get_session:
|
||||
sink.write_nodes("db-tenant-x", "`AWSUser`", [])
|
||||
get_session.assert_not_called()
|
||||
|
||||
def test_write_nodes_merges_on_provider_resource_label(self):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
sink.write_nodes(
|
||||
"db-tenant-x",
|
||||
"`AWSUser`:`_ProviderResource`",
|
||||
[{"provider_element_id": "p:e", "props": {"k": "v"}}],
|
||||
)
|
||||
|
||||
query, params = session.run.call_args.args
|
||||
assert "MERGE (n:`_ProviderResource`" in query
|
||||
assert "`_provider_element_id`: row.provider_element_id" in query
|
||||
assert "SET n:`AWSUser`:`_ProviderResource`" in query
|
||||
assert params == {"rows": [{"provider_element_id": "p:e", "props": {"k": "v"}}]}
|
||||
|
||||
def test_write_relationships_scopes_endpoints_by_provider_label(self):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
provider_id = "00000000-0000-0000-0000-000000000abc"
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
sink.write_relationships(
|
||||
"db-tenant-x",
|
||||
"RESOURCE",
|
||||
provider_id,
|
||||
[
|
||||
{
|
||||
"start_element_id": "s",
|
||||
"end_element_id": "e",
|
||||
"provider_element_id": "pe",
|
||||
"props": {},
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
query = session.run.call_args.args[0]
|
||||
assert ":`_Provider_00000000000000000000000000000abc`" in query
|
||||
assert ":RESOURCE" in query.replace("`", "")
|
||||
assert "MERGE (s)-[r:`RESOURCE`" in query
|
||||
|
||||
|
||||
class TestNeptuneSinkSyncWrites:
|
||||
def test_ensure_sync_indexes_is_noop(self):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
sink = NeptuneSink()
|
||||
with patch.object(sink, "get_session") as get_session:
|
||||
sink.ensure_sync_indexes("ignored")
|
||||
get_session.assert_not_called()
|
||||
|
||||
def test_write_nodes_merges_on_neptune_id_with_provider_resource_label(self):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
sink = NeptuneSink()
|
||||
session = MagicMock()
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
sink.write_nodes(
|
||||
"ignored",
|
||||
"`AWSUser`",
|
||||
[{"provider_element_id": "p:e", "props": {"k": "v"}}],
|
||||
)
|
||||
|
||||
query = session.run.call_args.args[0]
|
||||
# Neptune assigns a default `vertex` label to any unlabeled node,
|
||||
# so the MERGE must pin a real label at creation time.
|
||||
assert "MERGE (n:`_ProviderResource` {`~id`: row.provider_element_id})" in query
|
||||
assert "SET n:`AWSUser`" in query
|
||||
assert "SET n.`_provider_element_id` = row.provider_element_id" in query
|
||||
|
||||
def test_write_relationships_matches_endpoints_by_id(self):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
sink = NeptuneSink()
|
||||
session = MagicMock()
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
sink.write_relationships(
|
||||
"ignored",
|
||||
"RESOURCE",
|
||||
"provider-1",
|
||||
[
|
||||
{
|
||||
"start_element_id": "s",
|
||||
"end_element_id": "e",
|
||||
"provider_element_id": "pe",
|
||||
"props": {},
|
||||
}
|
||||
],
|
||||
)
|
||||
|
||||
query = session.run.call_args.args[0]
|
||||
assert "MATCH (s) WHERE id(s) = row.start_element_id" in query
|
||||
assert "MATCH (e) WHERE id(e) = row.end_element_id" in query
|
||||
assert "MERGE (s)-[r:`RESOURCE`" in query
|
||||
|
||||
|
||||
class TestNeptuneSinkDropSubgraph:
|
||||
def test_drop_subgraph_deletes_rels_before_nodes_in_bounded_batches(self):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
sink = NeptuneSink()
|
||||
session = MagicMock()
|
||||
|
||||
rel_record_first = MagicMock()
|
||||
rel_record_first.__getitem__ = lambda _self, key: 50
|
||||
rel_record_drain = MagicMock()
|
||||
rel_record_drain.__getitem__ = lambda _self, key: 0
|
||||
node_record_first = MagicMock()
|
||||
node_record_first.__getitem__ = lambda _self, key: 10
|
||||
node_record_drain = MagicMock()
|
||||
node_record_drain.__getitem__ = lambda _self, key: 0
|
||||
|
||||
run_results = [
|
||||
MagicMock(single=MagicMock(return_value=rel_record_first)),
|
||||
MagicMock(single=MagicMock(return_value=rel_record_drain)),
|
||||
MagicMock(single=MagicMock(return_value=node_record_first)),
|
||||
MagicMock(single=MagicMock(return_value=node_record_drain)),
|
||||
]
|
||||
session.run.side_effect = run_results
|
||||
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
deleted = sink.drop_subgraph("ignored", "provider-1")
|
||||
|
||||
assert deleted == 10
|
||||
first_query = session.run.call_args_list[0].args[0]
|
||||
assert "DELETE r" in first_query
|
||||
assert "DETACH DELETE" not in first_query
|
||||
# DISTINCT avoids double-counting relationships matched from both ends.
|
||||
assert "DISTINCT r" in first_query
|
||||
third_query = session.run.call_args_list[2].args[0]
|
||||
assert "DELETE n" in third_query
|
||||
|
||||
|
||||
class TestNeo4jSinkDropSubgraph:
|
||||
"""Neo4j drop deletes relationships then nodes in batches (no ``DETACH DELETE``)."""
|
||||
|
||||
def test_drop_subgraph_deletes_rels_before_nodes_in_bounded_batches(self):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
|
||||
rel_first = MagicMock()
|
||||
rel_first.get = lambda key, default=0: 50
|
||||
rel_drain = MagicMock()
|
||||
rel_drain.get = lambda key, default=0: 0
|
||||
node_first = MagicMock()
|
||||
node_first.get = lambda key, default=0: 10
|
||||
node_drain = MagicMock()
|
||||
node_drain.get = lambda key, default=0: 0
|
||||
session.run.side_effect = [
|
||||
MagicMock(single=MagicMock(return_value=rel_first)),
|
||||
MagicMock(single=MagicMock(return_value=rel_drain)),
|
||||
MagicMock(single=MagicMock(return_value=node_first)),
|
||||
MagicMock(single=MagicMock(return_value=node_drain)),
|
||||
]
|
||||
|
||||
provider_id = "00000000-0000-0000-0000-000000000abc"
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
deleted = sink.drop_subgraph("db-tenant-x", provider_id)
|
||||
|
||||
# Only phase-2 node counts contribute to the return value.
|
||||
assert deleted == 10
|
||||
assert session.run.call_count == 4
|
||||
|
||||
queries = [call.args[0] for call in session.run.call_args_list]
|
||||
# Regression guard: the memory blow-up was caused by DETACH DELETE.
|
||||
assert all("DETACH DELETE" not in query for query in queries)
|
||||
|
||||
first_query = queries[0]
|
||||
assert "DELETE r" in first_query
|
||||
# DISTINCT avoids double-counting relationships matched from both ends.
|
||||
assert "DISTINCT r" in first_query
|
||||
assert ":`_Provider_00000000000000000000000000000abc`" in first_query
|
||||
|
||||
assert "DELETE n" in queries[2]
|
||||
|
||||
# Relationships must be fully drained before nodes are deleted.
|
||||
first_node = next(i for i, q in enumerate(queries) if "DELETE n" in q)
|
||||
last_rel = max(i for i, q in enumerate(queries) if "DELETE r" in q)
|
||||
assert last_rel < first_node
|
||||
|
||||
def test_drop_subgraph_returns_zero_when_database_does_not_exist(self):
|
||||
from api.attack_paths.database import GraphDatabaseQueryException
|
||||
from api.attack_paths.sink.neo4j import DATABASE_NOT_FOUND_CODE, Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
session.run.side_effect = GraphDatabaseQueryException(
|
||||
message="db missing", code=DATABASE_NOT_FOUND_CODE
|
||||
)
|
||||
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
deleted = sink.drop_subgraph("db-tenant-missing", "provider-1")
|
||||
|
||||
assert deleted == 0
|
||||
|
||||
|
||||
class TestSinkHasProviderData:
|
||||
"""``has_provider_data`` is the read-path probe used by API views."""
|
||||
|
||||
def test_neo4j_returns_true_when_provider_node_exists(self):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
session.run.return_value.single.return_value = MagicMock()
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
present = sink.has_provider_data(
|
||||
"db-tenant-x", "00000000-0000-0000-0000-000000000abc"
|
||||
)
|
||||
|
||||
assert present is True
|
||||
query = session.run.call_args.args[0]
|
||||
assert ":`_Provider_00000000000000000000000000000abc`" in query
|
||||
|
||||
def test_neo4j_returns_false_when_database_does_not_exist(self):
|
||||
from api.attack_paths.database import GraphDatabaseQueryException
|
||||
from api.attack_paths.sink.neo4j import DATABASE_NOT_FOUND_CODE, Neo4jSink
|
||||
|
||||
sink = Neo4jSink()
|
||||
session = MagicMock()
|
||||
session.run.side_effect = GraphDatabaseQueryException(
|
||||
message="db missing", code=DATABASE_NOT_FOUND_CODE
|
||||
)
|
||||
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
present = sink.has_provider_data("db-tenant-missing", "provider-1")
|
||||
|
||||
assert present is False
|
||||
|
||||
def test_neptune_returns_true_when_provider_node_exists(self):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
sink = NeptuneSink()
|
||||
session = MagicMock()
|
||||
session.run.return_value.single.return_value = MagicMock()
|
||||
with patch.object(sink, "get_session", return_value=_session_ctx(session)):
|
||||
present = sink.has_provider_data("ignored", "provider-1")
|
||||
|
||||
assert present is True
|
||||
|
||||
|
||||
class TestGetBackendForScanCutover:
|
||||
"""``get_backend_for_scan`` keeps old-sink scans queryable after cutover."""
|
||||
|
||||
def test_legacy_scan_on_neptune_process_uses_neo4j_secondary(self, settings):
|
||||
from api.attack_paths.sink import factory
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
|
||||
active_neptune = MagicMock(name="neptune-active")
|
||||
factory._backend = active_neptune
|
||||
|
||||
secondary_neo4j = MagicMock(name="neo4j-secondary")
|
||||
with patch.object(factory, "_build_backend", return_value=secondary_neo4j):
|
||||
scan = MagicMock(sink_backend="neo4j")
|
||||
backend = factory.get_backend_for_scan(scan)
|
||||
|
||||
assert backend is secondary_neo4j
|
||||
assert backend is not active_neptune
|
||||
|
||||
|
||||
class TestSinkVerifyConnectivity:
|
||||
"""The readiness probe calls ``verify_connectivity`` through the shim.
|
||||
|
||||
Neo4j checks its single driver; Neptune checks the reader (the API read
|
||||
path), which on single-endpoint clusters aliases the writer.
|
||||
"""
|
||||
|
||||
@patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
|
||||
def test_neo4j_verifies_its_driver(self, mock_driver, settings):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": "7687",
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "pw",
|
||||
},
|
||||
}
|
||||
driver = MagicMock()
|
||||
mock_driver.return_value = driver
|
||||
|
||||
sink = Neo4jSink()
|
||||
sink.init()
|
||||
driver.verify_connectivity.reset_mock() # ignore the eager init check
|
||||
sink.verify_connectivity()
|
||||
|
||||
driver.verify_connectivity.assert_called_once_with()
|
||||
|
||||
@patch("api.attack_paths.sink.neptune.neptune_auth_provider")
|
||||
@patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
|
||||
def test_neptune_verifies_reader_not_writer(
|
||||
self, mock_driver, mock_auth_provider, settings
|
||||
):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neptune": {
|
||||
"WRITER_ENDPOINT": "writer.example",
|
||||
"READER_ENDPOINT": "reader.example",
|
||||
"PORT": "8182",
|
||||
"REGION": "eu-west-1",
|
||||
},
|
||||
}
|
||||
writer, reader = MagicMock(name="writer"), MagicMock(name="reader")
|
||||
mock_driver.side_effect = [writer, reader]
|
||||
mock_auth_provider.return_value = lambda: None
|
||||
|
||||
sink = NeptuneSink()
|
||||
sink.init()
|
||||
writer.verify_connectivity.reset_mock()
|
||||
reader.verify_connectivity.reset_mock()
|
||||
|
||||
sink.verify_connectivity()
|
||||
|
||||
reader.verify_connectivity.assert_called_once_with()
|
||||
writer.verify_connectivity.assert_not_called()
|
||||
|
||||
|
||||
class TestSinkInitToleratesUnreachableSink:
|
||||
"""Init must not crash the process when the sink is down at boot.
|
||||
|
||||
Same degradation model as Postgres: the driver is retained and
|
||||
reconnects lazily; /health/ready surfaces the outage until it recovers.
|
||||
"""
|
||||
|
||||
@patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
|
||||
def test_neo4j_init_continues_when_verify_fails(self, mock_driver, settings):
|
||||
from api.attack_paths.sink.neo4j import Neo4jSink
|
||||
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neo4j": {
|
||||
"HOST": "localhost",
|
||||
"PORT": "7687",
|
||||
"USER": "neo4j",
|
||||
"PASSWORD": "pw",
|
||||
},
|
||||
}
|
||||
driver = MagicMock()
|
||||
driver.verify_connectivity.side_effect = RuntimeError("unreachable")
|
||||
mock_driver.return_value = driver
|
||||
|
||||
sink = Neo4jSink()
|
||||
# Must not raise.
|
||||
assert sink.init() is driver
|
||||
assert sink._driver is driver
|
||||
|
||||
@patch("api.attack_paths.sink.neptune.neptune_auth_provider")
|
||||
@patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
|
||||
def test_neptune_init_continues_when_verify_fails(
|
||||
self, mock_driver, mock_auth_provider, settings
|
||||
):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
settings.DATABASES = {
|
||||
**settings.DATABASES,
|
||||
"neptune": {
|
||||
"WRITER_ENDPOINT": "writer.example",
|
||||
"READER_ENDPOINT": "reader.example",
|
||||
"PORT": "8182",
|
||||
"REGION": "eu-west-1",
|
||||
},
|
||||
}
|
||||
driver = MagicMock()
|
||||
driver.verify_connectivity.side_effect = RuntimeError("unreachable")
|
||||
mock_driver.return_value = driver
|
||||
mock_auth_provider.return_value = lambda: None
|
||||
|
||||
sink = NeptuneSink()
|
||||
# Must not raise; both drivers retained.
|
||||
sink.init()
|
||||
assert sink._writer is not None
|
||||
assert sink._reader is not None
|
||||
|
||||
|
||||
class TestNeptuneAdminNoOps:
|
||||
"""Neptune is single-database; admin DDL has no work to do."""
|
||||
|
||||
@pytest.mark.parametrize("method", ["create_database", "drop_database"])
|
||||
def test_admin_ops_return_none_without_touching_a_session(self, method):
|
||||
from api.attack_paths.sink.neptune import NeptuneSink
|
||||
|
||||
sink = NeptuneSink()
|
||||
with patch.object(sink, "get_session") as get_session:
|
||||
assert getattr(sink, method)("ignored") is None
|
||||
get_session.assert_not_called()
|
||||
|
||||
|
||||
class TestNeptuneAuthToken:
|
||||
"""SigV4 signing for the Neptune Bolt endpoint."""
|
||||
|
||||
@patch("api.attack_paths.sink.neptune.SigV4Auth")
|
||||
@patch("api.attack_paths.sink.neptune.BotoSession")
|
||||
def test_host_header_includes_non_default_port(self, mock_boto, mock_sigv4):
|
||||
# Neptune runs on 8182; the SigV4 canonical Host must keep the port or
|
||||
# the signature is rejected.
|
||||
from api.attack_paths.sink.neptune import _NeptuneAuthToken
|
||||
|
||||
credentials = MagicMock()
|
||||
credentials.get_frozen_credentials.return_value = MagicMock()
|
||||
mock_boto.return_value.get_credentials.return_value = credentials
|
||||
|
||||
token = _NeptuneAuthToken("eu-west-1", "https://writer.example:8182")
|
||||
|
||||
auth_obj = json.loads(token.credentials)
|
||||
assert auth_obj["Host"] == "writer.example:8182"
|
||||
@@ -4754,6 +4754,64 @@ class TestAttackPathsScanViewSet:
|
||||
assert first_attributes["provider_type"] == provider.provider
|
||||
assert first_attributes["provider_uid"] == provider.uid
|
||||
|
||||
def test_attack_paths_scans_list_prefers_active_sink_scan_on_rollback(
|
||||
self,
|
||||
authenticated_client,
|
||||
providers_fixture,
|
||||
scans_fixture,
|
||||
create_attack_paths_scan,
|
||||
settings,
|
||||
):
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
|
||||
provider = providers_fixture[0]
|
||||
|
||||
neo4j_scan = create_attack_paths_scan(
|
||||
provider,
|
||||
scan=scans_fixture[0],
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neo4j",
|
||||
)
|
||||
neptune_scan = create_attack_paths_scan(
|
||||
provider,
|
||||
scan=scans_fixture[0],
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
|
||||
response = authenticated_client.get(reverse("attack-paths-scans-list"))
|
||||
|
||||
assert response.status_code == status.HTTP_200_OK
|
||||
ids = {item["id"] for item in response.json()["data"]}
|
||||
assert str(neo4j_scan.id) in ids
|
||||
assert str(neptune_scan.id) not in ids
|
||||
|
||||
def test_attack_paths_scans_list_falls_back_when_active_sink_has_no_scan(
|
||||
self,
|
||||
authenticated_client,
|
||||
providers_fixture,
|
||||
scans_fixture,
|
||||
create_attack_paths_scan,
|
||||
settings,
|
||||
):
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
|
||||
provider = providers_fixture[0]
|
||||
|
||||
legacy_scan = create_attack_paths_scan(
|
||||
provider,
|
||||
scan=scans_fixture[0],
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neo4j",
|
||||
)
|
||||
|
||||
response = authenticated_client.get(reverse("attack-paths-scans-list"))
|
||||
|
||||
assert response.status_code == status.HTTP_200_OK
|
||||
ids = {item["id"] for item in response.json()["data"]}
|
||||
assert str(legacy_scan.id) in ids
|
||||
|
||||
def test_attack_paths_scans_list_respects_provider_group_visibility(
|
||||
self,
|
||||
authenticated_client_no_permissions_rbac,
|
||||
@@ -4874,7 +4932,8 @@ class TestAttackPathsScanViewSet:
|
||||
)
|
||||
|
||||
assert response.status_code == status.HTTP_200_OK
|
||||
mock_get_queries.assert_called_once_with(provider.provider)
|
||||
# TODO: drop the is_migrated argument after Neptune cutover
|
||||
mock_get_queries.assert_called_once_with(provider.provider, is_migrated=False)
|
||||
payload = response.json()["data"]
|
||||
assert len(payload) == 1
|
||||
assert payload[0]["id"] == "aws-rds"
|
||||
@@ -4974,7 +5033,8 @@ class TestAttackPathsScanViewSet:
|
||||
)
|
||||
|
||||
assert response.status_code == status.HTTP_200_OK
|
||||
mock_get_query.assert_called_once_with("aws-rds")
|
||||
# TODO: drop the is_migrated argument after Neptune cutover
|
||||
mock_get_query.assert_called_once_with("aws-rds", is_migrated=False)
|
||||
mock_get_db_name.assert_called_once_with(attack_paths_scan.provider.tenant_id)
|
||||
provider_id = str(attack_paths_scan.provider_id)
|
||||
mock_prepare.assert_called_once_with(
|
||||
@@ -4988,6 +5048,7 @@ class TestAttackPathsScanViewSet:
|
||||
query_definition,
|
||||
prepared_parameters,
|
||||
provider_id,
|
||||
scan=attack_paths_scan,
|
||||
)
|
||||
result = response.json()["data"]
|
||||
attributes = result["attributes"]
|
||||
@@ -5339,6 +5400,7 @@ class TestAttackPathsScanViewSet:
|
||||
"db-test",
|
||||
"MATCH (n) RETURN n",
|
||||
str(attack_paths_scan.provider_id),
|
||||
scan=attack_paths_scan,
|
||||
)
|
||||
attributes = response.json()["data"]["attributes"]
|
||||
assert len(attributes["nodes"]) == 1
|
||||
@@ -5875,9 +5937,10 @@ class TestAttackPathsScanViewSet:
|
||||
)
|
||||
|
||||
assert response.status_code == status.HTTP_200_OK
|
||||
mock_get_schema.assert_called_once_with(
|
||||
"db-test", str(attack_paths_scan.provider_id)
|
||||
)
|
||||
mock_get_schema.assert_called_once()
|
||||
schema_args = mock_get_schema.call_args[0]
|
||||
assert schema_args[:2] == ("db-test", str(attack_paths_scan.provider_id))
|
||||
assert schema_args[2].id == attack_paths_scan.id
|
||||
attributes = response.json()["data"]["attributes"]
|
||||
assert attributes["provider"] == "aws"
|
||||
assert attributes["cartography_version"] == "0.129.0"
|
||||
|
||||
@@ -2876,13 +2876,22 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
|
||||
|
||||
def list(self, request, *args, **kwargs):
|
||||
queryset = self.filter_queryset(self.get_queryset())
|
||||
active_sink_backend = django_settings.ATTACK_PATHS_SINK_DATABASE
|
||||
|
||||
latest_per_provider = queryset.annotate(
|
||||
active_sink_rank=Case(
|
||||
When(sink_backend=active_sink_backend, then=Value(0)),
|
||||
default=Value(1),
|
||||
output_field=IntegerField(),
|
||||
),
|
||||
latest_scan_rank=Window(
|
||||
expression=RowNumber(),
|
||||
partition_by=[F("provider_id")],
|
||||
order_by=[F("inserted_at").desc()],
|
||||
)
|
||||
order_by=[
|
||||
F("active_sink_rank").asc(),
|
||||
F("inserted_at").desc(),
|
||||
],
|
||||
),
|
||||
).filter(latest_scan_rank=1)
|
||||
|
||||
page = self.paginate_queryset(latest_per_provider)
|
||||
@@ -2909,7 +2918,11 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
|
||||
)
|
||||
def attack_paths_queries(self, request, pk=None):
|
||||
attack_paths_scan = self.get_object()
|
||||
queries = get_queries_for_provider(attack_paths_scan.provider.provider)
|
||||
# TODO: drop the is_migrated argument after Neptune cutover
|
||||
queries = get_queries_for_provider(
|
||||
attack_paths_scan.provider.provider,
|
||||
is_migrated=attack_paths_scan.is_migrated,
|
||||
)
|
||||
|
||||
if not queries:
|
||||
return Response(
|
||||
@@ -2942,7 +2955,11 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
|
||||
serializer = AttackPathsQueryRunRequestSerializer(data=payload)
|
||||
serializer.is_valid(raise_exception=True)
|
||||
|
||||
query_definition = get_query_by_id(serializer.validated_data["id"])
|
||||
# TODO: drop the is_migrated argument after Neptune cutover
|
||||
query_definition = get_query_by_id(
|
||||
serializer.validated_data["id"],
|
||||
is_migrated=attack_paths_scan.is_migrated,
|
||||
)
|
||||
if (
|
||||
query_definition is None
|
||||
or query_definition.provider != attack_paths_scan.provider.provider
|
||||
@@ -2968,6 +2985,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
|
||||
query_definition,
|
||||
parameters,
|
||||
provider_id,
|
||||
scan=attack_paths_scan,
|
||||
)
|
||||
query_duration = time.monotonic() - start
|
||||
|
||||
@@ -3035,6 +3053,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
|
||||
database_name,
|
||||
serializer.validated_data["query"],
|
||||
provider_id,
|
||||
scan=attack_paths_scan,
|
||||
)
|
||||
query_duration = time.monotonic() - start
|
||||
|
||||
@@ -3091,7 +3110,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
|
||||
provider_id = str(attack_paths_scan.provider_id)
|
||||
|
||||
schema = attack_paths_views_helpers.get_cartography_schema(
|
||||
database_name, provider_id
|
||||
database_name, provider_id, attack_paths_scan
|
||||
)
|
||||
if not schema:
|
||||
return Response(
|
||||
|
||||
@@ -311,6 +311,11 @@ ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES = env.int(
|
||||
"ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES", 2880
|
||||
) # 48h
|
||||
|
||||
# Selects where the persistent attack-paths graph is stored. The scan
|
||||
# temporary database is always Neo4j; only the sink is configurable.
|
||||
# Valid values: "neo4j" (default, OSS and local dev), "neptune" (hosted).
|
||||
ATTACK_PATHS_SINK_DATABASE = env.str("ATTACK_PATHS_SINK_DATABASE", default="neo4j")
|
||||
|
||||
# Orphan task recovery feature flags. The master switch is OFF by default, so task
|
||||
# recovery is opt-in; enable it with DJANGO_TASK_RECOVERY_ENABLED=true. The per-group
|
||||
# toggles default to enabled, so once the master is on every group recovers unless a
|
||||
|
||||
@@ -50,6 +50,12 @@ DATABASES = {
|
||||
"USER": env.str("NEO4J_USER", "neo4j"),
|
||||
"PASSWORD": env.str("NEO4J_PASSWORD", "neo4j_password"),
|
||||
},
|
||||
"neptune": {
|
||||
"WRITER_ENDPOINT": env.str("NEPTUNE_WRITER_ENDPOINT", ""),
|
||||
"READER_ENDPOINT": env.str("NEPTUNE_READER_ENDPOINT", ""),
|
||||
"PORT": env.str("NEPTUNE_PORT", "8182"),
|
||||
"REGION": env.str("AWS_REGION", ""),
|
||||
},
|
||||
}
|
||||
|
||||
DATABASES["default"] = DATABASES["prowler_user"]
|
||||
|
||||
@@ -49,12 +49,19 @@ DATABASES = {
|
||||
"HOST": env("POSTGRES_REPLICA_HOST", default=default_db_host),
|
||||
"PORT": env("POSTGRES_REPLICA_PORT", default=default_db_port),
|
||||
},
|
||||
# TODO: drop after Neptune cutover just loosen defaults to `""`
|
||||
"neo4j": {
|
||||
"HOST": env.str("NEO4J_HOST"),
|
||||
"PORT": env.str("NEO4J_PORT"),
|
||||
"USER": env.str("NEO4J_USER"),
|
||||
"PASSWORD": env.str("NEO4J_PASSWORD"),
|
||||
},
|
||||
"neptune": {
|
||||
"WRITER_ENDPOINT": env.str("NEPTUNE_WRITER_ENDPOINT", default=""),
|
||||
"READER_ENDPOINT": env.str("NEPTUNE_READER_ENDPOINT", default=""),
|
||||
"PORT": env.str("NEPTUNE_PORT", default="8182"),
|
||||
"REGION": env.str("AWS_REGION", default=""),
|
||||
},
|
||||
}
|
||||
|
||||
DATABASES["default"] = DATABASES["prowler_user"]
|
||||
|
||||
@@ -83,12 +83,28 @@ def _warm_compliance_caches_in_background():
|
||||
|
||||
|
||||
def post_fork(_server, worker):
|
||||
"""Warm compliance caches after each worker fork.
|
||||
"""Re-initialize attack-paths drivers and warm compliance caches per worker.
|
||||
|
||||
Warm compliance caches in a background thread so the worker becomes ready
|
||||
immediately. A request for a not-yet-warmed provider lazily loads just that
|
||||
provider, which stays well under the worker timeout.
|
||||
Neo4j / Neptune drivers spawn background IO threads that do not survive
|
||||
``fork()``. When the gunicorn master runs with ``preload_app=True``, the
|
||||
child inherits driver objects whose pool references dead threads and
|
||||
hangs on the first ``pool.acquire`` call until the watchdog kills the
|
||||
worker. Re-initializing per worker guarantees each child owns its own
|
||||
live threads. See GUNICORN_WORKER_TIMEOUTS_ANALYSIS.md for detail.
|
||||
|
||||
Compliance caches are then warmed in a background thread so the worker
|
||||
becomes ready immediately. A request for a not-yet-warmed provider lazily
|
||||
loads just that provider, which stays well under the worker timeout.
|
||||
"""
|
||||
from api.attack_paths import database as graph_database
|
||||
|
||||
try:
|
||||
graph_database.close_driver()
|
||||
except Exception: # pragma: no cover - best-effort cleanup
|
||||
pass
|
||||
graph_database.init_driver()
|
||||
gunicorn_logger.info(f"Attack-paths drivers initialized for worker {worker.pid}")
|
||||
|
||||
threading.Thread(
|
||||
target=_warm_compliance_caches_in_background,
|
||||
name="warm-compliance-caches",
|
||||
|
||||
@@ -1821,6 +1821,36 @@ def attack_paths_query_definition_factory():
|
||||
return _create
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def sink_backend_stub():
|
||||
"""Install a stub `SinkDatabase` into the sink factory for the test's duration.
|
||||
|
||||
The sink factory caches a process-wide backend and lazily initializes it
|
||||
against `settings.DATABASES["neo4j"]` / `["neptune"]`. Tests that don't
|
||||
want to stand up a real Bolt driver can yield this fixture's mock and
|
||||
configure its return values directly:
|
||||
|
||||
sink_backend_stub.execute_read_query.return_value = some_graph
|
||||
|
||||
Both the active backend and the secondary-backend cache are restored on
|
||||
teardown so tests stay isolated.
|
||||
"""
|
||||
from api.attack_paths.sink import factory
|
||||
from api.attack_paths.sink.base import SinkDatabase
|
||||
|
||||
stub = MagicMock(spec=SinkDatabase)
|
||||
previous_backend = factory._backend
|
||||
previous_secondary = dict(factory._secondary_backends)
|
||||
factory._backend = stub
|
||||
factory._secondary_backends.clear()
|
||||
try:
|
||||
yield stub
|
||||
finally:
|
||||
factory._backend = previous_backend
|
||||
factory._secondary_backends.clear()
|
||||
factory._secondary_backends.update(previous_secondary)
|
||||
|
||||
|
||||
@pytest.fixture
|
||||
def attack_paths_graph_stub_classes():
|
||||
"""Provide lightweight graph element stubs for Attack Paths serialization tests."""
|
||||
|
||||
@@ -6,6 +6,7 @@ from typing import Any
|
||||
|
||||
import aioboto3
|
||||
import boto3
|
||||
import botocore
|
||||
import neo4j
|
||||
from api.models import (
|
||||
AttackPathsScan as ProwlerAPIAttackPathsScan,
|
||||
@@ -73,13 +74,28 @@ def start_aws_ingestion(
|
||||
# Adding an extra field
|
||||
common_job_parameters["AWS_ID"] = prowler_api_provider.uid
|
||||
|
||||
cartography_aws._autodiscover_accounts(
|
||||
neo4j_session,
|
||||
boto3_session,
|
||||
prowler_api_provider.uid,
|
||||
cartography_config.update_tag,
|
||||
common_job_parameters,
|
||||
)
|
||||
# AWS Organizations account autodiscovery. Inlined from Cartography's removed
|
||||
# `_autodiscover_accounts` (deleted in `0.137.0`), as `load_aws_accounts` is still public.
|
||||
try:
|
||||
org_client = boto3_session.client("organizations")
|
||||
paginator = org_client.get_paginator("list_accounts")
|
||||
discovered = []
|
||||
for page in paginator.paginate():
|
||||
discovered.extend(page["Accounts"])
|
||||
active_accounts = {
|
||||
a["Name"]: a["Id"] for a in discovered if a["Status"] == "ACTIVE"
|
||||
}
|
||||
cartography_aws.organizations.load_aws_accounts(
|
||||
neo4j_session,
|
||||
active_accounts,
|
||||
cartography_config.update_tag,
|
||||
common_job_parameters,
|
||||
)
|
||||
except botocore.exceptions.ClientError:
|
||||
logger.warning(
|
||||
f"Account {prowler_api_provider.uid} lacks permissions for AWS "
|
||||
"Organizations autodiscovery."
|
||||
)
|
||||
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 4)
|
||||
|
||||
failed_syncs = sync_aws_account(
|
||||
@@ -277,7 +293,7 @@ def sync_aws_account(
|
||||
sync_args: dict[str, Any],
|
||||
attack_paths_scan: ProwlerAPIAttackPathsScan,
|
||||
) -> dict[str, str]:
|
||||
current_progress = 4 # `cartography_aws._autodiscover_accounts`
|
||||
current_progress = 4 # AWS Organizations account autodiscovery
|
||||
max_progress = (
|
||||
87 # `cartography_aws.RESOURCE_FUNCTIONS["permission_relationships"]` - 1
|
||||
)
|
||||
|
||||
@@ -8,7 +8,7 @@ from celery import states
|
||||
from celery.utils.log import get_task_logger
|
||||
from config.django.base import ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES
|
||||
from tasks.jobs.attack_paths.db_utils import (
|
||||
_mark_scan_finished,
|
||||
mark_scan_finished,
|
||||
recover_graph_data_ready,
|
||||
)
|
||||
from tasks.jobs.orphan_recovery import is_worker_alive as _is_worker_alive
|
||||
@@ -87,7 +87,7 @@ def _cleanup_stale_executing_scans(cutoff: datetime) -> list[str]:
|
||||
else:
|
||||
reason = "Worker dead — cleaned up by periodic task"
|
||||
else:
|
||||
# No worker recorded — time-based heuristic only
|
||||
# No worker recorded, time-based heuristic only
|
||||
if scan.started_at and scan.started_at >= cutoff:
|
||||
continue
|
||||
reason = (
|
||||
@@ -160,7 +160,7 @@ def _cleanup_scan(scan, task_result, reason: str) -> bool:
|
||||
"""
|
||||
scan_id_str = str(scan.id)
|
||||
|
||||
# 1. Drop temp Neo4j database
|
||||
# Drop temp Neo4j database
|
||||
tmp_db_name = graph_database.get_database_name(scan.id, temporary=True)
|
||||
try:
|
||||
graph_database.drop_database(tmp_db_name)
|
||||
@@ -225,6 +225,6 @@ def _finalize_failed_scan(scan, expected_state: str, reason: str):
|
||||
logger.info(f"Scan {scan_id_str} is now {fresh_scan.state}, skipping")
|
||||
return None
|
||||
|
||||
_mark_scan_finished(fresh_scan, StateChoices.FAILED, {"global_error": reason})
|
||||
mark_scan_finished(fresh_scan, StateChoices.FAILED, {"global_error": reason})
|
||||
|
||||
return fresh_scan
|
||||
|
||||
@@ -1,9 +1,14 @@
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass
|
||||
from uuid import UUID
|
||||
|
||||
from config.env import env
|
||||
from tasks.jobs.attack_paths import aws
|
||||
from tasks.jobs.attack_paths import provider_config as _provider_config
|
||||
|
||||
# Re-export provider config objects so existing imports keep working.
|
||||
AWS_CONFIG = _provider_config.AWS_CONFIG
|
||||
NormalizedList = _provider_config.NormalizedList
|
||||
PROVIDER_CONFIGS = _provider_config.PROVIDER_CONFIGS
|
||||
ProviderConfig = _provider_config.ProviderConfig
|
||||
|
||||
# Batch size for Neo4j write operations (resource labeling, cleanup)
|
||||
BATCH_SIZE = env.int("ATTACK_PATHS_BATCH_SIZE", 1000)
|
||||
@@ -21,42 +26,12 @@ PROWLER_FINDING_LABEL = "ProwlerFinding"
|
||||
PROVIDER_RESOURCE_LABEL = "_ProviderResource"
|
||||
|
||||
# Dynamic isolation labels that contain entity UUIDs and are added to every synced node during sync
|
||||
# Format: _Tenant_{uuid_no_hyphens}, _Provider_{uuid_no_hyphens}
|
||||
# Format: `_Tenant_{uuid_no_hyphens}`, `_Provider_{uuid_no_hyphens}`
|
||||
TENANT_LABEL_PREFIX = "_Tenant_"
|
||||
PROVIDER_LABEL_PREFIX = "_Provider_"
|
||||
DYNAMIC_ISOLATION_PREFIXES = [TENANT_LABEL_PREFIX, PROVIDER_LABEL_PREFIX]
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProviderConfig:
|
||||
"""Configuration for a cloud provider's Attack Paths integration."""
|
||||
|
||||
name: str
|
||||
root_node_label: str # e.g., "AWSAccount"
|
||||
uid_field: str # e.g., "arn"
|
||||
# Label for resources connected to the account node, enabling indexed finding lookups.
|
||||
resource_label: str # e.g., "_AWSResource"
|
||||
ingestion_function: Callable
|
||||
# Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance).
|
||||
short_uid_extractor: Callable[[str], str]
|
||||
|
||||
|
||||
# Provider Configurations
|
||||
# -----------------------
|
||||
|
||||
AWS_CONFIG = ProviderConfig(
|
||||
name="aws",
|
||||
root_node_label="AWSAccount",
|
||||
uid_field="arn",
|
||||
resource_label="_AWSResource",
|
||||
ingestion_function=aws.start_aws_ingestion,
|
||||
short_uid_extractor=aws.extract_short_uid,
|
||||
)
|
||||
|
||||
PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
|
||||
"aws": AWS_CONFIG,
|
||||
}
|
||||
|
||||
# Labels added by Prowler that should be filtered from API responses
|
||||
# Derived from provider configs + common internal labels
|
||||
INTERNAL_LABELS: list[str] = [
|
||||
@@ -87,7 +62,6 @@ INTERNAL_PROPERTIES: list[str] = [
|
||||
|
||||
|
||||
# Provider Config Accessors
|
||||
# -------------------------
|
||||
|
||||
|
||||
def is_provider_available(provider_type: str) -> bool:
|
||||
@@ -135,7 +109,6 @@ def get_short_uid_extractor(provider_type: str) -> Callable[[str], str]:
|
||||
|
||||
|
||||
# Dynamic Isolation Label Helpers
|
||||
# --------------------------------
|
||||
|
||||
|
||||
def _normalize_uuid(value: str | UUID) -> str:
|
||||
|
||||
@@ -8,6 +8,8 @@ from api.models import Provider as ProwlerAPIProvider
|
||||
from api.models import StateChoices
|
||||
from cartography.config import Config as CartographyConfig
|
||||
from celery.utils.log import get_task_logger
|
||||
from django.conf import settings
|
||||
from django.db.models import Case, IntegerField, Value, When
|
||||
from tasks.jobs.attack_paths.config import is_provider_available
|
||||
|
||||
logger = get_task_logger(__name__)
|
||||
@@ -29,13 +31,33 @@ def create_attack_paths_scan(
|
||||
return None
|
||||
|
||||
with rls_transaction(tenant_id):
|
||||
# Inherit graph_data_ready from the previous scan for this provider,
|
||||
# so queries remain available while the new scan runs.
|
||||
previous_data_ready = ProwlerAPIAttackPathsScan.objects.filter(
|
||||
tenant_id=tenant_id,
|
||||
provider_id=provider_id,
|
||||
graph_data_ready=True,
|
||||
).exists()
|
||||
# Inherit metadata from the previous ready scan for this provider so
|
||||
# queries remain available while the new scan runs. The new row only
|
||||
# flips to the target sink after its own graph sync succeeds.
|
||||
active_sink_backend = settings.ATTACK_PATHS_SINK_DATABASE
|
||||
previous_ready = (
|
||||
ProwlerAPIAttackPathsScan.objects.filter(
|
||||
tenant_id=tenant_id,
|
||||
provider_id=provider_id,
|
||||
graph_data_ready=True,
|
||||
)
|
||||
.annotate(
|
||||
active_sink_rank=Case(
|
||||
When(sink_backend=active_sink_backend, then=Value(0)),
|
||||
default=Value(1),
|
||||
output_field=IntegerField(),
|
||||
)
|
||||
)
|
||||
.order_by("active_sink_rank", "-inserted_at")
|
||||
.first()
|
||||
)
|
||||
previous_data_ready = previous_ready is not None
|
||||
inherited_is_migrated = previous_ready.is_migrated if previous_ready else False
|
||||
inherited_sink_backend = (
|
||||
previous_ready.sink_backend
|
||||
if previous_ready
|
||||
else ProwlerAPIAttackPathsScan.SinkBackendChoices.NEO4J
|
||||
)
|
||||
|
||||
attack_paths_scan = ProwlerAPIAttackPathsScan.objects.create(
|
||||
tenant_id=tenant_id,
|
||||
@@ -44,6 +66,8 @@ def create_attack_paths_scan(
|
||||
state=StateChoices.SCHEDULED,
|
||||
started_at=datetime.now(tz=UTC),
|
||||
graph_data_ready=previous_data_ready,
|
||||
is_migrated=inherited_is_migrated,
|
||||
sink_backend=inherited_sink_backend,
|
||||
)
|
||||
attack_paths_scan.save()
|
||||
|
||||
@@ -114,7 +138,7 @@ def starting_attack_paths_scan(
|
||||
return True
|
||||
|
||||
|
||||
def _mark_scan_finished(
|
||||
def mark_scan_finished(
|
||||
attack_paths_scan: ProwlerAPIAttackPathsScan,
|
||||
state: StateChoices,
|
||||
ingestion_exceptions: dict[str, Any],
|
||||
@@ -148,7 +172,7 @@ def finish_attack_paths_scan(
|
||||
ingestion_exceptions: dict[str, Any],
|
||||
) -> None:
|
||||
with rls_transaction(attack_paths_scan.tenant_id):
|
||||
_mark_scan_finished(attack_paths_scan, state, ingestion_exceptions)
|
||||
mark_scan_finished(attack_paths_scan, state, ingestion_exceptions)
|
||||
|
||||
|
||||
def update_attack_paths_scan_progress(
|
||||
@@ -169,19 +193,45 @@ def set_graph_data_ready(
|
||||
attack_paths_scan.save(update_fields=["graph_data_ready"])
|
||||
|
||||
|
||||
def set_scan_migrated(
|
||||
attack_paths_scan: ProwlerAPIAttackPathsScan,
|
||||
migrated: bool,
|
||||
sink_backend: str | None = None,
|
||||
) -> None:
|
||||
"""Mark the scan as written with the current (migrated) schema.
|
||||
|
||||
Called after a successful sync so the read catalog and sink backend only
|
||||
switch once the new graph is actually live.
|
||||
|
||||
# TODO: drop after Neptune cutover
|
||||
"""
|
||||
with rls_transaction(attack_paths_scan.tenant_id):
|
||||
attack_paths_scan.is_migrated = migrated
|
||||
update_fields = ["is_migrated"]
|
||||
if sink_backend is not None:
|
||||
attack_paths_scan.sink_backend = sink_backend
|
||||
update_fields.append("sink_backend")
|
||||
attack_paths_scan.save(update_fields=update_fields)
|
||||
|
||||
|
||||
def set_provider_graph_data_ready(
|
||||
attack_paths_scan: ProwlerAPIAttackPathsScan,
|
||||
ready: bool,
|
||||
sink_backend: str | None = None,
|
||||
) -> None:
|
||||
"""
|
||||
Set `graph_data_ready` for ALL scans of the same provider.
|
||||
Set `graph_data_ready` for scans of the same provider in one sink.
|
||||
|
||||
Used before drop/sync so that older scan IDs cannot bypass the query gate while the graph is being replaced.
|
||||
Used before drop/sync so that older scan IDs in the target sink cannot
|
||||
bypass the query gate while that sink's graph is being replaced. Scans
|
||||
preserved in another sink stay queryable for rollback.
|
||||
"""
|
||||
target_sink_backend = sink_backend or attack_paths_scan.sink_backend
|
||||
with rls_transaction(attack_paths_scan.tenant_id):
|
||||
ProwlerAPIAttackPathsScan.objects.filter(
|
||||
tenant_id=attack_paths_scan.tenant_id,
|
||||
provider_id=attack_paths_scan.provider_id,
|
||||
sink_backend=target_sink_backend,
|
||||
).update(graph_data_ready=ready)
|
||||
attack_paths_scan.refresh_from_db(fields=["graph_data_ready"])
|
||||
|
||||
@@ -202,10 +252,15 @@ def recover_graph_data_ready(
|
||||
next successful scan) is a worse outcome for the user.
|
||||
"""
|
||||
try:
|
||||
from api.attack_paths import sink as sink_module
|
||||
|
||||
tenant_db = graph_database.get_database_name(attack_paths_scan.tenant_id)
|
||||
if graph_database.has_provider_data(
|
||||
tenant_db, str(attack_paths_scan.provider_id)
|
||||
):
|
||||
# TODO: drop after Neptune cutover
|
||||
# Check the backend that actually holds this scan's data, not the
|
||||
# currently configured sink, a stale `EXECUTING` scan from before a
|
||||
# backend switch must still be recoverable
|
||||
backend = sink_module.get_backend_for_scan(attack_paths_scan)
|
||||
if backend.has_provider_data(tenant_db, str(attack_paths_scan.provider_id)):
|
||||
set_provider_graph_data_ready(attack_paths_scan, True)
|
||||
logger.info(
|
||||
f"Recovered `graph_data_ready` for provider {attack_paths_scan.provider_id}"
|
||||
@@ -247,6 +302,6 @@ def fail_attack_paths_scan(
|
||||
return
|
||||
if fresh.state in (StateChoices.COMPLETED, StateChoices.FAILED):
|
||||
return
|
||||
_mark_scan_finished(fresh, StateChoices.FAILED, {"global_error": error})
|
||||
mark_scan_finished(fresh, StateChoices.FAILED, {"global_error": error})
|
||||
|
||||
recover_graph_data_ready(fresh)
|
||||
|
||||
@@ -82,7 +82,6 @@ def _to_neo4j_dict(
|
||||
|
||||
|
||||
# Public API
|
||||
# ----------
|
||||
|
||||
|
||||
def analysis(
|
||||
@@ -196,7 +195,6 @@ def load_findings(
|
||||
|
||||
|
||||
# Findings Streaming (Generator-based)
|
||||
# -------------------------------------
|
||||
|
||||
|
||||
def stream_findings_with_resources(
|
||||
@@ -275,7 +273,6 @@ def _fetch_findings_batch(
|
||||
|
||||
|
||||
# Batch Enrichment
|
||||
# -----------------
|
||||
|
||||
|
||||
def _enrich_batch_with_resources(
|
||||
|
||||
@@ -1,5 +1,6 @@
|
||||
import neo4j
|
||||
from cartography.client.core.tx import run_write_query
|
||||
from cartography.intel import create_indexes as cartography_create_indexes
|
||||
from celery.utils.log import get_task_logger
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
INTERNET_NODE_LABEL,
|
||||
@@ -30,14 +31,34 @@ SYNC_INDEX_STATEMENTS = [
|
||||
|
||||
|
||||
def create_findings_indexes(neo4j_session: neo4j.Session) -> None:
|
||||
"""Create indexes for Prowler findings and resource lookups."""
|
||||
"""Create indexes for Prowler findings and resource lookups.
|
||||
|
||||
Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
|
||||
session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
|
||||
and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
|
||||
"""
|
||||
logger.info("Creating indexes for Prowler Findings node types")
|
||||
for statement in FINDINGS_INDEX_STATEMENTS:
|
||||
run_write_query(neo4j_session, statement)
|
||||
|
||||
|
||||
def create_cartography_indexes(neo4j_session: neo4j.Session, config) -> None:
|
||||
"""Create Cartography's standard indexes for the session's database.
|
||||
|
||||
Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
|
||||
session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
|
||||
and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
|
||||
"""
|
||||
cartography_create_indexes.run(neo4j_session, config)
|
||||
|
||||
|
||||
def create_sync_indexes(neo4j_session: neo4j.Session) -> None:
|
||||
"""Create indexes for provider resource sync operations."""
|
||||
"""Create indexes for provider resource sync operations.
|
||||
|
||||
Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
|
||||
session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
|
||||
and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
|
||||
"""
|
||||
logger.info("Ensuring ProviderResource indexes exist")
|
||||
for statement in SYNC_INDEX_STATEMENTS:
|
||||
neo4j_session.run(statement)
|
||||
|
||||
@@ -0,0 +1,413 @@
|
||||
"""
|
||||
Provider-level Attack Paths configuration.
|
||||
|
||||
Each `ProviderConfig` carries the cloud provider's ingestion entry point and
|
||||
the catalog of list-typed node properties (`normalized_lists`). The sync
|
||||
layer reads this catalog and materialises each list element as a child node
|
||||
connected to the parent by a typed edge, so queries traverse the graph
|
||||
instead of working on serialised list values. Both Neo4j and Neptune sinks
|
||||
write the same shape and queries are portable across them.
|
||||
"""
|
||||
|
||||
from collections.abc import Callable
|
||||
from dataclasses import dataclass, field
|
||||
|
||||
from tasks.jobs.attack_paths import aws
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class NormalizedList:
|
||||
"""Catalog entry for a list-typed node property.
|
||||
|
||||
Describes how the sync layer materialises a parent node's list-typed
|
||||
property as a set of child item nodes connected by a typed edge.
|
||||
|
||||
Conventions (mechanical, do not invent):
|
||||
- `child_label`: `<SourceLabel><PropertyPascal>Item`
|
||||
e.g. AWSPolicyStatement.resource -> AWSPolicyStatementResourceItem
|
||||
- `rel_type`: `HAS_<PROPERTY_UPPER>`
|
||||
e.g. resource -> HAS_RESOURCE
|
||||
- child node property:
|
||||
* `field_map = []` (scalar list, ~95% case) -> child stores `value: str`
|
||||
* `field_map = [(src_key, child_field), ...]` (list of dicts, rare)
|
||||
-> child stores those fields
|
||||
"""
|
||||
|
||||
source_label: str
|
||||
source_property: str
|
||||
child_label: str
|
||||
rel_type: str
|
||||
field_map: list[tuple[str, str]] = field(default_factory=list)
|
||||
|
||||
def __post_init__(self) -> None:
|
||||
if self.field_map:
|
||||
child_fields = [dst for _, dst in self.field_map]
|
||||
if "value" in child_fields:
|
||||
raise ValueError(
|
||||
f"NormalizedList {self.source_label}.{self.source_property}: "
|
||||
"`value` is reserved for scalar mode; do not map a source key to it"
|
||||
)
|
||||
src_keys = [src for src, _ in self.field_map]
|
||||
if len(set(src_keys)) != len(src_keys):
|
||||
raise ValueError(
|
||||
f"NormalizedList {self.source_label}.{self.source_property}: "
|
||||
"duplicate source key in field_map"
|
||||
)
|
||||
if len(set(child_fields)) != len(child_fields):
|
||||
raise ValueError(
|
||||
f"NormalizedList {self.source_label}.{self.source_property}: "
|
||||
"duplicate child field in field_map"
|
||||
)
|
||||
|
||||
|
||||
@dataclass(frozen=True)
|
||||
class ProviderConfig:
|
||||
"""Configuration for a cloud provider's Attack Paths integration."""
|
||||
|
||||
name: str
|
||||
root_node_label: str # e.g., "AWSAccount"
|
||||
uid_field: str # e.g., "arn"
|
||||
# Label for resources connected to the account node, enabling indexed finding lookups
|
||||
resource_label: str # e.g., "_AWSResource"
|
||||
ingestion_function: Callable
|
||||
# Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance)
|
||||
short_uid_extractor: Callable[[str], str]
|
||||
# List-typed properties to materialise as child nodes + edges at sync time.
|
||||
# Mandatory (may be []). Without an entry here, a list-typed property falls
|
||||
# back to comma-string flatten and emits a one-time warning.
|
||||
normalized_lists: list[NormalizedList]
|
||||
|
||||
|
||||
# AWS list-typed property catalog.
|
||||
# One entry per Cartography node property whose runtime value is a list. The
|
||||
# sync layer materialises each element as a `<child_label>` node and links it
|
||||
# to the parent with a `<rel_type>` edge; see the `NormalizedList` docstring
|
||||
# above for the naming conventions.
|
||||
AWS_NORMALIZED_LISTS: list[NormalizedList] = [
|
||||
# AWSPolicyStatement - the hot path driving the 53-query perf fix.
|
||||
NormalizedList(
|
||||
"AWSPolicyStatement", "action", "AWSPolicyStatementActionItem", "HAS_ACTION"
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSPolicyStatement",
|
||||
"notaction",
|
||||
"AWSPolicyStatementNotactionItem",
|
||||
"HAS_NOTACTION",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSPolicyStatement",
|
||||
"resource",
|
||||
"AWSPolicyStatementResourceItem",
|
||||
"HAS_RESOURCE",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSPolicyStatement",
|
||||
"notresource",
|
||||
"AWSPolicyStatementNotresourceItem",
|
||||
"HAS_NOTRESOURCE",
|
||||
),
|
||||
# S3PolicyStatement - same shape as IAM policies; AWS allows list or string.
|
||||
NormalizedList(
|
||||
"S3PolicyStatement", "action", "S3PolicyStatementActionItem", "HAS_ACTION"
|
||||
),
|
||||
NormalizedList(
|
||||
"S3PolicyStatement", "resource", "S3PolicyStatementResourceItem", "HAS_RESOURCE"
|
||||
),
|
||||
# IAM / Cognito / KMS / Secrets
|
||||
NormalizedList(
|
||||
"CognitoIdentityPool", "roles", "CognitoIdentityPoolRolesItem", "HAS_ROLES"
|
||||
),
|
||||
NormalizedList(
|
||||
"KMSKey",
|
||||
"encryption_algorithms",
|
||||
"KMSKeyEncryptionAlgorithmsItem",
|
||||
"HAS_ENCRYPTION_ALGORITHMS",
|
||||
),
|
||||
NormalizedList(
|
||||
"KMSKey",
|
||||
"signing_algorithms",
|
||||
"KMSKeySigningAlgorithmsItem",
|
||||
"HAS_SIGNING_ALGORITHMS",
|
||||
),
|
||||
NormalizedList(
|
||||
"KMSKey",
|
||||
"anonymous_actions",
|
||||
"KMSKeyAnonymousActionsItem",
|
||||
"HAS_ANONYMOUS_ACTIONS",
|
||||
),
|
||||
NormalizedList(
|
||||
"KMSGrant", "operations", "KMSGrantOperationsItem", "HAS_OPERATIONS"
|
||||
),
|
||||
NormalizedList(
|
||||
"SecretsManagerSecretVersion",
|
||||
"version_stages",
|
||||
"SecretsManagerSecretVersionVersionStagesItem",
|
||||
"HAS_VERSION_STAGES",
|
||||
),
|
||||
NormalizedList(
|
||||
"SecretsManagerSecretVersion",
|
||||
"kms_key_ids",
|
||||
"SecretsManagerSecretVersionKmsKeyIdsItem",
|
||||
"HAS_KMS_KEY_IDS",
|
||||
),
|
||||
NormalizedList(
|
||||
"SecretsManagerSecretVersion",
|
||||
"tags",
|
||||
"SecretsManagerSecretVersionTagsItem",
|
||||
"HAS_TAGS",
|
||||
field_map=[("Key", "key"), ("Value", "value_")],
|
||||
# `value` is reserved for scalar mode; map `Value` to `value_` to keep dict shape.
|
||||
),
|
||||
# Lambda / Compute
|
||||
NormalizedList(
|
||||
"AWSLambda", "architectures", "AWSLambdaArchitecturesItem", "HAS_ARCHITECTURES"
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSLambda",
|
||||
"anonymous_actions",
|
||||
"AWSLambdaAnonymousActionsItem",
|
||||
"HAS_ANONYMOUS_ACTIONS",
|
||||
),
|
||||
NormalizedList(
|
||||
"CodeBuildProject",
|
||||
"environment_variables",
|
||||
"CodeBuildProjectEnvironmentVariablesItem",
|
||||
"HAS_ENVIRONMENT_VARIABLES",
|
||||
),
|
||||
# ECS family
|
||||
NormalizedList(
|
||||
"ECSCluster",
|
||||
"capacity_providers",
|
||||
"ECSClusterCapacityProvidersItem",
|
||||
"HAS_CAPACITY_PROVIDERS",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSTaskDefinition",
|
||||
"compatibilities",
|
||||
"ECSTaskDefinitionCompatibilitiesItem",
|
||||
"HAS_COMPATIBILITIES",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSTaskDefinition",
|
||||
"requires_compatibilities",
|
||||
"ECSTaskDefinitionRequiresCompatibilitiesItem",
|
||||
"HAS_REQUIRES_COMPATIBILITIES",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSContainerDefinition",
|
||||
"links",
|
||||
"ECSContainerDefinitionLinksItem",
|
||||
"HAS_LINKS",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSContainerDefinition",
|
||||
"entry_point",
|
||||
"ECSContainerDefinitionEntryPointItem",
|
||||
"HAS_ENTRY_POINT",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSContainerDefinition",
|
||||
"command",
|
||||
"ECSContainerDefinitionCommandItem",
|
||||
"HAS_COMMAND",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSContainerDefinition",
|
||||
"dns_servers",
|
||||
"ECSContainerDefinitionDnsServersItem",
|
||||
"HAS_DNS_SERVERS",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSContainerDefinition",
|
||||
"dns_search_domains",
|
||||
"ECSContainerDefinitionDnsSearchDomainsItem",
|
||||
"HAS_DNS_SEARCH_DOMAINS",
|
||||
),
|
||||
NormalizedList(
|
||||
"ECSContainerDefinition",
|
||||
"docker_security_options",
|
||||
"ECSContainerDefinitionDockerSecurityOptionsItem",
|
||||
"HAS_DOCKER_SECURITY_OPTIONS",
|
||||
),
|
||||
NormalizedList("ECSContainer", "gpu_ids", "ECSContainerGpuIdsItem", "HAS_GPU_IDS"),
|
||||
# ECR
|
||||
NormalizedList(
|
||||
"ECRImage", "layer_diff_ids", "ECRImageLayerDiffIdsItem", "HAS_LAYER_DIFF_IDS"
|
||||
),
|
||||
NormalizedList(
|
||||
"ECRImage",
|
||||
"child_image_digests",
|
||||
"ECRImageChildImageDigestsItem",
|
||||
"HAS_CHILD_IMAGE_DIGESTS",
|
||||
),
|
||||
# EC2 / Networking
|
||||
NormalizedList(
|
||||
"EC2Instance",
|
||||
"exposed_internet_type",
|
||||
"EC2InstanceExposedInternetTypeItem",
|
||||
"HAS_EXPOSED_INTERNET_TYPE",
|
||||
),
|
||||
NormalizedList(
|
||||
"AutoScalingGroup",
|
||||
"exposed_internet_type",
|
||||
"AutoScalingGroupExposedInternetTypeItem",
|
||||
"HAS_EXPOSED_INTERNET_TYPE",
|
||||
),
|
||||
NormalizedList(
|
||||
"LaunchConfiguration",
|
||||
"security_groups",
|
||||
"LaunchConfigurationSecurityGroupsItem",
|
||||
"HAS_SECURITY_GROUPS",
|
||||
),
|
||||
NormalizedList(
|
||||
"LaunchTemplateVersion",
|
||||
"security_group_ids",
|
||||
"LaunchTemplateVersionSecurityGroupIdsItem",
|
||||
"HAS_SECURITY_GROUP_IDS",
|
||||
),
|
||||
NormalizedList(
|
||||
"LaunchTemplateVersion",
|
||||
"security_groups",
|
||||
"LaunchTemplateVersionSecurityGroupsItem",
|
||||
"HAS_SECURITY_GROUPS",
|
||||
),
|
||||
NormalizedList(
|
||||
"ELBListener", "policy_names", "ELBListenerPolicyNamesItem", "HAS_POLICY_NAMES"
|
||||
),
|
||||
# CloudFront / Route53 / CloudWatch / CloudTrail
|
||||
NormalizedList(
|
||||
"CloudFrontDistribution",
|
||||
"aliases",
|
||||
"CloudFrontDistributionAliasesItem",
|
||||
"HAS_ALIASES",
|
||||
),
|
||||
NormalizedList(
|
||||
"CloudFrontDistribution",
|
||||
"geo_restriction_locations",
|
||||
"CloudFrontDistributionGeoRestrictionLocationsItem",
|
||||
"HAS_GEO_RESTRICTION_LOCATIONS",
|
||||
),
|
||||
NormalizedList(
|
||||
"CloudWatchLogGroup",
|
||||
"inherited_properties",
|
||||
"CloudWatchLogGroupInheritedPropertiesItem",
|
||||
"HAS_INHERITED_PROPERTIES",
|
||||
),
|
||||
# RDS / Storage
|
||||
NormalizedList(
|
||||
"RDSCluster",
|
||||
"availability_zones",
|
||||
"RDSClusterAvailabilityZonesItem",
|
||||
"HAS_AVAILABILITY_ZONES",
|
||||
),
|
||||
NormalizedList(
|
||||
"RDSEventSubscription",
|
||||
"event_categories",
|
||||
"RDSEventSubscriptionEventCategoriesItem",
|
||||
"HAS_EVENT_CATEGORIES",
|
||||
),
|
||||
NormalizedList(
|
||||
"RDSEventSubscription",
|
||||
"source_ids",
|
||||
"RDSEventSubscriptionSourceIdsItem",
|
||||
"HAS_SOURCE_IDS",
|
||||
),
|
||||
NormalizedList(
|
||||
"S3Bucket",
|
||||
"anonymous_actions",
|
||||
"S3BucketAnonymousActionsItem",
|
||||
"HAS_ANONYMOUS_ACTIONS",
|
||||
),
|
||||
# Inspector / Config / SSM / ACM / APIGateway / Glue / SageMaker / Bedrock
|
||||
NormalizedList(
|
||||
"AWSInspectorFinding",
|
||||
"referenceurls",
|
||||
"AWSInspectorFindingReferenceurlsItem",
|
||||
"HAS_REFERENCEURLS",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSInspectorFinding",
|
||||
"relatedvulnerabilities",
|
||||
"AWSInspectorFindingRelatedvulnerabilitiesItem",
|
||||
"HAS_RELATEDVULNERABILITIES",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSInspectorFinding",
|
||||
"vulnerablepackageids",
|
||||
"AWSInspectorFindingVulnerablepackageidsItem",
|
||||
"HAS_VULNERABLEPACKAGEIDS",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSConfigurationRecorder",
|
||||
"recording_group_resource_types",
|
||||
"AWSConfigurationRecorderRecordingGroupResourceTypesItem",
|
||||
"HAS_RECORDING_GROUP_RESOURCE_TYPES",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSConfigRule",
|
||||
"scope_compliance_resource_types",
|
||||
"AWSConfigRuleScopeComplianceResourceTypesItem",
|
||||
"HAS_SCOPE_COMPLIANCE_RESOURCE_TYPES",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSConfigRule",
|
||||
"source_details",
|
||||
"AWSConfigRuleSourceDetailsItem",
|
||||
"HAS_SOURCE_DETAILS",
|
||||
),
|
||||
NormalizedList(
|
||||
"SSMInstancePatch", "cve_ids", "SSMInstancePatchCveIdsItem", "HAS_CVE_IDS"
|
||||
),
|
||||
NormalizedList(
|
||||
"ACMCertificate", "in_use_by", "ACMCertificateInUseByItem", "HAS_IN_USE_BY"
|
||||
),
|
||||
NormalizedList(
|
||||
"APIGatewayRestAPI",
|
||||
"anonymous_actions",
|
||||
"APIGatewayRestAPIAnonymousActionsItem",
|
||||
"HAS_ANONYMOUS_ACTIONS",
|
||||
),
|
||||
NormalizedList(
|
||||
"GlueJob", "connections", "GlueJobConnectionsItem", "HAS_CONNECTIONS"
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSBedrockFoundationModel",
|
||||
"input_modalities",
|
||||
"AWSBedrockFoundationModelInputModalitiesItem",
|
||||
"HAS_INPUT_MODALITIES",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSBedrockFoundationModel",
|
||||
"output_modalities",
|
||||
"AWSBedrockFoundationModelOutputModalitiesItem",
|
||||
"HAS_OUTPUT_MODALITIES",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSBedrockFoundationModel",
|
||||
"customizations_supported",
|
||||
"AWSBedrockFoundationModelCustomizationsSupportedItem",
|
||||
"HAS_CUSTOMIZATIONS_SUPPORTED",
|
||||
),
|
||||
NormalizedList(
|
||||
"AWSBedrockFoundationModel",
|
||||
"inference_types_supported",
|
||||
"AWSBedrockFoundationModelInferenceTypesSupportedItem",
|
||||
"HAS_INFERENCE_TYPES_SUPPORTED",
|
||||
),
|
||||
]
|
||||
|
||||
|
||||
AWS_CONFIG = ProviderConfig(
|
||||
name="aws",
|
||||
root_node_label="AWSAccount",
|
||||
uid_field="arn",
|
||||
resource_label="_AWSResource",
|
||||
ingestion_function=aws.start_aws_ingestion,
|
||||
short_uid_extractor=aws.extract_short_uid,
|
||||
normalized_lists=AWS_NORMALIZED_LISTS,
|
||||
)
|
||||
|
||||
|
||||
PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
|
||||
"aws": AWS_CONFIG,
|
||||
}
|
||||
@@ -1,8 +1,6 @@
|
||||
# Cypher query templates for Attack Paths operations
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
INTERNET_NODE_LABEL,
|
||||
PROVIDER_ELEMENT_ID_PROPERTY,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
PROWLER_FINDING_LABEL,
|
||||
)
|
||||
|
||||
@@ -21,7 +19,6 @@ def render_cypher_template(template: str, replacements: dict[str, str]) -> str:
|
||||
|
||||
|
||||
# Findings queries (used by findings.py)
|
||||
# ---------------------------------------
|
||||
|
||||
ADD_RESOURCE_LABEL_TEMPLATE = """
|
||||
MATCH (account:__ROOT_LABEL__ {id: $provider_uid})-->(r)
|
||||
@@ -88,7 +85,6 @@ INSERT_FINDING_TEMPLATE = f"""
|
||||
"""
|
||||
|
||||
# Internet queries (used by internet.py)
|
||||
# ---------------------------------------
|
||||
|
||||
CREATE_INTERNET_NODE = f"""
|
||||
MERGE (internet:{INTERNET_NODE_LABEL} {{id: 'Internet'}})
|
||||
@@ -118,8 +114,8 @@ CREATE_CAN_ACCESS_RELATIONSHIPS_TEMPLATE = f"""
|
||||
RETURN COUNT(r) AS relationships_merged
|
||||
"""
|
||||
|
||||
# Sync queries (used by sync.py)
|
||||
# -------------------------------
|
||||
# Sync queries (used by sync.py to fetch from the cartography temp Neo4j DB)
|
||||
# The write side of sync lives in each sink (`api/attack_paths/sink/`).
|
||||
|
||||
NODE_FETCH_QUERY = """
|
||||
MATCH (n)
|
||||
@@ -143,17 +139,3 @@ RELATIONSHIPS_FETCH_QUERY = """
|
||||
ORDER BY internal_id
|
||||
LIMIT $batch_size
|
||||
"""
|
||||
|
||||
NODE_SYNC_TEMPLATE = f"""
|
||||
UNWIND $rows AS row
|
||||
MERGE (n:__NODE_LABELS__ {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.provider_element_id}})
|
||||
SET n += row.props
|
||||
"""
|
||||
|
||||
RELATIONSHIP_SYNC_TEMPLATE = f"""
|
||||
UNWIND $rows AS row
|
||||
MATCH (s:{PROVIDER_RESOURCE_LABEL} {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.start_element_id}})
|
||||
MATCH (t:{PROVIDER_RESOURCE_LABEL} {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.end_element_id}})
|
||||
MERGE (s)-[r:__REL_TYPE__ {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.provider_element_id}}]->(t)
|
||||
SET r += row.props
|
||||
"""
|
||||
|
||||
@@ -39,8 +39,8 @@ Pipeline steps:
|
||||
|
||||
7. Sync the temp database into the tenant database:
|
||||
- Drop the old provider subgraph (matched by dynamic _Provider_{uuid} label).
|
||||
graph_data_ready is set to False for all scans of this provider while
|
||||
the swap happens so the API doesn't serve partial data.
|
||||
graph_data_ready is set to False for scans of this provider in the
|
||||
target sink while the swap happens so the API doesn't serve partial data.
|
||||
- Copy nodes and relationships in batches. Every synced node gets a
|
||||
_ProviderResource label and dynamic _Tenant_{uuid} / _Provider_{uuid}
|
||||
isolation labels, plus a _provider_element_id property for MERGE keys.
|
||||
@@ -64,10 +64,17 @@ from api.models import StateChoices
|
||||
from api.utils import initialize_prowler_provider
|
||||
from cartography.config import Config as CartographyConfig
|
||||
from cartography.intel import analysis as cartography_analysis
|
||||
from cartography.intel import create_indexes as cartography_create_indexes
|
||||
from cartography.intel import ontology as cartography_ontology
|
||||
from celery.utils.log import get_task_logger
|
||||
from tasks.jobs.attack_paths import db_utils, findings, indexes, internet, sync, utils
|
||||
from django.conf import settings
|
||||
from tasks.jobs.attack_paths import (
|
||||
db_utils,
|
||||
findings,
|
||||
indexes,
|
||||
internet,
|
||||
sync,
|
||||
utils,
|
||||
)
|
||||
from tasks.jobs.attack_paths.config import get_cartography_ingestion_function
|
||||
|
||||
# Without this Celery goes crazy with Cartography logging
|
||||
@@ -96,7 +103,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
attack_paths_scan = db_utils.retrieve_attack_paths_scan(tenant_id, scan_id)
|
||||
|
||||
# Idempotency guard: cleanup may have flipped this row to a terminal state
|
||||
# while the message was still in flight. Bail out before touching state.
|
||||
# while the message was still in flight. Bail out before touching state
|
||||
if attack_paths_scan and attack_paths_scan.state in (
|
||||
StateChoices.FAILED,
|
||||
StateChoices.COMPLETED,
|
||||
@@ -125,7 +132,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
|
||||
else:
|
||||
if not attack_paths_scan:
|
||||
# Safety net for in-flight messages or direct task invocations; dispatcher normally pre-creates the row.
|
||||
# Safety net for in-flight messages or direct task invocations; dispatcher normally pre-creates the row
|
||||
logger.warning(
|
||||
f"No Attack Paths Scan found for scan {scan_id} and tenant {tenant_id}, let's create it then"
|
||||
)
|
||||
@@ -143,10 +150,18 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
tenant_database_name = graph_database.get_database_name(
|
||||
prowler_api_provider.tenant_id
|
||||
)
|
||||
target_sink_backend = settings.ATTACK_PATHS_SINK_DATABASE
|
||||
target_description = (
|
||||
f"tenant Neo4j database {tenant_database_name}"
|
||||
if target_sink_backend == "neo4j"
|
||||
else f"{target_sink_backend} sink"
|
||||
)
|
||||
|
||||
# While creating the Cartography configuration, attributes `neo4j_user` and `neo4j_password` are not really needed in this config object
|
||||
tmp_cartography_config = CartographyConfig(
|
||||
neo4j_uri=graph_database.get_uri(),
|
||||
# The temp ingest database is always Neo4j, so use the ingest URI here
|
||||
# rather than the sink URI (which points at Neptune when configured).
|
||||
neo4j_uri=graph_database.get_ingest_uri(),
|
||||
neo4j_database=tmp_database_name,
|
||||
update_tag=int(time.time()),
|
||||
)
|
||||
@@ -168,7 +183,8 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
scan_t0 = time.perf_counter()
|
||||
logger.info(
|
||||
f"Starting Attack Paths scan ({attack_paths_scan.id}) for "
|
||||
f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id}"
|
||||
f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id} "
|
||||
f"(staging=Neo4j database {tmp_database_name}, target={target_description})"
|
||||
)
|
||||
|
||||
subgraph_dropped = False
|
||||
@@ -177,7 +193,8 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
|
||||
try:
|
||||
logger.info(
|
||||
f"Creating Neo4j database {tmp_cartography_config.neo4j_database} for tenant {prowler_api_provider.tenant_id}"
|
||||
f"Creating staging Neo4j database {tmp_cartography_config.neo4j_database} "
|
||||
f"for tenant {prowler_api_provider.tenant_id}"
|
||||
)
|
||||
|
||||
graph_database.create_database(tmp_cartography_config.neo4j_database)
|
||||
@@ -191,7 +208,9 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
tmp_cartography_config.neo4j_database
|
||||
) as tmp_neo4j_session:
|
||||
# Indexes creation
|
||||
cartography_create_indexes.run(tmp_neo4j_session, tmp_cartography_config)
|
||||
indexes.create_cartography_indexes(
|
||||
tmp_neo4j_session, tmp_cartography_config
|
||||
)
|
||||
indexes.create_findings_indexes(tmp_neo4j_session)
|
||||
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 2)
|
||||
|
||||
@@ -223,7 +242,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
cartography_analysis.run(tmp_neo4j_session, tmp_cartography_config)
|
||||
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 95)
|
||||
|
||||
# Creating Internet node and CAN_ACCESS relationships
|
||||
# Creating Internet node and `CAN_ACCESS` relationships
|
||||
logger.info(
|
||||
f"Creating Internet graph for AWS account {prowler_api_provider.uid}"
|
||||
)
|
||||
@@ -247,23 +266,41 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 97)
|
||||
|
||||
logger.info(
|
||||
f"Clearing Neo4j cache for database {tmp_cartography_config.neo4j_database}"
|
||||
f"Clearing Neo4j cache for staging database {tmp_cartography_config.neo4j_database}"
|
||||
)
|
||||
graph_database.clear_cache(tmp_cartography_config.neo4j_database)
|
||||
|
||||
t0 = time.perf_counter()
|
||||
logger.info(
|
||||
f"Ensuring tenant database {tenant_database_name}, and its indexes, exists for tenant {prowler_api_provider.tenant_id}"
|
||||
f"Preparing target {target_description} for tenant {prowler_api_provider.tenant_id}"
|
||||
)
|
||||
graph_database.create_database(tenant_database_name)
|
||||
with graph_database.get_session(tenant_database_name) as tenant_neo4j_session:
|
||||
cartography_create_indexes.run(
|
||||
tenant_neo4j_session, tenant_cartography_config
|
||||
)
|
||||
indexes.create_findings_indexes(tenant_neo4j_session)
|
||||
indexes.create_sync_indexes(tenant_neo4j_session)
|
||||
# Sink-side index creation: Neptune auto-manages indexes and rejects
|
||||
# `CREATE INDEX`, so only run it when the sink is Neo4j
|
||||
# The temp ingest DB is always Neo4j and is always indexed above
|
||||
if target_sink_backend != "neptune":
|
||||
logger.info(f"Ensuring indexes exist for {target_description}")
|
||||
with graph_database.get_session(
|
||||
tenant_database_name
|
||||
) as tenant_neo4j_session:
|
||||
indexes.create_cartography_indexes(
|
||||
tenant_neo4j_session, tenant_cartography_config
|
||||
)
|
||||
indexes.create_findings_indexes(tenant_neo4j_session)
|
||||
indexes.create_sync_indexes(tenant_neo4j_session)
|
||||
else:
|
||||
logger.info("Skipping tenant database indexes for neptune sink")
|
||||
logger.info(
|
||||
f"Prepared target {target_description} in {time.perf_counter() - t0:.3f}s"
|
||||
)
|
||||
|
||||
logger.info(f"Deleting existing provider graph in {tenant_database_name}")
|
||||
db_utils.set_provider_graph_data_ready(attack_paths_scan, False)
|
||||
logger.info(
|
||||
f"Deleting existing provider graph from {target_description} "
|
||||
f"(tenant={prowler_api_provider.tenant_id}, provider={prowler_api_provider.id})"
|
||||
)
|
||||
db_utils.set_provider_graph_data_ready(
|
||||
attack_paths_scan, False, target_sink_backend
|
||||
)
|
||||
provider_gated = True
|
||||
|
||||
t0 = time.perf_counter()
|
||||
@@ -272,14 +309,17 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
provider_id=str(prowler_api_provider.id),
|
||||
)
|
||||
logger.info(
|
||||
f"Deleted existing provider graph in {time.perf_counter() - t0:.3f}s "
|
||||
f"(deleted_nodes={deleted_nodes})"
|
||||
f"Deleted existing provider graph from {target_description} "
|
||||
f"in {time.perf_counter() - t0:.3f}s (deleted_nodes={deleted_nodes})"
|
||||
)
|
||||
subgraph_dropped = True
|
||||
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 98)
|
||||
|
||||
logger.info(
|
||||
f"Syncing graph from {tmp_database_name} into {tenant_database_name}"
|
||||
f"Syncing staging graph {tmp_database_name} into {target_description} "
|
||||
f"for provider {prowler_api_provider.id} "
|
||||
f"(tenant {prowler_api_provider.tenant_id}, "
|
||||
f"type {prowler_api_provider.provider})"
|
||||
)
|
||||
t0 = time.perf_counter()
|
||||
sync_result = sync.sync_graph(
|
||||
@@ -287,17 +327,34 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
target_database=tenant_database_name,
|
||||
tenant_id=str(prowler_api_provider.tenant_id),
|
||||
provider_id=str(prowler_api_provider.id),
|
||||
provider_type=prowler_api_provider.provider,
|
||||
)
|
||||
elapsed = time.perf_counter() - t0
|
||||
total_nodes = sync_result["nodes"] + sync_result["child_nodes"]
|
||||
elements = total_nodes + sync_result["relationships"]
|
||||
rate = elements / elapsed if elapsed else 0
|
||||
logger.info(
|
||||
f"Synced graph in {time.perf_counter() - t0:.3f}s "
|
||||
f"(nodes={sync_result['nodes']}, relationships={sync_result['relationships']})"
|
||||
f"Synced staging graph into {target_description} in {elapsed:.3f}s - "
|
||||
f"nodes={total_nodes} (source={sync_result['nodes']}, "
|
||||
f"items={sync_result['child_nodes']}), "
|
||||
f"relationships={sync_result['relationships']} "
|
||||
f"(structural={sync_result['structural_relationships']}, "
|
||||
f"items={sync_result['item_relationships']}), "
|
||||
f"~{rate:.0f} elem/s"
|
||||
)
|
||||
sync_completed = True
|
||||
# Flip metadata only now: the new schema is live in the target sink, so
|
||||
# reads can switch to the current catalog/backend. The target-sink gate
|
||||
# is already closed, so the switch is atomic from the API's view.
|
||||
db_utils.set_scan_migrated(attack_paths_scan, True, target_sink_backend)
|
||||
db_utils.set_graph_data_ready(attack_paths_scan, True)
|
||||
db_utils.update_attack_paths_scan_progress(attack_paths_scan, 99)
|
||||
|
||||
logger.info(f"Clearing Neo4j cache for database {tenant_database_name}")
|
||||
graph_database.clear_cache(tenant_database_name)
|
||||
if target_sink_backend == "neptune":
|
||||
logger.info("Skipping cache clear for neptune sink")
|
||||
else:
|
||||
logger.info(f"Clearing Neo4j cache for target {target_description}")
|
||||
graph_database.clear_cache(tenant_database_name)
|
||||
|
||||
logger.info(f"Dropping temporary Neo4j database {tmp_database_name}")
|
||||
graph_database.drop_database(tmp_database_name)
|
||||
@@ -316,14 +373,16 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
|
||||
logger.exception(exception_message)
|
||||
ingestion_exceptions["global_error"] = exception_message
|
||||
|
||||
# Recover graph_data_ready based on how far the swap got.
|
||||
# Partial drop (mid-batch failure) may leave `subgraph_dropped=False`
|
||||
# with data partially deleted, so we prefer that over permanently blocked queries.
|
||||
# Recover `graph_data_ready` based on how far the swap got
|
||||
# Partial drop (mid-batch failure) may leave `subgraph_dropped=False` with data partially deleted,
|
||||
# so we prefer that over permanently blocked queries
|
||||
try:
|
||||
if sync_completed:
|
||||
db_utils.set_graph_data_ready(attack_paths_scan, True)
|
||||
elif provider_gated and not subgraph_dropped:
|
||||
db_utils.set_provider_graph_data_ready(attack_paths_scan, True)
|
||||
db_utils.set_provider_graph_data_ready(
|
||||
attack_paths_scan, True, target_sink_backend
|
||||
)
|
||||
|
||||
except Exception:
|
||||
logger.error(
|
||||
|
||||
@@ -1,40 +1,57 @@
|
||||
"""
|
||||
Graph sync operations for Attack Paths.
|
||||
|
||||
This module handles syncing graph data from temporary scan databases
|
||||
to the tenant database, adding provider isolation labels and properties.
|
||||
Reads nodes and relationships out of the cartography temp database (always
|
||||
Neo4j) and hands them to the configured sink (Neo4j or Neptune) in batches.
|
||||
Backend-specific Cypher (MERGE shape, ID strategy, indexes) lives in each
|
||||
sink; this module owns the source read loop, per-batch grouping, and the
|
||||
list-property materialisation policy (see `NormalizedList`).
|
||||
|
||||
Each list-typed node property that appears in the provider's
|
||||
`normalized_lists` catalog becomes a set of child item nodes connected to
|
||||
the parent by a typed edge. A list-typed property that is not in the
|
||||
catalog is serialised to a comma-delimited string and emits a one-time
|
||||
warning per (label, property), surfacing Cartography fields that should be
|
||||
added to the catalog.
|
||||
"""
|
||||
|
||||
import json
|
||||
import time
|
||||
from collections import defaultdict
|
||||
from typing import Any
|
||||
|
||||
import neo4j
|
||||
from api.attack_paths import database as graph_database
|
||||
from api.attack_paths import sink as sink_module
|
||||
from celery.utils.log import get_task_logger
|
||||
from tasks.jobs.attack_paths.config import (
|
||||
PROVIDER_CONFIGS,
|
||||
PROVIDER_ISOLATION_PROPERTIES,
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
SYNC_BATCH_SIZE,
|
||||
NormalizedList,
|
||||
get_provider_label,
|
||||
get_tenant_label,
|
||||
)
|
||||
from tasks.jobs.attack_paths.queries import (
|
||||
NODE_FETCH_QUERY,
|
||||
NODE_SYNC_TEMPLATE,
|
||||
RELATIONSHIP_SYNC_TEMPLATE,
|
||||
RELATIONSHIPS_FETCH_QUERY,
|
||||
render_cypher_template,
|
||||
)
|
||||
|
||||
logger = get_task_logger(__name__)
|
||||
|
||||
# (label, property) tuples for which we've already emitted the
|
||||
# "unnormalised list" warning. Module-level so the warning fires once per
|
||||
# process, not once per node.
|
||||
_WARNED_UNNORMALIZED: set[tuple[str, str]] = set()
|
||||
|
||||
|
||||
def sync_graph(
|
||||
source_database: str,
|
||||
target_database: str,
|
||||
tenant_id: str,
|
||||
provider_id: str,
|
||||
provider_type: str,
|
||||
) -> dict[str, int]:
|
||||
"""
|
||||
Sync all nodes and relationships from source to target database.
|
||||
@@ -44,25 +61,38 @@ def sync_graph(
|
||||
`target_database`: The tenant database
|
||||
`tenant_id`: The tenant ID for isolation
|
||||
`provider_id`: The provider ID for isolation
|
||||
`provider_type`: Provider type key (e.g. "aws"), used to resolve the
|
||||
`NormalizedList` catalog from `PROVIDER_CONFIGS`.
|
||||
|
||||
Returns:
|
||||
Dict with counts of synced nodes and relationships
|
||||
Dict with counts of synced nodes, child item nodes, and relationships.
|
||||
"""
|
||||
nodes_synced = sync_nodes(
|
||||
sink = sink_module.get_backend()
|
||||
sink.ensure_sync_indexes(target_database)
|
||||
|
||||
normalized_lists = _resolve_normalized_lists(provider_type)
|
||||
|
||||
node_result = sync_nodes(
|
||||
source_database,
|
||||
target_database,
|
||||
tenant_id,
|
||||
provider_id,
|
||||
sink,
|
||||
normalized_lists,
|
||||
)
|
||||
relationships_synced = sync_relationships(
|
||||
source_database,
|
||||
target_database,
|
||||
provider_id,
|
||||
sink,
|
||||
)
|
||||
|
||||
return {
|
||||
"nodes": nodes_synced,
|
||||
"relationships": relationships_synced,
|
||||
"nodes": node_result["parents"],
|
||||
"child_nodes": node_result["children"],
|
||||
"relationships": relationships_synced + node_result["parent_child_rels"],
|
||||
"structural_relationships": relationships_synced,
|
||||
"item_relationships": node_result["parent_child_rels"],
|
||||
}
|
||||
|
||||
|
||||
@@ -71,22 +101,35 @@ def sync_nodes(
|
||||
target_database: str,
|
||||
tenant_id: str,
|
||||
provider_id: str,
|
||||
) -> int:
|
||||
sink: Any,
|
||||
normalized_lists: list[NormalizedList],
|
||||
) -> dict[str, int]:
|
||||
"""
|
||||
Sync nodes from source to target database.
|
||||
Sync nodes from source to target database, exploding catalogued list
|
||||
properties into child nodes + parent->child edges.
|
||||
|
||||
Adds `_ProviderResource` label and dynamic `_Tenant_{id}` and `_Provider_{id}`
|
||||
isolation labels to all nodes.
|
||||
isolation labels to all nodes (parents and children alike).
|
||||
|
||||
Source and target sessions are opened sequentially per batch to avoid
|
||||
holding two Bolt connections simultaneously for the entire sync duration.
|
||||
"""
|
||||
t0 = time.perf_counter()
|
||||
last_id = -1
|
||||
total_synced = 0
|
||||
parents_synced = 0
|
||||
children_synced = 0
|
||||
parent_child_rels = 0
|
||||
|
||||
catalog = _build_catalog_index(normalized_lists)
|
||||
extra_labels = _build_extra_labels(tenant_id, provider_id)
|
||||
|
||||
while True:
|
||||
grouped: dict[tuple[str, ...], list[dict[str, Any]]] = defaultdict(list)
|
||||
tb = time.perf_counter()
|
||||
prev_children = children_synced
|
||||
prev_rels = parent_child_rels
|
||||
parent_groups: dict[tuple[str, ...], list[dict[str, Any]]] = defaultdict(list)
|
||||
child_groups: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||
rel_groups: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||
batch_count = 0
|
||||
|
||||
with graph_database.get_session(source_database) as source_session:
|
||||
@@ -97,43 +140,65 @@ def sync_nodes(
|
||||
for record in result:
|
||||
batch_count += 1
|
||||
last_id = record["internal_id"]
|
||||
key, value = _node_to_sync_dict(record, provider_id)
|
||||
grouped[key].append(value)
|
||||
key, parent_dict, children, rels = _node_to_sync_dict(
|
||||
record, provider_id, catalog
|
||||
)
|
||||
parent_groups[key].append(parent_dict)
|
||||
for child in children:
|
||||
child_groups[child["_child_label"]].append(child["row"])
|
||||
for rel in rels:
|
||||
rel_groups[rel["rel_type"]].append(rel["row"])
|
||||
|
||||
if batch_count == 0:
|
||||
break
|
||||
|
||||
with graph_database.get_session(target_database) as target_session:
|
||||
for labels, batch in grouped.items():
|
||||
label_set = set(labels)
|
||||
label_set.add(PROVIDER_RESOURCE_LABEL)
|
||||
label_set.add(get_tenant_label(tenant_id))
|
||||
label_set.add(get_provider_label(provider_id))
|
||||
node_labels = ":".join(f"`{label}`" for label in sorted(label_set))
|
||||
for labels, batch in parent_groups.items():
|
||||
sink.write_nodes(
|
||||
target_database, _render_labels(labels, extra_labels), batch
|
||||
)
|
||||
|
||||
query = render_cypher_template(
|
||||
NODE_SYNC_TEMPLATE, {"__NODE_LABELS__": node_labels}
|
||||
)
|
||||
target_session.run(query, {"rows": batch})
|
||||
for child_label, batch in child_groups.items():
|
||||
sink.write_nodes(
|
||||
target_database,
|
||||
_render_labels((child_label,), extra_labels),
|
||||
batch,
|
||||
)
|
||||
children_synced += len(batch)
|
||||
|
||||
total_synced += batch_count
|
||||
for rel_type, batch in rel_groups.items():
|
||||
sink.write_relationships(target_database, rel_type, provider_id, batch)
|
||||
parent_child_rels += len(batch)
|
||||
|
||||
parents_synced += batch_count
|
||||
batch_dt = time.perf_counter() - tb
|
||||
batch_elements = (
|
||||
batch_count
|
||||
+ (children_synced - prev_children)
|
||||
+ (parent_child_rels - prev_rels)
|
||||
)
|
||||
rate = batch_elements / batch_dt if batch_dt else 0
|
||||
logger.info(
|
||||
f"Synced {total_synced} nodes from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
|
||||
f"[sync nodes] {parents_synced} source (+{children_synced} items, "
|
||||
f"+{parent_child_rels} item rels) · batch {batch_dt:.1f}s · "
|
||||
f"elapsed {time.perf_counter() - t0:.1f}s · ~{rate:.0f} elem/s"
|
||||
)
|
||||
|
||||
return total_synced
|
||||
return {
|
||||
"parents": parents_synced,
|
||||
"children": children_synced,
|
||||
"parent_child_rels": parent_child_rels,
|
||||
}
|
||||
|
||||
|
||||
def sync_relationships(
|
||||
source_database: str,
|
||||
target_database: str,
|
||||
provider_id: str,
|
||||
sink: Any,
|
||||
) -> int:
|
||||
"""
|
||||
Sync relationships from source to target database.
|
||||
|
||||
Matches source and target nodes by `_provider_element_id` in the tenant database.
|
||||
|
||||
Source and target sessions are opened sequentially per batch to avoid
|
||||
holding two Bolt connections simultaneously for the entire sync duration.
|
||||
"""
|
||||
@@ -142,6 +207,7 @@ def sync_relationships(
|
||||
total_synced = 0
|
||||
|
||||
while True:
|
||||
tb = time.perf_counter()
|
||||
grouped: dict[str, list[dict[str, Any]]] = defaultdict(list)
|
||||
batch_count = 0
|
||||
|
||||
@@ -159,32 +225,197 @@ def sync_relationships(
|
||||
if batch_count == 0:
|
||||
break
|
||||
|
||||
with graph_database.get_session(target_database) as target_session:
|
||||
for rel_type, batch in grouped.items():
|
||||
query = render_cypher_template(
|
||||
RELATIONSHIP_SYNC_TEMPLATE, {"__REL_TYPE__": rel_type}
|
||||
)
|
||||
target_session.run(query, {"rows": batch})
|
||||
for rel_type, batch in grouped.items():
|
||||
sink.write_relationships(target_database, rel_type, provider_id, batch)
|
||||
|
||||
total_synced += batch_count
|
||||
batch_dt = time.perf_counter() - tb
|
||||
rate = batch_count / batch_dt if batch_dt else 0
|
||||
logger.info(
|
||||
f"Synced {total_synced} relationships from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
|
||||
f"[sync rels] {total_synced} structural · batch {batch_dt:.1f}s · "
|
||||
f"elapsed {time.perf_counter() - t0:.1f}s · ~{rate:.0f}/s"
|
||||
)
|
||||
|
||||
return total_synced
|
||||
|
||||
|
||||
def _node_to_sync_dict(
|
||||
record: neo4j.Record, provider_id: str
|
||||
) -> tuple[tuple[str, ...], dict[str, Any]]:
|
||||
"""Transform a source node record into a (grouping_key, sync_dict) pair."""
|
||||
record: neo4j.Record,
|
||||
provider_id: str,
|
||||
catalog: dict[tuple[str, str], NormalizedList],
|
||||
) -> tuple[
|
||||
tuple[str, ...],
|
||||
dict[str, Any],
|
||||
list[dict[str, Any]],
|
||||
list[dict[str, Any]],
|
||||
]:
|
||||
"""Transform a source node record into a (grouping_key, sync_dict, children, rels) tuple.
|
||||
|
||||
Catalogued list properties are popped from `props` and emitted as child
|
||||
nodes + parent->child relationships.
|
||||
"""
|
||||
props = dict(record["props"] or {})
|
||||
_strip_internal_properties(props)
|
||||
labels = tuple(sorted(set(record["labels"] or [])))
|
||||
return labels, {
|
||||
"provider_element_id": f"{provider_id}:{record['element_id']}",
|
||||
parent_element_id = f"{provider_id}:{record['element_id']}"
|
||||
|
||||
children, rels = _explode_catalogued_lists(
|
||||
labels, props, catalog, provider_id, parent_element_id
|
||||
)
|
||||
|
||||
_normalize_sink_properties(props, labels)
|
||||
|
||||
parent = {
|
||||
"provider_element_id": parent_element_id,
|
||||
"props": props,
|
||||
}
|
||||
return labels, parent, children, rels
|
||||
|
||||
|
||||
def _explode_catalogued_lists(
|
||||
labels: tuple[str, ...],
|
||||
props: dict[str, Any],
|
||||
catalog: dict[tuple[str, str], NormalizedList],
|
||||
provider_id: str,
|
||||
parent_element_id: str,
|
||||
) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
|
||||
"""Pop catalogued list properties from `props` and produce child + rel emits.
|
||||
|
||||
A node may carry multiple labels (e.g. `AWSPolicyStatement` plus
|
||||
`_AWSResource`); we check each label for catalog matches independently.
|
||||
Returns:
|
||||
- children: list of {"_child_label": str, "row": <node row>} dicts.
|
||||
- rels: list of {"rel_type": str, "row": <rel row>} dicts.
|
||||
"""
|
||||
children: list[dict[str, Any]] = []
|
||||
rels: list[dict[str, Any]] = []
|
||||
|
||||
for label in labels:
|
||||
for key in list(props.keys()):
|
||||
spec = catalog.get((label, key))
|
||||
if spec is None:
|
||||
continue
|
||||
value = props.pop(key)
|
||||
if value is None:
|
||||
continue
|
||||
if not isinstance(value, list):
|
||||
# Catalogued but not actually a list this scan - fall back to
|
||||
# the generic normaliser so we don't lose the value.
|
||||
props[key] = value
|
||||
continue
|
||||
for item in value:
|
||||
child_value_key, child_props = _build_child_props(spec, item)
|
||||
if child_value_key is None:
|
||||
continue
|
||||
child_element_id = _build_child_id(
|
||||
provider_id, spec.child_label, child_value_key
|
||||
)
|
||||
children.append(
|
||||
{
|
||||
"_child_label": spec.child_label,
|
||||
"row": {
|
||||
"provider_element_id": child_element_id,
|
||||
"props": child_props,
|
||||
},
|
||||
}
|
||||
)
|
||||
rels.append(
|
||||
{
|
||||
"rel_type": spec.rel_type,
|
||||
"row": {
|
||||
"start_element_id": parent_element_id,
|
||||
"end_element_id": child_element_id,
|
||||
"provider_element_id": (
|
||||
f"{parent_element_id}::{spec.rel_type}::"
|
||||
f"{child_element_id}"
|
||||
),
|
||||
"props": {},
|
||||
},
|
||||
}
|
||||
)
|
||||
|
||||
return children, rels
|
||||
|
||||
|
||||
def _build_child_props(
|
||||
spec: NormalizedList, item: Any
|
||||
) -> tuple[str | None, dict[str, Any]]:
|
||||
"""Translate one list element into a child node's prop dict.
|
||||
|
||||
Returns (dedup_key, props). The dedup_key is what makes two child nodes
|
||||
equal within (tenant, provider) - used to build `_provider_element_id`.
|
||||
For scalar mode, the dedup key is the value itself. For dict mode it is
|
||||
a stable concatenation of the mapped fields in `field_map` order.
|
||||
"""
|
||||
if not spec.field_map:
|
||||
if isinstance(item, (dict, list)):
|
||||
# Defensive: caller marked this list as scalar but elements are
|
||||
# structured. Convert to a stable string so the value survives.
|
||||
value_str = json.dumps(item, sort_keys=True, default=str)
|
||||
else:
|
||||
value_str = str(item)
|
||||
return value_str, {"value": value_str}
|
||||
|
||||
if not isinstance(item, dict):
|
||||
# Catalogued as dict-shape but got a scalar. Skip - caller will see
|
||||
# the value go missing and can fix the field_map.
|
||||
return None, {}
|
||||
|
||||
props: dict[str, Any] = {}
|
||||
dedup_parts: list[str] = []
|
||||
for src_key, child_field in spec.field_map:
|
||||
raw = item.get(src_key)
|
||||
value_str = _to_sink_property_value(raw) if raw is not None else ""
|
||||
props[child_field] = value_str
|
||||
dedup_parts.append(f"{child_field}={value_str}")
|
||||
return "::".join(dedup_parts), props
|
||||
|
||||
|
||||
def _build_child_id(provider_id: str, child_label: str, value_key: str) -> str:
|
||||
"""Deterministic `_provider_element_id` for a list-item child node.
|
||||
|
||||
Dedupes within (tenant, provider): multiple parents referencing the same
|
||||
value share one child node via the existing MERGE-on-_provider_element_id
|
||||
index in both sinks.
|
||||
"""
|
||||
return f"{provider_id}::{child_label}::{value_key}"
|
||||
|
||||
|
||||
def _build_catalog_index(
|
||||
normalized_lists: list[NormalizedList],
|
||||
) -> dict[tuple[str, str], NormalizedList]:
|
||||
"""Index the catalog by (source_label, source_property) for O(1) lookup."""
|
||||
return {
|
||||
(spec.source_label, spec.source_property): spec for spec in normalized_lists
|
||||
}
|
||||
|
||||
|
||||
def _build_extra_labels(tenant_id: str, provider_id: str) -> tuple[str, ...]:
|
||||
return (
|
||||
PROVIDER_RESOURCE_LABEL,
|
||||
get_tenant_label(tenant_id),
|
||||
get_provider_label(provider_id),
|
||||
)
|
||||
|
||||
|
||||
def _render_labels(base_labels: tuple[str, ...], extra_labels: tuple[str, ...]) -> str:
|
||||
"""Render the Cypher label string for a node-write batch."""
|
||||
label_set = set(base_labels) | set(extra_labels)
|
||||
return ":".join(f"`{label}`" for label in sorted(label_set))
|
||||
|
||||
|
||||
def _resolve_normalized_lists(provider_type: str) -> list[NormalizedList]:
|
||||
config = PROVIDER_CONFIGS.get(provider_type)
|
||||
if config is None:
|
||||
# Unknown provider: empty catalog. Any list-typed property will be
|
||||
# serialised to a comma-delimited string with one warning per
|
||||
# (label, property).
|
||||
logger.warning(
|
||||
"Provider type %s not in PROVIDER_CONFIGS; no normalized_lists active",
|
||||
provider_type,
|
||||
)
|
||||
return []
|
||||
return config.normalized_lists
|
||||
|
||||
|
||||
def _rel_to_sync_dict(
|
||||
@@ -193,7 +424,11 @@ def _rel_to_sync_dict(
|
||||
"""Transform a source relationship record into a (grouping_key, sync_dict) pair."""
|
||||
props = dict(record["props"] or {})
|
||||
_strip_internal_properties(props)
|
||||
# Relationship properties go through the same primitive coercion as
|
||||
# nodes; catalog-driven materialisation applies to node properties only.
|
||||
_normalize_sink_properties(props, labels=None)
|
||||
rel_type = record["rel_type"]
|
||||
|
||||
return rel_type, {
|
||||
"start_element_id": f"{provider_id}:{record['start_element_id']}",
|
||||
"end_element_id": f"{provider_id}:{record['end_element_id']}",
|
||||
@@ -206,3 +441,80 @@ def _strip_internal_properties(props: dict[str, Any]) -> None:
|
||||
"""Remove provider isolation properties before the += spread in sync templates."""
|
||||
for key in PROVIDER_ISOLATION_PROPERTIES:
|
||||
props.pop(key, None)
|
||||
|
||||
|
||||
def _normalize_sink_properties(
|
||||
props: dict[str, Any], labels: tuple[str, ...] | None
|
||||
) -> None:
|
||||
"""Normalize property values to primitive Cypher literals for either sink.
|
||||
|
||||
Attack-paths node and relationship properties are written as primitive
|
||||
scalars regardless of the active sink (Neo4j or Neptune). The convention
|
||||
is driven by Neptune's openCypher type restrictions, which reject list,
|
||||
map, temporal and spatial property values, but it is applied uniformly
|
||||
so that custom and predefined queries are portable across sinks without
|
||||
runtime rewriting.
|
||||
|
||||
Concretely:
|
||||
- Temporal values (neo4j.time.{DateTime,Date,Time,Duration}) become
|
||||
their ISO-8601 string representation.
|
||||
- Spatial values (neo4j.spatial.Point and subclasses) become their
|
||||
WKT-style string representation.
|
||||
- Maps / dicts become a JSON-encoded string, read back with `CONTAINS`
|
||||
substring checks inside queries.
|
||||
- Lists become a comma-delimited string. Catalogued list properties
|
||||
are materialised as child item nodes upstream in
|
||||
`_explode_catalogued_lists` and never reach this point; any list
|
||||
seen here is uncatalogued, so we log a one-time warning per
|
||||
(label, property) to surface Cartography fields that should be
|
||||
added to the catalog.
|
||||
|
||||
`labels` is only used for the warning message; pass `None` for
|
||||
relationship props (no label context).
|
||||
"""
|
||||
for key, value in list(props.items()):
|
||||
if isinstance(value, list) and labels is not None:
|
||||
_warn_unnormalized_list(labels, key)
|
||||
props[key] = _to_sink_property_value(value)
|
||||
|
||||
|
||||
def _warn_unnormalized_list(labels: tuple[str, ...], key: str) -> None:
|
||||
"""Warn once per (label, property), on the real label(s) only.
|
||||
|
||||
Every synced node also carries internal isolation labels (`_AWSResource`,
|
||||
`_ProviderResource`, `_Tenant_*`, `_Provider_*`); warning on those just
|
||||
doubles the noise, so skip them and point at the actionable Cartography
|
||||
label. Falls back to all labels if only internal ones are present.
|
||||
"""
|
||||
real_labels = [label for label in labels if not label.startswith("_")]
|
||||
for label in real_labels or labels:
|
||||
token = (label, key)
|
||||
if token in _WARNED_UNNORMALIZED:
|
||||
continue
|
||||
_WARNED_UNNORMALIZED.add(token)
|
||||
logger.warning(
|
||||
"Unnormalized list property %s.%s reached sink as comma-string; "
|
||||
"add a NormalizedList entry to the provider catalog to explode it",
|
||||
label,
|
||||
key,
|
||||
)
|
||||
|
||||
|
||||
def _to_sink_property_value(value: Any) -> Any:
|
||||
if hasattr(value, "iso_format") and callable(value.iso_format):
|
||||
return value.iso_format()
|
||||
|
||||
if type(value).__module__.startswith("neo4j.spatial"):
|
||||
return str(value)
|
||||
|
||||
if isinstance(value, dict):
|
||||
# openCypher `SET` rejects map property values: encode as JSON so the structured payload
|
||||
# survives the round-trip and is queryable with `CONTAINS` substring checks
|
||||
return json.dumps(value, sort_keys=True, default=str)
|
||||
|
||||
if isinstance(value, list):
|
||||
# openCypher `SET` rejects list/array property values: encode as a
|
||||
# delimited string read back with split() inside queries
|
||||
return ",".join(str(_to_sink_property_value(v)) for v in value)
|
||||
|
||||
return value
|
||||
|
||||
@@ -1,4 +1,5 @@
|
||||
from api.attack_paths import database as graph_database
|
||||
from api.attack_paths import sink as sink_module
|
||||
from api.db_router import MainRouter
|
||||
from api.db_utils import batch_delete, rls_transaction
|
||||
from api.models import (
|
||||
@@ -76,6 +77,12 @@ def delete_provider(tenant_id: str, pk: str):
|
||||
"id", flat=True
|
||||
)
|
||||
)
|
||||
attack_paths_sink_backends = list(
|
||||
AttackPathsScan.all_objects.filter(provider=instance)
|
||||
.values_list("sink_backend", flat=True)
|
||||
.distinct()
|
||||
.order_by("sink_backend")
|
||||
)
|
||||
|
||||
deletion_steps = [
|
||||
("Scan Summaries", ScanSummary.all_objects.filter(scan__provider=instance)),
|
||||
@@ -97,7 +104,13 @@ def delete_provider(tenant_id: str, pk: str):
|
||||
# Delete the Attack Paths' graph data related to the provider from the tenant database
|
||||
tenant_database_name = graph_database.get_database_name(tenant_id)
|
||||
try:
|
||||
graph_database.drop_subgraph(tenant_database_name, str(pk))
|
||||
if attack_paths_sink_backends:
|
||||
for sink_backend in attack_paths_sink_backends:
|
||||
sink_module.get_backend_for_name(sink_backend).drop_subgraph(
|
||||
tenant_database_name, str(pk)
|
||||
)
|
||||
else:
|
||||
graph_database.drop_subgraph(tenant_database_name, str(pk))
|
||||
|
||||
except graph_database.GraphDatabaseQueryException as gdb_error:
|
||||
logger.error(f"Error deleting Provider graph data: {gdb_error}")
|
||||
|
||||
@@ -23,6 +23,14 @@ from tasks.jobs.attack_paths import internet as internet_module
|
||||
from tasks.jobs.attack_paths import sync as sync_module
|
||||
from tasks.jobs.attack_paths.scan import run as attack_paths_run
|
||||
|
||||
SYNC_RESULT_EMPTY = {
|
||||
"nodes": 0,
|
||||
"child_nodes": 0,
|
||||
"relationships": 0,
|
||||
"structural_relationships": 0,
|
||||
"item_relationships": 0,
|
||||
}
|
||||
|
||||
|
||||
@pytest.mark.django_db
|
||||
class TestAttackPathsRun:
|
||||
@@ -32,6 +40,7 @@ class TestAttackPathsRun:
|
||||
"tasks.jobs.attack_paths.scan.utils.call_within_event_loop",
|
||||
side_effect=lambda fn, *a, **kw: fn(*a, **kw),
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.set_scan_migrated")
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.set_graph_data_ready")
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
|
||||
@@ -39,7 +48,7 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.sync.sync_graph",
|
||||
return_value={"nodes": 0, "relationships": 0},
|
||||
return_value=SYNC_RESULT_EMPTY,
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph", return_value=0)
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
|
||||
@@ -48,11 +57,11 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_uri",
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
|
||||
return_value="bolt://neo4j",
|
||||
)
|
||||
@patch(
|
||||
@@ -66,7 +75,7 @@ class TestAttackPathsRun:
|
||||
def test_run_success_flow(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_create_db,
|
||||
mock_clear_cache,
|
||||
mock_cartography_indexes,
|
||||
@@ -83,6 +92,7 @@ class TestAttackPathsRun:
|
||||
mock_finish,
|
||||
mock_set_provider_graph_data_ready,
|
||||
mock_set_graph_data_ready,
|
||||
mock_set_scan_migrated,
|
||||
mock_event_loop,
|
||||
mock_drop_db,
|
||||
tenants_fixture,
|
||||
@@ -159,6 +169,7 @@ class TestAttackPathsRun:
|
||||
target_database="tenant-db",
|
||||
tenant_id=str(provider.tenant_id),
|
||||
provider_id=str(provider.id),
|
||||
provider_type="aws",
|
||||
)
|
||||
mock_get_ingestion.assert_called_once_with(provider.provider)
|
||||
mock_event_loop.assert_called_once()
|
||||
@@ -172,9 +183,12 @@ class TestAttackPathsRun:
|
||||
attack_paths_scan, StateChoices.COMPLETED, ingestion_result
|
||||
)
|
||||
mock_set_provider_graph_data_ready.assert_called_once_with(
|
||||
attack_paths_scan, False
|
||||
attack_paths_scan, False, "neo4j"
|
||||
)
|
||||
mock_set_graph_data_ready.assert_called_once_with(attack_paths_scan, True)
|
||||
# is_migrated is flipped to True only after the sync succeeds, so reads
|
||||
# don't switch to the new catalog/sink before the graph is live.
|
||||
mock_set_scan_migrated.assert_called_once_with(attack_paths_scan, True, "neo4j")
|
||||
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.utils.stringify_exception",
|
||||
@@ -194,13 +208,13 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_database_name",
|
||||
return_value="db-scan-id",
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.initialize_prowler_provider",
|
||||
return_value=MagicMock(_enabled_regions=["us-east-1"]),
|
||||
@@ -212,7 +226,7 @@ class TestAttackPathsRun:
|
||||
def test_run_failure_marks_scan_failed(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_get_db_name,
|
||||
mock_create_db,
|
||||
mock_cartography_indexes,
|
||||
@@ -293,13 +307,13 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_database_name",
|
||||
return_value="db-scan-id",
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.initialize_prowler_provider",
|
||||
return_value=MagicMock(_enabled_regions=["us-east-1"]),
|
||||
@@ -311,7 +325,7 @@ class TestAttackPathsRun:
|
||||
def test_failure_before_gate_does_not_flip_graph_data_ready_true(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_get_db_name,
|
||||
mock_create_db,
|
||||
mock_cartography_indexes,
|
||||
@@ -396,13 +410,13 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.internet.analysis")
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_database_name",
|
||||
return_value="db-scan-id",
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.initialize_prowler_provider",
|
||||
return_value=MagicMock(_enabled_regions=["us-east-1"]),
|
||||
@@ -414,7 +428,7 @@ class TestAttackPathsRun:
|
||||
def test_run_failure_marks_scan_failed_even_when_drop_database_fails(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_get_db_name,
|
||||
mock_create_db,
|
||||
mock_cartography_indexes,
|
||||
@@ -493,7 +507,7 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.sync.sync_graph",
|
||||
return_value={"nodes": 0, "relationships": 0},
|
||||
return_value=SYNC_RESULT_EMPTY,
|
||||
)
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
|
||||
@@ -505,11 +519,11 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_uri",
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
|
||||
return_value="bolt://neo4j",
|
||||
)
|
||||
@patch(
|
||||
@@ -523,7 +537,7 @@ class TestAttackPathsRun:
|
||||
def test_failure_after_gate_before_drop_restores_graph_data_ready(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_create_db,
|
||||
mock_clear_cache,
|
||||
mock_cartography_indexes,
|
||||
@@ -589,8 +603,8 @@ class TestAttackPathsRun:
|
||||
attack_paths_run(str(tenant.id), str(scan.id), "task-456")
|
||||
|
||||
assert mock_set_provider_graph_data_ready.call_args_list == [
|
||||
call(attack_paths_scan, False),
|
||||
call(attack_paths_scan, True),
|
||||
call(attack_paths_scan, False, "neo4j"),
|
||||
call(attack_paths_scan, True, "neo4j"),
|
||||
]
|
||||
|
||||
@patch(
|
||||
@@ -618,11 +632,11 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_uri",
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
|
||||
return_value="bolt://neo4j",
|
||||
)
|
||||
@patch(
|
||||
@@ -636,7 +650,7 @@ class TestAttackPathsRun:
|
||||
def test_failure_after_drop_before_sync_leaves_graph_data_ready_false(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_create_db,
|
||||
mock_clear_cache,
|
||||
mock_cartography_indexes,
|
||||
@@ -703,7 +717,7 @@ class TestAttackPathsRun:
|
||||
|
||||
# Only called with False (gate), never with True (no recovery for partial data)
|
||||
mock_set_provider_graph_data_ready.assert_called_once_with(
|
||||
attack_paths_scan, False
|
||||
attack_paths_scan, False, "neo4j"
|
||||
)
|
||||
|
||||
@patch(
|
||||
@@ -716,6 +730,7 @@ class TestAttackPathsRun:
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_database")
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.set_scan_migrated")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.db_utils.set_graph_data_ready",
|
||||
side_effect=[RuntimeError("flag failed"), None],
|
||||
@@ -725,7 +740,7 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.sync.sync_graph",
|
||||
return_value={"nodes": 0, "relationships": 0},
|
||||
return_value=SYNC_RESULT_EMPTY,
|
||||
)
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph")
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
|
||||
@@ -734,11 +749,11 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_uri",
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
|
||||
return_value="bolt://neo4j",
|
||||
)
|
||||
@patch(
|
||||
@@ -752,7 +767,7 @@ class TestAttackPathsRun:
|
||||
def test_failure_after_sync_restores_graph_data_ready(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_create_db,
|
||||
mock_clear_cache,
|
||||
mock_cartography_indexes,
|
||||
@@ -768,6 +783,7 @@ class TestAttackPathsRun:
|
||||
mock_update_progress,
|
||||
mock_set_provider_graph_data_ready,
|
||||
mock_set_graph_data_ready,
|
||||
mock_set_scan_migrated,
|
||||
mock_finish,
|
||||
mock_drop_db,
|
||||
mock_event_loop,
|
||||
@@ -824,8 +840,11 @@ class TestAttackPathsRun:
|
||||
]
|
||||
# set_provider_graph_data_ready only called once with False (the gate)
|
||||
mock_set_provider_graph_data_ready.assert_called_once_with(
|
||||
attack_paths_scan, False
|
||||
attack_paths_scan, False, "neo4j"
|
||||
)
|
||||
# is_migrated is flipped once after the sync and is not touched again by
|
||||
# the failure-recovery branch
|
||||
mock_set_scan_migrated.assert_called_once_with(attack_paths_scan, True, "neo4j")
|
||||
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.utils.stringify_exception",
|
||||
@@ -843,7 +862,7 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.sync.sync_graph",
|
||||
return_value={"nodes": 0, "relationships": 0},
|
||||
return_value=SYNC_RESULT_EMPTY,
|
||||
)
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
|
||||
@@ -855,11 +874,11 @@ class TestAttackPathsRun:
|
||||
@patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
|
||||
@patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
|
||||
@patch(
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_uri",
|
||||
"tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
|
||||
return_value="bolt://neo4j",
|
||||
)
|
||||
@patch(
|
||||
@@ -873,7 +892,7 @@ class TestAttackPathsRun:
|
||||
def test_recovery_failure_does_not_suppress_original_exception(
|
||||
self,
|
||||
mock_init_provider,
|
||||
mock_get_uri,
|
||||
mock_get_ingest_uri,
|
||||
mock_create_db,
|
||||
mock_clear_cache,
|
||||
mock_cartography_indexes,
|
||||
@@ -1116,7 +1135,7 @@ class TestFailAttackPathsScan:
|
||||
fail_attack_paths_scan(str(tenant.id), "nonexistent", "setup exploded")
|
||||
|
||||
def test_fail_recovers_graph_data_ready_when_data_exists(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
self, tenants_fixture, providers_fixture, scans_fixture, sink_backend_stub
|
||||
):
|
||||
from tasks.jobs.attack_paths.db_utils import fail_attack_paths_scan
|
||||
|
||||
@@ -1135,16 +1154,18 @@ class TestFailAttackPathsScan:
|
||||
state=StateChoices.EXECUTING,
|
||||
)
|
||||
|
||||
# `recover_graph_data_ready` routes `has_provider_data` through
|
||||
# `sink_module.get_backend_for_scan(scan)`. With `is_migrated=False`
|
||||
# and the default `ATTACK_PATHS_SINK_DATABASE=neo4j`, the factory
|
||||
# returns the active backend, which `sink_backend_stub` replaces.
|
||||
sink_backend_stub.has_provider_data.return_value = True
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.db_utils.retrieve_attack_paths_scan",
|
||||
return_value=attack_paths_scan,
|
||||
),
|
||||
patch("tasks.jobs.attack_paths.db_utils.graph_database.drop_database"),
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data",
|
||||
return_value=True,
|
||||
),
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.db_utils.set_provider_graph_data_ready"
|
||||
) as mock_set_ready,
|
||||
@@ -1154,7 +1175,7 @@ class TestFailAttackPathsScan:
|
||||
mock_set_ready.assert_called_once_with(attack_paths_scan, True)
|
||||
|
||||
def test_fail_leaves_graph_data_ready_false_when_no_data(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
self, tenants_fixture, providers_fixture, scans_fixture, sink_backend_stub
|
||||
):
|
||||
from tasks.jobs.attack_paths.db_utils import fail_attack_paths_scan
|
||||
|
||||
@@ -1173,16 +1194,14 @@ class TestFailAttackPathsScan:
|
||||
state=StateChoices.EXECUTING,
|
||||
)
|
||||
|
||||
sink_backend_stub.has_provider_data.return_value = False
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.db_utils.retrieve_attack_paths_scan",
|
||||
return_value=attack_paths_scan,
|
||||
),
|
||||
patch("tasks.jobs.attack_paths.db_utils.graph_database.drop_database"),
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data",
|
||||
return_value=False,
|
||||
),
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.db_utils.set_provider_graph_data_ready"
|
||||
) as mock_set_ready,
|
||||
@@ -1271,6 +1290,20 @@ class TestAttackPathsFindingsHelpers:
|
||||
[call(mock_session, stmt) for stmt in FINDINGS_INDEX_STATEMENTS]
|
||||
)
|
||||
|
||||
def test_create_findings_indexes_runs_even_when_sink_is_neptune(self, settings):
|
||||
# The index helpers run against the temp ingest DB, which is always
|
||||
# Neo4j regardless of the configured sink. A Neptune sink must not
|
||||
# suppress index creation on that DB (regression for the dropped
|
||||
# in-helper sink gate).
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
|
||||
mock_session = MagicMock()
|
||||
with patch("tasks.jobs.attack_paths.indexes.run_write_query") as mock_run_write:
|
||||
indexes_module.create_findings_indexes(mock_session)
|
||||
|
||||
from tasks.jobs.attack_paths.indexes import FINDINGS_INDEX_STATEMENTS
|
||||
|
||||
assert mock_run_write.call_count == len(FINDINGS_INDEX_STATEMENTS)
|
||||
|
||||
def test_load_findings_batches_requests(self, providers_fixture):
|
||||
provider = providers_fixture[0]
|
||||
provider.provider = Provider.ProviderChoices.AWS
|
||||
@@ -1802,7 +1835,7 @@ def _make_session_ctx(session, call_order=None, name=None):
|
||||
|
||||
|
||||
class TestSyncNodes:
|
||||
def test_sync_nodes_adds_private_label(self):
|
||||
def test_sync_nodes_passes_isolation_labels_to_sink(self):
|
||||
row = {
|
||||
"internal_id": 1,
|
||||
"element_id": "elem-1",
|
||||
@@ -1812,29 +1845,32 @@ class TestSyncNodes:
|
||||
|
||||
mock_source_1 = MagicMock()
|
||||
mock_source_1.run.return_value = [row]
|
||||
mock_target = MagicMock()
|
||||
mock_source_2 = MagicMock()
|
||||
mock_source_2.run.return_value = []
|
||||
sink = MagicMock()
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[
|
||||
_make_session_ctx(mock_source_1),
|
||||
_make_session_ctx(mock_target),
|
||||
_make_session_ctx(mock_source_2),
|
||||
],
|
||||
):
|
||||
total = sync_module.sync_nodes(
|
||||
"source-db", "target-db", "tenant-1", "prov-1"
|
||||
result = sync_module.sync_nodes(
|
||||
"source-db", "target-db", "tenant-1", "prov-1", sink, []
|
||||
)
|
||||
|
||||
assert total == 1
|
||||
query = mock_target.run.call_args.args[0]
|
||||
assert "_ProviderResource" in query
|
||||
assert "_Tenant_tenant1" in query
|
||||
assert "_Provider_prov1" in query
|
||||
assert result["parents"] == 1
|
||||
sink.write_nodes.assert_called_once()
|
||||
target_db, labels, batch = sink.write_nodes.call_args.args
|
||||
assert target_db == "target-db"
|
||||
assert "_ProviderResource" in labels
|
||||
assert "_Tenant_tenant1" in labels
|
||||
assert "_Provider_prov1" in labels
|
||||
assert batch[0]["provider_element_id"] == "prov-1:elem-1"
|
||||
assert batch[0]["props"] == {"key": "value"}
|
||||
|
||||
def test_sync_nodes_source_closes_before_target_opens(self):
|
||||
def test_sync_nodes_writes_after_source_session_closes(self):
|
||||
row = {
|
||||
"internal_id": 1,
|
||||
"element_id": "elem-1",
|
||||
@@ -1846,21 +1882,23 @@ class TestSyncNodes:
|
||||
|
||||
src_1 = MagicMock()
|
||||
src_1.run.return_value = [row]
|
||||
tgt = MagicMock()
|
||||
src_2 = MagicMock()
|
||||
src_2.run.return_value = []
|
||||
sink = MagicMock()
|
||||
sink.write_nodes.side_effect = lambda *_a, **_kw: call_order.append(
|
||||
"sink:write"
|
||||
)
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[
|
||||
_make_session_ctx(src_1, call_order, "source1"),
|
||||
_make_session_ctx(tgt, call_order, "target"),
|
||||
_make_session_ctx(src_2, call_order, "source2"),
|
||||
],
|
||||
):
|
||||
sync_module.sync_nodes("src-db", "tgt-db", "t-1", "p-1")
|
||||
sync_module.sync_nodes("src-db", "tgt-db", "t-1", "p-1", sink, [])
|
||||
|
||||
assert call_order.index("source1:exit") < call_order.index("target:enter")
|
||||
assert call_order.index("source1:exit") < call_order.index("sink:write")
|
||||
|
||||
def test_sync_nodes_pagination_with_batch_size_1(self):
|
||||
row_a = {
|
||||
@@ -1882,44 +1920,44 @@ class TestSyncNodes:
|
||||
src_2.run.return_value = [row_b]
|
||||
src_3 = MagicMock()
|
||||
src_3.run.return_value = []
|
||||
tgt_1 = MagicMock()
|
||||
tgt_2 = MagicMock()
|
||||
sink = MagicMock()
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[
|
||||
_make_session_ctx(src_1),
|
||||
_make_session_ctx(tgt_1),
|
||||
_make_session_ctx(src_2),
|
||||
_make_session_ctx(tgt_2),
|
||||
_make_session_ctx(src_3),
|
||||
],
|
||||
),
|
||||
patch("tasks.jobs.attack_paths.sync.SYNC_BATCH_SIZE", 1),
|
||||
):
|
||||
total = sync_module.sync_nodes("src", "tgt", "t-1", "p-1")
|
||||
result = sync_module.sync_nodes("src", "tgt", "t-1", "p-1", sink, [])
|
||||
|
||||
assert total == 2
|
||||
assert result["parents"] == 2
|
||||
assert sink.write_nodes.call_count == 2
|
||||
assert src_1.run.call_args.args[1]["last_id"] == -1
|
||||
assert src_2.run.call_args.args[1]["last_id"] == 1
|
||||
|
||||
def test_sync_nodes_empty_source_returns_zero(self):
|
||||
src = MagicMock()
|
||||
src.run.return_value = []
|
||||
sink = MagicMock()
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[_make_session_ctx(src)],
|
||||
) as mock_get_session:
|
||||
total = sync_module.sync_nodes("src", "tgt", "t-1", "p-1")
|
||||
result = sync_module.sync_nodes("src", "tgt", "t-1", "p-1", sink, [])
|
||||
|
||||
assert total == 0
|
||||
assert result["parents"] == 0
|
||||
assert mock_get_session.call_count == 1
|
||||
sink.write_nodes.assert_not_called()
|
||||
|
||||
|
||||
class TestSyncRelationships:
|
||||
def test_sync_relationships_source_closes_before_target_opens(self):
|
||||
def test_sync_relationships_writes_after_source_session_closes(self):
|
||||
row = {
|
||||
"internal_id": 1,
|
||||
"rel_type": "HAS",
|
||||
@@ -1932,21 +1970,23 @@ class TestSyncRelationships:
|
||||
|
||||
src_1 = MagicMock()
|
||||
src_1.run.return_value = [row]
|
||||
tgt = MagicMock()
|
||||
src_2 = MagicMock()
|
||||
src_2.run.return_value = []
|
||||
sink = MagicMock()
|
||||
sink.write_relationships.side_effect = lambda *_a, **_kw: call_order.append(
|
||||
"sink:write"
|
||||
)
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[
|
||||
_make_session_ctx(src_1, call_order, "source1"),
|
||||
_make_session_ctx(tgt, call_order, "target"),
|
||||
_make_session_ctx(src_2, call_order, "source2"),
|
||||
],
|
||||
):
|
||||
sync_module.sync_relationships("src", "tgt", "p-1")
|
||||
sync_module.sync_relationships("src", "tgt", "p-1", sink)
|
||||
|
||||
assert call_order.index("source1:exit") < call_order.index("target:enter")
|
||||
assert call_order.index("source1:exit") < call_order.index("sink:write")
|
||||
|
||||
def test_sync_relationships_pagination_with_batch_size_1(self):
|
||||
row_a = {
|
||||
@@ -1970,40 +2010,40 @@ class TestSyncRelationships:
|
||||
src_2.run.return_value = [row_b]
|
||||
src_3 = MagicMock()
|
||||
src_3.run.return_value = []
|
||||
tgt_1 = MagicMock()
|
||||
tgt_2 = MagicMock()
|
||||
sink = MagicMock()
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[
|
||||
_make_session_ctx(src_1),
|
||||
_make_session_ctx(tgt_1),
|
||||
_make_session_ctx(src_2),
|
||||
_make_session_ctx(tgt_2),
|
||||
_make_session_ctx(src_3),
|
||||
],
|
||||
),
|
||||
patch("tasks.jobs.attack_paths.sync.SYNC_BATCH_SIZE", 1),
|
||||
):
|
||||
total = sync_module.sync_relationships("src", "tgt", "p-1")
|
||||
total = sync_module.sync_relationships("src", "tgt", "p-1", sink)
|
||||
|
||||
assert total == 2
|
||||
assert sink.write_relationships.call_count == 2
|
||||
assert src_1.run.call_args.args[1]["last_id"] == -1
|
||||
assert src_2.run.call_args.args[1]["last_id"] == 1
|
||||
|
||||
def test_sync_relationships_empty_source_returns_zero(self):
|
||||
src = MagicMock()
|
||||
src.run.return_value = []
|
||||
sink = MagicMock()
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.sync.graph_database.get_session",
|
||||
side_effect=[_make_session_ctx(src)],
|
||||
) as mock_get_session:
|
||||
total = sync_module.sync_relationships("src", "tgt", "p-1")
|
||||
total = sync_module.sync_relationships("src", "tgt", "p-1", sink)
|
||||
|
||||
assert total == 0
|
||||
assert mock_get_session.call_count == 1
|
||||
sink.write_relationships.assert_not_called()
|
||||
|
||||
|
||||
class TestInternetAnalysis:
|
||||
@@ -2075,6 +2115,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
|
||||
assert attack_paths_scan is not None
|
||||
assert attack_paths_scan.graph_data_ready is False
|
||||
assert attack_paths_scan.is_migrated is False
|
||||
assert attack_paths_scan.sink_backend == "neo4j"
|
||||
|
||||
def test_create_attack_paths_scan_inherits_true_from_previous(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
@@ -2095,6 +2137,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
scan=scan,
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
is_migrated=True,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
|
||||
new_scan = Scan.objects.create(
|
||||
@@ -2115,6 +2159,109 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
|
||||
assert attack_paths_scan is not None
|
||||
assert attack_paths_scan.graph_data_ready is True
|
||||
# is_migrated tracks the data being served: inherited from the ready scan
|
||||
assert attack_paths_scan.is_migrated is True
|
||||
assert attack_paths_scan.sink_backend == "neptune"
|
||||
|
||||
def test_create_attack_paths_scan_prefers_active_sink_ready_scan(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture, settings
|
||||
):
|
||||
from tasks.jobs.attack_paths.db_utils import create_attack_paths_scan
|
||||
|
||||
settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
|
||||
tenant = tenants_fixture[0]
|
||||
provider = providers_fixture[0]
|
||||
provider.provider = Provider.ProviderChoices.AWS
|
||||
provider.save()
|
||||
scan = scans_fixture[0]
|
||||
scan.provider = provider
|
||||
scan.save()
|
||||
|
||||
AttackPathsScan.objects.create(
|
||||
tenant_id=tenant.id,
|
||||
provider=provider,
|
||||
scan=scan,
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
is_migrated=False,
|
||||
sink_backend="neo4j",
|
||||
)
|
||||
AttackPathsScan.objects.create(
|
||||
tenant_id=tenant.id,
|
||||
provider=provider,
|
||||
scan=scan,
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
is_migrated=True,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
|
||||
new_scan = Scan.objects.create(
|
||||
name="New Scan",
|
||||
provider=provider,
|
||||
trigger=Scan.TriggerChoices.MANUAL,
|
||||
state=StateChoices.AVAILABLE,
|
||||
tenant_id=tenant.id,
|
||||
)
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.db_utils.rls_transaction",
|
||||
new=lambda *args, **kwargs: nullcontext(),
|
||||
):
|
||||
attack_paths_scan = create_attack_paths_scan(
|
||||
str(tenant.id), str(new_scan.id), provider.id
|
||||
)
|
||||
|
||||
assert attack_paths_scan is not None
|
||||
assert attack_paths_scan.graph_data_ready is True
|
||||
assert attack_paths_scan.is_migrated is False
|
||||
assert attack_paths_scan.sink_backend == "neo4j"
|
||||
|
||||
def test_create_attack_paths_scan_inherits_is_migrated_false_from_legacy_ready(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
):
|
||||
from tasks.jobs.attack_paths.db_utils import create_attack_paths_scan
|
||||
|
||||
tenant = tenants_fixture[0]
|
||||
provider = providers_fixture[0]
|
||||
provider.provider = Provider.ProviderChoices.AWS
|
||||
provider.save()
|
||||
scan = scans_fixture[0]
|
||||
scan.provider = provider
|
||||
scan.save()
|
||||
|
||||
# Previous scan is ready but pre-cutover (legacy Neo4j graph shape)
|
||||
AttackPathsScan.objects.create(
|
||||
tenant_id=tenant.id,
|
||||
provider=provider,
|
||||
scan=scan,
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
is_migrated=False,
|
||||
sink_backend="neo4j",
|
||||
)
|
||||
|
||||
new_scan = Scan.objects.create(
|
||||
name="New Scan",
|
||||
provider=provider,
|
||||
trigger=Scan.TriggerChoices.MANUAL,
|
||||
state=StateChoices.AVAILABLE,
|
||||
tenant_id=tenant.id,
|
||||
)
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.db_utils.rls_transaction",
|
||||
new=lambda *args, **kwargs: nullcontext(),
|
||||
):
|
||||
attack_paths_scan = create_attack_paths_scan(
|
||||
str(tenant.id), str(new_scan.id), provider.id
|
||||
)
|
||||
|
||||
assert attack_paths_scan is not None
|
||||
assert attack_paths_scan.graph_data_ready is True
|
||||
# Reads stay on the legacy catalog/backend until this scan's own sync
|
||||
assert attack_paths_scan.is_migrated is False
|
||||
assert attack_paths_scan.sink_backend == "neo4j"
|
||||
|
||||
def test_create_attack_paths_scan_inherits_false_when_no_previous_ready(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
@@ -2135,6 +2282,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
scan=scan,
|
||||
state=StateChoices.FAILED,
|
||||
graph_data_ready=False,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
|
||||
new_scan = Scan.objects.create(
|
||||
@@ -2155,6 +2303,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
|
||||
assert attack_paths_scan is not None
|
||||
assert attack_paths_scan.graph_data_ready is False
|
||||
assert attack_paths_scan.is_migrated is False
|
||||
assert attack_paths_scan.sink_backend == "neo4j"
|
||||
|
||||
def test_set_graph_data_ready_updates_field(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
@@ -2261,7 +2411,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
assert attack_paths_scan.state == StateChoices.FAILED
|
||||
assert attack_paths_scan.graph_data_ready is True
|
||||
|
||||
def test_set_provider_graph_data_ready_updates_all_scans_for_provider(
|
||||
def test_set_provider_graph_data_ready_updates_all_scans_for_provider_sink(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
):
|
||||
from tasks.jobs.attack_paths.db_utils import set_provider_graph_data_ready
|
||||
@@ -2289,6 +2439,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
scan=scan_a,
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
new_ap_scan = AttackPathsScan.objects.create(
|
||||
tenant_id=tenant.id,
|
||||
@@ -2296,6 +2447,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
scan=scan_b,
|
||||
state=StateChoices.EXECUTING,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
|
||||
with patch(
|
||||
@@ -2309,6 +2461,48 @@ class TestAttackPathsDbUtilsGraphDataReady:
|
||||
assert old_ap_scan.graph_data_ready is False
|
||||
assert new_ap_scan.graph_data_ready is False
|
||||
|
||||
def test_set_provider_graph_data_ready_preserves_other_sink_scans(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
):
|
||||
from tasks.jobs.attack_paths.db_utils import set_provider_graph_data_ready
|
||||
|
||||
tenant = tenants_fixture[0]
|
||||
provider = providers_fixture[0]
|
||||
provider.provider = Provider.ProviderChoices.AWS
|
||||
provider.save()
|
||||
|
||||
scan = scans_fixture[0]
|
||||
scan.provider = provider
|
||||
scan.save()
|
||||
|
||||
legacy_scan = AttackPathsScan.objects.create(
|
||||
tenant_id=tenant.id,
|
||||
provider=provider,
|
||||
scan=scan,
|
||||
state=StateChoices.COMPLETED,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neo4j",
|
||||
)
|
||||
neptune_scan = AttackPathsScan.objects.create(
|
||||
tenant_id=tenant.id,
|
||||
provider=provider,
|
||||
scan=scan,
|
||||
state=StateChoices.EXECUTING,
|
||||
graph_data_ready=True,
|
||||
sink_backend="neptune",
|
||||
)
|
||||
|
||||
with patch(
|
||||
"tasks.jobs.attack_paths.db_utils.rls_transaction",
|
||||
new=lambda *args, **kwargs: nullcontext(),
|
||||
):
|
||||
set_provider_graph_data_ready(neptune_scan, False)
|
||||
|
||||
legacy_scan.refresh_from_db()
|
||||
neptune_scan.refresh_from_db()
|
||||
assert legacy_scan.graph_data_ready is True
|
||||
assert neptune_scan.graph_data_ready is False
|
||||
|
||||
def test_set_provider_graph_data_ready_does_not_affect_other_providers(
|
||||
self, tenants_fixture, providers_fixture, scans_fixture
|
||||
):
|
||||
@@ -2871,3 +3065,57 @@ class TestCleanupStaleAttackPathsScans:
|
||||
ap_scan.refresh_from_db()
|
||||
assert ap_scan.state == StateChoices.SCHEDULED
|
||||
mock_revoke.assert_not_called()
|
||||
|
||||
|
||||
class TestNormalizeSinkProperties:
|
||||
"""Coerce Cartography-emitted property values into sink-portable primitives.
|
||||
|
||||
Lists become comma-strings, dicts become JSON strings, temporals become
|
||||
ISO strings, spatials become their stringified form. The same coercion
|
||||
runs regardless of the active sink so queries are portable.
|
||||
"""
|
||||
|
||||
@pytest.mark.parametrize(
|
||||
"raw, expected",
|
||||
[
|
||||
(
|
||||
{"a": "x", "b": 1, "c": 1.5, "d": True, "e": None},
|
||||
{"a": "x", "b": 1, "c": 1.5, "d": True, "e": None},
|
||||
),
|
||||
(
|
||||
{"actions": ["s3:GetObject", "s3:PutObject"], "tags": []},
|
||||
{"actions": "s3:GetObject,s3:PutObject", "tags": ""},
|
||||
),
|
||||
(
|
||||
{"condition": {"StringEquals": {"aws:SourceAccount": "123456789012"}}},
|
||||
{
|
||||
"condition": '{"StringEquals": {"aws:SourceAccount": "123456789012"}}'
|
||||
},
|
||||
),
|
||||
],
|
||||
)
|
||||
def test_primitive_list_and_dict_branches(self, raw, expected):
|
||||
sync_module._normalize_sink_properties(raw, labels=None)
|
||||
assert raw == expected
|
||||
|
||||
def test_temporal_and_spatial_become_strings(self):
|
||||
class FakeDateTime:
|
||||
def iso_format(self) -> str:
|
||||
return "2026-05-13T10:00:00+00:00"
|
||||
|
||||
class FakeSpatialPoint:
|
||||
def __str__(self) -> str:
|
||||
return "POINT(1.0 2.0)"
|
||||
|
||||
# The spatial branch is detected by module prefix, not by base class.
|
||||
FakeSpatialPoint.__module__ = "neo4j.spatial.fake"
|
||||
|
||||
props = {
|
||||
"created_at": FakeDateTime(),
|
||||
"location": FakeSpatialPoint(),
|
||||
}
|
||||
sync_module._normalize_sink_properties(props, labels=None)
|
||||
assert props == {
|
||||
"created_at": "2026-05-13T10:00:00+00:00",
|
||||
"location": "POINT(1.0 2.0)",
|
||||
}
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
from unittest.mock import call, patch
|
||||
from unittest.mock import MagicMock, call, patch
|
||||
|
||||
import pytest
|
||||
from api.attack_paths import database as graph_database
|
||||
@@ -60,10 +60,12 @@ class TestDeleteProvider:
|
||||
|
||||
aps1 = create_attack_paths_scan(instance)
|
||||
aps2 = create_attack_paths_scan(instance)
|
||||
backend = MagicMock()
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.deletion.graph_database.drop_subgraph",
|
||||
"tasks.jobs.deletion.sink_module.get_backend_for_name",
|
||||
return_value=backend,
|
||||
),
|
||||
patch(
|
||||
"tasks.jobs.deletion.graph_database.drop_database",
|
||||
@@ -72,12 +74,55 @@ class TestDeleteProvider:
|
||||
result = delete_provider(tenant_id, instance.id)
|
||||
|
||||
assert result
|
||||
backend.drop_subgraph.assert_called_once_with(
|
||||
graph_database.get_database_name(tenant_id), str(instance.id)
|
||||
)
|
||||
expected_tmp_calls = [
|
||||
call(f"db-tmp-scan-{str(aps1.id).lower()}"),
|
||||
call(f"db-tmp-scan-{str(aps2.id).lower()}"),
|
||||
]
|
||||
mock_drop_database.assert_has_calls(expected_tmp_calls, any_order=True)
|
||||
|
||||
def test_delete_provider_drops_graph_data_from_all_recorded_sinks(
|
||||
self, providers_fixture, create_attack_paths_scan
|
||||
):
|
||||
instance = providers_fixture[0]
|
||||
tenant_id = str(instance.tenant_id)
|
||||
create_attack_paths_scan(instance, sink_backend="neo4j")
|
||||
create_attack_paths_scan(instance, sink_backend="neptune")
|
||||
neo4j_backend = MagicMock()
|
||||
neptune_backend = MagicMock()
|
||||
|
||||
def get_backend_for_name(name):
|
||||
return {
|
||||
"neo4j": neo4j_backend,
|
||||
"neptune": neptune_backend,
|
||||
}[name]
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.deletion.graph_database.get_database_name",
|
||||
return_value="tenant-db",
|
||||
),
|
||||
patch(
|
||||
"tasks.jobs.deletion.sink_module.get_backend_for_name",
|
||||
side_effect=get_backend_for_name,
|
||||
) as mock_get_backend_for_name,
|
||||
patch("tasks.jobs.deletion.graph_database.drop_database"),
|
||||
):
|
||||
result = delete_provider(tenant_id, instance.id)
|
||||
|
||||
assert result
|
||||
mock_get_backend_for_name.assert_has_calls(
|
||||
[call("neo4j"), call("neptune")], any_order=True
|
||||
)
|
||||
neo4j_backend.drop_subgraph.assert_called_once_with(
|
||||
"tenant-db", str(instance.id)
|
||||
)
|
||||
neptune_backend.drop_subgraph.assert_called_once_with(
|
||||
"tenant-db", str(instance.id)
|
||||
)
|
||||
|
||||
def test_delete_provider_continues_when_temp_db_drop_fails(
|
||||
self, providers_fixture, create_attack_paths_scan
|
||||
):
|
||||
@@ -85,10 +130,12 @@ class TestDeleteProvider:
|
||||
tenant_id = str(instance.tenant_id)
|
||||
|
||||
create_attack_paths_scan(instance)
|
||||
backend = MagicMock()
|
||||
|
||||
with (
|
||||
patch(
|
||||
"tasks.jobs.deletion.graph_database.drop_subgraph",
|
||||
"tasks.jobs.deletion.sink_module.get_backend_for_name",
|
||||
return_value=backend,
|
||||
),
|
||||
patch(
|
||||
"tasks.jobs.deletion.graph_database.drop_database",
|
||||
|
||||
Generated
+234
-9
@@ -110,7 +110,7 @@ constraints = [
|
||||
{ name = "blinker", specifier = "==1.9.0" },
|
||||
{ name = "boto3", specifier = "==1.40.61" },
|
||||
{ name = "botocore", specifier = "==1.40.61" },
|
||||
{ name = "cartography", specifier = "==0.135.0" },
|
||||
{ name = "cartography", specifier = "==0.138.1" },
|
||||
{ name = "celery", specifier = "==5.6.2" },
|
||||
{ name = "certifi", specifier = "==2026.1.4" },
|
||||
{ name = "cffi", specifier = "==2.0.0" },
|
||||
@@ -364,7 +364,7 @@ constraints = [
|
||||
{ name = "wcwidth", specifier = "==0.5.3" },
|
||||
{ name = "websocket-client", specifier = "==1.9.0" },
|
||||
{ name = "werkzeug", specifier = "==3.1.7" },
|
||||
{ name = "workos", specifier = "==6.0.4" },
|
||||
{ name = "workos", specifier = "==6.0.8" },
|
||||
{ name = "wrapt", specifier = "==1.17.3" },
|
||||
{ name = "xlsxwriter", specifier = "==3.2.9" },
|
||||
{ name = "xmlsec", specifier = "==1.3.17" },
|
||||
@@ -376,6 +376,7 @@ constraints = [
|
||||
{ name = "zstd", specifier = "==1.5.7.3" },
|
||||
]
|
||||
overrides = [
|
||||
{ name = "azure-mgmt-containerservice", specifier = "==34.1.0" },
|
||||
{ name = "dulwich", specifier = "==1.2.5" },
|
||||
{ name = "microsoft-kiota-abstractions", specifier = "==1.9.9" },
|
||||
{ name = "okta", specifier = "==3.4.2" },
|
||||
@@ -1407,6 +1408,20 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/3d/66/0d8ae9ca4d75e57746026a1f9a10a7e25029511c128cf20166fce516bda9/azure_mgmt_logic-10.0.0-py3-none-any.whl", hash = "sha256:525c78afedf3edb35eb0a16152c8beba89769ee1bc6af01bcdc42842a551e443", size = 235433, upload-time = "2022-06-13T01:38:27.333Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "azure-mgmt-managementgroups"
|
||||
version = "1.1.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "azure-mgmt-core" },
|
||||
{ name = "isodate" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/fd/73/ac5e064ed7343e1b3172f32f09be3efca906087218d3046b5038f2f394ed/azure_mgmt_managementgroups-1.1.0.tar.gz", hash = "sha256:e6199baf118890ba2bda35dda83a88861c0b1bbef126311b20ec12eed9681951", size = 60101, upload-time = "2026-02-13T03:45:45.439Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/92/bc/993158de03cc0a49f2cf8192615ffedbc508c417cb3522e88f6652b714cc/azure_mgmt_managementgroups-1.1.0-py3-none-any.whl", hash = "sha256:140934589559ef6afcac6f1d24f995588a1965aaa89d47851c1cc639fafb1942", size = 83586, upload-time = "2026-02-13T03:45:46.836Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "azure-mgmt-monitor"
|
||||
version = "6.0.2"
|
||||
@@ -1726,7 +1741,7 @@ wheels = [
|
||||
|
||||
[[package]]
|
||||
name = "cartography"
|
||||
version = "0.135.0"
|
||||
version = "0.138.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "adal" },
|
||||
@@ -1746,6 +1761,7 @@ dependencies = [
|
||||
{ name = "azure-mgmt-eventhub" },
|
||||
{ name = "azure-mgmt-keyvault" },
|
||||
{ name = "azure-mgmt-logic" },
|
||||
{ name = "azure-mgmt-managementgroups" },
|
||||
{ name = "azure-mgmt-monitor" },
|
||||
{ name = "azure-mgmt-network" },
|
||||
{ name = "azure-mgmt-resource" },
|
||||
@@ -1754,6 +1770,7 @@ dependencies = [
|
||||
{ name = "azure-mgmt-storage" },
|
||||
{ name = "azure-mgmt-synapse" },
|
||||
{ name = "azure-mgmt-web" },
|
||||
{ name = "azure-storage-blob" },
|
||||
{ name = "azure-synapse-artifacts" },
|
||||
{ name = "backoff" },
|
||||
{ name = "boto3" },
|
||||
@@ -1765,8 +1782,12 @@ dependencies = [
|
||||
{ name = "duo-client" },
|
||||
{ name = "google-api-python-client" },
|
||||
{ name = "google-auth" },
|
||||
{ name = "google-cloud-aiplatform" },
|
||||
{ name = "google-cloud-artifact-registry" },
|
||||
{ name = "google-cloud-asset" },
|
||||
{ name = "google-cloud-resource-manager" },
|
||||
{ name = "google-cloud-run" },
|
||||
{ name = "google-cloud-storage" },
|
||||
{ name = "httpx" },
|
||||
{ name = "kubernetes" },
|
||||
{ name = "marshmallow" },
|
||||
@@ -1792,9 +1813,9 @@ dependencies = [
|
||||
{ name = "workos" },
|
||||
{ name = "xmltodict" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/39/47/606851d2403a983b63813b9e95427a5dd896e49bc5a501868c041262e9a5/cartography-0.135.0.tar.gz", hash = "sha256:3f500cd22c3b392d00e8b49f62acc95fd4dcd559ce514aafe2eb8101133c7a49", size = 9106458, upload-time = "2026-04-10T16:25:34.898Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/51/cd/0eb6a5a3c89cc179801d902ade9719af1a583c516c00f50d72b8207db1eb/cartography-0.138.1.tar.gz", hash = "sha256:356e946a0bcac899cba293d57803c71bd35fdeabe623f5f67d9405d7a643af9f", size = 9756966, upload-time = "2026-06-19T22:11:32.411Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/b1/e1/99a26b3e662202be77961aba73338e1448623490710b81783e53a4bbef15/cartography-0.135.0-py3-none-any.whl", hash = "sha256:c62c32a6917b8f23a8b98fe2b6c7c4a918b50f55918482966c4dae1cf5f538e1", size = 1590545, upload-time = "2026-04-10T16:25:37.669Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a8/15/4447ec968825b2a19cba26ecb74964208aa3f941d9181a7782572e30b43d/cartography-0.138.1-py3-none-any.whl", hash = "sha256:88ec0898ea1a1b3f4653be9a3e7e61144f5cee20384b9040e92039617d39f029", size = 2014725, upload-time = "2026-06-19T22:11:29.886Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
@@ -2511,6 +2532,15 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/e3/26/57c6fb270950d476074c087527a558ccb6f4436657314bfb6cdf484114c4/docker-7.1.0-py3-none-any.whl", hash = "sha256:c96b93b7f0a746f9e77d325bcfb87422a3d8bd4f03136ae8a85b37f1898d5fc0", size = 147774, upload-time = "2024-05-23T11:13:55.01Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "docstring-parser"
|
||||
version = "0.18.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/e0/4d/f332313098c1de1b2d2ff91cf2674415cc7cddab2ca1b01ae29774bd5fdf/docstring_parser-0.18.0.tar.gz", hash = "sha256:292510982205c12b1248696f44959db3cdd1740237a968ea1e2e7a900eeb2015", size = 29341, upload-time = "2026-04-14T04:09:19.867Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/a7/5f/ed01f9a3cdffbd5a008556fc7b2a08ddb1cc6ace7effa7340604b1d16699/docstring_parser-0.18.0-py3-none-any.whl", hash = "sha256:b3fcbed555c47d8479be0796ef7e19c2670d428d72e96da63f3a40122860374b", size = 22484, upload-time = "2026-04-14T04:09:18.638Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "dogpile-cache"
|
||||
version = "1.5.0"
|
||||
@@ -2851,6 +2881,11 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/83/1d/d6466de3a5249d35e832a52834115ca9d1d0de6abc22065f049707516d47/google_auth-2.48.0-py3-none-any.whl", hash = "sha256:2e2a537873d449434252a9632c28bfc268b0adb1e53f9fb62afc5333a975903f", size = 236499, upload-time = "2026-01-26T19:22:45.099Z" },
|
||||
]
|
||||
|
||||
[package.optional-dependencies]
|
||||
requests = [
|
||||
{ name = "requests" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-auth-httplib2"
|
||||
version = "0.2.0"
|
||||
@@ -2877,6 +2912,46 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/ca/94/24b010493660dd55e2d9769ae7ef44164aebd7e1f4a9266cf9459affd687/google_cloud_access_context_manager-0.3.0-py3-none-any.whl", hash = "sha256:5d15ad51547f06c281e35f16b4ffcb3e98bb2d898b01470f88b94edfb2eeb0a3", size = 58852, upload-time = "2025-10-17T02:30:33.768Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-aiplatform"
|
||||
version = "1.153.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "docstring-parser" },
|
||||
{ name = "google-api-core", extra = ["grpc"] },
|
||||
{ name = "google-auth" },
|
||||
{ name = "google-cloud-bigquery" },
|
||||
{ name = "google-cloud-resource-manager" },
|
||||
{ name = "google-cloud-storage" },
|
||||
{ name = "google-genai" },
|
||||
{ name = "packaging" },
|
||||
{ name = "proto-plus" },
|
||||
{ name = "protobuf" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "typing-extensions" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/d5/97/1779e66ab845550bc602364311ea093ba156cb805a1c31b7c4d6f25b5863/google_cloud_aiplatform-1.153.1.tar.gz", hash = "sha256:445b6c683d5c630f174d81ae1f69f7da9e27e4d4ec5b70c5fe96de5c1247cfbc", size = 11011349, upload-time = "2026-05-15T06:34:14.851Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/16/01/8a1900e7a742ed480e6037ac4f6541466cb981d81bd4cbd34a9d46204ea1/google_cloud_aiplatform-1.153.1-py2.py3-none-any.whl", hash = "sha256:033fa1595a7e8ed1d97066e261e630f38fbc60e10c98c6487cf228fe9c7ec151", size = 9170782, upload-time = "2026-05-15T06:34:10.887Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-artifact-registry"
|
||||
version = "1.21.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "google-api-core", extra = ["grpc"] },
|
||||
{ name = "google-auth" },
|
||||
{ name = "grpc-google-iam-v1" },
|
||||
{ name = "grpcio" },
|
||||
{ name = "proto-plus" },
|
||||
{ name = "protobuf" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/13/2b/24e6956789bc1244efb18143aa4f124e03d870228e5bfd065c04d38a4d6b/google_cloud_artifact_registry-1.21.0.tar.gz", hash = "sha256:546e51eb5d463a6e5c668be6727d14f8ec82bc798031398006b2213d703e184c", size = 315219, upload-time = "2026-03-30T22:50:38.875Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/e1/8c/a5c68031728f38d3306bad5ac10c0ca670cbdf414db308ddefa2c47f2b34/google_cloud_artifact_registry-1.21.0-py3-none-any.whl", hash = "sha256:a07079035438fd0f2e7264d4318b388650495f011db575405c18c9881449025c", size = 250544, upload-time = "2026-03-30T22:48:49.345Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-asset"
|
||||
version = "4.2.0"
|
||||
@@ -2897,6 +2972,37 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/05/88/9a43fae1d2fed94d7f5f46b6f4c44bd15e5ea0e8657632108b5ec5f53d9d/google_cloud_asset-4.2.0-py3-none-any.whl", hash = "sha256:fd7ea04c64948a4779790343204cd5b41d4772d6ab1d05a9125e28a637ac0862", size = 282707, upload-time = "2026-01-09T14:53:03.081Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-bigquery"
|
||||
version = "3.41.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "google-api-core", extra = ["grpc"] },
|
||||
{ name = "google-auth" },
|
||||
{ name = "google-cloud-core" },
|
||||
{ name = "google-resumable-media" },
|
||||
{ name = "packaging" },
|
||||
{ name = "python-dateutil" },
|
||||
{ name = "requests" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/ce/13/6515c7aab55a4a0cf708ffd309fb9af5bab54c13e32dc22c5acd6497193c/google_cloud_bigquery-3.41.0.tar.gz", hash = "sha256:2217e488b47ed576360c9b2cc07d59d883a54b83167c0ef37f915c26b01a06fe", size = 513434, upload-time = "2026-03-30T22:50:55.347Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/40/33/1d3902efadef9194566d499d61507e1f038454e0b55499d2d7f8ab2a4fee/google_cloud_bigquery-3.41.0-py3-none-any.whl", hash = "sha256:2a5b5a737b401cbd824a6e5eac7554100b878668d908e6548836b5d8aaa4dcaa", size = 262343, upload-time = "2026-03-30T22:48:45.444Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-core"
|
||||
version = "2.6.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "google-api-core" },
|
||||
{ name = "google-auth" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/a8/dd/1eef226e470369b26824a505c34482c0b493bc35fe8e0c6b003b5feca21a/google_cloud_core-2.6.0.tar.gz", hash = "sha256:e76149739f90fac1fc6757c09f47eaccb3145b54adbd7759b0f7c4b235f46c83", size = 36001, upload-time = "2026-05-07T08:04:04.124Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/84/4a/98da8930ab109c73d9a5d13782a9ebb81ea8c111f6d534a567b71d23e52b/google_cloud_core-2.6.0-py3-none-any.whl", hash = "sha256:6d63ac8e5eca6d9e4319d0a1e2265fadcd7f1049904378caecfa01cf52dd869e", size = 29390, upload-time = "2026-05-07T08:02:34.672Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-org-policy"
|
||||
version = "1.16.0"
|
||||
@@ -2946,6 +3052,93 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/94/ff/4b28bcc791d9d7e4ac8fea00fbd90ccb236afda56746a3b4564d2ae45df3/google_cloud_resource_manager-1.16.0-py3-none-any.whl", hash = "sha256:fb9a2ad2b5053c508e1c407ac31abfd1a22e91c32876c1892830724195819a28", size = 400218, upload-time = "2026-01-15T13:02:47.378Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-run"
|
||||
version = "0.16.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "google-api-core", extra = ["grpc"] },
|
||||
{ name = "google-auth" },
|
||||
{ name = "grpc-google-iam-v1" },
|
||||
{ name = "grpcio" },
|
||||
{ name = "proto-plus" },
|
||||
{ name = "protobuf" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/b7/89/dcaf0dc97e39b41e446456ceb60657ab025de79cfccd39cbd739d1a9849e/google_cloud_run-0.16.0.tar.gz", hash = "sha256:d52cf4e6ad3702ae48caccf6abcab543afee6f61c2a6ec753cc62a31e5b629f1", size = 514452, upload-time = "2026-03-26T22:17:05.589Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/fa/c7/46153dc13713b5e4276d86f28ff4563332f9e4bae5ebc83abc5bfd994801/google_cloud_run-0.16.0-py3-none-any.whl", hash = "sha256:d7d2dd7307130fde2a0ce27e96d580dd23b7b2d973b6484b94d902e6b2618860", size = 459112, upload-time = "2026-03-26T22:16:00.018Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-cloud-storage"
|
||||
version = "3.10.1"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "google-api-core" },
|
||||
{ name = "google-auth" },
|
||||
{ name = "google-cloud-core" },
|
||||
{ name = "google-crc32c" },
|
||||
{ name = "google-resumable-media" },
|
||||
{ name = "requests" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/4c/47/205eb8e9a1739b5345843e5a425775cbdc472cc38e7eda082ba5b8d02450/google_cloud_storage-3.10.1.tar.gz", hash = "sha256:97db9aa4460727982040edd2bd13ff3d5e2260b5331ad22895802da1fc2a5286", size = 17309950, upload-time = "2026-03-23T09:35:23.409Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/ad/ff/ca9ab2417fa913d75aae38bf40bf856bb2749a604b2e0f701b37cfcd23cc/google_cloud_storage-3.10.1-py3-none-any.whl", hash = "sha256:a72f656759b7b99bda700f901adcb3425a828d4a29f911bc26b3ea79c5b1217f", size = 324453, upload-time = "2026-03-23T09:35:21.368Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-crc32c"
|
||||
version = "1.8.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/03/41/4b9c02f99e4c5fb477122cd5437403b552873f014616ac1d19ac8221a58d/google_crc32c-1.8.0.tar.gz", hash = "sha256:a428e25fb7691024de47fecfbff7ff957214da51eddded0da0ae0e0f03a2cf79", size = 14192, upload-time = "2025-12-16T00:35:25.142Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/5d/ef/21ccfaab3d5078d41efe8612e0ed0bfc9ce22475de074162a91a25f7980d/google_crc32c-1.8.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:014a7e68d623e9a4222d663931febc3033c5c7c9730785727de2a81f87d5bab8", size = 31298, upload-time = "2025-12-16T00:20:32.241Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c5/b8/f8413d3f4b676136e965e764ceedec904fe38ae8de0cdc52a12d8eb1096e/google_crc32c-1.8.0-cp311-cp311-macosx_12_0_x86_64.whl", hash = "sha256:86cfc00fe45a0ac7359e5214a1704e51a99e757d0272554874f419f79838c5f7", size = 30872, upload-time = "2025-12-16T00:33:58.785Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f6/fd/33aa4ec62b290477181c55bb1c9302c9698c58c0ce9a6ab4874abc8b0d60/google_crc32c-1.8.0-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:19b40d637a54cb71e0829179f6cb41835f0fbd9e8eb60552152a8b52c36cbe15", size = 33243, upload-time = "2025-12-16T00:40:21.46Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/71/03/4820b3bd99c9653d1a5210cb32f9ba4da9681619b4d35b6a052432df4773/google_crc32c-1.8.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:17446feb05abddc187e5441a45971b8394ea4c1b6efd88ab0af393fd9e0a156a", size = 33608, upload-time = "2025-12-16T00:40:22.204Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/7c/43/acf61476a11437bf9733fb2f70599b1ced11ec7ed9ea760fdd9a77d0c619/google_crc32c-1.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:71734788a88f551fbd6a97be9668a0020698e07b2bf5b3aa26a36c10cdfb27b2", size = 34439, upload-time = "2025-12-16T00:35:20.458Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/e9/5f/7307325b1198b59324c0fa9807cafb551afb65e831699f2ce211ad5c8240/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:4b8286b659c1335172e39563ab0a768b8015e88e08329fa5321f774275fc3113", size = 31300, upload-time = "2025-12-16T00:21:56.723Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/21/8e/58c0d5d86e2220e6a37befe7e6a94dd2f6006044b1a33edf1ff6d9f7e319/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:2a3dc3318507de089c5384cc74d54318401410f82aa65b2d9cdde9d297aca7cb", size = 30867, upload-time = "2025-12-16T00:38:31.302Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/ce/a9/a780cc66f86335a6019f557a8aaca8fbb970728f0efd2430d15ff1beae0e/google_crc32c-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:14f87e04d613dfa218d6135e81b78272c3b904e2a7053b841481b38a7d901411", size = 33364, upload-time = "2025-12-16T00:40:22.96Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/21/3f/3457ea803db0198c9aaca2dd373750972ce28a26f00544b6b85088811939/google_crc32c-1.8.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb5c869c2923d56cb0c8e6bcdd73c009c36ae39b652dbe46a05eb4ef0ad01454", size = 33740, upload-time = "2025-12-16T00:40:23.96Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/df/c0/87c2073e0c72515bb8733d4eef7b21548e8d189f094b5dad20b0ecaf64f6/google_crc32c-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:3cc0c8912038065eafa603b238abf252e204accab2a704c63b9e14837a854962", size = 34437, upload-time = "2025-12-16T00:35:21.395Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/52/c5/c171e4d8c44fec1422d801a6d2e5d7ddabd733eeda505c79730ee9607f07/google_crc32c-1.8.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:87fa445064e7db928226b2e6f0d5304ab4cd0339e664a4e9a25029f384d9bb93", size = 28615, upload-time = "2025-12-16T00:40:29.298Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/9c/97/7d75fe37a7a6ed171a2cf17117177e7aab7e6e0d115858741b41e9dd4254/google_crc32c-1.8.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f639065ea2042d5c034bf258a9f085eaa7af0cd250667c0635a3118e8f92c69c", size = 28800, upload-time = "2025-12-16T00:40:30.322Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-genai"
|
||||
version = "1.68.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "anyio" },
|
||||
{ name = "distro" },
|
||||
{ name = "google-auth", extra = ["requests"] },
|
||||
{ name = "httpx" },
|
||||
{ name = "pydantic" },
|
||||
{ name = "requests" },
|
||||
{ name = "sniffio" },
|
||||
{ name = "tenacity" },
|
||||
{ name = "typing-extensions" },
|
||||
{ name = "websockets" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/9c/2c/f059982dbcb658cc535c81bbcbe7e2c040d675f4b563b03cdb01018a4bc3/google_genai-1.68.0.tar.gz", hash = "sha256:ac30c0b8bc630f9372993a97e4a11dae0e36f2e10d7c55eacdca95a9fa14ca96", size = 511285, upload-time = "2026-03-18T01:03:18.243Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/84/de/7d3ee9c94b74c3578ea4f88d45e8de9405902f857932334d81e89bce3dfa/google_genai-1.68.0-py3-none-any.whl", hash = "sha256:a1bc9919c0e2ea2907d1e319b65471d3d6d58c54822039a249fe1323e4178d15", size = 750912, upload-time = "2026-03-18T01:03:15.983Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "google-resumable-media"
|
||||
version = "2.9.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "google-crc32c" },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/00/4b/0b235beccc310d0a48adbc7246b719d173cca6c88c572dfa4b090e39143c/google_resumable_media-2.9.0.tar.gz", hash = "sha256:f7cfb224846a9dd444d125115dfbe8ef02a2b893e78f087762fe716a255a734b", size = 2164534, upload-time = "2026-05-07T08:04:44.236Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/07/73/3518e63deb1667c5409a4579e28daf5e84479a87a72c547e0487f7883dcd/google_resumable_media-2.9.0-py3-none-any.whl", hash = "sha256:c8901e88e389af8bed64d9696c74d8bad961865eb2236e13e0bfca9bb0a65ca3", size = 81507, upload-time = "2026-05-07T08:03:23.809Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "googleapis-common-protos"
|
||||
version = "1.72.0"
|
||||
@@ -4606,7 +4799,7 @@ dev = [
|
||||
|
||||
[package.metadata]
|
||||
requires-dist = [
|
||||
{ name = "cartography", specifier = "==0.135.0" },
|
||||
{ name = "cartography", specifier = "==0.138.1" },
|
||||
{ name = "celery", specifier = "==5.6.2" },
|
||||
{ name = "defusedxml", specifier = "==0.7.1" },
|
||||
{ name = "dj-rest-auth", extras = ["with-social", "jwt"], specifier = "==7.0.1" },
|
||||
@@ -5931,6 +6124,38 @@ wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/34/db/b10e48aa8fff7407e67470363eac595018441cf32d5e1001567a7aeba5d2/websocket_client-1.9.0-py3-none-any.whl", hash = "sha256:af248a825037ef591efbf6ed20cc5faa03d3b47b9e5a2230a529eeee1c1fc3ef", size = 82616, upload-time = "2025-10-07T21:16:34.951Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "websockets"
|
||||
version = "16.0"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/04/24/4b2031d72e840ce4c1ccb255f693b15c334757fc50023e4db9537080b8c4/websockets-16.0.tar.gz", hash = "sha256:5f6261a5e56e8d5c42a4497b364ea24d94d9563e8fbd44e78ac40879c60179b5", size = 179346, upload-time = "2026-01-10T09:23:47.181Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/f2/db/de907251b4ff46ae804ad0409809504153b3f30984daf82a1d84a9875830/websockets-16.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:31a52addea25187bde0797a97d6fc3d2f92b6f72a9370792d65a6e84615ac8a8", size = 177340, upload-time = "2026-01-10T09:22:34.539Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/f3/fa/abe89019d8d8815c8781e90d697dec52523fb8ebe308bf11664e8de1877e/websockets-16.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:417b28978cdccab24f46400586d128366313e8a96312e4b9362a4af504f3bbad", size = 175022, upload-time = "2026-01-10T09:22:36.332Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/58/5d/88ea17ed1ded2079358b40d31d48abe90a73c9e5819dbcde1606e991e2ad/websockets-16.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:af80d74d4edfa3cb9ed973a0a5ba2b2a549371f8a741e0800cb07becdd20f23d", size = 175319, upload-time = "2026-01-10T09:22:37.602Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/d2/ae/0ee92b33087a33632f37a635e11e1d99d429d3d323329675a6022312aac2/websockets-16.0-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:08d7af67b64d29823fed316505a89b86705f2b7981c07848fb5e3ea3020c1abe", size = 184631, upload-time = "2026-01-10T09:22:38.789Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c8/c5/27178df583b6c5b31b29f526ba2da5e2f864ecc79c99dae630a85d68c304/websockets-16.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7be95cfb0a4dae143eaed2bcba8ac23f4892d8971311f1b06f3c6b78952ee70b", size = 185870, upload-time = "2026-01-10T09:22:39.893Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/87/05/536652aa84ddc1c018dbb7e2c4cbcd0db884580bf8e95aece7593fde526f/websockets-16.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d6297ce39ce5c2e6feb13c1a996a2ded3b6832155fcfc920265c76f24c7cceb5", size = 185361, upload-time = "2026-01-10T09:22:41.016Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6d/e2/d5332c90da12b1e01f06fb1b85c50cfc489783076547415bf9f0a659ec19/websockets-16.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:1c1b30e4f497b0b354057f3467f56244c603a79c0d1dafce1d16c283c25f6e64", size = 184615, upload-time = "2026-01-10T09:22:42.442Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/77/fb/d3f9576691cae9253b51555f841bc6600bf0a983a461c79500ace5a5b364/websockets-16.0-cp311-cp311-win32.whl", hash = "sha256:5f451484aeb5cafee1ccf789b1b66f535409d038c56966d6101740c1614b86c6", size = 178246, upload-time = "2026-01-10T09:22:43.654Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/54/67/eaff76b3dbaf18dcddabc3b8c1dba50b483761cccff67793897945b37408/websockets-16.0-cp311-cp311-win_amd64.whl", hash = "sha256:8d7f0659570eefb578dacde98e24fb60af35350193e4f56e11190787bee77dac", size = 178684, upload-time = "2026-01-10T09:22:44.941Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/84/7b/bac442e6b96c9d25092695578dda82403c77936104b5682307bd4deb1ad4/websockets-16.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:71c989cbf3254fbd5e84d3bff31e4da39c43f884e64f2551d14bb3c186230f00", size = 177365, upload-time = "2026-01-10T09:22:46.787Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b0/fe/136ccece61bd690d9c1f715baaeefd953bb2360134de73519d5df19d29ca/websockets-16.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8b6e209ffee39ff1b6d0fa7bfef6de950c60dfb91b8fcead17da4ee539121a79", size = 175038, upload-time = "2026-01-10T09:22:47.999Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/40/1e/9771421ac2286eaab95b8575b0cb701ae3663abf8b5e1f64f1fd90d0a673/websockets-16.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:86890e837d61574c92a97496d590968b23c2ef0aeb8a9bc9421d174cd378ae39", size = 175328, upload-time = "2026-01-10T09:22:49.809Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/18/29/71729b4671f21e1eaa5d6573031ab810ad2936c8175f03f97f3ff164c802/websockets-16.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9b5aca38b67492ef518a8ab76851862488a478602229112c4b0d58d63a7a4d5c", size = 184915, upload-time = "2026-01-10T09:22:51.071Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/97/bb/21c36b7dbbafc85d2d480cd65df02a1dc93bf76d97147605a8e27ff9409d/websockets-16.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e0334872c0a37b606418ac52f6ab9cfd17317ac26365f7f65e203e2d0d0d359f", size = 186152, upload-time = "2026-01-10T09:22:52.224Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/4a/34/9bf8df0c0cf88fa7bfe36678dc7b02970c9a7d5e065a3099292db87b1be2/websockets-16.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a0b31e0b424cc6b5a04b8838bbaec1688834b2383256688cf47eb97412531da1", size = 185583, upload-time = "2026-01-10T09:22:53.443Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/47/88/4dd516068e1a3d6ab3c7c183288404cd424a9a02d585efbac226cb61ff2d/websockets-16.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:485c49116d0af10ac698623c513c1cc01c9446c058a4e61e3bf6c19dff7335a2", size = 184880, upload-time = "2026-01-10T09:22:55.033Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/91/d6/7d4553ad4bf1c0421e1ebd4b18de5d9098383b5caa1d937b63df8d04b565/websockets-16.0-cp312-cp312-win32.whl", hash = "sha256:eaded469f5e5b7294e2bdca0ab06becb6756ea86894a47806456089298813c89", size = 178261, upload-time = "2026-01-10T09:22:56.251Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/c3/f0/f3a17365441ed1c27f850a80b2bc680a0fa9505d733fe152fdf5e98c1c0b/websockets-16.0-cp312-cp312-win_amd64.whl", hash = "sha256:5569417dc80977fc8c2d43a86f78e0a5a22fee17565d78621b6bb264a115d4ea", size = 178693, upload-time = "2026-01-10T09:22:57.478Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/72/07/c98a68571dcf256e74f1f816b8cc5eae6eb2d3d5cfa44d37f801619d9166/websockets-16.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:349f83cd6c9a415428ee1005cadb5c2c56f4389bc06a9af16103c3bc3dcc8b7d", size = 174947, upload-time = "2026-01-10T09:23:36.166Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/7e/52/93e166a81e0305b33fe416338be92ae863563fe7bce446b0f687b9df5aea/websockets-16.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:4a1aba3340a8dca8db6eb5a7986157f52eb9e436b74813764241981ca4888f03", size = 175260, upload-time = "2026-01-10T09:23:37.409Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/56/0c/2dbf513bafd24889d33de2ff0368190a0e69f37bcfa19009ef819fe4d507/websockets-16.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:f4a32d1bd841d4bcbffdcb3d2ce50c09c3909fbead375ab28d0181af89fd04da", size = 176071, upload-time = "2026-01-10T09:23:39.158Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/a5/8f/aea9c71cc92bf9b6cc0f7f70df8f0b420636b6c96ef4feee1e16f80f75dd/websockets-16.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0298d07ee155e2e9fda5be8a9042200dd2e3bb0b8a38482156576f863a9d457c", size = 176968, upload-time = "2026-01-10T09:23:41.031Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/9a/3f/f70e03f40ffc9a30d817eef7da1be72ee4956ba8d7255c399a01b135902a/websockets-16.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:a653aea902e0324b52f1613332ddf50b00c06fdaf7e92624fbf8c77c78fa5767", size = 178735, upload-time = "2026-01-10T09:23:42.259Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
name = "werkzeug"
|
||||
version = "3.1.7"
|
||||
@@ -5945,16 +6170,16 @@ wheels = [
|
||||
|
||||
[[package]]
|
||||
name = "workos"
|
||||
version = "6.0.4"
|
||||
version = "6.0.8"
|
||||
source = { registry = "https://pypi.org/simple" }
|
||||
dependencies = [
|
||||
{ name = "cryptography" },
|
||||
{ name = "httpx" },
|
||||
{ name = "pyjwt", extra = ["crypto"] },
|
||||
]
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/3c/2f/99fb8718274116c5c146c745755620fd5c5943f78ca52ca9b17e94348286/workos-6.0.4.tar.gz", hash = "sha256:b0bfe8fd212b8567422c4ea3732eb33608794033eb3a69900c6b04db183c32d6", size = 172217, upload-time = "2026-04-16T03:09:28.583Z" }
|
||||
sdist = { url = "https://files.pythonhosted.org/packages/ca/0d/0a7f78912657f99412c788932ea1f3f4089916e77bdef7d2463842febe08/workos-6.0.8.tar.gz", hash = "sha256:43aa3f1992a0a4ca8933d9b6e5ada846dd3b1fe0ee10e64c876ee2000fc6090d", size = 178137, upload-time = "2026-04-24T18:48:03.203Z" }
|
||||
wheels = [
|
||||
{ url = "https://files.pythonhosted.org/packages/fa/f1/d2ab661e6dc2828a4c73e38f12630c3b109cfe2bc664ab70631c04f0db4b/workos-6.0.4-py3-none-any.whl", hash = "sha256:548668b3702673536f853ba72a7b5bbbc269e467aaf9ac4f477b6e0177df5e21", size = 511418, upload-time = "2026-04-16T03:09:27.098Z" },
|
||||
{ url = "https://files.pythonhosted.org/packages/b2/3f/3d96da80d650b2f97d58af626053354584f619dbb769051e118bd9cd1ca5/workos-6.0.8-py3-none-any.whl", hash = "sha256:a00dd4930333aded2babbba824f8032eea05c5ca8c44d04a3fa068cf6be6e21a", size = 524505, upload-time = "2026-04-24T18:48:01.389Z" },
|
||||
]
|
||||
|
||||
[[package]]
|
||||
|
||||
@@ -3,13 +3,13 @@ title: "Attack Paths"
|
||||
description: "Identify privilege escalation chains and security misconfigurations across cloud environments using graph-based analysis."
|
||||
---
|
||||
|
||||
import { VersionBadge } from "/snippets/version-badge.mdx"
|
||||
import { VersionBadge } from "/snippets/version-badge.mdx";
|
||||
|
||||
<VersionBadge version="5.17.0" />
|
||||
|
||||
Attack Paths analyzes relationships between cloud resources, permissions, and security findings to detect how privileges can be escalated and how misconfigurations can be exploited by threat actors.
|
||||
|
||||
By mapping these relationships as a graph, Attack Paths reveals risks that individual security checks cannot detect on their own — such as an IAM role that can escalate its own permissions, or a chain of policies that grants unintended access to sensitive resources.
|
||||
By mapping these relationships as a graph, Attack Paths reveals risks that individual security checks cannot detect on their own, such as an IAM role that can escalate its own permissions, or a chain of policies that grants unintended access to sensitive resources.
|
||||
|
||||
<Note>
|
||||
Attack Paths is currently available for **AWS** providers. Support for
|
||||
@@ -21,7 +21,7 @@ By mapping these relationships as a graph, Attack Paths reveals risks that indiv
|
||||
The following prerequisites are required for Attack Paths:
|
||||
|
||||
- **An AWS provider is configured** with valid credentials in Prowler App. For setup instructions, see [Getting Started with AWS](/user-guide/providers/aws/getting-started-aws).
|
||||
- **At least one scan has completed** on the configured AWS provider. Attack Paths scans run automatically alongside regular security scans — no separate configuration is required.
|
||||
- **At least one scan has completed** on the configured AWS provider. Attack Paths scans run automatically alongside regular security scans, no separate configuration is required.
|
||||
|
||||
## How Attack Paths Scans Work
|
||||
|
||||
@@ -145,11 +145,10 @@ LIMIT 25
|
||||
**IAM principals with wildcard Allow statements:**
|
||||
|
||||
```cypher
|
||||
MATCH (principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
|
||||
WHERE stmt.effect = 'Allow'
|
||||
AND ANY(action IN stmt.action WHERE action = '*')
|
||||
RETURN principal.arn AS principal, policy.arn AS policy,
|
||||
stmt.action AS actions, stmt.resource AS resources
|
||||
MATCH (principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
|
||||
MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
|
||||
WHERE a.value = '*'
|
||||
RETURN DISTINCT principal.arn AS principal, policy.arn AS policy
|
||||
LIMIT 25
|
||||
```
|
||||
|
||||
@@ -173,218 +172,89 @@ RETURN r.name AS role_name, r.arn AS role_arn, p.arn AS trusted_service
|
||||
LIMIT 25
|
||||
```
|
||||
|
||||
### Advanced Attack Path Scenarios
|
||||
### Working with List-Typed Properties
|
||||
|
||||
The following scenarios show how to compose graph traversals into real attack-path stories. Each query can be pasted directly into the custom query box: the API auto-scopes them to the selected provider and injects tenant/provider isolation, so there is no need to include account identifiers or `$provider_uid` in the text. All queries are openCypher v9 (Neo4j and Neptune compatible).
|
||||
Some Cartography node properties carry a list of values, such as `action`, `resource`, `notaction`, and `notresource` on `AWSPolicyStatement` nodes, the algorithms on `KMSKey`, the container-definition lists on `ECSContainerDefinition`, and many others. The Attack Paths graph models each such property as a set of child item nodes connected to the parent by a typed edge. To read the values, traverse the edge; the parent does not carry the list as a single field.
|
||||
|
||||
#### 1. Live attacker on the box that owns the keys
|
||||
The naming convention for any list-typed property on a parent label is:
|
||||
|
||||
**Query story:** Finds an internet-exposed EC2 under an active GuardDuty SSH brute-force whose instance role can assume a higher-privileged role that can read a sensitive S3 bucket.
|
||||
- **Child label:** `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` resolves to `AWSPolicyStatementResourceItem`.
|
||||
- **Edge type:** `HAS_<PROPERTY_UPPER>`. Example: `resource` resolves to `HAS_RESOURCE`.
|
||||
- **Child property:** `value` for scalar lists (one string per list element). List-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) carry the original dict keys as named fields on the child node.
|
||||
|
||||
To express "at least one item in the list satisfies a predicate", traverse the `HAS_*` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`. `RETURN DISTINCT` collapses duplicate parent rows produced when multiple child items satisfy the filter:
|
||||
|
||||
```cypher
|
||||
MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
|
||||
WHERE ec2.exposed_internet = true
|
||||
MATCH p0 = (gd:GuardDutyFinding)-[:AFFECTS]->(ec2)
|
||||
MATCH p1 = (ec2)-[:INSTANCE_PROFILE]->(prof:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(low:AWSRole)
|
||||
MATCH p2 = (low)-[:STS_ASSUMEROLE_ALLOW]-(high:AWSRole)
|
||||
MATCH p3 = (high)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
|
||||
OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
|
||||
MATCH path_s3 = (acct)--(s3:S3Bucket)
|
||||
WHERE high <> low
|
||||
AND stmt.effect = 'Allow'
|
||||
AND size([a IN stmt.action WHERE
|
||||
toLower(a) STARTS WITH 's3:getobject'
|
||||
OR toLower(a) STARTS WITH 's3:listbucket'
|
||||
OR toLower(a) IN ['s3:*']
|
||||
]) > 0
|
||||
AND size([r IN stmt.resource WHERE
|
||||
r CONTAINS s3.name
|
||||
]) > 0
|
||||
RETURN path_net, path_ec2, p0, p1, p2, p3, path_s3
|
||||
```
|
||||
|
||||
**How it's built:**
|
||||
|
||||
- `path_ec2` anchors the graph on the account node and its internet-exposed EC2 instance, via a real account-to-resource edge. This is the visible spine that keeps everything connected.
|
||||
- `p0` ties a `GuardDutyFinding` to that instance through the `AFFECTS` edge (the live SSH brute-force alert).
|
||||
- `p1` walks the real graph edges from the instance to its instance profile to the role it runs as.
|
||||
- `p2` follows the `STS_ASSUMEROLE_ALLOW` edge to the higher-privileged role the low role can assume. It is undirected so it works regardless of how the assume edge was ingested. `high <> low` stops a role matching itself.
|
||||
- `p3` walks that role into its policy and policy statement.
|
||||
- `path_net` is the optional `Internet -[:CAN_ACCESS]-> instance` edge. It makes "from the internet" literal on screen. Optional so a missing `Internet` node never breaks the query live.
|
||||
- `path_s3` connects the sensitive bucket to the same account node, so it draws connected instead of floating. There is no physical edge from a role to a bucket; the grant is logical, enforced in the `WHERE`: the statement must allow an S3 read action (list comprehension over the `action` array) and its resource must cover the bucket (`CONTAINS s3.name`). The account is the shared hub; the bucket hanging off it next to the role chain is the teaching moment — the access exists only in IAM.
|
||||
|
||||
#### 2. Who can read the crown jewels
|
||||
|
||||
**Query story:** The sensitive bucket from the previous scenario seen from the data side: every role whose IAM policy can read it, regardless of how the role is reached.
|
||||
|
||||
```cypher
|
||||
MATCH (s3:S3Bucket)
|
||||
WHERE toLower(s3.name) CONTAINS 'sensitive'
|
||||
MATCH (role:AWSRole)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
|
||||
WHERE stmt.effect = 'Allow'
|
||||
AND size([a IN stmt.action WHERE
|
||||
toLower(a) STARTS WITH 's3:get'
|
||||
OR toLower(a) STARTS WITH 's3:list'
|
||||
OR toLower(a) IN ['s3:*']
|
||||
]) > 0
|
||||
AND size([r IN stmt.resource WHERE
|
||||
r CONTAINS s3.name
|
||||
]) > 0
|
||||
WITH DISTINCT s3, role
|
||||
MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
|
||||
MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
|
||||
WHERE toLower(a.value) STARTS WITH 's3:get'
|
||||
OR toLower(a.value) STARTS WITH 's3:list'
|
||||
RETURN DISTINCT stmt
|
||||
LIMIT 25
|
||||
MATCH path_s3 = (acct:AWSAccount)--(s3)
|
||||
MATCH path_role = (acct)--(role)
|
||||
RETURN path_s3, path_role
|
||||
```
|
||||
|
||||
**How it's built:** data-centric, not attacker-centric — the same bucket the previous kill chain exfiltrates, approached from the other direction.
|
||||
|
||||
- The `S3Bucket` is bound first by name (one node), so everything else filters against it.
|
||||
- `(role:AWSRole)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)` reaches statements only *through a role*, never via a global statement scan. A blanket `AWSPolicyStatement` scan also hits resource-policy statements whose shape differs and makes the list comprehension fail outright.
|
||||
- The `WHERE` filters in place: an S3 read action plus a resource that names that bucket.
|
||||
- `WITH DISTINCT s3, role LIMIT 25` collapses undirected-traversal duplicates and hard-caps the result.
|
||||
- `path_s3` and `path_role` attach the account hubs only after the cap, against at most 25 rows, so the bucket and role(s) draw connected through the account instead of floating.
|
||||
- No internet or EC2 here; this answers "who has the keys" instead of "how would an attacker get in."
|
||||
|
||||
#### 3. Lateral reach from an internet-exposed instance
|
||||
|
||||
**Query story:** The wide-angle view of the live-attacker scenario: every internet-exposed EC2, the role it runs as, and every role that role can assume. The first scenario is one specific exfiltration path inside this reach, under live attack.
|
||||
To check whether every item in the list satisfies a predicate, count the counter-examples and require zero, together with a guard that ensures at least one item is attached. This is the one case where the pattern-comprehension form is the right tool:
|
||||
|
||||
```cypher
|
||||
MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
|
||||
WHERE ec2.exposed_internet = true
|
||||
MATCH p1 = (ec2)-[:INSTANCE_PROFILE]->(prof:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(low:AWSRole)
|
||||
MATCH p2 = (low)-[:STS_ASSUMEROLE_ALLOW]-(high:AWSRole)
|
||||
OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
|
||||
WHERE high <> low
|
||||
RETURN path_net, path_ec2, p1, p2
|
||||
MATCH (stmt:AWSPolicyStatement)
|
||||
WHERE size([
|
||||
(stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
|
||||
WHERE NOT toLower(a.value) STARTS WITH 's3:'
|
||||
| a
|
||||
]) = 0
|
||||
AND size([(stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem) | a]) > 0
|
||||
RETURN stmt
|
||||
LIMIT 25
|
||||
```
|
||||
|
||||
**How it's built:** widens the lens instead of filtering down. It stops at the assume-role hop and shows every role reachable from any internet-exposed instance, without filtering down to a specific S3 leg.
|
||||
|
||||
- `path_ec2` is the account-to-instance spine.
|
||||
- `p1` walks to the instance role.
|
||||
- `p2` fans out to every role that role can assume.
|
||||
- `path_net` adds the optional `Internet -[:CAN_ACCESS]->` edge.
|
||||
- The first scenario is the specific exfiltration path under live attack; this is the broader privilege reach an attacker inherits the moment they land on the box.
|
||||
|
||||
#### 4. Role-chain privilege escalation
|
||||
|
||||
**Query story:** A pure-IAM escalation, no compromised instance: a role that can assume a second role whose policy lets it assume a third, admin-level role.
|
||||
For the "is any item of this list a substring of a dynamic value" case, such as "does any resource pattern in this policy match a target role ARN", add the `HAS_*` traversal as its own `MATCH` and check the substring relationship between the item value and the dynamic node in `WHERE`:
|
||||
|
||||
```cypher
|
||||
MATCH path_root = (acct:AWSAccount)--(r1:AWSRole)
|
||||
MATCH p1 = (r1)-[:STS_ASSUMEROLE_ALLOW]-(r2:AWSRole)
|
||||
MATCH p2 = (r2)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
|
||||
MATCH path_admin = (acct)--(admin:AWSRole)
|
||||
WHERE r1 <> r2 AND r1 <> admin AND r2 <> admin
|
||||
AND stmt.effect = 'Allow'
|
||||
AND size([a IN stmt.action WHERE
|
||||
toLower(a) IN ['sts:*', 'sts:assumerole']
|
||||
]) > 0
|
||||
AND size([res IN stmt.resource WHERE
|
||||
res CONTAINS admin.name
|
||||
]) > 0
|
||||
RETURN path_root, p1, p2, path_admin
|
||||
MATCH (role:AWSRole)
|
||||
WHERE role.name = 'Admin'
|
||||
MATCH (principal:AWSPrincipal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
|
||||
MATCH (stmt)-[:HAS_RESOURCE]->(r:AWSPolicyStatementResourceItem)
|
||||
WHERE r.value = '*'
|
||||
OR r.value CONTAINS role.name
|
||||
OR role.arn CONTAINS r.value
|
||||
RETURN DISTINCT principal.arn AS principal, stmt, role
|
||||
LIMIT 25
|
||||
```
|
||||
|
||||
**How it's built:**
|
||||
|
||||
- `path_root` anchors role 1 to the account node, the spine that keeps the picture connected.
|
||||
- `p1` is the one real assume edge in the chain (role 1 to role 2).
|
||||
- `p2` walks role 2 into its policy and statement.
|
||||
- `path_admin` connects the target admin role to the same account node so it draws connected. The third hop is not a graph edge: it exists only as `sts:AssumeRole` on that role's ARN inside the statement. The query proves it the same way the first scenario proves S3 access — the statement action must include an assume-role action and its resource list must reference the admin role's name.
|
||||
- The three `<>` guards stop a role matching itself at any position.
|
||||
|
||||
#### 5. External identity trust map
|
||||
|
||||
**Query story:** Finds external identity providers (SSO, GitHub, GitLab, Terraform Cloud) and the AWS roles they are trusted to assume.
|
||||
To return the list of values directly, collect them from the child items:
|
||||
|
||||
```cypher
|
||||
MATCH p = (role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(idp:AWSPrincipal)
|
||||
WHERE idp.arn CONTAINS 'saml-provider'
|
||||
OR idp.arn CONTAINS 'oidc-provider'
|
||||
MATCH path_role = (acct:AWSAccount)--(role)
|
||||
RETURN p, path_role
|
||||
MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
|
||||
OPTIONAL MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
|
||||
RETURN stmt, collect(a.value) AS actions
|
||||
LIMIT 25
|
||||
```
|
||||
|
||||
**How it's built:** federated principals are stored as `AWSPrincipal` nodes whose ARN contains `saml-provider` (SSO) or `oidc-provider` (GitHub, GitLab, Terraform Cloud).
|
||||
### Working with JSON-Encoded Properties
|
||||
|
||||
- `p` matches the trust edge undirected. It is written `(AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(AWSPrincipal)`, role to principal, so a directed `principal -> role` match returns nothing; undirected matches regardless of ingest direction.
|
||||
- The `WHERE` keeps only SAML or OIDC providers, drawing a fan-out from each external identity provider to every role it can assume (including reserved SSO admin roles).
|
||||
- `path_role` ties every trusted role to the account node so the provider stars share one spine instead of drawing as separate islands.
|
||||
Some Cartography properties represent nested objects, most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement` nodes. In the Attack Paths graph, object-typed properties are stored as JSON-encoded strings to keep the schema portable across graph backends. The value looks like:
|
||||
|
||||
#### 6. Federated SSO roles flagged as admin or privesc
|
||||
```
|
||||
'{"StringEquals":{"aws:SourceAccount":"123456789012"}}'
|
||||
```
|
||||
|
||||
**Query story:** The dangerous subset of the trust map above — externally-federated SSO roles that Prowler also flags for AdministratorAccess or privilege escalation.
|
||||
There is no JSON parser available at query time, so use `CONTAINS` for substring checks against keys or known values:
|
||||
|
||||
```cypher
|
||||
MATCH (idp:AWSPrincipal)-[:TRUSTS_AWS_PRINCIPAL]-(role:AWSRole)
|
||||
WHERE idp.arn CONTAINS 'saml-provider'
|
||||
OR idp.arn CONTAINS 'oidc-provider'
|
||||
MATCH (role)-[:HAS_FINDING]-(pf:ProwlerFinding)
|
||||
WHERE pf.status = 'FAIL'
|
||||
AND pf.check_id IN [
|
||||
'iam_inline_policy_allows_privilege_escalation',
|
||||
'iam_role_administratoraccess_policy',
|
||||
'iam_inline_policy_no_administrative_privileges',
|
||||
'iam_user_administrator_access_policy'
|
||||
]
|
||||
WITH DISTINCT idp, role, pf
|
||||
LIMIT 60
|
||||
MATCH path_root = (acct:AWSAccount)--(role)
|
||||
MATCH p_trust = (idp)-[:TRUSTS_AWS_PRINCIPAL]-(role)
|
||||
MATCH p_find = (role)-[:HAS_FINDING]-(pf)
|
||||
RETURN path_root, p_trust, p_find
|
||||
MATCH (stmt:AWSPolicyStatement)
|
||||
WHERE stmt.effect = 'Allow'
|
||||
AND stmt.condition CONTAINS '"aws:SourceAccount"'
|
||||
RETURN stmt
|
||||
LIMIT 25
|
||||
```
|
||||
|
||||
**How it's built:** a plain "list every flagged identity" query is a wide fan that draws as a column, and `ProwlerFinding` nodes accumulate across scans with no scan filter available in custom queries.
|
||||
|
||||
- The first MATCH plus `WHERE` keeps only roles trusted by a SAML or OIDC provider (trust edge undirected, so direction does not matter).
|
||||
- The second MATCH plus `check_id IN [...]` keeps only those carrying one of the four privilege-escalation or admin checks.
|
||||
- `WITH DISTINCT ... LIMIT 60` collapses duplicate finding nodes and hard-caps the result.
|
||||
- `p_trust`, `p_find`, and `path_root` draw it connected three ways: provider to role through the trust edge, role to its finding, and role to the account.
|
||||
- The previous scenario shows who can walk in; this shows which of those roles Prowler already flags as over-privileged.
|
||||
|
||||
#### 7. World-readable S3 buckets
|
||||
|
||||
**Query story:** Unlike the IAM-gated sensitive bucket in scenarios 1 and 2, these buckets are open to anyone on the internet with no credentials at all.
|
||||
|
||||
```cypher
|
||||
MATCH path_s3 = (acct:AWSAccount)--(s3:S3Bucket)
|
||||
WHERE s3.anonymous_access = true
|
||||
OPTIONAL MATCH p = (s3)--(stmt:S3PolicyStatement)
|
||||
RETURN path_s3, p
|
||||
```
|
||||
|
||||
**How it's built:** the counterpoint to scenarios 1 and 2 — there the sensitive bucket is reachable only through an IAM role chain; here the bucket needs no role at all.
|
||||
|
||||
- `path_s3` connects each public bucket to its account node so they draw connected. Cartography sets `anonymous_access = true` when a bucket's policy or ACL allows public access.
|
||||
- `p` is an optional match that pulls in the `S3PolicyStatement` granting the access where one exists, so the public grant is visible next to the bucket. Buckets that are public via ACL only still show, connected to the account.
|
||||
|
||||
#### 8. Internet exposure surface
|
||||
|
||||
**Query story:** The raw external attack surface behind scenarios 1 and 3: every internet-exposed EC2 instance with its security groups and the exact inbound ports left open.
|
||||
|
||||
```cypher
|
||||
MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
|
||||
WHERE ec2.exposed_internet = true
|
||||
MATCH p1 = (ec2)--(sg:EC2SecurityGroup)--(rule:IpPermissionInbound)
|
||||
OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
|
||||
OPTIONAL MATCH p2 = (ec2)-[:INSTANCE_PROFILE]->(:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(:AWSRole)
|
||||
RETURN path_net, path_ec2, p1, p2
|
||||
```
|
||||
|
||||
**How it's built:** `exposed_internet = true` is Cartography's computed reachability flag.
|
||||
|
||||
- `path_ec2` hubs all exposed instances on the account node so they draw as one picture.
|
||||
- `p1` joins each instance to its security groups and inbound rules so the open ports are on screen.
|
||||
- `path_net` adds the optional `Internet -[:CAN_ACCESS]->` edge so the external reachability is explicit.
|
||||
- `p2` optionally adds the instance role, which connects this surface view back to the kill chains in scenarios 1 and 3.
|
||||
When a query needs to inspect the structured members of a condition (for example, evaluate every operator and key), fetch the rows first and parse the JSON in application code. Cypher cannot navigate JSON object keys or values.
|
||||
|
||||
### Tips for Writing Queries
|
||||
|
||||
- Start small with `LIMIT` to inspect the shape of the data before broadening the pattern.
|
||||
- Traverse `HAS_*` edges to reach list-typed property values (for example `action`, `resource`). The parent node does not carry the list as a single field; see [Working with List-Typed Properties](#working-with-list-typed-properties) for the patterns.
|
||||
- On large scans, avoid broad disconnected patterns such as `MATCH (a:Label), (b:OtherLabel)`. Bind one side with a selective predicate first, and use `WITH DISTINCT` between expanding traversals when duplicates are possible.
|
||||
- Use `RETURN` projections (`RETURN n.name, n.region`) instead of returning whole nodes to keep responses compact.
|
||||
- Combine resource nodes with `ProwlerFinding` nodes via `HAS_FINDING` to correlate misconfigurations with the affected resources.
|
||||
- When a query times out or returns no rows, simplify the pattern step by step until the first variant runs successfully, then add constraints back.
|
||||
@@ -401,6 +271,8 @@ In addition to the upstream schema, Prowler enriches the graph with:
|
||||
|
||||
- **`ProwlerFinding`** nodes representing Prowler check results, linked to affected resources via `HAS_FINDING` relationships.
|
||||
- **`Internet`** nodes used to model exposure paths from the public internet to internal resources.
|
||||
- **List-typed properties** such as `action` or `resource` on `AWSPolicyStatement`, the algorithm lists on `KMSKey`, and similar lists on other node types are modeled as child item nodes linked by typed `HAS_*` edges. See [Working with List-Typed Properties](#working-with-list-typed-properties) for the read pattern.
|
||||
- **Object-typed properties** such as `condition` on `AWSPolicyStatement` are stored as JSON-encoded strings. See [Working with JSON-Encoded Properties](#working-with-json-encoded-properties) for the read pattern.
|
||||
|
||||
<Note>
|
||||
AI assistants connected through Prowler MCP Server can fetch the exact
|
||||
@@ -539,105 +411,106 @@ Attack Paths currently supports the following built-in queries for AWS:
|
||||
|
||||
#### Custom Attack Path Queries
|
||||
|
||||
| Query | Description |
|
||||
|---|---|
|
||||
| Query | Description |
|
||||
| ------------------------------------------------- | ---------------------------------------------------------------------------------------- |
|
||||
| **Internet-Exposed EC2 with Sensitive S3 Access** | Find SSH-exposed EC2 instances that can assume roles to read tagged sensitive S3 buckets |
|
||||
|
||||
#### Basic Resource Queries
|
||||
|
||||
| Query | Description |
|
||||
|---|---|
|
||||
| **RDS Instances Inventory** | List all provisioned RDS database instances in the account |
|
||||
| **Unencrypted RDS Instances** | Find RDS instances with storage encryption disabled |
|
||||
| **S3 Buckets with Anonymous Access** | Find S3 buckets that allow anonymous access |
|
||||
| **IAM Statements Allowing All Actions** | Find IAM policy statements that allow all actions via wildcard (\*) |
|
||||
| **IAM Statements Allowing Policy Deletion** | Find IAM policy statements that allow iam:DeletePolicy |
|
||||
| **IAM Statements Allowing Create Actions** | Find IAM policy statements that allow any create action |
|
||||
| Query | Description |
|
||||
| ------------------------------------------- | ------------------------------------------------------------------- |
|
||||
| **RDS Instances Inventory** | List all provisioned RDS database instances in the account |
|
||||
| **Unencrypted RDS Instances** | Find RDS instances with storage encryption disabled |
|
||||
| **S3 Buckets with Anonymous Access** | Find S3 buckets that allow anonymous access |
|
||||
| **IAM Statements Allowing All Actions** | Find IAM policy statements that allow all actions via wildcard (\*) |
|
||||
| **IAM Statements Allowing Policy Deletion** | Find IAM policy statements that allow iam:DeletePolicy |
|
||||
| **IAM Statements Allowing Create Actions** | Find IAM policy statements that allow any create action |
|
||||
|
||||
#### Network Exposure Queries
|
||||
|
||||
| Query | Description |
|
||||
|---|---|
|
||||
| **Internet-Exposed EC2 Instances** | Find EC2 instances flagged as exposed to the internet |
|
||||
| Query | Description |
|
||||
| ----------------------------------------------------- | ----------------------------------------------------------------------------------- |
|
||||
| **Internet-Exposed EC2 Instances** | Find EC2 instances flagged as exposed to the internet |
|
||||
| **Open Security Groups on Internet-Facing Resources** | Find internet-facing resources with security groups allowing inbound from 0.0.0.0/0 |
|
||||
| **Internet-Exposed Classic Load Balancers** | Find Classic Load Balancers exposed to the internet with their listeners |
|
||||
| **Internet-Exposed ALB/NLB Load Balancers** | Find ELBv2 (ALB/NLB) load balancers exposed to the internet with their listeners |
|
||||
| **Resource Lookup by Public IP** | Find the AWS resource associated with a given public IP address |
|
||||
| **Internet-Exposed Classic Load Balancers** | Find Classic Load Balancers exposed to the internet with their listeners |
|
||||
| **Internet-Exposed ALB/NLB Load Balancers** | Find ELBv2 (ALB/NLB) load balancers exposed to the internet with their listeners |
|
||||
| **Resource Lookup by Public IP** | Find the AWS resource associated with a given public IP address |
|
||||
|
||||
#### Privilege Escalation Queries
|
||||
|
||||
These queries are based on research from [pathfinding.cloud](https://pathfinding.cloud) by Datadog.
|
||||
|
||||
| Query | Description |
|
||||
|---|---|
|
||||
| **App Runner Service Creation with Privileged Role (APPRUNNER-001)** | Create an App Runner service with a privileged IAM role to gain its permissions |
|
||||
| **App Runner Service Update for Role Access (APPRUNNER-002)** | Update an existing App Runner service to leverage its already-attached privileged role |
|
||||
| **Bedrock Code Interpreter with Privileged Role (BEDROCK-001)** | Create a Bedrock AgentCore Code Interpreter with a privileged role attached |
|
||||
| **Bedrock Code Interpreter Session Hijacking (BEDROCK-002)** | Start a session on an existing Bedrock code interpreter to exfiltrate its privileged role credentials |
|
||||
| **CloudFormation Stack Creation with Privileged Role (CLOUDFORMATION-001)** | Create a CloudFormation stack with a privileged role to provision arbitrary AWS resources |
|
||||
| **CloudFormation Stack Update for Role Access (CLOUDFORMATION-002)** | Update an existing CloudFormation stack to leverage its already-attached privileged service role |
|
||||
| **CloudFormation StackSet Creation with Privileged Role (CLOUDFORMATION-003)** | Create a CloudFormation StackSet with a privileged execution role to provision arbitrary resources across accounts |
|
||||
| **CloudFormation StackSet Update with Privileged Role (CLOUDFORMATION-004)** | Update an existing CloudFormation StackSet to inject malicious resources using a privileged execution role |
|
||||
| **CloudFormation Change Set Privilege Escalation (CLOUDFORMATION-005)** | Create and execute a change set on an existing stack to leverage its privileged service role |
|
||||
| **CodeBuild Project Creation with Privileged Role (CODEBUILD-001)** | Create a CodeBuild project with a privileged role to execute arbitrary code via a malicious buildspec |
|
||||
| **CodeBuild Buildspec Override for Role Access (CODEBUILD-002)** | Start a build on an existing CodeBuild project with a buildspec override to execute code with its privileged role |
|
||||
| **CodeBuild Batch Buildspec Override for Role Access (CODEBUILD-003)** | Start a batch build on an existing CodeBuild project with a buildspec override to execute code with its privileged role |
|
||||
| **CodeBuild Batch Project Creation with Privileged Role (CODEBUILD-004)** | Create a CodeBuild project configured for batch builds with a privileged role to execute arbitrary code via a malicious buildspec |
|
||||
| **Data Pipeline Creation with Privileged Role (DATAPIPELINE-001)** | Create a Data Pipeline with a privileged role to execute arbitrary commands on provisioned infrastructure |
|
||||
| **EC2 Instance Launch with Privileged Role (EC2-001)** | Launch EC2 instances with privileged IAM roles to gain their permissions via IMDS |
|
||||
| **EC2 Role Hijacking via UserData Injection (EC2-002)** | Inject malicious scripts into EC2 instance userData to gain the attached role's permissions |
|
||||
| **Spot Instance Launch with Privileged Role (EC2-003)** | Launch EC2 Spot Instances with privileged IAM roles to gain their permissions via IMDS |
|
||||
| **Launch Template Poisoning for Role Access (EC2-004)** | Inject malicious userData into launch templates that reference privileged roles, no PassRole needed |
|
||||
| **EC2 Instance Connect SSH Access for Role Credentials (EC2INSTANCECONNECT-003)** | Push a temporary SSH key to an EC2 instance via Instance Connect to access its attached role credentials through IMDS |
|
||||
| **ECS Service Creation with Privileged Role (ECS-001 - New Cluster)** | Create an ECS cluster and service with a privileged Fargate task role to execute arbitrary code |
|
||||
| **ECS Task Execution with Privileged Role (ECS-002 - New Cluster)** | Create an ECS cluster and run a one-off Fargate task with a privileged role to execute arbitrary code |
|
||||
| **ECS Service Creation with Privileged Role (ECS-003 - Existing Cluster)** | Deploy a Fargate service with a privileged role on an existing ECS cluster |
|
||||
| **ECS Task Execution with Privileged Role (ECS-004 - Existing Cluster)** | Run a one-off Fargate task with a privileged role on an existing ECS cluster |
|
||||
| **ECS Task Start with Privileged Role on EC2 (ECS-005 - Existing Cluster)** | Register a task definition with a privileged role and start it on an EC2 container instance to execute arbitrary code |
|
||||
| **ECS Exec Container Hijacking for Role Credentials (ECS-006)** | Shell into a running ECS container via ECS Exec to steal the attached task role's credentials |
|
||||
| **Glue Dev Endpoint with Privileged Role (GLUE-001)** | Create a Glue development endpoint with a privileged role attached to gain its permissions |
|
||||
| **Glue Dev Endpoint SSH Hijacking via Update (GLUE-002)** | Update an existing Glue development endpoint to inject an SSH public key and access its attached role credentials |
|
||||
| **Glue Job Creation with Privileged Role (GLUE-003)** | Create a Glue job with a privileged role and start it to execute arbitrary code with that role's permissions |
|
||||
| **Glue Job Creation with Scheduled Trigger and Privileged Role (GLUE-004)** | Create a Glue job with a privileged role and a scheduled trigger to persistently execute arbitrary code |
|
||||
| **Glue Job Hijacking via Update with Privileged Role (GLUE-005)** | Update an existing Glue job to attach a privileged role and inject malicious code, then start it to gain that role's permissions |
|
||||
| **Glue Job Hijacking with Scheduled Trigger and Privileged Role (GLUE-006)** | Update an existing Glue job to attach a privileged role and inject malicious code, then create a scheduled trigger for persistent automated execution |
|
||||
| **Policy Version Override for Self-Escalation (IAM-001)** | Create a new version of an attached policy with administrative permissions, instantly escalating the principal's own privileges |
|
||||
| **Access Key Creation for Lateral Movement (IAM-002)** | Create access keys for other IAM users to gain their permissions and move laterally across the account |
|
||||
| **Access Key Rotation Attack for Lateral Movement (IAM-003)** | Delete and recreate access keys for other IAM users to bypass the two-key limit and gain their permissions |
|
||||
| **Console Login Profile Creation for Lateral Movement (IAM-004)** | Create console login profiles for other IAM users to access the AWS Console with their permissions |
|
||||
| **Inline Policy Injection for Self-Escalation (IAM-005)** | Attach an inline policy with administrative permissions to your own role, instantly escalating privileges |
|
||||
| **Console Password Override for Lateral Movement (IAM-006)** | Change the console password of other IAM users to log in as them and gain their permissions |
|
||||
| **Inline Policy Injection on User for Self-Escalation (IAM-007)** | Attach an inline policy with administrative permissions to your own IAM user, instantly escalating privileges |
|
||||
| **Managed Policy Attachment on User for Self-Escalation (IAM-008)** | Attach existing managed policies with administrative permissions to your own IAM user, instantly escalating privileges |
|
||||
| **Managed Policy Attachment on Role for Self-Escalation (IAM-009)** | Attach existing managed policies with administrative permissions to your own IAM role, instantly escalating privileges |
|
||||
| **Managed Policy Attachment on Group for Self-Escalation (IAM-010)** | Attach existing managed policies with administrative permissions to a group you belong to, escalating privileges for all group members |
|
||||
| **Inline Policy Injection on Group for Self-Escalation (IAM-011)** | Attach an inline policy with administrative permissions to a group you belong to, escalating privileges for all group members |
|
||||
| **Trust Policy Hijacking for Role Assumption (IAM-012)** | Modify a role's trust policy to allow yourself to assume it, gaining the role's permissions |
|
||||
| **Group Membership Hijacking for Privilege Escalation (IAM-013)** | Add yourself to a privileged IAM group to inherit its permissions, gaining access to all policies attached to the group |
|
||||
| **Managed Policy Attachment with Role Assumption for Lateral Movement (IAM-014)** | Attach administrative managed policies to another role you can assume, then assume it to gain elevated privileges |
|
||||
| **Managed Policy Attachment with Access Key Creation for Lateral Movement (IAM-015)** | Attach administrative managed policies to another IAM user and create access keys for them to gain programmatic access with elevated privileges |
|
||||
| **Policy Version Override with Role Assumption for Lateral Movement (IAM-016)** | Create a new version of a customer-managed policy attached to another role with administrative permissions, then assume that role to gain elevated access |
|
||||
| **Inline Policy Injection with Role Assumption for Lateral Movement (IAM-017)** | Attach an inline policy with administrative permissions to another role you can assume, then assume it to gain elevated privileges |
|
||||
| **Inline Policy Injection with Access Key Creation for Lateral Movement (IAM-018)** | Attach an inline policy with administrative permissions to another IAM user and create access keys for them to gain programmatic access with elevated privileges |
|
||||
| **Managed Policy Attachment with Trust Policy Hijacking for Privilege Escalation (IAM-019)** | Attach administrative managed policies to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
|
||||
| **Policy Version Override with Trust Policy Hijacking for Privilege Escalation (IAM-020)** | Create a new version of a customer-managed policy attached to a role with administrative permissions and modify its trust policy to assume it, without prior assume-role access |
|
||||
| **Inline Policy Injection with Trust Policy Hijacking for Privilege Escalation (IAM-021)** | Add an inline policy with administrative permissions to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
|
||||
| **Lambda Function Creation with Privileged Role (LAMBDA-001)** | Create a Lambda function with a privileged IAM role and invoke it to execute code with that role's permissions |
|
||||
| **Lambda Function Creation with Event Source Trigger (LAMBDA-002)** | Create a Lambda function with a privileged IAM role and an event source mapping to trigger it automatically, executing code with the role's permissions |
|
||||
| **Lambda Function Code Injection (LAMBDA-003)** | Modify the code of an existing Lambda function to execute arbitrary commands with the function's execution role permissions |
|
||||
| **Lambda Function Code Injection with Direct Invocation (LAMBDA-004)** | Modify the code of an existing Lambda function and invoke it directly to execute arbitrary commands with the function's execution role permissions |
|
||||
| **Lambda Function Code Injection with Resource Policy Grant (LAMBDA-005)** | Modify the code of an existing Lambda function and grant yourself invocation permission via its resource-based policy to execute code with the function's execution role |
|
||||
| **Lambda Function Creation with Resource Policy Invocation (LAMBDA-006)** | Create a Lambda function with a privileged IAM role and grant yourself invocation permission via its resource-based policy to execute code with the role's permissions |
|
||||
| **SageMaker Notebook Creation with Privileged Role (SAGEMAKER-001)** | Create a SageMaker notebook instance with a privileged IAM role to execute arbitrary code with the role's permissions via the Jupyter environment |
|
||||
| **SageMaker Training Job Creation with Privileged Role (SAGEMAKER-002)** | Create a SageMaker training job with a privileged IAM role to execute arbitrary container code with the role's permissions |
|
||||
| **SageMaker Processing Job Creation with Privileged Role (SAGEMAKER-003)** | Create a SageMaker processing job with a privileged IAM role to execute arbitrary container code with the role's permissions |
|
||||
| **SageMaker Presigned Notebook URL for Privilege Escalation (SAGEMAKER-004)** | Generate a presigned URL to access an existing SageMaker notebook instance and execute code with its execution role's permissions |
|
||||
| **SageMaker Notebook Lifecycle Config Injection (SAGEMAKER-005)** | Inject a malicious lifecycle configuration into an existing SageMaker notebook to execute code with the notebook's execution role during startup |
|
||||
| **SSM Session Access for EC2 Role Credentials (SSM-001)** | Start an SSM session on an EC2 instance to access its attached role credentials through IMDS |
|
||||
| **SSM Send Command for EC2 Role Credentials (SSM-002)** | Execute commands on an EC2 instance via SSM Run Command to access its attached role credentials through IMDS |
|
||||
| **Role Assumption for Privilege Escalation (STS-001)** | Assume IAM roles with elevated permissions by exploiting bidirectional trust between the starting principal and the target role |
|
||||
| Query | Description |
|
||||
| -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| **App Runner Service Creation with Privileged Role (APPRUNNER-001)** | Create an App Runner service with a privileged IAM role to gain its permissions |
|
||||
| **App Runner Service Update for Role Access (APPRUNNER-002)** | Update an existing App Runner service to leverage its already-attached privileged role |
|
||||
| **Bedrock Code Interpreter with Privileged Role (BEDROCK-001)** | Create a Bedrock AgentCore Code Interpreter with a privileged role attached |
|
||||
| **Bedrock Code Interpreter Session Hijacking (BEDROCK-002)** | Start a session on an existing Bedrock code interpreter to exfiltrate its privileged role credentials |
|
||||
| **CloudFormation Stack Creation with Privileged Role (CLOUDFORMATION-001)** | Create a CloudFormation stack with a privileged role to provision arbitrary AWS resources |
|
||||
| **CloudFormation Stack Update for Role Access (CLOUDFORMATION-002)** | Update an existing CloudFormation stack to leverage its already-attached privileged service role |
|
||||
| **CloudFormation StackSet Creation with Privileged Role (CLOUDFORMATION-003)** | Create a CloudFormation StackSet with a privileged execution role to provision arbitrary resources across accounts |
|
||||
| **CloudFormation StackSet Update with Privileged Role (CLOUDFORMATION-004)** | Update an existing CloudFormation StackSet to inject malicious resources using a privileged execution role |
|
||||
| **CloudFormation Change Set Privilege Escalation (CLOUDFORMATION-005)** | Create and execute a change set on an existing stack to leverage its privileged service role |
|
||||
| **CodeBuild Project Creation with Privileged Role (CODEBUILD-001)** | Create a CodeBuild project with a privileged role to execute arbitrary code via a malicious buildspec |
|
||||
| **CodeBuild Buildspec Override for Role Access (CODEBUILD-002)** | Start a build on an existing CodeBuild project with a buildspec override to execute code with its privileged role |
|
||||
| **CodeBuild Batch Buildspec Override for Role Access (CODEBUILD-003)** | Start a batch build on an existing CodeBuild project with a buildspec override to execute code with its privileged role |
|
||||
| **CodeBuild Batch Project Creation with Privileged Role (CODEBUILD-004)** | Create a CodeBuild project configured for batch builds with a privileged role to execute arbitrary code via a malicious buildspec |
|
||||
| **Data Pipeline Creation with Privileged Role (DATAPIPELINE-001)** | Create a Data Pipeline with a privileged role to execute arbitrary commands on provisioned infrastructure |
|
||||
| **EC2 Instance Launch with Privileged Role (EC2-001)** | Launch EC2 instances with privileged IAM roles to gain their permissions via IMDS |
|
||||
| **EC2 Role Hijacking via UserData Injection (EC2-002)** | Inject malicious scripts into EC2 instance userData to gain the attached role's permissions |
|
||||
| **Spot Instance Launch with Privileged Role (EC2-003)** | Launch EC2 Spot Instances with privileged IAM roles to gain their permissions via IMDS |
|
||||
| **Launch Template Poisoning for Role Access (EC2-004)** | Inject malicious userData into launch templates that reference privileged roles, no PassRole needed |
|
||||
| **EC2 Instance Connect SSH Access for Role Credentials (EC2INSTANCECONNECT-003)** | Push a temporary SSH key to an EC2 instance via Instance Connect to access its attached role credentials through IMDS |
|
||||
| **ECS Service Creation with Privileged Role (ECS-001 - New Cluster)** | Create an ECS cluster and service with a privileged Fargate task role to execute arbitrary code |
|
||||
| **ECS Task Execution with Privileged Role (ECS-002 - New Cluster)** | Create an ECS cluster and run a one-off Fargate task with a privileged role to execute arbitrary code |
|
||||
| **ECS Service Creation with Privileged Role (ECS-003 - Existing Cluster)** | Deploy a Fargate service with a privileged role on an existing ECS cluster |
|
||||
| **ECS Task Execution with Privileged Role (ECS-004 - Existing Cluster)** | Run a one-off Fargate task with a privileged role on an existing ECS cluster |
|
||||
| **ECS Task Start with Privileged Role on EC2 (ECS-005 - Existing Cluster)** | Register a task definition with a privileged role and start it on an EC2 container instance to execute arbitrary code |
|
||||
| **ECS Exec Container Hijacking for Role Credentials (ECS-006)** | Shell into a running ECS container via ECS Exec to steal the attached task role's credentials |
|
||||
| **Glue Dev Endpoint with Privileged Role (GLUE-001)** | Create a Glue development endpoint with a privileged role attached to gain its permissions |
|
||||
| **Glue Dev Endpoint SSH Hijacking via Update (GLUE-002)** | Update an existing Glue development endpoint to inject an SSH public key and access its attached role credentials |
|
||||
| **Glue Job Creation with Privileged Role (GLUE-003)** | Create a Glue job with a privileged role and start it to execute arbitrary code with that role's permissions |
|
||||
| **Glue Job Creation with Scheduled Trigger and Privileged Role (GLUE-004)** | Create a Glue job with a privileged role and a scheduled trigger to persistently execute arbitrary code |
|
||||
| **Glue Job Hijacking via Update with Privileged Role (GLUE-005)** | Update an existing Glue job to attach a privileged role and inject malicious code, then start it to gain that role's permissions |
|
||||
| **Glue Job Hijacking with Scheduled Trigger and Privileged Role (GLUE-006)** | Update an existing Glue job to attach a privileged role and inject malicious code, then create a scheduled trigger for persistent automated execution |
|
||||
| **Policy Version Override for Self-Escalation (IAM-001)** | Create a new version of an attached policy with administrative permissions, instantly escalating the principal's own privileges |
|
||||
| **Access Key Creation for Lateral Movement (IAM-002)** | Create access keys for other IAM users to gain their permissions and move laterally across the account |
|
||||
| **Access Key Rotation Attack for Lateral Movement (IAM-003)** | Delete and recreate access keys for other IAM users to bypass the two-key limit and gain their permissions |
|
||||
| **Console Login Profile Creation for Lateral Movement (IAM-004)** | Create console login profiles for other IAM users to access the AWS Console with their permissions |
|
||||
| **Inline Policy Injection for Self-Escalation (IAM-005)** | Attach an inline policy with administrative permissions to your own role, instantly escalating privileges |
|
||||
| **Console Password Override for Lateral Movement (IAM-006)** | Change the console password of other IAM users to log in as them and gain their permissions |
|
||||
| **Inline Policy Injection on User for Self-Escalation (IAM-007)** | Attach an inline policy with administrative permissions to your own IAM user, instantly escalating privileges |
|
||||
| **Managed Policy Attachment on User for Self-Escalation (IAM-008)** | Attach existing managed policies with administrative permissions to your own IAM user, instantly escalating privileges |
|
||||
| **Managed Policy Attachment on Role for Self-Escalation (IAM-009)** | Attach existing managed policies with administrative permissions to your own IAM role, instantly escalating privileges |
|
||||
| **Managed Policy Attachment on Group for Self-Escalation (IAM-010)** | Attach existing managed policies with administrative permissions to a group you belong to, escalating privileges for all group members |
|
||||
| **Inline Policy Injection on Group for Self-Escalation (IAM-011)** | Attach an inline policy with administrative permissions to a group you belong to, escalating privileges for all group members |
|
||||
| **Trust Policy Hijacking for Role Assumption (IAM-012)** | Modify a role's trust policy to allow yourself to assume it, gaining the role's permissions |
|
||||
| **Group Membership Hijacking for Privilege Escalation (IAM-013)** | Add yourself to a privileged IAM group to inherit its permissions, gaining access to all policies attached to the group |
|
||||
| **Managed Policy Attachment with Role Assumption for Lateral Movement (IAM-014)** | Attach administrative managed policies to another role you can assume, then assume it to gain elevated privileges |
|
||||
| **Managed Policy Attachment with Access Key Creation for Lateral Movement (IAM-015)** | Attach administrative managed policies to another IAM user and create access keys for them to gain programmatic access with elevated privileges |
|
||||
| **Policy Version Override with Role Assumption for Lateral Movement (IAM-016)** | Create a new version of a customer-managed policy attached to another role with administrative permissions, then assume that role to gain elevated access |
|
||||
| **Inline Policy Injection with Role Assumption for Lateral Movement (IAM-017)** | Attach an inline policy with administrative permissions to another role you can assume, then assume it to gain elevated privileges |
|
||||
| **Inline Policy Injection with Access Key Creation for Lateral Movement (IAM-018)** | Attach an inline policy with administrative permissions to another IAM user and create access keys for them to gain programmatic access with elevated privileges |
|
||||
| **Managed Policy Attachment with Trust Policy Hijacking for Privilege Escalation (IAM-019)** | Attach administrative managed policies to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
|
||||
| **Policy Version Override with Trust Policy Hijacking for Privilege Escalation (IAM-020)** | Create a new version of a customer-managed policy attached to a role with administrative permissions and modify its trust policy to assume it, without prior assume-role access |
|
||||
| **Inline Policy Injection with Trust Policy Hijacking for Privilege Escalation (IAM-021)** | Add an inline policy with administrative permissions to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
|
||||
| **Lambda Function Creation with Privileged Role (LAMBDA-001)** | Create a Lambda function with a privileged IAM role and invoke it to execute code with that role's permissions |
|
||||
| **Lambda Function Creation with Event Source Trigger (LAMBDA-002)** | Create a Lambda function with a privileged IAM role and an event source mapping to trigger it automatically, executing code with the role's permissions |
|
||||
| **Lambda Function Code Injection (LAMBDA-003)** | Modify the code of an existing Lambda function to execute arbitrary commands with the function's execution role permissions |
|
||||
| **Lambda Function Code Injection with Direct Invocation (LAMBDA-004)** | Modify the code of an existing Lambda function and invoke it directly to execute arbitrary commands with the function's execution role permissions |
|
||||
| **Lambda Function Code Injection with Resource Policy Grant (LAMBDA-005)** | Modify the code of an existing Lambda function and grant yourself invocation permission via its resource-based policy to execute code with the function's execution role |
|
||||
| **Lambda Function Creation with Resource Policy Invocation (LAMBDA-006)** | Create a Lambda function with a privileged IAM role and grant yourself invocation permission via its resource-based policy to execute code with the role's permissions |
|
||||
| **SageMaker Notebook Creation with Privileged Role (SAGEMAKER-001)** | Create a SageMaker notebook instance with a privileged IAM role to execute arbitrary code with the role's permissions via the Jupyter environment |
|
||||
| **SageMaker Training Job Creation with Privileged Role (SAGEMAKER-002)** | Create a SageMaker training job with a privileged IAM role to execute arbitrary container code with the role's permissions |
|
||||
| **SageMaker Processing Job Creation with Privileged Role (SAGEMAKER-003)** | Create a SageMaker processing job with a privileged IAM role to execute arbitrary container code with the role's permissions |
|
||||
| **SageMaker Presigned Notebook URL for Privilege Escalation (SAGEMAKER-004)** | Generate a presigned URL to access an existing SageMaker notebook instance and execute code with its execution role's permissions |
|
||||
| **SageMaker Notebook Lifecycle Config Injection (SAGEMAKER-005)** | Inject a malicious lifecycle configuration into an existing SageMaker notebook to execute code with the notebook's execution role during startup |
|
||||
| **SSM Session Access for EC2 Role Credentials (SSM-001)** | Start an SSM session on an EC2 instance to access its attached role credentials through IMDS |
|
||||
| **SSM Send Command for EC2 Role Credentials (SSM-002)** | Execute commands on an EC2 instance via SSM Run Command to access its attached role credentials through IMDS |
|
||||
| **Role Assumption for Privilege Escalation (STS-001)** | Assume IAM roles with elevated permissions by exploiting bidirectional trust between the starting principal and the target role |
|
||||
|
||||
These tools enable workflows such as:
|
||||
|
||||
- Asking an AI assistant to identify privilege escalation paths in a specific AWS account
|
||||
- Automating attack path analysis across multiple scans
|
||||
- Combining attack path data with findings and compliance information for comprehensive security reports
|
||||
|
||||
@@ -2,13 +2,14 @@
|
||||
name: prowler-attack-paths-query
|
||||
description: >
|
||||
Creates Prowler Attack Paths openCypher queries using the Cartography schema as the source of truth
|
||||
for node labels, properties, and relationships. Also covers Prowler-specific additions (Internet node,
|
||||
ProwlerFinding, internal isolation labels) and $provider_uid scoping for predefined queries.
|
||||
for node labels, properties, and relationships. Covers Prowler-specific additions (Internet node,
|
||||
ProwlerFinding, internal isolation labels), $provider_uid scoping, and list-property item nodes
|
||||
with typed `HAS_*` edges that run efficiently on both Neo4j and Amazon Neptune sinks.
|
||||
Trigger: When creating or updating Attack Paths queries.
|
||||
license: Apache-2.0
|
||||
metadata:
|
||||
author: prowler-cloud
|
||||
version: "2.0"
|
||||
version: "3.0"
|
||||
scope: [root, api]
|
||||
auto_invoke:
|
||||
- "Creating Attack Paths queries"
|
||||
@@ -19,36 +20,30 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task
|
||||
|
||||
## Overview
|
||||
|
||||
Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs (ingested via Cartography) to detect security risks like privilege escalation paths, network exposure, and misconfigurations.
|
||||
|
||||
Queries are written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
|
||||
Attack Paths queries are read-only openCypher queries over a Cartography-ingested cloud graph that detect privilege escalation chains, network exposure, and other graph-shaped security risks. Queries are written in openCypher Version 9 so they run on both Neo4j and Amazon Neptune sinks.
|
||||
|
||||
---
|
||||
|
||||
## Two query audiences
|
||||
|
||||
This skill covers two types of queries with different isolation mechanisms:
|
||||
| | Predefined queries | Custom queries |
|
||||
| ------------------ | ----------------------------------------------------------- | --------------------------------------------------------------------- |
|
||||
| Where they live | `api/src/backend/api/attack_paths/queries/{provider}.py` | User-supplied via the custom query API endpoint |
|
||||
| Provider isolation | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection by `cypher_sanitizer.py` |
|
||||
| What to write | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate |
|
||||
| Internal labels | Never use | Never use (system-injected) |
|
||||
|
||||
| | Predefined queries | Custom queries |
|
||||
|---|---|---|
|
||||
| **Where they live** | `api/src/backend/api/attack_paths/queries/{provider}.py` | User/LLM-supplied via the custom query API endpoint |
|
||||
| **Provider isolation** | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection via `cypher_sanitizer.py` |
|
||||
| **What to write** | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate needed |
|
||||
| **Internal labels** | Never use (`_ProviderResource`, `_Tenant_*`, `_Provider_*`) | Never use (injected automatically by the system) |
|
||||
**Predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. That is the isolation boundary.
|
||||
|
||||
**For predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. This is the isolation boundary.
|
||||
|
||||
**For custom queries**: write natural Cypher without isolation concerns. The query runner injects a `_Provider_{uuid}` label into every node pattern before execution, and a post-query filter catches edge cases.
|
||||
**Custom queries**: write natural Cypher. The runner injects a `_Provider_{uuid}` label into every node pattern, and a post-query filter handles edge cases.
|
||||
|
||||
---
|
||||
|
||||
## Input Sources
|
||||
## Input sources
|
||||
|
||||
Queries can be created from:
|
||||
Two sources for new queries:
|
||||
|
||||
1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
|
||||
- Reference: https://github.com/DataDog/pathfinding.cloud
|
||||
- The aggregated `paths.json` is too large for WebFetch. Use Bash:
|
||||
1. **pathfinding.cloud ID** (e.g. `ECS-001`, `GLUE-001`), the Datadog research catalogue. The aggregated `paths.json` is too large for WebFetch:
|
||||
|
||||
```bash
|
||||
# Fetch a single path by ID
|
||||
@@ -64,28 +59,24 @@ Queries can be created from:
|
||||
| jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
|
||||
```
|
||||
|
||||
If `jq` is not available, use `python3 -c "import json,sys; ..."` as a fallback.
|
||||
If `jq` is unavailable, use `python3 -c "import json,sys; ..."`.
|
||||
|
||||
2. **Natural language description** from the user
|
||||
2. **Natural language description** from the requester.
|
||||
|
||||
---
|
||||
|
||||
## Query Structure
|
||||
## Query structure
|
||||
|
||||
### Provider scoping parameter
|
||||
|
||||
One parameter is injected automatically by the query runner:
|
||||
| Parameter | Property | Used on | Purpose |
|
||||
| --------------- | -------- | ------------ | -------------------------------------- |
|
||||
| `$provider_uid` | `id` | `AWSAccount` | Scopes the query to a specific account |
|
||||
|
||||
| Parameter | Property it matches | Used on | Purpose |
|
||||
| --------------- | ------------------- | ------------ | -------------------------------- |
|
||||
| `$provider_uid` | `id` | `AWSAccount` | Scopes to a specific AWS account |
|
||||
|
||||
All other nodes are isolated by path connectivity from the `AWSAccount` anchor.
|
||||
The runner binds `$provider_uid` automatically. Every other node is isolated by path connectivity from the `AWSAccount` anchor.
|
||||
|
||||
### Imports
|
||||
|
||||
All query files start with these imports:
|
||||
|
||||
```python
|
||||
from api.attack_paths.queries.types import (
|
||||
AttackPathsQueryAttribution,
|
||||
@@ -95,29 +86,33 @@ from api.attack_paths.queries.types import (
|
||||
from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
|
||||
```
|
||||
|
||||
The `PROWLER_FINDING_LABEL` constant (value: `"ProwlerFinding"`) is used via f-string interpolation in all queries. Never hardcode the label string.
|
||||
Always use `PROWLER_FINDING_LABEL` via f-string interpolation, never hardcode `"ProwlerFinding"`.
|
||||
|
||||
### Privilege escalation sub-patterns
|
||||
### Definition fields
|
||||
|
||||
There are four distinct privilege escalation patterns. Choose based on the attack type:
|
||||
- **id**: kebab-case `{provider}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
|
||||
- **name**: short, human-friendly label. Sourced queries append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
|
||||
- **short_description**: one sentence, no technical permissions.
|
||||
- **description**: full technical explanation, plain text.
|
||||
- **provider**: `aws`, `azure`, `gcp`, `kubernetes`, or `github`.
|
||||
- **cypher**: f-string Cypher body. Literal `{` / `}` are escaped as `{{` / `}}`.
|
||||
- **parameters**: `parameters=[]` if none.
|
||||
- **attribution**: optional `AttackPathsQueryAttribution(text, link)` for sourced queries. `link` uses the lowercase ID.
|
||||
|
||||
| Sub-pattern | Target | `path_target` shape | Example |
|
||||
|---|---|---|---|
|
||||
| Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
|
||||
| Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
|
||||
| Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
|
||||
| PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(...)` | EC2-001 |
|
||||
Append the constant to the `{PROVIDER}_QUERIES` list at the bottom of the provider file.
|
||||
|
||||
#### Self-escalation (e.g., IAM-001)
|
||||
---
|
||||
|
||||
The principal modifies resources attached to itself. `path_target` loops back to `principal`:
|
||||
## Predefined query template
|
||||
|
||||
The canonical shape combines a principal walk, an optional target walk, deduplicated nodes, and a typed finding overlay:
|
||||
|
||||
```python
|
||||
AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
|
||||
id="aws-{kebab-case-name}",
|
||||
name="{Human-friendly label} ({REFERENCE_ID})",
|
||||
short_description="{Brief explanation, no technical permissions.}",
|
||||
description="{Detailed description of the attack vector and impact.}",
|
||||
name="{Label} ({REFERENCE_ID})",
|
||||
short_description="{One sentence.}",
|
||||
description="{Full technical explanation.}",
|
||||
attribution=AttackPathsQueryAttribution(
|
||||
text="pathfinding.cloud - {REFERENCE_ID} - {permission}",
|
||||
link="https://pathfinding.cloud/paths/{reference_id_lowercase}",
|
||||
@@ -125,29 +120,27 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
|
||||
provider="aws",
|
||||
cypher=f"""
|
||||
// Find principals with {permission}
|
||||
MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
|
||||
WHERE stmt.effect = 'Allow'
|
||||
AND any(action IN stmt.action WHERE
|
||||
toLower(action) = '{permission_lowercase}'
|
||||
OR toLower(action) = '{service}:*'
|
||||
OR action = '*'
|
||||
)
|
||||
MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {{effect: 'Allow'}})
|
||||
MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
|
||||
WHERE toLower(act.value) IN ['{permission_lowercase}', '{service}:*']
|
||||
OR act.value = '*'
|
||||
WITH DISTINCT aws, principal, stmt, path_principal
|
||||
|
||||
// Find target resources attached to the same principal
|
||||
// Target resources attached to the same principal (sub-patterns below)
|
||||
MATCH path_target = (aws)--(target_policy:AWSPolicy)--(principal)
|
||||
WHERE target_policy.arn CONTAINS $provider_uid
|
||||
AND any(resource IN stmt.resource WHERE
|
||||
resource = '*'
|
||||
OR target_policy.arn CONTAINS resource
|
||||
)
|
||||
MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
|
||||
WHERE res.value = '*'
|
||||
OR target_policy.arn CONTAINS res.value
|
||||
|
||||
WITH DISTINCT path_principal, path_target
|
||||
WITH collect(path_principal) + collect(path_target) AS paths
|
||||
UNWIND paths AS p
|
||||
UNWIND nodes(p) AS n
|
||||
|
||||
WITH paths, collect(DISTINCT n) AS unique_nodes
|
||||
UNWIND unique_nodes AS n
|
||||
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
|
||||
RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
|
||||
""",
|
||||
@@ -155,158 +148,145 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
|
||||
)
|
||||
```
|
||||
|
||||
#### Other sub-pattern `path_target` shapes
|
||||
Key points:
|
||||
|
||||
The other 3 sub-patterns share the same `path_principal`, deduplication tail, and RETURN as self-escalation. Only the `path_target` MATCH differs:
|
||||
- The principal walk types the `POLICY` and `STATEMENT` hops. Both are low-fan-out (each principal has a handful of policies; each policy a handful of statements), so the typed edge lets the planner cost a cheap inline filter.
|
||||
- The `(aws)--` hub hops stay anonymous. `AWSAccount` is a high-degree node that fans out to every principal, role, policy, and resource in the account; typing those edges forces the planner to enumerate from the hub and collapses performance on multi-tenant Neptune.
|
||||
- Other relationship types appear only where the file's existing queries already use one (`TRUSTS_AWS_PRINCIPAL`, `STS_ASSUMEROLE_ALLOW`, `MEMBER_AWS_GROUP`, `HAS_EXECUTION_ROLE`).
|
||||
- The finding probe is typed `:HAS_FINDING` and left undirected. The type lets Neptune apply an inline edge filter; the lack of direction matches the convention of the rest of the file.
|
||||
- Collapse duplicate rows after each permission gate with `WITH DISTINCT`, carrying only the variables needed by later clauses.
|
||||
- Each `HAS_*` traversal is its own `MATCH` clause with a `WHERE` on the child item node. `WITH DISTINCT path_principal, path_target` precedes `collect(path...)` to dedupe the row multiplication produced by the joins.
|
||||
- The `RETURN` shape `paths, dpf, dpfr` is the contract the serializer and visualiser depend on. Do not change it.
|
||||
|
||||
---
|
||||
|
||||
## Privilege escalation sub-patterns
|
||||
|
||||
Four `path_target` shapes cover the common attack types. Each shares the canonical template's `path_principal`, deduplication tail, and `RETURN`; only the `path_target` MATCH and its resource predicate differ.
|
||||
|
||||
| Sub-pattern | Target | `path_target` shape | Example |
|
||||
| ------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------- | ------- |
|
||||
| Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
|
||||
| Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
|
||||
| Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
|
||||
| PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: '{service}.amazonaws.com'})` | EC2-001 |
|
||||
|
||||
**Multi-permission queries** (e.g. PassRole plus a service-create action) add permission gates before `path_target`. Reuse the per-query counter for new variables (`act2`, `policy2`, `stmt2`) and collapse rows after each gate:
|
||||
|
||||
```cypher
|
||||
// Lateral to user (e.g., IAM-002) - targets other IAM users
|
||||
MATCH path_target = (aws)--(target_user:AWSUser)
|
||||
WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_user.arn CONTAINS resource OR resource CONTAINS target_user.name)
|
||||
|
||||
// Assume-role lateral (e.g., IAM-014) - targets roles the principal can assume
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
|
||||
WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
|
||||
|
||||
// PassRole + service (e.g., EC2-001) - targets roles trusting a service
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: '{service}.amazonaws.com'})
|
||||
WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
|
||||
MATCH (principal)-[:POLICY]->(policy2:AWSPolicy)-[:STATEMENT]->(stmt2:AWSPolicyStatement {effect: 'Allow'})
|
||||
MATCH (stmt2)-[:HAS_ACTION]->(act2:AWSPolicyStatementActionItem)
|
||||
WHERE toLower(act2.value) IN ['service:*', 'service:createsomething']
|
||||
OR act2.value = '*'
|
||||
WITH DISTINCT aws, principal, stmt, stmt2, path_principal
|
||||
```
|
||||
|
||||
**Multi-permission**: PassRole queries require a second permission. Add `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` with its own WHERE before `path_target`, then check BOTH `stmt.resource` AND `stmt2.resource` against the target. See IAM-015 or EC2-001 in `aws.py` for examples.
|
||||
If a permission is an existence-only gate whose statement resource is not checked later, keep the policy and statement anonymous and carry only the variables still needed:
|
||||
|
||||
### Network exposure pattern
|
||||
```cypher
|
||||
MATCH (principal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(:AWSPolicyStatement {effect: 'Allow'})-[:HAS_ACTION]->(act3:AWSPolicyStatementActionItem)
|
||||
WHERE toLower(act3.value) IN ['service:*', 'service:othersomething']
|
||||
OR act3.value = '*'
|
||||
WITH DISTINCT aws, principal, stmt, path_principal
|
||||
```
|
||||
|
||||
The Internet node is reached via `CAN_ACCESS` through the already-scoped resource, not via a standalone lookup:
|
||||
When all matching principals can target the same independent resource set, collect principal paths before expanding targets instead of creating one row per principal-target pair:
|
||||
|
||||
```cypher
|
||||
WITH aws, collect(DISTINCT path_principal) AS principal_paths
|
||||
MATCH path_target = (aws)--(target)
|
||||
WITH principal_paths + collect(DISTINCT path_target) AS paths
|
||||
```
|
||||
|
||||
Statements that constrain a target are still checked via `HAS_RESOURCE` traversals (`res`, `res2`). See IAM-015 or EC2-001 in `aws.py`.
|
||||
|
||||
---
|
||||
|
||||
## Network exposure pattern
|
||||
|
||||
The Internet node is reached via `CAN_ACCESS` through an already-scoped resource, never as a standalone lookup:
|
||||
|
||||
```python
|
||||
AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
|
||||
id="aws-{kebab-case-name}",
|
||||
name="{Human-friendly label}",
|
||||
short_description="{Brief explanation.}",
|
||||
description="{Detailed description.}",
|
||||
provider="aws",
|
||||
cypher=f"""
|
||||
// Match exposed resources (MUST chain from `aws`)
|
||||
MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
|
||||
WHERE resource.exposed_internet = true
|
||||
cypher=f"""
|
||||
// Resource scoped through the account anchor
|
||||
MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
|
||||
WHERE resource.exposed_internet = true
|
||||
|
||||
// Internet node reached via path connectivity through the resource
|
||||
OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
|
||||
// Internet node reached via path connectivity through the resource
|
||||
OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
|
||||
|
||||
WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
|
||||
UNWIND paths AS p
|
||||
UNWIND nodes(p) AS n
|
||||
WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
|
||||
UNWIND paths AS p
|
||||
UNWIND nodes(p) AS n
|
||||
|
||||
WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
|
||||
UNWIND unique_nodes AS n
|
||||
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
|
||||
UNWIND unique_nodes AS n
|
||||
OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
|
||||
RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
|
||||
internet, can_access
|
||||
""",
|
||||
parameters=[],
|
||||
)
|
||||
RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
|
||||
internet, can_access
|
||||
"""
|
||||
```
|
||||
|
||||
### Register in query list
|
||||
|
||||
Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
|
||||
|
||||
```python
|
||||
AWS_QUERIES: list[AttackPathsQueryDefinition] = [
|
||||
# ... existing queries ...
|
||||
AWS_{NEW_QUERY_NAME}, # Add here
|
||||
]
|
||||
```
|
||||
The `CAN_ACCESS` edge stays typed and directed (`-[:CAN_ACCESS]->`); that is its canonical sync-time orientation.
|
||||
|
||||
---
|
||||
|
||||
## Step-by-step creation process
|
||||
## List-typed properties as child nodes
|
||||
|
||||
### 1. Read the queries module
|
||||
Some Cartography node properties carry a list of values: `AWSPolicyStatement.action`, `AWSPolicyStatement.resource`, `KMSKey.encryption_algorithms`, `CloudFrontDistribution.aliases`, and many others. The graph models each such property as a set of child item nodes connected to the parent by a typed edge. Queries reach the values by traversing the edge; the parent does not carry the list as a single field.
|
||||
|
||||
**FIRST**, read all files in the queries module to understand the structure, type definitions, registration, and existing style:
|
||||
### Naming convention
|
||||
|
||||
```text
|
||||
api/src/backend/api/attack_paths/queries/
|
||||
├── __init__.py # Module exports
|
||||
├── types.py # AttackPathsQueryDefinition, AttackPathsQueryParameterDefinition
|
||||
├── registry.py # Query registry logic
|
||||
└── {provider}.py # Provider-specific queries (e.g., aws.py)
|
||||
For a list-typed parent property the sink stores:
|
||||
|
||||
- **Child label**: `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` → `AWSPolicyStatementResourceItem`.
|
||||
- **Edge type**: `HAS_<PROPERTY_UPPER>`. Example: `resource` → `HAS_RESOURCE`.
|
||||
- **Child property**: `value` (a single scalar string) for scalar-list properties. For list-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) the child carries the dict keys as named fields per the catalog's `field_map`.
|
||||
|
||||
### Variable naming for child-item matches
|
||||
|
||||
`aws.py` uses a per-query counter for each `HAS_*` traversal so chained matches stay unambiguous:
|
||||
|
||||
| Edge | First | Second | Third |
|
||||
| ----------------- | ------ | ------- | ------- |
|
||||
| `HAS_ACTION` | `act` | `act2` | `act3` |
|
||||
| `HAS_RESOURCE` | `res` | `res2` | `res3` |
|
||||
| `HAS_NOTACTION` | `nact` | `nact2` | `nact3` |
|
||||
| `HAS_NOTRESOURCE` | `nres` | `nres2` | `nres3` |
|
||||
|
||||
The counter resets at the top of every query.
|
||||
|
||||
### Example - action match
|
||||
|
||||
Find statements that grant `iam:PassRole`, `iam:*`, or `*`. Traverse the `HAS_ACTION` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`:
|
||||
|
||||
```cypher
|
||||
MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
|
||||
MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
|
||||
WHERE toLower(act.value) IN ['iam:passrole', 'iam:*']
|
||||
OR act.value = '*'
|
||||
```
|
||||
|
||||
**DO NOT** use generic templates. Match the exact style of existing queries in the file.
|
||||
The literal-action list is case-folded with `toLower(act.value)` because IAM authors mix case (`iam:PassRole`, `iam:passrole`); the `*` wildcard never lower-cases.
|
||||
|
||||
### 2. Fetch and consult the Cartography schema
|
||||
### Example - resource ARN match
|
||||
|
||||
**This is the most important step.** Every node label, property, and relationship in the query must exist in the Cartography schema for the pinned version. Do not guess or rely on memory.
|
||||
Find statements whose resource can target a specific role:
|
||||
|
||||
Check `api/pyproject.toml` for the Cartography dependency, then fetch the schema:
|
||||
|
||||
```bash
|
||||
grep cartography api/pyproject.toml
|
||||
```cypher
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)
|
||||
MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
|
||||
WHERE res.value = '*'
|
||||
OR res.value CONTAINS target_role.name
|
||||
OR target_role.arn CONTAINS res.value
|
||||
```
|
||||
|
||||
Build the schema URL (ALWAYS use the specific tag, not master/main):
|
||||
Three predicates cover the cases: full wildcard (`*`), pattern containing the role name (`arn:aws:iam::*:role/admin*`), and pattern that is a prefix or component of the actual ARN.
|
||||
|
||||
```text
|
||||
# Git dependency (prowler-cloud/cartography@0.126.1):
|
||||
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/{provider}/schema.md
|
||||
### Catalog of list properties
|
||||
|
||||
# PyPI dependency (cartography = "^0.126.0"):
|
||||
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/{provider}/schema.md
|
||||
```
|
||||
|
||||
Read the schema to discover available node labels, properties, and relationships for the target resources. Internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never appear in queries.
|
||||
|
||||
### 4. Create query definition
|
||||
|
||||
Use the appropriate pattern (privilege escalation or network exposure) with:
|
||||
|
||||
- **id**: `{provider}-{kebab-case-description}`
|
||||
- **name**: Short, human-friendly label. For sourced queries, append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
|
||||
- **short_description**: Brief explanation, no technical permissions.
|
||||
- **description**: Full technical explanation. Plain text only.
|
||||
- **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
|
||||
- **cypher**: The openCypher query with proper escaping
|
||||
- **parameters**: Optional list of user-provided parameters (`parameters=[]` if none)
|
||||
- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes source, reference ID, and permissions. The `link` uses a lowercase ID. Omit for non-sourced queries.
|
||||
|
||||
### 5. Add query to provider list
|
||||
|
||||
Add the constant to the `{PROVIDER}_QUERIES` list.
|
||||
|
||||
---
|
||||
|
||||
## Query naming conventions
|
||||
|
||||
### Query ID
|
||||
|
||||
```text
|
||||
{provider}-{category}-{description}
|
||||
```
|
||||
|
||||
Examples: `aws-ec2-privesc-passrole-iam`, `aws-ec2-instances-internet-exposed`
|
||||
|
||||
### Query constant name
|
||||
|
||||
```text
|
||||
{PROVIDER}_{CATEGORY}_{DESCRIPTION}
|
||||
```
|
||||
|
||||
Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
|
||||
|
||||
---
|
||||
|
||||
## Query categories
|
||||
|
||||
| Category | Description | Example |
|
||||
| -------------------- | ------------------------------ | ------------------------- |
|
||||
| Basic Resource | List resources with properties | RDS instances, S3 buckets |
|
||||
| Network Exposure | Internet-exposed resources | EC2 with public IPs |
|
||||
| Privilege Escalation | IAM privilege escalation paths | PassRole + RunInstances |
|
||||
| Data Access | Access to sensitive data | EC2 with S3 access |
|
||||
The provider catalog lives in `api/src/backend/tasks/jobs/attack_paths/provider_config.py` (`AWS_NORMALIZED_LISTS`). Beyond policy statements it includes KMS algorithms, ECS container-definition lists (`entry_point`, `command`, `links`, `dns_servers`, ...), CloudFront aliases, Inspector finding URL and vulnerability lists, RDS event-subscription categories, and others. To query a list property that is not in the catalog, add an entry there first so the sync layer materialises it.
|
||||
|
||||
---
|
||||
|
||||
@@ -315,53 +295,42 @@ Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
|
||||
### Match account and principal
|
||||
|
||||
```cypher
|
||||
MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
|
||||
MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
|
||||
```
|
||||
|
||||
### Check IAM action permissions
|
||||
The `(aws)--(principal)` hop stays anonymous; the `POLICY` and `STATEMENT` hops are typed.
|
||||
|
||||
### Roles trusting a service
|
||||
|
||||
```cypher
|
||||
WHERE stmt.effect = 'Allow'
|
||||
AND any(action IN stmt.action WHERE
|
||||
toLower(action) = 'iam:passrole'
|
||||
OR toLower(action) = 'iam:*'
|
||||
OR action = '*'
|
||||
)
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
|
||||
```
|
||||
|
||||
### Find roles trusting a service
|
||||
### Roles a principal can assume
|
||||
|
||||
```cypher
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)
|
||||
```
|
||||
|
||||
### Find roles the principal can assume
|
||||
### JSON-encoded properties
|
||||
|
||||
Note the arrow direction - `STS_ASSUMEROLE_ALLOW` points from the role to the principal:
|
||||
Object-typed Cartography properties (most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement`) are stored as JSON-encoded strings, e.g. `'{"StringEquals":{"aws:SourceAccount":"123456789012"}}'`. There is no JSON parser at query time, so use `CONTAINS` for substring checks:
|
||||
|
||||
```cypher
|
||||
MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
|
||||
WHERE stmt.condition CONTAINS '"aws:SourceAccount"'
|
||||
```
|
||||
|
||||
### Check resource scope
|
||||
|
||||
```cypher
|
||||
WHERE any(resource IN stmt.resource WHERE
|
||||
resource = '*'
|
||||
OR target_role.arn CONTAINS resource
|
||||
OR resource CONTAINS target_role.name
|
||||
)
|
||||
```
|
||||
For structured inspection, fetch the rows and parse in Python. Cypher cannot navigate JSON object keys.
|
||||
|
||||
### Internet node via path connectivity
|
||||
|
||||
The Internet node is reached through `CAN_ACCESS` relationships to already-scoped resources. No standalone lookup needed:
|
||||
|
||||
```cypher
|
||||
OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
|
||||
```
|
||||
|
||||
### Multi-label OR (match multiple resource types)
|
||||
`resource` must already be bound by the account-anchored pattern above.
|
||||
|
||||
### Multi-label OR (multiple resource types)
|
||||
|
||||
```cypher
|
||||
MATCH path = (aws:AWSAccount {id: $provider_uid})-[r]-(x)-[q]-(y)
|
||||
@@ -373,7 +342,7 @@ WHERE (x:EC2PrivateIp AND x.public_ip = $ip)
|
||||
|
||||
### Include Prowler findings
|
||||
|
||||
Deduplicate nodes before the ProwlerFinding lookup to avoid redundant OPTIONAL MATCH calls on nodes that appear in multiple paths:
|
||||
Deduplicate nodes before the typed finding probe to avoid one `OPTIONAL MATCH` per path-occurrence of the same node:
|
||||
|
||||
```cypher
|
||||
WITH collect(path_principal) + collect(path_target) AS paths
|
||||
@@ -382,12 +351,12 @@ UNWIND nodes(p) AS n
|
||||
|
||||
WITH paths, collect(DISTINCT n) AS unique_nodes
|
||||
UNWIND unique_nodes AS n
|
||||
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
|
||||
RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
|
||||
```
|
||||
|
||||
For network exposure queries, aggregate the internet node and relationship alongside paths:
|
||||
For network-exposure queries, aggregate the Internet node and its edge alongside paths:
|
||||
|
||||
```cypher
|
||||
WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
|
||||
@@ -396,7 +365,7 @@ UNWIND nodes(p) AS n
|
||||
|
||||
WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
|
||||
UNWIND unique_nodes AS n
|
||||
OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
|
||||
|
||||
RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
|
||||
internet, can_access
|
||||
@@ -406,22 +375,22 @@ RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
|
||||
|
||||
## Prowler-specific labels and relationships
|
||||
|
||||
These are added by the sync task, not part of the Cartography schema. For all other node labels, properties, and relationships, **always consult the Cartography schema** (see step 2 below).
|
||||
Added by the sync task, not part of the Cartography schema. For everything else, consult the pinned Cartography schema (see "Creation steps").
|
||||
|
||||
| Label/Relationship | Description |
|
||||
| ---------------------- | -------------------------------------------------- |
|
||||
| `ProwlerFinding` | Finding node (`status`, `severity`, `check_id`) |
|
||||
| `Internet` | Internet sentinel node |
|
||||
| `CAN_ACCESS` | Internet-to-resource exposure (relationship) |
|
||||
| `HAS_FINDING` | Resource-to-finding link (relationship) |
|
||||
| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship |
|
||||
| `STS_ASSUMEROLE_ALLOW` | Can assume role (direction: role -> principal) |
|
||||
| Label / Relationship | Description |
|
||||
| ---------------------- | ----------------------------------------------------------- |
|
||||
| `ProwlerFinding` | Finding node (`status`, `severity`, `check_id`) |
|
||||
| `Internet` | Internet sentinel node |
|
||||
| `CAN_ACCESS` | `(Internet)-[:CAN_ACCESS]->(resource)` exposure edge |
|
||||
| `HAS_FINDING` | `(resource)-[:HAS_FINDING]->(:ProwlerFinding)` finding link |
|
||||
| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship |
|
||||
| `STS_ASSUMEROLE_ALLOW` | Can assume role |
|
||||
|
||||
---
|
||||
|
||||
## Parameters
|
||||
|
||||
For queries requiring user input:
|
||||
For queries that take user input:
|
||||
|
||||
```python
|
||||
parameters=[
|
||||
@@ -438,50 +407,83 @@ parameters=[
|
||||
|
||||
---
|
||||
|
||||
## Best practices
|
||||
## openCypher compatibility
|
||||
|
||||
1. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). An unanchored `MATCH` would return nodes from all providers.
|
||||
Queries must run on both Neo4j and Amazon Neptune. Avoid these constructs:
|
||||
|
||||
```cypher
|
||||
// WRONG: matches ALL AWSRoles across all providers
|
||||
MATCH (role:AWSRole) WHERE role.name = 'admin'
|
||||
| Feature | Use instead |
|
||||
| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph |
|
||||
| Neptune extensions | Standard openCypher |
|
||||
| `reduce()` | `UNWIND` + `collect()` |
|
||||
| `FOREACH` | `WITH` + `UNWIND` + `SET` |
|
||||
| Regex `=~` | `toLower()` + exact match, or `STARTS WITH` / `CONTAINS` |
|
||||
| `CALL () { UNION }` | Multi-label `OR` in `WHERE` (see pattern above) |
|
||||
| `any(x IN list ...)` | `size([x IN list WHERE pred]) > 0` |
|
||||
| `all(x IN list ...)` | `size([x IN list WHERE pred]) = size(list)` |
|
||||
| `none(x IN list ...)` | `size([x IN list WHERE pred]) = 0` |
|
||||
| `EXISTS { MATCH (pattern) WHERE pred }` | Standalone `MATCH (pattern)` + `WHERE pred`; precede the downstream `collect(path...)` with `WITH DISTINCT <path-vars>` to dedupe the joins |
|
||||
|
||||
// CORRECT: scoped to the specific account's subgraph
|
||||
MATCH (aws)--(role:AWSRole) WHERE role.name = 'admin'
|
||||
```
|
||||
|
||||
**Exception**: A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph by the first MATCH. It does not need to chain from `aws` again.
|
||||
|
||||
2. **Include Prowler findings**: Always add `OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})` with `collect(DISTINCT pf)`.
|
||||
|
||||
3. **Comment the query purpose**: Add inline comments explaining each MATCH clause.
|
||||
|
||||
4. **Never use internal labels in queries**: `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are for system isolation. They should never appear in predefined or custom query text.
|
||||
|
||||
6. **Internet node uses path connectivity**: Reach it via `OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)` where `resource` is already scoped by the account anchor. No standalone lookup.
|
||||
For list-typed properties in the catalog (action, resource, and so on), traverse the `HAS_*` edges to the child item nodes via the multi-`MATCH` shape shown in "List-typed properties as child nodes". The parent node does not carry the list as a single field, so `split(...)` and comma-string predicates do not apply.
|
||||
|
||||
---
|
||||
|
||||
## openCypher compatibility
|
||||
## Best practices
|
||||
|
||||
Queries must be written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
|
||||
1. **Chain every MATCH from the account anchor.** An unanchored `MATCH (role:AWSRole)` returns roles from every provider in the graph; `MATCH (aws)--(role:AWSRole)` is scoped. A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph.
|
||||
2. **Type the finding probe.** Always `OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})`. The type lets Neptune apply an inline edge filter; an untyped probe scans every incident edge of high-degree nodes.
|
||||
3. **Comment each MATCH.** One inline `// ...` line per clause explaining its role.
|
||||
4. **Never use internal labels.** `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are system isolation labels and must not appear in query text (predefined or custom).
|
||||
5. **Reach the Internet node through path connectivity** via `(internet:Internet)-[:CAN_ACCESS]->(resource)`, never as a standalone match.
|
||||
6. **Preserve the `RETURN` contract.** `paths, dpf, dpfr` for the standard shape; add `internet, can_access` for network-exposure queries. The serializer and visualiser depend on these names.
|
||||
|
||||
### Avoid these (not in openCypher spec)
|
||||
---
|
||||
|
||||
| Feature | Use instead |
|
||||
| -------------------------- | ------------------------------------------------------ |
|
||||
| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph |
|
||||
| Neptune extensions | Standard openCypher |
|
||||
| `reduce()` function | `UNWIND` + `collect()` |
|
||||
| `FOREACH` clause | `WITH` + `UNWIND` + `SET` |
|
||||
| Regex operator (`=~`) | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH`. One legacy query uses `=~` - do not add new usages |
|
||||
| `CALL () { UNION }` | Multi-label OR in WHERE (see patterns section) |
|
||||
## Naming conventions
|
||||
|
||||
- **ID**: kebab-case `{provider}-{category}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
|
||||
- **Constant**: SHOUTING*SNAKE_CASE `{PROVIDER}*{CATEGORY}\_{DESCRIPTION}`, e.g. `AWS_EC2_PRIVESC_PASSROLE_IAM`.
|
||||
|
||||
---
|
||||
|
||||
## Creation steps
|
||||
|
||||
1. **Read the queries module first** to match the existing style:
|
||||
|
||||
```text
|
||||
api/src/backend/api/attack_paths/queries/
|
||||
├── __init__.py
|
||||
├── types.py # dataclass definitions
|
||||
├── registry.py
|
||||
└── {provider}.py
|
||||
```
|
||||
|
||||
2. **Fetch the Cartography schema for the pinned version.** Do not guess labels, properties, or relationships. Read the dependency pin:
|
||||
|
||||
```bash
|
||||
grep cartography api/pyproject.toml
|
||||
```
|
||||
|
||||
Then fetch the schema for that exact tag:
|
||||
|
||||
```text
|
||||
# Git pin (prowler-cloud/cartography@<TAG>):
|
||||
https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
|
||||
|
||||
# PyPI pin (cartography==<TAG>):
|
||||
https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
|
||||
```
|
||||
|
||||
3. **Build the query** using the canonical predefined template plus the appropriate sub-pattern (privilege escalation or network exposure). For list-typed properties (action/resource/etc.), traverse the exploded child nodes via `[:HAS_ACTION]->(:AWSPolicyStatementActionItem)` etc. (see "List-typed properties as child nodes" and the `AWS_NORMALIZED_LISTS` catalog).
|
||||
|
||||
4. **Register** the constant in the `{PROVIDER}_QUERIES` list at the bottom of the provider file.
|
||||
|
||||
---
|
||||
|
||||
## Reference
|
||||
|
||||
- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`, not WebFetch)
|
||||
- **Cartography schema**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
|
||||
- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
|
||||
- **openCypher spec**: https://github.com/opencypher/openCypher
|
||||
- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`; the aggregated `paths.json` is too large for WebFetch).
|
||||
- **Cartography schema** (per pinned tag): `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{tag}/docs/root/modules/{provider}/schema.md`.
|
||||
- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html.
|
||||
- **openCypher spec**: https://github.com/opencypher/openCypher.
|
||||
- **Sync converter** (`tasks/jobs/attack_paths/sync.py`): list-typed node properties listed in `tasks/jobs/attack_paths/provider_config.py::AWS_NORMALIZED_LISTS` are materialised as child item nodes + `HAS_*` edges. Properties that are not in the catalog are serialised to a comma-delimited string and emit a one-time warning. Dict-typed properties become JSON strings. Same shape on both sinks.
|
||||
|
||||
Reference in New Issue
Block a user