feat(api): make Attack Paths sink selectable between Neo4j and Neptune (#11524)

2026-07-04 19:21:51 +00:00 · 2026-06-26 10:22:29 +02:00
parent 9b8b77cec0
commit 5793cd7e38
48 changed files with 9928 additions and 3210 deletions
@@ -169,3 +169,7 @@ GEMINI.md
 # Claude Code
 .claude/*
 # Docker
 docker-compose.override.yml
 docker-compose-dev.override.yml
@@ -83,10 +83,18 @@ prowler dashboard
 ## Attack Paths
-Attack Paths automatically extends every completed AWS scan with a Neo4j graph that combines Cartography's cloud inventory with Prowler findings. The feature runs in the API worker after each scan and therefore requires:
+Attack Paths automatically extends every completed AWS scan with a graph that combines Cartography's cloud inventory with Prowler findings. The feature runs in the API worker after each scan.
- An accessible Neo4j instance (the Docker Compose files already ships a `neo4j` service).
+Two graph backends are supported as the long-lived sink:
- The following environment variables so Django and Celery can connect:
+
 - **Neo4j** (default; the Docker Compose files already ship a `neo4j` service).
 - **Amazon Neptune** (cloud-managed; opt-in).
 Select the sink with `ATTACK_PATHS_SINK_DATABASE` (`neo4j` or `neptune`; default `neo4j`).
 > Note: Cartography ingestion always uses a temporary Neo4j database, regardless of the configured sink. The `NEO4J_*` variables below must remain set even when `ATTACK_PATHS_SINK_DATABASE=neptune`.
 ### Neo4j sink
 | Variable | Description | Default |
 | --- | --- | --- |
@@ -94,6 +102,17 @@ Attack Paths automatically extends every completed AWS scan with a Neo4j graph t
 | `NEO4J_PORT` | Bolt port exposed by Neo4j. | `7687` |
 | `NEO4J_USER` / `NEO4J_PASSWORD` | Credentials with rights to create per-tenant databases. | `neo4j` / `neo4j_password` |
 ### Neptune sink
 | Variable | Description | Default |
 | --- | --- | --- |
 | `NEPTUNE_WRITER_ENDPOINT` | Bolt host for the Neptune writer instance. Required when sink is `neptune`. | _empty_ |
 | `NEPTUNE_READER_ENDPOINT` | Optional reader endpoint for read-only queries. Falls back to the writer when unset. | _empty_ |
 | `NEPTUNE_PORT` | Bolt port exposed by Neptune. | `8182` |
 | `AWS_REGION` | Region the Neptune cluster lives in. Required when sink is `neptune`. | _empty_ |
 Neptune authenticates with SigV4 using the standard boto3 credential chain. The worker's IAM role (or `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`) supplies the credentials. There is no Neptune password variable.
 Every AWS provider scan will enqueue an Attack Paths ingestion job automatically. Other cloud providers will be added in future iterations.
@@ -2,6 +2,14 @@
 All notable changes to the **Prowler API** are documented in this file.
 ## [1.33.0] (Prowler UNRELEASED)
 ### 🔄 Changed
 - Attack Paths: AWS Neptune is now supported as a persistent sink database, selectable via `ATTACK_PATHS_SINK_DATABASE=neptune` (default `neo4j`), Cartography's (bumped to 0.138.1) per-scan ingest database stays on Neo4j [(#11524)](https://github.com/prowler-cloud/prowler/pull/11524)
 ---
 ## [1.32.2] (Prowler UNRELEASED)
 ### 🐞 Fixed
@@ -58,7 +58,7 @@ dependencies = [
  "matplotlib (==3.10.8)",
  "reportlab (==4.4.10)",
  "neo4j (==6.1.0)",
-  "cartography (==0.135.0)",
+  "cartography (==0.138.1)",
  "gevent (==25.9.1)",
  "werkzeug (==3.1.7)",
  "sqlparse (==0.5.5)",
@@ -193,7 +193,7 @@ constraint-dependencies = [
  "blinker==1.9.0",
  "boto3==1.40.61",
  "botocore==1.40.61",
-  "cartography==0.135.0",
+  "cartography==0.138.1",
  "celery==5.6.2",
  "certifi==2026.1.4",
  "cffi==2.0.0",
@@ -447,7 +447,7 @@ constraint-dependencies = [
  "wcwidth==0.5.3",
  "websocket-client==1.9.0",
  "werkzeug==3.1.7",
-  "workos==6.0.4",
+  "workos==6.0.8",
  "wrapt==1.17.3",
  "xlsxwriter==3.2.9",
  "xmlsec==1.3.17",
@@ -458,8 +458,13 @@ constraint-dependencies = [
  "zope-interface==8.2",
  "zstd==1.5.7.3"
 ]
-# prowler@master needs okta==3.4.2; cartography 0.135.0 declares okta<1.0.0 for an
+# prowler@master needs okta==3.4.2, but cartography 0.138.1 requires okta<1.0.0.
-# integration prowler does not import.
+# Attack Paths does not ingest Okta today, so override the Cartography
 # dependency to the Prowler pin.
 #
 # prowler@master needs azure-mgmt-containerservice==34.1.0, but cartography
 # 0.138.1 requires azure-mgmt-containerservice>=41.0.0. Attack Paths does not
 # ingest Azure today, so override the Cartography dependency to the Prowler pin.
 #
 # prowler@master hard-pins microsoft-kiota-abstractions==1.9.2 in [project.dependencies].
 # The microsoft-kiota-http security bump to 1.9.9 (GHSA-7j59-v9qr-6fq9) requires
@@ -475,6 +480,7 @@ constraint-dependencies = [
 # that request pyjwt[crypto] and leave cryptography (needed for RS256) only transitive.
 override-dependencies = [
  "okta==3.4.2",
  "azure-mgmt-containerservice==34.1.0",
  "microsoft-kiota-abstractions==1.9.9",
  "dulwich==1.2.5",
  "pyjwt[crypto]==2.13.0"
@@ -42,9 +42,6 @@ class ApiConfig(AppConfig):
        ):
            self._ensure_crypto_keys()
        # Neo4j driver is created lazily on first use (see api.attack_paths.database).
        # App init never contacts Neo4j, so a Neo4j outage cannot block API startup.
    def _ensure_crypto_keys(self):
        """
        Orchestrator method that ensures all required cryptographic keys are present.
@@ -4,10 +4,10 @@ Cypher sanitizer for custom (user-supplied) Attack Paths queries.
 Two responsibilities:
 1. **Validation** - reject queries containing SSRF or dangerous procedure
-   patterns (defense-in-depth; the primary control is ``neo4j.READ_ACCESS``).
+   patterns (defense-in-depth; the primary control is `neo4j.READ_ACCESS`).
 2. **Provider-scoped label injection** - inject a dynamic
-   ``_Provider_{uuid}`` label into every node pattern so the database can
+   `_Provider_{uuid}` label into every node pattern so the database can
   use its native label index for provider isolation.
 Label-injection pipeline:
@@ -25,13 +25,13 @@ from rest_framework.exceptions import ValidationError
 from tasks.jobs.attack_paths.config import get_provider_label
 # Step 1 - String / comment protection
-# Single combined regex: strings first, then line comments.
+# Single combined regex: strings first, then line comments
 # The regex engine finds the leftmost match, so a string like 'https://prowler.com'
-# is consumed as a string before the // inside it can match as a comment.
+# is consumed as a string before the // inside it can match as a comment
 _PROTECTED_RE = re.compile(r"'(?:[^'\\]|\\.)*'|\"(?:[^\"\\]|\\.)*\"|//[^\n]*")
 # Step 2 - Clause splitting
-# OPTIONAL MATCH must come before MATCH to avoid partial matching.
+# `OPTIONAL MATCH` must come before `MATCH` to avoid partial matching
 _CLAUSE_RE = re.compile(
    r"\b(OPTIONAL\s+MATCH|MATCH|WHERE|RETURN|WITH|ORDER\s+BY"
    r"|SKIP|LIMIT|UNION|UNWIND|CALL)\b",
@@ -39,10 +39,10 @@ _CLAUSE_RE = re.compile(
 )
 # Pass A - Labeled node patterns (all segments)
-# Matches node patterns that have at least one :Label.
+# Matches node patterns that have at least one `:Label`
-# (?<!\w)\(  - open paren NOT preceded by a word char (excludes function calls).
+# `(?<!\w)\(`  - open paren NOT preceded by a word char, excludes function calls
-# Group 1:  optional variable + one or more :Label
+# Group 1:  optional variable + one or more `:Label`
-# Group 2:  optional {properties} + closing paren
+# Group 2:  optional `{`properties`}` + closing paren
 _LABELED_NODE_RE = re.compile(
    r"(?<!\w)\("
    r"("
@@ -55,9 +55,9 @@ _LABELED_NODE_RE = re.compile(
    r")"
 )
-# Pass B - Bare node patterns (MATCH segments only)
+# Pass B - Bare node patterns (`MATCH` segments only)
-# Matches (identifier) or (identifier {properties}) without any :Label.
+# Matches (identifier) or (identifier {properties}) without any `:Label`
-# Only applied in MATCH/OPTIONAL MATCH segments.
+# Only applied in `MATCH` / `OPTIONAL MATCH` segments
 _BARE_NODE_RE = re.compile(
    r"(?<!\w)\(" r"(\s*[a-zA-Z_]\w*)" r"(\s*(?:\{[^}]*\})?)" r"\s*\)"
 )
@@ -134,9 +134,7 @@ def inject_provider_label(cypher: str, provider_id: str) -> str:
    return work
 # ---------------------------------------------------------------------------
 # Validation
 # ---------------------------------------------------------------------------
 # Patterns that indicate SSRF or dangerous procedure calls
 # Defense-in-depth layer - the primary control is `neo4j.READ_ACCESS`
@@ -1,261 +1,32 @@
-import atexit
+"""Backwards-compatible facade over the ingest and sink modules.
-import logging
+
-import threading
+Historically this module owned a single Neo4j driver used for both the
-from collections.abc import Iterator
+cartography temp database and the per-tenant sink database. The port to AWS
-from contextlib import contextmanager
+Neptune split those roles: the cartography ingest (temp) database is always
 Neo4j and lives in `api.attack_paths.ingest`; the sink is configurable
 (Neo4j or Neptune) and lives in `api.attack_paths.sink`. This shim preserves
 the public API that `tasks/` and `api/v1/views.py` already depend on, and
 dispatches to the right module by database-name prefix.
 A database name starting with `db-tmp-scan-` is a cartography temp DB and
 routes to ingest. Everything else routes to the configured sink.
 """
 from contextlib import AbstractContextManager
 from typing import Any
 from uuid import UUID
-import neo4j
+import neo4j  # noqa: F401 - kept for tests that patch api.attack_paths.database.neo4j
-import neo4j.exceptions
+from api.attack_paths import ingest
-from api.attack_paths.retryable_session import RetryableSession
+from api.attack_paths import sink as sink_module
 from config.env import env
-from django.conf import settings
+from django.conf import (
-from tasks.jobs.attack_paths.config import (
+    settings,  # noqa: F401 - kept for tests that patch ...database.settings
    BATCH_SIZE,
    PROVIDER_RESOURCE_LABEL,
    get_provider_label,
 )
 # Without this Celery goes crazy with Neo4j logging
 logging.getLogger("neo4j").setLevel(logging.ERROR)
 logging.getLogger("neo4j").propagate = False
 SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
 )
 READ_QUERY_TIMEOUT_SECONDS = env.int(
    "ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
 )
 MAX_CUSTOM_QUERY_NODES = env.int("ATTACK_PATHS_MAX_CUSTOM_QUERY_NODES", default=250)
 # Shorter than CONN_ACQUISITION_TIMEOUT — the driver requires acquisition to be
 # the longer of the two (it may include opening a new connection).
 CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
 CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
 READ_EXCEPTION_CODES = [
    "Neo.ClientError.Statement.AccessMode",
    "Neo.ClientError.Procedure.ProcedureNotFound",
 ]
 CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
-# Module-level process-wide driver singleton
+TEMP_DB_PREFIX = "db-tmp-scan-"
 _driver: neo4j.Driver | None = None
 _lock = threading.Lock()
 # Base Neo4j functions
 def get_uri() -> str:
    host = settings.DATABASES["neo4j"]["HOST"]
    port = settings.DATABASES["neo4j"]["PORT"]
    return f"bolt://{host}:{port}"
 def init_driver() -> neo4j.Driver:
    global _driver
    if _driver is not None:
        return _driver
    with _lock:
        if _driver is None:
            uri = get_uri()
            config = settings.DATABASES["neo4j"]
            driver = neo4j.GraphDatabase.driver(
                uri,
                auth=(config["USER"], config["PASSWORD"]),
                keep_alive=True,
                max_connection_lifetime=7200,
                connection_timeout=CONNECTION_TIMEOUT,
                connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
                max_connection_pool_size=50,
            )
            # Publish the singleton only after connectivity is verified so a
            # failed probe does not leave an unverified driver behind. Close the
            # driver on failure so a repeatedly-probed outage cannot leak pools.
            try:
                driver.verify_connectivity()
            except Exception:
                driver.close()
                raise
            _driver = driver
            # Register cleanup handler (only runs once since we're inside the _driver is None block)
            atexit.register(close_driver)
    return _driver
 def get_driver() -> neo4j.Driver:
    return init_driver()
 def close_driver() -> None:  # TODO: Use it
    global _driver
    with _lock:
        if _driver is not None:
            try:
                _driver.close()
            finally:
                _driver = None
@contextmanager
 def get_session(
    database: str | None = None, default_access_mode: str | None = None
 ) -> Iterator[RetryableSession]:
    session_wrapper: RetryableSession | None = None
    try:
        session_wrapper = RetryableSession(
            session_factory=lambda: get_driver().session(
                database=database, default_access_mode=default_access_mode
            ),
            max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
        )
        yield session_wrapper
    except neo4j.exceptions.Neo4jError as exc:
        if (
            default_access_mode == neo4j.READ_ACCESS
            and exc.code
            and exc.code in READ_EXCEPTION_CODES
        ):
            message = "Read query not allowed"
            code = READ_EXCEPTION_CODES[0]
            raise WriteQueryNotAllowedException(message=message, code=code)
        message = exc.message if exc.message is not None else str(exc)
        if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
            raise ClientStatementException(message=message, code=exc.code)
        raise GraphDatabaseQueryException(message=message, code=exc.code)
    finally:
        if session_wrapper is not None:
            session_wrapper.close()
 def execute_read_query(
    database: str,
    cypher: str,
    parameters: dict[str, Any] | None = None,
 ) -> neo4j.graph.Graph:
    with get_session(database, default_access_mode=neo4j.READ_ACCESS) as session:
        def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
            result = tx.run(
                cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
            )
            return result.graph()
        return session.execute_read(_run)
 def create_database(database: str) -> None:
    query = "CREATE DATABASE $database IF NOT EXISTS"
    parameters = {"database": database}
    with get_session() as session:
        session.run(query, parameters)
 def drop_database(database: str) -> None:
    query = f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA"
    with get_session() as session:
        session.run(query)
 def drop_subgraph(database: str, provider_id: str) -> int:
    """
    Delete all nodes for a provider from the tenant database.
    Deletes relationships then nodes in batches (not `DETACH DELETE`) so a dense
    provider's graph cannot exceed Neo4j's transaction memory limit.
    Silently returns 0 if the database doesn't exist.
    """
    provider_label = get_provider_label(provider_id)
    deleted_nodes = 0
    try:
        with get_session(database) as session:
            # Phase 1: delete relationships incident to provider nodes in batches.
            deleted_count = 1
            while deleted_count > 0:
                result = session.run(
                    f"""
                    MATCH (:`{provider_label}`)-[r]-()
                    WITH DISTINCT r LIMIT $batch_size
                    DELETE r
                    RETURN COUNT(r) AS deleted_rels_count
                    """,
                    {"batch_size": BATCH_SIZE},
                )
                deleted_count = result.single().get("deleted_rels_count", 0)
            # Phase 2: delete the now relationship-free nodes in batches.
            deleted_count = 1
            while deleted_count > 0:
                result = session.run(
                    f"""
                    MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`)
                    WITH n LIMIT $batch_size
                    DELETE n
                    RETURN COUNT(n) AS deleted_nodes_count
                    """,
                    {"batch_size": BATCH_SIZE},
                )
                deleted_count = result.single().get("deleted_nodes_count", 0)
                deleted_nodes += deleted_count
    except GraphDatabaseQueryException as exc:
        if exc.code == "Neo.ClientError.Database.DatabaseNotFound":
            return 0
        raise
    return deleted_nodes
 def has_provider_data(database: str, provider_id: str) -> bool:
    """
    Check if any ProviderResource node exists for this provider.
    Returns `False` if the database doesn't exist.
    """
    provider_label = get_provider_label(provider_id)
    query = f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
    try:
        with get_session(database, default_access_mode=neo4j.READ_ACCESS) as session:
            result = session.run(query)
            return result.single() is not None
    except GraphDatabaseQueryException as exc:
        if exc.code == "Neo.ClientError.Database.DatabaseNotFound":
            return False
        raise
 def clear_cache(database: str) -> None:
    query = "CALL db.clearQueryCaches()"
    try:
        with get_session(database) as session:
            session.run(query)
    except GraphDatabaseQueryException as exc:
        logging.warning(f"Failed to clear query cache for database `{database}`: {exc}")
 # Neo4j functions related to Prowler + Cartography
 def get_database_name(entity_id: str | UUID, temporary: bool = False) -> str:
    prefix = "tmp-scan" if temporary else "tenant"
    return f"db-{prefix}-{str(entity_id).lower()}"
 # Exceptions
@@ -270,7 +41,6 @@ class GraphDatabaseQueryException(Exception):
    def __str__(self) -> str:
        if self.code:
            return f"{self.code}: {self.message}"
        return self.message
@@ -280,3 +50,152 @@ class WriteQueryNotAllowedException(GraphDatabaseQueryException):
 class ClientStatementException(GraphDatabaseQueryException):
    pass
 # Routing
 def _is_ingest_database(database: str | None) -> bool:
    return bool(database) and database.startswith(TEMP_DB_PREFIX)
 # Driver lifecycle
 def init_driver() -> Any:
    """Initialize the configured sink backend.
    The ingest driver (Neo4j for cartography temp DBs) stays lazy: it is
    only initialized when a temp-DB operation actually runs, which never
    happens on API pods.
    """
    return sink_module.init()
 def close_driver() -> None:
    """Close every driver held by this process."""
    sink_module.close()
    ingest.close_driver()
 def get_driver() -> neo4j.Driver:
    """Return the sink backend's underlying driver.
    Only meaningful for the Neo4j sink (where the backend has a single Neo4j
    driver). On Neptune this returns the writer driver. Kept for tests and
    legacy call-sites; prefer `get_session` for new code.
    """
    backend = sink_module.get_backend()
    # Neo4jSink exposes get_driver(); NeptuneSink exposes get_writer()
    if hasattr(backend, "get_driver"):
        return backend.get_driver()
    if hasattr(backend, "get_writer"):
        return backend.get_writer()
    raise RuntimeError("Active sink backend does not expose a driver handle")
 def verify_connectivity() -> None:
    """Raise if the configured graph database is unreachable on the API read path.
    Backend-agnostic entry point for the readiness probe: Neo4j verifies its
    driver, Neptune verifies the reader endpoint.
    """
    sink_module.get_backend().verify_connectivity()
 def get_uri() -> str:
    """Return the sink URI. Retained for backwards compatibility."""
    if settings.ATTACK_PATHS_SINK_DATABASE == "neptune":
        cfg = settings.DATABASES["neptune"]
        return f"bolt+s://{cfg['WRITER_ENDPOINT']}:{cfg['PORT']}"
    cfg = settings.DATABASES["neo4j"]
    return f"bolt://{cfg['HOST']}:{cfg['PORT']}"
 def get_ingest_uri() -> str:
    """Neo4j URI for the cartography temp (ingest) database, which is always
    Neo4j regardless of the configured sink."""
    return ingest.get_uri()
 # Session API
 def get_session(
    database: str | None = None,
    default_access_mode: str | None = None,
 ) -> AbstractContextManager:
    """Return a session against the right backend.
    - `database` names starting with `db-tmp-scan-` always go to ingest.
    - No database name → ingest (used for CREATE / DROP DATABASE admin ops).
    - Any other name → sink.
    """
    if _is_ingest_database(database) or database is None:
        return ingest.get_session(
            database=database, default_access_mode=default_access_mode
        )
    return sink_module.get_backend().get_session(
        database=database, default_access_mode=default_access_mode
    )
 def execute_read_query(
    database: str,
    cypher: str,
    parameters: dict[str, Any] | None = None,
 ) -> neo4j.graph.Graph:
    """Read-only query against the sink."""
    return sink_module.get_backend().execute_read_query(database, cypher, parameters)
 def create_database(database: str) -> None:
    """Create a database. Temp DBs always land on ingest (Neo4j).
    On the Neo4j sink, tenant DBs also route to ingest because both drivers
    connect to the same Neo4j cluster. On the Neptune sink, tenant DB creates
    are no-ops.
    """
    if _is_ingest_database(database):
        ingest.create_database(database)
        return
    sink_module.get_backend().create_database(database)
 def drop_database(database: str) -> None:
    """Drop a database. Mirrors `create_database` routing."""
    if _is_ingest_database(database):
        ingest.drop_database(database)
        return
    sink_module.get_backend().drop_database(database)
 def drop_subgraph(database: str, provider_id: str) -> int:
    return sink_module.get_backend().drop_subgraph(database, provider_id)
 def has_provider_data(database: str, provider_id: str) -> bool:
    return sink_module.get_backend().has_provider_data(database, provider_id)
 def clear_cache(database: str) -> None:
    if _is_ingest_database(database):
        ingest.clear_cache(database)
        return
    sink_module.get_backend().clear_cache(database)
 # Name helper
 def get_database_name(entity_id: str | UUID, temporary: bool = False) -> str:
    prefix = "tmp-scan" if temporary else "tenant"
    return f"db-{prefix}-{str(entity_id).lower()}"
@@ -0,0 +1,29 @@
 """Cartography ingest layer.
 Public surface for the per-scan Neo4j temp database driver. Implementation
 lives in `api.attack_paths.ingest.driver`.
 """
 from api.attack_paths.ingest.driver import (
    clear_cache,
    close_driver,
    create_database,
    drop_database,
    get_driver,
    get_session,
    get_uri,
    init_driver,
    run_cypher,
 )
 __all__ = [
    "clear_cache",
    "close_driver",
    "create_database",
    "drop_database",
    "get_driver",
    "get_session",
    "get_uri",
    "init_driver",
    "run_cypher",
 ]
@@ -0,0 +1,187 @@
 """Cartography ingest driver: per-scan throw-away Neo4j database.
 Cartography writes each scan's graph into a throw-away Neo4j database named
 `db-tmp-scan-{scan_uuid}`. This is always Neo4j, regardless of the configured
 sink: Neptune is single-database and cannot host per-scan throw-away
 databases. This module owns the Neo4j driver used for those temp DBs and the
 admin ops they need (CREATE / DROP DATABASE).
 """
 import atexit
 import logging
 import threading
 from collections.abc import Iterator
 from contextlib import contextmanager
 from typing import Any
 import neo4j
 import neo4j.exceptions
 from api.attack_paths.retryable_session import RetryableSession
 from config.env import env
 from django.conf import settings
 logging.getLogger("neo4j").setLevel(logging.ERROR)
 logging.getLogger("neo4j").propagate = False
 SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
 )
 CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
 # TCP connect timeout, ordered below the acquisition timeout so an unreachable
 # host can't pin a worker on a temp-DB op longer than this.
 CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
 MAX_CONNECTION_LIFETIME = env.int("NEO4J_MAX_CONNECTION_LIFETIME", default=7200)
 MAX_CONNECTION_POOL_SIZE = env.int("NEO4J_MAX_CONNECTION_POOL_SIZE", default=50)
 _driver: neo4j.Driver | None = None
 _lock = threading.Lock()
 def _neo4j_config() -> dict:
    return settings.DATABASES["neo4j"]
 def get_uri() -> str:
    """Bolt URI for the Neo4j temp (ingest) database. Always Neo4j."""
    config = _neo4j_config()
    host = config["HOST"]
    port = config["PORT"]
    if not host or not port:
        raise RuntimeError(
            "NEO4J_HOST / NEO4J_PORT must be set to use the attack-paths "
            "temp database. Workers require Neo4j env even when the sink is Neptune."
        )
    return f"bolt://{host}:{port}"
 def init_driver() -> neo4j.Driver:
    """Initialize the temp-database Neo4j driver. Idempotent."""
    global _driver
    if _driver is not None:
        return _driver
    with _lock:
        if _driver is None:
            config = _neo4j_config()
            _driver = neo4j.GraphDatabase.driver(
                get_uri(),
                auth=(config["USER"], config["PASSWORD"]),
                keep_alive=True,
                max_connection_lifetime=MAX_CONNECTION_LIFETIME,
                connection_timeout=CONNECTION_TIMEOUT,
                connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
                max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
            )
            # Best-effort connectivity check: a Neo4j that is down at boot must
            # not crash the worker. The driver reconnects lazily on first use.
            try:
                _driver.verify_connectivity()
            except Exception:
                logging.warning(
                    "Neo4j temp-database unreachable at init; continuing with a "
                    "lazily-reconnecting driver",
                    exc_info=True,
                )
            atexit.register(close_driver)
    return _driver
 def get_driver() -> neo4j.Driver:
    return init_driver()
 def close_driver() -> None:
    global _driver
    with _lock:
        if _driver is not None:
            try:
                _driver.close()
            finally:
                _driver = None
@contextmanager
 def get_session(
    database: str | None = None,
    default_access_mode: str | None = None,
 ) -> Iterator[RetryableSession]:
    """Session against the Neo4j temp-database cluster. Used for temp DB sessions
    and for admin operations (CREATE / DROP DATABASE) when `database` is None."""
    from api.attack_paths.database import (
        ClientStatementException,
        GraphDatabaseQueryException,
        WriteQueryNotAllowedException,
    )
    READ_EXCEPTION_CODES = [
        "Neo.ClientError.Statement.AccessMode",
        "Neo.ClientError.Procedure.ProcedureNotFound",
    ]
    CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
    session_wrapper: RetryableSession | None = None
    try:
        session_wrapper = RetryableSession(
            session_factory=lambda: get_driver().session(
                database=database, default_access_mode=default_access_mode
            ),
            max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
        )
        yield session_wrapper
    except neo4j.exceptions.Neo4jError as exc:
        if (
            default_access_mode == neo4j.READ_ACCESS
            and exc.code
            and exc.code in READ_EXCEPTION_CODES
        ):
            raise WriteQueryNotAllowedException(
                message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
            )
        message = exc.message if exc.message is not None else str(exc)
        if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
            raise ClientStatementException(message=message, code=exc.code)
        raise GraphDatabaseQueryException(message=message, code=exc.code)
    finally:
        if session_wrapper is not None:
            session_wrapper.close()
 def create_database(database: str) -> None:
    """Create a database on the Neo4j cluster. Used for temp scan DBs."""
    with get_session() as session:
        session.run("CREATE DATABASE $database IF NOT EXISTS", {"database": database})
 def drop_database(database: str) -> None:
    """Drop a database on the Neo4j cluster. Used for temp scan DBs."""
    with get_session() as session:
        session.run(f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA")
 def clear_cache(database: str) -> None:
    """Best-effort cache clear for a Neo4j database."""
    from api.attack_paths.database import GraphDatabaseQueryException
    try:
        with get_session(database) as session:
            session.run("CALL db.clearQueryCaches()")
    except GraphDatabaseQueryException as exc:
        logging.warning(f"Failed to clear query cache for database `{database}`: {exc}")
 def run_cypher(
    database: str | None,
    cypher: str,
    parameters: dict[str, Any] | None = None,
 ) -> Any:
    """Execute Cypher directly without the context manager. Thin helper."""
    with get_session(database) as session:
        return session.run(cypher, parameters or {})
@@ -1,12 +1,14 @@
 from api.attack_paths.queries.aws import AWS_QUERIES
 # TODO: drop after Neptune cutover
 from api.attack_paths.queries.aws_deprecated import AWS_DEPRECATED_QUERIES
 from api.attack_paths.queries.types import AttackPathsQueryDefinition
-# Query definitions organized by provider
+# Query definitions for scans synced with the current schema.
 _QUERY_DEFINITIONS: dict[str, list[AttackPathsQueryDefinition]] = {
    "aws": AWS_QUERIES,
 }
 # Flat lookup by query ID for O(1) access
 _QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
    definition.id: definition
    for definitions in _QUERY_DEFINITIONS.values()
@@ -14,11 +16,45 @@ _QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
 }
-def get_queries_for_provider(provider: str) -> list[AttackPathsQueryDefinition]:
+# TODO: drop after Neptune cutover
-    """Get all attack path queries for a specific provider."""
+#
-    return _QUERY_DEFINITIONS.get(provider, [])
+# Query definitions for pre-cutover scans (`AttackPathsScan.is_migrated=False`)
 # whose graph data was written under the previous schema. Both maps expose the
 # same query IDs so the API contract is identical regardless of which set is
 # routed to.
 _DEPRECATED_QUERY_DEFINITIONS: dict[str, list[AttackPathsQueryDefinition]] = {
    "aws": AWS_DEPRECATED_QUERIES,
 }
 _DEPRECATED_QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
    definition.id: definition
    for definitions in _DEPRECATED_QUERY_DEFINITIONS.values()
    for definition in definitions
 }
-def get_query_by_id(query_id: str) -> AttackPathsQueryDefinition | None:
+def get_queries_for_provider(
-    """Get a specific attack path query by its ID."""
+    provider: str,
-    return _QUERIES_BY_ID.get(query_id)
+    is_migrated: bool = True,
 ) -> list[AttackPathsQueryDefinition]:
    """Get all attack path queries for a provider.
    `is_migrated` selects the catalog: True for scans synced with the current
    schema, False for pre-cutover scans still using the legacy graph shape.
    # TODO: drop the `is_migrated` parameter after Neptune cutover
    """
    catalog = _QUERY_DEFINITIONS if is_migrated else _DEPRECATED_QUERY_DEFINITIONS
    return catalog.get(provider, [])
 def get_query_by_id(
    query_id: str,
    is_migrated: bool = True,
 ) -> AttackPathsQueryDefinition | None:
    """Get a specific attack path query by ID.
    `is_migrated` selects the catalog (see `get_queries_for_provider`).
    # TODO: drop the `is_migrated` parameter after Neptune cutover
    """
    by_id = _QUERIES_BY_ID if is_migrated else _DEPRECATED_QUERIES_BY_ID
    return by_id.get(query_id)
@@ -0,0 +1,28 @@
 """Attack-paths sink database layer.
 The sink is the persistent store where attack-paths graphs live after a scan
 finishes. Currently selectable between Neo4j (OSS / local dev default) and
 AWS Neptune (hosted dev/staging/prod). Backend is picked by the
 `ATTACK_PATHS_SINK_DATABASE` setting at process init.
 This package exposes the public factory API; the implementation lives in
 `api.attack_paths.sink.factory`.
 """
 from api.attack_paths.sink.factory import (
    SinkBackend,
    close,
    get_backend,
    get_backend_for_name,
    get_backend_for_scan,
    init,
 )
 __all__ = [
    "SinkBackend",
    "close",
    "get_backend",
    "get_backend_for_name",
    "get_backend_for_scan",
    "init",
 ]
@@ -0,0 +1,92 @@
 """Protocol every sink backend must implement."""
 from contextlib import AbstractContextManager
 from typing import Any, Protocol
 import neo4j
 class SinkDatabase(Protocol):
    """Contract for the persistent attack-paths graph store.
    The `database` argument is an opaque identifier passed through from the
    legacy `database.py` API surface. On Neo4j it is the per-tenant database
    name (e.g. `db-tenant-{uuid}`). On Neptune it is ignored (the cluster
    has a single graph, and isolation is label-based).
    """
    def init(self) -> None: ...
    def close(self) -> None: ...
    def verify_connectivity(self) -> None:
        """Raise if the backend the API read path uses is unreachable.
        Neo4j verifies its single driver. Neptune verifies the reader
        driver (the endpoint the API serves reads from); on single-endpoint
        clusters the reader aliases the writer, so that path is covered too.
        Used by the readiness probe; must not block longer than the caller's
        probe budget.
        """
        ...
    def get_session(
        self,
        database: str | None = None,
        default_access_mode: str | None = None,
    ) -> AbstractContextManager: ...
    def execute_read_query(
        self,
        database: str,
        cypher: str,
        parameters: dict[str, Any] | None = None,
    ) -> neo4j.graph.Graph: ...
    def create_database(self, database: str) -> None: ...
    def drop_database(self, database: str) -> None: ...
    def drop_subgraph(self, database: str, provider_id: str) -> int: ...
    def has_provider_data(self, database: str, provider_id: str) -> bool: ...
    def clear_cache(self, database: str) -> None: ...
    def ensure_sync_indexes(self, database: str) -> None:
        """Create any index needed for the sync write path.
        Called once at the start of each provider sync; must be idempotent.
        Neo4j creates a `_provider_element_id` index on `_ProviderResource`;
        Neptune is a no-op (its `~id` lookup needs no index).
        """
        ...
    def write_nodes(
        self,
        database: str,
        labels: str,
        rows: list[dict[str, Any]],
    ) -> None:
        """Upsert a batch of nodes into the sink.
        `labels` is a pre-rendered Cypher label string ready to drop after
        the node variable (e.g. `` `AWSUser`:`_ProviderResource`:`_Tenant_x` ``).
        Each row carries `provider_element_id` and `props`.
        """
        ...
    def write_relationships(
        self,
        database: str,
        rel_type: str,
        provider_id: str,
        rows: list[dict[str, Any]],
    ) -> None:
        """Upsert a batch of relationships into the sink.
        Each row carries `start_element_id`, `end_element_id`,
        `provider_element_id` and `props`. `rel_type` is the relationship
        type (already a valid Cypher identifier).
        """
        ...
@@ -0,0 +1,134 @@
 """Sink backend factory and process-wide handle cache.
 Picks the active backend from `settings.ATTACK_PATHS_SINK_DATABASE` at first
 use, holds the active backend plus any secondary backends needed to serve
 scans written under the previous configuration, and tears them all down on
 process shutdown. Imported via `from api.attack_paths import sink as
 sink_module`.
 """
 import threading
 from enum import StrEnum, auto
 from api.attack_paths.sink.base import SinkDatabase
 from api.models import AttackPathsScan
 from django.conf import settings
 # Backend names
 class SinkBackend(StrEnum):
    NEO4J = auto()
    NEPTUNE = auto()
 # Backend cache
 _backend: SinkDatabase | None = None
 _secondary_backends: dict[SinkBackend, SinkDatabase] = {}
 _lock = threading.Lock()
 def _resolve_setting() -> SinkBackend:
    raw = settings.ATTACK_PATHS_SINK_DATABASE.lower()
    try:
        return SinkBackend(raw)
    except ValueError:
        valid = sorted(b.value for b in SinkBackend)
        raise RuntimeError(
            f"ATTACK_PATHS_SINK_DATABASE must be one of {valid}; got {raw!r}"
        )
 def _build_backend(name: SinkBackend) -> SinkDatabase:
    if name is SinkBackend.NEO4J:
        from api.attack_paths.sink.neo4j import Neo4jSink
        return Neo4jSink()
    if name is SinkBackend.NEPTUNE:
        from api.attack_paths.sink.neptune import NeptuneSink
        return NeptuneSink()
    raise RuntimeError(f"Unknown sink backend {name!r}")
 # Lifecycle
 def init(name: SinkBackend | str | None = None) -> SinkDatabase:
    """Initialize the configured sink backend. Idempotent."""
    global _backend
    if _backend is not None:
        return _backend
    with _lock:
        if _backend is None:
            resolved = SinkBackend(name) if name else _resolve_setting()
            backend = _build_backend(resolved)
            backend.init()
            _backend = backend
    return _backend
 def close() -> None:
    """Close the active backend and every cached secondary backend."""
    global _backend
    with _lock:
        backends = [
            b for b in (_backend, *_secondary_backends.values()) if b is not None
        ]
        _backend = None
        _secondary_backends.clear()
    for backend in backends:
        try:
            backend.close()
        except Exception:  # pragma: no cover - best-effort
            pass
 def get_backend() -> SinkDatabase:
    """Return the active sink. Initializes on first call."""
    return init()
 # Per-scan routing
 def get_backend_for_scan(scan: AttackPathsScan) -> SinkDatabase:
    """Route reads by the sink that stores this scan's graph."""
    raw_backend = getattr(scan, "sink_backend", SinkBackend.NEO4J.value)
    if not isinstance(raw_backend, str):
        raw_backend = SinkBackend.NEO4J.value
    return get_backend_for_name(raw_backend)
 def get_backend_for_name(name: SinkBackend | str) -> SinkDatabase:
    """Return the backend named by persisted scan metadata."""
    resolved = SinkBackend(name)
    if resolved is _resolve_setting():
        return get_backend()
    return _build_backend_cached(resolved)
 def _build_backend_cached(name: SinkBackend) -> SinkDatabase:
    # TODO: drop after Neptune cutover
    # Needed only during cutover to serve Neo4j-written scans from a Neptune-
    # configured API pod (and vice versa). Once every scan is on Neptune,
    # `get_backend_for_scan` becomes a one-liner returning `get_backend()`.
    if name in _secondary_backends:
        return _secondary_backends[name]
    with _lock:
        if name not in _secondary_backends:
            backend = _build_backend(name)
            backend.init()
            _secondary_backends[name] = backend
    return _secondary_backends[name]
@@ -0,0 +1,454 @@
 """Neo4j sink implementation.
 Owns a Neo4j driver independent from the staging driver. On OSS and local dev
 this is the only sink; on hosted deployments it runs only as a legacy read
 path while phase-1 drains tenant DBs.
 """
 import atexit
 import logging
 import threading
 import time
 from collections.abc import Iterator
 from contextlib import AbstractContextManager, contextmanager
 from typing import Any
 import neo4j
 import neo4j.exceptions
 from api.attack_paths.retryable_session import RetryableSession
 from api.attack_paths.sink.base import SinkDatabase
 from config.env import env
 from django.conf import settings
 logging.getLogger("neo4j").setLevel(logging.ERROR)
 logging.getLogger("neo4j").propagate = False
 logger = logging.getLogger(__name__)
 SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
 )
 READ_QUERY_TIMEOUT_SECONDS = env.int(
    "ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
 )
 CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
 # TCP connect timeout, ordered below the acquisition timeout so an unreachable
 # host can't pin a request or the readiness probe longer than this.
 CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
 MAX_CONNECTION_LIFETIME = env.int("NEO4J_MAX_CONNECTION_LIFETIME", default=7200)
 MAX_CONNECTION_POOL_SIZE = env.int("NEO4J_MAX_CONNECTION_POOL_SIZE", default=50)
 READ_EXCEPTION_CODES = [
    "Neo.ClientError.Statement.AccessMode",
    "Neo.ClientError.Procedure.ProcedureNotFound",
 ]
 CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
 DATABASE_NOT_FOUND_CODE = "Neo.ClientError.Database.DatabaseNotFound"
 class Neo4jSink(SinkDatabase):
    """Neo4j-backed sink. Multi-database cluster; tenant isolation is physical."""
    def __init__(self) -> None:
        self._driver: neo4j.Driver | None = None
        self._lock = threading.Lock()
        self._atexit_registered = False
    # Driver
    def _config(self) -> dict:
        return settings.DATABASES["neo4j"]
    def _uri(self) -> str:
        cfg = self._config()
        host = cfg["HOST"]
        port = cfg["PORT"]
        if not host or not port:
            raise RuntimeError(
                "NEO4J_HOST / NEO4J_PORT must be set when ATTACK_PATHS_SINK_DATABASE=neo4j"
            )
        return f"bolt://{host}:{port}"
    def init(self) -> neo4j.Driver:
        if self._driver is not None:
            return self._driver
        with self._lock:
            if self._driver is None:
                cfg = self._config()
                self._driver = neo4j.GraphDatabase.driver(
                    self._uri(),
                    auth=(cfg["USER"], cfg["PASSWORD"]),
                    keep_alive=True,
                    max_connection_lifetime=MAX_CONNECTION_LIFETIME,
                    connection_timeout=CONNECTION_TIMEOUT,
                    connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
                    max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
                )
                # Eager connectivity check is best-effort:
                # A Neo4j that is down at boot must not crash the process, same degradation model as Postgres
                # The driver reconnects lazily on first use
                # /health/ready surfaces the outage until it recovers
                try:
                    self._driver.verify_connectivity()
                except Exception:
                    logger.warning(
                        "Neo4j sink unreachable at init; continuing with a lazily-reconnecting driver",
                        exc_info=True,
                    )
                if not self._atexit_registered:
                    atexit.register(self.close)
                    self._atexit_registered = True
        return self._driver
    def _get_driver(self) -> neo4j.Driver:
        return self.init()
    def verify_connectivity(self) -> None:
        self._get_driver().verify_connectivity()
    def close(self) -> None:
        with self._lock:
            if self._driver is not None:
                try:
                    self._driver.close()
                finally:
                    self._driver = None
    # Sessions
    @contextmanager
    def get_session(
        self,
        database: str | None = None,
        default_access_mode: str | None = None,
    ) -> Iterator[RetryableSession]:
        from api.attack_paths.database import (
            ClientStatementException,
            GraphDatabaseQueryException,
            WriteQueryNotAllowedException,
        )
        session_wrapper: RetryableSession | None = None
        try:
            session_wrapper = RetryableSession(
                session_factory=lambda: self._get_driver().session(
                    database=database, default_access_mode=default_access_mode
                ),
                max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
            )
            yield session_wrapper
        except neo4j.exceptions.Neo4jError as exc:
            if (
                default_access_mode == neo4j.READ_ACCESS
                and exc.code
                and exc.code in READ_EXCEPTION_CODES
            ):
                raise WriteQueryNotAllowedException(
                    message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
                )
            message = exc.message if exc.message is not None else str(exc)
            if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
                raise ClientStatementException(message=message, code=exc.code)
            raise GraphDatabaseQueryException(message=message, code=exc.code)
        finally:
            if session_wrapper is not None:
                session_wrapper.close()
    # Operations
    def execute_read_query(
        self,
        database: str,
        cypher: str,
        parameters: dict[str, Any] | None = None,
    ) -> neo4j.graph.Graph:
        with self.get_session(
            database, default_access_mode=neo4j.READ_ACCESS
        ) as session:
            def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
                result = tx.run(
                    cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
                )
                return result.graph()
            return session.execute_read(_run)
    def create_database(self, database: str) -> None:
        with self.get_session() as session:
            session.run(
                "CREATE DATABASE $database IF NOT EXISTS", {"database": database}
            )
    def drop_database(self, database: str) -> None:
        with self.get_session() as session:
            session.run(f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA")
    def drop_subgraph(self, database: str, provider_id: str) -> int:
        """Delete all nodes for a provider from a tenant database, batched.
        Deletes relationships then nodes in batches (not `DETACH DELETE`) so a
        dense provider's graph cannot exceed Neo4j's transaction memory limit.
        Silently returns 0 if the database doesn't exist.
        """
        from api.attack_paths.database import GraphDatabaseQueryException
        from tasks.jobs.attack_paths.config import (
            BATCH_SIZE,
            PROVIDER_RESOURCE_LABEL,
            get_provider_label,
        )
        provider_label = get_provider_label(provider_id)
        deleted_nodes = 0
        deleted_relationships = 0
        relationship_batches = 0
        node_batches = 0
        drop_t0 = time.perf_counter()
        logger.info(
            "Dropping provider graph from Neo4j sink database %s "
            "(provider=%s, provider_label=%s)",
            database,
            provider_id,
            provider_label,
        )
        try:
            logger.info(
                "Opening Neo4j sink session for provider graph drop "
                "(database=%s, provider=%s)",
                database,
                provider_id,
            )
            with self.get_session(database) as session:
                logger.info(
                    "Opened Neo4j sink session for provider graph drop "
                    "(database=%s, provider=%s)",
                    database,
                    provider_id,
                )
                # Phase 1: delete relationships incident to provider nodes in
                # batches. The undirected pattern matches an edge between two
                # provider nodes from both ends, so `DISTINCT r` dedupes it to
                # delete a full batch of unique relationships each round.
                deleted_count = 1
                while deleted_count > 0:
                    next_batch = relationship_batches + 1
                    logger.info(
                        "Deleting relationship batch from Neo4j sink database %s "
                        "(provider=%s, batch=%s, total_rels=%s, elapsed=%.3fs)",
                        database,
                        provider_id,
                        next_batch,
                        deleted_relationships,
                        time.perf_counter() - drop_t0,
                    )
                    result = session.run(
                        f"""
                        MATCH (:`{provider_label}`)-[r]-()
                        WITH DISTINCT r LIMIT $batch_size
                        DELETE r
                        RETURN COUNT(r) AS deleted_rels_count
                        """,
                        {"batch_size": BATCH_SIZE},
                    )
                    deleted_count = result.single().get("deleted_rels_count", 0)
                    if deleted_count > 0:
                        relationship_batches += 1
                        deleted_relationships += deleted_count
                        logger.info(
                            "Deleted relationship batch from Neo4j sink database %s "
                            "(provider=%s, batch=%s, deleted_rels=%s, "
                            "total_rels=%s, elapsed=%.3fs)",
                            database,
                            provider_id,
                            relationship_batches,
                            deleted_count,
                            deleted_relationships,
                            time.perf_counter() - drop_t0,
                        )
                # Phase 2: delete the now relationship-free nodes in batches.
                deleted_count = 1
                while deleted_count > 0:
                    next_batch = node_batches + 1
                    logger.info(
                        "Deleting node batch from Neo4j sink database %s "
                        "(provider=%s, batch=%s, total_nodes=%s, elapsed=%.3fs)",
                        database,
                        provider_id,
                        next_batch,
                        deleted_nodes,
                        time.perf_counter() - drop_t0,
                    )
                    result = session.run(
                        f"""
                        MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`)
                        WITH n LIMIT $batch_size
                        DELETE n
                        RETURN COUNT(n) AS deleted_nodes_count
                        """,
                        {"batch_size": BATCH_SIZE},
                    )
                    deleted_count = result.single().get("deleted_nodes_count", 0)
                    if deleted_count > 0:
                        node_batches += 1
                        deleted_nodes += deleted_count
                        logger.info(
                            "Deleted node batch from Neo4j sink database %s "
                            "(provider=%s, batch=%s, deleted_nodes=%s, "
                            "total_nodes=%s, elapsed=%.3fs)",
                            database,
                            provider_id,
                            node_batches,
                            deleted_count,
                            deleted_nodes,
                            time.perf_counter() - drop_t0,
                        )
        except GraphDatabaseQueryException as exc:
            if exc.code == DATABASE_NOT_FOUND_CODE:
                logger.info(
                    "Skipped provider graph drop from Neo4j sink database %s "
                    "(provider=%s, reason=database_not_found, elapsed=%.3fs)",
                    database,
                    provider_id,
                    time.perf_counter() - drop_t0,
                )
                return 0
            raise
        logger.info(
            "Finished dropping provider graph from Neo4j sink database %s "
            "(provider=%s, relationship_batches=%s, deleted_rels=%s, "
            "node_batches=%s, deleted_nodes=%s, elapsed=%.3fs)",
            database,
            provider_id,
            relationship_batches,
            deleted_relationships,
            node_batches,
            deleted_nodes,
            time.perf_counter() - drop_t0,
        )
        return deleted_nodes
    def has_provider_data(self, database: str, provider_id: str) -> bool:
        from api.attack_paths.database import GraphDatabaseQueryException
        from tasks.jobs.attack_paths.config import (
            PROVIDER_RESOURCE_LABEL,
            get_provider_label,
        )
        provider_label = get_provider_label(provider_id)
        query = (
            f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
        )
        try:
            with self.get_session(
                database, default_access_mode=neo4j.READ_ACCESS
            ) as session:
                result = session.run(query)
                return result.single() is not None
        except GraphDatabaseQueryException as exc:
            if exc.code == DATABASE_NOT_FOUND_CODE:
                return False
            raise
    def clear_cache(self, database: str) -> None:
        from api.attack_paths.database import GraphDatabaseQueryException
        try:
            with self.get_session(database) as session:
                session.run("CALL db.clearQueryCaches()")
        except GraphDatabaseQueryException as exc:
            logger.warning(
                f"Failed to clear query cache for database `{database}`: {exc}"
            )
    # Sync write path
    def ensure_sync_indexes(self, database: str) -> None:
        """Create the `_provider_element_id` lookup index on `_ProviderResource`.
        Every synced node carries the `_ProviderResource` label, so a single
        index covers both node-upserts and relationship endpoint MATCHes.
        Without this index the rel sync degrades to a label scan per row and
        large provider syncs become unworkable.
        """
        from tasks.jobs.attack_paths.config import (
            PROVIDER_ELEMENT_ID_PROPERTY,
            PROVIDER_RESOURCE_LABEL,
        )
        query = (
            f"CREATE INDEX provider_element_id_idx IF NOT EXISTS "
            f"FOR (n:`{PROVIDER_RESOURCE_LABEL}`) "
            f"ON (n.`{PROVIDER_ELEMENT_ID_PROPERTY}`)"
        )
        with self.get_session(database) as session:
            session.run(query).consume()
    def write_nodes(
        self,
        database: str,
        labels: str,
        rows: list[dict[str, Any]],
    ) -> None:
        if not rows:
            return
        from tasks.jobs.attack_paths.config import (
            PROVIDER_ELEMENT_ID_PROPERTY,
            PROVIDER_RESOURCE_LABEL,
        )
        query = f"""
            UNWIND $rows AS row
            MERGE (n:`{PROVIDER_RESOURCE_LABEL}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}})
            SET n:{labels}
            SET n += row.props
        """
        with self.get_session(database) as session:
            session.run(query, {"rows": rows}).consume()
    def write_relationships(
        self,
        database: str,
        rel_type: str,
        provider_id: str,
        rows: list[dict[str, Any]],
    ) -> None:
        if not rows:
            return
        from tasks.jobs.attack_paths.config import (
            PROVIDER_ELEMENT_ID_PROPERTY,
            PROVIDER_RESOURCE_LABEL,
            get_provider_label,
        )
        provider_label = get_provider_label(provider_id)
        query = f"""
            UNWIND $rows AS row
            MATCH (s:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.start_element_id}})
            MATCH (t:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.end_element_id}})
            MERGE (s)-[r:`{rel_type}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}}]->(t)
            SET r += row.props
        """
        with self.get_session(database) as session:
            session.run(query, {"rows": rows}).consume()
    # For compatibility with test harnesses that patch the concrete driver
    def get_driver(self) -> neo4j.Driver:
        return self._get_driver()
 # Helper for tests / external callers that want a writer session specifically
 def get_read_session(
    sink: Neo4jSink, database: str
 ) -> AbstractContextManager[RetryableSession]:
    return sink.get_session(database, default_access_mode=neo4j.READ_ACCESS)
@@ -0,0 +1,524 @@
 """AWS Neptune sink implementation.
 Dual Bolt drivers: one against the writer endpoint for workers, one against
 the reader endpoint for the API read path. If `NEPTUNE_READER_ENDPOINT` is
 unset the reader falls back to the writer driver so single-node clusters work.
 Neptune is single-database. The `database` argument on the SinkDatabase
 protocol is ignored; tenant / provider isolation is enforced by labels that
 the sync step already writes on every node (see tasks/jobs/attack_paths/sync.py).
 SigV4 auth lives at the bottom of this file as `neptune_auth_provider`. The
 neo4j driver invokes the returned callable on each token refresh.
 """
 import atexit
 import datetime
 import json
 import logging
 import threading
 import time
 from collections.abc import Callable, Iterator
 from contextlib import contextmanager
 from typing import Any
 from urllib.parse import urlsplit
 import neo4j
 import neo4j.exceptions
 from api.attack_paths.retryable_session import RetryableSession
 from api.attack_paths.sink.base import SinkDatabase
 from botocore.auth import SigV4Auth
 from botocore.awsrequest import AWSRequest
 from botocore.session import Session as BotoSession
 from config.env import env
 from django.conf import settings
 from neo4j.auth_management import AuthManagers, ExpiringAuth
 logging.getLogger("neo4j").setLevel(logging.ERROR)
 logging.getLogger("neo4j").propagate = False
 logger = logging.getLogger(__name__)
 SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
 )
 READ_QUERY_TIMEOUT_SECONDS = env.int(
    "ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
 )
 # Neptune serverless cold-start can be >30s; give the driver room
 CONN_ACQUISITION_TIMEOUT = env.int("NEPTUNE_CONN_ACQUISITION_TIMEOUT", default=60)
 # TCP connect timeout, ordered below the acquisition timeout so an unreachable
 # endpoint can't pin a request or the readiness probe longer than this. Kept
 # generous: cold-start delays query execution, not the socket connect.
 CONNECTION_TIMEOUT = env.int("NEPTUNE_CONNECTION_TIMEOUT", default=10)
 # Roll connections hourly so SigV4 rotations and cert refreshes don't strand long-lived pool entries
 MAX_CONNECTION_LIFETIME = env.int("NEPTUNE_MAX_CONNECTION_LIFETIME", default=3600)
 MAX_CONNECTION_POOL_SIZE = env.int("NEPTUNE_MAX_CONNECTION_POOL_SIZE", default=50)
 READ_EXCEPTION_CODES = [
    "Neo.ClientError.Statement.AccessMode",
    "Neo.ClientError.Procedure.ProcedureNotFound",
 ]
 CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
 # Refresh 60s before the 5-minute SigV4 window closes
 SIGV4_TOKEN_LIFETIME_MINUTES = 4
 class NeptuneSink(SinkDatabase):
    """Neptune-backed sink. Single database; isolation is label-based."""
    def __init__(self) -> None:
        self._writer: neo4j.Driver | None = None
        self._reader: neo4j.Driver | None = None
        self._lock = threading.Lock()
        self._atexit_registered = False
    # Config
    def _config(self) -> dict:
        return settings.DATABASES["neptune"]
    def _bolt_uri(self, endpoint: str, port: str) -> str:
        return f"bolt+s://{endpoint}:{port}"
    def _https_url(self, endpoint: str, port: str) -> str:
        return f"https://{endpoint}:{port}"
    def _build_driver(self, endpoint: str) -> neo4j.Driver:
        cfg = self._config()
        port = cfg["PORT"]
        region = cfg["REGION"]
        if not endpoint or not region:
            raise RuntimeError(
                "NEPTUNE_WRITER_ENDPOINT and AWS_REGION must be set when "
                "ATTACK_PATHS_SINK_DATABASE=neptune"
            )
        return neo4j.GraphDatabase.driver(
            self._bolt_uri(endpoint, port),
            auth=AuthManagers.bearer(
                neptune_auth_provider(region, self._https_url(endpoint, port))
            ),
            keep_alive=True,
            max_connection_lifetime=MAX_CONNECTION_LIFETIME,
            connection_timeout=CONNECTION_TIMEOUT,
            connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
            max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
            max_transaction_retry_time=0,
        )
    # Lifecycle
    def init(self) -> None:
        if self._writer is not None:
            return
        with self._lock:
            if self._writer is None:
                cfg = self._config()
                writer_endpoint = cfg["WRITER_ENDPOINT"]
                reader_endpoint = cfg["READER_ENDPOINT"] or writer_endpoint
                # Eager connectivity checks are best-effort
                # A Neptune that is down at boot must not crash the process, same degradation model as Postgres
                # Drivers reconnect lazily on first use
                # /health/ready surfaces the outage until it recovers
                self._writer = self._build_driver(writer_endpoint)
                self._verify_best_effort(self._writer, "writer")
                if reader_endpoint == writer_endpoint:
                    self._reader = self._writer
                else:
                    self._reader = self._build_driver(reader_endpoint)
                    self._verify_best_effort(self._reader, "reader")
                if not self._atexit_registered:
                    atexit.register(self.close)
                    self._atexit_registered = True
    def close(self) -> None:
        with self._lock:
            # `Driver.close()` is idempotent, so closing the same driver twice
            # (when reader aliases writer on single-endpoint configs) is safe
            for driver in (self._reader, self._writer):
                if driver is None:
                    continue
                try:
                    driver.close()
                except Exception:  # pragma: no cover - best-effort
                    pass
            self._writer = None
            self._reader = None
    # Sessions
    def _get_writer(self) -> neo4j.Driver:
        self.init()
        assert self._writer is not None
        return self._writer
    def _get_reader(self) -> neo4j.Driver:
        self.init()
        assert self._reader is not None
        return self._reader
    @staticmethod
    def _verify_best_effort(driver: neo4j.Driver, role: str) -> None:
        try:
            driver.verify_connectivity()
        except Exception:
            logger.warning(
                "Neptune %s endpoint unreachable at init; continuing with a lazily-reconnecting driver",
                role,
                exc_info=True,
            )
    def verify_connectivity(self) -> None:
        # The API read path uses the reader driver
        # On single-endpoint clusters it aliases the writer, so this also covers the writer
        # A writer-only outage is a workers' concern (no HTTP probe there) and deliberately does not fail API readiness
        self._get_reader().verify_connectivity()
    @contextmanager
    def get_session(
        self,
        database: str | None = None,  # noqa: ARG002 - ignored on Neptune
        default_access_mode: str | None = None,
    ) -> Iterator[RetryableSession]:
        from api.attack_paths.database import (
            ClientStatementException,
            GraphDatabaseQueryException,
            WriteQueryNotAllowedException,
        )
        driver = (
            self._get_reader()
            if default_access_mode == neo4j.READ_ACCESS
            else self._get_writer()
        )
        session_wrapper: RetryableSession | None = None
        try:
            session_wrapper = RetryableSession(
                session_factory=lambda: driver.session(
                    default_access_mode=default_access_mode
                ),
                max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
            )
            yield session_wrapper
        except neo4j.exceptions.Neo4jError as exc:
            if (
                default_access_mode == neo4j.READ_ACCESS
                and exc.code
                and exc.code in READ_EXCEPTION_CODES
            ):
                raise WriteQueryNotAllowedException(
                    message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
                )
            message = exc.message if exc.message is not None else str(exc)
            if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
                raise ClientStatementException(message=message, code=exc.code)
            raise GraphDatabaseQueryException(message=message, code=exc.code)
        finally:
            if session_wrapper is not None:
                session_wrapper.close()
    # Operations
    def execute_read_query(
        self,
        database: str,  # noqa: ARG002 - ignored on Neptune
        cypher: str,
        parameters: dict[str, Any] | None = None,
    ) -> neo4j.graph.Graph:
        with self.get_session(default_access_mode=neo4j.READ_ACCESS) as session:
            def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
                result = tx.run(
                    cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
                )
                return result.graph()
            return session.execute_read(_run)
    def create_database(self, database: str) -> None:  # noqa: ARG002
        # Neptune clusters are single-database; there is nothing to create.
        return None
    def drop_database(self, database: str) -> None:  # noqa: ARG002
        # Neptune clusters are single-database; there is nothing to drop.
        return None
    def drop_subgraph(self, database: str, provider_id: str) -> int:  # noqa: ARG002
        """Delete a provider's subgraph in two bounded phases.
        Neptune write transactions are capped at ~2 minutes. A naive
        `DETACH DELETE` on a label-scanned batch grows unbounded with graph
        density (one node can drag thousands of relationships into the same
        transaction). Instead:
        1. Delete relationships incident to provider nodes, one fixed-size
           batch per transaction.
        2. Delete the now-orphaned nodes, one fixed-size batch per transaction.
        Each transaction does work proportional to `batch_size`, never to the
        graph's branching factor.
        """
        from tasks.jobs.attack_paths.config import (
            BATCH_SIZE,
            PROVIDER_RESOURCE_LABEL,
            get_provider_label,
        )
        provider_label = get_provider_label(provider_id)
        deleted_relationships = 0
        relationship_batches = 0
        node_batches = 0
        drop_t0 = time.perf_counter()
        logger.info(
            "Dropping provider graph from Neptune sink "
            "(provider=%s, provider_label=%s)",
            provider_id,
            provider_label,
        )
        logger.info(
            "Opening Neptune writer session for provider graph drop (provider=%s)",
            provider_id,
        )
        with self.get_session() as session:
            logger.info(
                "Opened Neptune writer session for provider graph drop (provider=%s)",
                provider_id,
            )
            while True:
                next_batch = relationship_batches + 1
                logger.info(
                    "Deleting relationship batch from Neptune sink "
                    "(provider=%s, batch=%s, total_rels=%s, elapsed=%.3fs)",
                    provider_id,
                    next_batch,
                    deleted_relationships,
                    time.perf_counter() - drop_t0,
                )
                result = session.run(
                    f"""
                    MATCH (:`{provider_label}`)-[r]-()
                    WITH DISTINCT r LIMIT $batch_size
                    DELETE r
                    RETURN COUNT(r) AS deleted_rels_count
                    """,
                    {"batch_size": BATCH_SIZE},
                )
                record = result.single()
                deleted_rels = (record["deleted_rels_count"] if record else 0) or 0
                if deleted_rels == 0:
                    break
                relationship_batches += 1
                deleted_relationships += deleted_rels
                logger.info(
                    "Deleted relationship batch from Neptune sink "
                    "(provider=%s, batch=%s, deleted_rels=%s, total_rels=%s, "
                    "elapsed=%.3fs)",
                    provider_id,
                    relationship_batches,
                    deleted_rels,
                    deleted_relationships,
                    time.perf_counter() - drop_t0,
                )
            deleted_nodes = 0
            while True:
                next_batch = node_batches + 1
                logger.info(
                    "Deleting node batch from Neptune sink "
                    "(provider=%s, batch=%s, total_nodes=%s, elapsed=%.3fs)",
                    provider_id,
                    next_batch,
                    deleted_nodes,
                    time.perf_counter() - drop_t0,
                )
                result = session.run(
                    f"""
                    MATCH (n:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}`)
                    WITH n LIMIT $batch_size
                    DELETE n
                    RETURN COUNT(n) AS deleted_nodes_count
                    """,
                    {"batch_size": BATCH_SIZE},
                )
                record = result.single()
                deleted = (record["deleted_nodes_count"] if record else 0) or 0
                if deleted == 0:
                    break
                node_batches += 1
                deleted_nodes += deleted
                logger.info(
                    "Deleted node batch from Neptune sink "
                    "(provider=%s, batch=%s, deleted_nodes=%s, total_nodes=%s, "
                    "elapsed=%.3fs)",
                    provider_id,
                    node_batches,
                    deleted,
                    deleted_nodes,
                    time.perf_counter() - drop_t0,
                )
        logger.info(
            "Finished dropping provider graph from Neptune sink "
            "(provider=%s, relationship_batches=%s, deleted_rels=%s, "
            "node_batches=%s, deleted_nodes=%s, elapsed=%.3fs)",
            provider_id,
            relationship_batches,
            deleted_relationships,
            node_batches,
            deleted_nodes,
            time.perf_counter() - drop_t0,
        )
        return deleted_nodes
    def has_provider_data(self, database: str, provider_id: str) -> bool:  # noqa: ARG002
        from tasks.jobs.attack_paths.config import (
            PROVIDER_RESOURCE_LABEL,
            get_provider_label,
        )
        provider_label = get_provider_label(provider_id)
        query = (
            f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
        )
        with self.get_session(default_access_mode=neo4j.READ_ACCESS) as session:
            result = session.run(query)
            return result.single() is not None
    def clear_cache(self, database: str) -> None:  # noqa: ARG002
        # Neptune has no user-facing cache-clear procedure; no-op.
        return None
    # Sync write path
    def ensure_sync_indexes(self, database: str) -> None:  # noqa: ARG002
        # Neptune routes node and relationship lookups through `~id`, which is the cluster's primary key
        # No additional index is needed or supported
        return None
    def write_nodes(
        self,
        database: str,  # noqa: ARG002
        labels: str,
        rows: list[dict[str, Any]],
    ) -> None:
        if not rows:
            return
        from tasks.jobs.attack_paths.config import (
            PROVIDER_ELEMENT_ID_PROPERTY,
            PROVIDER_RESOURCE_LABEL,
        )
        # MERGE on `~id` is the documented and engine-optimized idempotent
        # upsert pattern for Neptune openCypher. The label inside the MERGE
        # matters: Neptune assigns a default `vertex` label to any node
        # created without an explicit one, so we pin `_ProviderResource`
        # (which every synced node carries anyway) at MERGE-time. Additional
        # labels are added after
        #
        # We also write `_provider_element_id` as a regular property so
        # non-sync code (drop_subgraph, query helpers) keeps a stable contract
        # that doesn't know about `~id`
        query = f"""
            UNWIND $rows AS row
            MERGE (n:`{PROVIDER_RESOURCE_LABEL}` {{`~id`: row.provider_element_id}})
            SET n:{labels}
            SET n += row.props
            SET n.`{PROVIDER_ELEMENT_ID_PROPERTY}` = row.provider_element_id
        """
        with self.get_session() as session:
            session.run(query, {"rows": rows}).consume()
    def write_relationships(
        self,
        database: str,  # noqa: ARG002
        rel_type: str,
        provider_id: str,  # noqa: ARG002 - encoded in start/end `~id` already
        rows: list[dict[str, Any]],
    ) -> None:
        if not rows:
            return
        from tasks.jobs.attack_paths.config import PROVIDER_ELEMENT_ID_PROPERTY
        # `id(n) = $value` is Neptune's parameterized fast path; both endpoint
        # MATCHes resolve in O(1) via the system `~id`, so per-row work stays
        # bounded regardless of batch size
        query = f"""
            UNWIND $rows AS row
            MATCH (s) WHERE id(s) = row.start_element_id
            MATCH (e) WHERE id(e) = row.end_element_id
            MERGE (s)-[r:`{rel_type}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}}]->(e)
            SET r += row.props
        """
        with self.get_session() as session:
            session.run(query, {"rows": rows}).consume()
    # Test helpers
    def get_writer(self) -> neo4j.Driver:
        return self._get_writer()
    def get_reader(self) -> neo4j.Driver:
        return self._get_reader()
 # SigV4 auth provider
 class _NeptuneAuthToken(neo4j.Auth):
    """Neo4j Auth backed by a SigV4-signed GET to `/opencypher`."""
    def __init__(self, region: str, url: str) -> None:
        session = BotoSession()
        credentials = session.get_credentials()
        if credentials is None:
            raise RuntimeError(
                "No AWS credentials available for Neptune SigV4 signing. "
                "Ensure the boto3 credential chain can resolve."
            )
        credentials = credentials.get_frozen_credentials()
        request = AWSRequest(method="GET", url=url + "/opencypher")
        # SigV4 canonical Host must carry the real `host:port`
        # Neptune runs on a non-default port (8182), so `.hostname` would drop it and break signing
        request.headers.add_header("Host", urlsplit(url).netloc)
        SigV4Auth(credentials, "neptune-db", region).add_auth(request)
        auth_obj = {
            header: request.headers[header]
            for header in (
                "Authorization",
                "X-Amz-Date",
                "X-Amz-Security-Token",
                "Host",
            )
            if header in request.headers
        }
        auth_obj["HttpMethod"] = "GET"
        super().__init__("basic", "username", json.dumps(auth_obj))
 def neptune_auth_provider(region: str, https_url: str) -> Callable[[], ExpiringAuth]:
    """Return a callable the neo4j driver can invoke to refresh credentials."""
    def _provider() -> ExpiringAuth:
        token = _NeptuneAuthToken(region, https_url)
        expires_at = (
            datetime.datetime.now(datetime.UTC)
            + datetime.timedelta(minutes=SIGV4_TOKEN_LIFETIME_MINUTES)
        ).timestamp()
        return ExpiringAuth(auth=token, expires_at=expires_at)
    return _provider
@@ -5,6 +5,7 @@ from typing import Any
 import neo4j
 from api.attack_paths import AttackPathsQueryDefinition
 from api.attack_paths import database as graph_database
 from api.attack_paths import sink as sink_module
 from api.attack_paths.cypher_sanitizer import (
    inject_provider_label,
    validate_custom_query,
@@ -14,7 +15,9 @@ from api.attack_paths.queries.schema import (
    RAW_SCHEMA_URL,
    get_cartography_schema_query,
 )
 from api.models import AttackPathsScan
 from config.custom_logging import BackendLogger
 from config.env import env
 from rest_framework.exceptions import APIException, PermissionDenied, ValidationError
 from tasks.jobs.attack_paths.config import (
    INTERNAL_LABELS,
@@ -26,6 +29,10 @@ from tasks.jobs.attack_paths.config import (
 logger = logging.getLogger(BackendLogger.API)
 def _custom_query_timeout_ms() -> int:
    return env.int("ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30) * 1000
 # Predefined query helpers
@@ -102,13 +109,13 @@ def execute_query(
    definition: AttackPathsQueryDefinition,
    parameters: dict[str, Any],
    provider_id: str,
    scan: AttackPathsScan,
 ) -> dict[str, Any]:
    try:
-        graph = graph_database.execute_read_query(
+        # TODO: drop after Neptune cutover
-            database=database_name,
+        # Route reads by the scan row's recorded sink, not by current settings.
-            cypher=definition.cypher,
+        backend = sink_module.get_backend_for_scan(scan)
-            parameters=parameters,
+        graph = backend.execute_read_query(database_name, definition.cypher, parameters)
        )
        return _serialize_graph(graph, provider_id)
    except graph_database.WriteQueryNotAllowedException:
@@ -142,22 +149,31 @@ def execute_custom_query(
    database_name: str,
    cypher: str,
    provider_id: str,
    scan: AttackPathsScan,
 ) -> dict[str, Any]:
    # Defense-in-depth for custom queries:
-    # 1. neo4j.READ_ACCESS — prevents mutations at the driver level
+    # 1. `neo4j.READ_ACCESS` — prevents mutations at the driver level
-    # 2. inject_provider_label() — regex-based label injection scopes node patterns
+    # 2. `inject_provider_label()` — regex-based label injection scopes node patterns
-    # 3. _serialize_graph() — post-query filter drops nodes without the provider label
+    # 3. `_serialize_graph()` — post-query filter drops nodes without the provider label
    # 4. `USING QUERY:TIMEOUTMILLISECONDS` on Neptune — server-side runaway cutoff
    #
    # Layer 2 is best-effort (regex can't fully parse Cypher);
    # layer 3 is the safety net that guarantees provider isolation.
    validate_custom_query(cypher)
    cypher = inject_provider_label(cypher, provider_id)
    # TODO: drop after Neptune cutover
    backend = sink_module.get_backend_for_scan(scan)
    # Neptune enforces a cluster-level query timeout; prepending the hint
    # makes the limit explicit and matches the client-side read timeout.
    # Applies only when the scan's graph lives in Neptune.
    if getattr(scan, "sink_backend", None) == "neptune":
        timeout_ms = _custom_query_timeout_ms()
        cypher = f"USING QUERY:TIMEOUTMILLISECONDS {timeout_ms}\n{cypher}"
    try:
-        graph = graph_database.execute_read_query(
+        graph = backend.execute_read_query(database_name, cypher, None)
            database=database_name,
            cypher=cypher,
        )
        serialized = _serialize_graph(graph, provider_id)
        return _truncate_graph(serialized)
@@ -180,10 +196,11 @@ def execute_custom_query(
 def get_cartography_schema(
-    database_name: str, provider_id: str
+    database_name: str, provider_id: str, scan: AttackPathsScan
 ) -> dict[str, str] | None:
    try:
-        with graph_database.get_session(
+        backend = sink_module.get_backend_for_scan(scan)
        with backend.get_session(
            database_name, default_access_mode=neo4j.READ_ACCESS
        ) as session:
            result = session.run(get_cartography_schema_query(provider_id))
@@ -2,8 +2,9 @@
 Format (draft-inadarei-api-health-check-06).
 Liveness reports only process status. Readiness verifies that PostgreSQL,
-Valkey and Neo4j are reachable and returns per-dependency detail when any
+Valkey and the attack-paths graph store (Neo4j or Neptune, per
-of them is unreachable.
+``ATTACK_PATHS_SINK_DATABASE``) are reachable and returns per-dependency
 detail when any of them is unreachable.
 """
 from __future__ import annotations
@@ -11,6 +12,8 @@ from __future__ import annotations
 import logging
 import threading
 import time
 from concurrent.futures import ThreadPoolExecutor
 from concurrent.futures import TimeoutError as FuturesTimeoutError
 from contextlib import suppress
 from datetime import UTC, datetime
 from typing import Any
@@ -37,9 +40,28 @@ STATUS_FAIL = "fail"
 STATUS_WARN = "warn"
 # Short socket timeout so a stuck Valkey cannot stall the probe.
 # Neo4j inherits its driver-level ``connection_acquisition_timeout``.
 VALKEY_PROBE_TIMEOUT_SECONDS = 2
 # Probe-scoped budget for the graph database.
 # ``Driver.verify_connectivity()`` takes no timeout; its only bound is the
 # driver-level ``connection_acquisition_timeout`` (60s on Neptune). The
 # probe needs its own budget, independent of the workload driver, so a
 # graph-database outage cannot pin a worker thread (and the readiness lock)
 # for a minute.
 GRAPH_DB_PROBE_TIMEOUT_SECONDS = 5
 # Bounded pool that enforces ``GRAPH_DB_PROBE_TIMEOUT_SECONDS``. If the
 # graph database is unreachable the probe call blocks until the driver's
 # own acquisition timeout fires; we abandon the future after the budget and
 # report ``fail``. Orphaned tasks are capped by ``max_workers`` plus the 3s
 # readiness cache plus the per-IP throttle, so they cannot pile up: worst
 # case during a graph-database outage is every readiness call failing fast
 # in ``GRAPH_DB_PROBE_TIMEOUT_SECONDS`` with at most 2 background threads
 # stuck for <= the driver acquisition timeout.
 _graph_db_probe_executor = ThreadPoolExecutor(
    max_workers=2, thread_name_prefix="health-graph-db-probe"
 )
 # Brief cache window so high-frequency probes (ALB target groups, scrapers)
 # do not stampede the actual dependency checks.
 CACHE_CONTROL_HEADER = "max-age=3, must-revalidate"
@@ -109,11 +131,24 @@ def _probe_valkey() -> None:
            client.close()
-def _probe_neo4j() -> None:
+def _graph_db_component_id() -> str:
-    # Lazy import: avoids pulling attack_paths into the boot import graph.
+    """Return the active graph database name for the ``componentId`` field."""
-    from api.attack_paths.database import get_driver
+    return settings.ATTACK_PATHS_SINK_DATABASE.strip().lower()
-    get_driver().verify_connectivity()
+
 def _probe_graph_db() -> None:
    # Lazy import: avoids pulling attack_paths into the boot import graph
    from api.attack_paths.database import verify_connectivity
    future = _graph_db_probe_executor.submit(verify_connectivity)
    try:
        future.result(timeout=GRAPH_DB_PROBE_TIMEOUT_SECONDS)
    except FuturesTimeoutError as exc:
        # Do not wait for the abandoned task; it ends when the driver's own acquisition timeout fires
        future.cancel()
        raise TimeoutError(
            f"graph-db probe exceeded {GRAPH_DB_PROBE_TIMEOUT_SECONDS}s"
        ) from exc
 def _build_check_entry(
@@ -176,14 +211,18 @@ def _readiness_payload() -> tuple[dict[str, Any], int]:
        ):
            return snapshot[1], snapshot[2]
        graph_db_component_id = _graph_db_component_id()
        postgres_result, postgres_ms = _measure("postgres", _probe_postgres)
        valkey_result, valkey_ms = _measure("valkey", _probe_valkey)
-        neo4j_result, neo4j_ms = _measure("neo4j", _probe_neo4j)
+        graph_db_result, graph_db_ms = _measure(graph_db_component_id, _probe_graph_db)
        entries = [
            _build_check_entry("postgres", "datastore", postgres_result, postgres_ms),
            _build_check_entry("valkey", "datastore", valkey_result, valkey_ms),
-            _build_check_entry("neo4j", "datastore", neo4j_result, neo4j_ms),
+            _build_check_entry(
                graph_db_component_id, "datastore", graph_db_result, graph_db_ms
            ),
        ]
        overall = _aggregate_status(entries)
@@ -191,7 +230,7 @@ def _readiness_payload() -> tuple[dict[str, Any], int]:
        payload["checks"] = {
            "postgres:responseTime": [entries[0]],
            "valkey:responseTime": [entries[1]],
-            "neo4j:responseTime": [entries[2]],
+            "graphdb:responseTime": [entries[2]],
        }
        http_status = (
@@ -233,10 +272,10 @@ class LivenessView(APIView):
 class ReadinessView(APIView):
    """Readiness probe.
-    Returns 200 when PostgreSQL, Valkey and Neo4j all respond, or 503 with
+    Returns 200 when PostgreSQL, Valkey and the attack-paths graph store
-    per-dependency detail when any of them is unreachable. Per-IP throttle
+    all respond, or 503 with per-dependency detail when any of them is
-    plus the short in-process result cache cap the real dependency hits
+    unreachable. Per-IP throttle plus the short in-process result cache cap
-    regardless of inbound traffic shape.
+    the real dependency hits regardless of inbound traffic shape.
    """
    authentication_classes: list = []
@@ -0,0 +1,24 @@
 from django.db import migrations, models
 class Migration(migrations.Migration):
    dependencies = [
        ("api", "0095_reconcile_orphan_tasks_periodic_task"),
    ]
    operations = [
        migrations.AddField(
            model_name="attackpathsscan",
            name="is_migrated",
            field=models.BooleanField(default=False),
        ),
        migrations.AddField(
            model_name="attackpathsscan",
            name="sink_backend",
            field=models.CharField(
                choices=[("neo4j", "Neo4j"), ("neptune", "Neptune")],
                default="neo4j",
                max_length=16,
            ),
        ),
    ]
@@ -757,6 +757,10 @@ class Scan(RowLevelSecurityProtectedModel):
 class AttackPathsScan(RowLevelSecurityProtectedModel):
    class SinkBackendChoices(models.TextChoices):
        NEO4J = "neo4j", "Neo4j"
        NEPTUNE = "neptune", "Neptune"
    objects = ActiveProviderManager()
    all_objects = models.Manager()
@@ -805,6 +809,18 @@ class AttackPathsScan(RowLevelSecurityProtectedModel):
    )
    ingestion_exceptions = models.JSONField(default=dict, null=True, blank=True)
    # True when the scan was synced with the current schema (list-typed
    # properties materialised as child item nodes). False for pre-cutover scans
    # still using the previous graph shape. Query catalog selection uses this
    # flag; physical read routing uses sink_backend below.
    # TODO: drop after Neptune cutover
    is_migrated = models.BooleanField(default=False)
    sink_backend = models.CharField(
        choices=SinkBackendChoices.choices,
        default=SinkBackendChoices.NEO4J,
        max_length=16,
    )
    class Meta(RowLevelSecurityProtectedModel.Meta):
        db_table = "attack_paths_scans"
@@ -92,7 +92,9 @@ def test_prepare_parameters_validates_cast(
 def test_execute_query_serializes_graph(
-    attack_paths_query_definition_factory, attack_paths_graph_stub_classes
+    attack_paths_query_definition_factory,
    attack_paths_graph_stub_classes,
    sink_backend_stub,
 ):
    definition = attack_paths_query_definition_factory(
        id="aws-rds",
@@ -135,18 +137,17 @@ def test_execute_query_serializes_graph(
    database_name = "db-tenant-test-tenant-id"
-    with patch(
+    sink_backend_stub.execute_read_query.return_value = graph_result
        "api.attack_paths.views_helpers.graph_database.execute_read_query",
        return_value=graph_result,
    ) as mock_execute_read_query:
    result = views_helpers.execute_query(
-            database_name, definition, parameters, provider_id=provider_id
+        database_name,
        definition,
        parameters,
        provider_id=provider_id,
        scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
    )
-    mock_execute_read_query.assert_called_once_with(
+    sink_backend_stub.execute_read_query.assert_called_once_with(
-        database=database_name,
+        database_name, definition.cypher, parameters
        cypher=definition.cypher,
        parameters=parameters,
    )
    assert result["nodes"][0]["id"] == "node-1"
    assert result["nodes"][0]["properties"]["complex"]["items"][0] == "value"
@@ -155,6 +156,7 @@ def test_execute_query_serializes_graph(
 def test_execute_query_wraps_graph_errors(
    attack_paths_query_definition_factory,
    sink_backend_stub,
 ):
    definition = attack_paths_query_definition_factory(
        id="aws-rds",
@@ -167,16 +169,17 @@ def test_execute_query_wraps_graph_errors(
    database_name = "db-tenant-test-tenant-id"
    parameters = {"provider_uid": "123"}
-    with (
+    sink_backend_stub.execute_read_query.side_effect = (
-        patch(
+        graph_database.GraphDatabaseQueryException("boom")
-            "api.attack_paths.views_helpers.graph_database.execute_read_query",
+    )
-            side_effect=graph_database.GraphDatabaseQueryException("boom"),
+    with patch("api.attack_paths.views_helpers.logger") as mock_logger:
        ),
        patch("api.attack_paths.views_helpers.logger") as mock_logger,
    ):
        with pytest.raises(APIException):
            views_helpers.execute_query(
-                database_name, definition, parameters, provider_id="test-provider-123"
+                database_name,
                definition,
                parameters,
                provider_id="test-provider-123",
                scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
            )
    mock_logger.error.assert_called_once()
@@ -184,6 +187,7 @@ def test_execute_query_wraps_graph_errors(
 def test_execute_query_raises_permission_denied_on_read_only(
    attack_paths_query_definition_factory,
    sink_backend_stub,
 ):
    definition = attack_paths_query_definition_factory(
        id="aws-rds",
@@ -196,16 +200,19 @@ def test_execute_query_raises_permission_denied_on_read_only(
    database_name = "db-tenant-test-tenant-id"
    parameters = {"provider_uid": "123"}
-    with patch(
+    sink_backend_stub.execute_read_query.side_effect = (
-        "api.attack_paths.views_helpers.graph_database.execute_read_query",
+        graph_database.WriteQueryNotAllowedException(
        side_effect=graph_database.WriteQueryNotAllowedException(
            message="Read query not allowed",
            code="Neo.ClientError.Statement.AccessMode",
-        ),
+        )
-    ):
+    )
    with pytest.raises(PermissionDenied):
        views_helpers.execute_query(
-                database_name, definition, parameters, provider_id="test-provider-123"
+            database_name,
            definition,
            parameters,
            provider_id="test-provider-123",
            scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
        )
@@ -440,6 +447,7 @@ def test_normalize_custom_query_payload_passthrough_for_flat_dict():
 def test_execute_custom_query_serializes_graph(
    attack_paths_graph_stub_classes,
    sink_backend_stub,
 ):
    provider_id = "test-provider-123"
    plabel = get_provider_label(provider_id)
@@ -453,50 +461,73 @@ def test_execute_custom_query_serializes_graph(
    graph_result.nodes = [node_1, node_2]
    graph_result.relationships = [relationship]
-    with patch(
+    sink_backend_stub.execute_read_query.return_value = graph_result
        "api.attack_paths.views_helpers.graph_database.execute_read_query",
        return_value=graph_result,
    ) as mock_execute:
    result = views_helpers.execute_custom_query(
-            "db-tenant-test", "MATCH (n) RETURN n", provider_id
+        "db-tenant-test",
        "MATCH (n) RETURN n",
        provider_id,
        scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
    )
-    mock_execute.assert_called_once()
+    sink_backend_stub.execute_read_query.assert_called_once()
-    call_kwargs = mock_execute.call_args[1]
+    call_args = sink_backend_stub.execute_read_query.call_args[0]
-    assert call_kwargs["database"] == "db-tenant-test"
+    assert call_args[0] == "db-tenant-test"
    # The cypher is rewritten with the provider label injection
-    assert plabel in call_kwargs["cypher"]
+    assert plabel in call_args[1]
    assert len(result["nodes"]) == 2
    assert result["relationships"][0]["label"] == "OWNS"
    assert result["truncated"] is False
    assert result["total_nodes"] == 2
-def test_execute_custom_query_raises_permission_denied_on_write():
+def test_execute_custom_query_adds_timeout_for_neptune_scan(sink_backend_stub):
    graph_result = MagicMock()
    graph_result.nodes = []
    graph_result.relationships = []
    sink_backend_stub.execute_read_query.return_value = graph_result
    with patch(
-        "api.attack_paths.views_helpers.graph_database.execute_read_query",
+        "api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
-        side_effect=graph_database.WriteQueryNotAllowedException(
+        return_value=sink_backend_stub,
    ):
        views_helpers.execute_custom_query(
            "db-tenant-test",
            "MATCH (n) RETURN n",
            "provider-1",
            scan=MagicMock(is_migrated=True, sink_backend="neptune"),
        )
    cypher = sink_backend_stub.execute_read_query.call_args[0][1]
    assert cypher.startswith("USING QUERY:TIMEOUTMILLISECONDS")
 def test_execute_custom_query_raises_permission_denied_on_write(sink_backend_stub):
    sink_backend_stub.execute_read_query.side_effect = (
        graph_database.WriteQueryNotAllowedException(
            message="Read query not allowed",
            code="Neo.ClientError.Statement.AccessMode",
-        ),
+        )
-    ):
+    )
    with pytest.raises(PermissionDenied):
        views_helpers.execute_custom_query(
-                "db-tenant-test", "CREATE (n) RETURN n", "provider-1"
+            "db-tenant-test",
            "CREATE (n) RETURN n",
            "provider-1",
            scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
        )
-def test_execute_custom_query_wraps_graph_errors():
+def test_execute_custom_query_wraps_graph_errors(sink_backend_stub):
-    with (
+    sink_backend_stub.execute_read_query.side_effect = (
-        patch(
+        graph_database.GraphDatabaseQueryException("boom")
-            "api.attack_paths.views_helpers.graph_database.execute_read_query",
+    )
-            side_effect=graph_database.GraphDatabaseQueryException("boom"),
+    with patch("api.attack_paths.views_helpers.logger") as mock_logger:
        ),
        patch("api.attack_paths.views_helpers.logger") as mock_logger,
    ):
        with pytest.raises(APIException):
            views_helpers.execute_custom_query(
-                "db-tenant-test", "MATCH (n) RETURN n", "provider-1"
+                "db-tenant-test",
                "MATCH (n) RETURN n",
                "provider-1",
                scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
            )
    mock_logger.error.assert_called_once()
@@ -561,13 +592,33 @@ def test_truncate_graph_empty_graph():
@pytest.fixture
 def mock_neo4j_session():
-    """Mock the Neo4j driver so execute_read_query uses a fake session."""
+    """Install a Neo4jSink with a mocked Bolt driver into the sink factory.
    The yielded mock is the `neo4j.Session` that the Neo4jSink will obtain via
    `driver.session(...)`. Tests configure `mock_neo4j_session.execute_read`
    return values / side effects to exercise the read-mode error translation
    path on the real `Neo4jSink.execute_read_query` and `get_session` code.
    """
    from api.attack_paths.sink import factory
    from api.attack_paths.sink.neo4j import Neo4jSink
    mock_session = MagicMock(spec=neo4j.Session)
    mock_driver = MagicMock(spec=neo4j.Driver)
    mock_driver.session.return_value = mock_session
-    with patch("api.attack_paths.database.get_driver", return_value=mock_driver):
+    sink = Neo4jSink()
    sink._driver = mock_driver
    previous_backend = factory._backend
    previous_secondary = dict(factory._secondary_backends)
    factory._backend = sink
    factory._secondary_backends.clear()
    try:
        yield mock_session
    finally:
        factory._backend = previous_backend
        factory._secondary_backends.clear()
        factory._secondary_backends.update(previous_secondary)
 def test_execute_read_query_succeeds_with_select(mock_neo4j_session):
@@ -663,16 +714,20 @@ def test_execute_read_query_rejects_apoc_real_create(mock_neo4j_session, cypher)
@pytest.fixture
 def mock_schema_session():
-    """Mock get_session for cartography schema tests."""
+    """Mock the routed sink backend session for cartography schema tests."""
    mock_result = MagicMock()
    mock_session = MagicMock()
    mock_session.run.return_value = mock_result
    mock_backend = MagicMock()
    with patch(
-        "api.attack_paths.views_helpers.graph_database.get_session"
+        "api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
-    ) as mock_get_session:
+        return_value=mock_backend,
-        mock_get_session.return_value.__enter__ = MagicMock(return_value=mock_session)
+    ):
-        mock_get_session.return_value.__exit__ = MagicMock(return_value=False)
+        mock_backend.get_session.return_value.__enter__ = MagicMock(
            return_value=mock_session
        )
        mock_backend.get_session.return_value.__exit__ = MagicMock(return_value=False)
        yield mock_session, mock_result
@@ -683,7 +738,9 @@ def test_get_cartography_schema_returns_urls(mock_schema_session):
        "module_version": "0.129.0",
    }
-    result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+    result = views_helpers.get_cartography_schema(
        "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
    )
    mock_session.run.assert_called_once()
    assert result["id"] == "aws-0.129.0"
@@ -699,7 +756,9 @@ def test_get_cartography_schema_returns_none_when_no_data(mock_schema_session):
    _, mock_result = mock_schema_session
    mock_result.single.return_value = None
-    result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+    result = views_helpers.get_cartography_schema(
        "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
    )
    assert result is None
@@ -721,21 +780,29 @@ def test_get_cartography_schema_extracts_provider(
        "module_version": "1.0.0",
    }
-    result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+    result = views_helpers.get_cartography_schema(
        "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
    )
    assert result["id"] == f"{expected_provider}-1.0.0"
    assert result["provider"] == expected_provider
 def test_get_cartography_schema_wraps_database_error():
    mock_backend = MagicMock()
    mock_backend.get_session.side_effect = graph_database.GraphDatabaseQueryException(
        "boom"
    )
    with (
        patch(
-            "api.attack_paths.views_helpers.graph_database.get_session",
+            "api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
-            side_effect=graph_database.GraphDatabaseQueryException("boom"),
+            return_value=mock_backend,
        ),
        patch("api.attack_paths.views_helpers.logger") as mock_logger,
    ):
        with pytest.raises(APIException):
-            views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+            views_helpers.get_cartography_schema(
                "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
            )
    mock_logger.error.assert_called_once()
@@ -1,623 +1,174 @@
-"""
+"""Tests for the attack-paths database facade.
 Tests for Neo4j database lazy initialization.
-The Neo4j driver is created on first use for every process type; app startup
+After the Neptune port, `api.attack_paths.database` is a thin routing shim
-never contacts Neo4j. These tests validate the database module behavior itself.
+over `api.attack_paths.ingest` (cartography temp DB, always Neo4j) and
 `api.attack_paths.sink` (configurable Neo4j or Neptune). The facade's
 contract is routing by database-name prefix and the public exception
 hierarchy; sink-internal behavior is exercised in `test_sink.py`.
 """
 import threading
 from unittest.mock import MagicMock, patch
 import api.attack_paths.database as db_module
 import neo4j
 import neo4j.exceptions
 import pytest
-class TestLazyInitialization:
+class TestDatabaseNameHelper:
-    """Test that Neo4j driver is initialized lazily on first use."""
+    def test_tenant_name_lowercases_uuid(self):
    @pytest.fixture(autouse=True)
    def reset_module_state(self):
        """Reset module-level singleton state before each test."""
        original_driver = db_module._driver
        db_module._driver = None
        yield
        db_module._driver = original_driver
    def test_driver_not_initialized_at_import(self):
        """Driver should be None after module import (no eager connection)."""
        assert db_module._driver is None
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_init_driver_creates_connection_on_first_call(
        self, mock_driver_factory, mock_settings
    ):
        """init_driver() should create connection only when called."""
        mock_driver = MagicMock()
        mock_driver_factory.return_value = mock_driver
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        assert db_module._driver is None
        result = db_module.init_driver()
        mock_driver_factory.assert_called_once()
        mock_driver.verify_connectivity.assert_called_once()
        assert result is mock_driver
        assert db_module._driver is mock_driver
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_init_driver_leaves_driver_none_when_verify_fails(
        self, mock_driver_factory, mock_settings
    ):
        """A failed verify_connectivity() must not publish or leak the driver."""
        mock_driver = MagicMock()
        mock_driver.verify_connectivity.side_effect = (
            neo4j.exceptions.ServiceUnavailable("down")
        )
        mock_driver_factory.return_value = mock_driver
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        with pytest.raises(neo4j.exceptions.ServiceUnavailable):
            db_module.init_driver()
        assert db_module._driver is None
        mock_driver.close.assert_called_once()
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_init_driver_returns_cached_driver_on_subsequent_calls(
        self, mock_driver_factory, mock_settings
    ):
        """Subsequent calls should return cached driver without reconnecting."""
        mock_driver = MagicMock()
        mock_driver_factory.return_value = mock_driver
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        first_result = db_module.init_driver()
        second_result = db_module.init_driver()
        third_result = db_module.init_driver()
        # Only one connection attempt
        assert mock_driver_factory.call_count == 1
        assert mock_driver.verify_connectivity.call_count == 1
        # All calls return same instance
        assert first_result is second_result is third_result
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_get_driver_delegates_to_init_driver(
        self, mock_driver_factory, mock_settings
    ):
        """get_driver() should use init_driver() for lazy initialization."""
        mock_driver = MagicMock()
        mock_driver_factory.return_value = mock_driver
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        result = db_module.get_driver()
        assert result is mock_driver
        mock_driver_factory.assert_called_once()
 class TestConnectionAcquisitionTimeout:
    """Test that the connection acquisition timeout is configurable."""
    @pytest.fixture(autouse=True)
    def reset_module_state(self):
        original_driver = db_module._driver
        original_acq_timeout = db_module.CONN_ACQUISITION_TIMEOUT
        original_conn_timeout = db_module.CONNECTION_TIMEOUT
        db_module._driver = None
        yield
        db_module._driver = original_driver
        db_module.CONN_ACQUISITION_TIMEOUT = original_acq_timeout
        db_module.CONNECTION_TIMEOUT = original_conn_timeout
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_driver_receives_configured_timeout(
        self, mock_driver_factory, mock_settings
    ):
        """init_driver() should pass the configured timeouts to the neo4j driver."""
        mock_driver_factory.return_value = MagicMock()
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        db_module.CONN_ACQUISITION_TIMEOUT = 42
        db_module.CONNECTION_TIMEOUT = 7
        db_module.init_driver()
        _, kwargs = mock_driver_factory.call_args
        assert kwargs["connection_acquisition_timeout"] == 42
        assert kwargs["connection_timeout"] == 7
 class TestAtexitRegistration:
    """Test that atexit cleanup handler is registered correctly."""
    @pytest.fixture(autouse=True)
    def reset_module_state(self):
        """Reset module-level singleton state before each test."""
        original_driver = db_module._driver
        db_module._driver = None
        yield
        db_module._driver = original_driver
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.atexit.register")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_atexit_registered_on_first_init(
        self, mock_driver_factory, mock_atexit_register, mock_settings
    ):
        """atexit.register should be called on first initialization."""
        mock_driver_factory.return_value = MagicMock()
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        db_module.init_driver()
        mock_atexit_register.assert_called_once_with(db_module.close_driver)
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.atexit.register")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_atexit_registered_only_once(
        self, mock_driver_factory, mock_atexit_register, mock_settings
    ):
        """atexit.register should only be called once across multiple inits.
        The double-checked locking on _driver ensures the atexit registration
        block only executes once (when _driver is first created).
        """
        mock_driver_factory.return_value = MagicMock()
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        db_module.init_driver()
        db_module.init_driver()
        db_module.init_driver()
        # Only registered once because subsequent calls hit the fast path
        assert mock_atexit_register.call_count == 1
 class TestCloseDriver:
    """Test driver cleanup functionality."""
    @pytest.fixture(autouse=True)
    def reset_module_state(self):
        """Reset module-level singleton state before each test."""
        original_driver = db_module._driver
        db_module._driver = None
        yield
        db_module._driver = original_driver
    def test_close_driver_closes_and_clears_driver(self):
        """close_driver() should close the driver and set it to None."""
        mock_driver = MagicMock()
        db_module._driver = mock_driver
        db_module.close_driver()
        mock_driver.close.assert_called_once()
        assert db_module._driver is None
    def test_close_driver_handles_none_driver(self):
        """close_driver() should handle case where driver is None."""
        db_module._driver = None
        # Should not raise
        db_module.close_driver()
        assert db_module._driver is None
    def test_close_driver_clears_driver_even_on_close_error(self):
        """Driver should be cleared even if close() raises an exception."""
        mock_driver = MagicMock()
        mock_driver.close.side_effect = Exception("Connection error")
        db_module._driver = mock_driver
        with pytest.raises(Exception, match="Connection error"):
            db_module.close_driver()
        # Driver should still be cleared
        assert db_module._driver is None
 class TestExecuteReadQuery:
    """Test read query execution helper."""
    def test_execute_read_query_calls_read_session_and_returns_result(self):
        tx = MagicMock()
        expected_graph = MagicMock()
        run_result = MagicMock()
        run_result.graph.return_value = expected_graph
        tx.run.return_value = run_result
        session = MagicMock()
        def execute_read_side_effect(fn):
            return fn(tx)
        session.execute_read.side_effect = execute_read_side_effect
        session_ctx = MagicMock()
        session_ctx.__enter__.return_value = session
        session_ctx.__exit__.return_value = False
        with patch(
            "api.attack_paths.database.get_session",
            return_value=session_ctx,
        ) as mock_get_session:
            result = db_module.execute_read_query(
                "db-tenant-test-tenant-id",
                "MATCH (n) RETURN n",
                {"provider_uid": "123"},
            )
        mock_get_session.assert_called_once_with(
            "db-tenant-test-tenant-id",
            default_access_mode=neo4j.READ_ACCESS,
        )
        session.execute_read.assert_called_once()
        tx.run.assert_called_once_with(
            "MATCH (n) RETURN n",
            {"provider_uid": "123"},
            timeout=db_module.READ_QUERY_TIMEOUT_SECONDS,
        )
        run_result.graph.assert_called_once_with()
        assert result is expected_graph
    def test_execute_read_query_defaults_parameters_to_empty_dict(self):
        tx = MagicMock()
        run_result = MagicMock()
        run_result.graph.return_value = MagicMock()
        tx.run.return_value = run_result
        session = MagicMock()
        session.execute_read.side_effect = lambda fn: fn(tx)
        session_ctx = MagicMock()
        session_ctx.__enter__.return_value = session
        session_ctx.__exit__.return_value = False
        with patch(
            "api.attack_paths.database.get_session",
            return_value=session_ctx,
        ):
            db_module.execute_read_query(
                "db-tenant-test-tenant-id",
                "MATCH (n) RETURN n",
            )
        tx.run.assert_called_once_with(
            "MATCH (n) RETURN n",
            {},
            timeout=db_module.READ_QUERY_TIMEOUT_SECONDS,
        )
        run_result.graph.assert_called_once_with()
 class TestGetSessionReadOnly:
    """Test that get_session translates Neo4j read-mode errors."""
    @pytest.fixture(autouse=True)
    def reset_module_state(self):
        original_driver = db_module._driver
        db_module._driver = None
        yield
        db_module._driver = original_driver
    @pytest.mark.parametrize(
        "neo4j_code",
        [
            "Neo.ClientError.Statement.AccessMode",
            "Neo.ClientError.Procedure.ProcedureNotFound",
        ],
    )
    def test_get_session_raises_write_query_not_allowed(self, neo4j_code):
        """Read-mode Neo4j errors should raise `WriteQueryNotAllowedException`."""
        mock_session = MagicMock()
        neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
            code=neo4j_code,
            message="Write operations are not allowed",
        )
        mock_session.run.side_effect = neo4j_error
        mock_driver = MagicMock()
        mock_driver.session.return_value = mock_session
        db_module._driver = mock_driver
        with pytest.raises(db_module.WriteQueryNotAllowedException):
            with db_module.get_session(
                default_access_mode=neo4j.READ_ACCESS
            ) as session:
                session.run("CREATE (n) RETURN n")
    def test_get_session_raises_generic_exception_for_other_errors(self):
        """Non-read-mode Neo4j errors should raise GraphDatabaseQueryException."""
        mock_session = MagicMock()
        neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
            code="Neo.ClientError.Statement.SyntaxError",
            message="Invalid syntax",
        )
        mock_session.run.side_effect = neo4j_error
        mock_driver = MagicMock()
        mock_driver.session.return_value = mock_session
        db_module._driver = mock_driver
        with pytest.raises(db_module.GraphDatabaseQueryException):
            with db_module.get_session(
                default_access_mode=neo4j.READ_ACCESS
            ) as session:
                session.run("INVALID CYPHER")
 class TestThreadSafety:
    """Test thread-safe initialization."""
    @pytest.fixture(autouse=True)
    def reset_module_state(self):
        """Reset module-level singleton state before each test."""
        original_driver = db_module._driver
        db_module._driver = None
        yield
        db_module._driver = original_driver
    @patch("api.attack_paths.database.settings")
    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
    def test_concurrent_init_creates_single_driver(
        self, mock_driver_factory, mock_settings
    ):
        """Multiple threads calling init_driver() should create only one driver."""
        mock_driver = MagicMock()
        mock_driver_factory.return_value = mock_driver
        mock_settings.DATABASES = {
            "neo4j": {
                "HOST": "localhost",
                "PORT": 7687,
                "USER": "neo4j",
                "PASSWORD": "password",
            }
        }
        results = []
        errors = []
        def call_init():
            try:
                result = db_module.init_driver()
                results.append(result)
            except Exception as e:
                errors.append(e)
        threads = [threading.Thread(target=call_init) for _ in range(10)]
        for t in threads:
            t.start()
        for t in threads:
            t.join()
        assert not errors, f"Threads raised errors: {errors}"
        # Only one driver created
        assert mock_driver_factory.call_count == 1
        # All threads got the same driver instance
        assert all(r is mock_driver for r in results)
        assert len(results) == 10
 class TestHasProviderData:
    """Test has_provider_data helper for checking provider nodes in Neo4j."""
    def test_returns_true_when_nodes_exist(self):
        mock_session = MagicMock()
        mock_result = MagicMock()
        mock_result.single.return_value = MagicMock()  # non-None record
        mock_session.run.return_value = mock_result
        session_ctx = MagicMock()
        session_ctx.__enter__.return_value = mock_session
        session_ctx.__exit__.return_value = False
        with patch(
            "api.attack_paths.database.get_session",
            return_value=session_ctx,
        ):
            assert db_module.has_provider_data("db-tenant-abc", "provider-123") is True
        mock_session.run.assert_called_once()
    def test_returns_false_when_no_nodes(self):
        mock_session = MagicMock()
        mock_result = MagicMock()
        mock_result.single.return_value = None
        mock_session.run.return_value = mock_result
        session_ctx = MagicMock()
        session_ctx.__enter__.return_value = mock_session
        session_ctx.__exit__.return_value = False
        with patch(
            "api.attack_paths.database.get_session",
            return_value=session_ctx,
        ):
            assert db_module.has_provider_data("db-tenant-abc", "provider-123") is False
    def test_returns_false_when_database_not_found(self):
        session_ctx = MagicMock()
        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
            message="Database does not exist",
            code="Neo.ClientError.Database.DatabaseNotFound",
        )
        with patch(
            "api.attack_paths.database.get_session",
            return_value=session_ctx,
        ):
        assert (
-                db_module.has_provider_data("db-tenant-gone", "provider-123") is False
+            db_module.get_database_name("ABC-123", temporary=False)
            == "db-tenant-abc-123"
        )
-    def test_raises_on_other_errors(self):
+    def test_temporary_name_uses_tmp_scan_prefix(self):
-        session_ctx = MagicMock()
+        assert (
-        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
+            db_module.get_database_name("XYZ-789", temporary=True)
-            message="Connection refused",
+            == "db-tmp-scan-xyz-789"
            code="Neo.TransientError.General.UnknownError",
        )
        with patch(
            "api.attack_paths.database.get_session",
            return_value=session_ctx,
        ):
            with pytest.raises(db_module.GraphDatabaseQueryException):
                db_module.has_provider_data("db-tenant-abc", "provider-123")
 class TestExceptionHierarchy:
    """`tasks/` and `api/v1/views.py` import these from the facade."""
-class TestDropSubgraph:
+    def test_write_query_is_graph_database_exception(self):
-    """Test drop_subgraph two-phase batched deletion of a provider's graph."""
+        assert issubclass(
-
+            db_module.WriteQueryNotAllowedException,
-    @staticmethod
+            db_module.GraphDatabaseQueryException,
    def _result(count):
        result = MagicMock()
        result.single.return_value.get.return_value = count
        return result
    @staticmethod
    def _session_ctx(session):
        ctx = MagicMock()
        ctx.__enter__.return_value = session
        ctx.__exit__.return_value = False
        return ctx
    def test_deletes_relationships_then_nodes_in_batches(self):
        session = MagicMock()
        # Phase 1 (relationships): one full batch then empty.
        # Phase 2 (nodes): one full batch then empty.
        session.run.side_effect = [
            self._result(1000),
            self._result(0),
            self._result(1000),
            self._result(0),
        ]
        with patch(
            "api.attack_paths.database.get_session",
            return_value=self._session_ctx(session),
        ):
            deleted = db_module.drop_subgraph("db-tenant-abc", "provider-123")
        # Only phase-2 node counts contribute to the return value.
        assert deleted == 1000
        assert session.run.call_count == 4
        queries = [call.args[0] for call in session.run.call_args_list]
        # Regression guard: the memory blow-up was caused by DETACH DELETE.
        assert all("DETACH DELETE" not in query for query in queries)
        rel_queries = [query for query in queries if "DELETE r" in query]
        node_queries = [query for query in queries if "DELETE n" in query]
        assert rel_queries and node_queries
        # DISTINCT avoids double-counting relationships matched from both ends.
        assert all("DISTINCT r" in query for query in rel_queries)
        # Relationships must be fully drained before nodes are deleted.
        first_node = next(i for i, q in enumerate(queries) if "DELETE n" in q)
        last_rel = max(i for i, q in enumerate(queries) if "DELETE r" in q)
        assert last_rel < first_node
    def test_returns_zero_when_database_not_found(self):
        session_ctx = MagicMock()
        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
            message="Database does not exist",
            code="Neo.ClientError.Database.DatabaseNotFound",
        )
-        with patch(
+    def test_client_statement_is_graph_database_exception(self):
-            "api.attack_paths.database.get_session",
+        assert issubclass(
-            return_value=session_ctx,
+            db_module.ClientStatementException, db_module.GraphDatabaseQueryException
        ):
            assert db_module.drop_subgraph("db-tenant-gone", "provider-123") == 0
    def test_raises_on_other_errors(self):
        session_ctx = MagicMock()
        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
            message="Connection refused",
            code="Neo.TransientError.General.UnknownError",
        )
-        with patch(
+    def test_exception_str_includes_code_when_set(self):
-            "api.attack_paths.database.get_session",
+        exc = db_module.GraphDatabaseQueryException(
-            return_value=session_ctx,
+            message="boom", code="Neo.ClientError.X.Y"
-        ):
+        )
-            with pytest.raises(db_module.GraphDatabaseQueryException):
+        assert str(exc) == "Neo.ClientError.X.Y: boom"
-                db_module.drop_subgraph("db-tenant-abc", "provider-123")
+
    def test_exception_str_falls_back_to_message_without_code(self):
        exc = db_module.GraphDatabaseQueryException(message="boom")
        assert str(exc) == "boom"
 class TestExecuteReadQueryRoutes:
    def test_execute_read_query_delegates_to_sink(self, sink_backend_stub):
        sink_backend_stub.execute_read_query.return_value = "graph"
        result = db_module.execute_read_query(
            "db-tenant-abc", "MATCH (n) RETURN n", {"provider_uid": "123"}
        )
        sink_backend_stub.execute_read_query.assert_called_once_with(
            "db-tenant-abc", "MATCH (n) RETURN n", {"provider_uid": "123"}
        )
        assert result == "graph"
    def test_execute_read_query_defaults_parameters_to_none(self, sink_backend_stub):
        db_module.execute_read_query("db-tenant-abc", "MATCH (n) RETURN n")
        sink_backend_stub.execute_read_query.assert_called_once_with(
            "db-tenant-abc", "MATCH (n) RETURN n", None
        )
 class TestSinkOperationsDelegation:
    def test_has_provider_data_delegates_to_sink(self, sink_backend_stub):
        sink_backend_stub.has_provider_data.return_value = True
        assert db_module.has_provider_data("db-tenant-abc", "provider-123") is True
        sink_backend_stub.has_provider_data.assert_called_once_with(
            "db-tenant-abc", "provider-123"
        )
    def test_drop_subgraph_delegates_to_sink(self, sink_backend_stub):
        sink_backend_stub.drop_subgraph.return_value = 42
        assert db_module.drop_subgraph("db-tenant-abc", "provider-123") == 42
        sink_backend_stub.drop_subgraph.assert_called_once_with(
            "db-tenant-abc", "provider-123"
        )
 class TestRoutingByDatabasePrefix:
    """`db-tmp-scan-*` and `None` route to ingest; everything else to sink."""
    def test_create_database_routes_temp_to_ingest(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            db_module.create_database("db-tmp-scan-uuid-1")
        mock_ingest.create_database.assert_called_once_with("db-tmp-scan-uuid-1")
        sink_backend_stub.create_database.assert_not_called()
    def test_create_database_routes_tenant_to_sink(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            db_module.create_database("db-tenant-abc")
        sink_backend_stub.create_database.assert_called_once_with("db-tenant-abc")
        mock_ingest.create_database.assert_not_called()
    def test_drop_database_routes_temp_to_ingest(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            db_module.drop_database("db-tmp-scan-uuid-1")
        mock_ingest.drop_database.assert_called_once_with("db-tmp-scan-uuid-1")
        sink_backend_stub.drop_database.assert_not_called()
    def test_drop_database_routes_tenant_to_sink(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            db_module.drop_database("db-tenant-abc")
        sink_backend_stub.drop_database.assert_called_once_with("db-tenant-abc")
        mock_ingest.drop_database.assert_not_called()
    def test_clear_cache_routes_temp_to_ingest(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            db_module.clear_cache("db-tmp-scan-uuid-1")
        mock_ingest.clear_cache.assert_called_once_with("db-tmp-scan-uuid-1")
        sink_backend_stub.clear_cache.assert_not_called()
    def test_clear_cache_routes_tenant_to_sink(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            db_module.clear_cache("db-tenant-abc")
        sink_backend_stub.clear_cache.assert_called_once_with("db-tenant-abc")
        mock_ingest.clear_cache.assert_not_called()
    def test_get_session_routes_temp_to_ingest(self, sink_backend_stub):
        sentinel = MagicMock()
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            mock_ingest.get_session.return_value = sentinel
            result = db_module.get_session("db-tmp-scan-uuid-1")
        assert result is sentinel
        mock_ingest.get_session.assert_called_once()
        sink_backend_stub.get_session.assert_not_called()
    def test_get_session_routes_none_to_ingest(self, sink_backend_stub):
        sentinel = MagicMock()
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            mock_ingest.get_session.return_value = sentinel
            result = db_module.get_session(None)
        assert result is sentinel
        sink_backend_stub.get_session.assert_not_called()
    def test_get_ingest_uri_delegates_to_ingest(self, sink_backend_stub):
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            mock_ingest.get_uri.return_value = "bolt://neo4j:7687"
            assert db_module.get_ingest_uri() == "bolt://neo4j:7687"
            mock_ingest.get_uri.assert_called_once_with()
    def test_get_session_routes_tenant_to_sink(self, sink_backend_stub):
        sentinel = MagicMock()
        sink_backend_stub.get_session.return_value = sentinel
        with patch("api.attack_paths.database.ingest") as mock_ingest:
            result = db_module.get_session("db-tenant-abc")
        assert result is sentinel
        mock_ingest.get_session.assert_not_called()
@@ -67,7 +67,7 @@ class TestLivenessEndpoint:
        with (
            patch("api.health._probe_postgres") as mock_pg,
            patch("api.health._probe_valkey") as mock_vk,
-            patch("api.health._probe_neo4j") as mock_neo,
+            patch("api.health._probe_graph_db") as mock_neo,
        ):
            response = api_client.get(reverse("health-live"))
@@ -83,14 +83,14 @@ class TestReadinessEndpoint:
        return (
            patch("api.health._probe_postgres", return_value=None),
            patch("api.health._probe_valkey", return_value=None),
-            patch("api.health._probe_neo4j", return_value=None),
+            patch("api.health._probe_graph_db", return_value=None),
        )
    def test_returns_200_and_pass_when_all_dependencies_healthy(self, api_client):
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))
@@ -107,7 +107,7 @@ class TestReadinessEndpoint:
        assert set(body["checks"].keys()) == {
            "postgres:responseTime",
            "valkey:responseTime",
-            "neo4j:responseTime",
+            "graphdb:responseTime",
        }
        for key in body["checks"]:
            entries = body["checks"][key]
@@ -122,6 +122,23 @@ class TestReadinessEndpoint:
            # `output` must not leak when the check passed.
            assert "output" not in entry
    @pytest.mark.parametrize("sink", ["neo4j", "neptune"])
    def test_graphdb_component_id_reflects_active_sink(self, api_client, sink):
        from django.test import override_settings
        with (
            override_settings(ATTACK_PATHS_SINK_DATABASE=sink),
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))
        assert response.status_code == status.HTTP_200_OK
        entry = response.json()["checks"]["graphdb:responseTime"][0]
        # Stable key, but the concrete store is named in componentId.
        assert entry["componentId"] == sink
    def test_returns_503_and_fail_when_postgres_is_down(self, api_client):
        with (
            patch(
@@ -129,7 +146,7 @@ class TestReadinessEndpoint:
                side_effect=RuntimeError("connection refused"),
            ),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))
@@ -141,13 +158,13 @@ class TestReadinessEndpoint:
        # Exception detail is never echoed in the response, only logged.
        assert "output" not in pg_entry
        assert body["checks"]["valkey:responseTime"][0]["status"] == "pass"
-        assert body["checks"]["neo4j:responseTime"][0]["status"] == "pass"
+        assert body["checks"]["graphdb:responseTime"][0]["status"] == "pass"
    def test_returns_503_and_fail_when_valkey_is_down(self, api_client):
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey", side_effect=ConnectionError("timeout")),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))
@@ -158,12 +175,12 @@ class TestReadinessEndpoint:
        assert vk_entry["status"] == "fail"
        assert "output" not in vk_entry
-    def test_returns_503_and_fail_when_neo4j_is_down(self, api_client):
+    def test_returns_503_and_fail_when_graph_db_is_down(self, api_client):
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
            patch(
-                "api.health._probe_neo4j",
+                "api.health._probe_graph_db",
                side_effect=RuntimeError("ServiceUnavailable"),
            ),
        ):
@@ -172,15 +189,15 @@ class TestReadinessEndpoint:
        assert response.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
        body = response.json()
        assert body["status"] == "fail"
-        neo_entry = body["checks"]["neo4j:responseTime"][0]
+        graph_db_entry = body["checks"]["graphdb:responseTime"][0]
-        assert neo_entry["status"] == "fail"
+        assert graph_db_entry["status"] == "fail"
-        assert "output" not in neo_entry
+        assert "output" not in graph_db_entry
    def test_reports_all_failures_simultaneously(self, api_client):
        with (
            patch("api.health._probe_postgres", side_effect=RuntimeError("pg down")),
            patch("api.health._probe_valkey", side_effect=RuntimeError("vk down")),
-            patch("api.health._probe_neo4j", side_effect=RuntimeError("neo down")),
+            patch("api.health._probe_graph_db", side_effect=RuntimeError("neo down")),
        ):
            response = api_client.get(reverse("health-ready"))
@@ -190,7 +207,7 @@ class TestReadinessEndpoint:
        for key in (
            "postgres:responseTime",
            "valkey:responseTime",
-            "neo4j:responseTime",
+            "graphdb:responseTime",
        ):
            entry = body["checks"][key][0]
            assert entry["status"] == "fail"
@@ -209,7 +226,7 @@ class TestReadinessEndpoint:
        with (
            patch("api.health._probe_postgres", side_effect=RuntimeError(sensitive)),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))
@@ -229,7 +246,7 @@ class TestReadinessEndpoint:
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            api_client.credentials()
            response = api_client.get(reverse("health-ready"))
@@ -244,7 +261,7 @@ class TestReadinessCache:
        with (
            patch("api.health._probe_postgres") as pg,
            patch("api.health._probe_valkey") as vk,
-            patch("api.health._probe_neo4j") as neo,
+            patch("api.health._probe_graph_db") as neo,
        ):
            r1 = api_client.get(reverse("health-ready"))
            r2 = api_client.get(reverse("health-ready"))
@@ -262,7 +279,7 @@ class TestReadinessCache:
        with (
            patch("api.health._probe_postgres") as pg,
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            api_client.get(reverse("health-ready"))
            assert pg.call_count == 1
@@ -286,7 +303,7 @@ class TestReadinessCache:
        with (
            patch("api.health._probe_postgres", side_effect=RuntimeError("down")) as pg,
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            r1 = api_client.get(reverse("health-ready"))
            r2 = api_client.get(reverse("health-ready"))
@@ -320,7 +337,7 @@ class TestRateLimiting:
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
            patch.object(ScopedRateThrottle, "parse_rate", return_value=(2, 60)),
        ):
            statuses = [
@@ -414,19 +431,42 @@ class TestProbeImplementations:
            with pytest.raises(RuntimeError, match="bug"):
                health._probe_valkey()
-    def test_neo4j_probe_calls_verify_connectivity(self):
+    def test_graph_db_probe_calls_verify_connectivity(self):
-        with patch("api.attack_paths.database.get_driver") as mock_get_driver:
+        with patch("api.attack_paths.database.verify_connectivity") as mock_verify:
-            mock_get_driver.return_value.verify_connectivity.return_value = None
+            mock_verify.return_value = None
-            assert health._probe_neo4j() is None
+            assert health._probe_graph_db() is None
-            mock_get_driver.return_value.verify_connectivity.assert_called_once_with()
+            mock_verify.assert_called_once_with()
-    def test_neo4j_probe_propagates_driver_errors(self):
+    def test_graph_db_probe_propagates_errors(self):
-        with patch("api.attack_paths.database.get_driver") as mock_get_driver:
+        with patch(
-            mock_get_driver.return_value.verify_connectivity.side_effect = RuntimeError(
+            "api.attack_paths.database.verify_connectivity",
-                "unreachable"
+            side_effect=RuntimeError("unreachable"),
-            )
+        ):
            with pytest.raises(RuntimeError, match="unreachable"):
-                health._probe_neo4j()
+                health._probe_graph_db()
    def test_graph_db_probe_times_out_when_check_exceeds_budget(self):
        # A sink whose connectivity check blocks past the probe budget must
        # surface as a failure fast, not pin the request thread for the
        # driver's full acquisition timeout.
        import time as _time
        def _hang() -> None:
            _time.sleep(2)
        with (
            patch("api.health.GRAPH_DB_PROBE_TIMEOUT_SECONDS", 0.2),
            patch(
                "api.attack_paths.database.verify_connectivity",
                side_effect=_hang,
            ),
        ):
            started = _time.perf_counter()
            with pytest.raises(TimeoutError):
                health._probe_graph_db()
            elapsed = _time.perf_counter() - started
        assert elapsed < health.GRAPH_DB_PROBE_TIMEOUT_SECONDS + 1
 class TestStatusAggregation:
@@ -0,0 +1,626 @@
 """Tests for the attack-paths sink factory and Neo4j sink.
 The sink module picks a backend per ``settings.ATTACK_PATHS_SINK_DATABASE``.
 Neo4j is the default and preserves today's behavior; Neptune is opt-in and
 builds dual writer/reader Bolt drivers.
 """
 import json
 from importlib import import_module
 from unittest.mock import MagicMock, patch
 import pytest
 # Prime patch-target resolution. `api.attack_paths.sink/__init__.py` doesn't
 # eagerly import these submodules (they're loaded on demand inside the
 # factory), so `mock.patch("api.attack_paths.sink.<sub>.…")` would fail with
 # AttributeError on first call. Importing here registers them as attributes
 # of the package before any decorator runs.
 import_module("api.attack_paths.sink.neo4j")
 import_module("api.attack_paths.sink.neptune")
@pytest.fixture(autouse=True)
 def reset_sink_state():
    """Reset the module-level backend singletons around each test.
    The cache lives in `api.attack_paths.sink.factory`, not on the package.
    """
    from api.attack_paths.sink import factory
    original_backend = factory._backend
    original_secondary = dict(factory._secondary_backends)
    factory._backend = None
    factory._secondary_backends.clear()
    yield
    factory._backend = original_backend
    factory._secondary_backends.clear()
    factory._secondary_backends.update(original_secondary)
 class TestSinkFactory:
    def test_default_resolves_to_neo4j(self, settings):
        from api.attack_paths.sink import factory
        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
        assert factory._resolve_setting() == "neo4j"
    def test_neptune_resolves_correctly(self, settings):
        from api.attack_paths.sink import factory
        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
        assert factory._resolve_setting() == "neptune"
    def test_invalid_value_raises(self, settings):
        from api.attack_paths.sink import factory
        settings.ATTACK_PATHS_SINK_DATABASE = "foo"
        with pytest.raises(RuntimeError, match="ATTACK_PATHS_SINK_DATABASE"):
            factory._resolve_setting()
    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
    def test_init_builds_neo4j_backend_by_default(self, mock_driver, settings):
        from api.attack_paths import sink as sink_module
        from api.attack_paths.sink.neo4j import Neo4jSink
        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
        settings.DATABASES = {
            **settings.DATABASES,
            "neo4j": {
                "HOST": "localhost",
                "PORT": "7687",
                "USER": "neo4j",
                "PASSWORD": "pw",
            },
        }
        mock_driver.return_value = MagicMock()
        backend = sink_module.init()
        assert isinstance(backend, Neo4jSink)
        mock_driver.assert_called_once()
    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
    def test_init_builds_neptune_backend(
        self, mock_driver, mock_auth_provider, settings
    ):
        from api.attack_paths import sink as sink_module
        from api.attack_paths.sink.neptune import NeptuneSink
        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
        settings.DATABASES = {
            **settings.DATABASES,
            "neptune": {
                "WRITER_ENDPOINT": "writer.example",
                "READER_ENDPOINT": "reader.example",
                "PORT": "8182",
                "REGION": "eu-west-1",
            },
        }
        mock_driver.return_value = MagicMock()
        mock_auth_provider.return_value = lambda: None
        backend = sink_module.init()
        assert isinstance(backend, NeptuneSink)
        # Writer + reader endpoints both trigger driver construction
        assert mock_driver.call_count == 2
        writer_uri = mock_driver.call_args_list[0][0][0]
        reader_uri = mock_driver.call_args_list[1][0][0]
        assert writer_uri == "bolt+s://writer.example:8182"
        assert reader_uri == "bolt+s://reader.example:8182"
    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
    def test_neptune_reader_falls_back_to_writer(
        self, mock_driver, mock_auth_provider, settings
    ):
        from api.attack_paths import sink as sink_module
        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
        settings.DATABASES = {
            **settings.DATABASES,
            "neptune": {
                "WRITER_ENDPOINT": "writer.example",
                "READER_ENDPOINT": "",
                "PORT": "8182",
                "REGION": "eu-west-1",
            },
        }
        mock_driver.return_value = MagicMock()
        mock_auth_provider.return_value = lambda: None
        sink_module.init()
        # Only one driver call — reader aliases writer
        assert mock_driver.call_count == 1
 class TestGetBackendForScan:
    """``get_backend_for_scan`` routes by the row's recorded sink backend."""
    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
    def test_legacy_scan_in_neo4j_process_uses_active_backend(
        self, mock_driver, settings
    ):
        from api.attack_paths import sink as sink_module
        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
        settings.DATABASES = {
            **settings.DATABASES,
            "neo4j": {
                "HOST": "localhost",
                "PORT": "7687",
                "USER": "neo4j",
                "PASSWORD": "pw",
            },
        }
        mock_driver.return_value = MagicMock()
        scan = MagicMock(sink_backend="neo4j")
        backend = sink_module.get_backend_for_scan(scan)
        assert backend is sink_module.get_backend()
    def test_neptune_scan_on_neo4j_process_uses_neptune_secondary(self, settings):
        from api.attack_paths.sink import factory
        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
        active_neo4j = MagicMock(name="neo4j-active")
        factory._backend = active_neo4j
        secondary_neptune = MagicMock(name="neptune-secondary")
        with patch.object(factory, "_build_backend", return_value=secondary_neptune):
            scan = MagicMock(sink_backend="neptune")
            backend = factory.get_backend_for_scan(scan)
        assert backend is secondary_neptune
        assert backend is not active_neo4j
 def _session_ctx(session: MagicMock) -> MagicMock:
    ctx = MagicMock()
    ctx.__enter__ = MagicMock(return_value=session)
    ctx.__exit__ = MagicMock(return_value=False)
    return ctx
 class TestNeo4jSinkSyncWrites:
    def test_ensure_sync_indexes_runs_create_index_idempotent(self):
        from api.attack_paths.sink.neo4j import Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        session.run.return_value = MagicMock()
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            sink.ensure_sync_indexes("db-tenant-x")
        query = session.run.call_args.args[0]
        assert "CREATE INDEX" in query
        assert "IF NOT EXISTS" in query
        assert "`_ProviderResource`" in query
        assert "`_provider_element_id`" in query
    def test_write_nodes_skips_empty_batch(self):
        from api.attack_paths.sink.neo4j import Neo4jSink
        sink = Neo4jSink()
        with patch.object(sink, "get_session") as get_session:
            sink.write_nodes("db-tenant-x", "`AWSUser`", [])
            get_session.assert_not_called()
    def test_write_nodes_merges_on_provider_resource_label(self):
        from api.attack_paths.sink.neo4j import Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            sink.write_nodes(
                "db-tenant-x",
                "`AWSUser`:`_ProviderResource`",
                [{"provider_element_id": "p:e", "props": {"k": "v"}}],
            )
        query, params = session.run.call_args.args
        assert "MERGE (n:`_ProviderResource`" in query
        assert "`_provider_element_id`: row.provider_element_id" in query
        assert "SET n:`AWSUser`:`_ProviderResource`" in query
        assert params == {"rows": [{"provider_element_id": "p:e", "props": {"k": "v"}}]}
    def test_write_relationships_scopes_endpoints_by_provider_label(self):
        from api.attack_paths.sink.neo4j import Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        provider_id = "00000000-0000-0000-0000-000000000abc"
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            sink.write_relationships(
                "db-tenant-x",
                "RESOURCE",
                provider_id,
                [
                    {
                        "start_element_id": "s",
                        "end_element_id": "e",
                        "provider_element_id": "pe",
                        "props": {},
                    }
                ],
            )
        query = session.run.call_args.args[0]
        assert ":`_Provider_00000000000000000000000000000abc`" in query
        assert ":RESOURCE" in query.replace("`", "")
        assert "MERGE (s)-[r:`RESOURCE`" in query
 class TestNeptuneSinkSyncWrites:
    def test_ensure_sync_indexes_is_noop(self):
        from api.attack_paths.sink.neptune import NeptuneSink
        sink = NeptuneSink()
        with patch.object(sink, "get_session") as get_session:
            sink.ensure_sync_indexes("ignored")
            get_session.assert_not_called()
    def test_write_nodes_merges_on_neptune_id_with_provider_resource_label(self):
        from api.attack_paths.sink.neptune import NeptuneSink
        sink = NeptuneSink()
        session = MagicMock()
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            sink.write_nodes(
                "ignored",
                "`AWSUser`",
                [{"provider_element_id": "p:e", "props": {"k": "v"}}],
            )
        query = session.run.call_args.args[0]
        # Neptune assigns a default `vertex` label to any unlabeled node,
        # so the MERGE must pin a real label at creation time.
        assert "MERGE (n:`_ProviderResource` {`~id`: row.provider_element_id})" in query
        assert "SET n:`AWSUser`" in query
        assert "SET n.`_provider_element_id` = row.provider_element_id" in query
    def test_write_relationships_matches_endpoints_by_id(self):
        from api.attack_paths.sink.neptune import NeptuneSink
        sink = NeptuneSink()
        session = MagicMock()
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            sink.write_relationships(
                "ignored",
                "RESOURCE",
                "provider-1",
                [
                    {
                        "start_element_id": "s",
                        "end_element_id": "e",
                        "provider_element_id": "pe",
                        "props": {},
                    }
                ],
            )
        query = session.run.call_args.args[0]
        assert "MATCH (s) WHERE id(s) = row.start_element_id" in query
        assert "MATCH (e) WHERE id(e) = row.end_element_id" in query
        assert "MERGE (s)-[r:`RESOURCE`" in query
 class TestNeptuneSinkDropSubgraph:
    def test_drop_subgraph_deletes_rels_before_nodes_in_bounded_batches(self):
        from api.attack_paths.sink.neptune import NeptuneSink
        sink = NeptuneSink()
        session = MagicMock()
        rel_record_first = MagicMock()
        rel_record_first.__getitem__ = lambda _self, key: 50
        rel_record_drain = MagicMock()
        rel_record_drain.__getitem__ = lambda _self, key: 0
        node_record_first = MagicMock()
        node_record_first.__getitem__ = lambda _self, key: 10
        node_record_drain = MagicMock()
        node_record_drain.__getitem__ = lambda _self, key: 0
        run_results = [
            MagicMock(single=MagicMock(return_value=rel_record_first)),
            MagicMock(single=MagicMock(return_value=rel_record_drain)),
            MagicMock(single=MagicMock(return_value=node_record_first)),
            MagicMock(single=MagicMock(return_value=node_record_drain)),
        ]
        session.run.side_effect = run_results
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            deleted = sink.drop_subgraph("ignored", "provider-1")
        assert deleted == 10
        first_query = session.run.call_args_list[0].args[0]
        assert "DELETE r" in first_query
        assert "DETACH DELETE" not in first_query
        # DISTINCT avoids double-counting relationships matched from both ends.
        assert "DISTINCT r" in first_query
        third_query = session.run.call_args_list[2].args[0]
        assert "DELETE n" in third_query
 class TestNeo4jSinkDropSubgraph:
    """Neo4j drop deletes relationships then nodes in batches (no ``DETACH DELETE``)."""
    def test_drop_subgraph_deletes_rels_before_nodes_in_bounded_batches(self):
        from api.attack_paths.sink.neo4j import Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        rel_first = MagicMock()
        rel_first.get = lambda key, default=0: 50
        rel_drain = MagicMock()
        rel_drain.get = lambda key, default=0: 0
        node_first = MagicMock()
        node_first.get = lambda key, default=0: 10
        node_drain = MagicMock()
        node_drain.get = lambda key, default=0: 0
        session.run.side_effect = [
            MagicMock(single=MagicMock(return_value=rel_first)),
            MagicMock(single=MagicMock(return_value=rel_drain)),
            MagicMock(single=MagicMock(return_value=node_first)),
            MagicMock(single=MagicMock(return_value=node_drain)),
        ]
        provider_id = "00000000-0000-0000-0000-000000000abc"
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            deleted = sink.drop_subgraph("db-tenant-x", provider_id)
        # Only phase-2 node counts contribute to the return value.
        assert deleted == 10
        assert session.run.call_count == 4
        queries = [call.args[0] for call in session.run.call_args_list]
        # Regression guard: the memory blow-up was caused by DETACH DELETE.
        assert all("DETACH DELETE" not in query for query in queries)
        first_query = queries[0]
        assert "DELETE r" in first_query
        # DISTINCT avoids double-counting relationships matched from both ends.
        assert "DISTINCT r" in first_query
        assert ":`_Provider_00000000000000000000000000000abc`" in first_query
        assert "DELETE n" in queries[2]
        # Relationships must be fully drained before nodes are deleted.
        first_node = next(i for i, q in enumerate(queries) if "DELETE n" in q)
        last_rel = max(i for i, q in enumerate(queries) if "DELETE r" in q)
        assert last_rel < first_node
    def test_drop_subgraph_returns_zero_when_database_does_not_exist(self):
        from api.attack_paths.database import GraphDatabaseQueryException
        from api.attack_paths.sink.neo4j import DATABASE_NOT_FOUND_CODE, Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        session.run.side_effect = GraphDatabaseQueryException(
            message="db missing", code=DATABASE_NOT_FOUND_CODE
        )
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            deleted = sink.drop_subgraph("db-tenant-missing", "provider-1")
        assert deleted == 0
 class TestSinkHasProviderData:
    """``has_provider_data`` is the read-path probe used by API views."""
    def test_neo4j_returns_true_when_provider_node_exists(self):
        from api.attack_paths.sink.neo4j import Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        session.run.return_value.single.return_value = MagicMock()
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            present = sink.has_provider_data(
                "db-tenant-x", "00000000-0000-0000-0000-000000000abc"
            )
        assert present is True
        query = session.run.call_args.args[0]
        assert ":`_Provider_00000000000000000000000000000abc`" in query
    def test_neo4j_returns_false_when_database_does_not_exist(self):
        from api.attack_paths.database import GraphDatabaseQueryException
        from api.attack_paths.sink.neo4j import DATABASE_NOT_FOUND_CODE, Neo4jSink
        sink = Neo4jSink()
        session = MagicMock()
        session.run.side_effect = GraphDatabaseQueryException(
            message="db missing", code=DATABASE_NOT_FOUND_CODE
        )
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            present = sink.has_provider_data("db-tenant-missing", "provider-1")
        assert present is False
    def test_neptune_returns_true_when_provider_node_exists(self):
        from api.attack_paths.sink.neptune import NeptuneSink
        sink = NeptuneSink()
        session = MagicMock()
        session.run.return_value.single.return_value = MagicMock()
        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
            present = sink.has_provider_data("ignored", "provider-1")
        assert present is True
 class TestGetBackendForScanCutover:
    """``get_backend_for_scan`` keeps old-sink scans queryable after cutover."""
    def test_legacy_scan_on_neptune_process_uses_neo4j_secondary(self, settings):
        from api.attack_paths.sink import factory
        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
        active_neptune = MagicMock(name="neptune-active")
        factory._backend = active_neptune
        secondary_neo4j = MagicMock(name="neo4j-secondary")
        with patch.object(factory, "_build_backend", return_value=secondary_neo4j):
            scan = MagicMock(sink_backend="neo4j")
            backend = factory.get_backend_for_scan(scan)
        assert backend is secondary_neo4j
        assert backend is not active_neptune
 class TestSinkVerifyConnectivity:
    """The readiness probe calls ``verify_connectivity`` through the shim.
    Neo4j checks its single driver; Neptune checks the reader (the API read
    path), which on single-endpoint clusters aliases the writer.
    """
    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
    def test_neo4j_verifies_its_driver(self, mock_driver, settings):
        from api.attack_paths.sink.neo4j import Neo4jSink
        settings.DATABASES = {
            **settings.DATABASES,
            "neo4j": {
                "HOST": "localhost",
                "PORT": "7687",
                "USER": "neo4j",
                "PASSWORD": "pw",
            },
        }
        driver = MagicMock()
        mock_driver.return_value = driver
        sink = Neo4jSink()
        sink.init()
        driver.verify_connectivity.reset_mock()  # ignore the eager init check
        sink.verify_connectivity()
        driver.verify_connectivity.assert_called_once_with()
    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
    def test_neptune_verifies_reader_not_writer(
        self, mock_driver, mock_auth_provider, settings
    ):
        from api.attack_paths.sink.neptune import NeptuneSink
        settings.DATABASES = {
            **settings.DATABASES,
            "neptune": {
                "WRITER_ENDPOINT": "writer.example",
                "READER_ENDPOINT": "reader.example",
                "PORT": "8182",
                "REGION": "eu-west-1",
            },
        }
        writer, reader = MagicMock(name="writer"), MagicMock(name="reader")
        mock_driver.side_effect = [writer, reader]
        mock_auth_provider.return_value = lambda: None
        sink = NeptuneSink()
        sink.init()
        writer.verify_connectivity.reset_mock()
        reader.verify_connectivity.reset_mock()
        sink.verify_connectivity()
        reader.verify_connectivity.assert_called_once_with()
        writer.verify_connectivity.assert_not_called()
 class TestSinkInitToleratesUnreachableSink:
    """Init must not crash the process when the sink is down at boot.
    Same degradation model as Postgres: the driver is retained and
    reconnects lazily; /health/ready surfaces the outage until it recovers.
    """
    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
    def test_neo4j_init_continues_when_verify_fails(self, mock_driver, settings):
        from api.attack_paths.sink.neo4j import Neo4jSink
        settings.DATABASES = {
            **settings.DATABASES,
            "neo4j": {
                "HOST": "localhost",
                "PORT": "7687",
                "USER": "neo4j",
                "PASSWORD": "pw",
            },
        }
        driver = MagicMock()
        driver.verify_connectivity.side_effect = RuntimeError("unreachable")
        mock_driver.return_value = driver
        sink = Neo4jSink()
        # Must not raise.
        assert sink.init() is driver
        assert sink._driver is driver
    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
    def test_neptune_init_continues_when_verify_fails(
        self, mock_driver, mock_auth_provider, settings
    ):
        from api.attack_paths.sink.neptune import NeptuneSink
        settings.DATABASES = {
            **settings.DATABASES,
            "neptune": {
                "WRITER_ENDPOINT": "writer.example",
                "READER_ENDPOINT": "reader.example",
                "PORT": "8182",
                "REGION": "eu-west-1",
            },
        }
        driver = MagicMock()
        driver.verify_connectivity.side_effect = RuntimeError("unreachable")
        mock_driver.return_value = driver
        mock_auth_provider.return_value = lambda: None
        sink = NeptuneSink()
        # Must not raise; both drivers retained.
        sink.init()
        assert sink._writer is not None
        assert sink._reader is not None
 class TestNeptuneAdminNoOps:
    """Neptune is single-database; admin DDL has no work to do."""
    @pytest.mark.parametrize("method", ["create_database", "drop_database"])
    def test_admin_ops_return_none_without_touching_a_session(self, method):
        from api.attack_paths.sink.neptune import NeptuneSink
        sink = NeptuneSink()
        with patch.object(sink, "get_session") as get_session:
            assert getattr(sink, method)("ignored") is None
            get_session.assert_not_called()
 class TestNeptuneAuthToken:
    """SigV4 signing for the Neptune Bolt endpoint."""
    @patch("api.attack_paths.sink.neptune.SigV4Auth")
    @patch("api.attack_paths.sink.neptune.BotoSession")
    def test_host_header_includes_non_default_port(self, mock_boto, mock_sigv4):
        # Neptune runs on 8182; the SigV4 canonical Host must keep the port or
        # the signature is rejected.
        from api.attack_paths.sink.neptune import _NeptuneAuthToken
        credentials = MagicMock()
        credentials.get_frozen_credentials.return_value = MagicMock()
        mock_boto.return_value.get_credentials.return_value = credentials
        token = _NeptuneAuthToken("eu-west-1", "https://writer.example:8182")
        auth_obj = json.loads(token.credentials)
        assert auth_obj["Host"] == "writer.example:8182"
@@ -4754,6 +4754,64 @@ class TestAttackPathsScanViewSet:
        assert first_attributes["provider_type"] == provider.provider
        assert first_attributes["provider_uid"] == provider.uid
    def test_attack_paths_scans_list_prefers_active_sink_scan_on_rollback(
        self,
        authenticated_client,
        providers_fixture,
        scans_fixture,
        create_attack_paths_scan,
        settings,
    ):
        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
        provider = providers_fixture[0]
        neo4j_scan = create_attack_paths_scan(
            provider,
            scan=scans_fixture[0],
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            sink_backend="neo4j",
        )
        neptune_scan = create_attack_paths_scan(
            provider,
            scan=scans_fixture[0],
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            sink_backend="neptune",
        )
        response = authenticated_client.get(reverse("attack-paths-scans-list"))
        assert response.status_code == status.HTTP_200_OK
        ids = {item["id"] for item in response.json()["data"]}
        assert str(neo4j_scan.id) in ids
        assert str(neptune_scan.id) not in ids
    def test_attack_paths_scans_list_falls_back_when_active_sink_has_no_scan(
        self,
        authenticated_client,
        providers_fixture,
        scans_fixture,
        create_attack_paths_scan,
        settings,
    ):
        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
        provider = providers_fixture[0]
        legacy_scan = create_attack_paths_scan(
            provider,
            scan=scans_fixture[0],
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            sink_backend="neo4j",
        )
        response = authenticated_client.get(reverse("attack-paths-scans-list"))
        assert response.status_code == status.HTTP_200_OK
        ids = {item["id"] for item in response.json()["data"]}
        assert str(legacy_scan.id) in ids
    def test_attack_paths_scans_list_respects_provider_group_visibility(
        self,
        authenticated_client_no_permissions_rbac,
@@ -4874,7 +4932,8 @@ class TestAttackPathsScanViewSet:
            )
        assert response.status_code == status.HTTP_200_OK
-        mock_get_queries.assert_called_once_with(provider.provider)
+        # TODO: drop the is_migrated argument after Neptune cutover
        mock_get_queries.assert_called_once_with(provider.provider, is_migrated=False)
        payload = response.json()["data"]
        assert len(payload) == 1
        assert payload[0]["id"] == "aws-rds"
@@ -4974,7 +5033,8 @@ class TestAttackPathsScanViewSet:
            )
        assert response.status_code == status.HTTP_200_OK
-        mock_get_query.assert_called_once_with("aws-rds")
+        # TODO: drop the is_migrated argument after Neptune cutover
        mock_get_query.assert_called_once_with("aws-rds", is_migrated=False)
        mock_get_db_name.assert_called_once_with(attack_paths_scan.provider.tenant_id)
        provider_id = str(attack_paths_scan.provider_id)
        mock_prepare.assert_called_once_with(
@@ -4988,6 +5048,7 @@ class TestAttackPathsScanViewSet:
            query_definition,
            prepared_parameters,
            provider_id,
            scan=attack_paths_scan,
        )
        result = response.json()["data"]
        attributes = result["attributes"]
@@ -5339,6 +5400,7 @@ class TestAttackPathsScanViewSet:
            "db-test",
            "MATCH (n) RETURN n",
            str(attack_paths_scan.provider_id),
            scan=attack_paths_scan,
        )
        attributes = response.json()["data"]["attributes"]
        assert len(attributes["nodes"]) == 1
@@ -5875,9 +5937,10 @@ class TestAttackPathsScanViewSet:
            )
        assert response.status_code == status.HTTP_200_OK
-        mock_get_schema.assert_called_once_with(
+        mock_get_schema.assert_called_once()
-            "db-test", str(attack_paths_scan.provider_id)
+        schema_args = mock_get_schema.call_args[0]
-        )
+        assert schema_args[:2] == ("db-test", str(attack_paths_scan.provider_id))
        assert schema_args[2].id == attack_paths_scan.id
        attributes = response.json()["data"]["attributes"]
        assert attributes["provider"] == "aws"
        assert attributes["cartography_version"] == "0.129.0"
@@ -2876,13 +2876,22 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())
        active_sink_backend = django_settings.ATTACK_PATHS_SINK_DATABASE
        latest_per_provider = queryset.annotate(
            active_sink_rank=Case(
                When(sink_backend=active_sink_backend, then=Value(0)),
                default=Value(1),
                output_field=IntegerField(),
            ),
            latest_scan_rank=Window(
                expression=RowNumber(),
                partition_by=[F("provider_id")],
-                order_by=[F("inserted_at").desc()],
+                order_by=[
-            )
+                    F("active_sink_rank").asc(),
                    F("inserted_at").desc(),
                ],
            ),
        ).filter(latest_scan_rank=1)
        page = self.paginate_queryset(latest_per_provider)
@@ -2909,7 +2918,11 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
    )
    def attack_paths_queries(self, request, pk=None):
        attack_paths_scan = self.get_object()
-        queries = get_queries_for_provider(attack_paths_scan.provider.provider)
+        # TODO: drop the is_migrated argument after Neptune cutover
        queries = get_queries_for_provider(
            attack_paths_scan.provider.provider,
            is_migrated=attack_paths_scan.is_migrated,
        )
        if not queries:
            return Response(
@@ -2942,7 +2955,11 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
        serializer = AttackPathsQueryRunRequestSerializer(data=payload)
        serializer.is_valid(raise_exception=True)
-        query_definition = get_query_by_id(serializer.validated_data["id"])
+        # TODO: drop the is_migrated argument after Neptune cutover
        query_definition = get_query_by_id(
            serializer.validated_data["id"],
            is_migrated=attack_paths_scan.is_migrated,
        )
        if (
            query_definition is None
            or query_definition.provider != attack_paths_scan.provider.provider
@@ -2968,6 +2985,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
            query_definition,
            parameters,
            provider_id,
            scan=attack_paths_scan,
        )
        query_duration = time.monotonic() - start
@@ -3035,6 +3053,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
            database_name,
            serializer.validated_data["query"],
            provider_id,
            scan=attack_paths_scan,
        )
        query_duration = time.monotonic() - start
@@ -3091,7 +3110,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
        provider_id = str(attack_paths_scan.provider_id)
        schema = attack_paths_views_helpers.get_cartography_schema(
-            database_name, provider_id
+            database_name, provider_id, attack_paths_scan
        )
        if not schema:
            return Response(
@@ -311,6 +311,11 @@ ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES = env.int(
    "ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES", 2880
 )  # 48h
 # Selects where the persistent attack-paths graph is stored. The scan
 # temporary database is always Neo4j; only the sink is configurable.
 # Valid values: "neo4j" (default, OSS and local dev), "neptune" (hosted).
 ATTACK_PATHS_SINK_DATABASE = env.str("ATTACK_PATHS_SINK_DATABASE", default="neo4j")
 # Orphan task recovery feature flags. The master switch is OFF by default, so task
 # recovery is opt-in; enable it with DJANGO_TASK_RECOVERY_ENABLED=true. The per-group
 # toggles default to enabled, so once the master is on every group recovers unless a
@@ -50,6 +50,12 @@ DATABASES = {
        "USER": env.str("NEO4J_USER", "neo4j"),
        "PASSWORD": env.str("NEO4J_PASSWORD", "neo4j_password"),
    },
    "neptune": {
        "WRITER_ENDPOINT": env.str("NEPTUNE_WRITER_ENDPOINT", ""),
        "READER_ENDPOINT": env.str("NEPTUNE_READER_ENDPOINT", ""),
        "PORT": env.str("NEPTUNE_PORT", "8182"),
        "REGION": env.str("AWS_REGION", ""),
    },
 }
 DATABASES["default"] = DATABASES["prowler_user"]
@@ -49,12 +49,19 @@ DATABASES = {
        "HOST": env("POSTGRES_REPLICA_HOST", default=default_db_host),
        "PORT": env("POSTGRES_REPLICA_PORT", default=default_db_port),
    },
    # TODO: drop after Neptune cutover just loosen defaults to `""`
    "neo4j": {
        "HOST": env.str("NEO4J_HOST"),
        "PORT": env.str("NEO4J_PORT"),
        "USER": env.str("NEO4J_USER"),
        "PASSWORD": env.str("NEO4J_PASSWORD"),
    },
    "neptune": {
        "WRITER_ENDPOINT": env.str("NEPTUNE_WRITER_ENDPOINT", default=""),
        "READER_ENDPOINT": env.str("NEPTUNE_READER_ENDPOINT", default=""),
        "PORT": env.str("NEPTUNE_PORT", default="8182"),
        "REGION": env.str("AWS_REGION", default=""),
    },
 }
 DATABASES["default"] = DATABASES["prowler_user"]
@@ -83,12 +83,28 @@ def _warm_compliance_caches_in_background():
 def post_fork(_server, worker):
-    """Warm compliance caches after each worker fork.
+    """Re-initialize attack-paths drivers and warm compliance caches per worker.
-    Warm compliance caches in a background thread so the worker becomes ready
+    Neo4j / Neptune drivers spawn background IO threads that do not survive
-    immediately. A request for a not-yet-warmed provider lazily loads just that
+    ``fork()``. When the gunicorn master runs with ``preload_app=True``, the
-    provider, which stays well under the worker timeout.
+    child inherits driver objects whose pool references dead threads and
    hangs on the first ``pool.acquire`` call until the watchdog kills the
    worker. Re-initializing per worker guarantees each child owns its own
    live threads. See GUNICORN_WORKER_TIMEOUTS_ANALYSIS.md for detail.
    Compliance caches are then warmed in a background thread so the worker
    becomes ready immediately. A request for a not-yet-warmed provider lazily
    loads just that provider, which stays well under the worker timeout.
    """
    from api.attack_paths import database as graph_database
    try:
        graph_database.close_driver()
    except Exception:  # pragma: no cover - best-effort cleanup
        pass
    graph_database.init_driver()
    gunicorn_logger.info(f"Attack-paths drivers initialized for worker {worker.pid}")
    threading.Thread(
        target=_warm_compliance_caches_in_background,
        name="warm-compliance-caches",
@@ -1821,6 +1821,36 @@ def attack_paths_query_definition_factory():
    return _create
@pytest.fixture
 def sink_backend_stub():
    """Install a stub `SinkDatabase` into the sink factory for the test's duration.
    The sink factory caches a process-wide backend and lazily initializes it
    against `settings.DATABASES["neo4j"]` / `["neptune"]`. Tests that don't
    want to stand up a real Bolt driver can yield this fixture's mock and
    configure its return values directly:
        sink_backend_stub.execute_read_query.return_value = some_graph
    Both the active backend and the secondary-backend cache are restored on
    teardown so tests stay isolated.
    """
    from api.attack_paths.sink import factory
    from api.attack_paths.sink.base import SinkDatabase
    stub = MagicMock(spec=SinkDatabase)
    previous_backend = factory._backend
    previous_secondary = dict(factory._secondary_backends)
    factory._backend = stub
    factory._secondary_backends.clear()
    try:
        yield stub
    finally:
        factory._backend = previous_backend
        factory._secondary_backends.clear()
        factory._secondary_backends.update(previous_secondary)
@pytest.fixture
 def attack_paths_graph_stub_classes():
    """Provide lightweight graph element stubs for Attack Paths serialization tests."""
@@ -6,6 +6,7 @@ from typing import Any
 import aioboto3
 import boto3
 import botocore
 import neo4j
 from api.models import (
    AttackPathsScan as ProwlerAPIAttackPathsScan,
@@ -73,13 +74,28 @@ def start_aws_ingestion(
    # Adding an extra field
    common_job_parameters["AWS_ID"] = prowler_api_provider.uid
-    cartography_aws._autodiscover_accounts(
+    # AWS Organizations account autodiscovery. Inlined from Cartography's removed
    # `_autodiscover_accounts` (deleted in `0.137.0`), as `load_aws_accounts` is still public.
    try:
        org_client = boto3_session.client("organizations")
        paginator = org_client.get_paginator("list_accounts")
        discovered = []
        for page in paginator.paginate():
            discovered.extend(page["Accounts"])
        active_accounts = {
            a["Name"]: a["Id"] for a in discovered if a["Status"] == "ACTIVE"
        }
        cartography_aws.organizations.load_aws_accounts(
            neo4j_session,
-        boto3_session,
+            active_accounts,
        prowler_api_provider.uid,
            cartography_config.update_tag,
            common_job_parameters,
        )
    except botocore.exceptions.ClientError:
        logger.warning(
            f"Account {prowler_api_provider.uid} lacks permissions for AWS "
            "Organizations autodiscovery."
        )
    db_utils.update_attack_paths_scan_progress(attack_paths_scan, 4)
    failed_syncs = sync_aws_account(
@@ -277,7 +293,7 @@ def sync_aws_account(
    sync_args: dict[str, Any],
    attack_paths_scan: ProwlerAPIAttackPathsScan,
 ) -> dict[str, str]:
-    current_progress = 4  # `cartography_aws._autodiscover_accounts`
+    current_progress = 4  # AWS Organizations account autodiscovery
    max_progress = (
        87  # `cartography_aws.RESOURCE_FUNCTIONS["permission_relationships"]` - 1
    )
@@ -8,7 +8,7 @@ from celery import states
 from celery.utils.log import get_task_logger
 from config.django.base import ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES
 from tasks.jobs.attack_paths.db_utils import (
-    _mark_scan_finished,
+    mark_scan_finished,
    recover_graph_data_ready,
 )
 from tasks.jobs.orphan_recovery import is_worker_alive as _is_worker_alive
@@ -87,7 +87,7 @@ def _cleanup_stale_executing_scans(cutoff: datetime) -> list[str]:
            else:
                reason = "Worker dead — cleaned up by periodic task"
        else:
-            # No worker recorded — time-based heuristic only
+            # No worker recorded, time-based heuristic only
            if scan.started_at and scan.started_at >= cutoff:
                continue
            reason = (
@@ -160,7 +160,7 @@ def _cleanup_scan(scan, task_result, reason: str) -> bool:
    """
    scan_id_str = str(scan.id)
-    # 1. Drop temp Neo4j database
+    # Drop temp Neo4j database
    tmp_db_name = graph_database.get_database_name(scan.id, temporary=True)
    try:
        graph_database.drop_database(tmp_db_name)
@@ -225,6 +225,6 @@ def _finalize_failed_scan(scan, expected_state: str, reason: str):
            logger.info(f"Scan {scan_id_str} is now {fresh_scan.state}, skipping")
            return None
-        _mark_scan_finished(fresh_scan, StateChoices.FAILED, {"global_error": reason})
+        mark_scan_finished(fresh_scan, StateChoices.FAILED, {"global_error": reason})
    return fresh_scan
@@ -1,9 +1,14 @@
 from collections.abc import Callable
 from dataclasses import dataclass
 from uuid import UUID
 from config.env import env
-from tasks.jobs.attack_paths import aws
+from tasks.jobs.attack_paths import provider_config as _provider_config
 # Re-export provider config objects so existing imports keep working.
 AWS_CONFIG = _provider_config.AWS_CONFIG
 NormalizedList = _provider_config.NormalizedList
 PROVIDER_CONFIGS = _provider_config.PROVIDER_CONFIGS
 ProviderConfig = _provider_config.ProviderConfig
 # Batch size for Neo4j write operations (resource labeling, cleanup)
 BATCH_SIZE = env.int("ATTACK_PATHS_BATCH_SIZE", 1000)
@@ -21,42 +26,12 @@ PROWLER_FINDING_LABEL = "ProwlerFinding"
 PROVIDER_RESOURCE_LABEL = "_ProviderResource"
 # Dynamic isolation labels that contain entity UUIDs and are added to every synced node during sync
-# Format: _Tenant_{uuid_no_hyphens}, _Provider_{uuid_no_hyphens}
+# Format: `_Tenant_{uuid_no_hyphens}`, `_Provider_{uuid_no_hyphens}`
 TENANT_LABEL_PREFIX = "_Tenant_"
 PROVIDER_LABEL_PREFIX = "_Provider_"
 DYNAMIC_ISOLATION_PREFIXES = [TENANT_LABEL_PREFIX, PROVIDER_LABEL_PREFIX]
@dataclass(frozen=True)
 class ProviderConfig:
    """Configuration for a cloud provider's Attack Paths integration."""
    name: str
    root_node_label: str  # e.g., "AWSAccount"
    uid_field: str  # e.g., "arn"
    # Label for resources connected to the account node, enabling indexed finding lookups.
    resource_label: str  # e.g., "_AWSResource"
    ingestion_function: Callable
    # Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance).
    short_uid_extractor: Callable[[str], str]
 # Provider Configurations
 # -----------------------
 AWS_CONFIG = ProviderConfig(
    name="aws",
    root_node_label="AWSAccount",
    uid_field="arn",
    resource_label="_AWSResource",
    ingestion_function=aws.start_aws_ingestion,
    short_uid_extractor=aws.extract_short_uid,
 )
 PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
    "aws": AWS_CONFIG,
 }
 # Labels added by Prowler that should be filtered from API responses
 # Derived from provider configs + common internal labels
 INTERNAL_LABELS: list[str] = [
@@ -87,7 +62,6 @@ INTERNAL_PROPERTIES: list[str] = [
 # Provider Config Accessors
 # -------------------------
 def is_provider_available(provider_type: str) -> bool:
@@ -135,7 +109,6 @@ def get_short_uid_extractor(provider_type: str) -> Callable[[str], str]:
 # Dynamic Isolation Label Helpers
 # --------------------------------
 def _normalize_uuid(value: str | UUID) -> str:
@@ -8,6 +8,8 @@ from api.models import Provider as ProwlerAPIProvider
 from api.models import StateChoices
 from cartography.config import Config as CartographyConfig
 from celery.utils.log import get_task_logger
 from django.conf import settings
 from django.db.models import Case, IntegerField, Value, When
 from tasks.jobs.attack_paths.config import is_provider_available
 logger = get_task_logger(__name__)
@@ -29,13 +31,33 @@ def create_attack_paths_scan(
        return None
    with rls_transaction(tenant_id):
-        # Inherit graph_data_ready from the previous scan for this provider,
+        # Inherit metadata from the previous ready scan for this provider so
-        # so queries remain available while the new scan runs.
+        # queries remain available while the new scan runs. The new row only
-        previous_data_ready = ProwlerAPIAttackPathsScan.objects.filter(
+        # flips to the target sink after its own graph sync succeeds.
        active_sink_backend = settings.ATTACK_PATHS_SINK_DATABASE
        previous_ready = (
            ProwlerAPIAttackPathsScan.objects.filter(
                tenant_id=tenant_id,
                provider_id=provider_id,
                graph_data_ready=True,
-        ).exists()
+            )
            .annotate(
                active_sink_rank=Case(
                    When(sink_backend=active_sink_backend, then=Value(0)),
                    default=Value(1),
                    output_field=IntegerField(),
                )
            )
            .order_by("active_sink_rank", "-inserted_at")
            .first()
        )
        previous_data_ready = previous_ready is not None
        inherited_is_migrated = previous_ready.is_migrated if previous_ready else False
        inherited_sink_backend = (
            previous_ready.sink_backend
            if previous_ready
            else ProwlerAPIAttackPathsScan.SinkBackendChoices.NEO4J
        )
        attack_paths_scan = ProwlerAPIAttackPathsScan.objects.create(
            tenant_id=tenant_id,
@@ -44,6 +66,8 @@ def create_attack_paths_scan(
            state=StateChoices.SCHEDULED,
            started_at=datetime.now(tz=UTC),
            graph_data_ready=previous_data_ready,
            is_migrated=inherited_is_migrated,
            sink_backend=inherited_sink_backend,
        )
        attack_paths_scan.save()
@@ -114,7 +138,7 @@ def starting_attack_paths_scan(
    return True
-def _mark_scan_finished(
+def mark_scan_finished(
    attack_paths_scan: ProwlerAPIAttackPathsScan,
    state: StateChoices,
    ingestion_exceptions: dict[str, Any],
@@ -148,7 +172,7 @@ def finish_attack_paths_scan(
    ingestion_exceptions: dict[str, Any],
 ) -> None:
    with rls_transaction(attack_paths_scan.tenant_id):
-        _mark_scan_finished(attack_paths_scan, state, ingestion_exceptions)
+        mark_scan_finished(attack_paths_scan, state, ingestion_exceptions)
 def update_attack_paths_scan_progress(
@@ -169,19 +193,45 @@ def set_graph_data_ready(
        attack_paths_scan.save(update_fields=["graph_data_ready"])
 def set_scan_migrated(
    attack_paths_scan: ProwlerAPIAttackPathsScan,
    migrated: bool,
    sink_backend: str | None = None,
 ) -> None:
    """Mark the scan as written with the current (migrated) schema.
    Called after a successful sync so the read catalog and sink backend only
    switch once the new graph is actually live.
    # TODO: drop after Neptune cutover
    """
    with rls_transaction(attack_paths_scan.tenant_id):
        attack_paths_scan.is_migrated = migrated
        update_fields = ["is_migrated"]
        if sink_backend is not None:
            attack_paths_scan.sink_backend = sink_backend
            update_fields.append("sink_backend")
        attack_paths_scan.save(update_fields=update_fields)
 def set_provider_graph_data_ready(
    attack_paths_scan: ProwlerAPIAttackPathsScan,
    ready: bool,
    sink_backend: str | None = None,
 ) -> None:
    """
-    Set `graph_data_ready` for ALL scans of the same provider.
+    Set `graph_data_ready` for scans of the same provider in one sink.
-    Used before drop/sync so that older scan IDs cannot bypass the query gate while the graph is being replaced.
+    Used before drop/sync so that older scan IDs in the target sink cannot
    bypass the query gate while that sink's graph is being replaced. Scans
    preserved in another sink stay queryable for rollback.
    """
    target_sink_backend = sink_backend or attack_paths_scan.sink_backend
    with rls_transaction(attack_paths_scan.tenant_id):
        ProwlerAPIAttackPathsScan.objects.filter(
            tenant_id=attack_paths_scan.tenant_id,
            provider_id=attack_paths_scan.provider_id,
            sink_backend=target_sink_backend,
        ).update(graph_data_ready=ready)
        attack_paths_scan.refresh_from_db(fields=["graph_data_ready"])
@@ -202,10 +252,15 @@ def recover_graph_data_ready(
    next successful scan) is a worse outcome for the user.
    """
    try:
        from api.attack_paths import sink as sink_module
        tenant_db = graph_database.get_database_name(attack_paths_scan.tenant_id)
-        if graph_database.has_provider_data(
+        # TODO: drop after Neptune cutover
-            tenant_db, str(attack_paths_scan.provider_id)
+        # Check the backend that actually holds this scan's data, not the
-        ):
+        # currently configured sink, a stale `EXECUTING` scan from before a
        # backend switch must still be recoverable
        backend = sink_module.get_backend_for_scan(attack_paths_scan)
        if backend.has_provider_data(tenant_db, str(attack_paths_scan.provider_id)):
            set_provider_graph_data_ready(attack_paths_scan, True)
            logger.info(
                f"Recovered `graph_data_ready` for provider {attack_paths_scan.provider_id}"
@@ -247,6 +302,6 @@ def fail_attack_paths_scan(
            return
        if fresh.state in (StateChoices.COMPLETED, StateChoices.FAILED):
            return
-        _mark_scan_finished(fresh, StateChoices.FAILED, {"global_error": error})
+        mark_scan_finished(fresh, StateChoices.FAILED, {"global_error": error})
    recover_graph_data_ready(fresh)
@@ -82,7 +82,6 @@ def _to_neo4j_dict(
 # Public API
 # ----------
 def analysis(
@@ -196,7 +195,6 @@ def load_findings(
 # Findings Streaming (Generator-based)
 # -------------------------------------
 def stream_findings_with_resources(
@@ -275,7 +273,6 @@ def _fetch_findings_batch(
 # Batch Enrichment
 # -----------------
 def _enrich_batch_with_resources(
@@ -1,5 +1,6 @@
 import neo4j
 from cartography.client.core.tx import run_write_query
 from cartography.intel import create_indexes as cartography_create_indexes
 from celery.utils.log import get_task_logger
 from tasks.jobs.attack_paths.config import (
    INTERNET_NODE_LABEL,
@@ -30,14 +31,34 @@ SYNC_INDEX_STATEMENTS = [
 def create_findings_indexes(neo4j_session: neo4j.Session) -> None:
-    """Create indexes for Prowler findings and resource lookups."""
+    """Create indexes for Prowler findings and resource lookups.
    Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
    session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
    and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
    """
    logger.info("Creating indexes for Prowler Findings node types")
    for statement in FINDINGS_INDEX_STATEMENTS:
        run_write_query(neo4j_session, statement)
 def create_cartography_indexes(neo4j_session: neo4j.Session, config) -> None:
    """Create Cartography's standard indexes for the session's database.
    Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
    session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
    and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
    """
    cartography_create_indexes.run(neo4j_session, config)
 def create_sync_indexes(neo4j_session: neo4j.Session) -> None:
-    """Create indexes for provider resource sync operations."""
+    """Create indexes for provider resource sync operations.
    Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
    session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
    and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
    """
    logger.info("Ensuring ProviderResource indexes exist")
    for statement in SYNC_INDEX_STATEMENTS:
        neo4j_session.run(statement)
@@ -0,0 +1,413 @@
 """
 Provider-level Attack Paths configuration.
 Each `ProviderConfig` carries the cloud provider's ingestion entry point and
 the catalog of list-typed node properties (`normalized_lists`). The sync
 layer reads this catalog and materialises each list element as a child node
 connected to the parent by a typed edge, so queries traverse the graph
 instead of working on serialised list values. Both Neo4j and Neptune sinks
 write the same shape and queries are portable across them.
 """
 from collections.abc import Callable
 from dataclasses import dataclass, field
 from tasks.jobs.attack_paths import aws
@dataclass(frozen=True)
 class NormalizedList:
    """Catalog entry for a list-typed node property.
    Describes how the sync layer materialises a parent node's list-typed
    property as a set of child item nodes connected by a typed edge.
    Conventions (mechanical, do not invent):
      - `child_label`: `<SourceLabel><PropertyPascal>Item`
          e.g. AWSPolicyStatement.resource -> AWSPolicyStatementResourceItem
      - `rel_type`:    `HAS_<PROPERTY_UPPER>`
          e.g. resource -> HAS_RESOURCE
      - child node property:
          * `field_map = []` (scalar list, ~95% case)  -> child stores `value: str`
          * `field_map = [(src_key, child_field), ...]` (list of dicts, rare)
              -> child stores those fields
    """
    source_label: str
    source_property: str
    child_label: str
    rel_type: str
    field_map: list[tuple[str, str]] = field(default_factory=list)
    def __post_init__(self) -> None:
        if self.field_map:
            child_fields = [dst for _, dst in self.field_map]
            if "value" in child_fields:
                raise ValueError(
                    f"NormalizedList {self.source_label}.{self.source_property}: "
                    "`value` is reserved for scalar mode; do not map a source key to it"
                )
            src_keys = [src for src, _ in self.field_map]
            if len(set(src_keys)) != len(src_keys):
                raise ValueError(
                    f"NormalizedList {self.source_label}.{self.source_property}: "
                    "duplicate source key in field_map"
                )
            if len(set(child_fields)) != len(child_fields):
                raise ValueError(
                    f"NormalizedList {self.source_label}.{self.source_property}: "
                    "duplicate child field in field_map"
                )
@dataclass(frozen=True)
 class ProviderConfig:
    """Configuration for a cloud provider's Attack Paths integration."""
    name: str
    root_node_label: str  # e.g., "AWSAccount"
    uid_field: str  # e.g., "arn"
    # Label for resources connected to the account node, enabling indexed finding lookups
    resource_label: str  # e.g., "_AWSResource"
    ingestion_function: Callable
    # Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance)
    short_uid_extractor: Callable[[str], str]
    # List-typed properties to materialise as child nodes + edges at sync time.
    # Mandatory (may be []). Without an entry here, a list-typed property falls
    # back to comma-string flatten and emits a one-time warning.
    normalized_lists: list[NormalizedList]
 # AWS list-typed property catalog.
 # One entry per Cartography node property whose runtime value is a list. The
 # sync layer materialises each element as a `<child_label>` node and links it
 # to the parent with a `<rel_type>` edge; see the `NormalizedList` docstring
 # above for the naming conventions.
 AWS_NORMALIZED_LISTS: list[NormalizedList] = [
    # AWSPolicyStatement - the hot path driving the 53-query perf fix.
    NormalizedList(
        "AWSPolicyStatement", "action", "AWSPolicyStatementActionItem", "HAS_ACTION"
    ),
    NormalizedList(
        "AWSPolicyStatement",
        "notaction",
        "AWSPolicyStatementNotactionItem",
        "HAS_NOTACTION",
    ),
    NormalizedList(
        "AWSPolicyStatement",
        "resource",
        "AWSPolicyStatementResourceItem",
        "HAS_RESOURCE",
    ),
    NormalizedList(
        "AWSPolicyStatement",
        "notresource",
        "AWSPolicyStatementNotresourceItem",
        "HAS_NOTRESOURCE",
    ),
    # S3PolicyStatement - same shape as IAM policies; AWS allows list or string.
    NormalizedList(
        "S3PolicyStatement", "action", "S3PolicyStatementActionItem", "HAS_ACTION"
    ),
    NormalizedList(
        "S3PolicyStatement", "resource", "S3PolicyStatementResourceItem", "HAS_RESOURCE"
    ),
    # IAM / Cognito / KMS / Secrets
    NormalizedList(
        "CognitoIdentityPool", "roles", "CognitoIdentityPoolRolesItem", "HAS_ROLES"
    ),
    NormalizedList(
        "KMSKey",
        "encryption_algorithms",
        "KMSKeyEncryptionAlgorithmsItem",
        "HAS_ENCRYPTION_ALGORITHMS",
    ),
    NormalizedList(
        "KMSKey",
        "signing_algorithms",
        "KMSKeySigningAlgorithmsItem",
        "HAS_SIGNING_ALGORITHMS",
    ),
    NormalizedList(
        "KMSKey",
        "anonymous_actions",
        "KMSKeyAnonymousActionsItem",
        "HAS_ANONYMOUS_ACTIONS",
    ),
    NormalizedList(
        "KMSGrant", "operations", "KMSGrantOperationsItem", "HAS_OPERATIONS"
    ),
    NormalizedList(
        "SecretsManagerSecretVersion",
        "version_stages",
        "SecretsManagerSecretVersionVersionStagesItem",
        "HAS_VERSION_STAGES",
    ),
    NormalizedList(
        "SecretsManagerSecretVersion",
        "kms_key_ids",
        "SecretsManagerSecretVersionKmsKeyIdsItem",
        "HAS_KMS_KEY_IDS",
    ),
    NormalizedList(
        "SecretsManagerSecretVersion",
        "tags",
        "SecretsManagerSecretVersionTagsItem",
        "HAS_TAGS",
        field_map=[("Key", "key"), ("Value", "value_")],
        # `value` is reserved for scalar mode; map `Value` to `value_` to keep dict shape.
    ),
    # Lambda / Compute
    NormalizedList(
        "AWSLambda", "architectures", "AWSLambdaArchitecturesItem", "HAS_ARCHITECTURES"
    ),
    NormalizedList(
        "AWSLambda",
        "anonymous_actions",
        "AWSLambdaAnonymousActionsItem",
        "HAS_ANONYMOUS_ACTIONS",
    ),
    NormalizedList(
        "CodeBuildProject",
        "environment_variables",
        "CodeBuildProjectEnvironmentVariablesItem",
        "HAS_ENVIRONMENT_VARIABLES",
    ),
    # ECS family
    NormalizedList(
        "ECSCluster",
        "capacity_providers",
        "ECSClusterCapacityProvidersItem",
        "HAS_CAPACITY_PROVIDERS",
    ),
    NormalizedList(
        "ECSTaskDefinition",
        "compatibilities",
        "ECSTaskDefinitionCompatibilitiesItem",
        "HAS_COMPATIBILITIES",
    ),
    NormalizedList(
        "ECSTaskDefinition",
        "requires_compatibilities",
        "ECSTaskDefinitionRequiresCompatibilitiesItem",
        "HAS_REQUIRES_COMPATIBILITIES",
    ),
    NormalizedList(
        "ECSContainerDefinition",
        "links",
        "ECSContainerDefinitionLinksItem",
        "HAS_LINKS",
    ),
    NormalizedList(
        "ECSContainerDefinition",
        "entry_point",
        "ECSContainerDefinitionEntryPointItem",
        "HAS_ENTRY_POINT",
    ),
    NormalizedList(
        "ECSContainerDefinition",
        "command",
        "ECSContainerDefinitionCommandItem",
        "HAS_COMMAND",
    ),
    NormalizedList(
        "ECSContainerDefinition",
        "dns_servers",
        "ECSContainerDefinitionDnsServersItem",
        "HAS_DNS_SERVERS",
    ),
    NormalizedList(
        "ECSContainerDefinition",
        "dns_search_domains",
        "ECSContainerDefinitionDnsSearchDomainsItem",
        "HAS_DNS_SEARCH_DOMAINS",
    ),
    NormalizedList(
        "ECSContainerDefinition",
        "docker_security_options",
        "ECSContainerDefinitionDockerSecurityOptionsItem",
        "HAS_DOCKER_SECURITY_OPTIONS",
    ),
    NormalizedList("ECSContainer", "gpu_ids", "ECSContainerGpuIdsItem", "HAS_GPU_IDS"),
    # ECR
    NormalizedList(
        "ECRImage", "layer_diff_ids", "ECRImageLayerDiffIdsItem", "HAS_LAYER_DIFF_IDS"
    ),
    NormalizedList(
        "ECRImage",
        "child_image_digests",
        "ECRImageChildImageDigestsItem",
        "HAS_CHILD_IMAGE_DIGESTS",
    ),
    # EC2 / Networking
    NormalizedList(
        "EC2Instance",
        "exposed_internet_type",
        "EC2InstanceExposedInternetTypeItem",
        "HAS_EXPOSED_INTERNET_TYPE",
    ),
    NormalizedList(
        "AutoScalingGroup",
        "exposed_internet_type",
        "AutoScalingGroupExposedInternetTypeItem",
        "HAS_EXPOSED_INTERNET_TYPE",
    ),
    NormalizedList(
        "LaunchConfiguration",
        "security_groups",
        "LaunchConfigurationSecurityGroupsItem",
        "HAS_SECURITY_GROUPS",
    ),
    NormalizedList(
        "LaunchTemplateVersion",
        "security_group_ids",
        "LaunchTemplateVersionSecurityGroupIdsItem",
        "HAS_SECURITY_GROUP_IDS",
    ),
    NormalizedList(
        "LaunchTemplateVersion",
        "security_groups",
        "LaunchTemplateVersionSecurityGroupsItem",
        "HAS_SECURITY_GROUPS",
    ),
    NormalizedList(
        "ELBListener", "policy_names", "ELBListenerPolicyNamesItem", "HAS_POLICY_NAMES"
    ),
    # CloudFront / Route53 / CloudWatch / CloudTrail
    NormalizedList(
        "CloudFrontDistribution",
        "aliases",
        "CloudFrontDistributionAliasesItem",
        "HAS_ALIASES",
    ),
    NormalizedList(
        "CloudFrontDistribution",
        "geo_restriction_locations",
        "CloudFrontDistributionGeoRestrictionLocationsItem",
        "HAS_GEO_RESTRICTION_LOCATIONS",
    ),
    NormalizedList(
        "CloudWatchLogGroup",
        "inherited_properties",
        "CloudWatchLogGroupInheritedPropertiesItem",
        "HAS_INHERITED_PROPERTIES",
    ),
    # RDS / Storage
    NormalizedList(
        "RDSCluster",
        "availability_zones",
        "RDSClusterAvailabilityZonesItem",
        "HAS_AVAILABILITY_ZONES",
    ),
    NormalizedList(
        "RDSEventSubscription",
        "event_categories",
        "RDSEventSubscriptionEventCategoriesItem",
        "HAS_EVENT_CATEGORIES",
    ),
    NormalizedList(
        "RDSEventSubscription",
        "source_ids",
        "RDSEventSubscriptionSourceIdsItem",
        "HAS_SOURCE_IDS",
    ),
    NormalizedList(
        "S3Bucket",
        "anonymous_actions",
        "S3BucketAnonymousActionsItem",
        "HAS_ANONYMOUS_ACTIONS",
    ),
    # Inspector / Config / SSM / ACM / APIGateway / Glue / SageMaker / Bedrock
    NormalizedList(
        "AWSInspectorFinding",
        "referenceurls",
        "AWSInspectorFindingReferenceurlsItem",
        "HAS_REFERENCEURLS",
    ),
    NormalizedList(
        "AWSInspectorFinding",
        "relatedvulnerabilities",
        "AWSInspectorFindingRelatedvulnerabilitiesItem",
        "HAS_RELATEDVULNERABILITIES",
    ),
    NormalizedList(
        "AWSInspectorFinding",
        "vulnerablepackageids",
        "AWSInspectorFindingVulnerablepackageidsItem",
        "HAS_VULNERABLEPACKAGEIDS",
    ),
    NormalizedList(
        "AWSConfigurationRecorder",
        "recording_group_resource_types",
        "AWSConfigurationRecorderRecordingGroupResourceTypesItem",
        "HAS_RECORDING_GROUP_RESOURCE_TYPES",
    ),
    NormalizedList(
        "AWSConfigRule",
        "scope_compliance_resource_types",
        "AWSConfigRuleScopeComplianceResourceTypesItem",
        "HAS_SCOPE_COMPLIANCE_RESOURCE_TYPES",
    ),
    NormalizedList(
        "AWSConfigRule",
        "source_details",
        "AWSConfigRuleSourceDetailsItem",
        "HAS_SOURCE_DETAILS",
    ),
    NormalizedList(
        "SSMInstancePatch", "cve_ids", "SSMInstancePatchCveIdsItem", "HAS_CVE_IDS"
    ),
    NormalizedList(
        "ACMCertificate", "in_use_by", "ACMCertificateInUseByItem", "HAS_IN_USE_BY"
    ),
    NormalizedList(
        "APIGatewayRestAPI",
        "anonymous_actions",
        "APIGatewayRestAPIAnonymousActionsItem",
        "HAS_ANONYMOUS_ACTIONS",
    ),
    NormalizedList(
        "GlueJob", "connections", "GlueJobConnectionsItem", "HAS_CONNECTIONS"
    ),
    NormalizedList(
        "AWSBedrockFoundationModel",
        "input_modalities",
        "AWSBedrockFoundationModelInputModalitiesItem",
        "HAS_INPUT_MODALITIES",
    ),
    NormalizedList(
        "AWSBedrockFoundationModel",
        "output_modalities",
        "AWSBedrockFoundationModelOutputModalitiesItem",
        "HAS_OUTPUT_MODALITIES",
    ),
    NormalizedList(
        "AWSBedrockFoundationModel",
        "customizations_supported",
        "AWSBedrockFoundationModelCustomizationsSupportedItem",
        "HAS_CUSTOMIZATIONS_SUPPORTED",
    ),
    NormalizedList(
        "AWSBedrockFoundationModel",
        "inference_types_supported",
        "AWSBedrockFoundationModelInferenceTypesSupportedItem",
        "HAS_INFERENCE_TYPES_SUPPORTED",
    ),
 ]
 AWS_CONFIG = ProviderConfig(
    name="aws",
    root_node_label="AWSAccount",
    uid_field="arn",
    resource_label="_AWSResource",
    ingestion_function=aws.start_aws_ingestion,
    short_uid_extractor=aws.extract_short_uid,
    normalized_lists=AWS_NORMALIZED_LISTS,
 )
 PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
    "aws": AWS_CONFIG,
 }
@@ -1,8 +1,6 @@
 # Cypher query templates for Attack Paths operations
 from tasks.jobs.attack_paths.config import (
    INTERNET_NODE_LABEL,
    PROVIDER_ELEMENT_ID_PROPERTY,
    PROVIDER_RESOURCE_LABEL,
    PROWLER_FINDING_LABEL,
 )
@@ -21,7 +19,6 @@ def render_cypher_template(template: str, replacements: dict[str, str]) -> str:
 # Findings queries (used by findings.py)
 # ---------------------------------------
 ADD_RESOURCE_LABEL_TEMPLATE = """
    MATCH (account:__ROOT_LABEL__ {id: $provider_uid})-->(r)
@@ -88,7 +85,6 @@ INSERT_FINDING_TEMPLATE = f"""
 """
 # Internet queries (used by internet.py)
 # ---------------------------------------
 CREATE_INTERNET_NODE = f"""
    MERGE (internet:{INTERNET_NODE_LABEL} {{id: 'Internet'}})
@@ -118,8 +114,8 @@ CREATE_CAN_ACCESS_RELATIONSHIPS_TEMPLATE = f"""
    RETURN COUNT(r) AS relationships_merged
 """
-# Sync queries (used by sync.py)
+# Sync queries (used by sync.py to fetch from the cartography temp Neo4j DB)
-# -------------------------------
+# The write side of sync lives in each sink (`api/attack_paths/sink/`).
 NODE_FETCH_QUERY = """
    MATCH (n)
@@ -143,17 +139,3 @@ RELATIONSHIPS_FETCH_QUERY = """
    ORDER BY internal_id
    LIMIT $batch_size
 """
 NODE_SYNC_TEMPLATE = f"""
    UNWIND $rows AS row
    MERGE (n:__NODE_LABELS__ {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.provider_element_id}})
    SET n += row.props
 """
 RELATIONSHIP_SYNC_TEMPLATE = f"""
    UNWIND $rows AS row
    MATCH (s:{PROVIDER_RESOURCE_LABEL} {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.start_element_id}})
    MATCH (t:{PROVIDER_RESOURCE_LABEL} {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.end_element_id}})
    MERGE (s)-[r:__REL_TYPE__ {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.provider_element_id}}]->(t)
    SET r += row.props
 """
@@ -39,8 +39,8 @@ Pipeline steps:
 7. Sync the temp database into the tenant database:
   - Drop the old provider subgraph (matched by dynamic _Provider_{uuid} label).
-     graph_data_ready is set to False for all scans of this provider while
+     graph_data_ready is set to False for scans of this provider in the
-     the swap happens so the API doesn't serve partial data.
+     target sink while the swap happens so the API doesn't serve partial data.
   - Copy nodes and relationships in batches. Every synced node gets a
     _ProviderResource label and dynamic _Tenant_{uuid} / _Provider_{uuid}
     isolation labels, plus a _provider_element_id property for MERGE keys.
@@ -64,10 +64,17 @@ from api.models import StateChoices
 from api.utils import initialize_prowler_provider
 from cartography.config import Config as CartographyConfig
 from cartography.intel import analysis as cartography_analysis
 from cartography.intel import create_indexes as cartography_create_indexes
 from cartography.intel import ontology as cartography_ontology
 from celery.utils.log import get_task_logger
-from tasks.jobs.attack_paths import db_utils, findings, indexes, internet, sync, utils
+from django.conf import settings
 from tasks.jobs.attack_paths import (
    db_utils,
    findings,
    indexes,
    internet,
    sync,
    utils,
 )
 from tasks.jobs.attack_paths.config import get_cartography_ingestion_function
 # Without this Celery goes crazy with Cartography logging
@@ -96,7 +103,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    attack_paths_scan = db_utils.retrieve_attack_paths_scan(tenant_id, scan_id)
    # Idempotency guard: cleanup may have flipped this row to a terminal state
-    # while the message was still in flight. Bail out before touching state.
+    # while the message was still in flight. Bail out before touching state
    if attack_paths_scan and attack_paths_scan.state in (
        StateChoices.FAILED,
        StateChoices.COMPLETED,
@@ -125,7 +132,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    else:
        if not attack_paths_scan:
-            # Safety net for in-flight messages or direct task invocations; dispatcher normally pre-creates the row.
+            # Safety net for in-flight messages or direct task invocations; dispatcher normally pre-creates the row
            logger.warning(
                f"No Attack Paths Scan found for scan {scan_id} and tenant {tenant_id}, let's create it then"
            )
@@ -143,10 +150,18 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    tenant_database_name = graph_database.get_database_name(
        prowler_api_provider.tenant_id
    )
    target_sink_backend = settings.ATTACK_PATHS_SINK_DATABASE
    target_description = (
        f"tenant Neo4j database {tenant_database_name}"
        if target_sink_backend == "neo4j"
        else f"{target_sink_backend} sink"
    )
    # While creating the Cartography configuration, attributes `neo4j_user` and `neo4j_password` are not really needed in this config object
    tmp_cartography_config = CartographyConfig(
-        neo4j_uri=graph_database.get_uri(),
+        # The temp ingest database is always Neo4j, so use the ingest URI here
        # rather than the sink URI (which points at Neptune when configured).
        neo4j_uri=graph_database.get_ingest_uri(),
        neo4j_database=tmp_database_name,
        update_tag=int(time.time()),
    )
@@ -169,6 +184,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    logger.info(
        f"Starting Attack Paths scan ({attack_paths_scan.id}) for "
        f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id} "
        f"(staging=Neo4j database {tmp_database_name}, target={target_description})"
    )
    subgraph_dropped = False
@@ -177,7 +193,8 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    try:
        logger.info(
-            f"Creating Neo4j database {tmp_cartography_config.neo4j_database} for tenant {prowler_api_provider.tenant_id}"
+            f"Creating staging Neo4j database {tmp_cartography_config.neo4j_database} "
            f"for tenant {prowler_api_provider.tenant_id}"
        )
        graph_database.create_database(tmp_cartography_config.neo4j_database)
@@ -191,7 +208,9 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            tmp_cartography_config.neo4j_database
        ) as tmp_neo4j_session:
            # Indexes creation
-            cartography_create_indexes.run(tmp_neo4j_session, tmp_cartography_config)
+            indexes.create_cartography_indexes(
                tmp_neo4j_session, tmp_cartography_config
            )
            indexes.create_findings_indexes(tmp_neo4j_session)
            db_utils.update_attack_paths_scan_progress(attack_paths_scan, 2)
@@ -223,7 +242,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            cartography_analysis.run(tmp_neo4j_session, tmp_cartography_config)
            db_utils.update_attack_paths_scan_progress(attack_paths_scan, 95)
-            # Creating Internet node and CAN_ACCESS relationships
+            # Creating Internet node and `CAN_ACCESS` relationships
            logger.info(
                f"Creating Internet graph for AWS account {prowler_api_provider.uid}"
            )
@@ -247,23 +266,41 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            db_utils.update_attack_paths_scan_progress(attack_paths_scan, 97)
        logger.info(
-            f"Clearing Neo4j cache for database {tmp_cartography_config.neo4j_database}"
+            f"Clearing Neo4j cache for staging database {tmp_cartography_config.neo4j_database}"
        )
        graph_database.clear_cache(tmp_cartography_config.neo4j_database)
        t0 = time.perf_counter()
        logger.info(
-            f"Ensuring tenant database {tenant_database_name}, and its indexes, exists for tenant {prowler_api_provider.tenant_id}"
+            f"Preparing target {target_description} for tenant {prowler_api_provider.tenant_id}"
        )
        graph_database.create_database(tenant_database_name)
-        with graph_database.get_session(tenant_database_name) as tenant_neo4j_session:
+        # Sink-side index creation: Neptune auto-manages indexes and rejects
-            cartography_create_indexes.run(
+        # `CREATE INDEX`, so only run it when the sink is Neo4j
        # The temp ingest DB is always Neo4j and is always indexed above
        if target_sink_backend != "neptune":
            logger.info(f"Ensuring indexes exist for {target_description}")
            with graph_database.get_session(
                tenant_database_name
            ) as tenant_neo4j_session:
                indexes.create_cartography_indexes(
                    tenant_neo4j_session, tenant_cartography_config
                )
                indexes.create_findings_indexes(tenant_neo4j_session)
                indexes.create_sync_indexes(tenant_neo4j_session)
        else:
            logger.info("Skipping tenant database indexes for neptune sink")
        logger.info(
            f"Prepared target {target_description} in {time.perf_counter() - t0:.3f}s"
        )
-        logger.info(f"Deleting existing provider graph in {tenant_database_name}")
+        logger.info(
-        db_utils.set_provider_graph_data_ready(attack_paths_scan, False)
+            f"Deleting existing provider graph from {target_description} "
            f"(tenant={prowler_api_provider.tenant_id}, provider={prowler_api_provider.id})"
        )
        db_utils.set_provider_graph_data_ready(
            attack_paths_scan, False, target_sink_backend
        )
        provider_gated = True
        t0 = time.perf_counter()
@@ -272,14 +309,17 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            provider_id=str(prowler_api_provider.id),
        )
        logger.info(
-            f"Deleted existing provider graph in {time.perf_counter() - t0:.3f}s "
+            f"Deleted existing provider graph from {target_description} "
-            f"(deleted_nodes={deleted_nodes})"
+            f"in {time.perf_counter() - t0:.3f}s (deleted_nodes={deleted_nodes})"
        )
        subgraph_dropped = True
        db_utils.update_attack_paths_scan_progress(attack_paths_scan, 98)
        logger.info(
-            f"Syncing graph from {tmp_database_name} into {tenant_database_name}"
+            f"Syncing staging graph {tmp_database_name} into {target_description} "
            f"for provider {prowler_api_provider.id} "
            f"(tenant {prowler_api_provider.tenant_id}, "
            f"type {prowler_api_provider.provider})"
        )
        t0 = time.perf_counter()
        sync_result = sync.sync_graph(
@@ -287,16 +327,33 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            target_database=tenant_database_name,
            tenant_id=str(prowler_api_provider.tenant_id),
            provider_id=str(prowler_api_provider.id),
            provider_type=prowler_api_provider.provider,
        )
        elapsed = time.perf_counter() - t0
        total_nodes = sync_result["nodes"] + sync_result["child_nodes"]
        elements = total_nodes + sync_result["relationships"]
        rate = elements / elapsed if elapsed else 0
        logger.info(
-            f"Synced graph in {time.perf_counter() - t0:.3f}s "
+            f"Synced staging graph into {target_description} in {elapsed:.3f}s - "
-            f"(nodes={sync_result['nodes']}, relationships={sync_result['relationships']})"
+            f"nodes={total_nodes} (source={sync_result['nodes']}, "
            f"items={sync_result['child_nodes']}), "
            f"relationships={sync_result['relationships']} "
            f"(structural={sync_result['structural_relationships']}, "
            f"items={sync_result['item_relationships']}), "
            f"~{rate:.0f} elem/s"
        )
        sync_completed = True
        # Flip metadata only now: the new schema is live in the target sink, so
        # reads can switch to the current catalog/backend. The target-sink gate
        # is already closed, so the switch is atomic from the API's view.
        db_utils.set_scan_migrated(attack_paths_scan, True, target_sink_backend)
        db_utils.set_graph_data_ready(attack_paths_scan, True)
        db_utils.update_attack_paths_scan_progress(attack_paths_scan, 99)
-        logger.info(f"Clearing Neo4j cache for database {tenant_database_name}")
+        if target_sink_backend == "neptune":
            logger.info("Skipping cache clear for neptune sink")
        else:
            logger.info(f"Clearing Neo4j cache for target {target_description}")
            graph_database.clear_cache(tenant_database_name)
        logger.info(f"Dropping temporary Neo4j database {tmp_database_name}")
@@ -316,14 +373,16 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
        logger.exception(exception_message)
        ingestion_exceptions["global_error"] = exception_message
-        # Recover graph_data_ready based on how far the swap got.
+        # Recover `graph_data_ready` based on how far the swap got
-        # Partial drop (mid-batch failure) may leave `subgraph_dropped=False`
+        # Partial drop (mid-batch failure) may leave `subgraph_dropped=False` with data partially deleted,
-        # with data partially deleted, so we prefer that over permanently blocked queries.
+        # so we prefer that over permanently blocked queries
        try:
            if sync_completed:
                db_utils.set_graph_data_ready(attack_paths_scan, True)
            elif provider_gated and not subgraph_dropped:
-                db_utils.set_provider_graph_data_ready(attack_paths_scan, True)
+                db_utils.set_provider_graph_data_ready(
                    attack_paths_scan, True, target_sink_backend
                )
        except Exception:
            logger.error(
@@ -1,40 +1,57 @@
 """
 Graph sync operations for Attack Paths.
-This module handles syncing graph data from temporary scan databases
+Reads nodes and relationships out of the cartography temp database (always
-to the tenant database, adding provider isolation labels and properties.
+Neo4j) and hands them to the configured sink (Neo4j or Neptune) in batches.
 Backend-specific Cypher (MERGE shape, ID strategy, indexes) lives in each
 sink; this module owns the source read loop, per-batch grouping, and the
 list-property materialisation policy (see `NormalizedList`).
 Each list-typed node property that appears in the provider's
 `normalized_lists` catalog becomes a set of child item nodes connected to
 the parent by a typed edge. A list-typed property that is not in the
 catalog is serialised to a comma-delimited string and emits a one-time
 warning per (label, property), surfacing Cartography fields that should be
 added to the catalog.
 """
 import json
 import time
 from collections import defaultdict
 from typing import Any
 import neo4j
 from api.attack_paths import database as graph_database
 from api.attack_paths import sink as sink_module
 from celery.utils.log import get_task_logger
 from tasks.jobs.attack_paths.config import (
    PROVIDER_CONFIGS,
    PROVIDER_ISOLATION_PROPERTIES,
    PROVIDER_RESOURCE_LABEL,
    SYNC_BATCH_SIZE,
    NormalizedList,
    get_provider_label,
    get_tenant_label,
 )
 from tasks.jobs.attack_paths.queries import (
    NODE_FETCH_QUERY,
    NODE_SYNC_TEMPLATE,
    RELATIONSHIP_SYNC_TEMPLATE,
    RELATIONSHIPS_FETCH_QUERY,
    render_cypher_template,
 )
 logger = get_task_logger(__name__)
 # (label, property) tuples for which we've already emitted the
 # "unnormalised list" warning. Module-level so the warning fires once per
 # process, not once per node.
 _WARNED_UNNORMALIZED: set[tuple[str, str]] = set()
 def sync_graph(
    source_database: str,
    target_database: str,
    tenant_id: str,
    provider_id: str,
    provider_type: str,
 ) -> dict[str, int]:
    """
    Sync all nodes and relationships from source to target database.
@@ -44,25 +61,38 @@ def sync_graph(
        `target_database`: The tenant database
        `tenant_id`: The tenant ID for isolation
        `provider_id`: The provider ID for isolation
        `provider_type`: Provider type key (e.g. "aws"), used to resolve the
            `NormalizedList` catalog from `PROVIDER_CONFIGS`.
    Returns:
-        Dict with counts of synced nodes and relationships
+        Dict with counts of synced nodes, child item nodes, and relationships.
    """
-    nodes_synced = sync_nodes(
+    sink = sink_module.get_backend()
    sink.ensure_sync_indexes(target_database)
    normalized_lists = _resolve_normalized_lists(provider_type)
    node_result = sync_nodes(
        source_database,
        target_database,
        tenant_id,
        provider_id,
        sink,
        normalized_lists,
    )
    relationships_synced = sync_relationships(
        source_database,
        target_database,
        provider_id,
        sink,
    )
    return {
-        "nodes": nodes_synced,
+        "nodes": node_result["parents"],
-        "relationships": relationships_synced,
+        "child_nodes": node_result["children"],
        "relationships": relationships_synced + node_result["parent_child_rels"],
        "structural_relationships": relationships_synced,
        "item_relationships": node_result["parent_child_rels"],
    }
@@ -71,22 +101,35 @@ def sync_nodes(
    target_database: str,
    tenant_id: str,
    provider_id: str,
-) -> int:
+    sink: Any,
    normalized_lists: list[NormalizedList],
 ) -> dict[str, int]:
    """
-    Sync nodes from source to target database.
+    Sync nodes from source to target database, exploding catalogued list
    properties into child nodes + parent->child edges.
    Adds `_ProviderResource` label and dynamic `_Tenant_{id}` and `_Provider_{id}`
-    isolation labels to all nodes.
+    isolation labels to all nodes (parents and children alike).
    Source and target sessions are opened sequentially per batch to avoid
    holding two Bolt connections simultaneously for the entire sync duration.
    """
    t0 = time.perf_counter()
    last_id = -1
-    total_synced = 0
+    parents_synced = 0
    children_synced = 0
    parent_child_rels = 0
    catalog = _build_catalog_index(normalized_lists)
    extra_labels = _build_extra_labels(tenant_id, provider_id)
    while True:
-        grouped: dict[tuple[str, ...], list[dict[str, Any]]] = defaultdict(list)
+        tb = time.perf_counter()
        prev_children = children_synced
        prev_rels = parent_child_rels
        parent_groups: dict[tuple[str, ...], list[dict[str, Any]]] = defaultdict(list)
        child_groups: dict[str, list[dict[str, Any]]] = defaultdict(list)
        rel_groups: dict[str, list[dict[str, Any]]] = defaultdict(list)
        batch_count = 0
        with graph_database.get_session(source_database) as source_session:
@@ -97,43 +140,65 @@ def sync_nodes(
            for record in result:
                batch_count += 1
                last_id = record["internal_id"]
-                key, value = _node_to_sync_dict(record, provider_id)
+                key, parent_dict, children, rels = _node_to_sync_dict(
-                grouped[key].append(value)
+                    record, provider_id, catalog
                )
                parent_groups[key].append(parent_dict)
                for child in children:
                    child_groups[child["_child_label"]].append(child["row"])
                for rel in rels:
                    rel_groups[rel["rel_type"]].append(rel["row"])
        if batch_count == 0:
            break
-        with graph_database.get_session(target_database) as target_session:
+        for labels, batch in parent_groups.items():
-            for labels, batch in grouped.items():
+            sink.write_nodes(
-                label_set = set(labels)
+                target_database, _render_labels(labels, extra_labels), batch
                label_set.add(PROVIDER_RESOURCE_LABEL)
                label_set.add(get_tenant_label(tenant_id))
                label_set.add(get_provider_label(provider_id))
                node_labels = ":".join(f"`{label}`" for label in sorted(label_set))
                query = render_cypher_template(
                    NODE_SYNC_TEMPLATE, {"__NODE_LABELS__": node_labels}
            )
                target_session.run(query, {"rows": batch})
-        total_synced += batch_count
+        for child_label, batch in child_groups.items():
            sink.write_nodes(
                target_database,
                _render_labels((child_label,), extra_labels),
                batch,
            )
            children_synced += len(batch)
        for rel_type, batch in rel_groups.items():
            sink.write_relationships(target_database, rel_type, provider_id, batch)
            parent_child_rels += len(batch)
        parents_synced += batch_count
        batch_dt = time.perf_counter() - tb
        batch_elements = (
            batch_count
            + (children_synced - prev_children)
            + (parent_child_rels - prev_rels)
        )
        rate = batch_elements / batch_dt if batch_dt else 0
        logger.info(
-            f"Synced {total_synced} nodes from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
+            f"[sync nodes] {parents_synced} source (+{children_synced} items, "
            f"+{parent_child_rels} item rels) · batch {batch_dt:.1f}s · "
            f"elapsed {time.perf_counter() - t0:.1f}s · ~{rate:.0f} elem/s"
        )
-    return total_synced
+    return {
        "parents": parents_synced,
        "children": children_synced,
        "parent_child_rels": parent_child_rels,
    }
 def sync_relationships(
    source_database: str,
    target_database: str,
    provider_id: str,
    sink: Any,
 ) -> int:
    """
    Sync relationships from source to target database.
    Matches source and target nodes by `_provider_element_id` in the tenant database.
    Source and target sessions are opened sequentially per batch to avoid
    holding two Bolt connections simultaneously for the entire sync duration.
    """
@@ -142,6 +207,7 @@ def sync_relationships(
    total_synced = 0
    while True:
        tb = time.perf_counter()
        grouped: dict[str, list[dict[str, Any]]] = defaultdict(list)
        batch_count = 0
@@ -159,32 +225,197 @@ def sync_relationships(
        if batch_count == 0:
            break
        with graph_database.get_session(target_database) as target_session:
        for rel_type, batch in grouped.items():
-                query = render_cypher_template(
+            sink.write_relationships(target_database, rel_type, provider_id, batch)
                    RELATIONSHIP_SYNC_TEMPLATE, {"__REL_TYPE__": rel_type}
                )
                target_session.run(query, {"rows": batch})
        total_synced += batch_count
        batch_dt = time.perf_counter() - tb
        rate = batch_count / batch_dt if batch_dt else 0
        logger.info(
-            f"Synced {total_synced} relationships from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
+            f"[sync rels] {total_synced} structural · batch {batch_dt:.1f}s · "
            f"elapsed {time.perf_counter() - t0:.1f}s · ~{rate:.0f}/s"
        )
    return total_synced
 def _node_to_sync_dict(
-    record: neo4j.Record, provider_id: str
+    record: neo4j.Record,
-) -> tuple[tuple[str, ...], dict[str, Any]]:
+    provider_id: str,
-    """Transform a source node record into a (grouping_key, sync_dict) pair."""
+    catalog: dict[tuple[str, str], NormalizedList],
 ) -> tuple[
    tuple[str, ...],
    dict[str, Any],
    list[dict[str, Any]],
    list[dict[str, Any]],
 ]:
    """Transform a source node record into a (grouping_key, sync_dict, children, rels) tuple.
    Catalogued list properties are popped from `props` and emitted as child
    nodes + parent->child relationships.
    """
    props = dict(record["props"] or {})
    _strip_internal_properties(props)
    labels = tuple(sorted(set(record["labels"] or [])))
-    return labels, {
+    parent_element_id = f"{provider_id}:{record['element_id']}"
-        "provider_element_id": f"{provider_id}:{record['element_id']}",
+
    children, rels = _explode_catalogued_lists(
        labels, props, catalog, provider_id, parent_element_id
    )
    _normalize_sink_properties(props, labels)
    parent = {
        "provider_element_id": parent_element_id,
        "props": props,
    }
    return labels, parent, children, rels
 def _explode_catalogued_lists(
    labels: tuple[str, ...],
    props: dict[str, Any],
    catalog: dict[tuple[str, str], NormalizedList],
    provider_id: str,
    parent_element_id: str,
 ) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
    """Pop catalogued list properties from `props` and produce child + rel emits.
    A node may carry multiple labels (e.g. `AWSPolicyStatement` plus
    `_AWSResource`); we check each label for catalog matches independently.
    Returns:
        - children: list of {"_child_label": str, "row": <node row>} dicts.
        - rels:     list of {"rel_type": str, "row": <rel row>} dicts.
    """
    children: list[dict[str, Any]] = []
    rels: list[dict[str, Any]] = []
    for label in labels:
        for key in list(props.keys()):
            spec = catalog.get((label, key))
            if spec is None:
                continue
            value = props.pop(key)
            if value is None:
                continue
            if not isinstance(value, list):
                # Catalogued but not actually a list this scan - fall back to
                # the generic normaliser so we don't lose the value.
                props[key] = value
                continue
            for item in value:
                child_value_key, child_props = _build_child_props(spec, item)
                if child_value_key is None:
                    continue
                child_element_id = _build_child_id(
                    provider_id, spec.child_label, child_value_key
                )
                children.append(
                    {
                        "_child_label": spec.child_label,
                        "row": {
                            "provider_element_id": child_element_id,
                            "props": child_props,
                        },
                    }
                )
                rels.append(
                    {
                        "rel_type": spec.rel_type,
                        "row": {
                            "start_element_id": parent_element_id,
                            "end_element_id": child_element_id,
                            "provider_element_id": (
                                f"{parent_element_id}::{spec.rel_type}::"
                                f"{child_element_id}"
                            ),
                            "props": {},
                        },
                    }
                )
    return children, rels
 def _build_child_props(
    spec: NormalizedList, item: Any
 ) -> tuple[str | None, dict[str, Any]]:
    """Translate one list element into a child node's prop dict.
    Returns (dedup_key, props). The dedup_key is what makes two child nodes
    equal within (tenant, provider) - used to build `_provider_element_id`.
    For scalar mode, the dedup key is the value itself. For dict mode it is
    a stable concatenation of the mapped fields in `field_map` order.
    """
    if not spec.field_map:
        if isinstance(item, (dict, list)):
            # Defensive: caller marked this list as scalar but elements are
            # structured. Convert to a stable string so the value survives.
            value_str = json.dumps(item, sort_keys=True, default=str)
        else:
            value_str = str(item)
        return value_str, {"value": value_str}
    if not isinstance(item, dict):
        # Catalogued as dict-shape but got a scalar. Skip - caller will see
        # the value go missing and can fix the field_map.
        return None, {}
    props: dict[str, Any] = {}
    dedup_parts: list[str] = []
    for src_key, child_field in spec.field_map:
        raw = item.get(src_key)
        value_str = _to_sink_property_value(raw) if raw is not None else ""
        props[child_field] = value_str
        dedup_parts.append(f"{child_field}={value_str}")
    return "::".join(dedup_parts), props
 def _build_child_id(provider_id: str, child_label: str, value_key: str) -> str:
    """Deterministic `_provider_element_id` for a list-item child node.
    Dedupes within (tenant, provider): multiple parents referencing the same
    value share one child node via the existing MERGE-on-_provider_element_id
    index in both sinks.
    """
    return f"{provider_id}::{child_label}::{value_key}"
 def _build_catalog_index(
    normalized_lists: list[NormalizedList],
 ) -> dict[tuple[str, str], NormalizedList]:
    """Index the catalog by (source_label, source_property) for O(1) lookup."""
    return {
        (spec.source_label, spec.source_property): spec for spec in normalized_lists
    }
 def _build_extra_labels(tenant_id: str, provider_id: str) -> tuple[str, ...]:
    return (
        PROVIDER_RESOURCE_LABEL,
        get_tenant_label(tenant_id),
        get_provider_label(provider_id),
    )
 def _render_labels(base_labels: tuple[str, ...], extra_labels: tuple[str, ...]) -> str:
    """Render the Cypher label string for a node-write batch."""
    label_set = set(base_labels) | set(extra_labels)
    return ":".join(f"`{label}`" for label in sorted(label_set))
 def _resolve_normalized_lists(provider_type: str) -> list[NormalizedList]:
    config = PROVIDER_CONFIGS.get(provider_type)
    if config is None:
        # Unknown provider: empty catalog. Any list-typed property will be
        # serialised to a comma-delimited string with one warning per
        # (label, property).
        logger.warning(
            "Provider type %s not in PROVIDER_CONFIGS; no normalized_lists active",
            provider_type,
        )
        return []
    return config.normalized_lists
 def _rel_to_sync_dict(
@@ -193,7 +424,11 @@ def _rel_to_sync_dict(
    """Transform a source relationship record into a (grouping_key, sync_dict) pair."""
    props = dict(record["props"] or {})
    _strip_internal_properties(props)
    # Relationship properties go through the same primitive coercion as
    # nodes; catalog-driven materialisation applies to node properties only.
    _normalize_sink_properties(props, labels=None)
    rel_type = record["rel_type"]
    return rel_type, {
        "start_element_id": f"{provider_id}:{record['start_element_id']}",
        "end_element_id": f"{provider_id}:{record['end_element_id']}",
@@ -206,3 +441,80 @@ def _strip_internal_properties(props: dict[str, Any]) -> None:
    """Remove provider isolation properties before the += spread in sync templates."""
    for key in PROVIDER_ISOLATION_PROPERTIES:
        props.pop(key, None)
 def _normalize_sink_properties(
    props: dict[str, Any], labels: tuple[str, ...] | None
 ) -> None:
    """Normalize property values to primitive Cypher literals for either sink.
    Attack-paths node and relationship properties are written as primitive
    scalars regardless of the active sink (Neo4j or Neptune). The convention
    is driven by Neptune's openCypher type restrictions, which reject list,
    map, temporal and spatial property values, but it is applied uniformly
    so that custom and predefined queries are portable across sinks without
    runtime rewriting.
    Concretely:
      - Temporal values (neo4j.time.{DateTime,Date,Time,Duration}) become
        their ISO-8601 string representation.
      - Spatial values (neo4j.spatial.Point and subclasses) become their
        WKT-style string representation.
      - Maps / dicts become a JSON-encoded string, read back with `CONTAINS`
        substring checks inside queries.
      - Lists become a comma-delimited string. Catalogued list properties
        are materialised as child item nodes upstream in
        `_explode_catalogued_lists` and never reach this point; any list
        seen here is uncatalogued, so we log a one-time warning per
        (label, property) to surface Cartography fields that should be
        added to the catalog.
    `labels` is only used for the warning message; pass `None` for
    relationship props (no label context).
    """
    for key, value in list(props.items()):
        if isinstance(value, list) and labels is not None:
            _warn_unnormalized_list(labels, key)
        props[key] = _to_sink_property_value(value)
 def _warn_unnormalized_list(labels: tuple[str, ...], key: str) -> None:
    """Warn once per (label, property), on the real label(s) only.
    Every synced node also carries internal isolation labels (`_AWSResource`,
    `_ProviderResource`, `_Tenant_*`, `_Provider_*`); warning on those just
    doubles the noise, so skip them and point at the actionable Cartography
    label. Falls back to all labels if only internal ones are present.
    """
    real_labels = [label for label in labels if not label.startswith("_")]
    for label in real_labels or labels:
        token = (label, key)
        if token in _WARNED_UNNORMALIZED:
            continue
        _WARNED_UNNORMALIZED.add(token)
        logger.warning(
            "Unnormalized list property %s.%s reached sink as comma-string; "
            "add a NormalizedList entry to the provider catalog to explode it",
            label,
            key,
        )
 def _to_sink_property_value(value: Any) -> Any:
    if hasattr(value, "iso_format") and callable(value.iso_format):
        return value.iso_format()
    if type(value).__module__.startswith("neo4j.spatial"):
        return str(value)
    if isinstance(value, dict):
        # openCypher `SET` rejects map property values: encode as JSON so the structured payload
        # survives the round-trip and is queryable with `CONTAINS` substring checks
        return json.dumps(value, sort_keys=True, default=str)
    if isinstance(value, list):
        # openCypher `SET` rejects list/array property values: encode as a
        # delimited string read back with split() inside queries
        return ",".join(str(_to_sink_property_value(v)) for v in value)
    return value
@@ -1,4 +1,5 @@
 from api.attack_paths import database as graph_database
 from api.attack_paths import sink as sink_module
 from api.db_router import MainRouter
 from api.db_utils import batch_delete, rls_transaction
 from api.models import (
@@ -76,6 +77,12 @@ def delete_provider(tenant_id: str, pk: str):
                "id", flat=True
            )
        )
        attack_paths_sink_backends = list(
            AttackPathsScan.all_objects.filter(provider=instance)
            .values_list("sink_backend", flat=True)
            .distinct()
            .order_by("sink_backend")
        )
        deletion_steps = [
            ("Scan Summaries", ScanSummary.all_objects.filter(scan__provider=instance)),
@@ -97,6 +104,12 @@ def delete_provider(tenant_id: str, pk: str):
    # Delete the Attack Paths' graph data related to the provider from the tenant database
    tenant_database_name = graph_database.get_database_name(tenant_id)
    try:
        if attack_paths_sink_backends:
            for sink_backend in attack_paths_sink_backends:
                sink_module.get_backend_for_name(sink_backend).drop_subgraph(
                    tenant_database_name, str(pk)
                )
        else:
            graph_database.drop_subgraph(tenant_database_name, str(pk))
    except graph_database.GraphDatabaseQueryException as gdb_error:
@@ -23,6 +23,14 @@ from tasks.jobs.attack_paths import internet as internet_module
 from tasks.jobs.attack_paths import sync as sync_module
 from tasks.jobs.attack_paths.scan import run as attack_paths_run
 SYNC_RESULT_EMPTY = {
    "nodes": 0,
    "child_nodes": 0,
    "relationships": 0,
    "structural_relationships": 0,
    "item_relationships": 0,
 }
@pytest.mark.django_db
 class TestAttackPathsRun:
@@ -32,6 +40,7 @@ class TestAttackPathsRun:
        "tasks.jobs.attack_paths.scan.utils.call_within_event_loop",
        side_effect=lambda fn, *a, **kw: fn(*a, **kw),
    )
    @patch("tasks.jobs.attack_paths.scan.db_utils.set_scan_migrated")
    @patch("tasks.jobs.attack_paths.scan.db_utils.set_graph_data_ready")
    @patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
    @patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
@@ -39,7 +48,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph", return_value=0)
    @patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@@ -48,11 +57,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -66,7 +75,7 @@ class TestAttackPathsRun:
    def test_run_success_flow(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -83,6 +92,7 @@ class TestAttackPathsRun:
        mock_finish,
        mock_set_provider_graph_data_ready,
        mock_set_graph_data_ready,
        mock_set_scan_migrated,
        mock_event_loop,
        mock_drop_db,
        tenants_fixture,
@@ -159,6 +169,7 @@ class TestAttackPathsRun:
            target_database="tenant-db",
            tenant_id=str(provider.tenant_id),
            provider_id=str(provider.id),
            provider_type="aws",
        )
        mock_get_ingestion.assert_called_once_with(provider.provider)
        mock_event_loop.assert_called_once()
@@ -172,9 +183,12 @@ class TestAttackPathsRun:
            attack_paths_scan, StateChoices.COMPLETED, ingestion_result
        )
        mock_set_provider_graph_data_ready.assert_called_once_with(
-            attack_paths_scan, False
+            attack_paths_scan, False, "neo4j"
        )
        mock_set_graph_data_ready.assert_called_once_with(attack_paths_scan, True)
        # is_migrated is flipped to True only after the sync succeeds, so reads
        # don't switch to the new catalog/sink before the graph is live.
        mock_set_scan_migrated.assert_called_once_with(attack_paths_scan, True, "neo4j")
    @patch(
        "tasks.jobs.attack_paths.scan.utils.stringify_exception",
@@ -194,13 +208,13 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.internet.analysis")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.get_database_name",
        return_value="db-scan-id",
    )
-    @patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
+    @patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
    @patch(
        "tasks.jobs.attack_paths.scan.initialize_prowler_provider",
        return_value=MagicMock(_enabled_regions=["us-east-1"]),
@@ -212,7 +226,7 @@ class TestAttackPathsRun:
    def test_run_failure_marks_scan_failed(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_get_db_name,
        mock_create_db,
        mock_cartography_indexes,
@@ -293,13 +307,13 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.internet.analysis")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.get_database_name",
        return_value="db-scan-id",
    )
-    @patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
+    @patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
    @patch(
        "tasks.jobs.attack_paths.scan.initialize_prowler_provider",
        return_value=MagicMock(_enabled_regions=["us-east-1"]),
@@ -311,7 +325,7 @@ class TestAttackPathsRun:
    def test_failure_before_gate_does_not_flip_graph_data_ready_true(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_get_db_name,
        mock_create_db,
        mock_cartography_indexes,
@@ -396,13 +410,13 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.internet.analysis")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.get_database_name",
        return_value="db-scan-id",
    )
-    @patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
+    @patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
    @patch(
        "tasks.jobs.attack_paths.scan.initialize_prowler_provider",
        return_value=MagicMock(_enabled_regions=["us-east-1"]),
@@ -414,7 +428,7 @@ class TestAttackPathsRun:
    def test_run_failure_marks_scan_failed_even_when_drop_database_fails(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_get_db_name,
        mock_create_db,
        mock_cartography_indexes,
@@ -493,7 +507,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
@@ -505,11 +519,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -523,7 +537,7 @@ class TestAttackPathsRun:
    def test_failure_after_gate_before_drop_restores_graph_data_ready(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -589,8 +603,8 @@ class TestAttackPathsRun:
                attack_paths_run(str(tenant.id), str(scan.id), "task-456")
        assert mock_set_provider_graph_data_ready.call_args_list == [
-            call(attack_paths_scan, False),
+            call(attack_paths_scan, False, "neo4j"),
-            call(attack_paths_scan, True),
+            call(attack_paths_scan, True, "neo4j"),
        ]
    @patch(
@@ -618,11 +632,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -636,7 +650,7 @@ class TestAttackPathsRun:
    def test_failure_after_drop_before_sync_leaves_graph_data_ready_false(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -703,7 +717,7 @@ class TestAttackPathsRun:
        # Only called with False (gate), never with True (no recovery for partial data)
        mock_set_provider_graph_data_ready.assert_called_once_with(
-            attack_paths_scan, False
+            attack_paths_scan, False, "neo4j"
        )
    @patch(
@@ -716,6 +730,7 @@ class TestAttackPathsRun:
    )
    @patch("tasks.jobs.attack_paths.scan.graph_database.drop_database")
    @patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
    @patch("tasks.jobs.attack_paths.scan.db_utils.set_scan_migrated")
    @patch(
        "tasks.jobs.attack_paths.scan.db_utils.set_graph_data_ready",
        side_effect=[RuntimeError("flag failed"), None],
@@ -725,7 +740,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@@ -734,11 +749,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -752,7 +767,7 @@ class TestAttackPathsRun:
    def test_failure_after_sync_restores_graph_data_ready(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -768,6 +783,7 @@ class TestAttackPathsRun:
        mock_update_progress,
        mock_set_provider_graph_data_ready,
        mock_set_graph_data_ready,
        mock_set_scan_migrated,
        mock_finish,
        mock_drop_db,
        mock_event_loop,
@@ -824,8 +840,11 @@ class TestAttackPathsRun:
        ]
        # set_provider_graph_data_ready only called once with False (the gate)
        mock_set_provider_graph_data_ready.assert_called_once_with(
-            attack_paths_scan, False
+            attack_paths_scan, False, "neo4j"
        )
        # is_migrated is flipped once after the sync and is not touched again by
        # the failure-recovery branch
        mock_set_scan_migrated.assert_called_once_with(attack_paths_scan, True, "neo4j")
    @patch(
        "tasks.jobs.attack_paths.scan.utils.stringify_exception",
@@ -843,7 +862,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
@@ -855,11 +874,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -873,7 +892,7 @@ class TestAttackPathsRun:
    def test_recovery_failure_does_not_suppress_original_exception(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -1116,7 +1135,7 @@ class TestFailAttackPathsScan:
            fail_attack_paths_scan(str(tenant.id), "nonexistent", "setup exploded")
    def test_fail_recovers_graph_data_ready_when_data_exists(
-        self, tenants_fixture, providers_fixture, scans_fixture
+        self, tenants_fixture, providers_fixture, scans_fixture, sink_backend_stub
    ):
        from tasks.jobs.attack_paths.db_utils import fail_attack_paths_scan
@@ -1135,16 +1154,18 @@ class TestFailAttackPathsScan:
            state=StateChoices.EXECUTING,
        )
        # `recover_graph_data_ready` routes `has_provider_data` through
        # `sink_module.get_backend_for_scan(scan)`. With `is_migrated=False`
        # and the default `ATTACK_PATHS_SINK_DATABASE=neo4j`, the factory
        # returns the active backend, which `sink_backend_stub` replaces.
        sink_backend_stub.has_provider_data.return_value = True
        with (
            patch(
                "tasks.jobs.attack_paths.db_utils.retrieve_attack_paths_scan",
                return_value=attack_paths_scan,
            ),
            patch("tasks.jobs.attack_paths.db_utils.graph_database.drop_database"),
            patch(
                "tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data",
                return_value=True,
            ),
            patch(
                "tasks.jobs.attack_paths.db_utils.set_provider_graph_data_ready"
            ) as mock_set_ready,
@@ -1154,7 +1175,7 @@ class TestFailAttackPathsScan:
        mock_set_ready.assert_called_once_with(attack_paths_scan, True)
    def test_fail_leaves_graph_data_ready_false_when_no_data(
-        self, tenants_fixture, providers_fixture, scans_fixture
+        self, tenants_fixture, providers_fixture, scans_fixture, sink_backend_stub
    ):
        from tasks.jobs.attack_paths.db_utils import fail_attack_paths_scan
@@ -1173,16 +1194,14 @@ class TestFailAttackPathsScan:
            state=StateChoices.EXECUTING,
        )
        sink_backend_stub.has_provider_data.return_value = False
        with (
            patch(
                "tasks.jobs.attack_paths.db_utils.retrieve_attack_paths_scan",
                return_value=attack_paths_scan,
            ),
            patch("tasks.jobs.attack_paths.db_utils.graph_database.drop_database"),
            patch(
                "tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data",
                return_value=False,
            ),
            patch(
                "tasks.jobs.attack_paths.db_utils.set_provider_graph_data_ready"
            ) as mock_set_ready,
@@ -1271,6 +1290,20 @@ class TestAttackPathsFindingsHelpers:
            [call(mock_session, stmt) for stmt in FINDINGS_INDEX_STATEMENTS]
        )
    def test_create_findings_indexes_runs_even_when_sink_is_neptune(self, settings):
        # The index helpers run against the temp ingest DB, which is always
        # Neo4j regardless of the configured sink. A Neptune sink must not
        # suppress index creation on that DB (regression for the dropped
        # in-helper sink gate).
        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
        mock_session = MagicMock()
        with patch("tasks.jobs.attack_paths.indexes.run_write_query") as mock_run_write:
            indexes_module.create_findings_indexes(mock_session)
        from tasks.jobs.attack_paths.indexes import FINDINGS_INDEX_STATEMENTS
        assert mock_run_write.call_count == len(FINDINGS_INDEX_STATEMENTS)
    def test_load_findings_batches_requests(self, providers_fixture):
        provider = providers_fixture[0]
        provider.provider = Provider.ProviderChoices.AWS
@@ -1802,7 +1835,7 @@ def _make_session_ctx(session, call_order=None, name=None):
 class TestSyncNodes:
-    def test_sync_nodes_adds_private_label(self):
+    def test_sync_nodes_passes_isolation_labels_to_sink(self):
        row = {
            "internal_id": 1,
            "element_id": "elem-1",
@@ -1812,29 +1845,32 @@ class TestSyncNodes:
        mock_source_1 = MagicMock()
        mock_source_1.run.return_value = [row]
        mock_target = MagicMock()
        mock_source_2 = MagicMock()
        mock_source_2.run.return_value = []
        sink = MagicMock()
        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[
                _make_session_ctx(mock_source_1),
                _make_session_ctx(mock_target),
                _make_session_ctx(mock_source_2),
            ],
        ):
-            total = sync_module.sync_nodes(
+            result = sync_module.sync_nodes(
-                "source-db", "target-db", "tenant-1", "prov-1"
+                "source-db", "target-db", "tenant-1", "prov-1", sink, []
            )
-        assert total == 1
+        assert result["parents"] == 1
-        query = mock_target.run.call_args.args[0]
+        sink.write_nodes.assert_called_once()
-        assert "_ProviderResource" in query
+        target_db, labels, batch = sink.write_nodes.call_args.args
-        assert "_Tenant_tenant1" in query
+        assert target_db == "target-db"
-        assert "_Provider_prov1" in query
+        assert "_ProviderResource" in labels
        assert "_Tenant_tenant1" in labels
        assert "_Provider_prov1" in labels
        assert batch[0]["provider_element_id"] == "prov-1:elem-1"
        assert batch[0]["props"] == {"key": "value"}
-    def test_sync_nodes_source_closes_before_target_opens(self):
+    def test_sync_nodes_writes_after_source_session_closes(self):
        row = {
            "internal_id": 1,
            "element_id": "elem-1",
@@ -1846,21 +1882,23 @@ class TestSyncNodes:
        src_1 = MagicMock()
        src_1.run.return_value = [row]
        tgt = MagicMock()
        src_2 = MagicMock()
        src_2.run.return_value = []
        sink = MagicMock()
        sink.write_nodes.side_effect = lambda *_a, **_kw: call_order.append(
            "sink:write"
        )
        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[
                _make_session_ctx(src_1, call_order, "source1"),
                _make_session_ctx(tgt, call_order, "target"),
                _make_session_ctx(src_2, call_order, "source2"),
            ],
        ):
-            sync_module.sync_nodes("src-db", "tgt-db", "t-1", "p-1")
+            sync_module.sync_nodes("src-db", "tgt-db", "t-1", "p-1", sink, [])
-        assert call_order.index("source1:exit") < call_order.index("target:enter")
+        assert call_order.index("source1:exit") < call_order.index("sink:write")
    def test_sync_nodes_pagination_with_batch_size_1(self):
        row_a = {
@@ -1882,44 +1920,44 @@ class TestSyncNodes:
        src_2.run.return_value = [row_b]
        src_3 = MagicMock()
        src_3.run.return_value = []
-        tgt_1 = MagicMock()
+        sink = MagicMock()
        tgt_2 = MagicMock()
        with (
            patch(
                "tasks.jobs.attack_paths.sync.graph_database.get_session",
                side_effect=[
                    _make_session_ctx(src_1),
                    _make_session_ctx(tgt_1),
                    _make_session_ctx(src_2),
                    _make_session_ctx(tgt_2),
                    _make_session_ctx(src_3),
                ],
            ),
            patch("tasks.jobs.attack_paths.sync.SYNC_BATCH_SIZE", 1),
        ):
-            total = sync_module.sync_nodes("src", "tgt", "t-1", "p-1")
+            result = sync_module.sync_nodes("src", "tgt", "t-1", "p-1", sink, [])
-        assert total == 2
+        assert result["parents"] == 2
        assert sink.write_nodes.call_count == 2
        assert src_1.run.call_args.args[1]["last_id"] == -1
        assert src_2.run.call_args.args[1]["last_id"] == 1
    def test_sync_nodes_empty_source_returns_zero(self):
        src = MagicMock()
        src.run.return_value = []
        sink = MagicMock()
        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[_make_session_ctx(src)],
        ) as mock_get_session:
-            total = sync_module.sync_nodes("src", "tgt", "t-1", "p-1")
+            result = sync_module.sync_nodes("src", "tgt", "t-1", "p-1", sink, [])
-        assert total == 0
+        assert result["parents"] == 0
        assert mock_get_session.call_count == 1
        sink.write_nodes.assert_not_called()
 class TestSyncRelationships:
-    def test_sync_relationships_source_closes_before_target_opens(self):
+    def test_sync_relationships_writes_after_source_session_closes(self):
        row = {
            "internal_id": 1,
            "rel_type": "HAS",
@@ -1932,21 +1970,23 @@ class TestSyncRelationships:
        src_1 = MagicMock()
        src_1.run.return_value = [row]
        tgt = MagicMock()
        src_2 = MagicMock()
        src_2.run.return_value = []
        sink = MagicMock()
        sink.write_relationships.side_effect = lambda *_a, **_kw: call_order.append(
            "sink:write"
        )
        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[
                _make_session_ctx(src_1, call_order, "source1"),
                _make_session_ctx(tgt, call_order, "target"),
                _make_session_ctx(src_2, call_order, "source2"),
            ],
        ):
-            sync_module.sync_relationships("src", "tgt", "p-1")
+            sync_module.sync_relationships("src", "tgt", "p-1", sink)
-        assert call_order.index("source1:exit") < call_order.index("target:enter")
+        assert call_order.index("source1:exit") < call_order.index("sink:write")
    def test_sync_relationships_pagination_with_batch_size_1(self):
        row_a = {
@@ -1970,40 +2010,40 @@ class TestSyncRelationships:
        src_2.run.return_value = [row_b]
        src_3 = MagicMock()
        src_3.run.return_value = []
-        tgt_1 = MagicMock()
+        sink = MagicMock()
        tgt_2 = MagicMock()
        with (
            patch(
                "tasks.jobs.attack_paths.sync.graph_database.get_session",
                side_effect=[
                    _make_session_ctx(src_1),
                    _make_session_ctx(tgt_1),
                    _make_session_ctx(src_2),
                    _make_session_ctx(tgt_2),
                    _make_session_ctx(src_3),
                ],
            ),
            patch("tasks.jobs.attack_paths.sync.SYNC_BATCH_SIZE", 1),
        ):
-            total = sync_module.sync_relationships("src", "tgt", "p-1")
+            total = sync_module.sync_relationships("src", "tgt", "p-1", sink)
        assert total == 2
        assert sink.write_relationships.call_count == 2
        assert src_1.run.call_args.args[1]["last_id"] == -1
        assert src_2.run.call_args.args[1]["last_id"] == 1
    def test_sync_relationships_empty_source_returns_zero(self):
        src = MagicMock()
        src.run.return_value = []
        sink = MagicMock()
        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[_make_session_ctx(src)],
        ) as mock_get_session:
-            total = sync_module.sync_relationships("src", "tgt", "p-1")
+            total = sync_module.sync_relationships("src", "tgt", "p-1", sink)
        assert total == 0
        assert mock_get_session.call_count == 1
        sink.write_relationships.assert_not_called()
 class TestInternetAnalysis:
@@ -2075,6 +2115,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is False
        assert attack_paths_scan.is_migrated is False
        assert attack_paths_scan.sink_backend == "neo4j"
    def test_create_attack_paths_scan_inherits_true_from_previous(
        self, tenants_fixture, providers_fixture, scans_fixture
@@ -2095,6 +2137,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            is_migrated=True,
            sink_backend="neptune",
        )
        new_scan = Scan.objects.create(
@@ -2115,6 +2159,109 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is True
        # is_migrated tracks the data being served: inherited from the ready scan
        assert attack_paths_scan.is_migrated is True
        assert attack_paths_scan.sink_backend == "neptune"
    def test_create_attack_paths_scan_prefers_active_sink_ready_scan(
        self, tenants_fixture, providers_fixture, scans_fixture, settings
    ):
        from tasks.jobs.attack_paths.db_utils import create_attack_paths_scan
        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
        tenant = tenants_fixture[0]
        provider = providers_fixture[0]
        provider.provider = Provider.ProviderChoices.AWS
        provider.save()
        scan = scans_fixture[0]
        scan.provider = provider
        scan.save()
        AttackPathsScan.objects.create(
            tenant_id=tenant.id,
            provider=provider,
            scan=scan,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            is_migrated=False,
            sink_backend="neo4j",
        )
        AttackPathsScan.objects.create(
            tenant_id=tenant.id,
            provider=provider,
            scan=scan,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            is_migrated=True,
            sink_backend="neptune",
        )
        new_scan = Scan.objects.create(
            name="New Scan",
            provider=provider,
            trigger=Scan.TriggerChoices.MANUAL,
            state=StateChoices.AVAILABLE,
            tenant_id=tenant.id,
        )
        with patch(
            "tasks.jobs.attack_paths.db_utils.rls_transaction",
            new=lambda *args, **kwargs: nullcontext(),
        ):
            attack_paths_scan = create_attack_paths_scan(
                str(tenant.id), str(new_scan.id), provider.id
            )
        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is True
        assert attack_paths_scan.is_migrated is False
        assert attack_paths_scan.sink_backend == "neo4j"
    def test_create_attack_paths_scan_inherits_is_migrated_false_from_legacy_ready(
        self, tenants_fixture, providers_fixture, scans_fixture
    ):
        from tasks.jobs.attack_paths.db_utils import create_attack_paths_scan
        tenant = tenants_fixture[0]
        provider = providers_fixture[0]
        provider.provider = Provider.ProviderChoices.AWS
        provider.save()
        scan = scans_fixture[0]
        scan.provider = provider
        scan.save()
        # Previous scan is ready but pre-cutover (legacy Neo4j graph shape)
        AttackPathsScan.objects.create(
            tenant_id=tenant.id,
            provider=provider,
            scan=scan,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            is_migrated=False,
            sink_backend="neo4j",
        )
        new_scan = Scan.objects.create(
            name="New Scan",
            provider=provider,
            trigger=Scan.TriggerChoices.MANUAL,
            state=StateChoices.AVAILABLE,
            tenant_id=tenant.id,
        )
        with patch(
            "tasks.jobs.attack_paths.db_utils.rls_transaction",
            new=lambda *args, **kwargs: nullcontext(),
        ):
            attack_paths_scan = create_attack_paths_scan(
                str(tenant.id), str(new_scan.id), provider.id
            )
        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is True
        # Reads stay on the legacy catalog/backend until this scan's own sync
        assert attack_paths_scan.is_migrated is False
        assert attack_paths_scan.sink_backend == "neo4j"
    def test_create_attack_paths_scan_inherits_false_when_no_previous_ready(
        self, tenants_fixture, providers_fixture, scans_fixture
@@ -2135,6 +2282,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan,
            state=StateChoices.FAILED,
            graph_data_ready=False,
            sink_backend="neptune",
        )
        new_scan = Scan.objects.create(
@@ -2155,6 +2303,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is False
        assert attack_paths_scan.is_migrated is False
        assert attack_paths_scan.sink_backend == "neo4j"
    def test_set_graph_data_ready_updates_field(
        self, tenants_fixture, providers_fixture, scans_fixture
@@ -2261,7 +2411,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert attack_paths_scan.state == StateChoices.FAILED
        assert attack_paths_scan.graph_data_ready is True
-    def test_set_provider_graph_data_ready_updates_all_scans_for_provider(
+    def test_set_provider_graph_data_ready_updates_all_scans_for_provider_sink(
        self, tenants_fixture, providers_fixture, scans_fixture
    ):
        from tasks.jobs.attack_paths.db_utils import set_provider_graph_data_ready
@@ -2289,6 +2439,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan_a,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            sink_backend="neptune",
        )
        new_ap_scan = AttackPathsScan.objects.create(
            tenant_id=tenant.id,
@@ -2296,6 +2447,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan_b,
            state=StateChoices.EXECUTING,
            graph_data_ready=True,
            sink_backend="neptune",
        )
        with patch(
@@ -2309,6 +2461,48 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert old_ap_scan.graph_data_ready is False
        assert new_ap_scan.graph_data_ready is False
    def test_set_provider_graph_data_ready_preserves_other_sink_scans(
        self, tenants_fixture, providers_fixture, scans_fixture
    ):
        from tasks.jobs.attack_paths.db_utils import set_provider_graph_data_ready
        tenant = tenants_fixture[0]
        provider = providers_fixture[0]
        provider.provider = Provider.ProviderChoices.AWS
        provider.save()
        scan = scans_fixture[0]
        scan.provider = provider
        scan.save()
        legacy_scan = AttackPathsScan.objects.create(
            tenant_id=tenant.id,
            provider=provider,
            scan=scan,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
            sink_backend="neo4j",
        )
        neptune_scan = AttackPathsScan.objects.create(
            tenant_id=tenant.id,
            provider=provider,
            scan=scan,
            state=StateChoices.EXECUTING,
            graph_data_ready=True,
            sink_backend="neptune",
        )
        with patch(
            "tasks.jobs.attack_paths.db_utils.rls_transaction",
            new=lambda *args, **kwargs: nullcontext(),
        ):
            set_provider_graph_data_ready(neptune_scan, False)
        legacy_scan.refresh_from_db()
        neptune_scan.refresh_from_db()
        assert legacy_scan.graph_data_ready is True
        assert neptune_scan.graph_data_ready is False
    def test_set_provider_graph_data_ready_does_not_affect_other_providers(
        self, tenants_fixture, providers_fixture, scans_fixture
    ):
@@ -2871,3 +3065,57 @@ class TestCleanupStaleAttackPathsScans:
        ap_scan.refresh_from_db()
        assert ap_scan.state == StateChoices.SCHEDULED
        mock_revoke.assert_not_called()
 class TestNormalizeSinkProperties:
    """Coerce Cartography-emitted property values into sink-portable primitives.
    Lists become comma-strings, dicts become JSON strings, temporals become
    ISO strings, spatials become their stringified form. The same coercion
    runs regardless of the active sink so queries are portable.
    """
    @pytest.mark.parametrize(
        "raw, expected",
        [
            (
                {"a": "x", "b": 1, "c": 1.5, "d": True, "e": None},
                {"a": "x", "b": 1, "c": 1.5, "d": True, "e": None},
            ),
            (
                {"actions": ["s3:GetObject", "s3:PutObject"], "tags": []},
                {"actions": "s3:GetObject,s3:PutObject", "tags": ""},
            ),
            (
                {"condition": {"StringEquals": {"aws:SourceAccount": "123456789012"}}},
                {
                    "condition": '{"StringEquals": {"aws:SourceAccount": "123456789012"}}'
                },
            ),
        ],
    )
    def test_primitive_list_and_dict_branches(self, raw, expected):
        sync_module._normalize_sink_properties(raw, labels=None)
        assert raw == expected
    def test_temporal_and_spatial_become_strings(self):
        class FakeDateTime:
            def iso_format(self) -> str:
                return "2026-05-13T10:00:00+00:00"
        class FakeSpatialPoint:
            def __str__(self) -> str:
                return "POINT(1.0 2.0)"
        # The spatial branch is detected by module prefix, not by base class.
        FakeSpatialPoint.__module__ = "neo4j.spatial.fake"
        props = {
            "created_at": FakeDateTime(),
            "location": FakeSpatialPoint(),
        }
        sync_module._normalize_sink_properties(props, labels=None)
        assert props == {
            "created_at": "2026-05-13T10:00:00+00:00",
            "location": "POINT(1.0 2.0)",
        }
@@ -1,4 +1,4 @@
-from unittest.mock import call, patch
+from unittest.mock import MagicMock, call, patch
 import pytest
 from api.attack_paths import database as graph_database
@@ -60,10 +60,12 @@ class TestDeleteProvider:
        aps1 = create_attack_paths_scan(instance)
        aps2 = create_attack_paths_scan(instance)
        backend = MagicMock()
        with (
            patch(
-                "tasks.jobs.deletion.graph_database.drop_subgraph",
+                "tasks.jobs.deletion.sink_module.get_backend_for_name",
                return_value=backend,
            ),
            patch(
                "tasks.jobs.deletion.graph_database.drop_database",
@@ -72,12 +74,55 @@ class TestDeleteProvider:
            result = delete_provider(tenant_id, instance.id)
        assert result
        backend.drop_subgraph.assert_called_once_with(
            graph_database.get_database_name(tenant_id), str(instance.id)
        )
        expected_tmp_calls = [
            call(f"db-tmp-scan-{str(aps1.id).lower()}"),
            call(f"db-tmp-scan-{str(aps2.id).lower()}"),
        ]
        mock_drop_database.assert_has_calls(expected_tmp_calls, any_order=True)
    def test_delete_provider_drops_graph_data_from_all_recorded_sinks(
        self, providers_fixture, create_attack_paths_scan
    ):
        instance = providers_fixture[0]
        tenant_id = str(instance.tenant_id)
        create_attack_paths_scan(instance, sink_backend="neo4j")
        create_attack_paths_scan(instance, sink_backend="neptune")
        neo4j_backend = MagicMock()
        neptune_backend = MagicMock()
        def get_backend_for_name(name):
            return {
                "neo4j": neo4j_backend,
                "neptune": neptune_backend,
            }[name]
        with (
            patch(
                "tasks.jobs.deletion.graph_database.get_database_name",
                return_value="tenant-db",
            ),
            patch(
                "tasks.jobs.deletion.sink_module.get_backend_for_name",
                side_effect=get_backend_for_name,
            ) as mock_get_backend_for_name,
            patch("tasks.jobs.deletion.graph_database.drop_database"),
        ):
            result = delete_provider(tenant_id, instance.id)
        assert result
        mock_get_backend_for_name.assert_has_calls(
            [call("neo4j"), call("neptune")], any_order=True
        )
        neo4j_backend.drop_subgraph.assert_called_once_with(
            "tenant-db", str(instance.id)
        )
        neptune_backend.drop_subgraph.assert_called_once_with(
            "tenant-db", str(instance.id)
        )
    def test_delete_provider_continues_when_temp_db_drop_fails(
        self, providers_fixture, create_attack_paths_scan
    ):
@@ -85,10 +130,12 @@ class TestDeleteProvider:
        tenant_id = str(instance.tenant_id)
        create_attack_paths_scan(instance)
        backend = MagicMock()
        with (
            patch(
-                "tasks.jobs.deletion.graph_database.drop_subgraph",
+                "tasks.jobs.deletion.sink_module.get_backend_for_name",
                return_value=backend,
            ),
            patch(
                "tasks.jobs.deletion.graph_database.drop_database",
@@ -110,7 +110,7 @@ constraints = [
    { name = "blinker", specifier = "==1.9.0" },
    { name = "boto3", specifier = "==1.40.61" },
    { name = "botocore", specifier = "==1.40.61" },
-    { name = "cartography", specifier = "==0.135.0" },
+    { name = "cartography", specifier = "==0.138.1" },
    { name = "celery", specifier = "==5.6.2" },
    { name = "certifi", specifier = "==2026.1.4" },
    { name = "cffi", specifier = "==2.0.0" },
@@ -364,7 +364,7 @@ constraints = [
    { name = "wcwidth", specifier = "==0.5.3" },
    { name = "websocket-client", specifier = "==1.9.0" },
    { name = "werkzeug", specifier = "==3.1.7" },
-    { name = "workos", specifier = "==6.0.4" },
+    { name = "workos", specifier = "==6.0.8" },
    { name = "wrapt", specifier = "==1.17.3" },
    { name = "xlsxwriter", specifier = "==3.2.9" },
    { name = "xmlsec", specifier = "==1.3.17" },
@@ -376,6 +376,7 @@ constraints = [
    { name = "zstd", specifier = "==1.5.7.3" },
 ]
 overrides = [
    { name = "azure-mgmt-containerservice", specifier = "==34.1.0" },
    { name = "dulwich", specifier = "==1.2.5" },
    { name = "microsoft-kiota-abstractions", specifier = "==1.9.9" },
    { name = "okta", specifier = "==3.4.2" },
@@ -1407,6 +1408,20 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/3d/66/0d8ae9ca4d75e57746026a1f9a10a7e25029511c128cf20166fce516bda9/azure_mgmt_logic-10.0.0-py3-none-any.whl", hash = "sha256:525c78afedf3edb35eb0a16152c8beba89769ee1bc6af01bcdc42842a551e443", size = 235433, upload-time = "2022-06-13T01:38:27.333Z" },
 ]
 [[package]]
 name = "azure-mgmt-managementgroups"
 version = "1.1.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "azure-mgmt-core" },
    { name = "isodate" },
    { name = "typing-extensions" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/fd/73/ac5e064ed7343e1b3172f32f09be3efca906087218d3046b5038f2f394ed/azure_mgmt_managementgroups-1.1.0.tar.gz", hash = "sha256:e6199baf118890ba2bda35dda83a88861c0b1bbef126311b20ec12eed9681951", size = 60101, upload-time = "2026-02-13T03:45:45.439Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/92/bc/993158de03cc0a49f2cf8192615ffedbc508c417cb3522e88f6652b714cc/azure_mgmt_managementgroups-1.1.0-py3-none-any.whl", hash = "sha256:140934589559ef6afcac6f1d24f995588a1965aaa89d47851c1cc639fafb1942", size = 83586, upload-time = "2026-02-13T03:45:46.836Z" },
 ]
 [[package]]
 name = "azure-mgmt-monitor"
 version = "6.0.2"
@@ -1726,7 +1741,7 @@ wheels = [
 [[package]]
 name = "cartography"
-version = "0.135.0"
+version = "0.138.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "adal" },
@@ -1746,6 +1761,7 @@ dependencies = [
    { name = "azure-mgmt-eventhub" },
    { name = "azure-mgmt-keyvault" },
    { name = "azure-mgmt-logic" },
    { name = "azure-mgmt-managementgroups" },
    { name = "azure-mgmt-monitor" },
    { name = "azure-mgmt-network" },
    { name = "azure-mgmt-resource" },
@@ -1754,6 +1770,7 @@ dependencies = [
    { name = "azure-mgmt-storage" },
    { name = "azure-mgmt-synapse" },
    { name = "azure-mgmt-web" },
    { name = "azure-storage-blob" },
    { name = "azure-synapse-artifacts" },
    { name = "backoff" },
    { name = "boto3" },
@@ -1765,8 +1782,12 @@ dependencies = [
    { name = "duo-client" },
    { name = "google-api-python-client" },
    { name = "google-auth" },
    { name = "google-cloud-aiplatform" },
    { name = "google-cloud-artifact-registry" },
    { name = "google-cloud-asset" },
    { name = "google-cloud-resource-manager" },
    { name = "google-cloud-run" },
    { name = "google-cloud-storage" },
    { name = "httpx" },
    { name = "kubernetes" },
    { name = "marshmallow" },
@@ -1792,9 +1813,9 @@ dependencies = [
    { name = "workos" },
    { name = "xmltodict" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/39/47/606851d2403a983b63813b9e95427a5dd896e49bc5a501868c041262e9a5/cartography-0.135.0.tar.gz", hash = "sha256:3f500cd22c3b392d00e8b49f62acc95fd4dcd559ce514aafe2eb8101133c7a49", size = 9106458, upload-time = "2026-04-10T16:25:34.898Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/51/cd/0eb6a5a3c89cc179801d902ade9719af1a583c516c00f50d72b8207db1eb/cartography-0.138.1.tar.gz", hash = "sha256:356e946a0bcac899cba293d57803c71bd35fdeabe623f5f67d9405d7a643af9f", size = 9756966, upload-time = "2026-06-19T22:11:32.411Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/b1/e1/99a26b3e662202be77961aba73338e1448623490710b81783e53a4bbef15/cartography-0.135.0-py3-none-any.whl", hash = "sha256:c62c32a6917b8f23a8b98fe2b6c7c4a918b50f55918482966c4dae1cf5f538e1", size = 1590545, upload-time = "2026-04-10T16:25:37.669Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/15/4447ec968825b2a19cba26ecb74964208aa3f941d9181a7782572e30b43d/cartography-0.138.1-py3-none-any.whl", hash = "sha256:88ec0898ea1a1b3f4653be9a3e7e61144f5cee20384b9040e92039617d39f029", size = 2014725, upload-time = "2026-06-19T22:11:29.886Z" },
 ]
 [[package]]
@@ -2511,6 +2532,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e3/26/57c6fb270950d476074c087527a558ccb6f4436657314bfb6cdf484114c4/docker-7.1.0-py3-none-any.whl", hash = "sha256:c96b93b7f0a746f9e77d325bcfb87422a3d8bd4f03136ae8a85b37f1898d5fc0", size = 147774, upload-time = "2024-05-23T11:13:55.01Z" },
 ]
 [[package]]
 name = "docstring-parser"
 version = "0.18.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/e0/4d/f332313098c1de1b2d2ff91cf2674415cc7cddab2ca1b01ae29774bd5fdf/docstring_parser-0.18.0.tar.gz", hash = "sha256:292510982205c12b1248696f44959db3cdd1740237a968ea1e2e7a900eeb2015", size = 29341, upload-time = "2026-04-14T04:09:19.867Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/a7/5f/ed01f9a3cdffbd5a008556fc7b2a08ddb1cc6ace7effa7340604b1d16699/docstring_parser-0.18.0-py3-none-any.whl", hash = "sha256:b3fcbed555c47d8479be0796ef7e19c2670d428d72e96da63f3a40122860374b", size = 22484, upload-time = "2026-04-14T04:09:18.638Z" },
 ]
 [[package]]
 name = "dogpile-cache"
 version = "1.5.0"
@@ -2851,6 +2881,11 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/83/1d/d6466de3a5249d35e832a52834115ca9d1d0de6abc22065f049707516d47/google_auth-2.48.0-py3-none-any.whl", hash = "sha256:2e2a537873d449434252a9632c28bfc268b0adb1e53f9fb62afc5333a975903f", size = 236499, upload-time = "2026-01-26T19:22:45.099Z" },
 ]
 [package.optional-dependencies]
 requests = [
    { name = "requests" },
 ]
 [[package]]
 name = "google-auth-httplib2"
 version = "0.2.0"
@@ -2877,6 +2912,46 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ca/94/24b010493660dd55e2d9769ae7ef44164aebd7e1f4a9266cf9459affd687/google_cloud_access_context_manager-0.3.0-py3-none-any.whl", hash = "sha256:5d15ad51547f06c281e35f16b4ffcb3e98bb2d898b01470f88b94edfb2eeb0a3", size = 58852, upload-time = "2025-10-17T02:30:33.768Z" },
 ]
 [[package]]
 name = "google-cloud-aiplatform"
 version = "1.153.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "docstring-parser" },
    { name = "google-api-core", extra = ["grpc"] },
    { name = "google-auth" },
    { name = "google-cloud-bigquery" },
    { name = "google-cloud-resource-manager" },
    { name = "google-cloud-storage" },
    { name = "google-genai" },
    { name = "packaging" },
    { name = "proto-plus" },
    { name = "protobuf" },
    { name = "pydantic" },
    { name = "typing-extensions" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/d5/97/1779e66ab845550bc602364311ea093ba156cb805a1c31b7c4d6f25b5863/google_cloud_aiplatform-1.153.1.tar.gz", hash = "sha256:445b6c683d5c630f174d81ae1f69f7da9e27e4d4ec5b70c5fe96de5c1247cfbc", size = 11011349, upload-time = "2026-05-15T06:34:14.851Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/16/01/8a1900e7a742ed480e6037ac4f6541466cb981d81bd4cbd34a9d46204ea1/google_cloud_aiplatform-1.153.1-py2.py3-none-any.whl", hash = "sha256:033fa1595a7e8ed1d97066e261e630f38fbc60e10c98c6487cf228fe9c7ec151", size = 9170782, upload-time = "2026-05-15T06:34:10.887Z" },
 ]
 [[package]]
 name = "google-cloud-artifact-registry"
 version = "1.21.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "google-api-core", extra = ["grpc"] },
    { name = "google-auth" },
    { name = "grpc-google-iam-v1" },
    { name = "grpcio" },
    { name = "proto-plus" },
    { name = "protobuf" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/13/2b/24e6956789bc1244efb18143aa4f124e03d870228e5bfd065c04d38a4d6b/google_cloud_artifact_registry-1.21.0.tar.gz", hash = "sha256:546e51eb5d463a6e5c668be6727d14f8ec82bc798031398006b2213d703e184c", size = 315219, upload-time = "2026-03-30T22:50:38.875Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/e1/8c/a5c68031728f38d3306bad5ac10c0ca670cbdf414db308ddefa2c47f2b34/google_cloud_artifact_registry-1.21.0-py3-none-any.whl", hash = "sha256:a07079035438fd0f2e7264d4318b388650495f011db575405c18c9881449025c", size = 250544, upload-time = "2026-03-30T22:48:49.345Z" },
 ]
 [[package]]
 name = "google-cloud-asset"
 version = "4.2.0"
@@ -2897,6 +2972,37 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/05/88/9a43fae1d2fed94d7f5f46b6f4c44bd15e5ea0e8657632108b5ec5f53d9d/google_cloud_asset-4.2.0-py3-none-any.whl", hash = "sha256:fd7ea04c64948a4779790343204cd5b41d4772d6ab1d05a9125e28a637ac0862", size = 282707, upload-time = "2026-01-09T14:53:03.081Z" },
 ]
 [[package]]
 name = "google-cloud-bigquery"
 version = "3.41.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "google-api-core", extra = ["grpc"] },
    { name = "google-auth" },
    { name = "google-cloud-core" },
    { name = "google-resumable-media" },
    { name = "packaging" },
    { name = "python-dateutil" },
    { name = "requests" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/ce/13/6515c7aab55a4a0cf708ffd309fb9af5bab54c13e32dc22c5acd6497193c/google_cloud_bigquery-3.41.0.tar.gz", hash = "sha256:2217e488b47ed576360c9b2cc07d59d883a54b83167c0ef37f915c26b01a06fe", size = 513434, upload-time = "2026-03-30T22:50:55.347Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/40/33/1d3902efadef9194566d499d61507e1f038454e0b55499d2d7f8ab2a4fee/google_cloud_bigquery-3.41.0-py3-none-any.whl", hash = "sha256:2a5b5a737b401cbd824a6e5eac7554100b878668d908e6548836b5d8aaa4dcaa", size = 262343, upload-time = "2026-03-30T22:48:45.444Z" },
 ]
 [[package]]
 name = "google-cloud-core"
 version = "2.6.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "google-api-core" },
    { name = "google-auth" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/a8/dd/1eef226e470369b26824a505c34482c0b493bc35fe8e0c6b003b5feca21a/google_cloud_core-2.6.0.tar.gz", hash = "sha256:e76149739f90fac1fc6757c09f47eaccb3145b54adbd7759b0f7c4b235f46c83", size = 36001, upload-time = "2026-05-07T08:04:04.124Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/84/4a/98da8930ab109c73d9a5d13782a9ebb81ea8c111f6d534a567b71d23e52b/google_cloud_core-2.6.0-py3-none-any.whl", hash = "sha256:6d63ac8e5eca6d9e4319d0a1e2265fadcd7f1049904378caecfa01cf52dd869e", size = 29390, upload-time = "2026-05-07T08:02:34.672Z" },
 ]
 [[package]]
 name = "google-cloud-org-policy"
 version = "1.16.0"
@@ -2946,6 +3052,93 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/94/ff/4b28bcc791d9d7e4ac8fea00fbd90ccb236afda56746a3b4564d2ae45df3/google_cloud_resource_manager-1.16.0-py3-none-any.whl", hash = "sha256:fb9a2ad2b5053c508e1c407ac31abfd1a22e91c32876c1892830724195819a28", size = 400218, upload-time = "2026-01-15T13:02:47.378Z" },
 ]
 [[package]]
 name = "google-cloud-run"
 version = "0.16.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "google-api-core", extra = ["grpc"] },
    { name = "google-auth" },
    { name = "grpc-google-iam-v1" },
    { name = "grpcio" },
    { name = "proto-plus" },
    { name = "protobuf" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/b7/89/dcaf0dc97e39b41e446456ceb60657ab025de79cfccd39cbd739d1a9849e/google_cloud_run-0.16.0.tar.gz", hash = "sha256:d52cf4e6ad3702ae48caccf6abcab543afee6f61c2a6ec753cc62a31e5b629f1", size = 514452, upload-time = "2026-03-26T22:17:05.589Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/fa/c7/46153dc13713b5e4276d86f28ff4563332f9e4bae5ebc83abc5bfd994801/google_cloud_run-0.16.0-py3-none-any.whl", hash = "sha256:d7d2dd7307130fde2a0ce27e96d580dd23b7b2d973b6484b94d902e6b2618860", size = 459112, upload-time = "2026-03-26T22:16:00.018Z" },
 ]
 [[package]]
 name = "google-cloud-storage"
 version = "3.10.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "google-api-core" },
    { name = "google-auth" },
    { name = "google-cloud-core" },
    { name = "google-crc32c" },
    { name = "google-resumable-media" },
    { name = "requests" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/4c/47/205eb8e9a1739b5345843e5a425775cbdc472cc38e7eda082ba5b8d02450/google_cloud_storage-3.10.1.tar.gz", hash = "sha256:97db9aa4460727982040edd2bd13ff3d5e2260b5331ad22895802da1fc2a5286", size = 17309950, upload-time = "2026-03-23T09:35:23.409Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/ad/ff/ca9ab2417fa913d75aae38bf40bf856bb2749a604b2e0f701b37cfcd23cc/google_cloud_storage-3.10.1-py3-none-any.whl", hash = "sha256:a72f656759b7b99bda700f901adcb3425a828d4a29f911bc26b3ea79c5b1217f", size = 324453, upload-time = "2026-03-23T09:35:21.368Z" },
 ]
 [[package]]
 name = "google-crc32c"
 version = "1.8.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/03/41/4b9c02f99e4c5fb477122cd5437403b552873f014616ac1d19ac8221a58d/google_crc32c-1.8.0.tar.gz", hash = "sha256:a428e25fb7691024de47fecfbff7ff957214da51eddded0da0ae0e0f03a2cf79", size = 14192, upload-time = "2025-12-16T00:35:25.142Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/5d/ef/21ccfaab3d5078d41efe8612e0ed0bfc9ce22475de074162a91a25f7980d/google_crc32c-1.8.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:014a7e68d623e9a4222d663931febc3033c5c7c9730785727de2a81f87d5bab8", size = 31298, upload-time = "2025-12-16T00:20:32.241Z" },
    { url = "https://files.pythonhosted.org/packages/c5/b8/f8413d3f4b676136e965e764ceedec904fe38ae8de0cdc52a12d8eb1096e/google_crc32c-1.8.0-cp311-cp311-macosx_12_0_x86_64.whl", hash = "sha256:86cfc00fe45a0ac7359e5214a1704e51a99e757d0272554874f419f79838c5f7", size = 30872, upload-time = "2025-12-16T00:33:58.785Z" },
    { url = "https://files.pythonhosted.org/packages/f6/fd/33aa4ec62b290477181c55bb1c9302c9698c58c0ce9a6ab4874abc8b0d60/google_crc32c-1.8.0-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:19b40d637a54cb71e0829179f6cb41835f0fbd9e8eb60552152a8b52c36cbe15", size = 33243, upload-time = "2025-12-16T00:40:21.46Z" },
    { url = "https://files.pythonhosted.org/packages/71/03/4820b3bd99c9653d1a5210cb32f9ba4da9681619b4d35b6a052432df4773/google_crc32c-1.8.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:17446feb05abddc187e5441a45971b8394ea4c1b6efd88ab0af393fd9e0a156a", size = 33608, upload-time = "2025-12-16T00:40:22.204Z" },
    { url = "https://files.pythonhosted.org/packages/7c/43/acf61476a11437bf9733fb2f70599b1ced11ec7ed9ea760fdd9a77d0c619/google_crc32c-1.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:71734788a88f551fbd6a97be9668a0020698e07b2bf5b3aa26a36c10cdfb27b2", size = 34439, upload-time = "2025-12-16T00:35:20.458Z" },
    { url = "https://files.pythonhosted.org/packages/e9/5f/7307325b1198b59324c0fa9807cafb551afb65e831699f2ce211ad5c8240/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:4b8286b659c1335172e39563ab0a768b8015e88e08329fa5321f774275fc3113", size = 31300, upload-time = "2025-12-16T00:21:56.723Z" },
    { url = "https://files.pythonhosted.org/packages/21/8e/58c0d5d86e2220e6a37befe7e6a94dd2f6006044b1a33edf1ff6d9f7e319/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:2a3dc3318507de089c5384cc74d54318401410f82aa65b2d9cdde9d297aca7cb", size = 30867, upload-time = "2025-12-16T00:38:31.302Z" },
    { url = "https://files.pythonhosted.org/packages/ce/a9/a780cc66f86335a6019f557a8aaca8fbb970728f0efd2430d15ff1beae0e/google_crc32c-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:14f87e04d613dfa218d6135e81b78272c3b904e2a7053b841481b38a7d901411", size = 33364, upload-time = "2025-12-16T00:40:22.96Z" },
    { url = "https://files.pythonhosted.org/packages/21/3f/3457ea803db0198c9aaca2dd373750972ce28a26f00544b6b85088811939/google_crc32c-1.8.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb5c869c2923d56cb0c8e6bcdd73c009c36ae39b652dbe46a05eb4ef0ad01454", size = 33740, upload-time = "2025-12-16T00:40:23.96Z" },
    { url = "https://files.pythonhosted.org/packages/df/c0/87c2073e0c72515bb8733d4eef7b21548e8d189f094b5dad20b0ecaf64f6/google_crc32c-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:3cc0c8912038065eafa603b238abf252e204accab2a704c63b9e14837a854962", size = 34437, upload-time = "2025-12-16T00:35:21.395Z" },
    { url = "https://files.pythonhosted.org/packages/52/c5/c171e4d8c44fec1422d801a6d2e5d7ddabd733eeda505c79730ee9607f07/google_crc32c-1.8.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:87fa445064e7db928226b2e6f0d5304ab4cd0339e664a4e9a25029f384d9bb93", size = 28615, upload-time = "2025-12-16T00:40:29.298Z" },
    { url = "https://files.pythonhosted.org/packages/9c/97/7d75fe37a7a6ed171a2cf17117177e7aab7e6e0d115858741b41e9dd4254/google_crc32c-1.8.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f639065ea2042d5c034bf258a9f085eaa7af0cd250667c0635a3118e8f92c69c", size = 28800, upload-time = "2025-12-16T00:40:30.322Z" },
 ]
 [[package]]
 name = "google-genai"
 version = "1.68.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "anyio" },
    { name = "distro" },
    { name = "google-auth", extra = ["requests"] },
    { name = "httpx" },
    { name = "pydantic" },
    { name = "requests" },
    { name = "sniffio" },
    { name = "tenacity" },
    { name = "typing-extensions" },
    { name = "websockets" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/9c/2c/f059982dbcb658cc535c81bbcbe7e2c040d675f4b563b03cdb01018a4bc3/google_genai-1.68.0.tar.gz", hash = "sha256:ac30c0b8bc630f9372993a97e4a11dae0e36f2e10d7c55eacdca95a9fa14ca96", size = 511285, upload-time = "2026-03-18T01:03:18.243Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/84/de/7d3ee9c94b74c3578ea4f88d45e8de9405902f857932334d81e89bce3dfa/google_genai-1.68.0-py3-none-any.whl", hash = "sha256:a1bc9919c0e2ea2907d1e319b65471d3d6d58c54822039a249fe1323e4178d15", size = 750912, upload-time = "2026-03-18T01:03:15.983Z" },
 ]
 [[package]]
 name = "google-resumable-media"
 version = "2.9.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "google-crc32c" },
 ]
 sdist = { url = "https://files.pythonhosted.org/packages/00/4b/0b235beccc310d0a48adbc7246b719d173cca6c88c572dfa4b090e39143c/google_resumable_media-2.9.0.tar.gz", hash = "sha256:f7cfb224846a9dd444d125115dfbe8ef02a2b893e78f087762fe716a255a734b", size = 2164534, upload-time = "2026-05-07T08:04:44.236Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/07/73/3518e63deb1667c5409a4579e28daf5e84479a87a72c547e0487f7883dcd/google_resumable_media-2.9.0-py3-none-any.whl", hash = "sha256:c8901e88e389af8bed64d9696c74d8bad961865eb2236e13e0bfca9bb0a65ca3", size = 81507, upload-time = "2026-05-07T08:03:23.809Z" },
 ]
 [[package]]
 name = "googleapis-common-protos"
 version = "1.72.0"
@@ -4606,7 +4799,7 @@ dev = [
 [package.metadata]
 requires-dist = [
-    { name = "cartography", specifier = "==0.135.0" },
+    { name = "cartography", specifier = "==0.138.1" },
    { name = "celery", specifier = "==5.6.2" },
    { name = "defusedxml", specifier = "==0.7.1" },
    { name = "dj-rest-auth", extras = ["with-social", "jwt"], specifier = "==7.0.1" },
@@ -5931,6 +6124,38 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/34/db/b10e48aa8fff7407e67470363eac595018441cf32d5e1001567a7aeba5d2/websocket_client-1.9.0-py3-none-any.whl", hash = "sha256:af248a825037ef591efbf6ed20cc5faa03d3b47b9e5a2230a529eeee1c1fc3ef", size = 82616, upload-time = "2025-10-07T21:16:34.951Z" },
 ]
 [[package]]
 name = "websockets"
 version = "16.0"
 source = { registry = "https://pypi.org/simple" }
 sdist = { url = "https://files.pythonhosted.org/packages/04/24/4b2031d72e840ce4c1ccb255f693b15c334757fc50023e4db9537080b8c4/websockets-16.0.tar.gz", hash = "sha256:5f6261a5e56e8d5c42a4497b364ea24d94d9563e8fbd44e78ac40879c60179b5", size = 179346, upload-time = "2026-01-10T09:23:47.181Z" }
 wheels = [
    { url = "https://files.pythonhosted.org/packages/f2/db/de907251b4ff46ae804ad0409809504153b3f30984daf82a1d84a9875830/websockets-16.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:31a52addea25187bde0797a97d6fc3d2f92b6f72a9370792d65a6e84615ac8a8", size = 177340, upload-time = "2026-01-10T09:22:34.539Z" },
    { url = "https://files.pythonhosted.org/packages/f3/fa/abe89019d8d8815c8781e90d697dec52523fb8ebe308bf11664e8de1877e/websockets-16.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:417b28978cdccab24f46400586d128366313e8a96312e4b9362a4af504f3bbad", size = 175022, upload-time = "2026-01-10T09:22:36.332Z" },
    { url = "https://files.pythonhosted.org/packages/58/5d/88ea17ed1ded2079358b40d31d48abe90a73c9e5819dbcde1606e991e2ad/websockets-16.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:af80d74d4edfa3cb9ed973a0a5ba2b2a549371f8a741e0800cb07becdd20f23d", size = 175319, upload-time = "2026-01-10T09:22:37.602Z" },
    { url = "https://files.pythonhosted.org/packages/d2/ae/0ee92b33087a33632f37a635e11e1d99d429d3d323329675a6022312aac2/websockets-16.0-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:08d7af67b64d29823fed316505a89b86705f2b7981c07848fb5e3ea3020c1abe", size = 184631, upload-time = "2026-01-10T09:22:38.789Z" },
    { url = "https://files.pythonhosted.org/packages/c8/c5/27178df583b6c5b31b29f526ba2da5e2f864ecc79c99dae630a85d68c304/websockets-16.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7be95cfb0a4dae143eaed2bcba8ac23f4892d8971311f1b06f3c6b78952ee70b", size = 185870, upload-time = "2026-01-10T09:22:39.893Z" },
    { url = "https://files.pythonhosted.org/packages/87/05/536652aa84ddc1c018dbb7e2c4cbcd0db884580bf8e95aece7593fde526f/websockets-16.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d6297ce39ce5c2e6feb13c1a996a2ded3b6832155fcfc920265c76f24c7cceb5", size = 185361, upload-time = "2026-01-10T09:22:41.016Z" },
    { url = "https://files.pythonhosted.org/packages/6d/e2/d5332c90da12b1e01f06fb1b85c50cfc489783076547415bf9f0a659ec19/websockets-16.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:1c1b30e4f497b0b354057f3467f56244c603a79c0d1dafce1d16c283c25f6e64", size = 184615, upload-time = "2026-01-10T09:22:42.442Z" },
    { url = "https://files.pythonhosted.org/packages/77/fb/d3f9576691cae9253b51555f841bc6600bf0a983a461c79500ace5a5b364/websockets-16.0-cp311-cp311-win32.whl", hash = "sha256:5f451484aeb5cafee1ccf789b1b66f535409d038c56966d6101740c1614b86c6", size = 178246, upload-time = "2026-01-10T09:22:43.654Z" },
    { url = "https://files.pythonhosted.org/packages/54/67/eaff76b3dbaf18dcddabc3b8c1dba50b483761cccff67793897945b37408/websockets-16.0-cp311-cp311-win_amd64.whl", hash = "sha256:8d7f0659570eefb578dacde98e24fb60af35350193e4f56e11190787bee77dac", size = 178684, upload-time = "2026-01-10T09:22:44.941Z" },
    { url = "https://files.pythonhosted.org/packages/84/7b/bac442e6b96c9d25092695578dda82403c77936104b5682307bd4deb1ad4/websockets-16.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:71c989cbf3254fbd5e84d3bff31e4da39c43f884e64f2551d14bb3c186230f00", size = 177365, upload-time = "2026-01-10T09:22:46.787Z" },
    { url = "https://files.pythonhosted.org/packages/b0/fe/136ccece61bd690d9c1f715baaeefd953bb2360134de73519d5df19d29ca/websockets-16.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8b6e209ffee39ff1b6d0fa7bfef6de950c60dfb91b8fcead17da4ee539121a79", size = 175038, upload-time = "2026-01-10T09:22:47.999Z" },
    { url = "https://files.pythonhosted.org/packages/40/1e/9771421ac2286eaab95b8575b0cb701ae3663abf8b5e1f64f1fd90d0a673/websockets-16.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:86890e837d61574c92a97496d590968b23c2ef0aeb8a9bc9421d174cd378ae39", size = 175328, upload-time = "2026-01-10T09:22:49.809Z" },
    { url = "https://files.pythonhosted.org/packages/18/29/71729b4671f21e1eaa5d6573031ab810ad2936c8175f03f97f3ff164c802/websockets-16.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9b5aca38b67492ef518a8ab76851862488a478602229112c4b0d58d63a7a4d5c", size = 184915, upload-time = "2026-01-10T09:22:51.071Z" },
    { url = "https://files.pythonhosted.org/packages/97/bb/21c36b7dbbafc85d2d480cd65df02a1dc93bf76d97147605a8e27ff9409d/websockets-16.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e0334872c0a37b606418ac52f6ab9cfd17317ac26365f7f65e203e2d0d0d359f", size = 186152, upload-time = "2026-01-10T09:22:52.224Z" },
    { url = "https://files.pythonhosted.org/packages/4a/34/9bf8df0c0cf88fa7bfe36678dc7b02970c9a7d5e065a3099292db87b1be2/websockets-16.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a0b31e0b424cc6b5a04b8838bbaec1688834b2383256688cf47eb97412531da1", size = 185583, upload-time = "2026-01-10T09:22:53.443Z" },
    { url = "https://files.pythonhosted.org/packages/47/88/4dd516068e1a3d6ab3c7c183288404cd424a9a02d585efbac226cb61ff2d/websockets-16.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:485c49116d0af10ac698623c513c1cc01c9446c058a4e61e3bf6c19dff7335a2", size = 184880, upload-time = "2026-01-10T09:22:55.033Z" },
    { url = "https://files.pythonhosted.org/packages/91/d6/7d4553ad4bf1c0421e1ebd4b18de5d9098383b5caa1d937b63df8d04b565/websockets-16.0-cp312-cp312-win32.whl", hash = "sha256:eaded469f5e5b7294e2bdca0ab06becb6756ea86894a47806456089298813c89", size = 178261, upload-time = "2026-01-10T09:22:56.251Z" },
    { url = "https://files.pythonhosted.org/packages/c3/f0/f3a17365441ed1c27f850a80b2bc680a0fa9505d733fe152fdf5e98c1c0b/websockets-16.0-cp312-cp312-win_amd64.whl", hash = "sha256:5569417dc80977fc8c2d43a86f78e0a5a22fee17565d78621b6bb264a115d4ea", size = 178693, upload-time = "2026-01-10T09:22:57.478Z" },
    { url = "https://files.pythonhosted.org/packages/72/07/c98a68571dcf256e74f1f816b8cc5eae6eb2d3d5cfa44d37f801619d9166/websockets-16.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:349f83cd6c9a415428ee1005cadb5c2c56f4389bc06a9af16103c3bc3dcc8b7d", size = 174947, upload-time = "2026-01-10T09:23:36.166Z" },
    { url = "https://files.pythonhosted.org/packages/7e/52/93e166a81e0305b33fe416338be92ae863563fe7bce446b0f687b9df5aea/websockets-16.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:4a1aba3340a8dca8db6eb5a7986157f52eb9e436b74813764241981ca4888f03", size = 175260, upload-time = "2026-01-10T09:23:37.409Z" },
    { url = "https://files.pythonhosted.org/packages/56/0c/2dbf513bafd24889d33de2ff0368190a0e69f37bcfa19009ef819fe4d507/websockets-16.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:f4a32d1bd841d4bcbffdcb3d2ce50c09c3909fbead375ab28d0181af89fd04da", size = 176071, upload-time = "2026-01-10T09:23:39.158Z" },
    { url = "https://files.pythonhosted.org/packages/a5/8f/aea9c71cc92bf9b6cc0f7f70df8f0b420636b6c96ef4feee1e16f80f75dd/websockets-16.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0298d07ee155e2e9fda5be8a9042200dd2e3bb0b8a38482156576f863a9d457c", size = 176968, upload-time = "2026-01-10T09:23:41.031Z" },
    { url = "https://files.pythonhosted.org/packages/9a/3f/f70e03f40ffc9a30d817eef7da1be72ee4956ba8d7255c399a01b135902a/websockets-16.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:a653aea902e0324b52f1613332ddf50b00c06fdaf7e92624fbf8c77c78fa5767", size = 178735, upload-time = "2026-01-10T09:23:42.259Z" },
    { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
 ]
 [[package]]
 name = "werkzeug"
 version = "3.1.7"
@@ -5945,16 +6170,16 @@ wheels = [
 [[package]]
 name = "workos"
-version = "6.0.4"
+version = "6.0.8"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "cryptography" },
    { name = "httpx" },
    { name = "pyjwt", extra = ["crypto"] },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/3c/2f/99fb8718274116c5c146c745755620fd5c5943f78ca52ca9b17e94348286/workos-6.0.4.tar.gz", hash = "sha256:b0bfe8fd212b8567422c4ea3732eb33608794033eb3a69900c6b04db183c32d6", size = 172217, upload-time = "2026-04-16T03:09:28.583Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/ca/0d/0a7f78912657f99412c788932ea1f3f4089916e77bdef7d2463842febe08/workos-6.0.8.tar.gz", hash = "sha256:43aa3f1992a0a4ca8933d9b6e5ada846dd3b1fe0ee10e64c876ee2000fc6090d", size = 178137, upload-time = "2026-04-24T18:48:03.203Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/fa/f1/d2ab661e6dc2828a4c73e38f12630c3b109cfe2bc664ab70631c04f0db4b/workos-6.0.4-py3-none-any.whl", hash = "sha256:548668b3702673536f853ba72a7b5bbbc269e467aaf9ac4f477b6e0177df5e21", size = 511418, upload-time = "2026-04-16T03:09:27.098Z" },
+    { url = "https://files.pythonhosted.org/packages/b2/3f/3d96da80d650b2f97d58af626053354584f619dbb769051e118bd9cd1ca5/workos-6.0.8-py3-none-any.whl", hash = "sha256:a00dd4930333aded2babbba824f8032eea05c5ca8c44d04a3fa068cf6be6e21a", size = 524505, upload-time = "2026-04-24T18:48:01.389Z" },
 ]
 [[package]]
@@ -3,13 +3,13 @@ title: "Attack Paths"
 description: "Identify privilege escalation chains and security misconfigurations across cloud environments using graph-based analysis."
 ---
-import { VersionBadge } from "/snippets/version-badge.mdx"
+import { VersionBadge } from "/snippets/version-badge.mdx";
 <VersionBadge version="5.17.0" />
 Attack Paths analyzes relationships between cloud resources, permissions, and security findings to detect how privileges can be escalated and how misconfigurations can be exploited by threat actors.
-By mapping these relationships as a graph, Attack Paths reveals risks that individual security checks cannot detect on their own — such as an IAM role that can escalate its own permissions, or a chain of policies that grants unintended access to sensitive resources.
+By mapping these relationships as a graph, Attack Paths reveals risks that individual security checks cannot detect on their own, such as an IAM role that can escalate its own permissions, or a chain of policies that grants unintended access to sensitive resources.
 <Note>
  Attack Paths is currently available for **AWS** providers. Support for
@@ -21,7 +21,7 @@ By mapping these relationships as a graph, Attack Paths reveals risks that indiv
 The following prerequisites are required for Attack Paths:
 - **An AWS provider is configured** with valid credentials in Prowler App. For setup instructions, see [Getting Started with AWS](/user-guide/providers/aws/getting-started-aws).
- **At least one scan has completed** on the configured AWS provider. Attack Paths scans run automatically alongside regular security scans — no separate configuration is required.
+- **At least one scan has completed** on the configured AWS provider. Attack Paths scans run automatically alongside regular security scans, no separate configuration is required.
 ## How Attack Paths Scans Work
@@ -145,11 +145,10 @@ LIMIT 25
 **IAM principals with wildcard Allow statements:**
 ```cypher
-MATCH (principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+MATCH (principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
-WHERE stmt.effect = 'Allow'
+MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
-  AND ANY(action IN stmt.action WHERE action = '*')
+WHERE a.value = '*'
-RETURN principal.arn AS principal, policy.arn AS policy,
+RETURN DISTINCT principal.arn AS principal, policy.arn AS policy
       stmt.action AS actions, stmt.resource AS resources
 LIMIT 25
 ```
@@ -173,218 +172,89 @@ RETURN r.name AS role_name, r.arn AS role_arn, p.arn AS trusted_service
 LIMIT 25
 ```
-### Advanced Attack Path Scenarios
+### Working with List-Typed Properties
-The following scenarios show how to compose graph traversals into real attack-path stories. Each query can be pasted directly into the custom query box: the API auto-scopes them to the selected provider and injects tenant/provider isolation, so there is no need to include account identifiers or `$provider_uid` in the text. All queries are openCypher v9 (Neo4j and Neptune compatible).
+Some Cartography node properties carry a list of values, such as `action`, `resource`, `notaction`, and `notresource` on `AWSPolicyStatement` nodes, the algorithms on `KMSKey`, the container-definition lists on `ECSContainerDefinition`, and many others. The Attack Paths graph models each such property as a set of child item nodes connected to the parent by a typed edge. To read the values, traverse the edge; the parent does not carry the list as a single field.
-#### 1. Live attacker on the box that owns the keys
+The naming convention for any list-typed property on a parent label is:
-**Query story:** Finds an internet-exposed EC2 under an active GuardDuty SSH brute-force whose instance role can assume a higher-privileged role that can read a sensitive S3 bucket.
+- **Child label:** `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` resolves to `AWSPolicyStatementResourceItem`.
 - **Edge type:** `HAS_<PROPERTY_UPPER>`. Example: `resource` resolves to `HAS_RESOURCE`.
 - **Child property:** `value` for scalar lists (one string per list element). List-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) carry the original dict keys as named fields on the child node.
 To express "at least one item in the list satisfies a predicate", traverse the `HAS_*` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`. `RETURN DISTINCT` collapses duplicate parent rows produced when multiple child items satisfy the filter:
 ```cypher
-MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
+MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
-WHERE ec2.exposed_internet = true
+MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
-MATCH p0 = (gd:GuardDutyFinding)-[:AFFECTS]->(ec2)
+WHERE toLower(a.value) STARTS WITH 's3:get'
-MATCH p1 = (ec2)-[:INSTANCE_PROFILE]->(prof:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(low:AWSRole)
+   OR toLower(a.value) STARTS WITH 's3:list'
-MATCH p2 = (low)-[:STS_ASSUMEROLE_ALLOW]-(high:AWSRole)
+RETURN DISTINCT stmt
 MATCH p3 = (high)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
 OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
 MATCH path_s3 = (acct)--(s3:S3Bucket)
 WHERE high <> low
  AND stmt.effect = 'Allow'
  AND size([a IN stmt.action WHERE
        toLower(a) STARTS WITH 's3:getobject'
        OR toLower(a) STARTS WITH 's3:listbucket'
        OR toLower(a) IN ['s3:*']
      ]) > 0
  AND size([r IN stmt.resource WHERE
        r CONTAINS s3.name
      ]) > 0
 RETURN path_net, path_ec2, p0, p1, p2, p3, path_s3
 ```
 **How it's built:**
 - `path_ec2` anchors the graph on the account node and its internet-exposed EC2 instance, via a real account-to-resource edge. This is the visible spine that keeps everything connected.
 - `p0` ties a `GuardDutyFinding` to that instance through the `AFFECTS` edge (the live SSH brute-force alert).
 - `p1` walks the real graph edges from the instance to its instance profile to the role it runs as.
 - `p2` follows the `STS_ASSUMEROLE_ALLOW` edge to the higher-privileged role the low role can assume. It is undirected so it works regardless of how the assume edge was ingested. `high <> low` stops a role matching itself.
 - `p3` walks that role into its policy and policy statement.
 - `path_net` is the optional `Internet -[:CAN_ACCESS]-> instance` edge. It makes "from the internet" literal on screen. Optional so a missing `Internet` node never breaks the query live.
 - `path_s3` connects the sensitive bucket to the same account node, so it draws connected instead of floating. There is no physical edge from a role to a bucket; the grant is logical, enforced in the `WHERE`: the statement must allow an S3 read action (list comprehension over the `action` array) and its resource must cover the bucket (`CONTAINS s3.name`). The account is the shared hub; the bucket hanging off it next to the role chain is the teaching moment — the access exists only in IAM.
 #### 2. Who can read the crown jewels
 **Query story:** The sensitive bucket from the previous scenario seen from the data side: every role whose IAM policy can read it, regardless of how the role is reached.
 ```cypher
 MATCH (s3:S3Bucket)
 WHERE toLower(s3.name) CONTAINS 'sensitive'
 MATCH (role:AWSRole)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
 WHERE stmt.effect = 'Allow'
  AND size([a IN stmt.action WHERE
        toLower(a) STARTS WITH 's3:get'
        OR toLower(a) STARTS WITH 's3:list'
        OR toLower(a) IN ['s3:*']
      ]) > 0
  AND size([r IN stmt.resource WHERE
        r CONTAINS s3.name
      ]) > 0
 WITH DISTINCT s3, role
 LIMIT 25
 MATCH path_s3 = (acct:AWSAccount)--(s3)
 MATCH path_role = (acct)--(role)
 RETURN path_s3, path_role
 ```
-**How it's built:** data-centric, not attacker-centric — the same bucket the previous kill chain exfiltrates, approached from the other direction.
+To check whether every item in the list satisfies a predicate, count the counter-examples and require zero, together with a guard that ensures at least one item is attached. This is the one case where the pattern-comprehension form is the right tool:
 - The `S3Bucket` is bound first by name (one node), so everything else filters against it.
 - `(role:AWSRole)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)` reaches statements only *through a role*, never via a global statement scan. A blanket `AWSPolicyStatement` scan also hits resource-policy statements whose shape differs and makes the list comprehension fail outright.
 - The `WHERE` filters in place: an S3 read action plus a resource that names that bucket.
 - `WITH DISTINCT s3, role LIMIT 25` collapses undirected-traversal duplicates and hard-caps the result.
 - `path_s3` and `path_role` attach the account hubs only after the cap, against at most 25 rows, so the bucket and role(s) draw connected through the account instead of floating.
 - No internet or EC2 here; this answers "who has the keys" instead of "how would an attacker get in."
 #### 3. Lateral reach from an internet-exposed instance
 **Query story:** The wide-angle view of the live-attacker scenario: every internet-exposed EC2, the role it runs as, and every role that role can assume. The first scenario is one specific exfiltration path inside this reach, under live attack.
 ```cypher
-MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
+MATCH (stmt:AWSPolicyStatement)
-WHERE ec2.exposed_internet = true
+WHERE size([
-MATCH p1 = (ec2)-[:INSTANCE_PROFILE]->(prof:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(low:AWSRole)
+    (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
-MATCH p2 = (low)-[:STS_ASSUMEROLE_ALLOW]-(high:AWSRole)
+    WHERE NOT toLower(a.value) STARTS WITH 's3:'
-OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
+    | a
-WHERE high <> low
+  ]) = 0
-RETURN path_net, path_ec2, p1, p2
+  AND size([(stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem) | a]) > 0
 RETURN stmt
 LIMIT 25
 ```
-**How it's built:** widens the lens instead of filtering down. It stops at the assume-role hop and shows every role reachable from any internet-exposed instance, without filtering down to a specific S3 leg.
+For the "is any item of this list a substring of a dynamic value" case, such as "does any resource pattern in this policy match a target role ARN", add the `HAS_*` traversal as its own `MATCH` and check the substring relationship between the item value and the dynamic node in `WHERE`:
 - `path_ec2` is the account-to-instance spine.
 - `p1` walks to the instance role.
 - `p2` fans out to every role that role can assume.
 - `path_net` adds the optional `Internet -[:CAN_ACCESS]->` edge.
 - The first scenario is the specific exfiltration path under live attack; this is the broader privilege reach an attacker inherits the moment they land on the box.
 #### 4. Role-chain privilege escalation
 **Query story:** A pure-IAM escalation, no compromised instance: a role that can assume a second role whose policy lets it assume a third, admin-level role.
 ```cypher
-MATCH path_root = (acct:AWSAccount)--(r1:AWSRole)
+MATCH (role:AWSRole)
-MATCH p1 = (r1)-[:STS_ASSUMEROLE_ALLOW]-(r2:AWSRole)
+WHERE role.name = 'Admin'
-MATCH p2 = (r2)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
+MATCH (principal:AWSPrincipal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
-MATCH path_admin = (acct)--(admin:AWSRole)
+MATCH (stmt)-[:HAS_RESOURCE]->(r:AWSPolicyStatementResourceItem)
-WHERE r1 <> r2 AND r1 <> admin AND r2 <> admin
+WHERE r.value = '*'
-  AND stmt.effect = 'Allow'
+   OR r.value CONTAINS role.name
-  AND size([a IN stmt.action WHERE
+   OR role.arn CONTAINS r.value
-        toLower(a) IN ['sts:*', 'sts:assumerole']
+RETURN DISTINCT principal.arn AS principal, stmt, role
-      ]) > 0
+LIMIT 25
  AND size([res IN stmt.resource WHERE
        res CONTAINS admin.name
      ]) > 0
 RETURN path_root, p1, p2, path_admin
 ```
-**How it's built:**
+To return the list of values directly, collect them from the child items:
 - `path_root` anchors role 1 to the account node, the spine that keeps the picture connected.
 - `p1` is the one real assume edge in the chain (role 1 to role 2).
 - `p2` walks role 2 into its policy and statement.
 - `path_admin` connects the target admin role to the same account node so it draws connected. The third hop is not a graph edge: it exists only as `sts:AssumeRole` on that role's ARN inside the statement. The query proves it the same way the first scenario proves S3 access — the statement action must include an assume-role action and its resource list must reference the admin role's name.
 - The three `<>` guards stop a role matching itself at any position.
 #### 5. External identity trust map
 **Query story:** Finds external identity providers (SSO, GitHub, GitLab, Terraform Cloud) and the AWS roles they are trusted to assume.
 ```cypher
-MATCH p = (role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(idp:AWSPrincipal)
+MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
-WHERE idp.arn CONTAINS 'saml-provider'
+OPTIONAL MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
-   OR idp.arn CONTAINS 'oidc-provider'
+RETURN stmt, collect(a.value) AS actions
-MATCH path_role = (acct:AWSAccount)--(role)
+LIMIT 25
 RETURN p, path_role
 ```
-**How it's built:** federated principals are stored as `AWSPrincipal` nodes whose ARN contains `saml-provider` (SSO) or `oidc-provider` (GitHub, GitLab, Terraform Cloud).
+### Working with JSON-Encoded Properties
- `p` matches the trust edge undirected. It is written `(AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(AWSPrincipal)`, role to principal, so a directed `principal -> role` match returns nothing; undirected matches regardless of ingest direction.
+Some Cartography properties represent nested objects, most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement` nodes. In the Attack Paths graph, object-typed properties are stored as JSON-encoded strings to keep the schema portable across graph backends. The value looks like:
 - The `WHERE` keeps only SAML or OIDC providers, drawing a fan-out from each external identity provider to every role it can assume (including reserved SSO admin roles).
 - `path_role` ties every trusted role to the account node so the provider stars share one spine instead of drawing as separate islands.
-#### 6. Federated SSO roles flagged as admin or privesc
+```
 '{"StringEquals":{"aws:SourceAccount":"123456789012"}}'
 ```
-**Query story:** The dangerous subset of the trust map above — externally-federated SSO roles that Prowler also flags for AdministratorAccess or privilege escalation.
+There is no JSON parser available at query time, so use `CONTAINS` for substring checks against keys or known values:
 ```cypher
-MATCH (idp:AWSPrincipal)-[:TRUSTS_AWS_PRINCIPAL]-(role:AWSRole)
+MATCH (stmt:AWSPolicyStatement)
-WHERE idp.arn CONTAINS 'saml-provider'
+WHERE stmt.effect = 'Allow'
-   OR idp.arn CONTAINS 'oidc-provider'
+  AND stmt.condition CONTAINS '"aws:SourceAccount"'
-MATCH (role)-[:HAS_FINDING]-(pf:ProwlerFinding)
+RETURN stmt
-WHERE pf.status = 'FAIL'
+LIMIT 25
  AND pf.check_id IN [
    'iam_inline_policy_allows_privilege_escalation',
    'iam_role_administratoraccess_policy',
    'iam_inline_policy_no_administrative_privileges',
    'iam_user_administrator_access_policy'
  ]
 WITH DISTINCT idp, role, pf
 LIMIT 60
 MATCH path_root = (acct:AWSAccount)--(role)
 MATCH p_trust = (idp)-[:TRUSTS_AWS_PRINCIPAL]-(role)
 MATCH p_find = (role)-[:HAS_FINDING]-(pf)
 RETURN path_root, p_trust, p_find
 ```
-**How it's built:** a plain "list every flagged identity" query is a wide fan that draws as a column, and `ProwlerFinding` nodes accumulate across scans with no scan filter available in custom queries.
+When a query needs to inspect the structured members of a condition (for example, evaluate every operator and key), fetch the rows first and parse the JSON in application code. Cypher cannot navigate JSON object keys or values.
 - The first MATCH plus `WHERE` keeps only roles trusted by a SAML or OIDC provider (trust edge undirected, so direction does not matter).
 - The second MATCH plus `check_id IN [...]` keeps only those carrying one of the four privilege-escalation or admin checks.
 - `WITH DISTINCT ... LIMIT 60` collapses duplicate finding nodes and hard-caps the result.
 - `p_trust`, `p_find`, and `path_root` draw it connected three ways: provider to role through the trust edge, role to its finding, and role to the account.
 - The previous scenario shows who can walk in; this shows which of those roles Prowler already flags as over-privileged.
 #### 7. World-readable S3 buckets
 **Query story:** Unlike the IAM-gated sensitive bucket in scenarios 1 and 2, these buckets are open to anyone on the internet with no credentials at all.
 ```cypher
 MATCH path_s3 = (acct:AWSAccount)--(s3:S3Bucket)
 WHERE s3.anonymous_access = true
 OPTIONAL MATCH p = (s3)--(stmt:S3PolicyStatement)
 RETURN path_s3, p
 ```
 **How it's built:** the counterpoint to scenarios 1 and 2 — there the sensitive bucket is reachable only through an IAM role chain; here the bucket needs no role at all.
 - `path_s3` connects each public bucket to its account node so they draw connected. Cartography sets `anonymous_access = true` when a bucket's policy or ACL allows public access.
 - `p` is an optional match that pulls in the `S3PolicyStatement` granting the access where one exists, so the public grant is visible next to the bucket. Buckets that are public via ACL only still show, connected to the account.
 #### 8. Internet exposure surface
 **Query story:** The raw external attack surface behind scenarios 1 and 3: every internet-exposed EC2 instance with its security groups and the exact inbound ports left open.
 ```cypher
 MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
 WHERE ec2.exposed_internet = true
 MATCH p1 = (ec2)--(sg:EC2SecurityGroup)--(rule:IpPermissionInbound)
 OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
 OPTIONAL MATCH p2 = (ec2)-[:INSTANCE_PROFILE]->(:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(:AWSRole)
 RETURN path_net, path_ec2, p1, p2
 ```
 **How it's built:** `exposed_internet = true` is Cartography's computed reachability flag.
 - `path_ec2` hubs all exposed instances on the account node so they draw as one picture.
 - `p1` joins each instance to its security groups and inbound rules so the open ports are on screen.
 - `path_net` adds the optional `Internet -[:CAN_ACCESS]->` edge so the external reachability is explicit.
 - `p2` optionally adds the instance role, which connects this surface view back to the kill chains in scenarios 1 and 3.
 ### Tips for Writing Queries
 - Start small with `LIMIT` to inspect the shape of the data before broadening the pattern.
 - Traverse `HAS_*` edges to reach list-typed property values (for example `action`, `resource`). The parent node does not carry the list as a single field; see [Working with List-Typed Properties](#working-with-list-typed-properties) for the patterns.
 - On large scans, avoid broad disconnected patterns such as `MATCH (a:Label), (b:OtherLabel)`. Bind one side with a selective predicate first, and use `WITH DISTINCT` between expanding traversals when duplicates are possible.
 - Use `RETURN` projections (`RETURN n.name, n.region`) instead of returning whole nodes to keep responses compact.
 - Combine resource nodes with `ProwlerFinding` nodes via `HAS_FINDING` to correlate misconfigurations with the affected resources.
 - When a query times out or returns no rows, simplify the pattern step by step until the first variant runs successfully, then add constraints back.
@@ -401,6 +271,8 @@ In addition to the upstream schema, Prowler enriches the graph with:
 - **`ProwlerFinding`** nodes representing Prowler check results, linked to affected resources via `HAS_FINDING` relationships.
 - **`Internet`** nodes used to model exposure paths from the public internet to internal resources.
 - **List-typed properties** such as `action` or `resource` on `AWSPolicyStatement`, the algorithm lists on `KMSKey`, and similar lists on other node types are modeled as child item nodes linked by typed `HAS_*` edges. See [Working with List-Typed Properties](#working-with-list-typed-properties) for the read pattern.
 - **Object-typed properties** such as `condition` on `AWSPolicyStatement` are stored as JSON-encoded strings. See [Working with JSON-Encoded Properties](#working-with-json-encoded-properties) for the read pattern.
 <Note>
  AI assistants connected through Prowler MCP Server can fetch the exact
@@ -540,13 +412,13 @@ Attack Paths currently supports the following built-in queries for AWS:
 #### Custom Attack Path Queries
 | Query                                             | Description                                                                              |
-|---|---|
+| ------------------------------------------------- | ---------------------------------------------------------------------------------------- |
 | **Internet-Exposed EC2 with Sensitive S3 Access** | Find SSH-exposed EC2 instances that can assume roles to read tagged sensitive S3 buckets |
 #### Basic Resource Queries
 | Query                                       | Description                                                         |
-|---|---|
+| ------------------------------------------- | ------------------------------------------------------------------- |
 | **RDS Instances Inventory**                 | List all provisioned RDS database instances in the account          |
 | **Unencrypted RDS Instances**               | Find RDS instances with storage encryption disabled                 |
 | **S3 Buckets with Anonymous Access**        | Find S3 buckets that allow anonymous access                         |
@@ -557,7 +429,7 @@ Attack Paths currently supports the following built-in queries for AWS:
 #### Network Exposure Queries
 | Query                                                 | Description                                                                         |
-|---|---|
+| ----------------------------------------------------- | ----------------------------------------------------------------------------------- |
 | **Internet-Exposed EC2 Instances**                    | Find EC2 instances flagged as exposed to the internet                               |
 | **Open Security Groups on Internet-Facing Resources** | Find internet-facing resources with security groups allowing inbound from 0.0.0.0/0 |
 | **Internet-Exposed Classic Load Balancers**           | Find Classic Load Balancers exposed to the internet with their listeners            |
@@ -569,7 +441,7 @@ Attack Paths currently supports the following built-in queries for AWS:
 These queries are based on research from [pathfinding.cloud](https://pathfinding.cloud) by Datadog.
 | Query                                                                                        | Description                                                                                                                                                                             |
-|---|---|
+| -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
 | **App Runner Service Creation with Privileged Role (APPRUNNER-001)**                         | Create an App Runner service with a privileged IAM role to gain its permissions                                                                                                         |
 | **App Runner Service Update for Role Access (APPRUNNER-002)**                                | Update an existing App Runner service to leverage its already-attached privileged role                                                                                                  |
 | **Bedrock Code Interpreter with Privileged Role (BEDROCK-001)**                              | Create a Bedrock AgentCore Code Interpreter with a privileged role attached                                                                                                             |
@@ -638,6 +510,7 @@ These queries are based on research from [pathfinding.cloud](https://pathfinding
 | **Role Assumption for Privilege Escalation (STS-001)**                                       | Assume IAM roles with elevated permissions by exploiting bidirectional trust between the starting principal and the target role                                                         |
 These tools enable workflows such as:
 - Asking an AI assistant to identify privilege escalation paths in a specific AWS account
 - Automating attack path analysis across multiple scans
 - Combining attack path data with findings and compliance information for comprehensive security reports
@@ -2,13 +2,14 @@
 name: prowler-attack-paths-query
 description: >
  Creates Prowler Attack Paths openCypher queries using the Cartography schema as the source of truth
-  for node labels, properties, and relationships. Also covers Prowler-specific additions (Internet node,
+  for node labels, properties, and relationships. Covers Prowler-specific additions (Internet node,
-  ProwlerFinding, internal isolation labels) and $provider_uid scoping for predefined queries.
+  ProwlerFinding, internal isolation labels), $provider_uid scoping, and list-property item nodes
  with typed `HAS_*` edges that run efficiently on both Neo4j and Amazon Neptune sinks.
  Trigger: When creating or updating Attack Paths queries.
 license: Apache-2.0
 metadata:
  author: prowler-cloud
-  version: "2.0"
+  version: "3.0"
  scope: [root, api]
  auto_invoke:
    - "Creating Attack Paths queries"
@@ -19,36 +20,30 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task
 ## Overview
-Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs (ingested via Cartography) to detect security risks like privilege escalation paths, network exposure, and misconfigurations.
+Attack Paths queries are read-only openCypher queries over a Cartography-ingested cloud graph that detect privilege escalation chains, network exposure, and other graph-shaped security risks. Queries are written in openCypher Version 9 so they run on both Neo4j and Amazon Neptune sinks.
 Queries are written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
 ---
 ## Two query audiences
 This skill covers two types of queries with different isolation mechanisms:
 |                    | Predefined queries                                          | Custom queries                                                        |
-|---|---|---|
+| ------------------ | ----------------------------------------------------------- | --------------------------------------------------------------------- |
-| **Where they live** | `api/src/backend/api/attack_paths/queries/{provider}.py` | User/LLM-supplied via the custom query API endpoint |
+| Where they live    | `api/src/backend/api/attack_paths/queries/{provider}.py`    | User-supplied via the custom query API endpoint                       |
-| **Provider isolation** | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection via `cypher_sanitizer.py` |
+| Provider isolation | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection by `cypher_sanitizer.py` |
-| **What to write** | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate needed |
+| What to write      | Chain every MATCH from the `aws` variable                   | Plain Cypher, no isolation boilerplate                                |
-| **Internal labels** | Never use (`_ProviderResource`, `_Tenant_*`, `_Provider_*`) | Never use (injected automatically by the system) |
+| Internal labels    | Never use                                                   | Never use (system-injected)                                           |
-**For predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. This is the isolation boundary.
+**Predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. That is the isolation boundary.
-**For custom queries**: write natural Cypher without isolation concerns. The query runner injects a `_Provider_{uuid}` label into every node pattern before execution, and a post-query filter catches edge cases.
+**Custom queries**: write natural Cypher. The runner injects a `_Provider_{uuid}` label into every node pattern, and a post-query filter handles edge cases.
 ---
-## Input Sources
+## Input sources
-Queries can be created from:
+Two sources for new queries:
-1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
+1. **pathfinding.cloud ID** (e.g. `ECS-001`, `GLUE-001`), the Datadog research catalogue. The aggregated `paths.json` is too large for WebFetch:
   - Reference: https://github.com/DataDog/pathfinding.cloud
   - The aggregated `paths.json` is too large for WebFetch. Use Bash:
   ```bash
   # Fetch a single path by ID
@@ -64,28 +59,24 @@ Queries can be created from:
     | jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
   ```
-   If `jq` is not available, use `python3 -c "import json,sys; ..."` as a fallback.
+   If `jq` is unavailable, use `python3 -c "import json,sys; ..."`.
-2. **Natural language description** from the user
+2. **Natural language description** from the requester.
 ---
-## Query Structure
+## Query structure
 ### Provider scoping parameter
-One parameter is injected automatically by the query runner:
+| Parameter       | Property | Used on      | Purpose                                |
 | --------------- | -------- | ------------ | -------------------------------------- |
 | `$provider_uid` | `id`     | `AWSAccount` | Scopes the query to a specific account |
-| Parameter       | Property it matches | Used on      | Purpose                          |
+The runner binds `$provider_uid` automatically. Every other node is isolated by path connectivity from the `AWSAccount` anchor.
 | --------------- | ------------------- | ------------ | -------------------------------- |
 | `$provider_uid` | `id`                | `AWSAccount` | Scopes to a specific AWS account |
 All other nodes are isolated by path connectivity from the `AWSAccount` anchor.
 ### Imports
 All query files start with these imports:
 ```python
 from api.attack_paths.queries.types import (
    AttackPathsQueryAttribution,
@@ -95,29 +86,33 @@ from api.attack_paths.queries.types import (
 from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
 ```
-The `PROWLER_FINDING_LABEL` constant (value: `"ProwlerFinding"`) is used via f-string interpolation in all queries. Never hardcode the label string.
+Always use `PROWLER_FINDING_LABEL` via f-string interpolation, never hardcode `"ProwlerFinding"`.
-### Privilege escalation sub-patterns
+### Definition fields
-There are four distinct privilege escalation patterns. Choose based on the attack type:
+- **id**: kebab-case `{provider}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
 - **name**: short, human-friendly label. Sourced queries append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
 - **short_description**: one sentence, no technical permissions.
 - **description**: full technical explanation, plain text.
 - **provider**: `aws`, `azure`, `gcp`, `kubernetes`, or `github`.
 - **cypher**: f-string Cypher body. Literal `{` / `}` are escaped as `{{` / `}}`.
 - **parameters**: `parameters=[]` if none.
 - **attribution**: optional `AttackPathsQueryAttribution(text, link)` for sourced queries. `link` uses the lowercase ID.
-| Sub-pattern | Target | `path_target` shape | Example |
+Append the constant to the `{PROVIDER}_QUERIES` list at the bottom of the provider file.
 |---|---|---|---|
 | Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
 | Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
 | Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
 | PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(...)` | EC2-001 |
-#### Self-escalation (e.g., IAM-001)
+---
-The principal modifies resources attached to itself. `path_target` loops back to `principal`:
+## Predefined query template
 The canonical shape combines a principal walk, an optional target walk, deduplicated nodes, and a typed finding overlay:
 ```python
 AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    id="aws-{kebab-case-name}",
-    name="{Human-friendly label} ({REFERENCE_ID})",
+    name="{Label} ({REFERENCE_ID})",
-    short_description="{Brief explanation, no technical permissions.}",
+    short_description="{One sentence.}",
-    description="{Detailed description of the attack vector and impact.}",
+    description="{Full technical explanation.}",
    attribution=AttackPathsQueryAttribution(
        text="pathfinding.cloud - {REFERENCE_ID} - {permission}",
        link="https://pathfinding.cloud/paths/{reference_id_lowercase}",
@@ -125,29 +120,27 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    provider="aws",
    cypher=f"""
        // Find principals with {permission}
-        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {{effect: 'Allow'}})
-        WHERE stmt.effect = 'Allow'
+        MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
-            AND any(action IN stmt.action WHERE
+        WHERE toLower(act.value) IN ['{permission_lowercase}', '{service}:*']
-                toLower(action) = '{permission_lowercase}'
+           OR act.value = '*'
-                OR toLower(action) = '{service}:*'
+        WITH DISTINCT aws, principal, stmt, path_principal
                OR action = '*'
            )
-        // Find target resources attached to the same principal
+        // Target resources attached to the same principal (sub-patterns below)
        MATCH path_target = (aws)--(target_policy:AWSPolicy)--(principal)
        WHERE target_policy.arn CONTAINS $provider_uid
-            AND any(resource IN stmt.resource WHERE
+        MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
-                resource = '*'
+        WHERE res.value = '*'
-                OR target_policy.arn CONTAINS resource
+           OR target_policy.arn CONTAINS res.value
            )
        WITH DISTINCT path_principal, path_target
        WITH collect(path_principal) + collect(path_target) AS paths
        UNWIND paths AS p
        UNWIND nodes(p) AS n
        WITH paths, collect(DISTINCT n) AS unique_nodes
        UNWIND unique_nodes AS n
-        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+        OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
        RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
    """,
@@ -155,39 +148,67 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
 )
 ```
-#### Other sub-pattern `path_target` shapes
+Key points:
-The other 3 sub-patterns share the same `path_principal`, deduplication tail, and RETURN as self-escalation. Only the `path_target` MATCH differs:
+- The principal walk types the `POLICY` and `STATEMENT` hops. Both are low-fan-out (each principal has a handful of policies; each policy a handful of statements), so the typed edge lets the planner cost a cheap inline filter.
 - The `(aws)--` hub hops stay anonymous. `AWSAccount` is a high-degree node that fans out to every principal, role, policy, and resource in the account; typing those edges forces the planner to enumerate from the hub and collapses performance on multi-tenant Neptune.
 - Other relationship types appear only where the file's existing queries already use one (`TRUSTS_AWS_PRINCIPAL`, `STS_ASSUMEROLE_ALLOW`, `MEMBER_AWS_GROUP`, `HAS_EXECUTION_ROLE`).
 - The finding probe is typed `:HAS_FINDING` and left undirected. The type lets Neptune apply an inline edge filter; the lack of direction matches the convention of the rest of the file.
 - Collapse duplicate rows after each permission gate with `WITH DISTINCT`, carrying only the variables needed by later clauses.
 - Each `HAS_*` traversal is its own `MATCH` clause with a `WHERE` on the child item node. `WITH DISTINCT path_principal, path_target` precedes `collect(path...)` to dedupe the row multiplication produced by the joins.
 - The `RETURN` shape `paths, dpf, dpfr` is the contract the serializer and visualiser depend on. Do not change it.
 ---
 ## Privilege escalation sub-patterns
 Four `path_target` shapes cover the common attack types. Each shares the canonical template's `path_principal`, deduplication tail, and `RETURN`; only the `path_target` MATCH and its resource predicate differ.
 | Sub-pattern         | Target                   | `path_target` shape                                                                                     | Example |
 | ------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------- | ------- |
 | Self-escalation     | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)`                                                         | IAM-001 |
 | Lateral to user     | Other IAM users          | `(aws)--(target_user:AWSUser)`                                                                          | IAM-002 |
 | Assume-role lateral | Assumable roles          | `(aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)`                                      | IAM-014 |
 | PassRole + service  | Service-trusting roles   | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: '{service}.amazonaws.com'})` | EC2-001 |
 **Multi-permission queries** (e.g. PassRole plus a service-create action) add permission gates before `path_target`. Reuse the per-query counter for new variables (`act2`, `policy2`, `stmt2`) and collapse rows after each gate:
 ```cypher
-// Lateral to user (e.g., IAM-002) - targets other IAM users
+MATCH (principal)-[:POLICY]->(policy2:AWSPolicy)-[:STATEMENT]->(stmt2:AWSPolicyStatement {effect: 'Allow'})
-MATCH path_target = (aws)--(target_user:AWSUser)
+MATCH (stmt2)-[:HAS_ACTION]->(act2:AWSPolicyStatementActionItem)
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_user.arn CONTAINS resource OR resource CONTAINS target_user.name)
+WHERE toLower(act2.value) IN ['service:*', 'service:createsomething']
-
+   OR act2.value = '*'
-// Assume-role lateral (e.g., IAM-014) - targets roles the principal can assume
+WITH DISTINCT aws, principal, stmt, stmt2, path_principal
 MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
 WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
 // PassRole + service (e.g., EC2-001) - targets roles trusting a service
 MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: '{service}.amazonaws.com'})
 WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
 ```
-**Multi-permission**: PassRole queries require a second permission. Add `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` with its own WHERE before `path_target`, then check BOTH `stmt.resource` AND `stmt2.resource` against the target. See IAM-015 or EC2-001 in `aws.py` for examples.
+If a permission is an existence-only gate whose statement resource is not checked later, keep the policy and statement anonymous and carry only the variables still needed:
-### Network exposure pattern
+```cypher
 MATCH (principal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(:AWSPolicyStatement {effect: 'Allow'})-[:HAS_ACTION]->(act3:AWSPolicyStatementActionItem)
 WHERE toLower(act3.value) IN ['service:*', 'service:othersomething']
   OR act3.value = '*'
 WITH DISTINCT aws, principal, stmt, path_principal
 ```
-The Internet node is reached via `CAN_ACCESS` through the already-scoped resource, not via a standalone lookup:
+When all matching principals can target the same independent resource set, collect principal paths before expanding targets instead of creating one row per principal-target pair:
 ```cypher
 WITH aws, collect(DISTINCT path_principal) AS principal_paths
 MATCH path_target = (aws)--(target)
 WITH principal_paths + collect(DISTINCT path_target) AS paths
 ```
 Statements that constrain a target are still checked via `HAS_RESOURCE` traversals (`res`, `res2`). See IAM-015 or EC2-001 in `aws.py`.
 ---
 ## Network exposure pattern
 The Internet node is reached via `CAN_ACCESS` through an already-scoped resource, never as a standalone lookup:
 ```python
 AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    id="aws-{kebab-case-name}",
    name="{Human-friendly label}",
    short_description="{Brief explanation.}",
    description="{Detailed description.}",
    provider="aws",
 cypher=f"""
-        // Match exposed resources (MUST chain from `aws`)
+    // Resource scoped through the account anchor
    MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
    WHERE resource.exposed_internet = true
@@ -200,113 +221,72 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
    UNWIND unique_nodes AS n
-        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+    OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
    RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
        internet, can_access
-    """,
+"""
    parameters=[],
 )
 ```
-### Register in query list
+The `CAN_ACCESS` edge stays typed and directed (`-[:CAN_ACCESS]->`); that is its canonical sync-time orientation.
 Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
 ```python
 AWS_QUERIES: list[AttackPathsQueryDefinition] = [
    # ... existing queries ...
    AWS_{NEW_QUERY_NAME},  # Add here
 ]
 ```
 ---
-## Step-by-step creation process
+## List-typed properties as child nodes
-### 1. Read the queries module
+Some Cartography node properties carry a list of values: `AWSPolicyStatement.action`, `AWSPolicyStatement.resource`, `KMSKey.encryption_algorithms`, `CloudFrontDistribution.aliases`, and many others. The graph models each such property as a set of child item nodes connected to the parent by a typed edge. Queries reach the values by traversing the edge; the parent does not carry the list as a single field.
-**FIRST**, read all files in the queries module to understand the structure, type definitions, registration, and existing style:
+### Naming convention
-```text
+For a list-typed parent property the sink stores:
-api/src/backend/api/attack_paths/queries/
+
-├── __init__.py      # Module exports
+- **Child label**: `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` → `AWSPolicyStatementResourceItem`.
-├── types.py         # AttackPathsQueryDefinition, AttackPathsQueryParameterDefinition
+- **Edge type**: `HAS_<PROPERTY_UPPER>`. Example: `resource` → `HAS_RESOURCE`.
-├── registry.py      # Query registry logic
+- **Child property**: `value` (a single scalar string) for scalar-list properties. For list-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) the child carries the dict keys as named fields per the catalog's `field_map`.
-└── {provider}.py    # Provider-specific queries (e.g., aws.py)
+
 ### Variable naming for child-item matches
 `aws.py` uses a per-query counter for each `HAS_*` traversal so chained matches stay unambiguous:
 | Edge              | First  | Second  | Third   |
 | ----------------- | ------ | ------- | ------- |
 | `HAS_ACTION`      | `act`  | `act2`  | `act3`  |
 | `HAS_RESOURCE`    | `res`  | `res2`  | `res3`  |
 | `HAS_NOTACTION`   | `nact` | `nact2` | `nact3` |
 | `HAS_NOTRESOURCE` | `nres` | `nres2` | `nres3` |
 The counter resets at the top of every query.
 ### Example - action match
 Find statements that grant `iam:PassRole`, `iam:*`, or `*`. Traverse the `HAS_ACTION` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`:
 ```cypher
 MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
 MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
 WHERE toLower(act.value) IN ['iam:passrole', 'iam:*']
   OR act.value = '*'
 ```
-**DO NOT** use generic templates. Match the exact style of existing queries in the file.
+The literal-action list is case-folded with `toLower(act.value)` because IAM authors mix case (`iam:PassRole`, `iam:passrole`); the `*` wildcard never lower-cases.
-### 2. Fetch and consult the Cartography schema
+### Example - resource ARN match
-**This is the most important step.** Every node label, property, and relationship in the query must exist in the Cartography schema for the pinned version. Do not guess or rely on memory.
+Find statements whose resource can target a specific role:
-Check `api/pyproject.toml` for the Cartography dependency, then fetch the schema:
+```cypher
-
+MATCH path_target = (aws)--(target_role:AWSRole)
-```bash
+MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
-grep cartography api/pyproject.toml
+WHERE res.value = '*'
   OR res.value CONTAINS target_role.name
   OR target_role.arn CONTAINS res.value
 ```
-Build the schema URL (ALWAYS use the specific tag, not master/main):
+Three predicates cover the cases: full wildcard (`*`), pattern containing the role name (`arn:aws:iam::*:role/admin*`), and pattern that is a prefix or component of the actual ARN.
-```text
+### Catalog of list properties
 # Git dependency (prowler-cloud/cartography@0.126.1):
 https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/{provider}/schema.md
-# PyPI dependency (cartography = "^0.126.0"):
+The provider catalog lives in `api/src/backend/tasks/jobs/attack_paths/provider_config.py` (`AWS_NORMALIZED_LISTS`). Beyond policy statements it includes KMS algorithms, ECS container-definition lists (`entry_point`, `command`, `links`, `dns_servers`, ...), CloudFront aliases, Inspector finding URL and vulnerability lists, RDS event-subscription categories, and others. To query a list property that is not in the catalog, add an entry there first so the sync layer materialises it.
 https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/{provider}/schema.md
 ```
 Read the schema to discover available node labels, properties, and relationships for the target resources. Internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never appear in queries.
 ### 4. Create query definition
 Use the appropriate pattern (privilege escalation or network exposure) with:
 - **id**: `{provider}-{kebab-case-description}`
 - **name**: Short, human-friendly label. For sourced queries, append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
 - **short_description**: Brief explanation, no technical permissions.
 - **description**: Full technical explanation. Plain text only.
 - **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
 - **cypher**: The openCypher query with proper escaping
 - **parameters**: Optional list of user-provided parameters (`parameters=[]` if none)
 - **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes source, reference ID, and permissions. The `link` uses a lowercase ID. Omit for non-sourced queries.
 ### 5. Add query to provider list
 Add the constant to the `{PROVIDER}_QUERIES` list.
 ---
 ## Query naming conventions
 ### Query ID
 ```text
 {provider}-{category}-{description}
 ```
 Examples: `aws-ec2-privesc-passrole-iam`, `aws-ec2-instances-internet-exposed`
 ### Query constant name
 ```text
 {PROVIDER}_{CATEGORY}_{DESCRIPTION}
 ```
 Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
 ---
 ## Query categories
 | Category             | Description                    | Example                   |
 | -------------------- | ------------------------------ | ------------------------- |
 | Basic Resource       | List resources with properties | RDS instances, S3 buckets |
 | Network Exposure     | Internet-exposed resources     | EC2 with public IPs       |
 | Privilege Escalation | IAM privilege escalation paths | PassRole + RunInstances   |
 | Data Access          | Access to sensitive data       | EC2 with S3 access        |
 ---
@@ -315,53 +295,42 @@ Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
 ### Match account and principal
 ```cypher
-MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
 ```
-### Check IAM action permissions
+The `(aws)--(principal)` hop stays anonymous; the `POLICY` and `STATEMENT` hops are typed.
 ### Roles trusting a service
 ```cypher
-WHERE stmt.effect = 'Allow'
+MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
    AND any(action IN stmt.action WHERE
        toLower(action) = 'iam:passrole'
        OR toLower(action) = 'iam:*'
        OR action = '*'
    )
 ```
-### Find roles trusting a service
+### Roles a principal can assume
 ```cypher
-MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
+MATCH path_target = (aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)
 ```
-### Find roles the principal can assume
+### JSON-encoded properties
-Note the arrow direction - `STS_ASSUMEROLE_ALLOW` points from the role to the principal:
+Object-typed Cartography properties (most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement`) are stored as JSON-encoded strings, e.g. `'{"StringEquals":{"aws:SourceAccount":"123456789012"}}'`. There is no JSON parser at query time, so use `CONTAINS` for substring checks:
 ```cypher
-MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
+WHERE stmt.condition CONTAINS '"aws:SourceAccount"'
 ```
-### Check resource scope
+For structured inspection, fetch the rows and parse in Python. Cypher cannot navigate JSON object keys.
 ```cypher
 WHERE any(resource IN stmt.resource WHERE
    resource = '*'
    OR target_role.arn CONTAINS resource
    OR resource CONTAINS target_role.name
 )
 ```
 ### Internet node via path connectivity
 The Internet node is reached through `CAN_ACCESS` relationships to already-scoped resources. No standalone lookup needed:
 ```cypher
 OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
 ```
-### Multi-label OR (match multiple resource types)
+`resource` must already be bound by the account-anchored pattern above.
 ### Multi-label OR (multiple resource types)
 ```cypher
 MATCH path = (aws:AWSAccount {id: $provider_uid})-[r]-(x)-[q]-(y)
@@ -373,7 +342,7 @@ WHERE (x:EC2PrivateIp AND x.public_ip = $ip)
 ### Include Prowler findings
-Deduplicate nodes before the ProwlerFinding lookup to avoid redundant OPTIONAL MATCH calls on nodes that appear in multiple paths:
+Deduplicate nodes before the typed finding probe to avoid one `OPTIONAL MATCH` per path-occurrence of the same node:
 ```cypher
 WITH collect(path_principal) + collect(path_target) AS paths
@@ -382,12 +351,12 @@ UNWIND nodes(p) AS n
 WITH paths, collect(DISTINCT n) AS unique_nodes
 UNWIND unique_nodes AS n
-OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
 RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
 ```
-For network exposure queries, aggregate the internet node and relationship alongside paths:
+For network-exposure queries, aggregate the Internet node and its edge alongside paths:
 ```cypher
 WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
@@ -396,7 +365,7 @@ UNWIND nodes(p) AS n
 WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
 UNWIND unique_nodes AS n
-OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
 RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
    internet, can_access
@@ -406,22 +375,22 @@ RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
 ## Prowler-specific labels and relationships
-These are added by the sync task, not part of the Cartography schema. For all other node labels, properties, and relationships, **always consult the Cartography schema** (see step 2 below).
+Added by the sync task, not part of the Cartography schema. For everything else, consult the pinned Cartography schema (see "Creation steps").
 | Label / Relationship   | Description                                                 |
-| ---------------------- | -------------------------------------------------- |
+| ---------------------- | ----------------------------------------------------------- |
 | `ProwlerFinding`       | Finding node (`status`, `severity`, `check_id`)             |
 | `Internet`             | Internet sentinel node                                      |
-| `CAN_ACCESS`           | Internet-to-resource exposure (relationship)       |
+| `CAN_ACCESS`           | `(Internet)-[:CAN_ACCESS]->(resource)` exposure edge        |
-| `HAS_FINDING`          | Resource-to-finding link (relationship)            |
+| `HAS_FINDING`          | `(resource)-[:HAS_FINDING]->(:ProwlerFinding)` finding link |
 | `TRUSTS_AWS_PRINCIPAL` | Role trust relationship                                     |
-| `STS_ASSUMEROLE_ALLOW` | Can assume role (direction: role -> principal)      |
+| `STS_ASSUMEROLE_ALLOW` | Can assume role                                             |
 ---
 ## Parameters
-For queries requiring user input:
+For queries that take user input:
 ```python
 parameters=[
@@ -438,50 +407,83 @@ parameters=[
 ---
-## Best practices
+## openCypher compatibility
-1. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). An unanchored `MATCH` would return nodes from all providers.
+Queries must run on both Neo4j and Amazon Neptune. Avoid these constructs:
-   ```cypher
+| Feature                                 | Use instead                                                                                                                                 |
-   // WRONG: matches ALL AWSRoles across all providers
+| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
-   MATCH (role:AWSRole) WHERE role.name = 'admin'
+| APOC procedures (`apoc.*`)              | Real nodes and relationships in the graph                                                                                                   |
 | Neptune extensions                      | Standard openCypher                                                                                                                         |
 | `reduce()`                              | `UNWIND` + `collect()`                                                                                                                      |
 | `FOREACH`                               | `WITH` + `UNWIND` + `SET`                                                                                                                   |
 | Regex `=~`                              | `toLower()` + exact match, or `STARTS WITH` / `CONTAINS`                                                                                    |
 | `CALL () { UNION }`                     | Multi-label `OR` in `WHERE` (see pattern above)                                                                                             |
 | `any(x IN list ...)`                    | `size([x IN list WHERE pred]) > 0`                                                                                                          |
 | `all(x IN list ...)`                    | `size([x IN list WHERE pred]) = size(list)`                                                                                                 |
 | `none(x IN list ...)`                   | `size([x IN list WHERE pred]) = 0`                                                                                                          |
 | `EXISTS { MATCH (pattern) WHERE pred }` | Standalone `MATCH (pattern)` + `WHERE pred`; precede the downstream `collect(path...)` with `WITH DISTINCT <path-vars>` to dedupe the joins |
-   // CORRECT: scoped to the specific account's subgraph
+For list-typed properties in the catalog (action, resource, and so on), traverse the `HAS_*` edges to the child item nodes via the multi-`MATCH` shape shown in "List-typed properties as child nodes". The parent node does not carry the list as a single field, so `split(...)` and comma-string predicates do not apply.
   MATCH (aws)--(role:AWSRole) WHERE role.name = 'admin'
   ```
   **Exception**: A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph by the first MATCH. It does not need to chain from `aws` again.
 2. **Include Prowler findings**: Always add `OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})` with `collect(DISTINCT pf)`.
 3. **Comment the query purpose**: Add inline comments explaining each MATCH clause.
 4. **Never use internal labels in queries**: `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are for system isolation. They should never appear in predefined or custom query text.
 6. **Internet node uses path connectivity**: Reach it via `OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)` where `resource` is already scoped by the account anchor. No standalone lookup.
 ---
-## openCypher compatibility
+## Best practices
-Queries must be written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
+1. **Chain every MATCH from the account anchor.** An unanchored `MATCH (role:AWSRole)` returns roles from every provider in the graph; `MATCH (aws)--(role:AWSRole)` is scoped. A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph.
 2. **Type the finding probe.** Always `OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})`. The type lets Neptune apply an inline edge filter; an untyped probe scans every incident edge of high-degree nodes.
 3. **Comment each MATCH.** One inline `// ...` line per clause explaining its role.
 4. **Never use internal labels.** `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are system isolation labels and must not appear in query text (predefined or custom).
 5. **Reach the Internet node through path connectivity** via `(internet:Internet)-[:CAN_ACCESS]->(resource)`, never as a standalone match.
 6. **Preserve the `RETURN` contract.** `paths, dpf, dpfr` for the standard shape; add `internet, can_access` for network-exposure queries. The serializer and visualiser depend on these names.
-### Avoid these (not in openCypher spec)
+---
-| Feature                    | Use instead                                            |
+## Naming conventions
-| -------------------------- | ------------------------------------------------------ |
+
-| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph              |
+- **ID**: kebab-case `{provider}-{category}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
-| Neptune extensions         | Standard openCypher                                    |
+- **Constant**: SHOUTING*SNAKE_CASE `{PROVIDER}*{CATEGORY}\_{DESCRIPTION}`, e.g. `AWS_EC2_PRIVESC_PASSROLE_IAM`.
-| `reduce()` function        | `UNWIND` + `collect()`                                 |
+
-| `FOREACH` clause           | `WITH` + `UNWIND` + `SET`                              |
+---
-| Regex operator (`=~`)      | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH`. One legacy query uses `=~` - do not add new usages |
+
-| `CALL () { UNION }`        | Multi-label OR in WHERE (see patterns section)         |
+## Creation steps
 1. **Read the queries module first** to match the existing style:
   ```text
   api/src/backend/api/attack_paths/queries/
   ├── __init__.py
   ├── types.py         # dataclass definitions
   ├── registry.py
   └── {provider}.py
   ```
 2. **Fetch the Cartography schema for the pinned version.** Do not guess labels, properties, or relationships. Read the dependency pin:
   ```bash
   grep cartography api/pyproject.toml
   ```
   Then fetch the schema for that exact tag:
   ```text
   # Git pin (prowler-cloud/cartography@<TAG>):
   https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
   # PyPI pin (cartography==<TAG>):
   https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
   ```
 3. **Build the query** using the canonical predefined template plus the appropriate sub-pattern (privilege escalation or network exposure). For list-typed properties (action/resource/etc.), traverse the exploded child nodes via `[:HAS_ACTION]->(:AWSPolicyStatementActionItem)` etc. (see "List-typed properties as child nodes" and the `AWS_NORMALIZED_LISTS` catalog).
 4. **Register** the constant in the `{PROVIDER}_QUERIES` list at the bottom of the provider file.
 ---
 ## Reference
- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`, not WebFetch)
+- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`; the aggregated `paths.json` is too large for WebFetch).
- **Cartography schema**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
+- **Cartography schema** (per pinned tag): `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{tag}/docs/root/modules/{provider}/schema.md`.
- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
+- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html.
- **openCypher spec**: https://github.com/opencypher/openCypher
+- **openCypher spec**: https://github.com/opencypher/openCypher.
 - **Sync converter** (`tasks/jobs/attack_paths/sync.py`): list-typed node properties listed in `tasks/jobs/attack_paths/provider_config.py::AWS_NORMALIZED_LISTS` are materialised as child item nodes + `HAS_*` edges. Properties that are not in the catalog are serialised to a comma-delimited string and emit a one-time warning. Dict-typed properties become JSON strings. Same shape on both sinks.