feat(api): make Attack Paths sink selectable between Neo4j and Neptune (#11524)

2026-07-04 19:21:51 +00:00 · 2026-06-26 10:22:29 +02:00
parent 9b8b77cec0
commit 5793cd7e38
48 changed files with 9928 additions and 3210 deletions
@@ -169,3 +169,7 @@ GEMINI.md

 # Claude Code
 .claude/*
+
+# Docker
+docker-compose.override.yml
+docker-compose-dev.override.yml
@@ -83,16 +83,35 @@ prowler dashboard

 ## Attack Paths

-Attack Paths automatically extends every completed AWS scan with a Neo4j graph that combines Cartography's cloud inventory with Prowler findings. The feature runs in the API worker after each scan and therefore requires:
+Attack Paths automatically extends every completed AWS scan with a graph that combines Cartography's cloud inventory with Prowler findings. The feature runs in the API worker after each scan.

- An accessible Neo4j instance (the Docker Compose files already ships a `neo4j` service).
- The following environment variables so Django and Celery can connect:
+Two graph backends are supported as the long-lived sink:

-  | Variable | Description | Default |
-  | --- | --- | --- |
-  | `NEO4J_HOST` | Hostname used by the API containers. | `neo4j` |
-  | `NEO4J_PORT` | Bolt port exposed by Neo4j. | `7687` |
-  | `NEO4J_USER` / `NEO4J_PASSWORD` | Credentials with rights to create per-tenant databases. | `neo4j` / `neo4j_password` |
+- **Neo4j** (default; the Docker Compose files already ship a `neo4j` service).
+- **Amazon Neptune** (cloud-managed; opt-in).
+
+Select the sink with `ATTACK_PATHS_SINK_DATABASE` (`neo4j` or `neptune`; default `neo4j`).
+
+> Note: Cartography ingestion always uses a temporary Neo4j database, regardless of the configured sink. The `NEO4J_*` variables below must remain set even when `ATTACK_PATHS_SINK_DATABASE=neptune`.
+
+### Neo4j sink
+
+| Variable | Description | Default |
+| --- | --- | --- |
+| `NEO4J_HOST` | Hostname used by the API containers. | `neo4j` |
+| `NEO4J_PORT` | Bolt port exposed by Neo4j. | `7687` |
+| `NEO4J_USER` / `NEO4J_PASSWORD` | Credentials with rights to create per-tenant databases. | `neo4j` / `neo4j_password` |
+
+### Neptune sink
+
+| Variable | Description | Default |
+| --- | --- | --- |
+| `NEPTUNE_WRITER_ENDPOINT` | Bolt host for the Neptune writer instance. Required when sink is `neptune`. | _empty_ |
+| `NEPTUNE_READER_ENDPOINT` | Optional reader endpoint for read-only queries. Falls back to the writer when unset. | _empty_ |
+| `NEPTUNE_PORT` | Bolt port exposed by Neptune. | `8182` |
+| `AWS_REGION` | Region the Neptune cluster lives in. Required when sink is `neptune`. | _empty_ |
+
+Neptune authenticates with SigV4 using the standard boto3 credential chain. The worker's IAM role (or `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY`) supplies the credentials. There is no Neptune password variable.

 Every AWS provider scan will enqueue an Attack Paths ingestion job automatically. Other cloud providers will be added in future iterations.

@@ -2,6 +2,14 @@

 All notable changes to the **Prowler API** are documented in this file.

+## [1.33.0] (Prowler UNRELEASED)
+
+### 🔄 Changed
+
+- Attack Paths: AWS Neptune is now supported as a persistent sink database, selectable via `ATTACK_PATHS_SINK_DATABASE=neptune` (default `neo4j`), Cartography's (bumped to 0.138.1) per-scan ingest database stays on Neo4j [(#11524)](https://github.com/prowler-cloud/prowler/pull/11524)
+
+---
+
 ## [1.32.2] (Prowler UNRELEASED)

 ### 🐞 Fixed
@@ -58,7 +58,7 @@ dependencies = [
  "matplotlib (==3.10.8)",
  "reportlab (==4.4.10)",
  "neo4j (==6.1.0)",
-  "cartography (==0.135.0)",
+  "cartography (==0.138.1)",
  "gevent (==25.9.1)",
  "werkzeug (==3.1.7)",
  "sqlparse (==0.5.5)",
@@ -193,7 +193,7 @@ constraint-dependencies = [
  "blinker==1.9.0",
  "boto3==1.40.61",
  "botocore==1.40.61",
-  "cartography==0.135.0",
+  "cartography==0.138.1",
  "celery==5.6.2",
  "certifi==2026.1.4",
  "cffi==2.0.0",
@@ -447,7 +447,7 @@ constraint-dependencies = [
  "wcwidth==0.5.3",
  "websocket-client==1.9.0",
  "werkzeug==3.1.7",
-  "workos==6.0.4",
+  "workos==6.0.8",
  "wrapt==1.17.3",
  "xlsxwriter==3.2.9",
  "xmlsec==1.3.17",
@@ -458,8 +458,13 @@ constraint-dependencies = [
  "zope-interface==8.2",
  "zstd==1.5.7.3"
 ]
-# prowler@master needs okta==3.4.2; cartography 0.135.0 declares okta<1.0.0 for an
-# integration prowler does not import.
+# prowler@master needs okta==3.4.2, but cartography 0.138.1 requires okta<1.0.0.
+# Attack Paths does not ingest Okta today, so override the Cartography
+# dependency to the Prowler pin.
+#
+# prowler@master needs azure-mgmt-containerservice==34.1.0, but cartography
+# 0.138.1 requires azure-mgmt-containerservice>=41.0.0. Attack Paths does not
+# ingest Azure today, so override the Cartography dependency to the Prowler pin.
 #
 # prowler@master hard-pins microsoft-kiota-abstractions==1.9.2 in [project.dependencies].
 # The microsoft-kiota-http security bump to 1.9.9 (GHSA-7j59-v9qr-6fq9) requires
@@ -475,6 +480,7 @@ constraint-dependencies = [
 # that request pyjwt[crypto] and leave cryptography (needed for RS256) only transitive.
 override-dependencies = [
  "okta==3.4.2",
+  "azure-mgmt-containerservice==34.1.0",
  "microsoft-kiota-abstractions==1.9.9",
  "dulwich==1.2.5",
  "pyjwt[crypto]==2.13.0"
@@ -42,9 +42,6 @@ class ApiConfig(AppConfig):
        ):
            self._ensure_crypto_keys()

-        # Neo4j driver is created lazily on first use (see api.attack_paths.database).
-        # App init never contacts Neo4j, so a Neo4j outage cannot block API startup.
-
    def _ensure_crypto_keys(self):
        """
        Orchestrator method that ensures all required cryptographic keys are present.
@@ -4,10 +4,10 @@ Cypher sanitizer for custom (user-supplied) Attack Paths queries.
 Two responsibilities:

 1. **Validation** - reject queries containing SSRF or dangerous procedure
-   patterns (defense-in-depth; the primary control is ``neo4j.READ_ACCESS``).
+   patterns (defense-in-depth; the primary control is `neo4j.READ_ACCESS`).

 2. **Provider-scoped label injection** - inject a dynamic
-   ``_Provider_{uuid}`` label into every node pattern so the database can
+   `_Provider_{uuid}` label into every node pattern so the database can
   use its native label index for provider isolation.

 Label-injection pipeline:
@@ -25,13 +25,13 @@ from rest_framework.exceptions import ValidationError
 from tasks.jobs.attack_paths.config import get_provider_label

 # Step 1 - String / comment protection
-# Single combined regex: strings first, then line comments.
+# Single combined regex: strings first, then line comments
 # The regex engine finds the leftmost match, so a string like 'https://prowler.com'
-# is consumed as a string before the // inside it can match as a comment.
+# is consumed as a string before the // inside it can match as a comment
 _PROTECTED_RE = re.compile(r"'(?:[^'\\]|\\.)*'|\"(?:[^\"\\]|\\.)*\"|//[^\n]*")

 # Step 2 - Clause splitting
-# OPTIONAL MATCH must come before MATCH to avoid partial matching.
+# `OPTIONAL MATCH` must come before `MATCH` to avoid partial matching
 _CLAUSE_RE = re.compile(
    r"\b(OPTIONAL\s+MATCH|MATCH|WHERE|RETURN|WITH|ORDER\s+BY"
    r"|SKIP|LIMIT|UNION|UNWIND|CALL)\b",
@@ -39,10 +39,10 @@ _CLAUSE_RE = re.compile(
 )

 # Pass A - Labeled node patterns (all segments)
-# Matches node patterns that have at least one :Label.
-# (?<!\w)\(  - open paren NOT preceded by a word char (excludes function calls).
-# Group 1:  optional variable + one or more :Label
-# Group 2:  optional {properties} + closing paren
+# Matches node patterns that have at least one `:Label`
+# `(?<!\w)\(`  - open paren NOT preceded by a word char, excludes function calls
+# Group 1:  optional variable + one or more `:Label`
+# Group 2:  optional `{`properties`}` + closing paren
 _LABELED_NODE_RE = re.compile(
    r"(?<!\w)\("
    r"("
@@ -55,9 +55,9 @@ _LABELED_NODE_RE = re.compile(
    r")"
 )

-# Pass B - Bare node patterns (MATCH segments only)
-# Matches (identifier) or (identifier {properties}) without any :Label.
-# Only applied in MATCH/OPTIONAL MATCH segments.
+# Pass B - Bare node patterns (`MATCH` segments only)
+# Matches (identifier) or (identifier {properties}) without any `:Label`
+# Only applied in `MATCH` / `OPTIONAL MATCH` segments
 _BARE_NODE_RE = re.compile(
    r"(?<!\w)\(" r"(\s*[a-zA-Z_]\w*)" r"(\s*(?:\{[^}]*\})?)" r"\s*\)"
 )
@@ -134,9 +134,7 @@ def inject_provider_label(cypher: str, provider_id: str) -> str:
    return work


-# ---------------------------------------------------------------------------
 # Validation
-# ---------------------------------------------------------------------------

 # Patterns that indicate SSRF or dangerous procedure calls
 # Defense-in-depth layer - the primary control is `neo4j.READ_ACCESS`
@@ -1,261 +1,32 @@
-import atexit
-import logging
-import threading
-from collections.abc import Iterator
-from contextlib import contextmanager
+"""Backwards-compatible facade over the ingest and sink modules.
+
+Historically this module owned a single Neo4j driver used for both the
+cartography temp database and the per-tenant sink database. The port to AWS
+Neptune split those roles: the cartography ingest (temp) database is always
+Neo4j and lives in `api.attack_paths.ingest`; the sink is configurable
+(Neo4j or Neptune) and lives in `api.attack_paths.sink`. This shim preserves
+the public API that `tasks/` and `api/v1/views.py` already depend on, and
+dispatches to the right module by database-name prefix.
+
+A database name starting with `db-tmp-scan-` is a cartography temp DB and
+routes to ingest. Everything else routes to the configured sink.
+"""
+
+from contextlib import AbstractContextManager
 from typing import Any
 from uuid import UUID

-import neo4j
-import neo4j.exceptions
-from api.attack_paths.retryable_session import RetryableSession
+import neo4j  # noqa: F401 - kept for tests that patch api.attack_paths.database.neo4j
+from api.attack_paths import ingest
+from api.attack_paths import sink as sink_module
 from config.env import env
-from django.conf import settings
-from tasks.jobs.attack_paths.config import (
-    BATCH_SIZE,
-    PROVIDER_RESOURCE_LABEL,
-    get_provider_label,
+from django.conf import (
+    settings,  # noqa: F401 - kept for tests that patch ...database.settings
 )

-# Without this Celery goes crazy with Neo4j logging
-logging.getLogger("neo4j").setLevel(logging.ERROR)
-logging.getLogger("neo4j").propagate = False
-
-SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
-    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
-)
-READ_QUERY_TIMEOUT_SECONDS = env.int(
-    "ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
-)
 MAX_CUSTOM_QUERY_NODES = env.int("ATTACK_PATHS_MAX_CUSTOM_QUERY_NODES", default=250)
-# Shorter than CONN_ACQUISITION_TIMEOUT — the driver requires acquisition to be
-# the longer of the two (it may include opening a new connection).
-CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
-CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
-READ_EXCEPTION_CODES = [
-    "Neo.ClientError.Statement.AccessMode",
-    "Neo.ClientError.Procedure.ProcedureNotFound",
-]
-CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."

-# Module-level process-wide driver singleton
-_driver: neo4j.Driver | None = None
-_lock = threading.Lock()
-
-# Base Neo4j functions
-
-
-def get_uri() -> str:
-    host = settings.DATABASES["neo4j"]["HOST"]
-    port = settings.DATABASES["neo4j"]["PORT"]
-    return f"bolt://{host}:{port}"
-
-
-def init_driver() -> neo4j.Driver:
-    global _driver
-    if _driver is not None:
-        return _driver
-
-    with _lock:
-        if _driver is None:
-            uri = get_uri()
-            config = settings.DATABASES["neo4j"]
-
-            driver = neo4j.GraphDatabase.driver(
-                uri,
-                auth=(config["USER"], config["PASSWORD"]),
-                keep_alive=True,
-                max_connection_lifetime=7200,
-                connection_timeout=CONNECTION_TIMEOUT,
-                connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
-                max_connection_pool_size=50,
-            )
-            # Publish the singleton only after connectivity is verified so a
-            # failed probe does not leave an unverified driver behind. Close the
-            # driver on failure so a repeatedly-probed outage cannot leak pools.
-            try:
-                driver.verify_connectivity()
-            except Exception:
-                driver.close()
-                raise
-            _driver = driver
-
-            # Register cleanup handler (only runs once since we're inside the _driver is None block)
-            atexit.register(close_driver)
-
-    return _driver
-
-
-def get_driver() -> neo4j.Driver:
-    return init_driver()
-
-
-def close_driver() -> None:  # TODO: Use it
-    global _driver
-    with _lock:
-        if _driver is not None:
-            try:
-                _driver.close()
-
-            finally:
-                _driver = None
-
-
-@contextmanager
-def get_session(
-    database: str | None = None, default_access_mode: str | None = None
-) -> Iterator[RetryableSession]:
-    session_wrapper: RetryableSession | None = None
-
-    try:
-        session_wrapper = RetryableSession(
-            session_factory=lambda: get_driver().session(
-                database=database, default_access_mode=default_access_mode
-            ),
-            max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
-        )
-        yield session_wrapper
-
-    except neo4j.exceptions.Neo4jError as exc:
-        if (
-            default_access_mode == neo4j.READ_ACCESS
-            and exc.code
-            and exc.code in READ_EXCEPTION_CODES
-        ):
-            message = "Read query not allowed"
-            code = READ_EXCEPTION_CODES[0]
-            raise WriteQueryNotAllowedException(message=message, code=code)
-
-        message = exc.message if exc.message is not None else str(exc)
-
-        if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
-            raise ClientStatementException(message=message, code=exc.code)
-
-        raise GraphDatabaseQueryException(message=message, code=exc.code)
-
-    finally:
-        if session_wrapper is not None:
-            session_wrapper.close()
-
-
-def execute_read_query(
-    database: str,
-    cypher: str,
-    parameters: dict[str, Any] | None = None,
-) -> neo4j.graph.Graph:
-    with get_session(database, default_access_mode=neo4j.READ_ACCESS) as session:
-
-        def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
-            result = tx.run(
-                cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
-            )
-            return result.graph()
-
-        return session.execute_read(_run)
-
-
-def create_database(database: str) -> None:
-    query = "CREATE DATABASE $database IF NOT EXISTS"
-    parameters = {"database": database}
-
-    with get_session() as session:
-        session.run(query, parameters)
-
-
-def drop_database(database: str) -> None:
-    query = f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA"
-
-    with get_session() as session:
-        session.run(query)
-
-
-def drop_subgraph(database: str, provider_id: str) -> int:
-    """
-    Delete all nodes for a provider from the tenant database.
-
-    Deletes relationships then nodes in batches (not `DETACH DELETE`) so a dense
-    provider's graph cannot exceed Neo4j's transaction memory limit.
-    Silently returns 0 if the database doesn't exist.
-    """
-    provider_label = get_provider_label(provider_id)
-    deleted_nodes = 0
-
-    try:
-        with get_session(database) as session:
-            # Phase 1: delete relationships incident to provider nodes in batches.
-            deleted_count = 1
-            while deleted_count > 0:
-                result = session.run(
-                    f"""
-                    MATCH (:`{provider_label}`)-[r]-()
-                    WITH DISTINCT r LIMIT $batch_size
-                    DELETE r
-                    RETURN COUNT(r) AS deleted_rels_count
-                    """,
-                    {"batch_size": BATCH_SIZE},
-                )
-                deleted_count = result.single().get("deleted_rels_count", 0)
-
-            # Phase 2: delete the now relationship-free nodes in batches.
-            deleted_count = 1
-            while deleted_count > 0:
-                result = session.run(
-                    f"""
-                    MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`)
-                    WITH n LIMIT $batch_size
-                    DELETE n
-                    RETURN COUNT(n) AS deleted_nodes_count
-                    """,
-                    {"batch_size": BATCH_SIZE},
-                )
-                deleted_count = result.single().get("deleted_nodes_count", 0)
-                deleted_nodes += deleted_count
-
-    except GraphDatabaseQueryException as exc:
-        if exc.code == "Neo.ClientError.Database.DatabaseNotFound":
-            return 0
-        raise
-
-    return deleted_nodes
-
-
-def has_provider_data(database: str, provider_id: str) -> bool:
-    """
-    Check if any ProviderResource node exists for this provider.
-
-    Returns `False` if the database doesn't exist.
-    """
-    provider_label = get_provider_label(provider_id)
-    query = f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
-
-    try:
-        with get_session(database, default_access_mode=neo4j.READ_ACCESS) as session:
-            result = session.run(query)
-            return result.single() is not None
-
-    except GraphDatabaseQueryException as exc:
-        if exc.code == "Neo.ClientError.Database.DatabaseNotFound":
-            return False
-        raise
-
-
-def clear_cache(database: str) -> None:
-    query = "CALL db.clearQueryCaches()"
-
-    try:
-        with get_session(database) as session:
-            session.run(query)
-
-    except GraphDatabaseQueryException as exc:
-        logging.warning(f"Failed to clear query cache for database `{database}`: {exc}")
-
-
-# Neo4j functions related to Prowler + Cartography
-
-
-def get_database_name(entity_id: str | UUID, temporary: bool = False) -> str:
-    prefix = "tmp-scan" if temporary else "tenant"
-    return f"db-{prefix}-{str(entity_id).lower()}"
+TEMP_DB_PREFIX = "db-tmp-scan-"


 # Exceptions
@@ -270,7 +41,6 @@ class GraphDatabaseQueryException(Exception):
    def __str__(self) -> str:
        if self.code:
            return f"{self.code}: {self.message}"
-
        return self.message


@@ -280,3 +50,152 @@ class WriteQueryNotAllowedException(GraphDatabaseQueryException):

 class ClientStatementException(GraphDatabaseQueryException):
    pass
+
+
+# Routing
+
+
+def _is_ingest_database(database: str | None) -> bool:
+    return bool(database) and database.startswith(TEMP_DB_PREFIX)
+
+
+# Driver lifecycle
+
+
+def init_driver() -> Any:
+    """Initialize the configured sink backend.
+
+    The ingest driver (Neo4j for cartography temp DBs) stays lazy: it is
+    only initialized when a temp-DB operation actually runs, which never
+    happens on API pods.
+    """
+    return sink_module.init()
+
+
+def close_driver() -> None:
+    """Close every driver held by this process."""
+    sink_module.close()
+    ingest.close_driver()
+
+
+def get_driver() -> neo4j.Driver:
+    """Return the sink backend's underlying driver.
+
+    Only meaningful for the Neo4j sink (where the backend has a single Neo4j
+    driver). On Neptune this returns the writer driver. Kept for tests and
+    legacy call-sites; prefer `get_session` for new code.
+    """
+    backend = sink_module.get_backend()
+
+    # Neo4jSink exposes get_driver(); NeptuneSink exposes get_writer()
+    if hasattr(backend, "get_driver"):
+        return backend.get_driver()
+
+    if hasattr(backend, "get_writer"):
+        return backend.get_writer()
+
+    raise RuntimeError("Active sink backend does not expose a driver handle")
+
+
+def verify_connectivity() -> None:
+    """Raise if the configured graph database is unreachable on the API read path.
+
+    Backend-agnostic entry point for the readiness probe: Neo4j verifies its
+    driver, Neptune verifies the reader endpoint.
+    """
+    sink_module.get_backend().verify_connectivity()
+
+
+def get_uri() -> str:
+    """Return the sink URI. Retained for backwards compatibility."""
+    if settings.ATTACK_PATHS_SINK_DATABASE == "neptune":
+        cfg = settings.DATABASES["neptune"]
+        return f"bolt+s://{cfg['WRITER_ENDPOINT']}:{cfg['PORT']}"
+
+    cfg = settings.DATABASES["neo4j"]
+    return f"bolt://{cfg['HOST']}:{cfg['PORT']}"
+
+
+def get_ingest_uri() -> str:
+    """Neo4j URI for the cartography temp (ingest) database, which is always
+    Neo4j regardless of the configured sink."""
+    return ingest.get_uri()
+
+
+# Session API
+
+
+def get_session(
+    database: str | None = None,
+    default_access_mode: str | None = None,
+) -> AbstractContextManager:
+    """Return a session against the right backend.
+
+    - `database` names starting with `db-tmp-scan-` always go to ingest.
+    - No database name → ingest (used for CREATE / DROP DATABASE admin ops).
+    - Any other name → sink.
+    """
+    if _is_ingest_database(database) or database is None:
+        return ingest.get_session(
+            database=database, default_access_mode=default_access_mode
+        )
+
+    return sink_module.get_backend().get_session(
+        database=database, default_access_mode=default_access_mode
+    )
+
+
+def execute_read_query(
+    database: str,
+    cypher: str,
+    parameters: dict[str, Any] | None = None,
+) -> neo4j.graph.Graph:
+    """Read-only query against the sink."""
+    return sink_module.get_backend().execute_read_query(database, cypher, parameters)
+
+
+def create_database(database: str) -> None:
+    """Create a database. Temp DBs always land on ingest (Neo4j).
+
+    On the Neo4j sink, tenant DBs also route to ingest because both drivers
+    connect to the same Neo4j cluster. On the Neptune sink, tenant DB creates
+    are no-ops.
+    """
+    if _is_ingest_database(database):
+        ingest.create_database(database)
+        return
+
+    sink_module.get_backend().create_database(database)
+
+
+def drop_database(database: str) -> None:
+    """Drop a database. Mirrors `create_database` routing."""
+    if _is_ingest_database(database):
+        ingest.drop_database(database)
+        return
+
+    sink_module.get_backend().drop_database(database)
+
+
+def drop_subgraph(database: str, provider_id: str) -> int:
+    return sink_module.get_backend().drop_subgraph(database, provider_id)
+
+
+def has_provider_data(database: str, provider_id: str) -> bool:
+    return sink_module.get_backend().has_provider_data(database, provider_id)
+
+
+def clear_cache(database: str) -> None:
+    if _is_ingest_database(database):
+        ingest.clear_cache(database)
+        return
+
+    sink_module.get_backend().clear_cache(database)
+
+
+# Name helper
+
+
+def get_database_name(entity_id: str | UUID, temporary: bool = False) -> str:
+    prefix = "tmp-scan" if temporary else "tenant"
+    return f"db-{prefix}-{str(entity_id).lower()}"
@@ -0,0 +1,29 @@
+"""Cartography ingest layer.
+
+Public surface for the per-scan Neo4j temp database driver. Implementation
+lives in `api.attack_paths.ingest.driver`.
+"""
+
+from api.attack_paths.ingest.driver import (
+    clear_cache,
+    close_driver,
+    create_database,
+    drop_database,
+    get_driver,
+    get_session,
+    get_uri,
+    init_driver,
+    run_cypher,
+)
+
+__all__ = [
+    "clear_cache",
+    "close_driver",
+    "create_database",
+    "drop_database",
+    "get_driver",
+    "get_session",
+    "get_uri",
+    "init_driver",
+    "run_cypher",
+]
@@ -0,0 +1,187 @@
+"""Cartography ingest driver: per-scan throw-away Neo4j database.
+
+Cartography writes each scan's graph into a throw-away Neo4j database named
+`db-tmp-scan-{scan_uuid}`. This is always Neo4j, regardless of the configured
+sink: Neptune is single-database and cannot host per-scan throw-away
+databases. This module owns the Neo4j driver used for those temp DBs and the
+admin ops they need (CREATE / DROP DATABASE).
+"""
+
+import atexit
+import logging
+import threading
+from collections.abc import Iterator
+from contextlib import contextmanager
+from typing import Any
+
+import neo4j
+import neo4j.exceptions
+from api.attack_paths.retryable_session import RetryableSession
+from config.env import env
+from django.conf import settings
+
+logging.getLogger("neo4j").setLevel(logging.ERROR)
+logging.getLogger("neo4j").propagate = False
+
+SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
+    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
+)
+CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
+# TCP connect timeout, ordered below the acquisition timeout so an unreachable
+# host can't pin a worker on a temp-DB op longer than this.
+CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
+MAX_CONNECTION_LIFETIME = env.int("NEO4J_MAX_CONNECTION_LIFETIME", default=7200)
+MAX_CONNECTION_POOL_SIZE = env.int("NEO4J_MAX_CONNECTION_POOL_SIZE", default=50)
+
+_driver: neo4j.Driver | None = None
+_lock = threading.Lock()
+
+
+def _neo4j_config() -> dict:
+    return settings.DATABASES["neo4j"]
+
+
+def get_uri() -> str:
+    """Bolt URI for the Neo4j temp (ingest) database. Always Neo4j."""
+    config = _neo4j_config()
+    host = config["HOST"]
+    port = config["PORT"]
+    if not host or not port:
+        raise RuntimeError(
+            "NEO4J_HOST / NEO4J_PORT must be set to use the attack-paths "
+            "temp database. Workers require Neo4j env even when the sink is Neptune."
+        )
+
+    return f"bolt://{host}:{port}"
+
+
+def init_driver() -> neo4j.Driver:
+    """Initialize the temp-database Neo4j driver. Idempotent."""
+    global _driver
+    if _driver is not None:
+        return _driver
+
+    with _lock:
+        if _driver is None:
+            config = _neo4j_config()
+            _driver = neo4j.GraphDatabase.driver(
+                get_uri(),
+                auth=(config["USER"], config["PASSWORD"]),
+                keep_alive=True,
+                max_connection_lifetime=MAX_CONNECTION_LIFETIME,
+                connection_timeout=CONNECTION_TIMEOUT,
+                connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
+                max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
+            )
+            # Best-effort connectivity check: a Neo4j that is down at boot must
+            # not crash the worker. The driver reconnects lazily on first use.
+            try:
+                _driver.verify_connectivity()
+
+            except Exception:
+                logging.warning(
+                    "Neo4j temp-database unreachable at init; continuing with a "
+                    "lazily-reconnecting driver",
+                    exc_info=True,
+                )
+
+            atexit.register(close_driver)
+
+    return _driver
+
+
+def get_driver() -> neo4j.Driver:
+    return init_driver()
+
+
+def close_driver() -> None:
+    global _driver
+    with _lock:
+        if _driver is not None:
+            try:
+                _driver.close()
+            finally:
+                _driver = None
+
+
+@contextmanager
+def get_session(
+    database: str | None = None,
+    default_access_mode: str | None = None,
+) -> Iterator[RetryableSession]:
+    """Session against the Neo4j temp-database cluster. Used for temp DB sessions
+    and for admin operations (CREATE / DROP DATABASE) when `database` is None."""
+    from api.attack_paths.database import (
+        ClientStatementException,
+        GraphDatabaseQueryException,
+        WriteQueryNotAllowedException,
+    )
+
+    READ_EXCEPTION_CODES = [
+        "Neo.ClientError.Statement.AccessMode",
+        "Neo.ClientError.Procedure.ProcedureNotFound",
+    ]
+    CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
+
+    session_wrapper: RetryableSession | None = None
+    try:
+        session_wrapper = RetryableSession(
+            session_factory=lambda: get_driver().session(
+                database=database, default_access_mode=default_access_mode
+            ),
+            max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
+        )
+        yield session_wrapper
+
+    except neo4j.exceptions.Neo4jError as exc:
+        if (
+            default_access_mode == neo4j.READ_ACCESS
+            and exc.code
+            and exc.code in READ_EXCEPTION_CODES
+        ):
+            raise WriteQueryNotAllowedException(
+                message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
+            )
+
+        message = exc.message if exc.message is not None else str(exc)
+        if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
+            raise ClientStatementException(message=message, code=exc.code)
+        raise GraphDatabaseQueryException(message=message, code=exc.code)
+
+    finally:
+        if session_wrapper is not None:
+            session_wrapper.close()
+
+
+def create_database(database: str) -> None:
+    """Create a database on the Neo4j cluster. Used for temp scan DBs."""
+    with get_session() as session:
+        session.run("CREATE DATABASE $database IF NOT EXISTS", {"database": database})
+
+
+def drop_database(database: str) -> None:
+    """Drop a database on the Neo4j cluster. Used for temp scan DBs."""
+    with get_session() as session:
+        session.run(f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA")
+
+
+def clear_cache(database: str) -> None:
+    """Best-effort cache clear for a Neo4j database."""
+    from api.attack_paths.database import GraphDatabaseQueryException
+
+    try:
+        with get_session(database) as session:
+            session.run("CALL db.clearQueryCaches()")
+
+    except GraphDatabaseQueryException as exc:
+        logging.warning(f"Failed to clear query cache for database `{database}`: {exc}")
+
+
+def run_cypher(
+    database: str | None,
+    cypher: str,
+    parameters: dict[str, Any] | None = None,
+) -> Any:
+    """Execute Cypher directly without the context manager. Thin helper."""
+    with get_session(database) as session:
+        return session.run(cypher, parameters or {})
@@ -1,12 +1,14 @@
 from api.attack_paths.queries.aws import AWS_QUERIES
+
+# TODO: drop after Neptune cutover
+from api.attack_paths.queries.aws_deprecated import AWS_DEPRECATED_QUERIES
 from api.attack_paths.queries.types import AttackPathsQueryDefinition

-# Query definitions organized by provider
+# Query definitions for scans synced with the current schema.
 _QUERY_DEFINITIONS: dict[str, list[AttackPathsQueryDefinition]] = {
    "aws": AWS_QUERIES,
 }

-# Flat lookup by query ID for O(1) access
 _QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
    definition.id: definition
    for definitions in _QUERY_DEFINITIONS.values()
@@ -14,11 +16,45 @@ _QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
 }


-def get_queries_for_provider(provider: str) -> list[AttackPathsQueryDefinition]:
-    """Get all attack path queries for a specific provider."""
-    return _QUERY_DEFINITIONS.get(provider, [])
+# TODO: drop after Neptune cutover
+#
+# Query definitions for pre-cutover scans (`AttackPathsScan.is_migrated=False`)
+# whose graph data was written under the previous schema. Both maps expose the
+# same query IDs so the API contract is identical regardless of which set is
+# routed to.
+_DEPRECATED_QUERY_DEFINITIONS: dict[str, list[AttackPathsQueryDefinition]] = {
+    "aws": AWS_DEPRECATED_QUERIES,
+}
+
+_DEPRECATED_QUERIES_BY_ID: dict[str, AttackPathsQueryDefinition] = {
+    definition.id: definition
+    for definitions in _DEPRECATED_QUERY_DEFINITIONS.values()
+    for definition in definitions
+}


-def get_query_by_id(query_id: str) -> AttackPathsQueryDefinition | None:
-    """Get a specific attack path query by its ID."""
-    return _QUERIES_BY_ID.get(query_id)
+def get_queries_for_provider(
+    provider: str,
+    is_migrated: bool = True,
+) -> list[AttackPathsQueryDefinition]:
+    """Get all attack path queries for a provider.
+
+    `is_migrated` selects the catalog: True for scans synced with the current
+    schema, False for pre-cutover scans still using the legacy graph shape.
+    # TODO: drop the `is_migrated` parameter after Neptune cutover
+    """
+    catalog = _QUERY_DEFINITIONS if is_migrated else _DEPRECATED_QUERY_DEFINITIONS
+    return catalog.get(provider, [])
+
+
+def get_query_by_id(
+    query_id: str,
+    is_migrated: bool = True,
+) -> AttackPathsQueryDefinition | None:
+    """Get a specific attack path query by ID.
+
+    `is_migrated` selects the catalog (see `get_queries_for_provider`).
+    # TODO: drop the `is_migrated` parameter after Neptune cutover
+    """
+    by_id = _QUERIES_BY_ID if is_migrated else _DEPRECATED_QUERIES_BY_ID
+    return by_id.get(query_id)
@@ -0,0 +1,28 @@
+"""Attack-paths sink database layer.
+
+The sink is the persistent store where attack-paths graphs live after a scan
+finishes. Currently selectable between Neo4j (OSS / local dev default) and
+AWS Neptune (hosted dev/staging/prod). Backend is picked by the
+`ATTACK_PATHS_SINK_DATABASE` setting at process init.
+
+This package exposes the public factory API; the implementation lives in
+`api.attack_paths.sink.factory`.
+"""
+
+from api.attack_paths.sink.factory import (
+    SinkBackend,
+    close,
+    get_backend,
+    get_backend_for_name,
+    get_backend_for_scan,
+    init,
+)
+
+__all__ = [
+    "SinkBackend",
+    "close",
+    "get_backend",
+    "get_backend_for_name",
+    "get_backend_for_scan",
+    "init",
+]
@@ -0,0 +1,92 @@
+"""Protocol every sink backend must implement."""
+
+from contextlib import AbstractContextManager
+from typing import Any, Protocol
+
+import neo4j
+
+
+class SinkDatabase(Protocol):
+    """Contract for the persistent attack-paths graph store.
+
+    The `database` argument is an opaque identifier passed through from the
+    legacy `database.py` API surface. On Neo4j it is the per-tenant database
+    name (e.g. `db-tenant-{uuid}`). On Neptune it is ignored (the cluster
+    has a single graph, and isolation is label-based).
+    """
+
+    def init(self) -> None: ...
+
+    def close(self) -> None: ...
+
+    def verify_connectivity(self) -> None:
+        """Raise if the backend the API read path uses is unreachable.
+
+        Neo4j verifies its single driver. Neptune verifies the reader
+        driver (the endpoint the API serves reads from); on single-endpoint
+        clusters the reader aliases the writer, so that path is covered too.
+        Used by the readiness probe; must not block longer than the caller's
+        probe budget.
+        """
+        ...
+
+    def get_session(
+        self,
+        database: str | None = None,
+        default_access_mode: str | None = None,
+    ) -> AbstractContextManager: ...
+
+    def execute_read_query(
+        self,
+        database: str,
+        cypher: str,
+        parameters: dict[str, Any] | None = None,
+    ) -> neo4j.graph.Graph: ...
+
+    def create_database(self, database: str) -> None: ...
+
+    def drop_database(self, database: str) -> None: ...
+
+    def drop_subgraph(self, database: str, provider_id: str) -> int: ...
+
+    def has_provider_data(self, database: str, provider_id: str) -> bool: ...
+
+    def clear_cache(self, database: str) -> None: ...
+
+    def ensure_sync_indexes(self, database: str) -> None:
+        """Create any index needed for the sync write path.
+
+        Called once at the start of each provider sync; must be idempotent.
+        Neo4j creates a `_provider_element_id` index on `_ProviderResource`;
+        Neptune is a no-op (its `~id` lookup needs no index).
+        """
+        ...
+
+    def write_nodes(
+        self,
+        database: str,
+        labels: str,
+        rows: list[dict[str, Any]],
+    ) -> None:
+        """Upsert a batch of nodes into the sink.
+
+        `labels` is a pre-rendered Cypher label string ready to drop after
+        the node variable (e.g. `` `AWSUser`:`_ProviderResource`:`_Tenant_x` ``).
+        Each row carries `provider_element_id` and `props`.
+        """
+        ...
+
+    def write_relationships(
+        self,
+        database: str,
+        rel_type: str,
+        provider_id: str,
+        rows: list[dict[str, Any]],
+    ) -> None:
+        """Upsert a batch of relationships into the sink.
+
+        Each row carries `start_element_id`, `end_element_id`,
+        `provider_element_id` and `props`. `rel_type` is the relationship
+        type (already a valid Cypher identifier).
+        """
+        ...
@@ -0,0 +1,134 @@
+"""Sink backend factory and process-wide handle cache.
+
+Picks the active backend from `settings.ATTACK_PATHS_SINK_DATABASE` at first
+use, holds the active backend plus any secondary backends needed to serve
+scans written under the previous configuration, and tears them all down on
+process shutdown. Imported via `from api.attack_paths import sink as
+sink_module`.
+"""
+
+import threading
+from enum import StrEnum, auto
+
+from api.attack_paths.sink.base import SinkDatabase
+from api.models import AttackPathsScan
+from django.conf import settings
+
+# Backend names
+
+
+class SinkBackend(StrEnum):
+    NEO4J = auto()
+    NEPTUNE = auto()
+
+
+# Backend cache
+
+_backend: SinkDatabase | None = None
+_secondary_backends: dict[SinkBackend, SinkDatabase] = {}
+_lock = threading.Lock()
+
+
+def _resolve_setting() -> SinkBackend:
+    raw = settings.ATTACK_PATHS_SINK_DATABASE.lower()
+    try:
+        return SinkBackend(raw)
+
+    except ValueError:
+        valid = sorted(b.value for b in SinkBackend)
+        raise RuntimeError(
+            f"ATTACK_PATHS_SINK_DATABASE must be one of {valid}; got {raw!r}"
+        )
+
+
+def _build_backend(name: SinkBackend) -> SinkDatabase:
+    if name is SinkBackend.NEO4J:
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        return Neo4jSink()
+
+    if name is SinkBackend.NEPTUNE:
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        return NeptuneSink()
+
+    raise RuntimeError(f"Unknown sink backend {name!r}")
+
+
+# Lifecycle
+
+
+def init(name: SinkBackend | str | None = None) -> SinkDatabase:
+    """Initialize the configured sink backend. Idempotent."""
+    global _backend
+    if _backend is not None:
+        return _backend
+
+    with _lock:
+        if _backend is None:
+            resolved = SinkBackend(name) if name else _resolve_setting()
+            backend = _build_backend(resolved)
+            backend.init()
+            _backend = backend
+
+    return _backend
+
+
+def close() -> None:
+    """Close the active backend and every cached secondary backend."""
+    global _backend
+    with _lock:
+        backends = [
+            b for b in (_backend, *_secondary_backends.values()) if b is not None
+        ]
+        _backend = None
+        _secondary_backends.clear()
+
+    for backend in backends:
+        try:
+            backend.close()
+
+        except Exception:  # pragma: no cover - best-effort
+            pass
+
+
+def get_backend() -> SinkDatabase:
+    """Return the active sink. Initializes on first call."""
+    return init()
+
+
+# Per-scan routing
+
+
+def get_backend_for_scan(scan: AttackPathsScan) -> SinkDatabase:
+    """Route reads by the sink that stores this scan's graph."""
+    raw_backend = getattr(scan, "sink_backend", SinkBackend.NEO4J.value)
+    if not isinstance(raw_backend, str):
+        raw_backend = SinkBackend.NEO4J.value
+    return get_backend_for_name(raw_backend)
+
+
+def get_backend_for_name(name: SinkBackend | str) -> SinkDatabase:
+    """Return the backend named by persisted scan metadata."""
+    resolved = SinkBackend(name)
+    if resolved is _resolve_setting():
+        return get_backend()
+
+    return _build_backend_cached(resolved)
+
+
+def _build_backend_cached(name: SinkBackend) -> SinkDatabase:
+    # TODO: drop after Neptune cutover
+    # Needed only during cutover to serve Neo4j-written scans from a Neptune-
+    # configured API pod (and vice versa). Once every scan is on Neptune,
+    # `get_backend_for_scan` becomes a one-liner returning `get_backend()`.
+    if name in _secondary_backends:
+        return _secondary_backends[name]
+
+    with _lock:
+        if name not in _secondary_backends:
+            backend = _build_backend(name)
+            backend.init()
+            _secondary_backends[name] = backend
+
+    return _secondary_backends[name]
@@ -0,0 +1,454 @@
+"""Neo4j sink implementation.
+
+Owns a Neo4j driver independent from the staging driver. On OSS and local dev
+this is the only sink; on hosted deployments it runs only as a legacy read
+path while phase-1 drains tenant DBs.
+"""
+
+import atexit
+import logging
+import threading
+import time
+from collections.abc import Iterator
+from contextlib import AbstractContextManager, contextmanager
+from typing import Any
+
+import neo4j
+import neo4j.exceptions
+from api.attack_paths.retryable_session import RetryableSession
+from api.attack_paths.sink.base import SinkDatabase
+from config.env import env
+from django.conf import settings
+
+logging.getLogger("neo4j").setLevel(logging.ERROR)
+logging.getLogger("neo4j").propagate = False
+
+logger = logging.getLogger(__name__)
+
+SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
+    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
+)
+READ_QUERY_TIMEOUT_SECONDS = env.int(
+    "ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
+)
+CONN_ACQUISITION_TIMEOUT = env.int("NEO4J_CONN_ACQUISITION_TIMEOUT", default=15)
+# TCP connect timeout, ordered below the acquisition timeout so an unreachable
+# host can't pin a request or the readiness probe longer than this.
+CONNECTION_TIMEOUT = env.int("NEO4J_CONNECTION_TIMEOUT", default=5)
+MAX_CONNECTION_LIFETIME = env.int("NEO4J_MAX_CONNECTION_LIFETIME", default=7200)
+MAX_CONNECTION_POOL_SIZE = env.int("NEO4J_MAX_CONNECTION_POOL_SIZE", default=50)
+
+READ_EXCEPTION_CODES = [
+    "Neo.ClientError.Statement.AccessMode",
+    "Neo.ClientError.Procedure.ProcedureNotFound",
+]
+CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
+DATABASE_NOT_FOUND_CODE = "Neo.ClientError.Database.DatabaseNotFound"
+
+
+class Neo4jSink(SinkDatabase):
+    """Neo4j-backed sink. Multi-database cluster; tenant isolation is physical."""
+
+    def __init__(self) -> None:
+        self._driver: neo4j.Driver | None = None
+        self._lock = threading.Lock()
+        self._atexit_registered = False
+
+    # Driver
+
+    def _config(self) -> dict:
+        return settings.DATABASES["neo4j"]
+
+    def _uri(self) -> str:
+        cfg = self._config()
+        host = cfg["HOST"]
+        port = cfg["PORT"]
+        if not host or not port:
+            raise RuntimeError(
+                "NEO4J_HOST / NEO4J_PORT must be set when ATTACK_PATHS_SINK_DATABASE=neo4j"
+            )
+        return f"bolt://{host}:{port}"
+
+    def init(self) -> neo4j.Driver:
+        if self._driver is not None:
+            return self._driver
+        with self._lock:
+            if self._driver is None:
+                cfg = self._config()
+                self._driver = neo4j.GraphDatabase.driver(
+                    self._uri(),
+                    auth=(cfg["USER"], cfg["PASSWORD"]),
+                    keep_alive=True,
+                    max_connection_lifetime=MAX_CONNECTION_LIFETIME,
+                    connection_timeout=CONNECTION_TIMEOUT,
+                    connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
+                    max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
+                )
+                # Eager connectivity check is best-effort:
+                # A Neo4j that is down at boot must not crash the process, same degradation model as Postgres
+                # The driver reconnects lazily on first use
+                # /health/ready surfaces the outage until it recovers
+                try:
+                    self._driver.verify_connectivity()
+
+                except Exception:
+                    logger.warning(
+                        "Neo4j sink unreachable at init; continuing with a lazily-reconnecting driver",
+                        exc_info=True,
+                    )
+
+                if not self._atexit_registered:
+                    atexit.register(self.close)
+                    self._atexit_registered = True
+        return self._driver
+
+    def _get_driver(self) -> neo4j.Driver:
+        return self.init()
+
+    def verify_connectivity(self) -> None:
+        self._get_driver().verify_connectivity()
+
+    def close(self) -> None:
+        with self._lock:
+            if self._driver is not None:
+                try:
+                    self._driver.close()
+                finally:
+                    self._driver = None
+
+    # Sessions
+
+    @contextmanager
+    def get_session(
+        self,
+        database: str | None = None,
+        default_access_mode: str | None = None,
+    ) -> Iterator[RetryableSession]:
+        from api.attack_paths.database import (
+            ClientStatementException,
+            GraphDatabaseQueryException,
+            WriteQueryNotAllowedException,
+        )
+
+        session_wrapper: RetryableSession | None = None
+        try:
+            session_wrapper = RetryableSession(
+                session_factory=lambda: self._get_driver().session(
+                    database=database, default_access_mode=default_access_mode
+                ),
+                max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
+            )
+            yield session_wrapper
+
+        except neo4j.exceptions.Neo4jError as exc:
+            if (
+                default_access_mode == neo4j.READ_ACCESS
+                and exc.code
+                and exc.code in READ_EXCEPTION_CODES
+            ):
+                raise WriteQueryNotAllowedException(
+                    message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
+                )
+
+            message = exc.message if exc.message is not None else str(exc)
+            if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
+                raise ClientStatementException(message=message, code=exc.code)
+            raise GraphDatabaseQueryException(message=message, code=exc.code)
+
+        finally:
+            if session_wrapper is not None:
+                session_wrapper.close()
+
+    # Operations
+
+    def execute_read_query(
+        self,
+        database: str,
+        cypher: str,
+        parameters: dict[str, Any] | None = None,
+    ) -> neo4j.graph.Graph:
+        with self.get_session(
+            database, default_access_mode=neo4j.READ_ACCESS
+        ) as session:
+
+            def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
+                result = tx.run(
+                    cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
+                )
+                return result.graph()
+
+            return session.execute_read(_run)
+
+    def create_database(self, database: str) -> None:
+        with self.get_session() as session:
+            session.run(
+                "CREATE DATABASE $database IF NOT EXISTS", {"database": database}
+            )
+
+    def drop_database(self, database: str) -> None:
+        with self.get_session() as session:
+            session.run(f"DROP DATABASE `{database}` IF EXISTS DESTROY DATA")
+
+    def drop_subgraph(self, database: str, provider_id: str) -> int:
+        """Delete all nodes for a provider from a tenant database, batched.
+
+        Deletes relationships then nodes in batches (not `DETACH DELETE`) so a
+        dense provider's graph cannot exceed Neo4j's transaction memory limit.
+        Silently returns 0 if the database doesn't exist.
+        """
+        from api.attack_paths.database import GraphDatabaseQueryException
+        from tasks.jobs.attack_paths.config import (
+            BATCH_SIZE,
+            PROVIDER_RESOURCE_LABEL,
+            get_provider_label,
+        )
+
+        provider_label = get_provider_label(provider_id)
+        deleted_nodes = 0
+        deleted_relationships = 0
+        relationship_batches = 0
+        node_batches = 0
+        drop_t0 = time.perf_counter()
+
+        logger.info(
+            "Dropping provider graph from Neo4j sink database %s "
+            "(provider=%s, provider_label=%s)",
+            database,
+            provider_id,
+            provider_label,
+        )
+
+        try:
+            logger.info(
+                "Opening Neo4j sink session for provider graph drop "
+                "(database=%s, provider=%s)",
+                database,
+                provider_id,
+            )
+            with self.get_session(database) as session:
+                logger.info(
+                    "Opened Neo4j sink session for provider graph drop "
+                    "(database=%s, provider=%s)",
+                    database,
+                    provider_id,
+                )
+                # Phase 1: delete relationships incident to provider nodes in
+                # batches. The undirected pattern matches an edge between two
+                # provider nodes from both ends, so `DISTINCT r` dedupes it to
+                # delete a full batch of unique relationships each round.
+                deleted_count = 1
+                while deleted_count > 0:
+                    next_batch = relationship_batches + 1
+                    logger.info(
+                        "Deleting relationship batch from Neo4j sink database %s "
+                        "(provider=%s, batch=%s, total_rels=%s, elapsed=%.3fs)",
+                        database,
+                        provider_id,
+                        next_batch,
+                        deleted_relationships,
+                        time.perf_counter() - drop_t0,
+                    )
+                    result = session.run(
+                        f"""
+                        MATCH (:`{provider_label}`)-[r]-()
+                        WITH DISTINCT r LIMIT $batch_size
+                        DELETE r
+                        RETURN COUNT(r) AS deleted_rels_count
+                        """,
+                        {"batch_size": BATCH_SIZE},
+                    )
+                    deleted_count = result.single().get("deleted_rels_count", 0)
+                    if deleted_count > 0:
+                        relationship_batches += 1
+                        deleted_relationships += deleted_count
+                        logger.info(
+                            "Deleted relationship batch from Neo4j sink database %s "
+                            "(provider=%s, batch=%s, deleted_rels=%s, "
+                            "total_rels=%s, elapsed=%.3fs)",
+                            database,
+                            provider_id,
+                            relationship_batches,
+                            deleted_count,
+                            deleted_relationships,
+                            time.perf_counter() - drop_t0,
+                        )
+
+                # Phase 2: delete the now relationship-free nodes in batches.
+                deleted_count = 1
+                while deleted_count > 0:
+                    next_batch = node_batches + 1
+                    logger.info(
+                        "Deleting node batch from Neo4j sink database %s "
+                        "(provider=%s, batch=%s, total_nodes=%s, elapsed=%.3fs)",
+                        database,
+                        provider_id,
+                        next_batch,
+                        deleted_nodes,
+                        time.perf_counter() - drop_t0,
+                    )
+                    result = session.run(
+                        f"""
+                        MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`)
+                        WITH n LIMIT $batch_size
+                        DELETE n
+                        RETURN COUNT(n) AS deleted_nodes_count
+                        """,
+                        {"batch_size": BATCH_SIZE},
+                    )
+                    deleted_count = result.single().get("deleted_nodes_count", 0)
+                    if deleted_count > 0:
+                        node_batches += 1
+                        deleted_nodes += deleted_count
+                        logger.info(
+                            "Deleted node batch from Neo4j sink database %s "
+                            "(provider=%s, batch=%s, deleted_nodes=%s, "
+                            "total_nodes=%s, elapsed=%.3fs)",
+                            database,
+                            provider_id,
+                            node_batches,
+                            deleted_count,
+                            deleted_nodes,
+                            time.perf_counter() - drop_t0,
+                        )
+
+        except GraphDatabaseQueryException as exc:
+            if exc.code == DATABASE_NOT_FOUND_CODE:
+                logger.info(
+                    "Skipped provider graph drop from Neo4j sink database %s "
+                    "(provider=%s, reason=database_not_found, elapsed=%.3fs)",
+                    database,
+                    provider_id,
+                    time.perf_counter() - drop_t0,
+                )
+                return 0
+            raise
+
+        logger.info(
+            "Finished dropping provider graph from Neo4j sink database %s "
+            "(provider=%s, relationship_batches=%s, deleted_rels=%s, "
+            "node_batches=%s, deleted_nodes=%s, elapsed=%.3fs)",
+            database,
+            provider_id,
+            relationship_batches,
+            deleted_relationships,
+            node_batches,
+            deleted_nodes,
+            time.perf_counter() - drop_t0,
+        )
+        return deleted_nodes
+
+    def has_provider_data(self, database: str, provider_id: str) -> bool:
+        from api.attack_paths.database import GraphDatabaseQueryException
+        from tasks.jobs.attack_paths.config import (
+            PROVIDER_RESOURCE_LABEL,
+            get_provider_label,
+        )
+
+        provider_label = get_provider_label(provider_id)
+        query = (
+            f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
+        )
+        try:
+            with self.get_session(
+                database, default_access_mode=neo4j.READ_ACCESS
+            ) as session:
+                result = session.run(query)
+                return result.single() is not None
+
+        except GraphDatabaseQueryException as exc:
+            if exc.code == DATABASE_NOT_FOUND_CODE:
+                return False
+            raise
+
+    def clear_cache(self, database: str) -> None:
+        from api.attack_paths.database import GraphDatabaseQueryException
+
+        try:
+            with self.get_session(database) as session:
+                session.run("CALL db.clearQueryCaches()")
+        except GraphDatabaseQueryException as exc:
+            logger.warning(
+                f"Failed to clear query cache for database `{database}`: {exc}"
+            )
+
+    # Sync write path
+
+    def ensure_sync_indexes(self, database: str) -> None:
+        """Create the `_provider_element_id` lookup index on `_ProviderResource`.
+
+        Every synced node carries the `_ProviderResource` label, so a single
+        index covers both node-upserts and relationship endpoint MATCHes.
+        Without this index the rel sync degrades to a label scan per row and
+        large provider syncs become unworkable.
+        """
+        from tasks.jobs.attack_paths.config import (
+            PROVIDER_ELEMENT_ID_PROPERTY,
+            PROVIDER_RESOURCE_LABEL,
+        )
+
+        query = (
+            f"CREATE INDEX provider_element_id_idx IF NOT EXISTS "
+            f"FOR (n:`{PROVIDER_RESOURCE_LABEL}`) "
+            f"ON (n.`{PROVIDER_ELEMENT_ID_PROPERTY}`)"
+        )
+        with self.get_session(database) as session:
+            session.run(query).consume()
+
+    def write_nodes(
+        self,
+        database: str,
+        labels: str,
+        rows: list[dict[str, Any]],
+    ) -> None:
+        if not rows:
+            return
+        from tasks.jobs.attack_paths.config import (
+            PROVIDER_ELEMENT_ID_PROPERTY,
+            PROVIDER_RESOURCE_LABEL,
+        )
+
+        query = f"""
+            UNWIND $rows AS row
+            MERGE (n:`{PROVIDER_RESOURCE_LABEL}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}})
+            SET n:{labels}
+            SET n += row.props
+        """
+        with self.get_session(database) as session:
+            session.run(query, {"rows": rows}).consume()
+
+    def write_relationships(
+        self,
+        database: str,
+        rel_type: str,
+        provider_id: str,
+        rows: list[dict[str, Any]],
+    ) -> None:
+        if not rows:
+            return
+        from tasks.jobs.attack_paths.config import (
+            PROVIDER_ELEMENT_ID_PROPERTY,
+            PROVIDER_RESOURCE_LABEL,
+            get_provider_label,
+        )
+
+        provider_label = get_provider_label(provider_id)
+        query = f"""
+            UNWIND $rows AS row
+            MATCH (s:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.start_element_id}})
+            MATCH (t:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.end_element_id}})
+            MERGE (s)-[r:`{rel_type}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}}]->(t)
+            SET r += row.props
+        """
+        with self.get_session(database) as session:
+            session.run(query, {"rows": rows}).consume()
+
+    # For compatibility with test harnesses that patch the concrete driver
+    def get_driver(self) -> neo4j.Driver:
+        return self._get_driver()
+
+
+# Helper for tests / external callers that want a writer session specifically
+def get_read_session(
+    sink: Neo4jSink, database: str
+) -> AbstractContextManager[RetryableSession]:
+    return sink.get_session(database, default_access_mode=neo4j.READ_ACCESS)
@@ -0,0 +1,524 @@
+"""AWS Neptune sink implementation.
+
+Dual Bolt drivers: one against the writer endpoint for workers, one against
+the reader endpoint for the API read path. If `NEPTUNE_READER_ENDPOINT` is
+unset the reader falls back to the writer driver so single-node clusters work.
+
+Neptune is single-database. The `database` argument on the SinkDatabase
+protocol is ignored; tenant / provider isolation is enforced by labels that
+the sync step already writes on every node (see tasks/jobs/attack_paths/sync.py).
+
+SigV4 auth lives at the bottom of this file as `neptune_auth_provider`. The
+neo4j driver invokes the returned callable on each token refresh.
+"""
+
+import atexit
+import datetime
+import json
+import logging
+import threading
+import time
+from collections.abc import Callable, Iterator
+from contextlib import contextmanager
+from typing import Any
+from urllib.parse import urlsplit
+
+import neo4j
+import neo4j.exceptions
+from api.attack_paths.retryable_session import RetryableSession
+from api.attack_paths.sink.base import SinkDatabase
+from botocore.auth import SigV4Auth
+from botocore.awsrequest import AWSRequest
+from botocore.session import Session as BotoSession
+from config.env import env
+from django.conf import settings
+from neo4j.auth_management import AuthManagers, ExpiringAuth
+
+logging.getLogger("neo4j").setLevel(logging.ERROR)
+logging.getLogger("neo4j").propagate = False
+
+logger = logging.getLogger(__name__)
+
+SERVICE_UNAVAILABLE_MAX_RETRIES = env.int(
+    "ATTACK_PATHS_SERVICE_UNAVAILABLE_MAX_RETRIES", default=3
+)
+READ_QUERY_TIMEOUT_SECONDS = env.int(
+    "ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30
+)
+# Neptune serverless cold-start can be >30s; give the driver room
+CONN_ACQUISITION_TIMEOUT = env.int("NEPTUNE_CONN_ACQUISITION_TIMEOUT", default=60)
+# TCP connect timeout, ordered below the acquisition timeout so an unreachable
+# endpoint can't pin a request or the readiness probe longer than this. Kept
+# generous: cold-start delays query execution, not the socket connect.
+CONNECTION_TIMEOUT = env.int("NEPTUNE_CONNECTION_TIMEOUT", default=10)
+# Roll connections hourly so SigV4 rotations and cert refreshes don't strand long-lived pool entries
+MAX_CONNECTION_LIFETIME = env.int("NEPTUNE_MAX_CONNECTION_LIFETIME", default=3600)
+MAX_CONNECTION_POOL_SIZE = env.int("NEPTUNE_MAX_CONNECTION_POOL_SIZE", default=50)
+
+READ_EXCEPTION_CODES = [
+    "Neo.ClientError.Statement.AccessMode",
+    "Neo.ClientError.Procedure.ProcedureNotFound",
+]
+CLIENT_STATEMENT_EXCEPTION_PREFIX = "Neo.ClientError.Statement."
+
+# Refresh 60s before the 5-minute SigV4 window closes
+SIGV4_TOKEN_LIFETIME_MINUTES = 4
+
+
+class NeptuneSink(SinkDatabase):
+    """Neptune-backed sink. Single database; isolation is label-based."""
+
+    def __init__(self) -> None:
+        self._writer: neo4j.Driver | None = None
+        self._reader: neo4j.Driver | None = None
+        self._lock = threading.Lock()
+        self._atexit_registered = False
+
+    # Config
+
+    def _config(self) -> dict:
+        return settings.DATABASES["neptune"]
+
+    def _bolt_uri(self, endpoint: str, port: str) -> str:
+        return f"bolt+s://{endpoint}:{port}"
+
+    def _https_url(self, endpoint: str, port: str) -> str:
+        return f"https://{endpoint}:{port}"
+
+    def _build_driver(self, endpoint: str) -> neo4j.Driver:
+        cfg = self._config()
+        port = cfg["PORT"]
+        region = cfg["REGION"]
+        if not endpoint or not region:
+            raise RuntimeError(
+                "NEPTUNE_WRITER_ENDPOINT and AWS_REGION must be set when "
+                "ATTACK_PATHS_SINK_DATABASE=neptune"
+            )
+        return neo4j.GraphDatabase.driver(
+            self._bolt_uri(endpoint, port),
+            auth=AuthManagers.bearer(
+                neptune_auth_provider(region, self._https_url(endpoint, port))
+            ),
+            keep_alive=True,
+            max_connection_lifetime=MAX_CONNECTION_LIFETIME,
+            connection_timeout=CONNECTION_TIMEOUT,
+            connection_acquisition_timeout=CONN_ACQUISITION_TIMEOUT,
+            max_connection_pool_size=MAX_CONNECTION_POOL_SIZE,
+            max_transaction_retry_time=0,
+        )
+
+    # Lifecycle
+
+    def init(self) -> None:
+        if self._writer is not None:
+            return
+        with self._lock:
+            if self._writer is None:
+                cfg = self._config()
+                writer_endpoint = cfg["WRITER_ENDPOINT"]
+                reader_endpoint = cfg["READER_ENDPOINT"] or writer_endpoint
+
+                # Eager connectivity checks are best-effort
+                # A Neptune that is down at boot must not crash the process, same degradation model as Postgres
+                # Drivers reconnect lazily on first use
+                # /health/ready surfaces the outage until it recovers
+                self._writer = self._build_driver(writer_endpoint)
+                self._verify_best_effort(self._writer, "writer")
+
+                if reader_endpoint == writer_endpoint:
+                    self._reader = self._writer
+
+                else:
+                    self._reader = self._build_driver(reader_endpoint)
+                    self._verify_best_effort(self._reader, "reader")
+
+                if not self._atexit_registered:
+                    atexit.register(self.close)
+                    self._atexit_registered = True
+
+    def close(self) -> None:
+        with self._lock:
+            # `Driver.close()` is idempotent, so closing the same driver twice
+            # (when reader aliases writer on single-endpoint configs) is safe
+            for driver in (self._reader, self._writer):
+                if driver is None:
+                    continue
+                try:
+                    driver.close()
+                except Exception:  # pragma: no cover - best-effort
+                    pass
+            self._writer = None
+            self._reader = None
+
+    # Sessions
+
+    def _get_writer(self) -> neo4j.Driver:
+        self.init()
+        assert self._writer is not None
+        return self._writer
+
+    def _get_reader(self) -> neo4j.Driver:
+        self.init()
+        assert self._reader is not None
+        return self._reader
+
+    @staticmethod
+    def _verify_best_effort(driver: neo4j.Driver, role: str) -> None:
+        try:
+            driver.verify_connectivity()
+
+        except Exception:
+            logger.warning(
+                "Neptune %s endpoint unreachable at init; continuing with a lazily-reconnecting driver",
+                role,
+                exc_info=True,
+            )
+
+    def verify_connectivity(self) -> None:
+        # The API read path uses the reader driver
+        # On single-endpoint clusters it aliases the writer, so this also covers the writer
+        # A writer-only outage is a workers' concern (no HTTP probe there) and deliberately does not fail API readiness
+        self._get_reader().verify_connectivity()
+
+    @contextmanager
+    def get_session(
+        self,
+        database: str | None = None,  # noqa: ARG002 - ignored on Neptune
+        default_access_mode: str | None = None,
+    ) -> Iterator[RetryableSession]:
+        from api.attack_paths.database import (
+            ClientStatementException,
+            GraphDatabaseQueryException,
+            WriteQueryNotAllowedException,
+        )
+
+        driver = (
+            self._get_reader()
+            if default_access_mode == neo4j.READ_ACCESS
+            else self._get_writer()
+        )
+
+        session_wrapper: RetryableSession | None = None
+        try:
+            session_wrapper = RetryableSession(
+                session_factory=lambda: driver.session(
+                    default_access_mode=default_access_mode
+                ),
+                max_retries=SERVICE_UNAVAILABLE_MAX_RETRIES,
+            )
+            yield session_wrapper
+
+        except neo4j.exceptions.Neo4jError as exc:
+            if (
+                default_access_mode == neo4j.READ_ACCESS
+                and exc.code
+                and exc.code in READ_EXCEPTION_CODES
+            ):
+                raise WriteQueryNotAllowedException(
+                    message="Read query not allowed", code=READ_EXCEPTION_CODES[0]
+                )
+
+            message = exc.message if exc.message is not None else str(exc)
+            if exc.code and exc.code.startswith(CLIENT_STATEMENT_EXCEPTION_PREFIX):
+                raise ClientStatementException(message=message, code=exc.code)
+            raise GraphDatabaseQueryException(message=message, code=exc.code)
+
+        finally:
+            if session_wrapper is not None:
+                session_wrapper.close()
+
+    # Operations
+
+    def execute_read_query(
+        self,
+        database: str,  # noqa: ARG002 - ignored on Neptune
+        cypher: str,
+        parameters: dict[str, Any] | None = None,
+    ) -> neo4j.graph.Graph:
+        with self.get_session(default_access_mode=neo4j.READ_ACCESS) as session:
+
+            def _run(tx: neo4j.ManagedTransaction) -> neo4j.graph.Graph:
+                result = tx.run(
+                    cypher, parameters or {}, timeout=READ_QUERY_TIMEOUT_SECONDS
+                )
+                return result.graph()
+
+            return session.execute_read(_run)
+
+    def create_database(self, database: str) -> None:  # noqa: ARG002
+        # Neptune clusters are single-database; there is nothing to create.
+        return None
+
+    def drop_database(self, database: str) -> None:  # noqa: ARG002
+        # Neptune clusters are single-database; there is nothing to drop.
+        return None
+
+    def drop_subgraph(self, database: str, provider_id: str) -> int:  # noqa: ARG002
+        """Delete a provider's subgraph in two bounded phases.
+
+        Neptune write transactions are capped at ~2 minutes. A naive
+        `DETACH DELETE` on a label-scanned batch grows unbounded with graph
+        density (one node can drag thousands of relationships into the same
+        transaction). Instead:
+
+        1. Delete relationships incident to provider nodes, one fixed-size
+           batch per transaction.
+        2. Delete the now-orphaned nodes, one fixed-size batch per transaction.
+
+        Each transaction does work proportional to `batch_size`, never to the
+        graph's branching factor.
+        """
+        from tasks.jobs.attack_paths.config import (
+            BATCH_SIZE,
+            PROVIDER_RESOURCE_LABEL,
+            get_provider_label,
+        )
+
+        provider_label = get_provider_label(provider_id)
+        deleted_relationships = 0
+        relationship_batches = 0
+        node_batches = 0
+        drop_t0 = time.perf_counter()
+
+        logger.info(
+            "Dropping provider graph from Neptune sink "
+            "(provider=%s, provider_label=%s)",
+            provider_id,
+            provider_label,
+        )
+
+        logger.info(
+            "Opening Neptune writer session for provider graph drop (provider=%s)",
+            provider_id,
+        )
+        with self.get_session() as session:
+            logger.info(
+                "Opened Neptune writer session for provider graph drop (provider=%s)",
+                provider_id,
+            )
+            while True:
+                next_batch = relationship_batches + 1
+                logger.info(
+                    "Deleting relationship batch from Neptune sink "
+                    "(provider=%s, batch=%s, total_rels=%s, elapsed=%.3fs)",
+                    provider_id,
+                    next_batch,
+                    deleted_relationships,
+                    time.perf_counter() - drop_t0,
+                )
+                result = session.run(
+                    f"""
+                    MATCH (:`{provider_label}`)-[r]-()
+                    WITH DISTINCT r LIMIT $batch_size
+                    DELETE r
+                    RETURN COUNT(r) AS deleted_rels_count
+                    """,
+                    {"batch_size": BATCH_SIZE},
+                )
+                record = result.single()
+                deleted_rels = (record["deleted_rels_count"] if record else 0) or 0
+                if deleted_rels == 0:
+                    break
+                relationship_batches += 1
+                deleted_relationships += deleted_rels
+                logger.info(
+                    "Deleted relationship batch from Neptune sink "
+                    "(provider=%s, batch=%s, deleted_rels=%s, total_rels=%s, "
+                    "elapsed=%.3fs)",
+                    provider_id,
+                    relationship_batches,
+                    deleted_rels,
+                    deleted_relationships,
+                    time.perf_counter() - drop_t0,
+                )
+
+            deleted_nodes = 0
+            while True:
+                next_batch = node_batches + 1
+                logger.info(
+                    "Deleting node batch from Neptune sink "
+                    "(provider=%s, batch=%s, total_nodes=%s, elapsed=%.3fs)",
+                    provider_id,
+                    next_batch,
+                    deleted_nodes,
+                    time.perf_counter() - drop_t0,
+                )
+                result = session.run(
+                    f"""
+                    MATCH (n:`{PROVIDER_RESOURCE_LABEL}`:`{provider_label}`)
+                    WITH n LIMIT $batch_size
+                    DELETE n
+                    RETURN COUNT(n) AS deleted_nodes_count
+                    """,
+                    {"batch_size": BATCH_SIZE},
+                )
+                record = result.single()
+                deleted = (record["deleted_nodes_count"] if record else 0) or 0
+                if deleted == 0:
+                    break
+                node_batches += 1
+                deleted_nodes += deleted
+                logger.info(
+                    "Deleted node batch from Neptune sink "
+                    "(provider=%s, batch=%s, deleted_nodes=%s, total_nodes=%s, "
+                    "elapsed=%.3fs)",
+                    provider_id,
+                    node_batches,
+                    deleted,
+                    deleted_nodes,
+                    time.perf_counter() - drop_t0,
+                )
+
+        logger.info(
+            "Finished dropping provider graph from Neptune sink "
+            "(provider=%s, relationship_batches=%s, deleted_rels=%s, "
+            "node_batches=%s, deleted_nodes=%s, elapsed=%.3fs)",
+            provider_id,
+            relationship_batches,
+            deleted_relationships,
+            node_batches,
+            deleted_nodes,
+            time.perf_counter() - drop_t0,
+        )
+        return deleted_nodes
+
+    def has_provider_data(self, database: str, provider_id: str) -> bool:  # noqa: ARG002
+        from tasks.jobs.attack_paths.config import (
+            PROVIDER_RESOURCE_LABEL,
+            get_provider_label,
+        )
+
+        provider_label = get_provider_label(provider_id)
+        query = (
+            f"MATCH (n:{PROVIDER_RESOURCE_LABEL}:`{provider_label}`) RETURN 1 LIMIT 1"
+        )
+        with self.get_session(default_access_mode=neo4j.READ_ACCESS) as session:
+            result = session.run(query)
+            return result.single() is not None
+
+    def clear_cache(self, database: str) -> None:  # noqa: ARG002
+        # Neptune has no user-facing cache-clear procedure; no-op.
+        return None
+
+    # Sync write path
+
+    def ensure_sync_indexes(self, database: str) -> None:  # noqa: ARG002
+        # Neptune routes node and relationship lookups through `~id`, which is the cluster's primary key
+        # No additional index is needed or supported
+        return None
+
+    def write_nodes(
+        self,
+        database: str,  # noqa: ARG002
+        labels: str,
+        rows: list[dict[str, Any]],
+    ) -> None:
+        if not rows:
+            return
+        from tasks.jobs.attack_paths.config import (
+            PROVIDER_ELEMENT_ID_PROPERTY,
+            PROVIDER_RESOURCE_LABEL,
+        )
+
+        # MERGE on `~id` is the documented and engine-optimized idempotent
+        # upsert pattern for Neptune openCypher. The label inside the MERGE
+        # matters: Neptune assigns a default `vertex` label to any node
+        # created without an explicit one, so we pin `_ProviderResource`
+        # (which every synced node carries anyway) at MERGE-time. Additional
+        # labels are added after
+        #
+        # We also write `_provider_element_id` as a regular property so
+        # non-sync code (drop_subgraph, query helpers) keeps a stable contract
+        # that doesn't know about `~id`
+        query = f"""
+            UNWIND $rows AS row
+            MERGE (n:`{PROVIDER_RESOURCE_LABEL}` {{`~id`: row.provider_element_id}})
+            SET n:{labels}
+            SET n += row.props
+            SET n.`{PROVIDER_ELEMENT_ID_PROPERTY}` = row.provider_element_id
+        """
+        with self.get_session() as session:
+            session.run(query, {"rows": rows}).consume()
+
+    def write_relationships(
+        self,
+        database: str,  # noqa: ARG002
+        rel_type: str,
+        provider_id: str,  # noqa: ARG002 - encoded in start/end `~id` already
+        rows: list[dict[str, Any]],
+    ) -> None:
+        if not rows:
+            return
+        from tasks.jobs.attack_paths.config import PROVIDER_ELEMENT_ID_PROPERTY
+
+        # `id(n) = $value` is Neptune's parameterized fast path; both endpoint
+        # MATCHes resolve in O(1) via the system `~id`, so per-row work stays
+        # bounded regardless of batch size
+        query = f"""
+            UNWIND $rows AS row
+            MATCH (s) WHERE id(s) = row.start_element_id
+            MATCH (e) WHERE id(e) = row.end_element_id
+            MERGE (s)-[r:`{rel_type}` {{`{PROVIDER_ELEMENT_ID_PROPERTY}`: row.provider_element_id}}]->(e)
+            SET r += row.props
+        """
+        with self.get_session() as session:
+            session.run(query, {"rows": rows}).consume()
+
+    # Test helpers
+
+    def get_writer(self) -> neo4j.Driver:
+        return self._get_writer()
+
+    def get_reader(self) -> neo4j.Driver:
+        return self._get_reader()
+
+
+# SigV4 auth provider
+
+
+class _NeptuneAuthToken(neo4j.Auth):
+    """Neo4j Auth backed by a SigV4-signed GET to `/opencypher`."""
+
+    def __init__(self, region: str, url: str) -> None:
+        session = BotoSession()
+        credentials = session.get_credentials()
+        if credentials is None:
+            raise RuntimeError(
+                "No AWS credentials available for Neptune SigV4 signing. "
+                "Ensure the boto3 credential chain can resolve."
+            )
+        credentials = credentials.get_frozen_credentials()
+
+        request = AWSRequest(method="GET", url=url + "/opencypher")
+        # SigV4 canonical Host must carry the real `host:port`
+        # Neptune runs on a non-default port (8182), so `.hostname` would drop it and break signing
+        request.headers.add_header("Host", urlsplit(url).netloc)
+        SigV4Auth(credentials, "neptune-db", region).add_auth(request)
+
+        auth_obj = {
+            header: request.headers[header]
+            for header in (
+                "Authorization",
+                "X-Amz-Date",
+                "X-Amz-Security-Token",
+                "Host",
+            )
+            if header in request.headers
+        }
+        auth_obj["HttpMethod"] = "GET"
+
+        super().__init__("basic", "username", json.dumps(auth_obj))
+
+
+def neptune_auth_provider(region: str, https_url: str) -> Callable[[], ExpiringAuth]:
+    """Return a callable the neo4j driver can invoke to refresh credentials."""
+
+    def _provider() -> ExpiringAuth:
+        token = _NeptuneAuthToken(region, https_url)
+        expires_at = (
+            datetime.datetime.now(datetime.UTC)
+            + datetime.timedelta(minutes=SIGV4_TOKEN_LIFETIME_MINUTES)
+        ).timestamp()
+        return ExpiringAuth(auth=token, expires_at=expires_at)
+
+    return _provider
@@ -5,6 +5,7 @@ from typing import Any
 import neo4j
 from api.attack_paths import AttackPathsQueryDefinition
 from api.attack_paths import database as graph_database
+from api.attack_paths import sink as sink_module
 from api.attack_paths.cypher_sanitizer import (
    inject_provider_label,
    validate_custom_query,
@@ -14,7 +15,9 @@ from api.attack_paths.queries.schema import (
    RAW_SCHEMA_URL,
    get_cartography_schema_query,
 )
+from api.models import AttackPathsScan
 from config.custom_logging import BackendLogger
+from config.env import env
 from rest_framework.exceptions import APIException, PermissionDenied, ValidationError
 from tasks.jobs.attack_paths.config import (
    INTERNAL_LABELS,
@@ -26,6 +29,10 @@ from tasks.jobs.attack_paths.config import (
 logger = logging.getLogger(BackendLogger.API)


+def _custom_query_timeout_ms() -> int:
+    return env.int("ATTACK_PATHS_READ_QUERY_TIMEOUT_SECONDS", default=30) * 1000
+
+
 # Predefined query helpers


@@ -102,13 +109,13 @@ def execute_query(
    definition: AttackPathsQueryDefinition,
    parameters: dict[str, Any],
    provider_id: str,
+    scan: AttackPathsScan,
 ) -> dict[str, Any]:
    try:
-        graph = graph_database.execute_read_query(
-            database=database_name,
-            cypher=definition.cypher,
-            parameters=parameters,
-        )
+        # TODO: drop after Neptune cutover
+        # Route reads by the scan row's recorded sink, not by current settings.
+        backend = sink_module.get_backend_for_scan(scan)
+        graph = backend.execute_read_query(database_name, definition.cypher, parameters)
        return _serialize_graph(graph, provider_id)

    except graph_database.WriteQueryNotAllowedException:
@@ -142,22 +149,31 @@ def execute_custom_query(
    database_name: str,
    cypher: str,
    provider_id: str,
+    scan: AttackPathsScan,
 ) -> dict[str, Any]:
    # Defense-in-depth for custom queries:
-    # 1. neo4j.READ_ACCESS — prevents mutations at the driver level
-    # 2. inject_provider_label() — regex-based label injection scopes node patterns
-    # 3. _serialize_graph() — post-query filter drops nodes without the provider label
+    # 1. `neo4j.READ_ACCESS` — prevents mutations at the driver level
+    # 2. `inject_provider_label()` — regex-based label injection scopes node patterns
+    # 3. `_serialize_graph()` — post-query filter drops nodes without the provider label
+    # 4. `USING QUERY:TIMEOUTMILLISECONDS` on Neptune — server-side runaway cutoff
    #
    # Layer 2 is best-effort (regex can't fully parse Cypher);
    # layer 3 is the safety net that guarantees provider isolation.
    validate_custom_query(cypher)
    cypher = inject_provider_label(cypher, provider_id)

+    # TODO: drop after Neptune cutover
+    backend = sink_module.get_backend_for_scan(scan)
+
+    # Neptune enforces a cluster-level query timeout; prepending the hint
+    # makes the limit explicit and matches the client-side read timeout.
+    # Applies only when the scan's graph lives in Neptune.
+    if getattr(scan, "sink_backend", None) == "neptune":
+        timeout_ms = _custom_query_timeout_ms()
+        cypher = f"USING QUERY:TIMEOUTMILLISECONDS {timeout_ms}\n{cypher}"
+
    try:
-        graph = graph_database.execute_read_query(
-            database=database_name,
-            cypher=cypher,
-        )
+        graph = backend.execute_read_query(database_name, cypher, None)
        serialized = _serialize_graph(graph, provider_id)
        return _truncate_graph(serialized)

@@ -180,10 +196,11 @@ def execute_custom_query(


 def get_cartography_schema(
-    database_name: str, provider_id: str
+    database_name: str, provider_id: str, scan: AttackPathsScan
 ) -> dict[str, str] | None:
    try:
-        with graph_database.get_session(
+        backend = sink_module.get_backend_for_scan(scan)
+        with backend.get_session(
            database_name, default_access_mode=neo4j.READ_ACCESS
        ) as session:
            result = session.run(get_cartography_schema_query(provider_id))
@@ -2,8 +2,9 @@
 Format (draft-inadarei-api-health-check-06).

 Liveness reports only process status. Readiness verifies that PostgreSQL,
-Valkey and Neo4j are reachable and returns per-dependency detail when any
-of them is unreachable.
+Valkey and the attack-paths graph store (Neo4j or Neptune, per
+``ATTACK_PATHS_SINK_DATABASE``) are reachable and returns per-dependency
+detail when any of them is unreachable.
 """

 from __future__ import annotations
@@ -11,6 +12,8 @@ from __future__ import annotations
 import logging
 import threading
 import time
+from concurrent.futures import ThreadPoolExecutor
+from concurrent.futures import TimeoutError as FuturesTimeoutError
 from contextlib import suppress
 from datetime import UTC, datetime
 from typing import Any
@@ -37,9 +40,28 @@ STATUS_FAIL = "fail"
 STATUS_WARN = "warn"

 # Short socket timeout so a stuck Valkey cannot stall the probe.
-# Neo4j inherits its driver-level ``connection_acquisition_timeout``.
 VALKEY_PROBE_TIMEOUT_SECONDS = 2

+# Probe-scoped budget for the graph database.
+# ``Driver.verify_connectivity()`` takes no timeout; its only bound is the
+# driver-level ``connection_acquisition_timeout`` (60s on Neptune). The
+# probe needs its own budget, independent of the workload driver, so a
+# graph-database outage cannot pin a worker thread (and the readiness lock)
+# for a minute.
+GRAPH_DB_PROBE_TIMEOUT_SECONDS = 5
+
+# Bounded pool that enforces ``GRAPH_DB_PROBE_TIMEOUT_SECONDS``. If the
+# graph database is unreachable the probe call blocks until the driver's
+# own acquisition timeout fires; we abandon the future after the budget and
+# report ``fail``. Orphaned tasks are capped by ``max_workers`` plus the 3s
+# readiness cache plus the per-IP throttle, so they cannot pile up: worst
+# case during a graph-database outage is every readiness call failing fast
+# in ``GRAPH_DB_PROBE_TIMEOUT_SECONDS`` with at most 2 background threads
+# stuck for <= the driver acquisition timeout.
+_graph_db_probe_executor = ThreadPoolExecutor(
+    max_workers=2, thread_name_prefix="health-graph-db-probe"
+)
+
 # Brief cache window so high-frequency probes (ALB target groups, scrapers)
 # do not stampede the actual dependency checks.
 CACHE_CONTROL_HEADER = "max-age=3, must-revalidate"
@@ -109,11 +131,24 @@ def _probe_valkey() -> None:
            client.close()


-def _probe_neo4j() -> None:
-    # Lazy import: avoids pulling attack_paths into the boot import graph.
-    from api.attack_paths.database import get_driver
+def _graph_db_component_id() -> str:
+    """Return the active graph database name for the ``componentId`` field."""
+    return settings.ATTACK_PATHS_SINK_DATABASE.strip().lower()

-    get_driver().verify_connectivity()
+
+def _probe_graph_db() -> None:
+    # Lazy import: avoids pulling attack_paths into the boot import graph
+    from api.attack_paths.database import verify_connectivity
+
+    future = _graph_db_probe_executor.submit(verify_connectivity)
+    try:
+        future.result(timeout=GRAPH_DB_PROBE_TIMEOUT_SECONDS)
+    except FuturesTimeoutError as exc:
+        # Do not wait for the abandoned task; it ends when the driver's own acquisition timeout fires
+        future.cancel()
+        raise TimeoutError(
+            f"graph-db probe exceeded {GRAPH_DB_PROBE_TIMEOUT_SECONDS}s"
+        ) from exc


 def _build_check_entry(
@@ -176,14 +211,18 @@ def _readiness_payload() -> tuple[dict[str, Any], int]:
        ):
            return snapshot[1], snapshot[2]

+        graph_db_component_id = _graph_db_component_id()
+
        postgres_result, postgres_ms = _measure("postgres", _probe_postgres)
        valkey_result, valkey_ms = _measure("valkey", _probe_valkey)
-        neo4j_result, neo4j_ms = _measure("neo4j", _probe_neo4j)
+        graph_db_result, graph_db_ms = _measure(graph_db_component_id, _probe_graph_db)

        entries = [
            _build_check_entry("postgres", "datastore", postgres_result, postgres_ms),
            _build_check_entry("valkey", "datastore", valkey_result, valkey_ms),
-            _build_check_entry("neo4j", "datastore", neo4j_result, neo4j_ms),
+            _build_check_entry(
+                graph_db_component_id, "datastore", graph_db_result, graph_db_ms
+            ),
        ]
        overall = _aggregate_status(entries)

@@ -191,7 +230,7 @@ def _readiness_payload() -> tuple[dict[str, Any], int]:
        payload["checks"] = {
            "postgres:responseTime": [entries[0]],
            "valkey:responseTime": [entries[1]],
-            "neo4j:responseTime": [entries[2]],
+            "graphdb:responseTime": [entries[2]],
        }

        http_status = (
@@ -233,10 +272,10 @@ class LivenessView(APIView):
 class ReadinessView(APIView):
    """Readiness probe.

-    Returns 200 when PostgreSQL, Valkey and Neo4j all respond, or 503 with
-    per-dependency detail when any of them is unreachable. Per-IP throttle
-    plus the short in-process result cache cap the real dependency hits
-    regardless of inbound traffic shape.
+    Returns 200 when PostgreSQL, Valkey and the attack-paths graph store
+    all respond, or 503 with per-dependency detail when any of them is
+    unreachable. Per-IP throttle plus the short in-process result cache cap
+    the real dependency hits regardless of inbound traffic shape.
    """

    authentication_classes: list = []
@@ -0,0 +1,24 @@
+from django.db import migrations, models
+
+
+class Migration(migrations.Migration):
+    dependencies = [
+        ("api", "0095_reconcile_orphan_tasks_periodic_task"),
+    ]
+
+    operations = [
+        migrations.AddField(
+            model_name="attackpathsscan",
+            name="is_migrated",
+            field=models.BooleanField(default=False),
+        ),
+        migrations.AddField(
+            model_name="attackpathsscan",
+            name="sink_backend",
+            field=models.CharField(
+                choices=[("neo4j", "Neo4j"), ("neptune", "Neptune")],
+                default="neo4j",
+                max_length=16,
+            ),
+        ),
+    ]
@@ -757,6 +757,10 @@ class Scan(RowLevelSecurityProtectedModel):


 class AttackPathsScan(RowLevelSecurityProtectedModel):
+    class SinkBackendChoices(models.TextChoices):
+        NEO4J = "neo4j", "Neo4j"
+        NEPTUNE = "neptune", "Neptune"
+
    objects = ActiveProviderManager()
    all_objects = models.Manager()

@@ -805,6 +809,18 @@ class AttackPathsScan(RowLevelSecurityProtectedModel):
    )
    ingestion_exceptions = models.JSONField(default=dict, null=True, blank=True)

+    # True when the scan was synced with the current schema (list-typed
+    # properties materialised as child item nodes). False for pre-cutover scans
+    # still using the previous graph shape. Query catalog selection uses this
+    # flag; physical read routing uses sink_backend below.
+    # TODO: drop after Neptune cutover
+    is_migrated = models.BooleanField(default=False)
+    sink_backend = models.CharField(
+        choices=SinkBackendChoices.choices,
+        default=SinkBackendChoices.NEO4J,
+        max_length=16,
+    )
+
    class Meta(RowLevelSecurityProtectedModel.Meta):
        db_table = "attack_paths_scans"

@@ -92,7 +92,9 @@ def test_prepare_parameters_validates_cast(


 def test_execute_query_serializes_graph(
-    attack_paths_query_definition_factory, attack_paths_graph_stub_classes
+    attack_paths_query_definition_factory,
+    attack_paths_graph_stub_classes,
+    sink_backend_stub,
 ):
    definition = attack_paths_query_definition_factory(
        id="aws-rds",
@@ -135,18 +137,17 @@ def test_execute_query_serializes_graph(

    database_name = "db-tenant-test-tenant-id"

-    with patch(
-        "api.attack_paths.views_helpers.graph_database.execute_read_query",
-        return_value=graph_result,
-    ) as mock_execute_read_query:
-        result = views_helpers.execute_query(
-            database_name, definition, parameters, provider_id=provider_id
-        )
+    sink_backend_stub.execute_read_query.return_value = graph_result
+    result = views_helpers.execute_query(
+        database_name,
+        definition,
+        parameters,
+        provider_id=provider_id,
+        scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
+    )

-    mock_execute_read_query.assert_called_once_with(
-        database=database_name,
-        cypher=definition.cypher,
-        parameters=parameters,
+    sink_backend_stub.execute_read_query.assert_called_once_with(
+        database_name, definition.cypher, parameters
    )
    assert result["nodes"][0]["id"] == "node-1"
    assert result["nodes"][0]["properties"]["complex"]["items"][0] == "value"
@@ -155,6 +156,7 @@ def test_execute_query_serializes_graph(

 def test_execute_query_wraps_graph_errors(
    attack_paths_query_definition_factory,
+    sink_backend_stub,
 ):
    definition = attack_paths_query_definition_factory(
        id="aws-rds",
@@ -167,16 +169,17 @@ def test_execute_query_wraps_graph_errors(
    database_name = "db-tenant-test-tenant-id"
    parameters = {"provider_uid": "123"}

-    with (
-        patch(
-            "api.attack_paths.views_helpers.graph_database.execute_read_query",
-            side_effect=graph_database.GraphDatabaseQueryException("boom"),
-        ),
-        patch("api.attack_paths.views_helpers.logger") as mock_logger,
-    ):
+    sink_backend_stub.execute_read_query.side_effect = (
+        graph_database.GraphDatabaseQueryException("boom")
+    )
+    with patch("api.attack_paths.views_helpers.logger") as mock_logger:
        with pytest.raises(APIException):
            views_helpers.execute_query(
-                database_name, definition, parameters, provider_id="test-provider-123"
+                database_name,
+                definition,
+                parameters,
+                provider_id="test-provider-123",
+                scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
            )

    mock_logger.error.assert_called_once()
@@ -184,6 +187,7 @@ def test_execute_query_wraps_graph_errors(

 def test_execute_query_raises_permission_denied_on_read_only(
    attack_paths_query_definition_factory,
+    sink_backend_stub,
 ):
    definition = attack_paths_query_definition_factory(
        id="aws-rds",
@@ -196,17 +200,20 @@ def test_execute_query_raises_permission_denied_on_read_only(
    database_name = "db-tenant-test-tenant-id"
    parameters = {"provider_uid": "123"}

-    with patch(
-        "api.attack_paths.views_helpers.graph_database.execute_read_query",
-        side_effect=graph_database.WriteQueryNotAllowedException(
+    sink_backend_stub.execute_read_query.side_effect = (
+        graph_database.WriteQueryNotAllowedException(
            message="Read query not allowed",
            code="Neo.ClientError.Statement.AccessMode",
-        ),
-    ):
-        with pytest.raises(PermissionDenied):
-            views_helpers.execute_query(
-                database_name, definition, parameters, provider_id="test-provider-123"
-            )
+        )
+    )
+    with pytest.raises(PermissionDenied):
+        views_helpers.execute_query(
+            database_name,
+            definition,
+            parameters,
+            provider_id="test-provider-123",
+            scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
+        )


 def test_serialize_graph_filters_by_provider_label(attack_paths_graph_stub_classes):
@@ -440,6 +447,7 @@ def test_normalize_custom_query_payload_passthrough_for_flat_dict():

 def test_execute_custom_query_serializes_graph(
    attack_paths_graph_stub_classes,
+    sink_backend_stub,
 ):
    provider_id = "test-provider-123"
    plabel = get_provider_label(provider_id)
@@ -453,50 +461,73 @@ def test_execute_custom_query_serializes_graph(
    graph_result.nodes = [node_1, node_2]
    graph_result.relationships = [relationship]

-    with patch(
-        "api.attack_paths.views_helpers.graph_database.execute_read_query",
-        return_value=graph_result,
-    ) as mock_execute:
-        result = views_helpers.execute_custom_query(
-            "db-tenant-test", "MATCH (n) RETURN n", provider_id
-        )
+    sink_backend_stub.execute_read_query.return_value = graph_result
+    result = views_helpers.execute_custom_query(
+        "db-tenant-test",
+        "MATCH (n) RETURN n",
+        provider_id,
+        scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
+    )

-    mock_execute.assert_called_once()
-    call_kwargs = mock_execute.call_args[1]
-    assert call_kwargs["database"] == "db-tenant-test"
+    sink_backend_stub.execute_read_query.assert_called_once()
+    call_args = sink_backend_stub.execute_read_query.call_args[0]
+    assert call_args[0] == "db-tenant-test"
    # The cypher is rewritten with the provider label injection
-    assert plabel in call_kwargs["cypher"]
+    assert plabel in call_args[1]
    assert len(result["nodes"]) == 2
    assert result["relationships"][0]["label"] == "OWNS"
    assert result["truncated"] is False
    assert result["total_nodes"] == 2


-def test_execute_custom_query_raises_permission_denied_on_write():
+def test_execute_custom_query_adds_timeout_for_neptune_scan(sink_backend_stub):
+    graph_result = MagicMock()
+    graph_result.nodes = []
+    graph_result.relationships = []
+    sink_backend_stub.execute_read_query.return_value = graph_result
+
    with patch(
-        "api.attack_paths.views_helpers.graph_database.execute_read_query",
-        side_effect=graph_database.WriteQueryNotAllowedException(
+        "api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
+        return_value=sink_backend_stub,
+    ):
+        views_helpers.execute_custom_query(
+            "db-tenant-test",
+            "MATCH (n) RETURN n",
+            "provider-1",
+            scan=MagicMock(is_migrated=True, sink_backend="neptune"),
+        )
+
+    cypher = sink_backend_stub.execute_read_query.call_args[0][1]
+    assert cypher.startswith("USING QUERY:TIMEOUTMILLISECONDS")
+
+
+def test_execute_custom_query_raises_permission_denied_on_write(sink_backend_stub):
+    sink_backend_stub.execute_read_query.side_effect = (
+        graph_database.WriteQueryNotAllowedException(
            message="Read query not allowed",
            code="Neo.ClientError.Statement.AccessMode",
-        ),
-    ):
-        with pytest.raises(PermissionDenied):
-            views_helpers.execute_custom_query(
-                "db-tenant-test", "CREATE (n) RETURN n", "provider-1"
-            )
+        )
+    )
+    with pytest.raises(PermissionDenied):
+        views_helpers.execute_custom_query(
+            "db-tenant-test",
+            "CREATE (n) RETURN n",
+            "provider-1",
+            scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
+        )


-def test_execute_custom_query_wraps_graph_errors():
-    with (
-        patch(
-            "api.attack_paths.views_helpers.graph_database.execute_read_query",
-            side_effect=graph_database.GraphDatabaseQueryException("boom"),
-        ),
-        patch("api.attack_paths.views_helpers.logger") as mock_logger,
-    ):
+def test_execute_custom_query_wraps_graph_errors(sink_backend_stub):
+    sink_backend_stub.execute_read_query.side_effect = (
+        graph_database.GraphDatabaseQueryException("boom")
+    )
+    with patch("api.attack_paths.views_helpers.logger") as mock_logger:
        with pytest.raises(APIException):
            views_helpers.execute_custom_query(
-                "db-tenant-test", "MATCH (n) RETURN n", "provider-1"
+                "db-tenant-test",
+                "MATCH (n) RETURN n",
+                "provider-1",
+                scan=MagicMock(is_migrated=False, sink_backend="neo4j"),
            )

    mock_logger.error.assert_called_once()
@@ -561,13 +592,33 @@ def test_truncate_graph_empty_graph():

@pytest.fixture
 def mock_neo4j_session():
-    """Mock the Neo4j driver so execute_read_query uses a fake session."""
+    """Install a Neo4jSink with a mocked Bolt driver into the sink factory.
+
+    The yielded mock is the `neo4j.Session` that the Neo4jSink will obtain via
+    `driver.session(...)`. Tests configure `mock_neo4j_session.execute_read`
+    return values / side effects to exercise the read-mode error translation
+    path on the real `Neo4jSink.execute_read_query` and `get_session` code.
+    """
+    from api.attack_paths.sink import factory
+    from api.attack_paths.sink.neo4j import Neo4jSink
+
    mock_session = MagicMock(spec=neo4j.Session)
    mock_driver = MagicMock(spec=neo4j.Driver)
    mock_driver.session.return_value = mock_session

-    with patch("api.attack_paths.database.get_driver", return_value=mock_driver):
+    sink = Neo4jSink()
+    sink._driver = mock_driver
+
+    previous_backend = factory._backend
+    previous_secondary = dict(factory._secondary_backends)
+    factory._backend = sink
+    factory._secondary_backends.clear()
+    try:
        yield mock_session
+    finally:
+        factory._backend = previous_backend
+        factory._secondary_backends.clear()
+        factory._secondary_backends.update(previous_secondary)


 def test_execute_read_query_succeeds_with_select(mock_neo4j_session):
@@ -663,16 +714,20 @@ def test_execute_read_query_rejects_apoc_real_create(mock_neo4j_session, cypher)

@pytest.fixture
 def mock_schema_session():
-    """Mock get_session for cartography schema tests."""
+    """Mock the routed sink backend session for cartography schema tests."""
    mock_result = MagicMock()
    mock_session = MagicMock()
    mock_session.run.return_value = mock_result
+    mock_backend = MagicMock()

    with patch(
-        "api.attack_paths.views_helpers.graph_database.get_session"
-    ) as mock_get_session:
-        mock_get_session.return_value.__enter__ = MagicMock(return_value=mock_session)
-        mock_get_session.return_value.__exit__ = MagicMock(return_value=False)
+        "api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
+        return_value=mock_backend,
+    ):
+        mock_backend.get_session.return_value.__enter__ = MagicMock(
+            return_value=mock_session
+        )
+        mock_backend.get_session.return_value.__exit__ = MagicMock(return_value=False)
        yield mock_session, mock_result


@@ -683,7 +738,9 @@ def test_get_cartography_schema_returns_urls(mock_schema_session):
        "module_version": "0.129.0",
    }

-    result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+    result = views_helpers.get_cartography_schema(
+        "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
+    )

    mock_session.run.assert_called_once()
    assert result["id"] == "aws-0.129.0"
@@ -699,7 +756,9 @@ def test_get_cartography_schema_returns_none_when_no_data(mock_schema_session):
    _, mock_result = mock_schema_session
    mock_result.single.return_value = None

-    result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+    result = views_helpers.get_cartography_schema(
+        "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
+    )

    assert result is None

@@ -721,21 +780,29 @@ def test_get_cartography_schema_extracts_provider(
        "module_version": "1.0.0",
    }

-    result = views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+    result = views_helpers.get_cartography_schema(
+        "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
+    )

    assert result["id"] == f"{expected_provider}-1.0.0"
    assert result["provider"] == expected_provider


 def test_get_cartography_schema_wraps_database_error():
+    mock_backend = MagicMock()
+    mock_backend.get_session.side_effect = graph_database.GraphDatabaseQueryException(
+        "boom"
+    )
    with (
        patch(
-            "api.attack_paths.views_helpers.graph_database.get_session",
-            side_effect=graph_database.GraphDatabaseQueryException("boom"),
+            "api.attack_paths.views_helpers.sink_module.get_backend_for_scan",
+            return_value=mock_backend,
        ),
        patch("api.attack_paths.views_helpers.logger") as mock_logger,
    ):
        with pytest.raises(APIException):
-            views_helpers.get_cartography_schema("db-tenant-test", "provider-123")
+            views_helpers.get_cartography_schema(
+                "db-tenant-test", "provider-123", MagicMock(sink_backend="neo4j")
+            )

    mock_logger.error.assert_called_once()
@@ -1,623 +1,174 @@
-"""
-Tests for Neo4j database lazy initialization.
+"""Tests for the attack-paths database facade.

-The Neo4j driver is created on first use for every process type; app startup
-never contacts Neo4j. These tests validate the database module behavior itself.
+After the Neptune port, `api.attack_paths.database` is a thin routing shim
+over `api.attack_paths.ingest` (cartography temp DB, always Neo4j) and
+`api.attack_paths.sink` (configurable Neo4j or Neptune). The facade's
+contract is routing by database-name prefix and the public exception
+hierarchy; sink-internal behavior is exercised in `test_sink.py`.
 """

-import threading
 from unittest.mock import MagicMock, patch

 import api.attack_paths.database as db_module
-import neo4j
-import neo4j.exceptions
-import pytest


-class TestLazyInitialization:
-    """Test that Neo4j driver is initialized lazily on first use."""
-
-    @pytest.fixture(autouse=True)
-    def reset_module_state(self):
-        """Reset module-level singleton state before each test."""
-        original_driver = db_module._driver
-
-        db_module._driver = None
-
-        yield
-
-        db_module._driver = original_driver
-
-    def test_driver_not_initialized_at_import(self):
-        """Driver should be None after module import (no eager connection)."""
-        assert db_module._driver is None
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_init_driver_creates_connection_on_first_call(
-        self, mock_driver_factory, mock_settings
-    ):
-        """init_driver() should create connection only when called."""
-        mock_driver = MagicMock()
-        mock_driver_factory.return_value = mock_driver
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        assert db_module._driver is None
-
-        result = db_module.init_driver()
-
-        mock_driver_factory.assert_called_once()
-        mock_driver.verify_connectivity.assert_called_once()
-        assert result is mock_driver
-        assert db_module._driver is mock_driver
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_init_driver_leaves_driver_none_when_verify_fails(
-        self, mock_driver_factory, mock_settings
-    ):
-        """A failed verify_connectivity() must not publish or leak the driver."""
-        mock_driver = MagicMock()
-        mock_driver.verify_connectivity.side_effect = (
-            neo4j.exceptions.ServiceUnavailable("down")
-        )
-        mock_driver_factory.return_value = mock_driver
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        with pytest.raises(neo4j.exceptions.ServiceUnavailable):
-            db_module.init_driver()
-
-        assert db_module._driver is None
-        mock_driver.close.assert_called_once()
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_init_driver_returns_cached_driver_on_subsequent_calls(
-        self, mock_driver_factory, mock_settings
-    ):
-        """Subsequent calls should return cached driver without reconnecting."""
-        mock_driver = MagicMock()
-        mock_driver_factory.return_value = mock_driver
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        first_result = db_module.init_driver()
-        second_result = db_module.init_driver()
-        third_result = db_module.init_driver()
-
-        # Only one connection attempt
-        assert mock_driver_factory.call_count == 1
-        assert mock_driver.verify_connectivity.call_count == 1
-
-        # All calls return same instance
-        assert first_result is second_result is third_result
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_get_driver_delegates_to_init_driver(
-        self, mock_driver_factory, mock_settings
-    ):
-        """get_driver() should use init_driver() for lazy initialization."""
-        mock_driver = MagicMock()
-        mock_driver_factory.return_value = mock_driver
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        result = db_module.get_driver()
-
-        assert result is mock_driver
-        mock_driver_factory.assert_called_once()
-
-
-class TestConnectionAcquisitionTimeout:
-    """Test that the connection acquisition timeout is configurable."""
-
-    @pytest.fixture(autouse=True)
-    def reset_module_state(self):
-        original_driver = db_module._driver
-        original_acq_timeout = db_module.CONN_ACQUISITION_TIMEOUT
-        original_conn_timeout = db_module.CONNECTION_TIMEOUT
-
-        db_module._driver = None
-
-        yield
-
-        db_module._driver = original_driver
-        db_module.CONN_ACQUISITION_TIMEOUT = original_acq_timeout
-        db_module.CONNECTION_TIMEOUT = original_conn_timeout
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_driver_receives_configured_timeout(
-        self, mock_driver_factory, mock_settings
-    ):
-        """init_driver() should pass the configured timeouts to the neo4j driver."""
-        mock_driver_factory.return_value = MagicMock()
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-        db_module.CONN_ACQUISITION_TIMEOUT = 42
-        db_module.CONNECTION_TIMEOUT = 7
-
-        db_module.init_driver()
-
-        _, kwargs = mock_driver_factory.call_args
-        assert kwargs["connection_acquisition_timeout"] == 42
-        assert kwargs["connection_timeout"] == 7
-
-
-class TestAtexitRegistration:
-    """Test that atexit cleanup handler is registered correctly."""
-
-    @pytest.fixture(autouse=True)
-    def reset_module_state(self):
-        """Reset module-level singleton state before each test."""
-        original_driver = db_module._driver
-
-        db_module._driver = None
-
-        yield
-
-        db_module._driver = original_driver
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.atexit.register")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_atexit_registered_on_first_init(
-        self, mock_driver_factory, mock_atexit_register, mock_settings
-    ):
-        """atexit.register should be called on first initialization."""
-        mock_driver_factory.return_value = MagicMock()
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        db_module.init_driver()
-
-        mock_atexit_register.assert_called_once_with(db_module.close_driver)
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.atexit.register")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_atexit_registered_only_once(
-        self, mock_driver_factory, mock_atexit_register, mock_settings
-    ):
-        """atexit.register should only be called once across multiple inits.
-
-        The double-checked locking on _driver ensures the atexit registration
-        block only executes once (when _driver is first created).
-        """
-        mock_driver_factory.return_value = MagicMock()
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        db_module.init_driver()
-        db_module.init_driver()
-        db_module.init_driver()
-
-        # Only registered once because subsequent calls hit the fast path
-        assert mock_atexit_register.call_count == 1
-
-
-class TestCloseDriver:
-    """Test driver cleanup functionality."""
-
-    @pytest.fixture(autouse=True)
-    def reset_module_state(self):
-        """Reset module-level singleton state before each test."""
-        original_driver = db_module._driver
-
-        db_module._driver = None
-
-        yield
-
-        db_module._driver = original_driver
-
-    def test_close_driver_closes_and_clears_driver(self):
-        """close_driver() should close the driver and set it to None."""
-        mock_driver = MagicMock()
-        db_module._driver = mock_driver
-
-        db_module.close_driver()
-
-        mock_driver.close.assert_called_once()
-        assert db_module._driver is None
-
-    def test_close_driver_handles_none_driver(self):
-        """close_driver() should handle case where driver is None."""
-        db_module._driver = None
-
-        # Should not raise
-        db_module.close_driver()
-
-        assert db_module._driver is None
-
-    def test_close_driver_clears_driver_even_on_close_error(self):
-        """Driver should be cleared even if close() raises an exception."""
-        mock_driver = MagicMock()
-        mock_driver.close.side_effect = Exception("Connection error")
-        db_module._driver = mock_driver
-
-        with pytest.raises(Exception, match="Connection error"):
-            db_module.close_driver()
-
-        # Driver should still be cleared
-        assert db_module._driver is None
-
-
-class TestExecuteReadQuery:
-    """Test read query execution helper."""
-
-    def test_execute_read_query_calls_read_session_and_returns_result(self):
-        tx = MagicMock()
-        expected_graph = MagicMock()
-        run_result = MagicMock()
-        run_result.graph.return_value = expected_graph
-        tx.run.return_value = run_result
-
-        session = MagicMock()
-
-        def execute_read_side_effect(fn):
-            return fn(tx)
-
-        session.execute_read.side_effect = execute_read_side_effect
-
-        session_ctx = MagicMock()
-        session_ctx.__enter__.return_value = session
-        session_ctx.__exit__.return_value = False
-
-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ) as mock_get_session:
-            result = db_module.execute_read_query(
-                "db-tenant-test-tenant-id",
-                "MATCH (n) RETURN n",
-                {"provider_uid": "123"},
-            )
-
-        mock_get_session.assert_called_once_with(
-            "db-tenant-test-tenant-id",
-            default_access_mode=neo4j.READ_ACCESS,
-        )
-        session.execute_read.assert_called_once()
-        tx.run.assert_called_once_with(
-            "MATCH (n) RETURN n",
-            {"provider_uid": "123"},
-            timeout=db_module.READ_QUERY_TIMEOUT_SECONDS,
-        )
-        run_result.graph.assert_called_once_with()
-        assert result is expected_graph
-
-    def test_execute_read_query_defaults_parameters_to_empty_dict(self):
-        tx = MagicMock()
-        run_result = MagicMock()
-        run_result.graph.return_value = MagicMock()
-        tx.run.return_value = run_result
-
-        session = MagicMock()
-        session.execute_read.side_effect = lambda fn: fn(tx)
-
-        session_ctx = MagicMock()
-        session_ctx.__enter__.return_value = session
-        session_ctx.__exit__.return_value = False
-
-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            db_module.execute_read_query(
-                "db-tenant-test-tenant-id",
-                "MATCH (n) RETURN n",
-            )
-
-        tx.run.assert_called_once_with(
-            "MATCH (n) RETURN n",
-            {},
-            timeout=db_module.READ_QUERY_TIMEOUT_SECONDS,
-        )
-        run_result.graph.assert_called_once_with()
-
-
-class TestGetSessionReadOnly:
-    """Test that get_session translates Neo4j read-mode errors."""
-
-    @pytest.fixture(autouse=True)
-    def reset_module_state(self):
-        original_driver = db_module._driver
-        db_module._driver = None
-        yield
-        db_module._driver = original_driver
-
-    @pytest.mark.parametrize(
-        "neo4j_code",
-        [
-            "Neo.ClientError.Statement.AccessMode",
-            "Neo.ClientError.Procedure.ProcedureNotFound",
-        ],
-    )
-    def test_get_session_raises_write_query_not_allowed(self, neo4j_code):
-        """Read-mode Neo4j errors should raise `WriteQueryNotAllowedException`."""
-        mock_session = MagicMock()
-        neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
-            code=neo4j_code,
-            message="Write operations are not allowed",
-        )
-        mock_session.run.side_effect = neo4j_error
-
-        mock_driver = MagicMock()
-        mock_driver.session.return_value = mock_session
-        db_module._driver = mock_driver
-
-        with pytest.raises(db_module.WriteQueryNotAllowedException):
-            with db_module.get_session(
-                default_access_mode=neo4j.READ_ACCESS
-            ) as session:
-                session.run("CREATE (n) RETURN n")
-
-    def test_get_session_raises_generic_exception_for_other_errors(self):
-        """Non-read-mode Neo4j errors should raise GraphDatabaseQueryException."""
-        mock_session = MagicMock()
-        neo4j_error = neo4j.exceptions.Neo4jError._hydrate_neo4j(
-            code="Neo.ClientError.Statement.SyntaxError",
-            message="Invalid syntax",
-        )
-        mock_session.run.side_effect = neo4j_error
-
-        mock_driver = MagicMock()
-        mock_driver.session.return_value = mock_session
-        db_module._driver = mock_driver
-
-        with pytest.raises(db_module.GraphDatabaseQueryException):
-            with db_module.get_session(
-                default_access_mode=neo4j.READ_ACCESS
-            ) as session:
-                session.run("INVALID CYPHER")
-
-
-class TestThreadSafety:
-    """Test thread-safe initialization."""
-
-    @pytest.fixture(autouse=True)
-    def reset_module_state(self):
-        """Reset module-level singleton state before each test."""
-        original_driver = db_module._driver
-
-        db_module._driver = None
-
-        yield
-
-        db_module._driver = original_driver
-
-    @patch("api.attack_paths.database.settings")
-    @patch("api.attack_paths.database.neo4j.GraphDatabase.driver")
-    def test_concurrent_init_creates_single_driver(
-        self, mock_driver_factory, mock_settings
-    ):
-        """Multiple threads calling init_driver() should create only one driver."""
-        mock_driver = MagicMock()
-        mock_driver_factory.return_value = mock_driver
-        mock_settings.DATABASES = {
-            "neo4j": {
-                "HOST": "localhost",
-                "PORT": 7687,
-                "USER": "neo4j",
-                "PASSWORD": "password",
-            }
-        }
-
-        results = []
-        errors = []
-
-        def call_init():
-            try:
-                result = db_module.init_driver()
-                results.append(result)
-            except Exception as e:
-                errors.append(e)
-
-        threads = [threading.Thread(target=call_init) for _ in range(10)]
-
-        for t in threads:
-            t.start()
-        for t in threads:
-            t.join()
-
-        assert not errors, f"Threads raised errors: {errors}"
-
-        # Only one driver created
-        assert mock_driver_factory.call_count == 1
-
-        # All threads got the same driver instance
-        assert all(r is mock_driver for r in results)
-        assert len(results) == 10
-
-
-class TestHasProviderData:
-    """Test has_provider_data helper for checking provider nodes in Neo4j."""
-
-    def test_returns_true_when_nodes_exist(self):
-        mock_session = MagicMock()
-        mock_result = MagicMock()
-        mock_result.single.return_value = MagicMock()  # non-None record
-        mock_session.run.return_value = mock_result
-
-        session_ctx = MagicMock()
-        session_ctx.__enter__.return_value = mock_session
-        session_ctx.__exit__.return_value = False
-
-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            assert db_module.has_provider_data("db-tenant-abc", "provider-123") is True
-
-        mock_session.run.assert_called_once()
-
-    def test_returns_false_when_no_nodes(self):
-        mock_session = MagicMock()
-        mock_result = MagicMock()
-        mock_result.single.return_value = None
-        mock_session.run.return_value = mock_result
-
-        session_ctx = MagicMock()
-        session_ctx.__enter__.return_value = mock_session
-        session_ctx.__exit__.return_value = False
-
-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            assert db_module.has_provider_data("db-tenant-abc", "provider-123") is False
-
-    def test_returns_false_when_database_not_found(self):
-        session_ctx = MagicMock()
-        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
-            message="Database does not exist",
-            code="Neo.ClientError.Database.DatabaseNotFound",
+class TestDatabaseNameHelper:
+    def test_tenant_name_lowercases_uuid(self):
+        assert (
+            db_module.get_database_name("ABC-123", temporary=False)
+            == "db-tenant-abc-123"
        )

-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            assert (
-                db_module.has_provider_data("db-tenant-gone", "provider-123") is False
-            )
-
-    def test_raises_on_other_errors(self):
-        session_ctx = MagicMock()
-        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
-            message="Connection refused",
-            code="Neo.TransientError.General.UnknownError",
+    def test_temporary_name_uses_tmp_scan_prefix(self):
+        assert (
+            db_module.get_database_name("XYZ-789", temporary=True)
+            == "db-tmp-scan-xyz-789"
        )

-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            with pytest.raises(db_module.GraphDatabaseQueryException):
-                db_module.has_provider_data("db-tenant-abc", "provider-123")

+class TestExceptionHierarchy:
+    """`tasks/` and `api/v1/views.py` import these from the facade."""

-class TestDropSubgraph:
-    """Test drop_subgraph two-phase batched deletion of a provider's graph."""
-
-    @staticmethod
-    def _result(count):
-        result = MagicMock()
-        result.single.return_value.get.return_value = count
-        return result
-
-    @staticmethod
-    def _session_ctx(session):
-        ctx = MagicMock()
-        ctx.__enter__.return_value = session
-        ctx.__exit__.return_value = False
-        return ctx
-
-    def test_deletes_relationships_then_nodes_in_batches(self):
-        session = MagicMock()
-        # Phase 1 (relationships): one full batch then empty.
-        # Phase 2 (nodes): one full batch then empty.
-        session.run.side_effect = [
-            self._result(1000),
-            self._result(0),
-            self._result(1000),
-            self._result(0),
-        ]
-
-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=self._session_ctx(session),
-        ):
-            deleted = db_module.drop_subgraph("db-tenant-abc", "provider-123")
-
-        # Only phase-2 node counts contribute to the return value.
-        assert deleted == 1000
-        assert session.run.call_count == 4
-
-        queries = [call.args[0] for call in session.run.call_args_list]
-
-        # Regression guard: the memory blow-up was caused by DETACH DELETE.
-        assert all("DETACH DELETE" not in query for query in queries)
-
-        rel_queries = [query for query in queries if "DELETE r" in query]
-        node_queries = [query for query in queries if "DELETE n" in query]
-        assert rel_queries and node_queries
-        # DISTINCT avoids double-counting relationships matched from both ends.
-        assert all("DISTINCT r" in query for query in rel_queries)
-
-        # Relationships must be fully drained before nodes are deleted.
-        first_node = next(i for i, q in enumerate(queries) if "DELETE n" in q)
-        last_rel = max(i for i, q in enumerate(queries) if "DELETE r" in q)
-        assert last_rel < first_node
-
-    def test_returns_zero_when_database_not_found(self):
-        session_ctx = MagicMock()
-        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
-            message="Database does not exist",
-            code="Neo.ClientError.Database.DatabaseNotFound",
+    def test_write_query_is_graph_database_exception(self):
+        assert issubclass(
+            db_module.WriteQueryNotAllowedException,
+            db_module.GraphDatabaseQueryException,
        )

-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            assert db_module.drop_subgraph("db-tenant-gone", "provider-123") == 0
-
-    def test_raises_on_other_errors(self):
-        session_ctx = MagicMock()
-        session_ctx.__enter__.side_effect = db_module.GraphDatabaseQueryException(
-            message="Connection refused",
-            code="Neo.TransientError.General.UnknownError",
+    def test_client_statement_is_graph_database_exception(self):
+        assert issubclass(
+            db_module.ClientStatementException, db_module.GraphDatabaseQueryException
        )

-        with patch(
-            "api.attack_paths.database.get_session",
-            return_value=session_ctx,
-        ):
-            with pytest.raises(db_module.GraphDatabaseQueryException):
-                db_module.drop_subgraph("db-tenant-abc", "provider-123")
+    def test_exception_str_includes_code_when_set(self):
+        exc = db_module.GraphDatabaseQueryException(
+            message="boom", code="Neo.ClientError.X.Y"
+        )
+        assert str(exc) == "Neo.ClientError.X.Y: boom"
+
+    def test_exception_str_falls_back_to_message_without_code(self):
+        exc = db_module.GraphDatabaseQueryException(message="boom")
+        assert str(exc) == "boom"
+
+
+class TestExecuteReadQueryRoutes:
+    def test_execute_read_query_delegates_to_sink(self, sink_backend_stub):
+        sink_backend_stub.execute_read_query.return_value = "graph"
+
+        result = db_module.execute_read_query(
+            "db-tenant-abc", "MATCH (n) RETURN n", {"provider_uid": "123"}
+        )
+
+        sink_backend_stub.execute_read_query.assert_called_once_with(
+            "db-tenant-abc", "MATCH (n) RETURN n", {"provider_uid": "123"}
+        )
+        assert result == "graph"
+
+    def test_execute_read_query_defaults_parameters_to_none(self, sink_backend_stub):
+        db_module.execute_read_query("db-tenant-abc", "MATCH (n) RETURN n")
+
+        sink_backend_stub.execute_read_query.assert_called_once_with(
+            "db-tenant-abc", "MATCH (n) RETURN n", None
+        )
+
+
+class TestSinkOperationsDelegation:
+    def test_has_provider_data_delegates_to_sink(self, sink_backend_stub):
+        sink_backend_stub.has_provider_data.return_value = True
+
+        assert db_module.has_provider_data("db-tenant-abc", "provider-123") is True
+        sink_backend_stub.has_provider_data.assert_called_once_with(
+            "db-tenant-abc", "provider-123"
+        )
+
+    def test_drop_subgraph_delegates_to_sink(self, sink_backend_stub):
+        sink_backend_stub.drop_subgraph.return_value = 42
+
+        assert db_module.drop_subgraph("db-tenant-abc", "provider-123") == 42
+        sink_backend_stub.drop_subgraph.assert_called_once_with(
+            "db-tenant-abc", "provider-123"
+        )
+
+
+class TestRoutingByDatabasePrefix:
+    """`db-tmp-scan-*` and `None` route to ingest; everything else to sink."""
+
+    def test_create_database_routes_temp_to_ingest(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            db_module.create_database("db-tmp-scan-uuid-1")
+
+        mock_ingest.create_database.assert_called_once_with("db-tmp-scan-uuid-1")
+        sink_backend_stub.create_database.assert_not_called()
+
+    def test_create_database_routes_tenant_to_sink(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            db_module.create_database("db-tenant-abc")
+
+        sink_backend_stub.create_database.assert_called_once_with("db-tenant-abc")
+        mock_ingest.create_database.assert_not_called()
+
+    def test_drop_database_routes_temp_to_ingest(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            db_module.drop_database("db-tmp-scan-uuid-1")
+
+        mock_ingest.drop_database.assert_called_once_with("db-tmp-scan-uuid-1")
+        sink_backend_stub.drop_database.assert_not_called()
+
+    def test_drop_database_routes_tenant_to_sink(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            db_module.drop_database("db-tenant-abc")
+
+        sink_backend_stub.drop_database.assert_called_once_with("db-tenant-abc")
+        mock_ingest.drop_database.assert_not_called()
+
+    def test_clear_cache_routes_temp_to_ingest(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            db_module.clear_cache("db-tmp-scan-uuid-1")
+
+        mock_ingest.clear_cache.assert_called_once_with("db-tmp-scan-uuid-1")
+        sink_backend_stub.clear_cache.assert_not_called()
+
+    def test_clear_cache_routes_tenant_to_sink(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            db_module.clear_cache("db-tenant-abc")
+
+        sink_backend_stub.clear_cache.assert_called_once_with("db-tenant-abc")
+        mock_ingest.clear_cache.assert_not_called()
+
+    def test_get_session_routes_temp_to_ingest(self, sink_backend_stub):
+        sentinel = MagicMock()
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            mock_ingest.get_session.return_value = sentinel
+
+            result = db_module.get_session("db-tmp-scan-uuid-1")
+
+        assert result is sentinel
+        mock_ingest.get_session.assert_called_once()
+        sink_backend_stub.get_session.assert_not_called()
+
+    def test_get_session_routes_none_to_ingest(self, sink_backend_stub):
+        sentinel = MagicMock()
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            mock_ingest.get_session.return_value = sentinel
+
+            result = db_module.get_session(None)
+
+        assert result is sentinel
+        sink_backend_stub.get_session.assert_not_called()
+
+    def test_get_ingest_uri_delegates_to_ingest(self, sink_backend_stub):
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            mock_ingest.get_uri.return_value = "bolt://neo4j:7687"
+
+            assert db_module.get_ingest_uri() == "bolt://neo4j:7687"
+
+            mock_ingest.get_uri.assert_called_once_with()
+
+    def test_get_session_routes_tenant_to_sink(self, sink_backend_stub):
+        sentinel = MagicMock()
+        sink_backend_stub.get_session.return_value = sentinel
+        with patch("api.attack_paths.database.ingest") as mock_ingest:
+            result = db_module.get_session("db-tenant-abc")
+
+        assert result is sentinel
+        mock_ingest.get_session.assert_not_called()
@@ -67,7 +67,7 @@ class TestLivenessEndpoint:
        with (
            patch("api.health._probe_postgres") as mock_pg,
            patch("api.health._probe_valkey") as mock_vk,
-            patch("api.health._probe_neo4j") as mock_neo,
+            patch("api.health._probe_graph_db") as mock_neo,
        ):
            response = api_client.get(reverse("health-live"))

@@ -83,14 +83,14 @@ class TestReadinessEndpoint:
        return (
            patch("api.health._probe_postgres", return_value=None),
            patch("api.health._probe_valkey", return_value=None),
-            patch("api.health._probe_neo4j", return_value=None),
+            patch("api.health._probe_graph_db", return_value=None),
        )

    def test_returns_200_and_pass_when_all_dependencies_healthy(self, api_client):
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))

@@ -107,7 +107,7 @@ class TestReadinessEndpoint:
        assert set(body["checks"].keys()) == {
            "postgres:responseTime",
            "valkey:responseTime",
-            "neo4j:responseTime",
+            "graphdb:responseTime",
        }
        for key in body["checks"]:
            entries = body["checks"][key]
@@ -122,6 +122,23 @@ class TestReadinessEndpoint:
            # `output` must not leak when the check passed.
            assert "output" not in entry

+    @pytest.mark.parametrize("sink", ["neo4j", "neptune"])
+    def test_graphdb_component_id_reflects_active_sink(self, api_client, sink):
+        from django.test import override_settings
+
+        with (
+            override_settings(ATTACK_PATHS_SINK_DATABASE=sink),
+            patch("api.health._probe_postgres"),
+            patch("api.health._probe_valkey"),
+            patch("api.health._probe_graph_db"),
+        ):
+            response = api_client.get(reverse("health-ready"))
+
+        assert response.status_code == status.HTTP_200_OK
+        entry = response.json()["checks"]["graphdb:responseTime"][0]
+        # Stable key, but the concrete store is named in componentId.
+        assert entry["componentId"] == sink
+
    def test_returns_503_and_fail_when_postgres_is_down(self, api_client):
        with (
            patch(
@@ -129,7 +146,7 @@ class TestReadinessEndpoint:
                side_effect=RuntimeError("connection refused"),
            ),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))

@@ -141,13 +158,13 @@ class TestReadinessEndpoint:
        # Exception detail is never echoed in the response, only logged.
        assert "output" not in pg_entry
        assert body["checks"]["valkey:responseTime"][0]["status"] == "pass"
-        assert body["checks"]["neo4j:responseTime"][0]["status"] == "pass"
+        assert body["checks"]["graphdb:responseTime"][0]["status"] == "pass"

    def test_returns_503_and_fail_when_valkey_is_down(self, api_client):
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey", side_effect=ConnectionError("timeout")),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))

@@ -158,12 +175,12 @@ class TestReadinessEndpoint:
        assert vk_entry["status"] == "fail"
        assert "output" not in vk_entry

-    def test_returns_503_and_fail_when_neo4j_is_down(self, api_client):
+    def test_returns_503_and_fail_when_graph_db_is_down(self, api_client):
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
            patch(
-                "api.health._probe_neo4j",
+                "api.health._probe_graph_db",
                side_effect=RuntimeError("ServiceUnavailable"),
            ),
        ):
@@ -172,15 +189,15 @@ class TestReadinessEndpoint:
        assert response.status_code == status.HTTP_503_SERVICE_UNAVAILABLE
        body = response.json()
        assert body["status"] == "fail"
-        neo_entry = body["checks"]["neo4j:responseTime"][0]
-        assert neo_entry["status"] == "fail"
-        assert "output" not in neo_entry
+        graph_db_entry = body["checks"]["graphdb:responseTime"][0]
+        assert graph_db_entry["status"] == "fail"
+        assert "output" not in graph_db_entry

    def test_reports_all_failures_simultaneously(self, api_client):
        with (
            patch("api.health._probe_postgres", side_effect=RuntimeError("pg down")),
            patch("api.health._probe_valkey", side_effect=RuntimeError("vk down")),
-            patch("api.health._probe_neo4j", side_effect=RuntimeError("neo down")),
+            patch("api.health._probe_graph_db", side_effect=RuntimeError("neo down")),
        ):
            response = api_client.get(reverse("health-ready"))

@@ -190,7 +207,7 @@ class TestReadinessEndpoint:
        for key in (
            "postgres:responseTime",
            "valkey:responseTime",
-            "neo4j:responseTime",
+            "graphdb:responseTime",
        ):
            entry = body["checks"][key][0]
            assert entry["status"] == "fail"
@@ -209,7 +226,7 @@ class TestReadinessEndpoint:
        with (
            patch("api.health._probe_postgres", side_effect=RuntimeError(sensitive)),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            response = api_client.get(reverse("health-ready"))

@@ -229,7 +246,7 @@ class TestReadinessEndpoint:
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            api_client.credentials()
            response = api_client.get(reverse("health-ready"))
@@ -244,7 +261,7 @@ class TestReadinessCache:
        with (
            patch("api.health._probe_postgres") as pg,
            patch("api.health._probe_valkey") as vk,
-            patch("api.health._probe_neo4j") as neo,
+            patch("api.health._probe_graph_db") as neo,
        ):
            r1 = api_client.get(reverse("health-ready"))
            r2 = api_client.get(reverse("health-ready"))
@@ -262,7 +279,7 @@ class TestReadinessCache:
        with (
            patch("api.health._probe_postgres") as pg,
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            api_client.get(reverse("health-ready"))
            assert pg.call_count == 1
@@ -286,7 +303,7 @@ class TestReadinessCache:
        with (
            patch("api.health._probe_postgres", side_effect=RuntimeError("down")) as pg,
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
        ):
            r1 = api_client.get(reverse("health-ready"))
            r2 = api_client.get(reverse("health-ready"))
@@ -320,7 +337,7 @@ class TestRateLimiting:
        with (
            patch("api.health._probe_postgres"),
            patch("api.health._probe_valkey"),
-            patch("api.health._probe_neo4j"),
+            patch("api.health._probe_graph_db"),
            patch.object(ScopedRateThrottle, "parse_rate", return_value=(2, 60)),
        ):
            statuses = [
@@ -414,19 +431,42 @@ class TestProbeImplementations:
            with pytest.raises(RuntimeError, match="bug"):
                health._probe_valkey()

-    def test_neo4j_probe_calls_verify_connectivity(self):
-        with patch("api.attack_paths.database.get_driver") as mock_get_driver:
-            mock_get_driver.return_value.verify_connectivity.return_value = None
-            assert health._probe_neo4j() is None
-            mock_get_driver.return_value.verify_connectivity.assert_called_once_with()
+    def test_graph_db_probe_calls_verify_connectivity(self):
+        with patch("api.attack_paths.database.verify_connectivity") as mock_verify:
+            mock_verify.return_value = None
+            assert health._probe_graph_db() is None
+            mock_verify.assert_called_once_with()

-    def test_neo4j_probe_propagates_driver_errors(self):
-        with patch("api.attack_paths.database.get_driver") as mock_get_driver:
-            mock_get_driver.return_value.verify_connectivity.side_effect = RuntimeError(
-                "unreachable"
-            )
+    def test_graph_db_probe_propagates_errors(self):
+        with patch(
+            "api.attack_paths.database.verify_connectivity",
+            side_effect=RuntimeError("unreachable"),
+        ):
            with pytest.raises(RuntimeError, match="unreachable"):
-                health._probe_neo4j()
+                health._probe_graph_db()
+
+    def test_graph_db_probe_times_out_when_check_exceeds_budget(self):
+        # A sink whose connectivity check blocks past the probe budget must
+        # surface as a failure fast, not pin the request thread for the
+        # driver's full acquisition timeout.
+        import time as _time
+
+        def _hang() -> None:
+            _time.sleep(2)
+
+        with (
+            patch("api.health.GRAPH_DB_PROBE_TIMEOUT_SECONDS", 0.2),
+            patch(
+                "api.attack_paths.database.verify_connectivity",
+                side_effect=_hang,
+            ),
+        ):
+            started = _time.perf_counter()
+            with pytest.raises(TimeoutError):
+                health._probe_graph_db()
+            elapsed = _time.perf_counter() - started
+
+        assert elapsed < health.GRAPH_DB_PROBE_TIMEOUT_SECONDS + 1


 class TestStatusAggregation:
@@ -0,0 +1,626 @@
+"""Tests for the attack-paths sink factory and Neo4j sink.
+
+The sink module picks a backend per ``settings.ATTACK_PATHS_SINK_DATABASE``.
+Neo4j is the default and preserves today's behavior; Neptune is opt-in and
+builds dual writer/reader Bolt drivers.
+"""
+
+import json
+from importlib import import_module
+from unittest.mock import MagicMock, patch
+
+import pytest
+
+# Prime patch-target resolution. `api.attack_paths.sink/__init__.py` doesn't
+# eagerly import these submodules (they're loaded on demand inside the
+# factory), so `mock.patch("api.attack_paths.sink.<sub>.…")` would fail with
+# AttributeError on first call. Importing here registers them as attributes
+# of the package before any decorator runs.
+import_module("api.attack_paths.sink.neo4j")
+import_module("api.attack_paths.sink.neptune")
+
+
+@pytest.fixture(autouse=True)
+def reset_sink_state():
+    """Reset the module-level backend singletons around each test.
+
+    The cache lives in `api.attack_paths.sink.factory`, not on the package.
+    """
+    from api.attack_paths.sink import factory
+
+    original_backend = factory._backend
+    original_secondary = dict(factory._secondary_backends)
+    factory._backend = None
+    factory._secondary_backends.clear()
+    yield
+    factory._backend = original_backend
+    factory._secondary_backends.clear()
+    factory._secondary_backends.update(original_secondary)
+
+
+class TestSinkFactory:
+    def test_default_resolves_to_neo4j(self, settings):
+        from api.attack_paths.sink import factory
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
+        assert factory._resolve_setting() == "neo4j"
+
+    def test_neptune_resolves_correctly(self, settings):
+        from api.attack_paths.sink import factory
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
+        assert factory._resolve_setting() == "neptune"
+
+    def test_invalid_value_raises(self, settings):
+        from api.attack_paths.sink import factory
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "foo"
+        with pytest.raises(RuntimeError, match="ATTACK_PATHS_SINK_DATABASE"):
+            factory._resolve_setting()
+
+    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
+    def test_init_builds_neo4j_backend_by_default(self, mock_driver, settings):
+        from api.attack_paths import sink as sink_module
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neo4j": {
+                "HOST": "localhost",
+                "PORT": "7687",
+                "USER": "neo4j",
+                "PASSWORD": "pw",
+            },
+        }
+        mock_driver.return_value = MagicMock()
+
+        backend = sink_module.init()
+
+        assert isinstance(backend, Neo4jSink)
+        mock_driver.assert_called_once()
+
+    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
+    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
+    def test_init_builds_neptune_backend(
+        self, mock_driver, mock_auth_provider, settings
+    ):
+        from api.attack_paths import sink as sink_module
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neptune": {
+                "WRITER_ENDPOINT": "writer.example",
+                "READER_ENDPOINT": "reader.example",
+                "PORT": "8182",
+                "REGION": "eu-west-1",
+            },
+        }
+        mock_driver.return_value = MagicMock()
+        mock_auth_provider.return_value = lambda: None
+
+        backend = sink_module.init()
+
+        assert isinstance(backend, NeptuneSink)
+        # Writer + reader endpoints both trigger driver construction
+        assert mock_driver.call_count == 2
+        writer_uri = mock_driver.call_args_list[0][0][0]
+        reader_uri = mock_driver.call_args_list[1][0][0]
+        assert writer_uri == "bolt+s://writer.example:8182"
+        assert reader_uri == "bolt+s://reader.example:8182"
+
+    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
+    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
+    def test_neptune_reader_falls_back_to_writer(
+        self, mock_driver, mock_auth_provider, settings
+    ):
+        from api.attack_paths import sink as sink_module
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neptune": {
+                "WRITER_ENDPOINT": "writer.example",
+                "READER_ENDPOINT": "",
+                "PORT": "8182",
+                "REGION": "eu-west-1",
+            },
+        }
+        mock_driver.return_value = MagicMock()
+        mock_auth_provider.return_value = lambda: None
+
+        sink_module.init()
+
+        # Only one driver call — reader aliases writer
+        assert mock_driver.call_count == 1
+
+
+class TestGetBackendForScan:
+    """``get_backend_for_scan`` routes by the row's recorded sink backend."""
+
+    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
+    def test_legacy_scan_in_neo4j_process_uses_active_backend(
+        self, mock_driver, settings
+    ):
+        from api.attack_paths import sink as sink_module
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neo4j": {
+                "HOST": "localhost",
+                "PORT": "7687",
+                "USER": "neo4j",
+                "PASSWORD": "pw",
+            },
+        }
+        mock_driver.return_value = MagicMock()
+
+        scan = MagicMock(sink_backend="neo4j")
+        backend = sink_module.get_backend_for_scan(scan)
+
+        assert backend is sink_module.get_backend()
+
+    def test_neptune_scan_on_neo4j_process_uses_neptune_secondary(self, settings):
+        from api.attack_paths.sink import factory
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
+        active_neo4j = MagicMock(name="neo4j-active")
+        factory._backend = active_neo4j
+
+        secondary_neptune = MagicMock(name="neptune-secondary")
+        with patch.object(factory, "_build_backend", return_value=secondary_neptune):
+            scan = MagicMock(sink_backend="neptune")
+            backend = factory.get_backend_for_scan(scan)
+
+        assert backend is secondary_neptune
+        assert backend is not active_neo4j
+
+
+def _session_ctx(session: MagicMock) -> MagicMock:
+    ctx = MagicMock()
+    ctx.__enter__ = MagicMock(return_value=session)
+    ctx.__exit__ = MagicMock(return_value=False)
+    return ctx
+
+
+class TestNeo4jSinkSyncWrites:
+    def test_ensure_sync_indexes_runs_create_index_idempotent(self):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+        session.run.return_value = MagicMock()
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            sink.ensure_sync_indexes("db-tenant-x")
+
+        query = session.run.call_args.args[0]
+        assert "CREATE INDEX" in query
+        assert "IF NOT EXISTS" in query
+        assert "`_ProviderResource`" in query
+        assert "`_provider_element_id`" in query
+
+    def test_write_nodes_skips_empty_batch(self):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        sink = Neo4jSink()
+        with patch.object(sink, "get_session") as get_session:
+            sink.write_nodes("db-tenant-x", "`AWSUser`", [])
+            get_session.assert_not_called()
+
+    def test_write_nodes_merges_on_provider_resource_label(self):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            sink.write_nodes(
+                "db-tenant-x",
+                "`AWSUser`:`_ProviderResource`",
+                [{"provider_element_id": "p:e", "props": {"k": "v"}}],
+            )
+
+        query, params = session.run.call_args.args
+        assert "MERGE (n:`_ProviderResource`" in query
+        assert "`_provider_element_id`: row.provider_element_id" in query
+        assert "SET n:`AWSUser`:`_ProviderResource`" in query
+        assert params == {"rows": [{"provider_element_id": "p:e", "props": {"k": "v"}}]}
+
+    def test_write_relationships_scopes_endpoints_by_provider_label(self):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+        provider_id = "00000000-0000-0000-0000-000000000abc"
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            sink.write_relationships(
+                "db-tenant-x",
+                "RESOURCE",
+                provider_id,
+                [
+                    {
+                        "start_element_id": "s",
+                        "end_element_id": "e",
+                        "provider_element_id": "pe",
+                        "props": {},
+                    }
+                ],
+            )
+
+        query = session.run.call_args.args[0]
+        assert ":`_Provider_00000000000000000000000000000abc`" in query
+        assert ":RESOURCE" in query.replace("`", "")
+        assert "MERGE (s)-[r:`RESOURCE`" in query
+
+
+class TestNeptuneSinkSyncWrites:
+    def test_ensure_sync_indexes_is_noop(self):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        sink = NeptuneSink()
+        with patch.object(sink, "get_session") as get_session:
+            sink.ensure_sync_indexes("ignored")
+            get_session.assert_not_called()
+
+    def test_write_nodes_merges_on_neptune_id_with_provider_resource_label(self):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        sink = NeptuneSink()
+        session = MagicMock()
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            sink.write_nodes(
+                "ignored",
+                "`AWSUser`",
+                [{"provider_element_id": "p:e", "props": {"k": "v"}}],
+            )
+
+        query = session.run.call_args.args[0]
+        # Neptune assigns a default `vertex` label to any unlabeled node,
+        # so the MERGE must pin a real label at creation time.
+        assert "MERGE (n:`_ProviderResource` {`~id`: row.provider_element_id})" in query
+        assert "SET n:`AWSUser`" in query
+        assert "SET n.`_provider_element_id` = row.provider_element_id" in query
+
+    def test_write_relationships_matches_endpoints_by_id(self):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        sink = NeptuneSink()
+        session = MagicMock()
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            sink.write_relationships(
+                "ignored",
+                "RESOURCE",
+                "provider-1",
+                [
+                    {
+                        "start_element_id": "s",
+                        "end_element_id": "e",
+                        "provider_element_id": "pe",
+                        "props": {},
+                    }
+                ],
+            )
+
+        query = session.run.call_args.args[0]
+        assert "MATCH (s) WHERE id(s) = row.start_element_id" in query
+        assert "MATCH (e) WHERE id(e) = row.end_element_id" in query
+        assert "MERGE (s)-[r:`RESOURCE`" in query
+
+
+class TestNeptuneSinkDropSubgraph:
+    def test_drop_subgraph_deletes_rels_before_nodes_in_bounded_batches(self):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        sink = NeptuneSink()
+        session = MagicMock()
+
+        rel_record_first = MagicMock()
+        rel_record_first.__getitem__ = lambda _self, key: 50
+        rel_record_drain = MagicMock()
+        rel_record_drain.__getitem__ = lambda _self, key: 0
+        node_record_first = MagicMock()
+        node_record_first.__getitem__ = lambda _self, key: 10
+        node_record_drain = MagicMock()
+        node_record_drain.__getitem__ = lambda _self, key: 0
+
+        run_results = [
+            MagicMock(single=MagicMock(return_value=rel_record_first)),
+            MagicMock(single=MagicMock(return_value=rel_record_drain)),
+            MagicMock(single=MagicMock(return_value=node_record_first)),
+            MagicMock(single=MagicMock(return_value=node_record_drain)),
+        ]
+        session.run.side_effect = run_results
+
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            deleted = sink.drop_subgraph("ignored", "provider-1")
+
+        assert deleted == 10
+        first_query = session.run.call_args_list[0].args[0]
+        assert "DELETE r" in first_query
+        assert "DETACH DELETE" not in first_query
+        # DISTINCT avoids double-counting relationships matched from both ends.
+        assert "DISTINCT r" in first_query
+        third_query = session.run.call_args_list[2].args[0]
+        assert "DELETE n" in third_query
+
+
+class TestNeo4jSinkDropSubgraph:
+    """Neo4j drop deletes relationships then nodes in batches (no ``DETACH DELETE``)."""
+
+    def test_drop_subgraph_deletes_rels_before_nodes_in_bounded_batches(self):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+
+        rel_first = MagicMock()
+        rel_first.get = lambda key, default=0: 50
+        rel_drain = MagicMock()
+        rel_drain.get = lambda key, default=0: 0
+        node_first = MagicMock()
+        node_first.get = lambda key, default=0: 10
+        node_drain = MagicMock()
+        node_drain.get = lambda key, default=0: 0
+        session.run.side_effect = [
+            MagicMock(single=MagicMock(return_value=rel_first)),
+            MagicMock(single=MagicMock(return_value=rel_drain)),
+            MagicMock(single=MagicMock(return_value=node_first)),
+            MagicMock(single=MagicMock(return_value=node_drain)),
+        ]
+
+        provider_id = "00000000-0000-0000-0000-000000000abc"
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            deleted = sink.drop_subgraph("db-tenant-x", provider_id)
+
+        # Only phase-2 node counts contribute to the return value.
+        assert deleted == 10
+        assert session.run.call_count == 4
+
+        queries = [call.args[0] for call in session.run.call_args_list]
+        # Regression guard: the memory blow-up was caused by DETACH DELETE.
+        assert all("DETACH DELETE" not in query for query in queries)
+
+        first_query = queries[0]
+        assert "DELETE r" in first_query
+        # DISTINCT avoids double-counting relationships matched from both ends.
+        assert "DISTINCT r" in first_query
+        assert ":`_Provider_00000000000000000000000000000abc`" in first_query
+
+        assert "DELETE n" in queries[2]
+
+        # Relationships must be fully drained before nodes are deleted.
+        first_node = next(i for i, q in enumerate(queries) if "DELETE n" in q)
+        last_rel = max(i for i, q in enumerate(queries) if "DELETE r" in q)
+        assert last_rel < first_node
+
+    def test_drop_subgraph_returns_zero_when_database_does_not_exist(self):
+        from api.attack_paths.database import GraphDatabaseQueryException
+        from api.attack_paths.sink.neo4j import DATABASE_NOT_FOUND_CODE, Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+        session.run.side_effect = GraphDatabaseQueryException(
+            message="db missing", code=DATABASE_NOT_FOUND_CODE
+        )
+
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            deleted = sink.drop_subgraph("db-tenant-missing", "provider-1")
+
+        assert deleted == 0
+
+
+class TestSinkHasProviderData:
+    """``has_provider_data`` is the read-path probe used by API views."""
+
+    def test_neo4j_returns_true_when_provider_node_exists(self):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+        session.run.return_value.single.return_value = MagicMock()
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            present = sink.has_provider_data(
+                "db-tenant-x", "00000000-0000-0000-0000-000000000abc"
+            )
+
+        assert present is True
+        query = session.run.call_args.args[0]
+        assert ":`_Provider_00000000000000000000000000000abc`" in query
+
+    def test_neo4j_returns_false_when_database_does_not_exist(self):
+        from api.attack_paths.database import GraphDatabaseQueryException
+        from api.attack_paths.sink.neo4j import DATABASE_NOT_FOUND_CODE, Neo4jSink
+
+        sink = Neo4jSink()
+        session = MagicMock()
+        session.run.side_effect = GraphDatabaseQueryException(
+            message="db missing", code=DATABASE_NOT_FOUND_CODE
+        )
+
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            present = sink.has_provider_data("db-tenant-missing", "provider-1")
+
+        assert present is False
+
+    def test_neptune_returns_true_when_provider_node_exists(self):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        sink = NeptuneSink()
+        session = MagicMock()
+        session.run.return_value.single.return_value = MagicMock()
+        with patch.object(sink, "get_session", return_value=_session_ctx(session)):
+            present = sink.has_provider_data("ignored", "provider-1")
+
+        assert present is True
+
+
+class TestGetBackendForScanCutover:
+    """``get_backend_for_scan`` keeps old-sink scans queryable after cutover."""
+
+    def test_legacy_scan_on_neptune_process_uses_neo4j_secondary(self, settings):
+        from api.attack_paths.sink import factory
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
+        active_neptune = MagicMock(name="neptune-active")
+        factory._backend = active_neptune
+
+        secondary_neo4j = MagicMock(name="neo4j-secondary")
+        with patch.object(factory, "_build_backend", return_value=secondary_neo4j):
+            scan = MagicMock(sink_backend="neo4j")
+            backend = factory.get_backend_for_scan(scan)
+
+        assert backend is secondary_neo4j
+        assert backend is not active_neptune
+
+
+class TestSinkVerifyConnectivity:
+    """The readiness probe calls ``verify_connectivity`` through the shim.
+
+    Neo4j checks its single driver; Neptune checks the reader (the API read
+    path), which on single-endpoint clusters aliases the writer.
+    """
+
+    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
+    def test_neo4j_verifies_its_driver(self, mock_driver, settings):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neo4j": {
+                "HOST": "localhost",
+                "PORT": "7687",
+                "USER": "neo4j",
+                "PASSWORD": "pw",
+            },
+        }
+        driver = MagicMock()
+        mock_driver.return_value = driver
+
+        sink = Neo4jSink()
+        sink.init()
+        driver.verify_connectivity.reset_mock()  # ignore the eager init check
+        sink.verify_connectivity()
+
+        driver.verify_connectivity.assert_called_once_with()
+
+    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
+    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
+    def test_neptune_verifies_reader_not_writer(
+        self, mock_driver, mock_auth_provider, settings
+    ):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neptune": {
+                "WRITER_ENDPOINT": "writer.example",
+                "READER_ENDPOINT": "reader.example",
+                "PORT": "8182",
+                "REGION": "eu-west-1",
+            },
+        }
+        writer, reader = MagicMock(name="writer"), MagicMock(name="reader")
+        mock_driver.side_effect = [writer, reader]
+        mock_auth_provider.return_value = lambda: None
+
+        sink = NeptuneSink()
+        sink.init()
+        writer.verify_connectivity.reset_mock()
+        reader.verify_connectivity.reset_mock()
+
+        sink.verify_connectivity()
+
+        reader.verify_connectivity.assert_called_once_with()
+        writer.verify_connectivity.assert_not_called()
+
+
+class TestSinkInitToleratesUnreachableSink:
+    """Init must not crash the process when the sink is down at boot.
+
+    Same degradation model as Postgres: the driver is retained and
+    reconnects lazily; /health/ready surfaces the outage until it recovers.
+    """
+
+    @patch("api.attack_paths.sink.neo4j.neo4j.GraphDatabase.driver")
+    def test_neo4j_init_continues_when_verify_fails(self, mock_driver, settings):
+        from api.attack_paths.sink.neo4j import Neo4jSink
+
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neo4j": {
+                "HOST": "localhost",
+                "PORT": "7687",
+                "USER": "neo4j",
+                "PASSWORD": "pw",
+            },
+        }
+        driver = MagicMock()
+        driver.verify_connectivity.side_effect = RuntimeError("unreachable")
+        mock_driver.return_value = driver
+
+        sink = Neo4jSink()
+        # Must not raise.
+        assert sink.init() is driver
+        assert sink._driver is driver
+
+    @patch("api.attack_paths.sink.neptune.neptune_auth_provider")
+    @patch("api.attack_paths.sink.neptune.neo4j.GraphDatabase.driver")
+    def test_neptune_init_continues_when_verify_fails(
+        self, mock_driver, mock_auth_provider, settings
+    ):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        settings.DATABASES = {
+            **settings.DATABASES,
+            "neptune": {
+                "WRITER_ENDPOINT": "writer.example",
+                "READER_ENDPOINT": "reader.example",
+                "PORT": "8182",
+                "REGION": "eu-west-1",
+            },
+        }
+        driver = MagicMock()
+        driver.verify_connectivity.side_effect = RuntimeError("unreachable")
+        mock_driver.return_value = driver
+        mock_auth_provider.return_value = lambda: None
+
+        sink = NeptuneSink()
+        # Must not raise; both drivers retained.
+        sink.init()
+        assert sink._writer is not None
+        assert sink._reader is not None
+
+
+class TestNeptuneAdminNoOps:
+    """Neptune is single-database; admin DDL has no work to do."""
+
+    @pytest.mark.parametrize("method", ["create_database", "drop_database"])
+    def test_admin_ops_return_none_without_touching_a_session(self, method):
+        from api.attack_paths.sink.neptune import NeptuneSink
+
+        sink = NeptuneSink()
+        with patch.object(sink, "get_session") as get_session:
+            assert getattr(sink, method)("ignored") is None
+            get_session.assert_not_called()
+
+
+class TestNeptuneAuthToken:
+    """SigV4 signing for the Neptune Bolt endpoint."""
+
+    @patch("api.attack_paths.sink.neptune.SigV4Auth")
+    @patch("api.attack_paths.sink.neptune.BotoSession")
+    def test_host_header_includes_non_default_port(self, mock_boto, mock_sigv4):
+        # Neptune runs on 8182; the SigV4 canonical Host must keep the port or
+        # the signature is rejected.
+        from api.attack_paths.sink.neptune import _NeptuneAuthToken
+
+        credentials = MagicMock()
+        credentials.get_frozen_credentials.return_value = MagicMock()
+        mock_boto.return_value.get_credentials.return_value = credentials
+
+        token = _NeptuneAuthToken("eu-west-1", "https://writer.example:8182")
+
+        auth_obj = json.loads(token.credentials)
+        assert auth_obj["Host"] == "writer.example:8182"
@@ -4754,6 +4754,64 @@ class TestAttackPathsScanViewSet:
        assert first_attributes["provider_type"] == provider.provider
        assert first_attributes["provider_uid"] == provider.uid

+    def test_attack_paths_scans_list_prefers_active_sink_scan_on_rollback(
+        self,
+        authenticated_client,
+        providers_fixture,
+        scans_fixture,
+        create_attack_paths_scan,
+        settings,
+    ):
+        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
+        provider = providers_fixture[0]
+
+        neo4j_scan = create_attack_paths_scan(
+            provider,
+            scan=scans_fixture[0],
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            sink_backend="neo4j",
+        )
+        neptune_scan = create_attack_paths_scan(
+            provider,
+            scan=scans_fixture[0],
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            sink_backend="neptune",
+        )
+
+        response = authenticated_client.get(reverse("attack-paths-scans-list"))
+
+        assert response.status_code == status.HTTP_200_OK
+        ids = {item["id"] for item in response.json()["data"]}
+        assert str(neo4j_scan.id) in ids
+        assert str(neptune_scan.id) not in ids
+
+    def test_attack_paths_scans_list_falls_back_when_active_sink_has_no_scan(
+        self,
+        authenticated_client,
+        providers_fixture,
+        scans_fixture,
+        create_attack_paths_scan,
+        settings,
+    ):
+        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
+        provider = providers_fixture[0]
+
+        legacy_scan = create_attack_paths_scan(
+            provider,
+            scan=scans_fixture[0],
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            sink_backend="neo4j",
+        )
+
+        response = authenticated_client.get(reverse("attack-paths-scans-list"))
+
+        assert response.status_code == status.HTTP_200_OK
+        ids = {item["id"] for item in response.json()["data"]}
+        assert str(legacy_scan.id) in ids
+
    def test_attack_paths_scans_list_respects_provider_group_visibility(
        self,
        authenticated_client_no_permissions_rbac,
@@ -4874,7 +4932,8 @@ class TestAttackPathsScanViewSet:
            )

        assert response.status_code == status.HTTP_200_OK
-        mock_get_queries.assert_called_once_with(provider.provider)
+        # TODO: drop the is_migrated argument after Neptune cutover
+        mock_get_queries.assert_called_once_with(provider.provider, is_migrated=False)
        payload = response.json()["data"]
        assert len(payload) == 1
        assert payload[0]["id"] == "aws-rds"
@@ -4974,7 +5033,8 @@ class TestAttackPathsScanViewSet:
            )

        assert response.status_code == status.HTTP_200_OK
-        mock_get_query.assert_called_once_with("aws-rds")
+        # TODO: drop the is_migrated argument after Neptune cutover
+        mock_get_query.assert_called_once_with("aws-rds", is_migrated=False)
        mock_get_db_name.assert_called_once_with(attack_paths_scan.provider.tenant_id)
        provider_id = str(attack_paths_scan.provider_id)
        mock_prepare.assert_called_once_with(
@@ -4988,6 +5048,7 @@ class TestAttackPathsScanViewSet:
            query_definition,
            prepared_parameters,
            provider_id,
+            scan=attack_paths_scan,
        )
        result = response.json()["data"]
        attributes = result["attributes"]
@@ -5339,6 +5400,7 @@ class TestAttackPathsScanViewSet:
            "db-test",
            "MATCH (n) RETURN n",
            str(attack_paths_scan.provider_id),
+            scan=attack_paths_scan,
        )
        attributes = response.json()["data"]["attributes"]
        assert len(attributes["nodes"]) == 1
@@ -5875,9 +5937,10 @@ class TestAttackPathsScanViewSet:
            )

        assert response.status_code == status.HTTP_200_OK
-        mock_get_schema.assert_called_once_with(
-            "db-test", str(attack_paths_scan.provider_id)
-        )
+        mock_get_schema.assert_called_once()
+        schema_args = mock_get_schema.call_args[0]
+        assert schema_args[:2] == ("db-test", str(attack_paths_scan.provider_id))
+        assert schema_args[2].id == attack_paths_scan.id
        attributes = response.json()["data"]["attributes"]
        assert attributes["provider"] == "aws"
        assert attributes["cartography_version"] == "0.129.0"
@@ -2876,13 +2876,22 @@ class AttackPathsScanViewSet(BaseRLSViewSet):

    def list(self, request, *args, **kwargs):
        queryset = self.filter_queryset(self.get_queryset())
+        active_sink_backend = django_settings.ATTACK_PATHS_SINK_DATABASE

        latest_per_provider = queryset.annotate(
+            active_sink_rank=Case(
+                When(sink_backend=active_sink_backend, then=Value(0)),
+                default=Value(1),
+                output_field=IntegerField(),
+            ),
            latest_scan_rank=Window(
                expression=RowNumber(),
                partition_by=[F("provider_id")],
-                order_by=[F("inserted_at").desc()],
-            )
+                order_by=[
+                    F("active_sink_rank").asc(),
+                    F("inserted_at").desc(),
+                ],
+            ),
        ).filter(latest_scan_rank=1)

        page = self.paginate_queryset(latest_per_provider)
@@ -2909,7 +2918,11 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
    )
    def attack_paths_queries(self, request, pk=None):
        attack_paths_scan = self.get_object()
-        queries = get_queries_for_provider(attack_paths_scan.provider.provider)
+        # TODO: drop the is_migrated argument after Neptune cutover
+        queries = get_queries_for_provider(
+            attack_paths_scan.provider.provider,
+            is_migrated=attack_paths_scan.is_migrated,
+        )

        if not queries:
            return Response(
@@ -2942,7 +2955,11 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
        serializer = AttackPathsQueryRunRequestSerializer(data=payload)
        serializer.is_valid(raise_exception=True)

-        query_definition = get_query_by_id(serializer.validated_data["id"])
+        # TODO: drop the is_migrated argument after Neptune cutover
+        query_definition = get_query_by_id(
+            serializer.validated_data["id"],
+            is_migrated=attack_paths_scan.is_migrated,
+        )
        if (
            query_definition is None
            or query_definition.provider != attack_paths_scan.provider.provider
@@ -2968,6 +2985,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
            query_definition,
            parameters,
            provider_id,
+            scan=attack_paths_scan,
        )
        query_duration = time.monotonic() - start

@@ -3035,6 +3053,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
            database_name,
            serializer.validated_data["query"],
            provider_id,
+            scan=attack_paths_scan,
        )
        query_duration = time.monotonic() - start

@@ -3091,7 +3110,7 @@ class AttackPathsScanViewSet(BaseRLSViewSet):
        provider_id = str(attack_paths_scan.provider_id)

        schema = attack_paths_views_helpers.get_cartography_schema(
-            database_name, provider_id
+            database_name, provider_id, attack_paths_scan
        )
        if not schema:
            return Response(
@@ -311,6 +311,11 @@ ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES = env.int(
    "ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES", 2880
 )  # 48h

+# Selects where the persistent attack-paths graph is stored. The scan
+# temporary database is always Neo4j; only the sink is configurable.
+# Valid values: "neo4j" (default, OSS and local dev), "neptune" (hosted).
+ATTACK_PATHS_SINK_DATABASE = env.str("ATTACK_PATHS_SINK_DATABASE", default="neo4j")
+
 # Orphan task recovery feature flags. The master switch is OFF by default, so task
 # recovery is opt-in; enable it with DJANGO_TASK_RECOVERY_ENABLED=true. The per-group
 # toggles default to enabled, so once the master is on every group recovers unless a
@@ -50,6 +50,12 @@ DATABASES = {
        "USER": env.str("NEO4J_USER", "neo4j"),
        "PASSWORD": env.str("NEO4J_PASSWORD", "neo4j_password"),
    },
+    "neptune": {
+        "WRITER_ENDPOINT": env.str("NEPTUNE_WRITER_ENDPOINT", ""),
+        "READER_ENDPOINT": env.str("NEPTUNE_READER_ENDPOINT", ""),
+        "PORT": env.str("NEPTUNE_PORT", "8182"),
+        "REGION": env.str("AWS_REGION", ""),
+    },
 }

 DATABASES["default"] = DATABASES["prowler_user"]
@@ -49,12 +49,19 @@ DATABASES = {
        "HOST": env("POSTGRES_REPLICA_HOST", default=default_db_host),
        "PORT": env("POSTGRES_REPLICA_PORT", default=default_db_port),
    },
+    # TODO: drop after Neptune cutover just loosen defaults to `""`
    "neo4j": {
        "HOST": env.str("NEO4J_HOST"),
        "PORT": env.str("NEO4J_PORT"),
        "USER": env.str("NEO4J_USER"),
        "PASSWORD": env.str("NEO4J_PASSWORD"),
    },
+    "neptune": {
+        "WRITER_ENDPOINT": env.str("NEPTUNE_WRITER_ENDPOINT", default=""),
+        "READER_ENDPOINT": env.str("NEPTUNE_READER_ENDPOINT", default=""),
+        "PORT": env.str("NEPTUNE_PORT", default="8182"),
+        "REGION": env.str("AWS_REGION", default=""),
+    },
 }

 DATABASES["default"] = DATABASES["prowler_user"]
@@ -83,12 +83,28 @@ def _warm_compliance_caches_in_background():


 def post_fork(_server, worker):
-    """Warm compliance caches after each worker fork.
+    """Re-initialize attack-paths drivers and warm compliance caches per worker.

-    Warm compliance caches in a background thread so the worker becomes ready
-    immediately. A request for a not-yet-warmed provider lazily loads just that
-    provider, which stays well under the worker timeout.
+    Neo4j / Neptune drivers spawn background IO threads that do not survive
+    ``fork()``. When the gunicorn master runs with ``preload_app=True``, the
+    child inherits driver objects whose pool references dead threads and
+    hangs on the first ``pool.acquire`` call until the watchdog kills the
+    worker. Re-initializing per worker guarantees each child owns its own
+    live threads. See GUNICORN_WORKER_TIMEOUTS_ANALYSIS.md for detail.
+
+    Compliance caches are then warmed in a background thread so the worker
+    becomes ready immediately. A request for a not-yet-warmed provider lazily
+    loads just that provider, which stays well under the worker timeout.
    """
+    from api.attack_paths import database as graph_database
+
+    try:
+        graph_database.close_driver()
+    except Exception:  # pragma: no cover - best-effort cleanup
+        pass
+    graph_database.init_driver()
+    gunicorn_logger.info(f"Attack-paths drivers initialized for worker {worker.pid}")
+
    threading.Thread(
        target=_warm_compliance_caches_in_background,
        name="warm-compliance-caches",
@@ -1821,6 +1821,36 @@ def attack_paths_query_definition_factory():
    return _create


+@pytest.fixture
+def sink_backend_stub():
+    """Install a stub `SinkDatabase` into the sink factory for the test's duration.
+
+    The sink factory caches a process-wide backend and lazily initializes it
+    against `settings.DATABASES["neo4j"]` / `["neptune"]`. Tests that don't
+    want to stand up a real Bolt driver can yield this fixture's mock and
+    configure its return values directly:
+
+        sink_backend_stub.execute_read_query.return_value = some_graph
+
+    Both the active backend and the secondary-backend cache are restored on
+    teardown so tests stay isolated.
+    """
+    from api.attack_paths.sink import factory
+    from api.attack_paths.sink.base import SinkDatabase
+
+    stub = MagicMock(spec=SinkDatabase)
+    previous_backend = factory._backend
+    previous_secondary = dict(factory._secondary_backends)
+    factory._backend = stub
+    factory._secondary_backends.clear()
+    try:
+        yield stub
+    finally:
+        factory._backend = previous_backend
+        factory._secondary_backends.clear()
+        factory._secondary_backends.update(previous_secondary)
+
+
@pytest.fixture
 def attack_paths_graph_stub_classes():
    """Provide lightweight graph element stubs for Attack Paths serialization tests."""
@@ -6,6 +6,7 @@ from typing import Any

 import aioboto3
 import boto3
+import botocore
 import neo4j
 from api.models import (
    AttackPathsScan as ProwlerAPIAttackPathsScan,
@@ -73,13 +74,28 @@ def start_aws_ingestion(
    # Adding an extra field
    common_job_parameters["AWS_ID"] = prowler_api_provider.uid

-    cartography_aws._autodiscover_accounts(
-        neo4j_session,
-        boto3_session,
-        prowler_api_provider.uid,
-        cartography_config.update_tag,
-        common_job_parameters,
-    )
+    # AWS Organizations account autodiscovery. Inlined from Cartography's removed
+    # `_autodiscover_accounts` (deleted in `0.137.0`), as `load_aws_accounts` is still public.
+    try:
+        org_client = boto3_session.client("organizations")
+        paginator = org_client.get_paginator("list_accounts")
+        discovered = []
+        for page in paginator.paginate():
+            discovered.extend(page["Accounts"])
+        active_accounts = {
+            a["Name"]: a["Id"] for a in discovered if a["Status"] == "ACTIVE"
+        }
+        cartography_aws.organizations.load_aws_accounts(
+            neo4j_session,
+            active_accounts,
+            cartography_config.update_tag,
+            common_job_parameters,
+        )
+    except botocore.exceptions.ClientError:
+        logger.warning(
+            f"Account {prowler_api_provider.uid} lacks permissions for AWS "
+            "Organizations autodiscovery."
+        )
    db_utils.update_attack_paths_scan_progress(attack_paths_scan, 4)

    failed_syncs = sync_aws_account(
@@ -277,7 +293,7 @@ def sync_aws_account(
    sync_args: dict[str, Any],
    attack_paths_scan: ProwlerAPIAttackPathsScan,
 ) -> dict[str, str]:
-    current_progress = 4  # `cartography_aws._autodiscover_accounts`
+    current_progress = 4  # AWS Organizations account autodiscovery
    max_progress = (
        87  # `cartography_aws.RESOURCE_FUNCTIONS["permission_relationships"]` - 1
    )
@@ -8,7 +8,7 @@ from celery import states
 from celery.utils.log import get_task_logger
 from config.django.base import ATTACK_PATHS_SCAN_STALE_THRESHOLD_MINUTES
 from tasks.jobs.attack_paths.db_utils import (
-    _mark_scan_finished,
+    mark_scan_finished,
    recover_graph_data_ready,
 )
 from tasks.jobs.orphan_recovery import is_worker_alive as _is_worker_alive
@@ -87,7 +87,7 @@ def _cleanup_stale_executing_scans(cutoff: datetime) -> list[str]:
            else:
                reason = "Worker dead — cleaned up by periodic task"
        else:
-            # No worker recorded — time-based heuristic only
+            # No worker recorded, time-based heuristic only
            if scan.started_at and scan.started_at >= cutoff:
                continue
            reason = (
@@ -160,7 +160,7 @@ def _cleanup_scan(scan, task_result, reason: str) -> bool:
    """
    scan_id_str = str(scan.id)

-    # 1. Drop temp Neo4j database
+    # Drop temp Neo4j database
    tmp_db_name = graph_database.get_database_name(scan.id, temporary=True)
    try:
        graph_database.drop_database(tmp_db_name)
@@ -225,6 +225,6 @@ def _finalize_failed_scan(scan, expected_state: str, reason: str):
            logger.info(f"Scan {scan_id_str} is now {fresh_scan.state}, skipping")
            return None

-        _mark_scan_finished(fresh_scan, StateChoices.FAILED, {"global_error": reason})
+        mark_scan_finished(fresh_scan, StateChoices.FAILED, {"global_error": reason})

    return fresh_scan
@@ -1,9 +1,14 @@
 from collections.abc import Callable
-from dataclasses import dataclass
 from uuid import UUID

 from config.env import env
-from tasks.jobs.attack_paths import aws
+from tasks.jobs.attack_paths import provider_config as _provider_config
+
+# Re-export provider config objects so existing imports keep working.
+AWS_CONFIG = _provider_config.AWS_CONFIG
+NormalizedList = _provider_config.NormalizedList
+PROVIDER_CONFIGS = _provider_config.PROVIDER_CONFIGS
+ProviderConfig = _provider_config.ProviderConfig

 # Batch size for Neo4j write operations (resource labeling, cleanup)
 BATCH_SIZE = env.int("ATTACK_PATHS_BATCH_SIZE", 1000)
@@ -21,42 +26,12 @@ PROWLER_FINDING_LABEL = "ProwlerFinding"
 PROVIDER_RESOURCE_LABEL = "_ProviderResource"

 # Dynamic isolation labels that contain entity UUIDs and are added to every synced node during sync
-# Format: _Tenant_{uuid_no_hyphens}, _Provider_{uuid_no_hyphens}
+# Format: `_Tenant_{uuid_no_hyphens}`, `_Provider_{uuid_no_hyphens}`
 TENANT_LABEL_PREFIX = "_Tenant_"
 PROVIDER_LABEL_PREFIX = "_Provider_"
 DYNAMIC_ISOLATION_PREFIXES = [TENANT_LABEL_PREFIX, PROVIDER_LABEL_PREFIX]


-@dataclass(frozen=True)
-class ProviderConfig:
-    """Configuration for a cloud provider's Attack Paths integration."""
-
-    name: str
-    root_node_label: str  # e.g., "AWSAccount"
-    uid_field: str  # e.g., "arn"
-    # Label for resources connected to the account node, enabling indexed finding lookups.
-    resource_label: str  # e.g., "_AWSResource"
-    ingestion_function: Callable
-    # Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance).
-    short_uid_extractor: Callable[[str], str]
-
-
-# Provider Configurations
-# -----------------------
-
-AWS_CONFIG = ProviderConfig(
-    name="aws",
-    root_node_label="AWSAccount",
-    uid_field="arn",
-    resource_label="_AWSResource",
-    ingestion_function=aws.start_aws_ingestion,
-    short_uid_extractor=aws.extract_short_uid,
-)
-
-PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
-    "aws": AWS_CONFIG,
-}
-
 # Labels added by Prowler that should be filtered from API responses
 # Derived from provider configs + common internal labels
 INTERNAL_LABELS: list[str] = [
@@ -87,7 +62,6 @@ INTERNAL_PROPERTIES: list[str] = [


 # Provider Config Accessors
-# -------------------------


 def is_provider_available(provider_type: str) -> bool:
@@ -135,7 +109,6 @@ def get_short_uid_extractor(provider_type: str) -> Callable[[str], str]:


 # Dynamic Isolation Label Helpers
-# --------------------------------


 def _normalize_uuid(value: str | UUID) -> str:
@@ -8,6 +8,8 @@ from api.models import Provider as ProwlerAPIProvider
 from api.models import StateChoices
 from cartography.config import Config as CartographyConfig
 from celery.utils.log import get_task_logger
+from django.conf import settings
+from django.db.models import Case, IntegerField, Value, When
 from tasks.jobs.attack_paths.config import is_provider_available

 logger = get_task_logger(__name__)
@@ -29,13 +31,33 @@ def create_attack_paths_scan(
        return None

    with rls_transaction(tenant_id):
-        # Inherit graph_data_ready from the previous scan for this provider,
-        # so queries remain available while the new scan runs.
-        previous_data_ready = ProwlerAPIAttackPathsScan.objects.filter(
-            tenant_id=tenant_id,
-            provider_id=provider_id,
-            graph_data_ready=True,
-        ).exists()
+        # Inherit metadata from the previous ready scan for this provider so
+        # queries remain available while the new scan runs. The new row only
+        # flips to the target sink after its own graph sync succeeds.
+        active_sink_backend = settings.ATTACK_PATHS_SINK_DATABASE
+        previous_ready = (
+            ProwlerAPIAttackPathsScan.objects.filter(
+                tenant_id=tenant_id,
+                provider_id=provider_id,
+                graph_data_ready=True,
+            )
+            .annotate(
+                active_sink_rank=Case(
+                    When(sink_backend=active_sink_backend, then=Value(0)),
+                    default=Value(1),
+                    output_field=IntegerField(),
+                )
+            )
+            .order_by("active_sink_rank", "-inserted_at")
+            .first()
+        )
+        previous_data_ready = previous_ready is not None
+        inherited_is_migrated = previous_ready.is_migrated if previous_ready else False
+        inherited_sink_backend = (
+            previous_ready.sink_backend
+            if previous_ready
+            else ProwlerAPIAttackPathsScan.SinkBackendChoices.NEO4J
+        )

        attack_paths_scan = ProwlerAPIAttackPathsScan.objects.create(
            tenant_id=tenant_id,
@@ -44,6 +66,8 @@ def create_attack_paths_scan(
            state=StateChoices.SCHEDULED,
            started_at=datetime.now(tz=UTC),
            graph_data_ready=previous_data_ready,
+            is_migrated=inherited_is_migrated,
+            sink_backend=inherited_sink_backend,
        )
        attack_paths_scan.save()

@@ -114,7 +138,7 @@ def starting_attack_paths_scan(
    return True


-def _mark_scan_finished(
+def mark_scan_finished(
    attack_paths_scan: ProwlerAPIAttackPathsScan,
    state: StateChoices,
    ingestion_exceptions: dict[str, Any],
@@ -148,7 +172,7 @@ def finish_attack_paths_scan(
    ingestion_exceptions: dict[str, Any],
 ) -> None:
    with rls_transaction(attack_paths_scan.tenant_id):
-        _mark_scan_finished(attack_paths_scan, state, ingestion_exceptions)
+        mark_scan_finished(attack_paths_scan, state, ingestion_exceptions)


 def update_attack_paths_scan_progress(
@@ -169,19 +193,45 @@ def set_graph_data_ready(
        attack_paths_scan.save(update_fields=["graph_data_ready"])


+def set_scan_migrated(
+    attack_paths_scan: ProwlerAPIAttackPathsScan,
+    migrated: bool,
+    sink_backend: str | None = None,
+) -> None:
+    """Mark the scan as written with the current (migrated) schema.
+
+    Called after a successful sync so the read catalog and sink backend only
+    switch once the new graph is actually live.
+
+    # TODO: drop after Neptune cutover
+    """
+    with rls_transaction(attack_paths_scan.tenant_id):
+        attack_paths_scan.is_migrated = migrated
+        update_fields = ["is_migrated"]
+        if sink_backend is not None:
+            attack_paths_scan.sink_backend = sink_backend
+            update_fields.append("sink_backend")
+        attack_paths_scan.save(update_fields=update_fields)
+
+
 def set_provider_graph_data_ready(
    attack_paths_scan: ProwlerAPIAttackPathsScan,
    ready: bool,
+    sink_backend: str | None = None,
 ) -> None:
    """
-    Set `graph_data_ready` for ALL scans of the same provider.
+    Set `graph_data_ready` for scans of the same provider in one sink.

-    Used before drop/sync so that older scan IDs cannot bypass the query gate while the graph is being replaced.
+    Used before drop/sync so that older scan IDs in the target sink cannot
+    bypass the query gate while that sink's graph is being replaced. Scans
+    preserved in another sink stay queryable for rollback.
    """
+    target_sink_backend = sink_backend or attack_paths_scan.sink_backend
    with rls_transaction(attack_paths_scan.tenant_id):
        ProwlerAPIAttackPathsScan.objects.filter(
            tenant_id=attack_paths_scan.tenant_id,
            provider_id=attack_paths_scan.provider_id,
+            sink_backend=target_sink_backend,
        ).update(graph_data_ready=ready)
        attack_paths_scan.refresh_from_db(fields=["graph_data_ready"])

@@ -202,10 +252,15 @@ def recover_graph_data_ready(
    next successful scan) is a worse outcome for the user.
    """
    try:
+        from api.attack_paths import sink as sink_module
+
        tenant_db = graph_database.get_database_name(attack_paths_scan.tenant_id)
-        if graph_database.has_provider_data(
-            tenant_db, str(attack_paths_scan.provider_id)
-        ):
+        # TODO: drop after Neptune cutover
+        # Check the backend that actually holds this scan's data, not the
+        # currently configured sink, a stale `EXECUTING` scan from before a
+        # backend switch must still be recoverable
+        backend = sink_module.get_backend_for_scan(attack_paths_scan)
+        if backend.has_provider_data(tenant_db, str(attack_paths_scan.provider_id)):
            set_provider_graph_data_ready(attack_paths_scan, True)
            logger.info(
                f"Recovered `graph_data_ready` for provider {attack_paths_scan.provider_id}"
@@ -247,6 +302,6 @@ def fail_attack_paths_scan(
            return
        if fresh.state in (StateChoices.COMPLETED, StateChoices.FAILED):
            return
-        _mark_scan_finished(fresh, StateChoices.FAILED, {"global_error": error})
+        mark_scan_finished(fresh, StateChoices.FAILED, {"global_error": error})

    recover_graph_data_ready(fresh)
@@ -82,7 +82,6 @@ def _to_neo4j_dict(


 # Public API
-# ----------


 def analysis(
@@ -196,7 +195,6 @@ def load_findings(


 # Findings Streaming (Generator-based)
-# -------------------------------------


 def stream_findings_with_resources(
@@ -275,7 +273,6 @@ def _fetch_findings_batch(


 # Batch Enrichment
-# -----------------


 def _enrich_batch_with_resources(
@@ -1,5 +1,6 @@
 import neo4j
 from cartography.client.core.tx import run_write_query
+from cartography.intel import create_indexes as cartography_create_indexes
 from celery.utils.log import get_task_logger
 from tasks.jobs.attack_paths.config import (
    INTERNET_NODE_LABEL,
@@ -30,14 +31,34 @@ SYNC_INDEX_STATEMENTS = [


 def create_findings_indexes(neo4j_session: neo4j.Session) -> None:
-    """Create indexes for Prowler findings and resource lookups."""
+    """Create indexes for Prowler findings and resource lookups.
+
+    Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
+    session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
+    and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
+    """
    logger.info("Creating indexes for Prowler Findings node types")
    for statement in FINDINGS_INDEX_STATEMENTS:
        run_write_query(neo4j_session, statement)


+def create_cartography_indexes(neo4j_session: neo4j.Session, config) -> None:
+    """Create Cartography's standard indexes for the session's database.
+
+    Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
+    session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
+    and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
+    """
+    cartography_create_indexes.run(neo4j_session, config)
+
+
 def create_sync_indexes(neo4j_session: neo4j.Session) -> None:
-    """Create indexes for provider resource sync operations."""
+    """Create indexes for provider resource sync operations.
+
+    Runs `CREATE INDEX`, so the caller must only invoke this against a Neo4j
+    session (the temp ingest DB or a Neo4j sink). Neptune auto-manages indexes
+    and rejects `CREATE INDEX`, so callers skip it for the Neptune sink.
+    """
    logger.info("Ensuring ProviderResource indexes exist")
    for statement in SYNC_INDEX_STATEMENTS:
        neo4j_session.run(statement)
@@ -0,0 +1,413 @@
+"""
+Provider-level Attack Paths configuration.
+
+Each `ProviderConfig` carries the cloud provider's ingestion entry point and
+the catalog of list-typed node properties (`normalized_lists`). The sync
+layer reads this catalog and materialises each list element as a child node
+connected to the parent by a typed edge, so queries traverse the graph
+instead of working on serialised list values. Both Neo4j and Neptune sinks
+write the same shape and queries are portable across them.
+"""
+
+from collections.abc import Callable
+from dataclasses import dataclass, field
+
+from tasks.jobs.attack_paths import aws
+
+
+@dataclass(frozen=True)
+class NormalizedList:
+    """Catalog entry for a list-typed node property.
+
+    Describes how the sync layer materialises a parent node's list-typed
+    property as a set of child item nodes connected by a typed edge.
+
+    Conventions (mechanical, do not invent):
+      - `child_label`: `<SourceLabel><PropertyPascal>Item`
+          e.g. AWSPolicyStatement.resource -> AWSPolicyStatementResourceItem
+      - `rel_type`:    `HAS_<PROPERTY_UPPER>`
+          e.g. resource -> HAS_RESOURCE
+      - child node property:
+          * `field_map = []` (scalar list, ~95% case)  -> child stores `value: str`
+          * `field_map = [(src_key, child_field), ...]` (list of dicts, rare)
+              -> child stores those fields
+    """
+
+    source_label: str
+    source_property: str
+    child_label: str
+    rel_type: str
+    field_map: list[tuple[str, str]] = field(default_factory=list)
+
+    def __post_init__(self) -> None:
+        if self.field_map:
+            child_fields = [dst for _, dst in self.field_map]
+            if "value" in child_fields:
+                raise ValueError(
+                    f"NormalizedList {self.source_label}.{self.source_property}: "
+                    "`value` is reserved for scalar mode; do not map a source key to it"
+                )
+            src_keys = [src for src, _ in self.field_map]
+            if len(set(src_keys)) != len(src_keys):
+                raise ValueError(
+                    f"NormalizedList {self.source_label}.{self.source_property}: "
+                    "duplicate source key in field_map"
+                )
+            if len(set(child_fields)) != len(child_fields):
+                raise ValueError(
+                    f"NormalizedList {self.source_label}.{self.source_property}: "
+                    "duplicate child field in field_map"
+                )
+
+
+@dataclass(frozen=True)
+class ProviderConfig:
+    """Configuration for a cloud provider's Attack Paths integration."""
+
+    name: str
+    root_node_label: str  # e.g., "AWSAccount"
+    uid_field: str  # e.g., "arn"
+    # Label for resources connected to the account node, enabling indexed finding lookups
+    resource_label: str  # e.g., "_AWSResource"
+    ingestion_function: Callable
+    # Maps a Postgres resource UID (e.g. full ARN) to the short-id form Cartography stores on some node types (e.g. `i-xxx` for EC2Instance)
+    short_uid_extractor: Callable[[str], str]
+    # List-typed properties to materialise as child nodes + edges at sync time.
+    # Mandatory (may be []). Without an entry here, a list-typed property falls
+    # back to comma-string flatten and emits a one-time warning.
+    normalized_lists: list[NormalizedList]
+
+
+# AWS list-typed property catalog.
+# One entry per Cartography node property whose runtime value is a list. The
+# sync layer materialises each element as a `<child_label>` node and links it
+# to the parent with a `<rel_type>` edge; see the `NormalizedList` docstring
+# above for the naming conventions.
+AWS_NORMALIZED_LISTS: list[NormalizedList] = [
+    # AWSPolicyStatement - the hot path driving the 53-query perf fix.
+    NormalizedList(
+        "AWSPolicyStatement", "action", "AWSPolicyStatementActionItem", "HAS_ACTION"
+    ),
+    NormalizedList(
+        "AWSPolicyStatement",
+        "notaction",
+        "AWSPolicyStatementNotactionItem",
+        "HAS_NOTACTION",
+    ),
+    NormalizedList(
+        "AWSPolicyStatement",
+        "resource",
+        "AWSPolicyStatementResourceItem",
+        "HAS_RESOURCE",
+    ),
+    NormalizedList(
+        "AWSPolicyStatement",
+        "notresource",
+        "AWSPolicyStatementNotresourceItem",
+        "HAS_NOTRESOURCE",
+    ),
+    # S3PolicyStatement - same shape as IAM policies; AWS allows list or string.
+    NormalizedList(
+        "S3PolicyStatement", "action", "S3PolicyStatementActionItem", "HAS_ACTION"
+    ),
+    NormalizedList(
+        "S3PolicyStatement", "resource", "S3PolicyStatementResourceItem", "HAS_RESOURCE"
+    ),
+    # IAM / Cognito / KMS / Secrets
+    NormalizedList(
+        "CognitoIdentityPool", "roles", "CognitoIdentityPoolRolesItem", "HAS_ROLES"
+    ),
+    NormalizedList(
+        "KMSKey",
+        "encryption_algorithms",
+        "KMSKeyEncryptionAlgorithmsItem",
+        "HAS_ENCRYPTION_ALGORITHMS",
+    ),
+    NormalizedList(
+        "KMSKey",
+        "signing_algorithms",
+        "KMSKeySigningAlgorithmsItem",
+        "HAS_SIGNING_ALGORITHMS",
+    ),
+    NormalizedList(
+        "KMSKey",
+        "anonymous_actions",
+        "KMSKeyAnonymousActionsItem",
+        "HAS_ANONYMOUS_ACTIONS",
+    ),
+    NormalizedList(
+        "KMSGrant", "operations", "KMSGrantOperationsItem", "HAS_OPERATIONS"
+    ),
+    NormalizedList(
+        "SecretsManagerSecretVersion",
+        "version_stages",
+        "SecretsManagerSecretVersionVersionStagesItem",
+        "HAS_VERSION_STAGES",
+    ),
+    NormalizedList(
+        "SecretsManagerSecretVersion",
+        "kms_key_ids",
+        "SecretsManagerSecretVersionKmsKeyIdsItem",
+        "HAS_KMS_KEY_IDS",
+    ),
+    NormalizedList(
+        "SecretsManagerSecretVersion",
+        "tags",
+        "SecretsManagerSecretVersionTagsItem",
+        "HAS_TAGS",
+        field_map=[("Key", "key"), ("Value", "value_")],
+        # `value` is reserved for scalar mode; map `Value` to `value_` to keep dict shape.
+    ),
+    # Lambda / Compute
+    NormalizedList(
+        "AWSLambda", "architectures", "AWSLambdaArchitecturesItem", "HAS_ARCHITECTURES"
+    ),
+    NormalizedList(
+        "AWSLambda",
+        "anonymous_actions",
+        "AWSLambdaAnonymousActionsItem",
+        "HAS_ANONYMOUS_ACTIONS",
+    ),
+    NormalizedList(
+        "CodeBuildProject",
+        "environment_variables",
+        "CodeBuildProjectEnvironmentVariablesItem",
+        "HAS_ENVIRONMENT_VARIABLES",
+    ),
+    # ECS family
+    NormalizedList(
+        "ECSCluster",
+        "capacity_providers",
+        "ECSClusterCapacityProvidersItem",
+        "HAS_CAPACITY_PROVIDERS",
+    ),
+    NormalizedList(
+        "ECSTaskDefinition",
+        "compatibilities",
+        "ECSTaskDefinitionCompatibilitiesItem",
+        "HAS_COMPATIBILITIES",
+    ),
+    NormalizedList(
+        "ECSTaskDefinition",
+        "requires_compatibilities",
+        "ECSTaskDefinitionRequiresCompatibilitiesItem",
+        "HAS_REQUIRES_COMPATIBILITIES",
+    ),
+    NormalizedList(
+        "ECSContainerDefinition",
+        "links",
+        "ECSContainerDefinitionLinksItem",
+        "HAS_LINKS",
+    ),
+    NormalizedList(
+        "ECSContainerDefinition",
+        "entry_point",
+        "ECSContainerDefinitionEntryPointItem",
+        "HAS_ENTRY_POINT",
+    ),
+    NormalizedList(
+        "ECSContainerDefinition",
+        "command",
+        "ECSContainerDefinitionCommandItem",
+        "HAS_COMMAND",
+    ),
+    NormalizedList(
+        "ECSContainerDefinition",
+        "dns_servers",
+        "ECSContainerDefinitionDnsServersItem",
+        "HAS_DNS_SERVERS",
+    ),
+    NormalizedList(
+        "ECSContainerDefinition",
+        "dns_search_domains",
+        "ECSContainerDefinitionDnsSearchDomainsItem",
+        "HAS_DNS_SEARCH_DOMAINS",
+    ),
+    NormalizedList(
+        "ECSContainerDefinition",
+        "docker_security_options",
+        "ECSContainerDefinitionDockerSecurityOptionsItem",
+        "HAS_DOCKER_SECURITY_OPTIONS",
+    ),
+    NormalizedList("ECSContainer", "gpu_ids", "ECSContainerGpuIdsItem", "HAS_GPU_IDS"),
+    # ECR
+    NormalizedList(
+        "ECRImage", "layer_diff_ids", "ECRImageLayerDiffIdsItem", "HAS_LAYER_DIFF_IDS"
+    ),
+    NormalizedList(
+        "ECRImage",
+        "child_image_digests",
+        "ECRImageChildImageDigestsItem",
+        "HAS_CHILD_IMAGE_DIGESTS",
+    ),
+    # EC2 / Networking
+    NormalizedList(
+        "EC2Instance",
+        "exposed_internet_type",
+        "EC2InstanceExposedInternetTypeItem",
+        "HAS_EXPOSED_INTERNET_TYPE",
+    ),
+    NormalizedList(
+        "AutoScalingGroup",
+        "exposed_internet_type",
+        "AutoScalingGroupExposedInternetTypeItem",
+        "HAS_EXPOSED_INTERNET_TYPE",
+    ),
+    NormalizedList(
+        "LaunchConfiguration",
+        "security_groups",
+        "LaunchConfigurationSecurityGroupsItem",
+        "HAS_SECURITY_GROUPS",
+    ),
+    NormalizedList(
+        "LaunchTemplateVersion",
+        "security_group_ids",
+        "LaunchTemplateVersionSecurityGroupIdsItem",
+        "HAS_SECURITY_GROUP_IDS",
+    ),
+    NormalizedList(
+        "LaunchTemplateVersion",
+        "security_groups",
+        "LaunchTemplateVersionSecurityGroupsItem",
+        "HAS_SECURITY_GROUPS",
+    ),
+    NormalizedList(
+        "ELBListener", "policy_names", "ELBListenerPolicyNamesItem", "HAS_POLICY_NAMES"
+    ),
+    # CloudFront / Route53 / CloudWatch / CloudTrail
+    NormalizedList(
+        "CloudFrontDistribution",
+        "aliases",
+        "CloudFrontDistributionAliasesItem",
+        "HAS_ALIASES",
+    ),
+    NormalizedList(
+        "CloudFrontDistribution",
+        "geo_restriction_locations",
+        "CloudFrontDistributionGeoRestrictionLocationsItem",
+        "HAS_GEO_RESTRICTION_LOCATIONS",
+    ),
+    NormalizedList(
+        "CloudWatchLogGroup",
+        "inherited_properties",
+        "CloudWatchLogGroupInheritedPropertiesItem",
+        "HAS_INHERITED_PROPERTIES",
+    ),
+    # RDS / Storage
+    NormalizedList(
+        "RDSCluster",
+        "availability_zones",
+        "RDSClusterAvailabilityZonesItem",
+        "HAS_AVAILABILITY_ZONES",
+    ),
+    NormalizedList(
+        "RDSEventSubscription",
+        "event_categories",
+        "RDSEventSubscriptionEventCategoriesItem",
+        "HAS_EVENT_CATEGORIES",
+    ),
+    NormalizedList(
+        "RDSEventSubscription",
+        "source_ids",
+        "RDSEventSubscriptionSourceIdsItem",
+        "HAS_SOURCE_IDS",
+    ),
+    NormalizedList(
+        "S3Bucket",
+        "anonymous_actions",
+        "S3BucketAnonymousActionsItem",
+        "HAS_ANONYMOUS_ACTIONS",
+    ),
+    # Inspector / Config / SSM / ACM / APIGateway / Glue / SageMaker / Bedrock
+    NormalizedList(
+        "AWSInspectorFinding",
+        "referenceurls",
+        "AWSInspectorFindingReferenceurlsItem",
+        "HAS_REFERENCEURLS",
+    ),
+    NormalizedList(
+        "AWSInspectorFinding",
+        "relatedvulnerabilities",
+        "AWSInspectorFindingRelatedvulnerabilitiesItem",
+        "HAS_RELATEDVULNERABILITIES",
+    ),
+    NormalizedList(
+        "AWSInspectorFinding",
+        "vulnerablepackageids",
+        "AWSInspectorFindingVulnerablepackageidsItem",
+        "HAS_VULNERABLEPACKAGEIDS",
+    ),
+    NormalizedList(
+        "AWSConfigurationRecorder",
+        "recording_group_resource_types",
+        "AWSConfigurationRecorderRecordingGroupResourceTypesItem",
+        "HAS_RECORDING_GROUP_RESOURCE_TYPES",
+    ),
+    NormalizedList(
+        "AWSConfigRule",
+        "scope_compliance_resource_types",
+        "AWSConfigRuleScopeComplianceResourceTypesItem",
+        "HAS_SCOPE_COMPLIANCE_RESOURCE_TYPES",
+    ),
+    NormalizedList(
+        "AWSConfigRule",
+        "source_details",
+        "AWSConfigRuleSourceDetailsItem",
+        "HAS_SOURCE_DETAILS",
+    ),
+    NormalizedList(
+        "SSMInstancePatch", "cve_ids", "SSMInstancePatchCveIdsItem", "HAS_CVE_IDS"
+    ),
+    NormalizedList(
+        "ACMCertificate", "in_use_by", "ACMCertificateInUseByItem", "HAS_IN_USE_BY"
+    ),
+    NormalizedList(
+        "APIGatewayRestAPI",
+        "anonymous_actions",
+        "APIGatewayRestAPIAnonymousActionsItem",
+        "HAS_ANONYMOUS_ACTIONS",
+    ),
+    NormalizedList(
+        "GlueJob", "connections", "GlueJobConnectionsItem", "HAS_CONNECTIONS"
+    ),
+    NormalizedList(
+        "AWSBedrockFoundationModel",
+        "input_modalities",
+        "AWSBedrockFoundationModelInputModalitiesItem",
+        "HAS_INPUT_MODALITIES",
+    ),
+    NormalizedList(
+        "AWSBedrockFoundationModel",
+        "output_modalities",
+        "AWSBedrockFoundationModelOutputModalitiesItem",
+        "HAS_OUTPUT_MODALITIES",
+    ),
+    NormalizedList(
+        "AWSBedrockFoundationModel",
+        "customizations_supported",
+        "AWSBedrockFoundationModelCustomizationsSupportedItem",
+        "HAS_CUSTOMIZATIONS_SUPPORTED",
+    ),
+    NormalizedList(
+        "AWSBedrockFoundationModel",
+        "inference_types_supported",
+        "AWSBedrockFoundationModelInferenceTypesSupportedItem",
+        "HAS_INFERENCE_TYPES_SUPPORTED",
+    ),
+]
+
+
+AWS_CONFIG = ProviderConfig(
+    name="aws",
+    root_node_label="AWSAccount",
+    uid_field="arn",
+    resource_label="_AWSResource",
+    ingestion_function=aws.start_aws_ingestion,
+    short_uid_extractor=aws.extract_short_uid,
+    normalized_lists=AWS_NORMALIZED_LISTS,
+)
+
+
+PROVIDER_CONFIGS: dict[str, ProviderConfig] = {
+    "aws": AWS_CONFIG,
+}
@@ -1,8 +1,6 @@
 # Cypher query templates for Attack Paths operations
 from tasks.jobs.attack_paths.config import (
    INTERNET_NODE_LABEL,
-    PROVIDER_ELEMENT_ID_PROPERTY,
-    PROVIDER_RESOURCE_LABEL,
    PROWLER_FINDING_LABEL,
 )

@@ -21,7 +19,6 @@ def render_cypher_template(template: str, replacements: dict[str, str]) -> str:


 # Findings queries (used by findings.py)
-# ---------------------------------------

 ADD_RESOURCE_LABEL_TEMPLATE = """
    MATCH (account:__ROOT_LABEL__ {id: $provider_uid})-->(r)
@@ -88,7 +85,6 @@ INSERT_FINDING_TEMPLATE = f"""
 """

 # Internet queries (used by internet.py)
-# ---------------------------------------

 CREATE_INTERNET_NODE = f"""
    MERGE (internet:{INTERNET_NODE_LABEL} {{id: 'Internet'}})
@@ -118,8 +114,8 @@ CREATE_CAN_ACCESS_RELATIONSHIPS_TEMPLATE = f"""
    RETURN COUNT(r) AS relationships_merged
 """

-# Sync queries (used by sync.py)
-# -------------------------------
+# Sync queries (used by sync.py to fetch from the cartography temp Neo4j DB)
+# The write side of sync lives in each sink (`api/attack_paths/sink/`).

 NODE_FETCH_QUERY = """
    MATCH (n)
@@ -143,17 +139,3 @@ RELATIONSHIPS_FETCH_QUERY = """
    ORDER BY internal_id
    LIMIT $batch_size
 """
-
-NODE_SYNC_TEMPLATE = f"""
-    UNWIND $rows AS row
-    MERGE (n:__NODE_LABELS__ {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.provider_element_id}})
-    SET n += row.props
-"""
-
-RELATIONSHIP_SYNC_TEMPLATE = f"""
-    UNWIND $rows AS row
-    MATCH (s:{PROVIDER_RESOURCE_LABEL} {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.start_element_id}})
-    MATCH (t:{PROVIDER_RESOURCE_LABEL} {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.end_element_id}})
-    MERGE (s)-[r:__REL_TYPE__ {{{PROVIDER_ELEMENT_ID_PROPERTY}: row.provider_element_id}}]->(t)
-    SET r += row.props
-"""
@@ -39,8 +39,8 @@ Pipeline steps:

 7. Sync the temp database into the tenant database:
   - Drop the old provider subgraph (matched by dynamic _Provider_{uuid} label).
-     graph_data_ready is set to False for all scans of this provider while
-     the swap happens so the API doesn't serve partial data.
+     graph_data_ready is set to False for scans of this provider in the
+     target sink while the swap happens so the API doesn't serve partial data.
   - Copy nodes and relationships in batches. Every synced node gets a
     _ProviderResource label and dynamic _Tenant_{uuid} / _Provider_{uuid}
     isolation labels, plus a _provider_element_id property for MERGE keys.
@@ -64,10 +64,17 @@ from api.models import StateChoices
 from api.utils import initialize_prowler_provider
 from cartography.config import Config as CartographyConfig
 from cartography.intel import analysis as cartography_analysis
-from cartography.intel import create_indexes as cartography_create_indexes
 from cartography.intel import ontology as cartography_ontology
 from celery.utils.log import get_task_logger
-from tasks.jobs.attack_paths import db_utils, findings, indexes, internet, sync, utils
+from django.conf import settings
+from tasks.jobs.attack_paths import (
+    db_utils,
+    findings,
+    indexes,
+    internet,
+    sync,
+    utils,
+)
 from tasks.jobs.attack_paths.config import get_cartography_ingestion_function

 # Without this Celery goes crazy with Cartography logging
@@ -96,7 +103,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    attack_paths_scan = db_utils.retrieve_attack_paths_scan(tenant_id, scan_id)

    # Idempotency guard: cleanup may have flipped this row to a terminal state
-    # while the message was still in flight. Bail out before touching state.
+    # while the message was still in flight. Bail out before touching state
    if attack_paths_scan and attack_paths_scan.state in (
        StateChoices.FAILED,
        StateChoices.COMPLETED,
@@ -125,7 +132,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:

    else:
        if not attack_paths_scan:
-            # Safety net for in-flight messages or direct task invocations; dispatcher normally pre-creates the row.
+            # Safety net for in-flight messages or direct task invocations; dispatcher normally pre-creates the row
            logger.warning(
                f"No Attack Paths Scan found for scan {scan_id} and tenant {tenant_id}, let's create it then"
            )
@@ -143,10 +150,18 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    tenant_database_name = graph_database.get_database_name(
        prowler_api_provider.tenant_id
    )
+    target_sink_backend = settings.ATTACK_PATHS_SINK_DATABASE
+    target_description = (
+        f"tenant Neo4j database {tenant_database_name}"
+        if target_sink_backend == "neo4j"
+        else f"{target_sink_backend} sink"
+    )

    # While creating the Cartography configuration, attributes `neo4j_user` and `neo4j_password` are not really needed in this config object
    tmp_cartography_config = CartographyConfig(
-        neo4j_uri=graph_database.get_uri(),
+        # The temp ingest database is always Neo4j, so use the ingest URI here
+        # rather than the sink URI (which points at Neptune when configured).
+        neo4j_uri=graph_database.get_ingest_uri(),
        neo4j_database=tmp_database_name,
        update_tag=int(time.time()),
    )
@@ -168,7 +183,8 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
    scan_t0 = time.perf_counter()
    logger.info(
        f"Starting Attack Paths scan ({attack_paths_scan.id}) for "
-        f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id}"
+        f"{prowler_api_provider.provider.upper()} provider {prowler_api_provider.id} "
+        f"(staging=Neo4j database {tmp_database_name}, target={target_description})"
    )

    subgraph_dropped = False
@@ -177,7 +193,8 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:

    try:
        logger.info(
-            f"Creating Neo4j database {tmp_cartography_config.neo4j_database} for tenant {prowler_api_provider.tenant_id}"
+            f"Creating staging Neo4j database {tmp_cartography_config.neo4j_database} "
+            f"for tenant {prowler_api_provider.tenant_id}"
        )

        graph_database.create_database(tmp_cartography_config.neo4j_database)
@@ -191,7 +208,9 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            tmp_cartography_config.neo4j_database
        ) as tmp_neo4j_session:
            # Indexes creation
-            cartography_create_indexes.run(tmp_neo4j_session, tmp_cartography_config)
+            indexes.create_cartography_indexes(
+                tmp_neo4j_session, tmp_cartography_config
+            )
            indexes.create_findings_indexes(tmp_neo4j_session)
            db_utils.update_attack_paths_scan_progress(attack_paths_scan, 2)

@@ -223,7 +242,7 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            cartography_analysis.run(tmp_neo4j_session, tmp_cartography_config)
            db_utils.update_attack_paths_scan_progress(attack_paths_scan, 95)

-            # Creating Internet node and CAN_ACCESS relationships
+            # Creating Internet node and `CAN_ACCESS` relationships
            logger.info(
                f"Creating Internet graph for AWS account {prowler_api_provider.uid}"
            )
@@ -247,23 +266,41 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            db_utils.update_attack_paths_scan_progress(attack_paths_scan, 97)

        logger.info(
-            f"Clearing Neo4j cache for database {tmp_cartography_config.neo4j_database}"
+            f"Clearing Neo4j cache for staging database {tmp_cartography_config.neo4j_database}"
        )
        graph_database.clear_cache(tmp_cartography_config.neo4j_database)

+        t0 = time.perf_counter()
        logger.info(
-            f"Ensuring tenant database {tenant_database_name}, and its indexes, exists for tenant {prowler_api_provider.tenant_id}"
+            f"Preparing target {target_description} for tenant {prowler_api_provider.tenant_id}"
        )
        graph_database.create_database(tenant_database_name)
-        with graph_database.get_session(tenant_database_name) as tenant_neo4j_session:
-            cartography_create_indexes.run(
-                tenant_neo4j_session, tenant_cartography_config
-            )
-            indexes.create_findings_indexes(tenant_neo4j_session)
-            indexes.create_sync_indexes(tenant_neo4j_session)
+        # Sink-side index creation: Neptune auto-manages indexes and rejects
+        # `CREATE INDEX`, so only run it when the sink is Neo4j
+        # The temp ingest DB is always Neo4j and is always indexed above
+        if target_sink_backend != "neptune":
+            logger.info(f"Ensuring indexes exist for {target_description}")
+            with graph_database.get_session(
+                tenant_database_name
+            ) as tenant_neo4j_session:
+                indexes.create_cartography_indexes(
+                    tenant_neo4j_session, tenant_cartography_config
+                )
+                indexes.create_findings_indexes(tenant_neo4j_session)
+                indexes.create_sync_indexes(tenant_neo4j_session)
+        else:
+            logger.info("Skipping tenant database indexes for neptune sink")
+        logger.info(
+            f"Prepared target {target_description} in {time.perf_counter() - t0:.3f}s"
+        )

-        logger.info(f"Deleting existing provider graph in {tenant_database_name}")
-        db_utils.set_provider_graph_data_ready(attack_paths_scan, False)
+        logger.info(
+            f"Deleting existing provider graph from {target_description} "
+            f"(tenant={prowler_api_provider.tenant_id}, provider={prowler_api_provider.id})"
+        )
+        db_utils.set_provider_graph_data_ready(
+            attack_paths_scan, False, target_sink_backend
+        )
        provider_gated = True

        t0 = time.perf_counter()
@@ -272,14 +309,17 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            provider_id=str(prowler_api_provider.id),
        )
        logger.info(
-            f"Deleted existing provider graph in {time.perf_counter() - t0:.3f}s "
-            f"(deleted_nodes={deleted_nodes})"
+            f"Deleted existing provider graph from {target_description} "
+            f"in {time.perf_counter() - t0:.3f}s (deleted_nodes={deleted_nodes})"
        )
        subgraph_dropped = True
        db_utils.update_attack_paths_scan_progress(attack_paths_scan, 98)

        logger.info(
-            f"Syncing graph from {tmp_database_name} into {tenant_database_name}"
+            f"Syncing staging graph {tmp_database_name} into {target_description} "
+            f"for provider {prowler_api_provider.id} "
+            f"(tenant {prowler_api_provider.tenant_id}, "
+            f"type {prowler_api_provider.provider})"
        )
        t0 = time.perf_counter()
        sync_result = sync.sync_graph(
@@ -287,17 +327,34 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
            target_database=tenant_database_name,
            tenant_id=str(prowler_api_provider.tenant_id),
            provider_id=str(prowler_api_provider.id),
+            provider_type=prowler_api_provider.provider,
        )
+        elapsed = time.perf_counter() - t0
+        total_nodes = sync_result["nodes"] + sync_result["child_nodes"]
+        elements = total_nodes + sync_result["relationships"]
+        rate = elements / elapsed if elapsed else 0
        logger.info(
-            f"Synced graph in {time.perf_counter() - t0:.3f}s "
-            f"(nodes={sync_result['nodes']}, relationships={sync_result['relationships']})"
+            f"Synced staging graph into {target_description} in {elapsed:.3f}s - "
+            f"nodes={total_nodes} (source={sync_result['nodes']}, "
+            f"items={sync_result['child_nodes']}), "
+            f"relationships={sync_result['relationships']} "
+            f"(structural={sync_result['structural_relationships']}, "
+            f"items={sync_result['item_relationships']}), "
+            f"~{rate:.0f} elem/s"
        )
        sync_completed = True
+        # Flip metadata only now: the new schema is live in the target sink, so
+        # reads can switch to the current catalog/backend. The target-sink gate
+        # is already closed, so the switch is atomic from the API's view.
+        db_utils.set_scan_migrated(attack_paths_scan, True, target_sink_backend)
        db_utils.set_graph_data_ready(attack_paths_scan, True)
        db_utils.update_attack_paths_scan_progress(attack_paths_scan, 99)

-        logger.info(f"Clearing Neo4j cache for database {tenant_database_name}")
-        graph_database.clear_cache(tenant_database_name)
+        if target_sink_backend == "neptune":
+            logger.info("Skipping cache clear for neptune sink")
+        else:
+            logger.info(f"Clearing Neo4j cache for target {target_description}")
+            graph_database.clear_cache(tenant_database_name)

        logger.info(f"Dropping temporary Neo4j database {tmp_database_name}")
        graph_database.drop_database(tmp_database_name)
@@ -316,14 +373,16 @@ def run(tenant_id: str, scan_id: str, task_id: str) -> dict[str, Any]:
        logger.exception(exception_message)
        ingestion_exceptions["global_error"] = exception_message

-        # Recover graph_data_ready based on how far the swap got.
-        # Partial drop (mid-batch failure) may leave `subgraph_dropped=False`
-        # with data partially deleted, so we prefer that over permanently blocked queries.
+        # Recover `graph_data_ready` based on how far the swap got
+        # Partial drop (mid-batch failure) may leave `subgraph_dropped=False` with data partially deleted,
+        # so we prefer that over permanently blocked queries
        try:
            if sync_completed:
                db_utils.set_graph_data_ready(attack_paths_scan, True)
            elif provider_gated and not subgraph_dropped:
-                db_utils.set_provider_graph_data_ready(attack_paths_scan, True)
+                db_utils.set_provider_graph_data_ready(
+                    attack_paths_scan, True, target_sink_backend
+                )

        except Exception:
            logger.error(
@@ -1,40 +1,57 @@
 """
 Graph sync operations for Attack Paths.

-This module handles syncing graph data from temporary scan databases
-to the tenant database, adding provider isolation labels and properties.
+Reads nodes and relationships out of the cartography temp database (always
+Neo4j) and hands them to the configured sink (Neo4j or Neptune) in batches.
+Backend-specific Cypher (MERGE shape, ID strategy, indexes) lives in each
+sink; this module owns the source read loop, per-batch grouping, and the
+list-property materialisation policy (see `NormalizedList`).
+
+Each list-typed node property that appears in the provider's
+`normalized_lists` catalog becomes a set of child item nodes connected to
+the parent by a typed edge. A list-typed property that is not in the
+catalog is serialised to a comma-delimited string and emits a one-time
+warning per (label, property), surfacing Cartography fields that should be
+added to the catalog.
 """

+import json
 import time
 from collections import defaultdict
 from typing import Any

 import neo4j
 from api.attack_paths import database as graph_database
+from api.attack_paths import sink as sink_module
 from celery.utils.log import get_task_logger
 from tasks.jobs.attack_paths.config import (
+    PROVIDER_CONFIGS,
    PROVIDER_ISOLATION_PROPERTIES,
    PROVIDER_RESOURCE_LABEL,
    SYNC_BATCH_SIZE,
+    NormalizedList,
    get_provider_label,
    get_tenant_label,
 )
 from tasks.jobs.attack_paths.queries import (
    NODE_FETCH_QUERY,
-    NODE_SYNC_TEMPLATE,
-    RELATIONSHIP_SYNC_TEMPLATE,
    RELATIONSHIPS_FETCH_QUERY,
-    render_cypher_template,
 )

 logger = get_task_logger(__name__)

+# (label, property) tuples for which we've already emitted the
+# "unnormalised list" warning. Module-level so the warning fires once per
+# process, not once per node.
+_WARNED_UNNORMALIZED: set[tuple[str, str]] = set()
+

 def sync_graph(
    source_database: str,
    target_database: str,
    tenant_id: str,
    provider_id: str,
+    provider_type: str,
 ) -> dict[str, int]:
    """
    Sync all nodes and relationships from source to target database.
@@ -44,25 +61,38 @@ def sync_graph(
        `target_database`: The tenant database
        `tenant_id`: The tenant ID for isolation
        `provider_id`: The provider ID for isolation
+        `provider_type`: Provider type key (e.g. "aws"), used to resolve the
+            `NormalizedList` catalog from `PROVIDER_CONFIGS`.

    Returns:
-        Dict with counts of synced nodes and relationships
+        Dict with counts of synced nodes, child item nodes, and relationships.
    """
-    nodes_synced = sync_nodes(
+    sink = sink_module.get_backend()
+    sink.ensure_sync_indexes(target_database)
+
+    normalized_lists = _resolve_normalized_lists(provider_type)
+
+    node_result = sync_nodes(
        source_database,
        target_database,
        tenant_id,
        provider_id,
+        sink,
+        normalized_lists,
    )
    relationships_synced = sync_relationships(
        source_database,
        target_database,
        provider_id,
+        sink,
    )

    return {
-        "nodes": nodes_synced,
-        "relationships": relationships_synced,
+        "nodes": node_result["parents"],
+        "child_nodes": node_result["children"],
+        "relationships": relationships_synced + node_result["parent_child_rels"],
+        "structural_relationships": relationships_synced,
+        "item_relationships": node_result["parent_child_rels"],
    }


@@ -71,22 +101,35 @@ def sync_nodes(
    target_database: str,
    tenant_id: str,
    provider_id: str,
-) -> int:
+    sink: Any,
+    normalized_lists: list[NormalizedList],
+) -> dict[str, int]:
    """
-    Sync nodes from source to target database.
+    Sync nodes from source to target database, exploding catalogued list
+    properties into child nodes + parent->child edges.

    Adds `_ProviderResource` label and dynamic `_Tenant_{id}` and `_Provider_{id}`
-    isolation labels to all nodes.
+    isolation labels to all nodes (parents and children alike).

    Source and target sessions are opened sequentially per batch to avoid
    holding two Bolt connections simultaneously for the entire sync duration.
    """
    t0 = time.perf_counter()
    last_id = -1
-    total_synced = 0
+    parents_synced = 0
+    children_synced = 0
+    parent_child_rels = 0
+
+    catalog = _build_catalog_index(normalized_lists)
+    extra_labels = _build_extra_labels(tenant_id, provider_id)

    while True:
-        grouped: dict[tuple[str, ...], list[dict[str, Any]]] = defaultdict(list)
+        tb = time.perf_counter()
+        prev_children = children_synced
+        prev_rels = parent_child_rels
+        parent_groups: dict[tuple[str, ...], list[dict[str, Any]]] = defaultdict(list)
+        child_groups: dict[str, list[dict[str, Any]]] = defaultdict(list)
+        rel_groups: dict[str, list[dict[str, Any]]] = defaultdict(list)
        batch_count = 0

        with graph_database.get_session(source_database) as source_session:
@@ -97,43 +140,65 @@ def sync_nodes(
            for record in result:
                batch_count += 1
                last_id = record["internal_id"]
-                key, value = _node_to_sync_dict(record, provider_id)
-                grouped[key].append(value)
+                key, parent_dict, children, rels = _node_to_sync_dict(
+                    record, provider_id, catalog
+                )
+                parent_groups[key].append(parent_dict)
+                for child in children:
+                    child_groups[child["_child_label"]].append(child["row"])
+                for rel in rels:
+                    rel_groups[rel["rel_type"]].append(rel["row"])

        if batch_count == 0:
            break

-        with graph_database.get_session(target_database) as target_session:
-            for labels, batch in grouped.items():
-                label_set = set(labels)
-                label_set.add(PROVIDER_RESOURCE_LABEL)
-                label_set.add(get_tenant_label(tenant_id))
-                label_set.add(get_provider_label(provider_id))
-                node_labels = ":".join(f"`{label}`" for label in sorted(label_set))
+        for labels, batch in parent_groups.items():
+            sink.write_nodes(
+                target_database, _render_labels(labels, extra_labels), batch
+            )

-                query = render_cypher_template(
-                    NODE_SYNC_TEMPLATE, {"__NODE_LABELS__": node_labels}
-                )
-                target_session.run(query, {"rows": batch})
+        for child_label, batch in child_groups.items():
+            sink.write_nodes(
+                target_database,
+                _render_labels((child_label,), extra_labels),
+                batch,
+            )
+            children_synced += len(batch)

-        total_synced += batch_count
+        for rel_type, batch in rel_groups.items():
+            sink.write_relationships(target_database, rel_type, provider_id, batch)
+            parent_child_rels += len(batch)
+
+        parents_synced += batch_count
+        batch_dt = time.perf_counter() - tb
+        batch_elements = (
+            batch_count
+            + (children_synced - prev_children)
+            + (parent_child_rels - prev_rels)
+        )
+        rate = batch_elements / batch_dt if batch_dt else 0
        logger.info(
-            f"Synced {total_synced} nodes from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
+            f"[sync nodes] {parents_synced} source (+{children_synced} items, "
+            f"+{parent_child_rels} item rels) · batch {batch_dt:.1f}s · "
+            f"elapsed {time.perf_counter() - t0:.1f}s · ~{rate:.0f} elem/s"
        )

-    return total_synced
+    return {
+        "parents": parents_synced,
+        "children": children_synced,
+        "parent_child_rels": parent_child_rels,
+    }


 def sync_relationships(
    source_database: str,
    target_database: str,
    provider_id: str,
+    sink: Any,
 ) -> int:
    """
    Sync relationships from source to target database.

-    Matches source and target nodes by `_provider_element_id` in the tenant database.
-
    Source and target sessions are opened sequentially per batch to avoid
    holding two Bolt connections simultaneously for the entire sync duration.
    """
@@ -142,6 +207,7 @@ def sync_relationships(
    total_synced = 0

    while True:
+        tb = time.perf_counter()
        grouped: dict[str, list[dict[str, Any]]] = defaultdict(list)
        batch_count = 0

@@ -159,32 +225,197 @@ def sync_relationships(
        if batch_count == 0:
            break

-        with graph_database.get_session(target_database) as target_session:
-            for rel_type, batch in grouped.items():
-                query = render_cypher_template(
-                    RELATIONSHIP_SYNC_TEMPLATE, {"__REL_TYPE__": rel_type}
-                )
-                target_session.run(query, {"rows": batch})
+        for rel_type, batch in grouped.items():
+            sink.write_relationships(target_database, rel_type, provider_id, batch)

        total_synced += batch_count
+        batch_dt = time.perf_counter() - tb
+        rate = batch_count / batch_dt if batch_dt else 0
        logger.info(
-            f"Synced {total_synced} relationships from {source_database} to {target_database} in {time.perf_counter() - t0:.3f}s"
+            f"[sync rels] {total_synced} structural · batch {batch_dt:.1f}s · "
+            f"elapsed {time.perf_counter() - t0:.1f}s · ~{rate:.0f}/s"
        )

    return total_synced


 def _node_to_sync_dict(
-    record: neo4j.Record, provider_id: str
-) -> tuple[tuple[str, ...], dict[str, Any]]:
-    """Transform a source node record into a (grouping_key, sync_dict) pair."""
+    record: neo4j.Record,
+    provider_id: str,
+    catalog: dict[tuple[str, str], NormalizedList],
+) -> tuple[
+    tuple[str, ...],
+    dict[str, Any],
+    list[dict[str, Any]],
+    list[dict[str, Any]],
+]:
+    """Transform a source node record into a (grouping_key, sync_dict, children, rels) tuple.
+
+    Catalogued list properties are popped from `props` and emitted as child
+    nodes + parent->child relationships.
+    """
    props = dict(record["props"] or {})
    _strip_internal_properties(props)
    labels = tuple(sorted(set(record["labels"] or [])))
-    return labels, {
-        "provider_element_id": f"{provider_id}:{record['element_id']}",
+    parent_element_id = f"{provider_id}:{record['element_id']}"
+
+    children, rels = _explode_catalogued_lists(
+        labels, props, catalog, provider_id, parent_element_id
+    )
+
+    _normalize_sink_properties(props, labels)
+
+    parent = {
+        "provider_element_id": parent_element_id,
        "props": props,
    }
+    return labels, parent, children, rels
+
+
+def _explode_catalogued_lists(
+    labels: tuple[str, ...],
+    props: dict[str, Any],
+    catalog: dict[tuple[str, str], NormalizedList],
+    provider_id: str,
+    parent_element_id: str,
+) -> tuple[list[dict[str, Any]], list[dict[str, Any]]]:
+    """Pop catalogued list properties from `props` and produce child + rel emits.
+
+    A node may carry multiple labels (e.g. `AWSPolicyStatement` plus
+    `_AWSResource`); we check each label for catalog matches independently.
+    Returns:
+        - children: list of {"_child_label": str, "row": <node row>} dicts.
+        - rels:     list of {"rel_type": str, "row": <rel row>} dicts.
+    """
+    children: list[dict[str, Any]] = []
+    rels: list[dict[str, Any]] = []
+
+    for label in labels:
+        for key in list(props.keys()):
+            spec = catalog.get((label, key))
+            if spec is None:
+                continue
+            value = props.pop(key)
+            if value is None:
+                continue
+            if not isinstance(value, list):
+                # Catalogued but not actually a list this scan - fall back to
+                # the generic normaliser so we don't lose the value.
+                props[key] = value
+                continue
+            for item in value:
+                child_value_key, child_props = _build_child_props(spec, item)
+                if child_value_key is None:
+                    continue
+                child_element_id = _build_child_id(
+                    provider_id, spec.child_label, child_value_key
+                )
+                children.append(
+                    {
+                        "_child_label": spec.child_label,
+                        "row": {
+                            "provider_element_id": child_element_id,
+                            "props": child_props,
+                        },
+                    }
+                )
+                rels.append(
+                    {
+                        "rel_type": spec.rel_type,
+                        "row": {
+                            "start_element_id": parent_element_id,
+                            "end_element_id": child_element_id,
+                            "provider_element_id": (
+                                f"{parent_element_id}::{spec.rel_type}::"
+                                f"{child_element_id}"
+                            ),
+                            "props": {},
+                        },
+                    }
+                )
+
+    return children, rels
+
+
+def _build_child_props(
+    spec: NormalizedList, item: Any
+) -> tuple[str | None, dict[str, Any]]:
+    """Translate one list element into a child node's prop dict.
+
+    Returns (dedup_key, props). The dedup_key is what makes two child nodes
+    equal within (tenant, provider) - used to build `_provider_element_id`.
+    For scalar mode, the dedup key is the value itself. For dict mode it is
+    a stable concatenation of the mapped fields in `field_map` order.
+    """
+    if not spec.field_map:
+        if isinstance(item, (dict, list)):
+            # Defensive: caller marked this list as scalar but elements are
+            # structured. Convert to a stable string so the value survives.
+            value_str = json.dumps(item, sort_keys=True, default=str)
+        else:
+            value_str = str(item)
+        return value_str, {"value": value_str}
+
+    if not isinstance(item, dict):
+        # Catalogued as dict-shape but got a scalar. Skip - caller will see
+        # the value go missing and can fix the field_map.
+        return None, {}
+
+    props: dict[str, Any] = {}
+    dedup_parts: list[str] = []
+    for src_key, child_field in spec.field_map:
+        raw = item.get(src_key)
+        value_str = _to_sink_property_value(raw) if raw is not None else ""
+        props[child_field] = value_str
+        dedup_parts.append(f"{child_field}={value_str}")
+    return "::".join(dedup_parts), props
+
+
+def _build_child_id(provider_id: str, child_label: str, value_key: str) -> str:
+    """Deterministic `_provider_element_id` for a list-item child node.
+
+    Dedupes within (tenant, provider): multiple parents referencing the same
+    value share one child node via the existing MERGE-on-_provider_element_id
+    index in both sinks.
+    """
+    return f"{provider_id}::{child_label}::{value_key}"
+
+
+def _build_catalog_index(
+    normalized_lists: list[NormalizedList],
+) -> dict[tuple[str, str], NormalizedList]:
+    """Index the catalog by (source_label, source_property) for O(1) lookup."""
+    return {
+        (spec.source_label, spec.source_property): spec for spec in normalized_lists
+    }
+
+
+def _build_extra_labels(tenant_id: str, provider_id: str) -> tuple[str, ...]:
+    return (
+        PROVIDER_RESOURCE_LABEL,
+        get_tenant_label(tenant_id),
+        get_provider_label(provider_id),
+    )
+
+
+def _render_labels(base_labels: tuple[str, ...], extra_labels: tuple[str, ...]) -> str:
+    """Render the Cypher label string for a node-write batch."""
+    label_set = set(base_labels) | set(extra_labels)
+    return ":".join(f"`{label}`" for label in sorted(label_set))
+
+
+def _resolve_normalized_lists(provider_type: str) -> list[NormalizedList]:
+    config = PROVIDER_CONFIGS.get(provider_type)
+    if config is None:
+        # Unknown provider: empty catalog. Any list-typed property will be
+        # serialised to a comma-delimited string with one warning per
+        # (label, property).
+        logger.warning(
+            "Provider type %s not in PROVIDER_CONFIGS; no normalized_lists active",
+            provider_type,
+        )
+        return []
+    return config.normalized_lists


 def _rel_to_sync_dict(
@@ -193,7 +424,11 @@ def _rel_to_sync_dict(
    """Transform a source relationship record into a (grouping_key, sync_dict) pair."""
    props = dict(record["props"] or {})
    _strip_internal_properties(props)
+    # Relationship properties go through the same primitive coercion as
+    # nodes; catalog-driven materialisation applies to node properties only.
+    _normalize_sink_properties(props, labels=None)
    rel_type = record["rel_type"]
+
    return rel_type, {
        "start_element_id": f"{provider_id}:{record['start_element_id']}",
        "end_element_id": f"{provider_id}:{record['end_element_id']}",
@@ -206,3 +441,80 @@ def _strip_internal_properties(props: dict[str, Any]) -> None:
    """Remove provider isolation properties before the += spread in sync templates."""
    for key in PROVIDER_ISOLATION_PROPERTIES:
        props.pop(key, None)
+
+
+def _normalize_sink_properties(
+    props: dict[str, Any], labels: tuple[str, ...] | None
+) -> None:
+    """Normalize property values to primitive Cypher literals for either sink.
+
+    Attack-paths node and relationship properties are written as primitive
+    scalars regardless of the active sink (Neo4j or Neptune). The convention
+    is driven by Neptune's openCypher type restrictions, which reject list,
+    map, temporal and spatial property values, but it is applied uniformly
+    so that custom and predefined queries are portable across sinks without
+    runtime rewriting.
+
+    Concretely:
+      - Temporal values (neo4j.time.{DateTime,Date,Time,Duration}) become
+        their ISO-8601 string representation.
+      - Spatial values (neo4j.spatial.Point and subclasses) become their
+        WKT-style string representation.
+      - Maps / dicts become a JSON-encoded string, read back with `CONTAINS`
+        substring checks inside queries.
+      - Lists become a comma-delimited string. Catalogued list properties
+        are materialised as child item nodes upstream in
+        `_explode_catalogued_lists` and never reach this point; any list
+        seen here is uncatalogued, so we log a one-time warning per
+        (label, property) to surface Cartography fields that should be
+        added to the catalog.
+
+    `labels` is only used for the warning message; pass `None` for
+    relationship props (no label context).
+    """
+    for key, value in list(props.items()):
+        if isinstance(value, list) and labels is not None:
+            _warn_unnormalized_list(labels, key)
+        props[key] = _to_sink_property_value(value)
+
+
+def _warn_unnormalized_list(labels: tuple[str, ...], key: str) -> None:
+    """Warn once per (label, property), on the real label(s) only.
+
+    Every synced node also carries internal isolation labels (`_AWSResource`,
+    `_ProviderResource`, `_Tenant_*`, `_Provider_*`); warning on those just
+    doubles the noise, so skip them and point at the actionable Cartography
+    label. Falls back to all labels if only internal ones are present.
+    """
+    real_labels = [label for label in labels if not label.startswith("_")]
+    for label in real_labels or labels:
+        token = (label, key)
+        if token in _WARNED_UNNORMALIZED:
+            continue
+        _WARNED_UNNORMALIZED.add(token)
+        logger.warning(
+            "Unnormalized list property %s.%s reached sink as comma-string; "
+            "add a NormalizedList entry to the provider catalog to explode it",
+            label,
+            key,
+        )
+
+
+def _to_sink_property_value(value: Any) -> Any:
+    if hasattr(value, "iso_format") and callable(value.iso_format):
+        return value.iso_format()
+
+    if type(value).__module__.startswith("neo4j.spatial"):
+        return str(value)
+
+    if isinstance(value, dict):
+        # openCypher `SET` rejects map property values: encode as JSON so the structured payload
+        # survives the round-trip and is queryable with `CONTAINS` substring checks
+        return json.dumps(value, sort_keys=True, default=str)
+
+    if isinstance(value, list):
+        # openCypher `SET` rejects list/array property values: encode as a
+        # delimited string read back with split() inside queries
+        return ",".join(str(_to_sink_property_value(v)) for v in value)
+
+    return value
@@ -1,4 +1,5 @@
 from api.attack_paths import database as graph_database
+from api.attack_paths import sink as sink_module
 from api.db_router import MainRouter
 from api.db_utils import batch_delete, rls_transaction
 from api.models import (
@@ -76,6 +77,12 @@ def delete_provider(tenant_id: str, pk: str):
                "id", flat=True
            )
        )
+        attack_paths_sink_backends = list(
+            AttackPathsScan.all_objects.filter(provider=instance)
+            .values_list("sink_backend", flat=True)
+            .distinct()
+            .order_by("sink_backend")
+        )

        deletion_steps = [
            ("Scan Summaries", ScanSummary.all_objects.filter(scan__provider=instance)),
@@ -97,7 +104,13 @@ def delete_provider(tenant_id: str, pk: str):
    # Delete the Attack Paths' graph data related to the provider from the tenant database
    tenant_database_name = graph_database.get_database_name(tenant_id)
    try:
-        graph_database.drop_subgraph(tenant_database_name, str(pk))
+        if attack_paths_sink_backends:
+            for sink_backend in attack_paths_sink_backends:
+                sink_module.get_backend_for_name(sink_backend).drop_subgraph(
+                    tenant_database_name, str(pk)
+                )
+        else:
+            graph_database.drop_subgraph(tenant_database_name, str(pk))

    except graph_database.GraphDatabaseQueryException as gdb_error:
        logger.error(f"Error deleting Provider graph data: {gdb_error}")
@@ -23,6 +23,14 @@ from tasks.jobs.attack_paths import internet as internet_module
 from tasks.jobs.attack_paths import sync as sync_module
 from tasks.jobs.attack_paths.scan import run as attack_paths_run

+SYNC_RESULT_EMPTY = {
+    "nodes": 0,
+    "child_nodes": 0,
+    "relationships": 0,
+    "structural_relationships": 0,
+    "item_relationships": 0,
+}
+

@pytest.mark.django_db
 class TestAttackPathsRun:
@@ -32,6 +40,7 @@ class TestAttackPathsRun:
        "tasks.jobs.attack_paths.scan.utils.call_within_event_loop",
        side_effect=lambda fn, *a, **kw: fn(*a, **kw),
    )
+    @patch("tasks.jobs.attack_paths.scan.db_utils.set_scan_migrated")
    @patch("tasks.jobs.attack_paths.scan.db_utils.set_graph_data_ready")
    @patch("tasks.jobs.attack_paths.scan.db_utils.set_provider_graph_data_ready")
    @patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
@@ -39,7 +48,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph", return_value=0)
    @patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@@ -48,11 +57,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -66,7 +75,7 @@ class TestAttackPathsRun:
    def test_run_success_flow(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -83,6 +92,7 @@ class TestAttackPathsRun:
        mock_finish,
        mock_set_provider_graph_data_ready,
        mock_set_graph_data_ready,
+        mock_set_scan_migrated,
        mock_event_loop,
        mock_drop_db,
        tenants_fixture,
@@ -159,6 +169,7 @@ class TestAttackPathsRun:
            target_database="tenant-db",
            tenant_id=str(provider.tenant_id),
            provider_id=str(provider.id),
+            provider_type="aws",
        )
        mock_get_ingestion.assert_called_once_with(provider.provider)
        mock_event_loop.assert_called_once()
@@ -172,9 +183,12 @@ class TestAttackPathsRun:
            attack_paths_scan, StateChoices.COMPLETED, ingestion_result
        )
        mock_set_provider_graph_data_ready.assert_called_once_with(
-            attack_paths_scan, False
+            attack_paths_scan, False, "neo4j"
        )
        mock_set_graph_data_ready.assert_called_once_with(attack_paths_scan, True)
+        # is_migrated is flipped to True only after the sync succeeds, so reads
+        # don't switch to the new catalog/sink before the graph is live.
+        mock_set_scan_migrated.assert_called_once_with(attack_paths_scan, True, "neo4j")

    @patch(
        "tasks.jobs.attack_paths.scan.utils.stringify_exception",
@@ -194,13 +208,13 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.internet.analysis")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.get_database_name",
        return_value="db-scan-id",
    )
-    @patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
+    @patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
    @patch(
        "tasks.jobs.attack_paths.scan.initialize_prowler_provider",
        return_value=MagicMock(_enabled_regions=["us-east-1"]),
@@ -212,7 +226,7 @@ class TestAttackPathsRun:
    def test_run_failure_marks_scan_failed(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_get_db_name,
        mock_create_db,
        mock_cartography_indexes,
@@ -293,13 +307,13 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.internet.analysis")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.get_database_name",
        return_value="db-scan-id",
    )
-    @patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
+    @patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
    @patch(
        "tasks.jobs.attack_paths.scan.initialize_prowler_provider",
        return_value=MagicMock(_enabled_regions=["us-east-1"]),
@@ -311,7 +325,7 @@ class TestAttackPathsRun:
    def test_failure_before_gate_does_not_flip_graph_data_ready_true(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_get_db_name,
        mock_create_db,
        mock_cartography_indexes,
@@ -396,13 +410,13 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.internet.analysis")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.get_database_name",
        return_value="db-scan-id",
    )
-    @patch("tasks.jobs.attack_paths.scan.graph_database.get_uri")
+    @patch("tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri")
    @patch(
        "tasks.jobs.attack_paths.scan.initialize_prowler_provider",
        return_value=MagicMock(_enabled_regions=["us-east-1"]),
@@ -414,7 +428,7 @@ class TestAttackPathsRun:
    def test_run_failure_marks_scan_failed_even_when_drop_database_fails(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_get_db_name,
        mock_create_db,
        mock_cartography_indexes,
@@ -493,7 +507,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
@@ -505,11 +519,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -523,7 +537,7 @@ class TestAttackPathsRun:
    def test_failure_after_gate_before_drop_restores_graph_data_ready(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -589,8 +603,8 @@ class TestAttackPathsRun:
                attack_paths_run(str(tenant.id), str(scan.id), "task-456")

        assert mock_set_provider_graph_data_ready.call_args_list == [
-            call(attack_paths_scan, False),
-            call(attack_paths_scan, True),
+            call(attack_paths_scan, False, "neo4j"),
+            call(attack_paths_scan, True, "neo4j"),
        ]

    @patch(
@@ -618,11 +632,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -636,7 +650,7 @@ class TestAttackPathsRun:
    def test_failure_after_drop_before_sync_leaves_graph_data_ready_false(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -703,7 +717,7 @@ class TestAttackPathsRun:

        # Only called with False (gate), never with True (no recovery for partial data)
        mock_set_provider_graph_data_ready.assert_called_once_with(
-            attack_paths_scan, False
+            attack_paths_scan, False, "neo4j"
        )

    @patch(
@@ -716,6 +730,7 @@ class TestAttackPathsRun:
    )
    @patch("tasks.jobs.attack_paths.scan.graph_database.drop_database")
    @patch("tasks.jobs.attack_paths.scan.db_utils.finish_attack_paths_scan")
+    @patch("tasks.jobs.attack_paths.scan.db_utils.set_scan_migrated")
    @patch(
        "tasks.jobs.attack_paths.scan.db_utils.set_graph_data_ready",
        side_effect=[RuntimeError("flag failed"), None],
@@ -725,7 +740,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch("tasks.jobs.attack_paths.scan.graph_database.drop_subgraph")
    @patch("tasks.jobs.attack_paths.scan.indexes.create_sync_indexes")
@@ -734,11 +749,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -752,7 +767,7 @@ class TestAttackPathsRun:
    def test_failure_after_sync_restores_graph_data_ready(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -768,6 +783,7 @@ class TestAttackPathsRun:
        mock_update_progress,
        mock_set_provider_graph_data_ready,
        mock_set_graph_data_ready,
+        mock_set_scan_migrated,
        mock_finish,
        mock_drop_db,
        mock_event_loop,
@@ -824,8 +840,11 @@ class TestAttackPathsRun:
        ]
        # set_provider_graph_data_ready only called once with False (the gate)
        mock_set_provider_graph_data_ready.assert_called_once_with(
-            attack_paths_scan, False
+            attack_paths_scan, False, "neo4j"
        )
+        # is_migrated is flipped once after the sync and is not touched again by
+        # the failure-recovery branch
+        mock_set_scan_migrated.assert_called_once_with(attack_paths_scan, True, "neo4j")

    @patch(
        "tasks.jobs.attack_paths.scan.utils.stringify_exception",
@@ -843,7 +862,7 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.db_utils.starting_attack_paths_scan")
    @patch(
        "tasks.jobs.attack_paths.scan.sync.sync_graph",
-        return_value={"nodes": 0, "relationships": 0},
+        return_value=SYNC_RESULT_EMPTY,
    )
    @patch(
        "tasks.jobs.attack_paths.scan.graph_database.drop_subgraph",
@@ -855,11 +874,11 @@ class TestAttackPathsRun:
    @patch("tasks.jobs.attack_paths.scan.indexes.create_findings_indexes")
    @patch("tasks.jobs.attack_paths.scan.cartography_ontology.run")
    @patch("tasks.jobs.attack_paths.scan.cartography_analysis.run")
-    @patch("tasks.jobs.attack_paths.scan.cartography_create_indexes.run")
+    @patch("tasks.jobs.attack_paths.indexes.cartography_create_indexes.run")
    @patch("tasks.jobs.attack_paths.scan.graph_database.clear_cache")
    @patch("tasks.jobs.attack_paths.scan.graph_database.create_database")
    @patch(
-        "tasks.jobs.attack_paths.scan.graph_database.get_uri",
+        "tasks.jobs.attack_paths.scan.graph_database.get_ingest_uri",
        return_value="bolt://neo4j",
    )
    @patch(
@@ -873,7 +892,7 @@ class TestAttackPathsRun:
    def test_recovery_failure_does_not_suppress_original_exception(
        self,
        mock_init_provider,
-        mock_get_uri,
+        mock_get_ingest_uri,
        mock_create_db,
        mock_clear_cache,
        mock_cartography_indexes,
@@ -1116,7 +1135,7 @@ class TestFailAttackPathsScan:
            fail_attack_paths_scan(str(tenant.id), "nonexistent", "setup exploded")

    def test_fail_recovers_graph_data_ready_when_data_exists(
-        self, tenants_fixture, providers_fixture, scans_fixture
+        self, tenants_fixture, providers_fixture, scans_fixture, sink_backend_stub
    ):
        from tasks.jobs.attack_paths.db_utils import fail_attack_paths_scan

@@ -1135,16 +1154,18 @@ class TestFailAttackPathsScan:
            state=StateChoices.EXECUTING,
        )

+        # `recover_graph_data_ready` routes `has_provider_data` through
+        # `sink_module.get_backend_for_scan(scan)`. With `is_migrated=False`
+        # and the default `ATTACK_PATHS_SINK_DATABASE=neo4j`, the factory
+        # returns the active backend, which `sink_backend_stub` replaces.
+        sink_backend_stub.has_provider_data.return_value = True
+
        with (
            patch(
                "tasks.jobs.attack_paths.db_utils.retrieve_attack_paths_scan",
                return_value=attack_paths_scan,
            ),
            patch("tasks.jobs.attack_paths.db_utils.graph_database.drop_database"),
-            patch(
-                "tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data",
-                return_value=True,
-            ),
            patch(
                "tasks.jobs.attack_paths.db_utils.set_provider_graph_data_ready"
            ) as mock_set_ready,
@@ -1154,7 +1175,7 @@ class TestFailAttackPathsScan:
        mock_set_ready.assert_called_once_with(attack_paths_scan, True)

    def test_fail_leaves_graph_data_ready_false_when_no_data(
-        self, tenants_fixture, providers_fixture, scans_fixture
+        self, tenants_fixture, providers_fixture, scans_fixture, sink_backend_stub
    ):
        from tasks.jobs.attack_paths.db_utils import fail_attack_paths_scan

@@ -1173,16 +1194,14 @@ class TestFailAttackPathsScan:
            state=StateChoices.EXECUTING,
        )

+        sink_backend_stub.has_provider_data.return_value = False
+
        with (
            patch(
                "tasks.jobs.attack_paths.db_utils.retrieve_attack_paths_scan",
                return_value=attack_paths_scan,
            ),
            patch("tasks.jobs.attack_paths.db_utils.graph_database.drop_database"),
-            patch(
-                "tasks.jobs.attack_paths.db_utils.graph_database.has_provider_data",
-                return_value=False,
-            ),
            patch(
                "tasks.jobs.attack_paths.db_utils.set_provider_graph_data_ready"
            ) as mock_set_ready,
@@ -1271,6 +1290,20 @@ class TestAttackPathsFindingsHelpers:
            [call(mock_session, stmt) for stmt in FINDINGS_INDEX_STATEMENTS]
        )

+    def test_create_findings_indexes_runs_even_when_sink_is_neptune(self, settings):
+        # The index helpers run against the temp ingest DB, which is always
+        # Neo4j regardless of the configured sink. A Neptune sink must not
+        # suppress index creation on that DB (regression for the dropped
+        # in-helper sink gate).
+        settings.ATTACK_PATHS_SINK_DATABASE = "neptune"
+        mock_session = MagicMock()
+        with patch("tasks.jobs.attack_paths.indexes.run_write_query") as mock_run_write:
+            indexes_module.create_findings_indexes(mock_session)
+
+        from tasks.jobs.attack_paths.indexes import FINDINGS_INDEX_STATEMENTS
+
+        assert mock_run_write.call_count == len(FINDINGS_INDEX_STATEMENTS)
+
    def test_load_findings_batches_requests(self, providers_fixture):
        provider = providers_fixture[0]
        provider.provider = Provider.ProviderChoices.AWS
@@ -1802,7 +1835,7 @@ def _make_session_ctx(session, call_order=None, name=None):


 class TestSyncNodes:
-    def test_sync_nodes_adds_private_label(self):
+    def test_sync_nodes_passes_isolation_labels_to_sink(self):
        row = {
            "internal_id": 1,
            "element_id": "elem-1",
@@ -1812,29 +1845,32 @@ class TestSyncNodes:

        mock_source_1 = MagicMock()
        mock_source_1.run.return_value = [row]
-        mock_target = MagicMock()
        mock_source_2 = MagicMock()
        mock_source_2.run.return_value = []
+        sink = MagicMock()

        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[
                _make_session_ctx(mock_source_1),
-                _make_session_ctx(mock_target),
                _make_session_ctx(mock_source_2),
            ],
        ):
-            total = sync_module.sync_nodes(
-                "source-db", "target-db", "tenant-1", "prov-1"
+            result = sync_module.sync_nodes(
+                "source-db", "target-db", "tenant-1", "prov-1", sink, []
            )

-        assert total == 1
-        query = mock_target.run.call_args.args[0]
-        assert "_ProviderResource" in query
-        assert "_Tenant_tenant1" in query
-        assert "_Provider_prov1" in query
+        assert result["parents"] == 1
+        sink.write_nodes.assert_called_once()
+        target_db, labels, batch = sink.write_nodes.call_args.args
+        assert target_db == "target-db"
+        assert "_ProviderResource" in labels
+        assert "_Tenant_tenant1" in labels
+        assert "_Provider_prov1" in labels
+        assert batch[0]["provider_element_id"] == "prov-1:elem-1"
+        assert batch[0]["props"] == {"key": "value"}

-    def test_sync_nodes_source_closes_before_target_opens(self):
+    def test_sync_nodes_writes_after_source_session_closes(self):
        row = {
            "internal_id": 1,
            "element_id": "elem-1",
@@ -1846,21 +1882,23 @@ class TestSyncNodes:

        src_1 = MagicMock()
        src_1.run.return_value = [row]
-        tgt = MagicMock()
        src_2 = MagicMock()
        src_2.run.return_value = []
+        sink = MagicMock()
+        sink.write_nodes.side_effect = lambda *_a, **_kw: call_order.append(
+            "sink:write"
+        )

        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[
                _make_session_ctx(src_1, call_order, "source1"),
-                _make_session_ctx(tgt, call_order, "target"),
                _make_session_ctx(src_2, call_order, "source2"),
            ],
        ):
-            sync_module.sync_nodes("src-db", "tgt-db", "t-1", "p-1")
+            sync_module.sync_nodes("src-db", "tgt-db", "t-1", "p-1", sink, [])

-        assert call_order.index("source1:exit") < call_order.index("target:enter")
+        assert call_order.index("source1:exit") < call_order.index("sink:write")

    def test_sync_nodes_pagination_with_batch_size_1(self):
        row_a = {
@@ -1882,44 +1920,44 @@ class TestSyncNodes:
        src_2.run.return_value = [row_b]
        src_3 = MagicMock()
        src_3.run.return_value = []
-        tgt_1 = MagicMock()
-        tgt_2 = MagicMock()
+        sink = MagicMock()

        with (
            patch(
                "tasks.jobs.attack_paths.sync.graph_database.get_session",
                side_effect=[
                    _make_session_ctx(src_1),
-                    _make_session_ctx(tgt_1),
                    _make_session_ctx(src_2),
-                    _make_session_ctx(tgt_2),
                    _make_session_ctx(src_3),
                ],
            ),
            patch("tasks.jobs.attack_paths.sync.SYNC_BATCH_SIZE", 1),
        ):
-            total = sync_module.sync_nodes("src", "tgt", "t-1", "p-1")
+            result = sync_module.sync_nodes("src", "tgt", "t-1", "p-1", sink, [])

-        assert total == 2
+        assert result["parents"] == 2
+        assert sink.write_nodes.call_count == 2
        assert src_1.run.call_args.args[1]["last_id"] == -1
        assert src_2.run.call_args.args[1]["last_id"] == 1

    def test_sync_nodes_empty_source_returns_zero(self):
        src = MagicMock()
        src.run.return_value = []
+        sink = MagicMock()

        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[_make_session_ctx(src)],
        ) as mock_get_session:
-            total = sync_module.sync_nodes("src", "tgt", "t-1", "p-1")
+            result = sync_module.sync_nodes("src", "tgt", "t-1", "p-1", sink, [])

-        assert total == 0
+        assert result["parents"] == 0
        assert mock_get_session.call_count == 1
+        sink.write_nodes.assert_not_called()


 class TestSyncRelationships:
-    def test_sync_relationships_source_closes_before_target_opens(self):
+    def test_sync_relationships_writes_after_source_session_closes(self):
        row = {
            "internal_id": 1,
            "rel_type": "HAS",
@@ -1932,21 +1970,23 @@ class TestSyncRelationships:

        src_1 = MagicMock()
        src_1.run.return_value = [row]
-        tgt = MagicMock()
        src_2 = MagicMock()
        src_2.run.return_value = []
+        sink = MagicMock()
+        sink.write_relationships.side_effect = lambda *_a, **_kw: call_order.append(
+            "sink:write"
+        )

        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[
                _make_session_ctx(src_1, call_order, "source1"),
-                _make_session_ctx(tgt, call_order, "target"),
                _make_session_ctx(src_2, call_order, "source2"),
            ],
        ):
-            sync_module.sync_relationships("src", "tgt", "p-1")
+            sync_module.sync_relationships("src", "tgt", "p-1", sink)

-        assert call_order.index("source1:exit") < call_order.index("target:enter")
+        assert call_order.index("source1:exit") < call_order.index("sink:write")

    def test_sync_relationships_pagination_with_batch_size_1(self):
        row_a = {
@@ -1970,40 +2010,40 @@ class TestSyncRelationships:
        src_2.run.return_value = [row_b]
        src_3 = MagicMock()
        src_3.run.return_value = []
-        tgt_1 = MagicMock()
-        tgt_2 = MagicMock()
+        sink = MagicMock()

        with (
            patch(
                "tasks.jobs.attack_paths.sync.graph_database.get_session",
                side_effect=[
                    _make_session_ctx(src_1),
-                    _make_session_ctx(tgt_1),
                    _make_session_ctx(src_2),
-                    _make_session_ctx(tgt_2),
                    _make_session_ctx(src_3),
                ],
            ),
            patch("tasks.jobs.attack_paths.sync.SYNC_BATCH_SIZE", 1),
        ):
-            total = sync_module.sync_relationships("src", "tgt", "p-1")
+            total = sync_module.sync_relationships("src", "tgt", "p-1", sink)

        assert total == 2
+        assert sink.write_relationships.call_count == 2
        assert src_1.run.call_args.args[1]["last_id"] == -1
        assert src_2.run.call_args.args[1]["last_id"] == 1

    def test_sync_relationships_empty_source_returns_zero(self):
        src = MagicMock()
        src.run.return_value = []
+        sink = MagicMock()

        with patch(
            "tasks.jobs.attack_paths.sync.graph_database.get_session",
            side_effect=[_make_session_ctx(src)],
        ) as mock_get_session:
-            total = sync_module.sync_relationships("src", "tgt", "p-1")
+            total = sync_module.sync_relationships("src", "tgt", "p-1", sink)

        assert total == 0
        assert mock_get_session.call_count == 1
+        sink.write_relationships.assert_not_called()


 class TestInternetAnalysis:
@@ -2075,6 +2115,8 @@ class TestAttackPathsDbUtilsGraphDataReady:

        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is False
+        assert attack_paths_scan.is_migrated is False
+        assert attack_paths_scan.sink_backend == "neo4j"

    def test_create_attack_paths_scan_inherits_true_from_previous(
        self, tenants_fixture, providers_fixture, scans_fixture
@@ -2095,6 +2137,8 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
+            is_migrated=True,
+            sink_backend="neptune",
        )

        new_scan = Scan.objects.create(
@@ -2115,6 +2159,109 @@ class TestAttackPathsDbUtilsGraphDataReady:

        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is True
+        # is_migrated tracks the data being served: inherited from the ready scan
+        assert attack_paths_scan.is_migrated is True
+        assert attack_paths_scan.sink_backend == "neptune"
+
+    def test_create_attack_paths_scan_prefers_active_sink_ready_scan(
+        self, tenants_fixture, providers_fixture, scans_fixture, settings
+    ):
+        from tasks.jobs.attack_paths.db_utils import create_attack_paths_scan
+
+        settings.ATTACK_PATHS_SINK_DATABASE = "neo4j"
+        tenant = tenants_fixture[0]
+        provider = providers_fixture[0]
+        provider.provider = Provider.ProviderChoices.AWS
+        provider.save()
+        scan = scans_fixture[0]
+        scan.provider = provider
+        scan.save()
+
+        AttackPathsScan.objects.create(
+            tenant_id=tenant.id,
+            provider=provider,
+            scan=scan,
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            is_migrated=False,
+            sink_backend="neo4j",
+        )
+        AttackPathsScan.objects.create(
+            tenant_id=tenant.id,
+            provider=provider,
+            scan=scan,
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            is_migrated=True,
+            sink_backend="neptune",
+        )
+
+        new_scan = Scan.objects.create(
+            name="New Scan",
+            provider=provider,
+            trigger=Scan.TriggerChoices.MANUAL,
+            state=StateChoices.AVAILABLE,
+            tenant_id=tenant.id,
+        )
+
+        with patch(
+            "tasks.jobs.attack_paths.db_utils.rls_transaction",
+            new=lambda *args, **kwargs: nullcontext(),
+        ):
+            attack_paths_scan = create_attack_paths_scan(
+                str(tenant.id), str(new_scan.id), provider.id
+            )
+
+        assert attack_paths_scan is not None
+        assert attack_paths_scan.graph_data_ready is True
+        assert attack_paths_scan.is_migrated is False
+        assert attack_paths_scan.sink_backend == "neo4j"
+
+    def test_create_attack_paths_scan_inherits_is_migrated_false_from_legacy_ready(
+        self, tenants_fixture, providers_fixture, scans_fixture
+    ):
+        from tasks.jobs.attack_paths.db_utils import create_attack_paths_scan
+
+        tenant = tenants_fixture[0]
+        provider = providers_fixture[0]
+        provider.provider = Provider.ProviderChoices.AWS
+        provider.save()
+        scan = scans_fixture[0]
+        scan.provider = provider
+        scan.save()
+
+        # Previous scan is ready but pre-cutover (legacy Neo4j graph shape)
+        AttackPathsScan.objects.create(
+            tenant_id=tenant.id,
+            provider=provider,
+            scan=scan,
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            is_migrated=False,
+            sink_backend="neo4j",
+        )
+
+        new_scan = Scan.objects.create(
+            name="New Scan",
+            provider=provider,
+            trigger=Scan.TriggerChoices.MANUAL,
+            state=StateChoices.AVAILABLE,
+            tenant_id=tenant.id,
+        )
+
+        with patch(
+            "tasks.jobs.attack_paths.db_utils.rls_transaction",
+            new=lambda *args, **kwargs: nullcontext(),
+        ):
+            attack_paths_scan = create_attack_paths_scan(
+                str(tenant.id), str(new_scan.id), provider.id
+            )
+
+        assert attack_paths_scan is not None
+        assert attack_paths_scan.graph_data_ready is True
+        # Reads stay on the legacy catalog/backend until this scan's own sync
+        assert attack_paths_scan.is_migrated is False
+        assert attack_paths_scan.sink_backend == "neo4j"

    def test_create_attack_paths_scan_inherits_false_when_no_previous_ready(
        self, tenants_fixture, providers_fixture, scans_fixture
@@ -2135,6 +2282,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan,
            state=StateChoices.FAILED,
            graph_data_ready=False,
+            sink_backend="neptune",
        )

        new_scan = Scan.objects.create(
@@ -2155,6 +2303,8 @@ class TestAttackPathsDbUtilsGraphDataReady:

        assert attack_paths_scan is not None
        assert attack_paths_scan.graph_data_ready is False
+        assert attack_paths_scan.is_migrated is False
+        assert attack_paths_scan.sink_backend == "neo4j"

    def test_set_graph_data_ready_updates_field(
        self, tenants_fixture, providers_fixture, scans_fixture
@@ -2261,7 +2411,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert attack_paths_scan.state == StateChoices.FAILED
        assert attack_paths_scan.graph_data_ready is True

-    def test_set_provider_graph_data_ready_updates_all_scans_for_provider(
+    def test_set_provider_graph_data_ready_updates_all_scans_for_provider_sink(
        self, tenants_fixture, providers_fixture, scans_fixture
    ):
        from tasks.jobs.attack_paths.db_utils import set_provider_graph_data_ready
@@ -2289,6 +2439,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan_a,
            state=StateChoices.COMPLETED,
            graph_data_ready=True,
+            sink_backend="neptune",
        )
        new_ap_scan = AttackPathsScan.objects.create(
            tenant_id=tenant.id,
@@ -2296,6 +2447,7 @@ class TestAttackPathsDbUtilsGraphDataReady:
            scan=scan_b,
            state=StateChoices.EXECUTING,
            graph_data_ready=True,
+            sink_backend="neptune",
        )

        with patch(
@@ -2309,6 +2461,48 @@ class TestAttackPathsDbUtilsGraphDataReady:
        assert old_ap_scan.graph_data_ready is False
        assert new_ap_scan.graph_data_ready is False

+    def test_set_provider_graph_data_ready_preserves_other_sink_scans(
+        self, tenants_fixture, providers_fixture, scans_fixture
+    ):
+        from tasks.jobs.attack_paths.db_utils import set_provider_graph_data_ready
+
+        tenant = tenants_fixture[0]
+        provider = providers_fixture[0]
+        provider.provider = Provider.ProviderChoices.AWS
+        provider.save()
+
+        scan = scans_fixture[0]
+        scan.provider = provider
+        scan.save()
+
+        legacy_scan = AttackPathsScan.objects.create(
+            tenant_id=tenant.id,
+            provider=provider,
+            scan=scan,
+            state=StateChoices.COMPLETED,
+            graph_data_ready=True,
+            sink_backend="neo4j",
+        )
+        neptune_scan = AttackPathsScan.objects.create(
+            tenant_id=tenant.id,
+            provider=provider,
+            scan=scan,
+            state=StateChoices.EXECUTING,
+            graph_data_ready=True,
+            sink_backend="neptune",
+        )
+
+        with patch(
+            "tasks.jobs.attack_paths.db_utils.rls_transaction",
+            new=lambda *args, **kwargs: nullcontext(),
+        ):
+            set_provider_graph_data_ready(neptune_scan, False)
+
+        legacy_scan.refresh_from_db()
+        neptune_scan.refresh_from_db()
+        assert legacy_scan.graph_data_ready is True
+        assert neptune_scan.graph_data_ready is False
+
    def test_set_provider_graph_data_ready_does_not_affect_other_providers(
        self, tenants_fixture, providers_fixture, scans_fixture
    ):
@@ -2871,3 +3065,57 @@ class TestCleanupStaleAttackPathsScans:
        ap_scan.refresh_from_db()
        assert ap_scan.state == StateChoices.SCHEDULED
        mock_revoke.assert_not_called()
+
+
+class TestNormalizeSinkProperties:
+    """Coerce Cartography-emitted property values into sink-portable primitives.
+
+    Lists become comma-strings, dicts become JSON strings, temporals become
+    ISO strings, spatials become their stringified form. The same coercion
+    runs regardless of the active sink so queries are portable.
+    """
+
+    @pytest.mark.parametrize(
+        "raw, expected",
+        [
+            (
+                {"a": "x", "b": 1, "c": 1.5, "d": True, "e": None},
+                {"a": "x", "b": 1, "c": 1.5, "d": True, "e": None},
+            ),
+            (
+                {"actions": ["s3:GetObject", "s3:PutObject"], "tags": []},
+                {"actions": "s3:GetObject,s3:PutObject", "tags": ""},
+            ),
+            (
+                {"condition": {"StringEquals": {"aws:SourceAccount": "123456789012"}}},
+                {
+                    "condition": '{"StringEquals": {"aws:SourceAccount": "123456789012"}}'
+                },
+            ),
+        ],
+    )
+    def test_primitive_list_and_dict_branches(self, raw, expected):
+        sync_module._normalize_sink_properties(raw, labels=None)
+        assert raw == expected
+
+    def test_temporal_and_spatial_become_strings(self):
+        class FakeDateTime:
+            def iso_format(self) -> str:
+                return "2026-05-13T10:00:00+00:00"
+
+        class FakeSpatialPoint:
+            def __str__(self) -> str:
+                return "POINT(1.0 2.0)"
+
+        # The spatial branch is detected by module prefix, not by base class.
+        FakeSpatialPoint.__module__ = "neo4j.spatial.fake"
+
+        props = {
+            "created_at": FakeDateTime(),
+            "location": FakeSpatialPoint(),
+        }
+        sync_module._normalize_sink_properties(props, labels=None)
+        assert props == {
+            "created_at": "2026-05-13T10:00:00+00:00",
+            "location": "POINT(1.0 2.0)",
+        }
@@ -1,4 +1,4 @@
-from unittest.mock import call, patch
+from unittest.mock import MagicMock, call, patch

 import pytest
 from api.attack_paths import database as graph_database
@@ -60,10 +60,12 @@ class TestDeleteProvider:

        aps1 = create_attack_paths_scan(instance)
        aps2 = create_attack_paths_scan(instance)
+        backend = MagicMock()

        with (
            patch(
-                "tasks.jobs.deletion.graph_database.drop_subgraph",
+                "tasks.jobs.deletion.sink_module.get_backend_for_name",
+                return_value=backend,
            ),
            patch(
                "tasks.jobs.deletion.graph_database.drop_database",
@@ -72,12 +74,55 @@ class TestDeleteProvider:
            result = delete_provider(tenant_id, instance.id)

        assert result
+        backend.drop_subgraph.assert_called_once_with(
+            graph_database.get_database_name(tenant_id), str(instance.id)
+        )
        expected_tmp_calls = [
            call(f"db-tmp-scan-{str(aps1.id).lower()}"),
            call(f"db-tmp-scan-{str(aps2.id).lower()}"),
        ]
        mock_drop_database.assert_has_calls(expected_tmp_calls, any_order=True)

+    def test_delete_provider_drops_graph_data_from_all_recorded_sinks(
+        self, providers_fixture, create_attack_paths_scan
+    ):
+        instance = providers_fixture[0]
+        tenant_id = str(instance.tenant_id)
+        create_attack_paths_scan(instance, sink_backend="neo4j")
+        create_attack_paths_scan(instance, sink_backend="neptune")
+        neo4j_backend = MagicMock()
+        neptune_backend = MagicMock()
+
+        def get_backend_for_name(name):
+            return {
+                "neo4j": neo4j_backend,
+                "neptune": neptune_backend,
+            }[name]
+
+        with (
+            patch(
+                "tasks.jobs.deletion.graph_database.get_database_name",
+                return_value="tenant-db",
+            ),
+            patch(
+                "tasks.jobs.deletion.sink_module.get_backend_for_name",
+                side_effect=get_backend_for_name,
+            ) as mock_get_backend_for_name,
+            patch("tasks.jobs.deletion.graph_database.drop_database"),
+        ):
+            result = delete_provider(tenant_id, instance.id)
+
+        assert result
+        mock_get_backend_for_name.assert_has_calls(
+            [call("neo4j"), call("neptune")], any_order=True
+        )
+        neo4j_backend.drop_subgraph.assert_called_once_with(
+            "tenant-db", str(instance.id)
+        )
+        neptune_backend.drop_subgraph.assert_called_once_with(
+            "tenant-db", str(instance.id)
+        )
+
    def test_delete_provider_continues_when_temp_db_drop_fails(
        self, providers_fixture, create_attack_paths_scan
    ):
@@ -85,10 +130,12 @@ class TestDeleteProvider:
        tenant_id = str(instance.tenant_id)

        create_attack_paths_scan(instance)
+        backend = MagicMock()

        with (
            patch(
-                "tasks.jobs.deletion.graph_database.drop_subgraph",
+                "tasks.jobs.deletion.sink_module.get_backend_for_name",
+                return_value=backend,
            ),
            patch(
                "tasks.jobs.deletion.graph_database.drop_database",
@@ -110,7 +110,7 @@ constraints = [
    { name = "blinker", specifier = "==1.9.0" },
    { name = "boto3", specifier = "==1.40.61" },
    { name = "botocore", specifier = "==1.40.61" },
-    { name = "cartography", specifier = "==0.135.0" },
+    { name = "cartography", specifier = "==0.138.1" },
    { name = "celery", specifier = "==5.6.2" },
    { name = "certifi", specifier = "==2026.1.4" },
    { name = "cffi", specifier = "==2.0.0" },
@@ -364,7 +364,7 @@ constraints = [
    { name = "wcwidth", specifier = "==0.5.3" },
    { name = "websocket-client", specifier = "==1.9.0" },
    { name = "werkzeug", specifier = "==3.1.7" },
-    { name = "workos", specifier = "==6.0.4" },
+    { name = "workos", specifier = "==6.0.8" },
    { name = "wrapt", specifier = "==1.17.3" },
    { name = "xlsxwriter", specifier = "==3.2.9" },
    { name = "xmlsec", specifier = "==1.3.17" },
@@ -376,6 +376,7 @@ constraints = [
    { name = "zstd", specifier = "==1.5.7.3" },
 ]
 overrides = [
+    { name = "azure-mgmt-containerservice", specifier = "==34.1.0" },
    { name = "dulwich", specifier = "==1.2.5" },
    { name = "microsoft-kiota-abstractions", specifier = "==1.9.9" },
    { name = "okta", specifier = "==3.4.2" },
@@ -1407,6 +1408,20 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/3d/66/0d8ae9ca4d75e57746026a1f9a10a7e25029511c128cf20166fce516bda9/azure_mgmt_logic-10.0.0-py3-none-any.whl", hash = "sha256:525c78afedf3edb35eb0a16152c8beba89769ee1bc6af01bcdc42842a551e443", size = 235433, upload-time = "2022-06-13T01:38:27.333Z" },
 ]

+[[package]]
+name = "azure-mgmt-managementgroups"
+version = "1.1.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "azure-mgmt-core" },
+    { name = "isodate" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/fd/73/ac5e064ed7343e1b3172f32f09be3efca906087218d3046b5038f2f394ed/azure_mgmt_managementgroups-1.1.0.tar.gz", hash = "sha256:e6199baf118890ba2bda35dda83a88861c0b1bbef126311b20ec12eed9681951", size = 60101, upload-time = "2026-02-13T03:45:45.439Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/92/bc/993158de03cc0a49f2cf8192615ffedbc508c417cb3522e88f6652b714cc/azure_mgmt_managementgroups-1.1.0-py3-none-any.whl", hash = "sha256:140934589559ef6afcac6f1d24f995588a1965aaa89d47851c1cc639fafb1942", size = 83586, upload-time = "2026-02-13T03:45:46.836Z" },
+]
+
 [[package]]
 name = "azure-mgmt-monitor"
 version = "6.0.2"
@@ -1726,7 +1741,7 @@ wheels = [

 [[package]]
 name = "cartography"
-version = "0.135.0"
+version = "0.138.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "adal" },
@@ -1746,6 +1761,7 @@ dependencies = [
    { name = "azure-mgmt-eventhub" },
    { name = "azure-mgmt-keyvault" },
    { name = "azure-mgmt-logic" },
+    { name = "azure-mgmt-managementgroups" },
    { name = "azure-mgmt-monitor" },
    { name = "azure-mgmt-network" },
    { name = "azure-mgmt-resource" },
@@ -1754,6 +1770,7 @@ dependencies = [
    { name = "azure-mgmt-storage" },
    { name = "azure-mgmt-synapse" },
    { name = "azure-mgmt-web" },
+    { name = "azure-storage-blob" },
    { name = "azure-synapse-artifacts" },
    { name = "backoff" },
    { name = "boto3" },
@@ -1765,8 +1782,12 @@ dependencies = [
    { name = "duo-client" },
    { name = "google-api-python-client" },
    { name = "google-auth" },
+    { name = "google-cloud-aiplatform" },
+    { name = "google-cloud-artifact-registry" },
    { name = "google-cloud-asset" },
    { name = "google-cloud-resource-manager" },
+    { name = "google-cloud-run" },
+    { name = "google-cloud-storage" },
    { name = "httpx" },
    { name = "kubernetes" },
    { name = "marshmallow" },
@@ -1792,9 +1813,9 @@ dependencies = [
    { name = "workos" },
    { name = "xmltodict" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/39/47/606851d2403a983b63813b9e95427a5dd896e49bc5a501868c041262e9a5/cartography-0.135.0.tar.gz", hash = "sha256:3f500cd22c3b392d00e8b49f62acc95fd4dcd559ce514aafe2eb8101133c7a49", size = 9106458, upload-time = "2026-04-10T16:25:34.898Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/51/cd/0eb6a5a3c89cc179801d902ade9719af1a583c516c00f50d72b8207db1eb/cartography-0.138.1.tar.gz", hash = "sha256:356e946a0bcac899cba293d57803c71bd35fdeabe623f5f67d9405d7a643af9f", size = 9756966, upload-time = "2026-06-19T22:11:32.411Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/b1/e1/99a26b3e662202be77961aba73338e1448623490710b81783e53a4bbef15/cartography-0.135.0-py3-none-any.whl", hash = "sha256:c62c32a6917b8f23a8b98fe2b6c7c4a918b50f55918482966c4dae1cf5f538e1", size = 1590545, upload-time = "2026-04-10T16:25:37.669Z" },
+    { url = "https://files.pythonhosted.org/packages/a8/15/4447ec968825b2a19cba26ecb74964208aa3f941d9181a7782572e30b43d/cartography-0.138.1-py3-none-any.whl", hash = "sha256:88ec0898ea1a1b3f4653be9a3e7e61144f5cee20384b9040e92039617d39f029", size = 2014725, upload-time = "2026-06-19T22:11:29.886Z" },
 ]

 [[package]]
@@ -2511,6 +2532,15 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/e3/26/57c6fb270950d476074c087527a558ccb6f4436657314bfb6cdf484114c4/docker-7.1.0-py3-none-any.whl", hash = "sha256:c96b93b7f0a746f9e77d325bcfb87422a3d8bd4f03136ae8a85b37f1898d5fc0", size = 147774, upload-time = "2024-05-23T11:13:55.01Z" },
 ]

+[[package]]
+name = "docstring-parser"
+version = "0.18.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e0/4d/f332313098c1de1b2d2ff91cf2674415cc7cddab2ca1b01ae29774bd5fdf/docstring_parser-0.18.0.tar.gz", hash = "sha256:292510982205c12b1248696f44959db3cdd1740237a968ea1e2e7a900eeb2015", size = 29341, upload-time = "2026-04-14T04:09:19.867Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a7/5f/ed01f9a3cdffbd5a008556fc7b2a08ddb1cc6ace7effa7340604b1d16699/docstring_parser-0.18.0-py3-none-any.whl", hash = "sha256:b3fcbed555c47d8479be0796ef7e19c2670d428d72e96da63f3a40122860374b", size = 22484, upload-time = "2026-04-14T04:09:18.638Z" },
+]
+
 [[package]]
 name = "dogpile-cache"
 version = "1.5.0"
@@ -2851,6 +2881,11 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/83/1d/d6466de3a5249d35e832a52834115ca9d1d0de6abc22065f049707516d47/google_auth-2.48.0-py3-none-any.whl", hash = "sha256:2e2a537873d449434252a9632c28bfc268b0adb1e53f9fb62afc5333a975903f", size = 236499, upload-time = "2026-01-26T19:22:45.099Z" },
 ]

+[package.optional-dependencies]
+requests = [
+    { name = "requests" },
+]
+
 [[package]]
 name = "google-auth-httplib2"
 version = "0.2.0"
@@ -2877,6 +2912,46 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/ca/94/24b010493660dd55e2d9769ae7ef44164aebd7e1f4a9266cf9459affd687/google_cloud_access_context_manager-0.3.0-py3-none-any.whl", hash = "sha256:5d15ad51547f06c281e35f16b4ffcb3e98bb2d898b01470f88b94edfb2eeb0a3", size = 58852, upload-time = "2025-10-17T02:30:33.768Z" },
 ]

+[[package]]
+name = "google-cloud-aiplatform"
+version = "1.153.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "docstring-parser" },
+    { name = "google-api-core", extra = ["grpc"] },
+    { name = "google-auth" },
+    { name = "google-cloud-bigquery" },
+    { name = "google-cloud-resource-manager" },
+    { name = "google-cloud-storage" },
+    { name = "google-genai" },
+    { name = "packaging" },
+    { name = "proto-plus" },
+    { name = "protobuf" },
+    { name = "pydantic" },
+    { name = "typing-extensions" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/d5/97/1779e66ab845550bc602364311ea093ba156cb805a1c31b7c4d6f25b5863/google_cloud_aiplatform-1.153.1.tar.gz", hash = "sha256:445b6c683d5c630f174d81ae1f69f7da9e27e4d4ec5b70c5fe96de5c1247cfbc", size = 11011349, upload-time = "2026-05-15T06:34:14.851Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/16/01/8a1900e7a742ed480e6037ac4f6541466cb981d81bd4cbd34a9d46204ea1/google_cloud_aiplatform-1.153.1-py2.py3-none-any.whl", hash = "sha256:033fa1595a7e8ed1d97066e261e630f38fbc60e10c98c6487cf228fe9c7ec151", size = 9170782, upload-time = "2026-05-15T06:34:10.887Z" },
+]
+
+[[package]]
+name = "google-cloud-artifact-registry"
+version = "1.21.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "google-api-core", extra = ["grpc"] },
+    { name = "google-auth" },
+    { name = "grpc-google-iam-v1" },
+    { name = "grpcio" },
+    { name = "proto-plus" },
+    { name = "protobuf" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/13/2b/24e6956789bc1244efb18143aa4f124e03d870228e5bfd065c04d38a4d6b/google_cloud_artifact_registry-1.21.0.tar.gz", hash = "sha256:546e51eb5d463a6e5c668be6727d14f8ec82bc798031398006b2213d703e184c", size = 315219, upload-time = "2026-03-30T22:50:38.875Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/e1/8c/a5c68031728f38d3306bad5ac10c0ca670cbdf414db308ddefa2c47f2b34/google_cloud_artifact_registry-1.21.0-py3-none-any.whl", hash = "sha256:a07079035438fd0f2e7264d4318b388650495f011db575405c18c9881449025c", size = 250544, upload-time = "2026-03-30T22:48:49.345Z" },
+]
+
 [[package]]
 name = "google-cloud-asset"
 version = "4.2.0"
@@ -2897,6 +2972,37 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/05/88/9a43fae1d2fed94d7f5f46b6f4c44bd15e5ea0e8657632108b5ec5f53d9d/google_cloud_asset-4.2.0-py3-none-any.whl", hash = "sha256:fd7ea04c64948a4779790343204cd5b41d4772d6ab1d05a9125e28a637ac0862", size = 282707, upload-time = "2026-01-09T14:53:03.081Z" },
 ]

+[[package]]
+name = "google-cloud-bigquery"
+version = "3.41.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "google-api-core", extra = ["grpc"] },
+    { name = "google-auth" },
+    { name = "google-cloud-core" },
+    { name = "google-resumable-media" },
+    { name = "packaging" },
+    { name = "python-dateutil" },
+    { name = "requests" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/ce/13/6515c7aab55a4a0cf708ffd309fb9af5bab54c13e32dc22c5acd6497193c/google_cloud_bigquery-3.41.0.tar.gz", hash = "sha256:2217e488b47ed576360c9b2cc07d59d883a54b83167c0ef37f915c26b01a06fe", size = 513434, upload-time = "2026-03-30T22:50:55.347Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/40/33/1d3902efadef9194566d499d61507e1f038454e0b55499d2d7f8ab2a4fee/google_cloud_bigquery-3.41.0-py3-none-any.whl", hash = "sha256:2a5b5a737b401cbd824a6e5eac7554100b878668d908e6548836b5d8aaa4dcaa", size = 262343, upload-time = "2026-03-30T22:48:45.444Z" },
+]
+
+[[package]]
+name = "google-cloud-core"
+version = "2.6.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "google-api-core" },
+    { name = "google-auth" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/a8/dd/1eef226e470369b26824a505c34482c0b493bc35fe8e0c6b003b5feca21a/google_cloud_core-2.6.0.tar.gz", hash = "sha256:e76149739f90fac1fc6757c09f47eaccb3145b54adbd7759b0f7c4b235f46c83", size = 36001, upload-time = "2026-05-07T08:04:04.124Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/84/4a/98da8930ab109c73d9a5d13782a9ebb81ea8c111f6d534a567b71d23e52b/google_cloud_core-2.6.0-py3-none-any.whl", hash = "sha256:6d63ac8e5eca6d9e4319d0a1e2265fadcd7f1049904378caecfa01cf52dd869e", size = 29390, upload-time = "2026-05-07T08:02:34.672Z" },
+]
+
 [[package]]
 name = "google-cloud-org-policy"
 version = "1.16.0"
@@ -2946,6 +3052,93 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/94/ff/4b28bcc791d9d7e4ac8fea00fbd90ccb236afda56746a3b4564d2ae45df3/google_cloud_resource_manager-1.16.0-py3-none-any.whl", hash = "sha256:fb9a2ad2b5053c508e1c407ac31abfd1a22e91c32876c1892830724195819a28", size = 400218, upload-time = "2026-01-15T13:02:47.378Z" },
 ]

+[[package]]
+name = "google-cloud-run"
+version = "0.16.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "google-api-core", extra = ["grpc"] },
+    { name = "google-auth" },
+    { name = "grpc-google-iam-v1" },
+    { name = "grpcio" },
+    { name = "proto-plus" },
+    { name = "protobuf" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/b7/89/dcaf0dc97e39b41e446456ceb60657ab025de79cfccd39cbd739d1a9849e/google_cloud_run-0.16.0.tar.gz", hash = "sha256:d52cf4e6ad3702ae48caccf6abcab543afee6f61c2a6ec753cc62a31e5b629f1", size = 514452, upload-time = "2026-03-26T22:17:05.589Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/fa/c7/46153dc13713b5e4276d86f28ff4563332f9e4bae5ebc83abc5bfd994801/google_cloud_run-0.16.0-py3-none-any.whl", hash = "sha256:d7d2dd7307130fde2a0ce27e96d580dd23b7b2d973b6484b94d902e6b2618860", size = 459112, upload-time = "2026-03-26T22:16:00.018Z" },
+]
+
+[[package]]
+name = "google-cloud-storage"
+version = "3.10.1"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "google-api-core" },
+    { name = "google-auth" },
+    { name = "google-cloud-core" },
+    { name = "google-crc32c" },
+    { name = "google-resumable-media" },
+    { name = "requests" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/4c/47/205eb8e9a1739b5345843e5a425775cbdc472cc38e7eda082ba5b8d02450/google_cloud_storage-3.10.1.tar.gz", hash = "sha256:97db9aa4460727982040edd2bd13ff3d5e2260b5331ad22895802da1fc2a5286", size = 17309950, upload-time = "2026-03-23T09:35:23.409Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/ad/ff/ca9ab2417fa913d75aae38bf40bf856bb2749a604b2e0f701b37cfcd23cc/google_cloud_storage-3.10.1-py3-none-any.whl", hash = "sha256:a72f656759b7b99bda700f901adcb3425a828d4a29f911bc26b3ea79c5b1217f", size = 324453, upload-time = "2026-03-23T09:35:21.368Z" },
+]
+
+[[package]]
+name = "google-crc32c"
+version = "1.8.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/03/41/4b9c02f99e4c5fb477122cd5437403b552873f014616ac1d19ac8221a58d/google_crc32c-1.8.0.tar.gz", hash = "sha256:a428e25fb7691024de47fecfbff7ff957214da51eddded0da0ae0e0f03a2cf79", size = 14192, upload-time = "2025-12-16T00:35:25.142Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/5d/ef/21ccfaab3d5078d41efe8612e0ed0bfc9ce22475de074162a91a25f7980d/google_crc32c-1.8.0-cp311-cp311-macosx_12_0_arm64.whl", hash = "sha256:014a7e68d623e9a4222d663931febc3033c5c7c9730785727de2a81f87d5bab8", size = 31298, upload-time = "2025-12-16T00:20:32.241Z" },
+    { url = "https://files.pythonhosted.org/packages/c5/b8/f8413d3f4b676136e965e764ceedec904fe38ae8de0cdc52a12d8eb1096e/google_crc32c-1.8.0-cp311-cp311-macosx_12_0_x86_64.whl", hash = "sha256:86cfc00fe45a0ac7359e5214a1704e51a99e757d0272554874f419f79838c5f7", size = 30872, upload-time = "2025-12-16T00:33:58.785Z" },
+    { url = "https://files.pythonhosted.org/packages/f6/fd/33aa4ec62b290477181c55bb1c9302c9698c58c0ce9a6ab4874abc8b0d60/google_crc32c-1.8.0-cp311-cp311-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:19b40d637a54cb71e0829179f6cb41835f0fbd9e8eb60552152a8b52c36cbe15", size = 33243, upload-time = "2025-12-16T00:40:21.46Z" },
+    { url = "https://files.pythonhosted.org/packages/71/03/4820b3bd99c9653d1a5210cb32f9ba4da9681619b4d35b6a052432df4773/google_crc32c-1.8.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:17446feb05abddc187e5441a45971b8394ea4c1b6efd88ab0af393fd9e0a156a", size = 33608, upload-time = "2025-12-16T00:40:22.204Z" },
+    { url = "https://files.pythonhosted.org/packages/7c/43/acf61476a11437bf9733fb2f70599b1ced11ec7ed9ea760fdd9a77d0c619/google_crc32c-1.8.0-cp311-cp311-win_amd64.whl", hash = "sha256:71734788a88f551fbd6a97be9668a0020698e07b2bf5b3aa26a36c10cdfb27b2", size = 34439, upload-time = "2025-12-16T00:35:20.458Z" },
+    { url = "https://files.pythonhosted.org/packages/e9/5f/7307325b1198b59324c0fa9807cafb551afb65e831699f2ce211ad5c8240/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_arm64.whl", hash = "sha256:4b8286b659c1335172e39563ab0a768b8015e88e08329fa5321f774275fc3113", size = 31300, upload-time = "2025-12-16T00:21:56.723Z" },
+    { url = "https://files.pythonhosted.org/packages/21/8e/58c0d5d86e2220e6a37befe7e6a94dd2f6006044b1a33edf1ff6d9f7e319/google_crc32c-1.8.0-cp312-cp312-macosx_12_0_x86_64.whl", hash = "sha256:2a3dc3318507de089c5384cc74d54318401410f82aa65b2d9cdde9d297aca7cb", size = 30867, upload-time = "2025-12-16T00:38:31.302Z" },
+    { url = "https://files.pythonhosted.org/packages/ce/a9/a780cc66f86335a6019f557a8aaca8fbb970728f0efd2430d15ff1beae0e/google_crc32c-1.8.0-cp312-cp312-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:14f87e04d613dfa218d6135e81b78272c3b904e2a7053b841481b38a7d901411", size = 33364, upload-time = "2025-12-16T00:40:22.96Z" },
+    { url = "https://files.pythonhosted.org/packages/21/3f/3457ea803db0198c9aaca2dd373750972ce28a26f00544b6b85088811939/google_crc32c-1.8.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:cb5c869c2923d56cb0c8e6bcdd73c009c36ae39b652dbe46a05eb4ef0ad01454", size = 33740, upload-time = "2025-12-16T00:40:23.96Z" },
+    { url = "https://files.pythonhosted.org/packages/df/c0/87c2073e0c72515bb8733d4eef7b21548e8d189f094b5dad20b0ecaf64f6/google_crc32c-1.8.0-cp312-cp312-win_amd64.whl", hash = "sha256:3cc0c8912038065eafa603b238abf252e204accab2a704c63b9e14837a854962", size = 34437, upload-time = "2025-12-16T00:35:21.395Z" },
+    { url = "https://files.pythonhosted.org/packages/52/c5/c171e4d8c44fec1422d801a6d2e5d7ddabd733eeda505c79730ee9607f07/google_crc32c-1.8.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux2014_x86_64.manylinux_2_17_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:87fa445064e7db928226b2e6f0d5304ab4cd0339e664a4e9a25029f384d9bb93", size = 28615, upload-time = "2025-12-16T00:40:29.298Z" },
+    { url = "https://files.pythonhosted.org/packages/9c/97/7d75fe37a7a6ed171a2cf17117177e7aab7e6e0d115858741b41e9dd4254/google_crc32c-1.8.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.whl", hash = "sha256:f639065ea2042d5c034bf258a9f085eaa7af0cd250667c0635a3118e8f92c69c", size = 28800, upload-time = "2025-12-16T00:40:30.322Z" },
+]
+
+[[package]]
+name = "google-genai"
+version = "1.68.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "anyio" },
+    { name = "distro" },
+    { name = "google-auth", extra = ["requests"] },
+    { name = "httpx" },
+    { name = "pydantic" },
+    { name = "requests" },
+    { name = "sniffio" },
+    { name = "tenacity" },
+    { name = "typing-extensions" },
+    { name = "websockets" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/9c/2c/f059982dbcb658cc535c81bbcbe7e2c040d675f4b563b03cdb01018a4bc3/google_genai-1.68.0.tar.gz", hash = "sha256:ac30c0b8bc630f9372993a97e4a11dae0e36f2e10d7c55eacdca95a9fa14ca96", size = 511285, upload-time = "2026-03-18T01:03:18.243Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/84/de/7d3ee9c94b74c3578ea4f88d45e8de9405902f857932334d81e89bce3dfa/google_genai-1.68.0-py3-none-any.whl", hash = "sha256:a1bc9919c0e2ea2907d1e319b65471d3d6d58c54822039a249fe1323e4178d15", size = 750912, upload-time = "2026-03-18T01:03:15.983Z" },
+]
+
+[[package]]
+name = "google-resumable-media"
+version = "2.9.0"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "google-crc32c" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/00/4b/0b235beccc310d0a48adbc7246b719d173cca6c88c572dfa4b090e39143c/google_resumable_media-2.9.0.tar.gz", hash = "sha256:f7cfb224846a9dd444d125115dfbe8ef02a2b893e78f087762fe716a255a734b", size = 2164534, upload-time = "2026-05-07T08:04:44.236Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/07/73/3518e63deb1667c5409a4579e28daf5e84479a87a72c547e0487f7883dcd/google_resumable_media-2.9.0-py3-none-any.whl", hash = "sha256:c8901e88e389af8bed64d9696c74d8bad961865eb2236e13e0bfca9bb0a65ca3", size = 81507, upload-time = "2026-05-07T08:03:23.809Z" },
+]
+
 [[package]]
 name = "googleapis-common-protos"
 version = "1.72.0"
@@ -4606,7 +4799,7 @@ dev = [

 [package.metadata]
 requires-dist = [
-    { name = "cartography", specifier = "==0.135.0" },
+    { name = "cartography", specifier = "==0.138.1" },
    { name = "celery", specifier = "==5.6.2" },
    { name = "defusedxml", specifier = "==0.7.1" },
    { name = "dj-rest-auth", extras = ["with-social", "jwt"], specifier = "==7.0.1" },
@@ -5931,6 +6124,38 @@ wheels = [
    { url = "https://files.pythonhosted.org/packages/34/db/b10e48aa8fff7407e67470363eac595018441cf32d5e1001567a7aeba5d2/websocket_client-1.9.0-py3-none-any.whl", hash = "sha256:af248a825037ef591efbf6ed20cc5faa03d3b47b9e5a2230a529eeee1c1fc3ef", size = 82616, upload-time = "2025-10-07T21:16:34.951Z" },
 ]

+[[package]]
+name = "websockets"
+version = "16.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/04/24/4b2031d72e840ce4c1ccb255f693b15c334757fc50023e4db9537080b8c4/websockets-16.0.tar.gz", hash = "sha256:5f6261a5e56e8d5c42a4497b364ea24d94d9563e8fbd44e78ac40879c60179b5", size = 179346, upload-time = "2026-01-10T09:23:47.181Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/f2/db/de907251b4ff46ae804ad0409809504153b3f30984daf82a1d84a9875830/websockets-16.0-cp311-cp311-macosx_10_9_universal2.whl", hash = "sha256:31a52addea25187bde0797a97d6fc3d2f92b6f72a9370792d65a6e84615ac8a8", size = 177340, upload-time = "2026-01-10T09:22:34.539Z" },
+    { url = "https://files.pythonhosted.org/packages/f3/fa/abe89019d8d8815c8781e90d697dec52523fb8ebe308bf11664e8de1877e/websockets-16.0-cp311-cp311-macosx_10_9_x86_64.whl", hash = "sha256:417b28978cdccab24f46400586d128366313e8a96312e4b9362a4af504f3bbad", size = 175022, upload-time = "2026-01-10T09:22:36.332Z" },
+    { url = "https://files.pythonhosted.org/packages/58/5d/88ea17ed1ded2079358b40d31d48abe90a73c9e5819dbcde1606e991e2ad/websockets-16.0-cp311-cp311-macosx_11_0_arm64.whl", hash = "sha256:af80d74d4edfa3cb9ed973a0a5ba2b2a549371f8a741e0800cb07becdd20f23d", size = 175319, upload-time = "2026-01-10T09:22:37.602Z" },
+    { url = "https://files.pythonhosted.org/packages/d2/ae/0ee92b33087a33632f37a635e11e1d99d429d3d323329675a6022312aac2/websockets-16.0-cp311-cp311-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:08d7af67b64d29823fed316505a89b86705f2b7981c07848fb5e3ea3020c1abe", size = 184631, upload-time = "2026-01-10T09:22:38.789Z" },
+    { url = "https://files.pythonhosted.org/packages/c8/c5/27178df583b6c5b31b29f526ba2da5e2f864ecc79c99dae630a85d68c304/websockets-16.0-cp311-cp311-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:7be95cfb0a4dae143eaed2bcba8ac23f4892d8971311f1b06f3c6b78952ee70b", size = 185870, upload-time = "2026-01-10T09:22:39.893Z" },
+    { url = "https://files.pythonhosted.org/packages/87/05/536652aa84ddc1c018dbb7e2c4cbcd0db884580bf8e95aece7593fde526f/websockets-16.0-cp311-cp311-musllinux_1_2_aarch64.whl", hash = "sha256:d6297ce39ce5c2e6feb13c1a996a2ded3b6832155fcfc920265c76f24c7cceb5", size = 185361, upload-time = "2026-01-10T09:22:41.016Z" },
+    { url = "https://files.pythonhosted.org/packages/6d/e2/d5332c90da12b1e01f06fb1b85c50cfc489783076547415bf9f0a659ec19/websockets-16.0-cp311-cp311-musllinux_1_2_x86_64.whl", hash = "sha256:1c1b30e4f497b0b354057f3467f56244c603a79c0d1dafce1d16c283c25f6e64", size = 184615, upload-time = "2026-01-10T09:22:42.442Z" },
+    { url = "https://files.pythonhosted.org/packages/77/fb/d3f9576691cae9253b51555f841bc6600bf0a983a461c79500ace5a5b364/websockets-16.0-cp311-cp311-win32.whl", hash = "sha256:5f451484aeb5cafee1ccf789b1b66f535409d038c56966d6101740c1614b86c6", size = 178246, upload-time = "2026-01-10T09:22:43.654Z" },
+    { url = "https://files.pythonhosted.org/packages/54/67/eaff76b3dbaf18dcddabc3b8c1dba50b483761cccff67793897945b37408/websockets-16.0-cp311-cp311-win_amd64.whl", hash = "sha256:8d7f0659570eefb578dacde98e24fb60af35350193e4f56e11190787bee77dac", size = 178684, upload-time = "2026-01-10T09:22:44.941Z" },
+    { url = "https://files.pythonhosted.org/packages/84/7b/bac442e6b96c9d25092695578dda82403c77936104b5682307bd4deb1ad4/websockets-16.0-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:71c989cbf3254fbd5e84d3bff31e4da39c43f884e64f2551d14bb3c186230f00", size = 177365, upload-time = "2026-01-10T09:22:46.787Z" },
+    { url = "https://files.pythonhosted.org/packages/b0/fe/136ccece61bd690d9c1f715baaeefd953bb2360134de73519d5df19d29ca/websockets-16.0-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:8b6e209ffee39ff1b6d0fa7bfef6de950c60dfb91b8fcead17da4ee539121a79", size = 175038, upload-time = "2026-01-10T09:22:47.999Z" },
+    { url = "https://files.pythonhosted.org/packages/40/1e/9771421ac2286eaab95b8575b0cb701ae3663abf8b5e1f64f1fd90d0a673/websockets-16.0-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:86890e837d61574c92a97496d590968b23c2ef0aeb8a9bc9421d174cd378ae39", size = 175328, upload-time = "2026-01-10T09:22:49.809Z" },
+    { url = "https://files.pythonhosted.org/packages/18/29/71729b4671f21e1eaa5d6573031ab810ad2936c8175f03f97f3ff164c802/websockets-16.0-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:9b5aca38b67492ef518a8ab76851862488a478602229112c4b0d58d63a7a4d5c", size = 184915, upload-time = "2026-01-10T09:22:51.071Z" },
+    { url = "https://files.pythonhosted.org/packages/97/bb/21c36b7dbbafc85d2d480cd65df02a1dc93bf76d97147605a8e27ff9409d/websockets-16.0-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:e0334872c0a37b606418ac52f6ab9cfd17317ac26365f7f65e203e2d0d0d359f", size = 186152, upload-time = "2026-01-10T09:22:52.224Z" },
+    { url = "https://files.pythonhosted.org/packages/4a/34/9bf8df0c0cf88fa7bfe36678dc7b02970c9a7d5e065a3099292db87b1be2/websockets-16.0-cp312-cp312-musllinux_1_2_aarch64.whl", hash = "sha256:a0b31e0b424cc6b5a04b8838bbaec1688834b2383256688cf47eb97412531da1", size = 185583, upload-time = "2026-01-10T09:22:53.443Z" },
+    { url = "https://files.pythonhosted.org/packages/47/88/4dd516068e1a3d6ab3c7c183288404cd424a9a02d585efbac226cb61ff2d/websockets-16.0-cp312-cp312-musllinux_1_2_x86_64.whl", hash = "sha256:485c49116d0af10ac698623c513c1cc01c9446c058a4e61e3bf6c19dff7335a2", size = 184880, upload-time = "2026-01-10T09:22:55.033Z" },
+    { url = "https://files.pythonhosted.org/packages/91/d6/7d4553ad4bf1c0421e1ebd4b18de5d9098383b5caa1d937b63df8d04b565/websockets-16.0-cp312-cp312-win32.whl", hash = "sha256:eaded469f5e5b7294e2bdca0ab06becb6756ea86894a47806456089298813c89", size = 178261, upload-time = "2026-01-10T09:22:56.251Z" },
+    { url = "https://files.pythonhosted.org/packages/c3/f0/f3a17365441ed1c27f850a80b2bc680a0fa9505d733fe152fdf5e98c1c0b/websockets-16.0-cp312-cp312-win_amd64.whl", hash = "sha256:5569417dc80977fc8c2d43a86f78e0a5a22fee17565d78621b6bb264a115d4ea", size = 178693, upload-time = "2026-01-10T09:22:57.478Z" },
+    { url = "https://files.pythonhosted.org/packages/72/07/c98a68571dcf256e74f1f816b8cc5eae6eb2d3d5cfa44d37f801619d9166/websockets-16.0-pp311-pypy311_pp73-macosx_10_15_x86_64.whl", hash = "sha256:349f83cd6c9a415428ee1005cadb5c2c56f4389bc06a9af16103c3bc3dcc8b7d", size = 174947, upload-time = "2026-01-10T09:23:36.166Z" },
+    { url = "https://files.pythonhosted.org/packages/7e/52/93e166a81e0305b33fe416338be92ae863563fe7bce446b0f687b9df5aea/websockets-16.0-pp311-pypy311_pp73-macosx_11_0_arm64.whl", hash = "sha256:4a1aba3340a8dca8db6eb5a7986157f52eb9e436b74813764241981ca4888f03", size = 175260, upload-time = "2026-01-10T09:23:37.409Z" },
+    { url = "https://files.pythonhosted.org/packages/56/0c/2dbf513bafd24889d33de2ff0368190a0e69f37bcfa19009ef819fe4d507/websockets-16.0-pp311-pypy311_pp73-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:f4a32d1bd841d4bcbffdcb3d2ce50c09c3909fbead375ab28d0181af89fd04da", size = 176071, upload-time = "2026-01-10T09:23:39.158Z" },
+    { url = "https://files.pythonhosted.org/packages/a5/8f/aea9c71cc92bf9b6cc0f7f70df8f0b420636b6c96ef4feee1e16f80f75dd/websockets-16.0-pp311-pypy311_pp73-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:0298d07ee155e2e9fda5be8a9042200dd2e3bb0b8a38482156576f863a9d457c", size = 176968, upload-time = "2026-01-10T09:23:41.031Z" },
+    { url = "https://files.pythonhosted.org/packages/9a/3f/f70e03f40ffc9a30d817eef7da1be72ee4956ba8d7255c399a01b135902a/websockets-16.0-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:a653aea902e0324b52f1613332ddf50b00c06fdaf7e92624fbf8c77c78fa5767", size = 178735, upload-time = "2026-01-10T09:23:42.259Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/28/258ebab549c2bf3e64d2b0217b973467394a9cea8c42f70418ca2c5d0d2e/websockets-16.0-py3-none-any.whl", hash = "sha256:1637db62fad1dc833276dded54215f2c7fa46912301a24bd94d45d46a011ceec", size = 171598, upload-time = "2026-01-10T09:23:45.395Z" },
+]
+
 [[package]]
 name = "werkzeug"
 version = "3.1.7"
@@ -5945,16 +6170,16 @@ wheels = [

 [[package]]
 name = "workos"
-version = "6.0.4"
+version = "6.0.8"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
    { name = "cryptography" },
    { name = "httpx" },
    { name = "pyjwt", extra = ["crypto"] },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/3c/2f/99fb8718274116c5c146c745755620fd5c5943f78ca52ca9b17e94348286/workos-6.0.4.tar.gz", hash = "sha256:b0bfe8fd212b8567422c4ea3732eb33608794033eb3a69900c6b04db183c32d6", size = 172217, upload-time = "2026-04-16T03:09:28.583Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/ca/0d/0a7f78912657f99412c788932ea1f3f4089916e77bdef7d2463842febe08/workos-6.0.8.tar.gz", hash = "sha256:43aa3f1992a0a4ca8933d9b6e5ada846dd3b1fe0ee10e64c876ee2000fc6090d", size = 178137, upload-time = "2026-04-24T18:48:03.203Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/fa/f1/d2ab661e6dc2828a4c73e38f12630c3b109cfe2bc664ab70631c04f0db4b/workos-6.0.4-py3-none-any.whl", hash = "sha256:548668b3702673536f853ba72a7b5bbbc269e467aaf9ac4f477b6e0177df5e21", size = 511418, upload-time = "2026-04-16T03:09:27.098Z" },
+    { url = "https://files.pythonhosted.org/packages/b2/3f/3d96da80d650b2f97d58af626053354584f619dbb769051e118bd9cd1ca5/workos-6.0.8-py3-none-any.whl", hash = "sha256:a00dd4930333aded2babbba824f8032eea05c5ca8c44d04a3fa068cf6be6e21a", size = 524505, upload-time = "2026-04-24T18:48:01.389Z" },
 ]

 [[package]]
@@ -3,13 +3,13 @@ title: "Attack Paths"
 description: "Identify privilege escalation chains and security misconfigurations across cloud environments using graph-based analysis."
 ---

-import { VersionBadge } from "/snippets/version-badge.mdx"
+import { VersionBadge } from "/snippets/version-badge.mdx";

 <VersionBadge version="5.17.0" />

 Attack Paths analyzes relationships between cloud resources, permissions, and security findings to detect how privileges can be escalated and how misconfigurations can be exploited by threat actors.

-By mapping these relationships as a graph, Attack Paths reveals risks that individual security checks cannot detect on their own — such as an IAM role that can escalate its own permissions, or a chain of policies that grants unintended access to sensitive resources.
+By mapping these relationships as a graph, Attack Paths reveals risks that individual security checks cannot detect on their own, such as an IAM role that can escalate its own permissions, or a chain of policies that grants unintended access to sensitive resources.

 <Note>
  Attack Paths is currently available for **AWS** providers. Support for
@@ -21,7 +21,7 @@ By mapping these relationships as a graph, Attack Paths reveals risks that indiv
 The following prerequisites are required for Attack Paths:

 - **An AWS provider is configured** with valid credentials in Prowler App. For setup instructions, see [Getting Started with AWS](/user-guide/providers/aws/getting-started-aws).
- **At least one scan has completed** on the configured AWS provider. Attack Paths scans run automatically alongside regular security scans — no separate configuration is required.
+- **At least one scan has completed** on the configured AWS provider. Attack Paths scans run automatically alongside regular security scans, no separate configuration is required.

 ## How Attack Paths Scans Work

@@ -145,11 +145,10 @@ LIMIT 25
 **IAM principals with wildcard Allow statements:**

 ```cypher
-MATCH (principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
-WHERE stmt.effect = 'Allow'
-  AND ANY(action IN stmt.action WHERE action = '*')
-RETURN principal.arn AS principal, policy.arn AS policy,
-       stmt.action AS actions, stmt.resource AS resources
+MATCH (principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
+WHERE a.value = '*'
+RETURN DISTINCT principal.arn AS principal, policy.arn AS policy
 LIMIT 25
 ```

@@ -173,218 +172,89 @@ RETURN r.name AS role_name, r.arn AS role_arn, p.arn AS trusted_service
 LIMIT 25
 ```

-### Advanced Attack Path Scenarios
+### Working with List-Typed Properties

-The following scenarios show how to compose graph traversals into real attack-path stories. Each query can be pasted directly into the custom query box: the API auto-scopes them to the selected provider and injects tenant/provider isolation, so there is no need to include account identifiers or `$provider_uid` in the text. All queries are openCypher v9 (Neo4j and Neptune compatible).
+Some Cartography node properties carry a list of values, such as `action`, `resource`, `notaction`, and `notresource` on `AWSPolicyStatement` nodes, the algorithms on `KMSKey`, the container-definition lists on `ECSContainerDefinition`, and many others. The Attack Paths graph models each such property as a set of child item nodes connected to the parent by a typed edge. To read the values, traverse the edge; the parent does not carry the list as a single field.

-#### 1. Live attacker on the box that owns the keys
+The naming convention for any list-typed property on a parent label is:

-**Query story:** Finds an internet-exposed EC2 under an active GuardDuty SSH brute-force whose instance role can assume a higher-privileged role that can read a sensitive S3 bucket.
+- **Child label:** `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` resolves to `AWSPolicyStatementResourceItem`.
+- **Edge type:** `HAS_<PROPERTY_UPPER>`. Example: `resource` resolves to `HAS_RESOURCE`.
+- **Child property:** `value` for scalar lists (one string per list element). List-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) carry the original dict keys as named fields on the child node.
+
+To express "at least one item in the list satisfies a predicate", traverse the `HAS_*` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`. `RETURN DISTINCT` collapses duplicate parent rows produced when multiple child items satisfy the filter:

 ```cypher
-MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
-WHERE ec2.exposed_internet = true
-MATCH p0 = (gd:GuardDutyFinding)-[:AFFECTS]->(ec2)
-MATCH p1 = (ec2)-[:INSTANCE_PROFILE]->(prof:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(low:AWSRole)
-MATCH p2 = (low)-[:STS_ASSUMEROLE_ALLOW]-(high:AWSRole)
-MATCH p3 = (high)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
-OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
-MATCH path_s3 = (acct)--(s3:S3Bucket)
-WHERE high <> low
-  AND stmt.effect = 'Allow'
-  AND size([a IN stmt.action WHERE
-        toLower(a) STARTS WITH 's3:getobject'
-        OR toLower(a) STARTS WITH 's3:listbucket'
-        OR toLower(a) IN ['s3:*']
-      ]) > 0
-  AND size([r IN stmt.resource WHERE
-        r CONTAINS s3.name
-      ]) > 0
-RETURN path_net, path_ec2, p0, p1, p2, p3, path_s3
-```
-
-**How it's built:**
-
- `path_ec2` anchors the graph on the account node and its internet-exposed EC2 instance, via a real account-to-resource edge. This is the visible spine that keeps everything connected.
- `p0` ties a `GuardDutyFinding` to that instance through the `AFFECTS` edge (the live SSH brute-force alert).
- `p1` walks the real graph edges from the instance to its instance profile to the role it runs as.
- `p2` follows the `STS_ASSUMEROLE_ALLOW` edge to the higher-privileged role the low role can assume. It is undirected so it works regardless of how the assume edge was ingested. `high <> low` stops a role matching itself.
- `p3` walks that role into its policy and policy statement.
- `path_net` is the optional `Internet -[:CAN_ACCESS]-> instance` edge. It makes "from the internet" literal on screen. Optional so a missing `Internet` node never breaks the query live.
- `path_s3` connects the sensitive bucket to the same account node, so it draws connected instead of floating. There is no physical edge from a role to a bucket; the grant is logical, enforced in the `WHERE`: the statement must allow an S3 read action (list comprehension over the `action` array) and its resource must cover the bucket (`CONTAINS s3.name`). The account is the shared hub; the bucket hanging off it next to the role chain is the teaching moment — the access exists only in IAM.
-
-#### 2. Who can read the crown jewels
-
-**Query story:** The sensitive bucket from the previous scenario seen from the data side: every role whose IAM policy can read it, regardless of how the role is reached.
-
-```cypher
-MATCH (s3:S3Bucket)
-WHERE toLower(s3.name) CONTAINS 'sensitive'
-MATCH (role:AWSRole)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
-WHERE stmt.effect = 'Allow'
-  AND size([a IN stmt.action WHERE
-        toLower(a) STARTS WITH 's3:get'
-        OR toLower(a) STARTS WITH 's3:list'
-        OR toLower(a) IN ['s3:*']
-      ]) > 0
-  AND size([r IN stmt.resource WHERE
-        r CONTAINS s3.name
-      ]) > 0
-WITH DISTINCT s3, role
+MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
+WHERE toLower(a.value) STARTS WITH 's3:get'
+   OR toLower(a.value) STARTS WITH 's3:list'
+RETURN DISTINCT stmt
 LIMIT 25
-MATCH path_s3 = (acct:AWSAccount)--(s3)
-MATCH path_role = (acct)--(role)
-RETURN path_s3, path_role
 ```

-**How it's built:** data-centric, not attacker-centric — the same bucket the previous kill chain exfiltrates, approached from the other direction.
-
- The `S3Bucket` is bound first by name (one node), so everything else filters against it.
- `(role:AWSRole)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)` reaches statements only *through a role*, never via a global statement scan. A blanket `AWSPolicyStatement` scan also hits resource-policy statements whose shape differs and makes the list comprehension fail outright.
- The `WHERE` filters in place: an S3 read action plus a resource that names that bucket.
- `WITH DISTINCT s3, role LIMIT 25` collapses undirected-traversal duplicates and hard-caps the result.
- `path_s3` and `path_role` attach the account hubs only after the cap, against at most 25 rows, so the bucket and role(s) draw connected through the account instead of floating.
- No internet or EC2 here; this answers "who has the keys" instead of "how would an attacker get in."
-
-#### 3. Lateral reach from an internet-exposed instance
-
-**Query story:** The wide-angle view of the live-attacker scenario: every internet-exposed EC2, the role it runs as, and every role that role can assume. The first scenario is one specific exfiltration path inside this reach, under live attack.
+To check whether every item in the list satisfies a predicate, count the counter-examples and require zero, together with a guard that ensures at least one item is attached. This is the one case where the pattern-comprehension form is the right tool:

 ```cypher
-MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
-WHERE ec2.exposed_internet = true
-MATCH p1 = (ec2)-[:INSTANCE_PROFILE]->(prof:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(low:AWSRole)
-MATCH p2 = (low)-[:STS_ASSUMEROLE_ALLOW]-(high:AWSRole)
-OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
-WHERE high <> low
-RETURN path_net, path_ec2, p1, p2
+MATCH (stmt:AWSPolicyStatement)
+WHERE size([
+    (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
+    WHERE NOT toLower(a.value) STARTS WITH 's3:'
+    | a
+  ]) = 0
+  AND size([(stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem) | a]) > 0
+RETURN stmt
+LIMIT 25
 ```

-**How it's built:** widens the lens instead of filtering down. It stops at the assume-role hop and shows every role reachable from any internet-exposed instance, without filtering down to a specific S3 leg.
-
- `path_ec2` is the account-to-instance spine.
- `p1` walks to the instance role.
- `p2` fans out to every role that role can assume.
- `path_net` adds the optional `Internet -[:CAN_ACCESS]->` edge.
- The first scenario is the specific exfiltration path under live attack; this is the broader privilege reach an attacker inherits the moment they land on the box.
-
-#### 4. Role-chain privilege escalation
-
-**Query story:** A pure-IAM escalation, no compromised instance: a role that can assume a second role whose policy lets it assume a third, admin-level role.
+For the "is any item of this list a substring of a dynamic value" case, such as "does any resource pattern in this policy match a target role ARN", add the `HAS_*` traversal as its own `MATCH` and check the substring relationship between the item value and the dynamic node in `WHERE`:

 ```cypher
-MATCH path_root = (acct:AWSAccount)--(r1:AWSRole)
-MATCH p1 = (r1)-[:STS_ASSUMEROLE_ALLOW]-(r2:AWSRole)
-MATCH p2 = (r2)--(pol:AWSPolicy)--(stmt:AWSPolicyStatement)
-MATCH path_admin = (acct)--(admin:AWSRole)
-WHERE r1 <> r2 AND r1 <> admin AND r2 <> admin
-  AND stmt.effect = 'Allow'
-  AND size([a IN stmt.action WHERE
-        toLower(a) IN ['sts:*', 'sts:assumerole']
-      ]) > 0
-  AND size([res IN stmt.resource WHERE
-        res CONTAINS admin.name
-      ]) > 0
-RETURN path_root, p1, p2, path_admin
+MATCH (role:AWSRole)
+WHERE role.name = 'Admin'
+MATCH (principal:AWSPrincipal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt)-[:HAS_RESOURCE]->(r:AWSPolicyStatementResourceItem)
+WHERE r.value = '*'
+   OR r.value CONTAINS role.name
+   OR role.arn CONTAINS r.value
+RETURN DISTINCT principal.arn AS principal, stmt, role
+LIMIT 25
 ```

-**How it's built:**
-
- `path_root` anchors role 1 to the account node, the spine that keeps the picture connected.
- `p1` is the one real assume edge in the chain (role 1 to role 2).
- `p2` walks role 2 into its policy and statement.
- `path_admin` connects the target admin role to the same account node so it draws connected. The third hop is not a graph edge: it exists only as `sts:AssumeRole` on that role's ARN inside the statement. The query proves it the same way the first scenario proves S3 access — the statement action must include an assume-role action and its resource list must reference the admin role's name.
- The three `<>` guards stop a role matching itself at any position.
-
-#### 5. External identity trust map
-
-**Query story:** Finds external identity providers (SSO, GitHub, GitLab, Terraform Cloud) and the AWS roles they are trusted to assume.
+To return the list of values directly, collect them from the child items:

 ```cypher
-MATCH p = (role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(idp:AWSPrincipal)
-WHERE idp.arn CONTAINS 'saml-provider'
-   OR idp.arn CONTAINS 'oidc-provider'
-MATCH path_role = (acct:AWSAccount)--(role)
-RETURN p, path_role
+MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
+OPTIONAL MATCH (stmt)-[:HAS_ACTION]->(a:AWSPolicyStatementActionItem)
+RETURN stmt, collect(a.value) AS actions
+LIMIT 25
 ```

-**How it's built:** federated principals are stored as `AWSPrincipal` nodes whose ARN contains `saml-provider` (SSO) or `oidc-provider` (GitHub, GitLab, Terraform Cloud).
+### Working with JSON-Encoded Properties

- `p` matches the trust edge undirected. It is written `(AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(AWSPrincipal)`, role to principal, so a directed `principal -> role` match returns nothing; undirected matches regardless of ingest direction.
- The `WHERE` keeps only SAML or OIDC providers, drawing a fan-out from each external identity provider to every role it can assume (including reserved SSO admin roles).
- `path_role` ties every trusted role to the account node so the provider stars share one spine instead of drawing as separate islands.
+Some Cartography properties represent nested objects, most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement` nodes. In the Attack Paths graph, object-typed properties are stored as JSON-encoded strings to keep the schema portable across graph backends. The value looks like:

-#### 6. Federated SSO roles flagged as admin or privesc
+```
+'{"StringEquals":{"aws:SourceAccount":"123456789012"}}'
+```

-**Query story:** The dangerous subset of the trust map above — externally-federated SSO roles that Prowler also flags for AdministratorAccess or privilege escalation.
+There is no JSON parser available at query time, so use `CONTAINS` for substring checks against keys or known values:

 ```cypher
-MATCH (idp:AWSPrincipal)-[:TRUSTS_AWS_PRINCIPAL]-(role:AWSRole)
-WHERE idp.arn CONTAINS 'saml-provider'
-   OR idp.arn CONTAINS 'oidc-provider'
-MATCH (role)-[:HAS_FINDING]-(pf:ProwlerFinding)
-WHERE pf.status = 'FAIL'
-  AND pf.check_id IN [
-    'iam_inline_policy_allows_privilege_escalation',
-    'iam_role_administratoraccess_policy',
-    'iam_inline_policy_no_administrative_privileges',
-    'iam_user_administrator_access_policy'
-  ]
-WITH DISTINCT idp, role, pf
-LIMIT 60
-MATCH path_root = (acct:AWSAccount)--(role)
-MATCH p_trust = (idp)-[:TRUSTS_AWS_PRINCIPAL]-(role)
-MATCH p_find = (role)-[:HAS_FINDING]-(pf)
-RETURN path_root, p_trust, p_find
+MATCH (stmt:AWSPolicyStatement)
+WHERE stmt.effect = 'Allow'
+  AND stmt.condition CONTAINS '"aws:SourceAccount"'
+RETURN stmt
+LIMIT 25
 ```

-**How it's built:** a plain "list every flagged identity" query is a wide fan that draws as a column, and `ProwlerFinding` nodes accumulate across scans with no scan filter available in custom queries.
-
- The first MATCH plus `WHERE` keeps only roles trusted by a SAML or OIDC provider (trust edge undirected, so direction does not matter).
- The second MATCH plus `check_id IN [...]` keeps only those carrying one of the four privilege-escalation or admin checks.
- `WITH DISTINCT ... LIMIT 60` collapses duplicate finding nodes and hard-caps the result.
- `p_trust`, `p_find`, and `path_root` draw it connected three ways: provider to role through the trust edge, role to its finding, and role to the account.
- The previous scenario shows who can walk in; this shows which of those roles Prowler already flags as over-privileged.
-
-#### 7. World-readable S3 buckets
-
-**Query story:** Unlike the IAM-gated sensitive bucket in scenarios 1 and 2, these buckets are open to anyone on the internet with no credentials at all.
-
-```cypher
-MATCH path_s3 = (acct:AWSAccount)--(s3:S3Bucket)
-WHERE s3.anonymous_access = true
-OPTIONAL MATCH p = (s3)--(stmt:S3PolicyStatement)
-RETURN path_s3, p
-```
-
-**How it's built:** the counterpoint to scenarios 1 and 2 — there the sensitive bucket is reachable only through an IAM role chain; here the bucket needs no role at all.
-
- `path_s3` connects each public bucket to its account node so they draw connected. Cartography sets `anonymous_access = true` when a bucket's policy or ACL allows public access.
- `p` is an optional match that pulls in the `S3PolicyStatement` granting the access where one exists, so the public grant is visible next to the bucket. Buckets that are public via ACL only still show, connected to the account.
-
-#### 8. Internet exposure surface
-
-**Query story:** The raw external attack surface behind scenarios 1 and 3: every internet-exposed EC2 instance with its security groups and the exact inbound ports left open.
-
-```cypher
-MATCH path_ec2 = (acct:AWSAccount)--(ec2:EC2Instance)
-WHERE ec2.exposed_internet = true
-MATCH p1 = (ec2)--(sg:EC2SecurityGroup)--(rule:IpPermissionInbound)
-OPTIONAL MATCH path_net = (internet:Internet)-[:CAN_ACCESS]->(ec2)
-OPTIONAL MATCH p2 = (ec2)-[:INSTANCE_PROFILE]->(:AWSInstanceProfile)-[:ASSOCIATED_WITH]->(:AWSRole)
-RETURN path_net, path_ec2, p1, p2
-```
-
-**How it's built:** `exposed_internet = true` is Cartography's computed reachability flag.
-
- `path_ec2` hubs all exposed instances on the account node so they draw as one picture.
- `p1` joins each instance to its security groups and inbound rules so the open ports are on screen.
- `path_net` adds the optional `Internet -[:CAN_ACCESS]->` edge so the external reachability is explicit.
- `p2` optionally adds the instance role, which connects this surface view back to the kill chains in scenarios 1 and 3.
+When a query needs to inspect the structured members of a condition (for example, evaluate every operator and key), fetch the rows first and parse the JSON in application code. Cypher cannot navigate JSON object keys or values.

 ### Tips for Writing Queries

 - Start small with `LIMIT` to inspect the shape of the data before broadening the pattern.
+- Traverse `HAS_*` edges to reach list-typed property values (for example `action`, `resource`). The parent node does not carry the list as a single field; see [Working with List-Typed Properties](#working-with-list-typed-properties) for the patterns.
+- On large scans, avoid broad disconnected patterns such as `MATCH (a:Label), (b:OtherLabel)`. Bind one side with a selective predicate first, and use `WITH DISTINCT` between expanding traversals when duplicates are possible.
 - Use `RETURN` projections (`RETURN n.name, n.region`) instead of returning whole nodes to keep responses compact.
 - Combine resource nodes with `ProwlerFinding` nodes via `HAS_FINDING` to correlate misconfigurations with the affected resources.
 - When a query times out or returns no rows, simplify the pattern step by step until the first variant runs successfully, then add constraints back.
@@ -401,6 +271,8 @@ In addition to the upstream schema, Prowler enriches the graph with:

 - **`ProwlerFinding`** nodes representing Prowler check results, linked to affected resources via `HAS_FINDING` relationships.
 - **`Internet`** nodes used to model exposure paths from the public internet to internal resources.
+- **List-typed properties** such as `action` or `resource` on `AWSPolicyStatement`, the algorithm lists on `KMSKey`, and similar lists on other node types are modeled as child item nodes linked by typed `HAS_*` edges. See [Working with List-Typed Properties](#working-with-list-typed-properties) for the read pattern.
+- **Object-typed properties** such as `condition` on `AWSPolicyStatement` are stored as JSON-encoded strings. See [Working with JSON-Encoded Properties](#working-with-json-encoded-properties) for the read pattern.

 <Note>
  AI assistants connected through Prowler MCP Server can fetch the exact
@@ -539,105 +411,106 @@ Attack Paths currently supports the following built-in queries for AWS:

 #### Custom Attack Path Queries

-| Query | Description |
-|---|---|
+| Query                                             | Description                                                                              |
+| ------------------------------------------------- | ---------------------------------------------------------------------------------------- |
 | **Internet-Exposed EC2 with Sensitive S3 Access** | Find SSH-exposed EC2 instances that can assume roles to read tagged sensitive S3 buckets |

 #### Basic Resource Queries

-| Query | Description |
-|---|---|
-| **RDS Instances Inventory** | List all provisioned RDS database instances in the account |
-| **Unencrypted RDS Instances** | Find RDS instances with storage encryption disabled |
-| **S3 Buckets with Anonymous Access** | Find S3 buckets that allow anonymous access |
-| **IAM Statements Allowing All Actions** | Find IAM policy statements that allow all actions via wildcard (\*) |
-| **IAM Statements Allowing Policy Deletion** | Find IAM policy statements that allow iam:DeletePolicy |
-| **IAM Statements Allowing Create Actions** | Find IAM policy statements that allow any create action |
+| Query                                       | Description                                                         |
+| ------------------------------------------- | ------------------------------------------------------------------- |
+| **RDS Instances Inventory**                 | List all provisioned RDS database instances in the account          |
+| **Unencrypted RDS Instances**               | Find RDS instances with storage encryption disabled                 |
+| **S3 Buckets with Anonymous Access**        | Find S3 buckets that allow anonymous access                         |
+| **IAM Statements Allowing All Actions**     | Find IAM policy statements that allow all actions via wildcard (\*) |
+| **IAM Statements Allowing Policy Deletion** | Find IAM policy statements that allow iam:DeletePolicy              |
+| **IAM Statements Allowing Create Actions**  | Find IAM policy statements that allow any create action             |

 #### Network Exposure Queries

-| Query | Description |
-|---|---|
-| **Internet-Exposed EC2 Instances** | Find EC2 instances flagged as exposed to the internet |
+| Query                                                 | Description                                                                         |
+| ----------------------------------------------------- | ----------------------------------------------------------------------------------- |
+| **Internet-Exposed EC2 Instances**                    | Find EC2 instances flagged as exposed to the internet                               |
 | **Open Security Groups on Internet-Facing Resources** | Find internet-facing resources with security groups allowing inbound from 0.0.0.0/0 |
-| **Internet-Exposed Classic Load Balancers** | Find Classic Load Balancers exposed to the internet with their listeners |
-| **Internet-Exposed ALB/NLB Load Balancers** | Find ELBv2 (ALB/NLB) load balancers exposed to the internet with their listeners |
-| **Resource Lookup by Public IP** | Find the AWS resource associated with a given public IP address |
+| **Internet-Exposed Classic Load Balancers**           | Find Classic Load Balancers exposed to the internet with their listeners            |
+| **Internet-Exposed ALB/NLB Load Balancers**           | Find ELBv2 (ALB/NLB) load balancers exposed to the internet with their listeners    |
+| **Resource Lookup by Public IP**                      | Find the AWS resource associated with a given public IP address                     |

 #### Privilege Escalation Queries

 These queries are based on research from [pathfinding.cloud](https://pathfinding.cloud) by Datadog.

-| Query | Description |
-|---|---|
-| **App Runner Service Creation with Privileged Role (APPRUNNER-001)** | Create an App Runner service with a privileged IAM role to gain its permissions |
-| **App Runner Service Update for Role Access (APPRUNNER-002)** | Update an existing App Runner service to leverage its already-attached privileged role |
-| **Bedrock Code Interpreter with Privileged Role (BEDROCK-001)** | Create a Bedrock AgentCore Code Interpreter with a privileged role attached |
-| **Bedrock Code Interpreter Session Hijacking (BEDROCK-002)** | Start a session on an existing Bedrock code interpreter to exfiltrate its privileged role credentials |
-| **CloudFormation Stack Creation with Privileged Role (CLOUDFORMATION-001)** | Create a CloudFormation stack with a privileged role to provision arbitrary AWS resources |
-| **CloudFormation Stack Update for Role Access (CLOUDFORMATION-002)** | Update an existing CloudFormation stack to leverage its already-attached privileged service role |
-| **CloudFormation StackSet Creation with Privileged Role (CLOUDFORMATION-003)** | Create a CloudFormation StackSet with a privileged execution role to provision arbitrary resources across accounts |
-| **CloudFormation StackSet Update with Privileged Role (CLOUDFORMATION-004)** | Update an existing CloudFormation StackSet to inject malicious resources using a privileged execution role |
-| **CloudFormation Change Set Privilege Escalation (CLOUDFORMATION-005)** | Create and execute a change set on an existing stack to leverage its privileged service role |
-| **CodeBuild Project Creation with Privileged Role (CODEBUILD-001)** | Create a CodeBuild project with a privileged role to execute arbitrary code via a malicious buildspec |
-| **CodeBuild Buildspec Override for Role Access (CODEBUILD-002)** | Start a build on an existing CodeBuild project with a buildspec override to execute code with its privileged role |
-| **CodeBuild Batch Buildspec Override for Role Access (CODEBUILD-003)** | Start a batch build on an existing CodeBuild project with a buildspec override to execute code with its privileged role |
-| **CodeBuild Batch Project Creation with Privileged Role (CODEBUILD-004)** | Create a CodeBuild project configured for batch builds with a privileged role to execute arbitrary code via a malicious buildspec |
-| **Data Pipeline Creation with Privileged Role (DATAPIPELINE-001)** | Create a Data Pipeline with a privileged role to execute arbitrary commands on provisioned infrastructure |
-| **EC2 Instance Launch with Privileged Role (EC2-001)** | Launch EC2 instances with privileged IAM roles to gain their permissions via IMDS |
-| **EC2 Role Hijacking via UserData Injection (EC2-002)** | Inject malicious scripts into EC2 instance userData to gain the attached role's permissions |
-| **Spot Instance Launch with Privileged Role (EC2-003)** | Launch EC2 Spot Instances with privileged IAM roles to gain their permissions via IMDS |
-| **Launch Template Poisoning for Role Access (EC2-004)** | Inject malicious userData into launch templates that reference privileged roles, no PassRole needed |
-| **EC2 Instance Connect SSH Access for Role Credentials (EC2INSTANCECONNECT-003)** | Push a temporary SSH key to an EC2 instance via Instance Connect to access its attached role credentials through IMDS |
-| **ECS Service Creation with Privileged Role (ECS-001 - New Cluster)** | Create an ECS cluster and service with a privileged Fargate task role to execute arbitrary code |
-| **ECS Task Execution with Privileged Role (ECS-002 - New Cluster)** | Create an ECS cluster and run a one-off Fargate task with a privileged role to execute arbitrary code |
-| **ECS Service Creation with Privileged Role (ECS-003 - Existing Cluster)** | Deploy a Fargate service with a privileged role on an existing ECS cluster |
-| **ECS Task Execution with Privileged Role (ECS-004 - Existing Cluster)** | Run a one-off Fargate task with a privileged role on an existing ECS cluster |
-| **ECS Task Start with Privileged Role on EC2 (ECS-005 - Existing Cluster)** | Register a task definition with a privileged role and start it on an EC2 container instance to execute arbitrary code |
-| **ECS Exec Container Hijacking for Role Credentials (ECS-006)** | Shell into a running ECS container via ECS Exec to steal the attached task role's credentials |
-| **Glue Dev Endpoint with Privileged Role (GLUE-001)** | Create a Glue development endpoint with a privileged role attached to gain its permissions |
-| **Glue Dev Endpoint SSH Hijacking via Update (GLUE-002)** | Update an existing Glue development endpoint to inject an SSH public key and access its attached role credentials |
-| **Glue Job Creation with Privileged Role (GLUE-003)** | Create a Glue job with a privileged role and start it to execute arbitrary code with that role's permissions |
-| **Glue Job Creation with Scheduled Trigger and Privileged Role (GLUE-004)** | Create a Glue job with a privileged role and a scheduled trigger to persistently execute arbitrary code |
-| **Glue Job Hijacking via Update with Privileged Role (GLUE-005)** | Update an existing Glue job to attach a privileged role and inject malicious code, then start it to gain that role's permissions |
-| **Glue Job Hijacking with Scheduled Trigger and Privileged Role (GLUE-006)** | Update an existing Glue job to attach a privileged role and inject malicious code, then create a scheduled trigger for persistent automated execution |
-| **Policy Version Override for Self-Escalation (IAM-001)** | Create a new version of an attached policy with administrative permissions, instantly escalating the principal's own privileges |
-| **Access Key Creation for Lateral Movement (IAM-002)** | Create access keys for other IAM users to gain their permissions and move laterally across the account |
-| **Access Key Rotation Attack for Lateral Movement (IAM-003)** | Delete and recreate access keys for other IAM users to bypass the two-key limit and gain their permissions |
-| **Console Login Profile Creation for Lateral Movement (IAM-004)** | Create console login profiles for other IAM users to access the AWS Console with their permissions |
-| **Inline Policy Injection for Self-Escalation (IAM-005)** | Attach an inline policy with administrative permissions to your own role, instantly escalating privileges |
-| **Console Password Override for Lateral Movement (IAM-006)** | Change the console password of other IAM users to log in as them and gain their permissions |
-| **Inline Policy Injection on User for Self-Escalation (IAM-007)** | Attach an inline policy with administrative permissions to your own IAM user, instantly escalating privileges |
-| **Managed Policy Attachment on User for Self-Escalation (IAM-008)** | Attach existing managed policies with administrative permissions to your own IAM user, instantly escalating privileges |
-| **Managed Policy Attachment on Role for Self-Escalation (IAM-009)** | Attach existing managed policies with administrative permissions to your own IAM role, instantly escalating privileges |
-| **Managed Policy Attachment on Group for Self-Escalation (IAM-010)** | Attach existing managed policies with administrative permissions to a group you belong to, escalating privileges for all group members |
-| **Inline Policy Injection on Group for Self-Escalation (IAM-011)** | Attach an inline policy with administrative permissions to a group you belong to, escalating privileges for all group members |
-| **Trust Policy Hijacking for Role Assumption (IAM-012)** | Modify a role's trust policy to allow yourself to assume it, gaining the role's permissions |
-| **Group Membership Hijacking for Privilege Escalation (IAM-013)** | Add yourself to a privileged IAM group to inherit its permissions, gaining access to all policies attached to the group |
-| **Managed Policy Attachment with Role Assumption for Lateral Movement (IAM-014)** | Attach administrative managed policies to another role you can assume, then assume it to gain elevated privileges |
-| **Managed Policy Attachment with Access Key Creation for Lateral Movement (IAM-015)** | Attach administrative managed policies to another IAM user and create access keys for them to gain programmatic access with elevated privileges |
-| **Policy Version Override with Role Assumption for Lateral Movement (IAM-016)** | Create a new version of a customer-managed policy attached to another role with administrative permissions, then assume that role to gain elevated access |
-| **Inline Policy Injection with Role Assumption for Lateral Movement (IAM-017)** | Attach an inline policy with administrative permissions to another role you can assume, then assume it to gain elevated privileges |
-| **Inline Policy Injection with Access Key Creation for Lateral Movement (IAM-018)** | Attach an inline policy with administrative permissions to another IAM user and create access keys for them to gain programmatic access with elevated privileges |
-| **Managed Policy Attachment with Trust Policy Hijacking for Privilege Escalation (IAM-019)** | Attach administrative managed policies to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
-| **Policy Version Override with Trust Policy Hijacking for Privilege Escalation (IAM-020)** | Create a new version of a customer-managed policy attached to a role with administrative permissions and modify its trust policy to assume it, without prior assume-role access |
-| **Inline Policy Injection with Trust Policy Hijacking for Privilege Escalation (IAM-021)** | Add an inline policy with administrative permissions to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
-| **Lambda Function Creation with Privileged Role (LAMBDA-001)** | Create a Lambda function with a privileged IAM role and invoke it to execute code with that role's permissions |
-| **Lambda Function Creation with Event Source Trigger (LAMBDA-002)** | Create a Lambda function with a privileged IAM role and an event source mapping to trigger it automatically, executing code with the role's permissions |
-| **Lambda Function Code Injection (LAMBDA-003)** | Modify the code of an existing Lambda function to execute arbitrary commands with the function's execution role permissions |
-| **Lambda Function Code Injection with Direct Invocation (LAMBDA-004)** | Modify the code of an existing Lambda function and invoke it directly to execute arbitrary commands with the function's execution role permissions |
-| **Lambda Function Code Injection with Resource Policy Grant (LAMBDA-005)** | Modify the code of an existing Lambda function and grant yourself invocation permission via its resource-based policy to execute code with the function's execution role |
-| **Lambda Function Creation with Resource Policy Invocation (LAMBDA-006)** | Create a Lambda function with a privileged IAM role and grant yourself invocation permission via its resource-based policy to execute code with the role's permissions |
-| **SageMaker Notebook Creation with Privileged Role (SAGEMAKER-001)** | Create a SageMaker notebook instance with a privileged IAM role to execute arbitrary code with the role's permissions via the Jupyter environment |
-| **SageMaker Training Job Creation with Privileged Role (SAGEMAKER-002)** | Create a SageMaker training job with a privileged IAM role to execute arbitrary container code with the role's permissions |
-| **SageMaker Processing Job Creation with Privileged Role (SAGEMAKER-003)** | Create a SageMaker processing job with a privileged IAM role to execute arbitrary container code with the role's permissions |
-| **SageMaker Presigned Notebook URL for Privilege Escalation (SAGEMAKER-004)** | Generate a presigned URL to access an existing SageMaker notebook instance and execute code with its execution role's permissions |
-| **SageMaker Notebook Lifecycle Config Injection (SAGEMAKER-005)** | Inject a malicious lifecycle configuration into an existing SageMaker notebook to execute code with the notebook's execution role during startup |
-| **SSM Session Access for EC2 Role Credentials (SSM-001)** | Start an SSM session on an EC2 instance to access its attached role credentials through IMDS |
-| **SSM Send Command for EC2 Role Credentials (SSM-002)** | Execute commands on an EC2 instance via SSM Run Command to access its attached role credentials through IMDS |
-| **Role Assumption for Privilege Escalation (STS-001)** | Assume IAM roles with elevated permissions by exploiting bidirectional trust between the starting principal and the target role |
+| Query                                                                                        | Description                                                                                                                                                                             |
+| -------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| **App Runner Service Creation with Privileged Role (APPRUNNER-001)**                         | Create an App Runner service with a privileged IAM role to gain its permissions                                                                                                         |
+| **App Runner Service Update for Role Access (APPRUNNER-002)**                                | Update an existing App Runner service to leverage its already-attached privileged role                                                                                                  |
+| **Bedrock Code Interpreter with Privileged Role (BEDROCK-001)**                              | Create a Bedrock AgentCore Code Interpreter with a privileged role attached                                                                                                             |
+| **Bedrock Code Interpreter Session Hijacking (BEDROCK-002)**                                 | Start a session on an existing Bedrock code interpreter to exfiltrate its privileged role credentials                                                                                   |
+| **CloudFormation Stack Creation with Privileged Role (CLOUDFORMATION-001)**                  | Create a CloudFormation stack with a privileged role to provision arbitrary AWS resources                                                                                               |
+| **CloudFormation Stack Update for Role Access (CLOUDFORMATION-002)**                         | Update an existing CloudFormation stack to leverage its already-attached privileged service role                                                                                        |
+| **CloudFormation StackSet Creation with Privileged Role (CLOUDFORMATION-003)**               | Create a CloudFormation StackSet with a privileged execution role to provision arbitrary resources across accounts                                                                      |
+| **CloudFormation StackSet Update with Privileged Role (CLOUDFORMATION-004)**                 | Update an existing CloudFormation StackSet to inject malicious resources using a privileged execution role                                                                              |
+| **CloudFormation Change Set Privilege Escalation (CLOUDFORMATION-005)**                      | Create and execute a change set on an existing stack to leverage its privileged service role                                                                                            |
+| **CodeBuild Project Creation with Privileged Role (CODEBUILD-001)**                          | Create a CodeBuild project with a privileged role to execute arbitrary code via a malicious buildspec                                                                                   |
+| **CodeBuild Buildspec Override for Role Access (CODEBUILD-002)**                             | Start a build on an existing CodeBuild project with a buildspec override to execute code with its privileged role                                                                       |
+| **CodeBuild Batch Buildspec Override for Role Access (CODEBUILD-003)**                       | Start a batch build on an existing CodeBuild project with a buildspec override to execute code with its privileged role                                                                 |
+| **CodeBuild Batch Project Creation with Privileged Role (CODEBUILD-004)**                    | Create a CodeBuild project configured for batch builds with a privileged role to execute arbitrary code via a malicious buildspec                                                       |
+| **Data Pipeline Creation with Privileged Role (DATAPIPELINE-001)**                           | Create a Data Pipeline with a privileged role to execute arbitrary commands on provisioned infrastructure                                                                               |
+| **EC2 Instance Launch with Privileged Role (EC2-001)**                                       | Launch EC2 instances with privileged IAM roles to gain their permissions via IMDS                                                                                                       |
+| **EC2 Role Hijacking via UserData Injection (EC2-002)**                                      | Inject malicious scripts into EC2 instance userData to gain the attached role's permissions                                                                                             |
+| **Spot Instance Launch with Privileged Role (EC2-003)**                                      | Launch EC2 Spot Instances with privileged IAM roles to gain their permissions via IMDS                                                                                                  |
+| **Launch Template Poisoning for Role Access (EC2-004)**                                      | Inject malicious userData into launch templates that reference privileged roles, no PassRole needed                                                                                     |
+| **EC2 Instance Connect SSH Access for Role Credentials (EC2INSTANCECONNECT-003)**            | Push a temporary SSH key to an EC2 instance via Instance Connect to access its attached role credentials through IMDS                                                                   |
+| **ECS Service Creation with Privileged Role (ECS-001 - New Cluster)**                        | Create an ECS cluster and service with a privileged Fargate task role to execute arbitrary code                                                                                         |
+| **ECS Task Execution with Privileged Role (ECS-002 - New Cluster)**                          | Create an ECS cluster and run a one-off Fargate task with a privileged role to execute arbitrary code                                                                                   |
+| **ECS Service Creation with Privileged Role (ECS-003 - Existing Cluster)**                   | Deploy a Fargate service with a privileged role on an existing ECS cluster                                                                                                              |
+| **ECS Task Execution with Privileged Role (ECS-004 - Existing Cluster)**                     | Run a one-off Fargate task with a privileged role on an existing ECS cluster                                                                                                            |
+| **ECS Task Start with Privileged Role on EC2 (ECS-005 - Existing Cluster)**                  | Register a task definition with a privileged role and start it on an EC2 container instance to execute arbitrary code                                                                   |
+| **ECS Exec Container Hijacking for Role Credentials (ECS-006)**                              | Shell into a running ECS container via ECS Exec to steal the attached task role's credentials                                                                                           |
+| **Glue Dev Endpoint with Privileged Role (GLUE-001)**                                        | Create a Glue development endpoint with a privileged role attached to gain its permissions                                                                                              |
+| **Glue Dev Endpoint SSH Hijacking via Update (GLUE-002)**                                    | Update an existing Glue development endpoint to inject an SSH public key and access its attached role credentials                                                                       |
+| **Glue Job Creation with Privileged Role (GLUE-003)**                                        | Create a Glue job with a privileged role and start it to execute arbitrary code with that role's permissions                                                                            |
+| **Glue Job Creation with Scheduled Trigger and Privileged Role (GLUE-004)**                  | Create a Glue job with a privileged role and a scheduled trigger to persistently execute arbitrary code                                                                                 |
+| **Glue Job Hijacking via Update with Privileged Role (GLUE-005)**                            | Update an existing Glue job to attach a privileged role and inject malicious code, then start it to gain that role's permissions                                                        |
+| **Glue Job Hijacking with Scheduled Trigger and Privileged Role (GLUE-006)**                 | Update an existing Glue job to attach a privileged role and inject malicious code, then create a scheduled trigger for persistent automated execution                                   |
+| **Policy Version Override for Self-Escalation (IAM-001)**                                    | Create a new version of an attached policy with administrative permissions, instantly escalating the principal's own privileges                                                         |
+| **Access Key Creation for Lateral Movement (IAM-002)**                                       | Create access keys for other IAM users to gain their permissions and move laterally across the account                                                                                  |
+| **Access Key Rotation Attack for Lateral Movement (IAM-003)**                                | Delete and recreate access keys for other IAM users to bypass the two-key limit and gain their permissions                                                                              |
+| **Console Login Profile Creation for Lateral Movement (IAM-004)**                            | Create console login profiles for other IAM users to access the AWS Console with their permissions                                                                                      |
+| **Inline Policy Injection for Self-Escalation (IAM-005)**                                    | Attach an inline policy with administrative permissions to your own role, instantly escalating privileges                                                                               |
+| **Console Password Override for Lateral Movement (IAM-006)**                                 | Change the console password of other IAM users to log in as them and gain their permissions                                                                                             |
+| **Inline Policy Injection on User for Self-Escalation (IAM-007)**                            | Attach an inline policy with administrative permissions to your own IAM user, instantly escalating privileges                                                                           |
+| **Managed Policy Attachment on User for Self-Escalation (IAM-008)**                          | Attach existing managed policies with administrative permissions to your own IAM user, instantly escalating privileges                                                                  |
+| **Managed Policy Attachment on Role for Self-Escalation (IAM-009)**                          | Attach existing managed policies with administrative permissions to your own IAM role, instantly escalating privileges                                                                  |
+| **Managed Policy Attachment on Group for Self-Escalation (IAM-010)**                         | Attach existing managed policies with administrative permissions to a group you belong to, escalating privileges for all group members                                                  |
+| **Inline Policy Injection on Group for Self-Escalation (IAM-011)**                           | Attach an inline policy with administrative permissions to a group you belong to, escalating privileges for all group members                                                           |
+| **Trust Policy Hijacking for Role Assumption (IAM-012)**                                     | Modify a role's trust policy to allow yourself to assume it, gaining the role's permissions                                                                                             |
+| **Group Membership Hijacking for Privilege Escalation (IAM-013)**                            | Add yourself to a privileged IAM group to inherit its permissions, gaining access to all policies attached to the group                                                                 |
+| **Managed Policy Attachment with Role Assumption for Lateral Movement (IAM-014)**            | Attach administrative managed policies to another role you can assume, then assume it to gain elevated privileges                                                                       |
+| **Managed Policy Attachment with Access Key Creation for Lateral Movement (IAM-015)**        | Attach administrative managed policies to another IAM user and create access keys for them to gain programmatic access with elevated privileges                                         |
+| **Policy Version Override with Role Assumption for Lateral Movement (IAM-016)**              | Create a new version of a customer-managed policy attached to another role with administrative permissions, then assume that role to gain elevated access                               |
+| **Inline Policy Injection with Role Assumption for Lateral Movement (IAM-017)**              | Attach an inline policy with administrative permissions to another role you can assume, then assume it to gain elevated privileges                                                      |
+| **Inline Policy Injection with Access Key Creation for Lateral Movement (IAM-018)**          | Attach an inline policy with administrative permissions to another IAM user and create access keys for them to gain programmatic access with elevated privileges                        |
+| **Managed Policy Attachment with Trust Policy Hijacking for Privilege Escalation (IAM-019)** | Attach administrative managed policies to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access               |
+| **Policy Version Override with Trust Policy Hijacking for Privilege Escalation (IAM-020)**   | Create a new version of a customer-managed policy attached to a role with administrative permissions and modify its trust policy to assume it, without prior assume-role access         |
+| **Inline Policy Injection with Trust Policy Hijacking for Privilege Escalation (IAM-021)**   | Add an inline policy with administrative permissions to a role and modify its trust policy to allow yourself to assume it, gaining elevated privileges without prior assume-role access |
+| **Lambda Function Creation with Privileged Role (LAMBDA-001)**                               | Create a Lambda function with a privileged IAM role and invoke it to execute code with that role's permissions                                                                          |
+| **Lambda Function Creation with Event Source Trigger (LAMBDA-002)**                          | Create a Lambda function with a privileged IAM role and an event source mapping to trigger it automatically, executing code with the role's permissions                                 |
+| **Lambda Function Code Injection (LAMBDA-003)**                                              | Modify the code of an existing Lambda function to execute arbitrary commands with the function's execution role permissions                                                             |
+| **Lambda Function Code Injection with Direct Invocation (LAMBDA-004)**                       | Modify the code of an existing Lambda function and invoke it directly to execute arbitrary commands with the function's execution role permissions                                      |
+| **Lambda Function Code Injection with Resource Policy Grant (LAMBDA-005)**                   | Modify the code of an existing Lambda function and grant yourself invocation permission via its resource-based policy to execute code with the function's execution role                |
+| **Lambda Function Creation with Resource Policy Invocation (LAMBDA-006)**                    | Create a Lambda function with a privileged IAM role and grant yourself invocation permission via its resource-based policy to execute code with the role's permissions                  |
+| **SageMaker Notebook Creation with Privileged Role (SAGEMAKER-001)**                         | Create a SageMaker notebook instance with a privileged IAM role to execute arbitrary code with the role's permissions via the Jupyter environment                                       |
+| **SageMaker Training Job Creation with Privileged Role (SAGEMAKER-002)**                     | Create a SageMaker training job with a privileged IAM role to execute arbitrary container code with the role's permissions                                                              |
+| **SageMaker Processing Job Creation with Privileged Role (SAGEMAKER-003)**                   | Create a SageMaker processing job with a privileged IAM role to execute arbitrary container code with the role's permissions                                                            |
+| **SageMaker Presigned Notebook URL for Privilege Escalation (SAGEMAKER-004)**                | Generate a presigned URL to access an existing SageMaker notebook instance and execute code with its execution role's permissions                                                       |
+| **SageMaker Notebook Lifecycle Config Injection (SAGEMAKER-005)**                            | Inject a malicious lifecycle configuration into an existing SageMaker notebook to execute code with the notebook's execution role during startup                                        |
+| **SSM Session Access for EC2 Role Credentials (SSM-001)**                                    | Start an SSM session on an EC2 instance to access its attached role credentials through IMDS                                                                                            |
+| **SSM Send Command for EC2 Role Credentials (SSM-002)**                                      | Execute commands on an EC2 instance via SSM Run Command to access its attached role credentials through IMDS                                                                            |
+| **Role Assumption for Privilege Escalation (STS-001)**                                       | Assume IAM roles with elevated permissions by exploiting bidirectional trust between the starting principal and the target role                                                         |

 These tools enable workflows such as:
+
 - Asking an AI assistant to identify privilege escalation paths in a specific AWS account
 - Automating attack path analysis across multiple scans
 - Combining attack path data with findings and compliance information for comprehensive security reports
@@ -2,13 +2,14 @@
 name: prowler-attack-paths-query
 description: >
  Creates Prowler Attack Paths openCypher queries using the Cartography schema as the source of truth
-  for node labels, properties, and relationships. Also covers Prowler-specific additions (Internet node,
-  ProwlerFinding, internal isolation labels) and $provider_uid scoping for predefined queries.
+  for node labels, properties, and relationships. Covers Prowler-specific additions (Internet node,
+  ProwlerFinding, internal isolation labels), $provider_uid scoping, and list-property item nodes
+  with typed `HAS_*` edges that run efficiently on both Neo4j and Amazon Neptune sinks.
  Trigger: When creating or updating Attack Paths queries.
 license: Apache-2.0
 metadata:
  author: prowler-cloud
-  version: "2.0"
+  version: "3.0"
  scope: [root, api]
  auto_invoke:
    - "Creating Attack Paths queries"
@@ -19,36 +20,30 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, Task

 ## Overview

-Attack Paths queries are openCypher queries that analyze cloud infrastructure graphs (ingested via Cartography) to detect security risks like privilege escalation paths, network exposure, and misconfigurations.
-
-Queries are written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
+Attack Paths queries are read-only openCypher queries over a Cartography-ingested cloud graph that detect privilege escalation chains, network exposure, and other graph-shaped security risks. Queries are written in openCypher Version 9 so they run on both Neo4j and Amazon Neptune sinks.

 ---

 ## Two query audiences

-This skill covers two types of queries with different isolation mechanisms:
+|                    | Predefined queries                                          | Custom queries                                                        |
+| ------------------ | ----------------------------------------------------------- | --------------------------------------------------------------------- |
+| Where they live    | `api/src/backend/api/attack_paths/queries/{provider}.py`    | User-supplied via the custom query API endpoint                       |
+| Provider isolation | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection by `cypher_sanitizer.py` |
+| What to write      | Chain every MATCH from the `aws` variable                   | Plain Cypher, no isolation boilerplate                                |
+| Internal labels    | Never use                                                   | Never use (system-injected)                                           |

-| | Predefined queries | Custom queries |
-|---|---|---|
-| **Where they live** | `api/src/backend/api/attack_paths/queries/{provider}.py` | User/LLM-supplied via the custom query API endpoint |
-| **Provider isolation** | `AWSAccount {id: $provider_uid}` anchor + path connectivity | Automatic `_Provider_{uuid}` label injection via `cypher_sanitizer.py` |
-| **What to write** | Chain every MATCH from the `aws` variable | Plain Cypher, no isolation boilerplate needed |
-| **Internal labels** | Never use (`_ProviderResource`, `_Tenant_*`, `_Provider_*`) | Never use (injected automatically by the system) |
+**Predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. That is the isolation boundary.

-**For predefined queries**: every node must be reachable from the `AWSAccount` root via graph traversal. This is the isolation boundary.
-
-**For custom queries**: write natural Cypher without isolation concerns. The query runner injects a `_Provider_{uuid}` label into every node pattern before execution, and a post-query filter catches edge cases.
+**Custom queries**: write natural Cypher. The runner injects a `_Provider_{uuid}` label into every node pattern, and a post-query filter handles edge cases.

 ---

-## Input Sources
+## Input sources

-Queries can be created from:
+Two sources for new queries:

-1. **pathfinding.cloud ID** (e.g., `ECS-001`, `GLUE-001`)
-   - Reference: https://github.com/DataDog/pathfinding.cloud
-   - The aggregated `paths.json` is too large for WebFetch. Use Bash:
+1. **pathfinding.cloud ID** (e.g. `ECS-001`, `GLUE-001`), the Datadog research catalogue. The aggregated `paths.json` is too large for WebFetch:

   ```bash
   # Fetch a single path by ID
@@ -64,28 +59,24 @@ Queries can be created from:
     | jq -r '.[] | select(.id | startswith("ecs")) | "\(.id): \(.name)"'
   ```

-   If `jq` is not available, use `python3 -c "import json,sys; ..."` as a fallback.
+   If `jq` is unavailable, use `python3 -c "import json,sys; ..."`.

-2. **Natural language description** from the user
+2. **Natural language description** from the requester.

 ---

-## Query Structure
+## Query structure

 ### Provider scoping parameter

-One parameter is injected automatically by the query runner:
+| Parameter       | Property | Used on      | Purpose                                |
+| --------------- | -------- | ------------ | -------------------------------------- |
+| `$provider_uid` | `id`     | `AWSAccount` | Scopes the query to a specific account |

-| Parameter       | Property it matches | Used on      | Purpose                          |
-| --------------- | ------------------- | ------------ | -------------------------------- |
-| `$provider_uid` | `id`                | `AWSAccount` | Scopes to a specific AWS account |
-
-All other nodes are isolated by path connectivity from the `AWSAccount` anchor.
+The runner binds `$provider_uid` automatically. Every other node is isolated by path connectivity from the `AWSAccount` anchor.

 ### Imports

-All query files start with these imports:
-
 ```python
 from api.attack_paths.queries.types import (
    AttackPathsQueryAttribution,
@@ -95,29 +86,33 @@ from api.attack_paths.queries.types import (
 from tasks.jobs.attack_paths.config import PROWLER_FINDING_LABEL
 ```

-The `PROWLER_FINDING_LABEL` constant (value: `"ProwlerFinding"`) is used via f-string interpolation in all queries. Never hardcode the label string.
+Always use `PROWLER_FINDING_LABEL` via f-string interpolation, never hardcode `"ProwlerFinding"`.

-### Privilege escalation sub-patterns
+### Definition fields

-There are four distinct privilege escalation patterns. Choose based on the attack type:
+- **id**: kebab-case `{provider}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
+- **name**: short, human-friendly label. Sourced queries append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
+- **short_description**: one sentence, no technical permissions.
+- **description**: full technical explanation, plain text.
+- **provider**: `aws`, `azure`, `gcp`, `kubernetes`, or `github`.
+- **cypher**: f-string Cypher body. Literal `{` / `}` are escaped as `{{` / `}}`.
+- **parameters**: `parameters=[]` if none.
+- **attribution**: optional `AttackPathsQueryAttribution(text, link)` for sourced queries. `link` uses the lowercase ID.

-| Sub-pattern | Target | `path_target` shape | Example |
-|---|---|---|---|
-| Self-escalation | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)` | IAM-001 |
-| Lateral to user | Other IAM users | `(aws)--(target_user:AWSUser)` | IAM-002 |
-| Assume-role lateral | Assumable roles | `(aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)` | IAM-014 |
-| PassRole + service | Service-trusting roles | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(...)` | EC2-001 |
+Append the constant to the `{PROVIDER}_QUERIES` list at the bottom of the provider file.

-#### Self-escalation (e.g., IAM-001)
+---

-The principal modifies resources attached to itself. `path_target` loops back to `principal`:
+## Predefined query template
+
+The canonical shape combines a principal walk, an optional target walk, deduplicated nodes, and a typed finding overlay:

 ```python
 AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    id="aws-{kebab-case-name}",
-    name="{Human-friendly label} ({REFERENCE_ID})",
-    short_description="{Brief explanation, no technical permissions.}",
-    description="{Detailed description of the attack vector and impact.}",
+    name="{Label} ({REFERENCE_ID})",
+    short_description="{One sentence.}",
+    description="{Full technical explanation.}",
    attribution=AttackPathsQueryAttribution(
        text="pathfinding.cloud - {REFERENCE_ID} - {permission}",
        link="https://pathfinding.cloud/paths/{reference_id_lowercase}",
@@ -125,29 +120,27 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
    provider="aws",
    cypher=f"""
        // Find principals with {permission}
-        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
-        WHERE stmt.effect = 'Allow'
-            AND any(action IN stmt.action WHERE
-                toLower(action) = '{permission_lowercase}'
-                OR toLower(action) = '{service}:*'
-                OR action = '*'
-            )
+        MATCH path_principal = (aws:AWSAccount {{id: $provider_uid}})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {{effect: 'Allow'}})
+        MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
+        WHERE toLower(act.value) IN ['{permission_lowercase}', '{service}:*']
+           OR act.value = '*'
+        WITH DISTINCT aws, principal, stmt, path_principal

-        // Find target resources attached to the same principal
+        // Target resources attached to the same principal (sub-patterns below)
        MATCH path_target = (aws)--(target_policy:AWSPolicy)--(principal)
        WHERE target_policy.arn CONTAINS $provider_uid
-            AND any(resource IN stmt.resource WHERE
-                resource = '*'
-                OR target_policy.arn CONTAINS resource
-            )
+        MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
+        WHERE res.value = '*'
+           OR target_policy.arn CONTAINS res.value

+        WITH DISTINCT path_principal, path_target
        WITH collect(path_principal) + collect(path_target) AS paths
        UNWIND paths AS p
        UNWIND nodes(p) AS n

        WITH paths, collect(DISTINCT n) AS unique_nodes
        UNWIND unique_nodes AS n
-        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+        OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

        RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
    """,
@@ -155,158 +148,145 @@ AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
 )
 ```

-#### Other sub-pattern `path_target` shapes
+Key points:

-The other 3 sub-patterns share the same `path_principal`, deduplication tail, and RETURN as self-escalation. Only the `path_target` MATCH differs:
+- The principal walk types the `POLICY` and `STATEMENT` hops. Both are low-fan-out (each principal has a handful of policies; each policy a handful of statements), so the typed edge lets the planner cost a cheap inline filter.
+- The `(aws)--` hub hops stay anonymous. `AWSAccount` is a high-degree node that fans out to every principal, role, policy, and resource in the account; typing those edges forces the planner to enumerate from the hub and collapses performance on multi-tenant Neptune.
+- Other relationship types appear only where the file's existing queries already use one (`TRUSTS_AWS_PRINCIPAL`, `STS_ASSUMEROLE_ALLOW`, `MEMBER_AWS_GROUP`, `HAS_EXECUTION_ROLE`).
+- The finding probe is typed `:HAS_FINDING` and left undirected. The type lets Neptune apply an inline edge filter; the lack of direction matches the convention of the rest of the file.
+- Collapse duplicate rows after each permission gate with `WITH DISTINCT`, carrying only the variables needed by later clauses.
+- Each `HAS_*` traversal is its own `MATCH` clause with a `WHERE` on the child item node. `WITH DISTINCT path_principal, path_target` precedes `collect(path...)` to dedupe the row multiplication produced by the joins.
+- The `RETURN` shape `paths, dpf, dpfr` is the contract the serializer and visualiser depend on. Do not change it.
+
+---
+
+## Privilege escalation sub-patterns
+
+Four `path_target` shapes cover the common attack types. Each shares the canonical template's `path_principal`, deduplication tail, and `RETURN`; only the `path_target` MATCH and its resource predicate differ.
+
+| Sub-pattern         | Target                   | `path_target` shape                                                                                     | Example |
+| ------------------- | ------------------------ | ------------------------------------------------------------------------------------------------------- | ------- |
+| Self-escalation     | Principal's own policies | `(aws)--(target_policy:AWSPolicy)--(principal)`                                                         | IAM-001 |
+| Lateral to user     | Other IAM users          | `(aws)--(target_user:AWSUser)`                                                                          | IAM-002 |
+| Assume-role lateral | Assumable roles          | `(aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)`                                      | IAM-014 |
+| PassRole + service  | Service-trusting roles   | `(aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: '{service}.amazonaws.com'})` | EC2-001 |
+
+**Multi-permission queries** (e.g. PassRole plus a service-create action) add permission gates before `path_target`. Reuse the per-query counter for new variables (`act2`, `policy2`, `stmt2`) and collapse rows after each gate:

 ```cypher
-// Lateral to user (e.g., IAM-002) - targets other IAM users
-MATCH path_target = (aws)--(target_user:AWSUser)
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_user.arn CONTAINS resource OR resource CONTAINS target_user.name)
-
-// Assume-role lateral (e.g., IAM-014) - targets roles the principal can assume
-MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
-
-// PassRole + service (e.g., EC2-001) - targets roles trusting a service
-MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: '{service}.amazonaws.com'})
-WHERE any(resource IN stmt.resource WHERE resource = '*' OR target_role.arn CONTAINS resource OR resource CONTAINS target_role.name)
+MATCH (principal)-[:POLICY]->(policy2:AWSPolicy)-[:STATEMENT]->(stmt2:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt2)-[:HAS_ACTION]->(act2:AWSPolicyStatementActionItem)
+WHERE toLower(act2.value) IN ['service:*', 'service:createsomething']
+   OR act2.value = '*'
+WITH DISTINCT aws, principal, stmt, stmt2, path_principal
 ```

-**Multi-permission**: PassRole queries require a second permission. Add `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` with its own WHERE before `path_target`, then check BOTH `stmt.resource` AND `stmt2.resource` against the target. See IAM-015 or EC2-001 in `aws.py` for examples.
+If a permission is an existence-only gate whose statement resource is not checked later, keep the policy and statement anonymous and carry only the variables still needed:

-### Network exposure pattern
+```cypher
+MATCH (principal)-[:POLICY]->(:AWSPolicy)-[:STATEMENT]->(:AWSPolicyStatement {effect: 'Allow'})-[:HAS_ACTION]->(act3:AWSPolicyStatementActionItem)
+WHERE toLower(act3.value) IN ['service:*', 'service:othersomething']
+   OR act3.value = '*'
+WITH DISTINCT aws, principal, stmt, path_principal
+```

-The Internet node is reached via `CAN_ACCESS` through the already-scoped resource, not via a standalone lookup:
+When all matching principals can target the same independent resource set, collect principal paths before expanding targets instead of creating one row per principal-target pair:
+
+```cypher
+WITH aws, collect(DISTINCT path_principal) AS principal_paths
+MATCH path_target = (aws)--(target)
+WITH principal_paths + collect(DISTINCT path_target) AS paths
+```
+
+Statements that constrain a target are still checked via `HAS_RESOURCE` traversals (`res`, `res2`). See IAM-015 or EC2-001 in `aws.py`.
+
+---
+
+## Network exposure pattern
+
+The Internet node is reached via `CAN_ACCESS` through an already-scoped resource, never as a standalone lookup:

 ```python
-AWS_{QUERY_NAME} = AttackPathsQueryDefinition(
-    id="aws-{kebab-case-name}",
-    name="{Human-friendly label}",
-    short_description="{Brief explanation.}",
-    description="{Detailed description.}",
-    provider="aws",
-    cypher=f"""
-        // Match exposed resources (MUST chain from `aws`)
-        MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
-        WHERE resource.exposed_internet = true
+cypher=f"""
+    // Resource scoped through the account anchor
+    MATCH path = (aws:AWSAccount {{id: $provider_uid}})--(resource:EC2Instance)
+    WHERE resource.exposed_internet = true

-        // Internet node reached via path connectivity through the resource
-        OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
+    // Internet node reached via path connectivity through the resource
+    OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)

-        WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
-        UNWIND paths AS p
-        UNWIND nodes(p) AS n
+    WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
+    UNWIND paths AS p
+    UNWIND nodes(p) AS n

-        WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
-        UNWIND unique_nodes AS n
-        OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+    WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
+    UNWIND unique_nodes AS n
+    OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

-        RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
-            internet, can_access
-    """,
-    parameters=[],
-)
+    RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
+        internet, can_access
+"""
 ```

-### Register in query list
-
-Add to the `{PROVIDER}_QUERIES` list at the bottom of the file:
-
-```python
-AWS_QUERIES: list[AttackPathsQueryDefinition] = [
-    # ... existing queries ...
-    AWS_{NEW_QUERY_NAME},  # Add here
-]
-```
+The `CAN_ACCESS` edge stays typed and directed (`-[:CAN_ACCESS]->`); that is its canonical sync-time orientation.

 ---

-## Step-by-step creation process
+## List-typed properties as child nodes

-### 1. Read the queries module
+Some Cartography node properties carry a list of values: `AWSPolicyStatement.action`, `AWSPolicyStatement.resource`, `KMSKey.encryption_algorithms`, `CloudFrontDistribution.aliases`, and many others. The graph models each such property as a set of child item nodes connected to the parent by a typed edge. Queries reach the values by traversing the edge; the parent does not carry the list as a single field.

-**FIRST**, read all files in the queries module to understand the structure, type definitions, registration, and existing style:
+### Naming convention

-```text
-api/src/backend/api/attack_paths/queries/
-├── __init__.py      # Module exports
-├── types.py         # AttackPathsQueryDefinition, AttackPathsQueryParameterDefinition
-├── registry.py      # Query registry logic
-└── {provider}.py    # Provider-specific queries (e.g., aws.py)
+For a list-typed parent property the sink stores:
+
+- **Child label**: `<ParentLabel><PropertyPascal>Item`. Example: `AWSPolicyStatement.resource` → `AWSPolicyStatementResourceItem`.
+- **Edge type**: `HAS_<PROPERTY_UPPER>`. Example: `resource` → `HAS_RESOURCE`.
+- **Child property**: `value` (a single scalar string) for scalar-list properties. For list-of-dict properties (rare; for example `SecretsManagerSecretVersion.tags`) the child carries the dict keys as named fields per the catalog's `field_map`.
+
+### Variable naming for child-item matches
+
+`aws.py` uses a per-query counter for each `HAS_*` traversal so chained matches stay unambiguous:
+
+| Edge              | First  | Second  | Third   |
+| ----------------- | ------ | ------- | ------- |
+| `HAS_ACTION`      | `act`  | `act2`  | `act3`  |
+| `HAS_RESOURCE`    | `res`  | `res2`  | `res3`  |
+| `HAS_NOTACTION`   | `nact` | `nact2` | `nact3` |
+| `HAS_NOTRESOURCE` | `nres` | `nres2` | `nres3` |
+
+The counter resets at the top of every query.
+
+### Example - action match
+
+Find statements that grant `iam:PassRole`, `iam:*`, or `*`. Traverse the `HAS_ACTION` edge in its own `MATCH` clause and apply the predicate in the attached `WHERE`:
+
+```cypher
+MATCH (stmt:AWSPolicyStatement {effect: 'Allow'})
+MATCH (stmt)-[:HAS_ACTION]->(act:AWSPolicyStatementActionItem)
+WHERE toLower(act.value) IN ['iam:passrole', 'iam:*']
+   OR act.value = '*'
 ```

-**DO NOT** use generic templates. Match the exact style of existing queries in the file.
+The literal-action list is case-folded with `toLower(act.value)` because IAM authors mix case (`iam:PassRole`, `iam:passrole`); the `*` wildcard never lower-cases.

-### 2. Fetch and consult the Cartography schema
+### Example - resource ARN match

-**This is the most important step.** Every node label, property, and relationship in the query must exist in the Cartography schema for the pinned version. Do not guess or rely on memory.
+Find statements whose resource can target a specific role:

-Check `api/pyproject.toml` for the Cartography dependency, then fetch the schema:
-
-```bash
-grep cartography api/pyproject.toml
+```cypher
+MATCH path_target = (aws)--(target_role:AWSRole)
+MATCH (stmt)-[:HAS_RESOURCE]->(res:AWSPolicyStatementResourceItem)
+WHERE res.value = '*'
+   OR res.value CONTAINS target_role.name
+   OR target_role.arn CONTAINS res.value
 ```

-Build the schema URL (ALWAYS use the specific tag, not master/main):
+Three predicates cover the cases: full wildcard (`*`), pattern containing the role name (`arn:aws:iam::*:role/admin*`), and pattern that is a prefix or component of the actual ARN.

-```text
-# Git dependency (prowler-cloud/cartography@0.126.1):
-https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/0.126.1/docs/root/modules/{provider}/schema.md
+### Catalog of list properties

-# PyPI dependency (cartography = "^0.126.0"):
-https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/0.126.0/docs/root/modules/{provider}/schema.md
-```
-
-Read the schema to discover available node labels, properties, and relationships for the target resources. Internal labels (`_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*`) exist for isolation but should never appear in queries.
-
-### 4. Create query definition
-
-Use the appropriate pattern (privilege escalation or network exposure) with:
-
- **id**: `{provider}-{kebab-case-description}`
- **name**: Short, human-friendly label. For sourced queries, append the reference ID: `"EC2 Instance Launch with Privileged Role (EC2-001)"`.
- **short_description**: Brief explanation, no technical permissions.
- **description**: Full technical explanation. Plain text only.
- **provider**: Provider identifier (aws, azure, gcp, kubernetes, github)
- **cypher**: The openCypher query with proper escaping
- **parameters**: Optional list of user-provided parameters (`parameters=[]` if none)
- **attribution**: Optional `AttackPathsQueryAttribution(text, link)` for sourced queries. The `text` includes source, reference ID, and permissions. The `link` uses a lowercase ID. Omit for non-sourced queries.
-
-### 5. Add query to provider list
-
-Add the constant to the `{PROVIDER}_QUERIES` list.
-
---
-
-## Query naming conventions
-
-### Query ID
-
-```text
-{provider}-{category}-{description}
-```
-
-Examples: `aws-ec2-privesc-passrole-iam`, `aws-ec2-instances-internet-exposed`
-
-### Query constant name
-
-```text
-{PROVIDER}_{CATEGORY}_{DESCRIPTION}
-```
-
-Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
-
---
-
-## Query categories
-
-| Category             | Description                    | Example                   |
-| -------------------- | ------------------------------ | ------------------------- |
-| Basic Resource       | List resources with properties | RDS instances, S3 buckets |
-| Network Exposure     | Internet-exposed resources     | EC2 with public IPs       |
-| Privilege Escalation | IAM privilege escalation paths | PassRole + RunInstances   |
-| Data Access          | Access to sensitive data       | EC2 with S3 access        |
+The provider catalog lives in `api/src/backend/tasks/jobs/attack_paths/provider_config.py` (`AWS_NORMALIZED_LISTS`). Beyond policy statements it includes KMS algorithms, ECS container-definition lists (`entry_point`, `command`, `links`, `dns_servers`, ...), CloudFront aliases, Inspector finding URL and vulnerability lists, RDS event-subscription categories, and others. To query a list property that is not in the catalog, add an entry there first so the sync layer materialises it.

 ---

@@ -315,53 +295,42 @@ Examples: `AWS_EC2_PRIVESC_PASSROLE_IAM`, `AWS_EC2_INSTANCES_INTERNET_EXPOSED`
 ### Match account and principal

 ```cypher
-MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)--(policy:AWSPolicy)--(stmt:AWSPolicyStatement)
+MATCH path_principal = (aws:AWSAccount {id: $provider_uid})--(principal:AWSPrincipal)-[:POLICY]->(policy:AWSPolicy)-[:STATEMENT]->(stmt:AWSPolicyStatement {effect: 'Allow'})
 ```

-### Check IAM action permissions
+The `(aws)--(principal)` hop stays anonymous; the `POLICY` and `STATEMENT` hops are typed.
+
+### Roles trusting a service

 ```cypher
-WHERE stmt.effect = 'Allow'
-    AND any(action IN stmt.action WHERE
-        toLower(action) = 'iam:passrole'
-        OR toLower(action) = 'iam:*'
-        OR action = '*'
-    )
+MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]-(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
 ```

-### Find roles trusting a service
+### Roles a principal can assume

 ```cypher
-MATCH path_target = (aws)--(target_role:AWSRole)-[:TRUSTS_AWS_PRINCIPAL]->(:AWSPrincipal {arn: 'ec2.amazonaws.com'})
+MATCH path_target = (aws)--(target_role:AWSRole)-[:STS_ASSUMEROLE_ALLOW]-(principal)
 ```

-### Find roles the principal can assume
+### JSON-encoded properties

-Note the arrow direction - `STS_ASSUMEROLE_ALLOW` points from the role to the principal:
+Object-typed Cartography properties (most notably `condition` on `AWSPolicyStatement` and `S3PolicyStatement`) are stored as JSON-encoded strings, e.g. `'{"StringEquals":{"aws:SourceAccount":"123456789012"}}'`. There is no JSON parser at query time, so use `CONTAINS` for substring checks:

 ```cypher
-MATCH path_target = (aws)--(target_role:AWSRole)<-[:STS_ASSUMEROLE_ALLOW]-(principal)
+WHERE stmt.condition CONTAINS '"aws:SourceAccount"'
 ```

-### Check resource scope
-
-```cypher
-WHERE any(resource IN stmt.resource WHERE
-    resource = '*'
-    OR target_role.arn CONTAINS resource
-    OR resource CONTAINS target_role.name
-)
-```
+For structured inspection, fetch the rows and parse in Python. Cypher cannot navigate JSON object keys.

 ### Internet node via path connectivity

-The Internet node is reached through `CAN_ACCESS` relationships to already-scoped resources. No standalone lookup needed:
-
 ```cypher
 OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)
 ```

-### Multi-label OR (match multiple resource types)
+`resource` must already be bound by the account-anchored pattern above.
+
+### Multi-label OR (multiple resource types)

 ```cypher
 MATCH path = (aws:AWSAccount {id: $provider_uid})-[r]-(x)-[q]-(y)
@@ -373,7 +342,7 @@ WHERE (x:EC2PrivateIp AND x.public_ip = $ip)

 ### Include Prowler findings

-Deduplicate nodes before the ProwlerFinding lookup to avoid redundant OPTIONAL MATCH calls on nodes that appear in multiple paths:
+Deduplicate nodes before the typed finding probe to avoid one `OPTIONAL MATCH` per path-occurrence of the same node:

 ```cypher
 WITH collect(path_principal) + collect(path_target) AS paths
@@ -382,12 +351,12 @@ UNWIND nodes(p) AS n

 WITH paths, collect(DISTINCT n) AS unique_nodes
 UNWIND unique_nodes AS n
-OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

 RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr
 ```

-For network exposure queries, aggregate the internet node and relationship alongside paths:
+For network-exposure queries, aggregate the Internet node and its edge alongside paths:

 ```cypher
 WITH collect(path) AS paths, head(collect(internet)) AS internet, collect(can_access) AS can_access
@@ -396,7 +365,7 @@ UNWIND nodes(p) AS n

 WITH paths, internet, can_access, collect(DISTINCT n) AS unique_nodes
 UNWIND unique_nodes AS n
-OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})
+OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})

 RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,
    internet, can_access
@@ -406,22 +375,22 @@ RETURN paths, collect(DISTINCT pf) as dpf, collect(DISTINCT pfr) as dpfr,

 ## Prowler-specific labels and relationships

-These are added by the sync task, not part of the Cartography schema. For all other node labels, properties, and relationships, **always consult the Cartography schema** (see step 2 below).
+Added by the sync task, not part of the Cartography schema. For everything else, consult the pinned Cartography schema (see "Creation steps").

-| Label/Relationship     | Description                                        |
-| ---------------------- | -------------------------------------------------- |
-| `ProwlerFinding`       | Finding node (`status`, `severity`, `check_id`)    |
-| `Internet`             | Internet sentinel node                             |
-| `CAN_ACCESS`           | Internet-to-resource exposure (relationship)       |
-| `HAS_FINDING`          | Resource-to-finding link (relationship)            |
-| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship                            |
-| `STS_ASSUMEROLE_ALLOW` | Can assume role (direction: role -> principal)      |
+| Label / Relationship   | Description                                                 |
+| ---------------------- | ----------------------------------------------------------- |
+| `ProwlerFinding`       | Finding node (`status`, `severity`, `check_id`)             |
+| `Internet`             | Internet sentinel node                                      |
+| `CAN_ACCESS`           | `(Internet)-[:CAN_ACCESS]->(resource)` exposure edge        |
+| `HAS_FINDING`          | `(resource)-[:HAS_FINDING]->(:ProwlerFinding)` finding link |
+| `TRUSTS_AWS_PRINCIPAL` | Role trust relationship                                     |
+| `STS_ASSUMEROLE_ALLOW` | Can assume role                                             |

 ---

 ## Parameters

-For queries requiring user input:
+For queries that take user input:

 ```python
 parameters=[
@@ -438,50 +407,83 @@ parameters=[

 ---

-## Best practices
+## openCypher compatibility

-1. **Chain all MATCHes from the root account node**: Every `MATCH` clause must connect to the `aws` variable (or another variable already bound to the account's subgraph). An unanchored `MATCH` would return nodes from all providers.
+Queries must run on both Neo4j and Amazon Neptune. Avoid these constructs:

-   ```cypher
-   // WRONG: matches ALL AWSRoles across all providers
-   MATCH (role:AWSRole) WHERE role.name = 'admin'
+| Feature                                 | Use instead                                                                                                                                 |
+| --------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------- |
+| APOC procedures (`apoc.*`)              | Real nodes and relationships in the graph                                                                                                   |
+| Neptune extensions                      | Standard openCypher                                                                                                                         |
+| `reduce()`                              | `UNWIND` + `collect()`                                                                                                                      |
+| `FOREACH`                               | `WITH` + `UNWIND` + `SET`                                                                                                                   |
+| Regex `=~`                              | `toLower()` + exact match, or `STARTS WITH` / `CONTAINS`                                                                                    |
+| `CALL () { UNION }`                     | Multi-label `OR` in `WHERE` (see pattern above)                                                                                             |
+| `any(x IN list ...)`                    | `size([x IN list WHERE pred]) > 0`                                                                                                          |
+| `all(x IN list ...)`                    | `size([x IN list WHERE pred]) = size(list)`                                                                                                 |
+| `none(x IN list ...)`                   | `size([x IN list WHERE pred]) = 0`                                                                                                          |
+| `EXISTS { MATCH (pattern) WHERE pred }` | Standalone `MATCH (pattern)` + `WHERE pred`; precede the downstream `collect(path...)` with `WITH DISTINCT <path-vars>` to dedupe the joins |

-   // CORRECT: scoped to the specific account's subgraph
-   MATCH (aws)--(role:AWSRole) WHERE role.name = 'admin'
-   ```
-
-   **Exception**: A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph by the first MATCH. It does not need to chain from `aws` again.
-
-2. **Include Prowler findings**: Always add `OPTIONAL MATCH (n)-[pfr]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})` with `collect(DISTINCT pf)`.
-
-3. **Comment the query purpose**: Add inline comments explaining each MATCH clause.
-
-4. **Never use internal labels in queries**: `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are for system isolation. They should never appear in predefined or custom query text.
-
-6. **Internet node uses path connectivity**: Reach it via `OPTIONAL MATCH (internet:Internet)-[can_access:CAN_ACCESS]->(resource)` where `resource` is already scoped by the account anchor. No standalone lookup.
+For list-typed properties in the catalog (action, resource, and so on), traverse the `HAS_*` edges to the child item nodes via the multi-`MATCH` shape shown in "List-typed properties as child nodes". The parent node does not carry the list as a single field, so `split(...)` and comma-string predicates do not apply.

 ---

-## openCypher compatibility
+## Best practices

-Queries must be written in **openCypher Version 9** for compatibility with both Neo4j and Amazon Neptune.
+1. **Chain every MATCH from the account anchor.** An unanchored `MATCH (role:AWSRole)` returns roles from every provider in the graph; `MATCH (aws)--(role:AWSRole)` is scoped. A second-permission MATCH like `MATCH (principal)--(policy2:AWSPolicy)--(stmt2:AWSPolicyStatement)` is safe because `principal` is already bound to the account's subgraph.
+2. **Type the finding probe.** Always `OPTIONAL MATCH (n)-[pfr:HAS_FINDING]-(pf:{PROWLER_FINDING_LABEL} {{status: 'FAIL'}})`. The type lets Neptune apply an inline edge filter; an untyped probe scans every incident edge of high-degree nodes.
+3. **Comment each MATCH.** One inline `// ...` line per clause explaining its role.
+4. **Never use internal labels.** `_ProviderResource`, `_AWSResource`, `_Tenant_*`, `_Provider_*` are system isolation labels and must not appear in query text (predefined or custom).
+5. **Reach the Internet node through path connectivity** via `(internet:Internet)-[:CAN_ACCESS]->(resource)`, never as a standalone match.
+6. **Preserve the `RETURN` contract.** `paths, dpf, dpfr` for the standard shape; add `internet, can_access` for network-exposure queries. The serializer and visualiser depend on these names.

-### Avoid these (not in openCypher spec)
+---

-| Feature                    | Use instead                                            |
-| -------------------------- | ------------------------------------------------------ |
-| APOC procedures (`apoc.*`) | Real nodes and relationships in the graph              |
-| Neptune extensions         | Standard openCypher                                    |
-| `reduce()` function        | `UNWIND` + `collect()`                                 |
-| `FOREACH` clause           | `WITH` + `UNWIND` + `SET`                              |
-| Regex operator (`=~`)      | `toLower()` + exact match, or `CONTAINS`/`STARTS WITH`. One legacy query uses `=~` - do not add new usages |
-| `CALL () { UNION }`        | Multi-label OR in WHERE (see patterns section)         |
+## Naming conventions
+
+- **ID**: kebab-case `{provider}-{category}-{description}`, e.g. `aws-ec2-privesc-passrole-iam`.
+- **Constant**: SHOUTING*SNAKE_CASE `{PROVIDER}*{CATEGORY}\_{DESCRIPTION}`, e.g. `AWS_EC2_PRIVESC_PASSROLE_IAM`.
+
+---
+
+## Creation steps
+
+1. **Read the queries module first** to match the existing style:
+
+   ```text
+   api/src/backend/api/attack_paths/queries/
+   ├── __init__.py
+   ├── types.py         # dataclass definitions
+   ├── registry.py
+   └── {provider}.py
+   ```
+
+2. **Fetch the Cartography schema for the pinned version.** Do not guess labels, properties, or relationships. Read the dependency pin:
+
+   ```bash
+   grep cartography api/pyproject.toml
+   ```
+
+   Then fetch the schema for that exact tag:
+
+   ```text
+   # Git pin (prowler-cloud/cartography@<TAG>):
+   https://raw.githubusercontent.com/prowler-cloud/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
+
+   # PyPI pin (cartography==<TAG>):
+   https://raw.githubusercontent.com/cartography-cncf/cartography/refs/tags/<TAG>/docs/root/modules/{provider}/schema.md
+   ```
+
+3. **Build the query** using the canonical predefined template plus the appropriate sub-pattern (privilege escalation or network exposure). For list-typed properties (action/resource/etc.), traverse the exploded child nodes via `[:HAS_ACTION]->(:AWSPolicyStatementActionItem)` etc. (see "List-typed properties as child nodes" and the `AWS_NORMALIZED_LISTS` catalog).
+
+4. **Register** the constant in the `{PROVIDER}_QUERIES` list at the bottom of the provider file.

 ---

 ## Reference

- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`, not WebFetch)
- **Cartography schema**: `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{version}/docs/root/modules/{provider}/schema.md`
- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html
- **openCypher spec**: https://github.com/opencypher/openCypher
+- **pathfinding.cloud**: https://github.com/DataDog/pathfinding.cloud (use `curl | jq`; the aggregated `paths.json` is too large for WebFetch).
+- **Cartography schema** (per pinned tag): `https://raw.githubusercontent.com/{org}/cartography/refs/tags/{tag}/docs/root/modules/{provider}/schema.md`.
+- **Neptune openCypher compliance**: https://docs.aws.amazon.com/neptune/latest/userguide/feature-opencypher-compliance.html.
+- **openCypher spec**: https://github.com/opencypher/openCypher.
+- **Sync converter** (`tasks/jobs/attack_paths/sync.py`): list-typed node properties listed in `tasks/jobs/attack_paths/provider_config.py::AWS_NORMALIZED_LISTS` are materialised as child item nodes + `HAS_*` edges. Properties that are not in the catalog are serialised to a comma-delimited string and emit a one-time warning. Dict-typed properties become JSON strings. Same shape on both sinks.