chore: update lighthouse architecture docs

2025-12-19 05:17:47 +00:00 · 2025-12-17 21:11:16 +08:00
8 changed files with 39 additions and 245 deletions
--- a/docs/developer-guide/lighthouse.mdx
+++ b/docs/developer-guide/lighthouse.mdx
@@ -1,140 +0,0 @@
---
-title: 'Extending Prowler Lighthouse AI'
---
-
-This guide helps developers customize and extend Prowler Lighthouse AI by adding or modifying AI agents.
-
-## Understanding AI Agents
-
-AI agents combine Large Language Models (LLMs) with specialized tools that provide environmental context. These tools can include API calls, system command execution, or any function-wrapped capability.
-
-### Types of AI Agents
-
-AI agents fall into two main categories:
-
- **Autonomous Agents**: Freely chooses from available tools to complete tasks, adapting their approach based on context. They decide which tools to use and when.
- **Workflow Agents**: Follows structured paths with predefined logic. They execute specific tool sequences and can include conditional logic.
-
-Prowler Lighthouse AI is an autonomous agent - selecting the right tool(s) based on the users query.
-
-<Note>
-To learn more about AI agents, read [Anthropic's blog post on building effective agents](https://www.anthropic.com/engineering/building-effective-agents).
-
-</Note>
-### LLM Dependency
-
-The autonomous nature of agents depends on the underlying LLM. Autonomous agents using identical system prompts and tools but powered by different LLM providers might approach user queries differently. Agent with one LLM might solve a problem efficiently, while with another it might take a different route or fail entirely.
-
-After evaluating multiple LLM providers (OpenAI, Gemini, Claude, LLama) based on tool calling features and response accuracy, we recommend using the `gpt-4o` model.
-
-## Prowler Lighthouse AI Architecture
-
-Prowler Lighthouse AI uses a multi-agent architecture orchestrated by the [Langgraph-Supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor) library.
-
-### Architecture Components
-
-<img src="/images/prowler-app/lighthouse-architecture.png" alt="Prowler Lighthouse architecture" />
-
-Prowler Lighthouse AI integrates with the NextJS application:
-
- The [Langgraph-Supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor) library integrates directly with NextJS
- The system uses the authenticated user session to interact with the Prowler API server
- Agents only access data the current user is authorized to view
- Session management operates automatically, ensuring Role-Based Access Control (RBAC) is maintained
-
-## Available Prowler AI Agents
-
-The following specialized AI agents are available in Prowler:
-
-### Agent Overview
-
- **provider_agent**: Fetches information about cloud providers connected to Prowler
- **user_info_agent**: Retrieves information about Prowler users
- **scans_agent**: Fetches information about Prowler scans
- **compliance_agent**: Retrieves compliance overviews across scans
- **findings_agent**: Fetches information about individual findings across scans
- **overview_agent**: Retrieves overview information (providers, findings by status and severity, etc.)
-
-## How to Add New Capabilities
-
-### Updating the Supervisor Prompt
-
-The supervisor agent controls system behavior, tone, and capabilities. You can find the supervisor prompt at: [https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts](https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts)
-
-#### Supervisor Prompt Modifications
-
-Modifying the supervisor prompt allows you to:
-
- Change personality or response style
- Add new high-level capabilities
- Modify task delegation to specialized agents
- Set up guardrails (query types to answer or decline)
-
-<Note>
-The supervisor agent should not have its own tools. This design keeps the system modular and maintainable.
-
-</Note>
-### How to Create New Specialized Agents
-
-The supervisor agent and all specialized agents are defined in the `route.ts` file. The supervisor agent uses [langgraph-supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor), while other agents use the prebuilt [create-react-agent](https://langchain-ai.github.io/langgraphjs/how-tos/create-react-agent/).
-
-To add new capabilities or all Lighthouse AI to interact with other APIs, create additional specialized agents:
-
-1. First determine what the new agent would do. Create a detailed prompt defining the agent's purpose and capabilities. You can see an example from [here](https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts#L359-L385).
-<Note>
-Ensure that the new agent's capabilities don't collide with existing agents. For example, if there's already a *findings_agent* that talks to findings APIs don't create a new agent to do the same.
-
-</Note>
-2. Create necessary tools for the agents to access specific data or perform actions. A tool is a specialized function that extends the capabilities of LLM by allowing it to access external data or APIs. A tool is triggered by LLM based on the description of the tool and the user's query.
-For example, the description of `getScanTool` is "Fetches detailed information about a specific scan by its ID." If the description doesn't convey what the tool is capable of doing, LLM will not invoke the function. If the description of `getScanTool` was set to something random or not set at all, LLM will not answer queries like "Give me the critical issues from the scan ID xxxxxxxxxxxxxxx"
-<Note>
-Ensure that one tool is added to one agent only. Adding tools is optional. There can be agents with no tools at all.
-
-</Note>
-3. Use the `createReactAgent` function to define a new agent. For example, the rolesAgent name is "roles_agent" and has access to call tools "*getRolesTool*" and "*getRoleTool*"
-```js
-const rolesAgent = createReactAgent({
-  llm: llm,
-  tools: [getRolesTool, getRoleTool],
-  name: "roles_agent",
-  prompt: rolesAgentPrompt,
-});
-```
-
-4. Create a detailed prompt defining the agent's purpose and capabilities.
-
-5. Add the new agent to the available agents list:
-```js
-const agents = [
-  userInfoAgent,
-  providerAgent,
-  overviewAgent,
-  scansAgent,
-  complianceAgent,
-  findingsAgent,
-  rolesAgent,  // New agent added here
-];
-// Create supervisor workflow
-const workflow = createSupervisor({
-  agents: agents,
-  llm: supervisorllm,
-  prompt: supervisorPrompt,
-  outputMode: "last_message",
-});
-```
-
-6. Update the supervisor's system prompt to summarize the new agent's capabilities.
-
-### Best Practices for Agent Development
-
-When developing new agents or capabilities:
-
- **Clear Responsibility Boundaries**: Each agent should have a defined purpose with minimal overlap. No two agents should access the same tools or different tools accessing the same Prowler APIs.
- **Minimal Data Access**: Agents should only request the data they need, keeping requests specific to minimize context window usage, cost, and response time.
- **Thorough Prompting:** Ensure agent prompts include clear instructions about:
-    - The agent's purpose and limitations
-    - How to use its tools
-    - How to format responses for the supervisor
-    - Error handling procedures (Optional)
- **Security Considerations:** Agents should never modify data or access sensitive information like secrets or credentials.
- **Testing:** Thoroughly test new agents with various queries before deploying to production.
--- a/docs/docs.json
+++ b/docs/docs.json
@@ -267,7 +267,6 @@
              "developer-guide/outputs",
              "developer-guide/integrations",
              "developer-guide/security-compliance-framework",
-              "developer-guide/lighthouse",
              "developer-guide/mcp-server"
            ]
          },
--- a/docs/getting-started/products/prowler-lighthouse-ai.mdx
+++ b/docs/getting-started/products/prowler-lighthouse-ai.mdx
@@ -59,6 +59,12 @@ Prowler Lighthouse AI is powerful, but there are limitations:
 - **NextJS session dependence**: If your Prowler application session expires or logs out, Lighthouse AI will error out. Refresh and log back in to continue.
 - **Response quality**: The response quality depends on the selected LLM provider and model. Choose models with strong tool-calling capabilities for best results. We recommend `gpt-5` model from OpenAI.

+## Extending Lighthouse AI
+
+Lighthouse AI retrieves data through Prowler MCP. To add new capabilities, extend the Prowler MCP Server with additional tools and let Lighthouse AI discover them automatically.
+
+For development details, see [Extending the MCP Server](https://docs.prowler.com/developer-guide/mcp-server#extending-the-mcp-server).
+
 ### Getting Help

 If you encounter issues with Prowler Lighthouse AI or have suggestions for improvements, please [reach out through our Slack channel](https://goto.prowler.com/slack).
@@ -67,94 +73,6 @@ If you encounter issues with Prowler Lighthouse AI or have suggestions for impro

 The following API endpoints are accessible to Prowler Lighthouse AI. Data from the following API endpoints could be shared with LLM provider depending on the scope of user's query:

-#### Accessible API Endpoints
-
-**User Management:**
-
- List all users - `/api/v1/users`
- Retrieve the current user's information - `/api/v1/users/me`
-
-**Provider Management:**
-
- List all providers - `/api/v1/providers`
- Retrieve data from a provider - `/api/v1/providers/{id}`
-
-**Scan Management:**
-
- List all scans - `/api/v1/scans`
- Retrieve data from a specific scan - `/api/v1/scans/{id}`
-
-**Resource Management:**
-
- List all resources - `/api/v1/resources`
- Retrieve data for a resource - `/api/v1/resources/{id}`
-
-**Findings Management:**
-
- List all findings - `/api/v1/findings`
- Retrieve data from a specific finding - `/api/v1/findings/{id}`
- Retrieve metadata values from findings - `/api/v1/findings/metadata`
-
-**Overview Data:**
-
- Get aggregated findings data - `/api/v1/overviews/findings`
- Get findings data by severity - `/api/v1/overviews/findings_severity`
- Get aggregated provider data - `/api/v1/overviews/providers`
- Get findings data by service - `/api/v1/overviews/services`
-
-**Compliance Management:**
-
- List compliance overviews (optionally filter by scan) - `/api/v1/compliance-overviews`
- Retrieve data from a specific compliance overview - `/api/v1/compliance-overviews/{id}`
-
-#### Excluded API Endpoints
-
-Not all Prowler API endpoints are integrated with Lighthouse AI. They are intentionally excluded for the following reasons:
-
- OpenAI/other LLM providers shouldn't have access to sensitive data (like fetching provider secrets and other sensitive config)
- Users queries don't need responses from those API endpoints (ex: tasks, tenant details, downloading zip file, etc.)
-
-**Excluded Endpoints:**
-
-**User Management:**
-
- List specific users information - `/api/v1/users/{id}`
- List user memberships - `/api/v1/users/{user_pk}/memberships`
- Retrieve membership data from the user - `/api/v1/users/{user_pk}/memberships/{id}`
-
-**Tenant Management:**
-
- List all tenants - `/api/v1/tenants`
- Retrieve data from a tenant - `/api/v1/tenants/{id}`
- List tenant memberships - `/api/v1/tenants/{tenant_pk}/memberships`
- List all invitations - `/api/v1/tenants/invitations`
- Retrieve data from tenant invitation - `/api/v1/tenants/invitations/{id}`
-
-**Security and Configuration:**
-
- List all secrets - `/api/v1/providers/secrets`
- Retrieve data from a secret - `/api/v1/providers/secrets/{id}`
- List all provider groups - `/api/v1/provider-groups`
- Retrieve data from a provider group - `/api/v1/provider-groups/{id}`
-
-**Reports and Tasks:**
-
- Download zip report - `/api/v1/scans/{v1}/report`
- List all tasks - `/api/v1/tasks`
- Retrieve data from a specific task - `/api/v1/tasks/{id}`
-
-**Lighthouse AI Configuration:**
-
- List LLM providers - `/api/v1/lighthouse/providers`
- Retrieve LLM provider - `/api/v1/lighthouse/providers/{id}`
- List available models - `/api/v1/lighthouse/models`
- Retrieve tenant configuration - `/api/v1/lighthouse/configuration`
-
-<Note>
-Agents only have access to hit GET endpoints. They don't have access to other HTTP methods.
-
-</Note>
-
 ## FAQs

 **1. Which LLM providers are supported?**
@@ -167,13 +85,21 @@ Lighthouse AI supports three providers:

 For detailed configuration instructions, see [Using Multiple LLM Providers with Lighthouse](/user-guide/tutorials/prowler-app-lighthouse-multi-llm).

-**2. Why a multi-agent supervisor model?**
+**2. Why some models don't appear in Lighthouse AI?**

-Context windows are limited. While demo data fits inside the context window, querying real-world data often exceeds it. A multi-agent architecture is used so different agents fetch different sizes of data and respond with the minimum required data to the supervisor. This spreads the context window usage across agents.
+LLM providers offer different types of models. Not every model can be integrated with Lighthouse AI (for example, text-to-speech, vision, embedding, computer use, etc.).
+
+Lighthouse AI requires models that support:
+
+- Text input
+- Text output
+- Tool calling
+
+Lighthouse AI [automatically filters](https://github.com/prowler-cloud/prowler/blob/master/api/src/backend/tasks/jobs/lighthouse_providers.py#L341-L353) out models that do not support these capabilities, so some provider models may not appear in the Lighthouse AI model list.

 **3. Is my security data shared with LLM providers?**

-Minimal data is shared to generate useful responses. Agents can access security findings and remediation details when needed. Provider secrets are protected by design and cannot be read. The LLM provider credentials configured with Lighthouse AI are only accessible to our NextJS server and are never sent to the LLM providers. Resource metadata (names, tags, account/project IDs, etc) may be shared with the configured LLM provider based on query requirements.
+Minimal data is shared to generate useful responses. Agent can access security findings and remediation details when needed. Provider secrets are protected by design and cannot be read. The LLM provider credentials configured with Lighthouse AI are only accessible to the Next.js server and are never sent to the LLM providers. Resource metadata (names, tags, account/project IDs, etc.) may be shared with the configured LLM provider based on query requirements.

 **4. Can the Lighthouse AI change my cloud environment?**

--- a/docs/user-guide/img/lighthouse-architecture-dark.png
+++ b/docs/user-guide/img/lighthouse-architecture-dark.png
--- a/docs/user-guide/img/lighthouse-architecture-light.png
+++ b/docs/user-guide/img/lighthouse-architecture-light.png
--- a/docs/user-guide/img/lighthouse-architecture.png
+++ b/docs/user-guide/img/lighthouse-architecture.png
--- a/docs/user-guide/tutorials/prowler-app-lighthouse-multi-llm.mdx
+++ b/docs/user-guide/tutorials/prowler-app-lighthouse-multi-llm.mdx
@@ -22,7 +22,7 @@ For Lighthouse AI to work properly, models **must** support all of the following

 - **Text input**: Ability to receive text prompts.
 - **Text output**: Ability to generate text responses.
- **Tool calling**: Ability to invoke tools and functions.
+- **Tool calling**: Ability to invoke tools and functions to retrieve data from Prowler.

 If any of these capabilities are missing, the model will not be compatible with Lighthouse AI.

--- a/docs/user-guide/tutorials/prowler-app-lighthouse.mdx
+++ b/docs/user-guide/tutorials/prowler-app-lighthouse.mdx
@@ -4,28 +4,37 @@ title: 'How It Works'

 import { VersionBadge } from "/snippets/version-badge.mdx"

-<VersionBadge version="5.8.0" />
+<VersionBadge version="5.16.0" />

 Prowler Lighthouse AI integrates Large Language Models (LLMs) with Prowler security findings data.

-Here's what's happening behind the scenes:
+Behind the scenes, Lighthouse AI works as follows:
+
+- Lighthouse AI runs as a [Langchain agent](https://docs.langchain.com/oss/javascript/langchain/agents) in NextJS
+- The agent connects to the configured LLM provider to understand the prompt and decide what data is needed
+- The agent accesses Prowler data through [Prowler MCP](https://docs.prowler.com/getting-started/products/prowler-mcp), which exposes tools from multiple sources, including:
+  - Prowler Hub
+  - Prowler Docs
+  - Prowler App
+- Instead of calling every tool directly, the agent uses two meta-tools:
+  - `describe_tool` to retrieve a tool schema and parameter requirements.
+  - `execute_tool` to run the selected tool with the required input.
+- Based on the user's query and the data necessary to answer it, Lighthouse agent will invoke necessary Prowler MCP tools using `discover_tool` and `execute_tool`

- The system uses a multi-agent architecture built with [LanggraphJS](https://github.com/langchain-ai/langgraphjs) for LLM logic and [Vercel AI SDK UI](https://sdk.vercel.ai/docs/ai-sdk-ui/overview) for frontend chatbot.
- It uses a ["supervisor" architecture](https://langchain-ai.lang.chat/langgraphjs/tutorials/multi_agent/agent_supervisor/) that interacts with different agents for specialized tasks. For example, `findings_agent` can analyze detected security findings, while `overview_agent` provides a summary of connected cloud accounts.
- The system connects to the configured LLM provider to understand user's query, fetches the right data, and responds to the query.
 <Note>
 Lighthouse AI supports multiple LLM providers including OpenAI, Amazon Bedrock, and OpenAI-compatible services. For configuration details, see [Using Multiple LLM Providers with Lighthouse](/user-guide/tutorials/prowler-app-lighthouse-multi-llm).
 </Note>
- The supervisor agent is the main contact point. It is what users interact with directly from the chat interface. It coordinates with other agents to answer users' questions comprehensively.

-<img src="/images/prowler-app/lighthouse-architecture.png" alt="Lighthouse AI Architecture" />
+<img className="block dark:hidden" src="/images/prowler-app/lighthouse-architecture-light.png" alt="Prowler Lighthouse Architecture" />
+<img className="hidden dark:block" src="/images/prowler-app/lighthouse-architecture-dark.png" alt="Prowler Lighthouse Architecture" />
+

 <Note>
-All agents can only read relevant security data. They cannot modify your data or access sensitive information like configured secrets or tenant details.
+Lighthouse AI can only read relevant security data. It cannot modify data or access sensitive information such as configured secrets or tenant details.

 </Note>

-## Set up
+## Set Up

 Getting started with Prowler Lighthouse AI is easy:

@@ -43,11 +52,11 @@ For detailed configuration instructions for each provider, see [Using Multiple L

 ### Adding Business Context

-The optional business context field lets you provide additional information to help Lighthouse AI understand your environment and priorities, including:
+The optional business context field lets teams provide additional information to help Lighthouse AI understand environment priorities, including:

- Your organization's cloud security goals
+- Organization cloud security goals
 - Information about account owners or responsible teams
- Compliance requirements for your organization
+- Compliance requirements
 - Current security initiatives or focus areas

 Better context leads to more relevant responses and prioritization that aligns with your needs.