mirror of
https://github.com/prowler-cloud/prowler.git
synced 2026-01-25 02:08:11 +00:00
docs(lighthouse): update lighthouse architecture docs (#9576)
Co-authored-by: Chandrapal Badshah <12944530+Chan9390@users.noreply.github.com> Co-authored-by: Rubén De la Torre Vico <ruben@prowler.com> Co-authored-by: Andoni Alonso <14891798+andoniaf@users.noreply.github.com>
This commit is contained in:
committed by
GitHub
parent
05466cff22
commit
6c01151d78
407
docs/developer-guide/lighthouse-architecture.mdx
Normal file
407
docs/developer-guide/lighthouse-architecture.mdx
Normal file
@@ -0,0 +1,407 @@
|
|||||||
|
---
|
||||||
|
title: 'Lighthouse AI Architecture'
|
||||||
|
---
|
||||||
|
|
||||||
|
This document describes the internal architecture of Prowler Lighthouse AI, enabling developers to understand how components interact and where to add new functionality.
|
||||||
|
|
||||||
|
<Info>
|
||||||
|
**Looking for user documentation?** See:
|
||||||
|
- [Lighthouse AI Overview](/getting-started/products/prowler-lighthouse-ai) - Capabilities and FAQs
|
||||||
|
- [How Lighthouse AI Works](/user-guide/tutorials/prowler-app-lighthouse) - Configuration and usage
|
||||||
|
- [Multi-LLM Provider Setup](/user-guide/tutorials/prowler-app-lighthouse-multi-llm) - Provider configuration
|
||||||
|
</Info>
|
||||||
|
|
||||||
|
## Architecture Overview
|
||||||
|
|
||||||
|
Lighthouse AI operates as a Langchain-based agent that connects Large Language Models (LLMs) with Prowler security data through the Model Context Protocol (MCP).
|
||||||
|
|
||||||
|
<img className="block dark:hidden" src="/images/lighthouse-architecture-light.png" alt="Prowler Lighthouse Architecture" />
|
||||||
|
<img className="hidden dark:block" src="/images/lighthouse-architecture-dark.png" alt="Prowler Lighthouse Architecture" />
|
||||||
|
|
||||||
|
### Three-Tier Architecture
|
||||||
|
|
||||||
|
The system follows a three-tier architecture:
|
||||||
|
|
||||||
|
1. **Frontend (Next.js)**: Chat interface, message rendering, model selection
|
||||||
|
2. **API Route**: Request handling, authentication, stream transformation
|
||||||
|
3. **Langchain Agent**: LLM orchestration, tool calling through MCP
|
||||||
|
|
||||||
|
### Request Flow
|
||||||
|
|
||||||
|
When a user sends a message through the Lighthouse chat interface, the system processes it through several stages:
|
||||||
|
|
||||||
|
1. **User Submits a Message**.
|
||||||
|
The chat component (`ui/components/lighthouse/chat.tsx`) captures the user's question (e.g., "What are my critical findings in AWS?") and sends it as an HTTP POST request to the backend API route.
|
||||||
|
|
||||||
|
2. **Authentication and Context Assembly**.
|
||||||
|
The API route (`ui/app/api/lighthouse/analyst/route.ts`) validates the user's session, extracts the JWT token (stored via `auth-context.ts`), and gathers context including the tenant's business context and current security posture data (assembled in `data.ts`).
|
||||||
|
|
||||||
|
3. **Agent Initialization**.
|
||||||
|
The workflow orchestrator (`ui/lib/lighthouse/workflow.ts`) creates a Langchain agent configured with:
|
||||||
|
- The selected LLM, instantiated through the factory (`llm-factory.ts`)
|
||||||
|
- A system prompt containing available tools and instructions (`system-prompt.ts`)
|
||||||
|
- Two meta-tools (`describe_tool` and `execute_tool`) for accessing Prowler data
|
||||||
|
|
||||||
|
4. **LLM Reasoning and Tool Calling**.
|
||||||
|
The agent sends the conversation to the LLM, which decides whether to respond directly or call tools to fetch data. When tools are needed, the meta-tools in `ui/lib/lighthouse/tools/meta-tool.ts` interact with the MCP client (`mcp-client.ts`) to:
|
||||||
|
- First call `describe_tool` to understand the tool's parameters
|
||||||
|
- Then call `execute_tool` to retrieve data from the MCP Server
|
||||||
|
- Continue reasoning with the returned data
|
||||||
|
|
||||||
|
5. **Streaming Response**.
|
||||||
|
As the LLM generates its response, the stream handler (`ui/lib/lighthouse/analyst-stream.ts`) transforms Langchain events into UI-compatible messages and streams tokens back to the browser in real-time using Server-Sent Events. The stream includes both text tokens and tool execution events (displayed as "chain of thought").
|
||||||
|
|
||||||
|
6. **Message Rendering**.
|
||||||
|
The frontend receives the stream and renders it through `message-item.tsx` with markdown formatting. Any tool calls that occurred during reasoning are displayed via `chain-of-thought-display.tsx`.
|
||||||
|
|
||||||
|
## Frontend Components
|
||||||
|
|
||||||
|
Frontend components reside in `ui/components/lighthouse/` and handle the chat interface and configuration workflows.
|
||||||
|
|
||||||
|
### Core Components
|
||||||
|
|
||||||
|
| Component | Location | Purpose |
|
||||||
|
|-----------|----------|---------|
|
||||||
|
| `chat.tsx` | `ui/components/lighthouse/` | Main chat interface managing message history and input handling |
|
||||||
|
| `message-item.tsx` | `ui/components/lighthouse/` | Individual message rendering with markdown support |
|
||||||
|
| `select-model.tsx` | `ui/components/lighthouse/` | Model and provider selection dropdown |
|
||||||
|
| `chain-of-thought-display.tsx` | `ui/components/lighthouse/` | Displays tool calls and reasoning steps during execution |
|
||||||
|
|
||||||
|
### Configuration Components
|
||||||
|
|
||||||
|
| Component | Location | Purpose |
|
||||||
|
|-----------|----------|---------|
|
||||||
|
| `lighthouse-settings.tsx` | `ui/components/lighthouse/` | Settings panel for business context and preferences |
|
||||||
|
| `connect-llm-provider.tsx` | `ui/components/lighthouse/` | Provider connection workflow |
|
||||||
|
| `llm-providers-table.tsx` | `ui/components/lighthouse/` | Provider management table |
|
||||||
|
| `forms/delete-llm-provider-form.tsx` | `ui/components/lighthouse/forms/` | Provider deletion confirmation dialog |
|
||||||
|
|
||||||
|
### Supporting Components
|
||||||
|
|
||||||
|
| Component | Location | Purpose |
|
||||||
|
|-----------|----------|---------|
|
||||||
|
| `banner.tsx` / `banner-client.tsx` | `ui/components/lighthouse/` | Status banners and notifications |
|
||||||
|
| `workflow/` | `ui/components/lighthouse/workflow/` | Multi-step configuration workflows |
|
||||||
|
| `ai-elements/` | `ui/components/lighthouse/ai-elements/` | Custom UI primitives for chat interface (input, select, dropdown, tooltip) |
|
||||||
|
|
||||||
|
## Library Code
|
||||||
|
|
||||||
|
Core library code resides in `ui/lib/lighthouse/` and handles agent orchestration, MCP communication, and stream processing.
|
||||||
|
|
||||||
|
### Workflow Orchestrator
|
||||||
|
|
||||||
|
**Location:** `ui/lib/lighthouse/workflow.ts`
|
||||||
|
|
||||||
|
The workflow module serves as the core orchestrator, responsible for:
|
||||||
|
|
||||||
|
- Initializing the Langchain agent with system prompt and tools
|
||||||
|
- Loading tenant configuration (default provider, model, business context)
|
||||||
|
- Creating the LLM instance through the factory
|
||||||
|
- Generating dynamic tool listings from available MCP tools
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Simplified workflow initialization
|
||||||
|
export async function initLighthouseWorkflow(runtimeConfig?: RuntimeConfig) {
|
||||||
|
await initializeMCPClient();
|
||||||
|
|
||||||
|
const toolListing = generateToolListing();
|
||||||
|
const systemPrompt = LIGHTHOUSE_SYSTEM_PROMPT_TEMPLATE.replace(
|
||||||
|
"{{TOOL_LISTING}}",
|
||||||
|
toolListing,
|
||||||
|
);
|
||||||
|
|
||||||
|
const llm = createLLM({
|
||||||
|
provider: providerType,
|
||||||
|
model: modelId,
|
||||||
|
credentials,
|
||||||
|
// ...
|
||||||
|
});
|
||||||
|
|
||||||
|
return createAgent({
|
||||||
|
model: llm,
|
||||||
|
tools: [describeTool, executeTool],
|
||||||
|
systemPrompt,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### MCP Client Manager
|
||||||
|
|
||||||
|
**Location:** `ui/lib/lighthouse/mcp-client.ts`
|
||||||
|
|
||||||
|
The MCP client manages connections to the Prowler MCP Server using a singleton pattern:
|
||||||
|
|
||||||
|
- **Connection Management**: Retry logic with configurable attempts and delays
|
||||||
|
- **Tool Discovery**: Fetches available tools from MCP server on initialization
|
||||||
|
- **Authentication Injection**: Automatically adds JWT tokens to `prowler_app_*` tool calls
|
||||||
|
- **Reconnection**: Supports forced reconnection after server restarts
|
||||||
|
|
||||||
|
Key constants:
|
||||||
|
- `MAX_RETRY_ATTEMPTS`: 3 connection attempts
|
||||||
|
- `RETRY_DELAY_MS`: 2000ms between retries
|
||||||
|
- `RECONNECT_INTERVAL_MS`: 5 minutes before retry after failure
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
// Authentication injection for Prowler App tools
|
||||||
|
private handleBeforeToolCall = ({ name, args }) => {
|
||||||
|
// Only inject auth for prowler_app_* tools (user-specific data)
|
||||||
|
if (!name.startsWith("prowler_app_")) {
|
||||||
|
return { args };
|
||||||
|
}
|
||||||
|
|
||||||
|
const accessToken = getAuthContext();
|
||||||
|
return {
|
||||||
|
args,
|
||||||
|
headers: { Authorization: `Bearer ${accessToken}` },
|
||||||
|
};
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### Meta-Tools
|
||||||
|
|
||||||
|
**Location:** `ui/lib/lighthouse/tools/meta-tool.ts`
|
||||||
|
|
||||||
|
Instead of registering all MCP tools directly with the agent, Lighthouse uses two meta-tools for dynamic tool discovery and execution:
|
||||||
|
|
||||||
|
| Tool | Purpose |
|
||||||
|
|------|---------|
|
||||||
|
| `describe_tool` | Retrieves full schema and parameter details for a specific tool |
|
||||||
|
| `execute_tool` | Executes a tool with provided parameters |
|
||||||
|
|
||||||
|
This pattern reduces the number of tools the LLM must track while maintaining access to all MCP capabilities.
|
||||||
|
|
||||||
|
### Additional Library Modules
|
||||||
|
|
||||||
|
| Module | Location | Purpose |
|
||||||
|
|--------|----------|---------|
|
||||||
|
| `analyst-stream.ts` | `ui/lib/lighthouse/` | Transforms Langchain stream events to UI message format |
|
||||||
|
| `llm-factory.ts` | `ui/lib/lighthouse/` | Creates LLM instances for OpenAI, Bedrock, and OpenAI-compatible providers |
|
||||||
|
| `system-prompt.ts` | `ui/lib/lighthouse/` | System prompt template with dynamic tool listing injection |
|
||||||
|
| `auth-context.ts` | `ui/lib/lighthouse/` | AsyncLocalStorage for JWT token propagation across async boundaries |
|
||||||
|
| `types.ts` | `ui/lib/lighthouse/` | TypeScript type definitions |
|
||||||
|
| `constants.ts` | `ui/lib/lighthouse/` | Configuration constants and error messages |
|
||||||
|
| `utils.ts` | `ui/lib/lighthouse/` | Message conversion and model parameter extraction |
|
||||||
|
| `validation.ts` | `ui/lib/lighthouse/` | Input validation utilities |
|
||||||
|
| `data.ts` | `ui/lib/lighthouse/` | Current data section generation for context enrichment |
|
||||||
|
|
||||||
|
## API Route
|
||||||
|
|
||||||
|
**Location:** `ui/app/api/lighthouse/analyst/route.ts`
|
||||||
|
|
||||||
|
The API route handles chat requests and manages the streaming response pipeline:
|
||||||
|
|
||||||
|
1. **Request Parsing**: Extracts messages, model, and provider from request body
|
||||||
|
2. **Authentication**: Validates session and extracts access token
|
||||||
|
3. **Context Assembly**: Gathers business context and current data
|
||||||
|
4. **Agent Initialization**: Creates Langchain agent with runtime configuration
|
||||||
|
5. **Stream Processing**: Transforms agent events to UI-compatible format
|
||||||
|
6. **Error Handling**: Captures errors with Sentry integration
|
||||||
|
|
||||||
|
```typescript
|
||||||
|
export async function POST(req: Request) {
|
||||||
|
const { messages, model, provider } = await req.json();
|
||||||
|
|
||||||
|
const session = await auth();
|
||||||
|
if (!session?.accessToken) {
|
||||||
|
return Response.json({ error: "Unauthorized" }, { status: 401 });
|
||||||
|
}
|
||||||
|
|
||||||
|
return await authContextStorage.run(accessToken, async () => {
|
||||||
|
const app = await initLighthouseWorkflow(runtimeConfig);
|
||||||
|
const agentStream = app.streamEvents({ messages }, { version: "v2" });
|
||||||
|
|
||||||
|
// Transform stream events to UI format
|
||||||
|
const stream = new ReadableStream({
|
||||||
|
async start(controller) {
|
||||||
|
for await (const streamEvent of agentStream) {
|
||||||
|
// Handle on_chat_model_stream, on_tool_start, on_tool_end, etc.
|
||||||
|
}
|
||||||
|
},
|
||||||
|
});
|
||||||
|
|
||||||
|
return createUIMessageStreamResponse({ stream });
|
||||||
|
});
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Backend Components
|
||||||
|
|
||||||
|
Backend components handle LLM provider configuration, model management, and credential storage.
|
||||||
|
|
||||||
|
### Database Models
|
||||||
|
|
||||||
|
**Location:** `api/src/backend/api/models.py`
|
||||||
|
|
||||||
|
| Model | Purpose |
|
||||||
|
|-------|---------|
|
||||||
|
| `LighthouseProviderConfiguration` | Per-tenant LLM provider credentials (encrypted with Fernet) |
|
||||||
|
| `LighthouseTenantConfiguration` | Tenant-level settings including business context and default provider/model |
|
||||||
|
| `LighthouseProviderModels` | Available models per provider configuration |
|
||||||
|
|
||||||
|
All models implement Row-Level Security (RLS) for tenant isolation.
|
||||||
|
|
||||||
|
#### LighthouseProviderConfiguration
|
||||||
|
|
||||||
|
Stores provider-specific credentials for each tenant:
|
||||||
|
|
||||||
|
- **provider_type**: `openai`, `bedrock`, or `openai_compatible`
|
||||||
|
- **credentials**: Encrypted JSON containing API keys or AWS credentials
|
||||||
|
- **base_url**: Custom endpoint for OpenAI-compatible providers
|
||||||
|
- **is_active**: Connection validation status
|
||||||
|
|
||||||
|
#### LighthouseTenantConfiguration
|
||||||
|
|
||||||
|
Stores tenant-wide Lighthouse settings:
|
||||||
|
|
||||||
|
- **business_context**: Optional context for personalized responses
|
||||||
|
- **default_provider**: Default LLM provider type
|
||||||
|
- **default_models**: JSON mapping provider types to default model IDs
|
||||||
|
|
||||||
|
#### LighthouseProviderModels
|
||||||
|
|
||||||
|
Catalogs available models for each provider:
|
||||||
|
|
||||||
|
- **model_id**: Provider-specific model identifier
|
||||||
|
- **model_name**: Human-readable display name
|
||||||
|
- **default_parameters**: Optional model-specific parameters
|
||||||
|
|
||||||
|
### Background Jobs
|
||||||
|
|
||||||
|
**Location:** `api/src/backend/tasks/jobs/lighthouse_providers.py`
|
||||||
|
|
||||||
|
#### check_lighthouse_provider_connection
|
||||||
|
|
||||||
|
Validates provider credentials by making a test API call:
|
||||||
|
|
||||||
|
- OpenAI: Lists models via `client.models.list()`
|
||||||
|
- Bedrock: Lists foundation models via `bedrock_client.list_foundation_models()`
|
||||||
|
- OpenAI-compatible: Lists models via custom base URL
|
||||||
|
|
||||||
|
Updates `is_active` status based on connection result.
|
||||||
|
|
||||||
|
#### refresh_lighthouse_provider_models
|
||||||
|
|
||||||
|
Synchronizes available models from provider APIs:
|
||||||
|
|
||||||
|
- Fetches current model catalog from provider
|
||||||
|
- Filters out non-chat models (DALL-E, Whisper, TTS, embeddings)
|
||||||
|
- Upserts model records in `LighthouseProviderModels`
|
||||||
|
- Removes stale models no longer available
|
||||||
|
|
||||||
|
**Excluded OpenAI model prefixes:**
|
||||||
|
```python
|
||||||
|
EXCLUDED_OPENAI_MODEL_PREFIXES = (
|
||||||
|
"dall-e", "whisper", "tts-", "sora",
|
||||||
|
"text-embedding", "text-moderation",
|
||||||
|
# Legacy models
|
||||||
|
"text-davinci", "davinci", "curie", "babbage", "ada",
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
## MCP Server Integration
|
||||||
|
|
||||||
|
Lighthouse AI communicates with the Prowler MCP Server to access security data. For detailed MCP Server architecture, see [Extending the MCP Server](/developer-guide/mcp-server).
|
||||||
|
|
||||||
|
### Tool Namespacing
|
||||||
|
|
||||||
|
MCP tools are organized into three namespaces based on authentication requirements:
|
||||||
|
|
||||||
|
| Namespace | Auth Required | Description |
|
||||||
|
|-----------|---------------|-------------|
|
||||||
|
| `prowler_app_*` | Yes (JWT) | Prowler Cloud/App tools for findings, providers, scans, resources |
|
||||||
|
| `prowler_hub_*` | No | Security checks catalog, compliance frameworks |
|
||||||
|
| `prowler_docs_*` | No | Documentation search and retrieval |
|
||||||
|
|
||||||
|
### Authentication Flow
|
||||||
|
|
||||||
|
1. User authenticates with Prowler App, receiving a JWT token
|
||||||
|
2. Token is stored in session and propagated via `authContextStorage`
|
||||||
|
3. MCP client injects `Authorization: Bearer <token>` header for `prowler_app_*` calls
|
||||||
|
4. MCP Server validates token and applies RLS filtering
|
||||||
|
|
||||||
|
### Tool Execution Pattern
|
||||||
|
|
||||||
|
The agent uses meta-tools rather than direct tool registration:
|
||||||
|
|
||||||
|
```
|
||||||
|
Agent needs data → describe_tool("prowler_app_search_findings")
|
||||||
|
→ Returns parameter schema → execute_tool with parameters
|
||||||
|
→ MCP client adds auth header → MCP Server executes
|
||||||
|
→ Results returned to agent → Agent continues reasoning
|
||||||
|
```
|
||||||
|
|
||||||
|
## Extension Points
|
||||||
|
|
||||||
|
### Adding New LLM Providers
|
||||||
|
|
||||||
|
To add a new LLM provider:
|
||||||
|
|
||||||
|
1. **Frontend**: Update `ui/lib/lighthouse/llm-factory.ts` with provider-specific initialization
|
||||||
|
2. **Backend**: Add provider type to `LighthouseProviderConfiguration.LLMProviderChoices`
|
||||||
|
3. **Jobs**: Add credential extraction and model fetching in `lighthouse_providers.py`
|
||||||
|
4. **UI**: Add connection workflow in `ui/components/lighthouse/workflow/`
|
||||||
|
|
||||||
|
### Modifying System Prompt
|
||||||
|
|
||||||
|
The system prompt template lives in `ui/lib/lighthouse/system-prompt.ts`. The `{{TOOL_LISTING}}` placeholder is dynamically replaced with available MCP tools during agent initialization.
|
||||||
|
|
||||||
|
### Adding Stream Events
|
||||||
|
|
||||||
|
To handle new Langchain stream events, modify `ui/lib/lighthouse/analyst-stream.ts`. Current handlers include:
|
||||||
|
|
||||||
|
- `on_chat_model_stream`: Token-by-token text streaming
|
||||||
|
- `on_chat_model_end`: Model completion with tool call detection
|
||||||
|
- `on_tool_start`: Tool execution started
|
||||||
|
- `on_tool_end`: Tool execution completed
|
||||||
|
|
||||||
|
### Adding MCP Tools
|
||||||
|
|
||||||
|
See [Extending the MCP Server](/developer-guide/mcp-server) for detailed instructions on adding new tools to the Prowler MCP Server.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
### Environment Variables
|
||||||
|
|
||||||
|
| Variable | Description |
|
||||||
|
|----------|-------------|
|
||||||
|
| `PROWLER_MCP_SERVER_URL` | MCP server endpoint (e.g., `https://mcp.prowler.com/mcp`) |
|
||||||
|
|
||||||
|
### Database Configuration
|
||||||
|
|
||||||
|
Provider credentials are stored encrypted in `LighthouseProviderConfiguration`:
|
||||||
|
|
||||||
|
- **OpenAI**: `{"api_key": "sk-..."}`
|
||||||
|
- **Bedrock**: `{"access_key_id": "...", "secret_access_key": "...", "region": "us-east-1"}` or `{"api_key": "...", "region": "us-east-1"}`
|
||||||
|
- **OpenAI-compatible**: `{"api_key": "..."}` with `base_url` field
|
||||||
|
|
||||||
|
### Tenant Configuration
|
||||||
|
|
||||||
|
Business context and default settings are stored in `LighthouseTenantConfiguration`:
|
||||||
|
|
||||||
|
```python
|
||||||
|
{
|
||||||
|
"business_context": "Optional organization context for personalized responses",
|
||||||
|
"default_provider": "openai",
|
||||||
|
"default_models": {
|
||||||
|
"openai": "gpt-4o",
|
||||||
|
"bedrock": "anthropic.claude-3-5-sonnet-20240620-v1:0"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Related Documentation
|
||||||
|
|
||||||
|
<CardGroup cols={2}>
|
||||||
|
<Card title="MCP Server Extension" icon="wrench" href="/developer-guide/mcp-server">
|
||||||
|
Adding new tools to the Prowler MCP Server
|
||||||
|
</Card>
|
||||||
|
<Card title="Lighthouse AI Overview" icon="robot" href="/getting-started/products/prowler-lighthouse-ai">
|
||||||
|
Capabilities, FAQs, and limitations
|
||||||
|
</Card>
|
||||||
|
<Card title="Multi-LLM Setup" icon="sliders" href="/user-guide/tutorials/prowler-app-lighthouse-multi-llm">
|
||||||
|
Configuring multiple LLM providers
|
||||||
|
</Card>
|
||||||
|
<Card title="How Lighthouse Works" icon="gear" href="/user-guide/tutorials/prowler-app-lighthouse">
|
||||||
|
User-facing architecture and setup guide
|
||||||
|
</Card>
|
||||||
|
</CardGroup>
|
||||||
@@ -1,140 +0,0 @@
|
|||||||
---
|
|
||||||
title: 'Extending Prowler Lighthouse AI'
|
|
||||||
---
|
|
||||||
|
|
||||||
This guide helps developers customize and extend Prowler Lighthouse AI by adding or modifying AI agents.
|
|
||||||
|
|
||||||
## Understanding AI Agents
|
|
||||||
|
|
||||||
AI agents combine Large Language Models (LLMs) with specialized tools that provide environmental context. These tools can include API calls, system command execution, or any function-wrapped capability.
|
|
||||||
|
|
||||||
### Types of AI Agents
|
|
||||||
|
|
||||||
AI agents fall into two main categories:
|
|
||||||
|
|
||||||
- **Autonomous Agents**: Freely chooses from available tools to complete tasks, adapting their approach based on context. They decide which tools to use and when.
|
|
||||||
- **Workflow Agents**: Follows structured paths with predefined logic. They execute specific tool sequences and can include conditional logic.
|
|
||||||
|
|
||||||
Prowler Lighthouse AI is an autonomous agent - selecting the right tool(s) based on the users query.
|
|
||||||
|
|
||||||
<Note>
|
|
||||||
To learn more about AI agents, read [Anthropic's blog post on building effective agents](https://www.anthropic.com/engineering/building-effective-agents).
|
|
||||||
|
|
||||||
</Note>
|
|
||||||
### LLM Dependency
|
|
||||||
|
|
||||||
The autonomous nature of agents depends on the underlying LLM. Autonomous agents using identical system prompts and tools but powered by different LLM providers might approach user queries differently. Agent with one LLM might solve a problem efficiently, while with another it might take a different route or fail entirely.
|
|
||||||
|
|
||||||
After evaluating multiple LLM providers (OpenAI, Gemini, Claude, LLama) based on tool calling features and response accuracy, we recommend using the `gpt-4o` model.
|
|
||||||
|
|
||||||
## Prowler Lighthouse AI Architecture
|
|
||||||
|
|
||||||
Prowler Lighthouse AI uses a multi-agent architecture orchestrated by the [Langgraph-Supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor) library.
|
|
||||||
|
|
||||||
### Architecture Components
|
|
||||||
|
|
||||||
<img src="/images/prowler-app/lighthouse-architecture.png" alt="Prowler Lighthouse architecture" />
|
|
||||||
|
|
||||||
Prowler Lighthouse AI integrates with the NextJS application:
|
|
||||||
|
|
||||||
- The [Langgraph-Supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor) library integrates directly with NextJS
|
|
||||||
- The system uses the authenticated user session to interact with the Prowler API server
|
|
||||||
- Agents only access data the current user is authorized to view
|
|
||||||
- Session management operates automatically, ensuring Role-Based Access Control (RBAC) is maintained
|
|
||||||
|
|
||||||
## Available Prowler AI Agents
|
|
||||||
|
|
||||||
The following specialized AI agents are available in Prowler:
|
|
||||||
|
|
||||||
### Agent Overview
|
|
||||||
|
|
||||||
- **provider_agent**: Fetches information about cloud providers connected to Prowler
|
|
||||||
- **user_info_agent**: Retrieves information about Prowler users
|
|
||||||
- **scans_agent**: Fetches information about Prowler scans
|
|
||||||
- **compliance_agent**: Retrieves compliance overviews across scans
|
|
||||||
- **findings_agent**: Fetches information about individual findings across scans
|
|
||||||
- **overview_agent**: Retrieves overview information (providers, findings by status and severity, etc.)
|
|
||||||
|
|
||||||
## How to Add New Capabilities
|
|
||||||
|
|
||||||
### Updating the Supervisor Prompt
|
|
||||||
|
|
||||||
The supervisor agent controls system behavior, tone, and capabilities. You can find the supervisor prompt at: [https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts](https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts)
|
|
||||||
|
|
||||||
#### Supervisor Prompt Modifications
|
|
||||||
|
|
||||||
Modifying the supervisor prompt allows you to:
|
|
||||||
|
|
||||||
- Change personality or response style
|
|
||||||
- Add new high-level capabilities
|
|
||||||
- Modify task delegation to specialized agents
|
|
||||||
- Set up guardrails (query types to answer or decline)
|
|
||||||
|
|
||||||
<Note>
|
|
||||||
The supervisor agent should not have its own tools. This design keeps the system modular and maintainable.
|
|
||||||
|
|
||||||
</Note>
|
|
||||||
### How to Create New Specialized Agents
|
|
||||||
|
|
||||||
The supervisor agent and all specialized agents are defined in the `route.ts` file. The supervisor agent uses [langgraph-supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor), while other agents use the prebuilt [create-react-agent](https://langchain-ai.github.io/langgraphjs/how-tos/create-react-agent/).
|
|
||||||
|
|
||||||
To add new capabilities or all Lighthouse AI to interact with other APIs, create additional specialized agents:
|
|
||||||
|
|
||||||
1. First determine what the new agent would do. Create a detailed prompt defining the agent's purpose and capabilities. You can see an example from [here](https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts#L359-L385).
|
|
||||||
<Note>
|
|
||||||
Ensure that the new agent's capabilities don't collide with existing agents. For example, if there's already a *findings_agent* that talks to findings APIs don't create a new agent to do the same.
|
|
||||||
|
|
||||||
</Note>
|
|
||||||
2. Create necessary tools for the agents to access specific data or perform actions. A tool is a specialized function that extends the capabilities of LLM by allowing it to access external data or APIs. A tool is triggered by LLM based on the description of the tool and the user's query.
|
|
||||||
For example, the description of `getScanTool` is "Fetches detailed information about a specific scan by its ID." If the description doesn't convey what the tool is capable of doing, LLM will not invoke the function. If the description of `getScanTool` was set to something random or not set at all, LLM will not answer queries like "Give me the critical issues from the scan ID xxxxxxxxxxxxxxx"
|
|
||||||
<Note>
|
|
||||||
Ensure that one tool is added to one agent only. Adding tools is optional. There can be agents with no tools at all.
|
|
||||||
|
|
||||||
</Note>
|
|
||||||
3. Use the `createReactAgent` function to define a new agent. For example, the rolesAgent name is "roles_agent" and has access to call tools "*getRolesTool*" and "*getRoleTool*"
|
|
||||||
```js
|
|
||||||
const rolesAgent = createReactAgent({
|
|
||||||
llm: llm,
|
|
||||||
tools: [getRolesTool, getRoleTool],
|
|
||||||
name: "roles_agent",
|
|
||||||
prompt: rolesAgentPrompt,
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
4. Create a detailed prompt defining the agent's purpose and capabilities.
|
|
||||||
|
|
||||||
5. Add the new agent to the available agents list:
|
|
||||||
```js
|
|
||||||
const agents = [
|
|
||||||
userInfoAgent,
|
|
||||||
providerAgent,
|
|
||||||
overviewAgent,
|
|
||||||
scansAgent,
|
|
||||||
complianceAgent,
|
|
||||||
findingsAgent,
|
|
||||||
rolesAgent, // New agent added here
|
|
||||||
];
|
|
||||||
// Create supervisor workflow
|
|
||||||
const workflow = createSupervisor({
|
|
||||||
agents: agents,
|
|
||||||
llm: supervisorllm,
|
|
||||||
prompt: supervisorPrompt,
|
|
||||||
outputMode: "last_message",
|
|
||||||
});
|
|
||||||
```
|
|
||||||
|
|
||||||
6. Update the supervisor's system prompt to summarize the new agent's capabilities.
|
|
||||||
|
|
||||||
### Best Practices for Agent Development
|
|
||||||
|
|
||||||
When developing new agents or capabilities:
|
|
||||||
|
|
||||||
- **Clear Responsibility Boundaries**: Each agent should have a defined purpose with minimal overlap. No two agents should access the same tools or different tools accessing the same Prowler APIs.
|
|
||||||
- **Minimal Data Access**: Agents should only request the data they need, keeping requests specific to minimize context window usage, cost, and response time.
|
|
||||||
- **Thorough Prompting:** Ensure agent prompts include clear instructions about:
|
|
||||||
- The agent's purpose and limitations
|
|
||||||
- How to use its tools
|
|
||||||
- How to format responses for the supervisor
|
|
||||||
- Error handling procedures (Optional)
|
|
||||||
- **Security Considerations:** Agents should never modify data or access sensitive information like secrets or credentials.
|
|
||||||
- **Testing:** Thoroughly test new agents with various queries before deploying to production.
|
|
||||||
@@ -284,7 +284,7 @@
|
|||||||
"developer-guide/outputs",
|
"developer-guide/outputs",
|
||||||
"developer-guide/integrations",
|
"developer-guide/integrations",
|
||||||
"developer-guide/security-compliance-framework",
|
"developer-guide/security-compliance-framework",
|
||||||
"developer-guide/lighthouse",
|
"developer-guide/lighthouse-architecture",
|
||||||
"developer-guide/mcp-server",
|
"developer-guide/mcp-server",
|
||||||
"developer-guide/ai-skills"
|
"developer-guide/ai-skills"
|
||||||
]
|
]
|
||||||
|
|||||||
@@ -59,6 +59,14 @@ Prowler Lighthouse AI is powerful, but there are limitations:
|
|||||||
- **NextJS session dependence**: If your Prowler application session expires or logs out, Lighthouse AI will error out. Refresh and log back in to continue.
|
- **NextJS session dependence**: If your Prowler application session expires or logs out, Lighthouse AI will error out. Refresh and log back in to continue.
|
||||||
- **Response quality**: The response quality depends on the selected LLM provider and model. Choose models with strong tool-calling capabilities for best results. We recommend `gpt-5` model from OpenAI.
|
- **Response quality**: The response quality depends on the selected LLM provider and model. Choose models with strong tool-calling capabilities for best results. We recommend `gpt-5` model from OpenAI.
|
||||||
|
|
||||||
|
## Extending Lighthouse AI
|
||||||
|
|
||||||
|
Lighthouse AI retrieves data through Prowler MCP. To add new capabilities, extend the Prowler MCP Server with additional tools and Lighthouse AI discovers them automatically.
|
||||||
|
|
||||||
|
For development details, see:
|
||||||
|
- [Lighthouse AI Architecture](/developer-guide/lighthouse-architecture) - Internal architecture and extension points
|
||||||
|
- [Extending the MCP Server](/developer-guide/mcp-server) - Adding new tools to Prowler MCP
|
||||||
|
|
||||||
### Getting Help
|
### Getting Help
|
||||||
|
|
||||||
If you encounter issues with Prowler Lighthouse AI or have suggestions for improvements, please [reach out through our Slack channel](https://goto.prowler.com/slack).
|
If you encounter issues with Prowler Lighthouse AI or have suggestions for improvements, please [reach out through our Slack channel](https://goto.prowler.com/slack).
|
||||||
@@ -67,94 +75,6 @@ If you encounter issues with Prowler Lighthouse AI or have suggestions for impro
|
|||||||
|
|
||||||
The following API endpoints are accessible to Prowler Lighthouse AI. Data from the following API endpoints could be shared with LLM provider depending on the scope of user's query:
|
The following API endpoints are accessible to Prowler Lighthouse AI. Data from the following API endpoints could be shared with LLM provider depending on the scope of user's query:
|
||||||
|
|
||||||
#### Accessible API Endpoints
|
|
||||||
|
|
||||||
**User Management:**
|
|
||||||
|
|
||||||
- List all users - `/api/v1/users`
|
|
||||||
- Retrieve the current user's information - `/api/v1/users/me`
|
|
||||||
|
|
||||||
**Provider Management:**
|
|
||||||
|
|
||||||
- List all providers - `/api/v1/providers`
|
|
||||||
- Retrieve data from a provider - `/api/v1/providers/{id}`
|
|
||||||
|
|
||||||
**Scan Management:**
|
|
||||||
|
|
||||||
- List all scans - `/api/v1/scans`
|
|
||||||
- Retrieve data from a specific scan - `/api/v1/scans/{id}`
|
|
||||||
|
|
||||||
**Resource Management:**
|
|
||||||
|
|
||||||
- List all resources - `/api/v1/resources`
|
|
||||||
- Retrieve data for a resource - `/api/v1/resources/{id}`
|
|
||||||
|
|
||||||
**Findings Management:**
|
|
||||||
|
|
||||||
- List all findings - `/api/v1/findings`
|
|
||||||
- Retrieve data from a specific finding - `/api/v1/findings/{id}`
|
|
||||||
- Retrieve metadata values from findings - `/api/v1/findings/metadata`
|
|
||||||
|
|
||||||
**Overview Data:**
|
|
||||||
|
|
||||||
- Get aggregated findings data - `/api/v1/overviews/findings`
|
|
||||||
- Get findings data by severity - `/api/v1/overviews/findings_severity`
|
|
||||||
- Get aggregated provider data - `/api/v1/overviews/providers`
|
|
||||||
- Get findings data by service - `/api/v1/overviews/services`
|
|
||||||
|
|
||||||
**Compliance Management:**
|
|
||||||
|
|
||||||
- List compliance overviews (optionally filter by scan) - `/api/v1/compliance-overviews`
|
|
||||||
- Retrieve data from a specific compliance overview - `/api/v1/compliance-overviews/{id}`
|
|
||||||
|
|
||||||
#### Excluded API Endpoints
|
|
||||||
|
|
||||||
Not all Prowler API endpoints are integrated with Lighthouse AI. They are intentionally excluded for the following reasons:
|
|
||||||
|
|
||||||
- OpenAI/other LLM providers shouldn't have access to sensitive data (like fetching provider secrets and other sensitive config)
|
|
||||||
- Users queries don't need responses from those API endpoints (ex: tasks, tenant details, downloading zip file, etc.)
|
|
||||||
|
|
||||||
**Excluded Endpoints:**
|
|
||||||
|
|
||||||
**User Management:**
|
|
||||||
|
|
||||||
- List specific users information - `/api/v1/users/{id}`
|
|
||||||
- List user memberships - `/api/v1/users/{user_pk}/memberships`
|
|
||||||
- Retrieve membership data from the user - `/api/v1/users/{user_pk}/memberships/{id}`
|
|
||||||
|
|
||||||
**Tenant Management:**
|
|
||||||
|
|
||||||
- List all tenants - `/api/v1/tenants`
|
|
||||||
- Retrieve data from a tenant - `/api/v1/tenants/{id}`
|
|
||||||
- List tenant memberships - `/api/v1/tenants/{tenant_pk}/memberships`
|
|
||||||
- List all invitations - `/api/v1/tenants/invitations`
|
|
||||||
- Retrieve data from tenant invitation - `/api/v1/tenants/invitations/{id}`
|
|
||||||
|
|
||||||
**Security and Configuration:**
|
|
||||||
|
|
||||||
- List all secrets - `/api/v1/providers/secrets`
|
|
||||||
- Retrieve data from a secret - `/api/v1/providers/secrets/{id}`
|
|
||||||
- List all provider groups - `/api/v1/provider-groups`
|
|
||||||
- Retrieve data from a provider group - `/api/v1/provider-groups/{id}`
|
|
||||||
|
|
||||||
**Reports and Tasks:**
|
|
||||||
|
|
||||||
- Download zip report - `/api/v1/scans/{v1}/report`
|
|
||||||
- List all tasks - `/api/v1/tasks`
|
|
||||||
- Retrieve data from a specific task - `/api/v1/tasks/{id}`
|
|
||||||
|
|
||||||
**Lighthouse AI Configuration:**
|
|
||||||
|
|
||||||
- List LLM providers - `/api/v1/lighthouse/providers`
|
|
||||||
- Retrieve LLM provider - `/api/v1/lighthouse/providers/{id}`
|
|
||||||
- List available models - `/api/v1/lighthouse/models`
|
|
||||||
- Retrieve tenant configuration - `/api/v1/lighthouse/configuration`
|
|
||||||
|
|
||||||
<Note>
|
|
||||||
Agents only have access to hit GET endpoints. They don't have access to other HTTP methods.
|
|
||||||
|
|
||||||
</Note>
|
|
||||||
|
|
||||||
## FAQs
|
## FAQs
|
||||||
|
|
||||||
**1. Which LLM providers are supported?**
|
**1. Which LLM providers are supported?**
|
||||||
@@ -167,13 +87,21 @@ Lighthouse AI supports three providers:
|
|||||||
|
|
||||||
For detailed configuration instructions, see [Using Multiple LLM Providers with Lighthouse](/user-guide/tutorials/prowler-app-lighthouse-multi-llm).
|
For detailed configuration instructions, see [Using Multiple LLM Providers with Lighthouse](/user-guide/tutorials/prowler-app-lighthouse-multi-llm).
|
||||||
|
|
||||||
**2. Why a multi-agent supervisor model?**
|
**2. Why some models don't appear in Lighthouse AI?**
|
||||||
|
|
||||||
Context windows are limited. While demo data fits inside the context window, querying real-world data often exceeds it. A multi-agent architecture is used so different agents fetch different sizes of data and respond with the minimum required data to the supervisor. This spreads the context window usage across agents.
|
LLM providers offer different types of models. Not every model can be integrated with Lighthouse AI (for example, text-to-speech, vision, embedding, computer use, etc.).
|
||||||
|
|
||||||
|
Lighthouse AI requires models that support:
|
||||||
|
|
||||||
|
- Text input
|
||||||
|
- Text output
|
||||||
|
- Tool calling
|
||||||
|
|
||||||
|
Lighthouse AI [automatically filters](https://github.com/prowler-cloud/prowler/blob/master/api/src/backend/tasks/jobs/lighthouse_providers.py#L341-L353) out models that do not support these capabilities, so some provider models may not appear in the Lighthouse AI model list.
|
||||||
|
|
||||||
**3. Is my security data shared with LLM providers?**
|
**3. Is my security data shared with LLM providers?**
|
||||||
|
|
||||||
Minimal data is shared to generate useful responses. Agents can access security findings and remediation details when needed. Provider secrets are protected by design and cannot be read. The LLM provider credentials configured with Lighthouse AI are only accessible to our NextJS server and are never sent to the LLM providers. Resource metadata (names, tags, account/project IDs, etc) may be shared with the configured LLM provider based on query requirements.
|
Minimal data is shared to generate useful responses. Agent can access security findings and remediation details when needed. Provider secrets are protected by design and cannot be read. The LLM provider credentials configured with Lighthouse AI are only accessible to the Next.js server and are never sent to the LLM providers. Resource metadata (names, tags, account/project IDs, etc.) may be shared with the configured LLM provider based on query requirements.
|
||||||
|
|
||||||
**4. Can the Lighthouse AI change my cloud environment?**
|
**4. Can the Lighthouse AI change my cloud environment?**
|
||||||
|
|
||||||
|
|||||||
Binary file not shown.
|
Before Width: | Height: | Size: 178 KiB |
BIN
docs/images/lighthouse-architecture-dark.png
Normal file
BIN
docs/images/lighthouse-architecture-dark.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 267 KiB |
BIN
docs/images/lighthouse-architecture-light.png
Normal file
BIN
docs/images/lighthouse-architecture-light.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 265 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 178 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 178 KiB |
Binary file not shown.
|
Before Width: | Height: | Size: 178 KiB |
@@ -22,7 +22,7 @@ For Lighthouse AI to work properly, models **must** support all of the following
|
|||||||
|
|
||||||
- **Text input**: Ability to receive text prompts.
|
- **Text input**: Ability to receive text prompts.
|
||||||
- **Text output**: Ability to generate text responses.
|
- **Text output**: Ability to generate text responses.
|
||||||
- **Tool calling**: Ability to invoke tools and functions.
|
- **Tool calling**: Ability to invoke tools and functions to retrieve data from Prowler.
|
||||||
|
|
||||||
If any of these capabilities are missing, the model will not be compatible with Lighthouse AI.
|
If any of these capabilities are missing, the model will not be compatible with Lighthouse AI.
|
||||||
|
|
||||||
|
|||||||
@@ -8,24 +8,33 @@ import { VersionBadge } from "/snippets/version-badge.mdx"
|
|||||||
|
|
||||||
Prowler Lighthouse AI integrates Large Language Models (LLMs) with Prowler security findings data.
|
Prowler Lighthouse AI integrates Large Language Models (LLMs) with Prowler security findings data.
|
||||||
|
|
||||||
Here's what's happening behind the scenes:
|
Behind the scenes, Lighthouse AI works as follows:
|
||||||
|
|
||||||
|
- Lighthouse AI runs as a [Langchain agent](https://docs.langchain.com/oss/javascript/langchain/agents) in NextJS
|
||||||
|
- The agent connects to the configured LLM provider to understand the prompt and decide what data is needed
|
||||||
|
- The agent accesses Prowler data through [Prowler MCP](https://docs.prowler.com/getting-started/products/prowler-mcp), which exposes tools from multiple sources, including:
|
||||||
|
- Prowler Hub
|
||||||
|
- Prowler Docs
|
||||||
|
- Prowler App
|
||||||
|
- Instead of calling every tool directly, the agent uses two meta-tools:
|
||||||
|
- `describe_tool` to retrieve a tool schema and parameter requirements.
|
||||||
|
- `execute_tool` to run the selected tool with the required input.
|
||||||
|
- Based on the user's query and the data necessary to answer it, Lighthouse agent will invoke necessary Prowler MCP tools using `discover_tool` and `execute_tool`
|
||||||
|
|
||||||
- The system uses a multi-agent architecture built with [LanggraphJS](https://github.com/langchain-ai/langgraphjs) for LLM logic and [Vercel AI SDK UI](https://sdk.vercel.ai/docs/ai-sdk-ui/overview) for frontend chatbot.
|
|
||||||
- It uses a ["supervisor" architecture](https://langchain-ai.lang.chat/langgraphjs/tutorials/multi_agent/agent_supervisor/) that interacts with different agents for specialized tasks. For example, `findings_agent` can analyze detected security findings, while `overview_agent` provides a summary of connected cloud accounts.
|
|
||||||
- The system connects to the configured LLM provider to understand user's query, fetches the right data, and responds to the query.
|
|
||||||
<Note>
|
<Note>
|
||||||
Lighthouse AI supports multiple LLM providers including OpenAI, Amazon Bedrock, and OpenAI-compatible services. For configuration details, see [Using Multiple LLM Providers with Lighthouse](/user-guide/tutorials/prowler-app-lighthouse-multi-llm).
|
Lighthouse AI supports multiple LLM providers including OpenAI, Amazon Bedrock, and OpenAI-compatible services. For configuration details, see [Using Multiple LLM Providers with Lighthouse](/user-guide/tutorials/prowler-app-lighthouse-multi-llm).
|
||||||
</Note>
|
</Note>
|
||||||
- The supervisor agent is the main contact point. It is what users interact with directly from the chat interface. It coordinates with other agents to answer users' questions comprehensively.
|
|
||||||
|
|
||||||
<img src="/images/prowler-app/lighthouse-architecture.png" alt="Lighthouse AI Architecture" />
|
<img className="block dark:hidden" src="/images/lighthouse-architecture-light.png" alt="Prowler Lighthouse Architecture" />
|
||||||
|
<img className="hidden dark:block" src="/images/lighthouse-architecture-dark.png" alt="Prowler Lighthouse Architecture" />
|
||||||
|
|
||||||
|
|
||||||
<Note>
|
<Note>
|
||||||
All agents can only read relevant security data. They cannot modify your data or access sensitive information like configured secrets or tenant details.
|
Lighthouse AI can only read relevant security data. It cannot modify data or access sensitive information such as configured secrets or tenant details.
|
||||||
|
|
||||||
</Note>
|
</Note>
|
||||||
|
|
||||||
## Set up
|
## Set Up
|
||||||
|
|
||||||
Getting started with Prowler Lighthouse AI is easy:
|
Getting started with Prowler Lighthouse AI is easy:
|
||||||
|
|
||||||
@@ -43,11 +52,11 @@ For detailed configuration instructions for each provider, see [Using Multiple L
|
|||||||
|
|
||||||
### Adding Business Context
|
### Adding Business Context
|
||||||
|
|
||||||
The optional business context field lets you provide additional information to help Lighthouse AI understand your environment and priorities, including:
|
The optional business context field lets teams provide additional information to help Lighthouse AI understand environment priorities, including:
|
||||||
|
|
||||||
- Your organization's cloud security goals
|
- Organization cloud security goals
|
||||||
- Information about account owners or responsible teams
|
- Information about account owners or responsible teams
|
||||||
- Compliance requirements for your organization
|
- Compliance requirements
|
||||||
- Current security initiatives or focus areas
|
- Current security initiatives or focus areas
|
||||||
|
|
||||||
Better context leads to more relevant responses and prioritization that aligns with your needs.
|
Better context leads to more relevant responses and prioritization that aligns with your needs.
|
||||||
|
|||||||
Reference in New Issue
Block a user