docs(lighthouse): Add Lighthouse Docs (#8196)

Co-authored-by: Chandrapal Badshah <12944530+Chan9390@users.noreply.github.com>
Co-authored-by: Pepe Fagoaga <pepe@prowler.com>
This commit is contained in:
Chandrapal Badshah
2025-07-08 11:41:23 +05:30
committed by GitHub
parent b4927c3ad1
commit f29c2ac9f0
9 changed files with 340 additions and 0 deletions

View File

@@ -0,0 +1,134 @@
# Extending Prowler Lighthouse
This guide helps developers customize and extend Prowler Lighthouse by adding or modifying AI agents.
## Understanding AI Agents
AI agents combine Large Language Models (LLMs) with specialized tools that provide environmental context. These tools can include API calls, system command execution, or any function-wrapped capability.
### Types of AI Agents
AI agents fall into two main categories:
- **Autonomous Agents**: Freely chooses from available tools to complete tasks, adapting their approach based on context. They decide which tools to use and when.
- **Workflow Agents**: Follows structured paths with predefined logic. They execute specific tool sequences and can include conditional logic.
Prowler Lighthouse is an autonomous agent - selecting the right tool(s) based on the users query.
???+ note
To learn more about AI agents, read [Anthropic's blog post on building effective agents](https://www.anthropic.com/engineering/building-effective-agents).
### LLM Dependency
The autonomous nature of agents depends on the underlying LLM. Autonomous agents using identical system prompts and tools but powered by different LLM providers might approach user queries differently. Agent with one LLM might solve a problem efficiently, while with another it might take a different route or fail entirely.
After evaluating multiple LLM providers (OpenAI, Gemini, Claude, LLama) based on tool calling features and response accuracy, we recommend using the `gpt-4o` model.
## Prowler Lighthouse Architecture
Prowler Lighthouse uses a multi-agent architecture orchestrated by the [Langgraph-Supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor) library.
### Architecture Components
<img src="../../tutorials/img/lighthouse-architecture.png" alt="Prowler Lighthouse architecture">
Prowler Lighthouse integrates with the NextJS application:
- The [Langgraph-Supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor) library integrates directly with NextJS
- The system uses the authenticated user session to interact with the Prowler API server
- Agents only access data the current user is authorized to view
- Session management operates automatically, ensuring Role-Based Access Control (RBAC) is maintained
## Available Prowler AI Agents
The following specialized AI agents are available in Prowler:
### Agent Overview
- **provider_agent**: Fetches information about cloud providers connected to Prowler
- **user_info_agent**: Retrieves information about Prowler users
- **scans_agent**: Fetches information about Prowler scans
- **compliance_agent**: Retrieves compliance overviews across scans
- **findings_agent**: Fetches information about individual findings across scans
- **overview_agent**: Retrieves overview information (providers, findings by status and severity, etc.)
## How to Add New Capabilities
### Updating the Supervisor Prompt
The supervisor agent controls system behavior, tone, and capabilities. You can find the supervisor prompt at: [https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts](https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts)
#### Supervisor Prompt Modifications
Modifying the supervisor prompt allows you to:
- Change personality or response style
- Add new high-level capabilities
- Modify task delegation to specialized agents
- Set up guardrails (query types to answer or decline)
???+ note
The supervisor agent should not have its own tools. This design keeps the system modular and maintainable.
### How to Create New Specialized Agents
The supervisor agent and all specialized agents are defined in the `route.ts` file. The supervisor agent uses [langgraph-supervisor](https://www.npmjs.com/package/@langchain/langgraph-supervisor), while other agents use the prebuilt [create-react-agent](https://langchain-ai.github.io/langgraphjs/how-tos/create-react-agent/).
To add new capabilities or all Lighthouse to interact with other APIs, create additional specialized agents:
1. First determine what the new agent would do. Create a detailed prompt defining the agent's purpose and capabilities. You can see an example from [here](https://github.com/prowler-cloud/prowler/blob/master/ui/lib/lighthouse/prompts.ts#L359-L385).
???+ note
Ensure that the new agent's capabilities don't collide with existing agents. For example, if there's already a *findings_agent* that talks to findings APIs don't create a new agent to do the same.
2. Create necessary tools for the agents to access specific data or perform actions. A tool is a specialized function that extends the capabilities of LLM by allowing it to access external data or APIs. A tool is triggered by LLM based on the description of the tool and the user's query.
For example, the description of `getScanTool` is "Fetches detailed information about a specific scan by its ID." If the description doesn't convey what the tool is capable of doing, LLM will not invoke the function. If the description of `getScanTool` was set to something random or not set at all, LLM will not answer queries like "Give me the critical issues from the scan ID xxxxxxxxxxxxxxx"
???+ note
Ensure that one tool is added to one agent only. Adding tools is optional. There can be agents with no tools at all.
3. Use the `createReactAgent` function to define a new agent. For example, the rolesAgent name is "roles_agent" and has access to call tools "*getRolesTool*" and "*getRoleTool*"
```js
const rolesAgent = createReactAgent({
llm: llm,
tools: [getRolesTool, getRoleTool],
name: "roles_agent",
prompt: rolesAgentPrompt,
});
```
4. Create a detailed prompt defining the agent's purpose and capabilities.
5. Add the new agent to the available agents list:
```js
const agents = [
userInfoAgent,
providerAgent,
overviewAgent,
scansAgent,
complianceAgent,
findingsAgent,
rolesAgent, // New agent added here
];
// Create supervisor workflow
const workflow = createSupervisor({
agents: agents,
llm: supervisorllm,
prompt: supervisorPrompt,
outputMode: "last_message",
});
```
6. Update the supervisor's system prompt to summarize the new agent's capabilities.
### Best Practices for Agent Development
When developing new agents or capabilities:
- **Clear Responsibility Boundaries**: Each agent should have a defined purpose with minimal overlap. No two agents should access the same tools or different tools accessing the same Prowler APIs.
- **Minimal Data Access**: Agents should only request the data they need, keeping requests specific to minimize context window usage, cost, and response time.
- **Thorough Prompting:** Ensure agent prompts include clear instructions about:
- The agent's purpose and limitations
- How to use its tools
- How to format responses for the supervisor
- Error handling procedures (Optional)
- **Security Considerations:** Agents should never modify data or access sensitive information like secrets or credentials.
- **Testing:** Thoroughly test new agents with various queries before deploying to production.

Binary file not shown.

After

Width:  |  Height:  |  Size: 178 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 197 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 204 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 241 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 268 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 404 KiB

View File

@@ -0,0 +1,204 @@
# Prowler Lighthouse
Prowler Lighthouse is an AI Cloud Security Analyst chatbot that helps you understand, prioritize, and remediate security findings in your cloud environments. It's designed to provide security expertise for teams without dedicated resources, acting as your 24/7 virtual cloud security analyst.
<img src="../img/lighthouse-intro.png" alt="Prowler Lighthouse">
## How It Works
Prowler Lighthouse uses OpenAI's language models and integrates with your Prowler security findings data.
Here's what's happening behind the scenes:
- The system uses a multi-agent architecture built with [LanggraphJS](https://github.com/langchain-ai/langgraphjs) for LLM logic and [Vercel AI SDK UI](https://sdk.vercel.ai/docs/ai-sdk-ui/overview) for frontend chatbot.
- It uses a ["supervisor" architecture](https://langchain-ai.lang.chat/langgraphjs/tutorials/multi_agent/agent_supervisor/) that interacts with different agents for specialized tasks. For example, `findings_agent` can analyze detected security findings, while `overview_agent` provides a summary of connected cloud accounts.
- The system connects to OpenAI models to understand, fetch the right data, and respond to the user's query.
???+ note
Lighthouse is tested against `gpt-4o` and `gpt-4o-mini` OpenAI models.
- The supervisor agent is the main contact point. It is what users interact with directly from the chat interface. It coordinates with other agents to answer users' questions comprehensively.
<img src="../img/lighthouse-architecture.png" alt="Lighthouse Architecture">
???+ note
All agents can only read relevant security data. They cannot modify your data or access sensitive information like configured secrets or tenant details.
## Set up
Getting started with Prowler Lighthouse is easy:
1. Go to the configuration page in your Prowler dashboard.
2. Enter your OpenAI API key.
3. Select your preferred model. The recommended one for best results is `gpt-4o`.
4. (Optional) Add business context to improve response quality and prioritization.
<img src="../img/lighthouse-config.png" alt="Lighthouse Configuration">
### Adding Business Context
The optional business context field lets you provide additional information to help Lighthouse understand your environment and priorities, including:
- Your organization's cloud security goals
- Information about account owners or responsible teams
- Compliance requirements for your organization
- Current security initiatives or focus areas
Better context leads to more relevant responses and prioritization that aligns with your needs.
## Capabilities
Prowler Lighthouse is designed to be your AI security team member, with capabilities including:
### Natural Language Querying
Ask questions in plain English about your security findings. Examples:
- "What are my highest risk findings?"
- "Show me all S3 buckets with public access."
- "What security issues were found in my production accounts?"
<img src="../img/lighthouse-feature1.png" alt="Natural language querying">
### Detailed Remediation Guidance
Get tailored step-by-step instructions for fixing security issues:
- Clear explanations of the problem and its impact
- Commands or console steps to implement fixes
- Alternative approaches with different solutions
<img src="../img/lighthouse-feature2.png" alt="Detailed Remediation">
### Enhanced Context and Analysis
Lighthouse can provide additional context to help you understand the findings:
- Explain security concepts related to findings in simple terms
- Provide risk assessments based on your environment and context
- Connect related findings to show broader security patterns
<img src="../img/lighthouse-config.png" alt="Business Context">
<img src="../img/lighthouse-feature3.png" alt="Contextual Responses">
## Important Notes
Prowler Lighthouse is powerful, but there are limitations:
- **Continuous improvement**: Please report any issues, as the feature may make mistakes or encounter errors, despite extensive testing.
- **Access limitations**: Lighthouse can only access data the logged-in user can view. If you can't see certain information, Lighthouse can't see it either.
- **NextJS session dependence**: If your Prowler application session expires or logs out, Lighthouse will error out. Refresh and log back in to continue.
- **Response quality**: The response quality depends on the selected OpenAI model. For best results, use gpt-4o.
### Getting Help
If you encounter issues with Prowler Lighthouse or have suggestions for improvements, please [reach out through our Slack channel](https://goto.prowler.com/slack).
### What Data Is Shared to OpenAI?
The following API endpoints are accessible to Prowler Lighthouse. Data from the following API endpoints could be shared with OpenAI depending on the scope of user's query:
#### Accessible API Endpoints
**User Management:**
- List all users - `/api/v1/users`
- Retrieve the current user's information - `/api/v1/users/me`
**Provider Management:**
- List all providers - `/api/v1/providers`
- Retrieve data from a provider - `/api/v1/providers/{id}`
**Scan Management:**
- List all scans - `/api/v1/scans`
- Retrieve data from a specific scan - `/api/v1/scans/{id}`
**Resource Management:**
- List all resources - `/api/v1/resources`
- Retrieve data for a resource - `/api/v1/resources/{id}`
**Findings Management:**
- List all findings - `/api/v1/findings`
- Retrieve data from a specific finding - `/api/v1/findings/{id}`
- Retrieve metadata values from findings - `/api/v1/findings/metadata`
**Overview Data:**
- Get aggregated findings data - `/api/v1/overviews/findings`
- Get findings data by severity - `/api/v1/overviews/findings_severity`
- Get aggregated provider data - `/api/v1/overviews/providers`
- Get findings data by service - `/api/v1/overviews/services`
**Compliance Management:**
- List compliance overviews for a scan - `/api/v1/compliance-overviews`
- Retrieve data from a specific compliance overview - `/api/v1/compliance-overviews/{id}`
#### Excluded API Endpoints
Not all Prowler API endpoints are integrated with Lighthouse. They are intentionally excluded for the following reasons:
- OpenAI/other LLM providers shouldn't have access to sensitive data (like fetching provider secrets and other sensitive config)
- Users queries don't need responses from those API endpoints (ex: tasks, tenant details, downloading zip file, etc.)
**Excluded Endpoints:**
**User Management:**
- List specific users information - `/api/v1/users/{id}`
- List user memberships - `/api/v1/users/{user_pk}/memberships`
- Retrieve membership data from the user - `/api/v1/users/{user_pk}/memberships/{id}`
**Tenant Management:**
- List all tenants - `/api/v1/tenants`
- Retrieve data from a tenant - `/api/v1/tenants/{id}`
- List tenant memberships - `/api/v1/tenants/{tenant_pk}/memberships`
- List all invitations - `/api/v1/tenants/invitations`
- Retrieve data from tenant invitation - `/api/v1/tenants/invitations/{id}`
**Security and Configuration:**
- List all secrets - `/api/v1/providers/secrets`
- Retrieve data from a secret - `/api/v1/providers/secrets/{id}`
- List all provider groups - `/api/v1/provider-groups`
- Retrieve data from a provider group - `/api/v1/provider-groups/{id}`
**Reports and Tasks:**
- Download zip report - `/api/v1/scans/{v1}/report`
- List all tasks - `/api/v1/tasks`
- Retrieve data from a specific task - `/api/v1/tasks/{id}`
**Lighthouse Configuration:**
- List OpenAI configuration - `/api/v1/lighthouse-config`
- Retrieve OpenAI key and configuration - `/api/v1/lighthouse-config/{id}`
???+ note
Agents only have access to hit GET endpoints. They don't have access to other HTTP methods.
## FAQs
**1. Why only OpenAI models?**
During feature development, we evaluated other LLM models.
- **Claude AI** - Claude models have [tier-based ratelimits](https://docs.anthropic.com/en/api/rate-limits#requirements-to-advance-tier). For Lighthouse to answer slightly complex questions, there are a handful of API calls to the LLM provider within few seconds. With Claude's tiering system, users must purchase $400 credits or convert their subscription to monthly invoicing after talking to their sales team. This pricing may not suit all Prowler users.
- **Gemini Models** - Gemini lacks a solid tool calling feature like OpenAI. It calls functions recursively until exceeding limits. Gemini-2.5-Pro-Experimental is better than previous models regarding tool calling and responding, but it's still experimental.
- **Deepseek V3** - Doesn't support system prompt messages.
**2. Why a multi-agent supervisor model?**
Context windows are limited. While demo data fits inside the context window, querying real-world data often exceeds it. A multi-agent architecture is used so different agents fetch different sizes of data and respond with the minimum required data to the supervisor. This spreads the context window usage across agents.
**3. Is my security data shared with OpenAI?**
Minimal data is shared to generate useful responses. Agents can access security findings and remediation details when needed. Provider secrets are protected by design and cannot be read. The Lighthouse key is only accessible to our NextJS server and is never sent to LLMs. Resource metadata (names, tags, account/project IDs, etc) may be shared with OpenAI based on your query requirements.
**4. Can the Lighthouse change my cloud environment?**
No. The agent doesn't have the tools to make the changes, even if the configured cloud provider API keys contain permissions to modify resources.

View File

@@ -54,6 +54,7 @@ nav:
- Role-Based Access Control: tutorials/prowler-app-rbac.md
- Social Login: tutorials/prowler-app-social-login.md
- SSO with SAML: tutorials/prowler-app-sso.md
- Lighthouse: tutorials/prowler-app-lighthouse.md
- CLI:
- Miscellaneous: tutorials/misc.md
- Reporting: tutorials/reporting.md
@@ -117,6 +118,7 @@ nav:
- Outputs: developer-guide/outputs.md
- Integrations: developer-guide/integrations.md
- Compliance: developer-guide/security-compliance-framework.md
- Lighthouse: developer-guide/lighthouse.md
- Provider Specific Details:
- AWS: developer-guide/aws-details.md
- Azure: developer-guide/azure-details.md