chore(skills): Improve Django and DRF skills (#9831)

Co-authored-by: Adrián Jesús Peña Rodríguez <adrianjpr@gmail.com>
This commit is contained in:
Pepe Fagoaga
2026-01-22 13:54:06 +01:00
committed by GitHub
parent 03d4c19ed5
commit dce05295ef
22 changed files with 3887 additions and 340 deletions

View File

@@ -20,6 +20,7 @@ Use these skills for detailed patterns on-demand:
| `playwright` | Page Object Model, MCP workflow, selectors | [SKILL.md](skills/playwright/SKILL.md) |
| `pytest` | Fixtures, mocking, markers, parametrize | [SKILL.md](skills/pytest/SKILL.md) |
| `django-drf` | ViewSets, Serializers, Filters | [SKILL.md](skills/django-drf/SKILL.md) |
| `jsonapi` | Strict JSON:API v1.1 spec compliance | [SKILL.md](skills/jsonapi/SKILL.md) |
| `zod-4` | New API (z.email(), z.uuid()) | [SKILL.md](skills/zod-4/SKILL.md) |
| `zustand-5` | Persist, selectors, slices | [SKILL.md](skills/zustand-5/SKILL.md) |
| `ai-sdk-5` | UIMessage, streaming, LangChain | [SKILL.md](skills/ai-sdk-5/SKILL.md) |
@@ -40,6 +41,7 @@ Use these skills for detailed patterns on-demand:
| `prowler-provider` | Add new cloud providers | [SKILL.md](skills/prowler-provider/SKILL.md) |
| `prowler-changelog` | Changelog entries (keepachangelog.com) | [SKILL.md](skills/prowler-changelog/SKILL.md) |
| `prowler-ci` | CI checks and PR gates (GitHub Actions) | [SKILL.md](skills/prowler-ci/SKILL.md) |
| `prowler-commit` | Professional commits (conventional-commits) | [SKILL.md](skills/prowler-commit/SKILL.md) |
| `prowler-pr` | Pull request conventions | [SKILL.md](skills/prowler-pr/SKILL.md) |
| `prowler-docs` | Documentation style guide | [SKILL.md](skills/prowler-docs/SKILL.md) |
| `skill-creator` | Create new AI agent skills | [SKILL.md](skills/skill-creator/SKILL.md) |
@@ -51,14 +53,19 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
| Action | Skill |
|--------|-------|
| Add changelog entry for a PR or feature | `prowler-changelog` |
| Adding DRF pagination or permissions | `django-drf` |
| Adding new providers | `prowler-provider` |
| Adding services to existing providers | `prowler-provider` |
| After creating/modifying a skill | `skill-sync` |
| App Router / Server Actions | `nextjs-15` |
| Building AI chat features | `ai-sdk-5` |
| Committing changes | `prowler-commit` |
| Create PR that requires changelog entry | `prowler-changelog` |
| Create a PR with gh pr create | `prowler-pr` |
| Creating API endpoints | `jsonapi` |
| Creating ViewSets, serializers, or filters in api/ | `django-drf` |
| Creating Zod schemas | `zod-4` |
| Creating a git commit | `prowler-commit` |
| Creating new checks | `prowler-sdk-check` |
| Creating new skills | `skill-creator` |
| Creating/modifying Prowler UI components | `prowler-ui` |
@@ -67,14 +74,16 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
| Debug why a GitHub Actions job is failing | `prowler-ci` |
| Fill .github/pull_request_template.md (Context/Description/Steps to review/Checklist) | `prowler-pr` |
| General Prowler development questions | `prowler` |
| Generic DRF patterns | `django-drf` |
| Implementing JSON:API endpoints | `django-drf` |
| Inspect PR CI checks and gates (.github/workflows/*) | `prowler-ci` |
| Inspect PR CI workflows (.github/workflows/*): conventional-commit, pr-check-changelog, pr-conflict-checker, labeler | `prowler-pr` |
| Mapping checks to compliance controls | `prowler-compliance` |
| Mocking AWS with moto in tests | `prowler-test-sdk` |
| Modifying API responses | `jsonapi` |
| Regenerate AGENTS.md Auto-invoke tables (sync.sh) | `skill-sync` |
| Review PR requirements: template, title conventions, changelog gate | `prowler-pr` |
| Review changelog format and conventions | `prowler-changelog` |
| Reviewing JSON:API compliance | `jsonapi` |
| Reviewing compliance framework PRs | `prowler-compliance-review` |
| Testing RLS tenant isolation | `prowler-test-api` |
| Troubleshoot why a skill is missing from AGENTS.md auto-invoke | `skill-sync` |

View File

@@ -4,6 +4,7 @@
> - [`prowler-api`](../skills/prowler-api/SKILL.md) - Models, Serializers, Views, RLS patterns
> - [`prowler-test-api`](../skills/prowler-test-api/SKILL.md) - Testing patterns (pytest-django)
> - [`django-drf`](../skills/django-drf/SKILL.md) - Generic DRF patterns
> - [`jsonapi`](../skills/jsonapi/SKILL.md) - Strict JSON:API v1.1 spec compliance
> - [`pytest`](../skills/pytest/SKILL.md) - Generic pytest patterns
### Auto-invoke Skills
@@ -13,10 +14,17 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
| Action | Skill |
|--------|-------|
| Add changelog entry for a PR or feature | `prowler-changelog` |
| Adding DRF pagination or permissions | `django-drf` |
| Committing changes | `prowler-commit` |
| Create PR that requires changelog entry | `prowler-changelog` |
| Creating API endpoints | `jsonapi` |
| Creating ViewSets, serializers, or filters in api/ | `django-drf` |
| Creating a git commit | `prowler-commit` |
| Creating/modifying models, views, serializers | `prowler-api` |
| Generic DRF patterns | `django-drf` |
| Implementing JSON:API endpoints | `django-drf` |
| Modifying API responses | `jsonapi` |
| Review changelog format and conventions | `prowler-changelog` |
| Reviewing JSON:API compliance | `jsonapi` |
| Testing RLS tenant isolation | `prowler-test-api` |
| Update CHANGELOG.md in any component | `prowler-changelog` |
| Writing Prowler API tests | `prowler-test-api` |

View File

@@ -9,7 +9,9 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
| Action | Skill |
|--------|-------|
| Add changelog entry for a PR or feature | `prowler-changelog` |
| Committing changes | `prowler-commit` |
| Create PR that requires changelog entry | `prowler-changelog` |
| Creating a git commit | `prowler-commit` |
| Review changelog format and conventions | `prowler-changelog` |
| Update CHANGELOG.md in any component | `prowler-changelog` |
| Working on MCP server tools | `prowler-mcp` |

View File

@@ -2,185 +2,504 @@
name: django-drf
description: >
Django REST Framework patterns.
Trigger: When implementing generic DRF APIs (ViewSets, serializers, routers, permissions, filtersets). For Prowler API specifics (RLS/JSON:API), also use prowler-api.
Trigger: When implementing generic DRF APIs (ViewSets, serializers, routers, permissions, filtersets). For Prowler API specifics (RLS/RBAC/Providers), also use prowler-api.
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.0"
version: "1.2.0"
scope: [root, api]
auto_invoke: "Generic DRF patterns"
auto_invoke:
- "Creating ViewSets, serializers, or filters in api/"
- "Implementing JSON:API endpoints"
- "Adding DRF pagination or permissions"
allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, WebSearch, Task
---
## ViewSet Pattern
## Critical Patterns
```python
from rest_framework import viewsets, status
from rest_framework.response import Response
from rest_framework.decorators import action
- ALWAYS separate serializers by operation: Read / Create / Update / Include
- ALWAYS use `filterset_class` for complex filtering (not `filterset_fields`)
- ALWAYS validate unknown fields in write serializers (inherit `BaseWriteSerializer`)
- ALWAYS use `select_related`/`prefetch_related` in `get_queryset()` to avoid N+1
- ALWAYS handle `swagger_fake_view` in `get_queryset()` for schema generation
- ALWAYS use `@extend_schema_field` for OpenAPI docs on `SerializerMethodField`
- NEVER put business logic in serializers - use services/utils
- NEVER use auto-increment PKs - use UUIDv4 or UUIDv7
- NEVER use trailing slashes in URLs (`trailing_slash=False`)
class UserViewSet(viewsets.ModelViewSet):
queryset = User.objects.all()
serializer_class = UserSerializer
filterset_class = UserFilter
permission_classes = [IsAuthenticated]
> **Note:** `swagger_fake_view` is specific to **drf-spectacular** for OpenAPI schema generation.
def get_serializer_class(self):
if self.action == "create":
return UserCreateSerializer
if self.action in ["update", "partial_update"]:
return UserUpdateSerializer
return UserSerializer
---
@action(detail=True, methods=["post"])
def activate(self, request, pk=None):
user = self.get_object()
user.is_active = True
user.save()
return Response({"status": "activated"})
## Implementation Checklist
When implementing a new endpoint, review these patterns in order:
| # | Pattern | Reference | Key Points |
|---|---------|-----------|------------|
| 1 | **Models** | `api/models.py` | UUID PK, `inserted_at`/`updated_at`, `JSONAPIMeta.resource_name` |
| 2 | **ViewSets** | `api/base_views.py`, `api/v1/views.py` | Inherit `BaseRLSViewSet`, `get_queryset()` with N+1 prevention |
| 3 | **Serializers** | `api/v1/serializers.py` | Separate Read/Create/Update/Include, inherit `BaseWriteSerializer` |
| 4 | **Filters** | `api/filters.py` | Use `filterset_class`, inherit base filter classes |
| 5 | **Permissions** | `api/base_views.py` | `required_permissions`, `set_required_permissions()` |
| 6 | **Pagination** | `api/pagination.py` | Custom pagination class if needed |
| 7 | **URL Routing** | `api/v1/urls.py` | `trailing_slash=False`, kebab-case paths |
| 8 | **OpenAPI Schema** | `api/v1/views.py` | `@extend_schema_view` with drf-spectacular |
| 9 | **Tests** | `api/tests/test_views.py` | JSON:API content type, fixture patterns |
> **Full file paths**: See [references/file-locations.md](references/file-locations.md)
---
## Decision Trees
### Which Serializer?
```
GET list/retrieve → <Model>Serializer
POST create → <Model>CreateSerializer
PATCH update → <Model>UpdateSerializer
?include=... → <Model>IncludeSerializer
```
### Which Base Serializer?
```
Read-only serializer → BaseModelSerializerV1
Create with tenant_id → RLSSerializer + BaseWriteSerializer (auto-injects tenant_id on create)
Update with validation → BaseWriteSerializer (tenant_id already exists on object)
Non-model data → BaseSerializerV1
```
### Which Filter Base?
```
Direct FK to Provider → BaseProviderFilter
FK via Scan → BaseScanProviderFilter
No provider relation → FilterSet
```
### Which Base ViewSet?
```
RLS-protected model → BaseRLSViewSet (most common)
Tenant operations → BaseTenantViewset
User operations → BaseUserViewset
No RLS required → BaseViewSet (rare)
```
### Resource Name Format?
```
Single word model → plural lowercase (Provider → providers)
Multi-word model → plural lowercase kebab (ProviderGroup → provider-groups)
Through/join model → parent-child pattern (UserRoleRelationship → user-roles)
Aggregation/overview → descriptive kebab plural (ComplianceOverview → compliance-overviews)
```
---
## Serializer Patterns
### Base Class Hierarchy
```python
from rest_framework import serializers
# Read Serializer
class UserSerializer(serializers.ModelSerializer):
full_name = serializers.SerializerMethodField()
# Read serializer (most common)
class ProviderSerializer(RLSSerializer):
class Meta:
model = User
fields = ["id", "email", "full_name", "created_at"]
read_only_fields = ["id", "created_at"]
def get_full_name(self, obj):
return f"{obj.first_name} {obj.last_name}"
# Create Serializer
class UserCreateSerializer(serializers.ModelSerializer):
password = serializers.CharField(write_only=True)
model = Provider
fields = ["id", "provider", "uid", "alias", "connected", "inserted_at"]
# Write serializer (validates unknown fields)
class ProviderCreateSerializer(RLSSerializer, BaseWriteSerializer):
class Meta:
model = User
fields = ["email", "password", "first_name", "last_name"]
model = Provider
fields = ["provider", "uid", "alias"]
def create(self, validated_data):
password = validated_data.pop("password")
user = User(**validated_data)
user.set_password(password)
user.save()
return user
# Update Serializer
class UserUpdateSerializer(serializers.ModelSerializer):
# Include serializer (sparse fields for ?include=)
class ProviderIncludeSerializer(RLSSerializer):
class Meta:
model = User
fields = ["first_name", "last_name"]
model = Provider
fields = ["id", "alias"] # Minimal fields
```
## Filters
### SerializerMethodField with OpenAPI
```python
from django_filters import rest_framework as filters
from drf_spectacular.utils import extend_schema_field
class UserFilter(filters.FilterSet):
email = filters.CharFilter(lookup_expr="icontains")
is_active = filters.BooleanFilter()
created_after = filters.DateTimeFilter(
field_name="created_at",
lookup_expr="gte"
)
created_before = filters.DateTimeFilter(
field_name="created_at",
lookup_expr="lte"
)
class ProviderSerializer(RLSSerializer):
connection = serializers.SerializerMethodField(read_only=True)
class Meta:
model = User
fields = ["email", "is_active"]
```
## Permissions
```python
from rest_framework.permissions import BasePermission
class IsOwner(BasePermission):
def has_object_permission(self, request, view, obj):
return obj.owner == request.user
class IsAdminOrReadOnly(BasePermission):
def has_permission(self, request, view):
if request.method in ["GET", "HEAD", "OPTIONS"]:
return True
return request.user.is_staff
```
## Pagination
```python
from rest_framework.pagination import PageNumberPagination
class StandardPagination(PageNumberPagination):
page_size = 20
page_size_query_param = "page_size"
max_page_size = 100
# settings.py
REST_FRAMEWORK = {
"DEFAULT_PAGINATION_CLASS": "api.pagination.StandardPagination",
@extend_schema_field({
"type": "object",
"properties": {
"connected": {"type": "boolean"},
"last_checked_at": {"type": "string", "format": "date-time"},
},
})
def get_connection(self, obj):
return {
"connected": obj.connected,
"last_checked_at": obj.connection_last_checked_at,
}
```
## URL Routing
### Included Serializers (JSON:API)
```python
from rest_framework.routers import DefaultRouter
router = DefaultRouter()
router.register(r"users", UserViewSet, basename="user")
router.register(r"posts", PostViewSet, basename="post")
urlpatterns = [
path("api/v1/", include(router.urls)),
]
class ScanSerializer(RLSSerializer):
included_serializers = {
"provider": "api.v1.serializers.ProviderIncludeSerializer",
}
```
## Testing
### Sensitive Data Masking
```python
import pytest
from rest_framework import status
from rest_framework.test import APIClient
@pytest.fixture
def api_client():
return APIClient()
@pytest.fixture
def authenticated_client(api_client, user):
api_client.force_authenticate(user=user)
return api_client
@pytest.mark.django_db
class TestUserViewSet:
def test_list_users(self, authenticated_client):
response = authenticated_client.get("/api/v1/users/")
assert response.status_code == status.HTTP_200_OK
def test_create_user(self, authenticated_client):
data = {"email": "new@test.com", "password": "pass123"}
response = authenticated_client.post("/api/v1/users/", data)
assert response.status_code == status.HTTP_201_CREATED
def to_representation(self, instance):
data = super().to_representation(instance)
# Mask by default, expose only on explicit request
fields_param = self.context.get("request").query_params.get("fields[my-model]", "")
if "api_key" in fields_param:
data["api_key"] = instance.api_key_decoded
else:
data["api_key"] = "****" if instance.api_key else None
return data
```
---
## ViewSet Patterns
### get_queryset() with N+1 Prevention
**Always combine** `swagger_fake_view` check with `select_related`/`prefetch_related`:
```python
def get_queryset(self):
# REQUIRED: Return empty queryset for OpenAPI schema generation
if getattr(self, "swagger_fake_view", False):
return Provider.objects.none()
# N+1 prevention: eager load relationships
return Provider.objects.select_related(
"tenant",
).prefetch_related(
"provider_groups",
Prefetch("tags", queryset=ProviderTag.objects.filter(tenant_id=self.request.tenant_id)),
)
```
> **Why swagger_fake_view?** drf-spectacular introspects ViewSets to generate OpenAPI schemas. Without this check, it executes real queries and can fail without request context.
### Action-Specific Serializers
```python
def get_serializer_class(self):
if self.action == "create":
return ProviderCreateSerializer
elif self.action == "partial_update":
return ProviderUpdateSerializer
elif self.action in ["connection", "destroy"]:
return TaskSerializer
return ProviderSerializer
```
### Dynamic Permissions per Action
```python
class ProviderViewSet(BaseRLSViewSet):
required_permissions = [Permissions.MANAGE_PROVIDERS]
def set_required_permissions(self):
if self.action in ["list", "retrieve"]:
self.required_permissions = [] # Read-only = no permission
else:
self.required_permissions = [Permissions.MANAGE_PROVIDERS]
```
### Cache Decorator
```python
from django.utils.decorators import method_decorator
from django.views.decorators.cache import cache_control
CACHE_DECORATOR = cache_control(
max_age=django_settings.CACHE_MAX_AGE,
stale_while_revalidate=django_settings.CACHE_STALE_WHILE_REVALIDATE,
)
@method_decorator(CACHE_DECORATOR, name="list")
@method_decorator(CACHE_DECORATOR, name="retrieve")
class ProviderViewSet(BaseRLSViewSet):
pass
```
### Custom Actions
```python
# Detail action (operates on single object)
@action(detail=True, methods=["post"], url_name="connection")
def connection(self, request, pk=None):
instance = self.get_object()
# Process instance...
# List action (operates on collection)
@action(detail=False, methods=["get"], url_name="metadata")
def metadata(self, request):
queryset = self.filter_queryset(self.get_queryset())
# Aggregate over queryset...
```
---
## Filter Patterns
### Base Filter Classes
```python
class BaseProviderFilter(FilterSet):
"""For models with direct FK to Provider"""
provider_id = UUIDFilter(field_name="provider__id", lookup_expr="exact")
provider_id__in = UUIDInFilter(field_name="provider__id", lookup_expr="in")
provider_type = ChoiceFilter(field_name="provider__provider", choices=Provider.ProviderChoices.choices)
class BaseScanProviderFilter(FilterSet):
"""For models with FK to Scan (Scan has FK to Provider)"""
provider_id = UUIDFilter(field_name="scan__provider__id", lookup_expr="exact")
```
### Custom Multi-Value Filters
```python
class UUIDInFilter(BaseInFilter, UUIDFilter):
pass
class CharInFilter(BaseInFilter, CharFilter):
pass
class ChoiceInFilter(BaseInFilter, ChoiceFilter):
pass
```
### ArrayField Filtering
```python
# Single value contains
region = CharFilter(method="filter_region")
def filter_region(self, queryset, name, value):
return queryset.filter(resource_regions__contains=[value])
# Multi-value overlap
region__in = CharInFilter(field_name="resource_regions", lookup_expr="overlap")
```
### Date Range Validation
```python
def filter_queryset(self, queryset):
# Require date filter for performance
if not (date_filters_provided):
raise ValidationError([{
"detail": "At least one date filter is required",
"status": 400,
"source": {"pointer": "/data/attributes/inserted_at"},
"code": "required",
}])
# Validate max range
if date_range > settings.FINDINGS_MAX_DAYS_IN_RANGE:
raise ValidationError(...)
return super().filter_queryset(queryset)
```
### Dynamic FilterSet Selection
```python
def get_filterset_class(self):
if self.action in ["latest", "metadata_latest"]:
return LatestFindingFilter
return FindingFilter
```
### Enum Field Override
```python
class Meta:
model = Finding
filter_overrides = {
FindingDeltaEnumField: {"filter_class": CharFilter},
StatusEnumField: {"filter_class": CharFilter},
SeverityEnumField: {"filter_class": CharFilter},
}
```
---
## Performance Patterns
### PaginateByPkMixin
For large querysets with expensive joins:
```python
class PaginateByPkMixin:
def paginate_by_pk(self, request, base_queryset, manager,
select_related=None, prefetch_related=None):
# 1. Get PKs only (cheap)
pk_list = base_queryset.values_list("id", flat=True)
page = self.paginate_queryset(pk_list)
# 2. Fetch full objects for just the page
queryset = manager.filter(id__in=page)
if select_related:
queryset = queryset.select_related(*select_related)
if prefetch_related:
queryset = queryset.prefetch_related(*prefetch_related)
# 3. Re-sort to preserve DB ordering
queryset = sorted(queryset, key=lambda obj: page.index(obj.id))
return self.get_paginated_response(self.get_serializer(queryset, many=True).data)
```
### Prefetch in Serializers
```python
def get_tags(self, obj):
# Use prefetched tags if available
if hasattr(obj, "prefetched_tags"):
return {tag.key: tag.value for tag in obj.prefetched_tags}
# Fallback (causes N+1 if not prefetched)
return obj.get_tags(self.context.get("tenant_id"))
```
---
## Naming Conventions
| Entity | Pattern | Example |
|--------|---------|---------|
| Serializer (read) | `<Model>Serializer` | `ProviderSerializer` |
| Serializer (create) | `<Model>CreateSerializer` | `ProviderCreateSerializer` |
| Serializer (update) | `<Model>UpdateSerializer` | `ProviderUpdateSerializer` |
| Serializer (include) | `<Model>IncludeSerializer` | `ProviderIncludeSerializer` |
| Filter | `<Model>Filter` | `ProviderFilter` |
| ViewSet | `<Model>ViewSet` | `ProviderViewSet` |
---
## OpenAPI Documentation
```python
from drf_spectacular.utils import extend_schema, extend_schema_view
@extend_schema_view(
list=extend_schema(tags=["Provider"], summary="List all providers"),
retrieve=extend_schema(tags=["Provider"], summary="Retrieve provider"),
create=extend_schema(tags=["Provider"], summary="Create provider"),
)
@extend_schema(tags=["Provider"])
class ProviderViewSet(BaseRLSViewSet):
pass
```
---
## API Security Patterns
> **Full examples**: See [assets/security_patterns.py](assets/security_patterns.py)
| Pattern | Key Points |
|---------|------------|
| **Input Validation** | Use `validate_<field>()` for sanitization, `validate()` for cross-field |
| **Prevent Mass Assignment** | ALWAYS use explicit `fields` list, NEVER `__all__` or `exclude` |
| **Object-Level Permissions** | Implement `has_object_permission()` for ownership checks |
| **Rate Limiting** | Configure `DEFAULT_THROTTLE_RATES`, use per-view throttles for sensitive endpoints |
| **Prevent Info Disclosure** | Generic error messages, return 404 not 403 for unauthorized (prevents enumeration) |
| **SQL Injection** | ALWAYS use ORM parameterization, NEVER string interpolation in raw SQL |
### Quick Reference
```python
# Input validation in serializer
def validate_uid(self, value):
value = value.strip().lower()
if not re.match(r'^[a-z0-9-]+$', value):
raise serializers.ValidationError("Invalid format")
return value
# Explicit fields (prevent mass assignment)
class Meta:
fields = ["name", "email"] # GOOD: whitelist
read_only_fields = ["id", "inserted_at"] # System fields
# Object permission
class IsOwnerOrReadOnly(BasePermission):
def has_object_permission(self, request, view, obj):
if request.method in SAFE_METHODS:
return True
return obj.owner == request.user
# Throttling for sensitive endpoints
class BurstRateThrottle(UserRateThrottle):
rate = "10/minute"
# Safe error messages (prevent enumeration)
def get_object(self):
try:
return super().get_object()
except Http404:
raise NotFound("Resource not found") # Generic, no internal IDs
```
---
## Commands
```bash
python manage.py runserver
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
python manage.py shell
# Development
cd api && poetry run python src/backend/manage.py runserver
cd api && poetry run python src/backend/manage.py shell
# Database
cd api && poetry run python src/backend/manage.py makemigrations
cd api && poetry run python src/backend/manage.py migrate
# Testing
cd api && poetry run pytest -x --tb=short
cd api && poetry run make lint
```
---
## Resources
### Local References
- **File Locations**: See [references/file-locations.md](references/file-locations.md)
- **JSON:API Conventions**: See [references/json-api-conventions.md](references/json-api-conventions.md)
- **Security Patterns**: See [assets/security_patterns.py](assets/security_patterns.py)
### Context7 MCP (Recommended)
**Prerequisite:** Install Context7 MCP server for up-to-date documentation lookup.
When implementing or debugging, query these libraries via `mcp_context7_query-docs`:
| Library | Context7 ID | Use For |
|---------|-------------|---------|
| **Django** | `/websites/djangoproject_en_5_2` | Models, ORM, migrations |
| **DRF** | `/websites/django-rest-framework` | ViewSets, serializers, permissions |
| **drf-spectacular** | `/tfranzel/drf-spectacular` | OpenAPI schema, `@extend_schema` |
**Example queries:**
```
mcp_context7_query-docs(libraryId="/websites/django-rest-framework", query="ViewSet get_queryset best practices")
mcp_context7_query-docs(libraryId="/tfranzel/drf-spectacular", query="extend_schema examples for custom actions")
mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_2", query="model constraints and indexes")
```
> **Note:** Use `mcp_context7_resolve-library-id` first if you need to find the correct library ID.
### External Docs
- **DRF Docs**: https://www.django-rest-framework.org/
- **DRF JSON:API**: https://django-rest-framework-json-api.readthedocs.io/
- **drf-spectacular**: https://drf-spectacular.readthedocs.io/
- **django-filter**: https://django-filter.readthedocs.io/

View File

@@ -0,0 +1,159 @@
# Example: DRF API Security Patterns
# Reference for django-drf skill
import re
from rest_framework import serializers, status, viewsets
from rest_framework.exceptions import NotFound
from rest_framework.permissions import SAFE_METHODS, BasePermission, IsAuthenticated
from rest_framework.throttling import UserRateThrottle
# =============================================================================
# INPUT VALIDATION
# =============================================================================
class ProviderCreateSerializer(serializers.Serializer):
"""Example: Input validation in serializers."""
uid = serializers.CharField(max_length=255)
provider = serializers.CharField()
def validate_uid(self, value):
"""Field-level validation with sanitization."""
# Sanitize: strip whitespace, normalize
value = value.strip().lower()
# Validate format
if not re.match(r"^[a-z0-9-]+$", value):
raise serializers.ValidationError(
"UID must be alphanumeric with hyphens only"
)
return value
def validate(self, attrs):
"""Cross-field validation."""
if attrs.get("provider") == "aws" and len(attrs.get("uid", "")) != 12:
raise serializers.ValidationError(
{"uid": "AWS account ID must be 12 digits"}
)
return attrs
# =============================================================================
# PREVENT MASS ASSIGNMENT
# =============================================================================
class UserUpdateSerializer(serializers.ModelSerializer):
"""Example: Explicit field whitelist prevents mass assignment."""
class Meta:
# GOOD: Explicit whitelist
fields = ["name", "email"]
# BAD: fields = "__all__" # Exposes is_staff, is_superuser
# BAD: exclude = ["password"] # New fields auto-exposed
class ProviderSerializer(serializers.ModelSerializer):
"""Example: Read-only fields for computed/system values."""
class Meta:
fields = ["id", "uid", "alias", "connected", "inserted_at"]
# Cannot be set via API - only read
read_only_fields = ["id", "connected", "inserted_at"]
# =============================================================================
# OBJECT-LEVEL PERMISSIONS
# =============================================================================
class IsOwnerOrReadOnly(BasePermission):
"""Example: Object-level permission check."""
def has_object_permission(self, request, view, obj):
# Read permissions for any authenticated request
if request.method in SAFE_METHODS:
return True
# Write permissions only for owner
return obj.owner == request.user
class DocumentViewSet(viewsets.ModelViewSet):
"""Example: ViewSet with object-level permissions."""
permission_classes = [IsAuthenticated, IsOwnerOrReadOnly]
# =============================================================================
# RATE LIMITING (THROTTLING)
# =============================================================================
# In settings.py:
# REST_FRAMEWORK = {
# "DEFAULT_THROTTLE_CLASSES": [
# "rest_framework.throttling.AnonRateThrottle",
# "rest_framework.throttling.UserRateThrottle",
# ],
# "DEFAULT_THROTTLE_RATES": {
# "anon": "100/hour",
# "user": "1000/hour",
# },
# }
class BurstRateThrottle(UserRateThrottle):
"""Example: Custom throttle for sensitive endpoints."""
rate = "10/minute"
class PasswordResetViewSet(viewsets.ViewSet):
"""Example: Per-view throttling for sensitive endpoints."""
throttle_classes = [BurstRateThrottle]
# =============================================================================
# PREVENT INFORMATION DISCLOSURE
# =============================================================================
class SecureViewSet(viewsets.ModelViewSet):
"""Example: Prevent information disclosure patterns."""
def get_object(self):
try:
return super().get_object()
except Exception:
# GOOD: Generic message - doesn't leak internal IDs or tenant info
raise NotFound("Resource not found")
# BAD: raise NotFound(f"Provider {pk} not found in tenant {tenant_id}")
def get_queryset(self):
# Use 404 not 403 for unauthorized access (prevents enumeration)
# Filter by tenant - unauthorized users get 404, not 403
return self.queryset.filter(tenant_id=self.request.tenant_id)
# =============================================================================
# SQL INJECTION PREVENTION
# =============================================================================
def safe_query_examples(user_input):
"""Example: SQL injection prevention patterns."""
from django.db import connection
# GOOD: Parameterized via ORM
# Provider.objects.filter(uid=user_input)
# Provider.objects.extra(where=["uid = %s"], params=[user_input])
# GOOD: If raw SQL unavoidable, use parameterized queries
with connection.cursor() as cursor:
cursor.execute("SELECT * FROM providers WHERE uid = %s", [user_input])
# BAD: String interpolation = SQL injection vulnerability
# Provider.objects.raw(f"SELECT * FROM providers WHERE uid = '{user_input}'")
# cursor.execute(f"SELECT * FROM providers WHERE uid = '{user_input}'")

View File

@@ -0,0 +1,154 @@
# Django-DRF File Locations
## Core API Files
| Pattern | File Path | Key Classes |
|---------|-----------|-------------|
| **Models** | `api/src/backend/api/models.py` | `Provider`, `Scan`, `Finding`, `Resource`, `StateChoices`, `StatusChoices` |
| **ViewSets** | `api/src/backend/api/v1/views.py` | `BaseViewSet`, `BaseRLSViewSet`, `BaseTenantViewset`, `BaseUserViewset` |
| **Serializers** | `api/src/backend/api/v1/serializers.py` | `BaseModelSerializerV1`, `BaseWriteSerializer`, `RLSSerializer` |
| **Filters** | `api/src/backend/api/filters.py` | `BaseProviderFilter`, `BaseScanProviderFilter`, `CommonFindingFilters` |
| **URL Routing** | `api/src/backend/api/v1/urls.py` | Router setup, nested routes |
| **Pagination** | `api/src/backend/api/pagination.py` | `LimitedJsonApiPageNumberPagination` |
| **Permissions** | `api/src/backend/api/decorators.py` | `HasPermissions`, `@check_permissions` |
| **RBAC** | `api/src/backend/api/rbac/permissions.py` | `Permissions` enum, `get_role()`, `get_providers()` |
| **Settings** | `api/src/backend/config/settings.py` | `REST_FRAMEWORK` config |
## ViewSet Hierarchy
```
BaseViewSet (minimal - no RLS/auth)
├── BaseRLSViewSet (+ tenant filtering, RLS-protected models)
│ └── Most ViewSets inherit this
├── BaseTenantViewset (+ Tenant-specific logic)
│ └── TenantViewSet
└── BaseUserViewset (+ User-specific logic)
└── UserViewSet
```
## Serializer Hierarchy
```
BaseModelSerializerV1 (JSON:API defaults, read_only_fields)
├── RLSSerializer (auto-injects tenant_id from request)
│ └── Most model serializers inherit this
└── BaseWriteSerializer (rejects unknown fields)
└── Create/Update serializers
+ Mixins:
- IncludedResourcesValidationMixin (validates ?include= param)
- JSONAPIRelatedLinksSerializerMixin (adds related links)
```
## Filter Hierarchy
```
FilterSet (django-filter)
├── CommonFindingFilters (mixin for date ranges, delta, status)
├── BaseProviderFilter (provider_type, provider_uid, provider_alias)
│ │
│ └── BaseScanProviderFilter (+ scan_id, scan filters)
└── Resource-specific filters (ProviderFilter, ScanFilter, etc.)
Custom Filter Types:
- UUIDInFilter: Comma-separated UUIDs
- CharInFilter: Comma-separated strings
- DateFilter: ISO date parsing
- DateTimeFilter: ISO datetime parsing
```
## Testing Files
| Pattern | File Path | Key Classes |
|---------|-----------|-------------|
| **ViewSet Tests** | `api/src/backend/api/tests/test_views.py` | Test patterns, fixtures |
| **RBAC Tests** | `api/src/backend/api/tests/test_rbac.py` | Permission tests |
| **Serializer Tests** | `api/src/backend/api/tests/test_serializers.py` | Validation tests |
| **Conftest** | `api/src/backend/conftest.py` | Shared fixtures |
## Key Patterns
### Filter Usage
```python
# In filters.py
class ProviderFilter(BaseProviderFilter):
class Meta:
model = Provider
fields = {
"provider": ["exact", "in"],
"connected": ["exact"],
}
# Custom filter method
def filter_severity(self, queryset, name, value):
if not value:
return queryset
return queryset.filter(severity__in=value)
```
### Serializer Usage
```python
# Read serializer
class ProviderSerializer(RLSSerializer):
class Meta:
model = Provider
fields = ["id", "provider", "uid", "alias", "connected"]
# Write serializer
class ProviderCreateSerializer(BaseWriteSerializer, RLSSerializer):
class Meta:
model = Provider
fields = ["provider", "uid", "alias"]
```
### ViewSet Action Pattern
```python
@action(detail=True, methods=["post"], url_path="scan")
def trigger_scan(self, request, pk=None):
provider = self.get_object()
task = perform_scan_task.delay(...)
return Response(status=status.HTTP_202_ACCEPTED)
```
## REST_FRAMEWORK Settings
Located in `api/src/backend/config/settings.py`:
```python
REST_FRAMEWORK = {
"PAGE_SIZE": 10,
"DEFAULT_PAGINATION_CLASS": "api.pagination.LimitedJsonApiPageNumberPagination",
"DEFAULT_PARSER_CLASSES": [
"rest_framework_json_api.parsers.JSONParser",
"rest_framework.parsers.JSONParser",
],
"DEFAULT_FILTER_BACKENDS": [
"rest_framework_json_api.filters.QueryParameterValidationFilter",
"rest_framework_json_api.filters.OrderingFilter",
"rest_framework_json_api.django_filters.DjangoFilterBackend",
"rest_framework.filters.SearchFilter",
],
"EXCEPTION_HANDLER": "rest_framework_json_api.exceptions.exception_handler",
# ... more settings
}
```
## JSON:API Resource Names
Find all `JSONAPIMeta` declarations:
```bash
rg "resource_name" api/src/backend/api/models.py
```
Convention: kebab-case, plural (e.g., `provider-groups`, `mute-rules`)

View File

@@ -0,0 +1,116 @@
# JSON:API Conventions
## Content Type
```
Content-Type: application/vnd.api+json
Accept: application/vnd.api+json
```
## Query Parameters
| Feature | Format | Example |
|---------|--------|---------|
| **Pagination** | `page[number]`, `page[size]` | `?page[number]=2&page[size]=20` |
| **Filtering** | `filter[field]`, `filter[field__lookup]` | `?filter[status]=FAIL&filter[inserted_at__gte]=2024-01-01` |
| **Sorting** | `sort` (prefix `-` for desc) | `?sort=-inserted_at,name` |
| **Sparse fields** | `fields[type]` | `?fields[providers]=id,alias,uid` |
| **Includes** | `include` | `?include=provider,scan` |
| **Search** | `filter[search]` | `?filter[search]=production` |
## Filter Naming
| Lookup | Django Filter | JSON:API Query |
|--------|--------------|----------------|
| Exact | `field` | `filter[field]=value` |
| Contains | `field__icontains` | `filter[field__icontains]=val` |
| In list | `field__in` | `filter[field__in]=a,b,c` |
| Greater/equal | `field__gte` | `filter[field__gte]=2024-01-01` |
| Less/equal | `field__lte` | `filter[field__lte]=2024-12-31` |
| Related field | `relation__field` | `filter[provider_id]=uuid` |
## Request Format
```json
{
"data": {
"type": "providers",
"attributes": {
"provider": "aws",
"uid": "123456789012",
"alias": "Production"
}
}
}
```
## Response Format
```json
{
"data": {
"type": "providers",
"id": "550e8400-e29b-41d4-a716-446655440000",
"attributes": {
"provider": "aws",
"uid": "123456789012",
"alias": "Production",
"inserted_at": "2024-01-15T10:30:00Z"
},
"relationships": {
"provider_groups": {
"data": [{"type": "provider-groups", "id": "..."}]
}
},
"links": {
"self": "/api/v1/providers/550e8400-e29b-41d4-a716-446655440000"
}
},
"meta": {
"version": "v1"
}
}
```
## Error Response Format
```json
{
"errors": [
{
"detail": "Error message here",
"status": "400",
"source": {"pointer": "/data/attributes/field_name"},
"code": "error_code"
}
]
}
```
## Resource Naming Rules
- Use **lowercase kebab-case** (hyphens, not underscores)
- Use **plural nouns** for collections
- Resource name in `JSONAPIMeta` MUST match URL path segment
| Model | resource_name | URL Path |
|-------|---------------|----------|
| `Provider` | `providers` | `/api/v1/providers` |
| `ProviderGroup` | `provider-groups` | `/api/v1/provider-groups` |
| `ProviderSecret` | `provider-secrets` | `/api/v1/providers/secrets` |
| `ComplianceOverview` | `compliance-overviews` | `/api/v1/compliance-overviews` |
| `AttackPathsScan` | `attack-paths-scans` | `/api/v1/attack-paths-scans` |
| `TenantAPIKey` | `api-keys` | `/api/v1/api-keys` |
| `MuteRule` | `mute-rules` | `/api/v1/mute-rules` |
## URL Endpoints
| Operation | Method | URL Pattern |
|-----------|--------|-------------|
| List | GET | `/{resources}` |
| Create | POST | `/{resources}` |
| Retrieve | GET | `/{resources}/{id}` |
| Update | PATCH | `/{resources}/{id}` |
| Delete | DELETE | `/{resources}/{id}` |
| Relationship | * | `/{resources}/{id}/relationships/{relation}` |
| Nested list | GET | `/{parent}/{parent_id}/{resources}` |

271
skills/jsonapi/SKILL.md Normal file
View File

@@ -0,0 +1,271 @@
---
name: jsonapi
description: >
Strict JSON:API v1.1 specification compliance.
Trigger: When creating or modifying API endpoints, reviewing API responses, or validating JSON:API compliance.
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.0.0"
scope: [root, api]
auto_invoke:
- "Creating API endpoints"
- "Modifying API responses"
- "Reviewing JSON:API compliance"
---
## Use With django-drf
This skill focuses on **spec compliance**. For **implementation patterns** (ViewSets, Serializers, Filters), use `django-drf` skill together with this one.
| Skill | Focus |
|-------|-------|
| `jsonapi` | What the spec requires (MUST/MUST NOT rules) |
| `django-drf` | How to implement it in DRF (code patterns) |
**When creating/modifying endpoints, invoke BOTH skills.**
---
## Before Implementing/Reviewing
**ALWAYS validate against the latest spec** before creating or modifying endpoints:
### Option 1: Context7 MCP (Preferred)
If Context7 MCP is available, query the JSON:API spec directly:
```
mcp_context7_resolve-library-id(query="jsonapi specification")
mcp_context7_query-docs(libraryId="<resolved-id>", query="[specific topic: relationships, errors, etc.]")
```
### Option 2: WebFetch (Fallback)
If Context7 is not available, fetch from the official spec:
```
WebFetch(url="https://jsonapi.org/format/", prompt="Extract rules for [specific topic]")
```
This ensures compliance with the latest JSON:API version, even after spec updates.
---
## Critical Rules (NEVER Break)
### Document Structure
- NEVER include both `data` and `errors` in the same response
- ALWAYS include at least one of: `data`, `errors`, `meta`
- ALWAYS use `type` and `id` (string) in resource objects
- NEVER include `id` when creating resources (server generates it)
### Content-Type
- ALWAYS use `Content-Type: application/vnd.api+json`
- ALWAYS use `Accept: application/vnd.api+json`
- NEVER add parameters to media type without `ext`/`profile`
### Resource Objects
- ALWAYS use **string** for `id` (even if UUID)
- ALWAYS use **lowercase kebab-case** for `type`
- NEVER put `id` or `type` inside `attributes`
- NEVER include foreign keys in `attributes` - use `relationships`
### Relationships
- ALWAYS include at least one of: `links`, `data`, or `meta`
- ALWAYS use resource linkage format: `{"type": "...", "id": "..."}`
- NEVER use raw IDs in relationships - always use linkage objects
### Error Objects
- ALWAYS return errors as array: `{"errors": [...]}`
- ALWAYS include `status` as **string** (e.g., `"400"`, not `400`)
- ALWAYS include `source.pointer` for field-specific errors
---
## HTTP Status Codes (Mandatory)
| Operation | Success | Async | Conflict | Not Found | Forbidden | Bad Request |
|-----------|---------|-------|----------|-----------|-----------|-------------|
| **GET** | `200` | - | - | `404` | `403` | `400` |
| **POST** | `201` | `202` | `409` | `404` | `403` | `400` |
| **PATCH** | `200` | `202` | `409` | `404` | `403` | `400` |
| **DELETE** | `200`/`204` | `202` | - | `404` | `403` | - |
### When to Use Each
| Code | Use When |
|------|----------|
| `200 OK` | Successful GET, PATCH with response body, DELETE with response |
| `201 Created` | POST created resource (MUST include `Location` header) |
| `202 Accepted` | Async operation started (return task reference) |
| `204 No Content` | Successful DELETE, PATCH with no response body |
| `400 Bad Request` | Invalid query params, malformed request, unknown fields |
| `403 Forbidden` | Authentication ok but no permission, client-generated ID rejected |
| `404 Not Found` | Resource doesn't exist OR RLS hides it (never reveal which) |
| `409 Conflict` | Duplicate ID, type mismatch, relationship conflict |
| `415 Unsupported` | Wrong Content-Type header |
---
## Document Structure
### Success Response (Single)
```json
{
"data": {
"type": "providers",
"id": "550e8400-e29b-41d4-a716-446655440000",
"attributes": {
"alias": "Production",
"connected": true
},
"relationships": {
"tenant": {
"data": {"type": "tenants", "id": "..."}
}
},
"links": {
"self": "/api/v1/providers/550e8400-..."
}
},
"links": {
"self": "/api/v1/providers/550e8400-..."
}
}
```
### Success Response (List)
```json
{
"data": [
{"type": "providers", "id": "...", "attributes": {...}},
{"type": "providers", "id": "...", "attributes": {...}}
],
"links": {
"self": "/api/v1/providers?page[number]=1",
"first": "/api/v1/providers?page[number]=1",
"last": "/api/v1/providers?page[number]=5",
"prev": null,
"next": "/api/v1/providers?page[number]=2"
},
"meta": {
"pagination": {"count": 100, "pages": 5}
}
}
```
### Error Response
```json
{
"errors": [
{
"status": "400",
"code": "invalid",
"title": "Invalid attribute",
"detail": "UID must be 12 digits for AWS accounts",
"source": {"pointer": "/data/attributes/uid"}
}
]
}
```
---
## Query Parameters
| Family | Format | Example |
|--------|--------|---------|
| `page` | `page[number]`, `page[size]` | `?page[number]=2&page[size]=25` |
| `filter` | `filter[field]`, `filter[field__op]` | `?filter[status]=FAIL` |
| `sort` | Comma-separated, `-` for desc | `?sort=-inserted_at,name` |
| `fields` | `fields[type]` | `?fields[providers]=id,alias` |
| `include` | Comma-separated paths | `?include=provider,scan.task` |
### Rules
- MUST return `400` for unsupported query parameters
- MUST return `400` for unsupported `include` paths
- MUST return `400` for unsupported `sort` fields
- MUST NOT include extra fields when `fields[type]` is specified
---
## Common Violations (AVOID)
| Violation | Wrong | Correct |
|-----------|-------|---------|
| ID as integer | `"id": 123` | `"id": "123"` |
| Type as camelCase | `"type": "providerGroup"` | `"type": "provider-groups"` |
| FK in attributes | `"tenant_id": "..."` | `"relationships": {"tenant": {...}}` |
| Errors not array | `{"error": "..."}` | `{"errors": [{"detail": "..."}]}` |
| Status as number | `"status": 400` | `"status": "400"` |
| Data + errors | `{"data": ..., "errors": ...}` | Only one or the other |
| Missing pointer | `{"detail": "Invalid"}` | `{"detail": "...", "source": {"pointer": "..."}}` |
---
## Relationship Updates
### To-One Relationship
```http
PATCH /api/v1/providers/123/relationships/tenant
Content-Type: application/vnd.api+json
{"data": {"type": "tenants", "id": "456"}}
```
To clear: `{"data": null}`
### To-Many Relationship
| Operation | Method | Body |
|-----------|--------|------|
| Replace all | PATCH | `{"data": [{...}, {...}]}` |
| Add members | POST | `{"data": [{...}]}` |
| Remove members | DELETE | `{"data": [{...}]}` |
---
## Compound Documents (`include`)
When using `?include=provider`:
```json
{
"data": {
"type": "scans",
"id": "...",
"relationships": {
"provider": {
"data": {"type": "providers", "id": "prov-123"}
}
}
},
"included": [
{
"type": "providers",
"id": "prov-123",
"attributes": {"alias": "Production"}
}
]
}
```
### Rules
- Every included resource MUST be reachable via relationship chain from primary data
- MUST NOT include orphan resources
- MUST NOT duplicate resources (same type+id)
---
## Spec Reference
- **Full Specification**: https://jsonapi.org/format/
- **Implementation**: Use `django-drf` skill for DRF-specific patterns
- **Testing**: Use `prowler-test-api` skill for test patterns

View File

@@ -1,29 +1,205 @@
---
name: prowler-api
description: >
Prowler API patterns: JSON:API, RLS, RBAC, providers, Celery tasks.
Trigger: When working in api/ on models/serializers/viewsets/filters/tasks involving tenant isolation (RLS), RBAC, JSON:API, or provider lifecycle.
Prowler API patterns: RLS, RBAC, providers, Celery tasks.
Trigger: When working in api/ on models/serializers/viewsets/filters/tasks involving tenant isolation (RLS), RBAC, or provider lifecycle.
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.0"
version: "1.2.0"
scope: [root, api]
auto_invoke: "Creating/modifying models, views, serializers"
allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, WebSearch, Task
---
## When to Use
Use this skill for **Prowler-specific** patterns:
- Row-Level Security (RLS) / tenant isolation
- RBAC permissions and role checks
- Provider lifecycle and validation
- Celery tasks with tenant context
- Multi-database architecture (4-database setup)
For **generic DRF patterns** (ViewSets, Serializers, Filters, JSON:API), use `django-drf` skill.
---
## Critical Rules
- ALWAYS use `rls_transaction(tenant_id)` when querying outside ViewSet context
- ALWAYS use `get_role()` before checking permissions (returns FIRST role only)
- NEVER access `Provider.objects` without RLS context in Celery tasks
- ALWAYS use `@set_tenant` then `@handle_provider_deletion` decorator order
- ALWAYS use explicit through models for M2M relationships (required for RLS)
- NEVER access `Provider.objects` without RLS context in Celery tasks
- NEVER bypass RLS by using raw SQL or `connection.cursor()`
- NEVER use Django's default M2M - RLS requires through models with `tenant_id`
> **Note**: `rls_transaction()` accepts both UUID objects and strings - it converts internally via `str(value)`.
---
## 1. Providers (10 Supported)
## Architecture Overview
UID validation is dynamic: `getattr(self, f"validate_{self.provider}_uid")(self.uid)`
### 4-Database Architecture
| Database | Alias | Purpose | RLS |
|----------|-------|---------|-----|
| `default` | `prowler_user` | Standard API queries | **Yes** |
| `admin` | `admin` | Migrations, auth bypass | No |
| `replica` | `prowler_user` | Read-only queries | **Yes** |
| `admin_replica` | `admin` | Admin read replica | No |
```python
# When to use admin (bypasses RLS)
from api.db_router import MainRouter
User.objects.using(MainRouter.admin_db).get(id=user_id) # Auth lookups
# Standard queries use default (RLS enforced)
Provider.objects.filter(connected=True) # Requires rls_transaction context
```
### RLS Transaction Flow
```
Request → Authentication → BaseRLSViewSet.initial()
├─ Extract tenant_id from JWT
├─ SET api.tenant_id = 'uuid' (PostgreSQL)
└─ All queries now tenant-scoped
```
---
## Implementation Checklist
When implementing Prowler-specific API features:
| # | Pattern | Reference | Key Points |
|---|---------|-----------|------------|
| 1 | **RLS Models** | `api/rls.py` | Inherit `RowLevelSecurityProtectedModel`, add constraint |
| 2 | **RLS Transactions** | `api/db_utils.py` | Use `rls_transaction(tenant_id)` context manager |
| 3 | **RBAC Permissions** | `api/rbac/permissions.py` | `get_role()`, `get_providers()`, `Permissions` enum |
| 4 | **Provider Validation** | `api/models.py` | `validate_<provider>_uid()` methods on `Provider` model |
| 5 | **Celery Tasks** | `tasks/tasks.py`, `api/decorators.py`, `config/celery.py` | Task definitions, decorators (`@set_tenant`, `@handle_provider_deletion`), `RLSTask` base |
| 6 | **RLS Serializers** | `api/v1/serializers.py` | Inherit `RLSSerializer` to auto-inject `tenant_id` |
| 7 | **Through Models** | `api/models.py` | ALL M2M must use explicit through with `tenant_id` |
> **Full file paths**: See [references/file-locations.md](references/file-locations.md)
---
## Decision Trees
### Which Base Model?
```
Tenant-scoped data → RowLevelSecurityProtectedModel
Global/shared data → models.Model + BaseSecurityConstraint (rare)
Partitioned time-series → PostgresPartitionedModel + RowLevelSecurityProtectedModel
Soft-deletable → Add is_deleted + ActiveProviderManager
```
### Which Manager?
```
Normal queries → Model.objects (excludes deleted)
Include deleted records → Model.all_objects
Celery task context → Must use rls_transaction() first
```
### Which Database?
```
Standard API queries → default (automatic via ViewSet)
Read-only operations → replica (automatic for GET in BaseRLSViewSet)
Auth/admin operations → MainRouter.admin_db
Cross-tenant lookups → MainRouter.admin_db (use sparingly!)
```
### Celery Task Decorator Order?
```
@shared_task(base=RLSTask, name="...", queue="...")
@set_tenant # First: sets tenant context
@handle_provider_deletion # Second: handles deleted providers
def my_task(tenant_id, provider_id):
pass
```
---
## RLS Model Pattern
```python
from api.rls import RowLevelSecurityProtectedModel, RowLevelSecurityConstraint
class MyModel(RowLevelSecurityProtectedModel):
# tenant FK inherited from parent
id = models.UUIDField(primary_key=True, default=uuid4, editable=False)
name = models.CharField(max_length=255)
inserted_at = models.DateTimeField(auto_now_add=True, editable=False)
updated_at = models.DateTimeField(auto_now=True, editable=False)
class Meta(RowLevelSecurityProtectedModel.Meta):
db_table = "my_models"
constraints = [
RowLevelSecurityConstraint(
field="tenant_id",
name="rls_on_%(class)s",
statements=["SELECT", "INSERT", "UPDATE", "DELETE"],
),
]
class JSONAPIMeta:
resource_name = "my-models"
```
### M2M Relationships (MUST use through models)
```python
class Resource(RowLevelSecurityProtectedModel):
tags = models.ManyToManyField(
ResourceTag,
through="ResourceTagMapping", # REQUIRED for RLS
)
class ResourceTagMapping(RowLevelSecurityProtectedModel):
# Through model MUST have tenant_id for RLS
resource = models.ForeignKey(Resource, on_delete=models.CASCADE)
tag = models.ForeignKey(ResourceTag, on_delete=models.CASCADE)
class Meta:
constraints = [
RowLevelSecurityConstraint(
field="tenant_id",
name="rls_on_%(class)s",
statements=["SELECT", "INSERT", "UPDATE", "DELETE"],
),
]
```
---
## Async Task Response Pattern (202 Accepted)
For long-running operations, return 202 with task reference:
```python
@action(detail=True, methods=["post"], url_name="connection")
def connection(self, request, pk=None):
with transaction.atomic():
task = check_provider_connection_task.delay(
provider_id=pk, tenant_id=self.request.tenant_id
)
prowler_task = Task.objects.get(id=task.id)
serializer = TaskSerializer(prowler_task)
return Response(
data=serializer.data,
status=status.HTTP_202_ACCEPTED,
headers={"Content-Location": reverse("task-detail", kwargs={"pk": prowler_task.id})}
)
```
---
## Providers (11 Supported)
| Provider | UID Format | Example |
|----------|-----------|---------|
@@ -42,98 +218,288 @@ UID validation is dynamic: `getattr(self, f"validate_{self.provider}_uid")(self.
---
## 2. Row-Level Security (RLS)
## RBAC Permissions
| Permission | Controls |
|------------|----------|
| `MANAGE_USERS` | User CRUD, role assignments |
| `MANAGE_ACCOUNT` | Tenant settings |
| `MANAGE_BILLING` | Billing/subscription |
| `MANAGE_PROVIDERS` | Provider CRUD |
| `MANAGE_INTEGRATIONS` | Integration config |
| `MANAGE_SCANS` | Scan execution |
| `UNLIMITED_VISIBILITY` | See all providers (bypasses provider_groups) |
### RBAC Visibility Pattern
```python
from api.db_utils import rls_transaction
with rls_transaction(tenant_id):
providers = Provider.objects.filter(connected=True)
# PostgreSQL enforces tenant_id automatically
```
Models inherit from `RowLevelSecurityProtectedModel` with `RowLevelSecurityConstraint`.
---
## 3. Managers
```python
Provider.objects.all() # Only is_deleted=False
Provider.all_objects.all() # All including deleted
Finding.objects.all() # Only from active providers
```
---
## 4. RBAC
```python
from api.rbac.permissions import get_role, get_providers, Permissions
user_role = get_role(self.request.user) # Returns FIRST role only
def get_queryset(self):
user_role = get_role(self.request.user)
if user_role.unlimited_visibility:
queryset = Provider.objects.filter(tenant_id=tenant_id)
return Model.objects.filter(tenant_id=self.request.tenant_id)
else:
queryset = get_providers(user_role) # Filtered by provider_groups
# Filter by provider_groups assigned to role
return Model.objects.filter(provider__in=get_providers(user_role))
```
**Permissions**: `MANAGE_USERS`, `MANAGE_ACCOUNT`, `MANAGE_BILLING`, `MANAGE_PROVIDERS`, `MANAGE_INTEGRATIONS`, `MANAGE_SCANS`, `UNLIMITED_VISIBILITY`
---
## 5. Celery Tasks
## Celery Queues
| Queue | Purpose |
|-------|---------|
| `scans` | Prowler scan execution |
| `overview` | Dashboard aggregations (severity, attack surface) |
| `compliance` | Compliance report generation |
| `integrations` | External integrations (Jira, S3, Security Hub) |
| `deletion` | Provider/tenant deletion (async) |
| `backfill` | Historical data backfill operations |
| `scan-reports` | Output generation (CSV, JSON, HTML, PDF) |
---
## Task Composition (Canvas)
Use Celery's Canvas primitives for complex workflows:
| Primitive | Use For |
|-----------|---------|
| `chain()` | Sequential execution: A → B → C |
| `group()` | Parallel execution: A, B, C simultaneously |
| Combined | Chain with nested groups for complex workflows |
> **Note:** Use `.si()` (signature immutable) to prevent result passing. Use `.s()` if you need to pass results.
> **Examples:** See [assets/celery_patterns.py](assets/celery_patterns.py) for chain, group, and combined patterns.
---
## Beat Scheduling (Periodic Tasks)
| Operation | Key Points |
|-----------|------------|
| **Create schedule** | `IntervalSchedule.objects.get_or_create(every=24, period=HOURS)` |
| **Create periodic task** | Use task name (not function), `kwargs=json.dumps(...)` |
| **Delete scheduled task** | `PeriodicTask.objects.filter(name=...).delete()` |
| **Avoid race conditions** | Use `countdown=5` to wait for DB commit |
> **Examples:** See [assets/celery_patterns.py](assets/celery_patterns.py) for schedule_provider_scan pattern.
---
## Advanced Task Patterns
### `@set_tenant` Behavior
| Mode | `tenant_id` in kwargs | `tenant_id` passed to function |
|------|----------------------|-------------------------------|
| `@set_tenant` (default) | Popped (removed) | NO - function doesn't receive it |
| `@set_tenant(keep_tenant=True)` | Read but kept | YES - function receives it |
### Key Patterns
| Pattern | Description |
|---------|-------------|
| `bind=True` | Access `self.request.id`, `self.request.retries` |
| `get_task_logger(__name__)` | Proper logging in Celery tasks |
| `SoftTimeLimitExceeded` | Catch to save progress before hard kill |
| `countdown=30` | Defer execution by N seconds |
| `eta=datetime(...)` | Execute at specific time |
> **Examples:** See [assets/celery_patterns.py](assets/celery_patterns.py) for all advanced patterns.
---
## Celery Configuration
| Setting | Value | Purpose |
|---------|-------|---------|
| `BROKER_VISIBILITY_TIMEOUT` | `86400` (24h) | Prevent re-queue for long tasks |
| `CELERY_RESULT_BACKEND` | `django-db` | Store results in PostgreSQL |
| `CELERY_TASK_TRACK_STARTED` | `True` | Track when tasks start |
| `soft_time_limit` | Task-specific | Raises `SoftTimeLimitExceeded` |
| `time_limit` | Task-specific | Hard kill (SIGKILL) |
> **Full config:** See [assets/celery_patterns.py](assets/celery_patterns.py) and actual files at `config/celery.py`, `config/settings/celery.py`.
---
## UUIDv7 for Partitioned Tables
`Finding` and `ResourceFindingMapping` use UUIDv7 for time-based partitioning:
```python
@shared_task(base=RLSTask, name="task-name", queue="scans")
from uuid6 import uuid7
from api.uuid_utils import uuid7_start, uuid7_end, datetime_to_uuid7
# Partition-aware filtering
start = uuid7_start(datetime_to_uuid7(date_from))
end = uuid7_end(datetime_to_uuid7(date_to), settings.FINDINGS_TABLE_PARTITION_MONTHS)
queryset.filter(id__gte=start, id__lt=end)
```
**Why UUIDv7?** Time-ordered UUIDs enable PostgreSQL to prune partitions during range queries.
---
## Batch Operations with RLS
```python
from api.db_utils import batch_delete, create_objects_in_batches, update_objects_in_batches
# Delete in batches (RLS-aware)
batch_delete(tenant_id, queryset, batch_size=1000)
# Bulk create with RLS
create_objects_in_batches(tenant_id, Finding, objects, batch_size=500)
# Bulk update with RLS
update_objects_in_batches(tenant_id, Finding, objects, fields=["status"], batch_size=500)
```
---
## Security Patterns
> **Full examples**: See [assets/security_patterns.py](assets/security_patterns.py)
### Tenant Isolation Summary
| Pattern | Rule |
|---------|------|
| **RLS in ViewSets** | Automatic via `BaseRLSViewSet` - tenant_id from JWT |
| **RLS in Celery** | MUST use `@set_tenant` + `rls_transaction(tenant_id)` |
| **Cross-tenant validation** | Defense-in-depth: verify `obj.tenant_id == request.tenant_id` |
| **Never trust user input** | Use `request.tenant_id` from JWT, never `request.data.get("tenant_id")` |
| **Admin DB bypass** | Only for cross-tenant admin ops - exposes ALL tenants' data |
### Celery Task Security Summary
| Pattern | Rule |
|---------|------|
| **Named tasks only** | NEVER use dynamic task names from user input |
| **Validate arguments** | Check UUID format before database queries |
| **Safe queuing** | Use `transaction.on_commit()` to enqueue AFTER commit |
| **Modern retries** | Use `autoretry_for`, `retry_backoff`, `retry_jitter` |
| **Time limits** | Set `soft_time_limit` and `time_limit` to prevent hung tasks |
| **Idempotency** | Use `update_or_create` or idempotency keys |
### Quick Reference
```python
# Safe task queuing - task only enqueued after transaction commits
with transaction.atomic():
provider = Provider.objects.create(**data)
transaction.on_commit(
lambda: verify_provider_connection.delay(
tenant_id=str(request.tenant_id),
provider_id=str(provider.id)
)
)
# Modern retry pattern
@shared_task(
base=RLSTask,
bind=True,
autoretry_for=(ConnectionError, TimeoutError, OperationalError),
retry_backoff=True,
retry_backoff_max=600,
retry_jitter=True,
max_retries=5,
soft_time_limit=300,
time_limit=360,
)
@set_tenant
@handle_provider_deletion
def my_task(tenant_id: str, provider_id: str):
def sync_provider_data(self, tenant_id, provider_id):
with rls_transaction(tenant_id):
# ... task logic
pass
```
**Queues**: Check `tasks/tasks.py`. Common: `scans`, `overview`, `compliance`, `integrations`.
**Orchestration**: Use `chain()` for sequential, `group()` for parallel.
---
## 6. JSON:API Format
```python
content_type = "application/vnd.api+json"
# Request
{"data": {"type": "providers", "attributes": {"provider": "aws", "uid": "123456789012"}}}
# Response access
response.json()["data"]["attributes"]["alias"]
# Idempotent task - safe to retry
@shared_task(base=RLSTask, acks_late=True)
@set_tenant
def process_finding(tenant_id, finding_uid, data):
with rls_transaction(tenant_id):
Finding.objects.update_or_create(uid=finding_uid, defaults=data)
```
---
## 7. Serializers
## Production Deployment Checklist
| Pattern | Usage |
|---------|-------|
| `ProviderSerializer` | Read (list/retrieve) |
| `ProviderCreateSerializer` | POST |
| `ProviderUpdateSerializer` | PATCH |
| `RLSSerializer` | Auto-injects tenant_id |
> **Full settings**: See [references/production-settings.md](references/production-settings.md)
Run before every production deployment:
```bash
cd api && poetry run python src/backend/manage.py check --deploy
```
### Critical Settings
| Setting | Production Value | Risk if Wrong |
|---------|-----------------|---------------|
| `DEBUG` | `False` | Exposes stack traces, settings, SQL queries |
| `SECRET_KEY` | Env var, rotated | Session hijacking, CSRF bypass |
| `ALLOWED_HOSTS` | Explicit list | Host header attacks |
| `SECURE_SSL_REDIRECT` | `True` | Credentials sent over HTTP |
| `SESSION_COOKIE_SECURE` | `True` | Session cookies over HTTP |
| `CSRF_COOKIE_SECURE` | `True` | CSRF tokens over HTTP |
| `SECURE_HSTS_SECONDS` | `31536000` (1 year) | Downgrade attacks |
| `CONN_MAX_AGE` | `60` or higher | Connection pool exhaustion |
---
## Commands
```bash
cd api && poetry run python manage.py migrate # Run migrations
cd api && poetry run python manage.py shell # Django shell
cd api && poetry run celery -A config.celery worker -l info # Start worker
# Development
cd api && poetry run python src/backend/manage.py runserver
cd api && poetry run python src/backend/manage.py shell
# Celery
cd api && poetry run celery -A config.celery worker -l info -Q scans,overview
cd api && poetry run celery -A config.celery beat -l info
# Testing
cd api && poetry run pytest -x --tb=short
# Production checks
cd api && poetry run python src/backend/manage.py check --deploy
```
---
## Resources
- **Documentation**: See [references/api-docs.md](references/api-docs.md) for local file paths and documentation
### Local References
- **File Locations**: See [references/file-locations.md](references/file-locations.md)
- **Modeling Decisions**: See [references/modeling-decisions.md](references/modeling-decisions.md)
- **Configuration**: See [references/configuration.md](references/configuration.md)
- **Production Settings**: See [references/production-settings.md](references/production-settings.md)
- **Security Patterns**: See [assets/security_patterns.py](assets/security_patterns.py)
### Related Skills
- **Generic DRF Patterns**: Use `django-drf` skill
- **API Testing**: Use `prowler-test-api` skill
### Context7 MCP (Recommended)
**Prerequisite:** Install Context7 MCP server for up-to-date documentation lookup.
When implementing or debugging Prowler-specific patterns, query these libraries via `mcp_context7_query-docs`:
| Library | Context7 ID | Use For |
|---------|-------------|---------|
| **Celery** | `/websites/celeryq_dev_en_stable` | Task patterns, queues, error handling |
| **django-celery-beat** | `/celery/django-celery-beat` | Periodic task scheduling |
| **Django** | `/websites/djangoproject_en_5_2` | Models, ORM, constraints, indexes |
**Example queries:**
```
mcp_context7_query-docs(libraryId="/websites/celeryq_dev_en_stable", query="shared_task decorator retry patterns")
mcp_context7_query-docs(libraryId="/celery/django-celery-beat", query="periodic task database scheduler")
mcp_context7_query-docs(libraryId="/websites/djangoproject_en_5_2", query="model constraints CheckConstraint UniqueConstraint")
```
> **Note:** Use `mcp_context7_resolve-library-id` first if you need to find the correct library ID.

View File

@@ -0,0 +1,319 @@
# Prowler API - Celery Patterns Reference
# Reference for prowler-api skill
from datetime import datetime, timedelta, timezone
import json
from celery import chain, group, shared_task
from celery.exceptions import SoftTimeLimitExceeded
from celery.utils.log import get_task_logger
from django.db import OperationalError, transaction
from django_celery_beat.models import IntervalSchedule, PeriodicTask
from api.db_utils import rls_transaction
from api.decorators import handle_provider_deletion, set_tenant
from api.models import Provider, Scan
from config.celery import RLSTask
logger = get_task_logger(__name__)
# =============================================================================
# DECORATOR ORDER - CRITICAL
# =============================================================================
# @shared_task() must be first
# @set_tenant must be second (sets RLS context)
# @handle_provider_deletion must be third (handles deleted providers)
# =============================================================================
# @set_tenant BEHAVIOR
# =============================================================================
# Example: @set_tenant (default) - tenant_id NOT in function signature
# The decorator pops tenant_id from kwargs after setting RLS context
@shared_task(base=RLSTask, name="provider-connection-check")
@set_tenant
def check_provider_connection_task(provider_id: str):
"""Task receives NO tenant_id param - decorator pops it from kwargs."""
# RLS context already set by decorator
with rls_transaction(): # Context already established
provider = Provider.objects.get(pk=provider_id)
return {"connected": provider.connected}
# Example: @set_tenant(keep_tenant=True) - tenant_id IN function signature
@shared_task(base=RLSTask, name="scan-report", queue="scan-reports")
@set_tenant(keep_tenant=True)
def generate_outputs_task(scan_id: str, provider_id: str, tenant_id: str):
"""Task receives tenant_id param - use when function needs it."""
# Can use tenant_id in function body
with rls_transaction(tenant_id):
scan = Scan.objects.get(pk=scan_id)
# ... generate outputs
return {"scan_id": scan_id, "tenant_id": tenant_id}
# =============================================================================
# TASK COMPOSITION (CANVAS)
# =============================================================================
# Chain: Sequential execution - A → B → C
def example_chain(tenant_id: str):
"""Tasks run one after another."""
chain(
task_a.si(tenant_id=tenant_id),
task_b.si(tenant_id=tenant_id),
task_c.si(tenant_id=tenant_id),
).apply_async()
# Group: Parallel execution - A, B, C simultaneously
def example_group(tenant_id: str):
"""Tasks run at the same time."""
group(
task_a.si(tenant_id=tenant_id),
task_b.si(tenant_id=tenant_id),
task_c.si(tenant_id=tenant_id),
).apply_async()
# Combined: Real pattern from Prowler (post-scan workflow)
def post_scan_workflow(tenant_id: str, scan_id: str, provider_id: str):
"""Chain with nested groups for complex workflows."""
chain(
# First: Summary
perform_scan_summary_task.si(tenant_id=tenant_id, scan_id=scan_id),
# Then: Parallel aggregation + outputs
group(
aggregate_daily_severity_task.si(tenant_id=tenant_id, scan_id=scan_id),
generate_outputs_task.si(
scan_id=scan_id, provider_id=provider_id, tenant_id=tenant_id
),
),
# Finally: Parallel compliance + integrations
group(
generate_compliance_reports_task.si(
tenant_id=tenant_id, scan_id=scan_id, provider_id=provider_id
),
check_integrations_task.si(
tenant_id=tenant_id, provider_id=provider_id, scan_id=scan_id
),
),
).apply_async()
# Note: Use .si() (signature immutable) to prevent result passing.
# Use .s() if you need to pass results between tasks.
# =============================================================================
# BEAT SCHEDULING (PERIODIC TASKS)
# =============================================================================
def schedule_provider_scan(provider_id: str, tenant_id: str):
"""Create a periodic task that runs every 24 hours."""
# 1. Create or get the schedule
schedule, _ = IntervalSchedule.objects.get_or_create(
every=24,
period=IntervalSchedule.HOURS,
)
# 2. Create the periodic task
PeriodicTask.objects.create(
interval=schedule,
name=f"scan-perform-scheduled-{provider_id}", # Unique name
task="scan-perform-scheduled", # Task name (not function name)
kwargs=json.dumps(
{
"tenant_id": str(tenant_id),
"provider_id": str(provider_id),
}
),
one_off=False,
start_time=datetime.now(timezone.utc) + timedelta(hours=24),
)
def delete_scheduled_scan(provider_id: str):
"""Remove a periodic task."""
PeriodicTask.objects.filter(name=f"scan-perform-scheduled-{provider_id}").delete()
# Avoiding race conditions with countdown
def schedule_with_countdown(provider_id: str, tenant_id: str):
"""Use countdown to ensure DB transaction commits before task runs."""
perform_scheduled_scan_task.apply_async(
kwargs={"tenant_id": tenant_id, "provider_id": provider_id},
countdown=5, # Wait 5 seconds
)
# =============================================================================
# ADVANCED TASK PATTERNS
# =============================================================================
# bind=True - Access task metadata
@shared_task(base=RLSTask, bind=True, name="scan-perform-scheduled", queue="scans")
@set_tenant(keep_tenant=True)
def perform_scheduled_scan_task(self, tenant_id: str, provider_id: str):
"""bind=True provides access to self.request for task metadata."""
task_id = self.request.id # Current task ID
retries = self.request.retries # Number of retries so far
with rls_transaction(tenant_id):
scan = Scan.objects.create(
provider_id=provider_id,
task_id=task_id, # Track which task started this scan
)
return {"scan_id": str(scan.id), "task_id": task_id}
# get_task_logger - Proper logging in Celery tasks
@shared_task(base=RLSTask, name="my-task")
@set_tenant
def my_task_with_logging(provider_id: str):
"""Always use get_task_logger for Celery task logging."""
logger.info(f"Processing provider {provider_id}")
logger.warning("Potential issue detected")
logger.error("Failed to process")
# Called with tenant_id in kwargs (decorator handles it)
# my_task_with_logging.delay(provider_id="...", tenant_id="...")
# SoftTimeLimitExceeded - Graceful timeout handling
@shared_task(
base=RLSTask,
soft_time_limit=300, # 5 minutes - raises SoftTimeLimitExceeded
time_limit=360, # 6 minutes - hard kill (SIGKILL)
)
@set_tenant(keep_tenant=True)
def long_running_task(tenant_id: str, scan_id: str):
"""Handle soft time limits gracefully to save progress."""
try:
with rls_transaction(tenant_id):
for batch in get_large_dataset():
process_batch(batch)
except SoftTimeLimitExceeded:
logger.warning(f"Task soft limit exceeded for scan {scan_id}, saving progress...")
save_partial_progress(scan_id)
raise # Re-raise to mark task as failed
# Deferred execution - countdown and eta
def deferred_examples():
"""Execute tasks at specific times."""
# Execute after 30 seconds
my_task.apply_async(kwargs={"provider_id": "..."}, countdown=30)
# Execute at specific time
my_task.apply_async(
kwargs={"provider_id": "..."},
eta=datetime(2024, 1, 15, 10, 0, tzinfo=timezone.utc),
)
# =============================================================================
# CELERY CONFIGURATION (config/celery.py)
# =============================================================================
# Example configuration - see actual file for full config
"""
from celery import Celery
celery_app = Celery("tasks")
celery_app.config_from_object("django.conf:settings", namespace="CELERY")
# Visibility timeout - CRITICAL for long-running tasks
# If task takes longer than this, broker assumes worker died and re-queues
BROKER_VISIBILITY_TIMEOUT = 86400 # 24 hours for scan tasks
celery_app.conf.broker_transport_options = {
"visibility_timeout": BROKER_VISIBILITY_TIMEOUT
}
celery_app.conf.result_backend_transport_options = {
"visibility_timeout": BROKER_VISIBILITY_TIMEOUT
}
# Result settings
celery_app.conf.update(
result_extended=True, # Store additional task metadata
result_expires=None, # Never expire results (we manage cleanup)
)
"""
# Django settings (config/settings/celery.py)
"""
CELERY_BROKER_URL = f"redis://{VALKEY_HOST}:{VALKEY_PORT}/{VALKEY_DB}"
CELERY_RESULT_BACKEND = "django-db" # Store results in PostgreSQL
CELERY_TASK_TRACK_STARTED = True # Track when tasks start
CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True
# Global time limits (optional)
CELERY_TASK_SOFT_TIME_LIMIT = 3600 # 1 hour soft limit
CELERY_TASK_TIME_LIMIT = 3660 # 1 hour + 1 minute hard limit
"""
# =============================================================================
# ASYNC TASK RESPONSE PATTERN (202 Accepted)
# =============================================================================
class ProviderViewSetExample:
"""Example: Return 202 for long-running operations."""
def connection(self, request, pk=None):
"""Trigger async connection check, return 202 with task location."""
from django.urls import reverse
from rest_framework import status
from rest_framework.response import Response
from api.models import Task
from api.v1.serializers import TaskSerializer
with transaction.atomic():
task = check_provider_connection_task.delay(
provider_id=pk, tenant_id=self.request.tenant_id
)
prowler_task = Task.objects.get(id=task.id)
serializer = TaskSerializer(prowler_task)
return Response(
data=serializer.data,
status=status.HTTP_202_ACCEPTED,
headers={
"Content-Location": reverse("task-detail", kwargs={"pk": prowler_task.id})
},
)
# =============================================================================
# PLACEHOLDERS (would exist in real codebase)
# =============================================================================
task_a = None
task_b = None
task_c = None
perform_scan_summary_task = None
aggregate_daily_severity_task = None
generate_compliance_reports_task = None
check_integrations_task = None
perform_scheduled_scan_task = None
my_task = None
def get_large_dataset():
return []
def process_batch(batch):
pass
def save_partial_progress(scan_id):
pass

View File

@@ -0,0 +1,207 @@
# Example: Prowler API Security Patterns
# Reference for prowler-api skill
import uuid
from celery import shared_task
from celery.exceptions import SoftTimeLimitExceeded
from django.db import OperationalError, transaction
from rest_framework.exceptions import PermissionDenied
from api.db_utils import rls_transaction
from api.decorators import handle_provider_deletion, set_tenant
from api.models import Finding, Provider
from api.rls import Tenant
from tasks.base import RLSTask
# =============================================================================
# TENANT ISOLATION (RLS)
# =============================================================================
class ProviderViewSet:
"""Example: RLS context set automatically by BaseRLSViewSet."""
def get_queryset(self):
# RLS already filters by tenant_id from JWT
# All queries are automatically tenant-scoped
return Provider.objects.all()
@shared_task(base=RLSTask)
@set_tenant
def process_scan_good(tenant_id, scan_id):
"""GOOD: Explicit RLS context in Celery tasks."""
with rls_transaction(tenant_id):
# RLS enforced - only sees tenant's data
scan = Scan.objects.get(id=scan_id)
return scan
def dangerous_function(provider_id):
"""BAD: Bypassing RLS with admin database - exposes ALL tenants' data!"""
# NEVER do this unless absolutely necessary for cross-tenant admin ops
provider = Provider.objects.using("admin").get(id=provider_id)
return provider
# =============================================================================
# CROSS-TENANT DATA LEAKAGE PREVENTION
# =============================================================================
class SecureViewSet:
"""Example: Defense-in-depth tenant validation."""
def get_object(self):
obj = super().get_object()
# Defense-in-depth: verify tenant even though RLS should filter
if obj.tenant_id != self.request.tenant_id:
raise PermissionDenied("Access denied")
return obj
def create_good(self, request):
"""GOOD: Use tenant from authenticated JWT."""
serializer = self.get_serializer(data=request.data)
serializer.is_valid(raise_exception=True)
serializer.save(tenant_id=request.tenant_id)
def create_bad(self, request):
"""BAD: Trust user input for tenant_id."""
serializer = self.get_serializer(data=request.data)
serializer.is_valid(raise_exception=True)
# NEVER trust user-provided tenant_id!
serializer.save(tenant_id=request.data.get("tenant_id"))
# =============================================================================
# CELERY TASK SECURITY
# =============================================================================
@shared_task(base=RLSTask)
@set_tenant
def process_provider(tenant_id, provider_id):
"""Example: Validate task arguments before processing."""
# Validate UUID format before database query
try:
uuid.UUID(provider_id)
except ValueError:
# Log and return - don't expose error details
return {"error": "Invalid provider_id format"}
with rls_transaction(tenant_id):
# Now safe to query
provider = Provider.objects.get(id=provider_id)
return {"provider": str(provider.id)}
def send_task_bad(user_provided_task_name, args):
"""BAD: Dynamic task names from user input = arbitrary code execution."""
from celery import current_app
# NEVER do this!
current_app.send_task(user_provided_task_name, args=args)
# =============================================================================
# SAFE TASK QUEUING WITH TRANSACTIONS
# =============================================================================
def create_provider_good(request, data):
"""GOOD: Task only enqueued AFTER transaction commits."""
with transaction.atomic():
provider = Provider.objects.create(**data)
# Task enqueued only if transaction succeeds
transaction.on_commit(
lambda: verify_provider_connection.delay(
tenant_id=str(request.tenant_id), provider_id=str(provider.id)
)
)
return provider
def create_provider_bad(request, data):
"""BAD: Task enqueued before transaction commits - race condition!"""
with transaction.atomic():
provider = Provider.objects.create(**data)
# Task might run before transaction commits!
# If transaction rolls back, task processes non-existent data
verify_provider_connection.delay(provider_id=str(provider.id))
return provider
# =============================================================================
# MODERN CELERY RETRY PATTERNS
# =============================================================================
@shared_task(
base=RLSTask,
bind=True,
# Automatic retry for transient errors
autoretry_for=(ConnectionError, TimeoutError, OperationalError),
retry_backoff=True, # Exponential: 1s, 2s, 4s, 8s...
retry_backoff_max=600, # Cap at 10 minutes
retry_jitter=True, # Randomize to prevent thundering herd
max_retries=5,
# Time limits prevent hung tasks
soft_time_limit=300, # 5 min: raises SoftTimeLimitExceeded
time_limit=360, # 6 min: hard kill
)
@set_tenant
def sync_provider_data(self, tenant_id, provider_id):
"""Example: Modern retry pattern with time limits."""
try:
with rls_transaction(tenant_id):
provider = Provider.objects.get(id=provider_id)
# ... sync logic
return {"status": "synced", "provider": str(provider.id)}
except SoftTimeLimitExceeded:
# Cleanup and exit gracefully
return {"status": "timeout", "provider": provider_id}
# =============================================================================
# IDEMPOTENT TASK DESIGN
# =============================================================================
@shared_task(base=RLSTask, acks_late=True)
@set_tenant
def process_finding_good(tenant_id, finding_uid, data):
"""GOOD: Idempotent - safe to retry, uses upsert pattern."""
with rls_transaction(tenant_id):
# update_or_create is idempotent - retry won't create duplicates
Finding.objects.update_or_create(uid=finding_uid, defaults=data)
@shared_task(base=RLSTask)
@set_tenant
def create_notification_bad(tenant_id, message):
"""BAD: Non-idempotent - retry creates duplicates."""
with rls_transaction(tenant_id):
# No dedup key - every retry creates a new notification!
Notification.objects.create(message=message)
@shared_task(base=RLSTask, acks_late=True)
@set_tenant
def send_notification_good(tenant_id, idempotency_key, message):
"""GOOD: Idempotency key for non-upsertable operations."""
with rls_transaction(tenant_id):
# Check if already processed
if ProcessedTask.objects.filter(key=idempotency_key).exists():
return {"status": "already_processed"}
Notification.objects.create(message=message)
ProcessedTask.objects.create(key=idempotency_key)
return {"status": "sent"}
# Placeholder for imports that would exist in real codebase
verify_provider_connection = None
Scan = None
Notification = None
ProcessedTask = None

View File

@@ -1,21 +0,0 @@
# API Documentation
## Local Documentation
For API-related patterns, see:
- `api/src/backend/api/models.py` - Models, Providers, UID validation
- `api/src/backend/api/v1/views.py` - ViewSets, RBAC patterns
- `api/src/backend/api/v1/serializers.py` - Serializers
- `api/src/backend/api/rbac/permissions.py` - RBAC functions
- `api/src/backend/tasks/tasks.py` - Celery tasks
- `api/src/backend/api/db_utils.py` - rls_transaction
## Contents
The documentation covers:
- Row-Level Security (RLS) implementation
- RBAC permission system
- Provider validation patterns
- Celery task orchestration
- JSON:API serialization format

View File

@@ -0,0 +1,282 @@
# Prowler API Configuration Reference
## Settings File Structure
```
api/src/backend/config/
├── django/
│ ├── base.py # Base settings (all environments)
│ ├── devel.py # Development overrides
│ ├── production.py # Production settings
│ └── testing.py # Test settings
├── settings/
│ ├── celery.py # Celery broker/backend config
│ ├── partitions.py # Table partitioning settings
│ ├── sentry.py # Error tracking + exception filtering
│ └── social_login.py # OAuth/SAML providers
├── celery.py # Celery app instance + RLSTask
├── custom_logging.py # NDJSON/Human-readable formatters
├── env.py # django-environ setup
└── urls.py # Root URL config
```
---
## REST Framework Configuration
### Complete `REST_FRAMEWORK` Settings
```python
REST_FRAMEWORK = {
# Schema Generation (JSON:API compatible)
"DEFAULT_SCHEMA_CLASS": "drf_spectacular_jsonapi.schemas.openapi.JsonApiAutoSchema",
# Authentication (JWT + API Key)
"DEFAULT_AUTHENTICATION_CLASSES": (
"api.authentication.CombinedJWTOrAPIKeyAuthentication",
),
# Pagination
"PAGE_SIZE": 10,
"DEFAULT_PAGINATION_CLASS": "drf_spectacular_jsonapi.schemas.pagination.JsonApiPageNumberPagination",
# Custom exception handler (JSON:API format)
"EXCEPTION_HANDLER": "api.exceptions.custom_exception_handler",
# Parsers (JSON:API compatible)
"DEFAULT_PARSER_CLASSES": (
"rest_framework_json_api.parsers.JSONParser",
"rest_framework.parsers.FormParser",
"rest_framework.parsers.MultiPartParser",
),
# Custom renderer with RLS context support
"DEFAULT_RENDERER_CLASSES": ("api.renderers.APIJSONRenderer",),
# Metadata
"DEFAULT_METADATA_CLASS": "rest_framework_json_api.metadata.JSONAPIMetadata",
# Filter Backends
"DEFAULT_FILTER_BACKENDS": (
"rest_framework_json_api.filters.QueryParameterValidationFilter",
"rest_framework_json_api.filters.OrderingFilter",
"rest_framework_json_api.django_filters.backends.DjangoFilterBackend",
"rest_framework.filters.SearchFilter",
),
# JSON:API search parameter
"SEARCH_PARAM": "filter[search]",
# Test settings
"TEST_REQUEST_RENDERER_CLASSES": ("rest_framework_json_api.renderers.JSONRenderer",),
"TEST_REQUEST_DEFAULT_FORMAT": "vnd.api+json",
# Uniform exception format
"JSON_API_UNIFORM_EXCEPTIONS": True,
# Throttling
"DEFAULT_THROTTLE_CLASSES": ["rest_framework.throttling.ScopedRateThrottle"],
"DEFAULT_THROTTLE_RATES": {
"token-obtain": env("DJANGO_THROTTLE_TOKEN_OBTAIN", default=None),
"dj_rest_auth": None,
},
}
```
### Throttling Configuration
| Scope | Environment Variable | Default | Format |
|-------|---------------------|---------|--------|
| `token-obtain` | `DJANGO_THROTTLE_TOKEN_OBTAIN` | `None` (disabled) | `"X/minute"`, `"X/hour"`, `"X/day"` |
| `dj_rest_auth` | N/A | `None` (disabled) | Same |
**To enable throttling:**
```bash
DJANGO_THROTTLE_TOKEN_OBTAIN="10/minute" # Limit token endpoint to 10 requests/minute
```
---
## JWT Configuration (SIMPLE_JWT)
```python
SIMPLE_JWT = {
# Token Lifetimes
"ACCESS_TOKEN_LIFETIME": timedelta(minutes=30), # DJANGO_ACCESS_TOKEN_LIFETIME
"REFRESH_TOKEN_LIFETIME": timedelta(minutes=1440), # DJANGO_REFRESH_TOKEN_LIFETIME (24h)
# Token Rotation
"ROTATE_REFRESH_TOKENS": True,
"BLACKLIST_AFTER_ROTATION": True,
# Cryptographic Settings
"ALGORITHM": "RS256", # Asymmetric (requires key pair)
"SIGNING_KEY": env.str("DJANGO_TOKEN_SIGNING_KEY", ""),
"VERIFYING_KEY": env.str("DJANGO_TOKEN_VERIFYING_KEY", ""),
# JWT Claims
"TOKEN_TYPE_CLAIM": "typ",
"JTI_CLAIM": "jti",
"USER_ID_FIELD": "id",
"USER_ID_CLAIM": "sub",
# Issuer/Audience
"AUDIENCE": env.str("DJANGO_JWT_AUDIENCE", "https://api.prowler.com"),
"ISSUER": env.str("DJANGO_JWT_ISSUER", "https://api.prowler.com"),
# Custom Serializers
"TOKEN_OBTAIN_SERIALIZER": "api.serializers.TokenSerializer",
"TOKEN_REFRESH_SERIALIZER": "api.serializers.TokenRefreshSerializer",
}
```
---
## Database Configuration
### 4-Database Architecture
```python
DATABASES = {
"default": {...}, # Alias to prowler_user (RLS enabled)
"prowler_user": {...}, # RLS-enabled connection
"admin": {...}, # Admin connection (bypasses RLS)
"replica": {...}, # Read replica (RLS enabled)
"admin_replica": {...}, # Admin on replica
"neo4j": {...}, # Graph database (attack paths)
}
```
### Environment Variables
| Variable | Default | Description |
|----------|---------|-------------|
| `POSTGRES_DB` | `prowler_db` | Database name |
| `POSTGRES_USER` | `prowler_user` | API user (RLS-constrained) |
| `POSTGRES_PASSWORD` | - | API user password |
| `POSTGRES_HOST` | `postgres-db` | Database host |
| `POSTGRES_PORT` | `5432` | Database port |
| `POSTGRES_ADMIN_USER` | `prowler` | Admin user (migrations) |
| `POSTGRES_ADMIN_PASSWORD` | - | Admin password |
| `POSTGRES_REPLICA_HOST` | - | Replica host (optional) |
| `POSTGRES_REPLICA_MAX_ATTEMPTS` | `3` | Retry attempts before fallback |
| `POSTGRES_REPLICA_RETRY_BASE_DELAY` | `0.5` | Base delay for exponential backoff |
---
## Celery Configuration
### Broker/Backend
```python
VALKEY_HOST = env("VALKEY_HOST", default="valkey")
VALKEY_PORT = env("VALKEY_PORT", default="6379")
VALKEY_DB = env("VALKEY_DB", default="0")
CELERY_BROKER_URL = f"redis://{VALKEY_HOST}:{VALKEY_PORT}/{VALKEY_DB}"
CELERY_RESULT_BACKEND = "django-db" # Store results in PostgreSQL
CELERY_TASK_TRACK_STARTED = True
CELERY_BROKER_CONNECTION_RETRY_ON_STARTUP = True
```
### Task Visibility
| Variable | Default | Description |
|----------|---------|-------------|
| `DJANGO_BROKER_VISIBILITY_TIMEOUT` | `86400` (24h) | Task visibility timeout |
| `DJANGO_CELERY_DEADLOCK_ATTEMPTS` | `5` | Deadlock retry attempts |
---
## Partitioning Configuration
```python
PSQLEXTRA_PARTITIONING_MANAGER = "api.partitions.manager"
FINDINGS_TABLE_PARTITION_MONTHS = env.int("FINDINGS_TABLE_PARTITION_MONTHS", 1)
FINDINGS_TABLE_PARTITION_COUNT = env.int("FINDINGS_TABLE_PARTITION_COUNT", 7)
FINDINGS_TABLE_PARTITION_MAX_AGE_MONTHS = env.int("...", None) # Optional cleanup
```
---
## Application Settings
| Variable | Default | Description |
|----------|---------|-------------|
| `DJANGO_DEBUG` | `False` | Debug mode |
| `DJANGO_ALLOWED_HOSTS` | `["localhost"]` | Allowed hosts |
| `DJANGO_CACHE_MAX_AGE` | `3600` | HTTP cache max-age |
| `DJANGO_STALE_WHILE_REVALIDATE` | `60` | Stale-while-revalidate time |
| `DJANGO_FINDINGS_MAX_DAYS_IN_RANGE` | `7` | Max days for findings date filter |
| `DJANGO_TMP_OUTPUT_DIRECTORY` | `/tmp/prowler_api_output` | Temp output directory |
| `DJANGO_FINDINGS_BATCH_SIZE` | `1000` | Batch size for findings export |
| `DJANGO_DELETION_BATCH_SIZE` | `5000` | Batch size for deletions |
| `DJANGO_LOGGING_LEVEL` | `INFO` | Log level |
| `DJANGO_LOGGING_FORMATTER` | `ndjson` | Log format (`ndjson` or `human_readable`) |
---
## Social Login (OAuth/SAML)
| Variable | Description |
|----------|-------------|
| `SOCIAL_GOOGLE_OAUTH_CLIENT_ID` | Google OAuth client ID |
| `SOCIAL_GOOGLE_OAUTH_CLIENT_SECRET` | Google OAuth secret |
| `SOCIAL_GITHUB_OAUTH_CLIENT_ID` | GitHub OAuth client ID |
| `SOCIAL_GITHUB_OAUTH_CLIENT_SECRET` | GitHub OAuth secret |
---
## Monitoring
| Variable | Description |
|----------|-------------|
| `DJANGO_SENTRY_DSN` | Sentry DSN for error tracking |
---
## Middleware Stack (Order Matters)
```python
MIDDLEWARE = [
"django_guid.middleware.guid_middleware", # 1. Transaction ID
"django.middleware.security.SecurityMiddleware", # 2. Security headers
"django.contrib.sessions.middleware.SessionMiddleware",
"corsheaders.middleware.CorsMiddleware", # 4. CORS (before Common)
"django.middleware.common.CommonMiddleware",
"django.middleware.csrf.CsrfViewMiddleware",
"django.contrib.auth.middleware.AuthenticationMiddleware",
"django.contrib.messages.middleware.MessageMiddleware",
"django.middleware.clickjacking.XFrameOptionsMiddleware",
"api.middleware.APILoggingMiddleware", # 10. Custom API logging
"allauth.account.middleware.AccountMiddleware",
]
```
---
## Security Headers
| Setting | Value | Description |
|---------|-------|-------------|
| `SECURE_PROXY_SSL_HEADER` | `("HTTP_X_FORWARDED_PROTO", "https")` | Trust X-Forwarded-Proto |
| `SECURE_CONTENT_TYPE_NOSNIFF` | `True` | X-Content-Type-Options: nosniff |
| `X_FRAME_OPTIONS` | `"DENY"` | Prevent framing |
| `CSRF_COOKIE_SECURE` | `True` | HTTPS-only CSRF cookie |
| `SESSION_COOKIE_SECURE` | `True` | HTTPS-only session cookie |
---
## Password Validators
| Validator | Options |
|-----------|---------|
| `UserAttributeSimilarityValidator` | Default |
| `MinimumLengthValidator` | `min_length=12` |
| `MaximumLengthValidator` | `max_length=72` (bcrypt limit) |
| `CommonPasswordValidator` | Default |
| `NumericPasswordValidator` | Default |
| `SpecialCharactersValidator` | `min_special_characters=1` |
| `UppercaseValidator` | `min_uppercase=1` |
| `LowercaseValidator` | `min_lowercase=1` |
| `NumericValidator` | `min_numeric=1` |

View File

@@ -0,0 +1,128 @@
# Prowler API File Locations
## Configuration
| Purpose | File Path | Key Items |
|---------|-----------|-----------|
| **Django Settings** | `api/src/backend/config/settings.py` | REST_FRAMEWORK, SIMPLE_JWT, DATABASES |
| **Celery Config** | `api/src/backend/config/celery.py` | Celery app, queues, task routing |
| **URL Routing** | `api/src/backend/config/urls.py` | Main URL patterns |
| **Database Router** | `api/src/backend/api/db_router.py` | `MainRouter` (4-database architecture) |
## RLS (Row-Level Security)
| Pattern | File Path | Key Classes/Functions |
|---------|-----------|----------------------|
| **RLS Base Model** | `api/src/backend/api/rls.py` | `RowLevelSecurityProtectedModel`, `RowLevelSecurityConstraint` |
| **RLS Transaction** | `api/src/backend/api/db_utils.py` | `rls_transaction()` context manager |
| **RLS Serializer** | `api/src/backend/api/v1/serializers.py` | `RLSSerializer` - auto-injects tenant_id |
| **Tenant Model** | `api/src/backend/api/rls.py` | `Tenant` model |
| **Partitioning** | `api/src/backend/api/partitions.py` | `PartitionManager`, UUIDv7 partitioning |
## RBAC (Role-Based Access Control)
| Pattern | File Path | Key Classes/Functions |
|---------|-----------|----------------------|
| **Permissions** | `api/src/backend/api/rbac/permissions.py` | `Permissions` enum, `get_role()`, `get_providers()` |
| **Role Model** | `api/src/backend/api/models.py` | `Role`, `UserRoleRelationship`, `RoleProviderGroupRelationship` |
| **Permission Decorator** | `api/src/backend/api/decorators.py` | `@check_permissions`, `HasPermissions` |
| **Visibility Filter** | `api/src/backend/api/rbac/` | Provider group visibility filtering |
## Providers
| Pattern | File Path | Key Classes/Functions |
|---------|-----------|----------------------|
| **Provider Model** | `api/src/backend/api/models.py` | `Provider`, `ProviderChoices` |
| **UID Validation** | `api/src/backend/api/models.py` | `validate_<provider>_uid()` staticmethods |
| **Provider Secret** | `api/src/backend/api/models.py` | `ProviderSecret` model |
| **Provider Groups** | `api/src/backend/api/models.py` | `ProviderGroup`, `ProviderGroupMembership` |
## Serializers
| Pattern | File Path | Key Classes/Functions |
|---------|-----------|----------------------|
| **Base Serializers** | `api/src/backend/api/v1/serializers.py` | `BaseModelSerializerV1`, `RLSSerializer`, `BaseWriteSerializer` |
| **ViewSet Helpers** | `api/src/backend/api/v1/serializers.py` | `get_serializer_class_for_view()` |
## ViewSets
| Pattern | File Path | Key Classes/Functions |
|---------|-----------|----------------------|
| **Base ViewSets** | `api/src/backend/api/v1/views.py` | `BaseViewSet`, `BaseRLSViewSet`, `BaseTenantViewset`, `BaseUserViewset` |
| **Custom Actions** | `api/src/backend/api/v1/views.py` | `@action(detail=True)` patterns |
| **Filters** | `api/src/backend/api/filters.py` | `BaseProviderFilter`, `BaseScanProviderFilter`, `CommonFindingFilters` |
## Celery Tasks
| Pattern | File Path | Key Classes/Functions |
|---------|-----------|----------------------|
| **Task Definitions** | `api/src/backend/tasks/tasks.py` | All `@shared_task` definitions |
| **RLS Task Base** | `api/src/backend/config/celery.py` | `RLSTask` base class (creates APITask on dispatch) |
| **Task Decorators** | `api/src/backend/api/decorators.py` | `@set_tenant`, `@handle_provider_deletion` |
| **Celery Config** | `api/src/backend/config/celery.py` | Celery app, broker settings, visibility timeout |
| **Django Settings** | `api/src/backend/config/settings/celery.py` | `CELERY_BROKER_URL`, `CELERY_RESULT_BACKEND` |
| **Beat Schedule** | `api/src/backend/tasks/beat.py` | `schedule_provider_scan()`, `PeriodicTask` creation |
| **Task Utilities** | `api/src/backend/tasks/utils.py` | `batched()`, `get_next_execution_datetime()` |
### Task Jobs (Business Logic)
| Job File | Purpose |
|----------|---------|
| `tasks/jobs/scan.py` | `perform_prowler_scan()`, `aggregate_findings()`, `aggregate_attack_surface()` |
| `tasks/jobs/deletion.py` | `delete_provider()`, `delete_tenant()` |
| `tasks/jobs/backfill.py` | Historical data backfill operations |
| `tasks/jobs/export.py` | Output file generation (CSV, JSON, HTML) |
| `tasks/jobs/report.py` | PDF report generation (ThreatScore, ENS, NIS2) |
| `tasks/jobs/connection.py` | Provider/integration connection checks |
| `tasks/jobs/integrations.py` | S3, Security Hub, Jira uploads |
| `tasks/jobs/muting.py` | Historical findings muting |
| `tasks/jobs/attack_paths/` | Attack paths scan (Neo4j/Cartography) |
## Key Line References
### RLS Transaction (api/src/backend/api/db_utils.py)
```python
# Usage pattern
from api.db_utils import rls_transaction
with rls_transaction(tenant_id):
# All queries here are tenant-scoped
providers = Provider.objects.filter(connected=True)
```
### RBAC Check (api/src/backend/api/rbac/permissions.py)
```python
# Usage pattern
from api.rbac.permissions import get_role, get_providers, Permissions
user_role = get_role(request.user) # Returns FIRST role only
if user_role.unlimited_visibility:
queryset = Provider.objects.all()
else:
queryset = get_providers(user_role)
```
### Celery Task (api/src/backend/tasks/tasks.py)
```python
# Usage pattern
@shared_task(base=RLSTask, name="task-name", queue="scans")
@set_tenant
@handle_provider_deletion
def my_task(tenant_id: str, provider_id: str):
with rls_transaction(tenant_id):
provider = Provider.objects.get(pk=provider_id)
```
## Tests
| Type | Path |
|------|------|
| **Central Fixtures** | `api/src/backend/conftest.py` |
| **API Tests** | `api/src/backend/api/tests/` |
| **Integration Tests** | `api/src/backend/api/tests/integration/` |
| **Task Tests** | `api/src/backend/tasks/tests/` |
## Related Skills
- **Generic DRF patterns**: Use `django-drf` skill for ViewSets, Serializers, Filters, JSON:API
- **API Testing**: Use `prowler-test-api` skill for testing patterns

View File

@@ -0,0 +1,274 @@
# Django Model Design Decisions
## When to Use What
### Primary Keys
| Pattern | When to Use | Example |
|---------|-------------|---------|
| `uuid4` | Default for most models | `id = models.UUIDField(primary_key=True, default=uuid4)` |
| `uuid7` | Time-ordered data (findings, scans) | `id = models.UUIDField(primary_key=True, default=uuid7)` |
**Why uuid7 for time-series?** UUIDv7 includes timestamp, enabling efficient range queries and partitioning.
### Timestamps
| Field | Pattern | Purpose |
|-------|---------|---------|
| `inserted_at` | `auto_now_add=True, editable=False` | Creation time, never changes |
| `updated_at` | `auto_now=True, editable=False` | Last modification time |
### Soft Delete
```python
# Model
is_deleted = models.BooleanField(default=False)
# Custom manager (excludes deleted by default)
class ActiveProviderManager(models.Manager):
def get_queryset(self):
return super().get_queryset().filter(is_deleted=False)
# Usage
objects = ActiveProviderManager() # Normal queries
all_objects = models.Manager() # Include deleted
```
### TextChoices Enums
```python
class StateChoices(models.TextChoices):
AVAILABLE = "available", _("Available")
SCHEDULED = "scheduled", _("Scheduled")
EXECUTING = "executing", _("Executing")
COMPLETED = "completed", _("Completed")
FAILED = "failed", _("Failed")
```
### Constraints
| Constraint | When to Use |
|------------|-------------|
| `UniqueConstraint` | Prevent duplicates within tenant scope |
| `UniqueConstraint + condition` | Unique only for non-deleted records |
| `RowLevelSecurityConstraint` | ALL RLS-protected models (mandatory) |
```python
constraints = [
# Unique provider UID per tenant (only for active providers)
models.UniqueConstraint(
fields=("tenant_id", "provider", "uid"),
condition=Q(is_deleted=False),
name="unique_provider_uids",
),
# RLS constraint (REQUIRED for all tenant-scoped models)
RowLevelSecurityConstraint(
field="tenant_id",
name="rls_on_%(class)s",
statements=["SELECT", "INSERT", "UPDATE", "DELETE"],
),
]
```
### Indexes
| Index Type | When to Use | Example |
|------------|-------------|---------|
| `models.Index` | Frequent queries | `fields=["tenant_id", "provider_id"]` |
| `GinIndex` | Full-text search, ArrayField | `fields=["text_search"]` |
| Conditional Index | Specific query patterns | `condition=Q(state="completed")` |
| Covering Index | Avoid table lookups | `include=["id", "name"]` |
```python
indexes = [
# Common query pattern
models.Index(
fields=["tenant_id", "provider_id", "-inserted_at"],
name="scans_prov_ins_desc_idx",
),
# Conditional: only completed scans
models.Index(
fields=["tenant_id", "provider_id", "-inserted_at"],
condition=Q(state=StateChoices.COMPLETED),
name="scans_completed_idx",
),
# Covering: include extra columns to avoid table lookup
models.Index(
fields=["tenant_id", "provider_id"],
include=["id", "graph_database"],
name="aps_active_graph_idx",
),
# Full-text search
GinIndex(fields=["text_search"], name="gin_resources_search_idx"),
]
```
### Full-Text Search
```python
from django.contrib.postgres.search import SearchVector, SearchVectorField
text_search = models.GeneratedField(
expression=SearchVector("uid", weight="A", config="simple")
+ SearchVector("name", weight="B", config="simple"),
output_field=SearchVectorField(),
db_persist=True,
null=True,
editable=False,
)
```
### ArrayField
```python
from django.contrib.postgres.fields import ArrayField
groups = ArrayField(
models.CharField(max_length=100),
blank=True,
null=True,
help_text="Groups for categorization",
)
```
### JSONField
```python
# Structured data with defaults
metadata = models.JSONField(default=dict, blank=True)
scanner_args = models.JSONField(default=dict, blank=True)
```
### Encrypted Fields
```python
# Binary field for encrypted data
_secret = models.BinaryField(db_column="secret")
@property
def secret(self):
# Decrypt on read
decrypted_data = fernet.decrypt(self._secret)
return json.loads(decrypted_data.decode())
@secret.setter
def secret(self, value):
# Encrypt on write
self._secret = fernet.encrypt(json.dumps(value).encode())
```
### Foreign Keys
| on_delete | When to Use |
|-----------|-------------|
| `CASCADE` | Child cannot exist without parent (Finding → Scan) |
| `SET_NULL` | Optional relationship, keep child (Task → PeriodicTask) |
| `PROTECT` | Prevent deletion if children exist |
```python
# Required relationship
provider = models.ForeignKey(
Provider,
on_delete=models.CASCADE,
related_name="scans",
related_query_name="scan",
)
# Optional relationship
scheduler_task = models.ForeignKey(
PeriodicTask,
on_delete=models.SET_NULL,
null=True,
blank=True,
)
```
### Many-to-Many with Through Table
```python
# On the model
tags = models.ManyToManyField(
ResourceTag,
through="ResourceTagMapping",
related_name="resources",
)
# Through table (for RLS + extra fields)
class ResourceTagMapping(RowLevelSecurityProtectedModel):
id = models.UUIDField(primary_key=True, default=uuid4)
resource = models.ForeignKey(Resource, on_delete=models.CASCADE)
tag = models.ForeignKey(ResourceTag, on_delete=models.CASCADE)
class Meta:
constraints = [
models.UniqueConstraint(
fields=("tenant_id", "resource_id", "tag_id"),
name="unique_resource_tag_mappings",
),
RowLevelSecurityConstraint(...),
]
```
### Partitioned Tables
```python
from psqlextra.models import PostgresPartitionedModel
from psqlextra.types import PostgresPartitioningMethod
class Finding(PostgresPartitionedModel, RowLevelSecurityProtectedModel):
class PartitioningMeta:
method = PostgresPartitioningMethod.RANGE
key = ["id"] # UUIDv7 for time-based partitioning
```
**Use for:** High-volume, time-series data (findings, resource mappings)
### Model Validation
```python
def clean(self):
super().clean()
# Dynamic validation based on field value
getattr(self, f"validate_{self.provider}_uid")(self.uid)
def save(self, *args, **kwargs):
self.full_clean() # Always validate before save
super().save(*args, **kwargs)
```
### JSONAPIMeta
```python
class JSONAPIMeta:
resource_name = "provider-groups" # kebab-case, plural
```
---
## Decision Tree: New Model
```
Is it tenant-scoped data?
├── Yes → Inherit RowLevelSecurityProtectedModel
│ Add RowLevelSecurityConstraint
│ Consider: soft-delete? partitioning?
└── No → Regular models.Model (rare in Prowler)
Does it need time-ordering for queries?
├── Yes → Use uuid7 for primary key
└── No → Use uuid4 (default)
Is it high-volume time-series data?
├── Yes → Use PostgresPartitionedModel
│ Partition by id (uuid7)
└── No → Regular model
Does it reference Provider?
├── Yes → Add ActiveProviderManager
│ Use CASCADE or filter is_deleted
└── No → Standard manager
Needs full-text search?
├── Yes → Add SearchVectorField + GinIndex
└── No → Skip
```

View File

@@ -0,0 +1,180 @@
# Production Settings Reference
## Django Deployment Checklist Command
```bash
cd api && poetry run python src/backend/manage.py check --deploy
```
This command checks for common deployment issues and missing security settings.
---
## Critical Settings Table
| Setting | Production Value | Risk if Wrong |
|---------|-----------------|---------------|
| `DEBUG` | `False` | Exposes stack traces, settings, SQL queries |
| `SECRET_KEY` | Env var, rotated | Session hijacking, CSRF bypass |
| `ALLOWED_HOSTS` | Explicit list | Host header attacks |
| `SECURE_SSL_REDIRECT` | `True` | Credentials sent over HTTP |
| `SESSION_COOKIE_SECURE` | `True` | Session cookies over HTTP |
| `CSRF_COOKIE_SECURE` | `True` | CSRF tokens over HTTP |
| `SECURE_HSTS_SECONDS` | `31536000` (1 year) | Downgrade attacks |
| `CONN_MAX_AGE` | `60` or higher | Connection pool exhaustion |
---
## Full Production Settings Example
```python
# settings/production.py
import environ
env = environ.Env()
# =============================================================================
# CORE SECURITY
# =============================================================================
DEBUG = False # NEVER True in production
# Load from environment - NEVER hardcode
SECRET_KEY = env("SECRET_KEY")
# Explicit list - no wildcards
ALLOWED_HOSTS = env.list("ALLOWED_HOSTS")
# Example: ALLOWED_HOSTS=api.prowler.com,prowler.com
# =============================================================================
# HTTPS ENFORCEMENT
# =============================================================================
# Redirect all HTTP to HTTPS
SECURE_SSL_REDIRECT = True
# Trust X-Forwarded-Proto header from reverse proxy (nginx, ALB, etc.)
SECURE_PROXY_SSL_HEADER = ("HTTP_X_FORWARDED_PROTO", "https")
# =============================================================================
# SECURE COOKIES
# =============================================================================
# Only send session cookie over HTTPS
SESSION_COOKIE_SECURE = True
# Only send CSRF cookie over HTTPS
CSRF_COOKIE_SECURE = True
# Prevent JavaScript access to session cookie (XSS protection)
SESSION_COOKIE_HTTPONLY = True
# SameSite attribute for CSRF protection
CSRF_COOKIE_SAMESITE = "Strict"
SESSION_COOKIE_SAMESITE = "Strict"
# =============================================================================
# HTTP STRICT TRANSPORT SECURITY (HSTS)
# =============================================================================
# Tell browsers to always use HTTPS for this domain
SECURE_HSTS_SECONDS = 31536000 # 1 year
# Apply HSTS to all subdomains
SECURE_HSTS_INCLUDE_SUBDOMAINS = True
# Allow browser preload lists (requires domain submission)
SECURE_HSTS_PRELOAD = True
# =============================================================================
# CONTENT SECURITY
# =============================================================================
# Prevent clickjacking - deny all framing
X_FRAME_OPTIONS = "DENY"
# Prevent MIME type sniffing
SECURE_CONTENT_TYPE_NOSNIFF = True
# Enable XSS filter in older browsers
SECURE_BROWSER_XSS_FILTER = True
# =============================================================================
# DATABASE
# =============================================================================
# Connection pooling - reuse connections for 60 seconds
# Reduces connection overhead for frequent requests
CONN_MAX_AGE = 60
# For high-traffic: consider connection pooler like PgBouncer
# CONN_MAX_AGE = None # Let PgBouncer manage connections
# =============================================================================
# LOGGING
# =============================================================================
LOGGING = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"verbose": {
"format": "{levelname} {asctime} {module} {process:d} {thread:d} {message}",
"style": "{",
},
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"formatter": "verbose",
},
},
"root": {
"handlers": ["console"],
"level": "INFO", # WARNING in production to reduce noise
},
"loggers": {
"django.security": {
"handlers": ["console"],
"level": "WARNING",
"propagate": False,
},
},
}
```
---
## Environment Variables Checklist
Required environment variables for production:
```bash
# Core
SECRET_KEY=<random-50+-chars>
ALLOWED_HOSTS=api.example.com,example.com
DEBUG=False
# Database
DATABASE_URL=<your-postgres-url>
# Or individual vars:
POSTGRES_HOST=...
POSTGRES_PORT=5432
POSTGRES_DB=...
POSTGRES_USER=...
POSTGRES_PASSWORD=...
# Redis (for Celery)
REDIS_URL=redis://host:6379/0
# Optional
SENTRY_DSN=https://...@sentry.io/...
```
---
## References
- [Django Deployment Checklist](https://docs.djangoproject.com/en/5.2/howto/deployment/checklist/)
- [Django Security Settings](https://docs.djangoproject.com/en/5.2/topics/security/)
- [OWASP Secure Headers](https://owasp.org/www-project-secure-headers/)

View File

@@ -0,0 +1,180 @@
---
name: prowler-commit
description: >
Creates professional git commits following conventional-commits format.
Trigger: When creating commits, after completing code changes, when user asks to commit.
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.1.0"
scope: [root, api, ui, prowler, mcp_server]
auto_invoke:
- "Creating a git commit"
- "Committing changes"
---
## Critical Rules
- ALWAYS use conventional-commits format: `type(scope): description`
- ALWAYS keep the first line under 72 characters
- ALWAYS ask for user confirmation before committing
- NEVER be overly specific (avoid counts like "6 subsections", "3 files")
- NEVER include implementation details in the title
- NEVER use `-n` flag unless user explicitly requests it
- NEVER use `git push --force` or `git push -f` (destructive, rewrites history)
- NEVER proactively offer to commit - wait for user to explicitly request it
---
## Commit Format
```
type(scope): concise description
- Key change 1
- Key change 2
- Key change 3
```
### Types
| Type | Use When |
|------|----------|
| `feat` | New feature or functionality |
| `fix` | Bug fix |
| `docs` | Documentation only |
| `chore` | Maintenance, dependencies, configs |
| `refactor` | Code change without feature/fix |
| `test` | Adding or updating tests |
| `perf` | Performance improvement |
| `style` | Formatting, no code change |
### Scopes
| Scope | When |
|-------|------|
| `api` | Changes in `api/` |
| `ui` | Changes in `ui/` |
| `sdk` | Changes in `prowler/` |
| `mcp` | Changes in `mcp_server/` |
| `skills` | Changes in `skills/` |
| `ci` | Changes in `.github/` |
| `docs` | Changes in `docs/` |
| *omit* | Multiple scopes or root-level |
---
## Good vs Bad Examples
### Title Line
```
# GOOD - Concise and clear
feat(api): add provider connection retry logic
fix(ui): resolve dashboard loading state
chore(skills): add Celery documentation
docs: update installation guide
# BAD - Too specific or verbose
feat(api): add provider connection retry logic with exponential backoff and jitter (3 retries max)
chore(skills): add comprehensive Celery documentation covering 8 topics
fix(ui): fix the bug in dashboard component on line 45
```
### Body (Bullet Points)
```
# GOOD - High-level changes
- Add retry mechanism for failed connections
- Document task composition patterns
- Expand configuration reference
# BAD - Too detailed
- Add retry with max_retries=3, backoff=True, jitter=True
- Add 6 subsections covering chain, group, chord
- Update lines 45-67 in dashboard.tsx
```
---
## Workflow
1. **Analyze changes**
```bash
git status
git diff --stat HEAD
git log -3 --oneline # Check recent commit style
```
2. **Draft commit message**
- Choose appropriate type and scope
- Write concise title (< 72 chars)
- Add 2-5 bullet points for significant changes
3. **Present to user for confirmation**
- Show files to be committed
- Show proposed message
- Wait for explicit confirmation
4. **Execute commit**
```bash
git add <files>
git commit -m "$(cat <<'EOF'
type(scope): description
- Change 1
- Change 2
EOF
)"
```
---
## Decision Tree
```
Single file changed?
├─ Yes → May omit body, title only
└─ No → Include body with key changes
Multiple scopes affected?
├─ Yes → Omit scope: `feat: description`
└─ No → Include scope: `feat(api): description`
Fixing a bug?
├─ User-facing → fix(scope): description
└─ Internal/dev → chore(scope): fix description
Adding documentation?
├─ Code docs (docstrings) → Part of feat/fix
└─ Standalone docs → docs: or docs(scope):
```
---
## Commands
```bash
# Check current state
git status
git diff --stat HEAD
# Standard commit
git add <files>
git commit -m "type(scope): description"
# Multi-line commit
git commit -m "$(cat <<'EOF'
type(scope): description
- Change 1
- Change 2
EOF
)"
# Amend last commit (same message)
git commit --amend --no-edit
# Amend with new message
git commit --amend -m "new message"
```

View File

@@ -6,7 +6,7 @@ description: >
license: Apache-2.0
metadata:
author: prowler-cloud
version: "1.0"
version: "1.1.0"
scope: [root, api]
auto_invoke:
- "Writing Prowler API tests"
@@ -17,115 +17,136 @@ allowed-tools: Read, Edit, Write, Glob, Grep, Bash, WebFetch, WebSearch, Task
## Critical Rules
- ALWAYS use `response.json()["data"]` not `response.data`
- ALWAYS use `content_type = "application/vnd.api+json"` in requests
- ALWAYS test cross-tenant isolation with `other_tenant_provider` fixture
- ALWAYS use `content_type = "application/vnd.api+json"` for PATCH/PUT requests
- ALWAYS use `format="vnd.api+json"` for POST requests
- ALWAYS test cross-tenant isolation - RLS returns 404, NOT 403
- NEVER skip RLS isolation tests when adding new endpoints
- NEVER use realistic-looking API keys in tests (TruffleHog will flag them)
- ALWAYS mock BOTH `.delay()` AND `Task.objects.get` for async task tests
---
## 1. JSON:API Format (Critical)
## 1. Fixture Dependency Chain
```python
content_type = "application/vnd.api+json"
payload = {
"data": {
"type": "providers", # Plural, kebab-case
"id": str(resource.id), # Required for PATCH
"attributes": {"alias": "updated"},
}
}
response.json()["data"]["attributes"]["alias"]
```
create_test_user (session) ─► tenants_fixture (function) ─► authenticated_client
└─► providers_fixture ─► scans_fixture ─► findings_fixture
```
---
## 2. RLS Isolation Tests
```python
def test_cross_tenant_access_denied(self, authenticated_client, other_tenant_provider):
"""User cannot see resources from other tenants."""
response = authenticated_client.get(
reverse("provider-detail", args=[other_tenant_provider.id])
)
assert response.status_code == status.HTTP_404_NOT_FOUND
```
---
## 3. RBAC Tests
```python
def test_unlimited_visibility_sees_all(self, authenticated_client_admin, providers_fixture):
response = authenticated_client_admin.get(reverse("provider-list"))
assert len(response.json()["data"]) == len(providers_fixture)
def test_limited_visibility_sees_only_assigned(self, authenticated_client_limited):
# User with unlimited_visibility=False sees only providers in their provider_groups
pass
def test_permission_required(self, authenticated_client_readonly):
response = authenticated_client_readonly.post(reverse("provider-list"), ...)
assert response.status_code == status.HTTP_403_FORBIDDEN
```
---
## 4. Managers (objects vs all_objects)
```python
def test_objects_excludes_deleted(self):
deleted_provider = Provider.objects.create(..., is_deleted=True)
assert deleted_provider not in Provider.objects.all()
assert deleted_provider in Provider.all_objects.all()
```
---
## 5. Celery Task Tests
```python
@patch("tasks.tasks.perform_prowler_scan")
def test_task_success(self, mock_scan):
mock_scan.return_value = {"findings_count": 100}
result = perform_scan_task(tenant_id="...", scan_id="...", provider_id="...")
assert result["findings_count"] == 100
```
---
## 6. Key Fixtures
### Key Fixtures
| Fixture | Description |
|---------|-------------|
| `create_test_user` | Session user (dev@prowler.com) |
| `tenants_fixture` | 3 tenants (2 with membership, 1 isolated) |
| `providers_fixture` | Providers in tenant 1 |
| `other_tenant_provider` | Provider in isolated tenant (RLS tests) |
| `authenticated_client` | Client with JWT for tenant 1 |
| `create_test_user` | Session user (`dev@prowler.com`) |
| `tenants_fixture` | 3 tenants: [0],[1] have membership, [2] isolated |
| `authenticated_client` | JWT client for tenant[0] |
| `providers_fixture` | 9 providers in tenant[0] |
| `tasks_fixture` | 2 Celery tasks with TaskResult |
### RBAC Fixtures
| Fixture | Permissions |
|---------|-------------|
| `authenticated_client_rbac` | All permissions (admin) |
| `authenticated_client_rbac_noroles` | Membership but NO roles |
| `authenticated_client_no_permissions_rbac` | All permissions = False |
---
## 7. Fake Secrets in Tests (TruffleHog)
CI runs TruffleHog to detect leaked secrets. Use obviously fake values:
## 2. JSON:API Requests
### POST (Create)
```python
# BAD - TruffleHog will flag these patterns:
api_key = "sk-test1234567890T3BlbkFJtest1234567890" # OpenAI pattern
api_key = "AKIA..." # AWS pattern
# GOOD - clearly fake values:
api_key = "sk-fake-test-key-for-unit-testing-only"
api_key = "fake-aws-key-for-testing"
response = client.post(
reverse("provider-list"),
data={"data": {"type": "providers", "attributes": {...}}},
format="vnd.api+json", # NOT content_type!
)
```
**Patterns to avoid:**
- `sk-*T3BlbkFJ*` (OpenAI)
- `AKIA[A-Z0-9]{16}` (AWS Access Key)
- `ghp_*` or `gho_*` (GitHub tokens)
### PATCH (Update)
```python
response = client.patch(
reverse("provider-detail", kwargs={"pk": provider.id}),
data={"data": {"type": "providers", "id": str(provider.id), "attributes": {...}}},
content_type="application/vnd.api+json", # NOT format!
)
```
### Reading Responses
```python
data = response.json()["data"]
attrs = data["attributes"]
errors = response.json()["errors"] # For 400 responses
```
---
## 3. RLS Isolation (Cross-Tenant)
**RLS returns 404, NOT 403** - the resource is invisible, not forbidden.
```python
def test_cross_tenant_access_denied(self, authenticated_client, tenants_fixture):
other_tenant = tenants_fixture[2] # Isolated tenant
foreign_provider = Provider.objects.create(tenant_id=other_tenant.id, ...)
response = authenticated_client.get(reverse("provider-detail", args=[foreign_provider.id]))
assert response.status_code == status.HTTP_404_NOT_FOUND # NOT 403!
```
---
## 4. Celery Task Testing
### Testing Strategies
| Strategy | Use For |
|----------|---------|
| Mock `.delay()` + `Task.objects.get` | Testing views that trigger tasks |
| `task.apply()` | Synchronous task logic testing |
| Mock `chain`/`group` | Testing Canvas orchestration |
| Mock `connection` | Testing `@set_tenant` decorator |
| Mock `apply_async` | Testing Beat scheduled tasks |
### Why NOT `task_always_eager`
| Problem | Impact |
|---------|--------|
| No task serialization | Misses argument type errors |
| No broker interaction | Hides connection issues |
| Different execution context | `self.request` behaves differently |
**Instead, use:** `task.apply()` for sync execution, mocking for isolation.
> **Full examples:** See [assets/api_test.py](assets/api_test.py) for `TestCeleryTaskLogic`, `TestCeleryCanvas`, `TestSetTenantDecorator`, `TestBeatScheduling`.
---
## 5. Fake Secrets (TruffleHog)
```python
# BAD - TruffleHog flags these:
api_key = "sk-test1234567890T3BlbkFJtest1234567890"
# GOOD - obviously fake:
api_key = "sk-fake-test-key-for-unit-testing-only"
```
---
## 6. Response Status Codes
| Scenario | Code |
|----------|------|
| Successful GET | 200 |
| Successful POST | 201 |
| Async operation (DELETE/scan trigger) | 202 |
| Sync DELETE | 204 |
| Validation error | 400 |
| Missing permission (RBAC) | 403 |
| RLS isolation / not found | 404 |
---
@@ -134,11 +155,13 @@ api_key = "fake-aws-key-for-testing"
```bash
cd api && poetry run pytest -x --tb=short
cd api && poetry run pytest -k "test_provider"
cd api && poetry run pytest -k "TestRBAC"
cd api && poetry run pytest api/src/backend/api/tests/test_rbac.py
```
---
## Resources
- **Documentation**: See [references/test-api-docs.md](references/test-api-docs.md) for local file paths and documentation
- **Full Examples**: See [assets/api_test.py](assets/api_test.py) for complete test patterns
- **Fixture Reference**: See [references/test-api-docs.md](references/test-api-docs.md)
- **Fixture Source**: `api/src/backend/conftest.py`

View File

@@ -0,0 +1,371 @@
# Example: Prowler API Test Patterns
# Source: api/src/backend/api/tests/test_views.py
from unittest.mock import Mock, patch
import pytest
from conftest import (
API_JSON_CONTENT_TYPE,
TEST_PASSWORD,
TEST_USER,
get_api_tokens,
get_authorization_header,
)
from django.urls import reverse
from rest_framework import status
from api.models import Provider, Scan, StateChoices
from api.rls import Tenant
@pytest.mark.django_db
class TestProviderViewSet:
"""Example API tests for Provider endpoints."""
def test_list_providers(self, authenticated_client, providers_fixture):
"""GET list returns all providers for authenticated tenant."""
response = authenticated_client.get(reverse("provider-list"))
assert response.status_code == status.HTTP_200_OK
assert len(response.json()["data"]) == len(providers_fixture)
def test_create_provider(self, authenticated_client):
"""POST with JSON:API format creates provider."""
response = authenticated_client.post(
reverse("provider-list"),
data={
"data": {
"type": "providers",
"attributes": {
"provider": "aws",
"uid": "123456789012",
"alias": "my-aws-account",
},
}
},
format="vnd.api+json", # Use format= for POST
)
assert response.status_code == status.HTTP_201_CREATED
assert response.json()["data"]["attributes"]["uid"] == "123456789012"
def test_update_provider(self, authenticated_client, providers_fixture):
"""PATCH with JSON:API format updates provider."""
provider = providers_fixture[0]
payload = {
"data": {
"type": "providers",
"id": str(provider.id), # ID required for PATCH
"attributes": {"alias": "updated-alias"},
}
}
response = authenticated_client.patch(
reverse("provider-detail", kwargs={"pk": provider.id}),
data=payload,
content_type="application/vnd.api+json", # Use content_type= for PATCH
)
assert response.status_code == status.HTTP_200_OK
assert response.json()["data"]["attributes"]["alias"] == "updated-alias"
@pytest.mark.django_db
class TestRLSIsolation:
"""Example RLS cross-tenant isolation tests."""
def test_cross_tenant_access_returns_404(
self, authenticated_client, tenants_fixture
):
"""User cannot see resources from other tenants - returns 404 NOT 403."""
# Create resource in tenant user has NO access to (tenant[2] is isolated)
other_tenant = tenants_fixture[2]
foreign_provider = Provider.objects.create(
provider="aws",
uid="999888777666",
alias="foreign_provider",
tenant_id=other_tenant.id,
)
# Try to access - should get 404 (not 403!)
response = authenticated_client.get(
reverse("provider-detail", args=[foreign_provider.id])
)
assert response.status_code == status.HTTP_404_NOT_FOUND
def test_list_excludes_other_tenants(
self, authenticated_client, providers_fixture, tenants_fixture
):
"""List endpoints only return resources from user's tenants."""
# Create provider in isolated tenant
other_tenant = tenants_fixture[2]
Provider.objects.create(
provider="aws",
uid="foreign123",
tenant_id=other_tenant.id,
)
response = authenticated_client.get(reverse("provider-list"))
assert response.status_code == status.HTTP_200_OK
# Should only see providers_fixture (9 providers in tenant[0])
assert len(response.json()["data"]) == len(providers_fixture)
@pytest.mark.django_db
class TestRBACPermissions:
"""Example RBAC permission tests."""
def test_requires_permission(self, authenticated_client_no_permissions_rbac):
"""Users without manage_providers cannot create providers."""
response = authenticated_client_no_permissions_rbac.post(
reverse("provider-list"),
data={
"data": {
"type": "providers",
"attributes": {"provider": "aws", "uid": "123456789012"},
}
},
format="vnd.api+json",
)
assert response.status_code == status.HTTP_403_FORBIDDEN
def test_user_with_no_roles_denied(self, authenticated_client_rbac_noroles):
"""User with membership but no roles gets 403."""
response = authenticated_client_rbac_noroles.get(reverse("user-list"))
assert response.status_code == status.HTTP_403_FORBIDDEN
def test_admin_sees_all(self, authenticated_client_rbac, providers_fixture):
"""Admin with unlimited_visibility=True sees all providers."""
response = authenticated_client_rbac.get(reverse("provider-list"))
assert response.status_code == status.HTTP_200_OK
@pytest.mark.django_db
class TestAsyncOperations:
"""Example async task tests - mock BOTH .delay() AND Task.objects.get."""
@patch("api.v1.views.Task.objects.get")
@patch("api.v1.views.delete_provider_task.delay")
def test_delete_provider_returns_202(
self,
mock_delete_task,
mock_task_get,
authenticated_client,
providers_fixture,
tasks_fixture,
):
"""DELETE returns 202 Accepted with Content-Location header."""
provider = providers_fixture[0]
prowler_task = tasks_fixture[0]
# Mock the Celery task
task_mock = Mock()
task_mock.id = prowler_task.id
mock_delete_task.return_value = task_mock
mock_task_get.return_value = prowler_task
response = authenticated_client.delete(
reverse("provider-detail", kwargs={"pk": provider.id})
)
assert response.status_code == status.HTTP_202_ACCEPTED
assert "Content-Location" in response.headers
assert f"/api/v1/tasks/{prowler_task.id}" in response.headers["Content-Location"]
# Verify task was called
mock_delete_task.assert_called_once()
@patch("api.v1.views.Task.objects.get")
@patch("api.v1.views.perform_scan_task.delay")
def test_trigger_scan_returns_202(
self,
mock_scan_task,
mock_task_get,
authenticated_client,
providers_fixture,
tasks_fixture,
):
"""POST to scan trigger returns 202 with task location."""
provider = providers_fixture[0]
prowler_task = tasks_fixture[0]
task_mock = Mock()
task_mock.id = prowler_task.id
mock_scan_task.return_value = task_mock
mock_task_get.return_value = prowler_task
response = authenticated_client.post(
reverse("provider-scan", kwargs={"pk": provider.id}),
format="vnd.api+json",
)
assert response.status_code == status.HTTP_202_ACCEPTED
@pytest.mark.django_db
class TestJSONAPIResponses:
"""Example JSON:API response handling."""
def test_read_single_resource(self, authenticated_client, providers_fixture):
"""Read data from single resource response."""
provider = providers_fixture[0]
response = authenticated_client.get(
reverse("provider-detail", kwargs={"pk": provider.id})
)
data = response.json()["data"]
attrs = data["attributes"]
resource_id = data["id"]
assert resource_id == str(provider.id)
assert attrs["provider"] == provider.provider
def test_read_list_response(self, authenticated_client, providers_fixture):
"""Read data from list response."""
response = authenticated_client.get(reverse("provider-list"))
items = response.json()["data"]
assert len(items) == len(providers_fixture)
def test_read_relationships(self, authenticated_client, scans_fixture):
"""Read relationship data."""
scan = scans_fixture[0]
response = authenticated_client.get(
reverse("scan-detail", kwargs={"pk": scan.id})
)
data = response.json()["data"]
relationships = data["relationships"]
provider_rel = relationships["provider"]["data"]
assert provider_rel["type"] == "providers"
assert provider_rel["id"] == str(scan.provider_id)
def test_error_response(self, authenticated_client):
"""Read error response structure."""
response = authenticated_client.post(
reverse("user-list"),
data={"email": "invalid"}, # Missing required fields
format="json",
)
assert response.status_code == status.HTTP_400_BAD_REQUEST
errors = response.json()["errors"]
# Error has source.pointer and detail
assert "source" in errors[0]
assert "detail" in errors[0]
@pytest.mark.django_db
class TestSoftDelete:
"""Example soft-delete manager tests."""
def test_objects_excludes_soft_deleted(self, providers_fixture):
"""Default manager excludes soft-deleted records."""
provider = providers_fixture[0]
provider.is_deleted = True
provider.save()
# objects manager excludes deleted
assert provider not in Provider.objects.all()
# all_objects includes deleted
assert provider in Provider.all_objects.all()
# =============================================================================
# CELERY TASK TESTING
# =============================================================================
@pytest.mark.django_db
class TestCeleryTaskLogic:
"""Example: Testing Celery task logic directly with apply()."""
def test_task_logic_directly(self, tenants_fixture, providers_fixture):
"""Use apply() for synchronous execution without Celery worker."""
from tasks.tasks import check_provider_connection_task
tenant = tenants_fixture[0]
provider = providers_fixture[0]
# Execute task synchronously (no broker needed)
result = check_provider_connection_task.apply(
kwargs={"tenant_id": str(tenant.id), "provider_id": str(provider.id)}
)
assert result.successful()
assert result.result["connected"] is True
@pytest.mark.django_db
class TestCeleryCanvas:
"""Example: Testing Canvas (chain/group) task orchestration."""
@patch("tasks.tasks.chain")
@patch("tasks.tasks.group")
def test_post_scan_workflow(self, mock_group, mock_chain, tenants_fixture):
"""Mock chain/group to verify task orchestration."""
from tasks.tasks import _perform_scan_complete_tasks
tenant = tenants_fixture[0]
# Mock chain.apply_async
mock_chain_instance = Mock()
mock_chain.return_value = mock_chain_instance
_perform_scan_complete_tasks(str(tenant.id), "scan-123", "provider-456")
# Verify chain was called
assert mock_chain.called
mock_chain_instance.apply_async.assert_called()
@pytest.mark.django_db
class TestSetTenantDecorator:
"""Example: Testing @set_tenant decorator behavior."""
@patch("api.decorators.connection")
def test_sets_rls_context(self, mock_conn, tenants_fixture, providers_fixture):
"""Verify @set_tenant sets RLS context via SET_CONFIG_QUERY."""
from tasks.tasks import check_provider_connection_task
tenant = tenants_fixture[0]
provider = providers_fixture[0]
# Call task with tenant_id - decorator sets RLS and pops it
check_provider_connection_task.apply(
kwargs={"tenant_id": str(tenant.id), "provider_id": str(provider.id)}
)
# Verify SET_CONFIG_QUERY was executed
mock_conn.cursor.return_value.__enter__.return_value.execute.assert_called()
@pytest.mark.django_db
class TestBeatScheduling:
"""Example: Testing Beat scheduled task creation."""
@patch("tasks.beat.perform_scheduled_scan_task.apply_async")
def test_schedule_provider_scan(self, mock_apply, providers_fixture):
"""Verify periodic task is created with correct settings."""
from django_celery_beat.models import PeriodicTask
from tasks.beat import schedule_provider_scan
provider = providers_fixture[0]
mock_apply.return_value = Mock(id="task-123")
schedule_provider_scan(provider)
# Verify periodic task created
assert PeriodicTask.objects.filter(
name=f"scan-perform-scheduled-{provider.id}"
).exists()
# Verify immediate execution with countdown
mock_apply.assert_called_once()
call_kwargs = mock_apply.call_args
assert call_kwargs.kwargs.get("countdown") == 5

View File

@@ -1,18 +1,214 @@
# API Test Documentation
# API Test Documentation Reference
## Local Documentation
## File Locations
For API testing patterns, see:
| Type | Path |
|------|------|
| Central fixtures | `api/src/backend/conftest.py` |
| API unit tests | `api/src/backend/api/tests/` |
| Integration tests | `api/src/backend/api/tests/integration/` |
| Task tests | `api/src/backend/tasks/tests/` |
| Dev fixtures (JSON) | `api/src/backend/api/fixtures/dev/` |
- `api/src/backend/conftest.py` - All fixtures
- `api/src/backend/api/tests/` - API tests
- `api/src/backend/tasks/tests/` - Task tests
---
## Contents
## Fixture Dependency Graph
The documentation covers:
- JSON:API format for requests/responses
- RLS isolation test patterns
- RBAC permission tests
- Celery task mocking
- Test fixtures and their usage
```
create_test_user (session)
└─► tenants_fixture (function)
├─► set_user_admin_roles_fixture
│ │
│ └─► authenticated_client
│ └─► (most API tests use this)
├─► providers_fixture
│ └─► scans_fixture
│ └─► findings_fixture
└─► RBAC fixtures (create their own tenants/users):
├─► create_test_user_rbac
│ └─► authenticated_client_rbac
├─► create_test_user_rbac_no_roles
│ └─► authenticated_client_rbac_noroles
├─► create_test_user_rbac_limited
│ └─► authenticated_client_no_permissions_rbac
├─► create_test_user_rbac_manage_account
│ └─► authenticated_client_rbac_manage_account
└─► create_test_user_rbac_manage_users_only
└─► authenticated_client_rbac_manage_users_only
```
---
## Test File Contents
### `api/src/backend/api/tests/test_views.py`
Main ViewSet tests covering:
- `TestUserViewSet` - User CRUD, password validation, deletion cascades
- `TestTenantViewSet` - Tenant operations
- `TestProviderViewSet` - Provider CRUD, async deletion, connection testing
- `TestScanViewSet` - Scan trigger, list, filter
- `TestFindingViewSet` - Finding queries, filters
- `TestResourceViewSet` - Resource listing with tags
- `TestTaskViewSet` - Celery task status
- `TestIntegrationViewSet` - S3/Security Hub integrations
- `TestComplianceOverviewViewSet` - Compliance data
- And many more...
### `api/src/backend/api/tests/test_rbac.py`
RBAC permission tests covering:
- Permission checks for each ViewSet
- Role-based access patterns
- `unlimited_visibility` behavior
- Provider group visibility filtering
- Self-access patterns (`/me` endpoint)
### `api/src/backend/api/tests/integration/test_rls_transaction.py`
RLS enforcement tests:
- `rls_transaction` context manager
- Invalid UUID validation
- Custom parameter names
### `api/src/backend/api/tests/integration/test_providers.py`
Provider integration tests:
- Delete + recreate flow with async tasks
- End-to-end provider lifecycle
### `api/src/backend/api/tests/integration/test_authentication.py`
Authentication tests:
- JWT token flow
- API key authentication
- Social login (SAML, OAuth)
- Cross-tenant token isolation
---
## Key Test Classes and Their Fixtures
### Standard API Tests
```python
@pytest.mark.django_db
class TestProviderViewSet:
def test_list(self, authenticated_client, providers_fixture):
# authenticated_client has JWT for tenant[0]
# providers_fixture has 9 providers in tenant[0]
...
```
### RBAC Tests
```python
@pytest.mark.django_db
class TestProviderRBAC:
def test_with_permission(self, authenticated_client_rbac, ...):
# Has all permissions
...
def test_without_permission(self, authenticated_client_no_permissions_rbac, ...):
# Has no permissions (all False)
...
```
### Cross-Tenant Tests
```python
@pytest.mark.django_db
class TestCrossTenantIsolation:
def test_cannot_access_other_tenant(self, authenticated_client, tenants_fixture):
other_tenant = tenants_fixture[2] # Isolated tenant
# Create resource in other_tenant
# Try to access with authenticated_client
# Expect 404
```
### Async Task Tests
```python
@pytest.mark.django_db
class TestAsyncOperations:
@patch("api.v1.views.Task.objects.get")
@patch("api.v1.views.some_task.delay")
def test_async_operation(self, mock_task, mock_task_get, tasks_fixture, ...):
prowler_task = tasks_fixture[0]
mock_task.return_value = Mock(id=prowler_task.id)
mock_task_get.return_value = prowler_task
# Execute and verify 202 response
```
---
## Constants Available from conftest
```python
from conftest import (
API_JSON_CONTENT_TYPE, # "application/vnd.api+json"
NO_TENANT_HTTP_STATUS, # status.HTTP_401_UNAUTHORIZED
TEST_USER, # "dev@prowler.com"
TEST_PASSWORD, # "testing_psswd"
TODAY, # str(datetime.today().date())
today_after_n_days, # Function: (n: int) -> str
get_api_tokens, # Function: (client, email, password, tenant_id?) -> (access, refresh)
get_authorization_header, # Function: (token) -> {"Authorization": f"Bearer {token}"}
)
```
---
## Running Tests
```bash
# Full test suite
cd api && poetry run pytest
# Fast fail on first error
cd api && poetry run pytest -x
# Short traceback
cd api && poetry run pytest --tb=short
# Specific file
cd api && poetry run pytest api/src/backend/api/tests/test_views.py
# Pattern match
cd api && poetry run pytest -k "Provider"
# Verbose with print output
cd api && poetry run pytest -v -s
# With coverage
cd api && poetry run pytest --cov=api --cov-report=html
# Parallel execution
cd api && poetry run pytest -n auto
```
---
## pytest Configuration
From `api/pyproject.toml`:
```toml
[tool.pytest.ini_options]
DJANGO_SETTINGS_MODULE = "config.settings"
python_files = "test_*.py"
addopts = "--reuse-db"
```
Key points:
- Uses `--reuse-db` for faster test runs
- Settings from `config.settings`
- Test files must match `test_*.py`

View File

@@ -137,6 +137,8 @@ extract_metadata() {
# On multi-line list, only accept "- item" lines. Anything else ends the list.
line = $0
# Stop at frontmatter delimiter (getline bypasses pattern matching)
if (line ~ /^---$/) break
if (line ~ /^[[:space:]]*-[[:space:]]*/) {
sub(/^[[:space:]]*-[[:space:]]*/, "", line)
line = trim(line)

View File

@@ -21,8 +21,10 @@ When performing these actions, ALWAYS invoke the corresponding skill FIRST:
| Add changelog entry for a PR or feature | `prowler-changelog` |
| App Router / Server Actions | `nextjs-15` |
| Building AI chat features | `ai-sdk-5` |
| Committing changes | `prowler-commit` |
| Create PR that requires changelog entry | `prowler-changelog` |
| Creating Zod schemas | `zod-4` |
| Creating a git commit | `prowler-commit` |
| Creating/modifying Prowler UI components | `prowler-ui` |
| Review changelog format and conventions | `prowler-changelog` |
| Update CHANGELOG.md in any component | `prowler-changelog` |