AutoC4 Project Documentation#
Project Overview##
AutoC4 is an AI-driven codebase analysis and C4 architecture generation platform that combines advanced code parsing, AI-powered summarization, vector embeddings, and intelligent search capabilities to automatically generate C4 architecture models from source code repositories.
Architecture##
Frontend Application###
Technology Stack:
- React 18.2.0 with TypeScript
- Material-UI (MUI) v5.15.15 for UI components
- Vite as build tool and development server
- React Router DOM v6.22.3 for navigation
- TanStack React Query v5.28.9 for data fetching and state management
- Axios for HTTP client
- React Markdown for rendering markdown content
Key Features:
- Responsive dashboard with system status monitoring
- Repository analysis interface supporting GitHub, GitLab, Bitbucket, and ZIP uploads
- Real-time pipeline progress tracking
- Interactive C4 model viewer with four levels (System Context, Containers, Components, Code)
- Chat interface for natural language codebase queries
- Modern dark/light theme with Material Design principles
Application Structure:
txt
Backend Application###
Technology Stack:
- NestJS framework with TypeScript
- Tree-sitter for AST parsing (JavaScript, TypeScript, Python, Java)
- Azure OpenAI for AI completions and embeddings
- Azure AI Search for vector and hybrid search
- Simple-git for Git repository operations
- JSZip for archive handling
- UUID for unique identifier generation
Core Services:
-
IngestionService: Handles repository cloning and ZIP file extraction
- Supports GitHub, GitLab, Bitbucket (public/private)
- ZIP file upload and extraction
- Repository validation and branch detection
-
ParserService: AST-based code analysis
- Multi-language support (JS, TS, Python, Java)
- Extracts classes, functions, methods, imports, exports
- Generates structured project representations
-
SummarizationService: AI-powered code summarization
- Azure OpenAI integration for code analysis
- Generates summaries with complexity analysis
- Extracts key features and dependencies
-
EmbeddingService: Vector embedding generation
- text-embedding-ada-002 model integration
- Batch processing with rate limiting
- 1536-dimensional vectors for semantic search
-
SearchService: Azure AI Search integration
- Hybrid search (keyword + vector)
- Index management and document operations
- Faceted search with filtering capabilities
-
PipelineService: Orchestrates the analysis workflow
- Multi-stage processing pipeline
- Real-time progress tracking
- Error handling and recovery
-
AgentService: AI agent for C4 model generation
- Tool-based architecture for codebase exploration
- Structured C4 model output
- Multi-level architecture analysis
-
ChatService: RAG-based chat interface
- Context-aware responses using retrieved code
- Conversation history management
- Source attribution and relevance scoring
Data Flow and Processing Pipeline##
Stage 1: Repository Ingestion###
- Repository URL validation and authentication
- Git cloning or ZIP extraction to temporary directory
- Metadata collection (size, file count, repository type)
Stage 2: Code###
- Tree-sitter AST parsing for supported languages
- Extraction of code structures (classes, functions, methods)
- Import/export relationship analysis
- Generation of structured project representation
Stage 3: Code Summarization###
- AI-powered analysis of code components
- Generation of natural language summaries
- Complexity assessment and feature extraction
- Dependency identification
Stage 4: Vector Embedding Generation###
- Conversion of summaries to vector embeddings
- Batch processing for efficiency
- 1536-dimensional vectors using Azure OpenAI
Stage 5: Search Indexing###
- Azure AI Search index creation/update
- Document upload with metadata
- Vector search configuration
- Full-text and hybrid search setup
Stage 6: C4 Model Generation###
- AI agent-based architecture analysis
- Multi-level C4 model creation:
- C1: System Context (actors, external systems)
- C2: Containers (applications, databases)
- C3: Components (modules, services)
- C4: Code (classes, functions)
API Endpoints##
Analysis API (/api/v1/analysis)###
POST /- Create repository analysisPOST /upload- Upload ZIP file analysisPOST /branches- Get repository branchesGET /:id/status- Get analysis statusPOST /:id/c4-model- Generate C4 modelGET /:id/c4-model- Retrieve C4 modelGET /:id/files- Get file structure
Chat API (/api/v1/chat)###
POST /- Send chat messageGET /:id/info- Get analysis infoGET /:id/suggestions- Get suggested questions
System API (/api)###
GET /health- System health checkGET /- Application information
Data Models##
Analysis Request###
typescript
Project Structure###
typescript
C4 Model###
typescript
Configuration##
Environment Variables###
bash
Frontend Configuration###
bash
File System Structure##
Temporary Analysis Storage###
txt
Azure AI Search Indexes###
- Index naming:
c4mcp-codebase-{analysis-id} - Vector dimensions: 1536 (text-embedding-ada-002)
- Search algorithms: HNSW for vector search
- Document schema includes code chunks, summaries, embeddings, metadata
Supported Languages and File Types##
Fully Supported (AST Parsing):
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Python (.py)
- Java (.java)
Repository Sources:
- GitHub (public/private)
- GitLab (public/private)
- Bitbucket (public/private)
- ZIP file upload (max 100MB)
Key Features##
- Multi-Source Repository Ingestion: Support for major Git providers with authentication
- Advanced Code Parsing: AST-based analysis with Tree-sitter for accurate code structure extraction
- AI-Powered Analysis: Azure OpenAI integration for intelligent code summarization and C4 generation
- Semantic Search: Vector embeddings with hybrid search capabilities
- Interactive C4 Models: Four-level architecture visualization with drill-down capabilities
- Natural Language Chat: RAG-based conversational interface for codebase exploration
- Real-time Processing: Live pipeline progress tracking with detailed status updates
- Comprehensive API: RESTful API with OpenAPI documentation
Technical Specifications##
- Frontend Build: Vite-based development and production builds
- Backend Framework: NestJS with dependency injection and modular architecture
- Database: Stateless design using Azure AI Search for persistence
- Authentication: Personal Access Token support for private repositories
- File Processing: Streaming file operations with configurable batch sizes
- Error Handling: Comprehensive error boundaries and graceful degradation
- Performance: Optimized for large codebases with efficient chunking and batching