AutoC4 Project Documentation

Project Overview
AutoC4 is an AI-driven codebase analysis and C4 architecture generation platform that combines advanced code parsing, AI-powered summarization, vector embeddings, and intelligent search capabilities to automatically generate C4 architecture models from source code repositories.
Architecture
Frontend Application
Technology Stack:
- React 18.2.0 with TypeScript
- Material-UI (MUI) v5.15.15 for UI components
- Vite as build tool and development server
- React Router DOM v6.22.3 for navigation
- TanStack React Query v5.28.9 for data fetching and state management
- Axios for HTTP client
- React Markdown for rendering markdown content
Key Features:
- Responsive dashboard with system status monitoring
- Repository analysis interface supporting GitHub, GitLab, Bitbucket, and ZIP uploads
- Real-time pipeline progress tracking
- Interactive C4 model viewer with four levels (System Context, Containers, Components, Code)
- Chat interface for natural language codebase queries
- Modern dark/light theme with Material Design principles
Application Structure:
txt
Backend Application
Technology Stack:
- NestJS framework with TypeScript
- Tree-sitter for AST parsing (JavaScript, TypeScript, Python, Java)
- Azure OpenAI for AI completions and embeddings
- Azure AI Search for vector and hybrid search
- Simple-git for Git repository operations
- JSZip for archive handling
- UUID for unique identifier generation
Core Services:
-
IngestionService: Handles repository cloning and ZIP file extraction
- Supports GitHub, GitLab, Bitbucket (public/private)
- ZIP file upload and extraction
- Repository validation and branch detection
-
ParserService: AST-based code analysis
- Multi-language support (JS, TS, Python, Java)
- Extracts classes, functions, methods, imports, exports
- Generates structured project representations
-
SummarizationService: AI-powered code summarization
- Azure OpenAI integration for code analysis
- Generates summaries with complexity analysis
- Extracts key features and dependencies
-
EmbeddingService: Vector embedding generation
- text-embedding-ada-002 model integration
- Batch processing with rate limiting
- 1536-dimensional vectors for semantic search
-
SearchService: Azure AI Search integration
- Hybrid search (keyword + vector)
- Index management and document operations
- Faceted search with filtering capabilities
-
PipelineService: Orchestrates the analysis workflow
- Multi-stage processing pipeline
- Real-time progress tracking
- Error handling and recovery
-
AgentService: AI agent for C4 model generation
- Tool-based architecture for codebase exploration
- Structured C4 model output
- Multi-level architecture analysis
-
ChatService: RAG-based chat interface
- Context-aware responses using retrieved code
- Conversation history management
- Source attribution and relevance scoring
Data Flow and Processing Pipeline
Stage 1: Repository Ingestion
- Repository URL validation and authentication
- Git cloning or ZIP extraction to temporary directory
- Metadata collection (size, file count, repository type)
Stage 2: Code
- Tree-sitter AST parsing for supported languages
- Extraction of code structures (classes, functions, methods)
- Import/export relationship analysis
- Generation of structured project representation
Stage 3: Code Summarization
- AI-powered analysis of code components
- Generation of natural language summaries
- Complexity assessment and feature extraction
- Dependency identification
Stage 4: Vector Embedding Generation
- Conversion of summaries to vector embeddings
- Batch processing for efficiency
- 1536-dimensional vectors using Azure OpenAI
Stage 5: Search Indexing
- Azure AI Search index creation/update
- Document upload with metadata
- Vector search configuration
- Full-text and hybrid search setup
Stage 6: C4 Model Generation
- AI agent-based architecture analysis
- Multi-level C4 model creation:
- C1: System Context (actors, external systems)
- C2: Containers (applications, databases)
- C3: Components (modules, services)
- C4: Code (classes, functions)
API Endpoints
Analysis API (/api/v1/analysis)
POST /- Create repository analysisPOST /upload- Upload ZIP file analysisPOST /branches- Get repository branchesGET /:id/status- Get analysis statusPOST /:id/c4-model- Generate C4 modelGET /:id/c4-model- Retrieve C4 modelGET /:id/files- Get file structure
Chat API (/api/v1/chat)
POST /- Send chat messageGET /:id/info- Get analysis infoGET /:id/suggestions- Get suggested questions
System API (/api)
GET /health- System health checkGET /- Application information
Data Models
Analysis Request
typescript
Project Structure
typescript
C4 Model
typescript
Configuration
Environment Variables
bash
Frontend Configuration
bash
File System Structure
Temporary Analysis Storage
txt
Azure AI Search Indexes
- Index naming:
c4mcp-codebase-{analysis-id} - Vector dimensions: 1536 (text-embedding-ada-002)
- Search algorithms: HNSW for vector search
- Document schema includes code chunks, summaries, embeddings, metadata
Supported Languages and File Types
Fully Supported (AST Parsing):
- JavaScript (.js, .jsx)
- TypeScript (.ts, .tsx)
- Python (.py)
- Java (.java)
Repository Sources:
- GitHub (public/private)
- GitLab (public/private)
- Bitbucket (public/private)
- ZIP file upload (max 100MB)
Key Features
- Multi-Source Repository Ingestion: Support for major Git providers with authentication
- Advanced Code Parsing: AST-based analysis with Tree-sitter for accurate code structure extraction
- AI-Powered Analysis: Azure OpenAI integration for intelligent code summarization and C4 generation
- Semantic Search: Vector embeddings with hybrid search capabilities
- Interactive C4 Models: Four-level architecture visualization with drill-down capabilities
- Natural Language Chat: RAG-based conversational interface for codebase exploration
- Real-time Processing: Live pipeline progress tracking with detailed status updates
- Comprehensive API: RESTful API with OpenAPI documentation
Technical Specifications
- Frontend Build: Vite-based development and production builds
- Backend Framework: NestJS with dependency injection and modular architecture
- Database: Stateless design using Azure AI Search for persistence
- Authentication: Personal Access Token support for private repositories
- File Processing: Streaming file operations with configurable batch sizes
- Error Handling: Comprehensive error boundaries and graceful degradation
- Performance: Optimized for large codebases with efficient chunking and batching