Architecture Overview¶
Welcome to the Open Estate AI architecture documentation. This page provides a high-level view of how our system components work together to deliver intelligent property search and insights.
System Overview¶
Open Estate AI is built on two main architectural pillars that work together seamlessly:
- Data Ingestion Architecture - Collects, processes, and indexes real estate data
- Multi-Agent Architecture - Delivers intelligent search and insights through AI agents
How They Connect¶
flowchart TB
subgraph Sources["🌐 Data Sources"]
RERA["UP RERA Website"]
Other["Other Property Portals"]
end
subgraph Ingestion["📥 Data Ingestion Pipeline"]
direction TB
Scrapers["🤖 Scrapers<br/><i>Collect Data</i>"]
S3Raw["📦 S3 Raw<br/><i>Store JSON</i>"]
Lambda["⚙️ Lambda<br/><i>Process</i>"]
SageMaker["🧠 SageMaker<br/><i>Embeddings</i>"]
S3Vectors["🔍 S3 Vectors<br/><i>Indexed</i>"]
end
subgraph MultiAgent["🎯 Multi-Agent System"]
direction TB
UI["💻 User Interface<br/><i>Web/Mobile</i>"]
Orchestrator["🎭 Orchestrator<br/><i>Coordinates</i>"]
Agents["🤖 AI Agents<br/><i>Search, Legal, etc.</i>"]
end
subgraph Users["👥 End Users"]
User["Property Seekers<br/>Investors<br/>Researchers"]
end
RERA -->|"Scrape"| Scrapers
Other -->|"Scrape"| Scrapers
Scrapers -->|"Upload"| S3Raw
S3Raw -->|"Trigger"| Lambda
Lambda -->|"Generate"| SageMaker
SageMaker -->|"Store"| S3Vectors
User -->|"Query"| UI
UI -->|"Request"| Orchestrator
Orchestrator -->|"Coordinate"| Agents
Agents -->|"Search"| S3Vectors
S3Vectors -->|"Results"| Agents
Agents -->|"Response"| Orchestrator
Orchestrator -->|"Display"| UI
UI -->|"Results"| User
classDef sourceStyle fill:#14B8A6,stroke:#0F766E,stroke-width:3px,color:#fff
classDef ingestionStyle fill:#F59E0B,stroke:#D97706,stroke-width:3px,color:#fff
classDef agentStyle fill:#8B5CF6,stroke:#6D28D9,stroke-width:3px,color:#fff
classDef userStyle fill:#4A90E2,stroke:#2E5C8A,stroke-width:3px,color:#fff
classDef ingestionGroupStyle fill:#FFF7E6,stroke:#FA8C16,stroke-width:3px,color:#AD4E00
classDef agentGroupStyle fill:#F3E8FF,stroke:#8B5CF6,stroke-width:3px,color:#5B21B6
classDef sourceGroupStyle fill:#CCFBF1,stroke:#14B8A6,stroke-width:3px,color:#134E4A
classDef userGroupStyle fill:#DBEAFE,stroke:#4A90E2,stroke-width:3px,color:#1E3A8A
class RERA,Other sourceStyle
class Scrapers,S3Raw,Lambda,SageMaker,S3Vectors ingestionStyle
class UI,Orchestrator,Agents agentStyle
class User userStyle
class Sources sourceGroupStyle
class Ingestion ingestionGroupStyle
class MultiAgent agentGroupStyle
class Users userGroupStyle
Architecture Components¶
1. Data Ingestion Architecture¶
Purpose: Collect, process, and make real estate data searchable
Key Components: - Scrapers - Automated data collection from websites (UP RERA, property portals) - S3 Raw Storage - Stores scraped data in JSON format with date partitioning - Lambda Processing - Transforms raw data and generates vector embeddings - SageMaker - ML model for creating 384-dimensional embeddings - S3 Vectors - Vector database for semantic search
Learn More: 📥 Data Ingestion Architecture
Key Features: - Automatic data collection and updates - Real-time processing pipeline - Semantic search capabilities - Scalable to millions of properties
2. Multi-Agent Architecture¶
Purpose: Deliver intelligent property search and insights through AI agents
Key Components: - User Interface - Web and mobile applications - Orchestrator - Coordinates multiple AI agents to handle complex queries - Intent Classifier - Understands user queries and extracts meaning - Planner Agent - Creates execution plans for complex workflows - Specialized Agents - Search, Legal Check, Valuation, Verification - Data Layer - Vector DB, Relational DB, Cache
Learn More: 🤖 Multi-Agent Architecture
Key Features: - Natural language query understanding - Multi-agent collaboration - Parallel task execution - Comprehensive property analysis
Data Flow: End-to-End¶
Here's how data flows from source websites to end users:
Phase 1: Data Collection & Processing¶
- Scrapers collect property data from UP RERA and other sources
- Raw data is uploaded to S3 in NDJSON format
- Lambda function is triggered automatically
- SageMaker generates vector embeddings from property descriptions
- Vectors are stored in S3 Vectors with metadata
- Data is indexed for fast semantic search
Phase 2: User Query & Response¶
- User submits a query: "Find 3 bedroom apartments in Noida under construction"
- UI sends request to Orchestrator
- Intent Classifier extracts:
bedrooms=3, location=Noida, status=under_construction - Planner Agent creates workflow: Search → Legal Check → Valuation → Verify
- Search Agent queries S3 Vectors using semantic similarity
- Legal Agent validates RERA registration
- Valuation Agent provides price analysis
- Verification Agent checks data quality
- Orchestrator aggregates results and applies confidence scoring
- UI displays ranked properties with insights
Technology Stack¶
Data Ingestion¶
- Compute: AWS App Runner, AWS Lambda
- Storage: AWS S3 (Raw + Vectors)
- ML: AWS SageMaker (sentence-transformers/all-MiniLM-L6-v2)
- Processing: Python 3.12, Playwright, FastMCP
- AI: AWS Bedrock (Claude Haiku)
Multi-Agent System¶
- Frontend: React/Next.js (Web), React Native (Mobile)
- Backend: FastAPI, Python
- Orchestration: Message Queue (AWS SQS/RabbitMQ)
- Database: PostgreSQL (Relational), Redis (Cache), S3 Vectors
- AI: AWS Bedrock, OpenAI API (tracing only)
- Infrastructure: AWS (ECS, Lambda, RDS, ElastiCache)
Key Benefits¶
For Property Seekers¶
- Natural Language Search - Ask questions like you would to a human
- Comprehensive Analysis - Legal status, price analysis, verification
- Intelligent Recommendations - Personalized based on preferences
- Real-time Updates - Always up-to-date property information
For Developers¶
- Modular Architecture - Easy to add new data sources or agents
- Scalable Design - Handles growing data and user base
- Observable System - Full tracing and monitoring
- Cost Effective - Optimized for cloud economics
For Contributors¶
- Clear Separation - Ingestion and multi-agent systems are independent
- Well Documented - Comprehensive architecture docs
- Open Source - Community-driven development
- Modern Stack - Latest AI/ML technologies
Getting Started¶
For Data Contributors¶
- Review Data Ingestion Architecture
- Set up scrapers following the scraper guide
- Deploy infrastructure using Terraform
- Monitor data quality and processing
For Application Developers¶
- Review Multi-Agent Architecture
- Understand agent communication protocols
- Use the API to query property data
- Build UI components for your use case
For System Administrators¶
- Review both architecture documents
- Set up monitoring and alerting
- Configure cost controls
- Implement backup strategies
Support & Community¶
Documentation¶
Get Help¶
- GitHub Issues: open-estate-ai/real-estate-docs
- Slack Community: Join our Slack
- Documentation: docs.open-estate-ai.org
Next Steps¶
- Understand the Architecture: Read the detailed docs for Ingestion and Multi-Agent systems
- Set Up Your Environment: Follow the setup guides in each repository
- Explore the Code: Check out the GitHub repositories
- Join the Community: Connect with other contributors on Slack
- Start Contributing: Pick an issue and submit your first PR!
License¶
See repository root for license information.