MLaaS Platform
Enterprise Machine Learning as a Service platform on AWS — Lambda, API Gateway, SNS, SQS, S3, Athena, RDS, and Apigee — delivering scalable ML model deployment and serving for multiple engineering teams.
Scalable ML serving
Event-driven architecture
Apigee + Logz integration
Multi-team adoption
Overview
Designed and built a Machine Learning as a Service (MLaaS) platform on AWS that standardizes how ML models are deployed, served, and monitored across Dish Network. The platform provides a unified interface for ML teams to ship models to production without managing infrastructure.
Architecture
The platform is built on an event-driven, serverless architecture:
Client → Apigee API Gateway → AWS API Gateway → Lambda (sync inference)
↓
SQS → Lambda (async inference)
↓
S3 (results) → Athena (analytics)
↓
SNS (notifications)
Synchronous inference: Low-latency requests go through API Gateway → Lambda, with results returned in real time.
Asynchronous inference: High-volume or long-running jobs are queued via SQS, processed by worker Lambdas, and results stored in S3 with SNS notifications on completion.
Data layer: RDS (PostgreSQL) stores model metadata, deployment configs, and job history. Athena provides SQL analytics over inference logs in S3.
API Management: Integrated Apigee for external API product management, rate limiting, and developer portal. Added Logz for centralized log monitoring.
Results
- Standardized ML deployment across multiple engineering teams
- Reduced model deployment time from days to hours
- Apigee + Logz integration improved API observability and reliability
- Platform handles variable inference workloads with auto-scaling Lambda
- Comprehensive unit test coverage for all ETL and serving components
Tech Stack
Compute: AWS Lambda, EC2
API: AWS API Gateway, Apigee
Messaging: AWS SNS, SQS
Storage: AWS S3, RDS (PostgreSQL)
Analytics: AWS Athena
Monitoring: AWS CloudWatch, Logz
Language: Python, SQL