Kranthi Chaithanya Thota_
|
Data Scientist & Engineer building intelligent, scalable solutions with Python, ML, and Cloud infrastructure (GCP/AWS). Master's Candidate (4.0 GPA), 3+ years experience, certified in GCP & Terraform.
// 01. About Me
I'm a passionate problem-solver operating at the intersection of Data Science, Machine Learning, and Cloud Engineering. My focus is on translating complex data into actionable insights and robust, production-ready systems.
With hands-on experience spanning the full data lifecycle – from designing ETL/ELT pipelines (Airflow, Spark) and deploying cloud infrastructure (Terraform, GCP, AWS) to developing and operationalizing ML/DL models (TensorFlow, PyTorch, MLOps) – I thrive on technical challenges.
I prioritize clean code, automation, and delivering quantifiable results, whether it's improving model accuracy (>90% F1), boosting pipeline efficiency (30%+), or reducing manual effort (40%+).
Key Metrics & Achievements:
- 0+ Years Experience
- 0%%+ F1 Scores (NLP/NER)
- 0%+ Accuracy (CV Models)
- 0% Proc Efficiency Boost
- 0% in Manual Setup (IaC)
- 0% Congestion Reduction
- 0+ Students Mentored
- 0+ Cloud/IaC Certifications
// 02. Technical Skills
Languages & Core
- Python
- SQL
- GoLang
- Bash
- R
- JavaScript
ML / DL / AI
- TensorFlow
- PyTorch
- Scikit-learn
- Hugging Face
- spaCy
- LLMs/RAG
- BERT
- YOLO
- OpenCV
- RL
- Stats
Cloud Platforms
- GCP (Expert)
- AWS
- Azure (Basic)
- Vertex AI
- BigQuery
- Dataflow
- Cloud Run
- GKE
- S3/GCS
- RedShift/SQL
DevOps / MLOps / IaC
- Terraform
- Docker
- Kubernetes
- CI/CD (Git, Cloud Build)
- Airflow
- MLflow
- Monitoring
- Logging
Data Eng & Databases
- ETL/ELT Design
- Spark
- Hadoop
- Data Modeling
- PostgreSQL
- MySQL
- SQL Server
- MongoDB
- Data Quality
Viz & Other Tools
- Tableau
- PowerBI
- Looker
- Grafana
- D3.js
- Document AI
- Web Scraping
- ArcGIS Pro
// 03. Experience
- Analyzed institutional data (PowerBI/Python), reducing faculty data retrieval time 30%.
- Mentored 50+ students (Python/SQL/NLP); built NER pipeline (>90% F1).
- Engineered traffic pipelines (drone/metrocount), cutting congestion 22%; built Tableau dashboards.
- Deployed YOLOv8 model (95% parking detection accuracy) and many more.
- Engineered ETL workflows (Airflow/GCP), boosting efficiency 30%.
- Automated CI/CD pipelines (GCP), reducing manual errors 40%.
- Automated infra (Terraform IaC, 50+ resources), cutting setup time 70%.
- Led patent analysis (Google Doc AI, 2k+ docs); built Looker dashboards, reducing reporting 40% and many more.
- Developed automated ETL (AppsScript, Cloud Functions, BigQuery) for dynamic data integration and dashboard development.
- Developed Event/Trigger based ETL scripts that run automating database replication from operations database(PostgreSQL) to the data warehouse (BigQuery) for analysis.
// 04. Projects
AI Therapeutic Chatbot
Full-stack chatbot (Ollama/Gemma, Flask, Voice I/O) enhancing senior mental well-being.
- Python
- LLM
- Flask
- API
Healthcare RAG QA System
RAG pipeline (Transformers, Vector DBs) for accurate, context-aware healthcare QA.
- Hugging Face
- RAG
- Vector DB
- NLP
NER Model Fine-tuning (>92% F1)
Fine-tuned BERT for domain-specific NER, achieving high performance.
- BERT
- Hugging Face
- TF
- NLP
DeepFake Detection (85%+ Acc)
EfficientNet model; enhanced 15% via advanced preprocessing (Adaptive ELA).
- EfficientNet
- TF
- PyTorch
- CV
Automated Instagram Bot (45% Eng Incr)
GPT-3.5 pipeline automating posts via Instagram API & NLP trend analysis.
- Python
- GPT-3
- API
- NLP
- Automation
RL for Autonomous Control
Applied Reinforcement Learning in simulation to train agent for quadcopter stabilization.
- RL
- Python
- Simulation
YOLOv8 Parking Detection (95% Acc)
Deployed CV model for accurate parking space detection from drone imagery.
- YOLOv8
- CV
- Python
- Deployment
BRFSS Health Risk Analysis (80% Acc)
Analyzed 400k+ health records (ANOVA, Chi-Sq) to identify key risk factors using TF.
- Python
- TensorFlow
- Statistics
- Healthcare
Social Distancing Alert Device
Wearable prototype using RSSI signals to alert users in crowded spaces.
- Hardware
- Bluetooth
- RSSI
- IoT
Wine Quality Prediction
EDA and predictive modeling to determine factors influencing wine quality using ML algorithms.
- Python
- Scikit-learn
- Pandas
- ML
IPEDS ETL pipeline
Pipeline to load IPEDS data from Access DBs to SQL Server with transformations for future sustainability.
- Python
- SQL Server
- ETL
- Automation
Real-Time Streaming Pipeline (1k+ evt/s)
Pipeline (Pub/Sub, Dataflow/Spark) processing events to BigQuery with low latency.
- GCP
- Pub/Sub
- Dataflow
- Streaming
- BigQuery
Data Warehouse Schema Design
Designed & implemented Star Schema model (SQL, BigQuery/Redshift) for efficient querying.
- Data Modeling
- SQL
- BigQuery
- Redshift
- DW
Patent Data Pipeline (DocAI)
Automated pipeline using Google Document AI for patent data extraction (2k+ PDFs).
- GCP
- Document AI
- BigQuery
- ETL
- NLP
Automated Reporting Pipeline
E2E pipeline (Python, SQL, Airflow) extracting, loading, & visualizing in Tableau/PowerBI.
- Python
- SQL
- Airflow
- Tableau
- PowerBI
NYSERDA Grant Pipeline & NER
Automated scraping (Beautiful Soup) & NER (Hugging Face, 90% F1) for 10k+ grants.
- Web Scraping
- Beautiful Soup
- PostgreSQL
- NER
ML Deployment Pipeline (70% Time Red)
Automated CI/CD (Docker, K8s, TF Serving/Cloud Run) reducing ML deployment time.
- MLOps
- CI/CD
- Docker
- Kubernetes
- GCP
Multi-Cloud Infra (IaC, >95% Auto)
Provisioned prod infrastructure (VPC, compute, DBs) across GCP/AWS via Terraform.
- Terraform
- IaC
- GCP
- AWS
- Multi-Cloud
Serverless Event Pipeline (60% Cost Red)
Pipeline (Cloud Functions/Lambda, Pub/Sub) processing 1k+ events/sec.
- Serverless
- GCP
- AWS
- Pub/Sub
Automated Container App CI/CD (50% Freq Incr)
CI/CD (Cloud Build) to containerize (Docker) & deploy to K8s (GKE/EKS).
- CI/CD
- Docker
- Kubernetes
- GCP
- AWS
Automated Cloud DB Mgmt (60% Task Red)
Terraform modules automating Cloud SQL/RDS deployment, config, backup.
- Terraform
- GCP
- AWS
- Database
- Automation
Automated Cloud Secrets Mgmt
Implemented secrets management (GCP Secret Manager, Terraform), eliminating hardcoded credentials.
- Terraform
- Secrets
- GCP
- Security
E-commerce Customer Segmentation
Performed K-Means clustering (Python, Scikit-learn) for targeted marketing insights.
- Python
- Scikit-learn
- Clustering
- EDA
Clarkson Traffic/Parking Dashboard
Created interactive Tableau dashboards visualizing campus traffic and parking data.
- Tableau
- GIS
- Data Analysis
- Dashboard
A2A Collaborative Analysis
Statistical analysis on roadkill data using surveys/drone imagery for hotspot determination.
- R
- ArcGIS Pro
- Statistics
- GIS
Town of Colton Complete Streets
Developed comprehensive plan via survey analysis and GIS visualization.
- ArcGIS Pro
- QGIS
- Survey Analysis
- GIS
Misc Data Visualization Projects
Collection including NYC Bike Lanes (GIS), Election Mapping (R), Spotify Trends (Python).
- Tableau
- R
- Python
- GIS
- D3.js
Twitter Sentiment Dashboard (40% Report Red)
Automated sentiment pipeline (Python, NLP) visualized in Looker/Tableau.
- Python
- NLP
- Looker
- Tableau
- Dashboard
OTT Web Platform (Netflix Clone)
Responsive streaming web application using React, Firebase, Node.js.
- React
- Node.js
- Firebase
- Web Dev
Voice-Based Email Client
Python application enabling email management through voice commands using GTTS.
- Python
- SpeechRecognition
- GTTS
// 05. Contact
Let's build something great together.
I'm actively seeking full-time roles where I can contribute my expertise in Data Science, ML, and Cloud Engineering. Whether you have a specific project in mind or just want to connect, feel free to reach out.
// Say Hello