Resume
// Hello, world! My name is

Kranthi Chaithanya Thota_

|

Data Scientist & Engineer building intelligent, scalable solutions with Python, ML, and Cloud infrastructure (GCP/AWS). Master's Candidate (4.0 GPA), 3+ years experience, certified in GCP & Terraform.

// 01. About Me

I'm a passionate problem-solver operating at the intersection of Data Science, Machine Learning, and Cloud Engineering. My focus is on translating complex data into actionable insights and robust, production-ready systems.

With hands-on experience spanning the full data lifecycle – from designing ETL/ELT pipelines (Airflow, Spark) and deploying cloud infrastructure (Terraform, GCP, AWS) to developing and operationalizing ML/DL models (TensorFlow, PyTorch, MLOps) – I thrive on technical challenges.

I prioritize clean code, automation, and delivering quantifiable results, whether it's improving model accuracy (>90% F1), boosting pipeline efficiency (30%+), or reducing manual effort (40%+).

Key Metrics & Achievements:

  • 0+ Years Experience
  • 0%%+ F1 Scores (NLP/NER)
  • 0%+ Accuracy (CV Models)
  • 0% Proc Efficiency Boost
  • 0% in Manual Setup (IaC)
  • 0% Congestion Reduction
  • 0+ Students Mentored
  • 0+ Cloud/IaC Certifications

// 02. Technical Skills

Languages & Core

  • Python
  • SQL
  • GoLang
  • Bash
  • R
  • JavaScript

ML / DL / AI

  • TensorFlow
  • PyTorch
  • Scikit-learn
  • Hugging Face
  • spaCy
  • LLMs/RAG
  • BERT
  • YOLO
  • OpenCV
  • RL
  • Stats

Cloud Platforms

  • GCP (Expert)
  • AWS
  • Azure (Basic)
  • Vertex AI
  • BigQuery
  • Dataflow
  • Cloud Run
  • GKE
  • S3/GCS
  • RedShift/SQL

DevOps / MLOps / IaC

  • Terraform
  • Docker
  • Kubernetes
  • CI/CD (Git, Cloud Build)
  • Airflow
  • MLflow
  • Monitoring
  • Logging

Data Eng & Databases

  • ETL/ELT Design
  • Spark
  • Hadoop
  • Data Modeling
  • PostgreSQL
  • MySQL
  • SQL Server
  • MongoDB
  • Data Quality

Viz & Other Tools

  • Tableau
  • PowerBI
  • Looker
  • Grafana
  • D3.js
  • Document AI
  • Web Scraping
  • ArcGIS Pro

// 03. Experience

Student Data Analyst / GTA / C3G Intern @ Clarkson University
May 2024 – Present
  • Analyzed institutional data (PowerBI/Python), reducing faculty data retrieval time 30%.
  • Mentored 50+ students (Python/SQL/NLP); built NER pipeline (>90% F1).
  • Engineered traffic pipelines (drone/metrocount), cutting congestion 22%; built Tableau dashboards.
  • Deployed YOLOv8 model (95% parking detection accuracy) and many more.
Associate Data Engineer @ Egen
Jul 2022 – Dec 2023
  • Engineered ETL workflows (Airflow/GCP), boosting efficiency 30%.
  • Automated CI/CD pipelines (GCP), reducing manual errors 40%.
  • Automated infra (Terraform IaC, 50+ resources), cutting setup time 70%.
  • Led patent analysis (Google Doc AI, 2k+ docs); built Looker dashboards, reducing reporting 40% and many more.
Consultant Data Engineer @ Egen
Jan 2022 – Jun 2022
  • Developed automated ETL (AppsScript, Cloud Functions, BigQuery) for dynamic data integration and dashboard development.
  • Developed Event/Trigger based ETL scripts that run automating database replication from operations database(PostgreSQL) to the data warehouse (BigQuery) for analysis.

// 04. Projects

AI Therapeutic Chatbot

Full-stack chatbot (Ollama/Gemma, Flask, Voice I/O) enhancing senior mental well-being.

  • Python
  • LLM
  • Flask
  • API

Healthcare RAG QA System

RAG pipeline (Transformers, Vector DBs) for accurate, context-aware healthcare QA.

  • Hugging Face
  • RAG
  • Vector DB
  • NLP

NER Model Fine-tuning (>92% F1)

Fine-tuned BERT for domain-specific NER, achieving high performance.

  • BERT
  • Hugging Face
  • TF
  • NLP

DeepFake Detection (85%+ Acc)

EfficientNet model; enhanced 15% via advanced preprocessing (Adaptive ELA).

  • EfficientNet
  • TF
  • PyTorch
  • CV

Automated Instagram Bot (45% Eng Incr)

GPT-3.5 pipeline automating posts via Instagram API & NLP trend analysis.

  • Python
  • GPT-3
  • API
  • NLP
  • Automation

RL for Autonomous Control

Applied Reinforcement Learning in simulation to train agent for quadcopter stabilization.

  • RL
  • Python
  • Simulation

YOLOv8 Parking Detection (95% Acc)

Deployed CV model for accurate parking space detection from drone imagery.

  • YOLOv8
  • CV
  • Python
  • Deployment

BRFSS Health Risk Analysis (80% Acc)

Analyzed 400k+ health records (ANOVA, Chi-Sq) to identify key risk factors using TF.

  • Python
  • TensorFlow
  • Statistics
  • Healthcare

Social Distancing Alert Device

Wearable prototype using RSSI signals to alert users in crowded spaces.

  • Hardware
  • Bluetooth
  • RSSI
  • IoT

Wine Quality Prediction

EDA and predictive modeling to determine factors influencing wine quality using ML algorithms.

  • Python
  • Scikit-learn
  • Pandas
  • ML

IPEDS ETL pipeline

Pipeline to load IPEDS data from Access DBs to SQL Server with transformations for future sustainability.

  • Python
  • SQL Server
  • ETL
  • Automation

Real-Time Streaming Pipeline (1k+ evt/s)

Pipeline (Pub/Sub, Dataflow/Spark) processing events to BigQuery with low latency.

  • GCP
  • Pub/Sub
  • Dataflow
  • Streaming
  • BigQuery

Data Warehouse Schema Design

Designed & implemented Star Schema model (SQL, BigQuery/Redshift) for efficient querying.

  • Data Modeling
  • SQL
  • BigQuery
  • Redshift
  • DW

Patent Data Pipeline (DocAI)

Automated pipeline using Google Document AI for patent data extraction (2k+ PDFs).

  • GCP
  • Document AI
  • BigQuery
  • ETL
  • NLP

Automated Reporting Pipeline

E2E pipeline (Python, SQL, Airflow) extracting, loading, & visualizing in Tableau/PowerBI.

  • Python
  • SQL
  • Airflow
  • Tableau
  • PowerBI

NYSERDA Grant Pipeline & NER

Automated scraping (Beautiful Soup) & NER (Hugging Face, 90% F1) for 10k+ grants.

  • Web Scraping
  • Beautiful Soup
  • PostgreSQL
  • NER

ML Deployment Pipeline (70% Time Red)

Automated CI/CD (Docker, K8s, TF Serving/Cloud Run) reducing ML deployment time.

  • MLOps
  • CI/CD
  • Docker
  • Kubernetes
  • GCP

Multi-Cloud Infra (IaC, >95% Auto)

Provisioned prod infrastructure (VPC, compute, DBs) across GCP/AWS via Terraform.

  • Terraform
  • IaC
  • GCP
  • AWS
  • Multi-Cloud

Serverless Event Pipeline (60% Cost Red)

Pipeline (Cloud Functions/Lambda, Pub/Sub) processing 1k+ events/sec.

  • Serverless
  • GCP
  • AWS
  • Pub/Sub

Automated Container App CI/CD (50% Freq Incr)

CI/CD (Cloud Build) to containerize (Docker) & deploy to K8s (GKE/EKS).

  • CI/CD
  • Docker
  • Kubernetes
  • GCP
  • AWS

Automated Cloud DB Mgmt (60% Task Red)

Terraform modules automating Cloud SQL/RDS deployment, config, backup.

  • Terraform
  • GCP
  • AWS
  • Database
  • Automation

Automated Cloud Secrets Mgmt

Implemented secrets management (GCP Secret Manager, Terraform), eliminating hardcoded credentials.

  • Terraform
  • Secrets
  • GCP
  • Security

E-commerce Customer Segmentation

Performed K-Means clustering (Python, Scikit-learn) for targeted marketing insights.

  • Python
  • Scikit-learn
  • Clustering
  • EDA

Clarkson Traffic/Parking Dashboard

Created interactive Tableau dashboards visualizing campus traffic and parking data.

  • Tableau
  • GIS
  • Data Analysis
  • Dashboard

A2A Collaborative Analysis

Statistical analysis on roadkill data using surveys/drone imagery for hotspot determination.

  • R
  • ArcGIS Pro
  • Statistics
  • GIS

Town of Colton Complete Streets

Developed comprehensive plan via survey analysis and GIS visualization.

  • ArcGIS Pro
  • QGIS
  • Survey Analysis
  • GIS

Misc Data Visualization Projects

Collection including NYC Bike Lanes (GIS), Election Mapping (R), Spotify Trends (Python).

  • Tableau
  • R
  • Python
  • GIS
  • D3.js

Twitter Sentiment Dashboard (40% Report Red)

Automated sentiment pipeline (Python, NLP) visualized in Looker/Tableau.

  • Python
  • NLP
  • Looker
  • Tableau
  • Dashboard

OTT Web Platform (Netflix Clone)

Responsive streaming web application using React, Firebase, Node.js.

  • React
  • Node.js
  • Firebase
  • Web Dev

Voice-Based Email Client

Python application enabling email management through voice commands using GTTS.

  • Python
  • SpeechRecognition
  • GTTS

Rowing School App

Full-stack application to manage memberships, records, and revenue.

  • Full Stack
  • Database
  • Web App

// 05. Contact

Let's build something great together.

I'm actively seeking full-time roles where I can contribute my expertise in Data Science, ML, and Cloud Engineering. Whether you have a specific project in mind or just want to connect, feel free to reach out.

// Say Hello