// available for work

Abhishek Batra
Data & AI/ML Dev _

pipelines · models · design · plan

Apache Spark dbt Airflow Google Cloud Data Warehousing Transformers Big Data

Tech Stack

Python SQL Spark Kafka dbt Airflow Google Cloud Agentic AI LLM Docker AWS

Latest Blogs

all posts →
mlops

Orchestrating ML Workflows with Airflow + MLflow

End-to-end ML pipeline orchestration combining Airflow DAGs with MLflow tracking.

May 2026 · 6 min
airflow mlflow
Data Engineering

Orchestrating Complex Microservices Environments

Connect numerous services and agents, enabling them to communicate asynchronously and reliably, even across different event formats and schemas.

May 2026 · 11 min
llm fine-tuning
data engineering

Building Real-Time Pipelines with Managed Kafka & Spark

How I built a 1M events/day pipeline using Kafka and Spark Streaming.

May 2026 · 8 min
kafka spark

Projects

StreamIQ — Realtime Analytics

End-to-end pipeline ingesting 1M+ events/day using Kafka, Spark Streaming, and Delta Lake.

Kafka Spark Delta Lake Grafana
🧠

DocMind — RAG Q&A System

LLM-powered document assistant using vector embeddings and retrieval augmented generation.

LangChain FAISS FastAPI HuggingFace
🔁

DataVault — dbt Warehouse

Modular data warehouse transformation layer with CI/CD, lineage, and automated testing.

dbt Snowflake GitHub Actions
📈

PriceSignal — Forecasting Engine

Time series ML model for commodity price prediction, deployed on AWS SageMaker.

Prophet XGBoost SageMaker

YouTube Videos

channel →

dbt Tutorial for Data Engineers

31K views· 35 min

Build a Data Pipeline from Scratch

42K views· 28 min

Intro to Vector Databases for ML

18K views· 19 min