About me

I'm Catalin, an ML and GPU kernel engineer. I started at 19 and I'm 21 now. Right now most of my time goes into improving inference on NVIDIA and AMD accelerators — writing and tuning kernels down at the ISA level so models run faster on the hardware they actually ship on.

Before that I've been building production ML systems: computer vision that runs in the real world, the data pipelines feeding it, and the MLOps around it. I like moving between those two worlds — the very low-level, where you're reading PTX and counting cycles, and the very high-level, where you're keeping a whole system alive in production.

I tend to go deep and iterate fast. Most of what I learn ends up in a real project or a blog post rather than staying theory. If you want the current state of things, my now page is the most honest snapshot.

Currently: deep learning compilers + mega kernels + MoE kernels now →

Technologies I work with

Tools, frameworks and systems I've shipped or built with.

GPU / Kernels

CUDA HIP Triton Gluon dialect CUTLASS CuTe (C++) PTX / ISA analysis

MLOps / Data

Kafka MinIO / S3 RabbitMQ TimescaleDB (vectors) Redis MLflow Grafana Prometheus

Infra

Kubernetes Helm Rancher Fleet (GitOps) PostgreSQL

Backend

Python (FastAPI) Java (Maven) C# (.NET)

Frontend

React Svelte TypeScript Flutter

AI domains I've worked in

Computer Vision, GPU Kernels, Generative AI, Large Language Models, Natural Language Processing, Recurrent Networks, Auto Encoders, Machine Learning Algorithms, Energy Based Models, Reinforcement Learning, Graph NNs.

Skills

Technical

Low-level GPU optimization ISA-level kernel analysis Attention mechanisms MLOps pipelines Production ML systems Mathematics (analysis, DSA) Deep learning research → practice Data pipelines at scale

How I work

Iterate fast Go deep Open to feedback Strong communication Problem-solving Self-directed