ML Infrastructure

We place the engineers who make frontier AI run at scale.

Training a foundation model is one problem. Shipping it with 2–10× better throughput, sub-second latency, and unit economics that actually work is another. We’ve spent the last seven years recruiting exclusively the engineers who can do both.

What we mean by ML Infrastructure

ML Infrastructure is a narrow, high-leverage discipline. The people who do it well are rare, they don’t apply to jobs, and they’re consistently the hardest hire on any frontier AI org chart. We source across the full stack:

Training infrastructure — distributed training (FSDP, ZeRO, tensor + pipeline parallelism), NCCL tuning, multi-node GPU orchestration, dataloaders and fusion, activation rematerialization, checkpoint hygiene, determinism under failure.

Kernels and compilers — Triton, CUDA, CUTLASS, FlashAttention-family speedups, sequence packing, KV-cache optimization, MLIR, LLVM IR, JIT codegen, torch.compile / Inductor, XLA, JAX, TensorRT, IREE.

Inference and serving — vLLM, TensorRT-LLM, SGLang, Triton Inference Server, KServe. Continuous batching, speculative decoding, prefix caching, prefill/decode disaggregation, quantization (GPTQ, AWQ, FP8), distillation.

Platform and orchestration — Kubernetes-based GPU orchestration, Ray, Dagster, SLURM, Terraform, cloud-scale observability (Prometheus, Grafana, OpenTelemetry), autoscaling on custom metrics, zero-downtime model deploys.

Real-time media inference — WebRTC at scale, long-lived connections, latency-critical video and audio pipelines, token-level metering and SLA enforcement.

Who we place, and where

Our ML infrastructure candidates come from a tight network we’ve built over seven years: frontier labs, inference-platform startups, hyperscaler GPU-infra teams, and ex-FAANG engineers who’ve recently moved to scale-up AI companies.

Typical candidate backgrounds:

Anthropic, OpenAI, Google DeepMind, xAI, Meta GenAI / FAIR, Mistral
NVIDIA (CUDA, TensorRT, Triton, cuDNN, CUTLASS teams)
PyTorch core, HuggingFace infrastructure, JAX / XLA contributors
Modal, Baseten, Together AI, Anyscale, Fireworks, Replicate, CoreWeave, Crusoe
Wayve, Waymo, Tesla Dojo, NVIDIA GEAR
Top-tier PhDs from Stanford, CMU, MIT, Berkeley, ETH Zurich, Cambridge

Typical client profile:

Seed through Series C frontier AI labs
Inference-platform companies scaling beyond $50M ARR
Physical AI and robotics foundation model companies
Real-time multimodal consumer AI (avatars, video generation, voice)
Quant / ML-native hedge funds with bespoke inference needs

Why we win ML infrastructure searches

We know the difference between a CUDA developer and a CUTLASS contributor. Most recruiters don't. That gap is why the best candidates ignore most outreach.
We've seen the real comp data. Current ML infra bands in our network range from $170K base (early-career research engineers) to $500K+ base (senior infra ICs at frontier labs) to $500K–$1M for CTO-level hands-on leaders. We don't anchor candidates low and we don't let clients underpay for the profile they actually need.
We source where the candidates actually are. NeurIPS, MLSys, CVPR, GPU Mode meetups, OSS commit history, referral chains from the 300+ ML engineers in our active network. Not LinkedIn InMail.
We understand the adjacent moves. A self-driving ML infra engineer from Cruise or Waymo is often the best candidate for a robotics foundation model role. A PyTorch core engineer is often the right profile for a compiler role at an embodied AI lab. We map those transitions constantly.

Log In

Sign Up

We place the engineers who make frontier AI run at scale.

What we mean by ML Infrastructure

Who we place, and where

Why we win ML infrastructure searches

Got a role to fill?