Log In

Don't have an account? Sign up now

Lost Password?

Sign Up

Password will be generated and sent to your email address.

ML Infrastructure

We place the engineers who make frontier AI run at scale.

Training a foundation model is one problem. Shipping it with 2–10× better throughput, sub-second latency, and unit economics that actually work is another. We’ve spent the last seven years recruiting exclusively the engineers who can do both.

What we mean by ML Infrastructure

ML Infrastructure is a narrow, high-leverage discipline. The people who do it well are rare, they don’t apply to jobs, and they’re consistently the hardest hire on any frontier AI org chart. We source across the full stack:

Training infrastructure — distributed training (FSDP, ZeRO, tensor + pipeline parallelism), NCCL tuning, multi-node GPU orchestration, dataloaders and fusion, activation rematerialization, checkpoint hygiene, determinism under failure.

Kernels and compilers — Triton, CUDA, CUTLASS, FlashAttention-family speedups, sequence packing, KV-cache optimization, MLIR, LLVM IR, JIT codegen, torch.compile / Inductor, XLA, JAX, TensorRT, IREE.

Inference and serving — vLLM, TensorRT-LLM, SGLang, Triton Inference Server, KServe. Continuous batching, speculative decoding, prefix caching, prefill/decode disaggregation, quantization (GPTQ, AWQ, FP8), distillation.

Platform and orchestration — Kubernetes-based GPU orchestration, Ray, Dagster, SLURM, Terraform, cloud-scale observability (Prometheus, Grafana, OpenTelemetry), autoscaling on custom metrics, zero-downtime model deploys.

Real-time media inference — WebRTC at scale, long-lived connections, latency-critical video and audio pipelines, token-level metering and SLA enforcement.

Who we place, and where

Our ML infrastructure candidates come from a tight network we’ve built over seven years: frontier labs, inference-platform startups, hyperscaler GPU-infra teams, and ex-FAANG engineers who’ve recently moved to scale-up AI companies.

Typical candidate backgrounds:

  • Anthropic, OpenAI, Google DeepMind, xAI, Meta GenAI / FAIR, Mistral
  • NVIDIA (CUDA, TensorRT, Triton, cuDNN, CUTLASS teams)
  • PyTorch core, HuggingFace infrastructure, JAX / XLA contributors
  • Modal, Baseten, Together AI, Anyscale, Fireworks, Replicate, CoreWeave, Crusoe
  • Wayve, Waymo, Tesla Dojo, NVIDIA GEAR
  • Top-tier PhDs from Stanford, CMU, MIT, Berkeley, ETH Zurich, Cambridge
 

Typical client profile:

  • Seed through Series C frontier AI labs
  • Inference-platform companies scaling beyond $50M ARR
  • Physical AI and robotics foundation model companies
  • Real-time multimodal consumer AI (avatars, video generation, voice)
  • Quant / ML-native hedge funds with bespoke inference needs

Why we win ML infrastructure searches

Got a role to fill?