I'm a Software Engineer on the Infrastructure team at Chime. My work primarily involves building and maintaining highly available and scalable systems using AWS, Kubernetes, ArgoCD, Datadog, and Terraform.
I mainly code in Python and Bash, with some experience in Ruby and Go.
I write about infrastructure and engineering on my blog.
Most recent blog series:
- Streaming LLM inference on EKS β the build: VPC, EKS, vLLM Production Stack, and the streaming gateway.
- How much can two L4s serve? It depends on the prompt. β capacity, prefix caching, and the methodology trap.
- Per-tenant concurrency caps β protecting well-behaved tenants from a bursty neighbor.
- Adaptive concurrency on a multi-tenant vLLM gateway: WFQ + AIMD against a TTFT SLO β the self-tuning gateway.
- Autoscaling a GPU Fleet on Inference-Aware Signals - π’
βοΈ
I've also shared insights on Chime's engineering blog:
- How We Preview Kubernetes Changes at Chime: [2023] https://medium.com/life-at-chime/how-we-preview-kubernetes-changes-at-chime-5b4871847c5e | mirror
- How We Upgraded Our Core Database with Just 5 Minutes of Downtime: [2025] https://careers.chime.com/en/life-at-chime/engineering-at-chime/how-we-upgraded-our-core-database-with-just-5-minutes-of-downtime/ | mirror
π β΅ π π΄ ποΈ πΈ




