Principal Engineer @ VMware/Broadcom

Hi, I'm Chao Li

AI Engineer & Distributed Systems Architect

Building the infrastructure that powers AI. Specializing in LLM training pipelines, distributed systems, and high-performance computing that handles millions of operations per second.

9+
Years Experience
10x
Customer Impact
1M+
Events/sec
About Me

From Distributed Systems to AI Infrastructure

I'm a Principal Engineer with 9+ years of experience building high-performance distributed systems. My journey has taken me through startups to unicorns to global enterprises, where I've consistently delivered systems that push the boundaries of scale and efficiency.

Currently at VMware/Broadcom, I lead LLM infrastructure projects that have achieved 10x improvements in customer onboarding speed. I specialize in the intersection of distributed systems and machine learning—building the infrastructure that makes AI work at scale.

When I'm not at work, I'm building my own ML infrastructure in my homelab—from C++ backtesting engines with microsecond latency to Ray-based distributed training platforms running on Kubernetes.

My Journey

2008-2012
Bachelor's in CS
2013-2015
Master's (PhD track)
2015-2021
Staff Engineer
2021-2022
Software Engineer
2022-Now
Principal Engineer

Current Focus

LLM Infrastructure

Building training and inference pipelines for large language models with distributed systems on Ray and Kubernetes.

Post-Training & RLHF

Implementing reinforcement learning from human feedback, PPO, DPO, and alignment techniques for production models.

High-Performance Systems

Designing systems handling 1M+ events/sec with microsecond latency using C++, shared memory, and async patterns.

Agentic AI Systems

Creating self-reflective multi-agent workflows with LangGraph for complex automation and code generation tasks.

Experience

Building Systems at Scale

From startups to global enterprises, I've built distributed systems and ML infrastructure that power millions of operations per second.

VMware (Broadcom)

Current

Principal Engineer

Palo Alto, California

December 2022 - Present
AI/ML IntegrationLLM Agents (LangGraph, CrewAI)Distributed SystemsConfiguration AutomationWeb Security (CSRF/CORS)C/C++/GolangLoad Balancer InfrastructureTechnical Leadership

LLM Migration Automation

Self-reflective agentic AI system for automating F5 to Avi load balancer configuration conversion.

Multi-Agent F5 iRule Conversion System: Built self-reflective multi-agent workflow using LangGraph to automatically convert F5 iRules to Avi DataScript, achieving 10x migration speed improvement
LoRA Fine-tuned Domain-Specific LLM: Fine-tuned 7B parameter model (Qwen-Coder) with LoRA for load balancer configuration domain, collaborating with L7 engineers to generate labeled training data
AST-Based Rule Parser for Hybrid Conversion: Implemented AST-based rule parser for deterministic syntax transformation, combined with LLM for semantic understanding, iteratively improving conversion quality
Cross-Functional Technical Leadership: Led project delivery with VP-level reporting, coordinated with Support Engineers for customer requirements and L7 Engineers for domain expertise
LangGraphQwen-CoderGemini ProLoRAPython

Distributed File Object System

Unified file management platform migrated from Python to Golang, enabling centralized file lifecycle control across multi-cluster control plane

Python to Golang Migration: Migrated legacy Python-based file object system to Golang for improved performance and type safety
Unified File Management Platform: Consolidated file handling across different apps and functionalities into centralized file object system with standardized lifecycle management
Full-Stack Pipeline Implementation: Implemented complete pipeline from control plane through to application servicing layer
Cross-Team Onboarding & Golden Path: Collaborated with multiple functionality teams for onboarding and created golden path documentation for seamless new file object integration
GolangPythonControl PlaneDistributed Systems

Web Security CSRF Module

CSRF protection module integrated into nginx module chain with full-stack implementation from control plane to runtime

CSRF Module Integration: Introduced CSRF protection module into existing nginx module chain, ensuring compatibility with existing infrastructure and security requirements
Full-Stack Security Implementation: Implemented complete stack from control plane configuration through to nginx module runtime execution
CnginxControl PlaneWeb Security

Argo AI

Software Engineer

Palo Alto, California

June 2021 - Dec 2022
High-Performance ComputingC++ Systems ProgrammingZero-Copy LoggingEvent-Based Tiered LoggingOnboard Vehicle SystemsAutonomous VehiclesReal-time Systems

Onboard Logging Infrastructure

High-performance logging system for autonomous vehicles with zero-copy data paths and tiered event handling

Zero-Copy High-Performance Logger: Implemented zero-copy logging infrastructure minimizing memory overhead and maximizing throughput for real-time vehicle sensor data
Slab-Based Journal Logger: Designed slab-based memory allocation for journal logging, reducing fragmentation and allocation overhead in high-frequency logging scenarios
Event-Based Tiered Logging System: Built tiered logging architecture with event-based routing to different channel sets, enabling flexible log level management and channel binding
C++Zero-CopySlab AllocatorRing Buffer

Avi Networks → VMware

Staff Engineer

Palo Alto, California

Nov 2015 - May 2021
Boost Coroutine/FiberC++ Threading ModelMemory Pool OptimizationRing Buffer LoggingElasticsearchQuery DSL Parser (pyparsing)High-Throughput SystemsDistributed Systems

Distributed Analytics Platform

High-throughput log analytics system with async indexing pipeline and custom query DSL, processing 1M+ events/sec

Query DSL to Elasticsearch Mapper: Built URL-based query DSL parser using pyparsing that transforms user queries into optimized Elasticsearch query DSL, enabling intuitive search interface
Async Indexing Pipeline with Twisted: Implemented async request→indexing→response pipeline using Twisted Python with message tracing, auditing, and graceful error handling
High-Performance Core Logging Engine: Introduced Boost coroutine/fiber in core logging to persist logs from DPDK ring buffer, with memory pooling and thread pool separating logic from system resources
End-to-End Logging Pipeline Leadership: Led complete logging pipeline from data plane ingestion through indexing to query serving, achieving 1M+ events/second throughput
C++Boost Coroutine/FiberPythonTwistedElasticsearch
Projects

What I've Built

From high-frequency trading systems to LLM infrastructure—engineering solutions that push performance boundaries.

LLM Migration Automation

Featured @ vmware-broadcom

Self-reflective agentic AI system for automating F5 to Avi load balancer configuration conversion.

Multi-Agent F5 iRule Conversion System: Built self-reflective multi-agent workflow using LangGraph to automatically convert F5 iRules to Avi DataScript, achieving 10x migration speed improvement
LoRA Fine-tuned Domain-Specific LLM: Fine-tuned 7B parameter model (Qwen-Coder) with LoRA for load balancer configuration domain, collaborating with L7 engineers to generate labeled training data
Multi-Agent System
LangGraph orchestration
LoRA Fine-tuning
Domain-specific adapters
Code Generation
iRule to DataScript
Self-Reflection
Iterative improvement
10x
Speed Up
7B
Parameters
CoT
Reasoning
LangGraphQwen-CoderGemini ProLoRAPython

Stock Strategy Backtesting Platform

Featured

End-to-end ML infrastructure product for quantitative trading strategy development and analysis.

Generic Message Passing Pipeline: Implemented shared-memory ring buffer based message passing architecture connecting data provider, algorithm, broker, risk manager, and logger with unified high-performance data processing pipeline
C++ to Python Algorithm Binding: Created Python bindings for C++ algorithm interface, enabling rapid strategy prototyping while maintaining production performance
C++ Engine
Shared-memory with μs latency
Ray on K8s
Distributed ML training
MCP Server
LLM-driven strategy generation
React Frontend
Real-time visualization
μs
Latency
5x
ETL Speed
TB
Scale
C++17PythonRayKubernetesReactFastAPI

Distributed Analytics Platform @ avi-networks-vmware

High-throughput log analytics system with async indexing pipeline and custom query DSL, processing 1M+ events/sec

Query DSL to Elasticsearch Mapper
C++Boost Coroutine/FiberPythonTwistedElasticsearch

Distributed File Object System @ vmware-broadcom

Unified file management platform migrated from Python to Golang, enabling centralized file lifecycle control across multi-cluster control plane

Python to Golang Migration
GolangPythonControl PlaneDistributed Systems

Distributed Training Platform

Ray-based ML training infrastructure for stock prediction models, running on homelab Kubernetes cluster with dynamic resource allocation and fault tolerance.

Ray Cluster on Homelab Kubernetes
Ray Cluster Auto-scaling workers
Multi-GPU Distributed training
Job Scheduler Fault-tolerant execution
RayKubernetesPyTorchDockerHelm

High-Performance ETL Pipeline

Ray-based data processing pipeline for stock market data ingestion and feature engineering, handling terabyte-scale historical quotes with columnar storage optimization for ML training.

Kubernetes Infrastructure Setup
Market Data Ingestion Distributed transforms
Arrow/Parquet Columnar storage
Feature Engineering Automated pipelines
RayApache ArrowParquetPySparkPandas

Onboard Logging Infrastructure @ argo-ai

High-performance logging system for autonomous vehicles with zero-copy data paths and tiered event handling

Zero-Copy High-Performance Logger
C++Zero-CopySlab AllocatorRing Buffer

Web Security CSRF Module @ vmware-broadcom

CSRF protection module integrated into nginx module chain with full-stack implementation from control plane to runtime

CSRF Module Integration
CnginxControl PlaneWeb Security
Skills

Technical Expertise

A decade of building high-performance systems across the full stack—from low-level C++ to distributed ML infrastructure.

Expert
Proficient

Languages

C++17
Python
Go
TypeScript
Shell

ML & LLM

PyTorch
LangGraph
LoRA/PEFT
RLHF/PPO/DPO
HuggingFace
Distributed Training

Infrastructure

Kubernetes
Ray
Docker
Terraform
Helm

Data & Storage

Elasticsearch
PostgreSQL
Apache Arrow
Redis
Kafka

High Performance

Async C++/Boost
Shared Memory
CUDA
nginx Modules
Zero-Copy

Web & APIs

FastAPI
React
gRPC
REST APIs
1M+
Events/sec handled
🚀
μs
Latency achieved
📊
TB+
Data processed
📈
10x
Performance gains
Contact

Let's Connect

Interested in discussing AI infrastructure, distributed systems, or potential collaborations? I'm always happy to chat about technology and engineering challenges.

Have a project in mind?

Whether it's building ML infrastructure, optimizing distributed systems, or exploring new AI applications—let's talk.

Send me an email