Principal Engineer @ VMware/Broadcom

Hi, I'm Chao Li

AI Engineer & Distributed Systems Architect

Building the infrastructure that powers AI. Specializing in LLM training pipelines, distributed systems, and high-performance computing that handles millions of operations per second.

View Projects Get in Touch

Years Experience

10x

Customer Impact

1M+

Events/sec

cluster.py

import ray
from ray.train import ScalingConfig
from transformers import AutoModelForCausalLM

@ray.remote(num_gpus=1)
class TrainingWorker:
    def __init__(self, model_id: str):
        self.model = AutoModelForCausalLM.from_pretrained(
            model_id,
            device_map="auto",
            torch_dtype="bfloat16"
        )

    def train_step(self, batch) -> dict:
        """Execute distributed training step."""
        return {"loss": loss.item()}

Scroll

About Me

From Distributed Systems to AI Infrastructure

I'm a Principal Engineer with 9+ years of experience building high-performance distributed systems. My journey has taken me through startups to unicorns to global enterprises, where I've consistently delivered systems that push the boundaries of scale and efficiency.

Currently at VMware/Broadcom, I lead LLM infrastructure projects that have achieved 10x improvements in customer onboarding speed. I specialize in the intersection of distributed systems and machine learning—building the infrastructure that makes AI work at scale.

When I'm not at work, I'm building my own ML infrastructure in my homelab—from C++ backtesting engines with microsecond latency to Ray-based distributed training platforms running on Kubernetes.

My Journey

2008-2012

Bachelor's in CS

2013-2015

Master's (PhD track)

2015-2021

Staff Engineer

2021-2022

Software Engineer

2022-Now

Principal Engineer

Current Focus

LLM Infrastructure

Building training and inference pipelines for large language models with distributed systems on Ray and Kubernetes.

Post-Training & RLHF

Implementing reinforcement learning from human feedback, PPO, DPO, and alignment techniques for production models.

High-Performance Systems

Designing systems handling 1M+ events/sec with microsecond latency using C++, shared memory, and async patterns.

Agentic AI Systems

Creating self-reflective multi-agent workflows with LangGraph for complex automation and code generation tasks.

Experience

Building Systems at Scale

From startups to global enterprises, I've built distributed systems and ML infrastructure that power millions of operations per second.

VMware (Broadcom)

Current

Principal Engineer

Palo Alto, California

December 2022 - Present

AI/ML IntegrationLLM Agents (LangGraph, CrewAI)Distributed SystemsConfiguration AutomationWeb Security (CSRF/CORS)C/C++/GolangLoad Balancer InfrastructureTechnical Leadership

LLM Migration Automation

Self-reflective agentic AI system for automating F5 to Avi load balancer configuration conversion.

Multi-Agent F5 iRule Conversion System: Built self-reflective multi-agent workflow using LangGraph to automatically convert F5 iRules to Avi DataScript, achieving 10x migration speed improvement

LoRA Fine-tuned Domain-Specific LLM: Fine-tuned 7B parameter model (Qwen-Coder) with LoRA for load balancer configuration domain, collaborating with L7 engineers to generate labeled training data

AST-Based Rule Parser for Hybrid Conversion: Implemented AST-based rule parser for deterministic syntax transformation, combined with LLM for semantic understanding, iteratively improving conversion quality

Cross-Functional Technical Leadership: Led project delivery with VP-level reporting, coordinated with Support Engineers for customer requirements and L7 Engineers for domain expertise

LangGraphQwen-CoderGemini ProLoRAPython

Distributed File Object System

Unified file management platform migrated from Python to Golang, enabling centralized file lifecycle control across multi-cluster control plane

Python to Golang Migration: Migrated legacy Python-based file object system to Golang for improved performance and type safety

Unified File Management Platform: Consolidated file handling across different apps and functionalities into centralized file object system with standardized lifecycle management

Full-Stack Pipeline Implementation: Implemented complete pipeline from control plane through to application servicing layer

Cross-Team Onboarding & Golden Path: Collaborated with multiple functionality teams for onboarding and created golden path documentation for seamless new file object integration

GolangPythonControl PlaneDistributed Systems

Web Security CSRF Module

CSRF protection module integrated into nginx module chain with full-stack implementation from control plane to runtime

CSRF Module Integration: Introduced CSRF protection module into existing nginx module chain, ensuring compatibility with existing infrastructure and security requirements

Full-Stack Security Implementation: Implemented complete stack from control plane configuration through to nginx module runtime execution

CnginxControl PlaneWeb Security

Argo AI

Software Engineer

Palo Alto, California

June 2021 - Dec 2022

High-Performance ComputingC++ Systems ProgrammingZero-Copy LoggingEvent-Based Tiered LoggingOnboard Vehicle SystemsAutonomous VehiclesReal-time Systems

Onboard Logging Infrastructure

High-performance logging system for autonomous vehicles with zero-copy data paths and tiered event handling

Zero-Copy High-Performance Logger: Implemented zero-copy logging infrastructure minimizing memory overhead and maximizing throughput for real-time vehicle sensor data

Slab-Based Journal Logger: Designed slab-based memory allocation for journal logging, reducing fragmentation and allocation overhead in high-frequency logging scenarios

Event-Based Tiered Logging System: Built tiered logging architecture with event-based routing to different channel sets, enabling flexible log level management and channel binding

C++Zero-CopySlab AllocatorRing Buffer

Avi Networks → VMware

Staff Engineer

Palo Alto, California

Nov 2015 - May 2021

Boost Coroutine/FiberC++ Threading ModelMemory Pool OptimizationRing Buffer LoggingElasticsearchQuery DSL Parser (pyparsing)High-Throughput SystemsDistributed Systems

Distributed Analytics Platform

High-throughput log analytics system with async indexing pipeline and custom query DSL, processing 1M+ events/sec

Query DSL to Elasticsearch Mapper: Built URL-based query DSL parser using pyparsing that transforms user queries into optimized Elasticsearch query DSL, enabling intuitive search interface

Async Indexing Pipeline with Twisted: Implemented async request→indexing→response pipeline using Twisted Python with message tracing, auditing, and graceful error handling

High-Performance Core Logging Engine: Introduced Boost coroutine/fiber in core logging to persist logs from DPDK ring buffer, with memory pooling and thread pool separating logic from system resources

End-to-End Logging Pipeline Leadership: Led complete logging pipeline from data plane ingestion through indexing to query serving, achieving 1M+ events/second throughput

C++Boost Coroutine/FiberPythonTwistedElasticsearch

Projects

What I've Built

From high-frequency trading systems to LLM infrastructure—engineering solutions that push performance boundaries.

LLM Migration Automation

Featured @ vmware-broadcom

Self-reflective agentic AI system for automating F5 to Avi load balancer configuration conversion.

Multi-Agent F5 iRule Conversion System: Built self-reflective multi-agent workflow using LangGraph to automatically convert F5 iRules to Avi DataScript, achieving 10x migration speed improvement

LoRA Fine-tuned Domain-Specific LLM: Fine-tuned 7B parameter model (Qwen-Coder) with LoRA for load balancer configuration domain, collaborating with L7 engineers to generate labeled training data

Multi-Agent System

LangGraph orchestration

LoRA Fine-tuning

Domain-specific adapters

Code Generation

iRule to DataScript

Self-Reflection

Iterative improvement

10x

Speed Up

Parameters

CoT

Reasoning

LangGraphQwen-CoderGemini ProLoRAPython

Stock Strategy Backtesting Platform

Featured

End-to-end ML infrastructure product for quantitative trading strategy development and analysis.

Generic Message Passing Pipeline: Implemented shared-memory ring buffer based message passing architecture connecting data provider, algorithm, broker, risk manager, and logger with unified high-performance data processing pipeline

C++ to Python Algorithm Binding: Created Python bindings for C++ algorithm interface, enabling rapid strategy prototyping while maintaining production performance

C++ Engine

Shared-memory with μs latency

Ray on K8s

Distributed ML training

MCP Server

LLM-driven strategy generation

React Frontend

Real-time visualization

μs

Latency

ETL Speed

Scale

C++17PythonRayKubernetesReactFastAPI

Distributed Analytics Platform @ avi-networks-vmware

High-throughput log analytics system with async indexing pipeline and custom query DSL, processing 1M+ events/sec

Query DSL to Elasticsearch Mapper

C++Boost Coroutine/FiberPythonTwistedElasticsearch

Distributed File Object System @ vmware-broadcom

Unified file management platform migrated from Python to Golang, enabling centralized file lifecycle control across multi-cluster control plane

Python to Golang Migration

GolangPythonControl PlaneDistributed Systems

Distributed Training Platform

Ray-based ML training infrastructure for stock prediction models, running on homelab Kubernetes cluster with dynamic resource allocation and fault tolerance.

Ray Cluster on Homelab Kubernetes

Ray Cluster — Auto-scaling workers

Multi-GPU — Distributed training

Job Scheduler — Fault-tolerant execution

RayKubernetesPyTorchDockerHelm

High-Performance ETL Pipeline

Ray-based data processing pipeline for stock market data ingestion and feature engineering, handling terabyte-scale historical quotes with columnar storage optimization for ML training.

Kubernetes Infrastructure Setup

Market Data Ingestion — Distributed transforms

Arrow/Parquet — Columnar storage

Feature Engineering — Automated pipelines

RayApache ArrowParquetPySparkPandas

Onboard Logging Infrastructure @ argo-ai

High-performance logging system for autonomous vehicles with zero-copy data paths and tiered event handling

Zero-Copy High-Performance Logger

C++Zero-CopySlab AllocatorRing Buffer

Web Security CSRF Module @ vmware-broadcom

CSRF protection module integrated into nginx module chain with full-stack implementation from control plane to runtime

CSRF Module Integration

CnginxControl PlaneWeb Security

Skills

Technical Expertise

A decade of building high-performance systems across the full stack—from low-level C++ to distributed ML infrastructure.

Expert

Proficient

Languages

C++17

Python

TypeScript

Shell

ML & LLM

PyTorch

LangGraph

LoRA/PEFT

RLHF/PPO/DPO

HuggingFace

Distributed Training

Infrastructure

Kubernetes

Ray

Docker

Terraform

Helm

Data & Storage

Elasticsearch

PostgreSQL

Apache Arrow

Redis

Kafka

High Performance

Async C++/Boost

Shared Memory

CUDA

nginx Modules

Zero-Copy

Web & APIs

FastAPI

React

gRPC

REST APIs

⚡

1M+

Events/sec handled

🚀

μs

Latency achieved

📊

TB+

Data processed

📈

10x

Performance gains

Contact

Let's Connect

Interested in discussing AI infrastructure, distributed systems, or potential collaborations? I'm always happy to chat about technology and engineering challenges.

[email protected]

GitHub

github.com/chaopli

linkedin.com/in/lichao90

Location

Sunnyvale, CA

Have a project in mind?

Whether it's building ML infrastructure, optimizing distributed systems, or exploring new AI applications—let's talk.

Send me an email