Projects
nsa-from-scratch 2025
Triton kernels for all three branches of DeepSeek's Native Sparse Attention (arXiv:2502.11089), plus a Hopper WGMMA CUDA selected forward. 7.4x faster than FlashAttention-3 at 64k context. Five-model training fleet, perplexity sweep to 256k, LongBench v2, MoBA cross-comparison. (sparse attention, Triton, Hopper WGMMA)
whitenoise 2026
Numerics for white noise and the stochastic heat equation on the torus. Python API, C++/pybind11 hot kernels, analytic Monte Carlo tests. (SPDE, stochastic heat equation, Monte Carlo)
NanoExchange 2026
Matching engine, UDP multicast feed, TCP order gateway, and a React dashboard with order book, depth, heatmap, OHLC chart, and live simulator. (matching engine, market data, low-latency)
Meridian 2026
Microservices TSDB in Go (gateway / ingestor / storage / querier / compactor) with Gorilla compression, PromQL, consistent-hash sharding, and a real-time React dashboard. (TSDB, PromQL, Gorilla compression)
tinycompress 2026
Implementations and measured benchmarks of LLM inference compression: int4/int8 quantization, GPTQ-like calibration, int8 KV cache, pruning, distillation, speculative decoding, torch.compile, ONNX. (LLM, quantization, speculative decoding)
windvane 2025
End-to-end SCADA fault detection on Penmanshiel wind-farm telemetry. TimescaleDB hypertables, 14-model dbt project (staging / intermediate / marts), EWMA baseline plus LSTM autoencoder, SHAP attribution, walk-forward CV, Streamlit demo. (fault detection, TimescaleDB, dbt)
stem-agent 2026
A self-specializing AI agent that differentiates into a code-quality reviewer, validated against a 20-sample benchmark with guard-gated rollback. (AI agent, code review, LLM)
parallel-downloader 2026
Parallel HTTP downloader in Kotlin: N ranged GETs in flight, positional FileChannel writes, retry/resume/single-GET fallback, sha256 verification, and rate limiting. Multi-platform CI (ubuntu/macos/windows × JDK 17/21). Runtime: kotlinx-coroutines-core only. (HTTP, range requests, Kotlin coroutines)
llm-router 2025
Prefix-cache aware reverse proxy for OpenAI-compatible LLM servers. Per-worker radix trees pin each request to the engine that already holds its KV-cache prefix. 94.9% upstream cache hits and the flattest TTFT-under-load slope vs random, round-robin, and least-loaded baselines on 4x A100 SXM with vLLM + Qwen2.5-7B/14B. (KV cache, prefix routing, vLLM)