Projects

  • nsa-from-scratch 2025

    Triton kernels for all three branches of DeepSeek's Native Sparse Attention (arXiv:2502.11089), plus a Hopper WGMMA CUDA selected forward. 7.4x faster than FlashAttention-3 at 64k context. Five-model training fleet, perplexity sweep to 256k, LongBench v2, MoBA cross-comparison. (sparse attention, Triton, Hopper WGMMA)

  • whitenoise 2026

    Numerics for white noise and the stochastic heat equation on the torus. Python API, C++/pybind11 hot kernels, analytic Monte Carlo tests. (SPDE, stochastic heat equation, Monte Carlo)

  • NanoExchange 2026

    Matching engine, UDP multicast feed, TCP order gateway, and a React dashboard with order book, depth, heatmap, OHLC chart, and live simulator. (matching engine, market data, low-latency)

  • Meridian 2026

    Microservices TSDB in Go (gateway / ingestor / storage / querier / compactor) with Gorilla compression, PromQL, consistent-hash sharding, and a real-time React dashboard. (TSDB, PromQL, Gorilla compression)

  • tinycompress 2026

    Implementations and measured benchmarks of LLM inference compression: int4/int8 quantization, GPTQ-like calibration, int8 KV cache, pruning, distillation, speculative decoding, torch.compile, ONNX. (LLM, quantization, speculative decoding)

  • windvane 2025

    End-to-end SCADA fault detection on Penmanshiel wind-farm telemetry. TimescaleDB hypertables, 14-model dbt project (staging / intermediate / marts), EWMA baseline plus LSTM autoencoder, SHAP attribution, walk-forward CV, Streamlit demo. (fault detection, TimescaleDB, dbt)

  • stem-agent 2026

    A self-specializing AI agent that differentiates into a code-quality reviewer, validated against a 20-sample benchmark with guard-gated rollback. (AI agent, code review, LLM)

  • parallel-downloader 2026

    Parallel HTTP downloader in Kotlin: N ranged GETs in flight, positional FileChannel writes, retry/resume/single-GET fallback, sha256 verification, and rate limiting. Multi-platform CI (ubuntu/macos/windows × JDK 17/21). Runtime: kotlinx-coroutines-core only. (HTTP, range requests, Kotlin coroutines)

  • llm-router 2025

    Prefix-cache aware reverse proxy for OpenAI-compatible LLM servers. Per-worker radix trees pin each request to the engine that already holds its KV-cache prefix. 94.9% upstream cache hits and the flattest TTFT-under-load slope vs random, round-robin, and least-loaded baselines on 4x A100 SXM with vLLM + Qwen2.5-7B/14B. (KV cache, prefix routing, vLLM)