vincentweisser 19 hours ago

A big step towards truly decentralized inference — unlocking consumer GPUs and already outperforming traditional approaches that stall in high-latency settings.

Unlike other p2p inference engines (e.g., Petals, Exo), our stack uniquely leverages vLLM’s advanced scheduling for efficient batch decoding, achieving 10–50× higher throughput.

Crucial for scaling decentralized RL rollouts and synthetic data generation.