Nix Distributed Builder Setup
Version updated for https://github.com/Misaka13514/setup-distributed-nix-builds to version v1.1.0.
- This action is used across all versions by ? repositories.
Go to the GitHub Marketplace to find the latest changes.
Action Summary
This GitHub Action automates the setup of a distributed Nix build cluster using ephemeral GitHub-hosted runners securely connected via Tailscale. It enables seamless horizontal scaling for parallel builds across multiple platforms and architectures (e.g., Linux and macOS), optimizing performance and simplifying remote builder configuration without requiring external infrastructure. Key features include automatic node provisioning, built-in caching, maximum disk space utilization, and graceful teardown of build resources.
Release notes
This release focuses on resolving severe CPU bottlenecks on the Coordinator runner, significantly improving the network transfer speeds between the Coordinator and Remote Builders, and introducing a command for better runner lifecycle management.
⚡ Transport Layer Optimizations (Resolving Coordinator Bottlenecks)
In previous versions, we observed that while remote builders finished compiling quickly, the final step of copying the Nix closures back to the Coordinator was abnormally slow. Our profiling (dstat and pidstat) revealed that the standard 4-core GitHub Hosted Runner acting as the Coordinator was hitting a severe CPU bottleneck, which restricted the entire pipeline.
This was caused by two overlapping mechanisms:
- Double Encryption: Tailscale inherently encrypts all mesh traffic using WireGuard (ChaCha20-Poly1305). Running standard SSH on top of Tailscale meant the Coordinator’s CPU was encrypting and decrypting data twice in software.
- The Fix: We now explicitly force SSH to use
Ciphers aes128-gcm@openssh.com. This offloads the SSH encryption layer to the CPU’s hardware-accelerated AES-NI instructions, freeing up the general ALU to handle Tailscale’s WireGuard encryption without choking.
- The Fix: We now explicitly force SSH to use
- Double Compression: Both SSH (via
-C) and Nix attempt to compress data over the wire. In a cloud environment where intra-region bandwidth is extremely high (often >3 Gbps on Azure), compressing already-dense binary Nix store paths wastes CPU cycles and actually slows down the transfer.- The Fix: We completely disabled compression in the generated
/etc/ssh/ssh_config(Compression no) and explicitly passed?compress=falseto thessh-ng://protocol in/etc/nix/machines.
- The Fix: We completely disabled compression in the generated
Measured Impact: According to our benchmark transferring roughly 1GB of generated payloads across 4 multi-arch builders:
- Throughput: The Tailscale mesh network easily handles massive bandwidth. With CPU bottlenecks removed, the Coordinator’s receive speed (
net/total:recv) peaked from 485MB/s (~3.8 Gbps) to 585 MB/s (~4.6 Gbps). - Time Saved: The total elapsed time for the distributed build and artifact retrieval phase dropped from 126.3s to just 40.6s. The
sshprocess CPU overhead dropped significantly, allowingnix-daemonto utilize the disk I/O much more effectively.
🛑 New Feature: Early Builder Teardown
To help you minimize GitHub Actions billing minutes, we are introducing the stop-nix-builders command.
Previously, remote builders would wait idly until their hard timeout or the post-job cleanup phase. You can now invoke this command on the Coordinator immediately after your heavy builds complete to gracefully terminate all remote builders early.
Example Usage:
- name: Build massive closure
run: nix build -L --max-jobs 0 .#my-heavy-package
- name: Teardown Builders explicitly to save runner minutes!
run: stop-nix-builders
- name: Push to cache (Builders are already shut down)
run: cachix push mycache --all
🛠️ Minor Fixes
- Magic Nix Cache: Explicitly set
diagnostic-endpoint: ""in themagic-nix-cache-actionconfiguration to disable unnecessary telemetry.