Install flash-attn from source (compile)
This is the fallback path for when you can’t find a compatible prebuilt wheel. It’s slower and more error-prone, but sometimes necessary for uncommon combinations.
Try wheels first (recommended)
If your goal is “install flash-attn”, the fastest path is usually a prebuilt wheel:
- - Use the wheel finder and install from the wheel URL (seconds).
- - Check compatibility if no wheels show up (often a version mismatch).
Prerequisites (source build)
Source builds typically require a compatible build toolchain. Exact requirements vary by OS and GPU stack, but commonly include:
- - A supported Python version and virtual environment
- - A matching PyTorch build (CUDA-enabled if you need GPU)
- - CUDA toolkit + nvcc (often required for compilation)
- - A C++ compiler toolchain
For canonical, always-up-to-date instructions, refer to the official repository: Dao-AILab/flash-attention.
Minimal build example (may vary)
The simplest approach is often to let pip build from source. This can take a while and may fail if versions don’t line up:
pip install flash-attn --no-build-isolationpython -c "import flash_attn; print('flash_attn ok')"If you hit errors, jump to troubleshooting.