Install flash-attn from source (compile)

This is the fallback path for when you can’t find a compatible prebuilt wheel. It’s slower and more error-prone, but sometimes necessary for uncommon combinations.

Try wheels first (recommended)

If your goal is “install flash-attn”, the fastest path is usually a prebuilt wheel:

Prerequisites (source build)

Source builds typically require a compatible build toolchain. Exact requirements vary by OS and GPU stack, but commonly include:

  • - A supported Python version and virtual environment
  • - A matching PyTorch build (CUDA-enabled if you need GPU)
  • - CUDA toolkit + nvcc (often required for compilation)
  • - A C++ compiler toolchain

For canonical, always-up-to-date instructions, refer to the official repository: Dao-AILab/flash-attention.

Minimal build example (may vary)

The simplest approach is often to let pip build from source. This can take a while and may fail if versions don’t line up:

pip install flash-attn --no-build-isolationpython -c "import flash_attn; print('flash_attn ok')"

If you hit errors, jump to troubleshooting.