Install flash-attn with prebuilt wheels (no compile)

This is the fastest and most reliable path for most users. Instead of building from source, install directly from a wheel URL that matches your Python, PyTorch, CUDA, and platform.

Steps

  1. 1) Find the right wheel: open the wheel finder and select your platform + versions.
  2. 2) Install with pip (the command installs directly from the wheel URL):
    pip install https://example.com/flash_attn-...whl
  3. 3) Or install with uv (often faster):
    uv pip install https://example.com/flash_attn-...whl
  4. 4) Verify:
    python -c "import flash_attn; print('flash_attn ok')"python -c "import torch; print(torch.__version__)"

What you don’t need

  • - The CUDA toolkit (for wheel installs)
  • - A C++ compiler toolchain
  • - 30+ minutes of build time

If no wheel matches

If you can’t find a compatible wheel for your exact versions, you have two options: