What is the fastest way to install flash-attn?

Use a prebuilt wheel that matches your Python, PyTorch, CUDA, and platform. This avoids compiling from source and usually installs in seconds.

Do I need the CUDA toolkit to install flash-attn?

Not when you install from a prebuilt wheel. If you build from source, you typically need a compatible CUDA toolkit and compiler toolchain.

Why does flash-attn installation fail?

Most failures come from mismatched Python/PyTorch/CUDA versions or attempting a from-source build without the right build dependencies. Prebuilt wheels help avoid this.

Install flash-attn (Flash Attention)

This guide is optimized for the common goal: install flash-attn quickly without compiling. Use the wheel finder to get a command that installs directly from a matching wheel URL.

Windows

Windows-specific install guide.

Windows install →

Using uv

Faster install with uv package manager.

Install with uv →

From Source

Build when no wheel matches.

Compile from source →

Quick start (prebuilt wheel)

1) Open the wheel finder: go to the wheel finder and choose your platform, flash-attn version, Python, PyTorch, and CUDA.
2) Copy the command: you’ll get commands like:
pip install https://example.com/flash_attn-...whluv pip install https://example.com/flash_attn-...whl
3) Verify:
python -c "import flash_attn; print('flash_attn ok')"

If something fails

Most install failures are version mismatches. Use these pages to recover quickly:

Compatibility checklist (Python / PyTorch / CUDA / platform)
Troubleshooting common flash-attn install errors

Recommended

Windows

Using uv

From Source

Quick start (prebuilt wheel)

If something fails