Install flash-attn on Windows

Complete guide to installing Flash Attention on Windows. Use prebuilt wheels to skip compilation issues and get flash-attn running in seconds.

Quick Start: Windows Installation

  1. 1) Check your environment:
    python --versionpython -c "import torch; print(torch.__version__, torch.version.cuda)"
  2. 2) Find your Windows wheel: go to the wheel finder and select Windows AMD64 as your platform.
  3. 3) Install with pip:
    pip install https://github.com/.../flash_attn-...-win_amd64.whl
  4. 4) Verify:
    python -c "import flash_attn; print('flash_attn installed successfully!')"

Windows Wheel Availability

Windows prebuilt wheels are available for these configurations:

Python Versions

  • • Python 3.10
  • • Python 3.11
  • • Python 3.12

PyTorch Versions

  • • PyTorch 2.0.x
  • • PyTorch 2.1.x
  • • PyTorch 2.2.x
  • • PyTorch 2.3.x
  • • PyTorch 2.4.x

CUDA Versions

  • • CUDA 11.8
  • • CUDA 12.1
  • • CUDA 12.2
  • • CUDA 12.4

Note: Not all combinations are available. Use the wheel finder to check your specific configuration.

Alternative: Use WSL2

WSL2 (Windows Subsystem for Linux) provides full Linux compatibility with better wheel availability:

  • • More wheel combinations available
  • • Full CUDA support via WSL2
  • • Same workflow as Linux servers

In the wheel finder, select "Linux x86_64" when using WSL2.

No Windows wheel found?

If no wheel matches your Windows configuration:

Common Windows Installation Issues

DLL load failed / ImportError

Usually caused by CUDA version mismatch. Ensure your PyTorch CUDA version matches the wheel. Check with: python -c "import torch; print(torch.version.cuda)"

CUDA driver version insufficient

Update your NVIDIA GPU drivers from nvidia.com/Download. CUDA 12.x wheels require driver version 525+.

wheel not supported on this platform

Make sure you're downloading a Windows wheel (filename ends with win_amd64.whl). Linux wheels won't work on native Windows.

Windows Installation FAQ

Does flash-attn officially support Windows?

Flash Attention has limited official Windows support, but community-built wheels are available for many Python/PyTorch/CUDA combinations. Use our wheel finder to check availability.

Why does flash-attn fail to build on Windows?

Building flash-attn from source on Windows requires Visual Studio Build Tools, CUDA Toolkit, and specific compiler configurations. Prebuilt wheels avoid all these issues.

Which Windows versions are supported?

Prebuilt wheels work on Windows 10 and Windows 11 (64-bit). Windows Server 2019+ is also supported for deployment scenarios.

Can I use WSL instead?

Yes! WSL2 with Ubuntu is a great alternative. Linux wheels have better availability and you get full CUDA support. Select "Linux x86_64" in the wheel finder.