[PyTorch/Common] Remove legacy FP8DS implementation #2959
[PyTorch/Common] Remove legacy FP8DS implementation #2959cyanguwa merged 11 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Greptile SummaryThis PR removes the legacy cuDNN v0 FP8 Delayed Scaling attention implementation that supported only
Confidence Score: 5/5Safe to merge — the removal is complete and internally consistent across all layers (CUDA kernel, C++ dispatch, Python bindings, context-parallel helpers, and tests). All ZInv references have been eliminated from every layer of the stack (verified via grep). The aux-tensor index arithmetic in fwd and bwd is consistent with the new two-element layout [S, rng_state]. The T3HD dispatch branch is cleanly removed from the backend selector and both fwd/bwd call sites. The renamed functions (dropping the _v1 suffix) match their new declarations and callers. No orphaned declarations or dangling includes remain. No files require special attention. The one stale docstring in test_attention.py is cosmetic only. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["nvte_fused_attn_fwd / bwd"] --> B{dtype}
B -->|FP8| C{qkv_format}
B -->|F16/BF16| D[F16 arbitrary-seqlen backend]
C -->|BSHD / SBHD / BHSD| E["fused_attn_fp8_fwd_impl\n(cuDNN FE v1.0+)"]
C -->|T3HD removed| F["REMOVED: fused_attn_fp8_fwd_impl v0\n(cuDNN 8.9, seqlen<=512, d=64)"]
C -->|other| G[NVTE_ERROR]
E --> H["Aux tensors: S, rng_state"]
F -.->|"was: S, ZInv, rng_state"| H
Reviews (7): Last reviewed commit: "Merge branch 'main' into remove_fp8_v0" | Re-trigger Greptile |
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
for more information, see https://pre-commit.ci
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
|
/te-ci L0 |
Signed-off-by: Charlene Yang <8636796+cyanguwa@users.noreply.github.com>
|
/te-ci pytorch L0 |
Description
This PR removes a legacy path of FP8 Delayed Scaling implementation from TE 1.6.0. It supports T3HD with max_seq_len<=512, head_dim=64, and padding mask. cudnn-frontend will remove their pre-FORT hand-written FMHA kernels (MR2829) hence the removal of this FP8 implementation here. General THD support for FP8 will be added in future PRs.
Type of change
Changes
See Description.
Checklist: