-
Notifications
You must be signed in to change notification settings - Fork 736
Pull requests: NVIDIA/TransformerEngine
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[PyTorch] Use data shape for MXFP8 pointer swizzle
org-contribution
#3070
opened Jun 1, 2026 by
jepio
Member
Loading…
7 of 13 tasks
[Pytorch] Add variable-K Cutlass GroupGEMM for fine-grained MoE wgrad
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3069
opened Jun 1, 2026 by
cassiewilliam
Contributor
Loading…
6 of 8 tasks
Add FP16 error modes for NVFP4 4over6
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3068
opened Jun 1, 2026 by
zianglih
Contributor
Loading…
9 of 13 tasks
[Fix] Fix CUTLASS grouped GEMM segfault for empty groups
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3067
opened Jun 1, 2026 by
Baibaifan
Loading…
[PyTorch] Propagate skip_fp8_weight_update in GroupedLinear during FP8 CUDA graph capture
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3065
opened May 31, 2026 by
LeSingh1
Contributor
Loading…
Add public NVFP4 quantize API
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3064
opened May 31, 2026 by
zianglih
Contributor
Loading…
8 of 13 tasks
fix unfused padding causal sdpa
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3063
opened May 31, 2026 by
hungryGeek16
Loading…
increasing precision tolerance
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3060
opened May 29, 2026 by
francesco-bertolotti
Contributor
Loading…
4 of 13 tasks
[JAX] Grouped quant+GEMM custom partitioning rules
#3058
opened May 28, 2026 by
jberchtold-nvidia
Collaborator
•
Draft
13 tasks
[Common/PyTorch] bugfix: Token-linear fused RoPE impl. for THD tensors.
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3057
opened May 28, 2026 by
plugyawn
Loading…
7 of 13 tasks
[JAX] [PyT] [Common] Enable D=256 BWD cuDNN fused attn for Blackwell CC 10.x
#3056
opened May 28, 2026 by
KshitijLakhani
Collaborator
Loading…
7 of 13 tasks
[PyTorch] Propagate FP8 graph weight update flag in GroupedLinear
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3052
opened May 28, 2026 by
allenphilipj
Loading…
[PyTorch] Integrate the cuBLAS single GEMM MXFP8 NN, NT support for sm120
#3050
opened May 28, 2026 by
KshitijLakhani
Collaborator
•
Draft
7 of 13 tasks
Enable NVFP4 fused grouped MLP
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
org-contribution
#3048
opened May 27, 2026 by
sraman-rgb
Contributor
Loading…
1 of 13 tasks
Feat/selective offload on srelu fuser
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3047
opened May 27, 2026 by
lhb8125
Contributor
Loading…
13 tasks
Add NVFP4 per-token quantization recipe
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
[PyTorch Debug] Add scale_inv_std stat and skip NVFP4 layers in LogFp8TensorStats
#3044
opened May 26, 2026 by
pggPL
Collaborator
Loading…
docs: expand comm gemm overlap guidance
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3043
opened May 26, 2026 by
omribz156
Loading…
5 of 13 tasks
Use cuDNN for row-scaled NVFP4 grouped GEMM
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
[PyTorch debug] FakeQuant: support Float8BlockScaling and fix MoE / w…
community-contribution
PRs from external contributor outside the core maintainers, representing community-driven work.
#3040
opened May 25, 2026 by
shangxiaokang
•
Draft
13 tasks
[JAX] Expert Parallelism: JAX primitives + VJPs
#3036
opened May 22, 2026 by
phu0ngng
Collaborator
Loading…
8 of 13 tasks
Expert Parallelism: common C API + NCCL EP backend
#3034
opened May 22, 2026 by
phu0ngng
Collaborator
Loading…
8 of 13 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.