Description
TensorRT appears to handle ONNX GroupNormalization with num_groups=1 incorrectly.
For GroupNormalization(num_groups=1), the normalization should be computed across all channels and spatial dimensions in each sample, matching PyTorch torch.nn.functional.group_norm(..., num_groups=1) and ONNX Runtime.
However, TensorRT produces an output that matches per-channel instance normalization instead. In other words, TensorRT seems to normalize each channel independently, as if num_groups=C, rather than treating all channels as one group.
The same ONNX model runs correctly in ONNX Runtime and matches PyTorch.
Environment
TensorRT Version: 10.16.1.11
NVIDIA GPU: N/A / not detected by nvidia-smi
NVIDIA Driver Version: N/A / nvidia-smi failed
CUDA Version: N/A / nvcc not found
CUDNN Version: N/A / torch.backends.cudnn.version() returned None
Operating System: Linux 6.17.0-20-generic x86_64, glibc 2.39
Python Version (if applicable): Python 3.11.15
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): 2.11.0+cpu
Baremetal or Container (if so, version): Baremetal / non-Docker environment (/proc/1/cgroup: 0::/init.scope)
Additional package versions:
ONNX Version: 1.21.0
ONNX Runtime Version: 1.25.1
Relevant Files
Model link: N/A
The ONNX model is generated inline by the minimal reproducible script below.
Steps To Reproduce
import numpy as np
import onnx
from onnx import helper, TensorProto
import torch
import onnxruntime as ort
from _trt_helper import build_engine_from_onnx, run_engine
C, H, W = 8, 4, 4
X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [1, C, H, W])
Y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, [1, C, H, W])
scale = helper.make_tensor("scale", TensorProto.FLOAT, [C], np.ones(C, np.float32))
bias = helper.make_tensor("bias", TensorProto.FLOAT, [C], np.zeros(C, np.float32))
node = helper.make_node(
"GroupNormalization",
["X", "scale", "bias"],
["Y"],
num_groups=1,
epsilon=1e-5,
)
g = helper.make_graph([node], "g", [X], [Y], initializer=[scale, bias])
m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 21)])
m.ir_version = 10
onnx.checker.check_model(m)
onnx_bytes = m.SerializeToString()
x = np.random.default_rng(0).standard_normal((1, C, H, W)).astype(np.float32)
eng, _ = build_engine_from_onnx(onnx_bytes, fp16=False)
trt_out = run_engine(
eng,
{"X": x},
["Y"],
[(1, C, H, W)],
[np.float32],
)["Y"]
torch_out = torch.nn.functional.group_norm(
torch.from_numpy(x),
1,
weight=torch.ones(C),
bias=torch.zeros(C),
eps=1e-5,
).numpy()
ort_out = ort.InferenceSession(
onnx_bytes,
providers=["CPUExecutionProvider"],
).run(["Y"], {"X": x})[0]
instance_out = np.zeros_like(x)
for c in range(C):
instance_out[0, c] = (
x[0, c] - x[0, c].mean()
) / np.sqrt(x[0, c].var() + 1e-5)
print("TRT[0,0,0,:4]: ", trt_out[0, 0, 0, :4])
print("torch[0,0,0,:4]: ", torch_out[0, 0, 0, :4])
print("ORT[0,0,0,:4]: ", ort_out[0, 0, 0, :4])
print("instance[0,0,0,:4]:", instance_out[0, 0, 0, :4])
print("max|TRT - torch|:", float(np.max(np.abs(trt_out - torch_out))))
print("max|TRT - ORT|: ", float(np.max(np.abs(trt_out - ort_out))))
print("max|TRT - instance|:", float(np.max(np.abs(trt_out - instance_out))))
assert np.max(np.abs(trt_out - torch_out)) > 1e-2
assert np.max(np.abs(trt_out - instance_out)) < 1e-4
Commands or scripts:
Have you tried the latest release?: Yes, reproduced with TensorRT 10.16.1.11.
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system Not attached. The issue is reproducible from the self-contained Python script above.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes. ONNX Runtime and PyTorch agree with each other. TensorRT differs from both and instead matches per-channel instance normalization.
Actual output:
TRT[0,0,0,:4]: [0.4456725 0.15238671 1.031132 0.4219784 ]
torch[0,0,0,:4]: [ 0.06682756 -0.2042495 0.6079537 0.04492765]
ORT[0,0,0,:4]: [ 0.06682757 -0.20424952 0.6079538 0.04492767]
instance[0,0,0,:4]: [0.4456726 0.15238675 1.0311322 0.4219785 ]
max|TRT - torch|: 0.6626157760620117
max|TRT - ORT|: 0.6626157760620117
max|TRT - instance|: 2.384185791015625e-07
This suggests TensorRT is treating GroupNormalization(num_groups=1) like instance normalization instead of normalizing over the single group containing all channels.
Description
TensorRT appears to handle ONNX
GroupNormalizationwithnum_groups=1incorrectly.For
GroupNormalization(num_groups=1), the normalization should be computed across all channels and spatial dimensions in each sample, matching PyTorchtorch.nn.functional.group_norm(..., num_groups=1)and ONNX Runtime.However, TensorRT produces an output that matches per-channel instance normalization instead. In other words, TensorRT seems to normalize each channel independently, as if
num_groups=C, rather than treating all channels as one group.The same ONNX model runs correctly in ONNX Runtime and matches PyTorch.
Environment
TensorRT Version: 10.16.1.11
NVIDIA GPU: N/A / not detected by nvidia-smi
NVIDIA Driver Version: N/A / nvidia-smi failed
CUDA Version: N/A / nvcc not found
CUDNN Version: N/A / torch.backends.cudnn.version() returned None
Operating System: Linux 6.17.0-20-generic x86_64, glibc 2.39
Python Version (if applicable): Python 3.11.15
Tensorflow Version (if applicable): N/A
PyTorch Version (if applicable): 2.11.0+cpu
Baremetal or Container (if so, version): Baremetal / non-Docker environment (/proc/1/cgroup: 0::/init.scope)
Additional package versions:
ONNX Version: 1.21.0
ONNX Runtime Version: 1.25.1
Relevant Files
Model link: N/A
The ONNX model is generated inline by the minimal reproducible script below.
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?: Yes, reproduced with TensorRT 10.16.1.11.
Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system Not attached. The issue is reproducible from the self-contained Python script above.
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt): Yes. ONNX Runtime and PyTorch agree with each other. TensorRT differs from both and instead matches per-channel instance normalization.Actual output:
This suggests TensorRT is treating GroupNormalization(num_groups=1) like instance normalization instead of normalizing over the single group containing all channels.