Skip to content

Incorrect GroupNormalization result of TensorRT 10.16.1.11 when running ONNX GroupNormalization(num_groups=1) on GPU #4756

@ALinrunrun

Description

@ALinrunrun

Description

TensorRT appears to handle ONNX GroupNormalization with num_groups=1 incorrectly.

For GroupNormalization(num_groups=1), the normalization should be computed across all channels and spatial dimensions in each sample, matching PyTorch torch.nn.functional.group_norm(..., num_groups=1) and ONNX Runtime.

However, TensorRT produces an output that matches per-channel instance normalization instead. In other words, TensorRT seems to normalize each channel independently, as if num_groups=C, rather than treating all channels as one group.

The same ONNX model runs correctly in ONNX Runtime and matches PyTorch.

Environment

TensorRT Version: 10.16.1.11

NVIDIA GPU: N/A / not detected by nvidia-smi

NVIDIA Driver Version: N/A / nvidia-smi failed

CUDA Version: N/A / nvcc not found

CUDNN Version: N/A / torch.backends.cudnn.version() returned None

Operating System: Linux 6.17.0-20-generic x86_64, glibc 2.39

Python Version (if applicable): Python 3.11.15

Tensorflow Version (if applicable): N/A

PyTorch Version (if applicable): 2.11.0+cpu

Baremetal or Container (if so, version): Baremetal / non-Docker environment (/proc/1/cgroup: 0::/init.scope)

Additional package versions:

ONNX Version: 1.21.0
ONNX Runtime Version: 1.25.1

Relevant Files

Model link: N/A

The ONNX model is generated inline by the minimal reproducible script below.

Steps To Reproduce

import numpy as np
import onnx
from onnx import helper, TensorProto
import torch
import onnxruntime as ort
from _trt_helper import build_engine_from_onnx, run_engine

C, H, W = 8, 4, 4
X = helper.make_tensor_value_info("X", TensorProto.FLOAT, [1, C, H, W])
Y = helper.make_tensor_value_info("Y", TensorProto.FLOAT, [1, C, H, W])

scale = helper.make_tensor("scale", TensorProto.FLOAT, [C], np.ones(C, np.float32))
bias = helper.make_tensor("bias", TensorProto.FLOAT, [C], np.zeros(C, np.float32))

node = helper.make_node(
    "GroupNormalization",
    ["X", "scale", "bias"],
    ["Y"],
    num_groups=1,
    epsilon=1e-5,
)

g = helper.make_graph([node], "g", [X], [Y], initializer=[scale, bias])
m = helper.make_model(g, opset_imports=[helper.make_opsetid("", 21)])
m.ir_version = 10
onnx.checker.check_model(m)
onnx_bytes = m.SerializeToString()

x = np.random.default_rng(0).standard_normal((1, C, H, W)).astype(np.float32)

eng, _ = build_engine_from_onnx(onnx_bytes, fp16=False)
trt_out = run_engine(
    eng,
    {"X": x},
    ["Y"],
    [(1, C, H, W)],
    [np.float32],
)["Y"]

torch_out = torch.nn.functional.group_norm(
    torch.from_numpy(x),
    1,
    weight=torch.ones(C),
    bias=torch.zeros(C),
    eps=1e-5,
).numpy()

ort_out = ort.InferenceSession(
    onnx_bytes,
    providers=["CPUExecutionProvider"],
).run(["Y"], {"X": x})[0]

instance_out = np.zeros_like(x)
for c in range(C):
    instance_out[0, c] = (
        x[0, c] - x[0, c].mean()
    ) / np.sqrt(x[0, c].var() + 1e-5)

print("TRT[0,0,0,:4]:     ", trt_out[0, 0, 0, :4])
print("torch[0,0,0,:4]:   ", torch_out[0, 0, 0, :4])
print("ORT[0,0,0,:4]:     ", ort_out[0, 0, 0, :4])
print("instance[0,0,0,:4]:", instance_out[0, 0, 0, :4])
print("max|TRT - torch|:", float(np.max(np.abs(trt_out - torch_out))))
print("max|TRT - ORT|:  ", float(np.max(np.abs(trt_out - ort_out))))
print("max|TRT - instance|:", float(np.max(np.abs(trt_out - instance_out))))

assert np.max(np.abs(trt_out - torch_out)) > 1e-2
assert np.max(np.abs(trt_out - instance_out)) < 1e-4

Commands or scripts:

Have you tried the latest release?: Yes, reproduced with TensorRT 10.16.1.11.

Attach the captured .json and .bin files from TensorRT's API Capture tool if you're on an x86_64 Unix system Not attached. The issue is reproducible from the self-contained Python script above.

Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (polygraphy run <model.onnx> --onnxrt): Yes. ONNX Runtime and PyTorch agree with each other. TensorRT differs from both and instead matches per-channel instance normalization.

Actual output:

TRT[0,0,0,:4]:      [0.4456725  0.15238671 1.031132   0.4219784 ]
torch[0,0,0,:4]:    [ 0.06682756 -0.2042495   0.6079537   0.04492765]
ORT[0,0,0,:4]:      [ 0.06682757 -0.20424952  0.6079538   0.04492767]
instance[0,0,0,:4]: [0.4456726  0.15238675 1.0311322  0.4219785 ]
max|TRT - torch|: 0.6626157760620117
max|TRT - ORT|:   0.6626157760620117
max|TRT - instance|: 2.384185791015625e-07

This suggests TensorRT is treating GroupNormalization(num_groups=1) like instance normalization instead of normalizing over the single group containing all channels.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:ONNXIssues relating to ONNX usage and import

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions