Graph kernel nodes don't keep kernel argument objects alive

## Summary

`GraphDefinition.launch()` does not extend the lifetime of Python kernel-argument objects to the lifetime of the graph. Inside `GN_launch`, a `ParamHolder` is built from the call's `args` tuple and holds the only strong references to those objects; when `GN_launch` returns, the `ParamHolder` is destroyed and its references drop.

The CUDA driver records the raw pointer values into the kernel node, but it does not (and cannot) retain the Python objects those pointers refer to. So if a kernel argument is reachable only through the `.launch()` call - for example, a `Buffer` passed inline - it can be garbage-collected before the graph runs. The graph then holds a stale device pointer, and instantiating/launching it produces undefined behavior: typically memory corruption or a crash.

`KernelHandle` and `EventHandle` are already retained via `_attach_user_object` for exactly this reason. Kernel arguments need the same treatment.

## Reproducer

```python
def test_kernel_args_buffer_not_kept_alive():
    """Buffer passed as a kernel arg is not kept alive by the graph.

    The ParamHolder built in GN_launch holds the only references to the
    kernel args and is destroyed when GN_launch returns. With no other
    reference to the Buffer, its device memory is freed and the graph
    is left with a stale pointer.
    """
    import ctypes
    import gc
    import weakref

    from cuda.bindings import driver as drv
    from cuda.core import Device, DeviceMemoryResource, LaunchConfig, Program, ProgramOptions
    from cuda.core.graph import GraphDefinition

    dev = Device()
    dev.set_current()
    stream = dev.create_stream()
    mr = DeviceMemoryResource(dev)

    arch = "".join(f"{i}" for i in dev.compute_capability)
    prog = Program(
        "__global__ void add_one(int *a) { *a += 1; }",
        code_type="c++",
        options=ProgramOptions(std="c++17", arch=f"sm_{arch}"),
    )
    add_one = prog.compile("cubin", name_expressions=("add_one",)).get_kernel("add_one")
    config = LaunchConfig(grid=1, block=1)

    buf = mr.allocate(ctypes.sizeof(ctypes.c_int), stream=stream)
    stream.sync()
    buf_weak = weakref.ref(buf)
    drv.cuMemcpyHtoD(int(buf.handle), (ctypes.c_int * 1)(0), ctypes.sizeof(ctypes.c_int))

    g = GraphDefinition()
    g.launch(config, add_one, buf)

    del buf
    gc.collect()
    assert buf_weak() is None  # nothing in the graph kept the Buffer alive

    g.instantiate().launch(stream)
    stream.sync()  # crash or memory corruption
```

## Proposed fix

In `GN_launch`, attach the `ParamHolder` (or just its `kernel_args` tuple) to the graph as a CUDA user object, mirroring the existing handling of `KernelHandle`. That ties the Python-side argument lifetimes to the graph itself.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graph kernel nodes don't keep kernel argument objects alive #2039

Summary

Reproducer

Proposed fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Graph kernel nodes don't keep kernel argument objects alive #2039

Description

Summary

Reproducer

Proposed fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions