Skip to content

Improve performance in python masking#719

Merged
xylar merged 1 commit intoMPAS-Dev:masterfrom
xylar:improve-masking-performance
May 9, 2026
Merged

Improve performance in python masking#719
xylar merged 1 commit intoMPAS-Dev:masterfrom
xylar:improve-masking-performance

Conversation

@xylar
Copy link
Copy Markdown
Collaborator

@xylar xylar commented May 8, 2026

1. Vectorized geometry creation (~17× speedup on polygon building)

_get_polygons: replaced the Python for loop building shapely.geometry.Polygon objects one-by-one with a single shapely.polygons(coords_3d) call. For a 1 M-cell mesh this drops from ~11.5 s to ~0.68 s.

Three region/projection functions: same pattern for points — shapely.points(numpy.stack([lon, lat], axis=-1)) replaces the old list comprehension.

2. Build STRtree once, query per feature (~20× speedup for 30 transects)

_compute_transect_masks and _compute_region_masks: the old code (through _compute_mask_from_shapes) rebuilt an STRtree from a small chunk of polygons for every chunk × every feature. A 500 K-cell mesh with 30 transects now builds one STRtree (0.12 s) and queries it 30 times (0.011 s total) instead of rebuilding ~15 000 small trees.

3. Thread-parallel feature queries (no serialization overhead)

When pool is provided, a ThreadPoolExecutor with the same worker count is used to run feature queries in parallel. Shapely 2.x releases the GIL during geometric operations, so threads genuinely run concurrently without the pickle/IPC overhead that made the old multiprocessing chunk approach expensive.

4. Flood-fill via scipy.connected_components (~16× speedup)

_flood_fill_mask: the Python while True BFS loop (O(diameter × cells) Python iterations) is replaced by a pure-C scipy sparse graph path: build the adjacency in numpy → connected_components → keep components that contain a seed cell. Tested at 0.007 s vs 0.113 s on a 40 000-cell grid.

5. Removed dead code

_compute_mask_from_shapes, _contains, and _intersects are deleted; functools.partial and progressbar imports removed with them.

**\_get\_polygons**: replaced the Python for loop building shapely.geometry.Polygon objects one-by-one with a single shapely.polygons(coords\_3d) call. For a 1 M-cell mesh this drops from \~11.5 s to \~0.68 s.

**Three region/projection functions**: same pattern for points — shapely.points(numpy.stack(\[lon, lat\], axis=-1)) replaces the old list comprehension.

**\_compute\_transect\_masks** and **\_compute\_region\_masks**: the old code (through \_compute\_mask\_from\_shapes) rebuilt an STRtree from a *small chunk* of polygons for every chunk × every feature. A 500 K-cell mesh with 30 transects now builds one STRtree (0.12 s) and queries it 30 times (0.011 s total) instead of rebuilding \~15 000 small trees.

When pool is provided, a ThreadPoolExecutor with the same worker count is used to run feature queries in parallel. Shapely 2.x releases the GIL during geometric operations, so threads genuinely run concurrently without the pickle/IPC overhead that made the old multiprocessing chunk approach expensive.

**\_flood\_fill\_mask**: the Python while True BFS loop (O(diameter × cells) Python iterations) is replaced by a pure-C scipy sparse graph path: build the adjacency in numpy → connected\_components → keep components that contain a seed cell. Tested at 0.007 s vs 0.113 s on a 40 000-cell grid.

\_compute\_mask\_from\_shapes, \_contains, and \_intersects are deleted; functools.partial and progressbar imports removed with them.
@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented May 8, 2026

Mask creation (particularly for transects) has been a bottleneck in culling MPAS meshes in Compass and Polaris. My hope with this work is that we can reduce that bottleneck, perhaps significantly for large meshes.

@xylar
Copy link
Copy Markdown
Collaborator Author

xylar commented May 9, 2026

I ran the e3sm/init/icos30km/topo/cull test from Polaris on Chrysalis. Without these changes, I see:

  * step: cull_mask
          execution:        SUCCESS
          runtime:          0:14:33

With them, I see:

  * step: cull_mask
          execution:        SUCCESS
          runtime:          0:00:48

That's astounding! Results are bit-for-bit.

@xylar xylar merged commit 1c5a2c3 into MPAS-Dev:master May 9, 2026
5 checks passed
@xylar xylar deleted the improve-masking-performance branch May 9, 2026 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant