Improve performance in python masking by xylar · Pull Request #719 · MPAS-Dev/MPAS-Tools

xylar · 2026-05-08T19:51:59Z

1. Vectorized geometry creation (~17× speedup on polygon building)

_get_polygons: replaced the Python for loop building shapely.geometry.Polygon objects one-by-one with a single shapely.polygons(coords_3d) call. For a 1 M-cell mesh this drops from ~11.5 s to ~0.68 s.

Three region/projection functions: same pattern for points — shapely.points(numpy.stack([lon, lat], axis=-1)) replaces the old list comprehension.

2. Build STRtree once, query per feature (~20× speedup for 30 transects)

_compute_transect_masks and _compute_region_masks: the old code (through _compute_mask_from_shapes) rebuilt an STRtree from a small chunk of polygons for every chunk × every feature. A 500 K-cell mesh with 30 transects now builds one STRtree (0.12 s) and queries it 30 times (0.011 s total) instead of rebuilding ~15 000 small trees.

3. Thread-parallel feature queries (no serialization overhead)

When pool is provided, a ThreadPoolExecutor with the same worker count is used to run feature queries in parallel. Shapely 2.x releases the GIL during geometric operations, so threads genuinely run concurrently without the pickle/IPC overhead that made the old multiprocessing chunk approach expensive.

4. Flood-fill via scipy.connected_components (~16× speedup)

_flood_fill_mask: the Python while True BFS loop (O(diameter × cells) Python iterations) is replaced by a pure-C scipy sparse graph path: build the adjacency in numpy → connected_components → keep components that contain a seed cell. Tested at 0.007 s vs 0.113 s on a 40 000-cell grid.

5. Removed dead code

_compute_mask_from_shapes, _contains, and _intersects are deleted; functools.partial and progressbar imports removed with them.

**\_get\_polygons**: replaced the Python for loop building shapely.geometry.Polygon objects one-by-one with a single shapely.polygons(coords\_3d) call. For a 1 M-cell mesh this drops from \~11.5 s to \~0.68 s. **Three region/projection functions**: same pattern for points — shapely.points(numpy.stack(\[lon, lat\], axis=-1)) replaces the old list comprehension. **\_compute\_transect\_masks** and **\_compute\_region\_masks**: the old code (through \_compute\_mask\_from\_shapes) rebuilt an STRtree from a *small chunk* of polygons for every chunk × every feature. A 500 K-cell mesh with 30 transects now builds one STRtree (0.12 s) and queries it 30 times (0.011 s total) instead of rebuilding \~15 000 small trees. When pool is provided, a ThreadPoolExecutor with the same worker count is used to run feature queries in parallel. Shapely 2.x releases the GIL during geometric operations, so threads genuinely run concurrently without the pickle/IPC overhead that made the old multiprocessing chunk approach expensive. **\_flood\_fill\_mask**: the Python while True BFS loop (O(diameter × cells) Python iterations) is replaced by a pure-C scipy sparse graph path: build the adjacency in numpy → connected\_components → keep components that contain a seed cell. Tested at 0.007 s vs 0.113 s on a 40 000-cell grid. \_compute\_mask\_from\_shapes, \_contains, and \_intersects are deleted; functools.partial and progressbar imports removed with them.

xylar · 2026-05-08T19:53:58Z

Mask creation (particularly for transects) has been a bottleneck in culling MPAS meshes in Compass and Polaris. My hope with this work is that we can reduce that bottleneck, perhaps significantly for large meshes.

xylar · 2026-05-09T08:14:23Z

I ran the e3sm/init/icos30km/topo/cull test from Polaris on Chrysalis. Without these changes, I see:

  * step: cull_mask
          execution:        SUCCESS
          runtime:          0:14:33

With them, I see:

  * step: cull_mask
          execution:        SUCCESS
          runtime:          0:00:48

That's astounding! Results are bit-for-bit.

xylar self-assigned this May 8, 2026

xylar added enhancement conda package labels May 8, 2026

xylar merged commit 1c5a2c3 into MPAS-Dev:master May 9, 2026
5 checks passed

xylar deleted the improve-masking-performance branch May 9, 2026 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve performance in python masking#719

Improve performance in python masking#719
xylar merged 1 commit intoMPAS-Dev:masterfrom
xylar:improve-masking-performance

xylar commented May 8, 2026 •

edited

Loading

Uh oh!

xylar commented May 8, 2026

Uh oh!

xylar commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xylar commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

1. Vectorized geometry creation (~17× speedup on polygon building)

2. Build STRtree once, query per feature (~20× speedup for 30 transects)

3. Thread-parallel feature queries (no serialization overhead)

4. Flood-fill via scipy.connected_components (~16× speedup)

5. Removed dead code

Uh oh!

xylar commented May 8, 2026

Uh oh!

xylar commented May 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xylar commented May 8, 2026 •

edited

Loading