Improve performance in python masking#719
Merged
xylar merged 1 commit intoMPAS-Dev:masterfrom May 9, 2026
Merged
Conversation
**\_get\_polygons**: replaced the Python for loop building shapely.geometry.Polygon objects one-by-one with a single shapely.polygons(coords\_3d) call. For a 1 M-cell mesh this drops from \~11.5 s to \~0.68 s. **Three region/projection functions**: same pattern for points — shapely.points(numpy.stack(\[lon, lat\], axis=-1)) replaces the old list comprehension. **\_compute\_transect\_masks** and **\_compute\_region\_masks**: the old code (through \_compute\_mask\_from\_shapes) rebuilt an STRtree from a *small chunk* of polygons for every chunk × every feature. A 500 K-cell mesh with 30 transects now builds one STRtree (0.12 s) and queries it 30 times (0.011 s total) instead of rebuilding \~15 000 small trees. When pool is provided, a ThreadPoolExecutor with the same worker count is used to run feature queries in parallel. Shapely 2.x releases the GIL during geometric operations, so threads genuinely run concurrently without the pickle/IPC overhead that made the old multiprocessing chunk approach expensive. **\_flood\_fill\_mask**: the Python while True BFS loop (O(diameter × cells) Python iterations) is replaced by a pure-C scipy sparse graph path: build the adjacency in numpy → connected\_components → keep components that contain a seed cell. Tested at 0.007 s vs 0.113 s on a 40 000-cell grid. \_compute\_mask\_from\_shapes, \_contains, and \_intersects are deleted; functools.partial and progressbar imports removed with them.
Collaborator
Author
|
Mask creation (particularly for transects) has been a bottleneck in culling MPAS meshes in Compass and Polaris. My hope with this work is that we can reduce that bottleneck, perhaps significantly for large meshes. |
Collaborator
Author
|
I ran the With them, I see: That's astounding! Results are bit-for-bit. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
1. Vectorized geometry creation (~17× speedup on polygon building)
_get_polygons: replaced the Python for loop building
shapely.geometry.Polygonobjects one-by-one with a singleshapely.polygons(coords_3d)call. For a 1 M-cell mesh this drops from ~11.5 s to ~0.68 s.Three region/projection functions: same pattern for points — shapely.points(numpy.stack([lon, lat], axis=-1)) replaces the old list comprehension.
2. Build STRtree once, query per feature (~20× speedup for 30 transects)
_compute_transect_masks and _compute_region_masks: the old code (through _compute_mask_from_shapes) rebuilt an STRtree from a small chunk of polygons for every chunk × every feature. A 500 K-cell mesh with 30 transects now builds one STRtree (0.12 s) and queries it 30 times (0.011 s total) instead of rebuilding ~15 000 small trees.
3. Thread-parallel feature queries (no serialization overhead)
When pool is provided, a ThreadPoolExecutor with the same worker count is used to run feature queries in parallel. Shapely 2.x releases the GIL during geometric operations, so threads genuinely run concurrently without the pickle/IPC overhead that made the old multiprocessing chunk approach expensive.
4. Flood-fill via scipy.connected_components (~16× speedup)
_flood_fill_mask: the Python while True BFS loop (O(diameter × cells) Python iterations) is replaced by a pure-C scipy sparse graph path: build the adjacency in numpy → connected_components → keep components that contain a seed cell. Tested at 0.007 s vs 0.113 s on a 40 000-cell grid.
5. Removed dead code
_compute_mask_from_shapes, _contains, and _intersects are deleted; functools.partial and progressbar imports removed with them.