Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCSMesh Improvement #58

Open
SorooshMani-NOAA opened this issue Mar 14, 2024 · 3 comments
Open

OCSMesh Improvement #58

SorooshMani-NOAA opened this issue Mar 14, 2024 · 3 comments
Labels
GSoC24 project idea Designates a proposed project idea

Comments

@SorooshMani-NOAA
Copy link
Contributor

SorooshMani-NOAA commented Mar 14, 2024

Project Description

OCSMesh is an open source tool for preparation of unstructured mesh for coastal ocean model

Getting started with OCSMesh:

The code was developed while trying to abide by object oriented design principles. We'd like to make the API more intuitive and generally improve the performance of the application. The improvements in mind includes, but are not limited to:

  • Better multiprocessing (distributed/MPI/Refactor with Dask/etc.)
    • Parallel size function calculation
  • Embed vertices or edges into a mesh
  • Mesh cleanup tools
  • Distance based size specification
  • Detect quad-likes in rivers and merge
  • Snap mesh to line/boundary

Expected Outcomes

Improved OCSMesh on a dev branch which is ready to be merged to main

Skills required

  • Python
    • Dask
    • Xarray
    • Multiprocessing
    • GIS libraries
  • Familiarity with mesh generation
  • Familiarity with numerical modeling

Mentor(s)

Soroosh Mani (@SorooshMani-NOAA), Atieh Alipour (@AtiehAlipour-NOAA)

Expected Project Size

350

What is the difficulty of the project?

Intermediate

@SorooshMani-NOAA SorooshMani-NOAA added GSoC24 project idea Designates a proposed project idea labels Mar 14, 2024
@Jaeun-Shin
Copy link

Hello @SorooshMani-NOAA and @AtiehAlipour-NOAA,

My name is Jaeun Shin, an incoming engineering student at University of Cambridge. I have a strong interest in programming and environmental science. I've explored OCSMesh and am captivated by its mission to facilitate the preparation of unstructured mesh for coastal ocean modeling. I'm keen on contributing, especially in performance optimization and mesh generation enhancements, as part of Google Summer of Code.

Regarding distributed computing in OCSMesh, could you share insights on specific bottlenecks with the current multiprocessing approach? How do you see Dask improving these aspects, and what are the priorities for ensuring compatibility and performance across different environments?

Looking forward to potentially contributing and learning with your team!

@SorooshMani-NOAA
Copy link
Contributor Author

Hi @Jaeun-Shin, thank you for showing interest in this project.

From a high level, in a simple workflow OCSMesh processes input raster files to:

  1. Extract the domain of meshing (based on elevation, etc.)
  2. Calculate a field of element sizes within the domain.

Of course there are other possible workflows, but let's ignore them for now.

The main bottleneck is in processing these rasters efficiently:

  • Partly because out-of-core as well as chunked computation is implemented "manually" (i.e. by explicitly writing to disk and holding onto file objects and paths), which prevents OCSMesh from parallelizing parts of the workflow (because file objects are not pickleable). Using Dask and better class structures can help with that for example.
  • There's also the problem of using multiprocessing for parallelization: many modelers have access to HPC environments where you can gain a lot in performance if you distribute the work properly, with multiprocessing one can only use the resources available on a single machine. Dask again is a good example of a library that helps with this too, although there might be other libraries that provide the same benefits.
  • The two problems above are specially visible when dealing with a lot of high quality DEMs, e.g. 650 NCEI CUDEM for entire US east coast.

Other than that there are other areas of improvements

  • A raster object can be modified as a side effect of some OCSMesh processes (such as Collector-type size function or geometry objects), this makes the meshing process confusing
  • OCSMesh projects everything to a local UTM and then starts meshing, this can be problematic in meshing large regions
  • Issue when dealing with mesh in regions across date-line, e.g. Pacific domains
  • OCSMesh needs better mesh cleanup tools (for mesh post processing)

I hope that makes things a bit clearer.

@Jaeun-Shin
Copy link

Thank you for the clarification. It precisely highlighted where this project can be enhanced. I will delve into Dask and the codebase to propose a more effective solution in my proposal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GSoC24 project idea Designates a proposed project idea
Projects
None yet
Development

No branches or pull requests

2 participants