Skip to content

Conversation

@bobpaw
Copy link
Collaborator

@bobpaw bobpaw commented Jun 3, 2025

Add METIS splitter/balancer

Add a direct METIS splitter and a balancer which only localizes graph connectivity. Adds msplit_2 and msplit_3 test cases.

Waiting on #486 to avoid rebasing.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 3, 2025

@cwsmith I want to move apf::tagOpposites from zoltan/apfInterElement.cc to some file in the apf/ folder (maybe "apfOpposites.cc". Is that ok? I use the function in my METIS balancer.

@cwsmith
Copy link
Contributor

cwsmith commented Jun 3, 2025

@bobpaw How about making it a free function in apfMesh.h ? IIRC, it tags the face (element) on the other side of a given edge (face).

@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 3, 2025

@cwsmith the only thing is there are some static helper functions. But if you prefer to group functions in apfMesh.cc then I can rename any conflicts.

I can also put the declaration in apfMesh.h and only keep the definitions separate which might be the best option.

@cwsmith
Copy link
Contributor

cwsmith commented Jun 4, 2025

/runtests

@github-actions
Copy link

github-actions bot commented Jun 4, 2025

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: success

@bobpaw bobpaw force-pushed the apw/lgmetis branch 2 times, most recently from 7ee49f5 to 4edc9ca Compare June 5, 2025 00:12
@bobpaw bobpaw marked this pull request as ready for review June 6, 2025 20:06
@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 7, 2025

Checks for the user example project are failing because the CI builds make static libraries. There is a line target_link_libraries(apf_mestis PRIVATE METIS::METIS) which adds the imported target from cmake/FindMETIS.cmake. This is great on paper because it saves lines linking libraries, includes, dependencies, but works best with shared libraries where the link path for METIS is written to apf_metis.so. Otherwise, CMake needs to know where to find METIS, and unfortunately I can't find a good way to export the IMPORTED library from FindMETIS. For now I just use the METIS_LIBRARIES/METIS_INCLUDE_DIRS as is done in other libraries (apf_zoltan). That adds the absolute path of the library to the exported apf_metis target.

@bobpaw bobpaw requested a review from cwsmith June 8, 2025 01:42
Copy link
Contributor

@cwsmith cwsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you.

A handful of relatively small comments are below.

In general, it would be nice to see more asserts on function inputs, but if other places in the code are already checking then it is fine.

Is there still an issue with shared vs static linking?

@cwsmith
Copy link
Contributor

cwsmith commented Jun 8, 2025

/runtests

@github-actions
Copy link

github-actions bot commented Jun 8, 2025

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: success

@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 9, 2025

In general, it would be nice to see more asserts on function inputs, but if other places in the code are already checking then it is fine.

Should I be adding PCU_ALWAYS_ASSERT or PCU_DEBUG_ASSERT in general?

@cwsmith
Copy link
Contributor

cwsmith commented Jun 9, 2025

I'd prefer ALWAYS... but if we documented a non-trivial performance hit then DEBUG would be fine. Alternatively, functions that are called infrequently can be ALWAYS while functions that are called many times (i.e., once per mesh entity) could use DEBUG.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 9, 2025

Maybe add a check that tolerance is within a reasonable range?

I checked for > 1.0 which is the only requirement that METIS documents -- not sure if you wanted an upper bound.

@bobpaw
Copy link
Collaborator Author

bobpaw commented Jun 10, 2025

(Rebased onto develop)

@bobpaw bobpaw requested a review from cwsmith June 10, 2025 19:58
@cwsmith
Copy link
Contributor

cwsmith commented Jun 10, 2025

/runtests

Copy link
Contributor

@cwsmith cwsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thank you. I only have one minor suggestion below.

@github-actions
Copy link

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: success

@bobpaw bobpaw requested a review from cwsmith June 11, 2025 03:45
@cwsmith
Copy link
Contributor

cwsmith commented Jun 11, 2025

/runtests

@github-actions
Copy link

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: success

bobpaw added 3 commits June 11, 2025 11:08
- for balancing, transfer graph info back to rank 0 and call metis
  partitioner.

Signed-off-by: Aiden Woodruff <[email protected]>
- ma/CMakeLists.txt: add METIS compile/link flags.
- ma/maBalance.cc: move <numeric> and <mpi.h> to global scope.
- (runLocalizedGraphMetis): build adj_cts first and then adjncy to put
  edges in the correct locations.
- get number when assigning adjncy to put edges in the right spot.
- adjust iteration variable names.
- add #ifdef APW_LGMETIS_SER to partition without migrating.
- add #ifdef APW_LGMETIS_VIZ to make VTK renders of migration plan.
- calculate edge cut.
- test/CMakeLists.txt: add lgmsplit test.
- test/lgmsplit.cc: add test of localizedGraphMetis copied from
  split.cc.

Signed-off-by: Aiden Woodruff <[email protected]>
- ma/maBalance.cc: add fstream for writing debug info.
- (runLocalizedGraphMetis): only allow 3d mesh.
- remove early Allgather of global element counts.
    - rename nelm to n_owned_elm.
- update makeGlobal comment.
- use Exscan for global numbering.
- rename `opposites` to `opp_tag` to clarify meaning.
- remove local_xadj fixme since it's no longer recalculated.
- add fixmes.
- refactor xadj and adjncy localization.
- fix adjncy Gatherv displs.
- add APW_LGMETIS_DUMP vtk section compiled conditionally.
  - takes time to localize centroids.
- add APW_LGMETIS_SER to force splitting on serial runs.
- update scatterv call.

Signed-off-by: Aiden Woodruff <[email protected]>
bobpaw added 23 commits June 11, 2025 11:08
- metis/CMakeLists.txt: use different sources depending on ENABLE_METIS.
- add apfMETISempty.cc without ENABLE_METIS.
- metis/apfMETIS.h: add apf::hasMETIS.
- metis/apfMETIS.cc: define hasMETIS.
- metis/apfMETISempty.cc: add empty defintions like apf_zoltan.
- define hasMETIS.
- test/CMakeLists.txt: compile msplit conditionally.
- test/testing.cmake: add msplit tests conditionally.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/CMakeLists.txt: remove PRIVATE c++14 feature.
- metis/apfMETISbalancer.cc: replace lambda auto parameters with
  using-decltype.

Signed-off-by: Aiden Woodruff <[email protected]>
- zoltan/apfZoltan.h: remove tagOpposites declaration.
- zoltan/apfInterElement.cc: remove tagOpposites source.
- apf/apfMesh.h: add tagOpposites declaration.
- reformat documentation comment.
- apf/apfOpposites: add new file with tagOpposites source and helpers.
- apf/CMakeLists.txt: add apfOpposites.cc to SOURCES.

Signed-off-by: Aiden Woodruff <[email protected]>
- ma/maInput.h: add pre, mid, and post METIS flags and documentation.
- ma/maInput.cc (setDefaultValues): initialize new balancer flags to
  false.
- (moreThanOneOptionIsTrue): replace arguments with
  std::initializer_list to support arbitrarily many options.
- (validateInput): use new moreThanOneOptionIsTrue.
- add METIS flags to moreThanOneOptionIsTrue calls.
- reformat to fit 80 characters.
- reject metis flags when compiled without METIS.
- ma/maBalance.cc: run metis in preBalance, midBalance, postBalance if
  requested.

Signed-off-by: Aiden Woodruff <[email protected]>
- test/mbalance.cc: add metis balancer test based on balance.cc.
- test/CMakeLists.txt: add mbalance.
- test/testing.cmake: add mbalance.

Signed-off-by: Aiden Woodruff <[email protected]>
- test/mbalance.cc: add try-catch block to avoid calling pcu::Finalize
  before pcu_obj destructor.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETISbalancer.cc: move graph gather/part scatter into helper
  functions.
- this is the first step of removing direct MPI calls.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETISbalancer.cc (gatherGraph): replace Gatherv with PCU
  Pack/Unpack.
- remove localized vtx_displs in favor of Exscan and send.
- (scatterPart): remove vtx_displs argument.
- replace Scatterv with PCU Pack/Unpack.
- (MetisBalancer::balance): remove MPI_Comm.

Signed-off-by: Aiden Woodruff <[email protected]>
- test/mbalance.cc: fix imbalance argument being an int.

Signed-off-by: Aiden Woodruff <[email protected]>
- .github/workflows/cmake.yml: revert cmake update. the image already
  comes with a new version.
- cmake/FindMETIS.cmake: add METIS_LIBRARIES and METIS_INCLUDE_DIRS as
  output variables.
- make METIS_PREFIX a HINT not a PATH because HINTS are searched first.
- metis/CMakeLists.txt: remove MODULE from find_package to allow using a
  Config script.
- use METIS_LIBRARIES instead of METIS_INCLUDE_DIRS to fix static
  builds.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/CMakeLists.txt: remove apf_zoltan from linking since
  tagOpposites is in apf now.
- metis/apfMETIScommon.cc: remove apfZoltan.h.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETIS.h: move \page metis to metis.dox.
- add function documentation.
- add Partitioning Doxygen group.
- metis/metis.dox: specify localization to rank 0 and change "only
  relevant" to "necessary".
- change "minimize migration" to "minimize data sent" because migration
  may or may not be more than ParMETIS.
- scorec.dox: add metis subpage.
- Doxyfile.in: add metis input files.
- Doxyfile_internal.in: add metis input files.
- apf/apfMesh.h (tagOpposites): update detailed paragraph.
- metis/apfMETISbalancer.cc: update ignoring weights message.
- metis/apfMETISsplitter.cc: update ignoring weights message.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETISbalancer.cc (gatherGraph): check MPI world size, confirm
  no empty parts, and check that the last element of xadj is at least
  correct.
- (scatterPart): check mpi size, vtx_cts is rightly sized, no empty
  part, sum of vtx_counts is right (on non-0 rank vtx_cts and part are
  both empty).
- (MetisBalancer::balance): confirm tolerance is at least more than 1
  (required by METIS).
- remove assert about idx_t.
- metis/apfMETISsplitter.cc: check that tolerance is good and splitting
  to more than 1 part.
- remove idx_t size assert.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETISbalancer.cc (MetisBalancer::balance): replace numbering
  code with Exscan and makeNumbering. Used Exscan in this function
  because gn_offset is required later on.
- metis/apfMETISsplitter.cc (MetisSplitter::split): replace numbering
  with makeNumbering.
- metis/apfMETIScommon.cc: add makeNumbering function which combines the
  behavior of the splitter and balancer. skipped intermediate Numbering*
  to ensure only elements are numbered and let the balancer enforce
  rank contiguity.
- metis/apfMETIScommon.h: add makeNumbering with default start of 0 for
  splitter.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETIScommon.h: add runMETIS.
- metis/aofMETIScommon.cc: add runMETIS which creates necessary input
  arguments, prints errors, and returns bool.
- metis/apfMETISbalancer.cc: use runMETIS.
- add remap timer.
- print that adapt continues without balancing.
- metis/apfMETISsplitter.cc: remove old comment.
- use runMETIS.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETIS.h: add APF_METIS_MAXRANKS macro.
- metis/apfMETISbalancer.cc: fail with error message when more than 256
  peers.
- metis/apfMETISsplitter.cc: fail when more than 256 peers.
- metis/apfMETIScommon.h: add STRINGIFY macro for error messages.
- ma/maBalance.cc: fallback to Parma when PUMI_HAS_METIS but more than
  256 peers to avoid failure.

Signed-off-by: Aiden Woodruff <[email protected]>
- metis/apfMETIScommon.c (makePlan): send to all ranks to avoid empty
  ranks on split.
- metis/apfMETISsplitter.cc (MetisSplitter::split): spread parts during
  multiproc split.
- test/msplit.cc: update msplit documentation.
- (Args): add back factor.
- (getConfig): add factor parameter.
- (main): split into modulo factor parts.
- test/testing.cmake: add 2 to 6 metis split test.

Signed-off-by: Aiden Woodruff <[email protected]>
@cwsmith
Copy link
Contributor

cwsmith commented Jun 11, 2025

/runtests

@github-actions
Copy link

Build Log
Simmetrix Test Result: success
Simmetrix + CGNS Test Result: success

@cwsmith cwsmith merged commit e117d14 into develop Jun 11, 2025
32 checks passed
@cwsmith cwsmith deleted the apw/lgmetis branch June 11, 2025 15:25
@cwsmith cwsmith added the v4.1.0 changes included in the 4.1.0 release label Aug 26, 2025
@cwsmith cwsmith mentioned this pull request Aug 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

v4.1.0 changes included in the 4.1.0 release

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants