Running with MPI

The MPI integration in ChemFit parallelizes the evaluation of a CombinedObjectiveFunction across MPI ranks. Each rank evaluates a slice of the combined objective’s terms, and rank 0 reduces their partial sums to a single scalar loss.

Core idea

  • Build a multi-term objective with CombinedObjectiveFunction.

  • Wrap it in MPIWrapperCOB.

  • Rank 0 calls the optimizer on the wrapper; worker ranks run a loop and wait for broadcast work items.

Environment and dependencies

  • A working MPI installation (mpich, Open MPI, etc.)

  • mpi4py installed (optional extra: pip install chemfit[mpi])

Launch your script with:

mpirun -n 4 python script.py

High-level workflow

  • All ranks construct the same CombinedObjectiveFunction and enter the MPIWrapperCOB context.

  • Rank 0 runs fitting on the MPI wrapper object and may call gather_meta_data() when desired.

  • Worker ranks (rank > 0) enter worker_loop() and wait for signals and parameter broadcasts.

Thanks to lazy loading patterns in quantity computers, building the combined objective on every rank is typically cheap; heavy resources are only needed on ranks that actually evaluate those terms.

Minimal example

This example shows the structure.

import numpy as np
from chemfit.fitter import Fitter
from chemfit.combined_objective_function import CombinedObjectiveFunction
from chemfit.mpi_wrapper_cob import MPIWrapperCOB

# all ranks construct the list of terms
terms = magic_from_elsewhere()

cob = CombinedObjectiveFunction(objective_functions=terms)  # weights default to 1.0

# wrap with MPI and run
with MPIWrapperCOB(cob) as mpi_cob:
    if mpi_cob.rank == 0:
        initial_params = {"epsilon": 2.0, "sigma": 1.5}
        fitter = Fitter(mpi_cob, initial_params=initial_params)
        opt_params = fitter.fit_scipy()

        # Optionally collect per-term metadata from all ranks
        meta = mpi_cob.gather_meta_data()
        print(opt_params)
        print(meta)
    else:
        mpi_cob.worker_loop()

How it partitions work

Within each evaluation:

  • Rank 0 broadcasts the parameter dictionary to all ranks.

  • Each rank evaluates its local slice of work.

  • All ranks participate in a reduction (sum) to rank 0.

  • Rank 0 receives the global loss and returns it to the optimizer.

Common pitfalls

  • Forgetting to call worker_loop() on ranks > 0 results in rank 0 blocking forever at the first broadcast.

  • Creating different numbers of terms on different ranks will mis-partition work. Always construct the same CombinedObjectiveFunction on all ranks.

  • Modifying the set of terms after constructing the MPI wrapper is not supported. Build the final combined objective first, then wrap.

Troubleshooting

  • Hang or deadlock at first evaluation:

  • Ensure every non-zero rank entered worker_loop().

  • Ensure all ranks are using the same communicator and number of terms.

  • Immediate exception on worker ranks:

  • Check per-term code paths for assumptions about unavailable files, GPUs,

    or environment on worker nodes.

  • Unexpectedly high wall-clock time:

  • Imbalanced slices if terms differ vastly in cost. Consider grouping similar-cost terms, or split the combined objective into multiple parts and use add_flat to rebalance.

Summary

  • Parallelization is at the objective-term level via CombinedObjectiveFunction.

  • MPIWrapperCOB broadcasts params, slices work, reduces losses, and provides metadata gathering.

  • Rank 0 runs the optimizer; all other ranks run a worker loop.

  • Keep objectives sliceable, deterministic, and consistently constructed across ranks.