.. _mpi:

==================
Running with MPI
==================

The MPI integration in ChemFit parallelizes the evaluation of a
:py:class:`~chemfit.combined_objective_function.CombinedObjectiveFunction`
across MPI ranks. Each rank evaluates a slice of the combined objective's
terms, and rank 0 reduces their partial sums to a single scalar loss.

Core idea
---------

- Build a multi-term objective with
  :py:class:`~chemfit.combined_objective_function.CombinedObjectiveFunction`.
- Wrap it in :py:class:`~chemfit.mpi_wrapper_cob.MPIWrapperCOB`.
- Rank 0 calls the optimizer on the wrapper; worker ranks run a loop and
  wait for broadcast work items.


Environment and dependencies
----------------------------

- A working MPI installation (mpich, Open MPI, etc.)
- ``mpi4py`` installed (optional extra: ``pip install chemfit[mpi]``)

Launch your script with:

::

   mpirun -n 4 python script.py


High-level workflow
-------------------

- **All ranks** construct the same :py:class:`~chemfit.combined_objective_function.CombinedObjectiveFunction` and enter the :py:class:`~chemfit.mpi_wrapper_cob.MPIWrapperCOB` context.
- **Rank 0** runs fitting on the MPI wrapper object and may call ``gather_meta_data()`` when desired.
- **Worker ranks (rank > 0)** enter ``worker_loop()`` and wait for signals and parameter broadcasts.

Thanks to lazy loading patterns in quantity computers, building the combined
objective on every rank is typically cheap; heavy resources are only needed
on ranks that actually evaluate those terms.

Minimal example
---------------------------------------

This example shows the structure.

.. code-block:: python

   import numpy as np
   from chemfit.fitter import Fitter
   from chemfit.combined_objective_function import CombinedObjectiveFunction
   from chemfit.mpi_wrapper_cob import MPIWrapperCOB

   # all ranks construct the list of terms
   terms = magic_from_elsewhere()

   cob = CombinedObjectiveFunction(objective_functions=terms)  # weights default to 1.0

   # wrap with MPI and run
   with MPIWrapperCOB(cob) as mpi_cob:
       if mpi_cob.rank == 0:
           initial_params = {"epsilon": 2.0, "sigma": 1.5}
           fitter = Fitter(mpi_cob, initial_params=initial_params)
           opt_params = fitter.fit_scipy()

           # Optionally collect per-term metadata from all ranks
           meta = mpi_cob.gather_meta_data()
           print(opt_params)
           print(meta)
       else:
           mpi_cob.worker_loop()

How it partitions work
----------------------

Within each evaluation:

- Rank 0 broadcasts the parameter dictionary to all ranks.
- Each rank evaluates its local slice of work.
- All ranks participate in a reduction (sum) to rank 0.
- Rank 0 receives the global loss and returns it to the optimizer.

Common pitfalls
---------------

- Forgetting to call ``worker_loop()`` on ranks > 0 results in rank 0 blocking
  forever at the first broadcast.
- Creating different numbers of terms on different ranks will mis-partition work.
  Always construct the same ``CombinedObjectiveFunction`` on all ranks.
- Modifying the set of terms after constructing the MPI wrapper is not supported.
  Build the final combined objective first, then wrap.

Troubleshooting
---------------

- Hang or deadlock at first evaluation:
- Ensure every non-zero rank entered ``worker_loop()``.
- Ensure all ranks are using the same communicator and number of terms.
- Immediate exception on worker ranks:
- Check per-term code paths for assumptions about unavailable files, GPUs,
    or environment on worker nodes.
- Unexpectedly high wall-clock time:
- Imbalanced slices if terms differ vastly in cost. Consider grouping similar-cost
  terms, or split the combined objective into multiple parts and use
  ``add_flat`` to rebalance.

Summary
-------

- Parallelization is at the **objective-term** level via
  :py:class:`~chemfit.combined_objective_function.CombinedObjectiveFunction`.
- :py:class:`~chemfit.mpi_wrapper_cob.MPIWrapperCOB` broadcasts params, slices work,
  reduces losses, and provides metadata gathering.
- Rank 0 runs the optimizer; all other ranks run a worker loop.
- Keep objectives sliceable, deterministic, and consistently constructed across ranks.