.. _parallel_execution: Parallel Execution ==================== There are two ways in which parallel execution enters the picture while dealing with a ChemFit objective function: 1. Evaluating the same objective function for different parameters in parallel. 2. Evaluating the terms of a :py:class:`~chemfit.combined_objective_function.CombinedObjectiveFunction` in parallel. This page is meant to showcase example code, making use of these forms of parallelism. 1. Evaluate parameter sets in parallel ----------------------------------------- The main complication in this form of parallelism is that each evaluation carries state (e.g. metadata, intermediate results). ChemFit's context system ensures that this state is preserved and propagated correctly across parallel execution (see :ref:`concepts_parallel_eval`). In practice, if you use the :py:class:`~chemfit.fitter.Fitter` class you won't have to explicitly interact with these nitty gritty details (simply supply ``num_workers`` to :py:meth:`~chemfit.fitter.Fitter.fit_nevergrad`). If you, nonetheless, find yourself in the situation of wanting to evaluate an objective function for multiple parameters in parallel, this will work: .. code-block:: python from concurrent.futures import ThreadPoolExecutor from chemfit.executor_utils import map_with_context from chemfit.abstract_objective_function import EvaluateContext executor = ThreadPoolExecutor(max_workers=4) params_list = [...] ctxs = [EvaluateContext() for _ in params_list] results = map_with_context( executor, objective, params_list, ctxs=ctxs, ) .. note:: Why do we need :py:func:`~chemfit.executor_utils.map_with_context`? Yes, we would get the same results with the built-in ``map`` function of the ``executor``. The difference is that :py:func:`~chemfit.executor_utils.map_with_context` correctly propagates the side-effects of the function evaluation on the context. With a :py:class:`~concurrent.futures.ThreadPoolExecutor` this is usually not an issue, since execution happens in the same process, but a :py:class:`~concurrent.futures.ProcessPoolExecutor` on the other hand will only pickle the **result** of the function and send it back the main process. The :py:func:`~chemfit.executor_utils.map_with_context` function ensures that context updates are propagated correctly by including the context in the returned results. .. note:: **Compute bound** pure python code (in non free-threading builds) will not be sped-up by using ``ThreadPoolExecutor``. The reason is the global interpreter lock (GIL). Generally it is recommended to avoids compute-heavy workloads in python... But if you really have to, you can speed up compute-bound python code by using a process pool. For example :py:class:`concurrent.futures.ProcessPoolExecutor` from the standard library. Be warned though that the required serialization can mean a significant overhead (always measure!). Furthermore, pickling certain functions can be tricky. .. tip:: The ``loky`` package provides a drop-in replacement for :py:class:`concurrent.futures.ProcessPoolExecutor`, which is able to pickle many more functions than the standard library version. 2. Evaluate objective terms in parallel ------------------------------------------ A :py:class:`~chemfit.combined_objective_function.CombinedObjectiveFunction` evaluates multiple terms for the same set of parameters. If these terms are independent and expensive, it can make sense to evaluate them in parallel. ChemFit provides two mechanisms for this: - executor-based parallelism - MPI-based parallelism 2.1 Executor-based parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use :py:class:`~chemfit.parallel_execution.ExecutorWrapperCOB` to evaluate terms in parallel using an executor. .. code-block:: python from concurrent.futures import ThreadPoolExecutor from chemfit.parallel_execution import ExecutorWrapperCOB from chemfit.abstract_objective_function import EvaluateContext executor = ThreadPoolExecutor(max_workers=4) wrapped = ExecutorWrapperCOB(objective, executor=executor) value = wrapped(parameters, EvaluateContext()) This is the simplest way to parallelize a combined objective. Use this when: - each term performs a non-trivial amount of work - the overhead of the executor is small compared to the cost of each term .. note:: As in the previous section, the choice of executor matters. - :py:class:`~concurrent.futures.ThreadPoolExecutor` has low overhead, but is limited by the GIL - :py:class:`~concurrent.futures.ProcessPoolExecutor` allows true parallelism, but introduces serialization overhead 2.2 MPI-based parallelism ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Use :py:class:`~chemfit.parallel_execution.MPIWrapperCOB` to distribute terms across MPI processes. .. code-block:: python from chemfit.parallel_execution import MPIWrapperCOB from chemfit.abstract_objective_function import EvaluateContext with MPIWrapperCOB(objective) as mpi: if mpi.rank == 0: value = mpi(parameters, EvaluateContext()) else: mpi.worker_loop() MPI does not behave like an executor. One process drives the evaluation, while the others wait for work in a loop. Use this when: - you already run your code under MPI - you have many small terms - executor overhead becomes a bottleneck Remarks ^^^^^^^^^ Parallelizing a combined objective only helps if the individual terms are sufficiently expensive. If terms are cheap, the overhead of parallel execution will dominate and performance may degrade.