ChemFit

ChemFit is a framework for fitting parameters of models used in computational chemistry, molecular dynamics, and materials science.

It provides composable building blocks for constructing objective functions from many independent terms, computing intermediate quantities using simulation workflows, and optimizing model parameters. A small set of core abstractions makes it straightforward to implement custom objective functions and to parallelize their evaluation across objective terms and/or trial parameters.

Out of the box, ChemFit includes integrations for the calculators defined in the Atomic Simulation Environment (ASE) as well as file-based simulation pipelines.

Highlights:

  • Designed for atomistic simulations: Build objective functions from quantities computed by ASE calculators or external simulation codes.

  • Composable objectives: Assemble complex fitting targets from many independent objective terms.

  • Parallel by construction: Evaluate objective terms concurrently across processes or MPI ranks without changing the objective definition.

  • Extensible architecture: Implement custom objective functions and simulation interfaces using ChemFit’s core abstractions.


Quickstart

In ChemFit, an objective function is typically built from two layers:

  1. A QuantityComputer computes intermediate quantities from a parameter dictionary.

  2. A loss function maps those quantities to a scalar loss.

Multiple such objective terms can then be combined using CombinedObjectiveFunction and optimized with Fitter.

In practical workflows, the quantity computation may be performed by an ASE calculator, a file-based simulation pipeline, or custom Python code.

The following minimal example simply defines a loss function

\[L(\text{params}) = (x^2 + y^2 - 2)^2,\]

where \(x^2\) and \(y^2\) are intermediate quantities:

from chemfit.wrap_funcs import to_quantity_computer

@to_quantity_computer()
def computer(params):
    return {"x2": params["x"] ** 2, "y2": params["y"] ** 2}

def square_deviation(q, target):
    return (q["x2"] + q["y2"] - target)**2

ob = computer.with_loss(square_deviation, target=2)

PARAMS = {"x": 1.0, "y": 2.0}
print(ob(PARAMS)) # <-- 9.0

The quantity computer is defined via a simple Python function, mapping a dict of parameters to a dict of quantities. The to_quantity_computer() decorator turns this function into a WrappedQuantityComputer instance.

The with_loss() method combines a quantity computer with a loss function to form a complete objective.

Note

The wrapped WrappedQuantityComputer differs from the plain function in that it can optionally accept an evaluation context (EvaluateContext).

This context is used internally by ChemFit to enable parallel evaluation and to collect metadata during execution. In most cases, you do not need to interact with it directly.

For more details, see Concepts.

Combining functions

Often an objective consists of many independent contributions. ChemFit provides CombinedObjectiveFunction to combine multiple objective terms into a single loss.

In the next example, we first define a parametrized loss term

\[T(\text{params},f,\text{target}) = (f x^2 + f y^2 - \text{target})^2,\]

where \(f\) is an external parameter and then we combine them into an overall loss

\[L(\text{params}) = T(\text{params},1,1) + T(\text{params},2,2).\]
from chemfit.wrap_funcs import to_quantity_computer
from chemfit.combined_objective_function import CombinedObjectiveFunction

@to_quantity_computer()
def computer(params, f):
    return {"fx2": f * params["x"] ** 2, "fy2": f * params["y"] ** 2}

def loss(q, target):
    return (q["fx2"] + q["fy2"] - target) ** 2

terms = [
    computer.bind(f=1).with_loss(loss, target=1),
    computer.bind(f=2).with_loss(loss, target=2)
]

PARAMS = {"x": 1, "y": 2}
combined = CombinedObjectiveFunction(terms)
print(combined(PARAMS)) # <-- 80.0

The CombinedObjectiveFunction can be customized in many ways. Details can be found in the dedicated Combined Objective Functions page.

The evaluation of the terms of a CombinedObjectiveFunction can be parallelized in two alternate ways:

  1. Executors which implement the ExecutorLike interface. For example from concurrent.futures.

  2. By launching multiple processes and using the message passing interface (MPI). See Running with MPI for details.

The following demonstrates the use of the “ExecutorLike” approach

from chemfit.combined_objective_function import CombinedObjectiveFunction
from chemfit.executor_wrapper_cob import ExecutorWrapperCOB
from concurrent.futures import ThreadPoolExecutor

cob = CombinedObjectiveFunction( terms ) # define the combined objective function
cob_parallel = ExecutorWrapperCOB(cob, executor=ThreadPoolExecutor(4))
print(cob_parallel(PARAMS)) # <-- evaluation of the wrapper parallelizes over the terms

Optimizing

In principle, objective functions defined with ChemFit can be optimized in any way you like. For convenience a Fitter class is provided.

It can be used as follows

from chemfit.fitter import Fitter
import math

fitter = Fitter(objective_function=cob_parallel, initial_params=PARAMS)

# fit with nevergrad
optimal_params = fitter.fit_nevergrad(budget=10) # <--- search solutions with a budget of 10

# fit with scipy
optimal_params = fitter.fit_scipy()

assert math.isclose(optimal_params["x"], 0.44721359813354555)
assert math.isclose(optimal_params["y"], 0.894427188104411)

The Fitter can be configured in many ways, for details refer to the Fitter page.

Contents