ASE-Based Quantity Computers

The chemfit.ase_objective_function module provides quantity computers built around ASE.

Minimal example

from chemfit.ase_objective_function import SinglePointASEComputer, PathAtomsFactory

def construct_calc(atoms):
    atoms.calc = MyCalculator()

def apply_params(atoms, params):
    atoms.calc.parameters.update(params)

computer = SinglePointASEComputer(
    calc_factory=construct_calc,
    param_applier=apply_params,
    atoms_factory=PathAtomsFactory("geometry.xyz"),
)

quants = computer({"param": 1.0})  # returns dict with energy, forces, ...
print(quants["energy"])

The example above follows the standard pattern. You provide four pieces:

  • something that creates an ase.Atoms object

  • something that attaches a calculator to it

  • something that applies fitting parameters to that calculator

  • one or more quantity processors that turn the finished ASE calculation into a quantity dictionary

ChemFit takes care of the rest.

This is useful when you want to fit ASE calculators directly, either on fixed structures or on relaxed ones.

The main classes are:

Implementing the customization points

The arguments passed to SinglePointASEComputer are just small callables with well-defined responsibilities.

In most cases, you do not need to subclass anything. You only need to provide functions (or callable objects) with the expected behavior.

Each component does one thing.

Calculator factory

A calculator factory receives an ase.Atoms object and attaches a calculator to it.

from ase.calculators.lj import LennardJones

def construct_lj(atoms):
    atoms.calc = LennardJones()

The factory should not run the calculation. It should only attach the calculator.

Parameter applier

A parameter applier receives the atoms object and the current parameter dictionary. Its job is to transfer fitting parameters into the attached calculator.

def apply_params_lj(atoms, params):
    atoms.calc.parameters.epsilon = params["epsilon"]
    atoms.calc.parameters.sigma = params["sigma"]

This function is called for every evaluation. It should not rely on cached state.

Atoms factory

An atoms factory takes no arguments and returns a single ase.Atoms object.

from ase import Atoms

def make_dimer():
    return Atoms("Ar2", positions=[[0, 0, 0], [3.5, 0, 0]])

You can also implement factories as callable objects with internal state (this applies to all factories not just the atoms factory):

from ase.io import read

class MyAtomsFactory:
    def __init__(self, path, index=0):
        self.path = path
        self.index = index

    def __call__(self):
        return read(self.path, index=self.index)

PathAtomsFactory is a built-in implementation of this pattern.

Note

The simplest atoms factory shipped with ChemFit is PathAtomsFactory.

It reads a single structure from disk using ASE (ase.io.read()).

from chemfit.ase_objective_function import PathAtomsFactory

atoms_factory = PathAtomsFactory("geometry.traj")
atoms = atoms_factory()

You can also pass an ASE index:

atoms_factory = PathAtomsFactory("trajectory.xyz", index=0)

The important detail is that the factory must return exactly one ase.Atoms object. If the index selects multiple images, PathAtomsFactory raises an exception.

Atoms post-processor

An atoms post-processor receives the atoms object once, before it is cached by the computer instance.

Use it for structure-level setup that should be shared across evaluations.

from ase.constraints import FixAtoms

def freeze_first_atom(atoms):
    atoms.set_constraint(FixAtoms(indices=[0]))

This function runs once during setup, not once per evaluation.

Remarks

Each customization point has a single responsibility:

  • the atoms factory creates atoms

  • the calculator factory attaches a calculator

  • the parameter applier updates calculator parameters

  • the quantity processor extracts quantities

All of these can be implemented either as plain functions or as callable objects (classes implementing __call__).

Using callable objects allows you to store configuration or reference data in __init__ while keeping evaluation itself stateless.

Keeping these responsibilities separate makes the resulting computer easier to test and reuse.

How a SinglePointASEComputer works

SinglePointASEComputer is the basic ASE quantity computer.

A single evaluation does the following:

  1. create the base atoms object once, using atoms_factory

  2. optionally modify it once, using atoms_post_processor

  3. copy the cached atoms object into ctx.temp.atoms

  4. attach a fresh calculator

  5. apply the current parameter dictionary

  6. call atoms.calc.calculate(atoms)

  7. run all quantity processors and merge their outputs

The important consequence is that the reference structure is cached on the computer instance, while each evaluation still uses a fresh copy of the atoms and a fresh calculator.

That means the calculator can mutate internal state without corrupting later evaluations.

Turning an ASE computer into an objective term

In practice, ASE computers are almost always turned into objective terms using with_loss().

from chemfit.ase_objective_function import SinglePointASEComputer, PathAtomsFactory

def construct_lj(atoms):
    atoms.calc = LennardJones()

def apply_params_lj(atoms, params):
    atoms.calc.parameters.epsilon = params["epsilon"]
    atoms.calc.parameters.sigma = params["sigma"]

def loss_function(quants, e_ref):
    return (quants["energy"] - e_ref) ** 2

computer = SinglePointASEComputer(
    calc_factory=construct_lj,
    param_applier=apply_params_lj,
    atoms_factory=PathAtomsFactory("dimer.xyz"),
)

objective_term = computer.with_loss(
    loss_function,
    e_ref=-0.1,
)

loss = objective_term({"epsilon": 1.0, "sigma": 1.0})

Note

with_loss() automatically binds additional keyword arguments to the loss function.

This means you can pass parameters such as e_ref directly, without using functools.partial.

A real Lennard-Jones example

The following example fits a Lennard–Jones potential.

For several interatomic distances, one objective term is created per configuration. Each term evaluates the energy with ASE and compares it to a reference value.

All terms are combined into a single objective and optimized to recover the correct parameters.

import numpy as np

from chemfit.ase_objective_function import SinglePointASEComputer
from chemfit.combined_objective_function import CombinedObjectiveFunction
from chemfit.fitter import Fitter

from conftest import LJAtomsFactory, apply_params_lj, construct_lj

def e_lj(r, eps, sigma):
    return 4 * eps * ((sigma / r)**12 - (sigma / r)**6)

def loss_function(quants, e_ref):
    return (quants["energy"] - e_ref) ** 2

def lj_ob_term(r, eps, sigma):
    computer = SinglePointASEComputer(
        calc_factory=construct_lj,
        param_applier=apply_params_lj,
        atoms_factory=LJAtomsFactory(r),
        tag=f"lj_{r}",
    )

    return computer.with_loss(
        loss_function,
        e_ref=e_lj(r, eps, sigma),
    )

def get_objective(eps, sigma):
    r_min = 2.0 ** (1 / 6) * sigma
    r_list = np.linspace(0.925 * r_min, 3.0 * sigma)

    return CombinedObjectiveFunction(
        objective_functions=[lj_ob_term(r, eps, sigma) for r in r_list]
    )

objective = get_objective(1.0, 1.0)

fitter = Fitter(
    objective,
    initial_params={"epsilon": 2.0, "sigma": 1.5},
)

optimal_params = fitter.fit_scipy()

Quantity processors

A quantity processor is a callable with signature

def processor(calc, atoms) -> dict[str, object]:
    ...

It receives the evaluated calculator and atoms object and returns a dictionary. The outputs of all quantity processors are merged into the final quantity dictionary.

The default quantity processor is DefaultQuantityProcessor.

It returns:

  • everything in calc.results

  • "n_atoms"

A small example:

from chemfit.ase_objective_function import (
    DefaultQuantityProcessor,
    SinglePointASEComputer,
)

computer = SinglePointASEComputer(
    calc_factory=construct_lj,
    param_applier=apply_params_lj,
    atoms_factory=PathAtomsFactory("dimer.xyz"),
    quantity_processors=[DefaultQuantityProcessor()],
)

quants = computer({"epsilon": 1.0, "sigma": 1.0})
print(quants.keys())

One detail matters here: if you pass quantity_processors=None, ChemFit uses [DefaultQuantityProcessor()].

If you pass your own list, ChemFit uses exactly that list. The default processor is not prepended automatically.

So if you want both the raw calculator results and your own derived quantities, include DefaultQuantityProcessor explicitly.

For example:

import numpy as np

from chemfit.ase_objective_function import (
    DefaultQuantityProcessor,
    SinglePointASEComputer,
)

def rms_force_processor(calc, atoms):
    forces = calc.results.get("forces")
    if forces is None:
        return {}
    return {"rms_force": np.sqrt((forces**2).mean())}

computer = SinglePointASEComputer(
    calc_factory=construct_lj,
    param_applier=apply_params_lj,
    atoms_factory=PathAtomsFactory("dimer.xyz"),
    quantity_processors=[
        DefaultQuantityProcessor(),
        rms_force_processor,
    ],
)

quants = computer({"epsilon": 1.0, "sigma": 1.0})
print(quants["energy"])
print(quants["rms_force"])

Warning

Because quantity processor outputs are merged with dict.update(), later processors can overwrite keys returned by earlier ones.

Note

Quantity processors should follow the same basic rule as quantity computers: avoid mutating global state or instance state during evaluation.

In practice, a quantity processor should behave like a pure function of calc and atoms, returning a dictionary of derived quantities.

This matters especially when evaluations may run in parallel.

Atoms post-processing

Sometimes the structure should be modified once before any evaluations happen.

That is what atoms_post_processor is for.

It is applied to the base atoms object before that object is cached.

def freeze_first_atom(atoms):
    from ase.constraints import FixAtoms
    atoms.set_constraint(FixAtoms(indices=[0]))

computer = SinglePointASEComputer(
    calc_factory=construct_lj,
    param_applier=apply_params_lj,
    atoms_factory=PathAtomsFactory("geometry.xyz"),
    atoms_post_processor=freeze_first_atom,
)

Since the processed atoms object is cached, this is the right place for structure-level setup that should be shared across all evaluations of the same computer instance.

MinimizationASEComputer

MinimizationASEComputer is a subclass of SinglePointASEComputer that performs a local geometry optimization before extracting quantities.

Internally it uses ASE’s ase.optimize.BFGS.

The workflow is almost the same as for the single-point computer, except that after prepare_ctx(...) it runs

optimizer = BFGS(ctx.temp.atoms, logfile=None)
optimizer.run(fmax=self.fmax, steps=self.max_steps)

and only then calls the quantity processors.

A minimal example:

from chemfit.ase_objective_function import MinimizationASEComputer

computer = MinimizationASEComputer(
    calc_factory=construct_lj,
    param_applier=apply_params_lj,
    atoms_factory=PathAtomsFactory("geometry.xyz"),
    fmax=1e-4,
    max_steps=500,
)

quants = computer({"epsilon": 1.0, "sigma": 1.0})

The constructor adds three parameters:

MinimizationASEComputer(
    ...,
    dt=1e-2,
    fmax=1e-5,
    max_steps=2000,
)

In the current implementation, fmax and max_steps are used by BFGS. dt is stored for compatibility with older code but is currently unused.

A distance-based example after relaxation

Here is a realistic example from the test suite. The relaxation is handled by MinimizationASEComputer; the custom quantity processor adds a derived geometric quantity.

from chemfit.abstract_objective_function import QuantityComputerObjectiveFunction
from chemfit.ase_objective_function import MinimizationASEComputer, PathAtomsFactory

REF_DISTANCE = 3.2

def compute_dimer_distance(calc, atoms):
    quants = calc.results
    quants["dimer_distance"] = atoms.get_distance(0, 3)
    return quants

objective = QuantityComputerObjectiveFunction(
    quantity_computer=MinimizationASEComputer(
        calc_factory=construct_calc,
        param_applier=apply_params,
        atoms_factory=PathAtomsFactory("ref.traj"),
        quantity_processors=[compute_dimer_distance],
        tag="dimer_distance",
    ),
    loss_function=lambda quants: (quants["dimer_distance"] - REF_DISTANCE) ** 2,
)

The important part here is that the quantity processor runs on the relaxed structure, not on the original one.

A more advanced processor: Kabsch RMSD

Quantity processors do not have to be plain functions. They can also be objects.

If a processor needs fixed reference data, the right place to acquire that state is __init__. It should not be acquired lazily during __call__.

This follows the same rule as in Writing Quantity Computers: evaluation should not mutate hidden instance state.

An example using the Kabsch RMSD against the reference configuration looks like this:

from typing import Any

import numpy as np
from ase import Atoms
from ase.calculators.calculator import Calculator

import chemfit.kabsch as kb
from chemfit.ase_objective_function import AtomsFactory

class KabschDistance:
    def __init__(self, atoms_factory: AtomsFactory):
        self._positions_ref = np.array(atoms_factory().positions, copy=True)

    def __call__(self, calc: Calculator, atoms: Atoms) -> dict[str, Any]:
        kabsch_r, kabsch_t = kb.kabsch(atoms.positions, self._positions_ref)
        positions_aligned = kb.apply_transform(atoms.positions, kabsch_r, kabsch_t)
        kabsch_rmsd = kb.rmsd(positions_aligned, self._positions_ref)

        return {
            "kabsch_t": kabsch_t,
            "kabsch_r": kabsch_r,
            "kabsch_rmsd": kabsch_rmsd,
        }

Used in a minimization-based objective:

objective = MinimizationASEComputer(
    calc_factory=construct_calc,
    param_applier=apply_params,
    atoms_factory=PathAtomsFactory("ref.traj"),
    quantity_processors=[
        KabschDistance(PathAtomsFactory("ref.traj"))
    ],
    tag="kabsch",
).with_loss(lambda quants: quants["kabsch_rmsd"])

Note

If a quantity processor needs persistent reference data, acquire it in __init__ and treat it as fixed thereafter.

Do not lazily initialize or mutate processor state inside __call__. That makes evaluation harder to reason about and can become problematic under parallel execution.

What gets stored in the context

Like any quantity computer, ASE-based computers write their results into the evaluation context.

That means you can inspect quantities after evaluation:

from chemfit.abstract_objective_function import EvaluateContext

ctx = EvaluateContext()
quants = computer({"epsilon": 1.0, "sigma": 1.0}, ctx)

print(ctx.quantities)
print(ctx.to_meta_data())

The ASE computer also uses ctx.temp.atoms internally during evaluation. That is an implementation detail, but it is useful to know when debugging.

Composition vs subclassing

For these ASE computers, composition is usually enough.

Most customization should happen through:

  • calc_factory

  • param_applier

  • atoms_factory

  • atoms_post_processor

  • quantity_processors

That already covers most use cases.

Subclassing only becomes necessary when the evaluation flow itself changes. The main built-in example is MinimizationASEComputer, which keeps the single-point setup but inserts an optimization step before the quantity processors run.

Remarks

The most important practical points are these.

If you provide your own quantity_processors, include DefaultQuantityProcessor yourself if you still want calc.results in the output.

If you need a fixed structure plus a custom derived quantity, SinglePointASEComputer is usually the right tool.

If the quantity should be measured after local relaxation, use MinimizationASEComputer instead.

Once wrapped in a QuantityComputerObjectiveFunction, ASE-based computers integrate with Fitter, Combined Objective Functions, and Parallel Execution.