ASE-Based Quantity Computers¶
The chemfit.ase_objective_function module provides quantity computers built
around ASE.
Minimal example¶
from chemfit.ase_objective_function import SinglePointASEComputer, PathAtomsFactory
def construct_calc(atoms):
atoms.calc = MyCalculator()
def apply_params(atoms, params):
atoms.calc.parameters.update(params)
computer = SinglePointASEComputer(
calc_factory=construct_calc,
param_applier=apply_params,
atoms_factory=PathAtomsFactory("geometry.xyz"),
)
quants = computer({"param": 1.0}) # returns dict with energy, forces, ...
print(quants["energy"])
The example above follows the standard pattern. You provide four pieces:
something that creates an
ase.Atomsobjectsomething that attaches a calculator to it
something that applies fitting parameters to that calculator
one or more quantity processors that turn the finished ASE calculation into a quantity dictionary
ChemFit takes care of the rest.
This is useful when you want to fit ASE calculators directly, either on fixed structures or on relaxed ones.
The main classes are:
Implementing the customization points¶
The arguments passed to SinglePointASEComputer
are just small callables with well-defined responsibilities.
In most cases, you do not need to subclass anything. You only need to provide functions (or callable objects) with the expected behavior.
Each component does one thing.
Calculator factory¶
A calculator factory receives an ase.Atoms object and attaches a
calculator to it.
from ase.calculators.lj import LennardJones
def construct_lj(atoms):
atoms.calc = LennardJones()
The factory should not run the calculation. It should only attach the calculator.
Parameter applier¶
A parameter applier receives the atoms object and the current parameter dictionary. Its job is to transfer fitting parameters into the attached calculator.
def apply_params_lj(atoms, params):
atoms.calc.parameters.epsilon = params["epsilon"]
atoms.calc.parameters.sigma = params["sigma"]
This function is called for every evaluation. It should not rely on cached state.
Atoms factory¶
An atoms factory takes no arguments and returns a single
ase.Atoms object.
from ase import Atoms
def make_dimer():
return Atoms("Ar2", positions=[[0, 0, 0], [3.5, 0, 0]])
You can also implement factories as callable objects with internal state (this applies to all factories not just the atoms factory):
from ase.io import read
class MyAtomsFactory:
def __init__(self, path, index=0):
self.path = path
self.index = index
def __call__(self):
return read(self.path, index=self.index)
PathAtomsFactory is a built-in
implementation of this pattern.
Note
The simplest atoms factory shipped with ChemFit is
PathAtomsFactory.
It reads a single structure from disk using ASE (ase.io.read()).
from chemfit.ase_objective_function import PathAtomsFactory
atoms_factory = PathAtomsFactory("geometry.traj")
atoms = atoms_factory()
You can also pass an ASE index:
atoms_factory = PathAtomsFactory("trajectory.xyz", index=0)
The important detail is that the factory must return exactly one
ase.Atoms object. If the index selects multiple images,
PathAtomsFactory raises an exception.
Atoms post-processor¶
An atoms post-processor receives the atoms object once, before it is cached by the computer instance.
Use it for structure-level setup that should be shared across evaluations.
from ase.constraints import FixAtoms
def freeze_first_atom(atoms):
atoms.set_constraint(FixAtoms(indices=[0]))
This function runs once during setup, not once per evaluation.
Remarks¶
Each customization point has a single responsibility:
the atoms factory creates atoms
the calculator factory attaches a calculator
the parameter applier updates calculator parameters
the quantity processor extracts quantities
All of these can be implemented either as plain functions or as callable
objects (classes implementing __call__).
Using callable objects allows you to store configuration or reference data in
__init__ while keeping evaluation itself stateless.
Keeping these responsibilities separate makes the resulting computer easier to test and reuse.
How a SinglePointASEComputer works¶
SinglePointASEComputer is the basic
ASE quantity computer.
A single evaluation does the following:
create the base atoms object once, using
atoms_factoryoptionally modify it once, using
atoms_post_processorcopy the cached atoms object into
ctx.temp.atomsattach a fresh calculator
apply the current parameter dictionary
call
atoms.calc.calculate(atoms)run all quantity processors and merge their outputs
The important consequence is that the reference structure is cached on the computer instance, while each evaluation still uses a fresh copy of the atoms and a fresh calculator.
That means the calculator can mutate internal state without corrupting later evaluations.
Turning an ASE computer into an objective term¶
In practice, ASE computers are almost always turned into objective terms
using with_loss().
from chemfit.ase_objective_function import SinglePointASEComputer, PathAtomsFactory
def construct_lj(atoms):
atoms.calc = LennardJones()
def apply_params_lj(atoms, params):
atoms.calc.parameters.epsilon = params["epsilon"]
atoms.calc.parameters.sigma = params["sigma"]
def loss_function(quants, e_ref):
return (quants["energy"] - e_ref) ** 2
computer = SinglePointASEComputer(
calc_factory=construct_lj,
param_applier=apply_params_lj,
atoms_factory=PathAtomsFactory("dimer.xyz"),
)
objective_term = computer.with_loss(
loss_function,
e_ref=-0.1,
)
loss = objective_term({"epsilon": 1.0, "sigma": 1.0})
Note
with_loss()
automatically binds additional keyword arguments to the loss function.
This means you can pass parameters such as e_ref directly, without
using functools.partial.
A real Lennard-Jones example¶
The following example fits a Lennard–Jones potential.
For several interatomic distances, one objective term is created per configuration. Each term evaluates the energy with ASE and compares it to a reference value.
All terms are combined into a single objective and optimized to recover the correct parameters.
import numpy as np
from chemfit.ase_objective_function import SinglePointASEComputer
from chemfit.combined_objective_function import CombinedObjectiveFunction
from chemfit.fitter import Fitter
from conftest import LJAtomsFactory, apply_params_lj, construct_lj
def e_lj(r, eps, sigma):
return 4 * eps * ((sigma / r)**12 - (sigma / r)**6)
def loss_function(quants, e_ref):
return (quants["energy"] - e_ref) ** 2
def lj_ob_term(r, eps, sigma):
computer = SinglePointASEComputer(
calc_factory=construct_lj,
param_applier=apply_params_lj,
atoms_factory=LJAtomsFactory(r),
tag=f"lj_{r}",
)
return computer.with_loss(
loss_function,
e_ref=e_lj(r, eps, sigma),
)
def get_objective(eps, sigma):
r_min = 2.0 ** (1 / 6) * sigma
r_list = np.linspace(0.925 * r_min, 3.0 * sigma)
return CombinedObjectiveFunction(
objective_functions=[lj_ob_term(r, eps, sigma) for r in r_list]
)
objective = get_objective(1.0, 1.0)
fitter = Fitter(
objective,
initial_params={"epsilon": 2.0, "sigma": 1.5},
)
optimal_params = fitter.fit_scipy()
Quantity processors¶
A quantity processor is a callable with signature
def processor(calc, atoms) -> dict[str, object]:
...
It receives the evaluated calculator and atoms object and returns a dictionary. The outputs of all quantity processors are merged into the final quantity dictionary.
The default quantity processor is
DefaultQuantityProcessor.
It returns:
everything in
calc.results"n_atoms"
A small example:
from chemfit.ase_objective_function import (
DefaultQuantityProcessor,
SinglePointASEComputer,
)
computer = SinglePointASEComputer(
calc_factory=construct_lj,
param_applier=apply_params_lj,
atoms_factory=PathAtomsFactory("dimer.xyz"),
quantity_processors=[DefaultQuantityProcessor()],
)
quants = computer({"epsilon": 1.0, "sigma": 1.0})
print(quants.keys())
One detail matters here: if you pass quantity_processors=None, ChemFit uses
[DefaultQuantityProcessor()].
If you pass your own list, ChemFit uses exactly that list. The default processor is not prepended automatically.
So if you want both the raw calculator results and your own derived quantities,
include DefaultQuantityProcessor
explicitly.
For example:
import numpy as np
from chemfit.ase_objective_function import (
DefaultQuantityProcessor,
SinglePointASEComputer,
)
def rms_force_processor(calc, atoms):
forces = calc.results.get("forces")
if forces is None:
return {}
return {"rms_force": np.sqrt((forces**2).mean())}
computer = SinglePointASEComputer(
calc_factory=construct_lj,
param_applier=apply_params_lj,
atoms_factory=PathAtomsFactory("dimer.xyz"),
quantity_processors=[
DefaultQuantityProcessor(),
rms_force_processor,
],
)
quants = computer({"epsilon": 1.0, "sigma": 1.0})
print(quants["energy"])
print(quants["rms_force"])
Warning
Because quantity processor outputs are merged with dict.update(), later
processors can overwrite keys returned by earlier ones.
Note
Quantity processors should follow the same basic rule as quantity computers: avoid mutating global state or instance state during evaluation.
In practice, a quantity processor should behave like a pure function of
calc and atoms, returning a dictionary of derived quantities.
This matters especially when evaluations may run in parallel.
Atoms post-processing¶
Sometimes the structure should be modified once before any evaluations happen.
That is what atoms_post_processor is for.
It is applied to the base atoms object before that object is cached.
def freeze_first_atom(atoms):
from ase.constraints import FixAtoms
atoms.set_constraint(FixAtoms(indices=[0]))
computer = SinglePointASEComputer(
calc_factory=construct_lj,
param_applier=apply_params_lj,
atoms_factory=PathAtomsFactory("geometry.xyz"),
atoms_post_processor=freeze_first_atom,
)
Since the processed atoms object is cached, this is the right place for structure-level setup that should be shared across all evaluations of the same computer instance.
MinimizationASEComputer¶
MinimizationASEComputer is a
subclass of SinglePointASEComputer
that performs a local geometry optimization before extracting quantities.
Internally it uses ASE’s ase.optimize.BFGS.
The workflow is almost the same as for the single-point computer, except that
after prepare_ctx(...) it runs
optimizer = BFGS(ctx.temp.atoms, logfile=None)
optimizer.run(fmax=self.fmax, steps=self.max_steps)
and only then calls the quantity processors.
A minimal example:
from chemfit.ase_objective_function import MinimizationASEComputer
computer = MinimizationASEComputer(
calc_factory=construct_lj,
param_applier=apply_params_lj,
atoms_factory=PathAtomsFactory("geometry.xyz"),
fmax=1e-4,
max_steps=500,
)
quants = computer({"epsilon": 1.0, "sigma": 1.0})
The constructor adds three parameters:
MinimizationASEComputer(
...,
dt=1e-2,
fmax=1e-5,
max_steps=2000,
)
In the current implementation, fmax and max_steps are used by BFGS.
dt is stored for compatibility with older code but is currently unused.
A distance-based example after relaxation¶
Here is a realistic example from the test suite. The relaxation is handled by
MinimizationASEComputer; the custom
quantity processor adds a derived geometric quantity.
from chemfit.abstract_objective_function import QuantityComputerObjectiveFunction
from chemfit.ase_objective_function import MinimizationASEComputer, PathAtomsFactory
REF_DISTANCE = 3.2
def compute_dimer_distance(calc, atoms):
quants = calc.results
quants["dimer_distance"] = atoms.get_distance(0, 3)
return quants
objective = QuantityComputerObjectiveFunction(
quantity_computer=MinimizationASEComputer(
calc_factory=construct_calc,
param_applier=apply_params,
atoms_factory=PathAtomsFactory("ref.traj"),
quantity_processors=[compute_dimer_distance],
tag="dimer_distance",
),
loss_function=lambda quants: (quants["dimer_distance"] - REF_DISTANCE) ** 2,
)
The important part here is that the quantity processor runs on the relaxed structure, not on the original one.
A more advanced processor: Kabsch RMSD¶
Quantity processors do not have to be plain functions. They can also be objects.
If a processor needs fixed reference data, the right place to acquire that state
is __init__. It should not be acquired lazily during __call__.
This follows the same rule as in Writing Quantity Computers: evaluation should not mutate hidden instance state.
An example using the Kabsch RMSD against the reference configuration looks like this:
from typing import Any
import numpy as np
from ase import Atoms
from ase.calculators.calculator import Calculator
import chemfit.kabsch as kb
from chemfit.ase_objective_function import AtomsFactory
class KabschDistance:
def __init__(self, atoms_factory: AtomsFactory):
self._positions_ref = np.array(atoms_factory().positions, copy=True)
def __call__(self, calc: Calculator, atoms: Atoms) -> dict[str, Any]:
kabsch_r, kabsch_t = kb.kabsch(atoms.positions, self._positions_ref)
positions_aligned = kb.apply_transform(atoms.positions, kabsch_r, kabsch_t)
kabsch_rmsd = kb.rmsd(positions_aligned, self._positions_ref)
return {
"kabsch_t": kabsch_t,
"kabsch_r": kabsch_r,
"kabsch_rmsd": kabsch_rmsd,
}
Used in a minimization-based objective:
objective = MinimizationASEComputer(
calc_factory=construct_calc,
param_applier=apply_params,
atoms_factory=PathAtomsFactory("ref.traj"),
quantity_processors=[
KabschDistance(PathAtomsFactory("ref.traj"))
],
tag="kabsch",
).with_loss(lambda quants: quants["kabsch_rmsd"])
Note
If a quantity processor needs persistent reference data, acquire it in
__init__ and treat it as fixed thereafter.
Do not lazily initialize or mutate processor state inside __call__.
That makes evaluation harder to reason about and can become problematic
under parallel execution.
What gets stored in the context¶
Like any quantity computer, ASE-based computers write their results into the evaluation context.
That means you can inspect quantities after evaluation:
from chemfit.abstract_objective_function import EvaluateContext
ctx = EvaluateContext()
quants = computer({"epsilon": 1.0, "sigma": 1.0}, ctx)
print(ctx.quantities)
print(ctx.to_meta_data())
The ASE computer also uses ctx.temp.atoms internally during evaluation.
That is an implementation detail, but it is useful to know when debugging.
Composition vs subclassing¶
For these ASE computers, composition is usually enough.
Most customization should happen through:
calc_factoryparam_applieratoms_factoryatoms_post_processorquantity_processors
That already covers most use cases.
Subclassing only becomes necessary when the evaluation flow itself changes. The
main built-in example is
MinimizationASEComputer, which keeps
the single-point setup but inserts an optimization step before the quantity
processors run.
Remarks¶
The most important practical points are these.
If you provide your own quantity_processors, include
DefaultQuantityProcessor
yourself if you still want calc.results in the output.
If you need a fixed structure plus a custom derived quantity,
SinglePointASEComputer is usually the
right tool.
If the quantity should be measured after local relaxation, use
MinimizationASEComputer instead.
Once wrapped in a
QuantityComputerObjectiveFunction,
ASE-based computers integrate with
Fitter,
Combined Objective Functions, and Parallel Execution.