chemfit package

Submodules

chemfit.abstract_objective_function module

class ObjectiveFunctor[source]

Bases: ABC

abstractmethod __call__(parameters: dict[str, Any]) float[source]

Compute the objective value given a set of parameters.

Parameters:

parameters – Dictionary of parameter names to float values.

Returns:

Computed objective value (e.g., error metric).

Return type:

float

abstractmethod get_meta_data() dict[str, Any][source]

Get meta data.

class QuantityComputer[source]

Bases: ABC

__call__(parameters: dict[str, Any]) dict[str, Any][source]

Call self as a function.

__init__()[source]

Initialize the QuantityComputer.

get_meta_data() dict[str, Any][source]

Get meta data.

class QuantityComputerObjectiveFunction(loss_function: Callable[[dict[str, Any]], float] | ObjectiveFunctor, quantity_computer: QuantityComputer)[source]

Bases: ObjectiveFunctor

__call__(parameters: dict[str, Any]) float[source]

Compute the objective value given a set of parameters.

Parameters:

parameters – Dictionary of parameter names to float values.

Returns:

Computed objective value (e.g., error metric).

Return type:

float

__init__(loss_function: Callable[[dict[str, Any]], float] | ObjectiveFunctor, quantity_computer: QuantityComputer) None[source]

Initialize the objective function with a quantity computer.

get_meta_data() dict[str, Any][source]

Get meta data.

class SupportsGetMetaData(*args, **kwargs)[source]

Bases: Protocol

__init__(*args, **kwargs)
get_meta_data() dict[str, Any][source]

chemfit.ase_objective_function module

class AtomsFactory(*args, **kwargs)[source]

Bases: Protocol

Protocol for a function that creates an ASE Atoms object.

__call__() Atoms[source]

Create an atoms object.

__init__(*args, **kwargs)
class AtomsPostProcessor(*args, **kwargs)[source]

Bases: Protocol

Protocol for a function that post-processes an ASE Atoms object.

__call__(atoms: Atoms) None[source]

Modify the atoms in-place.

__init__(*args, **kwargs)
class CalculatorFactory(*args, **kwargs)[source]

Bases: Protocol

Protocol for a factory that constructs an ASE calculator in-place and attaches it to atoms.

__call__(atoms: Atoms) None[source]

Construct a calculator and overwrite atoms.calc.

__init__(*args, **kwargs)
class MinimizationASEComputer(dt: float = 0.01, fmax: float = 1e-05, max_steps: int = 2000, **kwargs)[source]

Bases: SinglePointASEComputer

Computer based on the closes local minimum.

__init__(dt: float = 0.01, fmax: float = 1e-05, max_steps: int = 2000, **kwargs) None[source]

Initialize a MinimizationASEComputer.

All kwargs are passed to SinglePointASEComputer.__init__.

Parameters:
  • dt – Time step for relaxation.

  • fmax – Force convergence criterion.

  • max_steps – Maximum optimizer steps.

relax_structure(parameters: dict[str, Any]) None[source]
class ParameterApplier(*args, **kwargs)[source]

Bases: Protocol

Protocol for a function that applies parameters to an ASE calculator.

__call__(atoms: Atoms, params: dict[str, Any]) None[source]

Applies a parameter dictionary to atoms.calc in-place.

__init__(*args, **kwargs)
class PathAtomsFactory(path: Path, index: int | None = None)[source]

Bases: AtomsFactory

Implementation of AtomsFactory which reads the atoms from a path.

__call__() Atoms[source]

Create an atoms object.

__init__(path: Path, index: int | None = None) None[source]

Initialize a path atoms factory.

class QuantityProcessor(*args, **kwargs)[source]

Bases: Protocol

Protocol for a function that returns the quantities after the calculate function.

__call__(calc: Calculator, atoms: Atoms) dict[str, Any][source]

Call self as a function.

__init__(*args, **kwargs)
class SinglePointASEComputer(calc_factory: CalculatorFactory, param_applier: ParameterApplier, atoms_factory: AtomsFactory, atoms_post_processor: AtomsPostProcessor | None = None, quantity_processors: list[QuantityProcessor] | None = None, tag: str | None = None)[source]

Bases: QuantityComputer

Base class for a single point ASE-based computer.

This class loads a reference configuration, optionally post-processes the structure, attaches a calculator, and provides an interface for evaluating parameters

__init__(calc_factory: CalculatorFactory, param_applier: ParameterApplier, atoms_factory: AtomsFactory, atoms_post_processor: AtomsPostProcessor | None = None, quantity_processors: list[QuantityProcessor] | None = None, tag: str | None = None) None[source]

Initialize a SinglePointASEComputer.

Parameters:
  • calc_factory – Factory to create an ASE calculator given an Atoms object.

  • param_applier – Function that applies a dict of parameters to atoms.calc.

  • atoms_factory – Function to create the Atoms object.

  • atoms_post_processor – Optional function to modify or validate the Atoms object immediately after loading and before attaching the calculator.

  • quantities_processors – list of functions called after the calculate function to update the quantities dictionary

  • tag – Optional label for this computer. Defaults to “tag_None” if None.

create_atoms_object() Atoms[source]

Create the atoms object, check it, optionally post-processes it, and attach the calculator.

Returns:

ASE Atoms object with calculator attached.

Return type:

Atoms

get_meta_data() dict[str, Any][source]

Retrieve metadata for this objective function.

Returns:

Dictionary containing:

tag: User-defined label. n_atoms: Number of atoms in the configuration. weight: Final weight after any scaling. last_energy: The last computed energy

Return type:

dict[str, Union[str, int, float]]

property atoms

The atoms object. Accessing this property for the first time will create the atoms object.

property n_atoms

The number of atoms in the atoms object. May trigger creation of the atoms object.

check_protocol(obj: Any | None, prot: Any)[source]
default_quantity_processor(calc: Calculator, atoms: Atoms) dict[str, Any][source]

chemfit.combined_objective_function module

class CombinedObjectiveFunction(objective_functions: Sequence[Callable[[dict[str, Any]], float]], weights: Sequence[float] | None = None)[source]

Bases: ObjectiveFunctor

Represents a weighted sum of multiple objective functions.

Each objective function accepts a dictionary of parameters (str -> float) and returns a float. Internally, each function is paired with a non-negative weight. Calling the instance returns the weighted sum of all objective-function evaluations.

classmethod add_flat(combined_objective_functions_list: Sequence[Self], weights: Sequence[float] | None = None) Self[source]

Create a new, “flat” CombinedObjectiveFunction by merging multiple existing instances.

Each input instance is scaled by its corresponding weight, and all internal objective functions are concatenated into a single-level structure.

Parameters:
  • combined_objective_functions_list (Sequence[CombinedObjectiveFunction]) – A sequence of CombinedObjectiveFunction instances to combine.

  • weights (Sequence[float]) – A sequence of non-negative floats, one per CombinedObjectiveFunction. Each sub-instance’s internal weights are multiplied by its associated weight.

Returns:

A new instance whose objective_functions list is the

concatenation of all sub-instances’ objective functions, and whose weights list is the scaled and concatenated weights.

Return type:

CombinedObjectiveFunction

Raises:

AssertionError – If the lengths of combined_objective_functions_list and weights differ, or if any weight is negative.

__call__(params: dict[str, Any], idx_slice: slice = slice(None, None, None)) float[source]

Evaluate the combined objective at a given parameter dictionary.

Each individual objective function is called (with a shallow copy of params), multiplied by its weight, and summed into a single scalar result.

Parameters:

params (dict) – A dictionary mapping parameter names (str) to values (float). A copy is made for each objective function call to guard against in-place modifications.

Returns:

The weighted sum of all objective-function evaluations.

Return type:

float

__init__(objective_functions: Sequence[Callable[[dict[str, Any]], float]], weights: Sequence[float] | None = None) None[source]

Initialize a CombinedObjectiveFunction.

Parameters:
  • objective_functions (Sequence[Callable[[dict], float]]) – A sequence of callables. Each callable must accept a dictionary mapping parameter names (str) to values (float) and return a float.

  • weights (Sequence[float], optional) – A sequence of non-negative floats specifying the weight for each objective function. If None, all weights default to 1.0.

Raises:

AssertionError – If weights is provided but its length differs from the number of objective functions, or if any weight is negative.

add(obj_funcs: Sequence[Callable[[dict[str, Any]], float]] | Callable[[dict[str, Any]], float], weights: Sequence[float] | float = 1.0) Self[source]

Add one or more objective functions (and corresponding weights) to this instance.

If obj_funcs is a single callable, it is appended; if it is a sequence of callables, each is appended in order. The weights argument must align: - If weights is a single float, that same weight is used for each newly added function. - If weights is a sequence, its length must match the number of functions being added.

Parameters:
  • (Callable[dict] (obj_funcs) – or Sequence[Callable[[dict], float]]): Either a single objective-function callable or a sequence of such callables. Each callable must accept a dict and return a float.

  • float] – or Sequence[Callable[[dict], float]]): Either a single objective-function callable or a sequence of such callables. Each callable must accept a dict and return a float.

  • weights (float or Sequence[float], optional) – Either a float (used for every new function) or a sequence of non-negative floats. If a sequence, its length must equal the number of functions in obj_funcs. Defaults to 1.0.

Returns:

The current instance (allows chaining).

Return type:

Self

Raises:

AssertionError – If weights is a sequence but its length does not match the number of functions in obj_funcs, or if any provided weight is negative.

gather_meta_data(idx_slice: slice = slice(None, None, None)) list[dict[str, Any] | None][source]

Gather the meta data of each term and append it to a list.

If a slice is specified via the index argument the list only contains the results of the slice.

get_meta_data() dict[str, Any][source]

Get meta data.

n_terms() int[source]

Return the number of objective terms.

Returns:

The number of (function, weight) pairs stored internally.

Return type:

int

chemfit.data_utils module

process_csv(paths_to_csv: Path | Sequence[Path], index: slice | Sequence[slice] = slice(None, None, None)) tuple[list[Path], list[str], list[float]][source]

Load a dataset CSV and extract file paths, tags, and reference energies.

If a list of paths is passed it forwards them one by one to process_single_csv and collects the results.

Parameters:
  • paths_to_csv (Union[Path, Sequence[Path]]) – Either a single path to a CSV for a list of paths

  • index (Union[slice, Sequence[slice]]) – Either a single slice or a list of slices which is applied to the data read from the CSVs

Returns:

  • paths: List of resolved Path objects to each data file.

  • tags: List of dataset tag strings.

  • energies: List of reference energies as floats.

Return type:

tuple[list[Path], list[str], list[float]]

process_single_csv(path_to_csv: Path, index: slice = slice(None, None, None)) tuple[list[Path], list[str], list[float]][source]

Load a dataset CSV and extract file paths, tags, and reference energies.

The CSV must include the following columns:
  • Either path or file:
    • If path is present, each entry may be absolute or relative to the current working directory.

    • Otherwise, file entries are taken as relative to the CSV’s parent directory.

    • If both are present, path takes precedence.

  • tag: A short string label for each dataset.

  • reference_energy: A numeric reference energy for each dataset.

Additional columns are permitted and ignored.

Parameters:
  • path_to_csv (Path) – Path to the CSV file describing the datasets.

  • index (slice) slice(None, None, None) – A slice which is applied to the data read from the CSV

Returns:

  • paths: List of resolved Path objects to each data file.

  • tags: List of dataset tag strings.

  • energies: List of reference energies as floats.

Return type:

tuple[list[Path], list[str], list[float]]

Raises:
  • FileNotFoundError – If the CSV file does not exist.

  • KeyError – If neither path nor file, or if tag or reference_energy columns are missing.

  • ValueError – If any reference_energy value cannot be converted to float.

chemfit.debug_utils module

log_all_methods(obj: LoggedObject, log_func: Callable[[str], None], *args, **kwargs) LoggedObject[source]

Return a proxy object that logs method calls and delegates everything to obj.

log_invocation(func: Callable[[Any], T], log_func: Callable[[str], None], log_args: bool = True, log_res: bool = True) Callable[[Any], T][source]

chemfit.fitter module

class CallbackInfo(opt_params: 'dict[str, Any]', opt_loss: 'float', cur_params: 'dict[str, Any]', cur_loss: 'float', step: 'int', info: 'FitInfo')[source]

Bases: object

__init__(opt_params: dict[str, Any], opt_loss: float, cur_params: dict[str, Any], cur_loss: float, step: int, info: FitInfo) None
cur_loss: float
cur_params: dict[str, Any]
info: FitInfo
opt_loss: float
opt_params: dict[str, Any]
step: int
class FitInfo(initial_value: 'float | None' = None, final_value: 'float | None' = None, time_taken: 'float | None' = None, n_evals: 'int' = 0)[source]

Bases: object

__init__(initial_value: float | None = None, final_value: float | None = None, time_taken: float | None = None, n_evals: int = 0) None
final_value: float | None = None
initial_value: float | None = None
n_evals: int = 0
time_taken: float | None = None
class Fitter(objective_function: Callable[[dict[str, Any]], float], initial_params: dict[str, Any], bounds: dict[str, Any] | None = None, near_bound_tol: float | None = None, value_bad_params: float = 100000.0)[source]

Bases: object

__init__(objective_function: Callable[[dict[str, Any]], float], initial_params: dict[str, Any], bounds: dict[str, Any] | None = None, near_bound_tol: float | None = None, value_bad_params: float = 100000.0) None[source]

Initialize a Fitter.

Parameters:
  • objective_function (Callable[[dict], float]) – The objective function to be minimized.

  • initial_params (dict) – Initial values of the parameters.

  • bound (Optional[dict]) – Dictionary specifying bounds for each parameter.

  • near_bound_tol (Optional[float]) – If specified, checks whether any parameters are too close to their bounds and logs a warning if so.

  • value_bad_params (float) – Threshold value beyond which the objective function is considered to be in a poor or invalid region.

fit_nevergrad(budget: int, optimizer_str: str = 'NgIohTuned', **kwargs) dict[str, Any][source]
fit_scipy(method: str = 'L-BFGS-B', **kwargs) dict[str, Any][source]

Optimize parameters using SciPy’s minimize function.

Parameters:
  • initial_parameters (dict) – Initial guess for each parameter, as a mapping from name to value.

  • **kwargs – Additional keyword arguments passed directly to scipy.optimize.minimize.

Returns:

Dictionary of optimized parameter values.

Return type:

dict

Warning

If the optimizer does not converge, a warning is logged.

Example

>>> def objective_function(idx: int, params: dict):
...     return 2.0 * (params["x"] - 2) ** 2 + 3.0 * (params["y"] + 1) ** 2
>>> fitter = Fitter(objective_function=objective_function)
>>> initial_params = dict(x=0.0, y=0.0)
>>> optimal_params = fitter.fit_scipy(initial_parameters=initial_params)
>>> print(optimal_params)
{'x': 2.0, 'y': -1.0}
hook_post_fit(opt_params: dict[str, Any])[source]

A hook, which is invoked after optimizing.

hook_pre_fit()[source]

A hook, which is invoked before optimizing.

ob_func_wrapper(ob_func: Any) Callable[[dict[str, Any]], float][source]

Wraps the objective function and applies some checks plus logging.

register_callback(func: Callable[[CallbackInfo], None], n_steps: int)[source]

Register a callback which is executed after every n_steps of the optimization.

Multiple callbacks may be registered. They are executed in the order of registration. The callback must be a callable with the following signature:

func(arg: CallbackInfo)

The CallbackInfo is a dataclass with the following attributes:
  • opt_params: The optimal parameters at the time the callback is invoked.

  • opt_loss: The loss value corresponding to the optimal parameters.

  • cur_params: The parameters tested most recently when the callback is invoked.

  • cur_loss: The loss value associated with the most recently tested parameters.

  • step: The number of optimization steps performed so far

    (generally not equal to the number of loss function evaluations).

  • info: The current FitInfo instance of the fitter at the time the callback is invoked.

chemfit.kabsch module

apply_transform(P: ndarray[tuple[Any, ...], dtype[float64]], R: ndarray[tuple[Any, ...], dtype[float64]], t: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]][source]

Apply affine transform defined by rotation R and translation t to points P.

kabsch(P: ndarray[tuple[Any, ...], dtype[float64]], Q: ndarray[tuple[Any, ...], dtype[float64]], weights: ndarray[tuple[Any, ...], dtype[float64]] | None = None, allow_reflection: bool = False) tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]][source]

Compute the optimal rigid transformation that aligns P onto Q using the Kabsch algorithm.

This implementation assumes row-vector points of shape (N, D) and solves for rotation R and translation t in the mapping:

Q ≈ P @ R + t

The solution minimizes the root-mean-square deviation (RMSD) between P transformed and Q, optionally with per-point weights.

The algorithm:

  1. Compute centroids of P and Q (weighted if weights provided).

  2. Subtract centroids to get centered coordinates P0, Q0.

  3. Compute the cross-covariance matrix:

    C = P0.T @ Q0          # (D, D) for row-vector convention
    
  4. Perform singular value decomposition:

    U, S, Vt = np.linalg.svd(C)
    
  5. Compute rotation:

    R = Vt.T @ U.T
    

    If allow_reflection is False and det(R) < 0, flip the sign of the last row of Vt before recomputing R to ensure a proper rotation (det(R) = +1).

  6. Compute translation:

    t = cQ - cP @ R
    
Parameters:
  • P (ndarray of shape (N, D)) – Source point coordinates.

  • Q (ndarray of shape (N, D)) – Target point coordinates, corresponding 1-to-1 with P.

  • weights (ndarray of shape (N,), optional) – Nonnegative weights for each correspondence. If provided, centroids and covariance are computed with these weights.

  • allow_reflection (bool, default=False) – If False, the solution will have det(R) >= 0 (proper rotation). If True, improper rotations (reflections) are allowed.

Returns:

  • R (ndarray of shape (D, D)): Optimal rotation matrix.

  • t (ndarray of shape (D,)): Translation vector.

Return type:

Tuple[ndarray, ndarray]

Raises:

ValueError – If P and Q have mismatched shapes, fewer than D points are provided, or if weights are invalid (negative, wrong shape, or zero sum).

Notes

  • Works for any dimensionality D >= 2.

  • For column-vector convention (R @ P + t), the covariance and multiplication order must be adjusted.

  • The returned transform is optimal in the least-squares sense and preserves distances (no scaling or shearing).

rmsd(A: ndarray[tuple[Any, ...], dtype[float64]], B: ndarray[tuple[Any, ...], dtype[float64]], weights: ndarray[tuple[Any, ...], dtype[float64]] | None = None) float[source]

Root mean square deviation between two point sets A and B.

chemfit.mpi_wrapper_cob module

class MPIWrapperCOB(cob: CombinedObjectiveFunction, comm: Any | None = None, mpi_debug_log: bool = False)[source]

Bases: ObjectiveFunctor

__call__(params: dict[str, Any]) float[source]

Compute the objective value given a set of parameters.

Parameters:

parameters – Dictionary of parameter names to float values.

Returns:

Computed objective value (e.g., error metric).

Return type:

float

__init__(cob: CombinedObjectiveFunction, comm: Any | None = None, mpi_debug_log: bool = False) None[source]

Initialize wrapper for combined objective function.

gather_meta_data() list[dict[str, Any] | None][source]
get_meta_data() dict[str, Any][source]

Get meta data.

worker_gather_meta_data()[source]
worker_loop()[source]
worker_process_params(params: dict[str, Any])[source]
class Signal(*values)[source]

Bases: Enum

ABORT = -1
GATHER_META_DATA = 0
slice_up_range(n: int, n_ranks: int)[source]

chemfit.plot_utils module

plot_energies(energy_ref: Sequence[float], energy_fit: Sequence[float], n_atoms: Sequence[int], tags: Sequence[str], output_folder: Path) None[source]
plot_progress_curve(progress: list[float], outpath: Path) None[source]

Save a semi-log plot of the objective values (progress) versus iteration index.

tags_as_ticks(ax: Axes, tags: Sequence[str], **kwargs)[source]

chemfit.scme_factories module

class SCMECalculatorFactory(default_scme_params: dict[str, Any], path_to_scme_expansions: Path | None, parametrization_key: str | None)[source]

Bases: object

__call__(atoms: Atoms) Any[source]

Call self as a function.

__init__(default_scme_params: dict[str, Any], path_to_scme_expansions: Path | None, parametrization_key: str | None) None[source]

Create an SCME calculator.

class SCMEParameterApplier[source]

Bases: object

__call__(atoms: Atoms, params: dict[str, Any]) None[source]

Assign SCME parameter values to the attached calculator.

chemfit.scme_setup module

arrange_water_in_ohh_order(atoms: Atoms) Atoms[source]

Reorder atoms so each water molecule appears as O, H, H.

Parameters:

atoms (Atoms) – Original Atoms object containing water molecules.

Returns:

New Atoms object with OHH ordering and no constraints.

Return type:

Atoms

Raises:

ValueError – If atom counts or ratios are inconsistent with water.

check_water_is_in_ohh_order(atoms: Atoms, oh_distance_tol: float = 2.0) bool[source]

Validate that each water molecule is ordered O, H, H and within tolerance.

Parameters:
  • atoms (Atoms) – Atoms object to validate.

  • OH_distance_tol (float, optional) – Maximum allowed O-H distance (default is 2.0 Å).

Raises:

ValueError – If ordering or distances violate water OHH assumptions.

setup_calculator(atoms: Atoms, params: dict[str, Any], path_to_scme_expansions: Path | None, parametrization_key: str | None) pyscme.scme_calculator.SCMECalculator[source]
setup_expansions(calc: pyscme.scme_calculator.SCMECalculator, parametrization_key: str, path_to_scme_expansions: Path)[source]

chemfit.utils module

class ExtendedJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]

Bases: JSONEncoder

default(o: Any)[source]

Implement this method in a subclass such that it returns a serializable object for o, or calls the base implementation (to raise a TypeError).

For example, to support arbitrary iterators, you could implement default like this:

def default(self, o):
    try:
        iterable = iter(o)
    except TypeError:
        pass
    else:
        return list(iterable)
    # Let the base class default method raise the TypeError
    return super().default(o)
check_params_near_bounds(params: dict[str, Any], bounds: dict[str, Any], relative_tol: float) list[tuple[str, float, float, float]][source]

Check if any of the parameters are near or beyond the bounds.

The criterions checked are

  1. param_value < lower + relative_tol * (upper - lower)

  2. param_value > upper - relative_tol * (upper - lower)

Parameters:
  • params (dict) – the dict of params to check

  • bounds (dict) – the dict of bounds to check

  • relative_tol (float) – The tolerance, relative to the span of the bounds. Positive numbers mean the values must fulfill a stricter bound Zero means the values must fulfill the exact bound Negative numbers mean the values must fulfill a looser bound

Returns:

A list of tuples with information about parameters, which violate the constraint. Each tuple contains - A string identifying the parameter in a flattened dict - The value of the parameter - The lower bound - The upper bound

dump_dict_to_file(file: Path, dictionary: dict) None[source]

Write dictionary as JSON to file (with indent=4).

next_free_folder(base: Path) Path[source]

If ‘path/to/base’ does not exist, return ‘path/to/base’. Otherwise attempt ‘path/to/base_0’, ‘path/to/base_1’, etc. until finding a non-existent Path, then return that.

Module contents