chemfit package¶
Submodules¶
chemfit.abstract_objective_function module¶
- class ObjectiveFunctor[source]¶
Bases:
ABC
- class QuantityComputerObjectiveFunction(loss_function: Callable[[dict[str, Any]], float] | ObjectiveFunctor, quantity_computer: QuantityComputer)[source]¶
Bases:
ObjectiveFunctor- __call__(parameters: dict[str, Any]) float[source]¶
Compute the objective value given a set of parameters.
- Parameters:
parameters – Dictionary of parameter names to float values.
- Returns:
Computed objective value (e.g., error metric).
- Return type:
float
- __init__(loss_function: Callable[[dict[str, Any]], float] | ObjectiveFunctor, quantity_computer: QuantityComputer) None[source]¶
Initialize the objective function with a quantity computer.
chemfit.ase_objective_function module¶
- class AtomsFactory(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for a function that creates an ASE Atoms object.
- __init__(*args, **kwargs)¶
- class AtomsPostProcessor(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for a function that post-processes an ASE Atoms object.
- __init__(*args, **kwargs)¶
- class CalculatorFactory(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for a factory that constructs an ASE calculator in-place and attaches it to atoms.
- __init__(*args, **kwargs)¶
- class MinimizationASEComputer(dt: float = 0.01, fmax: float = 1e-05, max_steps: int = 2000, **kwargs)[source]¶
Bases:
SinglePointASEComputerComputer based on the closes local minimum.
- __init__(dt: float = 0.01, fmax: float = 1e-05, max_steps: int = 2000, **kwargs) None[source]¶
Initialize a MinimizationASEComputer.
All kwargs are passed to SinglePointASEComputer.__init__.
- Parameters:
dt – Time step for relaxation.
fmax – Force convergence criterion.
max_steps – Maximum optimizer steps.
- class ParameterApplier(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for a function that applies parameters to an ASE calculator.
- __call__(atoms: Atoms, params: dict[str, Any]) None[source]¶
Applies a parameter dictionary to atoms.calc in-place.
- __init__(*args, **kwargs)¶
- class PathAtomsFactory(path: Path, index: int | None = None)[source]¶
Bases:
AtomsFactoryImplementation of AtomsFactory which reads the atoms from a path.
- class QuantityProcessor(*args, **kwargs)[source]¶
Bases:
ProtocolProtocol for a function that returns the quantities after the calculate function.
- __init__(*args, **kwargs)¶
- class SinglePointASEComputer(calc_factory: CalculatorFactory, param_applier: ParameterApplier, atoms_factory: AtomsFactory, atoms_post_processor: AtomsPostProcessor | None = None, quantity_processors: list[QuantityProcessor] | None = None, tag: str | None = None)[source]¶
Bases:
QuantityComputerBase class for a single point ASE-based computer.
This class loads a reference configuration, optionally post-processes the structure, attaches a calculator, and provides an interface for evaluating parameters
- __init__(calc_factory: CalculatorFactory, param_applier: ParameterApplier, atoms_factory: AtomsFactory, atoms_post_processor: AtomsPostProcessor | None = None, quantity_processors: list[QuantityProcessor] | None = None, tag: str | None = None) None[source]¶
Initialize a SinglePointASEComputer.
- Parameters:
calc_factory – Factory to create an ASE calculator given an Atoms object.
param_applier – Function that applies a dict of parameters to atoms.calc.
atoms_factory – Function to create the Atoms object.
atoms_post_processor – Optional function to modify or validate the Atoms object immediately after loading and before attaching the calculator.
quantities_processors – list of functions called after the calculate function to update the quantities dictionary
tag – Optional label for this computer. Defaults to “tag_None” if None.
- create_atoms_object() Atoms[source]¶
Create the atoms object, check it, optionally post-processes it, and attach the calculator.
- Returns:
ASE Atoms object with calculator attached.
- Return type:
Atoms
- get_meta_data() dict[str, Any][source]¶
Retrieve metadata for this objective function.
- Returns:
- Dictionary containing:
tag: User-defined label. n_atoms: Number of atoms in the configuration. weight: Final weight after any scaling. last_energy: The last computed energy
- Return type:
dict[str, Union[str, int, float]]
- property atoms¶
The atoms object. Accessing this property for the first time will create the atoms object.
- property n_atoms¶
The number of atoms in the atoms object. May trigger creation of the atoms object.
chemfit.combined_objective_function module¶
- class CombinedObjectiveFunction(objective_functions: Sequence[Callable[[dict[str, Any]], float]], weights: Sequence[float] | None = None)[source]¶
Bases:
ObjectiveFunctorRepresents a weighted sum of multiple objective functions.
Each objective function accepts a dictionary of parameters (str -> float) and returns a float. Internally, each function is paired with a non-negative weight. Calling the instance returns the weighted sum of all objective-function evaluations.
- classmethod add_flat(combined_objective_functions_list: Sequence[Self], weights: Sequence[float] | None = None) Self[source]¶
Create a new, “flat” CombinedObjectiveFunction by merging multiple existing instances.
Each input instance is scaled by its corresponding weight, and all internal objective functions are concatenated into a single-level structure.
- Parameters:
combined_objective_functions_list (Sequence[CombinedObjectiveFunction]) – A sequence of CombinedObjectiveFunction instances to combine.
weights (Sequence[float]) – A sequence of non-negative floats, one per CombinedObjectiveFunction. Each sub-instance’s internal weights are multiplied by its associated weight.
- Returns:
- A new instance whose objective_functions list is the
concatenation of all sub-instances’ objective functions, and whose weights list is the scaled and concatenated weights.
- Return type:
- Raises:
AssertionError – If the lengths of combined_objective_functions_list and weights differ, or if any weight is negative.
- __call__(params: dict[str, Any], idx_slice: slice = slice(None, None, None)) float[source]¶
Evaluate the combined objective at a given parameter dictionary.
Each individual objective function is called (with a shallow copy of params), multiplied by its weight, and summed into a single scalar result.
- Parameters:
params (dict) – A dictionary mapping parameter names (str) to values (float). A copy is made for each objective function call to guard against in-place modifications.
- Returns:
The weighted sum of all objective-function evaluations.
- Return type:
float
- __init__(objective_functions: Sequence[Callable[[dict[str, Any]], float]], weights: Sequence[float] | None = None) None[source]¶
Initialize a CombinedObjectiveFunction.
- Parameters:
objective_functions (Sequence[Callable[[dict], float]]) – A sequence of callables. Each callable must accept a dictionary mapping parameter names (str) to values (float) and return a float.
weights (Sequence[float], optional) – A sequence of non-negative floats specifying the weight for each objective function. If None, all weights default to 1.0.
- Raises:
AssertionError – If weights is provided but its length differs from the number of objective functions, or if any weight is negative.
- add(obj_funcs: Sequence[Callable[[dict[str, Any]], float]] | Callable[[dict[str, Any]], float], weights: Sequence[float] | float = 1.0) Self[source]¶
Add one or more objective functions (and corresponding weights) to this instance.
If obj_funcs is a single callable, it is appended; if it is a sequence of callables, each is appended in order. The weights argument must align: - If weights is a single float, that same weight is used for each newly added function. - If weights is a sequence, its length must match the number of functions being added.
- Parameters:
(Callable[dict] (obj_funcs) – or Sequence[Callable[[dict], float]]): Either a single objective-function callable or a sequence of such callables. Each callable must accept a dict and return a float.
float] – or Sequence[Callable[[dict], float]]): Either a single objective-function callable or a sequence of such callables. Each callable must accept a dict and return a float.
weights (float or Sequence[float], optional) – Either a float (used for every new function) or a sequence of non-negative floats. If a sequence, its length must equal the number of functions in obj_funcs. Defaults to 1.0.
- Returns:
The current instance (allows chaining).
- Return type:
Self
- Raises:
AssertionError – If weights is a sequence but its length does not match the number of functions in obj_funcs, or if any provided weight is negative.
chemfit.data_utils module¶
- process_csv(paths_to_csv: Path | Sequence[Path], index: slice | Sequence[slice] = slice(None, None, None)) tuple[list[Path], list[str], list[float]][source]¶
Load a dataset CSV and extract file paths, tags, and reference energies.
If a list of paths is passed it forwards them one by one to process_single_csv and collects the results.
- Parameters:
paths_to_csv (Union[Path, Sequence[Path]]) – Either a single path to a CSV for a list of paths
index (Union[slice, Sequence[slice]]) – Either a single slice or a list of slices which is applied to the data read from the CSVs
- Returns:
paths: List of resolved Path objects to each data file.
tags: List of dataset tag strings.
energies: List of reference energies as floats.
- Return type:
tuple[list[Path], list[str], list[float]]
- process_single_csv(path_to_csv: Path, index: slice = slice(None, None, None)) tuple[list[Path], list[str], list[float]][source]¶
Load a dataset CSV and extract file paths, tags, and reference energies.
- The CSV must include the following columns:
- Either path or file:
If path is present, each entry may be absolute or relative to the current working directory.
Otherwise, file entries are taken as relative to the CSV’s parent directory.
If both are present, path takes precedence.
tag: A short string label for each dataset.
reference_energy: A numeric reference energy for each dataset.
Additional columns are permitted and ignored.
- Parameters:
path_to_csv (Path) – Path to the CSV file describing the datasets.
index (slice) slice(None, None, None) – A slice which is applied to the data read from the CSV
- Returns:
paths: List of resolved Path objects to each data file.
tags: List of dataset tag strings.
energies: List of reference energies as floats.
- Return type:
tuple[list[Path], list[str], list[float]]
- Raises:
FileNotFoundError – If the CSV file does not exist.
KeyError – If neither path nor file, or if tag or reference_energy columns are missing.
ValueError – If any reference_energy value cannot be converted to float.
chemfit.debug_utils module¶
chemfit.fitter module¶
- class CallbackInfo(opt_params: 'dict[str, Any]', opt_loss: 'float', cur_params: 'dict[str, Any]', cur_loss: 'float', step: 'int', info: 'FitInfo')[source]¶
Bases:
object- __init__(opt_params: dict[str, Any], opt_loss: float, cur_params: dict[str, Any], cur_loss: float, step: int, info: FitInfo) None¶
- cur_loss: float¶
- cur_params: dict[str, Any]¶
- opt_loss: float¶
- opt_params: dict[str, Any]¶
- step: int¶
- class FitInfo(initial_value: 'float | None' = None, final_value: 'float | None' = None, time_taken: 'float | None' = None, n_evals: 'int' = 0)[source]¶
Bases:
object- __init__(initial_value: float | None = None, final_value: float | None = None, time_taken: float | None = None, n_evals: int = 0) None¶
- final_value: float | None = None¶
- initial_value: float | None = None¶
- n_evals: int = 0¶
- time_taken: float | None = None¶
- class Fitter(objective_function: Callable[[dict[str, Any]], float], initial_params: dict[str, Any], bounds: dict[str, Any] | None = None, near_bound_tol: float | None = None, value_bad_params: float = 100000.0)[source]¶
Bases:
object- __init__(objective_function: Callable[[dict[str, Any]], float], initial_params: dict[str, Any], bounds: dict[str, Any] | None = None, near_bound_tol: float | None = None, value_bad_params: float = 100000.0) None[source]¶
Initialize a Fitter.
- Parameters:
objective_function (Callable[[dict], float]) – The objective function to be minimized.
initial_params (dict) – Initial values of the parameters.
bound (Optional[dict]) – Dictionary specifying bounds for each parameter.
near_bound_tol (Optional[float]) – If specified, checks whether any parameters are too close to their bounds and logs a warning if so.
value_bad_params (float) – Threshold value beyond which the objective function is considered to be in a poor or invalid region.
- fit_scipy(method: str = 'L-BFGS-B', **kwargs) dict[str, Any][source]¶
Optimize parameters using SciPy’s minimize function.
- Parameters:
initial_parameters (dict) – Initial guess for each parameter, as a mapping from name to value.
**kwargs – Additional keyword arguments passed directly to scipy.optimize.minimize.
- Returns:
Dictionary of optimized parameter values.
- Return type:
dict
Warning
If the optimizer does not converge, a warning is logged.
Example
>>> def objective_function(idx: int, params: dict): ... return 2.0 * (params["x"] - 2) ** 2 + 3.0 * (params["y"] + 1) ** 2 >>> fitter = Fitter(objective_function=objective_function) >>> initial_params = dict(x=0.0, y=0.0) >>> optimal_params = fitter.fit_scipy(initial_parameters=initial_params) >>> print(optimal_params) {'x': 2.0, 'y': -1.0}
- ob_func_wrapper(ob_func: Any) Callable[[dict[str, Any]], float][source]¶
Wraps the objective function and applies some checks plus logging.
- register_callback(func: Callable[[CallbackInfo], None], n_steps: int)[source]¶
Register a callback which is executed after every n_steps of the optimization.
Multiple callbacks may be registered. They are executed in the order of registration. The callback must be a callable with the following signature:
func(arg: CallbackInfo)
- The CallbackInfo is a dataclass with the following attributes:
opt_params: The optimal parameters at the time the callback is invoked.
opt_loss: The loss value corresponding to the optimal parameters.
cur_params: The parameters tested most recently when the callback is invoked.
cur_loss: The loss value associated with the most recently tested parameters.
- step: The number of optimization steps performed so far
(generally not equal to the number of loss function evaluations).
info: The current FitInfo instance of the fitter at the time the callback is invoked.
chemfit.kabsch module¶
- apply_transform(P: ndarray[tuple[Any, ...], dtype[float64]], R: ndarray[tuple[Any, ...], dtype[float64]], t: ndarray[tuple[Any, ...], dtype[float64]]) ndarray[tuple[Any, ...], dtype[float64]][source]¶
Apply affine transform defined by rotation R and translation t to points P.
- kabsch(P: ndarray[tuple[Any, ...], dtype[float64]], Q: ndarray[tuple[Any, ...], dtype[float64]], weights: ndarray[tuple[Any, ...], dtype[float64]] | None = None, allow_reflection: bool = False) tuple[ndarray[tuple[Any, ...], dtype[float64]], ndarray[tuple[Any, ...], dtype[float64]]][source]¶
Compute the optimal rigid transformation that aligns P onto Q using the Kabsch algorithm.
This implementation assumes row-vector points of shape
(N, D)and solves for rotationRand translationtin the mapping:Q ≈ P @ R + t
The solution minimizes the root-mean-square deviation (RMSD) between P transformed and Q, optionally with per-point weights.
The algorithm:
Compute centroids of P and Q (weighted if weights provided).
Subtract centroids to get centered coordinates P0, Q0.
Compute the cross-covariance matrix:
C = P0.T @ Q0 # (D, D) for row-vector convention
Perform singular value decomposition:
U, S, Vt = np.linalg.svd(C)
Compute rotation:
R = Vt.T @ U.T
If allow_reflection is False and det(R) < 0, flip the sign of the last row of Vt before recomputing R to ensure a proper rotation (det(R) = +1).
Compute translation:
t = cQ - cP @ R
- Parameters:
P (ndarray of shape (N, D)) – Source point coordinates.
Q (ndarray of shape (N, D)) – Target point coordinates, corresponding 1-to-1 with P.
weights (ndarray of shape (N,), optional) – Nonnegative weights for each correspondence. If provided, centroids and covariance are computed with these weights.
allow_reflection (bool, default=False) – If False, the solution will have det(R) >= 0 (proper rotation). If True, improper rotations (reflections) are allowed.
- Returns:
R (ndarray of shape (D, D)): Optimal rotation matrix.
t (ndarray of shape (D,)): Translation vector.
- Return type:
Tuple[ndarray, ndarray]
- Raises:
ValueError – If P and Q have mismatched shapes, fewer than D points are provided, or if weights are invalid (negative, wrong shape, or zero sum).
Notes
Works for any dimensionality D >= 2.
For column-vector convention (
R @ P + t), the covariance and multiplication order must be adjusted.The returned transform is optimal in the least-squares sense and preserves distances (no scaling or shearing).
chemfit.mpi_wrapper_cob module¶
- class MPIWrapperCOB(cob: CombinedObjectiveFunction, comm: Any | None = None, mpi_debug_log: bool = False)[source]¶
Bases:
ObjectiveFunctor- __call__(params: dict[str, Any]) float[source]¶
Compute the objective value given a set of parameters.
- Parameters:
parameters – Dictionary of parameter names to float values.
- Returns:
Computed objective value (e.g., error metric).
- Return type:
float
- __init__(cob: CombinedObjectiveFunction, comm: Any | None = None, mpi_debug_log: bool = False) None[source]¶
Initialize wrapper for combined objective function.
chemfit.plot_utils module¶
- plot_energies(energy_ref: Sequence[float], energy_fit: Sequence[float], n_atoms: Sequence[int], tags: Sequence[str], output_folder: Path) None[source]¶
chemfit.scme_factories module¶
chemfit.scme_setup module¶
- arrange_water_in_ohh_order(atoms: Atoms) Atoms[source]¶
Reorder atoms so each water molecule appears as O, H, H.
- Parameters:
atoms (Atoms) – Original Atoms object containing water molecules.
- Returns:
New Atoms object with OHH ordering and no constraints.
- Return type:
Atoms
- Raises:
ValueError – If atom counts or ratios are inconsistent with water.
- check_water_is_in_ohh_order(atoms: Atoms, oh_distance_tol: float = 2.0) bool[source]¶
Validate that each water molecule is ordered O, H, H and within tolerance.
- Parameters:
atoms (Atoms) – Atoms object to validate.
OH_distance_tol (float, optional) – Maximum allowed O-H distance (default is 2.0 Å).
- Raises:
ValueError – If ordering or distances violate water OHH assumptions.
chemfit.utils module¶
- class ExtendedJSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶
Bases:
JSONEncoder- default(o: Any)[source]¶
Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return super().default(o)
- check_params_near_bounds(params: dict[str, Any], bounds: dict[str, Any], relative_tol: float) list[tuple[str, float, float, float]][source]¶
Check if any of the parameters are near or beyond the bounds.
The criterions checked are
param_value < lower + relative_tol * (upper - lower)
param_value > upper - relative_tol * (upper - lower)
- Parameters:
params (dict) – the dict of params to check
bounds (dict) – the dict of bounds to check
relative_tol (float) – The tolerance, relative to the span of the bounds. Positive numbers mean the values must fulfill a stricter bound Zero means the values must fulfill the exact bound Negative numbers mean the values must fulfill a looser bound
- Returns:
A list of tuples with information about parameters, which violate the constraint. Each tuple contains - A string identifying the parameter in a flattened dict - The value of the parameter - The lower bound - The upper bound