Writing Quantity Computers

This page contains practical guidance for writing your own quantity computers from scratch.

Before we start…

Before we start, let us clearly state that “quantity computers” are merely a useful convention. Nothing prevents you from using ChemFit without them. That being said, you should probably use them.

A recommended first step is to check if you can use one of the built-in ways:

  1. If you already have a pure python function implementing your computation, have a look at the to_quantity_computer() decorator, which also features in the Quickstart examples.

  2. If you are using an external simulation tool, like LAMMPS for example, have a look at the FileBasedQuantityComputer and its corresponding doc page: File-Based Quantity Computers.

  3. If you are using ASE, try the SinglePointASEComputer or MinimizationASEComputer described in ASE-Based Quantity Computers.

If none of the built-in computers are to your taste, think about sub-classing them.

Let’s cook

For a completely fresh QuantityComputer, derive from the QuantityComputer base class and implement the _compute() method. That’s it.

The _compute method should accept exactly two arguments: A dictionary of parameters of type dict[str,Any] and an EvaluateContext. It should return the dictionary of quantities.

This is probably a point at which we should familiarize ourselves with the…

Golden Rule:

DO NOT MODIFY GLOBAL STATE FROM WITHIN THE COMPUTE METHOD. If you violate this rule, parallel evaluation of your quantity computer can be undefined. It does not have to be, but for everyone’s sake let’s assume it will be.

Importantly, the golden rule applies to instance variables of the computer itself as well.

Let’s illustrate what not to do:

class GoldenRuleViolator(QuantityComputer):
    # ...
    def _compute(self, params, ctx):
        self.bad = params["x"] # <-- bad mojo
        # ...
        return {"mojo" : self.bad}

Now what happens if you call the same instance of GoldenRuleViolator in parallel? That’s right! Bad things. The reason is that the value of self.bad could be overwritten by another thread in the middle of the compute function, which would make your params and the returned quantities mismatched.

You might say: “Why would I ever do something so stupid?”. Let me just say that you’d be surprised how easy it is to accidentally violate the Golden Rule. Even seemingly harmless patterns can violate this rule accidentally, especially when storing intermediate results on self.

Therefore, if you have anything to communicate with the outside world, there are two options

  1. Put it in the quantities dict and return it

  2. Write to ctx.meta

Let’s fix the GoldenRuleViolator:

class GoodCitizen(QuantityComputer):
    # ...
    def _compute(self, params, ctx):
        bad = params["x"]
        ctx.meta["bad"] = bad # <-- no problemo
        # ...
        return {"mojo" : bad}

Now there is no problem. All we ever do is write to bad which is local to the current function evaluation or to ctx.meta.bad which explicitly prevents any kind of race conditions.

Configuring a computer

Besides, the parameter dictionary passed on evaluation, a quantity computer may also want to be configured by different external parameters.

Symbolically we can imagine an external parameter \(f\), which influences the computation in some way:

\[\text{Quantities}(\text{params},f,\text{ctx}) = \{ ... \}\]

The external parameter \(f\) may for example be

  • A path to file with atomic coordinates

  • A constant numeric prefactor

  • The charge of certain atoms

  • You name it! Anything that influences the quantities and is otherwise fixed.

If the quantity computer is a wrapped python function, it’s easy to bind external parameters. Check this out:

from chemfit.wrap_funcs import to_quantity_computer

@to_quantity_computer(pass_ctx=True)
def computer(params, ctx, f):
    ...

# Configure f=1
f1_computer = computer.bind(f=1)
# Configure f=2
f2_computer = computer.bind(f=2)

Note

If pass_ctx=True, all arguments except params and ctx must be bound..

If pass_ctx==False, all arguments except params have to be bound.

If we forego the to_quantity_computer() approach and we need external parameters, they should be accepted in the constructor.

from chemfit.abstract_objective_function import QuantityComputer

class Computer(QuantityComputer):

    def __init__(self,f):
        self.f = f

    def _compute(self, params, ctx):
        # We can make use of self.f in here
        ...

Important

A quantity computer becomes fully specified once it depends only on (parameters, ctx). At that point, all external parameters have been fixed, either via bind() (for wrapped functions) or via the constructor (for class-based implementations).

Using the evaluation context

The EvaluateContext provides a structured way to exchange information during evaluation without relying on shared state.

It exposes three main fields:

  • ctx.meta — for results and diagnostics

  • ctx.config — for read-only configuration

  • ctx.shared — for controlled shared state

ctx.meta

ctx.meta is a dictionary used to record auxiliary information about the current evaluation. It is safe to write to and is typically used for:

  • debugging information

  • intermediate values that are not part of the returned quantities

  • provenance (e.g. which configuration was used)

  • performance metrics

ctx.meta["n_iterations"] = n_iter
ctx.meta["converged"] = converged
ctx.meta["structure_id"] = structure_id

Each evaluation has its own meta dictionary, so there are no race conditions.

In addition to values written during evaluation, quantity computers may also define static meta data. This is meta data attached to the computer itself (e.g. a tag or identifier), which is automatically merged into ctx.meta when the computer is evaluated.

This is useful for recording information that is constant across all evaluations of a given computer, such as:

  • a label or tag identifying the term

  • the origin of the data

  • a fixed configuration identifier

Important:

Quantities that are part of the computation should be returned from _compute, not written to ctx.meta.

Rule of thumb:

If something is needed for the loss or further computation, return it. If it is only useful for inspection, debugging, or bookkeeping, store it in ctx.meta.

ctx.config

ctx.config provides configuration information to the computation. It should be treated as read-only.

Typical use cases include:

  • passing runtime options

  • controlling execution modes

  • toggling optional behavior

if ctx.config.get("compute_forces", False):
    ...

The main purpose of ctx.config is to allow behavior to vary per evaluation, without requiring reconstruction of the quantity computer or objective function.

In particular, ctx.config is useful when different evaluations may run in different execution environments. For example, in distributed or parallel settings, different calls may:

  • run on different cluster nodes

  • use different numbers of cores or GPUs

  • access different scratch directories

  • use different execution backends

scratch_dir = ctx.config.get("scratch_dir", "/tmp")
n_cores = ctx.config.get("n_cores", 1)
Rule of thumb:

Use ctx.config to influence how the computation is carried out, but never modify it inside _compute.

ctx.shared

ctx.shared allows controlled sharing of state across multiple evaluations.

This is useful for:

  • caching expensive results

  • sharing resources between evaluations

  • coordinating work across parallel tasks

However, this is also the most dangerous field.

Important:

Any data stored in ctx.shared may be accessed concurrently from multiple threads or processes. You must ensure that all access is thread-safe and does not violate the Golden Rule.

Example (simple cache):

cache = ctx.shared.setdefault("cache", {})

key = tuple(sorted(params.items()))
if key in cache:
    return cache[key]

result = expensive_computation(...)
cache[key] = result
Rule of thumb:

Only use ctx.shared if you know exactly what you are doing. Prefer ctx.meta or returning values whenever possible.

External parameters vs ctx.config

Both external parameters (passed via bind or the constructor) and ctx.config can influence the behavior of a quantity computer, but they serve different purposes.

External parameters define what is being computed. They are part of the identity of the quantity computer or objective term.

Typical examples include:

  • the system or structure being evaluated

  • a file path or dataset

  • physical constants or fixed model settings

These values should usually be fixed when constructing or specializing the quantity computer.

computer.bind(atoms_factory=my_structure)
Computer(f=2.0)

In contrast, ctx.config defines how a particular evaluation is carried out.

Typical examples include:

  • enabling or disabling optional work

  • selecting approximate vs. exact evaluation modes

  • turning diagnostics on or off

  • passing execution-specific information (e.g. resources, paths)

The main reason to use ctx.config is that it can vary per evaluation without requiring you to reconstruct the quantity computer or objective term.

Rule of thumb:

  • Use external parameters if changing the value creates a different objective term.

  • Use ctx.config if changing the value only affects how the same term is evaluated.

For example, changing the atomic structure of a system should be an external parameter, while enabling additional diagnostics or selecting a cheap evaluation mode should be handled through ctx.config.

Summary

  • ctx.meta: write freely, per-evaluation diagnostic data (plus static meta data from the computer)

  • ctx.config: read-only, per-evaluation control of execution

  • ctx.shared: shared state, use with care

Calling computers from within computers

Note

This section is for fairly advanced use and, probably, most relevant if you are looking to implement your own execution wrapper for the CombinedObjectiveFunction, besides the built-in MPI and executor wrappers.

If we want to make calls to other computers from our custom computer, the recommended approach is to make use of the child context system to supply fresh contexts to the inner computers.

Here is a simple demonstration of the idea: We have an outer computer, which accepts a parent EvaluateContext and then later on splits of two child contexts using the child_contexts() context manager.

class OuterComputer(QuantityComputer):

    def _compute(self, params, ctx):
        # ...
        with ctx.child_contexts(2) as child_contexts:
            q1 = inner_computer1(params, child_contexts[0])
            q2 = inner_computer2(params, child_contexts[1])
        # ...

The benefit of this approach is two-fold

  1. We get full meta-data provenance. All of the child meta data can be found in ctx.meta["children"].

  2. Since the inner computers have their own context they can also be evaluated in parallel … although the example above does not make use of this.

Note

For parallel evaluation with an executor, use the map_with_context() function. Differently from the regular executor map function, it correctly handles the ctx fields even if execution happens in different processes. See also Parallel Execution.

Child-parent relationships for the different context fields

When creating child contexts via child_contexts(), the different fields of the context behave differently.

Understanding this behavior is important when composing quantity computers.

ctx.meta

Each child context receives its own independent meta dictionary.

During evaluation, child computers write to their own ctx.meta. After the child_contexts block exits, the parent context collects all child meta data under:

ctx.meta["children"]

This is a list containing the meta data of each child evaluation, in order.

This ensures full provenance: all information produced by child computations is preserved and accessible from the parent.

ctx.config

The config dictionary is passed from parent to child contexts as-is.

All child contexts see the same configuration, allowing them to adapt their behavior consistently.

value = ctx.config.get("mode")

Child contexts should treat config as read-only.

ctx.shared

The shared dictionary is shared between parent and child contexts.

This allows child computations to communicate and reuse data, for example through caching.

cache = ctx.shared.setdefault("cache", {})

Because ctx.shared may be accessed concurrently, all access must be thread-safe.

Configuring child contexts

Besides the number of children, child_contexts() accepts an optional argument of type ChildContextConfigurator.

A child context configurator allows you to customize how child contexts are created.

This can be useful when:

  • distributing work across resources

  • assigning identifiers or indices to child evaluations

  • modifying configuration for individual children

  • implementing custom execution strategies

The configurator is called once per child context and can modify the child context before it is used.

Conceptually, it allows you to control:

def configurator(idx_child_ctx, child_ctx, num_children, parent_ctx):
    ...

For example, you may want to assign each child a unique identifier:

def configurator(idx_child_ctx, child_ctx, num_children, parent_ctx):
    child_ctx.meta["child_index"] = idx_child_ctx

Or adjust configuration per child:

def configurator(idx_child_ctx, child_ctx, num_children, parent_ctx):
    child_ctx.config["worker_id"] = idx_child_ctx

This mechanism is particularly useful when writing execution wrappers (e.g. MPI or executor-based parallelization), where different children may correspond to different processes or resources.

Rule of thumb:

Use a child context configurator when child evaluations need systematic differences in their context. Otherwise, the default behavior is sufficient.