File-Based Quantity Computers¶

The FileBasedQuantityComputer runs an external command in a temporary working directory and parses the resulting output files into a quantity dictionary.

This is the standard way to integrate external simulation codes into ChemFit.

A file-based computer is constructed from three pieces:

a function that builds the command
a list of expected output files
a function that parses those files

Minimal example¶

Consider an external script with a command-line interface, which does the following:

Accept an input \(A\)
Compute \(y_i = A (x_i-2)^2\) for a predefined range of \(x_i \in \left[ x_\text{min}, x_\text{max} \right]\)
Write the resulting arrays \(y_i\) and the corresponding \(x_i\) to a file

Note

The full script can be found in the unit tests at https://github.com/MSallermann/chemfit/tests/input/square_function.py.

In this example we will use the FileBasedQuantityComputer to determine the pre-factor \(A\).

Before we can start we should define how our external command can be called. For maximum flexibility, the command is provided as a function that accepts the parameter dictionary and the temporary working directory. Each evaluation runs in its own isolated working directory.

All files created by the external command should be written relative to this working directory. The paths specified in output_files are interpreted relative to it as well.

Note

The extra arguments, script_file and output_file, need to be bound. In the end the computer will accept only a function whose only free arguments are the parameters and the working directory. In this example we will use the with_cmd() utility method to help us out with this.

# Define the command that will be called to create the output file with given parameters
def callable_cmd(
    parameters: dict[str, float], workdir: Path, script_file: Path, output_file: Path
) -> list[str]:
    return f"python {script_file} {parameters['prefactor']} {output_file}".split()

Next, we need to define a parser that converts the generated output file(s) into quantities.

For our example, we could define such a parser like so:

import numpy as np

def my_output_parser(output_files: list[Path]) -> dict[str, Any]:
    """Parse the output files and retrieve the quantities."""
    f = output_files[0]
    data = np.loadtxt(f)
    return {"y": data[:, 0], "x": data[:, 1]}

Note

As you can see my_output_parser() has to accept a list of output files. In this simple example, we do not have to worry about this, since we know there will only ever be one output file.

The reason for the list is that the FileBasedQuantityComputer may specify multiple output files and, in fact, multiple parsers. All output files are passed to all parsers, and their outputs are merged.

We will also need the following loss function

def loss_function(quantities: dict[str, Any], ref_y: Iterable[float]) -> float:
    y_values = quantities["y"]
    errors = [(y - y_r) ** 2 for y, y_r in zip(y_values, ref_y)]
    return np.sum(errors)

Now we’re ready to wire everything up:

ob = (
    FileBasedQuantityComputer(
        output_files=["output.txt"],
        output_parsers=[my_output_parser],
        base_working_directory=".",
        delete_temp_workdirs=True,
    )
    .with_cmd(callable_cmd, script_file=script_file, output_file="output.txt")
    .with_loss(loss_function, ref_y=ref_quantities["y"])
)

initial_guess = {"prefactor": 0.01}
fitter = Fitter(ob, initial_params=initial_guess)
opt_params = fitter.fit_scipy()

The entire example can be found in the tests.

What happens during evaluation¶

Each evaluation runs in an isolated working directory.

A single call performs the following steps:

create a temporary working directory
run presubmit_hook (if provided)
build the command via executable_cmd
execute it using subprocess.run()
wait until all expected output files exist
parse them using output_parsers
return the resulting quantity dictionary

The working directory is removed after evaluation unless configured otherwise.

Customization points¶

The behavior is controlled entirely through callables.

Command construction¶

executable_cmd receives the parameter dictionary and the current workdir and must return a command (list of strings):

def executable_cmd(parameters : dict[str,Any], workdir : Path):
    return ["my_program", "--x", str(parameters["x"])]

This function is called for every evaluation.

Output files¶

output_files defines which files must exist before parsing begins.

output_files = [Path("energy.txt"), Path("forces.txt")]

All paths must be relative to the working directory.

Output parsing¶

output_parsers receives the list of output file paths and returns a dictionary of quantities:

def output_parsers(paths):
    energy = float(paths[0].read_text())
    return {"energy": energy}

Presubmit hook¶

If input files need to be written before execution, use presubmit_hook:

def presubmit_hook(parameters:dict[str,Any], workdir:Path):
    with open("input.txt", "w") as f:
        f.write(str(parameters["x"]))

This runs inside the working directory before the command is executed.

Example: generating an input file¶

A common use of presubmit_hook is to generate input files from a template.

def write_input(
    parameters: dict[str, Any],
    workdir: Path,
    *,
    template_path: Path,
    output_name: str,
):
    template = template_path.read_text()

    content = template.replace("{{A}}", str(parameters["prefactor"]))

    output_path = workdir / output_name
    output_path.write_text(content)

This can then be attached to the computer:

computer = (
    FileBasedQuantityComputer(
        output_files=[Path("output.txt")],
        output_parsers=[my_output_parser],
    )
    .with_presubmit(
        write_input,
        template_path=Path("template.in"),
        output_name="input.in",
    )
    .with_cmd(callable_cmd, script_file="square.py", output_file="output.txt")
)

The presubmit hook runs inside the working directory before the command is executed. This makes it the right place to prepare all input files needed by the external program.

Hint

Using a template engine such as Jinja to generate input files can be a very powerful option in the presubmit_hook, especially when many files need to be configured or share common structure.

Important rules¶

Output files must be relative¶

All paths in output_files must be relative to the working directory.

Using absolute paths breaks isolation and can lead to incorrect results when running in parallel.

Existence vs completeness¶

An output file is considered ready as soon as it exists.

The framework does not check whether the file is fully written.

If your program writes files incrementally, ensure that files only appear once complete, or use a separate completion flag file.

Scheduler caveat¶

Some commands (e.g. srun or sbatch) return before the computation has finished.

In that case, output files may appear before the job is done.

A common solution is to write a done file and include it in output_files.

Debugging and failure handling¶

Temporary working directories are deleted after successful execution.

For debugging, you can keep them:

computer = FileBasedQuantityComputer(
    ...,
    keep_temp_workdir_after_crash=True,
)

To inspect failures, you can also enable dump files:

computer = FileBasedQuantityComputer(
    ...,
    write_dump_file_after_crash=True,
)

During execution, useful information is stored in the context, including:

the working directory
the executed command
the output files

Execution options¶

The constructor exposes additional options:

base_working_directory - where temporary directories are created
wait_timeout - maximum time to wait for output files
poll_interval - how often file existence is checked
subprocess_run_args - arguments passed to subprocess.run
delete_temp_workdirs - whether to remove directories after success

Subclassing¶

In most cases, constructing a FileBasedQuantityComputer with callables is sufficient.

Subclassing is useful when the execution flow itself needs to change.

A typical example is adding a scheduler wrapper such as srun:

class SrunComputer(FileBasedQuantityComputer):
    def build_cmd(self, parameters, ctx):
        base_cmd = super().build_cmd(parameters, ctx)
        return ["srun", *base_cmd]

This pattern is used when command construction depends on runtime context.

Summary¶

FileBasedQuantityComputer provides a structured way to:

run external programs
isolate executions in temporary directories
collect results as dictionaries

It is the main integration point for external simulation workflows in ChemFit.