Dataset

This module defines the tools to perform and manage several calculations. The usage of this module aims to simplify the approach to an ensemble calculations using both QuantumESPRESSO and Yambo, and to deal with parallel executions of multiple instances of the code.

class mppi.Datasets.Dataset.Dataset(label='Dataset', run_dir='runs', **kwargs)[source]

Bases: mppi.Calculators.Runner.Runner

Class to perform a set of calculations and to manage the associated results.

Parameters
  • label (str) – the label of the dataset, it can be useful for instance if more than one istance of the class is present

  • run_dir (str) – path of the directory where the runs will be performed

  • **kwargs – all the parameters passed to the dataset and stored in its _global_options. Can be useful, for instance, in performing a post-processing of the results.

The class members are:

runs

list of the runs which have to be treated by the dataset. The runs contain the input parameter to be passed to the various runners.

Type

list

calculators

calculators which will be used by the run method

Type

list

results

set of the results of each of the runs. The set is not ordered as the runs may be executed asynchronously.

Type

dict

ids

list of run ids, to be used in order to identify and fetch the results

Type

list

Example

>>> code = QeCalculator()
>>> study=Dataset(label = .., run_dir = ..., **kwargs)
>>> study.append_run(id={'ecut': 30, 'kpoints' : 4},input=...,runner=code,variable1=1)
>>> study.append_run(id={'ecut': 40, 'kpoints' : 4},input=...,runner=code,variable2='periodic')
>>> study.run()
append_run(id, runner, input, **kwargs)[source]

Add a run into the dataset.

Append a run to the list of runs to be performed and associate to each appended item the corresponding runner instance. The method updates the class member

self.runs[irun] = {‘names’ : name_from_id(input), ‘input’:input, kwargs} self.calc[icalc] = {‘calc’ : runner, iruns : […,irun]}

where irun is the cardinal index of the calculator.

The name of the input file is not directly passed. Instead it is computed from the id of the run using the function name_from_id. If, for instance, a jobname has to be provided it can be passed as kwargs.

Parameters
  • id – the id of the run, useful to identify the run in the dataset. It can be a dictionary or a string, as it may contain different keyword. For example a run can be classified as id = {'energy_cutoff': 60, 'kpoints': 6}

  • input (InputFile) – the instance of an InputFile class

  • runner (Runner) – the instance of runner class to which the remaining keyword arguments will be passed at the input

  • kwargs – these parameters describe further possible variables. All these quantities are stored as an element of the runs list and are passed to the calculator, together with the global options of the Dataset, when the run method is called

Raises

ValueError – if the provided id is identical to another previously appended run.

fetch_results(id=None, attribute=None, run_if_not_present=True)[source]

Retrieve the results that match some conditions.

Selects out of the results the objects which have in their id at least the dictionary specified as input. May return an attribute of each result if needed.

Parameters
  • id – string or dictionary of the retrieved id. Return a list of the runs that have the id argument inside the provided id in the order provided by append_run().

  • attribute (str) – if present, provide the attribute of each of the results instead of the result object

  • run_if_not_present (bool) – If the run has not yet been performed in the dataset then perform it.

Example

>>> study=Dataset()
>>> study.append_run(id={'ecut': 40, 'k' : 4}, input = ..., runner = )
>>> study.append_run(id={'ecut': 40, 'k' : 6}, input = ..., runner = )
>>> study.append_run(id={'ecut': 50, 'k' : 6}, input = ..., runner = )
>>> #append other runs if needed
>>> #set a post processing function that perform a parsing of the rsesults
>>> #and contains 'energy' as an attribute of the results object
>>> #run the calculations (optional if run_if_not_present=True)
>>> study.run()
>>> # returns a list of the energies of first and the second result
>>> # in this example
>>> data=study.fetch_results(id={'ecut': 40},attribute='energy')
post_processing(**kwargs)[source]

Calls the Dataset function with the results of the runs as arguments

process_run()[source]
Run the dataset by performing explicit run of each of the item of the

runs list.

run_the_calculations(selection=None)[source]

Method that manage the execution of the runs of the Dataset.

Parameters

selection (list) – if not None only the iruns in the list are computed. This parameter is used only when the method is called by the fetch_results() method.

seek_convergence(rtol=1e-05, atol=1e-08, selection=None, **kwargs)[source]

Search for the first result of the dataset which matches the provided tolerance parameter. The results are in dataset order (provided by the append_run() method) if selection is not specified. Employs the numpy allclose() method for comparison.

Parameters
  • rtol (float) – relative tolerance parameter

  • atol (float) – absolute tolerance parameter

  • selection (list) – list of the id of the runs in which to perform the convergence search. Each id should be unique in the dataset.

  • **kwargs – arguments to be passed to the fetch_results() method.

Returns

the id of the last run which matches the

convergence, together with the result, if convergence is reached.

Return type

id,result (tuple)

Raises

LookupError – if the parameter for convergence were not found. The dataset has to be enriched or the convergence parameters loosened.

set_postprocessing_function(func)[source]

Set the callback of run. Calls the function func after having performed the appended runs.

Parameters

func (func) – function that process the inputs results and returns the value of the run method of the dataset. The function is called as func(self).

mppi.Datasets.Dataset.name_from_id(id)[source]

Convert the id into a run name. If id is a string, set name = id, if it is a dictionary build the name string of the run from the id dictionary.

Parameters

id – id associated to the run

Returns

name of the run associated to the dictionary id

Return type

name (str)