Dataset¶

This module defines the tools to perform and manage several calculations. The usage of this module aims to simplify the approach to an ensemble calculations using both QuantumESPRESSO and Yambo, and to deal with parallel executions of multiple instances of the code.

class mppi.Datasets.Dataset.Dataset(label='Dataset', run_dir='runs', num_tasks=2, verbose=True, **kwargs)[source]¶

Bases: Runner

Class to perform a set of calculations and to manage the associated results.

Parameters:

label (str) – the label of the dataset, it can be useful for instance if more than one istance of the class is present
run_dir (str) – path of the directory where the runs will be performed. This argument can be overwritten by including a run_dir keyword in the append_run() method of the class. In this way the various elements of the dataset can be run in different folders
num_tasks (int) – maximum number of computations performed in parallel by the run() method of the class
verbose (bool) – set the amount of information provided on terminal
**kwargs – all the parameters passed to the dataset and stored in its _global_options. Can be useful, for instance, in performing a post-processing of the results

The class members are:

ids¶

list of run ids, to be used in order to identify and fetch the results

Type:: list

runs¶

list of the runs which have to be treated by the dataset. The runs contain all the input parameters to be passed to the various runners.

Type:: list

calculators¶

calculators which will be used by the run method

Type:: list

results¶

set of the results of each of the runs. The set is not ordered as the runs may be executed asynchronously.

Type:: dict

post_processing_function¶

specify a postprocessing on the results provided by the runs of the dataset

Type:: function

Example

>>> code = QeCalculator()
>>> study=Dataset(label = .., run_dir = ..., **kwargs)
>>> study.append_run(id={'ecut': 30, 'kpoints' : 4},input=...,runner=code,variable1=1)
>>> study.append_run(id={'ecut': 40, 'kpoints' : 4},input=...,runner=code,variable2='periodic')
>>> study.run()

append_run(id, runner, **kwargs)[source]¶

Add a run into the dataset.

Append a run to the list of runs to be performed and associate to each appended item the corresponding runner instance. If the name of the input file is not provided, the method attribute it from the id of the run using the function name_from_id. If, for instance, a jobname has to be provided it can be passed as kwargs.

Parameters:

id – the id of the run, useful to identify the run in the dataset. It can be a dictionary or a string, as it may contain different keyword. For example a run can be classified as id = {'energy_cutoff': 60, 'kpoints': 6}
runner (Runner) – the instance of runner class to which the keyword arguments will be passed at the input
kwargs – these arguments contain the instance of the input and any other variable needed for appended run. All these quantities are stored as an element of the runs list and are passed to the calculator, together with the global options of the Dataset, when the run method is called

Raises:

ValueError – if the provided id is identical to another previously appended run.

build_taskgroups(selection)[source]¶

Identify the elements,among the runs provided as input, that can be executed in parallel. The number of parallel computations is specified by the self.num_tasks attribute of the class

Parameters:: selection (list) – list with a selection of the runs of dataset.
Returns:: a list of list with the groups of parallel computation. The order of the runs respects the ones of the selection list
Return type:: (list)

fetch_results(id=None, attribute=None, run_if_not_present=True)[source]¶

Retrieve the results that match some conditions that is specified through an id in the form of a string or a dictionary. Selects out of the results of the objects which have in their name keyword at least the id provided as input.

Parameters:

id – string or dictionary of the retrieved id.
attribute (string) – if present, provide the attribute of each of the results instead of the result object
run_if_not_present (bool) – If the run has not yet been performed in the dataset then perform it.

Returns:

A list of the runs that match the condition in the order provided by append_run() method.

Example

>>> study=Dataset()
>>> study.append_run(id={'ecut': 40, 'k' : 4}, input = ..., runner = )
>>> study.append_run(id={'ecut': 40, 'k' : 6}, input = ..., runner = )
>>> study.append_run(id={'ecut': 50, 'k' : 6}, input = ..., runner = )
>>> #append other runs if needed
>>> #set a post processing function that perform a parsing of the rsesults
>>> #and contains 'energy' as an attribute of the results object
>>> #run the calculations (optional if run_if_not_present=True)
>>> study.run()
>>> # returns a list of the energies of first and the second result
>>> # in this example
>>> data=study.fetch_results(id={'ecut': 40},attribute='energy')

post_processing(**kwargs)[source]¶: Calls the Dataset function with the results of the runs as arguments

process_run()[source]¶

Run the dataset by performing explicit run of each of the item of the: runs list. If the list selection is provided in the call of the :py:meth:’run’ the calculation is restricted to the elements of the list

run_the_calculations(selection)[source]¶

Method that manage the execution of the runs of the Dataset. The elements of the Dataset in the selection list are computed in parallel according to the limitation provided by the num_tasks attribute. The method uses the multiprocessing.Process to manage the parallel runs.

Parameters:: selection (list) – if not None only the runs in the list are computed

seek_convergence(rtol=1e-05, atol=1e-08, convergence_level=1, selection=None, **kwargs)[source]¶

Search for the first result of the dataset that matches the provided tolerance parameter. Convergence is reached if all the subsequent calculations, specified by the convergence_level parameter, match the convergence condition. Results are checked in dataset order (provided by the append_run() method) if selection is not specified. Employs the numpy allclose() method for comparison.

Parameters:

rtol (float) – relative tolerance parameter
atol (float) – absolute tolerance parameter
convergence_level (int) – number of subsequent results that have to satisfy the convergence criterion to assess that convergence is reached
selection (list) – list of the ids of the runs used to perform the convergence search
**kwargs – arguments to be passed to the fetch_results() method. If a generic post_processing_function is used the user can specify the control quantity for seeking convergence, for instance attribute=’energy’

Returns:

if convergence is found return a dictionary with the converged id and the corresponding converged value of the control quantity

Return type:

dict

Raises:

IndexError – if convergence is not found. The dataset has to be enriched or the convergence parameters loosened.

set_postprocessing_function(func)[source]¶

Set the callback of run. Calls the function func after having performed the appended runs.

Parameters:: func (func) – function that process the inputs results and returns the value of the run method of the dataset. The function is called as func(self).

mppi.Datasets.Dataset.convergence_plot(**kwargs)[source]¶: Perform the convergence plot associated to the seek_convergence method. The plot shows a dashed vertical line in correspondence of the converged value.

mppi.Datasets.Dataset.name_from_id(id)[source]¶

Convert the id into a run name. If id is a string, set name = id. If id is a dictionary set name = key+’_’+str(id[key])+’-’ for all the keys of the id. If id is a tuple set name = str(element)+’-’ for all the elements of the id.

Parameters:: id – id associated to the run
Returns:: name of the run associated to the dictionary id
Return type:: name (str)

Dataset¶

Table of Contents

Previous topic

Next topic

This Page