[1]:

# useful to autoreload the module without restarting the kernel
%load_ext autoreload
%autoreload 2

[2]:

from mppi import InputFiles as I, Calculators as C, Datasets as D

Tutorial for the Dataset module

Dataset is the class used to build, perform and post-process a set made of several calculation performed both with QuantumESPRESSO and Yambo.

Here we discuss some explicit examples to describe the usage and the main features of the package.

Perform a convergence analysis for the gs energy of Silicon

We use this class to find the value of the energy cutoff that guarantees a converged result for the ground state energy of Silicon.

We start from a given input file for Silicon

[4]:

inp = I.PwInput(file='IO_files/si_scf.in')
#inp

And we define a Calculator that will be used by the Dataset class to run the computation

[5]:

code1 = C.QeCalculator(mpi=2, skip = False)
code2 = C.QeCalculator(mpi=4, skip = False)
code1.global_options()

Initialize a parallel QuantumESPRESSO calculator with scheduler direct
Initialize a parallel QuantumESPRESSO calculator with scheduler direct

[5]:

{'omp': 1,
 'mpi': 2,
 'mpi_run': 'mpirun -np',
 'executable': 'pw.x',
 'scheduler': 'direct',
 'multiTask': True,
 'skip': False,
 'verbose': True,
 'IO_time': 5}

Now we can define the instance of Dataset to perform the convergence procedure. Some information of the class can be read as

[6]:

gs_convergence = D.Dataset(label='Si_gs_convergence',run_dir='Si_gs_convergence', spin_orbit = False)

Dataset inherit from Runner so it has the same structure and we can use the same methods of QeCalculator and YamboCalculator to access to its global options.

Note that in this case we have defined a spin_orbit variable that can be used later. This variables is stored in the global options of the dataset

[7]:

gs_convergence.global_options()

[7]:

{'label': 'Si_gs_convergence',
 'run_dir': 'Si_gs_convergence',
 'spin_orbit': False}

The next step is to append to the Dataset all the calculation that we want to peform lately.

For instance we can append some calculations in function of the cutoff energy. To show the design of the class we make usage of two different calculators

[8]:

eng_cut = 20
idd = {'eng_cut' : eng_cut} #id that identifies the run in the Dataset
inp.set_prefix(D.name_from_id(idd)) #attribute the id as the prefix of the input
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code1,input=inp, variable1 = 'first_run')

The append_run method set the attribute of the object, for instance

[9]:

print(gs_convergence.ids) # identify each element of the dataset
print(gs_convergence.calculators) # list with the calculators and the associated runs
gs_convergence.runs

[{'eng_cut': 20}]
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af978>, 'iruns': [0]}]

[9]:

[{'input': {'control': {'verbosity': "'high'",
    'pseudo_dir': "'../pseudos'",
    'calculation': "'scf'",
    'prefix': "'eng_cut_20'"},
   'system': {'force_symmorphic': '.true.',
    'occupations': "'fixed'",
    'ibrav': '2',
    'celldm(1)': '10.3',
    'ntyp': '1',
    'nat': '2',
    'ecutwfc': 20},
   'electrons': {'conv_thr': '1e-08'},
   'ions': {},
   'cell': {},
   'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},
   'atomic_positions': {'type': 'crystal',
    'values': [['Si', [0.125, 0.125, 0.125]],
     ['Si', [-0.125, -0.125, -0.125]]]},
   'kpoints': {'type': 'automatic',
    'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},
   'cell_parameters': {},
   'file': 'IO_files/si_scf.in'},
  'name': 'eng_cut_20',
  'jobname': 'eng_cut_20',
  'label': 'Si_gs_convergence',
  'run_dir': 'Si_gs_convergence',
  'spin_orbit': False,
  'variable1': 'first_run'}]

The name of the input files is evaluated from the ids using the function name_from_id.

We add further calculations

[10]:

eng_cut = 30
idd = {'eng_cut' : eng_cut} #id that identifies the run in the Dataset
inp.set_prefix(D.name_from_id(idd)) #attribute the id as the prefix of the input
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code1,input=inp, variable2 = 'second_run')

[11]:

print(gs_convergence.ids)
print(gs_convergence.calculators)
gs_convergence.runs

[{'eng_cut': 20}, {'eng_cut': 30}]
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af978>, 'iruns': [0, 1]}]

[11]:

[{'input': {'control': {'verbosity': "'high'",
    'pseudo_dir': "'../pseudos'",
    'calculation': "'scf'",
    'prefix': "'eng_cut_20'"},
   'system': {'force_symmorphic': '.true.',
    'occupations': "'fixed'",
    'ibrav': '2',
    'celldm(1)': '10.3',
    'ntyp': '1',
    'nat': '2',
    'ecutwfc': 20},
   'electrons': {'conv_thr': '1e-08'},
   'ions': {},
   'cell': {},
   'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},
   'atomic_positions': {'type': 'crystal',
    'values': [['Si', [0.125, 0.125, 0.125]],
     ['Si', [-0.125, -0.125, -0.125]]]},
   'kpoints': {'type': 'automatic',
    'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},
   'cell_parameters': {},
   'file': 'IO_files/si_scf.in'},
  'name': 'eng_cut_20',
  'jobname': 'eng_cut_20',
  'label': 'Si_gs_convergence',
  'run_dir': 'Si_gs_convergence',
  'spin_orbit': False,
  'variable1': 'first_run'},
 {'input': {'control': {'verbosity': "'high'",
    'pseudo_dir': "'../pseudos'",
    'calculation': "'scf'",
    'prefix': "'eng_cut_30'"},
   'system': {'force_symmorphic': '.true.',
    'occupations': "'fixed'",
    'ibrav': '2',
    'celldm(1)': '10.3',
    'ntyp': '1',
    'nat': '2',
    'ecutwfc': 30},
   'electrons': {'conv_thr': '1e-08'},
   'ions': {},
   'cell': {},
   'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},
   'atomic_positions': {'type': 'crystal',
    'values': [['Si', [0.125, 0.125, 0.125]],
     ['Si', [-0.125, -0.125, -0.125]]]},
   'kpoints': {'type': 'automatic',
    'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},
   'cell_parameters': {},
   'file': 'IO_files/si_scf.in'},
  'name': 'eng_cut_30',
  'jobname': 'eng_cut_30',
  'label': 'Si_gs_convergence',
  'run_dir': 'Si_gs_convergence',
  'spin_orbit': False,
  'variable2': 'second_run'}]

Note that the variables passed as kwargs in the append run are added to the runs members.

We add further compuations using also the second calculator

[12]:

eng_cut = 40
idd = 'eng_cut_%s'%eng_cut # the id can be also a string
inp.set_prefix(D.name_from_id(idd))
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code2,input=inp,variable3 = 'second_calculator')

eng_cut = 50
idd = {'eng_cut' : eng_cut}
inp.set_prefix(D.name_from_id(idd))
inp.set_energy_cutoff(eng_cut)
gs_convergence.append_run(id=idd,runner=code1,input=inp)

[13]:

print(gs_convergence.ids)
print(gs_convergence.calculators)

[{'eng_cut': 20}, {'eng_cut': 30}, 'eng_cut_40', {'eng_cut': 50}]
[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af978>, 'iruns': [0, 1, 3]}, {'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af828>, 'iruns': [2]}]

gs_convergence.runs is a list that contains the merge of the input object and the global options for each of the appended run, in this way one can check which are the inputs associated to each calculator.

[14]:

#gs_convergence.runs[1] #give the parameters of the runs associated to the second calculator

The attribute .results a dictionary that is empty before the run

[15]:

gs_convergence.results

[15]:

{}

Once that all the computation have been added we can run the Dataset

[16]:

results = gs_convergence.run()
results

delete log file: Si_gs_convergence/eng_cut_20.log
delete xml file: Si_gs_convergence/eng_cut_20.xml
delete folder: Si_gs_convergence/eng_cut_20.save
delete log file: Si_gs_convergence/eng_cut_30.log
delete xml file: Si_gs_convergence/eng_cut_30.xml
delete folder: Si_gs_convergence/eng_cut_30.save
delete log file: Si_gs_convergence/eng_cut_50.log
delete xml file: Si_gs_convergence/eng_cut_50.xml
delete folder: Si_gs_convergence/eng_cut_50.save
run 0 command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_20.in > eng_cut_20.log
run 1 command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_30.in > eng_cut_30.log
run 2 command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_50.in > eng_cut_50.log
run0_is_running: True run1_is_running: True run2_is_running: True
Job completed
delete log file: Si_gs_convergence/eng_cut_40.log
delete xml file: Si_gs_convergence/eng_cut_40.xml
delete folder: Si_gs_convergence/eng_cut_40.save
run 0 command: cd Si_gs_convergence; mpirun -np 4 pw.x -inp eng_cut_40.in > eng_cut_40.log
run0_is_running: True
Job completed

[16]:

{0: {'output': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},
 1: {'output': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},
 3: {'output': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},
 2: {'output': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}}

The run method returns the attribute .results of the Dataset.

[17]:

gs_convergence.results

[17]:

{0: {'output': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},
 1: {'output': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},
 3: {'output': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},
 2: {'output': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}}

This implementation allows us to parse the data after the execution of the dataset and/or to choose a parser among several choices.

Usage of the multiTask feature

By default Dataset run in parallel all the computations associated to the same calculator. However if the multiTask = False option is passed to the calculator all the computations are performed in sequence.

[18]:

code1.update_global_options(multiTask=False)
code2.update_global_options(multiTask=False)

[19]:

results = gs_convergence.run()
results

delete log file: Si_gs_convergence/eng_cut_20.log
delete xml file: Si_gs_convergence/eng_cut_20.xml
delete folder: Si_gs_convergence/eng_cut_20.save
delete log file: Si_gs_convergence/eng_cut_30.log
delete xml file: Si_gs_convergence/eng_cut_30.xml
delete folder: Si_gs_convergence/eng_cut_30.save
delete log file: Si_gs_convergence/eng_cut_50.log
delete xml file: Si_gs_convergence/eng_cut_50.xml
delete folder: Si_gs_convergence/eng_cut_50.save
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_20.in > eng_cut_20.log
run0_is_running:True
Job completed
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_30.in > eng_cut_30.log
run0_is_running:True
Job completed
Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_50.in > eng_cut_50.log
run0_is_running:True
Job completed
delete log file: Si_gs_convergence/eng_cut_40.log
delete xml file: Si_gs_convergence/eng_cut_40.xml
delete folder: Si_gs_convergence/eng_cut_40.save
Executing command: cd Si_gs_convergence; mpirun -np 4 pw.x -inp eng_cut_40.in > eng_cut_40.log
run0_is_running:True
Job completed

[19]:

{0: {'output': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},
 1: {'output': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},
 3: {'output': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},
 2: {'output': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}}

Parsing of the results

One way to perform the parsing of the results is a posteriori from the run of the dataset.

For instance we can parse the results with the PwParser class of this package

[18]:

from mppi import Parsers as P
results = {}
for run,data in gs_convergence.results.items():
    results[run] = P.PwParser(data['output'])

Parse file : Si_gs_convergence/eng_cut_20.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_30.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_50.save/data-file-schema.xml
Parse file : Si_gs_convergence/eng_cut_40.save/data-file-schema.xml

[19]:

results

[19]:

{0: <mppi.Parsers.PwParser.PwParser at 0x7f2b6c0c6e48>,
 1: <mppi.Parsers.PwParser.PwParser at 0x7f2b6ff999e8>,
 3: <mppi.Parsers.PwParser.PwParser at 0x7f2b6ff99ac8>,
 2: <mppi.Parsers.PwParser.PwParser at 0x7f2b6ff4e710>}

The results associate to the key “i” correspond to the i-th element appended to the run.

The input parameters associated to each key of results are written inside the gs_convergence_runs[key] list.

For instance the total energy is extracted as

[20]:

for run,res in results.items():
    print('run',run,'energy',res.get_energy(convert_eV=False))

run 0 energy -7.870821313306621
run 1 energy -7.872953197735628
run 3 energy -7.874492376332521
run 2 energy -7.874327291715858

Usage of the post processing function

The Parsing, or other more specific procedures, can be performed directly when the run method is called.

To do so, we define a post processing function and pass it to the Dataset.

The class will apply this function when the run method of Dataset is called. For instance in this way we can directly extract the total energy

[21]:

def extract_energy(dataset):
    from mppi import Parsers as P
    energy = {}
    for run,data in dataset.results.items():
        results = P.PwParser(data['output'],verbose=False)
        energy[run] = results.get_energy(convert_eV = False)
    return energy

[22]:

gs_convergence.set_postprocessing_function(extract_energy)

Once that the post processing function is passed to dataset it is directly applied when the run is executed

[23]:

code1.update_global_options(verbose=False,skip=True,multiTask=True)
code2.update_global_options(verbose=False,skip=True,multiTask=True)
gs_convergence.run()

[23]:

{0: -7.870821313306621,
 1: -7.872953197735628,
 3: -7.874492376332521,
 2: -7.874327291715858}

[24]:

gs_convergence.post_processing()

[24]:

{0: -7.870821313306621,
 1: -7.872953197735628,
 3: -7.874492376332521,
 2: -7.874327291715858}

Note that the attribute results contains always the name of the xml data, the post processed results can be accessed in the class as self.post_processing().

Usage of the fetch_results method

Another possible approach is to define a post processing function that perform a simple parsing of the data.

Then we can use fetch_results to seek for the attribute energy in the computation(s) that match the id passed in fetch_results

[81]:

def parse_data(dataset):
    from mppi import Parsers as P
    results = {}
    for run,data in dataset.results.items():
        results[run] = P.PwParser(data['output'],verbose=False)
    return results

[82]:

gs_convergence.set_postprocessing_function(parse_data)

[83]:

gs_convergence.fetch_results(id={'eng_cut': 50},attribute='energy')

[83]:

[-7.874492376312396]

Note that it is not necesary to run the dataset since the fetch_results method perform the runs that match with the id (if the option run_if_not_present=True is used)

Usage of the seek_convergence method

We present the functionality of this method by performing a second convergence test on the number of kpoints.

In this example we set the energy cutoff to 60 Ry and build a new dataset appending run with increasing number of kpoints.

[135]:

inp = I.PwInput('IO_files/si_scf.in')
inp.set_energy_cutoff(60)

[136]:

code = C.QeCalculator(skip=True,verbose=False, mpi_run='mpirun -np 4')

Initialize a parallel QuantumESPRESSO calculator with scheduler direct

[137]:

gs_kpoint = D.Dataset(label='Si_kpoints_convergence',run_dir='Si_gs_convergence')

[138]:

kpoints = [2,3,4,5,6,7,8]

[139]:

for k in kpoints:
    id = {'kp':k}
    inp.set_kpoints(points = [k,k,k])
    inp.set_prefix(D.name_from_id(id))
    gs_kpoint.append_run(id=id,runner=code,input=inp)

The runs have been appended but not performed, then we call seek_convergence.

We want to perform a convergence procedure based on the value of the total energy of the system. So we can use the post processing function that directly provides this quantity

[140]:

gs_kpoint.set_postprocessing_function(extract_energy)

[141]:

gs_kpoint.seek_convergence(rtol=0.001)

Fetching results for id " {'kp': 2} "
Fetching results for id " {'kp': 3} "
Fetching results for id " {'kp': 4} "
Fetching results for id " {'kp': 5} "
Convergence reached in Dataset "Si_kpoints_convergence" for id " {'kp': 4} "

[141]:

({'kp': 4}, -7.874513952262473)

Seek_converge runs all the computation (in the order provided by append_run) until convergence is reached. Otherwise it is possible to pass a list of ids as argument of the method, in this case the calculation are restricted to the simulations associated to the provided ids.

It is also possible to use a more generic post processing function that simply parse the data. In this case we can choose which quantity is used to check if the convergence is reached by specifying the attribute = … options in the call of the seek_convergence. For instance

[142]:

gs_kpoint.set_postprocessing_function(parse_data)

[143]:

gs_kpoint.seek_convergence(rtol=0.001,attribute='energy')

Fetching results for id " {'kp': 2} "
Fetching results for id " {'kp': 3} "
Fetching results for id " {'kp': 4} "
Fetching results for id " {'kp': 5} "
Convergence reached in Dataset "Si_kpoints_convergence" for id " {'kp': 4} "

[143]:

({'kp': 4}, -7.874513952262473)

Perform a convergence test for Hartree-Fock computations with Yambo

We consider a set of Hartree-Fock computation for silicon and we look for the value of the EXXRLvcs that ensure a converged value of the direct gap.

First of all we need a nscf computation. We start from scf result with ecutoff = 60 and kpoints = [4,4,4]

[27]:

inp = I.PwInput('Si_gs_convergence/kp_4.in')
inp.set_nscf(8,force_symmorphic=True)
inp.set_kpoints(points = [6,6,6]) #nscf kpoints can be different from the scf
name = 'nscf_kp6_ecut60'
inp.set_prefix(name)
#inp

[28]:

code = C.QeCalculator(mpi=4)
code.global_options()

Initialize a parallel QuantumESPRESSO calculator with scheduler direct

[28]:

{'omp': 1,
 'mpi': 4,
 'mpi_run': 'mpirun -np',
 'executable': 'pw.x',
 'scheduler': 'direct',
 'multiTask': True,
 'skip': True,
 'verbose': True,
 'IO_time': 5}

[29]:

code.run(run_dir='Si_gs_convergence',inputs=[inp],names=[name],source_dir='Si_gs_convergence/kp_4.save')

The folder Si_gs_convergence/nscf_kp6_ecut60.save already exsists. Source folder Si_gs_convergence/kp_4.save not copied
Skip the run of nscf_kp6_ecut60
Job completed

[29]:

{'output': ['Si_gs_convergence/nscf_kp6_ecut60.save/data-file-schema.xml']}

The next step is the generation of the run_dir and SAVE folder

[30]:

from mppi import Utilities as U

[31]:

run_dir = 'Si_hf_convergence'
source_dir = 'Si_gs_convergence/nscf_kp6_ecut60.save'

[32]:

U.build_SAVE(source_dir,run_dir)

SAVE folder already present in Si_hf_convergence

Now we are ready to build the Yambo dataset

[33]:

code = C.YamboCalculator()
code.global_options()

Initialize a parallel Yambo calculator with scheduler direct

[33]:

{'omp': 1,
 'mpi': 2,
 'mpi_run': 'mpirun -np',
 'executable': 'yambo',
 'scheduler': 'direct',
 'multiTask': True,
 'skip': True,
 'verbose': True,
 'IO_time': 5,
 'clean_restart': True}

[34]:

inp = I.YamboInput(args='yambo -x -V rl',folder=run_dir)
inp.set_kRange(1,1) # we are interested at the direct gap at Gamma so we include only the first kpoint
inp

[34]:

{'args': 'yambo -x -V rl',
 'folder': 'Si_hf_convergence',
 'filename': 'yambo.in',
 'arguments': ['HF_and_locXC'],
 'variables': {'FFTGvecs': [2733.0, 'RL'],
  'SE_Threads': [0.0, ''],
  'EXXRLvcs': [1.0, 'RL'],
  'VXCRLvcs': [17153.0, 'RL'],
  'QPkrange': [[1, 1, 1, 8], '']}}

[167]:

hf_convergence = D.Dataset(label='Si_hf',run_dir=run_dir)

Let us start by adding some computations to see how to manage the data

[168]:

exx_values = [1.,2.,3.] #in Hartree

[169]:

for e in exx_values:
    id = {'exxrl' : e}
    inp['variables']['EXXRLvcs'] = [1e3*e, 'mHa']
    hf_convergence.append_run(id=id,input=inp,runner=code)

If needed we can also pass the jobname attribute by adding, for istance

jobname=D.name_from_id(id)+’-job’

in the appen_run. For instance

[170]:

exx = 4.
id = {'exxrl' : exx}
inp['variables']['EXXRLvcs'] = [1e3*exx, 'mHa']
hf_convergence.append_run(id=id,input=inp,jobname=D.name_from_id(id)+'-job',runner=code)

[171]:

hf_convergence.runs[3]

[171]:

{'input': {'args': 'yambo -x -V rl',
  'folder': 'Si_hf_convergence',
  'filename': 'yambo.in',
  'arguments': ['HF_and_locXC'],
  'variables': {'FFTGvecs': [2733.0, 'RL'],
   'SE_Threads': [0.0, ''],
   'EXXRLvcs': [4000.0, 'mHa'],
   'VXCRLvcs': [17153.0, 'RL'],
   'QPkrange': [[1, 1, 1, 8], '']}},
 'name': 'exxrl_4.0',
 'jobname': 'exxrl_4.0-job',
 'label': 'Si_hf',
 'run_dir': 'Si_hf_convergence'}

Then we can run the dataset

[172]:

hf_convergence.run()

Skip the computation for input exxrl_1.0
Skip the computation for input exxrl_2.0
Skip the computation for input exxrl_3.0
Skip the computation for input exxrl_4.0
Job completed

[172]:

{0: {'output': ['Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf'],
  'dbs': 'Si_hf_convergence/exxrl_1.0'},
 1: {'output': ['Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf'],
  'dbs': 'Si_hf_convergence/exxrl_2.0'},
 2: {'output': ['Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf'],
  'dbs': 'Si_hf_convergence/exxrl_3.0'},
 3: {'output': ['Si_hf_convergence/exxrl_4.0/o-exxrl_4.0-job.hf'],
  'dbs': 'Si_hf_convergence/exxrl_4.0-job'}}

[173]:

hf_convergence.results

[173]:

{0: {'output': ['Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf'],
  'dbs': 'Si_hf_convergence/exxrl_1.0'},
 1: {'output': ['Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf'],
  'dbs': 'Si_hf_convergence/exxrl_2.0'},
 2: {'output': ['Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf'],
  'dbs': 'Si_hf_convergence/exxrl_3.0'},
 3: {'output': ['Si_hf_convergence/exxrl_4.0/o-exxrl_4.0-job.hf'],
  'dbs': 'Si_hf_convergence/exxrl_4.0-job'}}

Parsing the results with a post processing function

We can define a general post processing function to extract all the results from the o- files of the dataset.

We can use the YamboParser class of this package

[174]:

def parse_data(dataset):
    from mppi import Parsers as P
    results = {}
    for run,data in dataset.results.items():
        results[run] = P.YamboParser(data['output'],verbose=True)
    return results

[175]:

hf_convergence.set_postprocessing_function(parse_data)

[176]:

code.update_global_options(verbose=False,skip=True)
results = hf_convergence.run()

Parse file Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf
Parse file Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf
Parse file Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf
Parse file Si_hf_convergence/exxrl_4.0/o-exxrl_4.0-job.hf

Results can be extracted as

[177]:

for irun in results:
    print(results[irun]['hf']['ehf'])

[-18.51499   -0.5254    -0.5254    -0.52829    7.248512   7.259116
   7.259286   8.57444 ]
[-18.59211   -0.9768    -0.9767    -0.98311    6.951296   6.961545
   6.961472   8.0761  ]
[-18.6265    -1.125     -1.125     -1.12512    6.797003   6.797605
   6.797518   7.91638 ]
[-18.63289   -1.142     -1.142     -1.14206    6.77871    6.778919
   6.778865   7.89733 ]

Computing the direct gap with a post processing function

We describe the usage of a post processing function to perform a more specific operation like computing the direct band gap. We define the post processing function

[178]:

def get_direct_gap(dataset):
    """"
    Compute the direct band gap assuming that there is only one kpoint.
    The arguments energy_col, val_band and cond_band are read from the global_options
    of the dataset.
    """
    from mppi import Parsers as P
    import numpy as np
    glob_opt = dataset.global_options()
    val_band = glob_opt.get('val_band')
    cond_band = glob_opt.get('cond_band')
    # the name of the column used to compute the gap
    energy_col = glob_opt.get('energy_col','hf')
    gap = {}
    for run,data in dataset.results.items():
        results = P.YamboParser(data['output'])
        key = list(results.keys())[0] # select the key (can be hf or qp)
        bands = results[key]['band']
        index_val = np.where(bands == val_band)
        index_cond = np.where(bands == cond_band)
        energy = results[key][energy_col]
        delta = energy[index_cond]-energy[index_val]
        gap[run] = float(delta)
    return gap

This function assume that some inputs like the specification of the conduction and valence bands are given in the global options of the dataset. So we can se

[179]:

hf_convergence.update_global_options(val_band = 4, cond_band = 5, energy_col = 'hf')

Then we set the new post processing function and run the dataset

[180]:

hf_convergence.set_postprocessing_function(get_direct_gap)

[181]:

hf_convergence.run()

[181]:

{0: 6.5168870000000005,
 1: 6.674491,
 2: 6.662208000000001,
 3: 6.660856000000001}

Usage of seek convergence

The post processing function defined above can be used together with the seek_convergence method to perform a convergence study

In this case we define a new dataset and append many possible runs. Only those one needed to reach the given tolerance will be executed

[182]:

hf_convergence2 = D.Dataset(label='Si_hf',run_dir=run_dir,val_band = 4, cond_band = 5, var_name = 'hf')

[183]:

exx_values = [float(i) for i in range(1,10)] #in Hartree
exx_values

[183]:

[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]

[184]:

for e in exx_values:
    id = {'exxrl' : e}
    inp['variables']['EXXRLvcs'] = [1e3*e, 'mHa']
    hf_convergence2.append_run(id=id,input=inp,runner=code)

[185]:

hf_convergence2.set_postprocessing_function(get_direct_gap)

[186]:

hf_convergence2.seek_convergence(rtol=0.0001)

Fetching results for id " {'exxrl': 1.0} "
Fetching results for id " {'exxrl': 2.0} "
Fetching results for id " {'exxrl': 3.0} "
Fetching results for id " {'exxrl': 4.0} "
Fetching results for id " {'exxrl': 5.0} "
Fetching results for id " {'exxrl': 6.0} "
Convergence reached in Dataset "Si_hf" for id " {'exxrl': 5.0} "

[186]:

({'exxrl': 5.0}, 6.661784)

[ ]: