{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# useful to autoreload the module without restarting the kernel\n",
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mppi import InputFiles as I, Calculators as C, Datasets as D"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Tutorial for the Dataset module"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dataset is the class used to build, perform and post-process a set made of several calculation performed both with QuantumESPRESSO and Yambo.\n",
    "\n",
    "Here we discuss some explicit examples to describe the usage and the main features of the package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a convergence analysis for the gs energy of Silicon"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use this class to find the value of the energy cutoff that guarantees a converged result for the\n",
    "ground state energy of Silicon.\n",
    "\n",
    "We start from a given input file for Silicon"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = I.PwInput(file='IO_files/si_scf.in')\n",
    "#inp"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And we define a Calculator that will be used by the Dataset class to run the computation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialize a parallel QuantumESPRESSO calculator with scheduler direct\n",
      "Initialize a parallel QuantumESPRESSO calculator with scheduler direct\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'omp': 1,\n",
       " 'mpi': 2,\n",
       " 'mpi_run': 'mpirun -np',\n",
       " 'executable': 'pw.x',\n",
       " 'scheduler': 'direct',\n",
       " 'multiTask': True,\n",
       " 'skip': False,\n",
       " 'verbose': True,\n",
       " 'IO_time': 5}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "code1 = C.QeCalculator(mpi=2, skip = False)\n",
    "code2 = C.QeCalculator(mpi=4, skip = False)\n",
    "code1.global_options()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can define the instance of Dataset to perform the convergence procedure. Some information of the class\n",
    "can be read as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs_convergence = D.Dataset(label='Si_gs_convergence',run_dir='Si_gs_convergence', spin_orbit = False)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Dataset inherit from Runner so it has the same structure and we can use the same methods of QeCalculator and YamboCalculator \n",
    "to access to its global options. \n",
    "\n",
    "Note that in this case we have defined a spin_orbit variable that can be used later. This variables is \n",
    "stored in the global options of the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'label': 'Si_gs_convergence',\n",
       " 'run_dir': 'Si_gs_convergence',\n",
       " 'spin_orbit': False}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_convergence.global_options()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is to append to the Dataset all the calculation that we want to peform lately.\n",
    "\n",
    "For instance we can append some calculations in function of the cutoff energy. To show the design of the class\n",
    "we make usage of two different calculators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "eng_cut = 20 \n",
    "idd = {'eng_cut' : eng_cut} #id that identifies the run in the Dataset\n",
    "inp.set_prefix(D.name_from_id(idd)) #attribute the id as the prefix of the input\n",
    "inp.set_energy_cutoff(eng_cut)\n",
    "gs_convergence.append_run(id=idd,runner=code1,input=inp, variable1 = 'first_run')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The append_run method set the attribute of the object, for instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'eng_cut': 20}]\n",
      "[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af978>, 'iruns': [0]}]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[{'input': {'control': {'verbosity': \"'high'\",\n",
       "    'pseudo_dir': \"'../pseudos'\",\n",
       "    'calculation': \"'scf'\",\n",
       "    'prefix': \"'eng_cut_20'\"},\n",
       "   'system': {'force_symmorphic': '.true.',\n",
       "    'occupations': \"'fixed'\",\n",
       "    'ibrav': '2',\n",
       "    'celldm(1)': '10.3',\n",
       "    'ntyp': '1',\n",
       "    'nat': '2',\n",
       "    'ecutwfc': 20},\n",
       "   'electrons': {'conv_thr': '1e-08'},\n",
       "   'ions': {},\n",
       "   'cell': {},\n",
       "   'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},\n",
       "   'atomic_positions': {'type': 'crystal',\n",
       "    'values': [['Si', [0.125, 0.125, 0.125]],\n",
       "     ['Si', [-0.125, -0.125, -0.125]]]},\n",
       "   'kpoints': {'type': 'automatic',\n",
       "    'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},\n",
       "   'cell_parameters': {},\n",
       "   'file': 'IO_files/si_scf.in'},\n",
       "  'name': 'eng_cut_20',\n",
       "  'jobname': 'eng_cut_20',\n",
       "  'label': 'Si_gs_convergence',\n",
       "  'run_dir': 'Si_gs_convergence',\n",
       "  'spin_orbit': False,\n",
       "  'variable1': 'first_run'}]"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(gs_convergence.ids) # identify each element of the dataset\n",
    "print(gs_convergence.calculators) # list with the calculators and the associated runs\n",
    "gs_convergence.runs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The name of the input files is evaluated from the ids using the function name_from_id."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We add further calculations"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "eng_cut = 30 \n",
    "idd = {'eng_cut' : eng_cut} #id that identifies the run in the Dataset\n",
    "inp.set_prefix(D.name_from_id(idd)) #attribute the id as the prefix of the input\n",
    "inp.set_energy_cutoff(eng_cut)\n",
    "gs_convergence.append_run(id=idd,runner=code1,input=inp, variable2 = 'second_run')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'eng_cut': 20}, {'eng_cut': 30}]\n",
      "[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af978>, 'iruns': [0, 1]}]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[{'input': {'control': {'verbosity': \"'high'\",\n",
       "    'pseudo_dir': \"'../pseudos'\",\n",
       "    'calculation': \"'scf'\",\n",
       "    'prefix': \"'eng_cut_20'\"},\n",
       "   'system': {'force_symmorphic': '.true.',\n",
       "    'occupations': \"'fixed'\",\n",
       "    'ibrav': '2',\n",
       "    'celldm(1)': '10.3',\n",
       "    'ntyp': '1',\n",
       "    'nat': '2',\n",
       "    'ecutwfc': 20},\n",
       "   'electrons': {'conv_thr': '1e-08'},\n",
       "   'ions': {},\n",
       "   'cell': {},\n",
       "   'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},\n",
       "   'atomic_positions': {'type': 'crystal',\n",
       "    'values': [['Si', [0.125, 0.125, 0.125]],\n",
       "     ['Si', [-0.125, -0.125, -0.125]]]},\n",
       "   'kpoints': {'type': 'automatic',\n",
       "    'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},\n",
       "   'cell_parameters': {},\n",
       "   'file': 'IO_files/si_scf.in'},\n",
       "  'name': 'eng_cut_20',\n",
       "  'jobname': 'eng_cut_20',\n",
       "  'label': 'Si_gs_convergence',\n",
       "  'run_dir': 'Si_gs_convergence',\n",
       "  'spin_orbit': False,\n",
       "  'variable1': 'first_run'},\n",
       " {'input': {'control': {'verbosity': \"'high'\",\n",
       "    'pseudo_dir': \"'../pseudos'\",\n",
       "    'calculation': \"'scf'\",\n",
       "    'prefix': \"'eng_cut_30'\"},\n",
       "   'system': {'force_symmorphic': '.true.',\n",
       "    'occupations': \"'fixed'\",\n",
       "    'ibrav': '2',\n",
       "    'celldm(1)': '10.3',\n",
       "    'ntyp': '1',\n",
       "    'nat': '2',\n",
       "    'ecutwfc': 30},\n",
       "   'electrons': {'conv_thr': '1e-08'},\n",
       "   'ions': {},\n",
       "   'cell': {},\n",
       "   'atomic_species': {'Si': ['28.086', 'Si.pbe-mt_fhi.UPF']},\n",
       "   'atomic_positions': {'type': 'crystal',\n",
       "    'values': [['Si', [0.125, 0.125, 0.125]],\n",
       "     ['Si', [-0.125, -0.125, -0.125]]]},\n",
       "   'kpoints': {'type': 'automatic',\n",
       "    'values': ([4.0, 4.0, 4.0], [0.0, 0.0, 0.0])},\n",
       "   'cell_parameters': {},\n",
       "   'file': 'IO_files/si_scf.in'},\n",
       "  'name': 'eng_cut_30',\n",
       "  'jobname': 'eng_cut_30',\n",
       "  'label': 'Si_gs_convergence',\n",
       "  'run_dir': 'Si_gs_convergence',\n",
       "  'spin_orbit': False,\n",
       "  'variable2': 'second_run'}]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "print(gs_convergence.ids) \n",
    "print(gs_convergence.calculators) \n",
    "gs_convergence.runs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the variables passed as kwargs in the append run are added to the runs members.\n",
    "\n",
    "We add further compuations using also the second calculator"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "eng_cut = 40 \n",
    "idd = 'eng_cut_%s'%eng_cut # the id can be also a string\n",
    "inp.set_prefix(D.name_from_id(idd)) \n",
    "inp.set_energy_cutoff(eng_cut)\n",
    "gs_convergence.append_run(id=idd,runner=code2,input=inp,variable3 = 'second_calculator')\n",
    "\n",
    "eng_cut = 50 \n",
    "idd = {'eng_cut' : eng_cut} \n",
    "inp.set_prefix(D.name_from_id(idd))\n",
    "inp.set_energy_cutoff(eng_cut)\n",
    "gs_convergence.append_run(id=idd,runner=code1,input=inp)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[{'eng_cut': 20}, {'eng_cut': 30}, 'eng_cut_40', {'eng_cut': 50}]\n",
      "[{'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af978>, 'iruns': [0, 1, 3]}, {'calc': <mppi.Calculators.QeCalculator.QeCalculator object at 0x7f2b6c0af828>, 'iruns': [2]}]\n"
     ]
    }
   ],
   "source": [
    "print(gs_convergence.ids) \n",
    "print(gs_convergence.calculators) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "gs_convergence.runs is a list that contains the merge of the input object and the global options for each of the\n",
    "appended run, in this way one can check which are the inputs associated to each calculator."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "#gs_convergence.runs[1] #give the parameters of the runs associated to the second calculator"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The attribute .results a dictionary that is empty before the run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_convergence.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once that all the computation have been added we can run the Dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "delete log file: Si_gs_convergence/eng_cut_20.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_20.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_20.save\n",
      "delete log file: Si_gs_convergence/eng_cut_30.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_30.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_30.save\n",
      "delete log file: Si_gs_convergence/eng_cut_50.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_50.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_50.save\n",
      "run 0 command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_20.in > eng_cut_20.log\n",
      "run 1 command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_30.in > eng_cut_30.log\n",
      "run 2 command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_50.in > eng_cut_50.log\n",
      "run0_is_running: True run1_is_running: True run2_is_running: True \n",
      "Job completed\n",
      "delete log file: Si_gs_convergence/eng_cut_40.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_40.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_40.save\n",
      "run 0 command: cd Si_gs_convergence; mpirun -np 4 pw.x -inp eng_cut_40.in > eng_cut_40.log\n",
      "run0_is_running: True \n",
      "Job completed\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{0: {'output': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},\n",
       " 1: {'output': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},\n",
       " 3: {'output': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},\n",
       " 2: {'output': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results = gs_convergence.run()\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The run method returns the attribute .results of the Dataset. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: {'output': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},\n",
       " 1: {'output': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},\n",
       " 3: {'output': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},\n",
       " 2: {'output': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}}"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_convergence.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This implementation allows us to parse the data after the execution of the dataset and/or to choose a parser \n",
    "among several choices. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage of the multiTask feature"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default Dataset run in parallel all the computations associated to the same calculator. However if the multiTask = False option\n",
    "is passed to the calculator all the computations are performed in sequence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "code1.update_global_options(multiTask=False)\n",
    "code2.update_global_options(multiTask=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "delete log file: Si_gs_convergence/eng_cut_20.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_20.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_20.save\n",
      "delete log file: Si_gs_convergence/eng_cut_30.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_30.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_30.save\n",
      "delete log file: Si_gs_convergence/eng_cut_50.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_50.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_50.save\n",
      "Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_20.in > eng_cut_20.log\n",
      "run0_is_running:True  \n",
      "Job completed\n",
      "Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_30.in > eng_cut_30.log\n",
      "run0_is_running:True  \n",
      "Job completed\n",
      "Executing command: cd Si_gs_convergence; mpirun -np 2 pw.x -inp eng_cut_50.in > eng_cut_50.log\n",
      "run0_is_running:True  \n",
      "Job completed\n",
      "delete log file: Si_gs_convergence/eng_cut_40.log\n",
      "delete xml file: Si_gs_convergence/eng_cut_40.xml\n",
      "delete folder: Si_gs_convergence/eng_cut_40.save\n",
      "Executing command: cd Si_gs_convergence; mpirun -np 4 pw.x -inp eng_cut_40.in > eng_cut_40.log\n",
      "run0_is_running:True  \n",
      "Job completed\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{0: {'output': 'Si_gs_convergence/eng_cut_20.save/data-file-schema.xml'},\n",
       " 1: {'output': 'Si_gs_convergence/eng_cut_30.save/data-file-schema.xml'},\n",
       " 3: {'output': 'Si_gs_convergence/eng_cut_50.save/data-file-schema.xml'},\n",
       " 2: {'output': 'Si_gs_convergence/eng_cut_40.save/data-file-schema.xml'}}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results = gs_convergence.run()\n",
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parsing of the results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One way to perform the parsing of the results is _a posteriori_ from the run of the dataset.\n",
    "\n",
    "For instance we can parse the results with the PwParser class of this package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parse file : Si_gs_convergence/eng_cut_20.save/data-file-schema.xml\n",
      "Parse file : Si_gs_convergence/eng_cut_30.save/data-file-schema.xml\n",
      "Parse file : Si_gs_convergence/eng_cut_50.save/data-file-schema.xml\n",
      "Parse file : Si_gs_convergence/eng_cut_40.save/data-file-schema.xml\n"
     ]
    }
   ],
   "source": [
    "from mppi import Parsers as P\n",
    "results = {}\n",
    "for run,data in gs_convergence.results.items():\n",
    "    results[run] = P.PwParser(data['output'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: <mppi.Parsers.PwParser.PwParser at 0x7f2b6c0c6e48>,\n",
       " 1: <mppi.Parsers.PwParser.PwParser at 0x7f2b6ff999e8>,\n",
       " 3: <mppi.Parsers.PwParser.PwParser at 0x7f2b6ff99ac8>,\n",
       " 2: <mppi.Parsers.PwParser.PwParser at 0x7f2b6ff4e710>}"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The results associate to the key \"i\" correspond to the i-th element appended to the run.\n",
    "\n",
    "The input parameters associated to each key of results are written inside the gs_convergence_runs[key] list.\n",
    "\n",
    "For instance the total energy is extracted as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "run 0 energy -7.870821313306621\n",
      "run 1 energy -7.872953197735628\n",
      "run 3 energy -7.874492376332521\n",
      "run 2 energy -7.874327291715858\n"
     ]
    }
   ],
   "source": [
    "for run,res in results.items():\n",
    "    print('run',run,'energy',res.get_energy(convert_eV=False))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage of the post processing function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The Parsing, or other more specific procedures, can be performed directly when the run method is called.\n",
    "\n",
    "To do so, we define a post processing function and pass it to the Dataset. \n",
    "\n",
    "The class will apply this function when the run method of Dataset is called. For instance in this way we can directly \n",
    "extract the total energy "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "def extract_energy(dataset): \n",
    "    from mppi import Parsers as P\n",
    "    energy = {}\n",
    "    for run,data in dataset.results.items():\n",
    "        results = P.PwParser(data['output'],verbose=False)\n",
    "        energy[run] = results.get_energy(convert_eV = False)\n",
    "    return energy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs_convergence.set_postprocessing_function(extract_energy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once that the post processing function is passed to dataset it is directly applied when the run is executed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: -7.870821313306621,\n",
       " 1: -7.872953197735628,\n",
       " 3: -7.874492376332521,\n",
       " 2: -7.874327291715858}"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "code1.update_global_options(verbose=False,skip=True,multiTask=True)\n",
    "code2.update_global_options(verbose=False,skip=True,multiTask=True)\n",
    "gs_convergence.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: -7.870821313306621,\n",
       " 1: -7.872953197735628,\n",
       " 3: -7.874492376332521,\n",
       " 2: -7.874327291715858}"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_convergence.post_processing()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the attribute results contains always the name of the xml data, the post processed results\n",
    "can be accessed in the class as self.post_processing(). "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage of the fetch_results method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another possible approach is to define a post processing function that perform a simple parsing of the data.\n",
    "\n",
    "Then we can use fetch_results to seek for the attribute energy in the computation(s) that match the id \n",
    "passed in fetch_results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_data(dataset):\n",
    "    from mppi import Parsers as P\n",
    "    results = {}\n",
    "    for run,data in dataset.results.items():\n",
    "        results[run] = P.PwParser(data['output'],verbose=False)\n",
    "    return results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs_convergence.set_postprocessing_function(parse_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[-7.874492376312396]"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_convergence.fetch_results(id={'eng_cut': 50},attribute='energy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that it is not necesary to run the dataset since the fetch_results method perform the runs that match\n",
    "with the id (if the option run_if_not_present=True is used)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage of the seek_convergence method"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We present the functionality of this method by performing a second convergence test on the number of kpoints.\n",
    "\n",
    "In this example we set the energy cutoff to 60 Ry and build a new dataset appending run with increasing number of\n",
    "kpoints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 135,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = I.PwInput('IO_files/si_scf.in')\n",
    "inp.set_energy_cutoff(60)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 136,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialize a parallel QuantumESPRESSO calculator with scheduler direct\n"
     ]
    }
   ],
   "source": [
    "code = C.QeCalculator(skip=True,verbose=False, mpi_run='mpirun -np 4')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 137,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs_kpoint = D.Dataset(label='Si_kpoints_convergence',run_dir='Si_gs_convergence')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 138,
   "metadata": {},
   "outputs": [],
   "source": [
    "kpoints = [2,3,4,5,6,7,8]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 139,
   "metadata": {},
   "outputs": [],
   "source": [
    "for k in kpoints:\n",
    "    id = {'kp':k}\n",
    "    inp.set_kpoints(points = [k,k,k])\n",
    "    inp.set_prefix(D.name_from_id(id))\n",
    "    gs_kpoint.append_run(id=id,runner=code,input=inp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The runs have been appended but not performed, then we call seek_convergence.\n",
    "\n",
    "We want to perform a convergence procedure based on the value of the total energy of the system.\n",
    "So we can use the post processing function that directly provides this quantity"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 140,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs_kpoint.set_postprocessing_function(extract_energy)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 141,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fetching results for id \" {'kp': 2} \"\n",
      "Fetching results for id \" {'kp': 3} \"\n",
      "Fetching results for id \" {'kp': 4} \"\n",
      "Fetching results for id \" {'kp': 5} \"\n",
      "Convergence reached in Dataset \"Si_kpoints_convergence\" for id \" {'kp': 4} \"\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'kp': 4}, -7.874513952262473)"
      ]
     },
     "execution_count": 141,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_kpoint.seek_convergence(rtol=0.001)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Seek_converge runs all the computation (in the order provided by append_run) until convergence is reached.\n",
    "Otherwise it is possible to pass a list of ids as argument of the method, in this case the calculation are restricted\n",
    "to the simulations associated to the provided ids."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "It is also possible to use a more generic post processing function that simply parse the data.\n",
    "In this case we can choose which quantity is used to check if the convergence is reached by specifying the attribute = ...\n",
    "options in the call of the seek_convergence. For instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 142,
   "metadata": {},
   "outputs": [],
   "source": [
    "gs_kpoint.set_postprocessing_function(parse_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 143,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fetching results for id \" {'kp': 2} \"\n",
      "Fetching results for id \" {'kp': 3} \"\n",
      "Fetching results for id \" {'kp': 4} \"\n",
      "Fetching results for id \" {'kp': 5} \"\n",
      "Convergence reached in Dataset \"Si_kpoints_convergence\" for id \" {'kp': 4} \"\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'kp': 4}, -7.874513952262473)"
      ]
     },
     "execution_count": 143,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "gs_kpoint.seek_convergence(rtol=0.001,attribute='energy')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Perform a convergence test for Hartree-Fock computations with Yambo"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We consider a set of Hartree-Fock computation for silicon and we look for the value of the EXXRLvcs that ensure\n",
    "a converged value of the direct gap."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First of all we need a nscf computation. We start from scf result with ecutoff = 60 and kpoints = [4,4,4]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "inp = I.PwInput('Si_gs_convergence/kp_4.in')\n",
    "inp.set_nscf(8,force_symmorphic=True)\n",
    "inp.set_kpoints(points = [6,6,6]) #nscf kpoints can be different from the scf\n",
    "name = 'nscf_kp6_ecut60'\n",
    "inp.set_prefix(name)\n",
    "#inp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialize a parallel QuantumESPRESSO calculator with scheduler direct\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'omp': 1,\n",
       " 'mpi': 4,\n",
       " 'mpi_run': 'mpirun -np',\n",
       " 'executable': 'pw.x',\n",
       " 'scheduler': 'direct',\n",
       " 'multiTask': True,\n",
       " 'skip': True,\n",
       " 'verbose': True,\n",
       " 'IO_time': 5}"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "code = C.QeCalculator(mpi=4)\n",
    "code.global_options()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The folder Si_gs_convergence/nscf_kp6_ecut60.save already exsists. Source folder Si_gs_convergence/kp_4.save not copied\n",
      "Skip the run of nscf_kp6_ecut60\n",
      "Job completed\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'output': ['Si_gs_convergence/nscf_kp6_ecut60.save/data-file-schema.xml']}"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "code.run(run_dir='Si_gs_convergence',inputs=[inp],names=[name],source_dir='Si_gs_convergence/kp_4.save')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next step is the generation of the run_dir and SAVE folder"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [],
   "source": [
    "from mppi import Utilities as U"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "run_dir = 'Si_hf_convergence'\n",
    "source_dir = 'Si_gs_convergence/nscf_kp6_ecut60.save'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "SAVE folder already present in Si_hf_convergence\n"
     ]
    }
   ],
   "source": [
    "U.build_SAVE(source_dir,run_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are ready to build the Yambo dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Initialize a parallel Yambo calculator with scheduler direct\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'omp': 1,\n",
       " 'mpi': 2,\n",
       " 'mpi_run': 'mpirun -np',\n",
       " 'executable': 'yambo',\n",
       " 'scheduler': 'direct',\n",
       " 'multiTask': True,\n",
       " 'skip': True,\n",
       " 'verbose': True,\n",
       " 'IO_time': 5,\n",
       " 'clean_restart': True}"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "code = C.YamboCalculator()\n",
    "code.global_options()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'args': 'yambo -x -V rl',\n",
       " 'folder': 'Si_hf_convergence',\n",
       " 'filename': 'yambo.in',\n",
       " 'arguments': ['HF_and_locXC'],\n",
       " 'variables': {'FFTGvecs': [2733.0, 'RL'],\n",
       "  'SE_Threads': [0.0, ''],\n",
       "  'EXXRLvcs': [1.0, 'RL'],\n",
       "  'VXCRLvcs': [17153.0, 'RL'],\n",
       "  'QPkrange': [[1, 1, 1, 8], '']}}"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "inp = I.YamboInput(args='yambo -x -V rl',folder=run_dir)\n",
    "inp.set_kRange(1,1) # we are interested at the direct gap at Gamma so we include only the first kpoint\n",
    "inp"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 167,
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_convergence = D.Dataset(label='Si_hf',run_dir=run_dir)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us start by adding some computations to see how to manage the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 168,
   "metadata": {},
   "outputs": [],
   "source": [
    "exx_values = [1.,2.,3.] #in Hartree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 169,
   "metadata": {},
   "outputs": [],
   "source": [
    "for e in exx_values:\n",
    "    id = {'exxrl' : e}\n",
    "    inp['variables']['EXXRLvcs'] = [1e3*e, 'mHa']\n",
    "    hf_convergence.append_run(id=id,input=inp,runner=code)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " If needed we can also pass the jobname attribute by adding, for istance\n",
    " \n",
    " jobname=D.name_from_id(id)+'-job' \n",
    " \n",
    " in the appen_run. For instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 170,
   "metadata": {},
   "outputs": [],
   "source": [
    "exx = 4.\n",
    "id = {'exxrl' : exx}\n",
    "inp['variables']['EXXRLvcs'] = [1e3*exx, 'mHa']\n",
    "hf_convergence.append_run(id=id,input=inp,jobname=D.name_from_id(id)+'-job',runner=code)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 171,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'input': {'args': 'yambo -x -V rl',\n",
       "  'folder': 'Si_hf_convergence',\n",
       "  'filename': 'yambo.in',\n",
       "  'arguments': ['HF_and_locXC'],\n",
       "  'variables': {'FFTGvecs': [2733.0, 'RL'],\n",
       "   'SE_Threads': [0.0, ''],\n",
       "   'EXXRLvcs': [4000.0, 'mHa'],\n",
       "   'VXCRLvcs': [17153.0, 'RL'],\n",
       "   'QPkrange': [[1, 1, 1, 8], '']}},\n",
       " 'name': 'exxrl_4.0',\n",
       " 'jobname': 'exxrl_4.0-job',\n",
       " 'label': 'Si_hf',\n",
       " 'run_dir': 'Si_hf_convergence'}"
      ]
     },
     "execution_count": 171,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hf_convergence.runs[3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we can run the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 172,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Skip the computation for input exxrl_1.0\n",
      "Skip the computation for input exxrl_2.0\n",
      "Skip the computation for input exxrl_3.0\n",
      "Skip the computation for input exxrl_4.0\n",
      "Job completed\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{0: {'output': ['Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_1.0'},\n",
       " 1: {'output': ['Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_2.0'},\n",
       " 2: {'output': ['Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_3.0'},\n",
       " 3: {'output': ['Si_hf_convergence/exxrl_4.0/o-exxrl_4.0-job.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_4.0-job'}}"
      ]
     },
     "execution_count": 172,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hf_convergence.run()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 173,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: {'output': ['Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_1.0'},\n",
       " 1: {'output': ['Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_2.0'},\n",
       " 2: {'output': ['Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_3.0'},\n",
       " 3: {'output': ['Si_hf_convergence/exxrl_4.0/o-exxrl_4.0-job.hf'],\n",
       "  'dbs': 'Si_hf_convergence/exxrl_4.0-job'}}"
      ]
     },
     "execution_count": 173,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hf_convergence.results"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Parsing the results with a post processing function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can define a general post processing function to extract all the results from the o- files of the dataset.\n",
    "\n",
    "We can use the YamboParser class of this package"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 174,
   "metadata": {},
   "outputs": [],
   "source": [
    "def parse_data(dataset):\n",
    "    from mppi import Parsers as P\n",
    "    results = {}\n",
    "    for run,data in dataset.results.items():\n",
    "        results[run] = P.YamboParser(data['output'],verbose=True)\n",
    "    return results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 175,
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_convergence.set_postprocessing_function(parse_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 176,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Parse file Si_hf_convergence/exxrl_1.0/o-exxrl_1.0.hf\n",
      "Parse file Si_hf_convergence/exxrl_2.0/o-exxrl_2.0.hf\n",
      "Parse file Si_hf_convergence/exxrl_3.0/o-exxrl_3.0.hf\n",
      "Parse file Si_hf_convergence/exxrl_4.0/o-exxrl_4.0-job.hf\n"
     ]
    }
   ],
   "source": [
    "code.update_global_options(verbose=False,skip=True)\n",
    "results = hf_convergence.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Results can be extracted as"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 177,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-18.51499   -0.5254    -0.5254    -0.52829    7.248512   7.259116\n",
      "   7.259286   8.57444 ]\n",
      "[-18.59211   -0.9768    -0.9767    -0.98311    6.951296   6.961545\n",
      "   6.961472   8.0761  ]\n",
      "[-18.6265    -1.125     -1.125     -1.12512    6.797003   6.797605\n",
      "   6.797518   7.91638 ]\n",
      "[-18.63289   -1.142     -1.142     -1.14206    6.77871    6.778919\n",
      "   6.778865   7.89733 ]\n"
     ]
    }
   ],
   "source": [
    "for irun in results:\n",
    "    print(results[irun]['hf']['ehf'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Computing the direct gap with a post processing function"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We describe the usage of a post processing function to perform a more specific operation like computing\n",
    "the direct band gap. We define the post processing function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 178,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_direct_gap(dataset):\n",
    "    \"\"\"\"\n",
    "    Compute the direct band gap assuming that there is only one kpoint.\n",
    "    The arguments energy_col, val_band and cond_band are read from the global_options\n",
    "    of the dataset.\n",
    "    \"\"\"\n",
    "    from mppi import Parsers as P\n",
    "    import numpy as np\n",
    "    glob_opt = dataset.global_options()\n",
    "    val_band = glob_opt.get('val_band')\n",
    "    cond_band = glob_opt.get('cond_band')\n",
    "    # the name of the column used to compute the gap\n",
    "    energy_col = glob_opt.get('energy_col','hf') \n",
    "    gap = {}\n",
    "    for run,data in dataset.results.items():\n",
    "        results = P.YamboParser(data['output'])\n",
    "        key = list(results.keys())[0] # select the key (can be hf or qp)\n",
    "        bands = results[key]['band']\n",
    "        index_val = np.where(bands == val_band)\n",
    "        index_cond = np.where(bands == cond_band)\n",
    "        energy = results[key][energy_col]\n",
    "        delta = energy[index_cond]-energy[index_val]\n",
    "        gap[run] = float(delta)\n",
    "    return gap"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This function assume that some inputs like the specification of the conduction and valence bands are given in the global options\n",
    "of the dataset. So we can se"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 179,
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_convergence.update_global_options(val_band = 4, cond_band = 5, energy_col = 'hf')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we set the new post processing function and run the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 180,
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_convergence.set_postprocessing_function(get_direct_gap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 181,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{0: 6.5168870000000005,\n",
       " 1: 6.674491,\n",
       " 2: 6.662208000000001,\n",
       " 3: 6.660856000000001}"
      ]
     },
     "execution_count": 181,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hf_convergence.run()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Usage of seek convergence"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The post processing function defined above can be used together with the seek_convergence method to perform a convergence study\n",
    "\n",
    "In this case we define a new dataset and append many possible runs. Only those one needed to reach the given tolerance will be executed"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 182,
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_convergence2 = D.Dataset(label='Si_hf',run_dir=run_dir,val_band = 4, cond_band = 5, var_name = 'hf')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 183,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]"
      ]
     },
     "execution_count": 183,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "exx_values = [float(i) for i in range(1,10)] #in Hartree\n",
    "exx_values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 184,
   "metadata": {},
   "outputs": [],
   "source": [
    "for e in exx_values:\n",
    "    id = {'exxrl' : e}\n",
    "    inp['variables']['EXXRLvcs'] = [1e3*e, 'mHa']\n",
    "    hf_convergence2.append_run(id=id,input=inp,runner=code)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 185,
   "metadata": {},
   "outputs": [],
   "source": [
    "hf_convergence2.set_postprocessing_function(get_direct_gap)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 186,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fetching results for id \" {'exxrl': 1.0} \"\n",
      "Fetching results for id \" {'exxrl': 2.0} \"\n",
      "Fetching results for id \" {'exxrl': 3.0} \"\n",
      "Fetching results for id \" {'exxrl': 4.0} \"\n",
      "Fetching results for id \" {'exxrl': 5.0} \"\n",
      "Fetching results for id \" {'exxrl': 6.0} \"\n",
      "Convergence reached in Dataset \"Si_hf\" for id \" {'exxrl': 5.0} \"\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "({'exxrl': 5.0}, 6.661784)"
      ]
     },
     "execution_count": 186,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "hf_convergence2.seek_convergence(rtol=0.0001)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}