Tutorial
Getting Started with EMOD
Introduction
Welcome to EMOD!
This repository and the documents contained herein are intended to serve as a basic walkthrough of how to install and run EMOD.
We will walk through the steps required to use EMOD and DTK-Tools as a black box software model. The full scope of how to build and customize an EMOD model from scratch will not be covered here. The next recommended steps for understanding the different features and nuances of the epidemiological model are covered elsewhere.
A guide to the files in this directory
The present directory contains the following files and directories:
simtools.ini
Data/calibration_ingest_form_Malawi_3_node_AB2_no_inc.xlsm
- This is the "ingest file." It contains a great deal of information which the simulation uses as inputs. This includes:- the list of dynamic parameters which will be fit during the calibration stage,
- the list of calibration target variables and how the calibration stage should weight them
- all calibration target data (ie, empirical data used to calibrate the model)
InputFiles
Static/Demographics_Malawi.json
- Malawi demographics file; includes all mortality and fertility data
InputFiles/Templates
PFA_Overlay_Malawi.json
- Malawi demographics file which adjusts the pair formation algorithm according to likelihood of pair formation between different demographic groupsRisk_Assortivity_Overlay_Malawi.json
- Malawi demographics file which adjusts assortativity between agents according to risk group (ie, risk-prone agents may be more likely to form pairs with other risk-prone agents)Accessibility_and_Risk_IP_Overlay_Malawi.json
- Malawi demographics file which adjusts accessibility and riskcampaign_Malawi_1Dec2021_recalib.json
- "Campaign file," which defines the full structure of the network of events which occur in the simulation. Together, this network of events describes the events which lead to transmission; the course of infection within infected individuals; and all the different ways individual agents interact with the health care system --- testing, treatment, care, etc.config.json
- a (mostly) alphabetical list of all parameters which are defined in the simulation. Not all of these parameters are necessary to build a simulation of HIV, but
optim_script.py
- Script for configuring the calibration stage of the simulation
run_scenarios.py
- Script for configuring and executing the simulation runs
scenarios.csv
- List of scenarios, which are run as part of
run_scenarios
- List of scenarios, which are run as part of
All together, these files can be used to build and run a simulation of HIV transmission in Malawi using EMOD and DTK-Tools.
It is recommended that you clone this repo in your personal folder in the Bershteyn lab group directory:
/gpfs/data/bershteynlab/EMOD/[YOUR FOLDER]
Steps for Installing EMOD and DTK-Tools
Follow these steps to install EMOD and DTK-Tools on BigPurple (NYU Langone's cluster):
Configure bash and python environment
ssh
into BigPurple- Create a virtual python environment for the purpose of holding all the python packages that DTK-Tools depends on. Run the following in the command line:
python -m venv ~/environments/dtk-tools-p36
source ~/environments/dtk-tools-p36/bin/activate
- Modify the bash environment to automatically launch the python environment just created. Paste the following into the
.bashrc
file:
# .bashrc
# Source global definitions
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
# User specific aliases and functions
module load singularity/3.1
module load python/cpu/3.6.5
module load git/2.17.0
source ~/environments/dtk-tools-p36/bin/activate
export PYTHONPATH=~/environments/dtk-tools-p36/lib/python3.6/site-packages
Run in command line: source ~/.bashrc
Install DTK-Tools and Disease-Specific Packages
Use git to install DTK-Tools and packages specific to HIV. It is recommended that you install these in the home
directory on BigPurple: `gpfs/home/[YOUR NAME]
-
Make sure that you have access to the Institute for Disease Modeling's github repos. Both dtk-tools and HIV-Analyzers are private, so your github account must be given permission to view and clone
-
Set up a public key on BigPurple and add the key to github. This will allow you to clone and directly download EMOD and DTK-Tools from github onto BigPurple.
-
Clone the DTK-Tools repo. In your
home
directory on BigPurple, run the following in the command line:``` cd ~ git clone git@github.com:InstituteforDiseaseModeling/dtk-tools.git dtk-tools-p36 cd dtk-tools-p36 python setup_manual.py ```
In the command line run
dtk -h
to confirm installation. -
Clone the HIV-analyzers repo In your
home
directory on BigPurple, run the following in the command line:``` cd ~ git clone git clone git@github.com:InstituteforDiseaseModeling/HIV-Analyzers.git cd HIV-Analyzers python setup.py develop ```
-
Install Docker image to run simulations. The Docker image creates a virtual environment where all of the packages and extensions required to run EMOD smoothly already exist. This saves the trouble of needing to re-install and re-configure them. Run the following in the command line:
``` cd ~ mkdir images cd images singularity pull docker://idm-docker-public.packages.idmod.org/nyu/dtk:20200306 ```
Guide to Simulation Setup
The simtools.ini
file defines the paths where the simulation looks for inputs and writes outputs.
It is recommended practice to store simulation outputs in the scratch
directory on BigPurple. This is a private directory which can be used for temporary storage. In simtools.ini
, set the simulation root path to this location:
```
# Path where the experiment/simulation outputs will be stored
sim_root = /gpfs/scratch/[YOUR NAME]/experiments`
```
Set the input_root
as the path to the current directory
```
input_root = /gpfs/data/bershteynlab/EMOD/[YOUR FOLDER]/malawi2022_tutorial/InputFiles/Static
```
How to calibrate
The first stage of the simulation will be to calibrate the model. This process will determine a probability distribution for each of the model parameters designated as "dynamic" which will then be used to run the simulation and produce output.
- Open a virtual environment with
screen
- Request a compute node on BigPurple:
srun -p cpu_short -n 2 --mem-per-cpu=8G --time=11:00:00 --pty bash
- More documentation on how to configure the different parts of this command here
- Run in command line:
python optim_script.py
After the simulation runs, a number of new files will appear:
- A directory called
Malawi--0.001--rep3--test0
- Contains all of the outputs of each of 10 iterations performed by the simulation (
iter0
,iter1
, etc) - Inside each
iter
folder is a set of PDF outputs which record the log-likelihood distribution of each dynamic parameter, and shows how the maximum likelihood parameter value evolves over the course of each iteration during calibration.
- Contains all of the outputs of each of 10 iterations performed by the simulation (
- Can be ignored:
post_channel_config.json
- This defines how the simulation population is subdivided demographically in order to match each demographic subgroup to its corresponding calibration target. A schema for the simulation output.
output
- A directory which contains a number of
HIVCalibSite
.csv files.
- A directory which contains a number of
How to run a scenario
The second stage of the simulation will be to run an ensemble of simulations. The simulation will draw from the calibrated parameter values from the previous stage and produce model output which may then be analyzed and interpreted.
Run the following python command inside the compute node on BigPurple:
```
python run_scenarios.py -c optim_script.py --resample-method roulette --nsamples 25 --output-dir MWI_baseline --suite-name MWI_baseline --table scenarios.csv --calib-dir Malawi--0.001--rep3--test0
```
-c
and--calib-dir
: The command references the script used to perform the calibration (optim_script.py
and the directory where the calibration results were stored.)--resample-method
and--nsamples
: The command allows the user to define how the parameter values will be resampled, and the number of times to resample each parameter--table
: The user provides a table of scenarios to run. In this case, only the baseline will run.- --output-dir`: Where simulation outputs will go
After the simulation runs, a number of new files and directories will appear. The output directory will contain a set of reports. These reports will also appear in the sim_root
directory defined in simtools.ini
Backlinks