A distributed Split-Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems
Table of content
Table of content
Description
Python codes associated with the method described in the following paper.
[1] P.-A. Thouvenin, A. Repetti, P. Chainais - A distributed Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems, arxiv preprint 2210.02341, October 2022, under review.
Authors: P.-A. Thouvenin, A. Repetti, P. Chainais
Installation
Environment setup
-
The library requires a functional installation of
conda
ormamba
(a fast, drop-in replacement forconda
). The instructions below are given in the case ofmamba
for a fast setup. Usingconda
instead only requires replacing the wordmamba
byconda
in the command line instructions below. -
To install the library, issue the following commands in a terminal.
# Cloning the repo. or unzip the dsgs-main.zip code archive
# git clone --recurse-submodules https://gitlab.com/pthouvenin/...git
unzip dsgs-main.zip
cd dsgs-main
# Create a conda environment using one of the lock files provided in the archive
# (use jcgs_review_environment_osx.lock.yml for MAC OS)
mamba env create --name jcgs-review --file jcgs_review_environment_linux.lock.yml
# mamba env create --name jcgs-review --file jcgs_review_environment.yml
# Activate the environment
mamba activate jcgs-review
# Install the library in editable mode
mamba develop src/
# Deleting the environment (if needed)
# mamba env remove --name jcgs-review
# Generating lock file from existing environment (if needed)
# mamba env export --name jcgs-review --file jcgs_review_environment_linux.lock.yml
# or
# mamba list --explicit --md5 > explicit_jcgs_env_linux-64.txt
# mamba create --name jcgs-test -c conda-forge --file explicit_jcgs_env_linux-64.txt
# pip install docstr-coverage genbadge wily sphinxcontrib-apa sphinx_copybutton
# Manual install (if absolutely needed)
# mamba create --name jcgs-review numpy numba mpi4py "h5py>=2.9=mpi*" scipy scikit-image matplotlib imageio tqdm jupyterlab pytest black flake8 isort coverage pre-commit sphinx sphinx_rtd_theme sphinxcontrib-bibtex sphinx-autoapi sphinxcontrib furo conda-lock conda-build
# mamba activate jcgs-review
# pip install sphinxcontrib-apa sphinx_copybutton docstr-coverage genbadge wily
# mamba develop src
- To avoid file lock issue in h5py, you may need to add the following line to your
~/.zshrc
file (or~/.bashrc
)
export HDF5_USE_FILE_LOCKING='FALSE'
- To test the installation went well, you can run the unit-tests provided in the package using the command below.
mamba activate jcgs-review
pytest --collect-only
export NUMBA_DISABLE_JIT=1 # need to disable jit compilation to check test coverage
coverage run -m pytest # run all the unit-tests (see Documentation section for more details)
Experiments
All the experiments reported in the paper can be reproduced from the .sh
scripts provided in ./examples/jcgs
, as detailed in the following paragraphs.
⚠️ WARNING: Memory requirements to save all the results
Running all the experiments produces a large volume of results / checkpoint data saved to the hard-drive in HDF5 (.h5
) format, essentially for the experiments involving serial samplers (206GB in total).
All the samples generated with serial samplers are saved to disk to better assess the sampling quality, which explains such a large requirement. Ensure maximum 20GB hard-drive memory is available for each single run of one of the serial samplers on one of the datasets considered.
Users are strongly advised to progressively run a subset of the experiments at once, carefully monitoring the remaining space available on the hard-drive where results are saved. The checkpoint files corresponding to burn-in samples can be safely discarded once the experiment is finalized.
Each run of the distributed sampler on a dataset requires 300MB memory on the hard drive, as fewer elements are saved to disk (last state to restart the chain + average over the current batch to compute the MMSE estimator).
tmux)
Running experiments in a detached tmux session (requiresThe experiments can be run from a detached tmux session running in the background. See the ./examples/jcgs/run_from_tmux.sh
script for further details.
A few basic instructions to interact with a tmux
session are given below.
# check name of the tmux session, called session_name in the following
tmux list-session
tmux a -t session_name # press crtl+b to kill the session once the work is done
# to detach from a session (leave it running in the background, press ctrl+b, then ctrl+d)
# in the tmux session, press crtl+b to kill it once the work is done or, from a normal terminal
tmux kill-session -t session_name
Running the experiments
- The folder
./examples/jcgs/configs
contains.json
files summarizing the list of parameters used for the different experiments/datasets. All the experiments can be reproduced using the following commands. Details about the location of the HDF5 files / data produced are included in each script.
# from a terminal at the root of the archive
mamba activate jcgs-review
cd examples/jcgs
# generate all the synthetic datasets used in the experiments (to be run only once)
bash generate_data.sh
# run all the experiments based on serial samplers (MYULA and proposed sampler)
bash sampling_quality_experiment.sh
# run strong scaling experiment
bash strong_scaling_experiment.sh
# run weak scaling experiment
bash weak_scaling_experiment.sh
# deactivating the conda environment (when no longer needed)
mamba deactivate
- The content of an
.h5
file can be quickly checked from the terminal (see theh5py
documentation for further details). Some examples are provided below.
mamba activate jcgs-review
# replace <filename> by the name of your file in the instructions below
h5dump --header <filename>.h5 # displays the name and size of all variables contained in the file
h5dump <filename>.h5 # diplays the value of all the variables saved in the file
h5dump -d "/GroupFoo/databar[1,1;2,3;3,19;1,1]" <filename>.h5 # display part of a variable from a dataset within a given h5 file
h5dump -d dset <filename>.h5 # displays content of a dataset dset
h5dls -d <filename>.h5/dset # displays content of a dataset dset
Development
Building the documentation
- Most functionalities are fully documented using the
numpy
docstring style. - The documentation can be generated in
.html
format using the following commands issued from a terminal.
mamba activate jcgs-review
cd build docs/build/html
make html
Assessing code and docstring coverage
To test the code/docstring coverage, run the following commands from a terminal.
mamba activate jcgs-review
pytest --collect-only
export NUMBA_DISABLE_JIT=1 # need to disable jit compilation to check test coverage
coverage run -m pytest # check all tests
coverage report # generate a coverage report in the terminal
coverage html # HTML-based reports which let you visually see what lines of code were not tested
coverage xml -o reports/coverage/coverage.xml # produce xml file to generate the badge
genbadge coverage -o docs/coverage.svg
docstr-coverage . # check docstring coverage and generate the associated coverage badge
To launch a single test, run a command of the form
mamba activate jcgs-review
python -m pytest tests/models/test_crop.py
pytest --markers # check full list of markers availables
pytest -m "not mpi" --ignore-glob=**/archive_unittest/* # run all tests not marked as mpi + ignore files in any directory "archive_unittest"
mpiexec -n 2 python -m mpi4py -m pytest -m mpi # run all tests marked mpi with 2 cores
License
The project is licensed under the GPL-3.0 license.