Skip to content
Snippets Groups Projects
Select Git revision
  • 6d340b0b3663e1662e6e48e7517972ffc4cc96a8
  • main default protected
  • v1.0
  • v0.0
4 results

dsgs

user avatar
pthouvenin authored
6d340b0b
History

A distributed Split-Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems

license docs-page tests docstr-coverage

pre-commit Python code style: black imports: isort

Table of content

Table of content


Description

Python codes associated with the method described in the following paper.

[1] P.-A. Thouvenin, A. Repetti, P. Chainais - A distributed Gibbs Sampler with Hypergraph Structure for High-Dimensional Inverse Problems, arxiv preprint 2210.02341, October 2022, under review.

Authors: P.-A. Thouvenin, A. Repetti, P. Chainais


Installation

Environment setup

  • The library requires a functional installation of conda or mamba (a fast, drop-in replacement for conda). The instructions below are given in the case of mamba for a fast setup. Using conda instead only requires replacing the word mamba by conda in the command line instructions below.

  • To install the library, issue the following commands in a terminal.

# Cloning the repo. or unzip the dsgs-main.zip code archive
# git clone --recurse-submodules https://gitlab.com/pthouvenin/...git
unzip dsgs-main.zip
cd dsgs-main

# Create a conda environment using one of the lock files provided in the archive
# (use jcgs_review_environment_osx.lock.yml for MAC OS)
mamba env create --name jcgs-review --file jcgs_review_environment_linux.lock.yml
# mamba env create --name jcgs-review --file jcgs_review_environment.yml

# Activate the environment
mamba activate jcgs-review

# Install the library in editable mode
mamba develop src/

# Deleting the environment (if needed)
# mamba env remove --name jcgs-review

# Generating lock file from existing environment (if needed)
# mamba env export --name jcgs-review --file jcgs_review_environment_linux.lock.yml
# or
# mamba list --explicit --md5 > explicit_jcgs_env_linux-64.txt
# mamba create --name jcgs-test -c conda-forge --file explicit_jcgs_env_linux-64.txt
# pip install docstr-coverage genbadge wily sphinxcontrib-apa sphinx_copybutton

# Manual install (if absolutely needed)
# mamba create --name jcgs-review numpy numba mpi4py "h5py>=2.9=mpi*" scipy scikit-image matplotlib imageio tqdm jupyterlab pytest black flake8 isort coverage pre-commit sphinx sphinx_rtd_theme sphinxcontrib-bibtex sphinx-autoapi sphinxcontrib furo conda-lock conda-build
# mamba activate jcgs-review
# pip install sphinxcontrib-apa sphinx_copybutton docstr-coverage genbadge wily
# mamba develop src
export HDF5_USE_FILE_LOCKING='FALSE'
  • To test the installation went well, you can run the unit-tests provided in the package using the command below.
mamba activate jcgs-review
pytest --collect-only
export NUMBA_DISABLE_JIT=1  # need to disable jit compilation to check test coverage
coverage run -m pytest      # run all the unit-tests (see Documentation section for more details)

Experiments

All the experiments reported in the paper can be reproduced from the .sh scripts provided in ./examples/jcgs, as detailed in the following paragraphs.

⚠️ WARNING: Memory requirements to save all the results

Running all the experiments produces a large volume of results / checkpoint data saved to the hard-drive in HDF5 (.h5) format, essentially for the experiments involving serial samplers (206GB in total).

All the samples generated with serial samplers are saved to disk to better assess the sampling quality, which explains such a large requirement. Ensure maximum 20GB hard-drive memory is available for each single run of one of the serial samplers on one of the datasets considered.

Users are strongly advised to progressively run a subset of the experiments at once, carefully monitoring the remaining space available on the hard-drive where results are saved. The checkpoint files corresponding to burn-in samples can be safely discarded once the experiment is finalized.

Each run of the distributed sampler on a dataset requires 300MB memory on the hard drive, as fewer elements are saved to disk (last state to restart the chain + average over the current batch to compute the MMSE estimator).

Running experiments in a detached tmux session (requires tmux)

The experiments can be run from a detached tmux session running in the background. See the ./examples/jcgs/run_from_tmux.sh script for further details.

A few basic instructions to interact with a tmux session are given below.

# check name of the tmux session, called session_name in the following
tmux list-session
tmux a -t session_name # press crtl+b to kill the session once the work is done

# to detach from a session (leave it running in the background, press ctrl+b, then ctrl+d)
# in the tmux session, press crtl+b to kill it once the work is done or, from a normal terminal
tmux kill-session -t session_name

Running the experiments

  • The folder ./examples/jcgs/configs contains .json files summarizing the list of parameters used for the different experiments/datasets. All the experiments can be reproduced using the following commands. Details about the location of the HDF5 files / data produced are included in each script.
# from a terminal at the root of the archive

mamba activate jcgs-review
cd examples/jcgs

# generate all the synthetic datasets used in the experiments (to be run only once)
bash generate_data.sh

# run all the experiments based on serial samplers (MYULA and proposed sampler)
bash sampling_quality_experiment.sh

# run strong scaling experiment
bash strong_scaling_experiment.sh

# run weak scaling experiment
bash weak_scaling_experiment.sh

# deactivating the conda environment (when no longer needed)
mamba deactivate
  • The content of an .h5 file can be quickly checked from the terminal (see the h5py documentation for further details). Some examples are provided below.
mamba activate jcgs-review

# replace <filename> by the name of your file in the instructions below
h5dump --header <filename>.h5 # displays the name and size of all variables contained in the file
h5dump <filename>.h5 # diplays the value of all the variables saved in the file
h5dump -d "/GroupFoo/databar[1,1;2,3;3,19;1,1]" <filename>.h5 # display part of a variable from a dataset within a given h5 file
h5dump -d dset <filename>.h5 # displays content of a dataset dset
h5dls -d <filename>.h5/dset  # displays content of a dataset dset

Development

Building the documentation

  • Most functionalities are fully documented using the numpy docstring style.
  • The documentation can be generated in .html format using the following commands issued from a terminal.
mamba activate jcgs-review
cd build docs/build/html
make html

Assessing code and docstring coverage

To test the code/docstring coverage, run the following commands from a terminal.

mamba activate jcgs-review
pytest --collect-only
export NUMBA_DISABLE_JIT=1  # need to disable jit compilation to check test coverage
coverage run -m pytest  # check all tests
coverage report  # generate a coverage report in the terminal
coverage html  # HTML-based reports which let you visually see what lines of code were not tested
coverage xml -o reports/coverage/coverage.xml  # produce xml file to generate the badge
genbadge coverage -o docs/coverage.svg
docstr-coverage .  # check docstring coverage and generate the associated coverage badge

To launch a single test, run a command of the form

mamba activate jcgs-review
python -m pytest tests/models/test_crop.py
pytest --markers  # check full list of markers availables
pytest -m "not mpi" --ignore-glob=**/archive_unittest/* # run all tests not marked as mpi + ignore files in any directory "archive_unittest"
mpiexec -n 2 python -m mpi4py -m pytest -m mpi  # run all tests marked mpi with 2 cores

License

The project is licensed under the GPL-3.0 license.