Skip to content
Snippets Groups Projects
Select Git revision
  • b99960b139761b543a90525ceb5c837ee82dc26c
  • main default protected
2 results

mary-morstan

Mary Morstan

A multi objective modular Framework to automatically configure machine learning algorithms.

This AutoML framework is based on Evolutionary Algorithms (inspired from TPOT and DEAP).

📝 Documentation 🌐 ORKAD team web site

Required Elements

  • Python (version 3.10 or higher)
  • Git
  • make

Additional(optional) elements

  • jupyter notebook to run various example provided in examples/ directory

Quick installation

Default installation can be summarized as follows:

git clone https://gitlab.cristal.univ-lille.fr/orkad-public/mary-morstan.git
cd mary-morstan
python3 -m venv venv
pip install -r requirements.dev.txt
pip install .

Folder organisation

  • bin/ the folder that contains the mary-morstan executable
  • marymorstan/ the folder that contains the code source of mary-morstan
  • datasets contains various python classes to download different (remote) datasets
  • misc different scripts and files for unit-tests and integration-tests
  • examples jupyter notebooks with various examples
  • tests unit tests
  • benchmarks contains python script for benchmark tests

Information

Authors

See Authors

Licenses

Mary-Morstan is dual licensed under the following licenses. You can use the software according to the terms of your chosen license.

  1. GNU General Public License version 3 (GPLv3) GPL refers to the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
  2. Possibility of proprietary licence : if, for any particular reason, you are interested with a no-copyleft licence, please contact the ORKAD team.

Documentation

A more detailed documentation (work in progress) will be available here

Examples

Classic usage

two methods are possible. The first is to program in python while the second is to use the mary-morstan executable

First method: programming in Python

from sklearn.model_selection import train_test_split
import importlib

from marymorstan.marymorstan import MaryMorstan

dataset_preprocessing_module = importlib.import_module("datasets.iris_dataset_preprocessing")
dataset = dataset_preprocessing_module.MyDataSetPreprocessing("iris")

X_train, X_test, y_train, y_test = train_test_split(dataset.get_X(), dataset.get_y(), test_size=.25, random_state=42)

mm = MaryMorstan(generations=4, population_size=5)
pipelines = mm.optimize(X_train=dataset.features, y_train=dataset.target, X_test=X_test, y_test=y_test, random_state=42)
best_pipeline = MaryMorstan.best(pipelines)

# then you can save it as a string and easily reimport later
best_pipeline_str = str(best_pipeline)

from marymorstan.compiler import MLPipelineCompiler
compiler = MLPipelineCompiler()
pipeline = compiler.compile(best_pipeline_str)
# pipeline.fit(....)
# pipeline.score(...)

Second method: the mary-morstan executable

Here is the script equivalent to the previous method

mary-morstan --dataset 'iris' --dataset-preprocessing "datasets.iris_dataset_preprocessing" \
 --generations 4 --population-size 5 --seed 42 --test-size-ratio .25 \
--log-level=ERROR \
--print-best-pipeline-only --test-best-pipeline

Dev

To ensure you dont break the code, you better run:

  • check code
  • unit-tests
  • integration tests
  • benchmark tests

Check code

make check-code

Run unit-tests

make unit-tests

Integration tests

make integration-tests

Run benchmark

It ensures expected performance, careful, it requires a machine with lot of memory (16GB) and multiple cpu (8 at least).

make benchmark-tests