Mary Morstan
A multi objective modular Framework to automatically configure machine learning algorithms.
This AutoML framework is based on Evolutionary Algorithms (inspired from TPOT and DEAP).
📝 Documentation 🌐 ORKAD team web site
Required Elements
-
Python
(version 3.10 or higher) Git
make
Additional(optional) elements
-
jupyter notebook
to run various example provided in examples/ directory
Quick installation
Default installation can be summarized as follows:
git clone https://gitlab.cristal.univ-lille.fr/orkad-public/mary-morstan.git
cd mary-morstan
python3 -m venv venv
pip install -r requirements.dev.txt
pip install .
Folder organisation
- bin/ the folder that contains the mary-morstan executable
- marymorstan/ the folder that contains the code source of mary-morstan
- datasets contains various python classes to download different (remote) datasets
- misc different scripts and files for unit-tests and integration-tests
- examples jupyter notebooks with various examples
- tests unit tests
- benchmarks contains python script for benchmark tests
Information
Authors
See Authors
Licenses
Mary-Morstan is dual licensed under the following licenses. You can use the software according to the terms of your chosen license.
- GNU General Public License version 3 (GPLv3) GPL refers to the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
- Possibility of proprietary licence : if, for any particular reason, you are interested with a no-copyleft licence, please contact the ORKAD team.
Documentation
A more detailed documentation (work in progress) will be available here
Examples
Classic usage
two methods are possible. The first is to program in python while the second is to use the mary-morstan executable
First method: programming in Python
from sklearn.model_selection import train_test_split
import importlib
from marymorstan.marymorstan import MaryMorstan
dataset_preprocessing_module = importlib.import_module("datasets.iris_dataset_preprocessing")
dataset = dataset_preprocessing_module.MyDataSetPreprocessing("iris")
X_train, X_test, y_train, y_test = train_test_split(dataset.get_X(), dataset.get_y(), test_size=.25, random_state=42)
mm = MaryMorstan(generations=4, population_size=5)
pipelines = mm.optimize(X_train=dataset.features, y_train=dataset.target, X_test=X_test, y_test=y_test, random_state=42)
best_pipeline = MaryMorstan.best(pipelines)
# then you can save it as a string and easily reimport later
best_pipeline_str = str(best_pipeline)
from marymorstan.compiler import MLPipelineCompiler
compiler = MLPipelineCompiler()
pipeline = compiler.compile(best_pipeline_str)
# pipeline.fit(....)
# pipeline.score(...)
Second method: the mary-morstan executable
Here is the script equivalent to the previous method
mary-morstan --dataset 'iris' --dataset-preprocessing "datasets.iris_dataset_preprocessing" \
--generations 4 --population-size 5 --seed 42 --test-size-ratio .25 \
--log-level=ERROR \
--print-best-pipeline-only --test-best-pipeline
Dev
To ensure you dont break the code, you better run:
- check code
- unit-tests
- integration tests
- benchmark tests
Check code
make check-code
Run unit-tests
make unit-tests
Integration tests
make integration-tests
Run benchmark
It ensures expected performance, careful, it requires a machine with lot of memory (16GB) and multiple cpu (8 at least).
make benchmark-tests