PEACEWORD
Prototype for Extracting And Considering the Explainability of WORD embeddings.
This simple Git project contains two classic heuristics for assessing their suitability for word embeddings (proofs of concepts).
This project is a work from the research team ORKAD of the CRIStAL laboratory of the University of Lille 🌐 ORKAD team web site
Required Elements
-
Python compiler
(version 3.12 or higher) Git
Quick installation
Default installation can be summarized as follows:
git clone https://gitlab.cristal.univ-lille.fr/orkad-public/peaceword.git
cd peaceword
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt
Folder organisation
- models the folder containing the downloaded datasets, note that the 'text8' dataset (based on Wikipedia text) is constantly evolving. The resulting cosine similarity may therefore vary depending on when the dataset was downloaded.
- methods python package containing two approaches (hillclimbing and greedy)
- project root or . contains the different main programs (described below)
Programs
This section describes the various python programs included in this Git project
Downloading datasets
There are two programs, the first (load_model.py) allows you to download a model from the gensim library, the second (load_glove_model.py) is specific to glove-XXX datasets.
It's easy to use: launch the python code with the name of the dataset as argument, and the loaded model is stored in the models directory.
Here's an example for the 'text8' dataset.
python3 load_model.py text8
The greedy method
The main program run_greedy.py requires several parameters:
-
dataset
: the dataset location -
only_pos
: 'yes' if the research is limited to positive words, 'no' otherwise. -
min_d
: the minimum distance between two dimension values for them to be considered closed (double value). -
min_p
: the minimum percentage (integer value) of close dimensions for selecting a word -
threshold
: the minimum absolute double value for which a dimension value is considered relevant. -
target
: target word name
Here is an example:
python3 run_greedy.py ./models/text8_article yes 0.0279 5 0.2233 yes queen
Information
Authors
See Authors
License
PEACEWORD is licensed under the following license :
- GNU General Public License version 3 (GPLv3) GPL refers to the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.