Skip to content
Snippets Groups Projects
Select Git revision
  • f6728ceb3f7e0c4ccc195e7339a0952d23f38093
  • main default protected
2 results

peaceword

PEACEWORD

Prototype for Extracting And Considering the Explainability of WORD embeddings.

This simple Git project contains two classic heuristics for assessing their suitability for word embeddings (proofs of concepts).

This project is a work from the research team ORKAD of the CRIStAL laboratory of the University of Lille 🌐 ORKAD team web site

Required Elements

  • Python compiler (version 3.12 or higher)
  • Git

Quick installation

Default installation can be summarized as follows:

git clone https://gitlab.cristal.univ-lille.fr/orkad-public/peaceword.git
cd peaceword
python3 -m venv venv
. venv/bin/activate
pip install -r requirements.txt

Folder organisation

  • models the folder containing the downloaded datasets, note that the 'text8' dataset (based on Wikipedia text) is constantly evolving. The resulting cosine similarity may therefore vary depending on when the dataset was downloaded.
  • methods python package containing two approaches (hillclimbing and greedy)
  • project root or . contains the different main programs (described below)

Programs

This section describes the various python programs included in this Git project

Downloading datasets

There are two programs, the first (load_model.py) allows you to download a model from the gensim library, the second (load_glove_model.py) is specific to glove-XXX datasets.

It's easy to use: launch the python code with the name of the dataset as argument, and the loaded model is stored in the models directory.

Here's an example for the 'text8' dataset.

python3 load_model.py text8

The greedy method

The main program run_greedy.py requires several parameters:

  • dataset : the dataset location
  • only_pos : 'yes' if the research is limited to positive words, 'no' otherwise.
  • min_d : the minimum distance between two dimension values for them to be considered closed (double value).
  • min_p : the minimum percentage (integer value) of close dimensions for selecting a word
  • threshold : the minimum absolute double value for which a dimension value is considered relevant.
  • target : target word name

Here is an example:

python3 run_greedy.py ./models/text8 yes 0.0279 5 0.2233 yes queen

results are stored in a CSV file

The HillClimbing method

The main program run_hillclimbing.py requires several parameters:

  • dataset : the dataset location
  • only_pos : 'yes' if the research is limited to positive words, 'no' otherwise.
  • seed : a seed number, the hillclimbing method is not deterministic
  • target : target word name

Here is an example:

python3 run_hillclimbing.py ./models/text8 no 16 brother

results are stored in a CSV file

The exampleAnalogy.py program

This simple program computes and displays the cosine similarity between fixed solutions and a target word.

The unique parameter is the dataset location, here is an example :

python3 exampleAnalogy.py ./models/glove-wiki-gigaword-100 

Information

Authors

See Authors

License

PEACEWORD is licensed under the following license :