Skip to content
Snippets Groups Projects
Select Git revision
  • 69d79b42d3c1e5b6da5ecd98f6957ec93efaaee0
  • master default protected
2 results

msa-limit

msa-limit

Msa-limit is an analysis pipeline to test the efficiency of different multiple alignment software (MSA) on long reads. Using nanopore reads and a reference, it generates consensus sequences from the different MSA software to compare to the reference and see if the alignment is correct. (See the schematic in the doc file for more details)

Usable MSA software: muscle,mafft,poa,kalign,spoa,kalign3,clustalo,abpoa,tcoffee

Usage

Conda (>4.10) must be installed (see https://docs.conda.io/projects/conda/en/latest/user-guide/install/linux.html)

To install msa-limit:

git clone https://gitlab.cristal.univ-lille.fr/crohmer/msa-limit.git
cd msa-limit

Run a test to verify proper operation:

./msa-limit.py test

To start the analysis pipeline:

Usage:
msa-limit.py -i <file_reads> -r <file_ref> [-options]

Arguments: 
  required:
    -i <string>
       nanopore long reads file (fasta or fastq)
    -r <string>
       reference sequence file (fasta, a single sequence)
       IUPAC consensus sequence in the diploid case

  optional:
    -n <string>
       default: date and time of execution
       name of the experiment
    -o <int> 
       default: 10
       number of regions to be tested
    -b <int>,<int>,...
       beginning(s) position of region(s) (replacing -o)
    -d <int>,<int>,...
       default: 10,20,50
       sequencing depth(s) (number of reads)
    -s <int>,<int>,...
       default: 100,200
       size(s) of region(s)
    -t <int>,<int>,... 
       default: 50
       threshold(s) for sequences consensus
    -m <string>,<string>,...
       default: muscle,mafft,poa,kalign,spoa,kalign3,clustalo,abpoa,tcoffee (all)
       MSA software(s) to run
    -h
       help

Ex: ./msa-limit.py -i reads.fastq -r ref.fasta -b 1,150 -n exp -d 10,100 -s 100,200 -t 50,75 -m mafft,poa

Others modes

There are other features than the basic one for msa-limit:

Usage:
msa-limit.py -i <file_reads> -r <file_ref> [-options]

Other modes: 
  test 
      Launches a pipeline test
  list 
      List of existing experiments
  summary
      More readable summary of experiments for a human
        optional: 
         -n <string> <string> <string> ...
            default: all the names of the experiments
            names of the experiments you want to display in the summary.
  run_config <string> <string> ...
       Launches the pipeline from configuration file(s)
         required: path to the configuration file(s).
  rulegraph
       Displays a graph of the snakemake rules

Configuration file

The basic mode of msa-limit creates a configuration file which is then used by the pipeline. It is possible with the run_config mode (msa-limit run_config <config_file>) to directly launch the pipeline with its own configuration file which must respect the following format:

I: <reads_file>    #REQUIRED, absolute path of preference
I: <ref_file>      #REQUIRED, absolute path of preference
n: test            #OPTIONAL, -n
D: [10,20,50]      #OPTIONAL, -d
S: [100,200]       #OPTIONAL, -s
T: [50]            #OPTIONAL, -t
M: [muscle,mafft]  #OPTIONAL, -m
O: 10              #OPTIONAL, -o, can be replaced by -b (B: [1,150])

Only snakemake

This pipeline is created from snakemake. If you are familiar with this tool, you can launch the pipeline directly from snakemake with a configuration file. You will need to install snakemake (6.10++) and set the option to use conda

snakemake --configfile <config_file> -c24 --use-conda

Dependencies

  • conda 4.10.1+
  • python 3.7.4+

Add a new msa software

If you want to add a new msa software in the pipeline, you will have to add a rule in the Snakefile. You will have to either install the software locally or create a conda environment file with the software. The output must be in fasta format In the following commands, replace <new_msa> with the name of the software.

Create a conda environment file:

conda create -n <new_msa>
conda install <new_msa>
conda env export >env_conda/<new_msa>.yaml

Add the rule below in the Snakefile. Replace <msa_limit> with the name of the software. Replaces <command_to_launch_the_software> with the command to run the software. In your command, the input and output file must be replaced with {input} and {output.out}. (Ex: muscle -in {input} -out {output.out})

rule <new_msa> :
    input :
        os.path.join('{data_set}','selected_read','reads_r{region_size}_d{depth}.fasta')
    output :
        time = os.path.join('{data_set}','time','MSA_<new_msa>_r{region_size}_d{depth}'),
        out = os.path.join('{data_set}','msa','MSA_<new_msa>_r{region_size}_d{depth}.fasta')
    message:
        "<new_msa> for {wildcards.data_set} (Region size={wildcards.region_size} & Depth={wildcards.depth})"
    log:
        os.path.join('{data_set}','logs','6_<new_msa>_r{region_size}_d{depth}.log')
    conda:   #Only if you use conda
        "env_conda/<new_msa>.yaml"
    shell :
        './src/run_MSA.sh "<command_to_launch_the_software>" {input} {output.out} {output.time} {log} 1'

Warning: If the output of the software is done by the terminal output stream, put only the command with the input and change the 6th parameter of the script run_msa.sh from 1 to 0 (see the rule for Spoa for this case)

Potential issue

Abpoa doesn't run

abpoa may not launch from conda on some machines. To solve this problem, you will have to install it locally (see https://github.com/yangao07/abPOA) and modify the abpoa rule.