Skip to content
Snippets Groups Projects
Commit e06efb74 authored by Lihouck Flavien's avatar Lihouck Flavien
Browse files

Update README.md

parent a2c01459
No related branches found
No related tags found
No related merge requests found
......@@ -21,28 +21,77 @@ usage: mc_msa.py [-h] -input INPUT -output OUTPUT [-reference REFERENCE] [-size
Creates the config file then runs the Meta-consensus pipeline
optional arguments:
-h, --help show this help message and exit
-input INPUT Reads file
-output OUTPUT Target directory for the pipeline results
-reference REFERENCE Reference for alignment and statistics
-list LIST A list of regions to work on (format: [r1, r2, ...] with r1 or [rStart1_End1, rStart2_End2 ...])
-size SIZE The size for cutting region (default: 2000)
(default: no region)
-tools TOOLS The list of tools to use in the meta-consensus (default: ['abpoa', 'spoa',
'kalign2', 'kalign3', 'mafft', 'muscle'])
-h, --help
` show this help message and exit`
-input INPUT
` Reads file's path`
` (Required)`
-output OUTPUT
` Directory path for the pipeline results`
` (Required)`
-reference REFERENCE
` Reference file's path , for alignment and statistics `
-list LIST
` A list of regions to work on `
` (format: [r1, r2, ...] with r1 or [rStart1_End1, rStart2_End2 ...])`
` (default: no region)`
-size SIZE
` The size for cutting region`
` (default: 2000) `
-tools TOOLS
` The list of tools to use in the meta-consensus`
` (default: ['abpoa', 'spoa','kalign2', 'kalign3', 'mafft', 'muscle']) `
-consensus_threshold CONSENSUS_THRESHOLD
Threshold(s) used for the MSA consensus step (default: [70])
` Threshold(s) used for the MSA consensus step`
` (default: [70])`
-metaconsensus_threshold METACONSENSUS_THRESHOLD
Threshold(s) used for the Meta-consensus result (default: [60])
-depth DEPTH The depth used in the process (default: [60])
-plot PLOT Analyse the meta-consensus and MSA consensus quality (requires reference)
` Threshold(s) used for the Meta-consensus result`
` (default: [60]) `
-depth DEPTH
` The depth used in the process`
` (default: [60]) `
-plot PLOT
` Analyse the meta-consensus and MSA consensus quality`
` (requires reference) `
-region_overlap REGION_OVERLAP
The size of the overlap between regions
-cores CORES The amount of cores to use in the pipeline run (default 1)
` The size of the overlap between regions `
-cores CORES
` The amount of cores to use in the pipeline run
` (default 1) `
### Input
The input reads file, in the fasta format.
### Output
The output folder will contain 4 folders at the end of a pipeline run:
- meta-consensus : the resulting meta-consensus, for each region and with each specified thresholds and depths combination.
- consensus: the intermediary consensus for every MSA, stored in a folder tree including *region*/*depth*/*consenus_threshold*/*metaconsensus_threshold* and consensus alignment
- data : the cut reads, calculated MSAs, and possibly cut-reference.
You can use the pipeline with pre-processed MSA by adding the MSA in `output/data/msa`, naming them MSA_*TOOL*_r*START_END*_d*DEPTH*.fasta with *TOOL* the tool used, *START* and *END* the limits of the region, and *DEPTH* the read depth for the MSA.
- logs: all the logs for the pipeline **will** be here in the final version (for now, some logs end up in the consensus folder ...)
### Region selection
There are 2 (two) main ways of setting up how the regions are selected.
You can output manually the regions using `-list`, allowing 2 formats.
- `-list "[rStart1, rStart2, rStart3, ...]"` : the corresponding regions will be from Start1 to Start2 , then from Start2 to Start 3 and so on.
- `-list "[rStart1_End1, rStart2_End2, ...]"` : the corresponding regions will be from Start1 to End1, then from Start2 to End2 and so on.
You can select a region size and an 'overlap', producing regions.
`-size 2000 -overlap 50` : will create regions from the 2nd position to the 2002nd, then from the 1952nd to the 3952 and so on. This way, regions share OVERLAP basis, which can be used to join them.
Setting the region size to 0 will try to process the whole sequence in one file. This will be very slow, and cause some tools to either struggle or not produce a result.
This comes from limitations from the MSA tools themself, as for example abPOA and SPOA require a lot of available RAM to function, and Muscle will slow down a lot for larger regions.
### Depth
## Authors and acknowledgment
Flavien Lihouck
Special thanks to Coralie Rohmer's work on the tool MSA-limit, which inspired and was used in many parts of this project.
## License
Probably CC_BY ?
Probably CC_SA ?
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment