Skip to content
Snippets Groups Projects
Commit 896233a7 authored by Touzet Hélène's avatar Touzet Hélène
Browse files

Edit help.php

parent 80cf541c
Branches patch-2
No related tags found
No related merge requests found
......@@ -18,12 +18,20 @@
<!-- Modifier le <h2> et le contenu de la page -->
<h2>Input form (basic)</h2>
<h2>PAMPA help</h2>
This page is a manual for PAMPA web server for species identification. It processes a set of MS spectra as input and determines the best taxonomic assignment for each spectrum using a database of peptide markers.
The full documentation of the software is accessible on the <a href="https://github.com/touzet/pampa/wiki">github</a> of the software.
<h3>Name of the job</h3>
You can enter any name.
<h3>Mass spectra</h3>
<p>
<B>Upload the MS spectra files :</B> PAMPA can process MALDI-TOF and MALDI-FTICR spectra. In all cases, we recommend deisotoping the mass spectra before processing them. <br>It can recognize the following formats, by the extension of the file name:
<B>Upload MS spectral files :</B> PAMPA can process MALDI-TOF and MALDI-FTICR spectra and recognizes the following formats, by the extension of the file name:
</p>
<ul>
......@@ -36,186 +44,78 @@
<p>
User can upload several files. It is also possible to provide a ZIP archive containing all files.
In all cases, MS spectra should be preprocessed first (calibration, replicates, deisotoping).
</p>
<br>
<p>
<b>Mass error :</b> The error margin is related to the resolution of the mass spectrometer, that is its ability to distinguish closely spaced peaks. We employ it to set an upper bound on the deviation between a peak and the theoretical mass of the associated peptide.
<b>Error margin tolerance:</b>
The error margin is related to the resolution of the mass spectrometer and its ability to distinguish closely spaced peaks. This is the maximal deviation between a peak and the theoretical mass of the peptide marker.
It can be expressed in Dalton or in ppm (parts per million).
</p>
<ul>
<li>
Optimize for MALDI-TOF spectra: This option corresponds to a value of 50 ppm.
Optimized for MALDI-TOF spectra: This option corresponds to a value of 50 ppm.
</li>
<li>
Optimize for MALDI-FTICR spectra: This option corresponds to a value of 5 ppm.
Optimized for MALDI-FTICR spectra: This option corresponds to a value of 5 ppm.
</li>
<li>
Custom value in ppm: Enter any value between 1 and 1000
</li>
<li>
Custom value in Daltons : Enter any value between 0.002 and 0.998
Custom value in Dalton: Enter any value between 0.002 and 0.998
</li>
</ul>
<br>
<h3>Results</h3>
<ul>
<li>
<b>Only optimal results :</b> with this option, PAMPA identifies the species with the smallest P-value for each mass spectrum.
</li>
<li>
<b>Near-optimal results within a suboptimality percentage :</b> allows to obtain also near-optimal solutions. For that, you can set the suboptimality range as a percentage from 0 to 100, with the default being 100 (corresponding to solutions with the highest number of marker peptides). <br>For example, if the optimal solutions has 11 marker peptides, a value of 80 will provide solutions with 9 markers or more.
</li>
<li>
<b>All results within a suboptimality percentage :</b> this option is linked to the previous option and modifies its behavior. When the previous option is used alone, it generates only near-optimal solutions that are not included in any other solution. This option makes the program to compute all solutions, even those that are included in other solutions.
</li>
</ul>
<h3> Basic mode / Advanced mode</h3>
<br>
The web form has two modes. The basic mode provides a database of curated peptide markers, sequences, and taxonomies.
This is the simplest mode to start with PAMPA. The advanced mode allows users to provide their own peptide markers, sequences and taxonomies.
<h2>Advanced analysis</h2>
<h3> Organisms and peptide markers in basic mode</h3>
<h3>Peptide tables</h3>
When selecting "mammals", PAMPA utilizes a predefined database of peptide markers in conjunction with the NCBI taxonomy. The list is accessible at https://docs.google.com/spreadsheets/d/1nwELNshZxF0h6DkIFNAYXDJqmq4NOSUOLWQlTZIzUDQ/
You can also restrict the set of species by selecting them one at a time.
<p>
Peptide markers are organized within peptide tables, which are TSV files where each column corresponds to a field. Twelve fields are recognized by the program.
</p>
<ul>
<li>
Rank : Taxonomic rank
</li>
<li>
Taxid : Taxonomic identifier
</li>
<li>
Taxon name : Scientific name
</li>
<li>
Sequence : Marker peptide sequence
</li>
<li>
PTM : Description of post-translational modifications applied to the marker peptide (see <a href="/pampa/help.php#PTM_description">PTM description</a> section)
</li>
<li>
Name : Marker name
</li>
<li>
Mass : Peptide mass
</li>
<li>
Gene : Gene name, e.g., COL1A1
</li>
<li>
SeqId : Sequence identifier(s) of the protein sequence from which the marker peptide is derived
</li>
<li>
Begin : Start position of the peptide marker within the protein sequence
</li>
<li>
End : End position of the peptide marker within the protein sequence
</li>
<li>
Comment : Additional comments about the marker
</li>
</ul>
<b>PTMs, include deamidations:</b> This option allows up to one deamidation to all peptide markers containing at least one asparagine or one glutamine.
<p>
The first row of the file should contain column headings.
</p>
<h3> Organisms, peptide markers and taxonomy in advanced mode</h3>
<p>
Most of these fields are optional and are here for reference. The following information is mandatory:
<p>
<ul>
<li>
You must provide a <u>taxid</u> for the peptide marker. Rank and taxon names are included primarily to enhance the clarity of results.
</li>
<li>
You should furnish either a <u>sequence</u>, possibly with a <a href="/pampa/help.php#PTM_description">PTM description</a>, or a <u>mass</u> for your marker peptide. If the sequence is provided without a mass, the program will automatically compute the mass from it. To do so, it will utilize either the PTM description (when available) or infer potential PTMs from the sequence.
</li>
</ul>
In this mode, the user can provide its own peptide table. It is also possible to supply FASTA amino acid sequences for the representative species instead.
These sequences will undergo in silico digestion to identify all tryptic peptides, allowing for up to one missed cleavage. Masses are then automatically computed.
In both cases, you can also provide your own taxonomy or select the NCBI taxonomy.
<p>
Lastly, you have the option to include additional fields (i.e., extra columns) for your own purposes. These fields will be disregarded by PAMPA.
</p>
For a detailed description of the formats for peptide tables, sequences, and taxonomies, we refer the user to the <a href="https://github.com/touzet/pampa/wiki">full documentation</a>.
<p>
<b>Where to find peptide tables, how to generate them ?</b> An example of peptide table for mammals is accessible <a href="/pampa/data_pampa/table_mammals_with_deamidation.tsv" download="table_mammals.tsv">here</a>. You can manually edit these peptide table files or create your own using any spreadsheet software and opting for the TSV export format.<br>
Alternatively, <a href="https://github.com/touzet/pampa#pampa-craft" target="_blank">PAMPA CRAFT</a> offers automated methods for generating peptide tables.
</p>
<h3>Results</h3>
<br>
For each spectrum, PAMPA will give the best species assignments, based on the peptide markers found in the spectrum.
The number of species can be customized with three options.
<h3>FASTA sequences</h3>
<p>
PAMPA processes amino-acid sequences. For that, it uses the standard FASTA format with UniprotKB-like header. The first line starts with a greater-than character (>) followed by some sequence identifier (SeqID), which is provided for informational purposes and can be customized by the user. Additionally, this line must contain three mandatory fields :
</p>
<ul>
<li>
OS: scientific name of the organism
<b>Only optimal results:</b> PAMPA identifies the species with the lowest P-value for each mass spectrum. The P-value reflects the correlation between the peaks of the spectrum and the masses of the peptides from the peptide table.
</li>
<li>
OX: taxonomomic identifier of the organism, such as assigned by the NCBI
<b>Near-optimal results within a suboptimality percentage:</b> allows to obtain also near-optimal solutions. For that, you can set the suboptimality range as a percentage from 0 to 100, with the default being 100 (corresponding to solutions with the highest number of marker peptides). <br>For example, if the optimal solutions has 11 marker peptides, a value of 80 will provide all solutions with 9 markers or more.
</li>
<li>
GN: gene name
<b>All results within a suboptimality percentage:</b> this option is linked to the previous option and modifies its behavior. When the previous option is used alone, it generates only near-optimal solutions that are not included in any other solution. This option makes the program to compute all solutions, even those that are included in other solutions.
</li>
</ul>
<p>
The other lines are the sequence representation, with one letter per amino acid.
</p>
<p>
For example:
</p>
<h3>Run an example</h3>
<pre>
>P02453 OS=Bos taurus OX=9913 GN=COL1A1 <br>
MFSFVDLRLLLLLAATALLTHGQEEGQEEGQEEDIPPVTCVQNGLRYHDRDVWKPVPCQI<br>
CVCDNGNVLCDDVICDELKDCPNAKVPTDECCPVCPEGQESPTDQETTGVEGPKGDTGPR<br>
GPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGISVPGPM<br>
GPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRP<br>
GERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQM
</pre>
<br>
<h3>Taxonomy</h3>
<p>
The program offers the optional possibility to add taxonomic information to the species identification. In this case, you can use the file provided or submit your own.
</p>
<p>
The taxonomy must be in the form of a Tab-Separated Values (TSV) file comprising five columns: Taxid, Common name, Scientific name, Parent (taxid), and Rank (species, genus, etc.). You can obtain this type of file directly from UniProt (<a href="https://www.uniprot.org/taxonomy" target="_blank">https://www.uniprot.org/taxonomy</a>) by following these steps:
</p>
<ol>
<li>
Use the search bar to find your desired clade, entering its common name, scientific name, or taxid.
</li>
<li>
Select the clade of interest and click on 'Browse all descendants.'
</li>
<li>
Locate the 'download' link.
</li>
<li>
Choose the TSV format and customize the columns in the following order: Common name, Scientific name, Parent, and Rank.
</li>
<li>
Proceed to download the taxonomy file.
</li>
</ol>
<h3>Visualising and exploring results</h3>
<br>
<h2>Exploring results</h2>
<p>
For each spectrum, the output file will give the best assignment, based on the highest number of marker peptides. It contains the following information :
</p>
<ul>
<li>
Peaks from the spectrum that match the marker petides
......@@ -247,41 +147,13 @@ The taxonomy must be in the form of a Tab-Separated Values (TSV) file comprising
<br>
<h2>Additional information</h2>
</div>
</div>
</body>
<h3 id="PTM_description">PTM description</h3>
<p>
Peptide tables include a field labeled <b>PTM</b>, which is utilized to describe the post-translational modifications (PTMs) applied to the corresponding peptide. PAMPA recognizes three types of PTMs :
</p>
<ul>
<li>
Oxylation of prolines (indicated by the single-letter code 'O')
</li>
<li>
Deamidation of asparagine and glutamine (indicated by the single-letter code 'D')
</li>
<li>
Phosphorylation of serine, threonine, and tyrosine (indicated by the single-letter code 'P')
</li>
</ul>
<p>
The PTM description is a concise representation of the number of oxylations, deamidations and phosphorylations necessary to compute the mass of a peptide sequence. For instance, '2O1D' signifies two oxyprolines and one deamidation, '1P4O' represents one phosphorylation and four oxyprolines, '2O' corresponds to two oxyprolines without any deamidation and phosphorylation. When no PTM applies, the description should be '0O', or '0D', etc.
</p>
<p>
When the PTM description field is left empty in the peptide table, it signifies that PTMs are not specified. In such cases, PAMPA directly infers PTMs based on two rules:
</p>
<ul>
<li>
No deamidation and phosphorylation are added.
</li>
<li>
The number of oxyprolines is determined empirically using the following formula: Let 'p' represent the total number of prolines in the peptide, and 'pp' represent the number of prolines involved in the pattern 'GxP'. If the difference 'p-pp' is less than 3, then 'pp' oxyprolines are applied. If 'p-pp' is 3 or greater, 'pp' oxyprolines and 'pp+1' oxyprolines are applied.
</li>
</ul>
<br>
<br>
</div> <!-- center -->
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment