Skip to content
Snippets Groups Projects
Commit a9091afb authored by Hottlet Valentin's avatar Hottlet Valentin
Browse files

update

parent 1ccdf00f
No related branches found
No related tags found
No related merge requests found
Showing with 1146 additions and 94 deletions
File added
>XP_027830506.1 OS=Ovis aries OX=9940 GN=COL1A1 A
MFSFVDLRLLLLLAATALLTHGQEEGQEEGQEEDIPPVTCVQNGLRYHDRDVWKPVPCQICVCDNGNVLCDDVICDELKDCPNAKVPTDECCPVCPEGQESTTDQETTGVEGPKGDTGPRGPRGPAGPPGRDGIPGQPGLPGPPGPPGPPGPPGLGGNFAPQLSYGYDEKSTGISVPGPMGPSGPRGLPGPPGAPGPQGFQGPPGEPGEPGASGPMGPRGPPGPPGKNGDDGEAGKPGRPGERGPPGPQGARGLPGTAGLPGMKGHRGFSGLDGAKGDAGPAGPKGEPGSPGENGAPGQMGPRGLPGERGRPGAPGPAGARGNDGATGAAGPPGPTGPAGPPGFPGAVGAKGEAGPQGPRGSEGPQGVRGEPGPPGPAGAAGPAGNPGADGQPGAKGANGAPGIAGAPGFPGARGPSGPQGPSGPPGPKGNSGEPGAPGSKGDTGAKGEPGPTGIQGPPGPAGEEGKRGARGEPGPAGLPGPPGERGGPGSRGFPGSDGVAGPKGPAGERGAPGPAGPKGSPGEAGRPGEAGLPGAKGLTGSPGSPGPDGKTGPPGPAGQDGRPGPPGPPGARGQAGVMGFPGPKGAAGEPGKAGERGVPGPPGAVGPAGKDGEAGAQGPPGPAGPAGERGEQGPAGSPGFQGLPGPAGPPGEAGKPGEQGVPGDLGAPGPSGARGERGFPGERGVQGPPGPAGPRGANGAPGNDGAKGDAGAPGAPGSQGAPGLQGMPGERGAAGLPGPKGDRGDAGPKGADGAPGKDGVRGLTGPIGPPGPAGAPGDKGETGPSGPAGPTGARGAPGDRGEPGPPGPAGFAGPPGADGQPGAKGEPGDAGAKGDAGPPGPAGPAGPPGPIGNVGAPGPKGARGSAGPPGATGFPGAAGRVGPPGPSGNAGPPGPPGPAGKEGSKGPRGETGPAGRAGEVGPPGPPGPAGEKGAPGADGPAGAPGTPGPQGIAGQRGVVGLPGQRGERGFPGLPGPSGEPGKQGPSGASGERGPPGPMGPPGLAGPPGESGREGAPGAEGSPGRDGAPGAKGDRGETGPAGPPGAPGAPGAPGPVGPAGKSGDRGETGPAGPAGPIGPVGARGPAGPQGPRGDKGETGEQGDRGIKGHRGFSGLQGPPGPPGSPGEQGPSGASGPAGPRGPPGSAGTPGKDGLNGLPGPIGPPGPRGRTGDAGPAGPPGPPGPPGPPGPPSGGYDLSFLPQPPQEKAHDGGRYYRADDANVVRDRDLEVDTTLKSLSQQIENIRSPEGSRKNPARTCRDLKMCHPDWKSGEYWIDPNQGCNLDAIKVFCNMETGETCVYPTQPSVPQKNWYISKNPKDKRHVWYGESMTGGFQFEYGGQGSDPADVAIQLTFLRLMSTEASQNITYHCKNSVAYMDQQTGSLKKALLLQGSNEIEIRAEGNSRFTYSVTYDGCTSHTGAWGKTVIEYKTTKTSRLPIIDVAPLDVGAPDQEFGFDIGSVCFL
>XP_004007775.1 OS=Ovis aries OX=9940 GN=COL1A2 A
MLSFVDTRTLLLLAVTSCLATCQSLQEATARKGPSGDRGPRGERGPPGPPGRDGDDGIPGPPGPPGPPGPPGLGGNFAAQFDGKGGGPGPMGLMGPRGPPGASGAPGPQGFQGPPGEPGEPGQTGPAGARGPPGPPGKAGEDGHPGKPGRPGERGVVGPQGARGFPGTPGLPGFKGIRGHNGLDGLKGQPGAPGVKGEPGAPGENGTPGQTGARGLPGERGRVGAPGPAGARGSDGSVGPVGPAGPIGSAGPPGFPGAPGPKGELGPVGNPGPAGPAGPRGEVGLPGLSGPVGPPGNPGANGLPGAKGAAGLPGVAGAPGLPGPRGIPGPVGAAGATGARGLVGEPGPAGSKGESGNKGEPGAVGQPGPPGPSGEEGKRGSTGEIGPAGPPGPPGLRGNPGSRGLPGADGRAGVMGPAGSRGATGPAGVRGPNGDSGRPGEPGLMGPRGFPGSPGNIGPAGKEGPAGLPGIDGRPGPIGPAGARGEPGNIGFPGPKGPTGDPGKAGEKGHAGLAGPRGAPGPDGNNGAQGPPGLQGVQGGKGEQGPAGPPGFQGLPGPAGTAGEAGKPGERGIPGEFGLPGPAGARGERGPPGESGAAGPTGPIGSRGPSGPPGPDGNKGEPGVVGAPGTAGPSGPSGLPGERGAAGIPGGKGEKGETGLRGDVGSPGRDGARGAPGAVGAPGPAGANGDRGEAGPAGPAGPAGPRGSPGERGEVGPAGPNGFAGPAGAAGQPGAKGERGTKGPKGENGPVGPTGPVGAAGPSGPNGPPGPAGSRGDGGPPGATGFPGAAGRTGPPGPAGISGPPGPPGPAGKEGLRGPRGDQGPVGRTGEPGAAGPPGFVGEKGPSGEPGTAGPPGTPGPQGLLGAPGFLGLPGSRGERGLPGVAGSVGEPGPLGIAGPPGARGPPGNVGNPGVNGAPGEAGRDGNPGNDGPPGRDGQPGHKGERGYPGNAGPVGAAGAPGPQGPVGPTGKHGSRGEPGPVGAVGPAGAVGPRGPSGPQGIRGDKGEPGDKGPRGLPGLKGHNGLQGLPGLAGHHGDQGAPGAVGPAGPRGPAGPTGPAGKDGRTGQPGAVGPAGIRGSQGSQGPAGPPGPPGPPGPPGPSGGGYDFGFDGDFYRADQPRSPASLRPKDYEVDATLKSLNNQIETLLTPEGSRKNPARTCRDLRLSHPEWSSGYYWIDPNQGCTMDAIKVYCDFSTGETCIRAQPEDIPVKNWYRNSKAKKHVWVGETINGGTQFEYNVEGVTTKEMATQLAFMRLLANHASQNITYHCKNSIAYMDEETGNLKKAVILQGSNDVELVAEGNSRFTYTVLVDGCSKKTNEWKKTIIEYKTNKPSRLPILDIAPLDIGGADQEIRLNIGPVCFK
debug.log 0 → 100644
NP_031768.2 OS=Mus musculus OX=10090 GN=COL1A1
NP_031769.2 OS=Mus musculus OX=10090 GN=COL1A2
XP_032768589.1 OS=Rattus rattus OX=10117 GN=COL1A1
XP_032762867.1 OS=Rattus rattus OX=10117 GN=COL1A2
NP_445756.1 OS=Rattus norvegicus OX=10116 GN=COL1A1
NP_445808.2 OS=Rattus norvegicus OX=10116 GN=COL1A2
constraints:0
hard sequences:6
soft sequences:0
soft constraints:0
No taxonomy provided. Unable to apply TAXONOMY completion.
NP_031768.2 OS=Mus musculus OX=10090 GN=COL1A1
NP_031769.2 OS=Mus musculus OX=10090 GN=COL1A2
XP_032768589.1 OS=Rattus rattus OX=10117 GN=COL1A1
XP_032762867.1 OS=Rattus rattus OX=10117 GN=COL1A2
NP_445756.1 OS=Rattus norvegicus OX=10116 GN=COL1A1
NP_445808.2 OS=Rattus norvegicus OX=10116 GN=COL1A2
constraints:0
hard sequences:6
soft sequences:0
soft constraints:0
Mus musculus: multiple peptides found for 1592.8.
No taxonomy provided. Unable to apply TAXONOMY completion.
No taxonomy provided. Unable to apply TAXONOMY completion.
No taxonomy provided. Unable to apply TAXONOMY completion.
File Taxonomy/taxonomy_mammals.tsv: taxID 40674 is not documented. Ignored.
File test/taxonomy_mammals.tsv, line 2: format error. Line is ignored
File test/taxonomy_mammals.tsv, line 2: format error. Line is ignored
File test/taxonomy_mammals.tsv: taxID 9MAMM is not documented. Ignored.
File test/taxonomy_mammals.tsv: taxID 9MAMM is not documented. Ignored.
TaxID None not found in TAXONOMY file.
File test/taxonomy_mammals.tsv: taxID 9MAMM is not documented. Ignored.
TaxID None not found in TAXONOMY file.
File test/taxonomy_mammals.tsv: taxID 32524 is not documented. Ignored.
File test/taxonomy_mammals.tsv: taxID 32524 is not documented. Ignored.
File test/taxonomy_mammals.tsv: taxID Amniota is not documented. Ignored.
NP_031768.2 OS=Mus musculus OX=10090 GN=COL1A1
NP_031769.2 OS=Mus musculus OX=10090 GN=COL1A2
XP_032768589.1 OS=Rattus rattus OX=10117 GN=COL1A1
XP_032762867.1 OS=Rattus rattus OX=10117 GN=COL1A2
NP_445756.1 OS=Rattus norvegicus OX=10116 GN=COL1A1
NP_445808.2 OS=Rattus norvegicus OX=10116 GN=COL1A2
constraints:0
hard sequences:6
soft sequences:0
soft constraints:0
File test/taxonomy_mammals.tsv: taxID Amniota is not documented. Ignored.
NP_031768.2 OS=Mus musculus OX=10090 GN=COL1A1
NP_031769.2 OS=Mus musculus OX=10090 GN=COL1A2
XP_032768589.1 OS=Rattus rattus OX=10117 GN=COL1A1
XP_032762867.1 OS=Rattus rattus OX=10117 GN=COL1A2
NP_445756.1 OS=Rattus norvegicus OX=10116 GN=COL1A1
NP_445808.2 OS=Rattus norvegicus OX=10116 GN=COL1A2
constraints:0
hard sequences:6
soft sequences:0
soft constraints:0
Mus musculus: multiple peptides found for 1592.8.
File test/taxonomy_mammals.tsv: taxID Amniota is not documented. Ignored.
NP_031768.2 OS=Mus musculus OX=10090 GN=COL1A1
NP_031769.2 OS=Mus musculus OX=10090 GN=COL1A2
XP_032768589.1 OS=Rattus rattus OX=10117 GN=COL1A1
XP_032762867.1 OS=Rattus rattus OX=10117 GN=COL1A2
NP_445756.1 OS=Rattus norvegicus OX=10116 GN=COL1A1
NP_445808.2 OS=Rattus norvegicus OX=10116 GN=COL1A2
constraints:0
hard sequences:6
soft sequences:0
soft constraints:0
Mus musculus: multiple peptides found for 1592.8.
File test/taxonomy_mammals.tsv: taxID Amniota is not documented. Ignored.
NP_031768.2 OS=Mus musculus OX=10090 GN=COL1A1
NP_031769.2 OS=Mus musculus OX=10090 GN=COL1A2
XP_032768589.1 OS=Rattus rattus OX=10117 GN=COL1A1
XP_032762867.1 OS=Rattus rattus OX=10117 GN=COL1A2
NP_445756.1 OS=Rattus norvegicus OX=10116 GN=COL1A1
NP_445808.2 OS=Rattus norvegicus OX=10116 GN=COL1A2
constraints:0
hard sequences:6
soft sequences:0
soft constraints:0
Mus musculus: multiple peptides found for 1592.8.
XP_027830506.1 OS=Ovis aries OX=9940 GN=COL1A1 A
XP_004007775.1 OS=Ovis aries OX=9940 GN=COL1A2 A
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
XP_027830506.1 OS=Ovis aries OX=9940 GN=COL1A1 A
XP_004007775.1 OS=Ovis aries OX=9940 GN=COL1A2 A
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
File test/SPECTRA_SHEEP/__MACOSX is not recognized (unknown format). Accepted formats are csv, mgf and mzML. File ignored.
File test/SPECTRA_SHEEP/SPECTRA_SHEEP is not recognized (unknown format). Accepted formats are csv, mgf and mzML. File ignored.
No valid spectra found.
Please refer to the warning.log file for more detail.
XP_027830506.1 OS=Ovis aries OX=9940 GN=COL1A1 A
XP_004007775.1 OS=Ovis aries OX=9940 GN=COL1A2 A
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
Missing parameter: output (-o).
Missing parameter: output (-o).
XP_047563680.1 OS=Lutra lutra OX=9657 GN=COL1A1
XP_047551750.1 OS=Lutra lutra OX=9657 GN=COL1A2
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
XP_047563680.1 OS=Lutra lutra OX=9657 GN=COL1A1
XP_047551750.1 OS=Lutra lutra OX=9657 GN=COL1A2
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
Missing target sequences (-f or -d).
XP_047563680.1 OS=Lutra lutra OX=9657 GN=COL1A1
XP_047551750.1 OS=Lutra lutra OX=9657 GN=COL1A2
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
XP_047563680.1 OS=Lutra lutra OX=9657 GN=COL1A1
XP_047551750.1 OS=Lutra lutra OX=9657 GN=COL1A2
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
File Taxonomy/taxonomy_mammals.tsv: taxID 40674 is not documented. Ignored.
XP_047563680.1 OS=Lutra lutra OX=9657 GN=COL1A1
XP_047551750.1 OS=Lutra lutra OX=9657 GN=COL1A2
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
XP_047563680.1 OS=Lutra lutra OX=9657 GN=COL1A1
XP_047551750.1 OS=Lutra lutra OX=9657 GN=COL1A2
constraints:0
hard sequences:2
soft sequences:0
soft constraints:0
No taxonomy provided. Unable to apply TAXONOMY completion.
No valid spectra found.
Please refer to the warning.log file for more detail.
Missing parameter: output (-o).
Missing parameter: output (-o).
Missing target sequences (-f or -d).
Rank TaxID Taxon name Sequence PTM Mass Marker Gene Hel Length SeqID Begin End Status Comment Digestion
species 9657 Lutra lutra GVQGPPGPAGPR 1H 1105.57488265473 COL1A1-508/P1 COL1A1 508 12 XP_047563680.1 682 693 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra TGHPGTVGPAGIR 0H 1219.65419522415 COL1A2-978/A COL1A2 978 13 XP_047551750.1 1069 1081 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra TGHPGTVGPAGIR 1H 1235.64911022415 COL1A2-978/A COL1A2 978 13 XP_047551750.1 1069 1081 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra GLPGEFGLPGPAGPR 2H 1453.74340491225 COL1A2-484/B COL1A2 484 15 XP_047551750.1 575 589 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra GPPGESGAAGPSGPIGSR 1H 1566.75067474022 COL1A2-502/C COL1A2 502 18 XP_047551750.1 593 610 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra GPNGEAGSAGPSGPPGLR 2H 1609.75648877709 COL1A2-292/P2 COL1A2 292 18 XP_047551750.1 383 400 Genetic Homology : R - peptide - G, 1 mismatch with GPNGEAGSTGPSGPPGLR. Yes
species 9657 Lutra lutra GLPGVSGSVGEPGPLGIAGPPGAR 3H 2147.10912340618 COL1A2-793/D COL1A2 793 24 XP_047551750.1 884 907 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra GLTGPIGPPGPAGAPGDKGEAGPSGPAGPTGAR 2H 2853.4125752358395 COL1A1-586/F COL1A1 586 33 XP_047563680.1 760 792 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra GLTGPIGPPGPAGAPGDKGEAGPSGPAGPTGAR 3H 2869.4074902358398 COL1A1-586/F COL1A1 586 33 XP_047563680.1 760 792 Genetic Homology : R - peptide - G, exact match. Yes
species 9657 Lutra lutra GPSGEPGTAGPPGTSGPQGLLGAPGILGLPGSR 4H 2973.4912202402397 COL1A2-757/G COL1A2 757 33 XP_047551750.1 848 880 Genetic Homology : K - peptide - G, 2 mismatches with GPSGEAGTAGPPGTPGPQGLLGAPGILGLPGSR. Yes
species 9657 Lutra lutra GPSGEPGTAGPPGTSGPQGLLGAPGILGLPGSR 5H 2989.48613524024 COL1A2-757/G COL1A2 757 33 XP_047551750.1 848 880 Genetic Homology : K - peptide - G, 2 mismatches with GPSGEAGTAGPPGTPGPQGLLGAPGILGLPGSR. Yes
Taxon name Marker Sequence Mass PTM Hel Gene SeqID Begin End Rank
Mus musculus A SGQPGPVGPAGVR 1178.62764612314 0H 978 COL1A2 NP_031769.2 1074 1086 species
Mus musculus A SGQPGPVGPAGVR 1194.62256112314 1H 978 COL1A2 NP_031769.2 1074 1086 species
Mus musculus B GLPGEFGLPGPAGPR 1453.74340491225 2H 484 COL1A2 NP_031769.2 580 594 species
Mus musculus C GEPGPAGSVGPVGAVGPR 1592.8027106935199 2H 889 COL1A2 NP_031769.2 985 1002 species
Mus musculus C GATGLPGVAGAPGLPGPR 1592.83909658268 3H 220 COL1A2 NP_031769.2 316 333 species
This diff is collapsed.
......@@ -21,11 +21,12 @@ from src import supplement
from src import compute_masses
from src import limit as lmt
def check_and_update_parameters(homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output):
def check_and_update_parameters(homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, web):
"""
Parameters checking and fixing. Configuration of loggers
"""
if output is None:
message.configure("")
message.escape("Missing parameter: output (-o).")
......@@ -93,7 +94,9 @@ def check_and_update_parameters(homology, deamidation, allpeptides, fillin, sele
if not os.path.isdir(directory):
message.escape("Directory "+directory+" not found (-d).")
return (homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, report, output_dir, report_file)
# TO DO : add taxonomy
return (homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, report, output_dir, report_file, web)
def create_report_homology(set_of_markers):
print(" Mode : HOMOLOGY")
......@@ -203,17 +206,17 @@ def main():
try:
# to do: add taxonomy
(homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, report, output_dir, report_file)=check_and_update_parameters(args.homology, args.deamidation, args.allpeptides, args.fillin, args.selection, args.peptide_table, args.fasta, args.directory, args.spectra, args.limit, args.taxonomy, args.output)
(homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, report, output_dir, report_file,web)=check_and_update_parameters(args.homology, args.deamidation, args.allpeptides, args.fillin, args.selection, args.peptide_table, args.fasta, args.directory, args.spectra, args.limit, args.taxonomy, args.output, args.web)
create_report_header(report)
list_of_constraints=lmt.parse_limits(limit)
primary_taxonomy=ta.parse_taxonomy_simple_file(taxonomy)
if homology:
set_of_markers, _ = pt.parse_peptide_tables(peptide_table, None, None)
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, None)
list_of_markers=homo.find_markers_all_sequences(set_of_sequences, set_of_markers)
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, primary_taxonomy)
list_of_markers=homo.find_markers_all_sequences(set_of_sequences, set_of_markers, primary_taxonomy)
list_of_markers=ta.add_taxonomy_ranks(list_of_markers, primary_taxonomy)
pt.build_peptide_table_from_set_of_markers(list_of_markers,output)
create_report_homology(set_of_markers)
......@@ -227,15 +230,30 @@ def main():
pt.build_peptide_table_from_set_of_markers(set_of_markers.union(set_of_new_markers),output, list_of_headers)
create_report_deamidation(peptide_table, set_of_codes)
if allpeptides:
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, None)
if len(set_of_sequences)==0:
message.escape("File "+fasta+": no valid sequences found.\n")
message.escape("No valid sequences found.\n")
data=config.parse_config_file()
if spectra is None:
set_of_new_markers = compute_masses.add_PTM_or_masses_to_markers(seq.in_silico_digestion(set_of_sequences, data["number_of_missed_cleavages"], data["min_peptide_length"], data["max_peptide_length"]))
if len(set_of_new_markers)==0:
message.escape("No valid peptide markers found.\n")
else:
set_of_markers = compute_masses.add_PTM_or_masses_to_markers(seq.in_silico_digestion(set_of_sequences, data["number_of_missed_cleavages"], data["min_peptide_length"], data["max_peptide_length"]), True, True)
if len(set_of_markers)==0:
message.escape("No valid peptide markers found.\n")
final_list_of_spectra=[]
for f in os.listdir(args.spectra):
file_name = os.path.join(spectra, f)
list_of_spectra=mass_spectrum.parser(file_name,f)
if len(list_of_spectra)>0:
final_list_of_spectra.extend(list_of_spectra)
if len(final_list_of_spectra)==0:
message.escape("No valid spectra found.\n Please refer to the warning.log file for more detail.")
minimal_number_of_spectra=max(1, 1*len(final_list_of_spectra)/5)
set_of_new_markers=marker_filtering.filter_set_of_markers(set_of_markers, final_list_of_spectra, args.resolution, minimal_number_of_spectra)
list_of_markers=markers.sort_and_merge(set_of_new_markers)
pt.build_peptide_table_from_set_of_markers(list_of_markers,output)
create_report_allpeptides(set_of_sequences, data["number_of_missed_cleavages"], data["min_peptide_length"], data["max_peptide_length"])
......@@ -262,19 +280,15 @@ def main():
if fillin:
# to do: check that there is a single peptide table
set_of_markers, list_of_headers = pt.parse_peptide_tables(peptide_table, list_of_constraints, None, False) # check list_of_constraints here.
primary_taxonomy=ta.parse_taxonomy_simple_file(taxonomy)
set_of_incomplete_markers, set_of_complete_markers, set_of_incomplete_fields = supplement.search_for_incomplete_markers(set_of_markers, set(list_of_headers))
print ("Incomplete fields")
print(set_of_incomplete_fields)
if fasta or directory:
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, None)
set_of_incomplete_markers=markers.add_sequences_and_positions_to_markers(set_of_incomplete_markers, set_of_sequences)
if args.resolution:
set_of_incomplete_markers=markers.check_masses_and_sequences(set_of_incomplete_markers, args.resolution)
set_of_incomplete_markers=markers.find_sequences_from_mass(set_of_incomplete_markers, set_of_sequences, args.resolution)
if "Digestion" in set_of_incomplete_fields:
print ("\n DIGESTION\n")
set_of_incomplete_markers=supplement.add_digestion_status(set_of_incomplete_markers, set_of_sequences)
set_of_new_markers=compute_masses.add_PTM_or_masses_to_markers(set_of_incomplete_markers)
set_of_new_markers=supplement.add_length(compute_masses.add_PTM_or_masses_to_markers(set_of_incomplete_markers))
set_of_new_markers=ta.supplement_taxonomic_information(set_of_new_markers, primary_taxonomy)
#list_of_markers=list(set_of_new_markers | set_of_complete_markers)
......@@ -288,7 +302,7 @@ def main():
create_report_footer(output_dir, output, report)
if not args.web:
if not web:
print("")
print(" Job completed.")
print(" All results are available in the following files.")
......@@ -296,15 +310,13 @@ def main():
print(f" - New peptide table : {output}")
print(f" - Report on the run : {report}")
print("")
# TO DO: add the new peptide table, if necessary
if os.path.getsize(os.path.join(output_dir, "warning.log")) > 0:
if not args.web:
print("Warnings were raised during the execution.")
print("Please refer to the warning.log file for detail.")
except message.InputError:
if not args.web:
if not web:
print("\n An error occured with your input. Stopping execution.")
print(" Please refer to the warning.log file for detail.")
else:
......
============================================
PAMPA CRAFT
============================================
Mon Apr 7 11:33:37 2025
---------------------------------
PARAMETERS
---------------------------------
Helical start175 / start759
Helical start175 / start759
Helical start175 / start681
Helical start92 / start592
Helical start92 / start1068
Helical start92 / start1068
Helical start92 / start574
Helical start92 / start883
Helical start92 / start847
Helical start92 / start847
Helical start92 / start382
Mode : HOMOLOGY
Input peptide table:
---------------------------------
NEW PEPTIDE TABLE
---------------------------------
Total number of species: 1
Gene : COL1A1
COL1A1-508/P1 - 1H COL1A1-586/F - 2H COL1A1-586/F - 3H
Meles meles
Gene : COL1A2
COL1A2-292/P2 - 2H COL1A2-484/B - 2H COL1A2-502/C - 1H COL1A2-757/G - 4H COL1A2-757/G - 5H COL1A2-793/D - 3H COL1A2-978/A - 0H COL1A2-978/A - 1H
Meles meles
---------------------------------
UNDISTINGUISHABLE SPECIES
---------------------------------
None
---------------------------------
INCLUSION OF SPECIES
---------------------------------
None
Job completed.
All results are available in the following files.
- New peptide table : output_homolo.tsv
- Report on the run : report_output_homolo.txt
============================================
PAMPA CRAFT
============================================
Tue Mar 25 12:28:26 2025
---------------------------------
PARAMETERS
---------------------------------
Incomplete fields
{'Hel', 'Rank', 'PTM', 'End', 'Sequence', 'SeqID', 'Begin'}
Mode : SUPPLEMENT
---------------------------------
INPUT PEPTIDE TABLE
---------------------------------
['test/table_mouse_ABC.tsv']
---------------------------------
INPUT SEQUENCES
---------------------------------
NP_031769.2 COL1A2 Mus musculus
XP_032762867.1 COL1A2 Rattus rattus
NP_445808.2 COL1A2 Rattus norvegicus
XP_032768589.1 COL1A1 Rattus rattus
NP_445756.1 COL1A1 Rattus norvegicus
NP_031768.2 COL1A1 Mus musculus
---------------------------------
NEW PEPTIDE TABLE
---------------------------------
Total number of species: 1
Gene : COL1A2
A - 0H A - 1H B - 2H C - 2H C - 3H
Mus musculus NP_031769.2 1074[1178.6276] 1074[1194.6226] 580[1453.7434] 985[1592.8027] 316[1592.8391]
---------------------------------
UNDISTINGUISHABLE SPECIES
---------------------------------
None
---------------------------------
INCLUSION OF SPECIES
---------------------------------
None
---------------------------------
WARNINGS
---------------------------------
Warnings were raised regarding your inputs.
- File test/taxonomy_mammals.tsv: taxID Amniota is not documented. Ignored.
- Mus musculus: multiple peptides found for 1592.8.
---------------------------------
OUTPUT FILES
---------------------------------
Main result file (TSV) : output_mouse_ABC.tsv
Report (this file) : report_output_mouse_ABC.txt
============================================
PAMPA CRAFT
============================================
Fri Apr 4 15:44:38 2025
---------------------------------
PARAMETERS
---------------------------------
Mode : ALL PEPTIDES
In silico digestion:
Enzyme: trypsine
Maximal number of missed cleavages :1
Minimal peptide length :12
Maximal peptide length :33
---------------------------------
INPUT SEQUENCES
---------------------------------
XP_004007775.1 COL1A2 Ovis aries
XP_027830506.1 COL1A1 Ovis aries
---------------------------------
OUTPUT FILES
---------------------------------
Main result file (TSV) : output_sheep.tsv
Report (this file) : report_output_sheep.txt
============================================
PAMPA CRAFT
============================================
Tue Apr 8 16:14:58 2025
---------------------------------
PARAMETERS
---------------------------------
Mode : HOMOLOGY
Input peptide table:
---------------------------------
NEW PEPTIDE TABLE
---------------------------------
Total number of species: 1
Gene : COL1A1
COL1A1-508/P1 - 1H COL1A1-586/F - 2H COL1A1-586/F - 3H
Meles meles
Gene : COL1A2
COL1A2-292/P2 - 2H COL1A2-484/B - 2H COL1A2-502/C - 1H COL1A2-757/G - 4H COL1A2-757/G - 5H COL1A2-793/D - 3H COL1A2-978/A - 0H COL1A2-978/A - 1H
Meles meles
---------------------------------
UNDISTINGUISHABLE SPECIES
---------------------------------
None
---------------------------------
INCLUSION OF SPECIES
---------------------------------
None
---------------------------------
WARNINGS
---------------------------------
Warnings were raised regarding your inputs.
- No taxonomy provided. Unable to apply TAXONOMY completion.
---------------------------------
OUTPUT FILES
---------------------------------
Main result file (TSV) : test.tsv
Report (this file) : report_test.txt
Taxon Name Sequence PTM Marker Gene
Meles meles GVQGPPGPAGPR 1H COL1A1-508/P1 COL1A1
Meles meles TGHPGTVGPAGIR 0H COL1A2-978/A COL1A2
Meles meles TGHPGTVGPAGIR 1H COL1A2-978/A COL1A2
Meles meles GLPGEFGLPGPAGPR 2H COL1A2-484/B COL1A2
Meles meles GPPGESGAAGPSGPIGSR 1H COL1A2-502/C COL1A2
Meles meles GPNGEAGSTGPSGPPGLR 2H COL1A2-292/P2 COL1A2
Meles meles GLPGVSGSVGEPGPLGIAGPPGAR 3H COL1A2-793/D COL1A2
Meles meles GLTGPIGPPGPAGAPGDKGEAGPSGPAGPTGAR 2H COL1A1-586/F COL1A1
Meles meles GLTGPIGPPGPAGAPGDKGEAGPSGPAGPTGAR 3H COL1A1-586/F COL1A1
Meles meles GPSGEAGTAGPPGTPGPQGLLGAPGILGLPGSR 4H COL1A2-757/G COL1A2
Meles meles GPSGEAGTAGPPGTPGPQGLLGAPGILGLPGSR 5H COL1A2-757/G COL1A2
\ No newline at end of file
import re
def period(seq):
pattern = r'^(?:G[A-Z]{2})+|(?:G[A-Z]{2})+G|(?:g[A-Z]{2})+G[A-Z]$'
if re.fullmatch(pattern, seq):
return 0
pattern =r'^(?:[A-Z]G[A-Z]{2})+|(?:[A-Z]G[A-Z]{2})+G|(?:[A-Z]G[A-Z]{2})+G[a-z]$'
if re.fullmatch(pattern, seq):
return 1
pattern =r'^(?:[A-Z]{2}G[A-Z]{2})+|(?:[A-Z]{2}G[A-Z]{2})+g|(?:[A-Z]{2}G[A-Z]{2})+G[A-Z]$'
if re.fullmatch(pattern, seq):
return 2
return -1
def P_pattern(seq):
p=period(seq)
if p<0:
return (-1, -1)
weak_P=0
strong_P=0
for i in range((p+2)%3, len(seq), 3):
if seq[i] == 'P':
strong_P+= 1
for i in range((p+1)%3, len(seq), 3):
if seq[i] == 'P':
weak_P+= 1
return (strong_P, weak_P)
def Pmask_distance(seq1, seq2):
if len(seq1)!=len(seq2):
return -1
r=0
for c1, c2 in zip(seq1, seq2):
if c1=='P' and c2=='P':
None
elif c1=='P'or c2=='P':
r=r+1
return r
......@@ -6,8 +6,10 @@ from collections import Counter
from src import utils as ut
from src import compute_masses as mass
from src import taxonomy
from src import sequences
from src import markers
from src import collagen
from src import config
def hamming_distance(seq1, seq2):
......@@ -15,18 +17,8 @@ def hamming_distance(seq1, seq2):
return -1
return sum(c1 != c2 for c1, c2 in zip(seq1, seq2))
def Pmask_distance(seq1, seq2):
if len(seq1)!=len(seq2):
return -1
r=0
for c1, c2 in zip(seq1, seq2):
if c1=='P' and c2=='P':
None
elif c1=='P'or c2=='P':
r=r+1
return r
def find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_markers, set_of_markers):
def find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_markers, set_of_markers, taxo):
# seq: protein (object defined in sequences.py)
# dict_of_model_markers, dictionary, key: peptide sequence [string], value: set of 3-uplets (PTM, code, gene_name)
# dict_of_taxid, key: (sequence,PTM), value: set of taxid
......@@ -84,7 +76,7 @@ def find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_ma
for pos in range(0, len(seq.sequence())-l):
d=hamming_distance((seq.sequence())[pos:pos+l], marker_seq)
if d<len(marker_seq)/10+1:
d2=Pmask_distance((seq.sequence())[pos:pos+l], marker_seq)
d2=collagen.Pmask_distance((seq.sequence())[pos:pos+l], marker_seq)
for code in set_of_codes:
if (code not in found_markers) :
found_markers[code]=(pos,d,d2, marker_seq)
......@@ -107,13 +99,10 @@ def find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_ma
dict["Sequence"]=new_sequence
dict["Begin"]= pos+1
dict["Length"]=l
print("Helical start"+str(helical_start)+" / start"+str(pos) )
if helical_start is not None and pos>helical_start:
dict["Hel"]=pos-helical_start+2
dict["End"]=pos+l
dict["Rank"]="species"
dict["PTM"]=model_marker[0]
dict["Mass"]=mass.peptide_mass_with_PTM(new_sequence,model_marker[0]) # à modifier pour conserver la masse
dict["Marker"]=model_marker[1]
dict["GN"]=model_marker[2]
dict["Status"]="Genetic"
......@@ -122,59 +111,64 @@ def find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_ma
else:
dict["Digestion"]="Yes"
if d==0:
dict["Comment"]="Homology : "+ seq.sequence()[pos-1]+" - peptide - "+ seq.sequence()[pos+l]+", exact match. "
dict["Comment"]="Homology : "+ seq.sequence()[pos-1]+" - peptide - "+ seq.sequence()[pos+l]+", exact match "
elif d==1:
dict["Comment"]="Homology : "+ seq.sequence()[pos-1]+" - peptide - "+ seq.sequence()[pos+l]+ ", 1 mismatch with " + marker_seq+". "
dict["Comment"]="Homology : "+ seq.sequence()[pos-1]+" - peptide - "+ seq.sequence()[pos+l]+ ", 1 mismatch with " + marker_seq+" "
else:
dict["Comment"]="Homology : "+ seq.sequence()[pos-1]+" - peptide - "+ seq.sequence()[pos+l]+ ", "+str(d)+ " mismatches with " + marker_seq+" "
if taxo:
set_of_potential_taxids={m.taxid() for m in set_of_markers if m.sequence()==marker_seq}
dict["Comment"]=dict["Comment"]+taxonomy.find_closest_ID(seq.taxid(),set_of_potential_taxids,taxo)
else:
dict["Comment"]="Homology : "+ seq.sequence()[pos-1]+" - peptide - "+ seq.sequence()[pos+l]+ ", "+str(d)+ " mismatches with " + marker_seq+". "
dict["Comment"]=dict["Comment"]+". "
if collagen.P_pattern(new_sequence)[0]==collagen.P_pattern(marker_seq)[0]:
dict["PTM"]=model_marker[0]
else:
dict["Comment"]=dict["Comment"]+ "PTM updated. "
new_marker=markers.Marker(field=dict)
set_of_new_markers.add(new_marker)
return set_of_new_markers
return mass.add_PTM_or_masses_to_markers(set_of_new_markers)
def find_helical_position(set_of_markers):
dict_codes={}
dict_codes={m.code():[] for m in set_of_markers}
for m in set_of_markers:
if m.helical() is None or str(m.helical())=="0":
continue
if m.code() in dict_codes:
dict_codes[m.code()].append(m.helical())
else:
dict_codes[m.code()]=[m.helical()]
dict_couting={}
for code in dict_codes:
if len(dict_codes[code])==0:
dict_codes[code]=0
continue
counts = Counter(dict_codes[code])
most_common, freq = counts.most_common(1)[0] # Get the most frequent element and its count
if freq > len(dict_codes[code]) / 2:
dict_codes[code]=most_common
else:
dict_codes[code]=0
return dict_codes
def check_helix_position(set_of_markers, set_of_new_markers):
def check_quality(set_of_markers, set_of_new_markers):
dict_codes=find_helical_position(set_of_markers)
set_of_correct_sequences=set()
set_of_dubious_markers=set()
set_of_sequences={m.sequence() for m in set_of_markers}
set_of_new_sequences={m.sequence() for m in set_of_new_markers}
dict_of_new_sequences={seq:len({m.taxon_name() for m in set_of_new_markers if m.sequence()==seq}) for seq in set_of_new_sequences}
for m in set_of_new_markers:
if m.helical() is None:
set_of_dubious_markers.add(m)
elif m.code() not in dict_codes:
continue
exist_sequence=(m.sequence() in set_of_sequences) or (dict_of_new_sequences[m.sequence()]>1)
exist_code= dict_codes[m.code()]>0 and m.helical() is not None
if not exist_code:
m.field["Quality"]=0
elif m.helical()==dict_codes[m.code()]:
m=markers.post_comment(m,"Expected position in helical region. ")
set_of_correct_sequences.add(m.sequence())
m.field["Quality"]=2
elif abs(m.helical()-dict_codes[m.code()])<4:
m=markers.post_comment(m,"Shifted position in helical region. ")
else:
set_of_dubious_markers.add(m)
for m in set_of_dubious_markers:
m.field["Hel"]=0
if m.sequence() in set_of_correct_sequences:
m=markers.post_comment(m,"No helical position. ")
m.field["Quality"]=1
else:
m=markers.post_comment(m,"Dubious sequence (no helical position, new sequence). ")
m.field["Quality"]=0
if exist_sequence:
m.field["Quality"]+=1
return set_of_new_markers
def find_markers_all_sequences(set_of_sequences, set_of_markers):
def find_markers_all_sequences(set_of_sequences, set_of_markers, taxo):
# set_of_sequences: set of target fasta sequences[object defined in sequences.py]
# construction of dict_of_model_markers and dict_of_taxid
......@@ -200,12 +194,11 @@ def find_markers_all_sequences(set_of_sequences, set_of_markers):
number_of_missed_cleavages=int(config.parse_config_file()["number_of_missed_cleavages"])
for seq in set_of_sequences:
set_of_digested_peptides=sequences.in_silico_digestion({seq}, number_of_missed_cleavages, 12, 33, False)
s=find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_markers, set_of_markers)
s=find_markers_single_sequence(seq, set_of_digested_peptides, dict_of_model_markers, set_of_markers, taxo)
set_of_new_markers.update(s)
list_of_new_markers=markers.sort_and_merge(set_of_new_markers)
list_of_new_markers=check_helix_position(set_of_markers, list_of_new_markers)
list_of_new_markers=check_quality(set_of_markers, list_of_new_markers)
return list_of_new_markers
......
......@@ -13,8 +13,7 @@ from src import markers
# input:
# list of markers, represented by a sorted list of masses (in increasing order)
# one spectrum
# indique pour chaque marqueur s'il y a un pic correspondant
# resultat: liste avec le nom du spectre si match
# output: list of matching peaks
def compare_set_of_markers_for_one_spectrum(mass_list, spectrum, resolution):
spectrum.sort()
peak_list=[None for m in mass_list]
......
......@@ -303,17 +303,19 @@ def colinearity(set_of_markers):
def str_union(s1, s2):
if s1 is None:
return s2
if s2 is None:
return s1
s=set(s1.split()) | set(s2.split())
t=' '.join(s)
return t
return ' '.join(s)
def sort_and_merge(set_of_markers):
# merge all markers having the same taxid, sequence, PTM and mass into a single marker. The new seqId is the union of all taxid. What about the comment ?
if len(set_of_markers)<2:
return list(set_of_markers)
list_of_markers=list(set_of_markers)
# TO DO: deal with empty mass
list_of_markers.sort(key= lambda m: (ut.none_str(m.taxid()),ut.none_str(m.sequence()),ut.none_str(m.PTM()),ut.none_float(m.mass()),ut.none_int(m.begin()),ut.none_str(m.seqid())))
list_of_selected_markers=[]
......@@ -383,7 +385,7 @@ def find_positions_from_sequence(m, matching_sequences):
for seq in matching_sequences:
pos=(seq.sequence()).find(m.sequence())
if pos>-1:
if m.length() is not None and m.length() !=len(seq.sequence()):
if m.length() is not None and m.length() !=len(m.sequence()):
message.warning("Peptide length modified in marker "+str(m)+". ")
if m.begin() is not None and m.begin()!=pos+1 :
message.warning("Begin position modified in marker "+str(m)+". ")
......@@ -396,7 +398,7 @@ def find_positions_from_sequence(m, matching_sequences):
dict["OX"]=seq.taxid()
dict["OS"]=seq.taxon_name()
dict["GN"]=seq.protein()
dict["Hel"]= pos + sequences.helical_region(seq)[0]
dict["Hel"]= pos - sequences.helical_region(seq)[0] +2
dict["SeqID"]= seq.seqid()
dict["Begin"]=pos+1
dict["End"]= pos + len(m.sequence())
......@@ -441,12 +443,29 @@ def find_sequence_from_positions(m, matching_sequences):
dict["SeqID"]=seq.seqid()
dict["Begin"]=begin
dict["End"]=end
if m.status() is None:
dict["Status"]="Genetic"
m2=Marker(field=dict)
m2=update_comment(m2,"Sequence deduced from positions. ")
set_of_markers.add(m2)
return set_of_markers
def check_masses_and_sequences(set_of_markers, resolution):
for m in set_of_markers:
if m.sequence() is None or m.mass() is None:
continue
m2=Marker(field={"Sequence":m.sequence()})
candidate_markers=compute_masses.add_PTM_or_masses_to_markers({m2}, True, True)
Found=False
for cm in candidate_markers:
if ut.matching_masses(cm.mass(), m.mass(), resolution):
Found=True
m.field["mass"]=cm.mass()
m.field["PTM"]=cm.PTM()
m=post_comment(m, " Adjusted mass (was "+str(m.mass())+").")
if not Found:
m=post_comment(m, " Inconsistent mass.")
return set_of_markers
# use mass to find sequence
def find_sequences_from_mass(set_of_markers, set_of_sequences, resolution):
......
......@@ -101,9 +101,7 @@ def peak_parser_csv(peak_file_name, name):
NameError: if the file is not a csv file
"""
peak_list = []
f= open (peak_file_name)
next(f)
line=1
......@@ -164,24 +162,46 @@ def peak_parser_mzml(peak_file_name,name):
list of Spectrum
"""
list_of_spectra=[]
file_name, _= os.path.splitext(peak_file_name)
with mzml.MzML(peak_file_name) as reader:
for spectrum in reader:
peak_list = []
name= file_name+" "+(
spectrum.get('scanList', {})
.get('scan', [{}])[0] # Get first scan if available
.get('name', spectrum.get('id', 'Unknown')) # Default to 'id' if 'name' is missing
)
#name = spectrum.get('name', spectrum.get('id', 'Unknown'))
#name= spectrum.get('id', 'Unknown') # Get spectrum ID
mz_values = spectrum['m/z array'] # Extract m/z values
intensity_values = spectrum['intensity array'] # Extract intensity values
for mass, intensity in zip(mz_values, intensity_values):
peak = Peak(float(mass), float(intensity))
peak_list.append(peak)
list_of_spectra.append(Spectrum(name,peak_list,""))
'''
try:
f = mzml.MzML(peak_file_name, use_index=True)
#f = mzml.MzML(peak_file_name, use_index=True)
f = mzml.MzML(peak_file_name)
except :
message.warning("File "+peak_file_name+" is incorrect (wrong format). File ignored." )
return []
list_of_spectra=[]
for s in f:
mass_array = s["m/z array"] # m/z
intensity_array = s["intensity array"] # intensity
name=s["id"]
name=s.get('id', 'Unknown')
peak_list = []
for mass, intensity in zip(mass_array.tolist(), intensity_array.tolist()):
peak = Peak(float(mass), float(intensity))
if peak not in peak_list:
peak_list.append(peak)
list_of_spectra.append(Spectrum(name,peak_list,""))
'''
return list_of_spectra
......
#!/usr/bin/env python3
import argparse
import os
import time
import sys
# local import
from src import markers
from src import sequences as seq
from src import homology as homo
from src import fasta_parsing as fa
from src import peptide_table as pt
from src import compute_masses
from src import marker_filtering
from src import mass_spectrum
from src import message
from src import taxonomy as ta
from src import config
from src import supplement
from src import compute_masses
from src import limit as lmt
def check_and_update_parameters(homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output):
"""
Parameters checking and fixing. Configuration of loggers
"""
if output is None:
message.configure("")
message.escape("Missing parameter: output (-o).")
output_dir, output_file = os.path.split(output)
if len(output_dir)>0 :
# Ensure the output directory exists. If not, create it.
if not os.path.exists(output_dir):
os.makedirs(output_dir)
message.configure(output_dir)
if deamidation is None and homology is None and allpeptides is None and fillin is None and spectra is None:
message.escape("Missing parameter: --homology, --allpeptides, --fillin, --deamidation, or -s.")
if homology is None and allpeptides is None and fillin is None:
if not peptide_table:
message.escape("Missing parameter: -p (peptide table).")
if fasta:
message.warning("Unused parameter: -f (fasta file).")
if directory:
message.warning("Unused parameter: -d (directory for fasta files)")
if (homology and allpeptides) or (homology and fillin) or (allpeptides and fillin):
message.escape("Parameters --homology, --allpeptides and --fillin are mutually exclusive.")
extension=output_file[-4:].lower()
if extension!=".tsv":
output_file=output_file+".tsv"
else:
output_file=output_file[:-4]+".tsv"
output=os.path.join(output_dir, output_file)
report_file="report_"+output_file.replace("tsv", "txt")
report=os.path.join(output_dir, report_file)
if fillin or homology or deamidation:
if peptide_table is None:
message.escape("Missing parameter: -p (peptide table)")
for pep in peptide_table:
if not os.path.isfile(pep):
message.escape("File "+pep+" not found (-p).")
if allpeptides and peptide_table:
message.warning("Unused parameter: -p (peptide_table)")
if limit:
if not os.path.isfile(limit):
message.escape("File "+limit+" not found (-l).")
if os.path.getsize(limit) == 0:
message.warning("File "+limit+" is empty.")
if allpeptides or homology:
q = (fasta, directory)
if not (q[0] or q[1] ):
message.escape("Missing target sequences (-f or -d).")
if (q[0] and q[1]) :
message.escape("Options -f (fasta file) and -d (directory of fasta files) are mutually exclusive.")
if q[0]:
if not os.path.isfile(fasta):
message.escape("File "+fasta+" not found (-f).")
if os.path.getsize(fasta) == 0:
message.escape("File "+fasta+" is empty.")
if q[1]:
if not os.path.isdir(directory):
message.escape("Directory "+directory+" not found (-d).")
# TO DO : add taxonomy
return (homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, report, output_dir, report_file)
def create_report_homology(set_of_markers):
print(" Mode : HOMOLOGY")
print(" Input peptide table:" )
print("---------------------------------")
print(" NEW PEPTIDE TABLE")
print("---------------------------------\n")
markers.colinearity(set_of_markers)
markers.check_set_of_markers(set_of_markers)
def create_report_deamidation(peptide_table, set_of_codes):
print(" Mode : DEAMIDATION")
print(" Peptide table: "+ str(peptide_table))
if len(set_of_codes)==0:
print(" Modified peptide markers: all")
else:
print(" Modified peptide markers: "+ str(set_of_codes))
def create_report_allpeptides(set_of_sequences, number_of_missed_cleavage, min_length, max_length):
print(" Mode : ALL PEPTIDES")
print(" In silico digestion:")
print(" Enzyme: trypsine")
print(" Maximal number of missed cleavages: "+str(number_of_missed_cleavage))
print(" Minimal peptide length: "+str(min_length))
print(" Maximal peptide length: "+str(max_length))
print("---------------------------------")
print(" INPUT SEQUENCES")
print("---------------------------------")
for seq in set_of_sequences:
print (" "+seq.seqid()+" "+seq.protein()+ " "+seq.taxon_name())
def create_report_supplement(peptide_table,list_of_markers=None, set_of_sequences=None):
print(" Mode : SUPPLEMENT\n")
print("---------------------------------")
print(" INPUT PEPTIDE TABLE")
print("---------------------------------")
print(str(peptide_table)+"\n")
if set_of_sequences:
print("---------------------------------")
print(" INPUT SEQUENCES")
print("---------------------------------")
for seq in set_of_sequences:
print (" "+str(seq.seqid())+" "+str(seq.protein())+ " "+str(seq.taxon_name()))
print("---------------------------------")
print(" NEW PEPTIDE TABLE")
print("---------------------------------\n")
markers.colinearity(set(list_of_markers))
markers.check_set_of_markers(set(list_of_markers))
def create_report_header(report):
sys.stdout=open(report, 'w')
print("============================================\n")
print(" PAMPA CRAFT\n")
print("============================================\n")
print (time.ctime())
print("")
print("---------------------------------")
print(" PARAMETERS")
print("---------------------------------\n")
def create_report_footer(output_dir, output, report):
if os.path.getsize(os.path.join(output_dir,'warning.log')) > 0:
print("---------------------------------")
print(" WARNINGS")
print("---------------------------------\n")
print(" Warnings were raised regarding your inputs.\n")
with open(os.path.join(output_dir,'warning.log'), 'r') as file:
for line in file:
print(" - "+line, end="")
print("")
print("")
print("---------------------------------")
print(" OUTPUT FILES")
print("---------------------------------\n")
print(" Main result file (TSV) : "+output)
print(" Report (this file) : "+report)
sys.stdout = sys.__stdout__
class CustomFormatter(argparse.HelpFormatter):
def add_argument(self, action):
pass
def format_help(self):
custom_paragraph = "\nUsage: pampa_craft [-h] \n --allpeptides| --deamidation | --fillin | --homology | --selection \n [-f FASTA | -d DIRECTORY] [-p PEPTIDE_TABLE] [-s SPECTRA] [-l LIMIT] [-t TAXONOMY] -o OUTPUT \n \nThis script is for the design of custom peptide tables.\nIt should be invoked with one of the following pamaters:\n\n --allpeptides Generation of all tryptic peptides from FASTA sequences, possibly filtered by a set of MS spectra. \n --deamidation Addition of deamidation modifications to an existing peptide table\n --fillin Supplementing a partially filled peptide table: adding missing masses, positions, sequences etc. \n --homology Construction of a new peptide table by homology\n --selection Filtration of markers of an existing peptide table with a set of MS spectra. \n\nOptions coming with --allpeptides\n -f FASTA Fasta file for new species\n -d DIRECTORY Directory containing Fasta files for new species\n -l LIMIT Limit file that applies constraints to the set of sequences (tokens GN, OS and OX). OPTIONAL\n -s SPECTRA Path to the spectra files. Authorized formats: cvs, mgd, mzML. OPTIONAL\n -o OUTPUT Path to the output file (new peptide table)\n\nOptions coming with --deamidation\n -p PEPTIDE_TABLE Peptide table for which deamidation should be added.\n -l LIMIT Limit file to apply constraints on the set of markers affected by the modification (token Marker). OPTIONAL\n -o OUTPUT Path to the output file (new peptide table)\n\nOptions coming with --fillin\n -p PEPTIDE_TABLE Peptide table for which missing information should be completed.\n -f FASTA Fasta file for supplementary sequences.OPTIONAL\n -d DIRECTORY Directory containing Fasta files for supplementary sequences. OPTIONAL\n -t TAXONOMY Path to the taxonomy file needed to add missing taxonomic information. OPTIONAL\n -o OUTPUT Path to the output file (new peptide table)\n\nOptions coming with --homology\n -p PEPTIDE_TABLE [PEPTIDE_TABLE]\n Peptide table(s) that contain model peptide markers\n -f FASTA Fasta file for new species\n -d DIRECTORY Directory containing Fasta files for new species\n -l LIMIT Limit file that applies constraints on the set of sequences (tokens GN, OS and OX). OPTIONAL\n -o OUTPUT Path to the output file (new peptide table)\n\nOptions coming with --selection\n -p PEPTIDE_TABLE Peptide table to be filtered.\n -s SPECTRA Path to the spectra files. Authorized formats: cvs, mgd, mzML. \n -o OUTPUT Path to the output file (new peptide table)\n\n"
return custom_paragraph + super(CustomFormatter, self).format_help()
def main():
parser = argparse.ArgumentParser(formatter_class=CustomFormatter, usage=argparse.SUPPRESS)
parser.add_argument("--homology", dest="homology", action='store_true', help="Generate a new table by homology.", required=False)
parser.add_argument("--deamidation", dest="deamidation", action='store_true', help="Add deamidation to marker masses. -l option can be used to specify the list of involved markers.")
parser.add_argument("--allpeptides", dest="allpeptides", action='store_true', help="Generation of all tryptic peptides a from FASTA sequences (specified with either -f or -d).", required=False)
parser.add_argument("--selection", dest="selection", action='store_true', help="Selection of peptide markers from a set of spectra.", required=False)
parser.add_argument("--fillin", dest="fillin", action='store_true', help="Fill in missing information (such as masses, sequences...) to an existing peptide table (specified with -p).", required=False)
parser.add_argument("-p", dest="peptide_table",nargs='+', help="Peptide table (TSV file). Required with --homology and --fillin.", type=str)
parser.add_argument("-o", dest="output", help="Output path (should include the output file name)", type=str)
parser.add_argument("-f", dest="fasta", help="FASTA file that contains new sequences.", type=str)
parser.add_argument("-d", dest="directory", help="Directory that contains FASTA files.", type=str)
parser.add_argument("-s", dest="spectra", help="Directory that contains spectra files (one spectrum per file) for marker filtering", type=str)
parser.add_argument("-e", dest="resolution", help="Error margin for mass spectrum peaks. Recommended values: 0.01 for maldi FT and 0.1 for maldi TOF.", type=float)
parser.add_argument("-l", dest="limit", help="Limit file (txt)", type=str)
parser.add_argument("-t", dest="taxonomy", help="Taxonomy file (TSV)", type=str)
parser.add_argument("--web", dest="web", action='store_true', help=argparse.SUPPRESS, required=False)
args = parser.parse_args()
try:
# to do: add taxonomy
(homology, deamidation, allpeptides, fillin, selection, peptide_table, fasta, directory, spectra, limit, taxonomy, output, report, output_dir, report_file)=check_and_update_parameters(args.homology, args.deamidation, args.allpeptides, args.fillin, args.selection, args.peptide_table, args.fasta, args.directory, args.spectra, args.limit, args.taxonomy, args.output)
create_report_header(report)
list_of_constraints=lmt.parse_limits(limit)
primary_taxonomy=ta.parse_taxonomy_simple_file(taxonomy)
if homology:
set_of_markers, _ = pt.parse_peptide_tables(peptide_table, None, None)
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, primary_taxonomy)
if taxonomy:
primary_taxonomy=ta.parse_taxonomy_simple_file(taxonomy)
list_of_markers=homo.find_markers_all_sequences(set_of_sequences, set_of_markers, primary_taxonomy)
pt.build_peptide_table_from_set_of_markers(list_of_markers,output)
create_report_homology(set_of_markers)
if deamidation:
set_of_codes=set()
for constraint in list_of_constraints:
if 'Deamidation' in constraint:
set_of_codes.update(constraint['Deamidation'])
set_of_markers, list_of_headers=pt.parse_peptide_tables(peptide_table, None, None)
set_of_new_markers=compute_masses.add_deamidation(set_of_markers, set_of_codes)
pt.build_peptide_table_from_set_of_markers(set_of_markers.union(set_of_new_markers),output, list_of_headers)
create_report_deamidation(peptide_table, set_of_codes)
if allpeptides:
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, None)
if len(set_of_sequences)==0:
message.escape("No valid sequences found.\n")
data=config.parse_config_file()
if spectra is None:
set_of_new_markers = compute_masses.add_PTM_or_masses_to_markers(seq.in_silico_digestion(set_of_sequences, data["number_of_missed_cleavages"], data["min_peptide_length"], data["max_peptide_length"]))
if len(set_of_new_markers)==0:
message.escape("No valid peptide markers found.\n")
else:
set_of_markers = compute_masses.add_PTM_or_masses_to_markers(seq.in_silico_digestion(set_of_sequences, data["number_of_missed_cleavages"], data["min_peptide_length"], data["max_peptide_length"]), True, True)
if len(set_of_markers)==0:
message.escape("No valid peptide markers found.\n")
final_list_of_spectra=[]
for f in os.listdir(args.spectra):
file_name = os.path.join(spectra, f)
list_of_spectra=mass_spectrum.parser(file_name,f)
if len(list_of_spectra)>0:
final_list_of_spectra.extend(list_of_spectra)
if len(final_list_of_spectra)==0:
message.escape("No valid spectra found.\n Please refer to the warning.log file for more detail.")
minimal_number_of_spectra=max(1, 1*len(final_list_of_spectra)/5)
set_of_new_markers=marker_filtering.filter_set_of_markers(set_of_markers, final_list_of_spectra, args.resolution, minimal_number_of_spectra)
list_of_markers=markers.sort_and_merge(set_of_new_markers)
pt.build_peptide_table_from_set_of_markers(list_of_markers,output)
create_report_allpeptides(set_of_sequences, data["number_of_missed_cleavages"], data["min_peptide_length"], data["max_peptide_length"])
if selection: # not compatible with -f or -d
set_of_markers, list_of_headers= pt.parse_peptide_tables(peptide_table, None, None)
if len(set_of_markers)==0:
message.escape("No valid markers found.\n")
####
final_list_of_spectra=[]
for f in os.listdir(args.spectra):
file_name = os.path.join(spectra, f)
list_of_spectra=mass_spectrum.parser(file_name,f)
if len(list_of_spectra)>0:
final_list_of_spectra.extend(list_of_spectra)
if len(final_list_of_spectra)==0:
message.escape("No valid spectra found.\n Please refer to the warning.log file for more detail.")
minimal_number_of_spectra=max(1, 2*len(final_list_of_spectra)/3)
set_of_confirmed_markers=marker_filtering.filter_set_of_markers(set_of_markers, final_list_of_spectra, args.resolution, minimal_number_of_spectra)
list_of_markers=markers.sort_and_merge(set_of_confirmed_markers)
pt.build_peptide_table_from_set_of_markers(set_of_confirmed_markers,output, list_of_headers)
if fillin:
# to do: check that there is a single peptide table
set_of_markers, list_of_headers = pt.parse_peptide_tables(peptide_table, list_of_constraints, None, False) # check list_of_constraints here.
set_of_incomplete_markers, set_of_complete_markers, set_of_incomplete_fields = supplement.search_for_incomplete_markers(set_of_markers, set(list_of_headers))
if fasta or directory:
set_of_sequences = fa.build_set_of_sequences(fasta, directory, list_of_constraints, None)
set_of_incomplete_markers=markers.add_sequences_and_positions_to_markers(set_of_incomplete_markers, set_of_sequences)
if args.resolution:
set_of_incomplete_markers=markers.check_masses_and_sequences(set_of_incomplete_markers, args.resolution)
set_of_incomplete_markers=markers.find_sequences_from_mass(set_of_incomplete_markers, set_of_sequences, args.resolution)
set_of_incomplete_markers=supplement.add_digestion_status(set_of_incomplete_markers, set_of_sequences)
set_of_new_markers=compute_masses.add_PTM_or_masses_to_markers(set_of_incomplete_markers)
set_of_new_markers=ta.supplement_taxonomic_information(set_of_new_markers, primary_taxonomy)
#list_of_markers=list(set_of_new_markers | set_of_complete_markers)
list_of_markers=markers.sort_and_merge(set_of_new_markers | set_of_complete_markers)
set_of_markers=ta.add_taxonomy_ranks(list_of_markers, primary_taxonomy)
pt.build_peptide_table_from_set_of_markers(set_of_markers,output, list_of_headers)
if fasta or directory:
create_report_supplement(peptide_table, list_of_markers, set_of_sequences)
else:
create_report_supplement(peptide_table)
create_report_footer(output_dir, output, report)
print("")
print(" Job completed.")
print(" All results are available in the following files.")
print("")
print(f" - New peptide table : {output}")
print(f" - Report on the run : {report}")
print("")
# TO DO: add the new peptide table, if necessary
if os.path.getsize(os.path.join(output_dir, "warning.log")) > 0:
print("Warnings were raised during the execution.")
print("Please refer to the warning.log file for detail.")
except message.InputError:
if not args.web:
print("\n An error occured with your input. Stopping execution.")
print(" Please refer to the warning.log file for detail.")
else:
pass
if __name__ == "__main__":
main()
......@@ -190,6 +190,7 @@ def build_peptide_table_from_set_of_markers(set_of_markers, outfile_name, list_o
tsv_file.close()
"""
"""
def json_build_peptide_table_from_set_of_markers(set_of_markers, outfile_name, list_of_headers=None, append_file=""):
JSON_file = open(outfile_name, "w")
if not list_of_headers:
......@@ -206,4 +207,4 @@ def json_build_peptide_table_from_set_of_markers(set_of_markers, outfile_name, l
dicts.append(dict(zip(list_of_headers, m.field.values())))
json.dump(dicts, JSON_file)
JSON_file.close()
"""
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment