-
Notifications
You must be signed in to change notification settings - Fork 10
Description
Hi,
I think there is an issue in parsing peptidoform for pepXML file.
in this peptide hit exemple :
<search_hit peptide="AHTMVHDQVSR" massdiff="-6.103515625E-4" calc_neutral_pep_mass="1295.604" peptide_next_aa="F" num_missed_cleavages="0" num_tol_term="2" protein_descr="gene=gltA;locus_tag=19A2747_02138;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P14165;product=Citrate synthase" num_tot_proteins="1" tot_num_ions="20" hit_rank="1" num_matched_ions="6" protein="19A2747_02138_gene" peptide_prev_aa="R" is_rejected="0">
<modification_info modified_peptide="AHTM[147.0354]VHDQVSR">
<mod_aminoacid_mass mass="147.0354" position="4"/>
</modification_info>
<search_score name="hyperscore" value="15.15"/>
<search_score name="nextscore" value="0.0"/>
<search_score name="expect" value="3.868121e-04"/>
</search_hit>
the psm_utils.io.read_file command returns:
AHTMV[+147.0354]HDQVSR/3
The oxidation(M) on position 4 is offset to position 5.
This might be due to the modification parsing occuring in the function "_parse_peptidoform"; specifically the line
sequence = [(aa, modifications_dict[i] or None) for i, aa in enumerate(peptide)]
I could be wrong but I think, this should be:
sequence = [(aa, modifications_dict[i+1] or None) for i, aa in enumerate(peptide)]
I hope this helps.
Thanks,