Skip to content

pepXML modifications are offset by one #100

@nnalpas

Description

@nnalpas

Hi,
I think there is an issue in parsing peptidoform for pepXML file.

in this peptide hit exemple :

<search_hit peptide="AHTMVHDQVSR" massdiff="-6.103515625E-4" calc_neutral_pep_mass="1295.604" peptide_next_aa="F" num_missed_cleavages="0" num_tol_term="2" protein_descr="gene=gltA;locus_tag=19A2747_02138;inference=ab initio prediction:Prodigal:002006,similar to AA sequence:UniProtKB:P14165;product=Citrate synthase" num_tot_proteins="1" tot_num_ions="20" hit_rank="1" num_matched_ions="6" protein="19A2747_02138_gene" peptide_prev_aa="R" is_rejected="0">
<modification_info modified_peptide="AHTM[147.0354]VHDQVSR">
<mod_aminoacid_mass mass="147.0354" position="4"/>
</modification_info>
<search_score name="hyperscore" value="15.15"/>
<search_score name="nextscore" value="0.0"/>
<search_score name="expect" value="3.868121e-04"/>
</search_hit>

the psm_utils.io.read_file command returns:

AHTMV[+147.0354]HDQVSR/3

The oxidation(M) on position 4 is offset to position 5.

This might be due to the modification parsing occuring in the function "_parse_peptidoform"; specifically the line
sequence = [(aa, modifications_dict[i] or None) for i, aa in enumerate(peptide)]
I could be wrong but I think, this should be:
sequence = [(aa, modifications_dict[i+1] or None) for i, aa in enumerate(peptide)]

I hope this helps.
Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions