This tool is designed to help assess the potential of immunopeptides (e.g. derived from pathogens) to elicit T-cell cross-reactivity due to molecular mimicry, defined as a closely related sequence including a maximum of 2 amino acid mismatches. Essentially, it is a bash script which runs an agrep command to match a list of sequences in a text file against a user-specified proteome, then runs a program called matchfinder to curate and collate the results into a readable spreadsheet. Using the agrep command for this purpose was reported previously in Knierman et al., however, no code was available to facilitate the analysis, so we made this in-house solution to assess the potential cross-reactivity of COVID immunopeptides which had been identified by mass spectrometry. While still rough around the edges, we hope this proves useful for others as well. If you use the script/program, please cite our paper:
Braun, A., Rowntree, L.C., Huang, Z. et al. Mapping the immunopeptidome of seven SARS-CoV-2 antigens across common HLA haplotypes. Nat Commun 15, 7547 (2024). https://doi.org/10.1038/s41467-024-51959-6
cc matchfinder.c –o matchfinder.o
-
Copy agrep_for_crossreactivity.sh and matchfinder.o to somewhere convenient on your computer
-
Open agrep_for_crossreactivity.sh with a text editor and modify line 42 so that the filepath points to where you just put matchfinder.o
-
Make a txt file with the list of peptides of interest e.g. peptides.txt. The script has only been tested when this txt file is in the same folder as the agrep_for_crossreactivity.sh script (it probably works with a filepath to a txt file in a different location, but I haven't checked).
-
Get a proteome fasta file of interest.
-
Run with a command like the template:
./agrep_for_crossreactivity.sh peptides proteome.fasta
Note, don't write the '.txt' extension on the peptides filename; do put the .fasta extension on the proteome filename. Sorry this is a bit rough-n-ready.
This should run with some messaging until it says 'Finished'.
A folder with the name x_output where x is the name of the file containing the peptide list. This will contain a txt file of agrep output for each peptide, plus a summary table in a file named output.csv with the precise match and location for all peptides that found matches in the proteome.