shortpepr

The shortpepr tool can be used to generate PWMs from peptides and to calculate the information content. It is intended to use with peptides that are bound to MHCI molecules, but can be applied to all gap free peptide alignments.

Installation

You can install the development version of shortpepr like so:

if(!require(devtools)){
    install.packages("devtools")
    library(devtools)
}

devtools::build()

install.packages("<path_to_repo>/shortpepr_0.1.1.tar.gz", repos = NULL, type="source")

Examples

Shortpepr is designed to work with tibbles from tidyverse. As example epitopes from the IEDB are loaded.

suppressMessages({
library(shortpepr)
library(tidyverse)
})

# dataframe containing the peptide, allele and the length of the peptide
example_epitopes <- system.file(
  "extdata",
  "test_epitopes.tsv", 
  package = "shortpepr")
example_epitopes_df <- read_tsv(example_epitopes)
#> Rows: 888 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (2): peptide, allele
#> dbl (1): len
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(example_epitopes_df)
#> # A tibble: 6 × 3
#>   peptide    allele   len
#>   <chr>      <chr>  <dbl>
#> 1 AIVNYANL   H2Kb       8
#> 2 TSYKFESV   H2Kb       8
#> 3 VSLDYINTM  H2Kb       9
#> 4 AAFEFINSL  H2Kb       9
#> 5 YAPVSPIVI  H2Db       9
#> 6 YSLPNAGDVI H2Db      10

To generate PWMs background frequencies are required.

# the background count and frequencies of amino acids (e.g. from the proteome)
example_background <- system.file(
  "extdata",
  "test_background.tsv", 
  package = "shortpepr")
example_background_df <- read_tsv(example_background) %>% select(name, freq) %>% deframe()
#> Rows: 20 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): name
#> dbl (2): value, freq
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
example_background_df
#>          A          C          D          E          F          G          H 
#> 0.06851810 0.02221453 0.04853844 0.07083450 0.03628429 0.06333933 0.02598077 
#>          I          K          L          M          N          P          Q 
#> 0.04353847 0.05723104 0.09947218 0.02223334 0.03581281 0.06203675 0.04845813 
#>          R          S          T          V          W          Y 
#> 0.05617254 0.08586854 0.05443012 0.06068323 0.01188343 0.02646946

A PWM can be generated from a vector of peptides.

# a PWM can be directly generated from a list of peptides and a given background frequency
test_pwm <- example_epitopes_df %>% 
  filter(allele == "H2Db", len==9) %>% 
  pull(peptide) %>% 
  seqs2pwm(background = example_background_df)

test_pwm
#> # A tibble: 9 × 20
#>        A       C       F      G       H      I      K      L       M      N
#>    <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
#> 1  0.796  0.251   1.54   -0.261 -0.238   0.280 -0.115 -1.18  -0.0135  0.884
#> 2  1.74   1.25   -2.04    0.962 -1.56   -1.30  -1.11  -1.33   1.25   -1.02 
#> 3 -1.96  -1.33   -0.235  -1.26  -0.238   1.70  -1.70   0.503  1.25    1.22 
#> 4 -0.637 -0.0123 -1.46   -0.261  0.0248  1.15  -1.11  -0.590  0.665   1.06 
#> 5 -2.37  -0.334  -0.720  -0.846 -0.560  -1.30  -2.70  -1.04  -1.34    4.23 
#> 6  0.626 -0.0123  0.417   0.154 -0.975   0.280 -0.530 -0.105  0.987  -0.701
#> 7  0.363 -0.334  -0.0420 -0.846  0.247   0.695 -1.38  -1.33  -1.34   -0.216
#> 8 -0.152  0.836   1.48   -2.85   0.440  -0.135 -0.699 -0.327 -0.335  -0.701
#> 9 -1.37  -0.0123  0.128  -3.85  -2.56    2.55  -2.70   1.77   2.57   -2.02 
#> # ℹ 10 more variables: Q <dbl>, R <dbl>, S <dbl>, T <dbl>, V <dbl>, W <dbl>,
#> #   Y <dbl>, E <dbl>, P <dbl>, D <dbl>

The funciton create_pwm_df generates a PWM for all allele length combinations for which at least min_datapoints are available.

pwm_df <- example_epitopes_df %>%
  create_pwm_df(min_datapoints=50, background=example_background_df)

pwm_df
#> # A tibble: 4 × 4
#>     len allele     n pwm              
#>   <dbl> <chr>  <int> <list>           
#> 1     8 H2Kb     217 <tibble [8 × 20]>
#> 2     9 H2Db     207 <tibble [9 × 20]>
#> 3     9 H2Kb     110 <tibble [9 × 20]>
#> 4     9 H2Kd     116 <tibble [9 × 20]>

Peptides can be scored based on a given PWM dataframe:

example_peptide = "ASNENAETM"
score_seq_allele(example_peptide, "H2Db", pwm_df)
#> [1] 0.7596006

For a given frequency matrix, the information content at the different positions can be determined by using the function calculate_information_content.

example_epitopes_df %>%
  filter(allele == "H2Db", len==9) %>%
  pull(peptide) %>% 
  seqs2pfm() %>% 
  calculate_information_content()
#> [1] 0.4061437 0.8142406 0.6516111 0.2584148 2.1069134 0.1866548 0.3894873
#> [8] 0.3018932 1.4782398

Developer information

Before commiting changes update the documentation and this README:

devtools::document()
devtools::build_readme()

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
R		R
inst/extdata		inst/extdata
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

shortpepr

Installation

Examples

Developer information

About

Uh oh!

Releases

Packages

Languages

LKress/shortpepr

Folders and files

Latest commit

History

Repository files navigation

shortpepr

Installation

Examples

Developer information

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages