Skip to content

LKress/shortpepr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

shortpepr

The shortpepr tool can be used to generate PWMs from peptides and to calculate the information content. It is intended to use with peptides that are bound to MHCI molecules, but can be applied to all gap free peptide alignments.

Installation

You can install the development version of shortpepr like so:

if(!require(devtools)){
    install.packages("devtools")
    library(devtools)
}

devtools::build()

install.packages("<path_to_repo>/shortpepr_0.1.1.tar.gz", repos = NULL, type="source")

Examples

Shortpepr is designed to work with tibbles from tidyverse. As example epitopes from the IEDB are loaded.

suppressMessages({
library(shortpepr)
library(tidyverse)
})

# dataframe containing the peptide, allele and the length of the peptide
example_epitopes <- system.file(
  "extdata",
  "test_epitopes.tsv", 
  package = "shortpepr")
example_epitopes_df <- read_tsv(example_epitopes)
#> Rows: 888 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (2): peptide, allele
#> dbl (1): len
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(example_epitopes_df)
#> # A tibble: 6 × 3
#>   peptide    allele   len
#>   <chr>      <chr>  <dbl>
#> 1 AIVNYANL   H2Kb       8
#> 2 TSYKFESV   H2Kb       8
#> 3 VSLDYINTM  H2Kb       9
#> 4 AAFEFINSL  H2Kb       9
#> 5 YAPVSPIVI  H2Db       9
#> 6 YSLPNAGDVI H2Db      10

To generate PWMs background frequencies are required.

# the background count and frequencies of amino acids (e.g. from the proteome)
example_background <- system.file(
  "extdata",
  "test_background.tsv", 
  package = "shortpepr")
example_background_df <- read_tsv(example_background) %>% select(name, freq) %>% deframe()
#> Rows: 20 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): name
#> dbl (2): value, freq
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
example_background_df
#>          A          C          D          E          F          G          H 
#> 0.06851810 0.02221453 0.04853844 0.07083450 0.03628429 0.06333933 0.02598077 
#>          I          K          L          M          N          P          Q 
#> 0.04353847 0.05723104 0.09947218 0.02223334 0.03581281 0.06203675 0.04845813 
#>          R          S          T          V          W          Y 
#> 0.05617254 0.08586854 0.05443012 0.06068323 0.01188343 0.02646946

A PWM can be generated from a vector of peptides.

# a PWM can be directly generated from a list of peptides and a given background frequency
test_pwm <- example_epitopes_df %>% 
  filter(allele == "H2Db", len==9) %>% 
  pull(peptide) %>% 
  seqs2pwm(background = example_background_df)

test_pwm
#> # A tibble: 9 × 20
#>        A       C       F      G       H      I      K      L       M      N
#>    <dbl>   <dbl>   <dbl>  <dbl>   <dbl>  <dbl>  <dbl>  <dbl>   <dbl>  <dbl>
#> 1  0.796  0.251   1.54   -0.261 -0.238   0.280 -0.115 -1.18  -0.0135  0.884
#> 2  1.74   1.25   -2.04    0.962 -1.56   -1.30  -1.11  -1.33   1.25   -1.02 
#> 3 -1.96  -1.33   -0.235  -1.26  -0.238   1.70  -1.70   0.503  1.25    1.22 
#> 4 -0.637 -0.0123 -1.46   -0.261  0.0248  1.15  -1.11  -0.590  0.665   1.06 
#> 5 -2.37  -0.334  -0.720  -0.846 -0.560  -1.30  -2.70  -1.04  -1.34    4.23 
#> 6  0.626 -0.0123  0.417   0.154 -0.975   0.280 -0.530 -0.105  0.987  -0.701
#> 7  0.363 -0.334  -0.0420 -0.846  0.247   0.695 -1.38  -1.33  -1.34   -0.216
#> 8 -0.152  0.836   1.48   -2.85   0.440  -0.135 -0.699 -0.327 -0.335  -0.701
#> 9 -1.37  -0.0123  0.128  -3.85  -2.56    2.55  -2.70   1.77   2.57   -2.02 
#> # ℹ 10 more variables: Q <dbl>, R <dbl>, S <dbl>, T <dbl>, V <dbl>, W <dbl>,
#> #   Y <dbl>, E <dbl>, P <dbl>, D <dbl>

The funciton create_pwm_df generates a PWM for all allele length combinations for which at least min_datapoints are available.

pwm_df <- example_epitopes_df %>%
  create_pwm_df(min_datapoints=50, background=example_background_df)

pwm_df
#> # A tibble: 4 × 4
#>     len allele     n pwm              
#>   <dbl> <chr>  <int> <list>           
#> 1     8 H2Kb     217 <tibble [8 × 20]>
#> 2     9 H2Db     207 <tibble [9 × 20]>
#> 3     9 H2Kb     110 <tibble [9 × 20]>
#> 4     9 H2Kd     116 <tibble [9 × 20]>

Peptides can be scored based on a given PWM dataframe:

example_peptide = "ASNENAETM"
score_seq_allele(example_peptide, "H2Db", pwm_df)
#> [1] 0.7596006

For a given frequency matrix, the information content at the different positions can be determined by using the function calculate_information_content.

example_epitopes_df %>%
  filter(allele == "H2Db", len==9) %>%
  pull(peptide) %>% 
  seqs2pfm() %>% 
  calculate_information_content()
#> [1] 0.4061437 0.8142406 0.6516111 0.2584148 2.1069134 0.1866548 0.3894873
#> [8] 0.3018932 1.4782398

Developer information

Before commiting changes update the documentation and this README:

devtools::document()
devtools::build_readme()

About

Generate PWMs from peptides

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages