Skip to content

SowpatiLab/ribbit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ribbit-logo

ribbit

Ribbit is a tool to identify tandem repeats of variable motif sizes in genomes. The tools is specialised to resolve complex TR structures, accurately define the priodicity and consensus motif of the tandem repeat. The algorithm converts DNA sequences to 2-bit format and uses basic bit operations to identify tandem repeat sequences. Ribbit investigates a DNA sequence for potential TRs of periodicities and compares the periodicity annotations based on the purity. Thus, it can resolve overlapping/nested tandem repeats with higher accuracy.

Table of Contents

  1. Installation
  2. Usage
  3. Inputs and Outputs
  4. Citation
  5. Contact

Installation

To install Ribbit, clone the repository and install the dependencies using the following commands:

Installing dependencies

1. Install boost library

sudo apt-get install boost

2. Installing bedtools

conda install pybedtools

OR

pip install pybedtools

Compiling ribbit

git clone https://github.com/SowpatiLab/ribbit.git
cd ribbit
make

Usage

Here’s a basic usage example:

./ribbit [options] -i sequence.fasta --output results.bed

To view detailed help information

./ribbit -h

  -h [ --help ]                 Ribbit tool identifies short tandem repeats 
                                with allowed levels of impurity.
  -i [ --input-file ] arg       File path for the input fasta file.
  -o [ --output-file ] arg      File path for the output file.
  -m [ --min-motif-length ] arg The minimum length of the motif of the repeats 
                                to be identified. Default: 2
  -M [ --max-motif-length ] arg The maximum length of the motif of the repeats 
                                to be identified. Default: 100
  -p [ --purity ] arg           Threshold value for the continuous number of 
                                ones found in a seed. Default: 0.85
  -l [ --min-length ] arg       The minimum length of the repeat. Default: 12
  --min-units arg               The minimum number of units of the repeat. Can 
                                be an integer value for cutoff across all motif
                                sizes, or a tab-separated file with two columns: 
                                the first is the motif size and the second is 
                                the unit cutoff. Default: 2
  --perfect-units arg           The minimum number of complete units of the 
                                repeat. Can be an integer value for cutoff 
                                across all motif sizes, or a tab-separated file 
                                with two columns: the first is the motif size and 
                                the second is the unit cutoff. Default: 2

Inputs and Outputs

-i or --input

Expects: STRING (to be used as filename)

The input file must be a valid FASTA file.

-o or --output

Expects: STRING (to be used as filename)

The output for ribbit is .bed file.

bed file output columns

S.No Column Description
1 Chromosome Chromosome or Sequence Name as specified by the first word in the FASTA header
2 Repeat Start 0-based start position of SSR in the Chromosome
3 Repeat Stop End position of SSR in the Chromosome
4 Repeat Class Class of repeat as grouped by their cyclical variations
5 Repeat Length Total length of identified repeat in nt
6 Motif count Number of complete motifs in the STR
7 Purity Purity of STR region (perfect STR = 1)
7 Repeat Strand Strand of SSR based on their cyclical variation
8 CIGAR Representing type of imperfections.

-m or --min-motif-length

The minimum length of the motif of the repeats to be identified.

-M or --max-motif-length

The maximum length of the motif of the repeats to be identified.

-p or --purity

TEXT

Bed file output example

Chromosome Start End Motif Purity Strand CIGAR Motif Size Repeat Length Motif Units
Test_Seq 90196 90393 AC 0.9494 + 3=1X3=1X5=1D82=1X17=1X19=1X31=1I2=1X3=1X21=1I2= 2 197 98
Test_Seq 137451 137470 CCCGCT 1 + 19= 6 19 3
Test_Seq 136254 136401 GT 0.9127 + 6=1X9=1D20=1D15=1X12=1X5=1X25=1X9=1X7=1X5=1X9=1X10=1X2=1X2= 2 147 73
Test_Seq 139286 139306 AGTTGCTT 0.95 + 8=1X11= 8 20 2
Test_Seq 3538110 3538168 AATAGCAAGAGCCAGAGCTAGAGCAAAG 0.8813 + 4=1X1=2I30=1X9=1X5=1X1=1D2= 8 58 7
Test_Seq 4197438 4197487 CACAGCCAGCT 0.9591 + 26=1X12=1X9= 11 49 4
Test_Seq 4858037 4858050 CTCTTT 0.9230 + 6=1I6= 6 13 2
Test_Seq 5000704 5000745 TATTCGTATGCGTATTC 0.9024 + 4=1I22=1X4=2X7= 17 41 2

Citation

If you found ribbit useful, we would appreciate it if you could cite our manuscript: Ribbit: Accurate identification and annotation of complex tandem repeat sequences in genomes

Contact

For queries or suggestions, please contact:
Akshay Kumar Avvaru - [email protected]
Divya Tej Sowpati - [email protected]

About

Ribbit is a tool to identify tandem repeats in genome sequences.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •