ribbit

Ribbit is a tool to identify tandem repeats of variable motif sizes in genomes. The tools is specialised to resolve complex TR structures, accurately define the priodicity and consensus motif of the tandem repeat. The algorithm converts DNA sequences to 2-bit format and uses basic bit operations to identify tandem repeat sequences. Ribbit investigates a DNA sequence for potential TRs of periodicities and compares the periodicity annotations based on the purity. Thus, it can resolve overlapping/nested tandem repeats with higher accuracy.

Installation

To install Ribbit, clone the repository and install the dependencies using the following commands:

Installing dependencies

1. Install boost library

sudo apt-get install boost

2. Installing bedtools

conda install pybedtools

OR

pip install pybedtools

Compiling ribbit

git clone https://github.com/SowpatiLab/ribbit.git
cd ribbit
make

Usage

Here’s a basic usage example:

./ribbit [options] -i sequence.fasta --output results.bed

To view detailed help information

./ribbit -h

  -h [ --help ]                 Ribbit tool identifies short tandem repeats 
                                with allowed levels of impurity.
  -i [ --input-file ] arg       File path for the input fasta file.
  -o [ --output-file ] arg      File path for the output file.
  -m [ --min-motif-length ] arg The minimum length of the motif of the repeats 
                                to be identified. Default: 2
  -M [ --max-motif-length ] arg The maximum length of the motif of the repeats 
                                to be identified. Default: 100
  -p [ --purity ] arg           Threshold value for the continuous number of 
                                ones found in a seed. Default: 0.85
  -l [ --min-length ] arg       The minimum length of the repeat. Default: 12
  --min-units arg               The minimum number of units of the repeat. Can 
                                be an integer value for cutoff across all motif
                                sizes, or a tab-separated file with two columns: 
                                the first is the motif size and the second is 
                                the unit cutoff. Default: 2
  --perfect-units arg           The minimum number of complete units of the 
                                repeat. Can be an integer value for cutoff 
                                across all motif sizes, or a tab-separated file 
                                with two columns: the first is the motif size and 
                                the second is the unit cutoff. Default: 2

Inputs and Outputs

-i or --input

Expects: STRING (to be used as filename)

The input file must be a valid FASTA file.

-o or --output

Expects: STRING (to be used as filename)

The output for ribbit is .bed file.

bed file output columns

S.No	Column	Description
1	Chromosome	Chromosome or Sequence Name as specified by the first word in the FASTA header
2	Repeat Start	0-based start position of SSR in the Chromosome
3	Repeat Stop	End position of SSR in the Chromosome
4	Repeat Class	Class of repeat as grouped by their cyclical variations
5	Repeat Length	Total length of identified repeat in nt
6	Motif count	Number of complete motifs in the STR
7	Purity	Purity of STR region (perfect STR = 1)
7	Repeat Strand	Strand of SSR based on their cyclical variation
8	CIGAR	Representing type of imperfections.

-m or --min-motif-length

The minimum length of the motif of the repeats to be identified.

-M or --max-motif-length

The maximum length of the motif of the repeats to be identified.

-p or --purity

TEXT

Bed file output example

Chromosome	Start	End	Motif	Purity	Strand	CIGAR	Motif Size	Repeat Length	Motif Units
Test_Seq	90196	90393	AC	0.9494	+	3=1X3=1X5=1D82=1X17=1X19=1X31=1I2=1X3=1X21=1I2=	2	197	98
Test_Seq	137451	137470	CCCGCT	1	+	19=	6	19	3
Test_Seq	136254	136401	GT	0.9127	+	6=1X9=1D20=1D15=1X12=1X5=1X25=1X9=1X7=1X5=1X9=1X10=1X2=1X2=	2	147	73
Test_Seq	139286	139306	AGTTGCTT	0.95	+	8=1X11=	8	20	2
Test_Seq	3538110	3538168	AATAGCAAGAGCCAGAGCTAGAGCAAAG	0.8813	+	4=1X1=2I30=1X9=1X5=1X1=1D2=	8	58	7
Test_Seq	4197438	4197487	CACAGCCAGCT	0.9591	+	26=1X12=1X9=	11	49	4
Test_Seq	4858037	4858050	CTCTTT	0.9230	+	6=1I6=	6	13	2
Test_Seq	5000704	5000745	TATTCGTATGCGTATTC	0.9024	+	4=1I22=1X4=2X7=	17	41	2

Citation

If you found ribbit useful, we would appreciate it if you could cite our manuscript: Ribbit: Accurate identification and annotation of complex tandem repeat sequences in genomes

Contact

For queries or suggestions, please contact:
Akshay Kumar Avvaru - [email protected]
Divya Tej Sowpati - [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
.vscode		.vscode
data_simulation		data_simulation
lib		lib
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
makefile		makefile
ribbit		ribbit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ribbit

Table of Contents

Installation

Installing dependencies

1. Install boost library

2. Installing bedtools

Compiling ribbit

Usage

Inputs and Outputs

bed file output columns

Bed file output example

Citation

Contact

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

SowpatiLab/ribbit

Folders and files

Latest commit

History

Repository files navigation

ribbit

Table of Contents

Installation

Installing dependencies

1. Install boost library

2. Installing bedtools

Compiling ribbit

Usage

Inputs and Outputs

bed file output columns

Bed file output example

Citation

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages