conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/Ribodetector
Introduction
RiboDetector is a software developed to accurately yet rapidly detect and remove rRNA sequences from metagenomic, metatranscriptomic, and ncRNA sequencing data (Deng et al. 2022). It was developed based on LSTMs and optimized for both GPU and CPU usage to achieve a 10 times on CPU and 50 times on a consumer GPU faster runtime compared to the current state-of-the-art software. Moreover, it is very accurate, with ~10 times fewer false classifications. Finally, it has a low level of bias towards any GO functional groups.
For more information, check out the manual
Installation
Installed on crunchomics: Yes,
- Ribodetector v0.3.1 is installed as part of the bioinformatics share. If you have access to crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.
- Afterwards, you can add the bioinformatics share as follows (if you have already done this in the past, you don’t need to run this command):
If you want to install it yourself, you can run:
mamba create -p /zfs/omics/projects/bioinformatics/software/miniconda3/envs/ribodetector_0.3.1 -c bioconda ribodetector python=3.8Usage
In the example below, we use the CPU version of ribodetector. If you have access to GPUs, check out the manual for more information.
To run ribodetector, you need to provide it with the sequencing length, You can set the -l parameter to the mean read length if you have reads with variable length. The mean read length can be computed with seqkit stats.
conda activate ribodetector_0.3.1
ribodetector_cpu -t 20 \
-i sample_R1_trim.fastq.gz sampleR2_trim.fastq.gz \
-l 138 \
-e rrna \
--chunk_size 256 \
-o sampleR1_nonrrna.fastq.gz sample_R2_nonrrna.fastq.gz \
--log sample.log
conda deactivateoptional arguments:
-h,--helpshow this help message and exit-lLEN,--lenLEN Sequencing read length (mean length). Note: the accuracy reduces for reads shorter than 40.-i[INPUT [INPUT …]],--input[INPUT [INPUT …]] Path of input sequence files (fasta and fastq), the second file will be considered as second end if two files given.-o[OUTPUT [OUTPUT …]],--output[OUTPUT [OUTPUT …]] Path of the output sequence files after rRNAs removal (same number of files as input). (Note: 2 times slower to write gz files)-r[RRNA [RRNA …]],--rrna[RRNA [RRNA …]] Path of the output sequence file of detected rRNAs (same number of files as input)-e{rrna,norrna,both,none},--ensure{rrna,norrna,both,none} Ensure which classificaion has high confidence for paired end reads. norrna: output only high confident non-rRNAs, the rest are clasified as rRNAs; rrna: vice versa, only high confident rRNAs are classified as rRNA and the rest output as non-rRNAs; both: both non-rRNA and rRNA prediction with high confidence; none: give label based on the mean probability of read pair. (Only applicable for paired end reads, discard the read pair when their predicitons are discordant)-tTHREADS,--threadsTHREADS number of threads to use. (default: 20)--chunk_sizeCHUNK_SIZE chunk_size * 1024 reads to load each time. When chunk_size=1000 and threads=20, consumming ~20G memory, better to be multiples of the number of threads..--logLOG Log file name-v,--versionShow program’s version number and exit