Bioinformatics guidance page

On this website you can find documentation about software commonly used in bioinformatic data analyses as well as tutorials about various bioinformatic subjects. On this webpage you can find software organized by topic and for each topic you find a list of commonly used software tools.

If you are working at the University of Amsterdam (UvA) Institute for Biodiversity and Ecosystem Dynamics (IBED) and want to know more about what computational resources are available, please also visit the computational support teams website and our website with more computation resources

Please, be aware that this page is a work in progress and will be slowly updated over time. If you want to add additional information or feel that something is missing feel free to send an email to n.dombrowski@uva.nl.

Useful tutorials

The Carpentries teaches workshops around the world on the foundational skills to work effectively and reproducibly with data and code and they are an excellent resource to check out if you want to get started with bioinformatics

The software carpentries provides tutorials on:
- Bash
- Git
- Python
- R
Data carpentries provides domain-specific tutorials, such as for ecology or genomics
Library carpentries contain some useful tutorials if you want to transform data frames, map data to each other and work effectively with data

Next, to the carpentries you will find a list of tutorials for more specific topics below.

Getting started with bash

A tutorial on using bash and an HPC
Using the local scratch on Crunchomics
Version control with git
A tutorial on using AWK, a command line tool for filtering tables, extracting patterns, etc… If you want to follow this tutorial then you can download the required input files from here

Using R

An R cookbook including some example files if you want to code along
Tutorial on data manipulation with dplyr
Tutorial on data visualization with ggplot2

Bioinformatic workflows

Bioinformatic tools A-Z

Bioinformatic Tools A-Z

ATLAS: A metagenomic pipeline for QC, assembly binning and annotation
Augustus: A program that predicts genes in eukaryotic genomic sequences
Autocycler: A tool for generating consensus long-read assemblies for microbial genomes
Bakta: A tool for the rapid & standardized annotation of bacterial genomes and plasmids from both isolates and MAGs
Barrnap: A tool to predict the location of ribosomal RNA genes in genomes.
BMGE: A program to select regions in a multiple sequence alignment that are suited for phylogenetic inference
Bowtie2: A tool for aligning sequencing reads to genomes and other reference sequences
BUSCO: Quality assessment of (meta)genomes, transcriptomes and proteomes
CheckM2: A tool to assess the quality of a genome assembly
Chopper: A tool for quality filtering of long read data
CoverM: A DNA read coverage and relative abundance calculator focused on metagenomics applications
DeepLoc: A tool to predict the subcellular localization(s) of eukaryotic proteins
Diamond: A sequence aligner for protein and translated DNA searches
Dnaapler: A tool to re-orient a genome, for example at dnaA
DeSeq2: Analyse gene expression data in R

FAMA: A fast pipeline for functional and taxonomic analysis of metagenomic sequences
FastP: A tool for fast all-in-one preprocessing of FastQ files
fastplong: A tool for ultrafast preprocessing and quality control for long reads
FastQC: A quality control tool for read sequencing data
FeatureCounts: A read summarization program that counts mapped reads for genomic features
FeGenie: A HMM-based identification and categorization of iron genes
Filtlong: A tool for filtering long reads
Flye: A de novo assembler for single-molecule sequencing reads
GTDB_tk: A software toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes
GToTree: a user-friendly workflow for phylogenomics intended to create phylogenomic trees
HMMER: A tool for searching sequence databases for sequence homologs, and for making sequence alignments
Homopolish: A tool for the removal of systematic errors in nanopore sequencing by homologous polishing

IQ-TREE: A tool for phylogenomic inferences
Interproscan: A tool to scan protein and nucleic sequences against InterPro signatures
ITSx: A tool to extract ITS1 and ITS2 subregions from ITS sequences
Kraken2: A taxonomic sequence classifier using kmers

Mafft: A multiple sequence alignment program
Medaka: A tool for assembly polishing
METABOLIC: A tool to predict functional trait profiles in genome datasets
MetaCerberus: A tool for functional assignment
MOTUS: A tool to estimate microbial abundances in Illumina and Nanopore sequencing data
Minimap2: A program to align DNA or mRNA sequences against a reference database
MultiQC: A program to summarize analysis reports
NanoClass2: A taxonomic meta-classifier for long-read 16S/18S rRNA gene sequencing data
NanoITS: A taxonomic meta-classifier for long-read ITS operon sequencing data
Nanophase: A pipeline to generate MAGs using Nanopore long and Illumina short reads from metagenomes
NanoPlot: Plotting tool for long read sequencing data
NanoQC: A quality control tool for long read sequencing data
NGSpeciesID: A tool for clustering and consensus forming of long-read amplicon sequencing data
Porechop: A tool for finding and removing adapters from Nanopore reads
Prokka: A tool to annotate bacterial, archaeal and viral genomes
Pseudofinder: A tool that detects pseudogene candidates from annotated genbank files of bacterial and archaeal genomes
pyCirclize: A tool for circular visualization, i.e. genome plots, in python
pyGenomeTracks: A tool to produce high-quality genome browser tracks

QUAST: A Quality Assessment Tool for Genome Assemblies
Ribodetector: Detect and remove rRNA sequences from metagenomic, metatranscriptomic, and ncRNA sequencing data
RSeQC: A tool to evaluate high throughput sequence data especially RNA-seq data
RSEM: A software package for estimating gene and isoform expression levels from RNA-Seq data
Samtools: A tool to manipulating alignments in SAM/BAM format
SeqKit: A tool for FASTA/Q file manipulation
SignalP6: A tool to predict the presence of signal peptides
SingleM: A tool for taxonomic profiling of shotgun metagenomes
SortMerNa: A tool to filter ribosomal RNAs in metatranscriptomic data
STAR: An ultrafast universal RNA-seq aligner
StringTie: A a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts
TD2: A tool to identify candidate coding regions within transcript sequences
TransDecoder: A tool to identify candidate coding regions within transcript sequences
Trycyler: A tool for generating consensus long-read assemblies for bacterial genomes
Trinity: A tool to assemble transcript sequences from Illumina RNA-Seq data

Bioinformatic toolbox

Custom scripts: A list of custom scripts useful for bioinformatic analyses
For-loops-in-bash: How can I scale my research and run software not only on one but multiple files?

Useful databases A-Z

arCOG: Archaeal Clusters of Orthologous Genes (arCOGs) database to analyse archaea
COG: Cluster of Orthologous Genes (COGs) to analyse bacteria
dbCAN: Database to analyse carbohydrate-active enzymes in genomes
FeGenie: Database to identify iron genes and iron operons in genomes
HydDB: Database to identify hydrogenases in genomes
KOfam: Database to assign KEGG orthologous (KOs) to protein sequences
NCBI-nr: NCBI’s non-redundant (nr) protein database to analyse protein sequences
Pfam: A collection of protein models to analyse protein sequences
PGAP database: Successor to the TIGRFAM database
TIGRFAM: A collection of manually curated protein families to analyse prokaryotic sequences