Diamond

Introduction

DIAMOND is a sequence aligner for protein and translated DNA searches, designed for high performance analysis of big sequence data (Buchfink, Reuter, and Drost 2021). The key features are:

  • Pairwise alignment of proteins and translated DNA at 100x-10,000x speed of BLAST.
  • Protein clustering of up to tens of billions of proteins
  • Frameshift alignments for long read analysis.
  • Low resource requirements and suitable for running on standard desktops or laptops.
  • Various output formats, including BLAST pairwise, tabular and XML, as well as taxonomic classification.

For a full description, please visit the tools website.

Installation

Installed on crunchomics: Yes,

  • Diamond v 2.1.9 is installed by default on the Crunchomics HPC

If you want to install the latest version yourself, you can run:

mamba create -n diamond -c bioconda -c conda-forge diamond

Usage

Generating your own diamond database

To generate your own database, feel free to follow the instructions found here. Additionally, if you want to build a database using NCBI BLAST database and want to include taxonomy information in your searches, you can follow an example to prepare a diamond database from NCBI nr here.

References

Buchfink, Benjamin, Klaus Reuter, and Hajk-Georg Drost. 2021. “Sensitive Protein Alignments at Tree-of-Life Scale Using DIAMOND.” Nature Methods 18 (4): 366–68. https://doi.org/10.1038/s41592-021-01101-x.