singlem

Introduction

SingleM is a tool for profiling shotgun metagenomes (Woodcroft et al. 2024). It was originally designed to determine the relative abundance of bacterial and archaeal taxa in a sample. As of version 0.19.0, it can also be used to profile dsDNA phages (see Lyrebird).

It shows good accuracy in estimating the relative abundances of community members, and has a particular strength in dealing with novel lineages. The method it uses also makes it suitable for some related tasks, such as assessing eukaryotic contamination, finding bias in genome recovery, and lineage-targeted MAG recovery. It can also be used as the basis for choosing metagenomes which, when coassembled, maximise the recovery of novel MAGs (see Bin Chicken).

The main idea of SingleM is to profile metagenomes by targeting short 20 amino acid stretches (“windows”) within single copy marker genes. It finds reads which cover an entire window, and analyses these further. By constraining analysis to these short windows, it becomes possible to know how novel each read is compared to known genomes. Then, using the fact that each analysed gene is (almost always) found exactly once in each genome, the abundance of each lineage can be accurately estimated.

For more information, visit the SingleM documentation.

Installation

Installed on Crunchomics: Yes,

SingleM v0.19.0 is installed as part of the bioinformatics share. If you have access to Crunchomics and have not yet access to the bioinformatics share, then you can send an email with your Uva netID to Nina Dombrowski, n.dombrowski@uva.nl.
Afterwards, you can add the bioinformatics share as follows (if you have already done this in the past, you don’t need to run this command):

conda config --add envs_dirs /zfs/omics/projects/bioinformatics/software/miniconda3/envs/

If you want to install it yourself, you can run:

Show the code

mamba create --name singlem_0.19.0 -c bioconda singlem=0.19.0

# Test installation 
conda activate singlem_0.19.0
singlem -h
lyrebird -h

# Get metadatapackage for newest gtdb version
# Adjust directory path as needed
singlem data --output-directory data_dir

conda deactivate

Usage

SingleM comes with several tools (subcommands), for full usage information, please visit the tools documentation.

Single sample

The main tool to use is SingleM pipe. In its most common usage, the SingleM pipe subcommand takes as input raw metagenomic reads and outputs a taxonomic profile. It can also take as input whole genomes (or contigs), and can output a table of OTUs. When using metagenomic reads, please use raw metagenomic reads, not quality trimmed reads. Quality trimming with e.g. Trimmomatic reads often makes them too short for SingleM to use.

conda activate singlem_0.19.0

# Add path to metadata
# Adjust the path as needed when installing this on your own system
export SINGLEM_METAPACKAGE_PATH='/zfs/omics/projects/bioinformatics/databases/singlem/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb'

# Generate OTU tables with GTDB taxonomic profiles
mkdir results 

singlem pipe \
    -1 data/NIOZ114_R1.fastq.gz \
    -2 data/NIOZ114_R2.fastq.gz \
    -p results/output.profile.tsv \
    --otu-table results/output.table.tsv \
    --threads 10

Useful options (for a full list use the tools help function or visit the documentation):

SingleM pipe:

-1: nucleotide read sequence(s) (forward or unpaired) to be searched. Can be FASTA or FASTQ format, GZIP-compressed or not.
-2: reverse reads to be searched. Can be FASTA or FASTQ format, GZIP-compressed or not.
--genome-fasta-files: nucleotide genome sequence(s) to be searched
-p filename: output a ‘condensed’ taxonomic profile for each sample based on the OTU table. Taxonomic profiles output can be further converted to other formats using singlem summarise.
--otu-table filename: outputs and output OTU table
--assignment-method: Method of assigning taxonomy to OTUs and taxonomic profiles [default: smafa_naive_then_diamond]. Options: {smafa_naive_then_diamond, scann_naive_then_diamond, annoy_then_diamond, scann_then_diamond, diamond,diamond_example, annoy, pplacer}

Multiple samples

SingleM can be run on multiple samples. There are two ways. It is possible to specify multiple input files to the singlem pipe subcommand directly by space separating them. Alternatively singlem pipe can be run on each sample and OTU tables combined using singlem summarise. The results should be identical, though there are some performance trade-offs. For large numbers of metagenomes (>100) it is probably preferable to run each sample individually or in smaller groups.

Note that the performance of a single pipe when run on many genomes drastically improved in version 0.17.0, and it now sensible to run up to 10,000 genomes at a time.

Below the example of how to run SingleM individually followed by using singlem summarize. The SingleM summarise subcommand transforms taxonomic profiles and OTU tables into a variety of different formats. The summarise subcommand is useful for transforming the default output formats of pipe, visualising the results of a SingleM analysis, and for performing some downstream analyses.

conda activate singlem_0.19.0

# Add path to metadata
# Adjust the path as needed when installing this on your own system
export SINGLEM_METAPACKAGE_PATH='/zfs/omics/projects/bioinformatics/databases/singlem/S5.4.0.GTDB_r226.metapackage_20250331.smpkg.zb'

# Generate OTU tables with GTDB taxonomic profiles
# Loop through all R1 files found in the data folder (assumes that the sample name is part of the file name)
mkdir results 

for R1 in data/*_R1.fastq.gz; do
    # Derive corresponding R2 file name by replacing _R1 with _R2
    R2="${R1/_R1/_R2}"

    # Extract the sample name from the file name
    SAMPLE=$(basename "$R1" | sed 's/_R1.fastq.gz//')

    # Run singlem pipe
    singlem pipe \
        -1 "$R1" -2 "$R2" \
        -p results/output.${SAMPLE}.profile.tsv \
        --otu-table results/output.${SAMPLE}.table.tsv \
        --threads 10 
done

# Generate one single otu table in long format
singlem summarise --input-otu-tables results/output.*.table.tsv \
    --output-otu-table results/combined.otu_table.csv

# Generate Krona plots 
singlem summarise --input-taxonomic-profile results/output.*.profile.tsv \
    --output-taxonomic-profile-krona results/doco_example.profile.html

# Generate relative abundance and species by site tbles (for each taxonomic level)
singlem summarise --input-taxonomic-profile results/output.*.profile.tsv \
    --output-species-by-site-relative-abundance-prefix results/myprefix

# Store data in long format with additional columns 
singlem summarise --input-taxonomic-profile output.*.profile.tsv \
    --output-taxonomic-profile-with-extras results/doco_example.with_extras.tsv

References

Woodcroft, Ben J., Samuel T. N. Aroney, Rossen Zhao, Mitchell Cunningham, Joshua A. M. Mitchell, Linda Blackall, and Gene W. Tyson. 2024. “SingleM and Sandpiper: Robust Microbial Taxonomic Profiles from Metagenomic Data.” http://dx.doi.org/10.1101/2024.01.30.578060.