# Download the data
wget https://raw.githubusercontent.com/GreeningLab/HydDB/refs/heads/main/fastas/HydDB_all_hydrogenases.faa
# Convert faa file to diamond database
diamond makedb --in HydDB_all_hydrogenases.faa -d hyddbHydDB database
Introduction
The hydrogenase database (HydDB) provides information pages for different groups of hydrogenases (Søndergaard, Pedersen, and Greening 2016). These could originally queried using the HydDB webserver but is no longer maintained, however, the database itself is available here and there is active work on HydDB v2.0.
For a brief description about the different hydrogenases, have a look at Table 1 in this paper.
Installation
Available on crunchomics: Yes,
- The HydDB database is installed as part of the bioinformatics share. If you have access to Crunchomics and have not yet access to the bioinformatics you can send an email with your Uva netID to Nina Dombrowski.
The database can be found here:
/zfs/omics/projects/bioinformatics/databases/hydDB/release2022.
If you want to download the database yourself, you can do:
Example usage
The database can be searched with diamond blastp and we provide a small python script to parse the output:
## Run search
diamond blastp -q data/genomes.faa \
--more-sensitive --evalue 1e-3 \
--threads 5 --max-target-seqs 20 \
--db /zfs/omics/projects/bioinformatics/databases/hyddb/release2022/hyddb \
--outfmt 6 qseqid qtitle qlen sseqid salltitles slen qstart qend sstart send evalue bitscore length pident \
--out results/results.txt
# Parse the data
python /zfs/omics/projects/bioinformatics/databases/hyddb/release2022/parse_diamond_hydDB.py \
-i results/results.txt \
-o results/hyddb_parsed.tsv \
--evalue 1e-5The python returns the best hit based on the e-value and bitscore. It also discards hits below a certain e-value thresholds and below the following percent identities (as recommended here):
- [NiFe] = >50% for group 4, >30% for all other groups
- [FeFe] = >45%
- [Fe] = >50%
options:
-h, --help show this help message and exit
-i INPUT, --input INPUT
Input HMMER tblout file
-o OUTPUT, --output OUTPUT
Output filtered TSV file
--evalue EVALUE E-value cutoff (default: 1e-10)
--cutoff-fefe CUTOFF_FEFE
Percent identity cutoff for FeFe hydrogenases (default: 45)
--cutoff-nife-all CUTOFF_NIFE_ALL
Percent identity cutoff for NiFe Groups 1, 2, 3 (default: 30)
--cutoff-nife-group-4 CUTOFF_NIFE_GROUP_4
Percent identity cutoff for NiFe Group 4 (default: 50)
--cutoff-feonly CUTOFF_FEONLY
Percent identity cutoff for Fe-only hydrogenases (default: 50)