How-to Add Taxa Names to Output¶
Info
We follow on from the main tutorial including all files just before the clean up step.
If you wish to have actual human-readable taxon names in your standardised
output, you need to supply 'taxonomy' files. These files are typically called
nodes.dmp
and names.dmp
. Most profilers use the NCBI
taxonomy files.
Assuming that your current working directory is the taxpasta-tutorial
directory, we can download the taxonomy files with the following.
curl -O ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5
curl -O ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
md5sum --check taxdump.tar.gz.md5
mkdir taxdump
tar -C taxdump -xzf taxdump.tar.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0100 49 100 49 0 0 33 0 0:00:01 0:00:01 --:--:-- 33
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 0 58.5M 0 0 0 0 0 0 --:--:-- 0:00:01 --:--:-- 0 8 58.5M 8 4877k 0 0 2289k 0 0:00:26 0:00:02 0:00:24 2289k 49 58.5M 49 29.0M 0 0 9437k 0 0:00:06 0:00:03 0:00:03 9438k 90 58.5M 90 53.1M 0 0 12.9M 0 0:00:04 0:00:04 --:--:-- 12.9M100 58.5M 100 58.5M 0 0 13.2M 0 0:00:04 0:00:04 --:--:-- 13.9M
taxdump.tar.gz: OK
Once downloaded and extracted, you can supply the directory with the taxdump
files to your respective taxpasta
commands with the --taxonomy
flag, and
specify which type of taxonomy information to display, e.g., just the name, the
rank, and/or taxonomic lineage.
taxpasta merge -p motus -o dbMOTUs_motus_with_names.tsv --taxonomy taxdump --add-name 2612_pe-ERR5766176-db_mOTU.out 2612_se-ERR5766180-db_mOTU.out
[WARNING] The merged profiles contained different taxa. Additional zeroes were introduced for missing taxa.
[INFO] Write result to 'dbMOTUs_motus_with_names.tsv'.
The merged taxpasta output now looks like:
taxonomy_id name 2612_pe-ERR5766176-db_mOTU 2612_se-ERR5766180-db_mOTU
40518 Ruminococcus bromii 20 2
216816 Bifidobacterium longum 1 0
1680 Bifidobacterium adolescentis 6 1
1262820 Clostridium sp. CAG:567 1 0
74426 Collinsella aerofaciens 2 1
1907654 Collinsella bouchesdurhonensis 1 0
1852370 Prevotellamassilia timonensis 3 1
39491 [Eubacterium] rectale 3 0
33039 [Ruminococcus] torques 2 0
Clean Up¶
Don't forget to remove the tutorial directory if you don't want to keep it for later use.