Skip to content

How-to Add Taxa Names to Output

Info

We follow on from the main tutorial including all files just before the clean up step.

If you wish to have actual human-readable taxon names in your standardised output, you need to supply 'taxonomy' files. These files are typically called nodes.dmp and names.dmp. Most profilers use the NCBI taxonomy files.

Assuming that your current working directory is the taxpasta-tutorial directory, we can download the taxonomy files with the following.

curl -O ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz.md5
curl -O ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
md5sum --check taxdump.tar.gz.md5
mkdir taxdump
tar -C taxdump -xzf taxdump.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100    49  100    49    0     0     33      0  0:00:01  0:00:01 --:--:--    33
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 58.5M    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0  8 58.5M    8 4877k    0     0  2289k      0  0:00:26  0:00:02  0:00:24 2289k 49 58.5M   49 29.0M    0     0  9437k      0  0:00:06  0:00:03  0:00:03 9438k 90 58.5M   90 53.1M    0     0  12.9M      0  0:00:04  0:00:04 --:--:-- 12.9M100 58.5M  100 58.5M    0     0  13.2M      0  0:00:04  0:00:04 --:--:-- 13.9M
taxdump.tar.gz: OK

Once downloaded and extracted, you can supply the directory with the taxdump files to your respective taxpasta commands with the --taxonomy flag, and specify which type of taxonomy information to display, e.g., just the name, the rank, and/or taxonomic lineage.

taxpasta merge -p motus -o dbMOTUs_motus_with_names.tsv --taxonomy taxdump --add-name 2612_pe-ERR5766176-db_mOTU.out 2612_se-ERR5766180-db_mOTU.out
[WARNING] The merged profiles contained different taxa. Additional zeroes were introduced for missing taxa.
[INFO] Write result to 'dbMOTUs_motus_with_names.tsv'.

The merged taxpasta output now looks like:

head dbMOTUs_motus_with_names.tsv
taxonomy_id name    2612_pe-ERR5766176-db_mOTU  2612_se-ERR5766180-db_mOTU
40518   Ruminococcus bromii 20  2
216816  Bifidobacterium longum  1   0
1680    Bifidobacterium adolescentis    6   1
1262820 Clostridium sp. CAG:567 1   0
74426   Collinsella aerofaciens 2   1
1907654 Collinsella bouchesdurhonensis  1   0
1852370 Prevotellamassilia timonensis   3   1
39491   [Eubacterium] rectale   3   0
33039   [Ruminococcus] torques  2   0

Clean Up

Don't forget to remove the tutorial directory if you don't want to keep it for later use.