Skip to content

How-to Customise Sample Names

Info

We follow on from the main tutorial including all files just before the clean up step.

With taxpasta you can also customise the sample names that are displayed in the column header of your merged table, by creating a sample sheet that has the sample name you want and paths to the files.

We can generate such a TSV sample sheet with a bit of bash trickery or your favourite spreadsheet program.

Assuming that your current working directory is the taxpasta-tutorial directory.

## Get the full paths for each file
ls -1 *mOTU.out > motus_paths.txt

## Construct a sample name based on the filename
sed 's#-db_mOTU.out##g;s#^.*/##g' motus_paths.txt > motus_names.txt

## Create the samplesheet, adding a header, and then adding the samplenames and paths
printf 'sample\tprofile\n' > motus_samplesheet.tsv
paste motus_names.txt motus_paths.txt >> motus_samplesheet.tsv

Then instead of giving to merge the paths to each of the profiles, we can provide the sample sheet itself.

taxpasta merge -p motus -o dbMOTUs_motus_cleannames.tsv -s motus_samplesheet.tsv
[INFO] Read sample sheet from 'motus_samplesheet.tsv'.
[WARNING] The merged profiles contained different taxa. Additional zeroes were introduced for missing taxa.
[INFO] Write result to 'dbMOTUs_motus_cleannames.tsv'.

You can now see that the column headers look a bit better.

head dbMOTUs_motus_cleannames.tsv
taxonomy_id 2612_pe-ERR5766176  2612_se-ERR5766180
40518   20  2
216816  1   0
1680    6   1
1262820 1   0
74426   2   1
1907654 1   0
1852370 3   1
39491   3   0
33039   2   0

Clean Up

Don't forget to remove the tutorial directory if you don't want to keep it for later use.