Final thoughts on short-read trimming

A summary of recommendations

  • Run FastQC before trimming; trim; look again.
  • Always trim paired sequences together.
  • Always adapter trim!
  • Impose a length filter, 50.
  • for quantification (RNAseq), trim lightly
  • for RNAseq assembly, trim lightly
  • for variant calling, trim stringently
  • use the same trimming parameters on all your data unless you have a VERY good reason otherwise!
  • ignore the first 10 bp composition bias in RNAseq;
  • ignore sequence duplication levels in high-coverage RNAseq;
  • look at your read positional bias with mapping (or de novo) as well;

Some references

MacManes, 2014, http://journal.frontiersin.org/article/10.3389/fgene.2014.00013/full - recommends gentle trimming for RNAseq.

Williams et al., 2015, http://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-0956-2 - recommends imposing a length filter.

Mbandi et al., 2014, http://journal.frontiersin.org/article/10.3389/fgene.2014.00017/full - complicated, but start with gentle trimming.

Del Fabbro et al., 2013, http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085024 - evaluation across data sets.


LICENSE: This documentation and all textual/graphic site content is licensed under the Creative Commons - 0 License (CC0) -- fork @ github. Presentations (PPT/PDF) and PDFs are the property of their respective owners and are under the terms indicated within the presentation.