cutadapt - remove adapter sequences from high-throughput sequencing data
Cutadapt finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequence from your high-throughput sequencing reads.
Cleaning your data in this way is often required: Reads from small-RNA sequencing contain the 3’ sequencing adapter because the read is longer than the molecule that is sequenced; amplicon reads start with a primer sequence; and poly-A tails are useful for pulling out RNA from your sample, but typically you don’t want them to be in your reads.
Cutadapt helps with these trimming tasks by finding the adapters or primers in an error-tolerant way. It can also filter reads by length and do quality trimming. Adapter sequences can contain IUPAC wildcard characters. Also, paired-end reads and even colorspace data is supported. If you want, you can also just demultiplex your input data, without removing adapter sequences at all.
Cutadapt is available under the terms of the MIT license, is well documented, and comes with an extensive suite of automated tests.
Install cutadapt by running
pip install --user cutadapt
In the simplest case, an adapter sequence is removed by running the program like this:
cutadapt -a AACCGGTT -o output.fastq.gz input.fastq.gz
For anything more complicated, just read the documentation.
Initially written while working at Computer Science chair 11 at TU Dortmund, now maintained at SciLifeLab.
See the code for cutadapt here: https://github.com/marcelm/cutadapt/