Welcome to THE_Aligner
THE_Aligner is a Snakemake pipeline developed by Isaac Vock. It is designed to process fastq files from various flavors of RNA-seq experiments and align them to a reference genome.
What THE_Aligner does
The pipeline includes the following steps:
- Trim adapters with fastp
- Fastqs will also be unzipped with pigz if gzipped. If this is the case, the unzipped fastqs are temporary files that get removed once the pipeline steps using them have finished running. This saves on disk space
- Assess fastqs with fastqc
- Align fastqs
- Sort bam files with samtools
- Generate coverage files
- Bedgraph files created with bedtools
- BigWig files created with bedGraphtoBigWig
In addition, if quantification of annotated transcripts is all you are interested in, THE_Aligner also implements the two most popular pseudo aligners, kallisto and salmon. In this case, bam files are not generated so the sorting and coverage file creation steps are skipped. Once again, indices for these pseudoaligners can be automatically generated.
Requirements for THE_Aligner
THE_Aligner uses the workflow manager Snakemake. The minimal version of Snakemake is techncially compatible with Windows, Mac, and Linux OS, but several of the software dependencies are only Mac and Linux compatible. If you are a Windows user like me, don't sweat it, I would suggest looking to the Windows subsystem for linux which can be easily installed (assuming you are running Windows 10 version 2004 or higher).
In addition, you will need Git installed on your system so that you can clone this repository. Head to this link for installation instructions if you don't already have Git.
Getting Started
There are several ways to run THE_Aligner:
- Deploying with Snakedeploy (recommended)
- A Simon Lab/Yale specific version of these instructions are here. While highly specific, this also includes instructions for optimized deployment on a cluster using a Slurm scheduler, and could thus be of some general use.
- Cloning the repo locally