Welcome to THE_Aligner

THE_Aligner is a Snakemake pipeline developed by Isaac Vock. It is designed to process fastq files from various flavors of RNA-seq experiments and align them to a reference genome.

What THE_Aligner does

The pipeline includes the following steps:

  1. Trim adapters with fastp
    • Fastqs will also be unzipped with pigz if gzipped. If this is the case, the unzipped fastqs are temporary files that get removed once the pipeline steps using them have finished running. This saves on disk space
  2. Assess fastqs with fastqc
  3. Align fastqs
    • Includes the strictly splice unaware aligners bwa-mem2 orbowtie2.
    • Also includes the splice aware aligners star and hisat2.
    • Alignment statistics are generated with bamtools.
    • Alignment indices can also be automatically built for all implemented aligners.
  4. Sort bam files with samtools
  5. Generate coverage files

In addition, if quantification of annotated transcripts is all you are interested in, THE_Aligner also implements the two most popular pseudo aligners, kallisto and salmon. In this case, bam files are not generated so the sorting and coverage file creation steps are skipped. Once again, indices for these pseudoaligners can be automatically generated.

Requirements for THE_Aligner

THE_Aligner uses the workflow manager Snakemake. The minimal version of Snakemake is techncially compatible with Windows, Mac, and Linux OS, but several of the software dependencies are only Mac and Linux compatible. If you are a Windows user like me, don't sweat it, I would suggest looking to the Windows subsystem for linux which can be easily installed (assuming you are running Windows 10 version 2004 or higher).

In addition, you will need Git installed on your system so that you can clone this repository. Head to this link for installation instructions if you don't already have Git.

Getting Started

There are several ways to run THE_Aligner:

  1. Deploying with Snakedeploy (recommended)
    • A Simon Lab/Yale specific version of these instructions are here. While highly specific, this also includes instructions for optimized deployment on a cluster using a Slurm scheduler, and could thus be of some general use.
  2. Cloning the repo locally