Cluster Flow - Automate and standardise bioinformatics analyses on cluster environments

What is Cluster Flow?

Cluster Flow is a command-line program which uses common cluster managers to run analysis pipelines. It currently supports GRIDEngine (SGE), LSF and SLURM as well as running locally on any unix system.

Benefits of using Cluster Flow:

Routine analyses are very quick to run
Pipelines use identical parameters, standardising analysis and making results more reproducable
Integrated parallelisation tools help prevent your cluster becoming overloaded
All commands and output is logged in files for future reference
Intuitive commands and a comprehensive manual make Cluster Flow easy to use
Very easy to get up and running (in theory at least!)

How Cluster Flow differs from other pipeline tools:

Very lightweight and flexible
Pipelines and configurations can easily be generated on a project-specific basis if required
New modules and pipelines are super easy to write (see video tutorial)

Installation

Cluster Flow can be downloaded from http://clusterflow.io. The source code for Cluster Flow is hosted on GitHub: https://github.com/ewels/clusterflow/

Full installation instructions can be found in the documentation.

Documentation

You can read the full documentation at http://clusterflow.io

There are also three introductory videos:

Introduction

Usage

Installation

Contributors

Phil Ewels

Written whilst working at the Babraham Institute, maintained at SciLifeLab

Licence

GPL v3

See the code for Cluster Flow here: https://github.com/ewels/clusterflow