Command Line Usage

Goldilocks is also packaged with a basic command line tool to demonstrate some of its capabilities and to provide access to base functionality without requiring users to author a script of their own. For more complicated queries, you’ll need to import Goldilocks as a package to a script of your own. But for simple use-cases the tool might be enough for you.

Usage

Goldilocks is invoked as follows:

goldilocks <strategy> <sort-op> [--tracks TRACK1 [TRACK2 ...]] -l LENGTH -s STRIDE [-@ THREADS] FAIDX1 [FAIDX2 ...]

Where a strategy is a census strategy listed as available...

$ goldilocks list
Available Strategies
  * gc
  * ref
  * motif
  * nuc

...and a sort operation is one of:

  • max
  • min
  • mean
  • median
  • none

Example

Tabulate all regions and their associated counts of nucleotides A, C, G, T and N. Window size 100Kbp, overlap 50Kbp. Census will spawn 4 processes. Regions in table will be sorted by co-ordinate:

goldilocks nuc none --tracks A C G T N -l 100000 -s 50000 -@ 4 /store/ref/hs37d5.fa.fai

Tabulate all regions and their associated GC-content. Same parameters as previous example but table will be sorted by maximum GC-content descending:

goldilocks gc max -l 100000 -s 50000 -@ 4 /store/ref/hs37d5.fa.fai