History¶

0.0.80 (2015-08-10)¶

Added multiprocessing capabilities during census step.
Added a simple command line interface.
Removed prepare-evaluate paradigm from strategies and now perform counts directly on input data in one step.
Skip slides (and set all counts to 0) if their end_pos falls outside of the region on that particular genome’s chromosome/contig.
Rename KMerCounterStrategy to MotifCounterStrategy
Fixed bug causing use_and to not work as expected for chromosomes not explicitly listed in the exceptions dict when also using use_chrom.
Support use of FASTA files which must be supplied with a samtools faidx style index.
Stopped supporting Python 3 due to incompatability with buffer and memoryview.
Prevent query from deep copying itself on return. Note this means that a query will alter the original Goldilocks object.
Now using a 3D numpy matrix to store counters with memory shared to support multiprocessing during census.
Removed StrategyValue as these cannot be stored in shared memory. This makes ratio-based strategies a bit of a hack currently (but still work...)
tldr; Goldilocks is at least 2-4x faster than previously, even without multiprocessing

Officially add MIT license to repository.
Deprecate _filter.
Update and tidy examples.py.
is_seq argument to initialisation removed and replaced with is_pos.
Use is_pos to indicate the expected input is positional, not sequence.
Force use of PositionCounterStrategy when is_pos is True.
Sequence data now read in to 0-indexed arrays to avoid the overhead of string

re-allocation by having to append a padding character to the beginning of very long strings.
Region metadata continues to use 1-indexed positions for user output.
VariantCounterStrategy now PositionCounterStrategy.
PositionCounterStrategy expects 1-indexed lists of positions;

prepare populates the listed locations with 1 and then evaluate returns the sum as before.
test_regression2 updated to account for converting 1-index to 0-index when

manually handling the sequence for expected results.
query accepts gmax and gmin arguments to filter candidate regions by the group-track value.
CandidateList removed and replaced with simply returning a new Goldilocks.

Goldilocks.sorted_regions stores a list of region ids to represent the result of a sorting operation following a call to query.
Regions in Goldilocks.regions now always have a copy of their “id” as a key.
__check_exclusions now accepts a group and track for more complex exclusion-based operations.
region_group_lte and region_group_gte added to usable exclusion fields to remove regions where the value of the desired group/track combination is less/greater than or equal to the value of the group/track set by the current query.
query now returns a new Goldilocks instance, rather than a CandidateList.
Goldilocks.candidates property now allows access to regions, this property will maintain the order of sorted_regions if it has one.
export_meta now allows group=None
CandidateList class deleted.
Test data that is no longer used has been deleted.
Scripts for generating test data added to test_gen/ directory.
Tests updated to reflect the fact CandidateList lists are no longer returned by query.
_filter is to be deprecated in favour of query by 0.0.7

Massively updated! Compatability with previous versions very broken.
Software retrofitted to be much more flexible to support a wider range of problems.