- Added multiprocessing capabilities during census step.
- Added a simple command line interface.
- Removed prepare-evaluate paradigm from strategies and now perform counts directly on input data in one step.
- Skip slides (and set all counts to 0) if their end_pos falls outside of the region on that particular genome’s chromosome/contig.
- Rename KMerCounterStrategy to MotifCounterStrategy
- Fixed bug causing use_and to not work as expected for chromosomes not explicitly listed in the exceptions dict when also using use_chrom.
- Support use of FASTA files which must be supplied with a samtools faidx style index.
- Stopped supporting Python 3 due to incompatability with buffer and memoryview.
- Prevent query from deep copying itself on return. Note this means that a query will alter the original Goldilocks object.
- Now using a 3D numpy matrix to store counters with memory shared to support multiprocessing during census.
- Removed StrategyValue as these cannot be stored in shared memory. This makes ratio-based strategies a bit of a hack currently (but still work...)
- tldr; Goldilocks is at least 2-4x faster than previously, even without multiprocessing
Officially add MIT license to repository.
Update and tidy examples.py.
is_seq argument to initialisation removed and replaced with is_pos.
Use is_pos to indicate the expected input is positional, not sequence.
Force use of PositionCounterStrategy when is_pos is True.
- Sequence data now read in to 0-indexed arrays to avoid the overhead of string
re-allocation by having to append a padding character to the beginning of very long strings.
Region metadata continues to use 1-indexed positions for user output.
VariantCounterStrategy now PositionCounterStrategy.
- PositionCounterStrategy expects 1-indexed lists of positions;
prepare populates the listed locations with 1 and then evaluate returns the sum as before.
- test_regression2 updated to account for converting 1-index to 0-index when
manually handling the sequence for expected results.
query accepts gmax and gmin arguments to filter candidate regions by the group-track value.
CandidateList removed and replaced with simply returning a new Goldilocks.
- Goldilocks.sorted_regions stores a list of region ids to represent the result of a sorting operation following a call to query.
- Regions in Goldilocks.regions now always have a copy of their “id” as a key.
- __check_exclusions now accepts a group and track for more complex exclusion-based operations.
- region_group_lte and region_group_gte added to usable exclusion fields to remove regions where the value of the desired group/track combination is less/greater than or equal to the value of the group/track set by the current query.
- query now returns a new Goldilocks instance, rather than a CandidateList.
- Goldilocks.candidates property now allows access to regions, this property will maintain the order of sorted_regions if it has one.
- export_meta now allows group=None
- CandidateList class deleted.
- Test data that is no longer used has been deleted.
- Scripts for generating test data added to test_gen/ directory.
- Tests updated to reflect the fact CandidateList lists are no longer returned by query.
- _filter is to be deprecated in favour of query by 0.0.7
- Massively updated! Compatability with previous versions very broken.
- Software retrofitted to be much more flexible to support a wider range of problems.
- Remove incompatible use of print
- Initial package