v0.1.0

github-actions released this 13 May 07:49

d043698

0.1.0 (2024-05-10)

⚠ BREAKING CHANGES

return data as NDJSON instead of JSON

Features

AAMutations with multiple sequences (0def8b2)
Action for amino acid distribution (a0a4cf1)
add limit orderBy and offset to all query actions (13b7e01)
add log statements to loadDatabaseState (46a0421)
add more tests, make less flaky and viable with large dataset (7772ae3)
add unit test for findIllegalNucleotideChar, unique test case name for insertion contains invalid pattern tests (99c9c4b)
added amino acid insertion search, added many test cases and fixed various bugs (d1e4b2b)
allow default preprocessing config along with user defined preprocessing config (ee9f20e)
allow reading fasta files with missing segments and genes #220 (8ea9893)
allow reading segments and genes that are null from ndjson file #220 (d0a3a7e)
also get Runtime Config options from environment variables (33bdd65)
also log to stdout (54b8a47)
also return mutation destructed that does not need to be reparsed (a93abbf)
Alternative templating of symbol classes (6b61985)
automatically detect file endings for fasta files (75bd14e)
be more lenient on input data, ignore superfluous sequences and fill missing sequences with Ns (ee12186)
Better test coverage for SymbolEquals filter (42c685c)
boolean columns, resolves #384: const declaration (685db9f)
boolean columns: actions/tuple: update assignTupleField() (b020109)
boolean columns: add and use JsonValueType (2ad1268)
boolean columns: add bool to JsonValueType, update tuple (3c990e2)
boolean columns: add expression_type "BooleanEquals" (b82ec69)
boolean columns: add filter_expressions/bool_equals (ff2a138)
boolean columns: add optional_bool (a2e47f8)
boolean columns: add storage/column/bool_column (5eb3430)
boolean columns: column_group: update ColumnPartitionGroup (d7adecf)
boolean columns: column_group.h: add {ColumnPartitionGroup,ColumnGroup}::bool_columns fields (af44f1a)
boolean columns: database (c555a68)
boolean columns: database_config: add "bool" case to DatabaseConfigReader::readConfig() (12fc7b8)
boolean columns: database_config: add "boolean" case to de/serialisation (e6d5363)
boolean columns: database_config: add BOOL to ValueType (0b47b82)
boolean columns: database_config: update DatabaseMetadata::getColumnType() (30febdd)
boolean columns: database_partition (0aef8f0)
boolean columns: optional_bool: add == (f0aa3e8)
boolean columns: selection (e52f2dc)
build metadata in parallel to sequences. Do not create unaligned sequence tables in preprocessing, rather hive-partition them directly to disk. Better (debug-)logging (c1cdfeb)
bulk Tuple allocations now possible (902ec04)
clearer Operator::negate and Expression::toString, logical Equivalents for debug printing/logging for the Leaf Operators IndexScan and BitmapSelection (026b639)
consistent behavior of configs when starting SILO with both --preprocessing and --api (847ec7e)
declutter README.md from linting option, which is now disabled by default and enforced in the CI for the Linter (9220435)
details no longer shows insertions (#354) (473cd98)
display database info after loading new database state (0249416)
display preprocessing duration in logs in human-readable format (not in microseconds) #296 (a2499af)
do not enforce building with clang-tidy by default. Linter will still be enforced (7134e45)
FastaAligned action (50776c8)
faster builds by copying @corneliusroemer image caching for our dependency images, which rarely change (#374) (7867bc7)
filter for amino acids (b52aabd)
fix sorting (1ed18ae)
flipped bitmap can now be set before insertion (f61c803)
format DatabaseConfig (4fb8f1b)
format PreprocessingConfig (ee35207)
generalize mutations action to have consistent behavior for different symbols (9834aea)
generalizing symbol and mutation filters. Clear handling of ambiguous symbols (aa9ad4d)
Generalizing the config for multiple nucleotide sequences and multiple genes (9a80204)
have structured and destructured insertion in insertions response (0a7e46a)
hide intermediate results of the preprocessing - don't put it in the output (44327b0)
implement basic request id to trace requests #303 (4defb59)
implement data updates at runtime. More resilient to superfluous or missing directory separators (dc5dfaa)
implement insertion columns and search (9167236)
improve loadDB speeds (2b7cd7d)
improve validation error message of some actions on orderByFields (a0da5b5)
insertion action targets all insertion columns by default (6b70241)
insertion columns for amino acids and multiple sequence names (3cc8fee)
insertions action (e067062)
insertions contains action now targets all columns if the column name is missing (32a6951)
introduce new storage type for Sequence Positions, where the most numerous symbol is deleted (6e15204)
introduce storage of unaligned sequences from either ndjson file or fasta file and make them queryable via the Fasta action (44df849)
load table lazily. Unaligned Sequences do not need to load the table (c2a8439)
log databaseConfig and preprocessingConfig (d2dc58c)
logging for partition (e75a925)
logging improvements (4c12a88)
make database serializable again (2523e67)
make pangoLineageDefinitionFilename in preprocessing config optional, linter errors (0f3dc53)
make partition_by field in config optional (3942418)
make SILO Docker image by default read data from /data (e83b910)
make threads and max queued http connections available through optional parameter (3ecde68)
migration to duckdb 0.10.1 (c1426ef)
mine data version at beginning of preprocessing (362fe0f)
More robust InputStreamWrapper (305dd36)
multiple performance improvements for details endpoint (28f41d0)
optimize bitmaps before finishing partition (5b06d58)
order all actions by default (a2f5c04)
preparation of insertion columns (c14a370)
put output and logs to gitignore (789e489)
reenable bitmap inversion (75ac20f)
reenable pushdown of And expressions through selections (802bec0)
refactor saving and loading database to not require preprocessing structs anymore (45bf7ed)
reintroduce randomize for all query actions (166045c)
reserve space in columns when bulk inserting rows (e3c9620)
return data as NDJSON instead of JSON (c236ba4), closes #126
return data version on each query (be5c886)
return only aliased pango lineages (abf0844)
reveal some more details when reading YAML fails (1f8d9db)
run preprocessing in github ci (f53eddb)
save database state into folder with name <data version> (41923eb)
separate preprocessing and starting silo (9808e2d)
serialization of partition descriptor to json (472c1da)
some suggestions for the insertion search (ae900da)
Specifiable nucSequence query target (7cc609f)
statically disable deleted symbol optimisations because of performance penalty (4e522f4)
stick to the default of having config value keys in camel case (a1cae40)
storing amino acids (f11a330)
support for nullable columns (6f78e3a)
support recombinant lineages (3e848a5)
template class for sequence store (cef4d48)
templatized Symbol classes (6b9d734)
test set with amino acid insertions (de2c4f8)
throw an error when there is not initialized database loaded yet #295 (b17f72a)
tidying up CLI and file configuration for runtime config. Added option for specifying the port (c3a88a0)
Unit tests for Tuple (4fc06e8)
update conan version (5540a67)
use 'pragma once' as include guards instead of 'ifndef...' (bc49aa5)
use own scope for preprocessing (2a93846)
use same default min proportions for mutations actions as the old LAPIS (f42f830)

Bug Fixes

adapt randomize query results to target architecture. x86 and ARM have possibly different std::hash results (#355) (600000d)
add bash dependency which is required by conan build of pkgconf and is not installed on alpine by default (1b3f51c)
add insertion to database_config test (aec40d0)
add missing file for test (262bedc)
add missing sequenceName field to mutation action "orderBy" (06c8c86)
add sleep statement before row call (8fa8efb)
add workaround so insertions are read correctly (8a2bfa8)
allow sql keywords for metadata field names #259 (6fbeee5)
also consider 'missing' symbols in the mutation action. Bugfix where Position invariant was broken because of 'missing' symbols (fab72a6)
alternative non-exhaustive three mer index (467086f)
always build dependency image for amd64 platform (4f73cb0)
bug when filtering for indexedStringColumns which are not present in some partitions (62d4e08)
bug where sequence reconstruction is false when the flipped bitmap is different from the reference sequence symbol (edac58c)
change random ordering for gcc hashing (78055ff)
change test to reflect new optimisations (cb63010)
compiling And: append selection_child->predicates to predicates - not vice versa (bdebca5)
deterministic order for e2e test (02427a2)
divergence between mac and linux info test results, fix memory leaks in Threshold.cpp (ffcefd0)
do not exclude zstd filter from boost installation (c974230)
do not use std::filesystem::path::relative_path() to also support absolute paths (b9ff422)
endToEndResults (c807a33)
error when the Mutations action looked for sequences but the filter was empty (91e5d52)
fix memory leaks in indexed_string_column.cpp and insertion_contains.cpp (c2eeac8)
floatEquals and floatBetween with null values (47b436e)
hide nucleotide sequence for default sequence (584715c)
insertion column, remove reference (d940c52)
insertion search e2e and insertion column tests, dont allow non-empty value for insertion search (19af04d)
linking error on linux (ad9076c)
linter (54d3c34)
linter (e6b6ab7)
linter (b31851c)
linter (34830ab)
linter errors (e9e1bbf)
Linter throws again and added clang-format option (87cb4a5)
Make C++ flags in CMake compatible for MacOS (38cae69)
metadata info test accessing getMetadataFields output no longer directly but over the address of a const (0dcdee1)
missing include (0773999)
new linter errors (ea6934c)
no longer have regression when no bitmap flipped is most efficient (b653753)
nodiscard (silence warnings) (69421c1), closes #390
non default unaligned nucleotide sequence prefix (93b4829)
nucleotide symbol equals with dot (6ad623e)
only apply order-by if the field is set, validate orderBy fields for all operations (405a7f1)
pango lineage filter with null values (b7238a8)
parse error messages for mutation filter expressions (9e4612d)
put compressors into sql function to avoid static variables (c1a11c8)
quoting {} in "x.{}" SQL struct accesses, as a string starting with a number leads to parser errors (#409) (f3ba6db)
random (but deterministic for a version) result can depend on internal state, which was changed with duckdb update (9d9351b)
recursive file reading for nodejs<20 (9ac0ffe)
remove caching for linter. Docker image to large on github actions. (8eaeee6)
response format (bca4961)
revert duckdb migration due to it being unable to build the new version on the GitHub runner (2a62c54)
revert test numbers to pre-optimization (99b600a)
Roaring from 1.3.0 -> 1.0.0 because of broken CI (ce82c77)
seralization for insertion_index and insertion_column (ee99f8d)
single partition build fix (a8af1c9)
specify namespace fmt in calls to format_to (#353) (62ffa3b)
specifying apk versions (2c2354e)
test cases verifying that the positions index for mutation distribution are now 1-indexed (fdf972a)
test with deterministic results, remove 2 unused variables (96a424b)
unit test info number updates for new pango-lineages in test data (39d07c2)
unit tests and mock fixtures (db506a1)
update cmake version on ubuntu (227a2dd)
Upper and lower bound should be inclusive in DateBetween filter (53c6c05)
Wrong compare function used in multi-threaded case, which displayed wrong tuples in the details endpoint when a limit was used (650bf36)
zstd dependency (c145722)

Assets 2