Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to Clang tidy 20 #535

Merged
merged 2 commits into from
Aug 6, 2024
Merged

Update to Clang tidy 20 #535

merged 2 commits into from
Aug 6, 2024

Conversation

Taepper
Copy link
Collaborator

@Taepper Taepper commented Aug 6, 2024

Summary

PR Checklist

  • All necessary documentation has been adapted or there is an issue to do so.
  • The implemented feature is covered by an appropriate test.

@Taepper Taepper requested a review from pflanze August 6, 2024 09:22
Copy link
Contributor

github-actions bot commented Aug 6, 2024

This is a preview of the changelog of the next release:

0.3.0 (2024-08-06)

⚠ BREAKING CHANGES

  • old database_config files might be invalid if they contained insertion columns. Also, we are more prohibitive for ndjson input files, which now MUST contain nucleotide/amino acid insertions for all respective sequences. The insertions action and filter do no longer require a column field.
  • return data as NDJSON instead of JSON

Features

  • add more tests, make less flaky and viable with large dataset (7772ae3)
  • add number of partitions to /info (1e780b5)
  • add serializationVersion to SILO output for smoother transitioning to new formats (d3badb6)
  • added amino acid insertion search, added many test cases and fixed various bugs (d1e4b2b)
  • allow default preprocessing config along with user defined preprocessing config (ee9f20e)
  • allow null for sequenceName in insertion contains queries (6dbe251)
  • allow reading fasta files with missing segments and genes #220 (8ea9893)
  • allow reading segments and genes that are null from ndjson file #220 (d0a3a7e)
  • also get Runtime Config options from environment variables (33bdd65)
  • also log to stdout (54b8a47)
  • also return mutation destructed that does not need to be reparsed (a93abbf)
  • Alternative templating of symbol classes (6b61985)
  • Better test coverage for SymbolEquals filter (42c685c)
  • boolean columns, resolves Boolean columns #384: const declaration (685db9f)
  • boolean columns: actions/tuple: update assignTupleField() (b020109)
  • boolean columns: add and use JsonValueType (2ad1268)
  • boolean columns: add bool to JsonValueType, update tuple (3c990e2)
  • boolean columns: add expression_type "BooleanEquals" (b82ec69)
  • boolean columns: add filter_expressions/bool_equals (ff2a138)
  • boolean columns: add optional_bool (a2e47f8)
  • boolean columns: add storage/column/bool_column (5eb3430)
  • boolean columns: column_group: update ColumnPartitionGroup (d7adecf)
  • boolean columns: column_group.h: add {ColumnPartitionGroup,ColumnGroup}::bool_columns fields (af44f1a)
  • boolean columns: database (c555a68)
  • boolean columns: database_config: add "bool" case to DatabaseConfigReader::readConfig() (12fc7b8)
  • boolean columns: database_config: add "boolean" case to de/serialisation (e6d5363)
  • boolean columns: database_config: add BOOL to ValueType (0b47b82)
  • boolean columns: database_config: update DatabaseMetadata::getColumnType() (30febdd)
  • boolean columns: database_partition (0aef8f0)
  • boolean columns: optional_bool: add == (f0aa3e8)
  • boolean columns: selection (e52f2dc)
  • build metadata in parallel to sequences. Do not create unaligned sequence tables in preprocessing, rather hive-partition them directly to disk. Better (debug-)logging (c1cdfeb)
  • bulk Tuple allocations now possible (902ec04)
  • bump serialization version to 2 (e897203)
  • bump serialization version to 3 (7584214)
  • change base image to ubuntu (64dee0d)
  • clearer Operator::negate and Expression::toString, logical Equivalents for debug printing/logging for the Leaf Operators IndexScan and BitmapSelection (026b639)
  • consistent behavior of configs when starting SILO with both --preprocessing and --api (847ec7e)
  • declutter README.md from linting option, which is now disabled by default and enforced in the CI for the Linter (9220435)
  • details no longer shows insertions (#354) (473cd98)
  • display database info after loading new database state (0249416)
  • display preprocessing duration in logs in human-readable format (not in microseconds) #296 (a2499af)
  • do not enforce building with clang-tidy by default. Linter will still be enforced (7134e45)
  • faster builds by copying @corneliusroemer image caching for our dependency images, which rarely change (#374) (7867bc7)
  • generalize mutations action to have consistent behavior for different symbols (9834aea)
  • generalizing symbol and mutation filters. Clear handling of ambiguous symbols (aa9ad4d)
  • have no default sequence by default, implement default amino acid sequence from config (2edf924), closes #454
  • have structured and destructured insertion in insertions response (0a7e46a)
  • hide intermediate results of the preprocessing - don't put it in the output (44327b0)
  • implement basic request id to trace requests #303 (4defb59)
  • implement data updates at runtime. More resilient to superfluous or missing directory separators (dc5dfaa)
  • improve loadDB speeds (2b7cd7d)
  • improve validation error message of some actions on orderByFields (a0da5b5)
  • insertion action targets all insertion columns by default (6b70241)
  • insertion columns for amino acids and multiple sequence names (3cc8fee)
  • insertions action (e067062)
  • insertions contains action now targets all columns if the column name is missing (32a6951)
  • insertions no longer in metadata or databaseConfig, instead expected for all aligned sequences #372 (1f3680c)
  • introduce new storage type for Sequence Positions, where the most numerous symbol is deleted (6e15204)
  • introduce storage of unaligned sequences from either ndjson file or fasta file and make them queryable via the Fasta action (44df849)
  • load table lazily. Unaligned Sequences do not need to load the table (c2a8439)
  • logging improvements (4c12a88)
  • make SILO Docker image by default read data from /data (e83b910)
  • make threads and max queued http connections available through optional parameter (3ecde68)
  • migration to duckdb 0.10.1 (c1426ef)
  • mine data version at beginning of preprocessing (362fe0f)
  • multiple performance improvements for details endpoint (28f41d0)
  • optimize bitmaps before finishing partition (5b06d58)
  • reintroduce randomize for all query actions (166045c)
  • reserve space in columns when bulk inserting rows (e3c9620)
  • return data as NDJSON instead of JSON (c236ba4), closes #126
  • return only aliased pango lineages (abf0844)
  • reveal some more details when reading YAML fails (1f8d9db)
  • save database state into folder with name <data version> (41923eb)
  • set the default sequence when there is only a single sequence in the reference genomes (0ae4022), closes #454
  • statically disable deleted symbol optimisations because of performance penalty (4e522f4)
  • stick to the default of having config value keys in camel case (a1cae40)
  • streaming: fasta_aligned: process in sub-partition batches (e87d927), closes #112
  • streaming: fasta: process as a batch per partition (66bfdff), closes #112
  • streaming: fasta: process in sub-partition batches (12f873f), closes #112
  • streaming: fasta: remove limit on total row count (df185cc), closes #112
  • streaming: implement materialization cut-off (fd1a77c), closes #112
  • support SAM files as sequence input and allow partial sequence input with an offset (5be9a9f)
  • template class for sequence store (cef4d48)
  • templatized Symbol classes (6b9d734)
  • test set with amino acid insertions (de2c4f8)
  • throw an error when there is not initialized database loaded yet #295 (b17f72a)
  • throw error in preprocessing when default sequence is not in reference genomes (287bb8b), closes #454
  • tidying up CLI and file configuration for runtime config. Added option for specifying the port (c3a88a0)
  • Unit tests for Tuple (4fc06e8)
  • use 'pragma once' as include guards instead of 'ifndef...' (bc49aa5)

Bug Fixes

  • adapt randomize query results to target architecture. x86 and ARM have possibly different std::hash results (#355) (600000d)
  • add bash dependency which is required by conan build of pkgconf and is not installed on alpine by default (1b3f51c)
  • add missing file for test (262bedc)
  • add missing sequenceName field to mutation action "orderBy" (06c8c86)
  • add workaround so insertions are read correctly (8a2bfa8)
  • allow all sequence-names by escaping them properly in all SQL statements (901fc7e)
  • allow insertions that start at 0 (#449) (f427137)
  • allow sql keywords for metadata field names #259 (6fbeee5)
  • also consider 'missing' symbols in the mutation action. Bugfix where Position invariant was broken because of 'missing' symbols (fab72a6)
  • alternative non-exhaustive three mer index (467086f)
  • always build dependency image for amd64 platform (4f73cb0)
  • be able to start without genes, without nucleotide sequences or with neither (e878ed5)
  • bug when filtering for indexedStringColumns which are not present in some partitions (62d4e08)
  • bug where sequence reconstruction is false when the flipped bitmap is different from the reference sequence symbol (edac58c)
  • change random ordering for gcc hashing (78055ff)
  • change test to reflect new optimisations (cb63010)
  • do not use std::filesystem::path::relative_path() to also support absolute paths (b9ff422)
  • don't abort when reading table in chunks (26b1558)
  • empty input without partitioning (5fa3c92)
  • empty ndjson input files more robust (#473) (9d4232b)
  • endToEndResults (c807a33)
  • erroneous file created during unit tests should not leak (1b764af)
  • error when the Mutations action looked for sequences but the filter was empty (91e5d52)
  • floatEquals and floatBetween with null values (47b436e)
  • hide nucleotide sequence for default sequence (584715c)
  • insertion column, remove reference (d940c52)
  • insertions being added at wrong index for large files (#472) (e056ed9)
  • linking error on linux (ad9076c)
  • linter (54d3c34)
  • Make C++ flags in CMake compatible for MacOS (38cae69)
  • metadata info test accessing getMetadataFields output no longer directly but over the address of a const (0dcdee1)
  • missing include (0773999)
  • more efficient ndjson emptiness check (#481) (344ec7b)
  • new linter errors (ea6934c)
  • nodiscard (silence warnings) (69421c1), closes #390
  • non default unaligned nucleotide sequence prefix (93b4829)
  • nucleotide symbol equals with dot (6ad623e)
  • pango lineage filter with null values (b7238a8)
  • put compressors into sql function to avoid static variables (c1a11c8)
  • query_handler: check for I/O errors when writing to the HTTP stream (de2b475), closes #112
  • quoting {} in "x.{}" SQL struct accesses, as a string starting with a number leads to parser errors (#409) (f3ba6db)
  • random (but deterministic for a version) result can depend on internal state, which was changed with duckdb update (9d9351b)
  • recursive file reading for nodejs<20 (9ac0ffe)
  • reduce function calls and parallelism when inserting rows into storage backend (#509) (93ba858)
  • remove caching for linter. Docker image to large on github actions. (8eaeee6)
  • remove incorrect compile flags (b92ee4e)
  • return instead of continue (03a500a)
  • revert duckdb migration due to it being unable to build the new version on the GitHub runner (2a62c54)
  • revert test numbers to pre-optimization (99b600a)
  • single partition build fix (a8af1c9)
  • specify namespace fmt in calls to format_to (#353) (62ffa3b)
  • specifying apk versions (2c2354e)
  • start with empty files without throwing an error (d407b92)
  • streaming: report an error when sorting requested while streaming (313d6bb)
  • streaming: report an error when the unimplemented offset or limit matters (3be9c06), closes #511
  • test with deterministic results, remove 2 unused variables (96a424b)
  • trigger release due to refactor commit-5e3041e5 (#494) (#496) (ecd4ba0)
  • unit tests and mock fixtures (db506a1)
  • update cmake version on ubuntu (227a2dd)
  • Upper and lower bound should be inclusive in DateBetween filter (53c6c05)
  • Wrong compare function used in multi-threaded case, which displayed wrong tuples in the details endpoint when a limit was used (650bf36)

Copy link
Contributor

@pflanze pflanze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@pflanze
Copy link
Contributor

pflanze commented Aug 6, 2024

PS. As discussed, the plan is to go with this as long as clang-format can be kept at version 17. If that doesn't work anymore, either revert the change to clang-tidy 20, or figure out how to configure clang-format ~exhaustively so that changing clang-format version does not yield formatting changes.

Discussion of the latter: https://stackoverflow.com/questions/59395507/how-to-maintain-a-clang-format-file-for-different-clang-format-versions
Problematic point re clangd: clangd/clangd#1999

Could potentially also generate the .clang-format file from our explicit options and public metadata (is https://clang.llvm.org/docs/ClangFormatStyleOptions.html generated from such metadata?) about the options accepted by the newest version of clang-format; but hopefully someone has already written such a generator, research this.

@Taepper Taepper merged commit 0099471 into main Aug 6, 2024
10 checks passed
@Taepper Taepper deleted the clang-tidy-20 branch August 6, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants