-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Development #117
Merged
Merged
Development #117
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Refactoring temporary json file concatenation
* Uncomment dask dataframe import. * Remove json output type from list of parquet incompatible formats. * Specify dtypes for dask dataframe read from json. * Enable reading json files into dask dataframe and writing as parquet file. * Enable files to be read in binary mode for all output formats. * Change json output field for j gene from score to assigner_score. * Revert change in line position to read file in binary mode. * Converting IMGT positions from integers or floats to string. * Coerce raw_position to be stringtype. * Add schema for JSON fields datatypes to override when writing to parquet. * Additional schema attributes. * Convert schema to full pyarrow schema for full dataset. * Add columns desired order and dtypes for dataframe metadata. * Reorder dtype fields. * Remove unneeded column and dtype information. * Edit json reading and parquet writing code. * Add additional schema attributes involved in BCR. * Reorder schema fields. * Reorder pyarrow schema.
… already assigned V gene
… end of the V and/or the 3' end of the J
* Replace string dtypes to object dtypes. * Add function attribute to indicate if parquet will be written to `write_output` function. * Write parquet files directly in place of temporary JSON files. * Add flag to ignore datatype conversion errors when casting integer columns with NaNs, and change output file name. * Edit concat_outputs to simply move files instead for parquet files generated from json output. * Edit file path from string concatenation to os path join. * Minor edit to ps.path.join. * Added if statement to check if file exists before attempting to delete temporary file. * Add `.snappy` file extension to parquet files. * Simplified file name to simply moving to directory instead. * Simplify specifying columns by changing `schema.names` to `dtypes`. * Parse strings of dictionary into dictionary with `json.loads` before loading into dataframe. * Read in temporary parquet files, repartition and write back parquet files. * Remove setting writing metadata file in parquet to False as it's the default function argument. * Remove unused imports. * Remove if condition to check for temp files before deleting them.
* Fix chunking of fastq files * ignore vscode
* Replace double quotation marks to single quotes for consistency with rest of codebase. * Add empty line at EOF. * Allow matplotlib to be installed to the latest version since scanpy has upgraded their matplotlib support. * Add comments to better explain code edits.
…nctions (no need for a matrix)
Add support to write in parquet format
Preprocessing
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.