- Fix for pysradb download - using public_url
- BREAKING change: Overhaul of how urls and associated metadata are returned (not backward compatible); all column names are lower cased by default
- Fix extra space in "organism_taxid" column
- Added support for Experiment attributes (#89 <saketkc#89 (comment)>)
- Fix ENA fastq fetching (#163 <saketkc#163>)
- Fix for fetchin alternative URLs
- Added ability to fetch alternative URLs (GCP/AWS) for metadata (#161 <saketkc#161>)
- Fix for xmldict 0.13.0 no longer defaulting to OrderedDict (#159 <saketkc#159>)
- Fix for missing experiment model and description in metadata (#160 <saketkc#160>)
- Add study_title to --detailed flag (#152)
- Fix KeyError in metadata where some new IDs do not have any metadata (#151)
- Do not exit if a qeury returns no hits (#149 <saketkc#149>)
- Fixed gsm-to-gse failure (#128)
- Fixed case sensitivity bug for ENA search (#144)
- Fixed publication date bug for search (#146)
- Added support for downloading data from GEO pysradb dowload -g <GSE> (#129)
- Dropped Python 3.6 since pandas 1.2 is not supported
- Retired
metadb
andSRAdb
based search through CLI - everything defaults toSRAweb
SRAweb
now supports search- N/A is now replaced with pd.NA
- Two new fields in --detailed: instrument_model and instrument_model_desc #75
- Updated documentation
- library_layout is now outputted in metadata #56
- -detailed unifies columns for ENA fastq links instead of appending _x/_y #59
- bugfix for parsing namespace in xml outputs #65
- XML errors from NCBI are now handled more gracefully #69
- Documentation and dependency updates
- pysradb download now supports multiple threads for paralle downloads
- pysradb download also supports ultra fast downloads of FASTQs from ENA using aspera-client
- Added test cases for SRAweb
- API limit exceeding errors are automagically handled
- Bug fixes for GSE <=> SRR
- Bug fix for metadata - supports multiple SRPs
Contributors
- Dibya Gautam
- Marius van den Beek
- Bug fix: Handle API-rate limit exceeding => Retries
- Enhancement: 'Alternatives' URLs are now part of --detailed
- Bug fix: Handle Python3.6 for capture_output in subprocess.run
- All the subcommands (srx-to-srr, srx-to-srs) will now print additional columns where the first two columns represent the relevant conversion
- Fixed a bug where for fetching entries with single efetch record
- Major fix: some SRRs would go missing as the experiment dict was being created only once per SRR (See #15)
- Features: More detailed metadata by default in the SRAweb mode
- See notebook: https://colab.research.google.com/drive/1C60V-
- Feature: instrument, run size and total spots are now printed in the metadata by default (SRAweb mode only)
- Issue: Fixed an issue with srapath failing on SRP. srapath is now run on individual SRRs.
- Introduced SRAweb to perform queries over the web if the SQLite is missing or does not contain the relevant record.
- This release completely changes the command line interface replacing click with argparse (saketkc#3)
- Removed Python 2 comptaible stale code
- srr-to-gsm: convert SRR to GSM
- SRAmetadb.sqlite.gz file is deleted by default after extraction
- When SRAmetadb is not found a confirmation is seeked before downloading
- Confirmation option before SRA downloads
- download() works with wget
- --out_dir is now out-dir
Important: Python2 is no longer supported. Please consider moving to Python3.
- Included docs in the index whihch were missed out in the previous release
- gsm-to-srr: convert GSM to SRR
- gsm-to-srx: convert GSM to SRX
- gsm-to-gse: convert GSM to GSE
The following commad line options have been renamed and the changes are not compatible with 0.6.0 release:
- sra-metadata -> metadata.
- sra-search -> search.
- srametadb -> metadb.
- Fixed bugs introduced in 0.5.0 with API changes where multiple redundant columns were output in sra-metadata
- download now allows piped inputs
- Support for filtering by SRX Id for SRA downloads.
- srr_to_srx: Convert SRR to SRX/SRP
- srp_to_srx: Convert SRP to SRX
- Stripped down sra-metadata to give minimal information
- Added --assay, --desc, --detailed flag for sra-metadata
- Improved table printing on terminal
- Fixed unicode error in tests for Python2
- Added a new BASEdb class to handle common database connections
- Initial support for GEOmetadb through GEOdb class
- Initial support or a command line interface: - download Download SRA project (SRPnnnn) - gse-metadata Fetch metadata for GEO ID (GSEnnnn) - gse-to-gsm Get GSM(s) for GSE - gsm-metadata Fetch metadata for GSM ID (GSMnnnn) - sra-metadata Fetch metadata for SRA project (SRPnnnn)
- Added three separate notebooks for SRAdb, GEOdb, CLI usage
- sample_attribute and experiment_attribute are now included by default in the df returned by sra_metadata()
- expand_sample_attribute_columns: expand metadata dataframe based on attributes in `sample_attribute column
- New methods to guess cell/tissue/strain: guess_cell_type()/guess_tissue_type()/guess_strain_type()
- Improved README and usage instructions
- search_sra() allows full text search on SRA metadata.
The following methods have been renamed and the changes are not compatible with 0.1.0 release:
- get_query() -> query().
- sra_convert() -> sra_metadata().
- get_table_counts() -> all_row_counts().
- download_sradb_file() makes fetching SRAmetadb.sqlite file easy; wget is no longer required.
- ftp protocol is now supported besides fsp and hence aspera-client is now optional. We however, strongly recommend aspera-client for faster downloads.
- Silenced SettingWithCopyWarning by excplicitly doing operations on a copy of the dataframe instead of the original.
Besides these, all methods now follow a numpydoc compatible documentation.
- First release on PyPI.