taxize 0.9.101
- Add
rworkflows
.- Update .Rbuildignore for
rworkflows
. - Remove old workflow: R-CMD-check.yaml
- Bump
taxize
version. - Add
rworkflows
status badge to README.
- Update .Rbuildignore for
Minor release to fix failing tests on CRAN.
Zachary Foster is now the maintainer of taxize
.
tnrs()
andtnrs_sources()
functions are defunct. The service has been unreliable for years now, and AFAICT is down for good. Associated changes have been made throughout the package, eg.resolve()
no longer has an option for tnrs, etc. (#841) (#842)
- new article/vignette added on issues with taxonomic ranks, e.g., "NCBI is weird", and how rank information is maintained and used within taxize (#852)
- vignettes are no longer on cran - find them at the docs site linked in DESCRIPTION (#855)
- re-instate a
tol_resolve()
test following new version ofrotl
package on cran (#816) - improve
class2tree()
function documentation regarding how the function works in more detail (#849) (#851) - improvements for WORMS, applies to functions
worms_downstream()
,children(..., db="worms")
anddownstream(..., db="worms")
: now paginate automatically for the user to get all results, and allow parametermarine_only
to be passed through the high level functionschildren()
/downstream()
down toworrms::wm_children()
where it toggles whether marine only results are returned (#848) thanks @oharac !
- fix to
ncbi_downstream()
(which cascades up todownstream(..., db="ncbi")
): an unneeded line of code was removed that was also throwing an error in some cases (#850) - fixes for WORMS ranks, applies to functions
worms_downstream()
,children(..., db="worms")
anddownstream(..., db="worms")
: added ranksepifamily
andinfraphylum
. In addition, when a rank is missing in data returned from WORMS, we'll change the missing rank to "no rank" (#847)
- improve
worms_downstream()
docs: make it clear that users can use parameters passed down toworrms::wm_children()
(#831) - improve
get_pow_()
docs: add section on rate limits, what are rate limits for KEW POW and a user facing resolution (#836) - add 8 new rank names (via NCBI) to the reference rank data.frame (
rank_ref
) in the package: biotype, forma specialis, isolate, pathogroup, series, serogroup, serotype, and strain - queries fromdownstream()
and other functions that rely on relative rank information should not fail anymore when they contain these 8 rank names (#830)
- new
rank_ref_zoo
reference data.frame specfically for zoological rank types - right now only used for WORMS. main difference is section/subsection inrank_ref_zoo
are nested between the order and family, whereas inrank_ref
(used for all other data sources) section/subsection are on the genus rank level (#833) - NCBI introduced a new rank "clade", or at least are using it a lot more often - often used instead of "no rank". This was causing some problems in
class2tree()
. Problem sorted out now (#835) (#838) (#839) (#840)
- Many of the functions in taxize share similar types of inputs (e.g., scientific names, or common names), but many different parameter names are used to refer to the same thing. We've standardized parameter names to make user understanding easier as the user uses different functions. TLDR:
sci
will always only accept a scientific name;com
accdepts only a common name;id
accepts a taxonomic identifier;sci_com
accepts or scientific or common name;sci_id
accepts a scientific name or taxonomic identifier. In most cases we have retained the old parameter name and you can still use it but you get a warning with information. In a future package version the replaced parameters will be removed completely. See #723 for tables covering the functions affected, their old and new parameter names (#723) (#829)
- upgraded APG datasets (
apg_families
andapg_order
) to v14 (from July 2017) (#827)
- fix to
worrms_downstream()
: three rank names were not accounted for in our internal set of ranks (suptertribe, subterclass, parvorder) (#824) classification.gbifid
was returning a duplicate last taxon, i.e., the last two rows in the output data.frame were the same. fixed. (#825)- fixed issue in
lowest_common()
due to problem inclassification.uid()
when a taxon UID was merged into another taxon (#828)
- NatureServe has a new API version; the package natserv (https://docs.ropensci.org/natserv/) has a complete overhaul for the new API - taxize interfaces to NatureServe updated. Only user facing change should be that we've moved to using just the final numeric part of the NatureServe taxonomic identifiers as
ELEMENT_GLOBAL.2.
part is redundant for every identifier (#823)
rankagg()
andtax_agg()
fixes:rankagg()
examples now conditional on availability ofvegan
as it should be, and now real abundance data are used in the example.tax_agg()
fixes species name ordering indune
data (#822) work by @jarioksa
- fixed a bug in
class2tree()
(#818) (#820) thx to @adriangeerre for the report & the fix by @trvinh - fix to
worms_downstream()
: user encountered a rank name ("phylum (division)") we hadn't dealt with yet for worms (#821) thx @msweetlove for the report
- gains new functions:
bold_children()
,bold_downstream()
and new S3 methods forboldid
:children.boldid
anddownstream.boldid
. Beware that these new methods are built on top of a function that scrapes BOLD's website - their API doesn't provide access to taxonomic children (only parents) - so we've taken the liberty of trying to liberate that data and make it easy to access (#817)
- fix to a failing
tol_resolve()
test - upstream package rotl had the bug; told maintainer about it and he'll submit a new version soon; affected commented out for now (#814)
synonyms()
gains a method for Plants of the World Online (synonyms.pow
); and new associated helper functionpow_synonyms()
used withinsynonyms.pow
(#812)
- change to
iucn_summary()
to allowget_iucn()
failures and the function to still proceed - to make a better experience when passing in more than 1 name (#810) - fixed non-ASCII string in the
species_plantarum_binomials
dataset
classification()
for data source GBIF wasn't working when the queried taxon rank was below species (e.g., subspecies or variety); GBIF didn't return the same fields for ranks below species, so we tack on that information with a bit of extra code (#809)- fix sorting of results in
classification()
with data source GBIF; at some point introduced bug in how results were sorted (#811)
use_eol()
is now defunct; EOL no longer requires an API key (#749) (#803) thanks @padpadpadpad
- http to https upgrades for the following functions:
vascan_search()
,taxize_cite()
, all*_ping()
functions,get_wormsid()
,get_pow()
,get_eolid()
,get_gbifid()
,get_boldid()
,gbif_name_usage()
; and in various places in documentation (#799)
classification.uid()
now does batch HTTP requests. NCBI Entrez web service allows requests with up to 50 identifiers; @zachary-foster did the work to make this method now use batch queries so its much faster (#678) (#798)class2tree()
improvement in taxonomy rank indexing (#805) work by @trvinh- fix to description of
taxon_state_messages
parameter in thetaxize_options
help file (#806) - taxize package datasets now loaded into a package environment (#792)
ncbi_children()
now accepts numeric and character class ids (#800)- fix
classification.gbifid()
, was failing because GBIF changed the order of results (#802) class2tree()
fix: problem was due ultimately to a bug inclassification.gbifid()
(see line above) (#801)tax_rank()
fix - fordb="ncbi"
was not giving correct ranks for queried names - was due to a change inclassification.uid
(#804)- fix bug in
get_eolid()
when filtering by data source lead to no results (#808) - fix for
ncbi_downstream
(and thereby fix fordownstream()
withdb="ncbi"
): for some taxa a query to NCBI resulted in children as well the queried name itself, and the next query would give the same results, leading to an endless while loop - now we remove the taxon itself that was queried to prevent this (#807)
COL introduced rate limiting recently in 2019 - which has made the API essentially unusable - CoL+ is coming soon and we'll incorporate it here when it's stable. see https://github.com/ropensci/colpluz for the in development R client (#796)
- gains new function
gn_parse()
to access the Global Names scientific name parser. it's a super fast parser. see the section on name parsers (https://docs.ropensci.org/taxize/reference/index.html#section-name-parsers) for the 3 functions that do name parsing (#794) - dropped packages from imports: reshape2, stringr, plyr (#795)
get_wormsid()
gains two new parameters:fuzzy
andmarine_only
; both are passed through toworrms::wm_records_name()
/worrms::wm_records_name()
(#790)
- no longer running taxon state examples on check (#791)
- vignettes have names now in the pkg docs site (#772)
- update docs for new roxygen2 version that suppoprts R6 (#793)
- gains dataset
worrms_ranks
to apply rank names in cases where WORMS fails to return rank names in their data - remove a
get_tpsid()
example that passes in names as factors;get_*
functions no longer accept factors
- fix to
classification.tpsid()
: change to an internal fxn changed its output; fix for that (#797) - fix
get_boldid()
: when filtering (e.g., w/rank
,division
,parent
) returned no match,get_boldid
was failing on downstream parsing; return NA now - fix
get_wormsid_()
: was missingmarine_only
andfuzzy
parameters - fix
pow_search()
: an if statement was leading to length > 1 booleans - fix
synonyms()
: an if statement in internal fxnprocess_syn_ids
was leading to length > 1 booleans - fix
classification.gbifid
: select columns only if they exist instead of failing on plucking non-existtent columns
get_ids()
gains a new parametersuppress
(default:FALSE
) to toggle pakagecli
messages stating which database is being worked on (#719)
- the following datasets are now available when the package is not loaded, so functions that use these datasets can now be called with package namespace like
taxize::downstream()
:rank_ref
,theplantlist
,apg_families
,apg_orders
(#777) (#781) - add new documentation site url (https://docs.ropensci.org/taxize/) to DESCRITPION file (#774) (@jeroen)
- fix links in README to issues label of new potential data sources (#782) (@katrinleinweber)
- more or less all functions that take as input the output of
get_*
functions have S3 methods that dispatch on thoseget_*
output classes. however, you can still pass in adb
parameter, which is IGNORED when dispatching on the input class. thedb
parameter is used (not ignored) when passing in a taxon id as character/numeric/etc. now these functions (children, classification, comm2sci, sci2comm, downstream, id2name, synonyms, upstream) warn when the user passes adb
value which will be ignored (#780) - The NCBI Entrez API often throws errors for users of this and other packages related to HTTP version used by the client; we now hard code the http version to HTTP/1.1 via the curl option
http_version=2L
across all Entrez requests (#783)
- fixes to
col_search()
: COL now does rate limiting (if you make too many requests within a time period they will stop allowing requests from your IP address/your computer); documented rate limiting, what I know at least; changedchecklist
parameter behavior: years 2014 and back dont provide JSON, so we returnxml_document
objects now for those years that the user can parse themselves (#786) tax_rank
somehow (my bad) had two.default
methods. previous behavior is the same as current behavior (this version) (#784)- fix
ncbi_children()
: fixed regex that was supposed to flag ambiguous taxa only, it was supposed to flagsp.
andspp.
, but was includingsubsp.
, which we didn't want included (#777) (#781) - another fix to
ncbi_children()
: when ID is passed rather than a name, we need to then setid=NULL
after switching to the equivalent taxononmic name internally to avoid getting duplicate data back (#777) (#781)
- update all EUBON functions to use their new API version;
eubon_search()
gains new paramslimit
andpage
; other eubon functions have no pagination (#766) - change base url in
ipni_search()
from http to https, via (#773)
- change
synonyms()
to always returnNA
for name not found, and always return a zero row data.frame when name found BUT no synonyms found; updated docs to indicate better what's returned (#763) (#765) - COL sometimes returns control characters in the XML payload; these can't be parsed by the
xml2
package, so we have to remove them using regex; we throw a message when we're doing this so the user knows (#768) - docs typos fixes (#770)
- update
classification()
docs with a newEOL
section discussing that EOL does not have good failure behavior, and what to expect from them (#775) - the following datasets are now available when the package is not loaded, so functions that use these datasets can now be called with package namespace like
taxize::downstream()
:rank_ref
,theplantlist
,apg_families
,apg_orders
(#777) sci2comm()
andcomm2sci()
improvements: fordb="ncbi"
we no longer stop with error when when there's no results for a query; instead we returncharacter(0)
. In addition, now all data source options for both functions now returncharacter(0)
when there's no results for a query (#778)id2name.uid()
now actually passes on...
internally for curl options
- fix
get_nbnid()
: was returning non-taxon entities, have ot addidxtype:TAXON
to thefq
query (#761) - fixes for
as.eolid()
andas.colid()
- don't run through helper function that was raising error on HTTP 404/etc., dont want to fail (#762) - fix to
class2tree()
: set root node name to NA if it does not exist, ITIS does not set a root node (#767) (#769) work by @gpli - fix to
ipni_search()
: IPNI changed parameter names, fixes for that; and now returning tibble's instead of data.frame's (#773) thanks @joelnitta ! - fix
ncbi_children()
: fixed regex that was supposed to flag ambiguous taxa only, it was supposed to flagsp.
andspp.
, but was includingsubsp.
, which we didn't want included (#777) - another fix to
ncbi_children()
: when ID is passed rather than a name, we need to then setid=NULL
after switching to the equivalent taxononmic name internally to avoid getting duplicate data back (#777)
- all
get_*
functions gain some new features (associated new fxns aretaxon_last
andtaxon_clear
): a) nicer messages printed to the console when iterating through taxa, and a summary at the end of what was done; and b) state is now saved when runningget_*
functions. That is, in an object external to theget_*
function call we keep track of what happened, so that if an error is encountered, you can easily restart where you left off; this is especially useful when dealing with a large number of inputs to aget_*
function. To utilize, pass the output oftaxon_last()
to aget_*
function call. Associated with these changes are new package imports: R6, crayon and cli (#736) (#757) - gains a new function
taxize_options()
to set options when using taxize. the first reason for the function is to set two options for the above item forget_*
functions:taxon_state_messages
to allow taxon state tracking messages inget_*
functions or not, andquiet=TRUE
quiets output from thetaxize_options()
function itself
- in
id2name()
andworms_downstream()
useworrms::wm_record
instead ofworrms::wm_record_
for newest version ofworrms
(#760) - many
get_*
functions andcol_downstream()
parameterverbose
changed tomessages
to not conflict with averbose
curl options parameter passed in tocrul
- fix to http request processing for COL - sometimes errors, and gives a message in the response body, but DOES NOT give the appropriate error HTTP status code - need to always do a check for COL responses (#755) (#756) thanks @dougwyu
- fix to
gbif_downstream()
- GBIF in some cases returns a rank of "unranked", which we hadn't accounted for in internal rank processing code (#758) thanks @ocstringham
class2tree()
gains node labels when present (#644) (#748) thanks @gpli- change documentation to use markdown (#658) (#746) thanks @Rekyt
- gains new functions for Kew's Plants of the World:
get_pow()
,get_pow_()
,as.pow()
,classification.pow()
,pow_search()
, andpow_lookup()
(#598) (#739) - we now pass a user agent string in all HTTP requests to the various data sources so they know its coming from
taxize
. the string will look something liker-curl/3.3 crul/0.7.0 rOpenSci(taxize/0.9.6)
, including the versions of thecurl
R pkg, thecrul
package, and thetaxize
package (#662) - change to
get_colid
functionality: we weren't paginating for the user when there were more than 50 results for a query; we now paginate for the user using async HTTP requests; this means that some requests will take longer than they did before if they have more than 50 results; this is a good change given that you get all the results for your query now (#743) - change across most
get_*
functions: in some of theget_*
functions we tried for a direct match (e.g.,"Poa" == "Poa"
) and if one was found, then we were done and returned that record. however, we didn't deploy the same logic across allget_*
functions. Now allget_*
functions check for a direct match. Of course if there is a direct match with more than 1 result, you still get the prompt asking you which name you want. (#631) (#734)
- Make separate
taxize-authentication
manual file covering authentication information across the package (#681) - new case study vignette added (#544) (#721) thanks @fozy81
- add note to
gnr_resolve()
docs about age of datasets used in the Global Names Resolver, and how to access age of datasets (#737) get_eolid()
fixes: gains new attributepageid
;uri
's given are updated to EOL's new URL format;rank
anddatasource
parameters were not documented, now are; we no longer use short names for data sources within EOL, but instead use their full names (#702) (#742)col_search()
now returns attributes on the output data.frame's with number of results found and returned, and other metadata about the searchgnr_datasources()
loses thetodf
parameter; now always returns a data.frame and the data.frame has all the columns, whereas the default call returned a limited set of columns in previous versions
- fix bug in
get_wormsid()
, was failing when there was a direct match found with more than 1 result (#740) - fix across all
get_*
functions: linting of the input to therows
parmeter was failing with a vector of values in some cases (#741) - fix to
iucn_summary()
; we weren't passing on the API key internally correctly (#735) thanks @PrincessPi314 for the report
iucn_summary_id()
is defunct, useiucn_summary()
instead
col_downstream()
gains parameterextant_only
(logical) to optionally keep extant taxa only (#714) thanks @ArielGreiner for the inquirydownstream()
gains anotherdb
options: Worms. You can now setdb="worms"
to use Worms to get taxa downstream from a target taxon. In addition,taxize
gains new functionworms_downstream()
, which is used under the hood indownstream(..., db="worms")
(#713) (#715)- gains new function
id2name()
withdb
options for tol, itis, ncbi, worms, gbif, col, and bold. the function converts taxonomid IDs to names. It's sort of the inverse of theget_*()
family of functions. (#712) (#716) tax_rank()
gains new parameterrows
so that one can passrows
down toget_*()
functions
synonyms()
warning from an internalcbind()
call now fixed (#704) (#705) thanks @vijaybarve- namespace
taxize
function calls thrown when notifying users about API keys (e.g.,taxize::use_tropicos()
) to make it very clear where the functions live (to avoid confusion withusethis
) (#724) (#725) thanks @maelle - changed
iucn_summary()
to output the same structure when no match is found as when a match is found so that when output is passed toiucn_status()
behavior is the same (#708) thanks @Rekyt - skip
tax_name()
tests on CRAN (#728) httr
replaced bycrul
throughout (#590)- most unit tests that make HTTP requests now cached with
vcr
, making tests much faster and not prone to errors to remote services being down (#729) - EOL: The EOL API underwent major changes, and we've attempted to get things in working order.
eol_dataobjects()
gains new parameterlanguage
.eol_pages()
losesiucn
,images
,videos
,sounds
,maps
, andtext
parameters, and gainsimages_per_page
,videos_per_page
,sounds_per_page
,maps_per_page
,texts_per_page
, andtexts_page
. Please do let us know if you find any problems with any EOL functions (#717) (#718) - As part of EOL changes, the default
db
value forcomm2sci()
andsci2comm()
is nowncbi
instead ofeol
- EUBON base URL now https instead of http
- A number of
get_*()
functions changed parameterverbose
tomessages
to not conflict withverbose
passed down tocrul::HttpClient
- ping functions:
ncbi_ping()
reworked to allow use of your api key as a parameter or pulled from your environemnt;eol_ping()
using https instead of http, and parsing JSON instead of XML.
get_eolid()
was erroring when no results found for a query due to not assigning an internal variable (#701) (#709) thanks for the fix @taddallasget_tolid()
was erroring when values wereNULL
- now replacing allNULLL
withNA_character_
to makedata.table::rbindlist()
happy (#710) (#711) thanks @gpli for the fix- add additional rows to the
rank_ref
data.frame of taxonomic ranks: species subgroup, forma, varietas, clade, megacohort, supercohort, cohort, subcohort, infracohort. when there's no matched rank errors can result in many of the downstream functions. The data.frame now has 43 rows. (#720) (#727) - fix to
downstream()
andncbi_get_taxon_summary()
: change inncbi_get_taxon_summary
to break up queries into smaller chunks to avoid HTTP 414 errors ("URI too long") (#727) (#730) thanks for reporting @fischhoff and @benjaminschwetz - a number of fixes internally (not user facing) to comply with upcoming R-devel changes for checking length greater than 1 in logical statements (#731)
- new contributor: Gaopeng Li
- gains new functions for helping the user get authentication keys/tokens:
use_entrez()
,use_eol()
,use_iucn()
(which uses internallyrredlist::rl_use_iucn()
), anduse_tropicos()
(#682) (#691) (#693) By @maelle
- remove commented out code
- fix
tropicos_ping()
- fixed
downstream()
andgbif_downstream()
: some of the results don't have acanonicalName
, so now safely try to get that field (#673) - fixed
as.uid()
, was erroring when passing in a taxon ID (#674) (#675) by @zachary-foster - fix in
get_boldid()
(and by extensionclassification(..., db = "bold")
): was failing when no parent taxon found, just fill in with NA now (#680) - fix to
synonyms()
: was failing for some TSNs fordb="itis"
(#685) - fix to
tax_name()
:rows
arg wasn't being passed on internally (#686) - fix to
gnr_resolve()
andgnr_datasources()
: problems were caused by http scheme, switched to use https instead of http (#687) - fix to
class2tree()
: organisms with unique rank lower than non-unique ranks will give extra wrong rows (#689) (#690) thanks @gpli - fix in
ncbi_get_taxon_summary()
: changes in the NCBI API most likely lead to HTTP 414 (URI Too Long) errors. we now loop internally for the user. By extension this helps problems upsteam indownstream()
/ncbi_downstream()
/ncbi_children()
(#698) - fix in
class2tree()
: was erroring when name strings contained pound signs (e.g.,#
) (#699) (#700) thanks @gpli
- package gains three new authors: Bastian Greshake Tzovaras, Philippe Marchand, and Vinh Tran
- Don't enforce rate limiting via
Sys.sleep
for NCBI requests if the user has an API key (#667) - Fix to all functions that do NCBI requests to work whether or not a user has an NCBI API key (#668)
- Increased documentation on authentication, see
?taxize-authentication
- Further conversion of
verbose
tomessages
across the package so that supressing calls tomessage()
do not conflict with curl options passed in - Converted
genbank2uid()
andncbi_get_taxon_summary()
to usecrul
instead ofhttr
for HTTP requests
- Fix to
get_tolid()
: it was missing assignment of theatt
attribute internally, causing failures in some cases (#663) (#672) - Fix to
ncbi_children()
(and thuschildren()
when requesting NCBI data) to not fail when there is an empty result from the internal call toclassification()
(#664) thanks @arendsee
class2tree()
gets a major overhaul thanks to @gedankenstuecke and @trvinh (!!). The function now takes unnamed ranks into account when clustering, which fixes problem where trees were unresolved for many splits as the named taxonomy levels were shared between them. Now it makes full use of the NCBI Taxonomy string, including the unnamed ranks, leading to higher resolution trees that have less multifurcations (#611) (#634)- Added support throughout package for use of NCBI Entrez API keys - NCBI now strongly encourages their use and you get a higher rate limit when you use one. See
?taxize-authentication
for help. Importantly, note that API key names (both R options and environment variables) have changed. They are now the same for R options and env vars: TROPICOS_KEY, EOL_KEY, PLANTMINER_KEY, ENTREZ_KEY. You no longer need an API key for Plantminer. (#640) (#646) - New author Zebulun Arendsee (@arendsee)
- New package dependencies:
crul
andzoo
- In
downstream()
we now pass onlimit
andstart
parameters togbif_downstream()
; we weren't doing that before; the two parameters control pagination (#638) genbank2uid()
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail (#642) thanks @zachary-fosterchildren()
outputs made more consistent for certain cases when no results found for searches (#648) (#649) thanks @arendsee- Improve
downstream()
by passing...
(additional parameters) down toncbi_children()
used internally. allows e.g., use ofambiguous
parameter inncbi_children()
allows you to remove ambiguousl named nodes (#653) (#654) thanks @arendsee - swapped out use of
httr
forcrul
in EOL and Tropics functions - note that this won't affect you unless you're passing curl options. see packagecrul
for help on curl options. Along with this change, the parameterverbose
has changed tomessages
(for toggling printing of information messages)
- Added additional text to the
CONTRIBUTING.md
file for how to contribute to the test suite (#635)
genbank2uid
now returns the correct ID when there are multiple possibilities and invalid IDs no longer make whole batches fail.- Fix to
downstream()
: passing numeric taxon ids to the function while usingdb="ncbi"
wasn't working (#641) thanks @arendsee - Fix to
children()
: passing numeric taxon ids to the function while usingdb="worms"
wasn't working (#650) (#651) thanks @arendsee synonyms_df()
- that attemps to combine many outputs from thesynonyms()
function - now removes NA/NULL/empy outputs before attempting the combination (#636)- Fix to
gnr_resolve()
: before ifpreferred_data_sources
was used, you would get the preferred data but only a few columns of the response. We now return all fields; however, we only return the preferred data part when that parameter is used (#656) - Fixes to
children()
. It was returning unexpected results for amgiguous taxonomic names (e.g., there's some insects that are returned when searching within Bacteria). It was also failing when one tried to get the children of a root taxon (e.g., the children of the NCBI id 131567). (#639) (#647) fixed via PR (#659) thanks @arendsee and @zachary-foster
- Added separate documentation file for all get* functions describing attributes and various exception behaviors
- Some
get*()
functions hadNaN
as defaultrows
parameter value. Those all changed toNA
- Better failure behavior now when non-acceptable
rows
parameter value given - Added in all type checks for parameters across
get_*()
functions - Changed behavior across all
get_*()
functions to behave the same whenask = FALSE, rows = 1
andask = TRUE, rows = 1
as these should result in the same outcome. (#627) thanks @zachary-foster ! - Fixed direct match behavior so that when there's multiple results
from the data provider, but no direct match, that the functions don't
give back just
NA
with no inication that there were multiple matches. - Please let me know if any of these changes cause problems for your code or package.
- Change
comm2sci()
to S3 setup with methods forcharacter
,uid
, andtsn
(#621) iucn_status()
now has S3 setup with a single method that only handles output from theiucn_summary()
function.
- Add required
key
parameter to fxniucn_id()
(#633) - imrove docs for
sci2comm()
: to indicate how to get non-simplified output (which includes what language the common name is from) vs. getting simplified output (#623) thanks @glaroc ! - Fix to
sci2comm()
to not be case sensitive when looking for matches (#625) thanks @glaroc ! - Two additional columns now returned with
eol_search()
:link
andcontent
- Improve docs in
eol_search()
to describe returneddata.frame
- Fix
bold_bing()
to use new base URL for their API - Improved description of the dataset
rank_ref
, see?rank_ref
- Fix to
downstream()
via fix torank_ref
dataset to include "infraspecies" and make "unspecified" and "no rank" requivalent. Fix tocol_downstream()
to remove properly ranks lower than allowed. (#620) thanks @cdeterman ! iucn_summary
: changed to usingrredlist
package internally.sciname
param changed tox
.iucn_summary_id()
now is deprecated in favor ofiucn_summary()
.iucn_summary()
now has a S3 setup, with methods forcharacter
andiucn
(#622)- Added "cohort" to
rank_ref
dataset as that rank sometimes used at NCBI (from bug reported inncbi_downstream()
) (#626) - Fix to
sci2comm()
, addtryCatch()
to internals to catch failed requests for specific pageid's (#624) thanks @glaroc ! - Fix URL for taxa for NBN taxonomic ids retrieved via
get_nbnid()
(#632)
- Remove
ape::neworder_phylo
object, which is not used anymore intaxize
(#618) (#619) thanks @ashiklom
- New function
ncbi_downstream()
and now NCBI is an option in the functiondownstream()
(#583) thanks for the push @andzandz11 - New data source: Wiki*, which includes Wikipedia, Wikispecies, and
Wikidata - you can choose which you'd like to search. Uses new package
wikitaxa
, with contributions from @ezwelty (#317) scrapenames()
gains a parameterreturn_content
, a boolean, to optionally return the OCR content as a text string with the results. (#614) thanks @fgabriel1891- New function
get_iucn()
- to get IUCN Red List ids for taxa. In addition, new S3 methodssynonyms.iucn
andsci2comm.iucn
- no other methods could be made to work with IUCN Red List ids as they do no share their taxonomic classification data (#578) thanks @diogoprov
bold
now an option inclassification()
function (#588)- fix to NBN to use new base URL (#582) ($597)
genbank2uid()
can give back more than 1 taxon matched to a given Genbank accession number. Now the function can return more than one match for each query, e.g., trygenbank2uid(id = "AM420293")
(#602) thanks @sariya- had to modify
cbind()
usage to incclude...
for method consistency (#612) tax_rank()
used to be able to do only ncbi and itis. Can now do a lot more data sources: ncbi, itis, eol, col, tropicos, gbif, nbn, worms, natserv, bold (#587)- Added to
classification()
docs in a sectionLots of results
a note about how to deal with results when there are A LOT of them. (#596) thanks @ahhurlbert for raising the issue tnrs()
now returns the resulting data.frame in the oder of the names passed in by the user (#613) thanks @wpetry- Changes to
gnr_resolve()
to now strip out taxonomic names submitted by user that are NA, or zero length strings, or are not of class character (#606) - Added description of the columns of the data.frame output in
gnr_resolve()
(#610) thanks @kamapu - Added noted in
tnrs()
docs that the service doesn't provide any information about homonyms. (#610) thanks @kamapu - Added
parvorder
to thetaxize
rank_ref
dataset - used by NCBI - if tax returned with that rank, some functions intaxize
were failing due to that rank missing in our reference datasetrank_ref
(#615)
- Fix to
get_colid()
via problem in parsing withincol_search()
(#585) - Fix to
gbif_downstream
(and thus fix indownstream()
): there was two rows with form in ourrank_ref
reference dataset of rank names, causing > 1 result in some cases, then causingvapply
to fail as it's expecting length 1 result (#599) thanks @andzandz11 - Fix
genbank2uid()
: was failing when getting more than 1 result back, works now (#603) and fails better now, giving back warnings/error messages that are more informative (see also #602) thanks @sariya - Fix to
synonyms.tsn()
: in some cases a TSN has > 1 accepted name. We get accepted names first from the TSN, then look for synonyms, and hadn't accounted for > 1 accepted name. Fixed now (#607) thanks @tdjames - Fixed bug in
sci2comm()
- was not dealing internally with passing thesimplify
parameter (#616)
- Added WoRMS integration via the new
worrms
package on CRAN. Adds functionsas.wormsid()
,get_wormsid()
,get_wormsid_()
,children.wormsid()
,classification.wormsid()
,sci2comm.wormsid()
,comm2sci.wormsid()
, andsynonyms.wormsid()
(#574) (#579) - New functions for NatureServe data, including
as.natservid
,get_natservid
,get_natservid_
, andclassification.natservid
(#126)
- EOL API keys were not passed on to internal functions. fixed now. thanks @dschlaep ! (#576)
- Fix in
rankagg()
with respect tovegan
package to work with older and new version ofvegan
- thank @jarioksa (#580) (#581)
- New data source added: Open Tree of Life. New functions for the data source
added:
get_tolid()
,get_tolid_()
, andas.tolid()
(#517) - related to above
classification()
gains new method for TOL data - related to above
lowest_common()
gains new method for TOL data - Now using
ritis
package, an external dependency for ITIS taxonomy data. Note that a large number of ITIS functions were removed, and are now available via the packageritis
. However, there are still many high level functions for working with ITIS data (see functions prefixed withitis_
), andget_tsn()
,classification.tsn()
, and similar high level functions remain unchanged. (#525) - EUBON has a new API (v1.2). We now interact with that new API version.
In addition,
eubon()
fxn is noweubon_search()
, although either still work - thougheubon()
will be made defunct in the next version of this package. Additional new functions were added:eubon_capabilities()
,eubon_children()
, andeubon_hierarchy()
(#567) lowest_common()
function gains two new data source options: COL (Catalogue of Life) and TOL (Tree of Life) (#505)- Addded new function
synonyms_df()
as a slim wrapper arounddata.table::rbindlist()
to make it easy to combine many outputs fromsynonyms()
for a single data source - there is a lot of heterogeneity among data sources in how they report synonyms data, so we don't attempt to combine data across sources (#533)
- Change NCBI URLs to
https
fromhttp
(#571)
- Fixed bug in
tax_name()
in which when an invalid taxon was searched for thenclassification()
returned no data and caused an error. Fixed now. (#560) thanks @ljvillanueva for reporting it! - Fixed bug in
gnr_resolve()
in which order of input names to the function was not retained. fixed now. (#561) thanks @bomeara for reporting it! - Fixed bug in
gbif_parse()
- data format changed coming back from GBIF - needed to replaceNULL
withNA
(#568) thanks @ChrKoenig for reporting it!
- New vignette: "Strategies for programmatic name cleaning" (#549)
get_*()
functions now have new attributes to further help the user:multiple_matches
(logical) indicating whether there were multiple matches or not, andpattern_match
(logical) indicating whether a pattern match was made, or not. (#550) from (#547) discussion, thanks @ahhurlbert ! see also (#551)- Change all
xml2::xml_find_one()
toxml2::xml_find_first()
for newxml2
version (#546) gnr_resolve()
now retains user supplied taxa that had no matches - this could affect your code, make sure to check your existing code (#558)gnr_resolve()
- stop sorting output data.frame, so order of rows in output data.frame now same as user input vector/list (#559)
- Fixed internal fxn
sub_rows()
inside of mostget_*()
functions to not fail when the data.frame rows were less than that requested by the user inrows
parameter (#556) - Fixed
get_gbifid()
, as sometimes calls failed because we now return numberic IDs but used to return character IDs (#555) - Fix to all
get_()
functions to call the internalsub_rows()
function later in the function flow so as not to interfere with taxonomic based filtering (e.g., user filtering by a taxonomic rank) (#555) - Fix to
gnr_resolve()
, to not fail on parsing when no data returned when a preferred data source specified (#557)
- Fix to
iucn_summary()
(#543) thanks @mcsiple - Added message for when too many Ids passed in to
ncbi_get_taxon_summary()
suggesting to break up the ids into chunks (#541) thanks @daattali - Fix to
itis_acceptname()
to accept multiple names (#534) and now gives back same output regardless of whether match found or not (#531)
- Fix to
tax_name()
for some queries that return no classification data via internal call toclassification()
(#542) thanks @daattali - Another fix for
tax_name()
(#530) thanks @ibartomeus - Fixed docs for
rankagg()
function, userequireNamespace()
in examples to make sure user hasvegan
installed (#529)
- Changed defunct messages in
eol_invasive()
andgisd_invasive()
to point to new location in the originr package. Also, cleaned out code in those functions as not avail. anymore (#494) - Access to IUCN taxonomy information is now provided through the newish rredlist package. (Two issues dealing with IUCN problems (#475) (#492))
- Fix to
get_gbifid()
to use new internal code to provide two ways to search GBIF taxonomy API, either via/species/match
or via/species/search
, instead of/species/suggest
, which we used previously. The suggest route was too coarse.get_gbifid()
also gains a parametermethod
to toggle whether you search for names using/species/match
or/species/search
. (#528) - Fix for
col_search()
to handle when COL can return a value ofmissapplied name
, which aswitch()
statement didn't handle yet (#511) thanks @JoStaerk ! - Fixes for
get_colid()
andcol_search()
(#523) thanks @zachary-foster !
- Fixed bug in the package dependency
bold
, which fixestaxize::bold_search()
, so no actual changes intaxize
for this, but take note (#521) - Fixed problem in
gnr_resolve()
where we indexed to data incorrectly. And added tests to account for this problem. Thanks @raredd ! (#519) (#520) - Fixed bug in
iucn_summary()
introduced in last version.iucn_summary()
now uses the packagerredlist
, which requires an API key, and I didn't document how to use the key. Function now allows user to pass the key in as a parameter, and documents how to get a key and save it in either.Renviron
or in.Rprofile
(#522)
- New function
lowest_common()
for obtaining the lowest common taxon and rank for a given taxon name or ID. Methods so far for ITIS, NCBI, and GBIF (#505) - New contributor James O'Donnell (@jimmyodonnell) (via #505)
- Now importing
rredlist
rredlist - New function
iucn_summary_id()
- same asiucn_summary()
, except takes IUCN IDs as input instead of taxonomic names (#493) - All taxonomic rank columns in data.frame's now given back as lower case. This provides consistency, which is important, and many functions use ranks to determine what to do next, so using a consistent case is good.
iucn_summary()
fixes, long story short: a number of bug fixes, and uses the new IUCN API via the newish packagerredlist
when IDs are given as input, but uses the old IUCN API when taxonomic names given. Also: gains new parameterdistr_details
(#174) (#472) (#487) (#488)- Replaced
XML
withxml2
for XML parsing (#499) - Fixes to internal use of
httr::content
to explicitly stateencoding="UTF-8"
(#498) gnr_resolve()
now outputs a column (user_supplied_name
) for the exact input taxon name - facilitates merging data back to original data inputs (#486) thanks @Alectoriaeol_dataobjects()
gains new parametertaxonomy
to toggle whether to return any taxonomy details from different data providers (#497)- Catalogue of Life URLs changed - updated all appropriate COL functions to use the new URLs (#501)
classification()
was giving back rank values in mixed case from different data providers (e.g.,class
vs.Class
). All rank values are now all lowercase (#504)- Changed number of results returned from internal GBIF search in
get_gbfid
to 50 from 20. Gives back more results, so more likely to get the thing searched for (#513) - Fix to
gni_search()
to make all output columnscharacter
class iucn_id()
,tpl_families()
, andtpl_get()
all gain a new parameter...
to pass on curl options tohttr::GET()
- Fixes to
get_eolid()
: URI returned now always has the pageid, and goes to the right place; API key if passed in now actually used, woopsy (#484) - Fixes to
get_uid()
: when a taxon not found, the "match" attribute was saying found sometimes anyway - that is now fixed; additionally, fixed docs to correctly state that we give back'NA due to ask=FALSE'
whenask = FALSE
(#489) Additionally, made this doc fix in otherget_*()
function docs - Fix to
apgOrders()
function (#490) - Fixes to
tp_search()
which fixesget_tpsid()
: Tropicos doesn't allow periods (.
) in query strings, so those are URL encoded now; Tropicos doesn't like sub-specific rank names in name query strings, so we warn when those are found, but don't alter user inputs; and improved docs to be more clear about how the function fails (#491) thanks @scelmendorf ! - Fix to
classification(db = "itis")
to fail better when no taxa found (#495) thanks @ashenkin ! eol_pages()
fixes: the EOL API route for this method gained a new parametertaxonomy
, this function gains that parameter. That change caused this fxn to fail. Now fixed. Also, parametersubject
changed tosubjects
(#500)- Fix to
col_search()
due to whenmisapplied name
come back as a data slot. There was previously no parser for that type. Now there is, and it works (#512)
- Now requires
R >= 3.2.1
. Good idea to update your R installation anyway (#476) - New function
ion()
for obtaining data from Index of Organism Names (#345) - New function
eubon()
for obtaining data from EU (European Union) BON taxonomy (#466) Note that you may onloy get partial results for some requests as paging isn't implemented yet in the EU BON API (#481) - New suite of functions, with prefix
fg_*()
for obtaining data from Index Fungorum. More work has to be done yet on this data source, but these initial functions allow some Index Fungorum data access (#471) - New function
gbif_downstream()
for obtaining downstream names from GBIF's backbone taxonomy. Also available indownstream()
, where you can request downstream names from GBIF, along with other data sources (#414)
- Note added in docs for all
db
parameters to warn users that if they provide the wrongdb
value for the given taxon ID, they can get data back, but it would be wrong. That is, all taxonomic data sources available intaxize
use their own unique IDs, so a single ID value can be in multiple data sources, even though the ID refers to different taxa in each data source. There is no way we can think of to prevent this from happening, so be cautious. (#465) - A note added to all IUCN functions to warn users that sometimes incorrect data is returned. This is beyond our control, as sometimes IUCN itself gives back incorrect data, and sometimes EOL/Global Names (which we use in some of the IUCN functions) give back incorrect data. (#468) (#473) (#174) (472) (#475)
- Fix to
gnr_resolve()
to by default capitalize first name of a name string passed to the function. GNR is case sensitive, so case matters (#469)
phylomatic_tree()
andphylomatic_format()
are defunct. They were deprecated in recent versions, but are now gone. See the new packagebrranching
for Phylomatic data (#479)
stripauthority
argument ingnr_resolve()
has been renamed tocanonical
to better match what it actually does (#451)gnr_resolve()
now returns a single data.frame in output, orNULL
when no data found. The input taxa that have no match at all are returned in an attribute with namenot_known
(#448)- updated some functions to work with to R >3.2.x
- In
vascan_search()
changedcallopts
parameter to...
to pass in curl options to the request. - In
ipni_search()
changedcallopts
parameter to...
to pass in curl options to the request. In addition, better http error handling, and added a test suite for this function. (#458) stringsAsFactors=FALSE
now used forgibf_parse()
(https://github.com/ropensci/taxize/commit/c0c4175d3a0b24d403f18c057258b67d3fbf17f0)- Made nearly all column headers and list names lowercase to simplify indexing to elements, as well as combining outputs. (#462)
- Plantminer API updated to use a new API. Option to search ThePlantList or the Brazilian Flora Checklist (#464)
- Added more details to the documentation for
get_uid()
to make more clear how to use the varoious parameters to get the desired result, and how to avoid certain pitfalls (#436) - Removed the parameter
asdf
from the functioneol_dataobjects()
- now returning data.frame's only. - Added some error catching to
get_eolid()
viatryCatch()
to fail better when names not found. - Dropped
openssl
as a package dependency. Not needed anymore because uBio dropped.
gnr_resolve()
failed when no canonical form was found.- Fixed
gnr_resolve()
when no results found whenbest_match_only=TRUE
(#432) - Fixed bug in internal function
itisdf()
to give back an empty data.frame when no results found, often with subspecific taxa. Helps solve errors reported in use ofdownstream()
,itis_downstream()
, andgethierarchydownfromtsn()
(#459)
gnr_resolve()
gains new parameterwith_canonical_ranks
(logical) to choose whether infraspecific ranks are returned or not.- New function
iucn_id()
to get the IUCN ID for a taxon from it's name. (#431)
- All functions that interacted with the taxonomy service uBio are now
defunct. Of course we would deprecate first, then make defunct later, to
make transition easier, but that is out of our hands. The functions
that are defunct are:
ubio_classification()
,ubio_classification_search()
,ubio_id()
,ubio_search()
,ubio_synonyms()
,get_ubioid()
,ubio_ping()
. In addition, ubio has been removed as an option in thesynonyms()
function, and references for uBio have been removed from thetaxize_cite()
utility function. (#449)
rankagg()
doesn't depend ondata.table
anymore (fixes issue with CRAN checks)- Replaced
RCurl::base64Decode()
withopenssl::base64_decode()
, needed forubio_*()
functions (#447) - Importing only functions (via
importFrom
) used across all imports now (#446). In addition,importFrom
for all non-base R pkgs, includinggraphics
,methods
,stats
andutils
packages (#441) - Fixes to prevent problems with httr v1, where you can't pass a zero length
list to the
query
parameter inGET()
, but can passNULL
(#445) - Fixes to all of the
gni_*()
functions, including code tidying, some DRYing out, and ability to pass in curl options (#444)
- Fixed typo in
taxize_cite()
- Fixed a bug in
classification()
where numeric IDs as input got converted to itis ids just because they were numeric. Fixed now. (#434) - Catalogue of Life (COL) changed from using short numeric codes for taxa to long alphanumeric UUID type ids. This required fixing functions using COL web services (#435)
- Added a method for Catalogue of Life for the
synonyms
function to get name synonyms. (#430) - Added datasets
apgFamilies
andapgOrders
. (#418) col_search()
gains parametersresponse
to get a terse or full response, and...
to pass in curl options.eol_dataobjects()
gains parameter...
to pass in curl options, and parameterreturntype
renamed toasdf
(for "as data.frame").ncb_get_taxon_summary()
gains parameter...
to pass in curl options.- The
children()
function gains therows
parameter passed on toget_*()
functions, supported for data sources ITIS and Catalogue of Life, but not for NCBI. - The
upstream()
function gains therows
parameter passed on toget_*()
functions, supported for both data sources ITIS and Catalogue of Life. - The
classification()
function gains therows
parameter passed on toget_*()
functions, for all sources used in the function. - The
downstream()
function gains therows
parameter passed on toget_*()
functions, for all sources used in the function. - Nearly all taxonomic ID retrieveal functions (i.e.,
get_*()
) gain new parameters to help filter results (e.g.,division
,phylum
,class
,family
,parent
,rank
, etc.). These parameters allow direct matching or regex filters (e.g.,.a
to match any character followed by ana
). (#410) (#385) - Nearly all taxonomic ID retrieveal functions (i.e.,
get_*()
) now give back more information (mostly higher taxonomic data) to help in the interactive decision process. (#327) - New data source added to
synonyms()
function: Catalogue of Life. (#430)
vegan
package, used inclass2tree()
function, moved from Imports to Suggests. (#392)- Improved
taxize_cite()
a lot - get URLs and sometimes citation information for data sources available in taxize. (#270) - Fixed typo in
apg_lookup()
function. (#422) - Fixed documentation in
apg_families()
function. (#418) - Across many functions, fixed support for passing in curl options, and added examples of curl option use.
callopts
parameter ineol_pages()
,eol_search()
,gnr_resolve()
,tp_accnames()
,tp_dist()
,tp_search()
,tp_summary()
,tp_synonyms()
,ubio_search()
changed to...
accepted
parameter inget_tsn()
changed toFALSE
by default. (#425)- Default value of
db
parameter inresolve()
changed tognr
astnrs
is often quite slow. - General code tidying across the package to make code easier to read.
- Fixed encoding issues in
tpl_families()
andtpl_get()
. (#424)
- The following functions that were deprecated are now defunct (no longer available):
ncbi_getbyname()
,ncbi_getbyid()
,ncbi_search()
,eol_invasive()
,gisd_isinvasive()
. These functions are available in thetraits
package. (#382) phylomatic_tree()
is deprecated, but will be defunct in a upcoming version.
- New set of functions to ping each of the APIs used in
taxize
. E.g.,itis_ping()
pings ITIS and returns a logical, indicating if the ITIS API is working or not. You can also do a very basic test to see whether content returned matches what's expected. (#394) - New function
status_codes()
to get vector of HTTP status codes. (#394)
- Removed startup message.
- Now can pass in curl options to
itis_ping()
, and all*_ping()
functions.
- Moved examples that were in
\donttest
into\dontrun
.
- New function
genbank2uid()
to get a NCBI taxonomic id (i.e., a uid) from a either a GenBank accession number of GI number. (#375) - New function
get_nbnid()
to get a UK National Biodiversity Network taxonomic id (i.e., a nbnid). (#332) - New function
nbn_classification()
to get a taxonomic classification for a UK National Biodiversity Network taxonomic id. Using this new function, generic methodclassification()
gains method fornbnid
. (#332) - New function
nbn_synonyms()
to get taxonomic synonyms for a UK National Biodiversity Network taxonomic id. Using this new function, generic methodsynonyms()
gains method fornbnid
. (#332) - New function
nbn_search()
to search for taxa in the UK National Biodiversity Network. (#332) - New function
ncbi_children()
to get direct taxonomic children for a NCBI taxonomic id. Using this new function, generic methodchildren()
gains method forncbi
. (#348) (#351) (#354) - New function
upstream()
to get taxa upstream of a taxon. E.g., getting families upstream from a genus gets all families within the one level higher up taxonomic class than family. (#343) - New suite of functions
as.*()
to coerce numeric/alphanumeric codes to taxonomic identifiers for various databases. There are methods on this function for each of itis, ncbi, tropicos, gbif, nbn, bold, col, eol, and ubio. By defaultas.*()
funtions make a quick check that the identifier is a real one by making a GET request against the identifier URI - this can be toggle off by settingcheck=FALSE
. There are methods for returning itself, character, numeric, list, and data.frame. In addition, if theas.*.data.frame()
function is used, a generic method exists to coerce thedata.frame
back to a identifier object. (#362) - New suite of functions named, for example,
get_tsn_()
(the underscore is the only different from the previous function name). These functions don't do the normal interactive process of prompts that e.g.,get_tsn()
do, but instead returned a list of all ids, or a subset via therows
parameter. (#237) - New function
ncbi_get_taxon_summary()
to get taxonomic name and rank for 1 or more NCBI uid's. (#348)
assertthat
removed from package imports, replaced withstopifnot()
, to reduce dependency load. (#387)eol_hierarchy()
now defunct (no longer available) (#228) (#381)tp_classifcation()
now defunct (no longer available) (#228) (#381)col_classification()
now defunct (no longer available) (#228) (#381)- New manual page listing all the low level ITIS functions for which their manual pages are not shown in the package index, but are available if you to
?fxn-name
. - All
get_*()
functions gain a new parameterrows
to allow selection of particular rows. For example,rows=1
to select the first row, orrows=1:3
to select rows 1 through 3. (#347) classification()
now by default returns taxonomic identifiers for each of the names. This can be toggled off by thereturn_id=FALSE
. (#359) (#360)- Simplification of many higher level functions to use
switch()
on thedb
parameter, which helps give better error message when adb
value is not possible or spelled incorrectly. (#379)
- Lots of reduction of redundancy in internal functions. (#378)
- New data sources added to taxize: BOLD (Biodiversity of Life Database). Three more data sources were added (World Register of Marine Species (WoRMS), Pan-European Species directories Infrastructure (PESI), and Mycobank), but are not available on CRAN. Those three data sources provide data via SOAP web services protocol, which is hard to support in R. Thus, those sources are available on Github. See https://github.com/ropensci/taxize#version-with-soap-data-sources
- New function
children()
, which is a single interface to various data sources to get immediate children from a given taxonomic name. (#304) - New functions added to search BOLD data"
bold_search()
that searches for taxa in the BOLD database of barcode data;get_boldid()
to search for a BOLD taxon identifier. (#301) - New function
get_ubioid()
to get a uBio taxon identifier. (#318) - New function started (not complete yet) to get suggested citations for the various data sources available in
taxize
:taxize_cite()
. (#270)
- Using
jsonlite
instead ofRJSONIO
throughout thetaxize
. get_ids()
gains new option to search for a uBio ID, in addition to the others, itis, ncbi, eol, col, tropicos, and gbif.- Fixed documentation for
stripauthority
parametergnr_resolve()
. (#325) iplant_resolve()
now outputs data.frame structure instead of a list. (#306)- Clarified parameter
seqrange
inncbi_getbyname()
andncbi_search()
(#328) synonyms()
gains new data source, can now get synonyms from uBio data source (#319)vascan_search()
giving back more useful results now.
- Added error catching for when URI is too long, i.e., when too many names provided (#329) (#330)
- Various fixes to
tnrs()
function, including more meaningful error messages on failures (#323) (#331) - Fixed bug in
getpublicationsfromtsn()
that caused function to fail on data.frame's with no data on name assignment (#297) - Fixed bug in
sci2comm()
that caused fxn to fail when usingdb=itis
sometimes (#293) - Fixes to
scrapenames()
. Sending a text blob via thetext
parameter now works. - Fixes to
resolve()
so that function now works for all 3 data sources. (#337)
- New function
iplant_resolve()
to do name resolution using the iPlant name resolution service. Note, this is different from http://taxosaurus.org/ that is wrapped in thetnrs()
function. - New function
ipni_search()
to search for names in the International Plant Names Index (IPNI). - New function
resolve()
that unifies name resolution services from iPlant's name resolution service (viaiplant_resolve()
), Taxosaurus' TNRS (viatnrs()
), and GNR's name resolution service (viagnr_resolve()
). - All
get_*()
functions how returning a new uri attribute that is a link to the taxon on on the web. If NA is given back (e.g. nothing found), the uri attribute is blank. You can go directly to the uri in your default browser by doing, for example:browseURL(attr(result, "uri"))
. get_eolid()
now returns an attribute provider because EOL collates taxonomic data form a lot of sources, then gives back IDs that are internal EOL ids, not those matching the id of the source they pull from. This should help with provenance, and should help if there is confusion about why the id givenb back by this function does not match that from the original source.- Within the
get_tsn()
function, now using the functionitis_terms()
, which gives back the accepted status of the taxa. This allows a new parameter in the function (accepted
, logical) that allows user to say give back only accepted status names (accepted=TRUE
), or to give back all names (accepted=FALSE
). gnr_resolve()
gains two new parametersbest_match_only
(logical, to return best match only) andpreferred_data_sources
(to return preferred data sources) andcallopts
to pass in curl options.tnrs()
,tp_accnames()
,tp_refs()
,tp_summary()
, andtp_synonyms()
gain new parametercallopts
to pass in curl options.
class2tree()
can now handle NA in classification objects.classification.eolid()
andclassification.colid()
now return the submitted name along with the classification.- Changed from CC0 to MIT license.
- Updated citation to have both the taxize paper in F1000 Research and the package citation.
- Sped up some functions by removing internal use of
plyr
functions, see #275. - Removed dependency on rgbif - copied into this package a few functions needed internally. This avoids users having to install GDAL binary.
- Added in
verbose
parameter to many more functions to allow suppression of help messages. - In most functions when using
httr
, now manually parsing JSON to a list then to another data format instead of allowing internalhttr
parsing - in addition added checks on content type and encoding in many functions. - Added
match.arg
iternally toget_ids()
for thedb
parameter so that a) unique short abbreviations of possible values are possible, and b) gives a meaningful warning if unsupported values are given. - Most long-named ITIS functions (e.g.,
getexpertsfromtsn
,getgeographicdivisionsfromtsn
) gain parametercurlopts
to pass in curl options. - Added
stringsAsFactors=FALSE
to alldata.frame
creations to eliminate factor variables.
classification.gbifid()
did not return the correct result when taxon not found.- Fixed bugs in many functions, see #245, #248, #254, #277.
classification()
used to fail when it was passed a subset of a vector of ids, in which case the class information was stripped off. Now works (#284)
- itis_downstream() and col_downstream() functions accessible now from a single function downstream() (#238)
- Added a extension function classification() for the gbif id class, classification.gbifid() (#241)
- Added some error catching to class2tree function. (#240)
- Fixed problems in cbind.classification() and rbind.classification() where the first column of the ouput was a useless column name, and all column names now lower case for consistency. (#243)
- classification() was giving back IDS instead of taxon names on the list element names, fixed this so hopefully all are giving back names. (#243)
- Fixed bugs in col_*() functions so they give back data.frame's now with character class columns instead of factors, damned stringsAsFactors! (#246)
- New dataset: Lookup-table for family, genus, and species names for ThePlantList under dataset name "theplantlist".
- get_ids() now accepts "gbif" as an option via use of get_gbifid().
- Changed function itis_phymat_format() to phylomatic_format() - this function gets the typical Phylomatic format name string "family/genus/genus_epithet"
- Updated gbif_parse() base url to the new one (http://api.gbif.org/v1/parser/name).
- Fixes to phylomatic_tree().
- New function class2tree() to convert list of classifications to a tree. For example, go from a list of classifications from the function classification() to this function to get a taxonomy tree in ape phylo format.
- New function get_gbfid() to get a Global Biodiversity Information Facility identifier. This is the ID GBIF uses in their backbone taxonomy.
- classification() outputs gain rbind() and cbind() generic methods that act on the various outputs of classification() to bind data width-wise, or column-wise, respectively.
- Updated ncbi_search() to retrieve more than a max of 500, slightly changed column headers in output data files, and if didn't before, now accepts a vector/list of taxonomic names instead of just one name.
- We attempted to make all ouput column names lowercase, and to increase consistency across column names in outputs from similar functions.
- New function scrapenames() uses the Global Names Recognition and Discovery service to extract taxonomic names from a web page, pdf, or other document.
- New function vascan_search() to search the CANADENSYS Vascan names database.
- Fixed bugs in get_tpsid(), get_eolid() and eol_pages().
- phylomatic_tree() bugs fixed.
- classification() methods were simplified. Now classification() is the workhorse for every data-source. col_classification(), eol_hierarchy(), and tp_classification() are now deprecated and will be removed in the next taxize version.
- classification() gains four new arguments: start, checklist, key, and callopts.
- comm2sci() gains argument simplify to optionally simplify output to a vector of names (TRUE by default).
- get_eolid() and get_tpsid() both gain new arguments key to specify an API key, and ... to pass on arguments to eol_search().
- Added ncbi as a data source (db="ncbi") in sci2comm().
- tax_agg() now accepts a matrix in addition to a data.frame. Thanks to @tpoi
- tnrs() changes: Using httr instead of RCurl; now forcing splitting up name vector when long. Still issues when using POST requests (getpost="POST") wherein a request sent with 100 names only returns 30 for example. Investigating this now.
- Function name change: tp_acceptednames() now tp_accnames().
- Function name change: tp_namedistributions() now tp_dist().
- Function name change: tp_namereferences() now tp_refs().
- Internal ldfast() function changed name to taxize_ldfast() to avoid namespace conflicts with similar function in another package.
- Three functions now with ncbi_* prefix: get_seqs() is now ncbi_getbyname(); get_genes() is now ncbi_getbyid(); and get_genes_avail() is now ncbi_search().
- classification() gains extension method classification.ids() to accept output from get_ids() - which attempts to get a taxonomic hierarchy from each of the taxon identifiers with the output from get_ids().
- synonyms() gains extension method synonyms.ids() to accept output from get_ids() - which attempts to get synonyms from each of the taxon identifiers with the output from get_ids().
- Reworked functions that interact with the ITIS API so that lower level functions were grouped together into higher level functions. All the approximately 50 lower level functions are still exported but are not included in the index help file (due to @keywords internal for each fxn) - but can still be used normally, and man files are avaialable at ?functionName.
- New function itis_ping() to check if the ITIS API service is up, similar to eol_ping() for the EOL API.
- New function itis_getrecord() to get a partial or full record, using a TSN or lsid.
- New function itis_refs() to get references associated with a TSN.
- New function itis_kingdomnames() to get all kingdom names, or kingdom name for a TSN.
- New function itis_lsid() to get a TSN from an lsid, get a partial or full record from an lsid.
- New function itis_native() to get status as native, exotic, etc. in various geographic regions.
- New function itis_hierarchy() to get full hierarchy, or immediate up or downstream hierarchy.
- New function itis_terms() to get tsn's, authors, common names, and scientific names from a given query.
- New function sci2comm() to get common (vernacular) names from input scientific names from various data sources.
- New function comm2sci() to get scientific names from input common (vernacular) names from various data sources.
- New function get_ids() to get taxonomic identifiers across all sources.
- itis_taxrank() now outputs a character, not a factor; loses parameter verbose, and gains ..., which passes on further arguments to gettaxonomicranknamefromtsn.
- tp_synonyms(), tp_summary(), plantminer(), itis_downstream(), gisd_isinvasive(), get_genes_avail(), get_genes(), eol_invasive(), eol_dataobjects(), andn tnrs() gain parameter verbose to optionally suppress messages.
- phylomatic_tree() format changed so that names are passed in normall (e.g., Poa annua) instead of the slashpath format (family/genus/genus_species). Also, taxaformat parameter dropped.
- itis_acceptname() gains ... to pass in further arguments to getacceptednamesfromtsn()
- tp_namedistributions() loses parameter format.
- get_tsn() and get_uid() return infomation about match as attribute.
- clarified iucn-documentation
- Fixed bug in synonyms() so that further arguments can be passed on to get_tsn() to suppress messages.
- Removed test for ubio_classification_search(), a function that isn't operational yet.
- New functions added just like get_uid()/get_tsn() but for EOL, Catalogue of Life, and Tropicos, see get_eolid(), get_colid(), and get_tpsid(), respectively.
- classification() methods added for EOL, Catalogue of Life, and Tropicos, see functions classification.eolid(), classification.colid(), and classification.tpsid() respectively.
- New function col_search() to search for names in the Catalogue of Life.
- User can turn off interactive mode in get_* functions. All get_* functions gain an ask argument, if TRUE (default) a user prompt is used for user to select which row they want, if FALSE, NA is returned when many results available; and added tests for the new argument. Affects downstream functions too.
- New function eol_invasive() to search EOL collections of invasive species lists.
- New function tp_search() to search for a taxonomic IDs from Tropicos.
- New function tp_classification() to get a taxonomic hierarchy from Tropicos.
- New function gbif_parse() to parse scientific names into their components, using the GBIF name parser API.
- New function itis_searchcommon() to search for common names across both searchbycommonnamebeginswith, and searchbycommonnameendswith.
- tax_name() and other function broke, because get_tsn() and get_uid() returned wrong value when a taxon was not found. Fixed.
- Added tests for new classification() methods for EOL, COL, and Tropicos.
- Added tests for new functions tp_search() and tp_classification().
- Moved tests from inst/tests to tests/testthat according to new preferred location of tests.
- Updated CITATION in inst/ with our F1000Research paper info.
- Package repo name on Github changed from taxize_ to taxize - remember to use "taxize" in install_github() calls now instead of "taxize_"
- New function tpl_families() to get data.frame of families from The Plantlist.org site.
- New function names_list() to get a random vector of species names using the
- Added two new data sets, plantGenusNames.RData and plantNames.RData, to be used in names_list().
- New function ldfast(), a replacement function for plyr::ldply that should be faster in all cases.
- Changed API key names to be more consistent, now tropicosApiKey, eolApiKey, ubioApiKey, and pmApiKey - do change these in your .Rprofile if you store them there.
- Added a startup message.
- Across most functions, removed dependencies on plyr, using ldfast() instead, for increased speed.
- Across most functions, changed from using RCurl to using httr.
- Across most functions, stop_for_status() now used directly after Curl call to check the http status code, stoping the function if appropriate code found.
- Many functions changed parameter ... to callopts, which passes on additional Curl options, with default an empty list (list()), which makes function testing easier.
- eol_search() gains parameters page, exact, filter_tid, filter_heid, filter_by_string, matching, cache_ttl, and callopts.
- eol_hierarchy() gains parameter callopts, and loses parameter usekey (always using API key now).
- eol_pages() gains parameters images, videos, sounds, maps, text, subject, licenses, details, common_names, synonyms, references, vetted, cache_ttl, and callopts.
- gni_search(): parameter url lost, is defined inside the function now, and .Rd file gains url references.
- phylomatic_tree() now checks to make sure family names were found for input taxa. If not, the function stops with message informing this.
- tpl_get() updated with fixes/improvements by John Baumgartner - now gets taxa from all groups, whereas only retrieved from Angiosperms before. In addition, csv files from The Plantlist.org are downloaded directly rather than read into R and written out again.
- tpl_search() now checks for missing data or errors, and stops function with error message.
- capwords() fxn changed to taxize_capwords() to avoid namespace conflicts with other packages with a similar function.
- ubio_namebank() was giving back base64 encoded data, now decoded appropriately.
- Added John Baumgartner as an author.
- tax_name() accepts multiple ranks to query.
- tax_name() accepts vectors as input.
- tax_name() has an option to query both, NCBI and ITIS, in one call and return the union of both.
- new extractor function for iucn_summary(): iucn_status(), to extract status from iucn-objects.
- tax_agg(): A function to aggregate species data to given taxonomic rank.
- tax_rank(): Get taxonomic rank for a given taxon name.
- classification() accepts taxon names as input and returns a named list.
- new function apg_lookup() looks up APGIII taxonomy and replaces family names
- new function gni_parse() parses scientific names using EOl's name parser API
- new function iucn_getname() is a utility to find IUCN names using the EOL API
- new function rank_agg() aggregates data by a given taxonomic rank
- new data table apg_families
- new data table apg_orders
- gnr_resolve() gains new arguments gnr_resolvee_once, with_context, stripauthority, highestscore, and http, and loses returndf (that is, a data.frame is returned by default)
- gni_search() gains parameter parse_names
- tnrs() parameter getpost changed from default of 'GET' to 'POST'
- Across all functions, the url parameter specifying an API endpoint was moved inside of functions (i.e., not available as a parameter in the function call)
- gnr_datasources() parameter todf=TRUE by default now, returning a data.frame
- col_classification() minor formatting improvements
- iucn_summary() returns no information about population estimates.
- get_tsn() raised a warning in specific situations.
- tax_name() did not work for multiple ranks with ITIS.
- fixed errors in getfullhierarchyfromtsn()
- fixed errors in gethierarchydownfromtsn()
- fixed errors in getsynonymnamesfromtsn()
- fixed errors in searchforanymatch()
- fixed errors in searchforanymatchedpage()
- Removed dependency to NCBI2R
- Improvements of documentation
- Citation added
- removed tests for now until longer term fix is made so that web APIs that are temporarily down don't cause tests to fail.
- added R (>= 2.15.0) so that package tests don't fail on some systems due to paste0()
- remove test for ubio_namebank() function as it sometimes fails
- iucn_summary() does not break when API returns no information.
- tax_name() returns NA when taxon is not found on API.
- get_uid() asks for user input when more then one UID is found for a taxon.
- changed base URL for phylomatic_tree(), and associated parameter changes
- added check for invasive species status for a set of species from GISD database via gisd_isinvasive().
- Further development with the EOL-API: eol_dataobjects().
- added Catalogue of Life: col_classification(), col_children(), and col_downstream().
- new fxn get_genes(), retrieve gene sequences from NCBI by accession number.
- new functions to interact with the Phylotastic name resolution service: tnrs_sources() and tnrs()
- Added unit tests
- itis_name() fxn deprecated - use tax_name() instead
- changed paste0 to paste to avoid problems on certain platforms.
- removed all tests until the next version so that tests will not fail on any platforms.
- plyr was missing as import for iucn_summary fxn.
- added NEWS file.
- released to CRAN