-
Notifications
You must be signed in to change notification settings - Fork 0
APIs for Scientists
Erika Austhof edited this page Oct 3, 2022
·
3 revisions
I attended the Demystifying APIs for Scientists workshop on 9/28/2022 by David LeBauer.
Slides and the full workshop on Youtube
An API is useful for getting data for analyses.
- The API is made for machines, it's programmable, you can automate it, and it can export in an analysis-ready format.
- Endpoints are additional URLs where you can query data. Parameters are additional types of information you can use, such as coding or filters you want to include when querying data. Constructing the endpoint will allow you to get text data from a website.
- For example: api.gbig/org/v1/species?name=Puma%20concolor
- Species is the endpoint, name is the parameter, %20 is a space character
- You get a lot of text data which is not human-readable
- You can get a JSON formatter extension in order to read the data better
A coding language for handling data structures.
- it can support a really complex structure of the data
- You can have data that's nested, complex, and is relational across different types of data
- It supports schemas, and description of the data can be more complex too
- this is more effective, and widely used
- "this is the way"
- using the previous example, you can get the rows!
- In R, the package
jsonlite
has a function calledfromJSON
which will send the records to a list, which is collapsed into a table- if you do this with a complex dataset, you might have a table within a table, so it could take some time to get the data into a format you need
- Why not a CSV? JSON allows nesting with more complicated structures. For example, some have mapped their data to an existing API which is great for connecting data, but it would be complex for connections (mapping by different time and spatial scales)
- Offset is "what page to start on" for example starting on page 10
- Limit is the number of records, if there is no limit, you can just set the limit to the max number of records. Otherwise you have to set a limit and then "walk back" to the number of records: e.g. limit 200, offset 100-199
A workbench for using APIs.
- Helps you build queries for accessing data
- Allows you to build them, and see what is being pulled from the server
- cURL provides a way to get the code on your local computer, the same way that postman is pulling in the data. In a terminal, paste the cURL
- You can use lots of different packages, so Postman will give you the code for R using a library, which can then be executed in the R terminal
- Make sure that spaces are replaced with %20 so it will work
Number of packages available for pulling down, merging, and getting data into tables. Some of them even have API Clients which help you move through each of the steps. API clients are easier to use, whereas a browser is more flexible.
- ROpenSci hosts the prism package for weather data.