Source code: source/create.ts
The RFC5988 defines how the Link
header looks like.
When the response has been processed, Got looks for the reference of the next
relation.
This way Got knows the URL it should visit afterwards. The header can look like this:
Link: <https://api.github.com/repositories/18193978/commits?page=2>; rel="next", <https://api.github.com/repositories/18193978/commits?page=44>; rel="last"
By default, Got looks only at the next
relation. To use other relations, you need to customize the paginate
function below.
Returns an async iterator.
This is memory efficient, as the logic is executed immediately when new data comes in.
import got from 'got';
const countLimit = 10;
const pagination = got.paginate(
'https://api.github.com/repos/sindresorhus/got/commits',
{
pagination: {countLimit}
}
);
console.log(`Printing latest ${countLimit} Got commits (newest to oldest):`);
for await (const commitData of pagination) {
console.log(commitData.commit.message);
}
Note:
- Querying a large dataset significantly increases memory usage.
Returns a Promise for an array of all results.
import got from 'got';
const countLimit = 10;
const results = await got.paginate.all('https://api.github.com/repos/sindresorhus/got/commits', {
pagination: {countLimit}
});
console.log(`Printing latest ${countLimit} Got commits (newest to oldest):`);
console.log(results);
Type: object
Default:
{
transform: (response: Response) => {
if (response.request.options.responseType === 'json') {
return response.body;
}
return JSON.parse(response.body as string);
},
paginate: ({response}) => {
const rawLinkHeader = response.headers.link;
if (typeof rawLinkHeader !== 'string' || rawLinkHeader.trim() === '') {
return false;
}
const parsed = parseLinkHeader(rawLinkHeader);
const next = parsed.find(entry => entry.parameters.rel === 'next' || entry.parameters.rel === '"next"');
if (next) {
return {url: next.reference};
}
return false;
},
filter: () => true,
shouldContinue: () => true,
countLimit: Number.POSITIVE_INFINITY,
backoff: 0,
requestLimit: 10_000,
stackAllItems: false
}
This option represents the pagination
object.
Type: Function
Default: response => JSON.parse(response.body)
A function that transforms Response
into an array of items.
This is where you should do the parsing.
Type: Function
Default: Link
header logic
The function takes an object with the following properties:
response
- The current response object,currentItems
- Items from the current response,allItems
- An empty array, unlessstackAllItems
istrue
, otherwise it contains all emitted items.
It should return an object representing Got options pointing to the next page. If there is no next page, false
should be returned instead.
The options are merged automatically with the previous request.
Therefore the options returned by pagination.paginate(…)
must reflect changes only.
Type: Function
Default: ({item, currentItems, allItems}) => true
Whether the item should be emitted or not.
Type: Function
Default: ({item, currentItems, allItems}) => true
Note:
- This function executes only when
filter
returnstrue
.
For example, if you need to stop before emitting an entry with some flag, you should use ({item}) => !item.flag
.
If you want to stop after emitting the entry, you should use ({item, allItems}) => allItems.some(item => item.flag)
instead.
Type: number
Default: Number.POSITIVE_INFINITY
The maximum amount of items that should be emitted.
Type: number
Default: 0
Milliseconds to wait before the next request is triggered.
Type: number
Default: 10000
The maximum amount of request that should be triggered.
Note:
- Retries on failure are not counted towards this limit.
Type: boolean
Default: false
Defines how allItems
is managed in pagination.paginate
, pagination.filter
and pagination.shouldContinue
.
By default, allItems
is always an empty array. Setting this to true
will significantly increase memory usage when working with a large dataset.
In this example we will use searchParams
instead of Link
header.
Just to show how you can customize the paginate
function.
The reason filter
looks exactly the same like shouldContinue
is that the latter will tell Got to stop once we reach our timestamp.
The filter
function is needed as well, because in the same response we can get results with different timestamps.
import got from 'got';
import Bourne from '@hapi/bourne';
const max = Date.now() - 1000 * 86400 * 7;
const iterator = got.paginate('https://api.github.com/repos/sindresorhus/got/commits', {
pagination: {
paginate: ({response, currentItems}) => {
// If there are no more data, finish.
if (currentItems.length === 0) {
return false;
}
// Get the current page number.
const {searchParams} = response.request.options;
const previousPage = Number(searchParams.get('page') ?? 1);
// Update the page number by one.
return {
searchParams: {
page: previousPage + 1
}
};
},
// Using `Bourne` to prevent prototype pollution.
transform: response => Bourne.parse(response.body),
filter: ({item}) => {
// Check if the commit time exceeds our range.
const date = new Date(item.commit.committer.date);
const end = date.getTime() - max >= 0;
return end;
},
shouldContinue: ({item}) => {
// Check if the commit time exceeds our range.
const date = new Date(item.commit.committer.date);
const end = date.getTime() - max >= 0;
return end;
},
// We want only 50 results.
countLimit: 50,
// Wait 1s before making another request to prevent API rate limiting.
backoff: 1000,
// It is a good practice to set an upper limit of how many requests can be made.
// This way we can avoid infinite loops.
requestLimit: 10,
// In this case, we don't need to store all the items we receive.
// They are processed immediately.
stackAllItems: false
}
});
console.log('Last 50 commits from now to week ago:');
for await (const item of iterator) {
console.log(item.commit.message.split('\n')[0]);
}