Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Symbol.asyncIterator for cursors #616

Merged
merged 2 commits into from
May 1, 2020
Merged

Conversation

pluma
Copy link
Contributor

@pluma pluma commented Aug 6, 2019

Not sure if this breaks compat with older node versions.

const cursor = await db.query("FOR i IN 1..100 RETURN i");
for await (const value of cursor) {
  console.log(value);
}

@pluma pluma changed the title Implement Symbol.asyncIterator Implement Symbol.asyncIterator for cursors Aug 6, 2019
@pluma pluma changed the base branch from master to v7 May 1, 2020 15:18
@pluma pluma force-pushed the feature/for-await-of-cursor branch from 317a86f to 0f5f502 Compare May 1, 2020 15:37
@pluma pluma merged commit ef71e96 into v7 May 1, 2020
@pluma pluma deleted the feature/for-await-of-cursor branch May 8, 2020 11:13
@JonDum
Copy link

JonDum commented Jul 4, 2020

Hey, @pluma. This is a really cool addition because I've found myself really wanting ArrayCursor.each/map to support async handlers, but I think this is probably not what you want most of the time because if you use any await inside the for loop b/c it will process and await serially. Worse, if you don't use await then the execution will go right past the loop before it's done.

Example:

const {Database} = require('arangojs')
const db = new Database({url: 'http://localhost:8529'})

const sleep = (ms = 1000) => new Promise(resolve => setTimeout(resolve, ms))

const getSomething = async (value) => {
    await sleep()
    return value * value
}

const doSomething = async (value, thing) => {
    await sleep()
    console.log('doSomething: ', value, thing)
}

async function forOfTest() {
    const cursor = await db.query("FOR i IN 1..10000 RETURN i");

    for await (const value of cursor) {
        const thing = await getSomething(value)
        await doSomething(value, thing)
    }
    console.log('done')
}

forOfTest()

Output

doSomething 1 1
(2s pause)
doSomething 2 4
(2s pause)
doSomething 4 16
(2s pause)
....
(20,000s later....)
...
done

That's most likely not what the user wants. And if instead you move it to a different function like so:

const processThing = async (value) => {
	const thing = await getSomething(value)
	await doSomething(value, thing)
}

async function forOfTest() {
    const cursor = await db.query("FOR i IN 1..10000 RETURN i");

    for await (const value of cursor) {
        processThing(value)
    }

	console.log('done')
}

Output

done
(2s pause)
doSomething 1 1
doSomething 2 4
doSomething 4 16
....

It doesn't wait through the loop :(

What I've found most useful is processing the cursor batch by batch and awaiting on that. Something like this:

const {Database} = require('arangojs')
const db = new Database({url: 'http://localhost:8529'})

const sleep = (ms = 1000) => new Promise(resolve => setTimeout(resolve, ms))

const getSomething = async (value) => {
    await sleep()
    return value * value
}

const doSomething = async (value, thing) => {
    await sleep()
    console.log('doSomething: ', value, thing)
}

async function process(cursor, fn) {
	while(cursor.hasNext) {
		const batch = await cursor.nextBatch()
		if(batch) {
			await Promise.all(batch.map(fn))
		}
	}
}

async function forOfTest() {
    const cursor = await db.query("FOR i IN 1..10000 RETURN i")

	await process(cursor, async (value) => {
		const thing = await getSomething(value)
		await doSomething(value, thing)
	})
        // properly awaits, process batch by batch
	console.log('done')
}

forOfTest()

It would be really awesome if that process function could be built into ArrayCursor somehow (possibly forEach?)

@pluma
Copy link
Contributor Author

pluma commented Jul 4, 2020

I don't see how doing it batchwise rather than item by item solves the problem of the cursor going away if you're too slow. Is the idea that this might help with situations where items in the same batch could be processed in parallel?

@pluma
Copy link
Contributor Author

pluma commented Jul 6, 2020

I could imagine having something like cursor.batched return a BatchedArrayCursor that mimics the API but only provides batch-wise access instead of item-wise. This would require a fair bit of duplication but it would allow me to remove all the batch logic from ArrayCursor.

@pluma
Copy link
Contributor Author

pluma commented Jul 6, 2020

@JonDum I've implemented a BatchedArrayCursor in #673, please check if that suits your needs. It provides exactly the same API ArrayCursor does except batch-wise rather than item-wise. The cursor is exposed as cursor.batches (which links back via batchedCursor.items). This also removes cursor.nextBatch, moves cursor.hasMore to batchedCursor.hasMore and adds batchedCursor.loadAll to fetch all remaining batches from the server.

@pluma
Copy link
Contributor Author

pluma commented Jul 6, 2020

Here's how your forOfTest function would look like using for-await-of with the batch-wise API:

async function forOfTest() {
  const cursor = await db.query("FOR i IN 1..10000 RETURN i");

  for await (const batch of cursor.batches) {
    await Promise.all(
      batch.map(async (value) => {
        const thing = await getSomething(value);
        await doSomething(value, thing);
      })
    );
  }

  console.log("done");
}

@pluma pluma added this to the v7 milestone Jul 6, 2020
@JonDum
Copy link

JonDum commented Jul 6, 2020

@pluma Holy moly! That's awesome!

I think this will give devs a lot more power to iterate through large collections without running oom!

Excited to see this drop in v7

@pluma
Copy link
Contributor Author

pluma commented Jul 6, 2020

The PR has been merged into the v7 branch: https://arangodb.github.io/arangojs/devel/classes/_cursor_.batchedarraycursor.html

pluma added a commit that referenced this pull request Jul 10, 2020
* Implement Symbol.asyncIterator

* Add to CHANGELOG, add warning
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants