Joins, or a way to pulling extra data from other namespaces #39

nstott · 2015-01-01T18:43:11Z

After fetching a document from a source, we need a way to resolve pieces of the document when data might exist in other namespaces
eg, if we have a document from a namespace of 'posts' that looks like this

{
    title: "this is a title",
    author: ObjectId("54179ce06570544fb3892b69"),
    content: "post content"
}

then we need to be able to query another namespace on the source to turn ObjectId("54179ce06570544fb3892b69") into an appropriate object.

One way to solve this would be to add a javascript vm to the source, and let the user run a js function with a javascript builtin or other mechanism that would perform a lookup against the source

The text was updated successfully, but these errors were encountered:

nstott · 2015-01-01T19:10:15Z

If we agree that merging the joined data into the existing document should be handled in a javascript vm in the source, then we're presented with a bit of a dilemma about how to request lookups
i.e. consider the case of redis, and mongo,
with redis we might want to query with
GET <key> or HGET <key> <value>
whereas with mongo, we'd want to do a db.findOne({_id: <id>}), or possibly a findOne(<bson query>)

We are forced to either provide a generic function that can take a variety of ways to query,
ie
Mongo

module.exports = function(doc, source) {
    doc["author"] = source.lookup({namespace: "boom.authors", query: {_id: doc.author_id}});
}

Redis

module.exports = function(doc, source) {
    doc["author"] = source.lookup({method: "HGET", key: "authors", value: doc.author_id});
}

or we provide specialized functions for each source type.
Redis

module.exports = function(doc, source) {
    doc["author"] = source.HGET("authors", doc.author_id);
}

Each of these options has drawbacks.

opinions?

mrkurt · 2015-01-02T04:24:36Z

I think you can probably get a long way with a stupid simple lookup interface right now, even just k/v lookups (find by _id in Mongo, get in Redis. Advanced queries (anything special on Redis probably counts) can come later.

And, the less actual work that happens in Javascript the more chance there is to optimize / scale this stuff later. Letting people do arbitrary queries and then run logic against them in JS seems like it's going to create a really hard-to-optimize performance bottleneck.

andrewreedy · 2015-04-03T19:13:52Z

👍

tiengtinh · 2015-05-03T15:24:57Z

👍 This is also one feature that I'm looking forward to

shividhar · 2015-07-02T04:26:01Z

+1 Definitely a sought after feature.

allanlundhansen · 2015-08-19T05:57:59Z

+1 denormalization of data for elasticsearch should be possible

vinodtolexo · 2015-08-30T17:16:16Z

+1 from me as well I can also contribute tothe code if required .

jipperinbham · 2017-03-13T19:32:15Z

dumping what the plan is here so I don't forget when I get to this soon...

t.Source(mongodb({uri: "connection string"…}).
  Join(postgres({uri: "connection string"…}), {
    id_map: {"account_id": "id"}, 
    field_map: {
      "name": "flegergle", 
      "slug": "account_slug"
   },
    query_ref: "accounts"}).
  Save(elasticsearch({uri: "connection string"...})
)

NOTE this is contingent on changes to the javascript DSL which is currently in progress.

the general idea here is to have a new method Join(...) that takes two parameters, an adaptor and a configuration for performing the query.

in the above pipeline, the following scenario would take place:

original doc

{
  "_id": "somespecial_ID",
  "name": "fancypants",
  "type": "foo",
  "account_id": 1567
}

and when sent to the Join the following query would be executed:

SELECT name AS flegergle, slug AS account_slug FROM accounts WHERE id = 1567

which would then send the following document down the pipeline:

{
  "_id": "somespecial_ID",
  "name": "fancypants",
  "type": "foo",
  "account_id": 1567,
  "flegergle": "Super Duper",
  "account_slug": "super-duper"
}

The initial implementation of this will likely only support joins to a single table/collection.

nstott added the enhancement label Jan 1, 2015

jipperinbham mentioned this issue Aug 31, 2015

Denormalizing Data from Two Collections Before Importing into ElasticSearch #127

Closed

jipperinbham mentioned this issue Dec 6, 2016

load mongo object while transporting to ES #163

Closed

jipperinbham added the next label Mar 10, 2017

jipperinbham added this to the v0.3.0 milestone Mar 10, 2017

jipperinbham added ready and removed next labels Mar 16, 2017

jipperinbham removed this from the v0.3.0 milestone Mar 16, 2017

nstott closed this as completed Jun 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Joins, or a way to pulling extra data from other namespaces #39

Joins, or a way to pulling extra data from other namespaces #39

nstott commented Jan 1, 2015 •

edited by jipperinbham

Loading

nstott commented Jan 1, 2015

mrkurt commented Jan 2, 2015

andrewreedy commented Apr 3, 2015

tiengtinh commented May 3, 2015

shividhar commented Jul 2, 2015

allanlundhansen commented Aug 19, 2015

vinodtolexo commented Aug 30, 2015

jipperinbham commented Mar 13, 2017

Joins, or a way to pulling extra data from other namespaces #39

Joins, or a way to pulling extra data from other namespaces #39

Comments

nstott commented Jan 1, 2015 • edited by jipperinbham Loading

nstott commented Jan 1, 2015

mrkurt commented Jan 2, 2015

andrewreedy commented Apr 3, 2015

tiengtinh commented May 3, 2015

shividhar commented Jul 2, 2015

allanlundhansen commented Aug 19, 2015

vinodtolexo commented Aug 30, 2015

jipperinbham commented Mar 13, 2017

nstott commented Jan 1, 2015 •

edited by jipperinbham

Loading