Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Way to strip out unknown properties. #1

Closed
gothammm opened this issue Dec 14, 2015 · 48 comments
Closed

Way to strip out unknown properties. #1

gothammm opened this issue Dec 14, 2015 · 48 comments

Comments

@gothammm
Copy link

Is there a way I can strip out/delete unknown properties that are not defined in the schema?

@notheotherben
Copy link
Member

Not entirely sure about what your use case here is, but let me quickly describe how Iridium associates properties - and let's see if that clears things up for you.

Iridium pulls down the document from the database, for the most part, in its original form (the exception being when you provide the fields property in your query). It then selectively exposes properties defined in the schema through its Instance type. This is done to allow V8 to optimize the object type (dynamic properties result in the V8 engine de-optimizing the code and slowing everything down). With that in mind, the Instance you have will only expose the properties defined in the schema - while its document sub-property will contain all defined properties of the object as returned by the database.

If you're looking for a way to have Iridium automatically generate a fields property based on the schema - that is currently not implemented, but can definitely be something we look at if necessary.

@notheotherben
Copy link
Member

Ah, just noticed that this was on the Skmatc repository - little things we miss late at night 😉. Skmatc isn't really a transforms framework, the idea is that its operations remain idempotent. That being said, if you want to implement a "validator" which operates on objects and removes any properties not contained within the schema then it is certainly possible.

var skmatc = require('skmatc');

module.exports = skmatc.create(function(schema) {
  // Only operate on objects
  return typeof schema === 'object';
}, function(schema, data, path) {
  var dataKeys = Object.keys(data);
  for(var i = 0; i < dataKeys.length; i++) {
    if(schema[dataKeys[i]] === undefined) {
      delete data[dataKeys[i]];
    }
  }

  // Return nothing as we aren't validating
}, {
  name: '[destructive] strip unknown properties'
})

That module would then need to be registered with your Skmatc instance and used like this:

var skmatc = require('skmatc');
var myValidator = require('./myValidator');

var s = new skmatc({
  prop1: true
});

s.register(myValidator);

var data = {
  prop1: 12,
  prop2: 'should be removed'
};

var validationResult = s.validate(data);

// validationResult.success === true
// data ~== { prop1: 12 }

If you're using it with Iridium, you would expose your validator using the validators static property on your model.

@gothammm
Copy link
Author

Wow. Thanks, this is what I was looking for. And 1 more question, so when you register your custom skmatc validator, it does play along well other default validators right? ( like it merges with the default validators).

So, thing is I want this stripping of properties to work well with Iridium, and I see Iridium picks up a default validator. So just wondering how custom validators play along with the default ones

@notheotherben
Copy link
Member

That is correct, the default validators are run first, followed by your custom validators. Only validators which return true for the first function (handles) will be executed for a schema entry, and validators are recursively applied to object-type schema entries (you'll see the path parameter indicate where you are in the tree).

You shouldn't have any issues using this with Iridium - but if you do bump into anything, please let me know so that I can address it.

@gothammm
Copy link
Author

Understood. Thanks for clarifying. Yes will do :).

And great job with Iridium :) loving it!

@notheotherben
Copy link
Member

Thank you, glad to hear you're finding it useful 😄

@gothammm
Copy link
Author

Bump.

So I've a schema in Iridium something like this,

User.schema = {
  _id: false,
  email: { $type: /^.+@.+$/, $required: true, $message: 'Invalid Email Address' },
  username: { $type: String, $required: true },
  users: { $type: [UserTwo.schema], $required: true }
};
User.validators = [CustomValidator] (the example you gave).

But, while running the validators, it only hits, the "_id" part of the schema, and doesn't proceed to the next one like the email, username etc.

Any thoughts?

@notheotherben
Copy link
Member

Can you clarify exactly what you mean by "hits"?

Perhaps it would be easier if you simply swap it out for this version which prints a number of debug lines to help track down exactly what is happening...

var skmatc = require('skmatc');

module.exports = skmatc.create(function(schema) {
  console.log('check handling (%j): %j', schema, typeof schema === 'object');
  return typeof schema === 'object';
}, function(schema, data, path) {
  console.log('validating (%j|%j)@%s', data, schema, path);
  var dataKeys = Object.keys(data);
  for(var i = 0; i < dataKeys.length; i++) {
    if(schema[dataKeys[i]] === undefined) {
      console.log('removing %s', dataKeys[i]);
      delete data[dataKeys[i]];
    }
  }

  console.log('finished validating');
}, {
  name: '[destructive] strip unknown properties'
})

@gothammm
Copy link
Author

check handling (undefined): false

This is what, it hits.

@gothammm
Copy link
Author

I also tried just running the previous example, without iridium, it does not hit the validate area at all, it just breaks out after hitting the handle function.

@notheotherben
Copy link
Member

Oh, very interesting - can you by any chance print out the values you are passing in as a schema and data? Or if not, representative ones that I can look at

@gothammm
Copy link
Author

So considering, that User schema, and the validation, this is the data I'm sending.

// Schema
User.schema = {
  _id: false,
  email: { $type: /^.+@.+$/, $required: true, $message: 'Invalid Email Address' },
  username: { $type: String, $required: true },
  users: { $type: [UserTwo.schema], $required: true }
};
// Data
{
    email: '[email protected]',
    username: 'test',
    users: [{ userX: 'test', emailX: 'test' }],
    test: 'test',
    hello: 'can you hear me?'
  }

Also, I've tried with this data, as well.

{
  prop1: 12,
  prop2: 'should be removed'
};

@notheotherben
Copy link
Member

Okay, let me put together a quick test project and see if I can repro+fix it for you 😉

@gothammm
Copy link
Author

Awesome, thanks!

@notheotherben
Copy link
Member

Ah, the reason you're having troubles with this is that skmatc will use the first validator which reports that it handles a schema type - and ignore the results of the rest. You can see that here

I guess the better solution is to allow each validator to be executed in sequence (if it reports that it can handle the schema type) and then return the aggregate result. I'll need to see what effect that has on Iridium before I release the changes, but shouldn't be the end of the world to implement 😄

@gothammm
Copy link
Author

Yeah, so the other validators don't get a chance 😄 .

So, how long for changes 😉

@gothammm
Copy link
Author

I don't think it would effect too much, if you aggregate them, and send one single result. It would be of the same type in the end either way. Right?

@gothammm
Copy link
Author

I see, the Result.compound, Here, I guess this should help 😄 right.

@notheotherben
Copy link
Member

Just glanced through, unfortunately it's a bit more complex than that (sorry, haven't made changes on this codebase in a long time, getting to grips with it all over again). The problem is that certain validator types (control for example) actually rely on other validators not being executed in order to function correctly.

I've actually been itching to rewrite Skmatc for a while, and this may just be the push I need to get started on doing so - I'll look through it tonight and see how far I get, will try to make sure you have something to work with (or at least a workaround) by the end of the week at the latest.

@gothammm
Copy link
Author

Ah, got it. 😢.

Thanks a ton for your effort. Would love to hear some update on this 😄

@gothammm
Copy link
Author

@spartan563 Should there be a new issue created for this?

@gothammm
Copy link
Author

@spartan563 can we have some options apart from the schema, like.

var s = new Skmatc({
  prop1: Number
}, {
  stripUnknown: true
});

and pass on this options to the validators, and in ObjectValidator.js, we can have something like.

// ObjectValidator.js (validate func)
function(schema, data, path, options) {
    var stripUnknownProps = _.isBoolean(options.stripUnknown) ? options.stripUnknown : false;    
    if(data === null || data === undefined) return this.fail('Expected ' + (path || 'value') + ' to be a defined non-null value but got ' + data + ' instead');

    if (stripUnknownProps) {
      _.map(data, function(val, key) {
        if (!schema[key]) {
          delete data[key];
        }
      });
    }

    var result = Result.compound(_.map(schema, function(schemaType, key) {
        var newPath = (path ? path + '.' : '') + key;
        return skmatc.validate(this.skmatc.validators, schema[key], data[key], newPath);
    }, this));

    return result.failures;
}

Just trying to have a quick work around.

What do you think?

@notheotherben
Copy link
Member

I'd probably err away from that approach simply because it mutates the original document - something which violates the implicit guarantees about how a validation tool is supposed to work. It may be better, at least for this use-case.

The way I see it, if we implement something along the lines of a $strict: true option to enforce extra-property restrictions, then it should fail validation rather than damaging the original object. Conversely, for a sanitization solution, it may be a better idea to instead provide that functionality within Iridium - I'll look at doing that as well and see whether it maybe ends up being a more time-efficient solution.

@gothammm
Copy link
Author

Agreed, sorry about the bad work around 😞

Sure, looking forward for the change 😄

@notheotherben
Copy link
Member

Not a problem at all, no such thing as a bad idea - only way to find out what works 😉
I'm currently looking at implementing a document-level transform in Iridium (works the same way as the property level ones currently implemented) and also including property and model as additional (optional) arguments in your transforms - from that you should very easily be able to satisfy your use case and it keeps it nice and generic so that it can easily be used for other stuff down the line.

The idea is that you would have something like this on your Iridium Model:

class MyModel extends Iridium.Instance<MyDocument, MyModel> {
  static transforms = {
    $document: {
      toDB: (document: any, property: string, model: Iridium.Model<any, any>) => {
        // You can access the schema through model.schema, and can return a new
        // document which includes only the schema properties.
      },
      fromDB: document => document
    }
  };
}

Any concerns/complaints/suggestions while I'm still working on it?

@gothammm
Copy link
Author

So, this transforms is at Model level, and not at instance level (like the one right now) ?

The idea, sounds good to me.

PS: I don't know how transforms work at Instance levle 😞

I've something like this right now.

User.transforms = {
  _id: {
    fromDB: (value) => {
      console.log('From DB - ', value);
      return value;
    },
    toDB: (value) => {
      console.log('To DB - ', value);
      return value;
    }
  }
};

So, how does this work? I don't see the console.log, either way sorry for diverting the topic, just noticed right now. You let me know when you are done ^^

@notheotherben
Copy link
Member

Currently transforms operate on a per-property level, i.e. your _id transform will be triggered only for the _id property, and only when it is present (i.e. not undefined). The toDB transform is triggered when you create a document as well as whenever you set schema property on an instance. Similarly, the fromDB transform is used by the Instance's getters to retrieve the value from the backing document.

So I guess the best way to describe transforms is to say that they act as a layer between how the database represents your document's properties (as defined in the schema) and how Iridium exposes them.

Something like ObjectID for example is generally used as a string, but when you store it in the database you want to use the ObjectID type instead. Transforms allow Iridium to handle this without you needing to do any significant legwork.

This would add an additional transform using the special name $document which would receive the entire backing document instead of just a single property - allowing you to manipulate the document itself.

@gothammm
Copy link
Author

Got it. But when I try to fetch data using get/find/findOne the transforms dont trigger.

I put a console.log on transforms.fromDB like mentioned above, but I dont see the logs.

@notheotherben
Copy link
Member

That's correct, transforms are lazy for reads - you'll only trigger them when you read one of the properties that a transform is applied to 😉

@gothammm
Copy link
Author

Ok I'm a lil confused. So does it trigger when I say instance.document._id ?

@notheotherben
Copy link
Member

Precisely, maybe a bit of code will help explain it better...

Behind the scenes, Iridium constructs a virtual class for your Instance (because this never changes, V8 can do optimizations on it and all of your code which uses it, helping to boost performance).

Within this class, there are a bunch of properties which behave sort of like the following:

class MyVirtualClass extends MyModel {
  get _id() {
    if(MyModel.transforms['_id'])
      return MyModel.transforms['_id'].fromDB(this.document['_id'], '_id', MyModel);
    return this.document['_id'];
  }

  set _id(value) {
    if(MyModel.transforms['_id'])
      this.document['_id'] = MyModel.transforms['_id'].toDB(value, '_id', MyModel);
    else
      this.document['_id'] = value;
  }
}

@gothammm
Copy link
Author

Hmmm, still don't get it.

So here is my transform

// Schema transforms
User.transforms = {
  _id: {
    fromDB: (document) => { 
      console.log('From DB'); return document; 
    },
    toDB: document => document
  }
}

// It has schema defined etc.

 var UserModel = new Model(Core, User);
// UserModel
UserModel.get("XXXXX").then((data) => {
    console.log(data.document._id);
 });

So, in this code, how do I trigger the transforms?

@notheotherben
Copy link
Member

Here's an example of how transforms would work.

User.transforms = {
  email: {
    fromDB: value => {
      console.log('fromDB', value);
      return value.toUpperCase();
    },
    toDB: value => {
      console.log('toDB', value);
      return value.toLowerCase();
    }
  }
};

var UserModel = new Model(Core, User);
var p = UserModel.create({
  email: '[email protected]'
});

/* ASSUMING p RESOLVES HERE */
// > toDB [email protected]

p.then(user => {
   console.log(user.email);
  // > fromDB [email protected]
  // > [email protected]

  // At this point, using user.email will run the fromDB transform and return the value
  // user.email === '[email protected]'

  // But accessing the raw document's value will return the value as stored in the DB
  // user.document.email === '[email protected]'
  // (Where user.document is the raw document from the database)
})

@gothammm
Copy link
Author

user.email gives "undefined" , the getter seems to be not working. toDB works, not fromDB

@notheotherben
Copy link
Member

Curious, getters are definitely working but could be something else causing problems.

Just a quick progress update, the implementation and tests are done - I'm currently working on the documentation for all the new functionality, soon as I've got that done I'll push a release (probably 6.5.0) and you can start fiddling.

@gothammm
Copy link
Author

Ah my bad, yes it works. I was using the ES6 classes of Node v4.* for User Instance, and there is some issue with inheriting.

@notheotherben
Copy link
Member

Good to know, wonder if I should maybe add a warning to that effect to the README...

@gothammm
Copy link
Author

Nice, thanks for the update!

Yeah, the ES6 classes are acting a bit weird, and it doesn't work well with require('util').inherits. Should stick to plain ES5 for defining Instance atleast.

@notheotherben
Copy link
Member

Oh, are you defining them like this?

class MyInstance extends Iridium.Instance {

}

or are you trying to mix-and-match?

class MyInstance {

}

require('util').inherits(MyInstance, Iridium.Instance);

@gothammm
Copy link
Author

1st one doesn't work - I get the error class cannot be invoked without 'new' keyword.
So I went with the 2nd option, but there is an issue with that as well - nodejs/node#3452

And I tried using Object.setPrototypeOf (Which is the fix that I see in the issue above) but I'm not sure how it works, it seems to completely override the scope of Instance class.

And now I fallback to the usual ES5 syntax for Instance.

@notheotherben
Copy link
Member

Oh wow, that's significantly more broken that I realized - luckily hiding behind TypeScript's transpiler and not noticing any of this.

@gothammm
Copy link
Author

Hahaha, yeah 😄

@notheotherben
Copy link
Member

No worries, should also be fixed in the next release (will be 6.5.1).

@gothammm
Copy link
Author

Awesome, Waiting for the release 😄

@notheotherben
Copy link
Member

[email protected] is out 😉

@gothammm
Copy link
Author

Fantastic! Thanks for everything 😄 !

Checking out the new readme now.

@notheotherben
Copy link
Member

Not a problem, let me know if there are any problems with it

@gothammm
Copy link
Author

Sure. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants