Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Create less IO errors to improve efficiency of require (forwardport of #8189) #8312

Closed

Conversation

errendir
Copy link

@errendir errendir commented Sep 3, 2014

In some environment (especially when coffescript is involved) module.js
will spend a lot of time looking up nonexisting files. This is caused by the
lengthy error creation. Those errors are then immediately caught.

This change makes the lookup much faster, by avoiding the unnecessary error
creation.

module::statPath now uses the new second parameter of fs.statSync function of the
fs.js. There is a similar change for the fs.stat, adding the third parameter.

This is the forwardport of #8189

@errendir errendir changed the title fs.stat third parameter controlling error creation Create less IO errors to improve efficiency of require (forwardport of https://github.com/joyent/node/pull/8189) Sep 3, 2014
@errendir errendir changed the title Create less IO errors to improve efficiency of require (forwardport of https://github.com/joyent/node/pull/8189) Create less IO errors to improve efficiency of require (forwardport of #8189) Sep 3, 2014
binding.stat(pathModule._makeLong(path), cb);
function cb(err, stats) {
if (callback) callback(err ? false : true);
if (!nullCheck(path, undefined, true)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you passing undefined? If there's a callback then the error should be passed to the callback.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. we can't break the guarantee that the error will be passed to the callback. and we can't break the guarantee that no error will propagate at all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e. it's probably necessary to make the API fs.exists(path[, throwSafe], callback).

Also, if there is no callback on an async API call then we will throw. Basically throwSafe should be documented to only prevent the creation of an Error object in cases working with the path (e.g. ENOENT), excluding throwing when there are null characters in the path name.

@trevnorris
Copy link

@errendir Finished first round of reviews.

@@ -168,12 +168,15 @@ Only available on Mac OS X.

Synchronous lchmod(2).

## fs.stat(path, callback)
## fs.stat(path, callback, [throwSafe])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not having the callback as the last argument is atypical, I think. (EDIT: I see Trevor already commented on that.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wouldn't it be better to have an options object instead of a boolean? Seems a little more future proof.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to the options object.

In some environment (especially when coffescript is involved) module.js
will spend a lot of time looking up nonexisting files. This is caused by the
lengthy error creation. Those errors are then immediately caught.

This change makes the lookup much faster, by avoiding the unnecessary error
creation.

module::statPath now uses the new second parameter of fs.statSync function of the
fs.js. There is a similar change for the fs.stat, adding the third parameter.

Apply suggestions from libuv pull request

Thanks saghul: joyent/libuv#1428

Fix the datastructures for async

Documentation update

Add tests and simplify Stat function

Apply the code review comments
fs.js now exports the function CreateConfiguredFSObject, which takes the
configuration hash and creates the fs object with that configuration.

This way the public interface of fs.js remains the same apart from a new
function added to it. The callback-is-last principle is preserved and the
module.js can still use the optimized stat, just like any other code. All that
needs to be done is to create the configured fs object:

var throwSafeFs = fs.CreateConfiguredFSObject({ throwSafe: true });
@errendir errendir force-pushed the feature/remove-redundant-throws-v0.12 branch 3 times, most recently from a33fa92 to b17107c Compare September 8, 2014 22:18
@errendir errendir force-pushed the feature/remove-redundant-throws-v0.12 branch from b17107c to 3e7a2bc Compare September 8, 2014 22:39
@errendir
Copy link
Author

errendir commented Sep 8, 2014

The new commits make this change compatible with the current API.

fs.js now exports the function CreateConfiguredFSObject. It takes the configuration hash (options) and creates the fs object with that configuration. The options are stored in the non-enumerable options property.

The object exported from the fs.js is still the same object as it was previously, so it's backwards compatible. The only new function is the CreateConfiguredFSObject.

The callback-is-last principle is preserved and the module.js can still use the optimized stat, just like any other code. All that needs to be done is to create the configured fs object:

var throwSafeFs = fs.CreateConfiguredFSObject({ throwSafe: true });

What do you think of doing this change this way?

There are certainly some other configuration options that can be put in the options hash. Default encoding seems like one.

backtrace.stack.substr(backtrace.name.length);
err = backtrace;
throw err;
throw err; // Forgot a callback but don't know where? Use NODE_DEBUG=fs

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node is doing this wrong. We should be synchronously throwing if no callback is passed to an asynchronous function. e.g.

try {
  fs.stat('/not/here');
} catch(e) {
  // This won't be reached, so we have an
  // uncatchable error.
}

The above demonstrates horrible API design. We know immediately if there's a callback or not, and we should let the user know that as well.

Honestly I don't care if there's some strange backwards compatibility reason for this. It needs to change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a transition mechanism. It was deemed too aggressive to make missing callbacks throw right away, see commits a804347 and 6bd8b7e.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand correctly, in v0.8 fs.stat('/some/path') would silently swallow the error that no callback was passed? Oy.

Anyway that was back in v0.8 and now it should be fine to now throw if there's no callback in v0.12.

I can't remember the v0.8 API, but now all the fs functions have fs.*Sync() counterparts. So while there are some legacy cases where a function returns synchronously if no callback is passed, the API now has an explicit way to call all functions synchronously.

Guess what I'm getting at is we've now finished the v0.10 transition period and the API should start throwing if no callback is called in v0.12.

@trevnorris
Copy link

Placing all the functions in CreateConfiguredFSObject() isn't a permissible solution.

__VA_ARGS__, \
NULL); \
if (err < 0) { \
if (dest != NULL && \
if (req_wrap.req.throw_safe == 1) { \
args.GetReturnValue().Set(False(env->isolate())); \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

args.GetReturnValue().Set(false) is enough.

@trevnorris
Copy link

I've gathered some numbers. First, here is the patch that I've used as a control:

diff --git a/src/node_file.cc b/src/node_file.cc
index 3b287ec..9a17563 100644
--- a/src/node_file.cc
+++ b/src/node_file.cc
@@ -131,23 +131,7 @@ static void After(uv_fs_t *req) {
   Local<Value> argv[2];

   if (req->result < 0) {
-    // If the request doesn't have a path parameter set.
-    if (req->path == NULL) {
-      argv[0] = UVException(req->result, NULL, req_wrap->syscall());
-    } else if ((req->result == UV_EEXIST ||
-                req->result == UV_ENOTEMPTY ||
-                req->result == UV_EPERM) &&
-               req_wrap->dest_len() > 0) {
-      argv[0] = UVException(req->result,
-                            NULL,
-                            req_wrap->syscall(),
-                            req_wrap->dest());
-    } else {
-      argv[0] = UVException(req->result,
-                            NULL,
-                            req_wrap->syscall(),
-                            static_cast<const char*>(req->path));
-    }
+    argv[0] = Null(env->isolate());
   } else {
     // error value is empty or null for non-error.
     argv[0] = Null(env->isolate());

And here is the test:

var fs = require('fs');
var ITER = (process.env[2] >>> 0) || 1e5;

var iter = ITER;
var t = process.hrtime();

(function runner(err) {
  if (--iter > 0)
    return fs.stat('/not/here', runner);
  t = process.hrtime(t);
  console.log(((t[0] * 1e9 + t[1]) / ITER).toFixed(1) + ' ns/op');
}());

On my machine, which has an SSD, the control takes ~12us to run. On current v0.12 it runs in ~24us.

So first I'd like to say that the current level of added code complexity to save ~12us on a path that I doubt is being called more than 100 times/sec., and only in special cases, is too much.

Instead I'd start by making Local<Value> UVException() faster. By calling into JS and creating the Error object and setting additional properties, instead of doing it all in C++, I estimate 50% of error creation time can be saved. Bringing the call down to ~18us. Also it would be much easier.

@bnoordhuis
Copy link
Member

I'm not sure how reliable that benchmark is. There are two costs to an error object: creating the stack trace and creating the object. In your benchmark, the error objects never make it out of the new space while in the real world, things are probably less clear cut; an error object moves through a chain of callbacks, is probably walked several times by the garbage collector, may end up getting promoted to the old space, etc.

@errendir
Copy link
Author

@trevnorris OK, I will look into the Local<Value> UVException().

Why is wrapping the functions in a CreateConfiguredFSObject() function not permissible? Is it a style guideline? Is it a rule against too many closures? This seems to be the only way to have a configuration hash available from every function.
Using the standard class pattern with the prototype can't guarantee that the functions will not be called bound to a different object. I guess I could manually bind all the functions upon object creation, but that sounds even worse.

@trevnorris
Copy link

@bnoordhuis You're correct, but I'm not sure there'd even be a reliable way to produce those results. Especially since the time taken could vary a lot based on the current conditions of the application. So we could say the benchmark measures the least possible time to generate an error. While with the diff we can see the amount of time taken to not bother with an error at all.

@errendir May want to look at e9ce8fc as an example for calling from C++ to JS and setting object properties.

@jasnell
Copy link
Member

jasnell commented Aug 15, 2015

@errendir ... Unfortunately this seems to have stalled out. At this point, the PR will need to be updated and reopened against master on http://github.com/nodejs/node. Closing this here (which is not to say that improvements aren't needed, just that this can't land the way it is currently)

@jasnell jasnell closed this Aug 15, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants