Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port NodeJS backend into Rust. #22

Merged
merged 1 commit into from
Aug 5, 2016
Merged

Port NodeJS backend into Rust. #22

merged 1 commit into from
Aug 5, 2016

Conversation

Mark-Simulacrum
Copy link
Member

@Mark-Simulacrum Mark-Simulacrum commented Aug 3, 2016

Removes JS file and replaces backend directory with Rust port.

Fixes: #21.

Running list to keep track of major future, either in this PR or separate, work (@nrc, feel free to add):

  • Use liblog rather than println for logging.
  • Track down a good way of parsing dates passed from the client.
  • Tests.
  • Think about and possibly replace error-chain.
  • Rewrite the make_times function to return a Timings struct which will keep the total phase separate, since it is a meta-phase. This is a huge project since this affects most of the codebase; as well as causing problems with the iteration over phases code: it probably expects the total phase to exist.

@nrc nrc mentioned this pull request Aug 3, 2016
@@ -0,0 +1,15 @@
[package]
authors = ["Mark-Simulacrum <[email protected]>"]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what to put in here--let me know and I can replace it/change it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

["Mark-Simulacrum <[email protected]>", "Nicholas Cameron <[email protected]>", "The rustc-perf contributors"]

@nrc
Copy link
Member

nrc commented Aug 3, 2016

Just at a first glance, it would be nice, I think, to break out at least a couple of modules - ideally main.rs should be just the extern crates and a main function. You should probably leave the JS version there for the moment - it will be easy to remove later, but it might be interesting to run both for testing. Could you add some more comments please, particularly around the major data structures.

@Mark-Simulacrum
Copy link
Member Author

Regarding the JS version: should I leave the ES6 version (upon which this work was based, at least in part: although I compared with both along the way to make sure I wasn't missing anything) or the version in master right now?

@nrc
Copy link
Member

nrc commented Aug 3, 2016

I guess the old version, since that is still our working reference.

@Mark-Simulacrum
Copy link
Member Author

Pushed the modularization and documentation. Should be at least ready for another review pass.


// Post processing to generate the summary data.
let summary_rustc = Summary::new(&data_rustc, last_date);
let summary_benchmarks = Summary::new(&data_benchmarks, last_date);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just did a test, and both of these complete in less than a second (in debug mode). Should we move them into the server logic, rather than splitting the summary into two steps? I don't see many advantages to doing so, but wanted to leave a comment noting that here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to take a global look at what code is where later, but for now I think this is fine.

@Mark-Simulacrum
Copy link
Member Author

I've also been thinking about the median computation for the summary page, and I'm not sure that it's beneficial: can you explain to me what the intended benefits are? I feel like it doesn't help to show the true by-week results, and skews the data by potentially never showing a week (especially if the week is an outlier!). I may be misreading something though.

@nrc
Copy link
Member

nrc commented Aug 4, 2016

re the medians, it is the median of three commits at apx the same point in the week, not three weeks, so we should never miss a week. The benefit is meant to be that there is some noise in the results and if we don't take the median, we are more likely to see a difference due to noise rather than signal, hopefully using a median gets rid of some of the noise.

@Mark-Simulacrum
Copy link
Member Author

it is the median of three commits at apx the same point in the week, not three weeks

I'm not sure I follow you here: Is there a guarantee that there are >=3 commits per week? If not, then the (at least current) logic takes the previous 3 data points, not necessarily within the current week:

// For a given date we'll get the three most recent sets of data
// and take the the median for each value.
let start_idx = start_idx(data, &date);
assert!(start_idx >= 3, "Less than 3 days of data");
let mut weeks = Vec::new();
for idx in 0..3 {
    weeks.push(&data[start_idx - idx].by_crate);
}

@nrc
Copy link
Member

nrc commented Aug 4, 2016

It's not guaranteed, and at the moment we've been struggling with perf issues (ha! irony), but we were averaging three runs per day, so it would be highly unlikely to ever miss a week. We could add a guard against this in the code.

@Mark-Simulacrum
Copy link
Member Author

A guard sounds good to me. (And this can be a future work type thing, I think).

This sound good as to a specification?: Assert that all three data points we just pulled are within the current week. Drop any data points that are outside the current week (and log a warning?). Possibly log another warning when 0 data points are now left and skip the week if no data points for it were found.

@Mark-Simulacrum
Copy link
Member Author

Also, I've been looking at the rust-lang/rust#34831 issue (as a case of a pre-existing URL pointing at perf.rust-lang.org), and upon testing it with the current code, things break. I don't know of a good way to "best effort parse this date we're throwing at you" in Rust (and even in JS, truth be told). Without that, we'll either need to try and list all the cases we can think of (attempting parsing with each, failing, and moving on to the next one), or think of something else. Perhaps input sanitization? Regex-based search and recombine the input into something more readily parseable?

Let me know if you think we shouldn't care about backwards compatibility with URLs (I think we should). I'm also not 100% certain how to generate the URL in that issue, my attempts don't seem to produce the HH:MM:SS GMT+0000 (UTC) part.

Error(Msg("while parsing Mon Jun 13 2016 19:51:42 GMT+0000 (UTC)"), (Some(ParseError(TooLong)), stack backtrace:
...


/// Return where the passed date is (Ok(usize)) or should go (Err(usize)) in
/// the data slice.
fn get_insert_location(data: &[Data], date: &NaiveDateTime) -> ::std::result::Result<usize, usize> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the return type, this should probably be called find or something like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was tempted, but find [to me] implies returning &Item (&Data) in this case; position could work, but since this can return the position where something should go, not where it is, I wasn't too sure.

Also, do you think it makes sense to return Result<usize, usize> here? Or would returning usize and doing the if is_ok(), unwrap(), else, unwrap_err() logic in here be better? We perform that logic anyway below near as I can tell.

data_rustc.push(Data::new(date, &header, times, true));

for timing in times.members() {
let crate_name = timing["crate"].as_str().unwrap().to_string();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on here: .as_str().unwrap().to_string() ?

Copy link
Member Author

@Mark-Simulacrum Mark-Simulacrum Aug 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timing["crate"] is a JsonValue; so to convert it to a string we need to take a Option<&str> out, unwrap that (it could be something else) and then convert it to a String.

This is one of the places where Serde might help... I'm still split on which one is better (Serde's ergonomics are terrible IMO if you don't use the Serialize/Deserialize auto-impls).

@Mark-Simulacrum
Copy link
Member Author

I'd like to add tests, but I've been struggling to come up with good isolated portions of the code which can be tested. We can definitely do some form of overall diff-like testing though I think (create a InputData with predetermined state and call the server functions to generate output). That shouldn't be too difficult as the server functions don't depend on a request being passed, just the input data and [optionally] a body.

}

if kind == "benchmarks" && body["crates"].contains("total") {
return Err(format!("bad value for crates {:?} with kind benchmarks", body["crates"]).into());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas as to why a "benchmarks" kind can't have "total" in the crates? I'm a little confused by this.

@nrc
Copy link
Member

nrc commented Aug 5, 2016

Yeah, how we test this is a big question for me. I think in the medium term we should refactor to make portions testable, one piece at a time, and add unit tests - test coverage is a big goal for me for the rewrite.

In the short-term, that is probably too much to do. Some diff testing with real data from the timing repo would be good to do, if it is not too complex.

data.len() - 1
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both start_idx and end_idx basically just clamp the value into the bounds of 0..data.len(); is there something in the stdlib better than what we do now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I know about, might be worth asking on #rust though

@Mark-Simulacrum
Copy link
Member Author

I think I can do some dirty testing with real data. Unfortunately, due to a few changes in logic (the percent update is the main one), we can't compare every page. I think I'll wait on the tests until we're blocking on them for merge if I can't come up with some good way of doing them quickly and cheaply. My current slow and semi-painful idea is to Save as... each page from a run of the JS version and do the same here and then diff them; but this grows rather quickly due to the semi-large number of variants for each page: kind, group_by, etc.

@nrc
Copy link
Member

nrc commented Aug 5, 2016

It might be worth separating out any corrections into a separate commit, so we can be sure that the old version matches the new version without fixes, then add the fixes back in. Not sure if that is easy enough to be worthwhile.

@Mark-Simulacrum
Copy link
Member Author

Hmm, yeah, I'm not sure how easy that would be either. I think the only major difference should be on the summary page (and this has so far proven true from visual testing); but I may be wrong. There may also be subtle changes that are more difficult to detect/fix: for example, right now I believe the dates on the summary page are off by one compared to the dates on perf.rust-lang.org. I'm not certain why this is, although I have suspicions that it has to do with my updating the data directory on my machine while the official one seems to have stopped auto-updating.

@nrc
Copy link
Member

nrc commented Aug 5, 2016

I should have thought of this earlier, but we should move either the Rust or node implementation into a separate directory so we know what is what.

@Mark-Simulacrum
Copy link
Member Author

I see a couple of options for that, not sure which is better, but I prefer either the first or the third personally. Creating a directory for a single file feels odd to me.

  1. Moving Rust into a rs-backend directory
  2. Moving Node into a node-backend directory
  3. Moving the [single file] node perf.js to top level and rename to node-backend.js.

@nrc
Copy link
Member

nrc commented Aug 5, 2016

I like 1 or 2, so 1 seems like a plan we can both get behind :-)

let mut chain = Chain::new(router);
chain.link(State::<InputData>::both(data));

Iron::new(chain).http("0.0.0.0:2346").unwrap();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull the ip addr into a const please?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Crate-level or module-level? I'm torn either way: server-specific but also semi-global.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably stick at the top of the crate to make it as easy as possible to change. We'll probably read it from a config file at some point, I guess.

The intent is to increase maintainability and allow for faster development.

This rewrite does not intend to change any functionality other than fixing the
percent-handling code to use the proper formula.

Fixes: #21.
@nrc nrc merged commit 085f5ff into rust-lang:master Aug 5, 2016
@nrc
Copy link
Member

nrc commented Aug 5, 2016

Awesome work, thank you!

@Mark-Simulacrum Mark-Simulacrum deleted the rust-port branch August 7, 2016 22:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants