Multi-step measurement implementation #618

sburnicki · 2016-05-13T09:47:40Z

With a multi-step implementation, we want to get multiple separated measurements by executing a single script. Currently, you can only get one result per executed script.

There is an existing fork with a multi-step implementation (https://github.com/iteratec-hh/webpagetest/tree/multistep), however, the real multi-step implementation is somehow lost in different commits and the fork is not compatible with upstream.

For a new, clean implementation of multi-step I'd like to discuss the implementation before actually starting with it.

How is multi-step implemted in the fork?
The desired behavior is to be downwards-compatible to the current single-step version.
Each new measurement starts therefore with a setEventName command.
Commands executed between two setEventName are regarded as a single measurement, which work equally to the current implementation.

A script with three measurements would therefore look like this:

setEventName google
navigate http://www.google.com
setEventName aol
navigate http://www.aol.com
setEventName yahoo
navigate http://www.yahoo.com

If we'd leave out the line setEventName yahoo, the second result of the measurement would be just as executing both navigates to aol and yahoo with the upstream WebPagetest version.

Another upside of this behavior is that there are very little changes to the current implementation of the script processing. The main differences to be made are in saving, sending them to the server, displaying the results, and make them accessible via REST API.

The text was updated successfully, but these errors were encountered:

pmeenan · 2016-05-13T12:44:05Z

I don't know that the script modifications are strictly necessary. Even without them the agent code generates synthetic names and knows the step number it is currently working on. It is certainly helpful when reporting or presenting to the users.

For me, the biggest requirement is for backward compatibility with both the agents and consumers of the API. The structure of the JSON/XML/HAR can't change for the case of a single run and it would be best if the multi-step case was still backward compatible (possibly returning the combined data of all steps as a sequence where a single run would normally report). There is a lot of tooling and automation that uses the existing API and it needs to continue to work.

Something like this might work:

The "firstView" and "repeatView" entries under runs, average, standardDeviation and median would be the sum of the metrics from all of the steps.
The median run would be selected based on the aggregate timings for a given run
In multi-step cases only:
- The requests would be missing and would need to be pulled from the individual steps
- The video frames would be missing and would need to be pulled from the individual steps
- There would be a "steps" array that contained the full "firstView"/"repeatView" entries for each step
- Each step would include the label as well as the start time relative to the first step
- The video and request timings in a given step would be relative to the start of that one step

As far as the agents reporting the results, the page data, object data, screen shots, traces and other files should get an additional indicator in the file name to report the set number. It should be missing for the first reported step so the files are backwards-compatible with existing servers and incremented for each reported/recorded step (i.e. don't increment it for steps that have a logData 0 block around them).

The video files also need to be reported in such a way that they get stored into separate directories for each step and so that reporting to a legacy server does not break.

sburnicki · 2016-05-19T09:48:12Z

About backward compatibility: Do you only value API compatibility, or also behavior compatibility?
Both the modifications to the JSON/XML results and to the agent result reports would at least result in a different behavior for existing scripts, where multiple steps are executed (and only the last is considered in the result).

This is related to my next point about median, average, standardDeviation:

The "firstView" and "repeatView" entries under runs, average, standardDeviation and median would be the sum of the metrics from all of the steps.
The median run would be selected based on the aggregate timings for a given run

Depending on the use case, also the data of the specific steps would be interesting, not only data about the aggregated steps (we call that "journey") So in a journey with three steps, the median for different steps might be from different runs.
Maybe both should be supported, e.g. by introducing new subelements in the corresponding section.

For our main interest, the OpenSpeedMonitor, these values do not really matter, so this is not a personal requirement, but a consideration.

The requests would be missing and would need to be pulled from the individual steps
Wouldn't missing requests break API compatibility?

By "pull", do you mean it would be needed to refer to the data in the steps array to avoid too much duplicate data? I would agree with that.

There would be a "steps" array that contained the full "firstView"/"repeatView" entries for each step

I think it makes sense to include a step array per run, so it should be a subelement of each run in the run array, right?

pmeenan · 2016-05-19T13:10:24Z

Existing multiple-step scripts with multiple measurements are fundamentally broken if you try to do it with the current agent (all steps are crammed together). Behavior for multi-step scripts with only one reported step (logData 0/1 or combinesteps) needs to be maintained though.

For median/average/stddev, if the user wants to do something fancier and calculate the median of each step, they can (and should) calculate that directly off of the data from each run and just ignore the convenience metrics. And yes, by "pull" I mean don't include request data at all in the aggregates because of the large amount of duplicate data.

Agree that the steps would be within each run. To maintain backward compatibility though it probably needs to be one layer lower in the JSON and inside of the "firstView"/"repeatView" entry within each run.

i.e.

"runs" : [
   "1" : {
        "firstView" : {
            "steps": []
       }
    }
]

sburnicki · 2016-05-19T13:53:55Z

For multistep measurements you proposed to leave out requests and video frames and include them for the individual steps. I think the same would hold for images, thumbnails, and rawData, since they'd also differ for each step.

However, I'm not convinced of completely leaving them out for multi-step measurements. Of course the current execution of multi-step scripts doesn't return meaningful data, but removing these values might cause existing 3rd party software to crash instead of getting nonsense data.

The same idea holds for general "results" like runs.1.firstView.title and also the other values: For multi-step these values do not make sense, but should be filled with some values to not break other tools (e.g. first or last measured step).

The steps arrray with the correct per step results should only be added in addition.

sburnicki · 2016-06-08T13:55:38Z

sburnicki · 2016-06-16T08:17:00Z

Again about compatibility, especially the output of different runs (at the example of XML results):

What kind of data should exist on top level of run-specfic output? Results of the first run, or aggregated data?
If we use aggeregated data, what about the data that cannot be agrregated, or doesn't make sense to be aggregated?
If xmlResults should contain data about all requests, should it be included on top level, as now, to stay compatible? If yes, the amount of duplicate data will be quite huge. If no, the XML wouldn't be compatible.

Also, I think the data of the first step should then be duplicated:

pmeenan · 2016-06-16T17:08:56Z

My preference would be that for non-multistep tests it doesn't change but for multistep tests I think it makes sense to do something "smarter"

At the top level of a multi-step test I think it would be great if we could report the combined stats for the flow (like all steps combined into a sequence) and just the stats, not raw request-level stuff.

That way any automation that can trend high-level metrics like page load time, bytes in, etc would still do something reasonable with multi-step tests.

I'm also open to just not including anything at the top level for a multi-step test and require any tooling to handle it explicitly. Given that it hasn't been supported before we don't have to worry about backward compatibility for those (just whatever we can do to make tooling easier/more consistent).

sburnicki · 2016-06-16T17:23:25Z

Okay, so existing tooling doesn't need to be able to handle result data of multistep tests in any way. That's fine for me.

What I'd like to see is that tooling which supports multistep doesn't need to differentiate between multistep and singlestep results (as singlestep tests are only a subset of multistep runs then).

However, this would also lead to duplicate data for singlestep runs.

What we could do is to check for a parameter (like &multistep=true) which forces multistep results even for singlestep runs. This way data processing for tooling can stay consistent.
However, a solution without parameters would be even nicer.

sburnicki · 2016-07-05T15:43:27Z

For documentation purposes:
Both XML and JSON signlestep results can be forced to be also in multistep format by passing &multistepFormat=1 as a URL parameter to xmlResult.php, respectivel jsonResult.php.

sburnicki · 2016-08-15T11:40:52Z

With #684 major UI implementation is finished, so multistep should be usable at least for desktop agents. Investigation on how much effort the NodeJS agent would need follows next.

pmeenan · 2016-08-15T13:04:39Z

Thanks so much for all of the work you put into this. It was pretty epic and it was great to have everything in manageable chunks.

sburnicki · 2016-08-15T13:34:14Z

You're welcome. Thank you for checking and merging all this.

zeman · 2016-11-23T21:49:31Z

I noticed that HAR exports haven't been ticked off the list above. Can you confirm that they don't yet support multistep?

sburnicki · 2016-11-23T22:03:17Z

Yes, unfortunately that's right.

This was referenced May 24, 2016

wptdriver: Adjustments for multistep support #626

Merged

Multistep #376

Closed

sburnicki mentioned this issue Jun 8, 2016

Multistep: Result processing on server #634

Merged

This was referenced Jun 10, 2016

Server: Project directory structure? #636

Closed

Server: Refactoring of XML result generation #638

Merged

xmlResult: Further refactoring and tests #642

Merged

This was referenced Jun 23, 2016

Refactor/Tests: Loading and representing result data, XML generation #646

Merged

Multistep support for XML results #650

Merged

sburnicki mentioned this issue Jul 5, 2016

Multistep support for JSON results #651

Merged

This was referenced Jul 6, 2016

Multistep support for waterfall & optimization checklist images #652

Merged

Design of multistep result pages #653

Closed

sburnicki mentioned this issue Jul 13, 2016

Multistep changes for Screen Shot page and header #659

Merged

This was referenced Jul 29, 2016

Multistep changes for details page #673

Merged

Multistep pageimages #675

Merged

Multistep support for custom waterfalls #676

Merged

Multistep support for compare/filmstrip view and filmstrip/frame export #677

Merged

This was referenced Aug 9, 2016

Multistep support for breakdown and performance optimization pages #680

Merged

Multistep support for result page #684

Merged

sburnicki closed this as completed Sep 23, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-step measurement implementation #618

Multi-step measurement implementation #618

sburnicki commented May 13, 2016

pmeenan commented May 13, 2016

sburnicki commented May 19, 2016

pmeenan commented May 19, 2016

sburnicki commented May 19, 2016 •

edited

Loading

sburnicki commented Jun 8, 2016 •

edited

Loading

sburnicki commented Jun 16, 2016

pmeenan commented Jun 16, 2016

sburnicki commented Jun 16, 2016

sburnicki commented Jul 5, 2016

sburnicki commented Aug 15, 2016

pmeenan commented Aug 15, 2016

sburnicki commented Aug 15, 2016

zeman commented Nov 23, 2016

sburnicki commented Nov 23, 2016

Multi-step measurement implementation #618

Multi-step measurement implementation #618

Comments

sburnicki commented May 13, 2016

pmeenan commented May 13, 2016

sburnicki commented May 19, 2016

pmeenan commented May 19, 2016

sburnicki commented May 19, 2016 • edited Loading

sburnicki commented Jun 8, 2016 • edited Loading

sburnicki commented Jun 16, 2016

pmeenan commented Jun 16, 2016

sburnicki commented Jun 16, 2016

sburnicki commented Jul 5, 2016

sburnicki commented Aug 15, 2016

pmeenan commented Aug 15, 2016

sburnicki commented Aug 15, 2016

zeman commented Nov 23, 2016

sburnicki commented Nov 23, 2016

sburnicki commented May 19, 2016 •

edited

Loading

sburnicki commented Jun 8, 2016 •

edited

Loading