Skip to content

Getting Started

tustinova edited this page Aug 31, 2016 · 2 revisions

Run the RBADT demo

If you choose to run the RBAD tool in command line, please refer to the following steps:

    1. Open a terminal window.
  1. Navigate to the folder into which you installed the application.
    If you accepted the default settings, you can find the folder in one of the following locations:
Windows C:\Program Files\RBAD
Mac OS X /Applications/RBAD
Linux /usr/RBAD
3. Run the application using the one of the following commands:
Windows application\RBAD
Mac OS X ./RBAD.app/Contents/MacOS/RBAD
Linux ./RBAD

Demo mode

As there are currently no DIA in development to test for anomalies or anomalies-injection tool available, as well as RBADT is not yet integrated with other DICE tools, the tool works in the demo mode. This includes:

  1. DIA performance model is created using (some) observations from the 32-point dataset obtained by running Oryx 2 (instead of running a DIA each time when a datapoint is needed and collecting it from the D-Mon).
  2. Versioned model repository is implemented as a readable/writable data structure (cell array), where model of the previous deployment (version_to_compare=1) is the Oryx 2 performance model (for batch processing times) with an addition of the term -40*executor-cores (to generate the report message after the comparison of two models).
  3. The trained model is not saved into the model repository (this functionality is implemented, but commented out), because it will be the same on every tool execution.

Configuration files

Configuration parameters for the tool are supplied via the configuration file config_main.txt, and input data for model training (except for observations, which are obtained from the DICE monitoring platform) via config_factors.txt.
Important! These files should be placed into the same directory as an installed application.

Configuration file config_main.txt contains the following parameters:

  1. Metric - performance metric to train a statistical model for, set by the developer. Should be entered in the same syntax as it's called in the DICE monitoring platform (D-Mon). This parameter currently is not used by the tool, because it is not integrated yet with D-Mon.
  2. Budget_constr (Yes/No). Indicates whether developer is limited by how many observation points they can obtain for the model training step (general rule: the more datapoints available - the better performance model will be, but more expensive - more times to run application under test are needed).
  3. Budget (integer value or empty). If there is a time/money limitation on obtaining training data, indicate maximum number of experiments you can afford, else leave it empty.
  4. Mode (manual/automated). Tool currently supports only manual mode. In this mode developer provides to the tool the list of inputs for the model training (e.g. application configuration parameters, hardware parameters etc.) with their settings (low and high) via config_factors.txt file (see the file for an example input). In automated mode this list will be supplied by the DICE deployment tool.
  5. R2 (default value 0.85). Parameter defining the minimum desired goodness of fit for the trained model (general rule: the lower this value, the less datapoints are needed to train the model, but the model predictive capability also diminishes).
  6. p_value (default 0.05). Parameter defining the certainty with which the terms (from the list of input parameters) should be included into the trained model. General rule is similar to the R2: the higher this value (e.g 0.1), the less experiments would be needed to obtain observations for model fitting. However, this would increase the risk of including input parameters into the model which are noise (don't influence performance metric being modelled).
  7. Version_to_compare. Here enter the version number of your application's deployment with which you want to compare the performance model that will be trained for your the most recent deployment. The tool will retrieve the corresponding performance model from the model repository. Currently there is no application under test and RBADT works in the demo mode (see more detailed explanation of the demo mode below).

Configuration file config_factors.txt contains five Spark configuration parameters with settings (low and high) for the open-source Machine Learning application Oryx 2. Oryx 2 was run with all possible combinations of these settings (32) to obtain datapoints of the batch processing time metric simulating Data-Intensive Application (DIA) under test for the RBADT. Where in the software development environment the tool will be running the DIA and collecting one data point per each model training step from the monitoring platform, in the current version it's picking one datapoint from this 32 observations' dataset. The Oryx 2 dataset serves as a demo and this feature will be replaced with the interfaces to the DICE deployment tool and DICE monitoring platform on the integration stage.

Clone this wiki locally