2016 01 Release Notes

Wiki Home ▸ 2016 01 Release Notes

##Model Catalog Model Catalog is a service for exposing data models on the platform in a nice and easy to access way. The service provides basic information about analytics tool’s instances like: guid, hostname, credentials and lists data models per each instance. Currently, only H2O instances are supported. The next step is to extend this service also for TAP Analytics Toolkit instances. (https://github.com/trustedanalytics/model-catalog)

##Downloader behind proxy (Open Stack) A solution that supports installation of TAP Platform inside protected environments. There is an environment variable added for the downloader service that enables/disables http proxy usage to control download of files from outside of the protected network. (https://github.com/trustedanalytics/downloader)

##RStudio, Arcadia, Hue open in a separate browser tabs Adjustment for the better flow in the TAP Console usage. All of the tools instances in the Data Science section as well as the apps on the list in Applications section and views of datasets in Data Catalog section open now in the new tab of the browser.

##Hue available from outside CF network Hue is now deployed like any other platform applications. There is no need for manual adjustment to have Hue accessible for datasets in Data Catalog section.

##Service keys management capability in TAP Console Now there is a possibility to generate or destroy credentials for different service instances from the console (as defined in Cloud Foundry documentation: https://docs.cloudfoundry.org/devguide/services/service-keys.html). This feature is accessible in TAP Console in Services > Instances view.

##Improvements in Application Broker Application Broker can be used to easily create service offerings without implementing separate broker. The only thing that needs to be done is to prepare reference application and register it in the broker. Next step is to spawn copies of the reference app and treat them like service instances. Offerings will be also visible within CF marketplace. (https://github.com/trustedanalytics/application-broker)

Application broker changed the way it generates names of the spawned applications in such a way that they will no more make collision with already existing instances. From now on there can be multiple namesake instances of a service handled by the application-broker (in different spaces).

##Gearpump on TAP GearPump is a real-time big data streaming engine. TAP supports 0.7.4 GearPump version. GearPump broker enables users to create GearPump cluster with a matter of a single mouse click in TAP Marketplace. Dedicated UI application is created on Cloud Foundry and Gearpump’s master and workers are deployed to YARN. Please, visit http://www.gearpump.io to check out Gearpump’s features.

##Change of the default OAuth client for TAP apps This change breaks compatibility with previous version of TAP and your application might require an update. The default OAuth client used by some of our apps (console, user-management) was changed from developer_console to tap_console. This was done because “developer_console” was a special user in UAA and its permissions kept resetting after UAA restarts. After those restarts, which are a common thing during maintenance, the users would be unable to log in unless the “developer_console” permissions wouldn’t put back by hand.

##Machine Learning

Logistic Regression Enhancements – Enhanced Logistic Regression algorithm with the addition of a summary table and enabling a frequency column for observations
Principal Component Analysis (PCA) via Singular Value Decomposition (SVD) – Users can train a model on a given frame by specifying the columns and the number of components needed. Users can run predict on a trained PCA model and obtain the top ‘k’ principal components and total t-squared index of the frame. It is a scoring model, computing t-squared index per observation (adding a new column to the frame for this), providing an option to normalize the data
Random Forest – Implementation of Random Forest as a classifier and a regressor. This includes a separate Train, Test and Predict methods for Random Forest as a classifier and as a regressor. Has scoring model for Random Forest as a classifier
Collaborative Filtering – Collaborative filtering rewritten to a model (train).
ALS – Recommender added for collaborative filtering model recommend. Migrated the implementation to MLlib with train, score, predict, and recommend methods. The CGD algorithm has been deprecated in favor of ALS
LBP – Refactored the existing code into Pregel-core and LBP specific implementation in preparation to merge with the new LP
LDA Model – Predict method on an LDA Model that computes word occurrences, conditional probability of word given document, conditional probability of topic given word and new word count/percentages computations
Categorical Summary – Feature adds the ability for users to retrieve statistical information on non-numeric data
Scoring Engine – Redesigned the scoring engine, simplifying the deployment and binding models to an instance of the engine
Deprecation of Giraph – Simplified the design of TAP analytics components by supporting the Spark implementation of algorithms and removing Giraph implementations

For more details on current supported machine learning algorithms, please refer to the [ATK Documentation] (http://trustedanalytics.github.io/atk/) ##Data Manipulation

Simplified TAP Analytics Authentication - Updated client config for ease of use between TAP server and TAP Analytics client that now only requires ATl Server uri, user name and password. Client versioning has been improved for easier deployment and troubleshooting when TAP is updated
Data Catalog Python Integration – Python users can now view their datasets using Python code
HBase Import/Export – Users are be able to import HBase tables to as Frames and export from Frames to HBase
JDBC Import/Export – Postgres and MySQL integration with the Python API allowing data to be transferred in and out of these JDBC databases
Hive – Data can be imported and exported into Hive from Frames
Deprecation of Titan – Removed the integration with Titan graph database
Inspect Formatting – Improved the formatting control and defaults in the Inspect Python command

##Security

Kerberos – Analytics Python API is working Kerberos enabled CDH. Batch operations fully functional. Real-time Analytics (scoring) limited to working on non-Kerberos CDH deployments

##Scale and Performance

Module Loader – A module loader was created to install Analytics JARs in HDFS significantly increasing performance by reducing the number of time code needs to be copied
CDH 5.5.x Upgrade – Analytics run on Cloudera Hadoop 5.5.x gaining the advantage of all the scale and performance improvements of the updated release
Various Tunings and Bug Fixes – Features have been tuned as part of the release, making features such as JOIN, scale and perform better

##Fixed Problems & Issues

TRACS-12 “terraform-openstack-intel/cf-install/provision.sh” error
TRACS-14 Blocked when download JDK after console prompt user input
TRACS-17 deploy the CDH parcel to all hosts failed on local repo with proxy
TRACS-19 slow network connection between VMs
TRACS-20 zookeeper service failed
TRACS-26 Http time out
TRACS-30 Release is invalid
TRACS-32 references an unknown release
TRACS-33 logsearch required
TRACS-34 error remove apps/run from the domain while bosh deploy
TRACS-37 error cf login
TRACS-39 502 server error while cf create-service-broker
TRACS-42 Can't create broker
TRACS-43 mssing login-intel package
TRACS-46 unzip problem --- only applies to Ubuntu platform -- different with TRACS-18
TRACS-50 Unable to start all CF apps
TRACS-60 DNS server behavior different on our DP2 platform and standard centos
TRACS-66/TRACS-51 make update error
TRACS-68 The version number in project template and hadoop-utils aren't the same

##Known bugs

TRACS-83 Problem in CloudFoundry: provisioning frequently fails during buildpack_php compilation
TRACS-82Transfer submission fails with 500 Internal Server Error right after deployment

##Browser Compatbility There is a number of interfaces to use the Analytics Toolkit, such as Jupyter Notebook, IntelliJ, Eclipse, or Python command line interface. The Jupyter Notebook is compatible with most major browsers, except Internet Explorer (IE). We recommend users not use IE with the Jupyter Notebook.

Home | FAQs

[Overview of Trusted Analytics Platform](Overview of Trusted Analytics Platform)
[Getting Started Guide](Getting Started Guide)
Space Shuttle Demo Application
Trusted Analytics JIRA Project
[Building TAP from sources] (Building-TAP-from-sources)
PaaS Architecture
Use-cases
- [High Level Use Case](High Level Use Case)
- [Model Development for Data Scientists](Model Development for Data Scientists)
Platform Tips and Tricks
Platform Security Features
- Managing User Accounts in Kerberos
Platform Configurations
Release Notes
- 2015 09 Release Notes
- 2016 01 Release Notes
Additional Deployment Information

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2016 01 Release Notes

Clone this wiki locally