Skip to content

2016 01 Release Notes

Iman Saleh edited this page Feb 9, 2016 · 10 revisions

Wiki Home2016 01 Release Notes

##Model Catalog Model Catalog is a service for exposing data models on the platform in a nice and easy to access way. The service provides basic information about analytics tool’s instances like: guid, hostname, credentials and lists data models per each instance. Currently, only H2O instances are supported. The next step is to extend this service also for TAP Analytics Toolkit instances. (https://github.com/trustedanalytics/model-catalog)

##Downloader behind proxy (Open Stack) A solution that supports installation of TAP Platform inside protected environments. There is an environment variable added for the downloader service that enables/disables http proxy usage to control download of files from outside of the protected network. (https://github.com/trustedanalytics/downloader)

##RStudio, Arcadia, Hue open in a separate browser tabs Adjustment for the better flow in the TAP Console usage. All of the tools instances in the Data Science section as well as the apps on the list in Applications section and views of datasets in Data Catalog section open now in the new tab of the browser.

##Hue available from outside CF network Hue is now deployed like any other platform applications. There is no need for manual adjustment to have Hue accessible for datasets in Data Catalog section.

##Service keys management capability in TAP Console Now there is a possibility to generate or destroy credentials for different service instances from the console (as defined in Cloud Foundry documentation: https://docs.cloudfoundry.org/devguide/services/service-keys.html). This feature is accessible in TAP Console in Services > Instances view.

##Improvements in Application Broker Application Broker can be used to easily create service offerings without implementing separate broker. The only thing that needs to be done is to prepare reference application and register it in the broker. Next step is to spawn copies of the reference app and treat them like service instances. Offerings will be also visible within CF marketplace. (https://github.com/trustedanalytics/application-broker)

Application broker changed the way it generates names of the spawned applications in such a way that they will no more make collision with already existing instances. From now on there can be multiple namesake instances of a service handled by the application-broker (in different spaces).

##Gearpump on TAP GearPump is a real-time big data streaming engine. TAP supports 0.7.4 GearPump version. GearPump broker enables users to create GearPump cluster with a matter of a single mouse click in TAP Marketplace. Dedicated UI application is created on Cloud Foundry and Gearpump’s master and workers are deployed to YARN. Please, visit http://www.gearpump.io to check out Gearpump’s features.

##Change of the default OAuth client for TAP apps This change breaks compatibility with previous version of TAP and your application might require an update. The default OAuth client used by some of our apps (console, user-management) was changed from developer_console to tap_console. This was done because “developer_console” was a special user in UAA and its permissions kept resetting after UAA restarts. After those restarts, which are a common thing during maintenance, the users would be unable to log in unless the “developer_console” permissions wouldn’t put back by hand.

##Machine Learning

  • Logistic Regression Enhancements – Enhanced Logistic Regression algorithm with the addition of a summary table and enabling a frequency column for observations
  • Principal Component Analysis (PCA) via Singular Value Decomposition (SVD) – Users can train a model on a given frame by specifying the columns and the number of components needed. Users can run predict on a trained PCA model and obtain the top ‘k’ principal components and total t-squared index of the frame. It is a scoring model, computing t-squared index per observation (adding a new column to the frame for this), providing an option to normalize the data
  • Random Forest – Implementation of Random Forest as a classifier and a regressor. This includes a separate Train, Test and Predict methods for Random Forest as a classifier and as a regressor. Has scoring model for Random Forest as a classifier
  • Collaborative Filtering – Collaborative filtering rewritten to a model (train).
  • ALS – Recommender added for collaborative filtering model recommend. Migrated the implementation to MLlib with train, score, predict, and recommend methods. The CGD algorithm has been deprecated in favor of ALS
  • LBP – Refactored the existing code into Pregel-core and LBP specific implementation in preparation to merge with the new LP
  • LDA Model – Predict method on an LDA Model that computes word occurrences, conditional probability of word given document, conditional probability of topic given word and new word count/percentages computations
  • Categorical Summary – Feature adds the ability for users to retrieve statistical information on non-numeric data
  • Scoring Engine – Redesigned the scoring engine, simplifying the deployment and binding models to an instance of the engine
  • Deprecation of Giraph – Simplified the design of TAP analytics components by supporting the Spark implementation of algorithms and removing Giraph implementations

For more details on current supported machine learning algorithms, please refer to the [ATK Documentation] (http://trustedanalytics.github.io/atk/) ##Data Manipulation

  • Simplified TAP Analytics Authentication - Updated client config for ease of use between TAP server and TAP Analytics client that now only requires ATl Server uri, user name and password. Client versioning has been improved for easier deployment and troubleshooting when TAP is updated
  • Data Catalog Python Integration – Python users can now view their datasets using Python code
  • HBase Import/Export – Users are be able to import HBase tables to as Frames and export from Frames to HBase
  • JDBC Import/Export – Postgres and MySQL integration with the Python API allowing data to be transferred in and out of these JDBC databases
  • Hive – Data can be imported and exported into Hive from Frames
  • Deprecation of Titan – Removed the integration with Titan graph database
  • Inspect Formatting – Improved the formatting control and defaults in the Inspect Python command

##Security

  • Kerberos – Analytics Python API is working Kerberos enabled CDH. Batch operations fully functional. Real-time Analytics (scoring) limited to working on non-Kerberos CDH deployments

##Scale and Performance

  • Module Loader – A module loader was created to install Analytics JARs in HDFS significantly increasing performance by reducing the number of time code needs to be copied
  • CDH 5.5.x Upgrade – Analytics run on Cloudera Hadoop 5.5.x gaining the advantage of all the scale and performance improvements of the updated release
  • Various Tunings and Bug Fixes – Features have been tuned as part of the release, making features such as JOIN, scale and perform better

##Fixed Problems & Issues

  • TRACS-12 “terraform-openstack-intel/cf-install/provision.sh” error
  • TRACS-14 Blocked when download JDK after console prompt user input
  • TRACS-17 deploy the CDH parcel to all hosts failed on local repo with proxy
  • TRACS-19 slow network connection between VMs
  • TRACS-20 zookeeper service failed
  • TRACS-26 Http time out
  • TRACS-30 Release is invalid
  • TRACS-32 references an unknown release
  • TRACS-33 logsearch required
  • TRACS-34 error remove apps/run from the domain while bosh deploy
  • TRACS-37 error cf login
  • TRACS-39 502 server error while cf create-service-broker
  • TRACS-42 Can't create broker
  • TRACS-43 mssing login-intel package
  • TRACS-46 unzip problem --- only applies to Ubuntu platform -- different with TRACS-18
  • TRACS-50 Unable to start all CF apps
  • TRACS-60 DNS server behavior different on our DP2 platform and standard centos
  • TRACS-66/TRACS-51 make update error
  • TRACS-68 The version number in project template and hadoop-utils aren't the same

##Known bugs

  • TRACS-83 Problem in CloudFoundry: provisioning frequently fails during buildpack_php compilation
  • TRACS-82Transfer submission fails with 500 Internal Server Error right after deployment

##Browser Compatbility There is a number of interfaces to use the Analytics Toolkit, such as Jupyter Notebook, IntelliJ, Eclipse, or Python command line interface. The Jupyter Notebook is compatible with most major browsers, except Internet Explorer (IE). We recommend users not use IE with the Jupyter Notebook.

Clone this wiki locally