This is the code repository for Predictive Analytics with TensorFlow, published by Packt. It contains all the supporting project files necessary to work through the book from start to finish.
Predictive decisions are becoming a huge trend worldwide catering wide sectors of industries by predicting which decisions are more likely to give maximum results. The data mining, statistics, machine learning allows users to discover predictive intelligence by uncovering patterns and showing the relationship among the structured and unstructured data. This book will help you build solutions which will make automated decisions. In the end tune and build your own predictive analytics model with the help of TensorFlow.
This book will be divided in three main sections. In the first section-Applied Mathematics, Statistics, and Foundations of Predictive Analytics; will cover Linear algebra needed to getting started with data science in a practical manner by using the most commonly used Python packages. It will also cover the needed background in probability and information theory that is must for Data Scientists.
The second section shows how to develop large-scale predictive analytics pipelines using supervised (classification/regression) and unsupervised (clustering) learning algorithms. It’ll then demonstrate how to develop predictive models for NLP. Finally, reinforcement learning and recommendation system will be used for developing predictive models.
The third section covers practical mastery of deep learning architectures for advanced predictive analytics: including Deep Neural Networks (MLP & DBN) and Recurrent Neural Networks for high-dimensional and sequence data. Finally, it’ll show how to develop Convolutional Neural Networks- based predictive models for emotion recognition, image classification, and sentiment analysis.
So in total, this book will help you control the power of deep learning in diverse fields, providing best practices and tips from the real world use cases and helps you in decision making based on predictive analytics.
All of the code is organized into folders. Each folder starts with a number followed by the application name -e.g. Chapter02. A sample code block in the book looks like the following:
# Import necessary packages and modules
import pandas as pd
import numpy as np
import tensorflow as tf
import os
from datetime import datetime
from sklearn.metrics import roc_auc_score as auc
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
from tensorflow.python.framework import ops
# Initializing the variables
init_op = tf.global_variables_initializer()
# Instantiate the TensorFlow session and execute the computational graph
with tf.Session() as sess:
now = datetime.now()
sess.run(init)
total_batch = int(train_x.shape[0]/batch_size)
All the examples have been implemented in Python 2 and 3 with TensorFlow 1.2.0+. You will also need some additional software and tools. To be more specifi, the following tools and libraries are required, preferably the latest version:
- Python (2.7.x or 3.3+)
- TensorFlow (1.0.0+)
- Bazel (latest version)
- pip/pip3 (latest version for Python 2 and 3 respectively)
- matplotlib (latest version)
- pandas (latest version)
- NumPy (latest version)
- SciPy (latest version)
- sklearn (latest version)
- yahoo_fiance (latest version)
- Bazel(latest version)
- CUDA (latest version)
- CuDNN (latest version)
- Linux distributions are preferable (including Debian, Ubuntu, Fedora, RHEL, and CentOS) and to be more specifi, for Ubuntu it is recommended to have the 14.04
- (LTS) 64-bit (or later) complete installation or VMWare player 12 or VirtualBox. You can also run TensorFlow jobs on Windows (XP/7/8/10) or Mac OS X (10.4.7+).
- Processor Core i5 or Core i7 with GPU support is recommended to get the best results. However, multicore processing would provide faster data processing and scalability of the predictive analytics jobs—at least 8 GB RAM (recommended) for a standalone mode and at least 32 GB RAM for a single VM and higher for a cluster.
- There is enough storage for running heavy jobs (depending on the dataset size you will be handling), preferably at least 50 GB of free disk storage.