Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intel OpenVINO backend #2332

Closed
wants to merge 12 commits into from
Closed

Intel OpenVINO backend #2332

wants to merge 12 commits into from

Conversation

dkurt
Copy link

@dkurt dkurt commented Dec 21, 2020

Pull Request Template

Description

Add Intel OpenVINO backend for prediction. Backend optimizes only predictions, not training.

To enable OpenVINO, add use_openvino=True when create a model:

# examples/tox21/tox21_tf_progressive.py

model = dc.models.ProgressiveMultitaskClassifier(
    len(tox21_tasks),
    n_features,
    layer_sizes=[1000],
    dropouts=[.25],
    learning_rate=0.001,
    batch_size=50,
    use_openvino=True)

Efficiency measurements (not final, working on improvement):

benchmark code
from __future__ import print_function
from __future__ import division
from __future__ import unicode_literals

import os
import shutil
import numpy as np
import argparse
import deepchem as dc
from deepchem.molnet import load_tox21

parser = argparse.ArgumentParser()
parser.add_argument('--use_openvino', action='store_true')
args = parser.parse_args()

# Only for debug!
np.random.seed(123)

# Load Tox21 dataset
n_features = 1024
tox21_tasks, tox21_datasets, transformers = load_tox21()
train_dataset, valid_dataset, test_dataset = tox21_datasets

# Fit models
metric = dc.metrics.Metric(dc.metrics.roc_auc_score, np.mean)

model = dc.models.ProgressiveMultitaskClassifier(
    len(tox21_tasks),
    n_features,
    layer_sizes=[1000],
    dropouts=[.25],
    learning_rate=0.001,
    batch_size=50,
    use_openvino=args.use_openvino)

print("Evaluating model")
import time
for i in range(3):
  start = time.time()
  train_scores = model.evaluate(train_dataset, [metric], transformers)
  print(time.time() - start)
tox21_tf_progressive use_openvino=False use_openvino=True
Intel® Core™ i7 8665UE 7.75 sec 6.03 sec (x1.28)
Intel® Xeon® Gold 6258R 1.23 sec 0.72 sec (x1.7)

Tested models:

  • ProgressiveMultitaskClassifier (tox21)

Type of change

Please check the option that is related to your PR.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
    • In this case, we recommend to discuss your modification on GitHub issues before creating the PR
  • Documentations (modification for documents)

Checklist

  • My code follows the style guidelines of this project
    • Run yapf -i <modified file> and check no errors (yapf version must be 0.22.0)
    • Run mypy -p deepchem and check no errors
    • Run flake8 <modified file> --count and check no errors
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • New unit tests pass locally with my changes
  • I have checked my code and corrected any misspellings

@dkurt dkurt marked this pull request as draft December 21, 2020 07:00
@dkurt dkurt force-pushed the openvino branch 2 times, most recently from 4703992 to 83b9c42 Compare December 21, 2020 10:02
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I have a few preliminary comments below. I'm not very familiar with OpenVINO so some general information about its use cases and more documentation and tests would be very helpful for us to be able to maintain this as a feature.

deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
@dkurt
Copy link
Author

dkurt commented Dec 22, 2020

@rbharath, Hi! Thanks for review! I fully agree with your comments and going to resolve them soon.

@codecov-io
Copy link

codecov-io commented Dec 23, 2020

Codecov Report

Merging #2332 (8471ded) into master (5099208) will increase coverage by 0.07%.
The diff coverage is 94.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2332      +/-   ##
==========================================
+ Coverage   85.00%   85.08%   +0.07%     
==========================================
  Files         292      294       +2     
  Lines       26018    26191     +173     
==========================================
+ Hits        22116    22284     +168     
- Misses       3902     3907       +5     
Impacted Files Coverage Δ
deepchem/data/data_loader.py 88.56% <ø> (ø)
deepchem/data/datasets.py 86.91% <ø> (ø)
deepchem/feat/graph_data.py 79.48% <ø> (ø)
deepchem/utils/openvino_model.py 92.37% <92.37%> (ø)
deepchem/models/keras_model.py 87.14% <100.00%> (+0.28%) ⬆️
deepchem/models/tests/test_openvino.py 100.00% <100.00%> (ø)
deepchem/models/torch_models/torch_model.py 89.14% <100.00%> (+0.22%) ⬆️
deepchem/rl/tests/test_rl_reload.py 96.61% <0.00%> (+1.69%) ⬆️
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 01ea79d...8471ded. Read the comment docs.

env.cpu.yml Outdated Show resolved Hide resolved
@dkurt
Copy link
Author

dkurt commented Dec 24, 2020

Failed test:

deepchem/models/tests/test_graph_models.py::test_dtnn_regression_model FAILED [ 44%]

Not sure that it's because of the changes in this PR

@dkurt dkurt force-pushed the openvino branch 2 times, most recently from 35715ae to d5c5548 Compare December 24, 2020 18:50
@dkurt dkurt marked this pull request as ready for review December 24, 2020 19:22
Copy link
Member

@rbharath rbharath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks better! Thanks for adding in some unit tests :). I've done a more detailed review pass now

As a first comment, I think we need some more work on the docstrings to match the rest of the codebase.

A second comment is whether we should consider making OpenVINOModel a private class.

My other major comment is about maintainability. If OpenVINO is experimental, I want to make sure that we can maintain it. If you have bandwidth to commit to maintaining I think we can make this work :)

Also, tagging in @peastman @nd-02110114 who might be interested to follow along. Please feel free to chime in if you have thoughts!

deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
env.cpu.yml Outdated Show resolved Hide resolved
deepchem/models/openvino_model.py Outdated Show resolved Hide resolved
@nissy-dev
Copy link
Member

nissy-dev commented Dec 29, 2020

My other major comment is about maintainability.

I'm also concerned about this point.
I seems that OpenVINO is an optimized engine for computer vision such as image processing like CNN. However, most of DNN models for material science rarely treat with image processing and I don't think our users will gain much benefit from it. Also, in the field of chemistry, the inference performance is rarely required as severely as in image recognition or object recognition.

On the other hand, this PR depends on your published PyPI package and it is personalized.
So, I seem it is hard and not motivated for us to maintain.

@dkurt
Copy link
Author

dkurt commented Dec 29, 2020

@nd-02110114, Thanks for feedback!

Despite that OpenVINO was initially designed for computer vision tasks, it's a universal engine and from PR's description you can see that there is a benefit even for networks without convolutions at all (x1.7 improvement for tox21_tf_progressive which consists of GEMM layers only).

On the other hand, this PR depends on your published PyPI package and it is personalized.
So, I seem it is hard and not motivated for us to maintain.

I see you point. I'll try to switch to an official package.

@nissy-dev
Copy link
Member

@dkurt I'm sorry for a late response 🙇‍♂️

Despite that OpenVINO was initially designed for computer vision tasks, it's a universal engine and from PR's description you can see that there is a benefit even for networks without convolutions at all (x1.7 improvement for tox21_tf_progressive which consists of GEMM layers only).

I understood the inference performance for tox21_tf_progressive improved. However, as I mentioned, our users like chemists rarely face the situation that requires such inference performance improvements. So, I seem that most of our users don't feel benchmark improvments as a benefit. Generally, our users require train or preprocess performance improvements rather than inference. When making an inference, the bottleneck of the performance is mainly the preprocess in my experience. If you know the good usecases of OpenVINO in the area of chemistry, I want to know.

@rbharath
Copy link
Member

rbharath commented Jan 4, 2021

One possible application that comes to mind is if a user wants to run inference against a large chemical or materials library. For example, users sometimes want to run a graph conv model against a large database of compounds like Enamine REAL (~1 billion compounds). In that case, it might be useful to have inference speedups. Could OpenVINO help for this case? If so a small benchmark might really help establish the advantage.

@dkurt Seconding @nd-02110114 that it would be great to hear about any other use cases you have in mind :).

@dkurt
Copy link
Author

dkurt commented Jan 5, 2021

@nd-02110114, @rbharath, Thanks for comments! You're absolutely right that the best benefit of using such kind of optimizations is large volume data.

For example, users sometimes want to run a graph conv model against a large database of compounds like Enamine REAL (~1 billion compounds)

May I ask to point to a model so we can do benchmarking?

@rbharath
Copy link
Member

My apologies for the slow response here! This fell behind due to the DeepChem 2.4.0 release work.

As a suggestion for a benchmark model, could you try running dc.models.GraphConvModel with the OpenVino backend to see the speed improvements? You could use any collection of molecules for this, but the ZINC15 dataset is available in MoleculeNet (and is quite large) so might be a good scale benchmark

@dkurt dkurt marked this pull request as draft April 29, 2021 06:15
@rbharath
Copy link
Member

rbharath commented Apr 5, 2023

Closing this old PR for cleanup

@rbharath rbharath closed this Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants