Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plots: introduce support for images #6431

Merged
merged 16 commits into from
Aug 26, 2021

Conversation

pared
Copy link
Contributor

@pared pared commented Aug 16, 2021

For #6145

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

EDIT:

Key takeaways:

  • repo.plots no longer returns vega json - now its data collected from particular recvisions - @efiop I think we might need to bump minor version with this one.
  • HTML rendering is moved out of repo - it still requires repo for some logic related to templates and config, but we might be able to get rid of that too
  • image support is there
  • --show-vega remains - backward compatibility, in case of image target we just won't get any result

What we need to document:

  • images support
  • plots modify will not have an effect on image plots (not restricting it as we support dirs, which can have mixed content)

Further development:

@pared pared requested a review from a team as a code owner August 16, 2021 07:45
@pared pared requested a review from skshetry August 16, 2021 07:45
@pared pared force-pushed the 6145_separate_rendering branch 3 times, most recently from deabe35 to f51aeb0 Compare August 18, 2021 12:34
@pared pared added feature is a feature A: plots Related to the plots labels Aug 19, 2021
@pared pared force-pushed the 6145_separate_rendering branch 2 times, most recently from e150b4c to bb5d920 Compare August 19, 2021 20:57
@dberenbaum
Copy link
Collaborator

Hi @pared, what's the status on this PR? Is there anything we can do to move it along while you are out?

@pared pared force-pushed the 6145_separate_rendering branch from 44afa56 to a04558c Compare August 21, 2021 13:51
@@ -49,7 +49,6 @@ def _collect_paths(
)
else:
logger.warning("'%s' was not found at: '%s'.", path_info, rev)
continue
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing - even if we want to have a warning that some file does not exist, we need to try to open that file later, so that we get FileNotFound in results, for API consistiency.

@pared
Copy link
Contributor Author

pared commented Aug 21, 2021

@dberenbaum I believe it should be fine as-is for now. There could be some polishing in the future, but it should be fine for now.

@pared pared force-pushed the 6145_separate_rendering branch from 5ba0f96 to 4ffef00 Compare August 21, 2021 14:35
@pared pared changed the title [WIP] plots: introduce support for images plots: introduce support for images Aug 21, 2021
@pared pared requested review from Suor and removed request for Suor August 21, 2021 15:02
@pared
Copy link
Contributor Author

pared commented Aug 21, 2021

cc @Suor @rogermparent @mattseddon

@pared
Copy link
Contributor Author

pared commented Aug 21, 2021

Sample image plots rendering:

#!/bin/bash

set -ex

src=$(pwd)

pushd $TMPDIR

rm -rf test_workspace
mkdir test_workspace

pushd test_workspace
mkdir test_repo

pushd test_repo

git init --quiet
dvc init --quiet

echo -e "
from tensorflow.keras.callbacks import Callback
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras import utils
from collections import defaultdict
from dvclive.keras import DvcLiveCallback

results = defaultdict(list) 

class MetricsCallback(Callback):
    def on_epoch_end(self, epoch: int, logs: dict = None):
        logs = logs or {}
        for metric, value in logs.items():
            results[metric].append(value)
        results['epoch'].append(epoch)


def load_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.reshape(60000, 784)
    x_test = x_test.reshape(10000, 784)
    x_train = x_train.astype('float32')
    x_test = x_test.astype('float32')
    x_train /= 255
    x_test /= 255

    classes = 10
    y_train = utils.to_categorical(y_train, classes)
    y_test = utils.to_categorical(y_test, classes)
    return (x_train, y_train), (x_test, y_test)


def get_model():
    model = Sequential()

    model.add(Dense(256, input_dim=784))
    model.add(Activation('sigmoid'))
    model.add(Dense(10, input_dim=256))
    model.add(Activation('softmax'))

    model.compile(loss='categorical_crossentropy',
    metrics=['accuracy'], optimizer='sgd')
    return model


(x_train, y_train), (x_test, y_test) = load_data()
model = get_model()

model.fit(x_train,
          y_train,
          validation_data=(x_test, y_test),
          batch_size=128,
          epochs=5,
          callbacks=[MetricsCallback(), DvcLiveCallback()])

def write_results(results, d):
    import matplotlib
    import matplotlib.pyplot as plt
    import os
    os.makedirs(d)
    x = results.pop('epoch')
    for key, y in results.items():
        plt.figure()
        plt.plot(x, y)
        plt.ylabel(key)
        plt.savefig(os.path.join(d, key + '.png'))
    

write_results(results, 'plots')
" >> train.py
dvc run -d train.py --plots plots --plots dvclive -n train python train.py
git add -A
git commit -am "init"

sed -i "" "s/sgd/adam/g" train.py
dvc repro train
dvc plots diff

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is isolated, so let's merge and address things in follow-ups (docs too).

@efiop efiop merged commit ab08db5 into iterative:master Aug 26, 2021
@@ -15,10 +17,10 @@ def create_summary(out):

metrics, plots = out.repo.live.show(str(out.path_info))

html_path = out.path_info.with_suffix(".html")
html_path = out.path_info.fspath + "_dvc_plots"
Copy link
Contributor

@daavoo daavoo Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared What is the reason for this change?

Should we update on DVCLive that {path}.html is no longer the default html output?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! To support images, the image files must be stored alongside the html, so the output is now a directory instead of a single file.

I would probably opt to do something like dvcvlive/dvc_plots rather than append them with underscores. Any other ideas are welcome.

Copy link
Contributor

@daavoo daavoo Aug 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong preference but, if we go with dvcvlive/dvc_plots, the .html output will now be handled like the .tsv plots (i.e. based onlive's cache option).

Previously, the .html was not tracked by DVC and it was up to the user whether to track it with Git/DVC or just ignore.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's probably why @pared did it this way. Maybe just dvclive_plots then?

Another option would be to reorganize dvclive outputs so that the tsv files are under a subdirectory like dvclive/logs or dvclive/tsv and the html is another subdirectory like dvclive/plots or dvclive/html.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like dvclive/tsv & dvclive/html.

Honestly, the word plots for referring to the HTML sounds kind of confusing to me because the .tsv files are also dvc plots in a sense.

Moving the outputs to subdirectories would require some changes in DVC core but it feels like a good direction. I might consider also moving the summary .json to dvclive/summary.json.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can continue the discussion in iterative/dvclive#148 as this is a comment in a closed P.R.

def __init__(self):
super().__init__(
"Plot data extraction failed. Please see "
"https://man.dvc.org/plot for supported data formats."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be /plots

Comment on lines +33 to +39
rel_img_path = relpath(img_path, page_dir_path)
with open(img_path, "wb") as fd:
fd.write(image_data)
return """
<div>
<p>{title}</p>
<img src="{src}">
Copy link
Contributor

@jorgeorpinel jorgeorpinel Sep 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can probably inject HTML code here:

>>> os.path.relpath('#"> <script...')
'#"> <script...'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: plots Related to the plots feature is a feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants