Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Containerized Python Components #8353

Closed
statmike opened this issue Oct 12, 2022 · 10 comments · Fixed by #8359
Closed

[bug] Containerized Python Components #8353

statmike opened this issue Oct 12, 2022 · 10 comments · Fixed by #8359
Labels

Comments

@statmike
Copy link

kfp.__version__ = 1.8.14

Trying the instructions at 2. Containerized Python components:

  • command kfp component build ... results in error: "Error: No such command 'component'."
    • KFP API SDK Reference for kfp compontent build
    • changing kfp component to kfp components seem to get pass the command issue
  • using kfp components build throw errors for *.py files that do not contain components: ERROR: No KFP components found in file ...
    • the other *.py file are modules imported by the file.py with component definitions using relative imports. This seems to be
      exactly how the example is laid out doc example
@statmike
Copy link
Author

Update

I upgraded to kfp.__version__ = 2.0.0b5 and it fixe the error of "ERROR: No such command 'component'." and it matches the KFP API SDK.

I still see the behavior of ERROR: No KFP components found in file ... for any module in the folder that does not contain a component definition. I tried using --component-filepattern my_component.py but then it fails to find my other python modules and returns errors like No module named ... for each module being imported in the my_component.py.

To check that my non component python modules work I:

  • used import statements in an interactive Python environment and tested the included functions
  • ran the entrypoint python module from a command line and observed its successful completion

@connor-mccarthy
Copy link
Member

connor-mccarthy commented Oct 12, 2022

Hi, @statmike. Can you please provide the file tree that contains my_component.py and the accompanying source code files? Can you also provide what my_component.py looks like (feel free to omit the component body -- it's probably not necessary)?

@adampilz
Copy link

Having the same issue. Here are mine for a second look:

my_component.py

from kfp import dsl

@dsl.component(
base_image='python:3.9'
, target_image='gcr.io/my-project/my-containerized_python_component:v1'
, output_component_file="component_containerized_python_component.yaml"
)
def train_model(
):
# component functionality
print("hello")

my_helper_module.py

def adding_stuff(x, y):
return x + y

@statmike
Copy link
Author

Hello @connor-mccarthy ,
This is the full example I have been setting up:

tree

  • ./src
    • init.py
    • my_component.py
    • my_helper.py
    • train.py

init.py
is empty

my_component.py

# my_component.py
from kfp.v2 import dsl
import train

@dsl.component(
    base_image = 'python:3.7',
    target_image = 'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/tips_kfp_kfp_component',
    packages_to_install = ['numpy', 'scikit-learn']
)
def train_model(
    size: int,
    metrics: dsl.Output[dsl.Metrics],
    class_metrics: dsl.Output[dsl.ClassificationMetrics]
):
    # run
    cm, auPRC = train.runner(size)
    
    # output
    metrics.log_metric('auPRC', auPRC)
    class_metrics.log_confusion_matrix(['Not Fraud', 'Fraud'], cm)

train.py

# train.py
from sklearn import metrics
import my_helper

def runner(size):
    # make data
    x, y, p = my_helper.make_dataset(size)

    # fit logistic regression
    y_pred = my_helper.fit_logistic(x, y)

    # gather metrics
    cm = metrics.confusion_matrix(y, y_pred)
    auPRC = metrics.accuracy_score(y, y_pred)
    
    return cm, auPRC

cm, auPRC = runner(100)

my_helper.py

# my_helper.py
import numpy as np
from sklearn import linear_model

# Make some data where y = 0, 1 for a range of x's - let y=1 be more likely as x increases
def make_dataset(size):
    x = np.random.randn(size)
    p = 1 / (1 + np.exp(-1 * (5 * x)))
    y = np.random.binomial(1, p, size)   
    return x, y, p

# fit logistic regression
def fit_logistic(x, y):
    logisticReg = linear_model.LogisticRegression()
    x2 = x.reshape(-1,1)
    fit = logisticReg.fit(x2, y)
    return fit.predict(x2)

@chensun
Copy link
Member

chensun commented Oct 13, 2022

@statmike The linked PR will fix the issue.

I tested that your code would work after a minor fix.

In your my_component.py, the following line

class_metrics.log_confusion_matrix(['Not Fraud', 'Fraud'], cm)

should be updated to

class_metrics.log_confusion_matrix(['Not Fraud', 'Fraud'], cm.tolist())

because cm is of type np.ndarrary which is not JSON serializable, and the method requires a list typed input here.

@statmike
Copy link
Author

Thank you @chensun for the very fast response and fix. I look forward to using this once it is merged!

google-oss-prow bot pushed a commit that referenced this issue Oct 13, 2022
…8359)

* Update component.py

* Update component.py

* Update component.py
@statmike
Copy link
Author

Hello @chensun
I installed kfp==2.0.0b6 and tried the code above with the fix to cm and it looks like the import of the my_helper.py file is not occurring in time to be used.

Running kfp component build ./{DIR}/src Returns:

Building component using KFP package path: kfp==2.0.0-beta.6
No module named 'my_helper'

@connor-mccarthy
Copy link
Member

connor-mccarthy commented Oct 19, 2022

@statmike, thank you for raising this. This is indeed a bug. After some investigation, it happens (at least) in the case when the longest path of the import graph ending at the component involves >2 modules. For example:

# component.py
from module_one import one

@dsl.component
def comp(): ...
# module_one.py
from module_two import two
one = 1
# module_two.py
two = 2

The fix isn't immediately clear, but I will add this as a bug and look into it.

@statmike
Copy link
Author

@connor-mccarthy thank you for the quick validation. Do I need to open another issue for tracking? I notice this one is closed.

@connor-mccarthy
Copy link
Member

@statmike
I have made one here: #8385

jlyaoyuli pushed a commit to jlyaoyuli/pipelines that referenced this issue Jan 5, 2023
…#8353 (kubeflow#8359)

* Update component.py

* Update component.py

* Update component.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants