Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python dataset support ubuntu 19/20 #950

Open
epa095 opened this issue Apr 28, 2020 · 31 comments
Open

Python dataset support ubuntu 19/20 #950

epa095 opened this issue Apr 28, 2020 · 31 comments
Assignees
Labels
ADO Issue is documented on MSFT ADO for internal tracking cxp Data4ML machine-learning/svc product-feedback Indicating it's product feedback triaged

Comments

@epa095
Copy link

epa095 commented Apr 28, 2020

Issue: When attempting to download a dataset on ubuntu 19.10 I get NotImplementedError: Unsupported Linux distribution ubuntu 19.10.

It seems like the problem is that the dotnetcore2 pip package actually only supports ubuntu 18. But ubuntu 20.04 is the new LTS, so it makes sense to support it (and also ubuntu 19).

Also, can we agree that it is a bit of an architecture-smell when downlading some csv-files (the dataset) causes a dependency to go look for a distro-specitic tar-file for a custom installation of a third dependency? I don't know whats the best solution, but this cant be it.

Related: #713

@GiftA-MSFT GiftA-MSFT self-assigned this Apr 29, 2020
@GiftA-MSFT
Copy link

@epa095 we will review your feedback and get back to you shortly. Thanks.

@SturgeonMi
Copy link

Hi Erik,

Were you downloading an AML Dataset from AML workspace?
Or were you downloading the CSV file?
Could you help to provide more details about the interface you are using?
Thanks!

@epa095
Copy link
Author

epa095 commented Apr 29, 2020

Hi @SturgeonMi !
I was attempting to follow along this tutorial on my ubuntu 19.10 linux, but I got the above mentioned problem when I got to the step "Download the MNIST dataset". It crashes on the step MNIST.get_file_dataset, because it ends up calling attemp_get_deps in the file runtime.py in the package dotnetcore2.

My relevant versions:
dotnetcore2==2.1.13
azureml-opendatasets==1.4.0
azureml-sdk==1.4.0
azure-core==1.4.0
ubuntu 19.10

@SturgeonMi
Copy link

Thanks a lot, Erik!
Opend a bug to track from AzureML side.
Will get back to you about updates.

@DebFro DebFro added the Data4ML label May 1, 2020
@SturgeonMi
Copy link

Hi Erik,

We fixed related bug in Open Datasets SDK.

Could you help to try below steps?

Please ensure you are using the latest Azure Open Datasets SDK. You can get install the latest SDK by Running the following commands"
!pip uninstall -y azureml-opendatasets
!pip install azureml-opendatasets

Also here is the latest version of the tutorial notebook: https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/image-classification-mnist-data/img-classification-part1-training.ipynb

Thanks!

@epa095
Copy link
Author

epa095 commented May 2, 2020

Hi @SturgeonMi, the latest version of azureml-opendatasets I see on pypi is 1.4.0, and as you can see from my previous comment that is the version I am already using.

@SturgeonMi
Copy link

SturgeonMi commented May 4, 2020

Hi Erik, we opened a bug for dotnetcore2 issue.
Once it's fixed, will update here.

@GiftA-MSFT
Copy link

@epa095 hope the above solution helped. I will now proceed to close this thread. Let us know if you continue to encounter issues downloading the dataset. Thanks.

@SturgeonMi
Copy link

Added a feature to support v19.

@epa095
Copy link
Author

epa095 commented May 8, 2020

@SturgeonMi thanks for opening an issue for me in dotnetcore2. Is there any way I can track it (i.e. is it publicly available in any way)?

@lostmygithubaccount
Copy link
Contributor

reopening per new policy - is this fixed?

@v-strudm-msft v-strudm-msft added the ADO Issue is documented on MSFT ADO for internal tracking label Apr 24, 2021
@gegnew
Copy link

gegnew commented Jan 26, 2022

apparently not, also having issues

@SturgeonMi
Copy link

Hi @gegnew, are you still getting NotImplementedError: Unsupported Linux distribution ubuntu 19.10 when downloading a dataset on ubuntu 19.10? Or it's other error messages you are getting?

@gegnew
Copy link

gegnew commented Jan 27, 2022

Hi @SturgeonMi, I'm getting the errors reported in this issue, but have been totally unable to get any workaround to function. It's not precisely the same error, but afaict it's related.

@gegnew
Copy link

gegnew commented Jan 27, 2022

I'm on Arch, but installing the lttng modules doesn't resolve the missing dependency in the dotnet runtime

@SturgeonMi
Copy link

SturgeonMi commented Jan 28, 2022

Do you mind to provide more about what you were doing (what was the command you were using) when getting "NotImplementedError: Linux distribution arch . does not have automatic support.
.NET Core 2.1 can still be used via dotnetcore2 if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install dotnet-runtime-* and replace * with 2.1."?

@vighnesh-sablok
Copy link

Hi, I am getting the same error. Providing the details on that below:

When is the error coming:
When I try to load a azure dataset in local as a pandas dataframe.
df = azure_workspace.datasets.get(dataset_name).to_pandas_dataframe()

Error Message:
NotImplementedError: Linux distribution ubuntu 22.04 does not have automatic support.
Missing packages: {'liblttng-ust.so.0'}
.NET Core 3.1 can still be used via dotnetcore2 if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install dotnet-runtime-* and replace * with 3.1.23.

My system details
Distributor ID: Ubuntu
Description: Ubuntu 22.04 LTS
Release: 22.04
Codename: jammy
dotnetcore2== 3.1.23
azureml-sdk==1.42.0
azureml-core==1.42.0.post1
azureml-opendatasets==1.42.0

What have I tried as a solution:
Tried installing dotnet_runtime as mentioned in the error.
Command: sudo apt-get install -y dotnet-runtime-3.1.23

result :
E: Unable to locate package dotnet-runtime-3.1.23
E: Couldn't find any package by glob 'dotnet-runtime-3.1.23'
E: Couldn't find any package by regex 'dotnet-runtime-3.1.23'

Please provide any solns/alternatives. Ultimately, I want to load an azure dataset in local, whichever way possible.

@NielsHoogeveen1990
Copy link

I want to run a job on Azure ML (as a Docker container where I train my model). However, I keep getting this error when the job fails:

Traceback (most recent call last):
  File "train.py", line 5, in <module>
    train()
  File "/usr/local/lib/python3.9/site-packages/mlops_i4t/machine_learning/model_utils.py", line 56, in train
    df = dataset.to_pandas_dataframe()
  File "/usr/local/lib/python3.9/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/azureml/data/tabular_dataset.py", line 168, in to_pandas_dataframe
    dataflow = get_dataflow_for_execution(self._dataflow, 'to_pandas_dataframe', 'TabularDataset')
  File "/usr/local/lib/python3.9/site-packages/azureml/data/_loggerfactory.py", line 132, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.9/site-packages/azureml/data/abstract_dataset.py", line 221, in _dataflow
    dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/_datastore_helper.py", line 177, in _set_auth_type
    get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(auth_type, json.dumps(auth_value)))
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/api.py", line 19, in get_engine_api
    _engine_api = EngineAPI()
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/api.py", line 102, in __init__
    self._message_channel = launch_engine()
  File "/usr/local/lib/python3.9/site-packages/azureml/dataprep/api/engineapi/engine.py", line 333, in launch_engine
    dependencies_path = runtime.ensure_dependencies()
  File "/usr/local/lib/python3.9/site-packages/dotnetcore2/runtime.py", line 285, in ensure_dependencies
    if not attempt_get_deps(missing_pkgs):
  File "/usr/local/lib/python3.9/site-packages/dotnetcore2/runtime.py", line 279, in attempt_get_deps
    raise NotImplementedError(err_msg + '\n' + _unsupported_help_msg)
NotImplementedError: Linux distribution debian 11. does not have automatic support. 
Missing packages: {'libcurl.so.4', 'liblttng-ust.so.0'}
.NET Core 3.1 can still be used via `dotnetcore2` if the required dependencies are installed.
Visit https://aka.ms/dotnet-install-linux for Linux distro specific .NET Core install instructions.
Follow your distro specific instructions to install `dotnet-runtime-*` and replace `*` with `3.1.23`.

I am lost...what can I do to solve this?

@kato-m
Copy link

kato-m commented Jan 11, 2023

Same Issue here with @NielsHoogeveen1990 stack trace. It can be reproduced with the the latest Ubuntu 22.04 MS Runner Image: https://github.com/actions/runner-images

@corticalstack
Copy link

Experiencing same issue trying to consume a data asset registered in my AML workspace. Anyone able to resolve the "not supported... .NET Core" issue? Thanks

@SturgeonMi
Copy link

Hi, debian 11 is not supported automatically. Could you try to install your Linux distro specific .NET Core based on guidance here https://learn.microsoft.com/en-us/dotnet/core/install/linux?
Follow your distro specific instructions to install dotnet-runtime-* and replace * with 3.1.23.
Thanks!

@corticalstack
Copy link

Hi, debian 11 is not supported automatically. Could you try to install your Linux distro specific .NET Core based on guidance here https://learn.microsoft.com/en-us/dotnet/core/install/linux? Follow your distro specific instructions to install dotnet-runtime-* and replace * with 3.1.23. Thanks!

I have followed the instructions you recommended, and get same as reported by @vighnesh-sablok with "Unable to locate package dotnet-runtime-3.1.23"

@corticalstack
Copy link

@SturgeonMi an easy way to replicate a test environment to get this error is to setup a devcontainer within vscode. If you could try follow the dotnet installation instructions for linux - I have not been able to get them working. Thank-you!

Example devcontainer.json

{
	"name": "my-aml-devcontainer",
        "build": { 
		"dockerfile": "Dockerfile"
	}
}

Example Dockerfile

FROM mcr.microsoft.com/vscode/devcontainers/base:ubuntu-22.04

# Install packages from standard package manager
RUN apt-get update -qq && export DEBIAN_FRONTEND=noninteractive && \
    apt-get install -y --no-install-recommends \
        software-properties-common \
        apt-transport-https \
        wget \
        curl \
        tar \
        zip \
        unzip \
        sudo \
        apt-utils \
        file \
        git \
        python3 \
        python3-pip \
        python3-setuptools \
        nano

# Python packages
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt

# Install Azure CLI and extensions
RUN curl -sL https://aka.ms/InstallAzureCLIDeb | bash \
    && az extension add -n ml -y

# Cleanup cached apt data
RUN apt-get autoremove -y && apt-get clean && \
    rm -rf /var/lib/apt/lists/*

CMD ["/bin/bash"]

Your requirements.txt would have python packages include azureml.core

Then a simple AML Python SDK v1 script

from azureml.core import Workspace, Dataset, Experiment, Model
import pandas as pd
import numpy as np
workspace = Workspace.from_config()
dataset_name = 'your dataset name here'
ds = Dataset.get_by_name(workspace=workspace, name=dataset_name)

@ghost
Copy link

ghost commented Mar 29, 2023

@SturgeonMi @corticalstack I'm facing the same issue. Is there any update?

@edgBR
Copy link

edgBR commented May 30, 2023

Hi,

Same issue here, I was using an ubuntu 20.04 image with sdk 1.48 and it was working but when bumping to 22.04 it doesnt work any longer.

My base image is:

mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04

@microsoft-sampsa
Copy link

Same issue here - all my azure ml cluster runs blow up because of this, when trying to use this as the base docker image of my environment:

https://github.com/Azure/AzureML-Containers/tree/master/base/gpu/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04

So: microsoft provided docker images won't work in microsoft azure ml clusters using microsoft azure ml APIs --> a major incompatibility within microsoft products.

@henrydleao
Copy link

Any news on this? I am having the same dotnet error in this Ubuntu version when trying to use the lib "azureml-dataset-runtime".

@SturgeonMi
Copy link

Ubuntu version 14, 16,18, 20 are supported by "azureml-dataset-runtime". The package has a dependency on dotnetcore and that brings the restriction. We will publish a version 5.0.0 without dotnetcore dependency in the coming weeks. And that should resolve this issue.

@jmwoloso
Copy link

What about suport for Ubuntu 22 @SturgeonMi?

@SturgeonMi
Copy link

We plan to publish a newer package version without dotnetcore dependency in the coming weeks. This should resolve the "Unsupported Linux distribution ubuntu" issue. @anliakho2 can provide more details here.

@zso3n3n
Copy link

zso3n3n commented Dec 15, 2023

Hello @SturgeonMi , any update on publishing the newer package?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ADO Issue is documented on MSFT ADO for internal tracking cxp Data4ML machine-learning/svc product-feedback Indicating it's product feedback triaged
Projects
None yet
Development

No branches or pull requests