Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dataset.download() Unsupported Linux distribution #1003

Closed
Radeju opened this issue Jun 10, 2020 · 19 comments
Closed

dataset.download() Unsupported Linux distribution #1003

Radeju opened this issue Jun 10, 2020 · 19 comments

Comments

@Radeju
Copy link

Radeju commented Jun 10, 2020

I am trying to download an AzureML dataset on Ubuntu 20.04. I am using azureml.core library. However, when I try to run it I get following error

  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 169, in attemp_get_deps
    blob_deps_to_file()
  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 161, in blob_deps_to_file
    blob = request.urlopen(deps_url, context=ssl_context)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 531, in open
    response = meth(req, response)
  File "/usr/lib/python3.8/urllib/request.py", line 640, in http_response
    response = self.parent.error(
  File "/usr/lib/python3.8/urllib/request.py", line 569, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 649, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "setup/get_datasets.py", line 27, in <module>
    dataset.download(target_path=f'{path}/../.datasets/{dataset_name}', overwrite=True)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
    return func(*args, **kwargs)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/file_dataset.py", line 123, in download
    for p in self._to_path(activity='download.to_path')]
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/file_dataset.py", line 98, in _to_path
    dataflow, portable_path = _add_portable_path_column(self._dataflow)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/_loggerfactory.py", line 106, in wrapper
    return func(*args, **kwargs)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/data/abstract_dataset.py", line 203, in _dataflow
    dataprep().api._datastore_helper._set_auth_type(self._registration.workspace)
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py", line 136, in _set_auth_type
    get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth)))
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 18, in get_engine_api
    _engine_api = EngineAPI()
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py", line 55, in __init__
    self._message_channel = launch_engine()
  File "/home/bartek/.local/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py", line 300, in launch_engine
    dependencies_path = runtime.ensure_dependencies()
  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 181, in ensure_dependencies
    if not attemp_get_deps():
  File "/home/bartek/.local/lib/python3.8/site-packages/dotnetcore2/runtime.py", line 175, in attemp_get_deps
    raise NotImplementedError('Unsupported Linux distribution {0} {1}.{2}'.format(dist, version[0], version[1]))
NotImplementedError: Unsupported Linux distribution ubuntu 20.04
The terminal process terminated with exit code: 1

Are you planning to support 20.04 version of Ubuntu? Is there any roadmap? I found this issue from 6 months ago and would really appreciate to hear if anything had changed since then.

Right now I am using the workaround from here to make it work.

Warm regards

@YutongTie-MSFT
Copy link

@Radeju
Thanks for the feedback! We are currently investigating and will update you shortly.

@MayMSFT
Copy link
Contributor

MayMSFT commented Jun 11, 2020

thanks for the feedback. We have recorded the feedback and added it as a feature request on our roadmap.

@DebFro DebFro added the Data4ML label Jun 12, 2020
@YutongTie-MSFT
Copy link

@MayMSFT Hi May, are we good to close this or you want me to keep it open? Thanks.

@xkszltl
Copy link

xkszltl commented Jun 17, 2020

Hi! I'm planning to switch our pipeline from 18.04 to 20.04 soon as well.
Looks like this may be a blocking issue.
Do we have timeline regarding the fix?

Based on the log seems distro version is asserted by a whitelist. IMHO this is a bad design which can probably affect a lot of not-so-popular distros like arch or mint.

@MayMSFT
Copy link
Contributor

MayMSFT commented Jun 17, 2020

Unfortunately, it depends on legal approval. @tot0 to share more details

@tot0
Copy link

tot0 commented Jun 17, 2020

@xkszltl Hi, I unfortunately don't have any concrete timeline for official support of new linux distros. The legal processes involved distributing open source packages so that normally Datasets 'just works' require care and aren't moving as fast as we'd hope.

Datasets will only return saying 'Unsupported Distro' if the required dependencies for .NET Core 2.1 are not present on default library paths AND a pre-prepared dependency set doesn't exist.
We are working on improving the error message to link out to the official .NET Core documentation on how to install the correct dependencies for supported distributions.

@xkszltl Would you be able to try the first command here to install .NET Cores dependencies for Ubuntu 20.04 and see if you're able to use dataset.download()?
https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

@xkszltl
Copy link

xkszltl commented Jun 18, 2020

Of course, if it is just a matter of installing .NET it's totally fine for us.
Actually we will do that regardless of the use of AML Datasets.

Is 2.1 a exact or minimum requirement?
Can we use later version? Namely 2.2 or 3+

@tot0
Copy link

tot0 commented Jun 21, 2020

Currently Datasets requires .NET Core 2.1

@YutongTie-MSFT
Copy link

@Radeju
We will now proceed to close this thread. If there are further questions regarding this matter, please respond here and @YutongTie-MSFT and we will gladly continue the discussion.

@corticalstack
Copy link

corticalstack commented Oct 30, 2020

Getting same issue trying to use "from azureml.opendatasets import Diabetes" with error "Unsupported Linux distribution ubuntu 20.04". Tried suggested by @tot0 but didnt resolve:
https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

@YutongTie-MSFT

@corticalstack
Copy link

Had this error again trying to access my own dataset in a storage account blob, error as follows. Code is being run as a local jupyter notebook on Ubuntu 20.04. Code is the "day1-part4-data" notebook:
https://github.com/Azure/MachineLearningNotebooks/blob/master/tutorials/get-started-day1/day1-part4-data.ipynb

which fails on line:
dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

`HTTPError Traceback (most recent call last)
~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in attemp_get_deps()
198 try:
--> 199 blob_deps_to_file()
200 success = True

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in blob_deps_to_file()
190 ssl_context = ssl.create_default_context(cafile=cafile)
--> 191 blob = request.urlopen(deps_url, context=ssl_context)
192 with open(deps_tar_path, 'wb') as f:

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 opener = _opener
--> 222 return opener.open(url, data, timeout)
223

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in open(self, fullurl, data, timeout)
530 meth = getattr(processor, meth_name)
--> 531 response = meth(req, response)
532

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in http_response(self, request, response)
639 if not (200 <= code < 300):
--> 640 response = self.parent.error(
641 'http', request, response, code, msg, hdrs)

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in error(self, proto, *args)
568 args = (dict, 'default', 'http_error_default') + orig_args
--> 569 return self._call_chain(*args)
570

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in _call_chain(self, chain, kind, meth_name, *args)
501 func = getattr(handler, meth_name)
--> 502 result = func(*args)
503 if result is not None:

~/anaconda3/envs/pybasic383/lib/python3.8/urllib/request.py in http_error_default(self, req, fp, code, msg, hdrs)
648 def http_error_default(self, req, fp, code, msg, hdrs):
--> 649 raise HTTPError(req.full_url, code, msg, hdrs, fp)
650

HTTPError: HTTP Error 404: Not Found

During handling of the above exception, another exception occurred:

NotImplementedError Traceback (most recent call last)
in
----> 1 dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/data/_loggerfactory.py in wrapper(*args, **kwargs)
124 with _LoggerFactory.track_activity(logger, func.name, activity_type, custom_dimensions) as al:
125 try:
--> 126 return func(*args, **kwargs)
127 except Exception as e:
128 if hasattr(al, 'activity_info') and hasattr(e, 'error_code'):

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/data/dataset_factory.py in from_files(path, validate)
702 from azureml.data import FileDataset
703
--> 704 dataflow = dataprep().api.dataflow.Dataflow._path_to_get_files_block(_validate_and_normalize_path(path))
705 if validate:
706 _validate_has_data(dataflow, 'Cannot load any data from the specified path. '

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/dataflow.py in _path_to_get_files_block(path, archive_options)
2423 try:
2424 if _is_datapath(path) or _is_datapaths(path):
-> 2425 return datastore_to_dataflow(path)
2426 except ImportError:
2427 pass

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in datastore_to_dataflow(data_source, query_timeout)
25 datastore_values = []
26 for source in data_source:
---> 27 datastore, datastore_value = get_datastore_value(source)
28 if not _is_fs_datastore(datastore):
29 raise NotSupportedDatastoreTypeError(datastore)

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in get_datastore_value(data_source)
78
79 workspace = datastore.workspace
---> 80 _set_auth_type(workspace)
81 return (datastore, DatastoreValue(
82 subscription=workspace.subscription_id,

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/_datastore_helper.py in _set_auth_type(workspace)
141 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.SERVICEPRINCIPAL, json.dumps(auth)))
142 else:
--> 143 get_engine_api().set_aml_auth(SetAmlAuthMessageArgument(AuthType.DERIVED, json.dumps(auth)))
144
145

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py in get_engine_api()
17 global _engine_api
18 if not _engine_api:
---> 19 _engine_api = EngineAPI()
20
21 from .._dataset_resolver import register_dataset_resolver

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/api.py in init(self)
66 pass
67
---> 68 self._message_channel = launch_engine()
69 connect_to_requests_channel()
70

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/azureml/dataprep/api/engineapi/engine.py in launch_engine()
331 engine_path = _get_engine_path()
332 try:
--> 333 dependencies_path = runtime.ensure_dependencies()
334 except Exception as e:
335 _LoggerFactory.trace(log, 'Failed to ensure dependencies' + str(e))

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in ensure_dependencies()
211 return success
212
--> 213 if not attemp_get_deps():
214 # Failed accessing blob, likely an interrupted connection. Try again once more.
215 if not attemp_get_deps():

~/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/runtime.py in attemp_get_deps()
205 err_msg = 'Unsupported Linux distribution {0} {1}.{2}'.format(dist, version[0], version[1])
206 log_event('ensure_dependencies', error=err_msg, missing_pkgs=list(missing_pkgs))
--> 207 raise NotImplementedError(err_msg)
208 except Exception as e:
209 logger.debug("Exception when accessing blob: " + str(e))

NotImplementedError: Unsupported Linux distribution ubuntu 20.04
`

@tot0
Copy link

tot0 commented Nov 2, 2020

Hi @corticalstack, could you try running the below python snippet in your Ubuntu 20.04 environment?

from dotnetcore2 import runtime
runtime._enable_debug_logging()
runtime.ensure_dependencies()

This should reveal what dependencies missing for Datasets.

For installing .NET Core 2.1 ahead of time did you install dotnet-runtime-3.1 or dotnet-runtime-2.1?

Cheers.

@corticalstack
Copy link

@tot0 Wrt .NET Core 2.1, I believe it was 3.1 as per:
https://docs.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#2004-

Within a Jupyter notebook I added the 3 lines as requested, then executed:

dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))

And got what seems like multiple errors trying to log in DEBUG mode:

DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False DEBUG - Created a static thread pool for ServiceContext class DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Performing instance discovery: ... DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Performing static instance discovery DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - Authority:Authority validated via static instance discovery DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - TokenRequest:Getting token from cache with refresh if necessary. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:finding with query keys: {'_clientId': '...', 'userId': '...'} DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Looking for potential cache entries: {'_clientId': '...', 'userId': '...'} DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Found 2 potential entries. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Resource specific token found. DEBUG - 2a747d93-9b35-4d5e-a541-cae131c8a5b4 - CacheDriver:Returning token from cache lookup, AccessTokenId: b'ji5H/ccIOfhlbO6LhVa6SPJm1T+uGkOaz40LghSXBzc=', RefreshTokenId: b'WKAoyST6eg+Go79SJMjKcyHKHQ1z1tWx146fEyzlv8M=' DEBUG - Could not load run context RunEnvironmentException: Message: Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run. InnerException None ErrorResponse { "error": { "message": "Could not load a submitted run, if outside of an execution context, use experiment.start_logging to initialize an azureml.core.Run." } }, switching offline: False DEBUG - Could not load the run context and allow_offline set to False

@tot0
Copy link

tot0 commented Nov 3, 2020

@corticalstack Hmmm, those RunContext debug logs make sense from the Dataset calls, they shouldn't have happened during the runtime.ensure_depedencies() call. What version of dotnetcore2 is installed in your environment?
Would it be possible too just see the outcome of running the 3 lines I shared, and not the from_files call? Thanks!

Unfortunately the .NET Core docs don't have any specific 2.1 advice anymore. The package dotnet-runtime-2.1 does exist though and I recommend installing that instead of dotnet-runtime-3.1.

@corticalstack
Copy link

@tot0 version installed is 2.1.15 of dotnetcore2

The only Jupyter output from the 3 lines you shared is as follows:
'/home/jp/anaconda3/envs/pybasic383/lib/python3.8/site-packages/dotnetcore2/bin/deps'

Thanks

@tot0
Copy link

tot0 commented Nov 5, 2020

Ok so if runtime.ensure_dependencies() returns a path like the one you shared that means it all the dependencies exist locally for .NET Core to run.
dotnetcore2==2.1.17 is the newest version and upgraded the underlying .NET Core run time to support newer openssl version installed on newer linux distros (Ubuntu 20 included). It has not yet added full support for all the dependencies required on Ubuntu 20 (so the pre install steps via apt-get is still required) but using the newer version of dotnetcore2 should enable Datasets to run on Ubuntu 20.

@corticalstack
Copy link

@tot0 pip uninstalled dotnetcore 2.1.15 and installed latest, all good. Thanks!

@smougel
Copy link

smougel commented Nov 18, 2020

from dotnetcore2 import runtime
runtime._enable_debug_logging()
runtime.ensure_dependencies()

NotImplementedError: Unsupported Linux distribution ubuntu 20.10

pip install dotnetcore2
Collecting dotnetcore2
Using cached dotnetcore2-2.1.19-py3-none-manylinux1_x86_64.whl (28.7 MB)
Requirement already satisfied: distro>=1.2.0 in ./.conda/envs/p8/lib/python3.8/site-packages (from dotnetcore2) (1.5.0)
Installing collected packages: dotnetcore2
Successfully installed dotnetcore2-2.1.19

Any idea ?

@smougel
Copy link

smougel commented Nov 18, 2020

Issue solved

sudo apt install dotnet-runtime-2.1
The following packages have unmet dependencies:
dotnet-runtime-deps-2.1 : Depends: libicu but it is not installable or
libicu66 but it is not installable or
libicu65 but it is not installable or
libicu63 but it is not installable or
libicu60 but it is not installable or
libicu57 but it is not installable or
libicu55 but it is not installable or
libicu52 but it is not installable
E: Unable to correct problems, you have held broken packages.

  1. Install libicu
    wget http://ftp.us.debian.org/debian/pool/main/i/icu/libicu63_63.2-3_amd64.deb
    sudo dpkg -i libicu63_63.2-3_amd64.deb
  1. sudo apt install dotnet-runtime-2.1

Don't know if there is a best way to do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants