Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NotImplementedError: unable to open file: libtensorflow_io.so #1313

Closed
dgoldenberg-audiomack opened this issue Feb 25, 2021 · 14 comments
Closed

Comments

@dgoldenberg-audiomack
Copy link

Running on AWS in EMR, bootstrapping as follows:

pip3 install --user tensorflow==2.4.0
pip3 install --user tensorflow_recommenders==v0.4.0
# Nightly fixes https://github.com/tensorflow/io/issues/1254 (has support for strings in input parquet).
pip3 install --user tensorflow-io-nightly

Seeing a library load error as below. The code is basically like this:

    def load_dataset(ds_name, files, columns):
        dataset = tfio.IODataset.from_parquet(files[0], columns=columns)
        for file_name in files[1:]:
            ds = tfio.IODataset.from_parquet(file_name, columns=columns)
            dataset = dataset.concatenate(ds)
        return dataset

I'm loading a parquet file like this: Loading s3://my-bucket/dir1/dir2/part-00000-6f1f5a9d-95ac-462c-b148-fcb9404d6972-c000.snappy.parquet

Error:

Traceback (most recent call last):
...
(my code) -- line 236, in load_dataset
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/io_dataset.py", line 275, in from_parquet
    filename, columns=columns, internal=True
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/parquet_dataset_ops.py", line 30, in __init__
    components, shapes, dtypes = core_ops.io_parquet_readable_info(
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/__init__.py", line 88, in __getattr__
    return getattr(self._load(), attrb)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/__init__.py", line 84, in _load
    self._mod = _load_library(self._library)
  File "/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/__init__.py", line 71, in _load_library
    + "{}, from paths: {}\ncaused by: {}".format(filename, filenames, errs)
NotImplementedError: unable to open file: libtensorflow_io.so, from paths: ['/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/libtensorflow_io.so']
caused by: ['/home/hadoop/.local/lib/python3.7/site-packages/tensorflow_io/core/python/ops/libtensorflow_io.so: undefined symbol: _ZNK10tensorflow10FileSystem8BasenameEN4absl14lts_2020_02_2511string_viewE']
@yongtang
Copy link
Member

@dgoldenberg-audiomack With PR #1309, the tensorflow-io-nightly has been switched to depending on tf-nightly. Since there are some changes in tensorflow core repo, tf-nightly's API might be slightly different from tensorflow 2.4.0.

For your specific issues, I think if you use a specific tensorflow-io-nightly version earlier than 2021/02/10:

pip3 install --user tensorflow-io-nightly=<earlier than 2021/02/10>

The issue will be resolved for your case for now.

@dgoldenberg-audiomack
Copy link
Author

@yongtang Thanks for your fast response on this one. Oddly, I pushed my configuration back to tensorflow_io==0.17.0 from nightly. And with that, my parquet reads appear to be working. They looked broken per #1254. Somehow, I'm not seeing the #1254 error now. I'm thinking to run on 0.17.0 for now and if something breaks with parquet reading, then use your recommendation and switch to tensorflow-io-nightly=<earlier than 2021/02/10>.

@dgoldenberg-audiomack
Copy link
Author

@yongtang Hi, I'm looking to run some code with Tensorflow v.2.3.1.

What version of tensorflow io would you recommend that would be compatible with 2.3.1?

Also, any way to get a hold of a compatible version which would have this fix for parquet processing?

When installing tensorflow-io with pip, any way to cause it not to install tensorflow? I want to make sure I'm using the pre-installed TF 2.3.1 and not change anything there.

Thanks.

@yongtang
Copy link
Member

@dgoldenberg-audiomack You can find version compatibility on:
https://github.com/tensorflow/io#tensorflow-version-compatibility

We normally don't do patch for past releases due to the resource limitation. We can consider it if there is a great need (or for security/vulnerability reasons).

@dgoldenberg-audiomack
Copy link
Author

Thank you for that pointer, @yongtang.

Specifically for TF 2.3.1, it was released on Sep 24, 2020. Looking at that table, I see

0.16.0 2.3.x Oct 23, 2020
0.15.0 2.3.x Aug 03, 2020

I assume I'll want to run TF IO 0.16.0.

So without a patch to 0.16.0, there is no support for strings in Parquet, which I would classify as a great need. Could we please have this patch generated? Thank you.

@yongtang
Copy link
Member

there is no support for strings in Parquet

@dgoldenberg-audiomack Do you know the specific commit that fixed this issue? To release a patch we will need the following:

  1. Create a R0.16 branch
  2. Cherry-pick the specific commit to the branch
  3. Update the version to 0.16.1
  4. Build on GitHub CI and, if everything passes we can push to pypi.org

@dgoldenberg-audiomack
Copy link
Author

@yongtang Hi, I don't know, I was just going by your earlier comment

@dgoldenberg-audiomack With PR #1309, the tensorflow-io-nightly has been switched to depending on tf-nightly. Since there are some changes in tensorflow core repo, tf-nightly's API might be slightly different from tensorflow 2.4.0.

For your specific issues, I think if you use a specific tensorflow-io-nightly version earlier than 2021/02/10:

pip3 install --user tensorflow-io-nightly=<earlier than 2021/02/10
The issue will be resolved for your case for now.

@yongtang
Copy link
Member

yongtang commented Apr 2, 2021

@dgoldenberg-audiomack The issue fixed by #1309 is the API compatibility issue which is different from the comment of there is no support for strings in Parquet,. Can you explain a little more about the "strings support" you are referring to in the comment?

@dgoldenberg-audiomack
Copy link
Author

dgoldenberg-audiomack commented Apr 2, 2021

@yongtang My goal is to be able to read parquet into tf datasets without any issues.

We had started this discussion here: #1254.

You had added PR #1262 for the fix to that issue.

If I'm using this:

pip3 install --user tensorflow==2.4.0
pip3 install --user tensorflow_recommenders==v0.4.0
# Nightly fixes https://github.com/tensorflow/io/issues/1254 (has support for strings in input parquet).
pip3 install --user tensorflow-io-nightly

then the question is, will I be able to? TF 2.4.0 or 2.4.1, TFRS v0.4.0. What version of TF IO should I use? so that I have the fix for #1254 but also don't get the error described in this ticket:

libtensorflow_io.so: undefined symbol: _ZNK10tensorflow10FileSystem8BasenameEN4absl14lts_2020_02_2511string_viewE'

I'm just looking for a clean deployment. Whether any more fixes or patches are necessary, I cannot tell you.

It seems like if I do

pip3 install --user tensorflow==2.4.1
pip3 install --user tensorflow_recommenders==v0.4.0
pip3 install --user tensorflow-io
pip3 install --user boto3

then parquet is loaded with no issues. Maybe your fix for #1254 is already in TF IO latest? With this, I don't seem to get the undefined symbol, either...

@yongtang
Copy link
Member

yongtang commented Apr 7, 2021

To release a version that includes #1262 and works with tensorflow 2.4.x, I think the easiest paths could be:

  1. Release 0.17.1 with cherry-pick of Fix incomplete row reading issue in parquet files #1262 and pin to tensorflow 2.4.x, or
  2. Release 0.18.0 with current master and pin to tensorflow 2.4.x, then move to tensorflow 2.5.0 for 0.19.0 release.

cc @kvignesh1420 @terrytangyuan any insight on next release?

@kvignesh1420
Copy link
Member

@yongtang I think it's better to go with the first option: "Release 0.17.1 with cherry-pick of #1262 and pin to tensorflow 2.4.x" due to the following reasons:

  • The change is minor and will not affect other API's.
  • If we release 0.18.0 (current master) and pin to 2.4.x then the file-system plugins will fail to load. This is a major concern which we don't want. (for ex: the API compatibility checks fail when the current tfio master is used with tf 2.4.x. Reference)

@yongtang
Copy link
Member

V0.17.1 has been released:
https://github.com/tensorflow/io/releases/tag/v0.17.1
https://pypi.org/project/tensorflow-io/0.17.1/

binaries:

$ sha256sum *.whl
1cbd071850901d3adc4aecbb5030e54e9c772ce71494ac8458fa67bf2ce9521a  tensorflow_io-0.17.1-cp36-cp36m-macosx_10_13_x86_64.whl
b9cf74e838ea7feab56ece6ccb58fc488bb685cc71d0b23bc12eccf68a584744  tensorflow_io-0.17.1-cp36-cp36m-manylinux2010_x86_64.whl
ae101de08bfa8b640af42ca24843c64fe53371299e1c6c56d873fe4a6bcd697c  tensorflow_io-0.17.1-cp36-cp36m-win_amd64.whl
b04ed67434d2de57451fcf7b9a1897e7d9cdb8d308ef16608f8d51167b950aad  tensorflow_io-0.17.1-cp37-cp37m-macosx_10_13_x86_64.whl
00f8a8a1d7561013be4ec0f8eb5fc1ad23ac8864e025aa5164c33c7f9a4c9d3d  tensorflow_io-0.17.1-cp37-cp37m-manylinux2010_x86_64.whl
39f46862eda5b46b98b59da86ca0a5fdc1c2296f58ecd9be57e770ac722cb164  tensorflow_io-0.17.1-cp37-cp37m-win_amd64.whl
fafca8ae03d52f28d2ca153079e730cf4e26a0f237c8e89578400a2033fe8700  tensorflow_io-0.17.1-cp38-cp38-macosx_10_13_x86_64.whl
8bcc45bb7040037161db30f917935f5ea36d10dd21551c7b50030bd0a395ef2d  tensorflow_io-0.17.1-cp38-cp38-manylinux2010_x86_64.whl
8343d604d14257806059fa620294565a168f495cc43b17c9ca779403714457bc  tensorflow_io-0.17.1-cp38-cp38-win_amd64.whl

@yongtang
Copy link
Member

@dgoldenberg-audiomack With the release of 0.17.1 the issue should have been fixed. I will close this issue for now, but please feel free to re-open if the issue persists.

@dgoldenberg-audiomack
Copy link
Author

@yongtang Hi. Great, thanks very much for the fix and for the update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants