Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BLD: Update Gitpod to use docker installation flow and pip/meson for setup #54046

Merged
merged 23 commits into from
Jul 11, 2023

Conversation

theuerc
Copy link
Contributor

@theuerc theuerc commented Jul 7, 2023

The docker images on Dockerhub that are used with Gitpod are outdated by like 6 months, which is causing the build to fail in Gitpod (because the docker images are using python 3.8.16).

This is the original repo in gitpod with the issue (using the latest commit):
https://gitpod.io#https://github.com/pandas-dev/pandas/commit/457690995ccbfc5b8eee80a0818d62070d078bcf

(pandas-dev) gitpod > /workspace/pandas $ python -i
Python 3.8.16 | packaged by conda-forge | (default, Feb  1 2023, 16:01:55) 
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/workspace/pandas/pandas/__init__.py", line 46, in <module>
    from pandas.core.api import (
  File "/workspace/pandas/pandas/core/api.py", line 47, in <module>
    from pandas.core.groupby import (
  File "/workspace/pandas/pandas/core/groupby/__init__.py", line 1, in <module>
    from pandas.core.groupby.generic import (
  File "/workspace/pandas/pandas/core/groupby/generic.py", line 70, in <module>
    from pandas.core.frame import DataFrame
  File "/workspace/pandas/pandas/core/frame.py", line 137, in <module>
    from pandas.core.generic import (
  File "/workspace/pandas/pandas/core/generic.py", line 191, in <module>
    from pandas.core.window import (
  File "/workspace/pandas/pandas/core/window/__init__.py", line 1, in <module>
    from pandas.core.window.ewm import (
  File "/workspace/pandas/pandas/core/window/ewm.py", line 41, in <module>
    from pandas.core.window.numba_ import (
  File "/workspace/pandas/pandas/core/window/numba_.py", line 20, in <module>
    @functools.cache
AttributeError: module 'functools' has no attribute 'cache'

I've made a couple changes to fix this and other errors related to the Gitpod build. This is what it looks like with the changes:
https://gitpod.io#https://github.com/pandas-dev/pandas/pull/54046

gitpod@theuerc-pandas-5ztda316yrs:/workspace/pandas$ python -i
Python 3.10.8 (main, Dec  6 2022, 14:13:21) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas
+ /usr/local/bin/ninja
[1/1] Generating write_version_file with a custom command
>>> pandas.DataFrame({'test': 'testing'}, index=[0])
      test
0  testing

The Bigger Changes

I'm following the updated development environment creation instructions for these changes, but with the docker option instead of the mamba option (as mamba requires version pinning and causes other issues that can make it hard to maintain).

  • Update setup to use pip/meson.
  • Add a duplicate line in command to resolve a small issue with pip when prebuilding with pip/meson. Strangely this prebuild issue is not present on all of the branches I was testing on.
  • Build the Gitpod image from the Dockerfile in the base of the repo instead of pulling the image from Dockerhub, so that it will always stay current with the rest of the repo.
    • Gitpod will build and install all of the dependencies inside the image, and then reuse the image after that. If there are any changes to the Dockerfile, it will rebuild the image automatically.
    • This would automate the process of having to manually update the Docker image on Dockerhub every 3-6 months to get Gitpod working again. The image will rebuild itself when Gitpod detects a difference in the Dockerfile. source
Screen Shot 2023-07-07 at 3 18 33 PM

The Smaller Changes

  • Update gitpod/Dockerfile to use the latest version of conda (though the mamba flow wouldn't be used at all anymore).
  • Remove intermediary echo statements
  • Remove legacy plugin settings that were causing errors.

Next Steps and Other Considerations

I tried to do the minimal changes to get everything working again, but it seems like the gitpod/ folder could be gotten rid of completely. Only .gitpod.yml, Dockerfile, and gitpod/settings.json are needed for Gitpod (could change gitpod/settings.json > settings.json).

gitpod/Dockerfile, gitpod/gitpod.Dockerfile, and gitpod/workspace_config are customizations for the Gitpod workspace, but they have to be continually updated over time for them to work. Otherwise they just cause errors after a few months (like they're doing in the repo right now).

Enabling prebuilds for branches/forks/pull-requests would be cool in the future. It would allow for instantly opening/running pull requests in a web browser. Prebuilds save about 3 minutes of time each time Gitpod is booted up. I wasn't sure if there would be cost associated with it so I didn't enable autoprebuilds for those options in the .gitpod.yml file. Right now prebuilds have to be done manually since they are only enabled for main.

@theuerc theuerc changed the title BLD: Updates Gitpod to use docker installation flow and pip/meson for setup BLD: Update Gitpod to use docker installation flow and pip/meson for setup Jul 8, 2023
@lithomas1
Copy link
Member

Thanks for picking this up. This looks good to me.

One comment though:

While theoretically using the Dockerfile would be fine, some dependencies such as pyarrow and numba are a PITA to install, which is why I'd recommend using conda, and figuring out a way for it to update dependencies based on the environment.yaml.

If it works with the Dockerfile (you probably want to verify that all tests pass in Gitpod), I'm fine with leaving it as is, though.

@theuerc
Copy link
Contributor Author

theuerc commented Jul 8, 2023

I can look into a better way to set up conda for this.

These are the results of the run of the full suite of tests with current gitpod configuration:

=================== 93 failed, 216967 passed, 2514 skipped, 2004 xfailed, 13 xpassed, 35 warnings, 56 errors in 3495.84s (0:58:15) ===================

From what I saw, it looks like all of the errors/failures are connection failures like this:

ERROR pandas/tests/io/parser/test_network.py::TestS3::test_parse_public_s3_bucket_chunked - botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: "http://127.0.0.1:5555/pandas-test-79107f44-e164-472a-b025-a7...

I'll run the tests again in my local environment to see if I get the same number of failures/exceptions. My understanding is that some tests shouldn't pass without more configuration:
Screen Shot 2023-07-08 at 6 32 47 PM

I'll also save the output to a log file next time.

@theuerc
Copy link
Contributor Author

theuerc commented Jul 9, 2023

Just to update--I cannot get the full suite of tests to run in my local docker environment without the process being killed prematurely, but I was able to run the following tests, which are the isolated the failures/errors from the Gitpod full run:
pytest pandas/tests/io/test_sql.py pandas/tests/io/parser/test_network.py pandas/tests/io/test_fsspec.py pandas/tests/io/test_parquet.py pandas/tests/io/test_s3.py pandas/tests/io/excel/test_readers.py pandas/tests/io/excel/test_style.py pandas/tests/io/json/test_compression.py pandas/tests/io/json/test_pandas.py pandas/tests/io/xml/test_to_xml.py > myoutput.log

I've run this command in both Gitpod and my local docker installation. This dummy repo has the results of the runs:
https://github.com/theuerc/pandas_errors_analytics

The summary is that the errors are exactly the same in both the local docker build and in Gitpod (93 connection-related errors), so I think that everything is working as expected.

Just to be comprehensive, this is the full output for the Gitpod run of the entire test suite:
myoutput.log

I think that everything is working as it should.

@lithomas1 lithomas1 added this to the 2.1 milestone Jul 11, 2023
@mroeschke mroeschke merged commit dbb19b9 into pandas-dev:main Jul 11, 2023
@mroeschke
Copy link
Member

Thanks @theuerc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUILD: Issue while creating DEV environment using Gitpod
3 participants