Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUILD: Issue while creating DEV environment using Gitpod #53685

Closed
1 task done
rsm-23 opened this issue Jun 15, 2023 · 7 comments · Fixed by #54046
Closed
1 task done

BUILD: Issue while creating DEV environment using Gitpod #53685

rsm-23 opened this issue Jun 15, 2023 · 7 comments · Fixed by #54046
Assignees
Labels
Build Library building on various platforms

Comments

@rsm-23
Copy link
Contributor

rsm-23 commented Jun 15, 2023

Installation check

Platform

Linux-6.1.27-060127-generic-x86_64-with-glibc2.10

Installation Method

Built from source

pandas Version

2.0.2-dev

Python Version

3.8.16

Installation Logs

This appears when I run python -m pytest pandas after environment creation in Gitpod. Also, similar message appears when I try to import pandas in terminal.

ImportError while loading conftest '/workspace/pandas/pandas/conftest.py'.
pandas/init.py:46: in
from pandas.core.api import (
pandas/core/api.py:47: in
from pandas.core.groupby import (
pandas/core/groupby/init.py:1: in
from pandas.core.groupby.generic import (
pandas/core/groupby/generic.py:69: in
from pandas.core.frame import DataFrame
pandas/core/frame.py:141: in
from pandas.core.generic import (
pandas/core/generic.py:182: in
from pandas.core.window import (
pandas/core/window/init.py:1: in
from pandas.core.window.ewm import (
pandas/core/window/ewm.py:41: in
from pandas.core.window.numba_ import (
pandas/core/window/numba_.py:20: in
@functools.cache
E AttributeError: module 'functools' has no attribute 'cache'

@rsm-23 rsm-23 added Build Library building on various platforms Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2023
@lithomas1
Copy link
Member

You need Python 3.9+ to run pandas main (as dictated by the NEP 29 schedule). Can you recreate your gitpod env?

@lithomas1 lithomas1 added Closing Candidate May be closeable, needs more eyeballs and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 15, 2023
@rsm-23
Copy link
Contributor Author

rsm-23 commented Jun 15, 2023

@lithomas1 I did try rebuilding it but it seems gitpod picks old python version and since it builds the whole environment automatically, it's not possible to change. From what I understand the docker image needs to be updated.

@jorisvandenbossche jorisvandenbossche removed the Closing Candidate May be closeable, needs more eyeballs label Jun 19, 2023
@lithomas1
Copy link
Member

We talked about this at the dev call. Will assign myself so I don't forget to update the instructions.

@lithomas1 lithomas1 self-assigned this Jun 29, 2023
@rsm-23
Copy link
Contributor Author

rsm-23 commented Jun 30, 2023

Thanks @lithomas1 ! I depend on Gitpod for the dev environment but I have not been able to build the project there. This will be really helpful!

@theuerc
Copy link
Contributor

theuerc commented Jul 2, 2023

Hey guys!

The docker image for pandas/pandas-gitpod:latest on DockerHub is a few months out of date (this is the image referenced in .gitpod.yml). The docker image could be updated manually by whoever has permissions, but it will probably keep breaking every couple of months.

Instead, it might be worthwhile to set up autobuilds on Dockerhub for every time the environment.yml file is changed--pandas is a sponsored open source software on Dockerhub, so I think it has access to autobuilds. This is probably the best solution.

Alternatively, Gitpod has the capability to build and reuse docker images. The pro of using the Dockerfile in the .gitpod.yml file is that it will always be up to date with the rest of the repo. The con is that it takes 10 minutes to initially build the docker image (though future load times will be much shorter because it reuses the image so long as the dockerfile hasn't changed in the repo).

This fixes the issue right now:

In .gitpod.yml

- image: pandas/pandas-gitpod:latest
+ image:
+  file: Dockerfile

There are additional considerations, like that the Dockerfile in the repo right now doesn't create a conda environment, and it throws a few errors because the setup tasks expect it to:

> mkdir -p .vscode
> cp gitpod/settings.json .vscode/settings.json
> conda activate pandas-dev
> git pull --unshallow  # need to force this else the prebuild fails
> git fetch --tags
> python setup.py build_ext --inplace -j 4
> echo "🛠 Completed rebuilding Pandas!! 🛠 "
bash: !!: event not found
> pre-commit install
> echo "✨ Pre-build complete! You can close this terminal ✨ "
> 
> }
bash: conda: command not found
hint: Pulling without specifying how to reconcile divergent branches is
hint: discouraged. You can squelch this message by running one of the following
hint: commands sometime before your next pull:
hint: 
hint:   git config pull.rebase false  # merge (the default strategy)
hint:   git config pull.rebase true   # rebase
hint:   git config pull.ff only       # fast-forward only
hint: 
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Unable to create '/workspace/pandas/.git/shallow.lock': File exists.

However, you can just specify the python version that was installed from the Dockerfile and everything works:

Screen Shot 2023-07-02 at 12 50 02 PM Screen Shot 2023-07-02 at 12 50 53 PM
gitpod@theuerc-pandas-do4jcm1pe06:/workspace/pandas$ python -m pytest pandas
================================================================ test session starts =================================================================
platform linux -- Python 3.10.8, pytest-7.4.0, pluggy-1.2.0
rootdir: /workspace/pandas
configfile: pyproject.toml
plugins: anyio-3.7.0, hypothesis-6.80.0, asyncio-0.21.0, cov-4.1.0, cython-0.2.1, localserver-0.7.1, xdist-3.3.1
asyncio: mode=strict
collecting 177971 items                                                                                                                              ^collected 179090 items        
gitpod@theuerc-pandas-do4jcm1pe06:/workspace/pandas$ python -i
Python 3.10.8 (main, Dec  6 2022, 14:13:21) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.DataFrame({'dont': 'open', 'dead': 'inside'}, index=[0])
   dont    dead
0  open  inside

@lithomas1 I don't know what you have planned for this issue, but I can help out / pair-program some things if you want. I have a long weekend ahead and I am trapped in an Airbnb.

@lithomas1
Copy link
Member

Thanks.

While I update the build instructions, can you help me figure out a good way to update the conda environment in the gitpod?

There seem to be permissions issue with conda env update -f environment.yml.

If you get that working and add that step into the pre-build, you should be able to build pandas.

The docker image will still need to be updated (to prevent having to download new packages and such every time, though).

@theuerc
Copy link
Contributor

theuerc commented Jul 5, 2023

Hey @jorisvandenbossche,
I'm pinging you because you made a lot of the commits for this part.

Is there a reason that we are using the mamba installation flow here instead of the docker installation flow for building pandas in Gitpod? I understand that mamba is recommended in the installation guide, but it seems like using mamba with docker images introduces many unnecessary points of failure (version pinning for mamba, outdated images built from other outdated images on dockerhub, etc).

I think just using the Dockerfile in the base of the repo and letting Gitpod take care of keeping the image up to date is the simplest / most maintainable solution. The dependencies only have to be installed once when the image is initially built, and if there are any changes when the workspace is reopened Gitpod will rebuild the image (source):

Screen Shot 2023-07-05 at 2 41 53 PM

This would completely circumvent the process of having to upload/pull images to and from Dockerhub, and it is just as fast after the image is built for the first time.

I've modified the code to automate the Docker installation flow from the installation guide if that is the route we want to go (using the newer meson builds setup process). Let me know and I can create a pull request with updated code and documentation. Additionally, I think gitpod/Dockerfile, gitpod/workspace_config and gitpod/gitpod.Dockerfile wouldn't be used anymore if we're not using mamba.

This is my branch with the working .gitpod.yml:
https://github.com/theuerc/pandas/tree/gitpod_docker

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Build Library building on various platforms
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants