-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reactivate the integration tests for remote file access (GCS, S3) #210
base: dev
Are you sure you want to change the base?
Conversation
da00cae
to
1dc349c
Compare
1dc349c
to
6593606
Compare
@popescu-v & @folmos-at-orange : a detail we forgot the S3 & GCS drivers are not released publicly and thus not available in the dev image. When trying to access to the remote files during the tests khiops complains
|
cb7c126
to
35c3a6f
Compare
Yes, AFAIU we need to wait for the drivers to be publicly released before the tests can work. |
18cf7e9
to
a750996
Compare
ed8cb75
to
571f402
Compare
beab51e
to
04fbd2b
Compare
e9011a8
to
3351a38
Compare
3e90f62
to
53fdbc3
Compare
There is an issue remaining that breaks the tests and that I cannot fix right now. It seems that khiops under the conda env cannot use the remote files drivers installed system wide. They cannot be installed via conda as there is no conda release yet. This execution
fails with
|
This is true. Hence, IMHO the way to go for the current PR is to test the remote access only in the native case (Conda environments for test, but with the native Then, as soon as the Conda packages for the drivers are available, a new PR should activate these integration tests in full Conda environments. |
2e88e35
to
4602e44
Compare
.github/workflows/tests.yml
Outdated
@@ -43,7 +43,7 @@ jobs: | |||
# because the `env` context is only accessible at the step level; | |||
# hence, it is hard-coded | |||
image: |- | |||
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || 'latest' }} | |||
ghcr.io/khiopsml/khiops-python/khiopspydev-ubuntu22.04:${{ inputs.image-tag || '10.2.3.0.s3-gcs-remote-files' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it isn't latest
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have misunderstood the process agreed with @popescu-v.
As long as the PR is in review, the specific image build with this branch is not set yet as the latest.
Just before the merge then I can change to the latest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. To fix beforem merging then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; in this case we need specific tag, as the Dockerfiles have been changed in the current PR. If the PR hadn't changed the Dockerfiles, then latest
could have been used.
Hence, the process would be:
- (before merging) Manually launch the Docker build workflow (cannot be set to
latest
on the PR!) to check that everything is OK - (before merging) Change tag to 'latest' in the Test workflow (hence, the tests would fail! Their workflow should be canceled from the outset)
- Merge the PR into
dev
- Manually launch Docker build workflow on
dev
, while settingSet to latest
andPublish
; thus, the Docker modifications from the (just merged PR) would be taggedlatest
. - (Just in case) Manually launch all tests (including long ones) on
dev
to check that everything is OK.
tests/test_remote_access.py
Outdated
Temporary skip the S3 and GCS tests under a conda environment | ||
until the drivers for remote files access | ||
are released as conda packages |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> Temporarily skip the S3 and GCS tests in a conda environment ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo fixed
def setUp(self): | ||
self.skip_if_no_config() | ||
if self.is_in_a_conda_env() and self.should_skip_in_a_conda_env(): | ||
# TODO : Temporary skip the test under a conda environment | ||
# until the drivers for remote files access | ||
# are released as conda packages | ||
self.skipTest( | ||
f"Remote test case {self.remote_access_test_case()} " | ||
"in a conda environment is currently skipped" | ||
) | ||
self.print_test_title() | ||
|
||
def tearDown(self): | ||
# Cleanup the output dir (the files within and the folder) | ||
if hasattr(self, "folder_name_to_clean_in_teardown"): | ||
for filename in fs.list_dir(self.folder_name_to_clean_in_teardown): | ||
fs.remove( | ||
fs.get_child_path( | ||
self.folder_name_to_clean_in_teardown, filename | ||
) | ||
) | ||
fs.remove(self.folder_name_to_clean_in_teardown) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid using hasattr
and setting up your own temporary urls, instead of a per test setup/tear-down, I'd use one for the whole class.
- In the
setupClass
method you simply set the env varKHIOPS_TMP_DIR
with a fresh runner. - In the
tearDownClass
method you clean up everything and restore the original temporary directory and runner.
Something like this example
import unittest
import os
import subprocess
from khiops import core as kh
from khiops.core.internals.runner import KhiopsLocalRunner
class TestWithAltRunner(unittest.TestCase):
@classmethod
def setUpClass(cls):
# Save the original runner and temp dir
cls.original_runner = kh.get_runner()
cls.original_khiops_temp_dir = os.environ.get("KHIOPS_TMP_DIR")
# Set the test suite temp dir
os.environ["KHIOPS_TMP_DIR"] = "/tmp/mytmpdir/" # replace here with the URL
cls.runner = KhiopsLocalRunner()
@classmethod
def tearDownClass(cls):
# Restore the original runner and temp dir
kh.set_runner(cls.original_runner)
if cls.original_khiops_temp_dir is None:
del os.environ["KHIOPS_TMP_DIR"]
else:
os.environ["KHIOPS_TMP_DIR"] = cls.original_khiops_temp_dir
# Clean the whole alternative temporary file
# ...
def test(self):
print("\n======= Current test runner =========")
kh.get_runner().print_status()
… non-conda environments - 2 fake servers (fakes3 and fake-gcs-server) are used locally instead of a real S3 and GCS servers. - The fake servers are provisioned with sample files before the tests run - During the construction of the khiopsdev docker image we need to install the s3/gcs drivers required by khiops - As these drivers are not released yet as conda packages we must disable the tests in conda environments (except when using a docker runner)
…s, upload_file and download_file do not return a response but raises an exception in case of error
…e is not available
4602e44
to
a5604b6
Compare
) | ||
# normalize the raised exception | ||
except Exception as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would restrict the "catch" here to S3UploadFailedError
:
except S3UploadFailedError as exc:
raise RuntimeError(...) from exc
Thus, wenever there is a different (unexpected) exception, it would not be handled and whence would be propagated to the caller code so that we know about it (e.g. from inspecting the CI logs).
# Avoid resolving a fake s3-bucket.localhost hostname | ||
# Alternate builders (buildx via moby buildkit) mount /etc/hosts read-only, the following command will fail | ||
# echo "127.0.0.1 s3-bucket.localhost" >> /etc/hosts | ||
# You will have to add the `add-hosts` input instead (https://github.com/docker/build-push-action/#inputs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why can't we do this directly here? This Docker file is only useful in the CI context anyway.
@@ -79,17 +118,47 @@ def skip_if_no_config(self): | |||
"has no configuration available" | |||
) | |||
|
|||
@staticmethod | |||
def is_in_a_conda_env(): | |||
# CONDA_DEFAULT_ENV and CONDA_PROMPT_MODIFIER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, what we want here is to distinguish whether the khiops-core
Khiops binaries package is installed in a Conda environment or in a native way (in the OS):
- if installed natively, then these tests can be run
- otherwise, they cannot be, until the Conda driver packages are shipped.
In order to detect whether we are running a Conda-installer khiops-core
, we can rely on the fact that Conda-installed khiops-core
has the Khiops executables in the same place as the Conda environment's bin
directory. Hence, we can do something like this:
@staticmethod
def is_in_a_conda_env():
"""Detects whether this is run from a Conda environment and khiops-core is installed in the same env"""
# Get path to the Khiops executable
khiops_path = kh.get_runner()._khiops_path
# If $(dirname khiops_path) is identical to $CONDA_PREFIX/bin, then return True
conda_prefix = os.environ.get("CONDA_PREFIX")
return conda_prefix is not None and os.path.join(conda_prefix, "bin") == os.path.dirname(khiops_path)
# CONDA_DEFAULT_ENV and CONDA_PROMPT_MODIFIER | ||
# are the only usable env variables that let us know we are in a conda env | ||
conda_default_env = os.environ.get("CONDA_DEFAULT_ENV") | ||
return conda_default_env and conda_default_env.endswith("_conda") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .endswith
leaks CI details in the tests (as the _conda
env names for Conda-installed khiops-core
are a CI detail).
The proposal from the previous comment avoids this.
def setUp(self): | ||
self.skip_if_no_config() | ||
if self.is_in_a_conda_env() and self.should_skip_in_a_conda_env(): | ||
# TODO : Temporary skip the test under a conda environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrase TODO comment to something like
# TODO: Temporary skip the test for Conda-installed khiops-core
@@ -177,17 +251,21 @@ def test_train_predictor_fail_and_log_with_remote_access(self): | |||
log_file_path = fs.get_child_path( | |||
self._khiops_temp_dir, f"khiops_log_{uuid.uuid4()}.log" | |||
) | |||
|
|||
# no cleaning required as an error would raise without any result produced |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change "an error would raise" to "an exception would be raised".
|
||
runner.samples_dir = f"s3://{bucket_name}/khiops-cicd/samples" | ||
resources_directory = KhiopsTestHelper.get_resources_dir() | ||
# WARNING : khiops temp files cannot be remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add empty line before the comment (to have paragraphs).
resources_directory = KhiopsTestHelper.get_resources_dir() | ||
# WARNING : khiops temp files cannot be remote | ||
cls._khiops_temp_dir = f"{resources_directory}/tmp/khiops-cicd" | ||
# whereas root_temp_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete "whereas".
Add empty line before the beginning of the comment (to have the commented block as a paragraph).
runner.samples_dir = f"gs://{bucket_name}/khiops-cicd/samples" | ||
cls._khiops_temp_dir = f"gs://{bucket_name}/khiops-cicd/tmp" | ||
resources_directory = KhiopsTestHelper.get_resources_dir() | ||
# WARNING : khiops temp files cannot be remote |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add empty line before comment.
resources_directory = KhiopsTestHelper.get_resources_dir() | ||
# WARNING : khiops temp files cannot be remote | ||
cls._khiops_temp_dir = f"{resources_directory}/tmp/khiops-cicd" | ||
# whereas root_temp_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete "whereas". Add empty line before the first line of the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments.
Docker is installed in the khiopspydev docker image so that it can start the local fake servers .
The container must be started with the --privileged option otherwise docker inside the container would not start.
The fake servers are pre-provisioned with sample files to be ready before the tests run
closes #175, #53