Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python SDK: start_offline_to_online_ingestion Fails with default ingestion jar configuration #1275

Closed
jpugliesi opened this issue Jan 19, 2021 · 2 comments · Fixed by #1284
Closed

Comments

@jpugliesi
Copy link

Expected Behavior

In the minimal_ride_hailing.ipynb example notebook, I expect the following cell to run:

job = client.start_offline_to_online_ingestion(
    driver_statistics,
    datetime(2020, 10, 10),
    datetime(2020, 10, 20)
)
# expect offline to online ingestion job to run

Current Behavior

In the minimal_ride_hailing.ipynb example notebook, the following cell:

job = client.start_offline_to_online_ingestion(
    driver_statistics,
    datetime(2020, 10, 10),
    datetime(2020, 10, 20)
)

Produces the following error:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-72-e6363419621a> in <module>
      2     driver_statistics,
      3     datetime(2020, 10, 10),
----> 4     datetime(2020, 10, 20)
      5 )

~/.local/lib/python3.7/site-packages/feast/client.py in start_offline_to_online_ingestion(self, feature_table, start, end)
   1065                 feature_table=feature_table,
   1066                 start=start,
-> 1067                 end=end,
   1068             )
   1069         else:

~/.local/lib/python3.7/site-packages/feast/pyspark/launcher.py in start_offline_to_online_ingestion(client, project, feature_table, start, end)
    250             ),
    251             deadletter_path=client._config.get(opt.DEADLETTER_PATH),
--> 252             stencil_url=client._config.get(opt.STENCIL_URL),
    253         )
    254     )

~/.local/lib/python3.7/site-packages/feast/pyspark/launchers/aws/emr.py in offline_to_online_ingestion(self, ingestion_job_params)
    256 
    257         jar_s3_path = _upload_jar(
--> 258             self._staging_location, ingestion_job_params.get_main_file_path()
    259         )
    260         step = _sync_offline_to_online_step(

~/.local/lib/python3.7/site-packages/feast/pyspark/launchers/aws/emr_utils.py in _upload_jar(jar_s3_prefix, local_path)
    127 
    128 def _upload_jar(jar_s3_prefix: str, local_path: str) -> str:
--> 129     with open(local_path, "rb") as f:
    130         return _s3_upload(
    131             f,

FileNotFoundError: [Errno 2] No such file or directory: 'https://storage.googleapis.com/feast-jobs/spark/ingestion/feast-ingestion-spark-develop.jar'

Steps to reproduce

  1. install feast
pip install feast==0.8.2
  1. Run the cells in the example notebook. Note that I have configured the following Client fields, but not the spark_ingestion_jar config (this config works fine for defining features in feast):
client = Client(
    core_url='feast-feast-core.feast-dev:6565',
    spark_launcher="emr",
    emr_cluster_id="<redacted>",
    emr_region="<redacted>",
    spark_staging_location="<redacted>",
    emr_log_location="<redacted>",
    historical_feature_output_location="<redacted>"
)

Specifications

  • Version: 0.8.2
@jpugliesi
Copy link
Author

I now see this appears related to #1266

@jpugliesi
Copy link
Author

jpugliesi commented Jan 22, 2021

I get a similar error even when defining spark_ingestion_jar configuration on the client, i.e. spark_ingestion_jar=s3://my-bucket/feast-ingestion.jar (with a valid jar of course):

  1. The jar exists:
$ aws s3 ls s3://my-bucket/feast-ingestion.jar
2021-01-19 23:44:12   45031646 feast-ingestion.jar
  1. Failed attempt to kick off ingestion:
job = client.start_offline_to_online_ingestion(
    driver_statistics,
    datetime(2020, 10, 10),
    datetime(2020, 10, 20)
)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-127-e6363419621a> in <module>
      2     driver_statistics,
      3     datetime(2020, 10, 10),
----> 4     datetime(2020, 10, 20)
      5 )

~/.local/lib/python3.7/site-packages/feast/client.py in start_offline_to_online_ingestion(self, feature_table, start, end)
   1167                 feature_table=feature_table,
   1168                 start=start,
-> 1169                 end=end,
   1170             )
   1171         else:

~/.local/lib/python3.7/site-packages/feast/pyspark/launcher.py in start_offline_to_online_ingestion(client, project, feature_table, start, end)
    280             ),
    281             deadletter_path=client._config.get(opt.DEADLETTER_PATH),
--> 282             stencil_url=client._config.get(opt.STENCIL_URL),
    283         )
    284     )

~/.local/lib/python3.7/site-packages/feast/pyspark/launchers/aws/emr.py in offline_to_online_ingestion(self, ingestion_job_params)
    269 
    270         jar_s3_path = _upload_jar(
--> 271             self._staging_location, ingestion_job_params.get_main_file_path()
    272         )
    273         step = _sync_offline_to_online_step(

~/.local/lib/python3.7/site-packages/feast/pyspark/launchers/aws/emr_utils.py in _upload_jar(jar_s3_prefix, local_path)
     82 
     83 def _upload_jar(jar_s3_prefix: str, local_path: str) -> str:
---> 84     with open(local_path, "rb") as f:
     85         uri = urlparse(os.path.join(jar_s3_prefix, os.path.basename(local_path)))
     86         return urlunparse(

FileNotFoundError: [Errno 2] No such file or directory: 's3://my-bucket/feast-ingestion.jar'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant