Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

destination connector method elements input #1674

Merged
merged 7 commits into from
Oct 17, 2023
Merged

Conversation

rbiseck3
Copy link
Contributor

@rbiseck3 rbiseck3 commented Oct 6, 2023

Description

Ingest destination connectors support for writing raw list of elements Along with the default write method used in the ingest pipeline to write the json content associated with the ingest docs, each destination connector can now also write a raw list of elements to the desired downstream location without having an ingest doc associated with it.

@rbiseck3 rbiseck3 requested review from ryannikolaidis, awalker4 and jasonbot and removed request for awalker4 October 6, 2023 21:02
@rbiseck3 rbiseck3 added enhancement New feature or request ingest labels Oct 6, 2023
@rbiseck3 rbiseck3 linked an issue Oct 6, 2023 that may be closed by this pull request
@rbiseck3 rbiseck3 marked this pull request as ready for review October 12, 2023 22:16
Comment on lines 241 to 258
def write_elements(
self,
elements: t.List[Element],
filename: t.Optional[str] = None,
indent: int = 4,
encoding: str = "utf-8",
*args,
**kwargs,
) -> None:
from fsspec import AbstractFileSystem, get_filesystem_class

fs: AbstractFileSystem = get_filesystem_class(self.connector_config.protocol)(
**self.connector_config.get_access_kwargs(),
)

logger.info(f"Writing content using filesystem: {type(fs).__name__}")

s3_folder = self.connector_config.path

s3_output_path = str(PurePath(s3_folder, filename)) if filename else s3_folder
element_dict = convert_to_dict(elements)
with tempfile.NamedTemporaryFile(mode="w+", encoding=encoding) as tmp_file:
json.dump(element_dict, tmp_file, indent=indent)
logger.debug(f"Uploading {tmp_file.name} -> {s3_output_path}")
fs.put_file(lpath=tmp_file.name, rpath=s3_output_path)

def write(self, docs: t.List[BaseIngestDoc]) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unable to do the write/write_dict/write_elements pattern for this as you did the others?

Copy link
Contributor Author

@rbiseck3 rbiseck3 Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is a bit unique because I need to write the content to the filesystem regardless to upload to s3 using the pattern we've been using of fs.put_file(). I can look into if that library support writing from a byte stream to the destination in which case maybe the write/write_dict/write_elements pattern could be used.

Comment on lines 87 to 89
def write_elements(self, elements: t.List[Element], *args, **kwargs) -> None:
elements_dict = convert_to_dict(elements)
self.write_dict(json_list=elements_dict)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as previous comment, can we make this a function defined on BaseIngestDoc?

@rbiseck3 rbiseck3 force-pushed the roman/dest-conn-elements branch from 7d154b1 to d544c50 Compare October 12, 2023 23:54
@rbiseck3
Copy link
Contributor Author

@ryannikolaidis given that there's only 3 implementations of this so far, and how varied those 3 are in the implementation, would make sense until we have some more use cases before we look for generalizations in the code.

@ryannikolaidis
Copy link
Contributor

@ryannikolaidis given that there's only 3 implementations of this so far, and how varied those 3 are in the implementation, would make sense until we have some more use cases before we look for generalizations in the code.

I have a hunch that once you get in and code, most implementations should fall into that write/write_dict/write_elements pattern and would definitely like to avoid another copy and paste across every single connector if there are patterns we can abstract (even if they are optionally overwritten where needed or mixed in). maybe worth taking a second to explore you're comment on s3 as an option.

overall I think the code itself makes sense if you want to start moving forward. that's just my biggest caution here.

@rbiseck3 rbiseck3 force-pushed the roman/dest-conn-elements branch from 21f16f7 to 88c07f3 Compare October 13, 2023 13:31
Copy link
Contributor

@ryannikolaidis ryannikolaidis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! one request: make sure we have this included in the README destination connector steps when the doc PR lands

@rbiseck3 rbiseck3 force-pushed the roman/dest-conn-elements branch from 88c07f3 to 72a44e4 Compare October 16, 2023 20:25
@rbiseck3 rbiseck3 force-pushed the roman/dest-conn-elements branch from 72a44e4 to ddc8e7c Compare October 17, 2023 12:14
@rbiseck3 rbiseck3 changed the title Roman/dest conn elements destination connector method elements input Oct 17, 2023
@rbiseck3 rbiseck3 enabled auto-merge October 17, 2023 12:15
@rbiseck3 rbiseck3 added this pull request to the merge queue Oct 17, 2023
Merged via the queue into main with commit aeaae5f Oct 17, 2023
39 checks passed
@rbiseck3 rbiseck3 deleted the roman/dest-conn-elements branch October 17, 2023 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support destination connector method leveraging List[Elements]
2 participants