-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Maxi297/fix streams interface #46995
base: brian/concurrent_declarative_source
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -125,24 +125,20 @@ def read( | |
filtered_catalog = self._remove_concurrent_streams_from_catalog(catalog=catalog, concurrent_stream_names=concurrent_stream_names) | ||
yield from super().read(logger, config, filtered_catalog, state) | ||
|
||
def check(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteConnectionStatus: | ||
return super().check(logger=logger, config=config) | ||
|
||
def discover(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteCatalog: | ||
return AirbyteCatalog( | ||
streams=[stream.as_airbyte_stream() for stream in self.streams(config=config, include_concurrent_streams=True)] | ||
streams=[stream.as_airbyte_stream() for stream in self._concurrent_streams + self._synchronous_streams] | ||
) | ||
|
||
def streams(self, config: Mapping[str, Any], include_concurrent_streams: bool = False) -> List[Stream]: | ||
def streams(self, config: Mapping[str, Any]) -> List[Stream]: | ||
""" | ||
Returns the list of streams that can be run synchronously in the Python CDK. | ||
The `streams` method is used as part of the AbstractSource in the following cases: | ||
* ConcurrentDeclarativeSource.check -> ManifestDeclarativeSource.check -> AbstractSource.check -> DeclarativeSource.check_connection -> CheckStream.check_connection -> streams | ||
* ConcurrentDeclarativeSource.discover -> ManifestDeclarativeSource.discover -> streams | ||
|
||
NOTE: For ConcurrentDeclarativeSource, this method only returns synchronous streams because it usage is invoked within the | ||
existing Python CDK. Streams that support concurrency are started from read(). | ||
In both case, we will assume that calling the DeclarativeStream is perfectly fine as the result for these is the same regardless of if it is a DeclarativeStream or a DefaultStream (concurrent). This | ||
""" | ||
if include_concurrent_streams: | ||
return self._synchronous_streams + self._concurrent_streams # type: ignore # Although AbstractStream doesn't inherit stream, they were designed to fit the same interface when called from streams() | ||
return self._synchronous_streams | ||
return super().streams(config) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just to transcribe a convo w/ maxime. This method doesn't technically doesn't need to be overwritten because it just calls the parent implementation. However, this comment is very useful to explain why this work. Let's just add one more comment that we would deprecate this once the concurrent |
||
|
||
def _group_streams(self, config: Mapping[str, Any]) -> Tuple[List[AbstractStream], List[Stream]]: | ||
concurrent_streams: List[AbstractStream] = [] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By not using
DeclarativeSource
, we avoid the circular dependency. That being said while being more accurate and allowing to re-enable type checking from mypy inCheckStream
, we probably have a bigger design problem we should solve eventually.