Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maxi297/fix streams interface #46995

Open
wants to merge 3 commits into
base: brian/concurrent_declarative_source
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions airbyte-cdk/python/airbyte_cdk/sources/abstract_source.py
Original file line number Diff line number Diff line change
Expand Up @@ -56,11 +56,9 @@ def check_connection(self, logger: logging.Logger, config: Mapping[str, Any]) ->
"""

@abstractmethod
def streams(self, config: Mapping[str, Any], include_concurrent_streams: bool = False) -> List[Stream]:
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
"""
:param config: The user-provided configuration as specified by the source's spec.
:param include_concurrent_streams: Concurrent sources can be made up of streams that can be run concurrently and
ones that must be run synchronously. By default, for backwards compatibility this is disabled.
Any stream construction related operation should happen here.
:return: A list of the streams in this source connector.
"""
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@
from dataclasses import InitVar, dataclass
from typing import Any, List, Mapping, Tuple

from airbyte_cdk import AbstractSource
from airbyte_cdk.sources.declarative.checks.connection_checker import ConnectionChecker
from airbyte_cdk.sources.source import Source
from airbyte_cdk.sources.streams.http.availability_strategy import HttpAvailabilityStrategy


Expand All @@ -27,8 +27,8 @@ class CheckStream(ConnectionChecker):
def __post_init__(self, parameters: Mapping[str, Any]) -> None:
self._parameters = parameters

def check_connection(self, source: Source, logger: logging.Logger, config: Mapping[str, Any]) -> Tuple[bool, Any]:
streams = source.streams(config=config, include_concurrent_streams=True) # type: ignore # source is always a DeclarativeSource, but this parameter type adheres to the outer interface
def check_connection(self, source: AbstractSource, logger: logging.Logger, config: Mapping[str, Any]) -> Tuple[bool, Any]:
streams = source.streams(config=config)
stream_name_to_stream = {s.name: s for s in streams}
if len(streams) == 0:
return False, f"No streams to connect to from source {source}"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from abc import ABC, abstractmethod
from typing import Any, Mapping, Tuple

from airbyte_cdk.sources.source import Source
from airbyte_cdk import AbstractSource


class ConnectionChecker(ABC):
Expand All @@ -15,7 +15,7 @@ class ConnectionChecker(ABC):
"""

@abstractmethod
def check_connection(self, source: Source, logger: logging.Logger, config: Mapping[str, Any]) -> Tuple[bool, Any]:
def check_connection(self, source: AbstractSource, logger: logging.Logger, config: Mapping[str, Any]) -> Tuple[bool, Any]:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By not using DeclarativeSource, we avoid the circular dependency. That being said while being more accurate and allowing to re-enable type checking from mypy in CheckStream, we probably have a bigger design problem we should solve eventually.

"""
Tests if the input configuration can be used to successfully connect to the integration e.g: if a provided Stripe API token can be used to connect
to the Stripe API.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -125,24 +125,20 @@ def read(
filtered_catalog = self._remove_concurrent_streams_from_catalog(catalog=catalog, concurrent_stream_names=concurrent_stream_names)
yield from super().read(logger, config, filtered_catalog, state)

def check(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteConnectionStatus:
return super().check(logger=logger, config=config)

def discover(self, logger: logging.Logger, config: Mapping[str, Any]) -> AirbyteCatalog:
return AirbyteCatalog(
streams=[stream.as_airbyte_stream() for stream in self.streams(config=config, include_concurrent_streams=True)]
streams=[stream.as_airbyte_stream() for stream in self._concurrent_streams + self._synchronous_streams]
)

def streams(self, config: Mapping[str, Any], include_concurrent_streams: bool = False) -> List[Stream]:
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
"""
Returns the list of streams that can be run synchronously in the Python CDK.
The `streams` method is used as part of the AbstractSource in the following cases:
* ConcurrentDeclarativeSource.check -> ManifestDeclarativeSource.check -> AbstractSource.check -> DeclarativeSource.check_connection -> CheckStream.check_connection -> streams
* ConcurrentDeclarativeSource.discover -> ManifestDeclarativeSource.discover -> streams

NOTE: For ConcurrentDeclarativeSource, this method only returns synchronous streams because it usage is invoked within the
existing Python CDK. Streams that support concurrency are started from read().
In both case, we will assume that calling the DeclarativeStream is perfectly fine as the result for these is the same regardless of if it is a DeclarativeStream or a DefaultStream (concurrent). This
"""
if include_concurrent_streams:
return self._synchronous_streams + self._concurrent_streams # type: ignore # Although AbstractStream doesn't inherit stream, they were designed to fit the same interface when called from streams()
return self._synchronous_streams
return super().streams(config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to transcribe a convo w/ maxime. This method doesn't technically doesn't need to be overwritten because it just calls the parent implementation.

However, this comment is very useful to explain why this work. Let's just add one more comment that we would deprecate this once the concurrent AbstractStream class implements are more thorough check implementation


def _group_streams(self, config: Mapping[str, Any]) -> Tuple[List[AbstractStream], List[Stream]]:
concurrent_streams: List[AbstractStream] = []
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def connection_checker(self) -> ConnectionChecker:
else:
raise ValueError(f"Expected to generate a ConnectionChecker component, but received {check_stream.__class__}")

def streams(self, config: Mapping[str, Any], include_concurrent_streams: bool = False) -> List[Stream]:
def streams(self, config: Mapping[str, Any]) -> List[Stream]:
self._emit_manifest_debug_message(extra_args={"source_name": self.name, "parsed_config": json.dumps(self._source_config)})
stream_configs = self._stream_configs(self._source_config)

Expand Down
Loading
Loading