-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🎉 Source Shopify: convert prices into numbers #5244
🎉 Source Shopify: convert prices into numbers #5244
Conversation
/test connector=source-shopify
|
item_properties = object_properties.get("items", {}) | ||
self._transform_array(value, item_properties) | ||
|
||
def _transform(self, records: List[Mapping[str, Any]]) -> List[Mapping[str, Any]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't use a generator here, this makes reading a single record slight heavier
def _transform_object(self, transform_object: Mapping[str, Any], properties: Mapping[str, Any]): | ||
for object_property, value in transform_object.items(): | ||
if not value: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure about the purpose of this method.
but if you got falsy value like:
- empty string
- 0
- None
- 0.0
- []
- {}
and your type does not match this value you in trouble, i.e. empty string -> number
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also methods like this better cover with unittest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need some advice. Should I reproduce record for tests or some simple structure will be enough?
self.start_date = start_date | ||
self.shop = shop | ||
self.api_password = api_password | ||
|
||
@staticmethod | ||
def _get_json_types(value_type) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _get_json_types(value_type) -> str: | |
def _get_json_types(value_type) -> List[str]: |
schema_type = self._find_schema_type(schema_types) | ||
if not any(value_json_type in schema_types for value_json_type in value_json_types): | ||
if schema_type == "number": | ||
transform_object[object_property] = self._convert(value, schema_type) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mapping -> MutableMapping
continue | ||
value_json_types = self._get_json_types(type(value)) | ||
schema_type = self._find_schema_type(schema_types) | ||
if not any(value_json_type in schema_types for value_json_type in value_json_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks complicated, why don't you check
if set(value_json_types) - set(schema_types)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'n not sure if it works for int: ["integer", "number"].
value_json_types = self._get_json_types(type(value)) | ||
schema_type = self._find_schema_type(schema_types) | ||
if not any(value_json_type in schema_types for value_json_type in value_json_types): | ||
if schema_type == "number": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like convert works only with numbers, is it true?
why don't you move if
there then?
nested_properties = item_properties.get("properties", {}) | ||
for item in array: | ||
if schema_type == "object": | ||
self._transform_object(item, nested_properties) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what will happen to other types? numbers?
return json_types.get(value_type) | ||
|
||
@staticmethod | ||
def _convert(value: Any, convert_type: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method needs a better name
if schema_type == "object": | ||
self._transform_object(item, nested_properties) | ||
|
||
def _transform_object(self, transform_object: Mapping[str, Any], properties: Mapping[str, Any]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in general, it is hard to track what is going on, and why are we doing the transformation.
Nice clear comments could help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my comments
/test connector=source-shopify
|
/test connector=source-shopify
|
/test connector=source-shopify
|
/test connector=source-shopify
|
|
from typing import Any, Iterable, List, Mapping, MutableMapping | ||
|
||
|
||
class Transformer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we call this a clearer name? maybe DataTypeEnforcer
or something? (not a great name but you get the idea)
from typing import Any, Iterable, List, Mapping, MutableMapping | ||
|
||
|
||
class Transformer: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can probably be a general utility in the CDK. Can you create an issue to make it more general and available in the CDK?
self._schema = schema | ||
|
||
@staticmethod | ||
def _get_json_types(value_type) -> List[str]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a typehint?
if schema_types: | ||
value_json_types = self._get_json_types(type(record)) | ||
schema_type = self._first_non_null_type(schema_types) | ||
if not any(value_json_type in schema_types for value_json_type in value_json_types): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add source code comments for what is happening? It's really unclear what check this branch is performing. This is particularly confusing because record
's type signature is a MutableMapping
which makes me think it will always be a dict
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking more at the code it seems like record
can actually be of any type? maybe it should be called field
instead? also the typehint should be changed
"subtotal_price": "19.00", | ||
"customer": {"total_spent": "0.00"}, | ||
}, | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you isolate the test inputs as much as possible so that it is clear what each input is testing?
For example, the current input tests many things:
- A field with the correct type is not transformed
- A string field which should be a number is transformed
- A field which is nested in an object is transformed
- A field which is nested in an object inside an array is transformed
Could we then have at least 4 test cases, each of which specifically test those?
The guiding principle here is that tests should be as isolated as possible so that:
- a failure clearly indicates what is wrong
- the reader can have confidence that all tests cases are tested (right now the monolithic test case makes it very difficult to know all test cases are accurately tested)
I think we also need an additional test case that array
which contains strings but that should be ints should also be tested right? Currently all items inside arrays are objects in this example
/test connector=source-shopify
|
# Conflicts: # airbyte-integrations/connectors/source-shopify/source_shopify/source.py
/publish connector=connectors/source-shopify
|
Thanks a ton to @vitaliizazmic and everyone else who worked on this. This has been a blocker to our company's adopt of Airbyte, and I'm so glad to see it merged 🎉 |
What
Found price fields, list of them is presented as comment in related issue. Changed type of fields for them in schemas from string to number. Implemented transforming for them from string to number in base stream class, since Shopify API returns strings.
Closes #4841
Pre-merge Checklist
airbyte_secret
./gradlew :airbyte-integrations:connectors:<name>:integrationTest
.README.md
docs/integrations/<source or destination>/<name>.md
including changelog. See changelog example