Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom data type (de)serialization with EFactory #26

Closed
wants to merge 1 commit into from

Conversation

hallvard
Copy link

The implementation uses EFactory (de)serialization for (built-in) types that are not known to be handled well in (emf)json, e.g. types implemented in the ecore package or involving java.lang classes.

Tests are implemented for a custom data type for LocalDate.

@martin-fleck-at
Copy link
Contributor

@hallvard Thank you very much for your contribution! I'll have a look at this a little later today.

Copy link
Contributor

@martin-fleck-at martin-fleck-at left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I had some time to look at this and in general, I really like the idea of using the EFactory createFromString and convertFromString method to serialize/deserialize data types. Thank you very much for that!

However, I am slightly worried that the approach we are taking here is too greedy as we aim to handle all java.lang.* types and the process fails without any particular error if you use such a type but do not provide the necessary create/convert methods. I am not really sure how to best approach this problem, so I am opening it up for discussion. We could do a try-fail approach, where we catch the exception from the calls and fall back to the default jackson approach but maybe there is a more elegant solution since we do not want to try this for all types all the time which would have a negative impact on performance for sure.

@hallvard What do you think?

@@ -26,6 +25,11 @@

public class EDataTypeDeserializer extends JsonDeserializer<Object> {

public static boolean isJavaLangType(final EDataType dataType) {
String instanceClassName = dataType.getInstanceClassName();
return instanceClassName.startsWith("java.lang.") || instanceClassName.indexOf('.') < 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a rather greedy approach since we cannot really guarantee that we support all java.lang.* types.
For instance, the EcoreUtil.createFromString call will fail hard if you do not provide the necessary create/convert function in the factory.

Also having this check as a static method makes it impossible to override this in any sub-class.

Copy link
Author

@hallvard hallvard Mar 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you define a data type, it is your duty (contract to be fulfilled) to implement the create/convert methods if you need to support serialisation. That's part of the job when using EMF. Hence, IMO (as detailed below) we should be even more greedy. I'm not sure why the original implementation doesn't rely on EFactory in general, I'd assume that all pre-defined data types in Ecore support serialization and just should work. However, the tests would need to be re-written, since there are assumptions there of the specific format that may not match Ecore's. Some cases won't work, e.g. Object[] is handled not by a general serialization of Objects, but using JSON-specific logic, if I understand it correctly. I would start by reverting the logic, use EFactory in all cases where the data type has the serializable flag set and then see if there are special cases that could need special handling, e.g. when a data type wraps an array of something that can be serialized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to make certain data types easier to support, we can also define a separate Emfjson model with pre-defined data types, so anyone can import it and use its data types. This can be cleaner than hard-coding the logic in the serialization class.
An orthogonal option is to introduce some annotations on the data type, for controlling how emfjson-specific logic should be used for serialization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are also not sure why the original implementation did not rely on EFactory in general but I think it is the right way forward. However, as to not break backwards compatibility, we should use an EAnnotation to allow switching between strategies as there might be some customizations we are not aware of. There is already little support for annotations in JsonAnnotations.

@@ -35,7 +39,7 @@ public Object deserialize(final JsonParser jp, final DeserializationContext ctxt
}
Class<?> type = dataType.getInstanceClass();

if (type == null || dataType instanceof EEnum || EJAVA_CLASS.equals(dataType)
if (type == null || (!isJavaLangType(dataType)) || EJAVA_CLASS.equals(dataType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you removed the EEnum check for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't really matter, as I guess the default handling of EFactory and current emfjson implementation should be the same. I think the main decision should be made elsewhere and this class always use the EFactory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the EAnnotation approach this check should be replaced by checking the annotation to see which strategy should be used.


private JsonSerializer<Object> findValueSerializer(final SerializerProvider provider) throws JsonMappingException {
if (feature instanceof EAttribute && shouldUseEFactory(((EAttribute) feature).getEAttributeType())) {
return EDATA_TYPE_SERIALIZER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid using static instances of concrete classes like this as it makes it impossible for adopters to avoid it's usage. At the moment serializers are registered through the EMFSerializers class so I think we should use the registration mechanism to retrieve the serializer we want:

provider.findValueSerializer(EcoreType.DataType.class);

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea!

throws JsonMappingException {
if ((!feature.isMany()) && feature instanceof EAttribute
&& shouldUseEFactory(((EAttribute) feature).getEAttributeType())) {
return EDATA_TYPE_DESERIALIZER;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid using static instances of concrete classes like this as it makes it impossible for adopters to avoid it's usage. At the moment deserializers are registered through the EMFDeserializers class so I think we should use the registration mechanism to retrieve the deserializer we want:

ctxt.findContextualValueDeserializer(ctxt.getTypeFactory().constructType(EcoreType.DataType.class), null)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: Sounds like a good idea!

@eneufeld
Copy link
Contributor

I added your test to the master branch and run it.
The error pointed me to use com.fasterxml.jackson.datatype:jackson-datatype-jsr310 .
Thus I added:

<dependency>
	<groupId>com.fasterxml.jackson.datatype</groupId>
	<artifactId>jackson-datatype-jsr310</artifactId>
	<version>2.11.3</version>
</dependency>

to the pom.xml .
In the StandardFixture class I added:
mapper.registerModule(new JavaTimeModule());

With this the test succeeds for me.
I'm wondering do we need additional custom code or does this solve your problem?

@eneufeld
Copy link
Contributor

I think it would be cleaner to add deserializer and serializer similar to how it is done in the module for the cases you need.
If we find multiple cases missing we could create a new optional module.

@hallvard
Copy link
Author

My contribution is an attempt to find a middle ground between emfjson's approach, and EMF approach to serialisation. The latter assumes all data value serialisation is handled by some EFactory, the basic cases are delegated to the default Ecore factory while any model may define its own datatypes with custom serialisation. You may argue that it doesn't really matter who does the serialization, as long as what you write is what you get back, but one thing you will miss is validation of values. A custom datatype based on String may e.g. limit the size and syntax, and one based on int can limit the range, and you will miss this if you don't use the EFactory. Hence, in my opinion, if the goal of emfjson is a json-based replacement of EMF serialization, it should delegate all serialisation of data values to the EFactory, to ensure compatible behavior.

An argument for emfjson's approach is when you want to consume/produce json from/to other sources, e.g. REST services. Then you (may argue that you) need to ensure that you are able to read the format. In my opinion, this should rather be done by defining custom EMF model with datatypes and serialisation suitable for that format. Emfjson could help by defining basic data types that can read/write the standard json data types, in the cases where Ecore's EFactory doesn't work.

The solution above where support for time types is provided by an existing library is a very special case that will not work in general. In my models, there are other types that will require using the EFactory. I don't consider it a solution to write a jackson-specific module with serialisers and deserialisers, I already have the necessary logic in my EFactory and want to use that.

As mentioned, my code tries to find a middle ground, a better approach would be to have some feature identifiers which allows you to select between various approaches with a "pure" EFactory approach being one option, a default approach which is similar to what I suggest above and maybe other relevant variations.

@eneufeld
Copy link
Contributor

eneufeld commented Mar 2, 2022

So after thinking about what you wrote, I see your point.

I agree that we should support the transformation of custom datatypes if it is supported by the EFactory without the need to rely on external modules.

Could you look at the review comments @martin-fleck-at wrote?

@martin-fleck-at
Copy link
Contributor

martin-fleck-at commented Mar 21, 2022

@hallvard Please excuse the delayed reply. First of all, thank you very much for taking the time to elaborate on your thoughts and the contribution, specifically. We had a deeper discussion about this and think the the extra annotation to mark which strategy should be used for datatype serialization/deserialization is a good middle ground to take here.

Our main concern was breaking the current behavior for adopters and also adding somewhat arbitrary checks for when to use the EFactory. However, using the annotation approach we can keep backwards compatibility while also extending the usage of the EFactory to a degree that feels more appropriate. As you mentioned, we should be able to rely on the convert/create EFactory contract baked into EMF and actually should use only the EFactory for serialization/deserialization. Adopters can then switch between the new and old strategy based on the new annotation.

What do you think? Would you have some time to implement that approach as part of this PR as well as have a look at the how to get the serializer/deserializer?

@maho7791
Copy link
Contributor

I think EMFJson should be able to support built in serializers and de-serializers out of the box. Forcing EMFJ to use just the EMF to string method should be activated by an annotation.
Increasing a version in a right way, and noting this change in the release note can point users to the change.
We e.g. want to serialize Double[] wrapped in a EDatatype. Serializing these array types is oob supported by Jackson, why not using this?
Regards,
Mark

@martin-fleck-at
Copy link
Contributor

@maho7791 We totally agree that both options (serializing through Jackson and serializing through EMF Factory) should be supported properly and managed through an annotation. This is basically what we are missing in this PR for it to be merged.

@koegel
Copy link

koegel commented Aug 23, 2022

This PR seems to have stalled. I will now close it, feel free to reopen if you want to address the remaining issues as indicated by @martin-fleck-at

@koegel koegel closed this Aug 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Features should be (de)serialized by using the EFactory, not Jackson's built-in logic
5 participants