Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify object subtyping #43

Closed
jclark opened this issue Mar 26, 2019 · 24 comments
Closed

Specify object subtyping #43

jclark opened this issue Mar 26, 2019 · 24 comments
Labels
Area/Lang Relates to the Ballerina language specification design/incomplete Part of design not yet worked out
Milestone

Comments

@jclark
Copy link
Collaborator

jclark commented Mar 26, 2019

The spec needs to specify how object subtyping works.

This will almost certainly depend on function subtyping #28.

Can we define this in terms of shapes? (ie type denotes set of shapes, and subtype means subset of denoted set of shapes)

It would be intuitive if every object type was a subtype of object {}, since then we would not need any special way to write a type corresponding to the object basic type.

Privacy

One significant complexity that relates only to objects is privacy. We need to be clear on what private means. Two possible interpretations

  1. access only via self (private to this object)
  2. access only by methods that belong to same object type descriptor (private to the object type)

Approach A

One possible approach is as follows. An object type S is a subtype of an object type T provided that the only differences between S and T are the following:

  1. the order of declaration of fields and methods may differ between S and T
  2. a method may be extern in one and not in the other
  3. method bodies in S and T may be different
  4. the function type of a method m in S may be a subtype of the function type of m in T
  5. the type of a field f in S may be a subtype of the type of field f in T
  6. S may have a method m, where T does not have any method m
  7. S may have a field f, where T does have have any field f
  8. the visibility (public/module/private) of a field or method x in S may be greater than the visibility of the corresponding field or method x in T (where public > module > private)

Approach B

An alternative approach is that private aspects of value do not affect type.

  1. same
  2. same
  3. same
  4. same
  5. same
  6. S may have a method m, where T does not have any method m or has a private method m
  7. S may have a field f, where T does have have any field f or has a private field f
  8. S and T may have different private fields and methods
  9. a field or method x in S may be public and the field or method x in T may have module visibility

This only works with first interpretation of private.

@jclark jclark added Area/Lang Relates to the Ballerina language specification design/incomplete Part of design not yet worked out labels Mar 26, 2019
@jclark
Copy link
Collaborator Author

jclark commented Mar 26, 2019

@hasithaa Which interpretation of private do you use currently?

@sameerajayasoma
Copy link
Contributor

The current implementation is based on the 2nd interpretation where access only by methods that belong to the same object type descriptor (private to the object type).

import ballerina/io;
type Person object {
    private string email;
    function __init(string emailAddr = "[email protected]") {
        self.email = emailAddr;
    }
    function foo(Person p) {
        io:println(self.email);
        io:println(p.email);
    }
};
public function main() {
    Person p1 = new;
    Person p2 = new(emailAddr = "[email protected]");
    p1.foo(p2);
}

@jclark
Copy link
Collaborator Author

jclark commented Apr 2, 2019

Great. I think the 2nd interpretation is the more useful one. This means Approach B won’t work, which leaves Approach A.

@sameerajayasoma
Copy link
Contributor

sameerajayasoma commented Apr 2, 2019

Btw, is it a strict requirement to have structural subtyping for objects. I know Flow and TypeScript have some nominal typing for Classes.

I came across the following paper which describes an approach where they've integrated nominal and structural subtyping to get rid of the weaknesses of both approaches. The example situations that they have describes are practical situations that we've faced in current Ballerina compiler front-end written in Java as well as in Ballerina JMV back-end written in Ballerina that we are working on these days.

FYI,
http://www.cs.cmu.edu/~aldrich/papers/ecoop08.pdf

We've defined structural subtyping for data related types and that works well. But for objects possibly we can think of a different approach. WDYT?

@jclark @sanjiva @hasithaa

@jclark
Copy link
Collaborator Author

jclark commented Apr 3, 2019

Well one problem is that containers, which are typed semantically, can contain objects, which means that typing of objects has to also be defined in terms of a subset relationship between sets of values. I suppose you could think of an object as having the class name as part of its value, but I think you would want a very different syntax if that is your conceptual model. Maybe something to think about for Ballerina v2 or v3.

@sameerajayasoma
Copy link
Contributor

sameerajayasoma commented Apr 3, 2019

I will open a new issue to discuss this matter.

Btw, I have a concern on the point 8 of the approach A in your original proposal. Let me explain it using the following example.

Ballerina module: pkg1

import ballerina/io;

public type Person object {
    int age = 20;
    private string email = "[email protected]";

    public function runWith(Person other) {
        io:println(self.email);
        // Privacy violation 1
        io:println(other.email);
    }
};

public function tryRunning(Person p1, Person p2) {
    p1.runWith(p2);
    io:println(p1.age);
    // Privacy violation 2
    io:println(p2.age);
}

Ballerina module: pkg2

import ballerina/io;

public type Student object {
    int age = 32;
    string email = "[email protected]";

    public function runWith(Student other) {
        io:println(self.email);
        io:println(other.email);
    }

    public function walkWith(Student other){
    }
};

Ballerina module: pkgmain

import pkg1;
import pkg2;
import ballerina/io;

public function main() {
    var p = new pkg1:Person();
    var s = new pkg2:Student();
    pkg1:tryRunning(p,s);
}

Here is the output

Here in this example pkg2:Student object type is a subtype of pkg1:Person object type according to Approach A. Also, according to point 8, the visibility of field "email" in pkg2:Student is greater than that visibility of field "email" in pkg1:Person.

But this examples demostrates that pkg1 has access to a field of pk2:Student which has "module" visibility.

@sameerajayasoma sameerajayasoma pinned this issue Apr 3, 2019
@sameerajayasoma sameerajayasoma unpinned this issue Apr 3, 2019
@sameerajayasoma
Copy link
Contributor

However, I agree that privacy interpretation 1 (introduced above) is less useful even though it does not break privacy rules. But it has many drawbacks compared to interpretation 2. Some scenarios cannot be implemented without exposing private members to the whole world and implementation that verifies the interpretation 1 can be sub-optimal than interpretation 2.

Even though the interpretation 2 has privacy violations as explained in my above comment, that is the most useful interpretation. Therefore the privacy violation 1 in my above comment may not be a concern.

However, I think the 2nd violation can be a problem.

@jclark
Copy link
Collaborator Author

jclark commented Apr 4, 2019

Point 8 of my proposed approach needs rethinking.

@sameerajayasoma Don't we have a problem even with simpler cases? For example

type P1 object {
   private string x = "P1";
   public function foo(P2 p) {
      P1 p1 = p; // same types
      io:println(p1.x); // whoops! we have circumvented the privacy of P2
  }
};
type P2 object {
   private string x = "P2";
   public function foo(P2 p) {
  }
};

@jclark
Copy link
Collaborator Author

jclark commented Apr 4, 2019

I think we need to distinguish between the visibility specifier (public, nothing or private) and the region of code to which a visibility specifier limits access. We can describe the visibility region as being one of

  • public
  • module(M)
  • object(M, O)

where M is a reference to a module and O is a reference to a specific (non-abstract) object type descriptor.

Before doing type-checking, we need to resolve each (context-dependent) visibility specifier into a (context-independent) visibility region.

There is a partial order on visibility regions corresponding to region inclusion with

public > module(M) > object(M, O)

Some regions are incomparable (<>):

module(M1) <> module(M2)
object(M, O1) <> object(M, O2)

@sameerajayasoma
Copy link
Contributor

sameerajayasoma commented Apr 4, 2019

I have a concern about considering object members which have module and private visibility for object subtyping.

Imagine that I need to define an object type (S) that is a subtype of another object type (T) defined in a different Ballerina module. Now I need to know the internal structure (including module level and private members ) of T in order to make S a subtype of T. The internal structure is not available unless you have access to the source code of the module. That can be a problem IMO.

We can think of a model where we define object subtype relationships only based on abstract objects?

  1. Abstract object type AT1 is a subtype of abstract object type AT2 (Need to define how)
  2. Non-abstract type T3 is a subtype of the abstract object type AT1. (Need to define how)
  3. We don't define subtype relationships of two non-abstract object types. They are incomparable.

If abstract object type (AT1) contains members with module visibility then other packages won't be able to create subtypes of AT1. This model is similar Go lang interfaces.

@jclark
Copy link
Collaborator Author

jclark commented Apr 4, 2019

Suppose *T copies the members along with the resolved visibility regions instead of the visibility specifiers. This makes a difference for module-level visibility.

At the moment abstract object types cannot have private members. But I think we should also say that public abstract object types should have only public members. However, I think it’s fine for protected abstract object types to have protected members.

@sameerajayasoma
Copy link
Contributor

Sorry. I edited my comment to remove the restriction for public abstract objects having members with module (protected) visibility a few mins ago. (didn't see your reply). Go lang has done something like that and it is useful.

@jclark
Copy link
Collaborator Author

jclark commented Apr 5, 2019

Based on further discussions with @sameerajayasoma, it actually does make sense to have public abstract objects with module visibility. What it means is that only the module that defines the abstract type can subtype it.

@jclark
Copy link
Collaborator Author

jclark commented Apr 5, 2019

The overall effect is that objects with private fields/methods cannot have subtypes. This makes sense given that

  • private means private to object type not private to object value, and
  • we want privacy to be statically enforced by the type system, and
  • we allow *T only when T is an abstract type (which cannot have private members)

For object types with module level visibility, only that module can create subtypes.

Objects with only public fields/modules can be subtyped freely, just like with records.

@jclark
Copy link
Collaborator Author

jclark commented Apr 5, 2019

If module M defines an abstract object type T with module-visibility members, then only M can use *T.

@jclark
Copy link
Collaborator Author

jclark commented Apr 5, 2019

The previous comment but one is not quite correct since member of subtype can have more visibility than the supertype (consistent with substitution principle).

@jclark jclark added this to the 2019R1 milestone Apr 6, 2019
jclark added a commit that referenced this issue Apr 6, 2019
Means private to object type not private to object value. See discussion in issue #43.
@jclark
Copy link
Collaborator Author

jclark commented Apr 12, 2019

To summarize, an object type T’ is a subtype of an object type T, if and only if for every field and method f of T, there is a field or method f’ of T’, such that

  • f’ has the same name as f
  • f’ is a method or field according as f is a method or field
  • the type of f’ is a subtype of the type of f
  • the region of code within which f’ is visible is a superset of the region of code within which f is visible

The last point is motivated by substitutability. We want it to be possible to substitute T objects with T’ objects. What is needed for this is for T’ to provide at least the level of access to a method or field that T does.

Since objects can be members of containers, and subtyping of containers is defined semantically in terms of set relationships between the set of values (shapes) that types denote, we need to define the shape of an object in such a way that the above subtype relationship holds.

We can do this by saying that the shape of an object is a pair of maps <F,M> where F maps keys to the shape of fields and M maps keys to the shape of methods, which will be a function shape. In both cases, a key is a pair <R, S> where R is a code region and S is string for the name of the field or method. A code region is either a module or an object type descriptor. F or M will have an entry <R, S> if there is a field or method named S that is visible in region R. Thus a field or method declared as public will have a map entry for every code region; one declared as module level will have a map entry for the code region of its module and of every object type descriptor in that module; one declared as private will have a map entry only for the code region of its object type descriptor. An object type is open and includes all shapes that have at least the fields and member functions specified in a type.

@pubudu91
Copy link

Do we need to take the initializer into account for subtyping? The current behaviour is to ignore the initializer.

@jclark
Copy link
Collaborator Author

jclark commented Apr 26, 2019

You mean the __init method, right? If you can call it explicitly, type safety requires to be taken into account. At the moment, the spec does not prohibit calling it explicitly.

@jclark
Copy link
Collaborator Author

jclark commented Apr 29, 2019

I think we have two choices as to how we handle visibility. When S is a subtype of T, and a method or field f occurs in both S and T, we have two choices for the requirement of the visibility regions of f in S and in T, either

  1. its visibility region in S must include its visibility region in T (as above), or
  2. its visibility region in S must equal its visibility region in T

1 allows more subtyping relationships, and is, I believe, what is required for substitutability, but 2 is perhaps easier to understand, and has the advantage that it allows us to approximate nominal typing: if you create an object with a protected field/method, then only objects defined in the same module can be a subtype; similarly, if you create an object with a private field/method, then the only objects that could be a subtype are local objects created by methods of that objects.

@sanjiva, @sameerajayasoma, @hasithaa Any thoughts?

@sanjiva
Copy link
Contributor

sanjiva commented Apr 30, 2019

I think the idea that a subtype expands the visibility region is going to cause a lot of confusion to mort programmers.

So I prefer option 2.

@jclark jclark closed this as completed in aab2c8a Apr 30, 2019
@sameerajayasoma
Copy link
Contributor

sameerajayasoma commented May 8, 2019

+1 for option 2. AFAIR, that is the option that was discussed in an offline discussion with @jclark on 4th of April. It is summarized in this comment. #43 (comment)

@sameerajayasoma
Copy link
Contributor

Btw, these object subtyping rules give rise to many patterns which are useful in Ballerina programs with multiple Ballerina modules. They are related to substitutability principal in OOP.

  • Define an object type or an abstract object type T such that an object of T can be substituted by an object of S where S is a subtype of T and S can be defined in any module.
    • Make all the fields and methods public.
  • Define an object type or an abstract object type T such that an object of T can be substituted by an object of S where S is a subtype of T and S is defined in the same module as T.
    • Define at least one field or a method to have module-level visibility.
  • Define an object type T such that an object of T can only be substituted by an object of T.
    • Define at least one field or a method to have private visibility.
    • An abstract object type cannot have a private member.

@jclark
Copy link
Collaborator Author

jclark commented May 9, 2019

Option 2 is what I put in the spec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area/Lang Relates to the Ballerina language specification design/incomplete Part of design not yet worked out
Projects
None yet
Development

No branches or pull requests

4 participants