Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type inference experiments. #14510

Merged
merged 1 commit into from
Dec 18, 2023
Merged

Type inference experiments. #14510

merged 1 commit into from
Dec 18, 2023

Conversation

ekpyron
Copy link
Member

@ekpyron ekpyron commented Aug 22, 2023

This is far from complete and even the parts that are in here need some reworking, so I'm mainly posting it for a basis for discussing and splitting up further work.

Notes:

  • The SyntaxRestrictor is preliminary and (probably) incomplete.
  • The DebugWarner is just used for printing the result of type inference as info messages.

Among the issues:

  • The bogus codegen assumes everything is one stack slot - it will need to reintroduce an IRVariable mechanism to account for larger types.
    Also it'd be interesting to explore if and how "size on stack" could be conceptually a construct defined in-language (i.e. a type class property that is merely directly inherited/derived on type declarations; optional - also fine to determine it hard-coded for now).
  • The PR currently introduces several keywords for builtin types and type classes and has a notion of "builtin" and "primitive" types. That has several issues:
    • There's overhead in hardcoding these types/classes.
    • The stdlib concept means that these constructs shouldn't in fact be globally available, but rather explicitly imported.
    • The distinction between primitive/builtin/user-defined constructs introduces overhead on all uses.
      What should rather happen is:
      • Even the most basic primitive types and classes are user-defined.
      • We introduce a way to mark specific type and class declarations as having special semantics (like being "the" word type that may cross the assembly barrier). This means name resolution and type checking (other than for the cases involving special semantics) works as usual, so this immediately yields proper scoping and reduces special-casing. E.g. type word = __builtin("word"); or as separate top-level statement __builtin_type("word", word);
      • Only one type can be registered as any specific builtin type. I.e. type word1 = __builtin("word"); type word2 = _builtin("word"); is invalid.
  • Polymorphic recursion is not actively avoided (this may be implicit in the current handling and just need double-checking or a more involved tracking mechanism).
  • The order of visitation during type inference is important, which allows for introducing issues (e.g. importing into scope and generalizing too soon) and potentially makes error-reporting harder.
  • Potentially the split into TypeRegistration and TypeInference can be avoided - or, maybe, the system can be split up more properly in multiple passes (late unification as in "Generalized Hindley-Milner"? Issues with generalization on importing functions - but can potentially type-check functions (sets of mutually recursing functions?) individually)
  • The introduced syntax is temporary.
  • Longer-term we may need a monomorphization pass during analysis - and could then store more information in monomorphized annotations instead of having more logic during codegen.
  • Explicit free type variables in function signatures have improper scope (function f(x:a) {} works, while function f(x:a) -> a {} fails with undeclared identifier). Potentially we should have explicit syntax for free type variables (like 'a or ?a) or for declaring free types (like function f<a>(x:a) -> a or forall a. function f(x:a) -> a)
  • Obviously a lot of features are missing (below a subset in no particular ordering):
    • Generalized type classes (multiple type variables, etc.) - respectively, extracting the type-class logic from unification (HM(X)-style)
    • Conversions from literals
    • Algebraic data types
    • Function pointers.
    • Type class hierarchy: subclasses (a class that requires a different class, but extends it)
    • static assertions (will probably require a monomorphized analysis pass)
    • ...
  • Obviously, there's also no test coverage.

Infrastructure issues:

  • AST import/export test scripts need to exclude experimental solidity tests.

Further notes for future development: https://notes.ethereum.org/_OSmtx9aQAOHQXwa60IDsQ

Comment on lines 155 to 157
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::TypeFunction).m_index).arities = {Arity{std::vector<Sort>{{typeSort},{typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::Function).m_index).arities = {Arity{std::vector<Sort>{{typeSort, typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::Function).m_index).arities = {Arity{std::vector<Sort>{{typeSort, typeSort}}, classType}};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this one should be Itself too as in #14510 (comment)?

Suggested change
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::TypeFunction).m_index).arities = {Arity{std::vector<Sort>{{typeSort},{typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::Function).m_index).arities = {Arity{std::vector<Sort>{{typeSort, typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::Function).m_index).arities = {Arity{std::vector<Sort>{{typeSort, typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::TypeFunction).m_index).arities = {Arity{std::vector<Sort>{{typeSort},{typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::Function).m_index).arities = {Arity{std::vector<Sort>{{typeSort, typeSort}}, classType}};
m_typeConstructors.at(m_primitiveTypeConstructors.at(PrimitiveType::Itself).m_index).arities = {Arity{std::vector<Sort>{{typeSort, typeSort}}, classType}};

Comment on lines +98 to +102
size_t arguments() const
{
solAssert(!arities.empty());
return arities.front().argumentSorts.size();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the first element in arities meant to be always PrimitiveClass::Type? Can we assert that? If not, what would a different type class on the first position mean?

Can a type constructor have more than one PrimitiveClass::Type in arities? If so, what would that mean?

If there's always exactly one, then it seems to me that it we should have a separate field for it and not just lump it with other things in arities. It seems different in some ways and it's even excluded by some checks.

Copy link
Member

@cameel cameel Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and by the way calling this thing Arity confused me a lot. Looking at other code I assumed it was just something describing the number of args, but it looks like an equivalent of TypeConstant but for type classes. I'm only now noticing that it has a type class inside, and TypeSystem::instantiateClass() makes more sense (it was a bit weird that you'd not need to say which class you're instantiating).

Can we come up with a more descriptive name for it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's how these things are called :-). If you can come up with a better name, feel free to suggest it.
It describes for a type constructor that applying it to type arguments of specific sorts (the argument sorts in the arity), the result will have a specific class (the class in the arity).

Comment on lines +253 to +268
Sort baseSort{{primitiveClass(PrimitiveClass::Type)}};
size_t index = m_typeConstructors.size();
m_typeConstructors.emplace_back(TypeConstructorInfo{
_name,
_canonicalName,
{Arity{std::vector<Sort>{_arguments, baseSort}, primitiveClass(PrimitiveClass::Type)}},
_declaration
});
TypeConstructor constructor{index};
if (_arguments)
{
std::vector<Sort> argumentSorts;
std::generate_n(std::back_inserter(argumentSorts), _arguments, [&](){ return Sort{{primitiveClass(PrimitiveClass::Type)}}; });
std::vector<Type> argumentTypes;
std::generate_n(std::back_inserter(argumentTypes), _arguments, [&](){ return freshVariable({}); });
auto error = instantiateClass(type(constructor, argumentTypes), Arity{argumentSorts, primitiveClass(PrimitiveClass::Type)});
solAssert(!error, *error);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get the purpose of instantiateClass() call here. Unless I'm missing something, it will simply insert a second, identical copy of PrimitiveClass::Type arity into arities and with how it's called, validations inside can never fail (we're calling it on a type constant and using matching numbers of args).

Is this bit just redundant? If so, why is it done only when there is at least one argument?

Copy link
Member Author

@ekpyron ekpyron Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The system currently has a primitive class Type that contains all types (ultimately, we may not need that, actually, but currently at least it's there). When you declare a new type constructor, you need to clarify that it, applied to arguments of Type sorts gives you something of class Type - that doesn't happen automatically.

Copy link
Member

@cameel cameel Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why do we need two identical copies of that class in arities?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, we don't :-) - we should probably not add it manually in here and only let instantiateClass handle it, I need to read through it properly myself again :-).

{
SecondarySourceLocation ssl;
ssl.append("Previous instantiation.", instantiation->second->location());
m_errorReporter.typeError(6620_error, _typeClassInstantiation.location(), ssl, "Duplicate type class instantiation.");
Copy link
Member

@cameel cameel Sep 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I find it odd that you can constrain type arguments of instantiations and we can distinguish instantiations with different constraints (they'll have different Arity), but you can't instantiate multiple times.

For example:

pragma experimental solidity;

type Vector(Item);

class Self: Container {}

class Self: Number {}
class Self: String {}

instantiation Vector(Item: Number): Container {}
instantiation Vector(Item: String): Container {}
TypeError 6620: (171-219): Duplicate type class instantiation.

Is this something we're going to allow that in the future? Does it just require support for parameterized type classes? Otherwise I don't see much point in allowing those constraints in instantiations.

I'm also generally unsure at what places we're going to allow constraints in the first place, because the prototype is pretty inconsistent about that. It disallows some things I'd expect to be possible (e.g. type Vector(T: Number);) while allowing some wild things (e.g. let x: T: Number: Number: Number;). It's incomplete so it's understandable, but it makes it hard to figure out what is supposed to be valid eventually and what not.

Copy link
Member Author

@ekpyron ekpyron Oct 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is generally impossible. How would you distinguish between the two instantiations? I.e. how would you ever be able to tell which is the right one? There can only ever be a unique instantiation for any type constructor.

In situations in which you'd want to do this, you'd, instead, move the specific logic to a new type class for the argument. I.e. have instantiation Vector(Item: NumberOrString): Container {} and appropriately define NumberOrString (we currently don't have that in the PR, but conceptually it's possible to declare Number and String subclasses of NumberOrString by implementing the interface of NumberOrString with what's available for Number or String - but in general the current PR doesn't properly handle a dependency/inheritance hierarchy of type classes yet - partially because this would also only need to change once we generalize them anyways)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also generally unsure at what places we're going to allow constraints in the first place, because the prototype is pretty inconsistent about that. It disallows some things I'd expect to be possible (e.g. type Vector(T: Number);) while allowing some wild things (e.g. let x: T: Number: Number: Number;). It's incomplete so it's understandable, but it makes it hard to figure out what is supposed to be valid eventually and what not.

Type definitions are general and cannot be constrained to arguments of specific type classes (there's no use for doing that) It's only type class instantiations that can depend on specific argument sorts. So that part is intentional.

Stuff like let x:T:Number:Number:Number; - if that's really allowed - is just laziness so far. Remember that this is experimental and not yet meant to be perfect - it will have to change anyways, once we generalize type classes to more than one type argument.

Comment on lines +160 to +161
experimental::Type TypeSystem::freshVariable(Sort _sort)
{
size_t index = m_numTypeVariables++;
return TypeVariable(index, std::move(_sort));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We increase the number first so type variable IDs start at 1 rather than 0. Is that intentional? If not, it would be more consistent with other IDs in the AST to start at 0.

while (index /= 26)
varName += static_cast<char>('a' + (index%26));
reverse(varName.begin(), varName.end());
stream << '\'' << varName;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an established convention for naming fixed free type variables? Is prefixing them with " a good idea?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change generic ones to be prefixed by ? instead of ' and use ' for fixed ones.

Comment on lines +166 to +167
experimental::Type TypeSystem::freshTypeVariable(Sort _sort)
{
_sort.classes.emplace(primitiveClass(PrimitiveClass::Type));
return freshVariable(_sort);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does a fresh non-type variable even mean? Does it have any sensible interpretation in the type system?

The only meaningful use of freshVariable() on its own currently seems to be when we're declaring the type primitive class. If seems to me that it should be kept as an internal helper. Especially given that the naming is confusing - it's not immediately clear what the difference between a "variable" and a "type variable" is.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Higher-kinded type variables would be examples - but we don't have that implemented and potentially won't.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about naming then? Can we have something better than freshVariable()?

@cameel
Copy link
Member

cameel commented Nov 1, 2023

Removing the commit with a hack that disabled codegen in syntax tests before I merge #14660.

Copy link

This pull request is stale because it has been open for 14 days with no activity.
It will be closed in 7 days unless the stale label is removed.

@github-actions github-actions bot added the stale The issue/PR was marked as stale because it has been open for too long. label Nov 16, 2023
@cameel cameel added must have Something we consider an essential part of Solidity 1.0. and removed stale The issue/PR was marked as stale because it has been open for too long. labels Nov 16, 2023
@ekpyron
Copy link
Member Author

ekpyron commented Dec 6, 2023

Ok, there seems to be some difference in bytecode comparison's handling of unimplemented feature errors between cli and standard json - and something in soltest needs to deal with them more gracefully, but it doesn't look like much is left

Well, and the error code test coverage of course needs to ignore this (we need to see if there's a better way than adding all of them to the exclusion list, but if need be we can do that)

@nikola-matic
Copy link
Collaborator

nikola-matic commented Dec 7, 2023

Ok, there seems to be some difference in bytecode comparison's handling of unimplemented feature errors between cli and standard json - and something in soltest needs to deal with them more gracefully, but it doesn't look like much is left

Well, and the error code test coverage of course needs to ignore this (we need to see if there's a better way than adding all of them to the exclusion list, but if need be we can do that)

We also have to deal with uncovered error codes somehow, and I'm not sure that ignoring them is the best option; should likely have at least some basic test coverage for them.

Also, the functionDependencyGraph tests should be excluded from the AST JSON tests, since we don't allow import/export of experimental ASTs.

@nikola-matic
Copy link
Collaborator

On a side note - why are we disallowing this in the experimental mode - we can't have semantic tests without, except we already have working tests, so all this assert did was to essentially disallow previously working functionality.
Removing this assert allows the experimental tests to pass , and basically fixes all of the *soltest failing steps.

@cameel
Copy link
Member

cameel commented Dec 11, 2023

why are we disallowing this in the experimental mode

Yeah, I see no reason to disable it. When reviewing #14659 I assumed that all the asserted outputs are unusable due to missing annotation and either segfault or produce broken output. If they are not, we should re-enable them.

BTW, does metadata output work too? If it does, that would also solve the bytecode comparison problem.

Ok, there seems to be some difference in bytecode comparison's handling of unimplemented feature errors between cli and standard json - and something in soltest needs to deal with them more gracefully, but it doesn't look like much is left

It's not different handling between CLI and StandardJSON. It's actually the lack of different handling. CLI and StandardJSON behave differently with regard to what outputs they produce as I already reported in #13925. For example StandardJSON may give you metadata even in presence of codegen ICEs. I still think that the right way to solve this is to make these two interfaces behave consistently. But it can of course be solved also by hard-coding the report generatorto ignore metadata in some cases. That's messier though.

Well, and the error code test coverage of course needs to ignore this (we need to see if there's a better way than adding all of them to the exclusion list, but if need be we can do that)

I think we should just add them all to the exclusion list. That will force us to later do a proper pass of fixing error handling and adding proper coverage for them rather than glossing over it just to get the script to shut up. Right now the prototype is not complete enough for us to be able to cover it properly (unstable syntax, testing corner cases very often triggers ICEs). If we want to avoid forgetting about this, we could add an issue for it already or list it in Experimental Solidity quirks to deal with eventually.

@cameel
Copy link
Member

cameel commented Dec 12, 2023

I made a quick checklist here to track remaining work on this: #14729.

Co-authored-by: Kamil Śliwak <[email protected]>
Co-authored-by: Matheus Aguiar <[email protected]>
Co-authored-by: Nikola Matic <[email protected]>
@ekpyron ekpyron marked this pull request as ready for review December 18, 2023 16:07
Copy link
Member Author

@ekpyron ekpyron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually just wanted to remark these comments for future reference, but then still approve and merge - but I actually cannot approve, since I'm the author :-).

Comment on lines +1094 to +1095
// TODO: consider still asserting unless we are in experimental solidity.
// solAssert(m_typeName, ""); solAssert(!m_typeExpression, "");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should note this down somewhere, resp. take care of this sooner than later.

{
solAssert(_kind == Token::Constructor || _kind == Token::Function || _kind == Token::Fallback || _kind == Token::Receive, "");
solAssert(isOrdinary() == !name().empty(), "");
// TODO: assert _returnParameters implies non-experimental _experimentalReturnExpression implies experimental
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be nice to clean up soon. Unfortunately a bit of a hassle to pass the information whether the AST is experimental down to these...

@@ -2138,7 +2152,8 @@ class BinaryOperation: public Expression
):
Expression(_id, _location), m_left(std::move(_left)), m_operator(_operator), m_right(std::move(_right))
{
solAssert(TokenTraits::isBinaryOp(_operator) || TokenTraits::isCompareOp(_operator), "");
// TODO: assert against colon for non-experimental solidity
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another one that'd be nice to get rid off.

Comment on lines +173 to +180
return std::apply([&](auto... _indexTuple) {
return ([&](auto&& _step) {
for (auto source: _sourceUnits)
if (!_step.analyze(*source))
return false;
return true;
}(std::tuple_element_t<decltype(_indexTuple)::value, AnalysisSteps>{*this}) && ...);
}, makeIndexTuple(std::make_index_sequence<std::tuple_size_v<AnalysisSteps>>{}));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually kept this until now, haha :-D, well, now we merge it.

// TODO move after step introduced in https://github.com/ethereum/solidity/pull/14578, but before TypeInference
FunctionDependencyAnalysis,
TypeInference,
DebugWarner
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we should keep this here on develop

}
}

// TODO: clean up rational parsing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, all this is half-dead and duplicated code that won't stay like this at all, but well, works for now...


if (!m_activeInstantiations.empty())
{
// TODO: This entire logic is superfluous - I thought mutually recursive dependencies between
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, this we should also get rid of quickly until anyone thinks this is actually necessary

using namespace solidity::frontend;
using namespace solidity::frontend::experimental;

/*std::optional<TypeConstructor> experimental::typeConstructorFromTypeName(Analysis const& _analysis, TypeName const& _typeName)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those comments we could also have cleaned out still :-).

@r0qs
Copy link
Member

r0qs commented Dec 18, 2023

There is already a PR with the workaround to the failing prb-math test.

@ekpyron ekpyron merged commit ed52376 into develop Dec 18, 2023
64 of 65 checks passed
@ekpyron ekpyron deleted the newAnalysis branch December 18, 2023 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
experimental must have Something we consider an essential part of Solidity 1.0.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants