-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: all you want for queries #1070
Comments
@monperrus <https://github.com/monperrus> does it makes sense to
create a new P
Yes, go ahead.
We will first extensively discuss the design and contracts (javadoc) of:
* the new methods in existing interfaces
* the new interfaces (if any)
…--Martin
|
The main requirement which is not supported by #1088 is possibility to create the query which has not initialized input element and then this query can be used as a part of another query. Then We can build and use queries/model analyzing/refactoring chains |
The main requirement which is not supported is the possibility to
create the query which has *not initialized input element*
What will be the algorithm / strategy used to infer the input element?
|
it will come from the previous step of the query chain |
I guess it helps, if I explain the big picture of the problem, I would like to solve using spoon queries. We have an old java project, where nearly each method is polluted with one project specific bad design pattern. I want to get rid of that design pattern. It is easy to localize it, but it is harder to replace it and even more harder to clean all the useless code, which was written just to fulfill needs of that bad pattern. So I need an algorithm which does something like this:
I know that it is a big challenge and that the algorithm will be more complicated then this raw sketch, but may be you now understand why I need interface like interface CtQueryStep {
void evaluate(Object input, Consumer<Object> outputConsumer);
} This interface and function, let's me build all the queries/filters/refactoring, which I need in effective way. May be I will fail and the project stays dirty as it is, but I like that game :-) ... with exception when Martin deletes my code ;-))) |
But not for complex cases with many thousands intermediate results.
Is your problem a pure performance problem? of do you need an additional
feature?
As far as I understand, you would like a pipeline-based approach?
|
no, we already have a pipeline in CtQueryImpl
yes. But not only performance. See this example. In spoon there are many places where model Scanner is used. In all places the code inherits from base Scanner and hooks on |
One thing I still don't completely understand is the link between:
* scope-based queries
* pipeline architecture
To me, they seem to be two distinct problems. Could we handle them
separately one after the other?
|
Yes we can handle them separately. But in this order After lazy result sending is finished. There is nothing more needed, because scope based queries and many other kinds of model analysis and refactoring algorithms can be easily implemented without need for any extra interaces or changes in the spoon core. |
OK. Renamed this issue accordingly. Now we focus the discussion on pipelines. Does you current step design in QueryImpl support pipelines? Is it only a matter of having CtConsumer public? |
no it is not enough. I need Then for example scope based query can be then implemented like this query.map(new OverridenMethodQuery())...
class OverridenMethodQuery implements CtQueryStep {
void evaluate(CtElement input, Consumer out) {
input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).forEach(out);
}
} by the way, I have just learned design pattern Pipeline and Chain of responsibility and I see that we shouldn't call this PR pipeline. I have not found the name of design pattern I need. |
and current CtQuery does not supports forEach(Consumer) so instead of
in the eaxample above, the client has to actually write
|
And this is the final solution with optimized performance query.map(new OverridenMethodQuery())...
class OverridenMethodQuery implements CtQueryStep {
OverridenMethodFilter filter = new OverridenMethodFilter();
CtQuery query = new CtQueryImpl().filterChildren(filter);
void evaluate(CtElement input, Consumer out) {
filter.setMethod(input);
query.evaluate(input.factory.getRootPackage(),out);
}
} |
To me, it seems that lazy evaluation gives you pipelining for free, with no change in the API:
instead we can have something like:
WDYT? |
I think that we speak all the time about different things. |
please forget about pipelines. It was a pure misunderstanding. We already have a pipeline in CtQueryImpl. I have updated my answers above, and I have changed the name of this issues. I will try now to explain the problem. There is really no feature missing in CtQueryImpl. What is missing, is support for processing of results in a lazy way (I do not know the correct design pattern name for that, so do not focus on the name "lazy result", but have a look at this 2 examples:
query.map(new OverridenMethodQuery())...
class OverridenMethodQuery implements CtFunction<CtMethod, CtMethod> {
CtMethod evaluate(CtMethod input) {
return input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).list();
}
}
query.map(new OverridenMethodQuery())...
class OverridenMethodQuery implements CtQueryStep {
void evaluate(CtElement input, Consumer out) {
input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).forEach(out);
}
} Both examples gives exactly same results, but the second one will perform faster with lower memory consumption in cases if OverridenMethodQuery would produce 1 000 000 results. |
Do you mean that if the final result of |
yes it is
|
It is not performance problem, if you run it once. But I see the Query as a core feture of analyses, refactorings and TemplateMatcher and I think it we be evaluated many many times ... and then allocation of useless lists is a bad idea. |
One solution is that
with no useless list involved. |
I think, it is very difficult task to implement Iterator iter = e.filterChildren(x).iterator() without helper list, because the way how CtScanner.enter method is called. Show me that if you think opposite! interface CtQueryStep {
void evaluate(Object input, Consumer<Object> outputConsumer);
} ??? Why do you try to invent different solution??? Note: I already had solution for this issue in #1018 and I still think it is the most efficient solution. It needs only little code in spoon core ( void evaluate(CtElement input, Consumer out) {
input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).forEach(out);
} it is fastest and it has minimal memory overhead. Why do you not like it? Why do you try to invent something else? Is it another misunderstanding? Did you understood my solution? Yes, I see some problems and possible improvements in original #1018 (before you deleted support of lazy result sending) too. But the core idea is OK. I suggest to discuss these problems. I am sure we can found an elegant solutions for them. at the beginning you wrote:
Who is we? May be I am just too active? And you just need time to discuss it? Is this discussion helpful for you? |
This discussion is very helpful to understand the interesting problems you raise and find a good solution. Thanks a lot for it, Pavel!
Oh no, we're just trying to identify the simplest and most intuitive solution. And a standard Java iterator is such a simple solution
OK, but conceptually is this what we need? Is this what we would like to write? (I'd say yes)
Could you elaborate on what you mean by helper list in this context? |
Thanks for confirming that we are on good way ;-)
I am sure No, because such implementation is really very difficult. Just try it, it is better then any further discussion. I can actually only imagine the one solution - running a helper Thread and some thread synchronization. Such solution is complicated, slow and resource consuming. May be you know better solution. Then show me that and we can discuss if it is better then
I mean this: e.filterChildren(x).iterator()
can be written as
e.filterChildren(x).list().iterator() which is the solution, which creates helper list with 1M items. |
We all agree on this.
I mean from a pure API viewpoint. Assuming this would be easy to implement, would it be what we want?
? So having |
yes. Because Spoon CtScanner, which will be the core of the most of the future queries, works this way. It is not possible to iterate over children element visited by CtScanner and it is not efficient to collect children elements in a long list. The most compatible with
Do you mean to implement List result = e.map(..some CtFunction lambda ...).foreach(... some CtQueryStep based query ...).foreach(... some other CtQueryStep based query ...).list();
for(Object item : list) {
... a code which consumes result of query
} ? e.map(..some CtFunction lambda ...).map(... some CtQueryStep based query ...).map(... some other CtQueryStep based query ...).forEach(... a code which consumes result of query...) note that second example has exactly same behavior. |
Note: I do not persist on any names. I am open to rename |
Added info about our convergence points at the beginning of this thread in the section requirements.
|
OK, there was a misunderstanding: according to your previous message, I thought we had reached an agreement on those two points. It's not perfect, it's not final, but it's a valuable intermediate step. Keep in mind that what matters is the API, not the implementation. In Spoon we often go for a more complex implementation in order to have a more intuitive and usable API (the ideal API being when you understand it just with the names and types, without reading the Javadoc). Let's come back to your house metaphor. What I'm doing is to evaluate whether what we put in Spoon is valuable and usable only for your house (bad) or valuable for other houses as well (good): lazy foreach meets this criterion (RQ1), reusable queries as well (RQ4). That's already a huge step forward and I thank you for this. |
Thanks for explanation. I like the idea to finish API first too. The problem is that till now you suggested API (some parts), which was not acceptable for me, because the suggested API would cause extra complexity in client's code too and would be a bottleneck for some future powerful query features. Let's discuss API
If you understand
If you like reusable queries as agreed feature for next PR, then I agree too. But note that:
So I suggest to have following function as evaluation API of reusable queries:
I am open to rename this method if |
Yes, I agree, an input and a query, hence My main problems with |
I do not understad how that will be used. Please provide a client´s code which uses your API. |
|
Thanks for nice example. I understand it. It is super that we spooke about same thing :))) What is good? I see these problems:
Why it should capture query? I do not understand You. Please explain it. It is probably because you (same like me) expected different usage of that API. Here is the client code which uses // reusable query
CtLazyFunction q = new OverridenMethodQuery()
//or optionally this will work too
CtBaseQuery q = new CtBaseQueryImpl().map(...).filterChildren(...);
// listless foreach
q.apply(aMethod, new CtConsumer() {
void accept(CtMethod m) {
System.out.println(m);
}
}); Note:
I understand |
Excellent. I'll answer to all your points but first:
Is it a shortcut for the following? How is this different from:
|
Yes, It is the same.
See the list of problems above. It is the list of differences. |
I'd prefer to avoid this and have a single API.
OK for renaming.
I would say it's not a problem for now. We can document it and make it ready for concurrency afterwards. Baby steps first :-)
Runtime exception?
Runtime exception? |
I have two questions:
I guess you prefer one API, because it is simpler for clients right? Or is there any other reason? Here is an example of Query which is based on public class CtFieldReferenceQuery implements CtLazyFunction<CtField<?>, CtFieldReference<?>> {
@Override
public void apply(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
if (field.hasModifier(ModifierKind.PRIVATE)) {
searchForPrivateField(field, outputConsumer);
} else if (field.hasModifier(ModifierKind.PUBLIC)) {
searchForPublicField(field, outputConsumer);
} else if (field.hasModifier(ModifierKind.PROTECTED)) {
searchForProtectedField(field, outputConsumer);
} else {
searchForPackageProtectedField(field, outputConsumer);
}
}
private void searchForPrivateField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
field.getTopLevelType().filterChildren(new DirectReferenceFilter(field.getReference())).forEach(outputConsumer);
}
private void searchForProtectedField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
field.getFactory().getModel().getRootPackage()
.filterChildren(new SubtypeFilter(field.getDeclaringType().getReference()))
.filterChildren(new DirectReferenceFilter(field.getReference()))
.forEach(outputConsumer);
}
private void searchForPublicField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
field.getFactory().getModel().getRootPackage().filterChildren(new DirectReferenceFilter(field.getReference())).forEach(outputConsumer);
}
private void searchForPackageProtectedField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
field.getTopLevelType().getPackage().filterChildren(new DirectReferenceFilter(field.getReference())).forEach(outputConsumer);
}
} It is already working with PR #1076. What interface class it should extend if we use |
Yes
What about make it extend class QueryImpl directly (instead of implementing an interface) (do you remember our discussion on a possible |
I am glad that we both want simple API for spoon clients. It is probably unbelievable at first sight, but I think that these 2 simple APIs are simpler/better for clients, then 1 more complex API. Why 2 is simpler then 1? You probably think: If there are 2 APIs then client must choose, which one to use, so it is more complex then 1 API, where this choice is not needed.
anElement.map(...).forEach(...);
//or
List aList = anElement.map(...).list(); Note: that in this context client automatically gets
CtBaseQuery query = factory.Query().map(...);
//... loop for each searched element ...
query.apply(element, new CtConsumer() {
void accept(CtElement e) {
System.out.println(e);
}
}); Note: that in this context client automatically gets Baby steps in wrong direction are steps in wrong direction, even when parents love their baby. ;-) |
What if one simply wants the list of elements?
What if one wants to refine a reusable query as follows?
For those two cases, one needs either one single interface or |
Good question. There are several solutions: CtBaseQuery query = factory.Query().map(...);
List l1 = new ArrayList();
//here can be optional loop, which collects results of all the query evaluations in one list,
//without building intermediate useless lists
//{
query.apply(element, (item)->l1.add(item));
//}end of loop
CtBaseQuery query = factory.Query().map(...);
...
// refining the query
query.map(...).filterChildren(...) No problem. This code already works. See CtBaseQuery API in PR #1076.
no, it would kill the idea of that design. Please read the changes in #1076 and you will see what I am speaking about. |
Why? Currently, there is a huge copy'n'paste between CtQuery and CtBaseQuery (nearly all methods of CtQuery are duplicated in CtBaseQuery). |
The reason is fluent API - The type of returned value, which is different. So it is only semantic copy. But implementation is different. I tried to used generic type for returned value, but I failed, because the return type MUST depend on the arguments of the |
I understand. Sometimes, flexible dynamic checks are better than rigid static typing. To me it's the case here. This clearly calls for a unique interface CtQuery: we'll have all appropriate methods accessible on the returned object, and no duplicate methods. |
It is not design mistake that these methods seems to be copied insecond API. It is design pattern facade. |
Yes, especially if it's absolutely required for the implementation. I'm not saying it's a mistake, I'm saying it's simpler and better to have Pavel, we've made awesome progress on this topic! |
It is my first open source project. I am developing commercial SW for 21 years (16 java). I like the idea of open source, but I did not know that progress is 10 times slower :( then In my daily work. |
Yes, I know it's slow, essentially because of asynchronous communication. With one or two 15 minute Skype meetings, we would have probably converged directly. Maybe we can do this next time. But it's really important that we're along the same page, so that we could maintain your code, and so that we can write together high-quality documentation of this new super feature. |
I fully agree! And with skype it will probably not consume so much time :-) |
I mailed you my skype name and I have added the conceptual model and some explanations into PR #1076 . I suggest to discuss and then update this conceptual model description so it is always up to date and we can base future discussion on the agreed terms. I want to minimize misunderstandings. I think I have finally understood your ideas how to implement the queries. These ideas were already implemented in my PR #1018 before you simplified it. I do not need fast answer. Take your time |
Closed together with #1090 |
Awesome. Thanks a lot Pavel for your creativity and patience. You set
the record of the longest PRs in term of discussion :-)
|
I am glad that it is not normal. :-) It was full of misunderstandings... Therefore it needed so long. |
Requirements/goals for new features of queries:
CtQuery extends Iterator
would an elegant solution but not possible because a Scanner is not iterableCtQuery query = new CtBaseQueryImpl().map(...).map(...)
)The method
CtQuery#map(CtQueryStep)
is semantically equal to the existing methodCtQuery#map(CtFunction)
. The only difference is the way how result of mapping is delivered to the next query step.return objectBooleanOrList;
. This is optimal for mapping functions which returns 1 or few elements.outputConsumer.apply(item)
, which is called for each spoon model element, which has to be returned by this step. This is optimal for mapping functions, which returns many elements.I need a spoon query, which correctly returns all the references to an private or protected class field. It means that query must search for references in correct scope(s).
There was an idea how to implement it during development of PR #1018, but then we removed that code for simplicity reasons.
@monperrus does it makes sense to create a new PR, which will reuse the same ideas again or do you know some better way?
The text was updated successfully, but these errors were encountered: