feature: all you want for queries #1070

pvojtechovsky · 2016-12-22T10:04:29Z

Requirements/goals for new features of queries:

support of lazy foreach (better performance for large queries because no creation of useless intermediate lists)
- ~~CtQuery extends Iterator would an elegant solution but not possible because a Scanner is not iterable~~
support for chaining of foreach
- what should be the elements given for the next steps?
support for lazy evaluation in map for functions that returns an iterator (already works for functions returning Iterable).
support for queries reusable over multiple inputs (CtQuery query = new CtBaseQueryImpl().map(...).map(...) )
support for sending of results of one step directly to next step, without necessity to build helper lists.

interface CtQuery { // already exists
  /** Performs an action on each element of the query results.
    * Implemented efficiently for CtScanner based queries */
  CtQuery forEach(CtQueryStep);
}

interface CtQueryStep {
  /** 
   * evaluates query step and applies mapping of one input element to many output elements.
   * The output elements are `returned` by call of `outputConsumer.accept(outputElement)`, 
   * which is called once for each outputElement
   */
   void evaluate(Object input, CtConsumer<Object> outputConsumer);
}

/**This interface is equivavlent to java 8 Consumer. 
We can remove it after we upgrade to java 8 */
interface CtConsumer<T> {
 void accept(T item);
}

The method CtQuery#map(CtQueryStep) is semantically equal to the existing method CtQuery#map(CtFunction). The only difference is the way how result of mapping is delivered to the next query step.

Implementation of CtFunction - returns result by return objectBooleanOrList;. This is optimal for mapping functions which returns 1 or few elements.
Implementation of CtQueryStep- returns result by outputConsumer.apply(item), which is called for each spoon model element, which has to be returned by this step. This is optimal for mapping functions, which returns many elements.

I need a spoon query, which correctly returns all the references to an private or protected class field. It means that query must search for references in correct scope(s).

There was an idea how to implement it during development of PR #1018, but then we removed that code for simplicity reasons.

@monperrus does it makes sense to create a new PR, which will reuse the same ideas again or do you know some better way?

The text was updated successfully, but these errors were encountered:

monperrus · 2016-12-22T11:57:55Z

@monperrus <https://github.com/monperrus> does it makes sense to create a new P

Yes, go ahead. We will first extensively discuss the design and contracts (javadoc) of: * the new methods in existing interfaces * the new interfaces (if any)

…

--Martin

pvojtechovsky · 2016-12-22T14:08:27Z

The main requirement which is not supported by #1088 is possibility to create the query which has not initialized input element and then this query can be used as a part of another query. Then We can build and use queries/model analyzing/refactoring chains

monperrus · 2016-12-22T15:01:53Z

The main requirement which is not supported is the possibility to create the query which has *not initialized input element*

What will be the algorithm / strategy used to infer the input element?

pvojtechovsky · 2016-12-22T15:42:19Z

it will come from the previous step of the query chain

pvojtechovsky · 2016-12-22T17:38:27Z

I guess it helps, if I explain the big picture of the problem, I would like to solve using spoon queries. We have an old java project, where nearly each method is polluted with one project specific bad design pattern. I want to get rid of that design pattern. It is easy to localize it, but it is harder to replace it and even more harder to clean all the useless code, which was written just to fulfill needs of that bad pattern.

So I need an algorithm which does something like this:

starts on one method of one class. It is the input point for the example analyze, which I need to run first
then searches for all callees
then searches for all callees again
then focuses on parameters which are used to call these method and stores them in list1
then searches for all declarations of these parameters and stores them in list2
then searches for all other uses of these declarations
then filter these uses by a templateMatcher
collect the remaining uses in list3
after I analyze content of list3 more or less manually and configure templateMacher, and list 3 is finally empty.
I will run that chain again but at places where list1, list2 are collected, there will be called recursive refactoring, which will remove these parameters and their declarations, from the local scope of functions, from the parameters of methods (and all overridden/overriding methods), field variables of types etc. recursively.

I know that it is a big challenge and that the algorithm will be more complicated then this raw sketch, but may be you now understand why I need interface like

interface CtQueryStep {
void evaluate(Object input, Consumer<Object> outputConsumer);
}

This interface and function, let's me build all the queries/filters/refactoring, which I need in effective way.
Current CtQuery, without CtQueryStep and public Consumer interface is like calling spoon Scanner on whole model and to collect all the visited elements in the big List (without Filtering them) and then later filter that list and collect another list etc... bad approach. Yes it is nice and simple for small queries. But not for complex cases with many thousands intermediate results.

May be I will fail and the project stays dirty as it is, but I like that game :-) ... with exception when Martin deletes my code ;-)))
No, problem. You do it well Martin 👍

monperrus · 2016-12-22T22:10:36Z

But not for complex cases with many thousands intermediate results.

Is your problem a pure performance problem? of do you need an additional feature? As far as I understand, you would like a pipeline-based approach?

pvojtechovsky · 2016-12-23T07:43:59Z

pipeline

no, we already have a pipeline in CtQueryImpl

performance problem?

yes. But not only performance. See this example. In spoon there are many places where model Scanner is used. In all places the code inherits from base Scanner and hooks on enter method. If somebody would implement it different and would return all the model elements in one long list and then would process that long list .... how would you call it? Just performance problem? Bad design?
And it is the same case like current Query implementation. CtFunction is good for small amount of results. Pipeline can handle much more and it is better design for such cases. I like both approaches.

monperrus · 2016-12-23T09:02:50Z

One thing I still don't completely understand is the link between: * scope-based queries * pipeline architecture To me, they seem to be two distinct problems. Could we handle them separately one after the other?

pvojtechovsky · 2016-12-23T10:12:08Z

Yes we can handle them separately. But in this order
1 lazy result sending
2 scope based queries

After lazy result sending is finished. There is nothing more needed, because scope based queries and many other kinds of model analysis and refactoring algorithms can be easily implemented without need for any extra interaces or changes in the spoon core.

monperrus · 2016-12-23T10:43:12Z

But in this order
1 pipeline
2 scope based queries

OK. Renamed this issue accordingly. Now we focus the discussion on pipelines.

Does you current step design in QueryImpl support pipelines? Is it only a matter of having CtConsumer public?

pvojtechovsky · 2016-12-23T11:57:52Z

only CtConsumer public

no it is not enough. I need
1 CtConsumer
2 CtQueryStep (see above)
3 a new map function in CtQuery, which accepts CtQueryStep as the only parameter.

Then for example scope based query can be then implemented like this

query.map(new OverridenMethodQuery())...

class OverridenMethodQuery implements CtQueryStep {
void evaluate(CtElement input, Consumer out) {
input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).forEach(out);
}
}

by the way, I have just learned design pattern Pipeline and Chain of responsibility and I see that we shouldn't call this PR pipeline. I have not found the name of design pattern I need.
Current implementation of CtQueryImpl is already implemented as a kind of pipeline.

pvojtechovsky · 2016-12-23T12:27:21Z

and current CtQuery does not supports forEach(Consumer) so instead of

.forEach(out)

in the eaxample above, the client has to actually write

.map((ele)->{out.accept(ele);return false;}).list()

pvojtechovsky · 2016-12-23T12:39:58Z

And this is the final solution with optimized performance

query.map(new OverridenMethodQuery())...

class OverridenMethodQuery implements CtQueryStep {
  OverridenMethodFilter filter = new OverridenMethodFilter();
  CtQuery query = new CtQueryImpl().filterChildren(filter);
  void evaluate(CtElement input, Consumer out) {
    filter.setMethod(input);
    query.evaluate(input.factory.getRootPackage(),out);
  }
}

monperrus · 2016-12-23T13:48:21Z

To me, it seems that lazy evaluation gives you pipelining for free, with no change in the API:

# no pipeline, current implem
forEachSequential(steps, inputs):
for s : steps
   results = s.apply(inputs)
   inputs = results

instead we can have something like:

# simple version where each step yield no or one result
forEachPipeline(steps, inputs):
for i : inputs
   # applying the pipeline
   for s: steps
      j = i
      if j != null:
          j = s.apply(j)
      if laststep: results.add(j)

WDYT?

pvojtechovsky · 2016-12-23T18:55:20Z

I think that we speak all the time about different things.
I am trying to explain that returning 10 000 items by return longList; is not good idea and You are all the time offering solution like return shortList;
I guess we need a break :-)
Merry Christmas!

pvojtechovsky · 2016-12-25T09:24:32Z

please forget about pipelines. It was a pure misunderstanding. We already have a pipeline in CtQueryImpl. I have updated my answers above, and I have changed the name of this issues.

I will try now to explain the problem. There is really no feature missing in CtQueryImpl. What is missing, is support for processing of results in a lazy way (I do not know the correct design pattern name for that, so do not focus on the name "lazy result", but have a look at this 2 examples:

Code which works now

query.map(new OverridenMethodQuery())...

class OverridenMethodQuery implements CtFunction<CtMethod, CtMethod> {
  CtMethod evaluate(CtMethod input) {
    return input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).list();
  }
}

Code which can be written after lazy result sending is supported

query.map(new OverridenMethodQuery())...

class OverridenMethodQuery implements CtQueryStep {
   void evaluate(CtElement input, Consumer out) {
      input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).forEach(out);
   }
}

Both examples gives exactly same results, but the second one will perform faster with lower memory consumption in cases if OverridenMethodQuery would produce 1 000 000 results.
Yes OverridenMethodQuery will never return 1M results, but as you can see is in the big picture above, I need to write filters, which will do that. I used OverridenMethodQuery here as example, because it exists in current code, so it is easily understandable.

monperrus · 2016-12-25T10:03:05Z

Do you mean that if the final result of list() contains 1 000 000 elements, we actually build that list? that's the performance problem?

pvojtechovsky · 2016-12-25T14:12:30Z

yes it is

performance problem
and
it is not needed because results of one step can be sent as input of next step without collecting them in any list.
Actualy all 1M result items has to be collected in the list and then the code of CtQueryImpl will iterate this list and call nextStep.accept(item) 1M times.
I suggest to optionnaly!support another way: To let current step call the nextStep.accept(item) 1M times directly - without any useless allocation of a helper list

pvojtechovsky · 2016-12-25T15:39:37Z

It is not performance problem, if you run it once. But I see the Query as a core feture of analyses, refactorings and TemplateMatcher and I think it we be evaluated many many times ... and then allocation of useless lists is a bad idea.

monperrus · 2016-12-25T15:58:12Z

One solution is that CtQuery implements Iterator and we thus add a method iterator() and then on can write:

 for(CtStatement s : e.filterChildren(x)) {
    doSomething(s)
 }

with no useless list involved.

pvojtechovsky · 2016-12-25T18:00:34Z

I think, it is very difficult task to implement

Iterator iter = e.filterChildren(x).iterator()

without helper list, because the way how CtScanner.enter method is called. Show me that if you think opposite!
What is bad on interface method like this:

interface CtQueryStep {
   void evaluate(Object input, Consumer<Object> outputConsumer);
}

??? Why do you try to invent different solution???

Note: I already had solution for this issue in #1018 and I still think it is the most efficient solution. It needs only little code in spoon core (CtQueryStep#evaluate). It needs only little code in client's Query,

void evaluate(CtElement input, Consumer out) {
      input.factory.getRootPackage().filterChildren(new OverridenMethodFilter(input)).forEach(out);
   }

it is fastest and it has minimal memory overhead. Why do you not like it? Why do you try to invent something else? Is it another misunderstanding? Did you understood my solution?

Yes, I see some problems and possible improvements in original #1018 (before you deleted support of lazy result sending) too. But the core idea is OK. I suggest to discuss these problems. I am sure we can found an elegant solutions for them.

at the beginning you wrote:

We will first extensively discuss the design and contracts

Who is we?
A) Pavel and Martin
B) Inria Spoon team

May be I am just too active? And you just need time to discuss it? Is this discussion helpful for you?

monperrus · 2016-12-25T19:17:58Z

This discussion is very helpful to understand the interesting problems you raise and find a good solution. Thanks a lot for it, Pavel!

??? Why do you try to invent different solution???

Oh no, we're just trying to identify the simplest and most intuitive solution. And a standard Java iterator is such a simple solution

I think, it is very difficult task to implement Iterator iter = e.filterChildren(x).iterator()

OK, but conceptually is this what we need? Is this what we would like to write? (I'd say yes)

without helper list?

Could you elaborate on what you mean by helper list in this context?

pvojtechovsky · 2016-12-25T19:32:37Z

Thanks for confirming that we are on good way ;-)

I think, it is very difficult task to implement Iterator iter = e.filterChildren(x).iterator()

OK, but conceptually is this what we need? Is this what we would like to write? (I'd say yes)

I am sure No, because such implementation is really very difficult. Just try it, it is better then any further discussion. I can actually only imagine the one solution - running a helper Thread and some thread synchronization. Such solution is complicated, slow and resource consuming. May be you know better solution. Then show me that and we can discuss if it is better then CtQueryStep#evaluate(Object, Consumer)

Could you elaborate on what you mean by helper list in this context?

I mean this:

e.filterChildren(x).iterator()
can be written as
e.filterChildren(x).list().iterator()

which is the solution, which creates helper list with 1M items.

monperrus · 2016-12-25T19:44:46Z

e.filterChildren(x).list().iterator()is NOT a solution, which creates helper list with 1M items.

We all agree on this.

OK, but conceptually is this what we need? Is this what we would like to write? (I'd say yes)

I am sure No, because such implementation is really very difficult.

I mean from a pure API viewpoint. Assuming this would be easy to implement, would it be what we want?

 for(CtStatement s : e.filterChildren(x)) {
    doSomething(s)
 }

?

So having CtQuery#map(CtQueryStep) in the only feasible solution to you (as opposed to iterator())? I prefer to call it foreach(CtQueryStep) (because it is equivalent to for(CtStatement s : e.filterChildren(x)) which is called a foreach). OK?

pvojtechovsky · 2016-12-25T20:08:30Z

So having CtQuery#map(CtQueryStep) in the only feasible solution to you (as opposed to iterator())?

yes. Because Spoon CtScanner, which will be the core of the most of the future queries, works this way. It is not possible to iterate over children element visited by CtScanner and it is not efficient to collect children elements in a long list. The most compatible with CtScanner#enter method is the possibility to send the actually visited child element into next query step using call of an function - I suggest to call void CtConsumer<T>#accept(T).... or anything similar, but it must be a one function call for each visited element.

I prefer to call it foreach(CtQueryStep)

Do you mean to implement CtQuery#map(CtQueryStep), but to change the method name map to foreach?
So clients will then write code like

List result = e.map(..some CtFunction lambda ...).foreach(... some CtQueryStep based query ...).foreach(... some other CtQueryStep based query ...).list();
for(Object item : list) {
   ...  a code which consumes result of query
}

?
I would prefer to not use foreach here because of standard java 8 List#forEach(Consumer) method, which I would like to see at the end of query instead of list. Like this:

e.map(..some CtFunction lambda ...).map(... some CtQueryStep based query ...).map(... some other CtQueryStep based query ...).forEach(... a code which consumes result of query...)

note that second example has exactly same behavior.

pvojtechovsky · 2016-12-25T20:36:37Z

Note: I do not persist on any names. I am open to rename map to anything else, if you think it helps clients to understand. But I think that forEach would be really confusing at that place.

monperrus · 2016-12-25T22:37:24Z

Added info about our convergence points at the beginning of this thread in the section requirements.

about reusing CtScanner: what about method scan() which is used differently from enter?
what will method forEach return?
about the link with Java8, what about renaming QueryStep into CtConsumer?

monperrus · 2016-12-28T21:40:47Z

evaluate(CtElement input) is a step in wrong direction

OK, there was a misunderstanding: according to your previous message, I thought we had reached an agreement on those two points. It's not perfect, it's not final, but it's a valuable intermediate step. Keep in mind that what matters is the API, not the implementation. In Spoon we often go for a more complex implementation in order to have a more intuitive and usable API (the ideal API being when you understand it just with the names and types, without reading the Javadoc).

Let's come back to your house metaphor. What I'm doing is to evaluate whether what we put in Spoon is valuable and usable only for your house (bad) or valuable for other houses as well (good): lazy foreach meets this criterion (RQ1), reusable queries as well (RQ4). That's already a huge step forward and I thank you for this.

pvojtechovsky · 2016-12-28T22:37:29Z

Thanks for explanation. I like the idea to finish API first too. The problem is that till now you suggested API (some parts), which was not acceptable for me, because the suggested API would cause extra complexity in client's code too and would be a bottleneck for some future powerful query features.

Let's discuss API

RQ1

If you understand CtQuery#forEach(CtConsumer outputConsumer) implementation of #1076 as solution for RQ1, then I agree. This function is really valuable for everybody. I agree with this API.

reusable queries as well (RQ4)

If you like reusable queries as agreed feature for next PR, then I agree too. But note that:

reusable query is implemented by CtBaseQueryImpl (which is nothing else then some code moved from CtQueryImpl)
the API to evaluate reusable query needs two input parameters:

Object input - input element - which represents the starting point of the query
CtConsumer outputConsumer - the consumer, which will process the results of the query.
Note that this approach is 100% compatible with CtQuery#forEach(CtConsumer outputConsumer), which you already like as good API of query with bound input (represented by CtQuery).

So I suggest to have following function as evaluation API of reusable queries:

void CtBaseQuery#apply(Object input, CtConsumer outputConsumer).

I am open to rename this method if apply is not intuitive enough. But note that this method is a method inherited from CtLazyFunction! I need this inheritance, because then we can use one query as mapping step of another query.
I see other places where CtLazyFunction should be used in future. One of them is CtScanner, which should implement it in near future. So I am looking for a method name, which is generic enough.

monperrus · 2016-12-29T10:13:04Z

the API to evaluate reusable query needs two input parameters:

Yes, I agree, an input and a query, hence CtQuery.evaluate(CtElement input) (this is the first argument, input the second) or CtElement.evaluate(CtQuery query).

My main problems with apply(Object input, CtConsumer outputConsumer) is that it uses a CtConsumer which is not the abstraction that captures the query. It exposes in the API an implementation concern.

pvojtechovsky · 2016-12-29T10:23:16Z

I do not understad how that will be used. Please provide a client´s code which uses your API.
We probably again understand different thing behind the term "reusable query". So show me the use case and I understand what is your API for.

monperrus · 2016-12-29T10:30:26Z


// reusable query
CtQuery q =  new OverridenMethodQuery();

List l1 = q.evaluate(aMethod).list();
List l2 = q.evaluate(anotherMethod).list();

// listless foreach
q.evaluate(aMethod).forEach( new CtConsumer() {
   void accept(CtMethod m) {
      System.out.println(m);
   }
});

pvojtechovsky · 2016-12-29T12:28:09Z

Thanks for nice example. I understand it. It is super that we spooke about same thing :)))

What is good?
It is readable if client knows what to do.
It needs only minimum changes to existing API.

I see these problems:

the method evaluate does not evaluates the query, but only binds the input element to the query. It is the reason, why I did not understood how to use it. Solution: rename it?
This API works with single thread only. So the reusability of such API is limited to one thread. May be it is not problem now. But it is a design issue if we speak about reusable queries
CtQuery is interface for both use cases A) element based query B) reusable query.
3a) Therefore one can write bad code like

element.map(...).evaluate(element2).list(). What it does?
q.list(); without binding input element.
3b) Therefore CtQuery API is more complicated then needed.

once we release that we have to support that, even it will be not needed, because we will may be create better API later. And the spoon API will stay less understandable then it might be.

My main problems with apply(Object input, CtConsumer outputConsumer) is that it uses a CtConsumer which is not the abstraction that captures the query. It exposes in the API an implementation concern.

Why it should capture query? I do not understand You. Please explain it. It is probably because you (same like me) expected different usage of that API.

Here is the client code which uses apply method based API

// reusable query
CtLazyFunction q =  new OverridenMethodQuery()
//or optionally this will work too
CtBaseQuery q = new CtBaseQueryImpl().map(...).filterChildren(...);

// listless foreach
q.apply(aMethod, new CtConsumer() {
   void accept(CtMethod m) {
      System.out.println(m);
   }
});

Note:

I am not using CtQuery for reusable queries, but CtBaseQuery, which can be used to build complex query using map and filterChildren methods and finally can be executed several times using apply method of CtLazyFunction.
Each query use case is covered by one API A) CtQuery, B) CtBaseQuery. Each API is as simple as possible and avoids missuse.
CtLazyFunction/CtBaseQuery apply method can be used by multiple threads ... after internal implementation allows that too.

I understand evaluate based API. Please confirm that You understand apply based API or ask me.
Then we can discuss next steps.

monperrus · 2016-12-29T18:27:08Z

Excellent. I'll answer to all your points but first:

q.apply(aMethod, new CtConsumer() {
   void accept(CtMethod m) {
      System.out.println(m);
   }
});

Is it a shortcut for the following? How is this different from:

q.evaluate(aMethod).forEach(new CtConsumer() {
   void accept(CtMethod m) {
      System.out.println(m);
   }
});

pvojtechovsky · 2016-12-29T19:06:18Z

Yes, It is the same.

How is this different from:

See the list of problems above. It is the list of differences.
And note that CtBaseQuery has no method like list or forEach. There is only apply method, which can be used to evaluate query. So the client cannot use it wrong and it is clear for him/her what is the correct method to call. API is simpler then evaluate based. But there are 2 APIs in query engine. One for element based query - CtQuery and second for reusable query CtBaseQuery.

monperrus · 2016-12-29T20:30:35Z

But there are 2 APIs in query engine. One for element based query - CtQuery and second for reusable query CtBaseQuery.

I'd prefer to avoid this and have a single API.

the method evaluate does not evaluates the query, but only binds the input element to the query. It is the reason, why I did not understood how to use it. Solution: rename it?

OK for renaming. bindTo? prepareFor?`

This API works with single thread only. So the reusability of such API is limited to one thread. May be it is not problem now.

I would say it's not a problem for now. We can document it and make it ready for concurrency afterwards. Baby steps first :-)

element.map(...).evaluate(element2).list(). What it does?

Runtime exception?

q.list(); without binding input element.

Runtime exception?

pvojtechovsky · 2016-12-29T20:43:48Z

I have two questions:

But there are 2 APIs in query engine. One for element based query - CtQuery and second for reusable query CtBaseQuery.

I'd prefer to avoid this and have a single API.

I guess you prefer one API, because it is simpler for clients right? Or is there any other reason?

Here is an example of Query which is based on apply API

public class CtFieldReferenceQuery implements CtLazyFunction<CtField<?>, CtFieldReference<?>> {

	@Override
	public void apply(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
		if (field.hasModifier(ModifierKind.PRIVATE)) {
			searchForPrivateField(field, outputConsumer);
		} else if (field.hasModifier(ModifierKind.PUBLIC)) {
			searchForPublicField(field, outputConsumer);
		} else if (field.hasModifier(ModifierKind.PROTECTED)) {
			searchForProtectedField(field, outputConsumer);
		} else {
			searchForPackageProtectedField(field, outputConsumer);
		}
	}
	private void searchForPrivateField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
		field.getTopLevelType().filterChildren(new DirectReferenceFilter(field.getReference())).forEach(outputConsumer);
	}
	private void searchForProtectedField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
		field.getFactory().getModel().getRootPackage()
			.filterChildren(new SubtypeFilter(field.getDeclaringType().getReference()))
			.filterChildren(new DirectReferenceFilter(field.getReference()))
			.forEach(outputConsumer);
	}
	private void searchForPublicField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
		field.getFactory().getModel().getRootPackage().filterChildren(new DirectReferenceFilter(field.getReference())).forEach(outputConsumer);
	}
	private void searchForPackageProtectedField(CtField<?> field, CtConsumer<CtFieldReference<?>> outputConsumer) {
		field.getTopLevelType().getPackage().filterChildren(new DirectReferenceFilter(field.getReference())).forEach(outputConsumer);
	}
}

It is already working with PR #1076.

What interface class it should extend if we use evaluate based API?

monperrus · 2016-12-29T20:48:54Z

I guess you prefer one API, because it is simpler for clients right?

Yes

What interface class it should extend if we use evaluate based API?

What about make it extend class QueryImpl directly (instead of implementing an interface) (do you remember our discussion on a possible class OverridenMethodQuery extends QueryImpl?)

pvojtechovsky · 2016-12-30T08:25:44Z

But there are 2 APIs in query engine. One for element based query - CtQuery and second for reusable query CtBaseQuery.

I'd prefer to avoid this and have a single API.

I guess you prefer one API, because it is simpler for clients right?

Yes

I am glad that we both want simple API for spoon clients. It is probably unbelievable at first sight, but I think that these 2 simple APIs are simpler/better for clients, then 1 more complex API.

Why 2 is simpler then 1?
Because the only reason, why I designed these 2 APIs is to make it simple for clients. There is no other internal reason for 2 APIs.

You probably think: If there are 2 APIs then client must choose, which one to use, so it is more complex then 1 API, where this choice is not needed.
But it is not true in this case, because this decision is done automatically by the context where the query is used. There are two contexts:

the query, which starts on CtElement

anElement.map(...).forEach(...);
//or 
List aList = anElement.map(...).list();

Note: that in this context client automatically gets CtQuery API. In this context there is no reason to confuse client with methods like evaluate, bindTo or prepareFor. Client does not need these methods here at all, therefore it is simpler for him/her when these methods are not there.

the reusable query, which is created without initial CtElement

CtBaseQuery query = factory.Query().map(...);

//... loop for each searched element ...
query.apply(element, new CtConsumer() {
   void accept(CtElement e) {
      System.out.println(e);
   }
});

Note: that in this context client automatically gets CtBaseQuery API. In this context there is no reason to confuse client with methods like evaluate, bindTo or prepareFor followed by forEach or list. Client has only one simple method apply. I have no problem to rename it, if the name is not good. But again one method is simpler then 2 methods, which has to be called in correct order, which has to be learned in documentation.

Baby steps in wrong direction are steps in wrong direction, even when parents love their baby. ;-)

monperrus · 2016-12-30T09:33:22Z

What if one simply wants the list of elements?

CtBaseQuery query = factory.Query().map(...);
List l1 = query.evaluate(aMethod).list();

What if one wants to refine a reusable query as follows?

CtBaseQuery query = factory.Query().map(...);
...
// refining the query
query.map(...).filterChildren(...)

For those two cases, one needs either one single interface or CtBaseQuery extends CtQuery. I'm fine with both.

pvojtechovsky · 2016-12-30T10:35:20Z

What if one simply wants the list of elements?

Good question. There are several solutions:
A) Nobody needs it in case of reusable queries.
B) If somebody will ak for it, then we can make a baby step and add support for that.
C) client can already write code like this

CtBaseQuery query = factory.Query().map(...);
List l1 = new ArrayList();
//here can be optional loop, which collects results of all the query evaluations in one list, 
//without building intermediate useless lists
//{
query.apply(element, (item)->l1.add(item));
//}end of loop

What if one wants to refine a reusable query as follows?

CtBaseQuery query = factory.Query().map(...);
...
// refining the query
query.map(...).filterChildren(...)

No problem. This code already works. See CtBaseQuery API in PR #1076.

CtBaseQuery extends CtQuery

no, it would kill the idea of that design. Please read the changes in #1076 and you will see what I am speaking about.

monperrus · 2016-12-30T13:34:42Z

CtBaseQuery extends CtQuery

no, it would kill the idea of that design

Why? Currently, there is a huge copy'n'paste between CtQuery and CtBaseQuery (nearly all methods of CtQuery are duplicated in CtBaseQuery).

pvojtechovsky · 2016-12-30T13:47:14Z

The reason is fluent API - The type of returned value, which is different. So it is only semantic copy. But implementation is different.
In one case we have to return CtQuery.
In second case we have to return CtBaseQuery.

I tried to used generic type for returned value, but I failed, because the return type MUST depend on the arguments of the map method. If anybody founds a way how to avoid copy past of declaration of these methods then I will warmly welcome that. I hate copying of code and I always do my best to avoid it.

monperrus · 2016-12-30T13:55:27Z

I understand.

Sometimes, flexible dynamic checks are better than rigid static typing. To me it's the case here.

This clearly calls for a unique interface CtQuery: we'll have all appropriate methods accessible on the returned object, and no duplicate methods.

pvojtechovsky · 2016-12-30T19:32:33Z

It is not design mistake that these methods seems to be copied insecond API. It is design pattern facade.

monperrus · 2016-12-31T08:56:15Z

we'll have all appropriate methods accessible on the returned object

including method apply(Object input, CtConsumer outputConsumer)?

Yes, especially if it's absolutely required for the implementation.

I'm not saying it's a mistake, I'm saying it's simpler and better to have forEach and apply in CtQuery (and evaluate if possible).

Pavel, we've made awesome progress on this topic!

pvojtechovsky · 2016-12-31T10:24:34Z

It is my first open source project. I am developing commercial SW for 21 years (16 java). I like the idea of open source, but I did not know that progress is 10 times slower :( then In my daily work.
Yes we did a big step in understanding of each other. Now you understood and accepted the solution, which I had implemented 3 weeks ago... and I have seen the problems of this solution, so I solved them in PR #1076. Now I am again 1 week further and I am disapointed with idea, that I will need next many weeks of discussions to get acceptance for that.
OK, I see, it is not Your mistake. It is principle of open source - at least two people has to have same problem and then they can solve it. You do not need/develop using queries so it is very hard for me to explain it to you. I will not give up like yesterday evening and will continue ... may be we found a way how to progress faster ;-)

monperrus · 2016-12-31T10:38:55Z

Yes, I know it's slow, essentially because of asynchronous communication. With one or two 15 minute Skype meetings, we would have probably converged directly. Maybe we can do this next time.

But it's really important that we're along the same page, so that we could maintain your code, and so that we can write together high-quality documentation of this new super feature.

pvojtechovsky · 2016-12-31T11:17:50Z

But it's really important that we're along the same page, so that we could maintain your code, and so that we can write together high-quality documentation of this new super feature.

I fully agree! And with skype it will probably not consume so much time :-)

pvojtechovsky · 2017-01-01T10:59:26Z

I mailed you my skype name and I have added the conceptual model and some explanations into PR #1076 . I suggest to discuss and then update this conceptual model description so it is always up to date and we can base future discussion on the agreed terms. I want to minimize misunderstandings.

I think I have finally understood your ideas how to implement the queries. These ideas were already implemented in my PR #1018 before you simplified it.
The forEach(CtConsumer) method was there
The evaluate, bindTo or prepareFor function is there too. See public CtQueryImpl#setInput(Object). It is just not part of public interface API, because I think that it makes the client API more complicated then needed. But let's discuss it later. First please focus on the description of concepts in PR #1076.

I do not need fast answer. Take your time

pvojtechovsky · 2017-01-12T13:43:33Z

Closed together with #1090

monperrus · 2017-01-12T14:05:08Z

Awesome. Thanks a lot Pavel for your creativity and patience. You set the record of the longest PRs in term of discussion :-)

pvojtechovsky · 2017-01-12T14:25:46Z

You set the record

I am glad that it is not normal. :-) It was full of misunderstandings... Therefore it needed so long.
I believe other PRs will proceed much faster. Thanks for your patience too.

monperrus changed the title ~~support for scope-based Queries~~ feature: add support for scope-based queries Dec 22, 2016

monperrus changed the title ~~feature: add support for scope-based queries~~ feature: add support for pipeline-based queries Dec 23, 2016

pvojtechovsky changed the title ~~feature: add support for pipeline-based queries~~ feature: add support for lazy result sending in queries Dec 25, 2016

surli mentioned this issue Jan 12, 2017

feature: reusable and efficient CtQueries #1090

Merged

pvojtechovsky closed this as completed Jan 12, 2017

monperrus mentioned this issue Mar 1, 2017

review: introduction of CtScannerFunction #1180

Merged

bolu61 mentioned this issue Jun 22, 2022

[Discussion] CtQuery Improvements #4756

Open

feature: all you want for queries #1070

feature: all you want for queries #1070

Comments

pvojtechovsky commented Dec 22, 2016 • edited by monperrus Loading

monperrus commented Dec 22, 2016 via email

pvojtechovsky commented Dec 22, 2016

monperrus commented Dec 22, 2016 via email

pvojtechovsky commented Dec 22, 2016

pvojtechovsky commented Dec 22, 2016

monperrus commented Dec 22, 2016 via email

pvojtechovsky commented Dec 23, 2016 • edited Loading

monperrus commented Dec 23, 2016 via email

pvojtechovsky commented Dec 23, 2016 • edited Loading

monperrus commented Dec 23, 2016

pvojtechovsky commented Dec 23, 2016 • edited Loading

pvojtechovsky commented Dec 23, 2016

pvojtechovsky commented Dec 23, 2016

monperrus commented Dec 23, 2016

pvojtechovsky commented Dec 23, 2016

pvojtechovsky commented Dec 25, 2016

monperrus commented Dec 25, 2016

pvojtechovsky commented Dec 25, 2016

pvojtechovsky commented Dec 25, 2016

monperrus commented Dec 25, 2016

pvojtechovsky commented Dec 25, 2016

monperrus commented Dec 25, 2016

pvojtechovsky commented Dec 25, 2016

monperrus commented Dec 25, 2016

pvojtechovsky commented Dec 25, 2016 • edited by monperrus Loading

pvojtechovsky commented Dec 25, 2016

monperrus commented Dec 25, 2016

monperrus commented Dec 28, 2016

pvojtechovsky commented Dec 28, 2016

monperrus commented Dec 29, 2016

pvojtechovsky commented Dec 29, 2016 • edited Loading

monperrus commented Dec 29, 2016

pvojtechovsky commented Dec 29, 2016

monperrus commented Dec 29, 2016

pvojtechovsky commented Dec 29, 2016 • edited Loading

monperrus commented Dec 29, 2016 • edited Loading

pvojtechovsky commented Dec 29, 2016

monperrus commented Dec 29, 2016

pvojtechovsky commented Dec 30, 2016

monperrus commented Dec 30, 2016

pvojtechovsky commented Dec 30, 2016 • edited Loading

monperrus commented Dec 30, 2016

pvojtechovsky commented Dec 30, 2016

monperrus commented Dec 30, 2016

pvojtechovsky commented Dec 30, 2016 • edited Loading

monperrus commented Dec 31, 2016

pvojtechovsky commented Dec 31, 2016

monperrus commented Dec 31, 2016

pvojtechovsky commented Dec 31, 2016

pvojtechovsky commented Jan 1, 2017 • edited Loading

pvojtechovsky commented Jan 12, 2017

monperrus commented Jan 12, 2017 via email

pvojtechovsky commented Jan 12, 2017

pvojtechovsky commented Dec 22, 2016 •

edited by monperrus

Loading

pvojtechovsky commented Dec 23, 2016 •

edited

Loading

pvojtechovsky commented Dec 23, 2016 •

edited

Loading

pvojtechovsky commented Dec 23, 2016 •

edited

Loading

pvojtechovsky commented Dec 25, 2016 •

edited by monperrus

Loading

pvojtechovsky commented Dec 29, 2016 •

edited

Loading

pvojtechovsky commented Dec 29, 2016 •

edited

Loading

monperrus commented Dec 29, 2016 •

edited

Loading

pvojtechovsky commented Dec 30, 2016 •

edited

Loading

pvojtechovsky commented Dec 30, 2016 •

edited

Loading

pvojtechovsky commented Jan 1, 2017 •

edited

Loading