Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Union implementation #444

Merged
merged 2 commits into from
Jan 13, 2016
Merged

Conversation

stevelinton
Copy link
Contributor

Avoids bug reported by Nick Loughlin and detects all cases where the result can be represented as a range without unpacking ranges in the arguments.

@stevelinton
Copy link
Contributor Author

This is WIP for a few reasons at the moment:

  • it assumes that adding an implication between IsPlistRep and HasLength is OK see IsPlistRep should imply HasLength #441.
  • it needs a few bits of documentation like an entry in dev/Updates
  • it needs more thorough tests. I'm not sure the current ones cover all the cases.
  • it should be benchmarked against the existing code, just to make sure it isn't a lot slower in some case.

@stevelinton
Copy link
Contributor Author

Documentation and tests now fixed. Benchmarking would still be good.

@ChrisJefferson
Copy link
Contributor

This example is hit very badly:

f := x -> List([1..x], y -> [y*5..(y+1)*5]); Union(f(10000));

Used to take 12ms, now takes 21,456ms.

While that would be fairly easy to fix with an explicit check for when all input ranges have skip 1, and the output also has skip 1, similar issues arise with skips as well.

f := x -> List([1..x], y -> [y*15,(y+1)*15..(y+5)*15]); Union(f(10000));

Used to take 10ms, now 4,274ms.

I've been talking this over, I think just special casing some important cases (small number of ranges, all ranges dense, ranges all with same skip) might be easier and faster, if less satisfying / does less merging.

@stevelinton
Copy link
Contributor Author

These hit a heuristic that I think I got wrong. If you comment out lines 2929 to 2937 it's much faster.

@ChrisJefferson
Copy link
Contributor

That is much faster. It's still a little slower (if I push up to 1,000,000, then old gap takes about 1.6s, new gap 4.8s). However, the old code slows down quite a bit if I shuffle the list.

I wrote a little extension (which can be seen at https://github.com/ChrisJefferson/gap/tree/union_sweep ), which special cases when all ranges, and the resulting range, are dense, and then does a simple one-pass, which speeds up the algorithm in that case back to the old one.

@ChrisJefferson
Copy link
Contributor

Another bad case:

f := x -> List([1..x], y -> [y..y+Random([1..100])]);

This takes about 1.3s in old GAP, 76s in head. I can greatly improve the performance by sorting by first element instead of largest first on line 2881. I think this is probably a better order, as it will reduce the amount of splitting of ranges which goes on (we could then also add an optimisation which checks if the first element of the remaining values is ever greater than the first element of the first range, and if so abort early).

@stevelinton
Copy link
Contributor Author

I think the latest version deals with this pretty well. I have switched to the order you use (by leftmost entry) and added a variation of your sweep that deals with all the matching stride inputs without creating new ranges. Then whatever is left goes on to be checked against non-matching stride inputs and non-ranges.

@ChrisJefferson
Copy link
Contributor

One small comment, based on what you said you seem to sort first by stride, then first element:
data := List(ranges, r-> [r[2]-r[1],r[1],-Length(r)]);

Other than that (and if you did that on purpose), then this is great. I can imagine no better union algorithm :)

…new test.

Improve test file to get 100% coverage of new code. Fix some bugs. Document change.

Add a sweep phase to deal with the matching stride case faster.
@stevelinton stevelinton changed the title New Union implementation (WIP) New Union implementation Jan 13, 2016
@stevelinton
Copy link
Contributor Author

Ready for merge I think. Fixes a bug so maybe should be in stable, but it's quite a lot of new code
for a very rare bug, so maybe not yet.

@markuspf
Copy link
Member

Looks good to me right now.

olexandr-konovalov pushed a commit that referenced this pull request Jan 13, 2016
@olexandr-konovalov olexandr-konovalov merged commit 71c75fc into gap-system:master Jan 13, 2016
@olexandr-konovalov
Copy link
Member

@stevelinton New diffs in manual examples after this PR, but only the 1st one points to a genuine problem:

============================================================
########> Diff in [ "./../../lib/coll.gd", 3084, 3095 ]
# Input is:
Union( [ (1,2,3), (1,2,3,4) ], Group( (1,2,3), (1,2) ) );
# Expected output:
[ (), (2,3), (1,2), (1,2,3), (1,2,3,4), (1,3,2), (1,3) ]
# But found:
Error, AppendList: <list2> must be a small list (not a object (compone\
nt))
########
########> Diff in [ "./../../lib/coll.gd", 3084, 3095 ]
# Input is:
Union( [ [1,2,4], [2,3,4], [1,3,4] ] )
  ;  # or one list of lists or collections
# Expected output:
[ 1, 2, 3, 4 ]
# But found:
[ 1 .. 4 ]
########
########> Diff in [ "./../../lib/grp.gd", 2679, 2687 ]
# Input is:
FactorGroup(g,n);
# Expected output:
Group([ f1, f2 ])
# But found:
<pc group of size 6 with 2 generators>
########
########> Diff in [ "./../../lib/oprt.gd", 1603, 1606 ]
# Input is:
MaximalBlocks(g,[1..8]);
# Expected output:
[ [ 1, 2, 3, 8 ], [ 4, 5, 6, 7 ] ]
# But found:
[ [ 1, 2, 3, 8 ], [ 4 .. 7 ] ]
########
########> Diff in [ "./pperm.xml", 573, 577 ]
# Input is:
ImageOfPartialPermCollection(GeneratorsOfInverseSemigroup(S));
# Expected output:
[ 1, 2, 3, 4, 5 ]
# But found:
[ 1 .. 5 ]
########
########> Diff in [ "./intrfc.xml", 749, 773 ]
# Input is:
a+b;
# Expected output:
[ 1, 2, 3, 4 ]
# But found:
[ 1 .. 4 ]
########

@stevelinton
Copy link
Contributor Author

Fixed in a commit a minute ago.

S

On 14 Jan 2016, at 11:31, Alexander Konovalov [email protected] wrote:

@stevelinton New diffs in manual examples after this PR, but only the 1st one points to a genuine problem:

########> Diff in [ "./../../lib/coll.gd", 3084, 3095 ]

Input is:

Union( [ (1,2,3), (1,2,3,4) ], Group( (1,2,3), (1,2) ) );

Expected output:

[ (), (2,3), (1,2), (1,2,3), (1,2,3,4), (1,3,2), (1,3) ]

But found:

Error, AppendList: must be a small list (not a object (compone
nt))
########
########> Diff in [ "./../../lib/coll.gd", 3084, 3095 ]

Input is:

Union( [ [1,2,4], [2,3,4], [1,3,4] ] )
; # or one list of lists or collections

Expected output:

[ 1, 2, 3, 4 ]

But found:

[ 1 .. 4 ]
########
########> Diff in [ "./../../lib/grp.gd", 2679, 2687 ]

Input is:

FactorGroup(g,n);

Expected output:

Group([ f1, f2 ])

But found:

<pc group of size 6 with 2 generators>
########
########> Diff in [ "./../../lib/oprt.gd", 1603, 1606 ]

Input is:

MaximalBlocks(g,[1..8]);

Expected output:

[ [ 1, 2, 3, 8 ], [ 4, 5, 6, 7 ] ]

But found:

[ [ 1, 2, 3, 8 ], [ 4 .. 7 ] ]
########
########> Diff in [ "./pperm.xml", 573, 577 ]

Input is:

ImageOfPartialPermCollection(GeneratorsOfInverseSemigroup(S));

Expected output:

[ 1, 2, 3, 4, 5 ]

But found:

[ 1 .. 5 ]
########
########> Diff in [ "./intrfc.xml", 749, 773 ]

Input is:

a+b;

Expected output:

[ 1, 2, 3, 4 ]

But found:

[ 1 .. 4 ]
########


Reply to this email directly or view it on GitHub.

@bh11
Copy link
Contributor

bh11 commented Jan 17, 2016

The new Union code breaks test code in the CRISP package, namely

Union (TrivialGroups, TrivialGroups);

The reason is that the new code tries to apply 'Set' to to an object representing a proper class (lines 2728 and 2755). Does it make sense to use 'Set' only if the object in question is a list and to return the collection otherwise? An attempt to unite two collections which happen to be identical might run into a similar problem, trying to expand them into a (possiby huge) ssorted list.

I suspect that there is a similar problem with resclasses.

@stevelinton
Copy link
Contributor Author

@bh11 you're quite right. The special path for union of a single object shouldn't blindly apply Set.

@stevelinton
Copy link
Contributor Author

Have pushed a further fix.

@olexandr-konovalov
Copy link
Member

Another related error could be in SimpComp package:

########> Diff in /home/hudson/hudson/workspace/GAP-major-release-test/GAPCOPT\
S/64build/GAPGMP/gmp/GAPTARGET/packages/label/graupius/gap4r9/pkg/simpcomp/tst\
/simpcomp.tst, line 982:
# Input is:
c:=SCFromDifferenceCycles([[1,1,6],[3,3,2]]);;
# Expected output:
# But found:
Error, Union: arguments must be lists or collections
########

@bh11
Copy link
Contributor

bh11 commented Jan 17, 2016

@stevelinton Thanks.

Btw., I think that line 2730 should read 'return Set(x)', not just 'Set(x)'.

@stevelinton
Copy link
Contributor Author

@alex-konovalov This one is slightly ugly.

simpcomp creates some objects that do not have IsListOrCollection but for which it installs a Union2 method. Since my code checks IsListOrCollection, it gives the error before ever getting to Union2.

This is easy enough to fix. I can remove the check and let bad cases go through to a No Method Found for Union2.

However, what am I supposed to do if asked for the Union of just ONE such object?

If I don't do the check then you get daft things like Union([Z(5)]) returning Z(5), but I can't easily tell whether a arbitrary object is going to have a method for Union2.

I coudl return Union2(x,x) in such a situation, but it seems rather a kludge.

I wonder if the right answer here is to suggest that simpcomp should set IsListOrCollection for these objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants