-
Notifications
You must be signed in to change notification settings - Fork 323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement relational NULL semantics for Nothing for in-memory Column operations #8816
Changes from 55 commits
32e5781
d76d2ab
8b5d2d1
f7b18e9
9b67c34
2543611
0e6422b
85ca4bf
f4e89ec
9a7b047
bba6daa
987c366
e35bfda
ad4bada
9ef1d05
1118fa2
42b7762
ef83197
0a62d1d
9046cba
49c54e6
db5634f
1c94d88
6f53139
8416e0b
10ef070
ae8c4c6
573de61
e923c7a
8425998
665a6a8
577a993
a2ef8fd
0dddfac
dcb3e0a
fd49520
4c99f18
656538c
36d9aa7
26e537e
276bf76
b33e5df
dbf0df7
8de8498
a23b341
aa193f7
af922af
1e8f1d9
a1c481a
cfce031
305c018
fd40de3
564587d
391a421
b4db21b
83645ba
b692bea
0d328cf
d5280bd
7f610d0
ac11b26
2f77775
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -55,16 +55,21 @@ public Storage<?> runMap(S storage, List<?> arg) { | |
Context context = Context.getCurrent(); | ||
CompactRepresentation<T> compactRepresentation = prepareList(arg); | ||
BitSet newVals = new BitSet(); | ||
BitSet missing = new BitSet(); | ||
for (int i = 0; i < storage.size(); i++) { | ||
if (storage.isNa(i) && compactRepresentation.hasNulls) { | ||
newVals.set(i); | ||
if (storage.isNa(i)) { | ||
missing.set(i); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if Do we have a test case for such scenario? If not, please let's add it and ensure the implementation is correct. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done. |
||
} else if (compactRepresentation.coercedValues.contains(storage.getItemBoxed(i))) { | ||
newVals.set(i); | ||
} else if (compactRepresentation.hasNulls) { | ||
missing.set(i); | ||
} else { | ||
// Leave as default=false | ||
} | ||
|
||
context.safepoint(); | ||
} | ||
return new BoolStorage(newVals, new BitSet(), storage.size(), false); | ||
return new BoolStorage(newVals, missing, storage.size(), false); | ||
} | ||
|
||
@Override | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,17 @@ | ||
package org.enso.table.data.column.operation.map.bool; | ||
|
||
import java.util.BitSet; | ||
import java.util.List; | ||
import org.enso.table.data.column.operation.map.BinaryMapOperation; | ||
import org.enso.table.data.column.operation.map.MapOperationProblemAggregator; | ||
import org.enso.table.data.column.storage.BoolStorage; | ||
import org.enso.table.data.column.storage.Storage; | ||
import org.enso.table.util.ImmutableBitSet; | ||
import org.graalvm.polyglot.Context; | ||
|
||
/** | ||
* A specialized implementation for the IS_IN operation on booleans - since booleans have just three | ||
* possible values we can have a highly efficient implementation that does not even rely on hashmap | ||
* and after processing the input vector, performs the checks in constant time. | ||
* and after processing the input vector, performs the checks using only BitSet builtins. | ||
*/ | ||
public class BooleanIsInOp extends BinaryMapOperation<Boolean, BoolStorage> { | ||
public BooleanIsInOp() { | ||
|
@@ -60,43 +60,30 @@ public Storage<?> runZip( | |
} | ||
|
||
private BoolStorage run(BoolStorage storage, boolean hadNull, boolean hadTrue, boolean hadFalse) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps just my lack of knowledge of the codebase, but what is the difference between Null, Nothing and Missing? This PR is about Nothing, this function takes a hadNull and generates a newMissing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think they all mean the same concept here really, so it is just a bit of chaos that got into the codebase over time.
We could probably rename all of these to |
||
BitSet newVals; | ||
boolean negated = false; | ||
int size = storage.size(); | ||
ImmutableBitSet values = new ImmutableBitSet(storage.getValues(), size); | ||
ImmutableBitSet missing = new ImmutableBitSet(storage.getIsMissing(), size); | ||
boolean negated = storage.isNegated(); | ||
|
||
if (hadNull && hadTrue && hadFalse) { | ||
// We use empty newVals which has everything set to false and negate it to make all of that | ||
// set to true with zero cost. | ||
newVals = new BitSet(); | ||
negated = true; | ||
} else if (!hadNull && !hadTrue && !hadFalse) { | ||
// No values are present, so the result is to be false everywhere. | ||
newVals = new BitSet(); | ||
} else if (hadNull && !hadTrue && !hadFalse) { | ||
// Only missing values are in the set, so we just return the missing indicator. | ||
newVals = storage.getIsMissing(); | ||
} else if (hadTrue && hadFalse) { // && !hadNull | ||
// All non-missing values are in the set - so we just return the negated missing indicator. | ||
newVals = storage.getIsMissing(); | ||
negated = true; | ||
} else { | ||
// hadTrue != hadFalse | ||
newVals = storage.getValues().get(0, storage.size()); | ||
if (hadTrue) { | ||
if (storage.isNegated()) { | ||
newVals.flip(0, storage.size()); | ||
} | ||
} else { // hadFalse | ||
if (!storage.isNegated()) { | ||
newVals.flip(0, storage.size()); | ||
} | ||
} | ||
newVals.andNot(storage.getIsMissing()); | ||
ImmutableBitSet newValues; | ||
ImmutableBitSet newMissing; | ||
|
||
if (hadNull) { | ||
newVals.or(storage.getIsMissing()); | ||
} | ||
if (hadTrue && !hadFalse) { | ||
newValues = storage.isNegated() ? missing.notAndNot(values) : missing.notAnd(values); | ||
newMissing = | ||
hadNull ? (storage.isNegated() ? missing.or(values) : missing.orNot(values)) : missing; | ||
} else if (!hadTrue && hadFalse) { | ||
newValues = storage.isNegated() ? missing.notAnd(values) : missing.notAndNot(values); | ||
newMissing = | ||
hadNull ? (storage.isNegated() ? missing.orNot(values) : missing.or(values)) : missing; | ||
} else if (hadTrue && hadFalse) { | ||
newValues = missing.not(); | ||
newMissing = missing; | ||
} else { | ||
newValues = ImmutableBitSet.allFalse(size); | ||
newMissing = hadNull ? ImmutableBitSet.allTrue(size) : ImmutableBitSet.allFalse(size); | ||
} | ||
|
||
return new BoolStorage(newVals, new BitSet(), storage.size(), negated); | ||
return new BoolStorage(newValues.toBitSet(), newMissing.toBitSet(), size, false); | ||
} | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
package org.enso.table.util; | ||
|
||
import java.util.BitSet; | ||
|
||
/** | ||
* A wrapper around BitSet that implements boolean operations conveniently. Unlike BitSet, | ||
* ImmutableBitSet takes a size parameter, which allows .not to be implemented. | ||
*/ | ||
Comment on lines
+5
to
+8
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Cool! |
||
public class ImmutableBitSet { | ||
private BitSet bitSet; | ||
private int size; | ||
|
||
public ImmutableBitSet(BitSet bitSet, int size) { | ||
this.bitSet = bitSet; | ||
this.size = size; | ||
} | ||
|
||
public BitSet toBitSet() { | ||
return bitSet; | ||
} | ||
|
||
public ImmutableBitSet and(ImmutableBitSet other) { | ||
assert size == other.size; | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.and(other.bitSet); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public ImmutableBitSet or(ImmutableBitSet other) { | ||
assert size == other.size; | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.or(other.bitSet); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public ImmutableBitSet andNot(ImmutableBitSet other) { | ||
assert size == other.size; | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.andNot(other.bitSet); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public ImmutableBitSet not() { | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.flip(0, size); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public ImmutableBitSet notAnd(ImmutableBitSet other) { | ||
assert size == other.size; | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.flip(0, size); | ||
result.and(other.bitSet); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public ImmutableBitSet notAndNot(ImmutableBitSet other) { | ||
assert size == other.size; | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.flip(0, size); | ||
result.andNot(other.bitSet); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public ImmutableBitSet orNot(ImmutableBitSet other) { | ||
// Doing an extra operation to avoid doing an extra allocation. | ||
// a || !b => !(!a && b) | ||
assert size == other.size; | ||
BitSet result = (BitSet) bitSet.clone(); | ||
result.flip(0, size); | ||
result.and(other.bitSet); | ||
result.flip(0, size); | ||
return new ImmutableBitSet(result, size); | ||
} | ||
|
||
public static ImmutableBitSet allFalse(int size) { | ||
return new ImmutableBitSet(new BitSet(), size); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is return new ImmutableBitSet(new BitSet(size), size); going to make the allocation more efficient? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking at the usages, I think to the contrary. The thing is that So it creates the So actually I think that keeping not adding the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, |
||
} | ||
|
||
public static ImmutableBitSet allTrue(int size) { | ||
return new ImmutableBitSet(new BitSet(), size).not(); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similarly return new ImmutableBitSet(new BitSet(size), size).not(); ? |
||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@radeusgd I'm not sure how to invoke the fallback implementation of
==
(line 214) -- what case uses this?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need more context for this question. Line 214 of which file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 214 of this file -- I can't leave a comment there.
Or here: https://github.com/enso-org/enso/blob/develop/distribution/lib/Standard/Table/0.0.0-dev/src/Data/Column.enso#L214
I think this code might need to be modified for handling nulls differently, but I'm not sure what cases use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try using
==
on aMixed
column, e.g.I think it should trigger it, but you can add some println to make sure :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I remember now --
==
uses skip_nulls=True, so the fallback is never used for nulls. So no change is needed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh :D