Skip to content
This repository has been archived by the owner on Dec 28, 2017. It is now read-only.

Refactor RangeBuilder RangeSpliter KeyRangeUtils and etc. #129

Open
1 of 3 tasks
zhexuany opened this issue Nov 3, 2017 · 7 comments
Open
1 of 3 tasks

Refactor RangeBuilder RangeSpliter KeyRangeUtils and etc. #129

zhexuany opened this issue Nov 3, 2017 · 7 comments
Assignees

Comments

@zhexuany
Copy link
Contributor

zhexuany commented Nov 3, 2017

  • get rid of Comparables
  • refactor RangeBuilder
  • refactor RangeSpliter.

Where does this used in current code base?

KeyRangeUtils

  • toRange method
    *
  if (range == null || (range.getStart().isEmpty() && range.getEnd().isEmpty())) {
    return Range.all();
  }
  if (range.getStart().isEmpty()) {
    return Range.lessThan(ByteStringCMP.of(range.getEnd()));
  }
  if (range.getEnd().isEmpty()) {
    return Range.atLeast(ByteStringCMP.of(range.getStart()));
  }
  return Range.closedOpen(ByteStringCMP.of(range.getStart()), ByteStringCMP.of(range.getEnd()));
  • makeRange method
    *
  if (startKey.isEmpty() && endKey.isEmpty()) {
    return Range.all();
  }
  if (startKey.isEmpty()) {
    return Range.lessThan(Comparables.wrap(endKey));
  } else if (endKey.isEmpty()) {
    return Range.atLeast(Comparables.wrap(startKey));
  }
  return Range.closedOpen(Comparables.wrap(startKey), Comparables.wrap(endKey));
} 

RangeBuilder

  • exprsToIndexRanges method
    • exprsToPoints
      • exprToPoints
        • normalize condition
        • Iterate all constant values in normalized
      • exprToRanges
        • Comparable<?> comparableVal = Comparables.wrap(constVal.getValue());
  • exprToRanges method
    • get value from constant and make it comparable.
    • and Range need the constant to be a Comparable
      RegionManager
  • used in keyToRegionIdCache
    • keyToRegionIdCache.get(Comparables.wrap(key))
      ScanIterator
  • used in scanRange which is a Range. The key need to be comparable.
    *
  return scanRange.contains(Comparables.wrap(key));
}

Why we need this comparables?

Range need key to be comparable, but ByteString and Bytep[] are not comparable.
Comparables is just a wrapper. Comparables is just to resolve ByteString and Byte[].

Solution for replacing Comparables.

Create a new class call Key

public abstract class RangeKey<T> implements Comparable<RangeKey<T>>{
    T key;      
    public abstract int compareTo(RangeKey<T> b);
}

public class RangeByteStringKey<ByteString> extends RangeKey<ByteString> {
    RangeByteStringKey(ByteString key) {
        this.key = key;
    }

    public int compareTo(RangeKey<ByteString> b) {
        ByteString bKey = b.key;
       int n = Math.min(key.size(), bkey.size());
       for (int i = 0, j = 0; i < n; i++, j++) {
            int cmp = UnsignedBytes.compare(key.byteAt(i), bKey.byteAt(j));
            if (cmp != 0) return cmp;
        }
        // one is the prefix of other then the longer is larger
        return bytes.size() - otherBytes.size();
    }
}

public class RangeByteArrKey<byte[]> extends RangeKey<byte[]> {
    
    private static final Comparator<byte[]> comparator = UnsignedBytes.lexicographicalComparator();
    RangeByteArrKey(byte[] key) {
        this.key = key;
    }

    public int compareTo(RangeKey<byte[]> b) {
        Byte[] bKey = b.key;
       return comparator.compare(bytes, other.bytes);
    }
}

public class RangeCMPKey<C extends Comparable<C>> extends RangeKey<C extends Comparable<C>> {
     public int compareTo(RangeKey<C extends Comparable<C>> b) {
        b.key.compareTo(this.key);
    }
}

@zhexuany
Copy link
Contributor Author

zhexuany commented Nov 3, 2017

@ilovesoup @Novemser @birdstorm PTAL.

@birdstorm
Copy link
Contributor

birdstorm commented Nov 3, 2017

I would like to state my reason why we should replace Comparables with something else:

Although in current base branch, Comparables only needs to deal with comparisons of byte[] type and ByteString type RESPECTIVELY, it should also be capable to resolve more complicated conditions. For instance, when we need to compare byte[] with ByteString directly.

Sometimes it is confusing if we have two types of data waiting to compare with, while we have't get their types explicitly. We know that TiDB stores ranges in form of Datum, it indicates MAX and MIN of the data range by a byte array rather than its real type. When you read the encoded data from database, you would likely encounter various problems including and not limited to comparisons between Integer type and byte array.

In my perspective the new implementation should at least resolve problems above, that's why I have tried a rather silly approach in my selectivity branch, by implementing a TiKey<T> that wraps all types into same type if not implicitly comparable.

I know that wasn't a desirable solution to this issue, we should dig more into this. Because I can't see how the RangeKey<T> solution can dispel my concerns.

@birdstorm
Copy link
Contributor

I am considering building our own Datum type to make cleaner interface, we can store the data type along with data itself inside Datum and compare them in form of byte array implicitly(perhaps iff they are NOT compatible). So we could avoid regularly comparisons that leads to performance loss.

@ilovesoup
Copy link
Contributor

Per discussion offline, I agree with @birdstorm
there couple of problem to address:

  1. unify bytestring from grpc and some internally generated byte array;
  2. all values from user interface(spark sql for now) should use the same representation.
  3. some value are not cleanly represented (INFINITE and values beyond java realm)

@zhexuany
Copy link
Contributor Author

zhexuany commented Nov 3, 2017

@birdstorm @ilovesoup
Thanks for all insightful comments.

Per " when we need to compare byte[] with ByteString directly.", I see no usage in current code base or potential usage. But it is a good point, we need convert our value into a unified form, e.g byte[].

Originally, I came up with a solution converting all value into byte[]. I gave it up because I worried about the performance issue. For ByteString, we convert to byte[] and compare it. It results a quadratic runtime. I thought is not good. However, after the discussion with @ilovesoup offline, I changed my mind. We actually do not spend a lot runtime at plan generation stage. These cost are acceptable.

Here was my original solution:

public abstract class ByteArrayComparable implements Comparable<byte[]> {
       byte[] value;
}

For other types, we can extend from this base class.

@zhexuany
Copy link
Contributor Author

zhexuany commented Nov 6, 2017

this issue is blocked by double index read, will work on this once that PR get merged into master.

@zhexuany zhexuany self-assigned this Nov 6, 2017
@ilovesoup
Copy link
Contributor

I will take over it and refactor type system on the fly.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants