-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collections API #1606
Merged
echeipesh
merged 26 commits into
locationtech:master
from
pomadchin:feature/collections-api
Sep 1, 2016
Merged
Collections API #1606
Changes from 12 commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
c70a674
collections api init
pomadchin 688d088
LayerCollection readers for all backends without optimisations; Hadoo…
pomadchin 591ab02
fix hadoop collection reader
pomadchin 0928cc5
add reading
pomadchin c67d646
Merge branch 'master' of github.com:pomadchin/geotrellis into feature…
pomadchin 43405d5
+file multithread reads
pomadchin ee67aa6
fix collections api
pomadchin 2abeb1f
hbase collection reader
pomadchin 2b91182
parallelize reads in collections api
pomadchin 50e0023
improve collection api reads
pomadchin e85fb2d
fixed thread pools in collection readers
pomadchin 7158cf2
collections reading threads are configurable
pomadchin 47352b7
Accumulo sim; removed partitions number, generic njoin func
pomadchin 16b4117
Merge branch 'master' of github.com:pomadchin/geotrellis into feature…
pomadchin 807134f
accumulo and hbase etl fix
pomadchin 7367d66
hide thread pool creation / closing inside njoin function
pomadchin 821b09d
improve thread pool size definition
pomadchin 1e083f0
explicit return type in all colelction readers
pomadchin bbf27c3
hbase conenction control fix
pomadchin 3eead88
hbase reads performance improvements
pomadchin 20df234
rollback readers; they were operating normally; problems were caused …
pomadchin d08f30d
safer hbase scanners handle
pomadchin 238f320
LayerCollection.njoin function usage
pomadchin 7a499dd
LayerCollection.njoin function usage
pomadchin 7b2f659
Merge branch 'feature/collections-api' of https://github.com/pomadchi…
echeipesh 3c0134c
Merge pull request #15 from echeipesh/feature/collections-api-njoin
pomadchin File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
43 changes: 43 additions & 0 deletions
43
accumulo/src/main/scala/geotrellis/spark/io/accumulo/AccumuloCollectionReader.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
package geotrellis.spark.io.accumulo | ||
|
||
import geotrellis.spark.io.avro.codecs.KeyValueRecordCodec | ||
import geotrellis.spark.io.avro.{AvroEncoder, AvroRecordCodec} | ||
import geotrellis.spark.{Boundable, KeyBounds} | ||
|
||
import org.apache.accumulo.core.data.{Range => AccumuloRange} | ||
import org.apache.accumulo.core.security.Authorizations | ||
import org.apache.avro.Schema | ||
import org.apache.hadoop.io.Text | ||
|
||
import scala.collection.JavaConversions._ | ||
import scala.reflect.ClassTag | ||
|
||
object AccumuloCollectionReader { | ||
def read[K: Boundable: AvroRecordCodec: ClassTag, V: AvroRecordCodec: ClassTag]( | ||
table: String, | ||
columnFamily: Text, | ||
queryKeyBounds: Seq[KeyBounds[K]], | ||
decomposeBounds: KeyBounds[K] => Seq[AccumuloRange], | ||
filterIndexOnly: Boolean, | ||
writerSchema: Option[Schema] = None | ||
)(implicit instance: AccumuloInstance): Seq[(K, V)] = { | ||
if(queryKeyBounds.isEmpty) return Seq.empty[(K, V)] | ||
|
||
val codec = KeyValueRecordCodec[K, V] | ||
val includeKey = (key: K) => queryKeyBounds.includeKey(key) | ||
|
||
val ranges = queryKeyBounds.flatMap(decomposeBounds) | ||
|
||
ranges flatMap { range: AccumuloRange => | ||
val scanner = instance.connector.createScanner(table, new Authorizations()) | ||
scanner.setRange(range) | ||
scanner.fetchColumnFamily(columnFamily) | ||
scanner.iterator.map { case entry => | ||
AvroEncoder.fromBinary(writerSchema.getOrElse(codec.schema), entry.getValue.get)(codec) | ||
}.flatMap { pairs: Vector[(K, V)] => | ||
if(filterIndexOnly) pairs | ||
else pairs.filter { pair => includeKey(pair._1) } | ||
} | ||
} | ||
} | ||
} |
47 changes: 47 additions & 0 deletions
47
accumulo/src/main/scala/geotrellis/spark/io/accumulo/AccumuloLayerCollectionReader.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
package geotrellis.spark.io.accumulo | ||
|
||
import geotrellis.spark._ | ||
import geotrellis.spark.io._ | ||
import geotrellis.spark.io.avro._ | ||
import geotrellis.util._ | ||
|
||
import org.apache.accumulo.core.data.{Range => AccumuloRange} | ||
import org.apache.hadoop.io.Text | ||
import spray.json._ | ||
|
||
import scala.reflect._ | ||
|
||
class AccumuloLayerCollectionReader(val attributeStore: AttributeStore)(implicit instance: AccumuloInstance) extends CollectionLayerReader[LayerId] { | ||
|
||
def read[ | ||
K: AvroRecordCodec: Boundable: JsonFormat: ClassTag, | ||
V: AvroRecordCodec: ClassTag, | ||
M: JsonFormat: GetComponent[?, Bounds[K]] | ||
](id: LayerId, rasterQuery: LayerQuery[K, M], numPartitions: Int, filterIndexOnly: Boolean) = { | ||
if (!attributeStore.layerExists(id)) throw new LayerNotFoundError(id) | ||
|
||
val LayerAttributes(header, metadata, keyIndex, writerSchema) = try { | ||
attributeStore.readLayerAttributes[AccumuloLayerHeader, M, K](id) | ||
} catch { | ||
case e: AttributeNotFoundError => throw new LayerReadError(id).initCause(e) | ||
} | ||
|
||
val queryKeyBounds = rasterQuery(metadata) | ||
|
||
val decompose = (bounds: KeyBounds[K]) => | ||
keyIndex.indexRanges(bounds).map { case (min, max) => | ||
new AccumuloRange(new Text(AccumuloKeyEncoder.long2Bytes(min)), new Text(AccumuloKeyEncoder.long2Bytes(max))) | ||
} | ||
|
||
val seq = AccumuloCollectionReader.read[K, V](header.tileTable, columnFamily(id), queryKeyBounds, decompose, filterIndexOnly, Some(writerSchema)) | ||
new ContextCollection(seq, metadata) | ||
} | ||
} | ||
|
||
object AccumuloLayerCollectionReader { | ||
def apply(attributeStore: AccumuloAttributeStore)(implicit instance: AccumuloInstance): AccumuloLayerCollectionReader = | ||
new AccumuloLayerCollectionReader(attributeStore) | ||
|
||
def apply(implicit instance: AccumuloInstance): AccumuloLayerCollectionReader = | ||
new AccumuloLayerCollectionReader(AccumuloAttributeStore(instance.connector)) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
92 changes: 92 additions & 0 deletions
92
cassandra/src/main/scala/geotrellis/spark/io/cassandra/CassandraCollectionReader.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
package geotrellis.spark.io.cassandra | ||
|
||
import geotrellis.spark.{Boundable, KeyBounds, LayerId} | ||
import geotrellis.spark.io.CollectionLayerReader | ||
import geotrellis.spark.io.avro.codecs.KeyValueRecordCodec | ||
import geotrellis.spark.io.avro.{AvroEncoder, AvroRecordCodec} | ||
import geotrellis.spark.io.index.{IndexRanges, MergeQueue} | ||
import geotrellis.spark.util.KryoWrapper | ||
|
||
import org.apache.avro.Schema | ||
import com.datastax.driver.core.querybuilder.QueryBuilder | ||
import com.datastax.driver.core.querybuilder.QueryBuilder.{eq => eqs} | ||
import com.typesafe.config.ConfigFactory | ||
import scalaz.std.vector._ | ||
import scalaz.concurrent.{Strategy, Task} | ||
import scalaz.stream.{Process, nondeterminism} | ||
|
||
import java.util.concurrent.Executors | ||
import scala.collection.JavaConversions._ | ||
import scala.reflect.ClassTag | ||
|
||
object CassandraCollectionReader { | ||
def read[K: Boundable : AvroRecordCodec : ClassTag, V: AvroRecordCodec : ClassTag]( | ||
instance: CassandraInstance, | ||
keyspace: String, | ||
table: String, | ||
layerId: LayerId, | ||
queryKeyBounds: Seq[KeyBounds[K]], | ||
decomposeBounds: KeyBounds[K] => Seq[(Long, Long)], | ||
filterIndexOnly: Boolean, | ||
writerSchema: Option[Schema] = None, | ||
numPartitions: Option[Int] = None, | ||
threads: Int = ConfigFactory.load().getInt("geotrellis.cassandra.threads.collection.read") | ||
): Seq[(K, V)] = { | ||
if (queryKeyBounds.isEmpty) return Seq.empty[(K, V)] | ||
|
||
val includeKey = (key: K) => queryKeyBounds.includeKey(key) | ||
val _recordCodec = KeyValueRecordCodec[K, V] | ||
val kwWriterSchema = KryoWrapper(writerSchema) //Avro Schema is not Serializable | ||
|
||
val ranges = if (queryKeyBounds.length > 1) | ||
MergeQueue(queryKeyBounds.flatMap(decomposeBounds)) | ||
else | ||
queryKeyBounds.flatMap(decomposeBounds) | ||
|
||
val bins = IndexRanges.bin(ranges, numPartitions.getOrElse(CollectionLayerReader.defaultNumPartitions)).toVector.map(_.toIterator) | ||
|
||
val query = QueryBuilder.select("value") | ||
.from(keyspace, table) | ||
.where(eqs("key", QueryBuilder.bindMarker())) | ||
.and(eqs("name", layerId.name)) | ||
.and(eqs("zoom", layerId.zoom)) | ||
.toString | ||
|
||
val pool = Executors.newFixedThreadPool(threads) | ||
|
||
val result = instance.withSessionDo { session => | ||
val statement = session.prepare(query) | ||
|
||
bins flatMap { partition => | ||
val range: Process[Task, Iterator[Long]] = Process.unfold(partition) { iter => | ||
if (iter.hasNext) { | ||
val (start, end) = iter.next() | ||
Some((start to end).toIterator, iter) | ||
} | ||
else None | ||
} | ||
|
||
val read: Iterator[Long] => Process[Task, Vector[(K, V)]] = { iterator => | ||
Process.unfold(iterator) { iter => | ||
if (iter.hasNext) { | ||
val index = iter.next() | ||
val row = session.execute(statement.bind(index.asInstanceOf[java.lang.Long])) | ||
if (row.nonEmpty) { | ||
val bytes = row.one().getBytes("value").array() | ||
val recs = AvroEncoder.fromBinary(kwWriterSchema.value.getOrElse(_recordCodec.schema), bytes)(_recordCodec) | ||
if (filterIndexOnly) Some(recs, iter) | ||
else Some(recs.filter { row => includeKey(row._1) }, iter) | ||
} else Some(Vector.empty, iter) | ||
} else { | ||
None | ||
} | ||
} | ||
} | ||
|
||
nondeterminism.njoin(maxOpen = threads, maxQueued = threads) { range map read }(Strategy.Executor(pool)).runFoldMap(identity).unsafePerformSync | ||
} | ||
} | ||
|
||
pool.shutdown(); result | ||
} | ||
} |
42 changes: 42 additions & 0 deletions
42
cassandra/src/main/scala/geotrellis/spark/io/cassandra/CassandraLayerCollectionReader.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
package geotrellis.spark.io.cassandra | ||
|
||
import geotrellis.spark._ | ||
import geotrellis.spark.io._ | ||
import geotrellis.spark.io.avro._ | ||
import geotrellis.util._ | ||
|
||
import spray.json._ | ||
|
||
import scala.reflect._ | ||
|
||
class CassandraLayerCollectionReader(val attributeStore: AttributeStore, instance: CassandraInstance) extends CollectionLayerReader[LayerId] { | ||
|
||
def read[ | ||
K: AvroRecordCodec: Boundable: JsonFormat: ClassTag, | ||
V: AvroRecordCodec: ClassTag, | ||
M: JsonFormat: GetComponent[?, Bounds[K]] | ||
](id: LayerId, rasterQuery: LayerQuery[K, M], numPartitions: Int, filterIndexOnly: Boolean) = { | ||
if (!attributeStore.layerExists(id)) throw new LayerNotFoundError(id) | ||
|
||
val LayerAttributes(header, metadata, keyIndex, writerSchema) = try { | ||
attributeStore.readLayerAttributes[CassandraLayerHeader, M, K](id) | ||
} catch { | ||
case e: AttributeNotFoundError => throw new LayerReadError(id).initCause(e) | ||
} | ||
|
||
val queryKeyBounds = rasterQuery(metadata) | ||
|
||
val decompose = (bounds: KeyBounds[K]) => keyIndex.indexRanges(bounds) | ||
|
||
val seq = CassandraCollectionReader.read[K, V](instance, header.keyspace, header.tileTable, id, queryKeyBounds, decompose, filterIndexOnly, Some(writerSchema)) | ||
new ContextCollection(seq, metadata) | ||
} | ||
} | ||
|
||
object CassandraLayerCollectionReader { | ||
def apply(instance: CassandraInstance): CassandraLayerCollectionReader = | ||
new CassandraLayerCollectionReader(CassandraAttributeStore(instance), instance) | ||
|
||
def apply(attributeStore: CassandraAttributeStore): CassandraLayerCollectionReader = | ||
new CassandraLayerCollectionReader(attributeStore, attributeStore.instance) | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know scanner will have a thread pool available to run multiple requests, does it do the thing that is very useful here and process multiple ranges async or are they essentially sequential from range to range but async for each range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async for each range; would fix that issue with updating that pr up to master; i thought it is completely async, but forgot that thing is valid only in case of hbase, where we can setup multirange scanner