-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(spanner): add SelectAll method to decode from Spanner iterator.Rows to golang struct #9206
Conversation
@CAFxX Can you please help review this please |
spanner/row.go
Outdated
reflect.ValueOf(dst).Elem().Field(i).Set(reflect.ValueOf(p).Elem()) | ||
} | ||
} | ||
sliceVal.Set(reflect.Append(sliceVal, sliceItem)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be great if we could preallocate the whole resultset, but IIRC this is not possible today because the spanner client does not know how many rows are there in the resultset until the end (not sure if this is a fundamental issue, or it can be improved)
@rahul2393 to be able to know what to focus on in the review, can you give me a rough idea of what kind of optimizations you have in your mind? |
Thanks CAFxX for review, I am trying to reduce the allocation ops as much as possible by preallocating as you said, since Spanner Row does not have meta information on number of rows we need to see how we can get maximum performance given the situation. |
CAFxX
For others
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second round
spanner/row.go
Outdated
|
||
isPrimitive := itemType.Kind() != reflect.Struct | ||
var pointers []interface{} | ||
isFistRow := true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] isFirstRow
spanner/row.go
Outdated
sliceItem.Field(i).Set(reflect.ValueOf(p).Elem()) | ||
} | ||
if rowIndex >= 0 { | ||
sliceVal.Index(rowIndex).Set(sliceItem) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this panic if the number of rows returned by RowsReturned is not precise (e.g. it's a lower bound)?
Does it have a huge impact if instead of using reflect.MakeSlice(sliceType, int(nRows), int(nRows))
we use reflect.MakeSlice(sliceType, 0, int(nRows))
and always append? this would handle both the cases in which RowsReturned returns a number that is too low (instead of panicking) or too high (instead of returning a slice with extra zero elements at the end). It would also make it easier in case we want to guarantee that we always append to a user-provided slice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, using reflect.MakeSlice(sliceType, 0, int(nRows))
was 25% slower compared to preallocated and setting the value via index, I have updated the code to not panic if RowsReturned
is lower bound,
btw it will be lower bound only for DML and PDML so normal Select queries should work without needing append.
spanner/row.go
Outdated
if isFistRow { | ||
nRows := rows.RowsReturned() | ||
if nRows != -1 { | ||
sliceVal = reflect.MakeSlice(sliceType, int(nRows), int(nRows)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the user passes in v
a slice that is preallocated, I think we should append to that slice instead of allocating a new one (and if so, this behavior should be documented as guaranteed)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated in comment that we will resets the destination slice
spanner/row.go
Outdated
@@ -378,3 +386,133 @@ func (r *Row) ToStructLenient(p interface{}) error { | |||
true, | |||
) | |||
} | |||
|
|||
// SelectAll scans rows into a slice (v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] we should probably document, just as a suggested best practice to avoid running out of memory, to only use SelectAll
on resultsets that the user knows to be bounded in size
spanner/row.go
Outdated
@@ -378,3 +386,133 @@ func (r *Row) ToStructLenient(p interface{}) error { | |||
true, | |||
) | |||
} | |||
|
|||
// SelectAll scans rows into a slice (v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should document the accepted types of v (or enforce it with generics)
spanner/row.go
Outdated
} | ||
vType := reflect.TypeOf(v) | ||
if k := vType.Kind(); k != reflect.Ptr { | ||
return errToStructArgType(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this the right error message here? shouldn't it be errNotASlicePointer?
spanner/row.go
Outdated
isFistRow := true | ||
rowIndex := -1 | ||
return rows.Do(func(row *Row) error { | ||
sliceItem := reflect.New(itemType).Elem() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I'm following correctly, this implies the type of v
must be something like *[]*struct {...}
.
Would it be too hard to also support *[]struct { ... }
? The benefit for users that pass a *[]struct { ... }
would be less calls to the allocator, lower allocated memory, and better memory locality (as all rows are guaranteed to be contiguous in memory, and no pointer chasing is required per element).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad, misread the code
spanner/read.go
Outdated
switch r.QueryStats["rows_returned"].(type) { | ||
case float64: | ||
return r.QueryStats["rows_returned"].(int64) | ||
case string: | ||
v, err := strconv.Atoi(r.QueryStats["rows_returned"].(string)) | ||
if err != nil { | ||
return -1 | ||
} | ||
return int64(v) | ||
default: | ||
return -1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
switch r.QueryStats["rows_returned"].(type) { | |
case float64: | |
return r.QueryStats["rows_returned"].(int64) | |
case string: | |
v, err := strconv.Atoi(r.QueryStats["rows_returned"].(string)) | |
if err != nil { | |
return -1 | |
} | |
return int64(v) | |
default: | |
return -1 | |
} | |
switch rowsReturned := r.QueryStats["rows_returned"].(type) { | |
case float64: | |
return int64(rowsReturned) | |
case string: | |
v, err := strconv.ParseInt(rowsReturned, 10, 64) | |
if err != nil { | |
v = -1 | |
} | |
return v | |
} |
spanner/read.go
Outdated
// RowsReturned returns the number of rows returned by the query. If the query | ||
// was a DML statement, the number of rows affected is returned. If the query | ||
// was a PDML statement, the number of rows affected is a lower bound. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// RowsReturned returns the number of rows returned by the query. If the query | |
// was a DML statement, the number of rows affected is returned. If the query | |
// was a PDML statement, the number of rows affected is a lower bound. | |
// RowsReturned returns, a lower bound on the number of rows returned by the query. | |
// Currently this requires the query to be executed with query stats enabled. | |
// | |
// If the query was a DML statement, the number of rows affected is returned. | |
// If the query was a PDML statement, the number of rows affected is a lower bound. | |
// If the query was executed without query stats enabled, or if it is otherwise | |
// impossible to determine the number of rows in the resultset, -1 is returned. |
…ows to golang struct
@CAFxX please help take a look again. |
spanner/read.go
Outdated
@@ -121,6 +130,29 @@ type RowIterator struct { | |||
sawStats bool | |||
} | |||
|
|||
// RowsReturned returns, a lower bound on the number of rows returned by the query. | |||
// Currently, this requires the query to be executed with query stats enabled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should also document that the query stats are only included in the last response of the gRPC stream. That means that if you run a query with query stats enabled that returns a large number of rows, then this method is only guaranteed to return the number of rows once you have iterated through all the rows (which again makes it a bit less useful). So maybe add the following to this sentence: The query stats are only guaranteed to be available after iterating through all the rows.
And change the last point to something like this:
// If the query was executed without query stats enabled, if the query stats have not yet been
// returned by the server (the query stats are included in the last message of the query stream),
// or if it is otherwise impossible to determine the number of rows in the resultset, -1 is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it is especially true for SELECT
queries. DML/PDML will only return one PartialResultSet
, as they only return an update count and/or a small number of rows (in case the contain a THEN RETURN
clause).
SELECT
queries that are executed with AnalyzeMode=PROFILE
will include the number of rows returned, but only in the ResultSetStats
, and ResultSetStats
are returned together with the last PartialResultSet
of the stream. So for large queries, this value will only be available once you have iterated through enough of the rows for the client to have fetched the last PartialResultSet
from the stream (and in the current API, there is no way to determine when that is).
See https://cloud.google.com/spanner/docs/reference/rpc/google.spanner.v1#google.spanner.v1.ResultSetStats and https://cloud.google.com/spanner/docs/reference/rpc/google.spanner.v1#google.spanner.v1.PartialResultSet (the section on Stats for the last one)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the logic to rely on ReturnedRows stats since for large dataset row count is returned with last result set only hence it won't help in preallocation.
Thanks
spanner/read.go
Outdated
@@ -90,6 +91,14 @@ func streamWithReplaceSessionFunc( | |||
} | |||
} | |||
|
|||
// Iterator is an interface for iterating over Rows. | |||
type Iterator interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this interface? (Or put another way: What are the benefits of adding this interface?)
The name also seems very generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the interface name and converted to package private so that can't be used by customers, added this to ease mock and unit tests
spanner/row.go
Outdated
// Before starting, SelectAll resets the destination slice, | ||
// so if it's not empty it will overwrite all existing elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, as mentioned before, I think we should only append to the slice (without first resetting it) in case a non-nil slice is provided. WDYT?
The user may have pre-seeded the slice with some entries, or it may want to use the same slice in multiple successive calls. If we always reset, we basically preclude these options. Whereas with the alternative (just append) all these options are available (and the current behavior of resetting is trivial for the user to opt-in to: just pass s[:0]
instead of just s
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed reset logic, added unit test to validate it will append on base slice.
// | ||
// var singersByPtr []*Singer | ||
// err := spanner.SelectAll(row, &singersByPtr, spanner.WithLenient()) | ||
func SelectAll(rows rowIterator, destination interface{}, options ...DecodeOptions) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: can you call this method from outside the spanner
package?
Maybe this answers my earlier question around whether the interface is useful or not...:
type rowIterator interface
is not exported.func SelectAll
is exported.- I assume that our standard
RowIterator
that is returned for query results implementsrowIterator
. - Can you call the
SelectAll
method from outside thespanner
package when the interfacerowIterator
is unexported, as long as you have a value that implements the unexported interface?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can call this method from outside spanner package.
- Yes, its intentional at the moment so that no one can use it outside the package, the only reason its present there is to ease in test setup.
- func SelectAll is exported: Yes and it can be used with RowIterator that is returned for query results
- Same as above.
- Yes we can
Fixes: #9111
Steps to benchmark
Compare results
Result of running the combined benchmark
BenchmarkScan
on a five field Go struct, here is the chart showing execution time vs number of rows scanned.