Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6994] Allow to fetch field values by name in sql.Row #5573

Closed
wants to merge 2 commits into from

Conversation

vidma
Copy link

@vidma vidma commented Apr 18, 2015

It looked weird that up to now there was no way in Spark's Scala API to access fields of DataFrame/sql.Row by name, only by their index.

This tries to solve this issue.

@vidma vidma force-pushed the features/row-with-named-fields branch 2 times, most recently from 496cd36 to 10ae4d8 Compare April 18, 2015 17:00
@vidma vidma changed the title Add named field support for sql.Row [SPARK-6994] Allow to fetch field values by name in sql.Row Apr 18, 2015

object RowImplicits {

implicit class RowWithNamedFields[K, V](sqlRow: Row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we just add this to Row itself?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 if so, I guess I should rename getValue[T] into an overloaded getAs[T](fieldName: String) to be consistent with an existing method getAs? (I remember overloading didn't work with implicits...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup sounds good

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the new method to be convenient to use it should go into Row.
However, the schema is not always defined for a Row, so Row.getAs[T](fieldName: String) with throw an UnsupportedOperationException.
GenericRowWithSchema will implement the case when schema exists.

@vidma vidma force-pushed the features/row-with-named-fields branch 9 times, most recently from d823862 to 49698a4 Compare April 19, 2015 12:17
@vidma vidma force-pushed the features/row-with-named-fields branch from 49698a4 to a9a8854 Compare April 19, 2015 12:19
@vidma
Copy link
Author

vidma commented Apr 19, 2015

moved the changes to the Row and RowWithSchema classes, and also added a fieldIndex to the schema (StructType).

also got rid of dependency on HiveTest.

@vidma vidma force-pushed the features/row-with-named-fields branch from a9a8854 to 0a2b8b0 Compare April 19, 2015 12:31
- add fieldIndex(name: String)
- add getAs[T](fieldName: String)
- add getValuesMap[T] returning a map of values for the requested fieldNames
@vidma vidma force-pushed the features/row-with-named-fields branch from 0a2b8b0 to 6145ae3 Compare April 19, 2015 12:41
@vidma
Copy link
Author

vidma commented Apr 19, 2015

it's slightly weird that there exists a RowSuite.scala in sql-core and none in sql-catalyst.
Anyway, the one in sql-core looks like integration test, so I added a little test there too, to make sure it works on Rows created via DataFrames.

P.S. isn't jenkins supposted to run tests on this PR?

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30611 has finished for PR 5573 at commit 6145ae3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@marmbrus
Copy link
Contributor

Thanks, merged to master.

@asfgit asfgit closed this in 2e8c6ca Apr 21, 2015
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
It looked weird that up to now there was no way in Spark's Scala API to access fields of `DataFrame/sql.Row` by name, only by their index.

This tries to solve this issue.

Author: vidmantas zemleris <[email protected]>

Closes apache#5573 from vidma/features/row-with-named-fields and squashes the following commits:

6145ae3 [vidmantas zemleris] [SPARK-6994][SQL] Allow to fetch field values by name on Row
9564ebb [vidmantas zemleris] [SPARK-6994][SQL] Add fieldIndex to schema (StructType)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants