Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't constrain GridBounds size to IntMax x IntMax #2292

Merged
merged 4 commits into from
Jul 25, 2017

Conversation

fosskers
Copy link
Contributor

@fosskers fosskers commented Jul 20, 2017

Issues

  • size returning a Long instead of Int is likely backwards incompatible.
  • I wanted to make coords return an Iterator instead of a Stream, but one single function (RDDKernelDensity.pointFeatureToSpatialKey) in GT requires the output of a for over coords
    to be a Seq, which Array can be cast to and Iterator cannot.
  • I ingested a 60k-by-60k GeoTiff after making this change, and while the ingest itself was very strange (took forever to process, was all single core, no I didn't mess up the Spark config), it did succeed.

Closes #2265 .

def coords: Stream[(Int, Int)] = for {
row <- Stream.range(0, height)
col <- Stream.range(0, width)
} yield (row, col)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really love to salvage the ability to use Iterator here.

@lossyrob
Copy link
Member

The problem here of course is that this breaks the API. Such a minor break, but still breaks it.
If we were to release this as a bugfix release, because it's fixing a bug, the semver gods will frown down upon us.

Client code like:

val gb: GridBounds = ???
val gridSize = gb.size

would be fine, but code like

val gb: GridBounds = ???
val gridSize:Int = gb.size

would fail.

It might be the case that exactly 0 codebases have code that would break on this change. But then again, maybe a million codebases. Or like 2.

Now that we switched to semver how do we handle this? Any thoughts from others? @hectcastro? @echeipesh? @metasim?

@fosskers
Copy link
Contributor Author

Regarding appeasing the SemVer gods, I suppose we'd have to wait to merge until we're ready to push more 2.0 features through?

@fosskers
Copy link
Contributor Author

fosskers commented Jul 20, 2017

The Array => Stream change is also breaking. They're both collections, but they don't support exactly the same operations. GridBounds.coords is never used this way in GT itself, but technically there was nothing stopping users from doing:

val gbs: GridBounds = ???
val coords: Array[(Int, Int)] = dbs.coords

println(coords(5)) // caution to the wind. Also a senseless operation.

Open question: do we have a right to overlook API breaks when we have sufficient belief that a certain use case (like this one) would never occur in reality? More of me wants to say "no" than "yes", but the "yes" is there nonetheless.

@fosskers fosskers changed the title Don't constrain GridBounds size to IntMax x IntMax [WIP] Don't constrain GridBounds size to IntMax x IntMax Jul 20, 2017
@echeipesh
Copy link
Contributor

GridBounds floats to the surface of the API way too often for us to pretend that its somehow a private class. I don't think I've ever written even a small module that didn't and up holding GridBounds in some way.

I'd be tempted to add: .sizeLong: Long and .cordsIterator: Iterator[(Int, Int)] and try to switch the breaking code to use the new methods. This could spider out but then again it could be tidy.

A cleaner version of above is to make a new class LongGridBounds or some such and convert to it to perform the above operations.

@fosskers
Copy link
Contributor Author

... to pretend that its somehow a private class.

Luckily I wasn't suggesting that. Can you think of a reason someone would honestly index the coords Array as in my example?

Alarm bells going off regarding LongGridBounds. The bug is occurring in the original class as used in vanilla ingests, so I don't think LGB would solve that.

The .*Long varieties might work, but they birth another hidden bug: the original coords Array being too big and throwing. Okay, so we bring in the invariant back in to protect against that... and we're back to square one.

@echeipesh
Copy link
Contributor

echeipesh commented Jul 20, 2017

Luckily I wasn't suggesting that. Can you think of a reason someone would honestly index the coords Array as in my example?

I didn't not mean that as an accusation, just not an option.

I would disagree that original .coords throwing when its too big is a hidden bug, thats pretty much a standard data-structure choice of sequence vs. an iterator. And we're free to @depricate it with a meaningful warning.

Edit: And I agree there is totally no reason to index into array of coords, the discussion is purely around being good software carpenters as prescribed by semver.

@fosskers
Copy link
Contributor Author

fosskers commented Jul 20, 2017

thats pretty much a standard data-structure choice of sequence vs. an iterator.

Yeah true. So if it blows, we say "told you so, here's the Iterator variety for that". @deprecated feels right.

The Stream/Iterator issue is still outstanding though.

def size = width * height
def isEmpty = size == 0

def size: Int = width * height
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use the @migration annotation which is mentioned in the official Scaladoc docs, but it doesn't compile. There's no mention of it in the main scala repo.

@@ -50,7 +50,7 @@ object CutTiles {
val extent = inKey.extent
logger.debug(s"Cutting $inKey of ${tile.dimensions} cells covering $extent")
mapTransform(extent)
.coords.toIterator
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naughty!

- The original `coords` has been marked deprecated, and will be switched with
  `coordsIter` in 2.0.
@hectcastro
Copy link
Contributor

I agree with Eugene's instincts around sizeLong and LongGridBounds. It would be great if we could come up with replacement function or class names that we'd want to keep moving forward (e.g., coords -> coordinates).

size seems a lot trickier. I wasn't able to come up with a good alternative name that didn't lead to confusion. In this case, we may have no choice but to go with something like sizeLong, mark size for deprecation (but note that its type will change vs. actual function removal), rewrite size to return type Long in 2.0, and mark sizeLong for deprecation in 3.0.

As far as the high level plan moving forward, I think the process would be:

  • Publish the new functions with deprecation warnings on the old functions in a minor release
  • Highlight the above change in the CHANGELOG, or anywhere else the 1.2 release Is discussed
  • Schedule removal of the deprecated functions in a major release

@fosskers
Copy link
Contributor Author

I can add a deprecation warning to size (already done so for sizeLong), and update the CHANGELOG to include these warnings.

@fosskers fosskers changed the title [WIP] Don't constrain GridBounds size to IntMax x IntMax Don't constrain GridBounds size to IntMax x IntMax Jul 21, 2017
@fosskers
Copy link
Contributor Author

Figuring out test failures...

@echeipesh echeipesh added this to the 1.2 milestone Jul 25, 2017
@echeipesh echeipesh merged commit c9db248 into master Jul 25, 2017
@fosskers fosskers deleted the fix/cgw/big-gridbounds branch August 17, 2017 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants