-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Run data collection for Clay v0.2 #142
Comments
I suspect we are also dropping a very substantial share of inputs due a single Lines 39 to 51 in ae70345
See: https://github.com/Clay-foundation/office/issues/170#issuecomment-1914173261 |
For the latlon coordinates embeddings to capture the intended global structure, I believe we must include full global coverage on the training set, which in my opinion means to add full coverage from MODIS, either composite or several times raw images. Perhaps even train first with modis only to warm up a general latlon embeddings? |
For Clay v0.2 we are not planning to change the input platforms. Adding MODIS would require changes in architecture. The idea for v0.2 was to use the same datasources but with a much larger sample. |
Ran data collection with code from #173 We have 2535 MGRS tiles successfully processed, the data sits in s3://clay-tiles-04-sample-v02 |
We can use the current pipeline, but probably with the following changes:
Regarding the MGRS tile increase, the question is if we want to change the ratio of the input. I discussed with @srmsoumya yesterday that we should mabye increase the fraction of the landcover classes with human footprint, i.e. Urban and Agriculture. Presumably that is what users will be most interested in for search. So we could increase the fraction of that to give this more weight.
The text was updated successfully, but these errors were encountered: