-
Notifications
You must be signed in to change notification settings - Fork 15
DeveloperNotes
Additional stuff for contributors not covered elsewhere
We use a combination of nbdev and pip to maintain the package dependencies of the project. Please make sure to upgrade to the latest version of pip (at least or greater than version 22.1) to ensure support of developing geowrangler using an editable install. Follow the instructions in the DEVELOPMENT and CONTRIBUTING documents to setup a local development environment.
One of the things that make geowrangler different from other geospatial packages (or even other projects) is its use of nbdev which makes it possible to develop python packages alongside their documentation using Jupyter notebooks See the section on the Documentation site below.
-
All the module code in geowrangler is built from Jupyter notebooks residing in the
notebooks
folder. However, not all notebooks in thenotebooks
folder contribute a code module for geowrangler. Some of these other notebooks are tutorials or provide an overview of the geowrangler project- The implementation notebooks have a 'XX_<module_name>.ipynb' format, where
XX
is an arbitrary (can be repeating) number - The tutorial notebooks have a 'tutorial..ipynb' format, and are usually show in the documentation side bar under the Tutorials section.
- The overview notebook name 'index.ipynb' is the notebook that becomes the overview (index.html) page in the documentation site.
- The implementation notebooks have a 'XX_<module_name>.ipynb' format, where
-
To extract the code from implementation notebooks into the module code residing the
geowrangler
folder, runnbdev_export
. If the module code is already existing, it is overwritten by the latest copy of the notebook. However, if the notebook was deleted, the module code for that deleted notebook is not automatically deleted -- it must be manually deleted.
This creates the possibility of creating module code directly (without a matching notebook) if need be (and as long as no notebook tags the same module in their default_exp
comment)
- The tutorial notebooks are an important component of geowrangler's documentation -- along with the reference documentation, they provide examples as to how the geowrangler modules can be used in wrangling with geospatial data.
-
We also want to encourage exploration of the geowrangler package by providing an
Open in Colab
button for all the jupyter notebooks (especially the tutorials).- This will automatically load the notebook from the github repo into Colab using a simple conversion recipe: for any url in the github repo pointing to a notebook (e.g.
https://github.com/thinkingmachines/geowrangler/blob/master/notebooks/index.ipynb
) just replace thehttps://
part withhttps://colab.research.google.com/
and remove the.com/
(e.g.https://colab.research.google.com/github/thinkingmachines/geowrangler/blob/master/notebooks/index.ipynb
)
Note: Currently, each notebook must be edited manually to include an "Open in Colab" button.
- To make the notebook runnable in Colab, there has to be some additional steps taken, such as doing a
pip install
of thegeowrangler
package.- We also need to make sure to run these extra steps only in the context of running the notebook in Colab. This is done by bash test expression
[ -e /content ]
which assumes that only in the Colab environment is there a root directory folder named content. So to add a bash command likepip install <my-package>
in Colab, we add a! [ -e /content] && pip install <my-package>
which checks if there is a directory/content/
and executes thepip install
only if it finds it.
- We also need to make sure to run these extra steps only in the context of running the notebook in Colab. This is done by bash test expression
- This will automatically load the notebook from the github repo into Colab using a simple conversion recipe: for any url in the github repo pointing to a notebook (e.g.
- Another way to encourage exploration is by providing sample datasets (usually
.geojson
files) for the tutorials. In the repository, these sample datasets are stored thedata
directory. So if we clone the repo, there is a../data
directory (relative to thenotebooks
directory where the tutorial and implementation notebooks reside).- If the notebooks were copied and loaded individually in a jupyter environment (like in Colab), the
../data
directory might not been created and the datasets may also not have been downloaded. Another check in the tutorial notebooks is added to see if the datasets in the../data
directory have already been downloaded (and downloads them if they haven't been downloaded yet).
- If the notebooks were copied and loaded individually in a jupyter environment (like in Colab), the
- As mentioned earlier, the geowrangler maintains a set of unit tests in the
tests
folder. We also check that our unit tests cover at least 80 percent of the module code and highly encourage the contributors to maintain it as close as possible to 100 percent code coverage. To check if the unit tests are passing and that test coverage is at least 80 percent, run the following command:
pytest --cov --cov-config=.coveragerc --cov-fail-under=80 -n auto --cov-report=html
This will not only check if the tests are passing and if the code coverage is greater than 80 percent, but will also show which lines of code have not been executed during the test run in the folder htmlcov
.
- The project also has several automated CI/CD pipelines enabled (see the
.github/workflows
folder):-
pytest.yaml
which checks if the code being merged to master has passing unit tests and a code coverage greater than 80 percent -
deploy.yaml
which creates the final version of the documentation and publishes it to geowrangler's documentation site (geowrangler.thinkingmachin.es)
-
- Providing good and updated documentation is a top priority for the geowrangler project and is one of the primary reasons why we adopted nbdev -- in our workflow, the same implementation notebooks that generate geowrangler's modules are also the same notebooks that generate the reference documentation. This means that all the classes, methods and parameters in geowrangler's are easily kept in sync with the documentation.
This also has the benefit of enabling the all the tutorials and reference docs to be "executable" and (along with the Open on Colab button) significantly lowers the barrier to exploring geowrangler's modules.
- Since the documentation is generated from the notebooks anyway, the geowrangler project has been setup so that the generated pages are NOT checked into the repository. We have also adopted the latest version of nbdev (nbdev2) which uses quarto to generate the site.