Table of contents:
If you aren't comfortable using Git or some of its functionalities, the Pro Git book (within Git documentation) is a great resource where you can cherry-pick any subject you need to understand.
Optional, but highly recommended, it's good practice to keep required dependencies separated from other projects by creating an isolated environment.
Popular choices include virtual environments, from the Python standard library:
$ python -m venv <env_path>
$ source <env_path/bin/activate>
and conda environments:
$ conda create --name <env_name> "python>=3.6.5,<3.9"
$ conda activate <env_name>
It boils down to fork, clone, and sync:
-
You first want to fork the project repository (i.e. creating a personal server-side copy):
go to the project repository webpage and click the Fork button on the top right of the page. -
Then clone your fork to your local machine (i.e. creating a local copy on your machine):
$ git clone https://github.com/your-username/funk-svd.git
-
And finally sync your fork with the upstream repository (the "central" server-side repository, i.e. the parent of the fork):
$ cd funk-svd
$ git remote add upstream https://github.com/gbolmier/funk-svd.git
The whole process is also well documented by GitHub.
Navigate to the cloned directory and install the library in editable mode so that changes in the code take effect immediately, and with the required development dependencies (cf. the following stackoverflow question):
$ pip install -e ".[dev]"
Now that our fork is synced with upstream we can update our local master branch with upstream latest changes:
$ git checkout master
$ git pull upstream master
It's good practice to make changes within an independent line of development so that the master branch reflects only production-ready code. Let's create a new feature branch and tell git to point to this latter:
$ git branch <new_feature>
$ git checkout <new_feature>
Develop within your feature branch and record your changes to the repository:
$ git add <modified_files>
$ git commit
When you want your changes to appear publicly on your GitHub page, push your forked feature branch’s commits:
$ git push -u origin <new_feature>
If the changes you're working on might be impacted by potential changes in the upstream repository, you probably want to merge upstream latest changes into your local feature branch before and after editing it — enabling you to detect conflicts or changes breaking yours, early. To do so:
$ git fetch upstream
$ git merge upstream/master
If you aren't familiar with conflict solving, you can refer to the related Github documentation.
The project follows PEP 8 style guide for Python code, and NumPy format for docstring conventions. Unit testing is done with the pytest
framework.
Before being merged, changes must pass PEP8 and unit testing checks, both executed from the root of the project:
$ flake8
$ pytest
Clarity and conciseness are warmly encouraged. One asset of Python being its high readability, it would be too bad not taking advantage of it. We human beings have limited cognitive load, meaning that we can't remember too many items in our short term memory. In order to make the development experience more friendly for everyone, let's create abstractions by grouping concept-related instructions together and give them relevant names. This process of chunking and aliasing makes grouped items more easily remembered, reducing our cognitive load. Said in a more practical maneer, it consist of:
- grouping instructions by concepts in variables, functions, and classes
- giving explicit instead of cryptic names
- defining the abstractions right level of granularity, avoiding too long or too nested series of instructions
For example, instead of:
grades = [3.25, 2, 4.5, 3.75, 5]
m = sum(grades) / len(grades)
std = (sum(((grade - m)**2 for grade in grades)) / len(grades))**.5
prefer something like:
grades = [3.25, 2, 4.5, 3.75, 5]
def standard_deviation(x: Sequence[Real]) -> float:
n = len(x)
x_mean = sum(x) / n
x_centered_squared = ((xi - x_mean)**2 for xi in x)
variance = sum(x_centered_squared) / n
return variance**.5
grades_std = standard_deviation(grades)
Obviously, code readability must be balanced with code performance depending on the purpose of the code. Therefore, when you are forced to write complex code (e.g. to make it faster), add clear comments of what it does under the hoods:
def fast_standard_deviation(x: Sequence[Real]) -> float:
x = np.array(x)
# Center and square x vector
# Compute the arithmetic mean of the previous result
# Return the square root of the previous result
return np.sqrt(((x - x.mean())**2).mean())
Follow Github documentation instructions to create a pull request from your fork and submit it to the upstream repository. Prefix your pull request name by [WIP]
if it's still work in progress, or [MRG]
once you consider it ready to be merged. In the latter case, don't forget to synchronize your feature branch with the latest changes.