- 1. Introduction (to the Job Market)
- 2. Environment Setup
- 3. Git and Version Control
- 4. Organizing Your Workspace
- 5. Conclusion
This repository is for STEM graduates who want to become software professionals. It's designed for those who excel in science, statistics, or data analysis but have knowledge gaps in software engineering. If you're great academically but less great with coding, this repo is perfect for you no matter what data-based job you're aiming for.
After six years as a professional developer, I've returned to this practice repo and enhanced it with the tools, tips, and tutorials I wish I'd have been given when starting out. This repository covers the essentials of interfacing with the computer as a software developer while practicing with LeetCode and Project Euler. It's designed to help you improve your computer skills, especially if you have a strong math background.
Please note, this repository does not focus on creating elaborate data science projects or complex machine learning models. Instead, it emphasizes the engineering principles and efficient data structures essential for developing robust, high-performance software.
After graduating with degrees in math and physics, I had only used Python in a research setting. My clunky code was holding me back. I needed to adopt a more computational approach to break new ground (e.g. a one-month runtime is unreasonable). I realized that deepening my understanding of software principles was key to solving more complex numerical problems. I then discovered Project Euler, a website focused on solving progressively challenging math problems with code. It was the perfect place to start. Later, I found that LeetCode was also crucial for improving, as it reinforces coding with efficient, industry-standard data structures.
If you've used R or Python in research, you may have used Anaconda to simplify setup and package installation to jump right into coding. Staying in this tutorial zone limits your growth and ability to push to production. Learning to set up Python and interact with your computer's filesystem is crucial foundational knowledge, even if it's frustrating at first.
First, visit the downloads section of the official Python website and download the latest version of the Python installer for Windows.
Unless you have a specific reason to use something older, use the latest version (e.g. TensorFlow is incompatible with Python > 3.11 as of writing this). Does this mean if you want to use TensorFlow, you have to uninstall the latest version of Python? No. You can use different Python versions for different projects, but that will be covered a little later.
Go to wherever you downloaded the installer, then run it.
The β Add python.exe to PATH option during installation configures your Windows system to recognize Python commands from any command prompt.
The only people who need to install Python globally work in IT. Even if you're on a company laptop, just install Python for yourself.
The β Add Python to Environment Variables option is the same as β Add python.exe to PATH. This sets up the system's environment variables to include the directory where Python is installed.
This setup allows the operating system to locate Python executables and scripts from any command line or terminal without specifying the full path (e.g. C:\Users\..\python.exe
). Instead, you can access Python by simply typing python
in your chosen terminal.
After setup, verify that the installation was successful by opening Command Prompt and running python --version
.
Instead of Command Prompt or PowerShell, I prefer using Git Bash because it is cross-platform and designed for Git, the near-ubiquitous version control system. Download the latest stable build and install it with the default recommended settings.
Notepad++ is a widespread, open source upgrade to Microsoft's default Notepad. If you were to code in Python in a bare bones text editor, Notepad++ would be a logical choice. It has a clean GUI and a myriad of plug-ins and customizability, and it directly interfaces with Git Bash. Download the latest stable build and install it with the default recommended settings.
VS Code is the best code editor for the modern developer on a Windows machine. It is free, open-source, lightweight, cross-platform, fast, customizable, widely used, and equipped with all modern coding tools in the same interface. Download the latest stable build and install it with the default settings.
VS Code has a built-in terminal that allows you to run command-line operations without needing to switch windows. Be sure to set the default integrated terminal to Git Bash.
Hotkeys are the way to making coding feel comfortable and natural. It's essential to automate your workflow, and the less you have to move the mouse, the better your wrists will feel.
VS Code is fully accessible from the keyboard. The most important key combination to know is F1
, which brings up the Command Palette. From there, you have access to all functionality within VS Code, including keyboard shortcuts for the most common operations. Here are some of the most useful ones for beginners.
Shortcut | Action |
---|---|
F1 |
Open Command Palette |
Ctrl + / |
Toggle Line Comment |
Shift + Alt + β |
Copy Line Up (or Down β ) |
Alt + β |
Go Back (or Forward β ) |
Alt + β |
Move Line Up (or Down β ) |
Ctrl + D |
Add Selection to Next Find Match |
Ctrl + X |
Cut (the Entire Line) |
Ctrl + K Ctrl + O |
File: Open Folder... |
Shift + Alt + R |
File: Reveal in File Explorer |
Ctrl + F |
Find |
Ctrl + H |
Replace |
F5 |
Debug: Start Debugging |
F12 |
Go to Definition |
In a professional context, the debugger is crucial for solving bugs. While print statements have their place, the debugger is the most important tool for identifying critical issues or production bugs.
Like the built-in terminal, VS Code has a built-in debug console. When running a script with the debugger, you can explore variable states and the call stack when paused at a breakpoint.
By default, VS Code creates a new terminal for each debug run. This can consume unnecessary RAM, especially for new programmers who don't need previous debug outputs. To avoid this, configure VS Code to send output to the integrated debug console, clearing history after each run. This setting is managed in something called the launch.json
file that appears in the dynamically-generated .vscode
folder when debugging is prompted (with F5
). Here is the standard one I use.
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python Debugger: Current File",
"type": "debugpy",
"request": "launch",
"program": "${file}",
"console": "internalConsole",
"internalConsoleOptions": "openOnSessionStart"
}
]
}
After years of coding, I find these core extensions greatly enhance the VS Code experience.
-
One Dark Pro by binaryify
- This theme's syntax highlighting and clever use of italics ease frustration by making code more readable.
-
Material Icon Theme by Phillipp Kief
- This theme enhances the VS Code file explorer with colored icons, making file searching simpler.
-
Markdown All in One by Yu Zhang
- This extension allows for editing and previewing Markdown documentation directly in VS Code, streamlining documentation workflow.
-
Markdown PDF by yzane
- Convert Markdown documents into PDF format with ease if you ever need to email a report.
-
Rainbow CSV by mechatroner
- Rainbow CSV makes reading comma-separated values directly in VS Code possible, reducing time spent context switching (to Excel π).
-
vscode-pdf by tomoki1207
- With this, you can view PDF files directly in VS Code, reducing context switching.
-
Python Indent by Kevin Rose
- This extension automatically indents Python code to the correct level as you write new lines, reducing keystrokes and improving readability without distraction.
-
Ruff by Astral Software
- Automatically format your Python code in modern Black style, practically eliminating the need for manual formatting. I also recommend adding these keyboard shortcuts for Ruff formatting:
-
isort by Microsoft
- This extension consistently sorts imports, enhancing code readability. Combined with Ruff, you can auto-format nearly everything in your Python scripts except for some comments and docstrings.
-
autoDocstring: VSCode Python Docstring Generator by Nils Werner
- Dynamically create Google-style docstrings based on a function's definition. This gives you that last piece of formatting automation.
Spend as little time on formatting as possible. Using Ruff, isort, and autoDocstring eliminates 99% of any formatting you'll need to do. Solving the problem at hand should always be your main focus.
"Black is an uncompromising Python code formatter that saves time and mental energy by automating code formatting, ensuring consistency, and reducing diffs for faster code reviews."
The Black formatting standard is a modern improvement to the previous one, PEP 8, and it's the best modern way to format a Python code base. The Ruff extension utilizes Black when it performs automated formatting.
Black defaults to 88 characters per line while PEP8 uses between 72 and 99 characters. Keeping all of this in mind, I configured my user settings to include highlighting certain line lengths (i.e. rulers) and wrapping text at 88 characters. Here is a link to the user settings.json
file I use while developing.
Note that these settings are user settings, so they must be set by accessing Preferences: Open User Settings (JSON) from the command palette. Remember, F1
opens the command palette.
After the user settings.json
file is open, you can copy-and-paste my settings file, but note that it assumes all previously-mentioned extensions are installed.
Virtual environments are essential for managing project dependencies. They allow you to avoid the hassle of reinstalling Python multiple times when different projects require different versions (e.g., Python 3.8 vs. 3.12).
In a global workspace, Python packages are installed system-wide. While convenient for system tools and widely used libraries, this approach can lead to version conflicts and unintended updates across projects.
A local workspace, on the other hand, utilizes virtual environments to isolate project-specific dependencies. By creating a virtual environment for each project, you can install packages without affecting the system-wide Python installation. This isolation ensures that each project operates with its own set of dependencies, maintaining consistency and stability.
Local workspaces allow you to experiment with different package versions too, helping facilitate updating project dependencies (which can be a very frustrating process).
Once your workspace folder is open, do the following.
- Run
pip install virtualenv
.- What is
pip install
?- In Python,
pip
stands for "Pip Installs Packages" and it's the default package installer for Python. It's used to install packages from the Python Package Index (PyPI).install
is just one of the many commands available, and you can usepip help
to see a list of all available commands.
- In Python,
- What is
virtualenv
?virtualenv
is a package for setting up virtual environments. While Python does already have a package that does this calledvenv
, it is only compatible with Python >= 3.3, whereasvirtualenv
runs on a wider range of versions, plus offers additional features for more complex development environments.
- What is
- Run
virtualenv venv
.- This creates a virtual environment folder named
venv
in your workspace folder. It contains a sequestered installation of Python (and the version can be specified with the--python=python3.x
flag).
- This creates a virtual environment folder named
- Run
source venv/Scripts/activate
.- This command runs a BAT ("batch") file (batch files are automation scripts file for Windows) that sets up and activates the environment.
- After the environment is activated, you should see
(venv)
above the first line of the terminal.
- At this point, you have a blank slate to begin installing packages. If you are running someone else's project, they should have a
requirements.txt
file that lists out all modules necessary to get the software to work. To install these all in one go, runpip install -r requirements.txt
.
What if you want to make your own requirements.txt
file? As your projects grow more complex, you'll need to import various Python libraries, each potentially requiring specific versions of their own dependencies. Without a proper system to manage these interconnected dependencies, things can quickly become a mess. The standard way to export currently-installed modules is with pip freeze > requirements.txt
, but there is a better way.
By default, running pip freeze > requirements.txt
exports a list of your project dependencies in a file named requirements.txt
. For a project with four dependent modules, the results might look like this:
build==1.2.1
click==8.1.7
colorama==0.4.6
contourpy==1.2.1
However, this list doesn't indicate which modules depend on which others.
This is where pip-tools
becomes useful. After installing the package with pip install pip-tools
, the flow to export dependencies becomes:
pip freeze > requirements.in
pip-compile
The resulting requirements.txt
file includes not only the required modules and their versions, but also all of their interlaced dependencies. This detailed information is especially valuable when sharing code with other developers that want to update or modify dependencies in your project.
# This file is autogenerated by pip-compile with Python 3.12
# by the following command:
#
# pip-compile
#
build==1.2.1
# via
# -r requirements.in
# pip-tools
cfgv==3.4.0
# via
# -r requirements.in
# pre-commit
click==8.1.7
# via
# -r requirements.in
# pip-tools
colorama==0.4.6
# via
# -r requirements.in
# build
# click
contourpy==1.2.1
# via
# -r requirements.in
# matplotlib
Not using Git is a recipe for disaster. Developing enterprise software is complex, and tracking progress is essential. While there are various methods for version control (e.g., myscript.py
, myscript1.py
, myscript1_final.py
), Git is by far the best. Although new developers may find Git intimidating at first, embracing it early on saves considerable pain and frustration in the long run.
Watch this video by ByteByteGo for a great overview of Git.
The four main areas Git interfaces with are:
- /localWorkingDir
- Staging Area
- Local Repository
- Remote Repository
The commands you'll use 99% of the time are:
git clone
- Create a copy of an existing repository from a remote server to your local machine.
git branch
- List, create, or delete branches within a repository.
git checkout
- Switch between branches or restore files in the working directory.
git add
- Stage changes (new, modified, or deleted files) to be included in the next commit.
git commit
- Record any staged changes to the repository with a descriptive message.
git push
- Upload local commits to a remote repository.
git pull
- Fetch and integrate changes from a remote repository into the current branch.
Less often, you'll need additional commands when managing a complex code base. This video by ByteByteGo is a great introduction to them.
git merge
- Combine changes from one branch into another, creating a merge commit that integrates the histories of both branches.
git rebase
- Move or re-apply a series of commits to a new base commit, creating a linear project history.
squash commit
- Combine multiple commits into a single commit, simplifying the commit history. This can be done during a rebase or merge.
After cloning a project and aunthenticating, you want to have a cohesive general approach to developing new features in your repo.
- πΎ Commit frequently. You want to maintain a clean commit history by committing frequently, logically chunking updates with meaningful commit messages. This makes reading the commit history much easier.
- β Avoid partial pushes. Only push to remote when your work is ready, and avoid partial pushes. This is an essential practice to prevent bugs in production.
- π± Develop on new branches. To make sure things don't get mixed up, its best to develop new features or bugfixes on new branches. Working solely on
master
is only really appropriate when you're the sole author of a project, and even still, creating new branches for particularly large feature changes saves a lot of headache if anything ever needs to be rolled back, even if you are working alone.
Here's how my standard development process looks on a daily basis. Again, merge
or rebase
aren't used often, so they aren't a part of this flow.
- π₯ Clone the repository from its remote location to your local machine.
git clone <url>
- This can be done with SSH or HTTPS. SSH saves time in the long run and could be considered slightly more secure, but both methods are just fine.
- π± Make a new branch and switch to it.
git branch <name>
to create the branch.git checkout <name>
to switch to it.- Or, use
git checkout -b <name>
to do it all at once (this is the easiest way).
- π Make changes locally by creating new files or editing existing ones.
- π¨ Add the changes to the Staging Area.
git add .
- Note that
.
means all files, but you could specify individual ones withgit add "myfile.txt"
if you wanted.
- Note that
- π¬ Commit the changes with a meaningful message (easier said than done).
git commit -m "asdf"
- Repeat steps 3-5 for subsequent changes, paced at logical intervals.
- π€ Push all commits to the remote repository.
git push
- To learn even more, MIT provides an in-depth guide to learning the Git workflow.
Before you're a professional, you'll almost never collaborate with other devs on a coding project in the same repo, so merge conflicts are rare. However, once multiple team members begin working on the same codebase simultaneously, it can get messy.
For instance, what happens if a colleague pushes a series of commits while you're in the middle of developing a feature? This situation calls for merging the changes, which is typically straightforward if they involve separate files. If the changes are in the same file however, merge conflicts can be created, and they need tedious manual review to resolve.
To avoid all of this headache:
This way, you are always working from the latest version of your code base, and you're be less likely to have to perform a merge later.
Adopting virtual environments, Git, and a well-structured folder tree streamlines your workflow and reduces errors. These practices are foundational for maintaining a clean, manageable, and scalable codebase, so its essential to practice them, even on personal projects.
Most Python projects have similarly-named directories to organize files in a way that is logical and familiar to other devs. The following table lists some of the most common directory names.
Directory | Description |
---|---|
doc |
Documentation |
img |
Images |
src |
Source code |
venv |
Virtual environment |
tests |
Unit tests |
bin |
Executables |
lib |
Additional non-pip-installable dependencies |
config |
Configuration files |
notebooks |
Interactive documentation |
examples |
Examples of how to use the project |
queries |
SQL queries for fetching data |
static |
CSS, JavaScript, and images for web apps |
templates |
HTML templates for web apps |
logs |
Logs generated by the app |
scripts |
Various utility or setup scripts |
dist |
Stores distribution packages for the project (i.e. wheels) |
build |
Build-related files and temporary build artifacts |
Even if these names or concepts aren't familiar yet, using these folders when creating a workspace structure will help other devs understand your project.
In Python, I generally use the following file naming conventions when possible.
- Directories are lowercase
snake_case
. - Non-code files are lowercase
kebab-case
. - Classes are
PascalCase
. - Functions are lowercase
snake_case
.
A file or directory that uses kebab-case
can't be directly imported by Python, so I especially like using kebab-case
for non-code files to reinforce this idea.
According to me, every repository worth its salt needs to have the following files:
.gitignore
- A configuration file that specifies files and directories that Git should ignore. It ensures that sensitive or unnecessary files (e.g. log files or compiled binaries) are not tracked by version control.
README.md
- A Markdown file that serves as the entry point and documentation for a project. It typically includes a description of the project, installation instructions, usage examples, and other relevant information.
requirements.in
- A text file used with the
pip-compile
tool. It lists the direct dependencies of a project without specifying how they are interconnected. It's used to generate therequirements.txt
file.
- A text file used with the
requirements.txt
- A text file that lists the Python packages (including their specific versions) required for a project. It allows for easy installation of dependencies using
pip install -r requirements.txt
.
- A text file that lists the Python packages (including their specific versions) required for a project. It allows for easy installation of dependencies using
As you want to automate your workflow further, you can introduce some of these intermediate configuration files. Although there are many others, most of my serious Python projects have used the following:
.flake8
.gitattributes
- Configuration file that specifies attributes for files in a Git repository. It can define attributes such as text/binary handling, merge strategies, and end-of-line normalization.
.pre-commit-config.yaml
- Configuration file for the Pre-commit framework, which manages and executes hooks for code formatting, linting, and more, ensuring code quality before commits. I've found this configuration file to be one of the most important when it comes to reducing diffs and unifying a code base.
- You can have
pre-commit
run automatically on everygit commit
by runningpre-commit install
after the main module is installed. This installs a Git hook in the.git/hooks
directory, specifically as a pre-commit hook. - If this file is implemented into a legacy code base and there are a ton of recommended changes to make (too many to feasibly fix at once), using the
--no-verify
flag bypasses thepre-commit
checks.
- You can have
- Configuration file for the Pre-commit framework, which manages and executes hooks for code formatting, linting, and more, ensuring code quality before commits. I've found this configuration file to be one of the most important when it comes to reducing diffs and unifying a code base.
pyproject.toml
Great job if you've made it this far. This guide has a lot of condensed information, but it will help you code like a professional Python software engineer instead of a university academic. Practicing these principles and workflows alongside coding concepts and data projects is the difference between pushing a project to production or having it sit in a closet as a prototype. Docker containers are also relevant to this idea, but are outside of the scope of this repo.
If you were able to use these concepts to clone or fork this repository, start developing solutions to Project Euler or LeetCode problems, and treat this repo as a gym ποΈ. It's a place to show up every day and keep your mind and software skills sharp. Don't just solve problems, but practice your workflows, and keep on top of your Github profile. Remember that consistency is key, and strength builds over time. You can't become a wizard in a day.