This is from my Python working session on managing Python environments. I think this is helpful for researchers and data scientists who want to endow some structure in their workflows.
In this blogpost, we will be learning how we can manage Python environments and isolate dependencies using various developer tools. Specifically, I’ll introduce:
- virtualenv: allows isolated Python environments
- pip-tools: a collection of useful utilities for managing dependencies
- make: build automation tool that reads recipes called Makefiles
- Creating a virtual environment
- Managing dependencies within your virtualenv
- Using Makefile for automation
Creating a virtual environment
Virtual environment is like a sandbox. When you create one, it’s still empty (no numpy, pandas, etc.) aside from your Python interpreter.
First, create a git repo:
We first start our dev environment by putting everything under git. I recommend this to be the first step, so that we know that we’re working on git right away
mkdir my_project cd my_project git init
We don’t want to track everything in Git (e.g. data, local editor configs, some MacOS directories, etc.), so we put them inside
Since we’re working with Python, I recommend copy-pasting Github’s template Python .gitignore:
touch .gitignore # open this file with your favorite editor, then copy paste the # link above. In my case, I use the vim editor vim .gitignore
Create a virtual environment
A virtual environment (env) isolates your dependencies (and python version) from the rest of your machine. So if you install Pytorch within that environment, it will only “show up” within that env.
python3 -m venv venv # creates a virtual env called venv (gitignored)
python3: your runner
-m: run library module as a script
venv: run virtualenv library
venv: the name of the virtual env to be created
It will create a folder called
venv. Inside that folder you’ll have a dedicated
venv/bin/pip3installer. You’ll use that instead of your “global” python in your system.
Activate the virtual environment
Once you’ve created the env, you need to explicitly activate it.
In some terminals, you’ll see that
(venv)will show up in your prompt. You can deactivate it by typing:
# Don't do this for now deactivate
Managing dependencies within your virtualenv
We often use pip-tools for this step. It gives a nice interface to manage dependencies.
pip-toolswithin your virtual env
venv/bin/pip3 install pip-tools
Actually, once you activated the env, you don’t need to explicitly specify the path. However, I think it’s better to be explicit than implicit just-in-case!
Installing pip-tools gives you access to two important commands:
pip-compile: pins and resolves versions for your dependencies
pip-sync: installs dependencies and their exact versions in your env
Instead of installing dependencies one-by-one, we create a file that tracks them so that it’s reproducible. Make a file called
requirements.in, and let’s put some of our favorite libraries:
# requirements.in requests numpy pandas==1.1.3
Assume that you need the
1.1.3version of pandas and you “don’t care” about whatever version requests and numpy will be.
Compile your requirements to get pinned versions
In app development, it’s super important that your versions are pinned. It’s helpful for vulnerability-tracking, idempotence, reproducibility, and more.
pip-compilefor this, it will spit out a
venv/bin/pip-compile -o requirements.txt requirements.in
requirements.txt, you need to commit these in
git add requirements.in git add requirements.txt git commit -m "Add dependencies"
At this point, you haven’t installed dependencies yet. Doing this,
# venv/bin/python3 >>> import pandas as pd
Will result in a
ModuleNotFounderror. We’ll use
pip-syncto fix that:
requirements.txt, the pinned version, not
It does basically what is says: it syncs whatever’s in your
requirements.txtin your git. So, if you open your Python interpeter:
# venv/bin/python3 >>> import pandas as pd >>> pd.__version__ '1.1.3'
Using Makefile for automation
Usually, it’s good practice to automate these steps. We do this by writing a
Makefile and executing it via
make. The format is usually:
target can be anything:
make dependencies, etc.
It’s also a DAG, so you can run specific recipes first before running another.
To standardize things, here’s how we often do it:
# Makefile venv: ## create virtual environment if venv is not present python3 -m venv venv requirements.txt: venv requirements.in ## generate requirements for release venv/bin/pip-compile -o requirements.txt requirements.in dev: ## creates a development environment, install deps venv/bin/pip-sync requirements.txt venv/bin/pre-commit install # (out-of-scope for this session)
Later on, you’ll see yourself adding new targets. Usually, I see things like:
make run: runs a web server (maybe calling
make test: runs all your tests using pytest
make clean: remove artifact files like
# Makefile clean: ## Remove general artifact files find . -name '.coverage' -delete find . -name '*.pyc' -delete find . -name '*.pyo' -delete find . -name '.pytest_cache' -type d | xargs rm -rf find . -name '__pycache__' -type d | xargs rm -rf find . -name '.ipynb_checkpoints' -type d | xargs rm -rf format: dev ## Scan and format all files with pre-commit venv/bin/pre-commit run --all-files test: dev ## Run all tests with coverage venv/bin/pytest tests --cov=src -v --cov-report=term-missing
Separating prod and dev dependencies
Sometimes we also separate the dependencies only needed to run the app (app
dependencies) and those that are needed to develop the app (dev dependencies).
A good example is
pytest It’s a library for running tests and reporting
coverage. You don’t really need it unless you’re the developer or part of the
Here’s how I set them up. I have a file,
requirements-dev.in, that contains
all these extra dependencies:
# requirements-dev.in -r requirements.txt pytest
Then I have separate targets for building dev and production environments in the application:
prod: # creates a production environment venv/bin/pip-sync requirements.txt dev: ## creates a development environment, install deps venv/bin/pip-sync requirements-dev.txt requirements.txt: venv requirements.in ## generate requirements for release venv/bin/pip-compile -o requirements.txt requirements.in requirements-dev.txt: venv requirements-dev.in ## generate requirements for dev venv/bin/pip-compile -o requirements-dev.txt requirements-dev.in
In this blogpost, we learned about managing Python environments using tools such as virtualenv, pip-tools, and make. From scratch, we created a git repository, added dependencies, and automated build steps using a recipe. Hope you learned something new today!