Learn How to Build a Python Package from Scratch

Learn How to Build a Python Package from Scratch

Introduction

One of Python’s strongest features is its vast ecosystem of packages and modules that enable all kinds of useful functionality. As you advance in Python skills, it’s valuable to learn how to contribute back to the community by creating and sharing your own python packages.

In this comprehensive guide, we’ll walk through the steps to build a Python package from scratch – from conceptualizing functionality to publishing on PyPI for others to install and use. We’ll cover:

  • Planning package features and scope
  • Structuring code and files
  • Writing setup scripts
  • Versioning and change logs
  • Documentation and READMEs
  • Licensing
  • Testing and CI
  • Publishing to PyPI
  • Managing releases and contributions

By the end, you’ll have the knowledge to build and distribute your own Python packages to extend what’s possible with Python!

Planning Your Package

The first step is conceptualizing what your package will do. Some things to decide:

  • Use Case – What problem will your package solve? What will users be able to accomplish with it?
  • Scope – Will it provide a single cohesive capability or a suite of tools?
  • Name – What readable name summarizes the functionality?
  • Existing Packages – Is there overlap with existing modules? How will yours differ?

For example, our package could provide utilities for working with JSON data. We may call it jsontoolkit to summarize the functionality.

Keeping a narrow, well-defined scope for your first packages allows focusing on quality. Broad packages require more effort to maintain and document.

With the purpose clear, let’s look at how to structure the Python code.

Structuring Code and Layout

A Python package contains modules – .py files of related code and functionality.

A typical package structure looks like:

jsontoolkit/
  jsontoolkit/
    __init__.py
    utils.py
    transformer.py
  tests/
    test_utils.py
  setup.py
  README.md
  LICENSE

The package is contained within the top-level jsontoolkit directory which shares the package name. This folder contains:

  • The modules implementing functionality – like utils.py
  • __init__.py to mark this as a package
  • Tests for package code
  • README, LICENSE, setup files etc.

This gives a well-organized structure for package code and assets.

Now let’s look at how to write the module code.

Writing Python Package Modules

The .py files within the package contain the useful functionality we want to provide.

For example, utils.py may contain:

import json

def read_json(filepath):
  with open(filepath, 'r') as f:
    return json.load(f)
  
def write_json(filepath, data):
  with open(filepath, 'w') as f:
    json.dump(data, f)

This simple module provides utilities to read and write JSON files.

We can have additional modules like transformer.py that provides functionality related to transforming JSON data.

Each module will import the common libraries it needs and contain related functions, classes etc.

Writing Setup Scripts

The setup.py file contains metadata used when installing and distributing the package via pip.

A sample setup.py:

from setuptools import setup, find_packages

setup(
  name='jsontoolkit',
  version='1.0',
  description='Useful utilities for working with JSON',
  
  author='Example Author',
  author_email='author@example.com',
  
  packages=find_packages(),
  
  install_requires=['jsonschema'], # dependencies
  
  python_requires='>=3.6' # supported Python versions
)

This contains metadata like name, version, dependencies, supported Python versions etc. There are many more options available.

The find_packages() call automatically includes all modules in the package for distribution.

Versioning Packages

Proper versioning helps users understand package changes and improvements over time.

The semantic versioning standard is commonly used – MAJOR.MINOR.PATCH

  • MAJOR – Breaking API changes
  • MINOR – New backwards compatible functionality
  • PATCH – Backwards compatible bug fixes

For example, 1.0.0 for initial release, 1.1.0 for a minor feature addition, 1.0.2 for a patch release etc.

We configure the version in setup.py, and it should be incremented with each release.

Writing CHANGELOGs

A CHANGELOG keeps users informed about releases by summarizing package changes, fixes, and new features.

For example:

# Changelog

## 1.0.2 - 2022-05-20
- Fixed issue with read_json on Windows 
- Improved file writing performance

## 1.0.1 - 2022-04-15  
- Added support for JSON Schema validation
- Handled file encoding during reads

## 1.0.0 - 2022-02-10
- Initial release

The file can be named CHANGELOG.md and included with the package. Make the habit of maintaining a changelog to improve the user experience.

Writing READMEs

A README.md explains what the package does, how to install it, usage examples, etc. It’s the first thing users see.

A sample README:

# jsontoolkit

Useful utilities for working with JSON data in Python.

## Install

```bash
pip install jsontoolkit

Usage

from jsontoolkit import read_json, write_json

data = read_json('data.json')
processed = do_something(data) 
write_json('output.json', processed)

Documentation

Full documentation at https://example.com/docs


The README gives a quick overview and essential usage examples. It provides the first impression - so write a thorough, well-organized one.

## Adding Documentation

Full documentation helps users leverage your package fully. Tools like [Sphinx](https://www.sphinx-doc.org/) make documenting code easy.

For example, Sphinx can generate beautiful HTML docs from Python docstrings:

```python
def read_json(filepath):
  """Read JSON file into dict object.
  
  Arguments:
    filepath: Path to JSON file
  
  Returns: 
    dict: Parsed JSON contents
  """

Provide ample usage examples in docs. Host docs on tools like Read the Docs. Link to your full docs from the README.

Thorough documentation improves user experience and adoption.

Licensing Your Package

Licensing gives users permission to reuse your package under defined terms. The most popular open source license is MIT:

Copyright 2023 Your Name

Permission is hereby granted...

Add a LICENSE file with the license text to your package. Without an explicit license, your package defaults to closed-source.

Packages on PyPI should generally use open source licenses.

Testing Python Packages

Testing is essential to delivering robust, reliable packages.

Some tips:

  • Use PyTest, UnitTest etc. to create test cases
  • Keep tests in a separate tests module
  • Run tests locally before committing

GitHub Actions provide free CI for testing:

# .github/workflows/python-test.yml 

on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v2
    - uses: actions/setup-python@v2
    - run: pip install pytest
    - run: pytest tests

This runs pytest on every commit. Add CI early to enforce quality.

Publishing to PyPI

Once your package is ready, you can distribute it on PyPI for easy installation via pip.

First, create an account on PyPI.

Then install Twine for publishing:

pip install twine

Build your package:

python setup.py sdist bdist_wheel

This creates a .tar.gz source archive and .whl wheel file in dist/.

Then use Twine to upload to PyPI:

twine upload dist/*

Provide your PyPI username and password.

Your package will now be live on PyPI and installable via pip install!

pip install jsontoolkit

Publishing to PyPI allows you to easily share your package with the Python community.

Managing Releases

Use semantic versioning sensibly to release improvements:

  • Bug fixes and small changes – Release as PATCH bump
  • New features/APIs – Release as MINOR bump
  • Breaking changes – Release as MAJOR bump

Increment the version number in setup.py, update CHANGELOG, tag a release on GitHub etc.

Manage releases carefully over time to avoid breaking existing user workflows.

Handling Contributions

As your package becomes popular, you may receive contributions via pull requests:

  • Have a CONTRIBUTING.md guide for contributors
  • Use tools like GitHub issues to discuss/track contributions
  • Establish a style guide and conventions for contributions
  • Consider adding contributors to AUTHORS file

With an engaged open source community, accepting contributions can greatly improve your package over time.

Next Steps

Some additional best practices as your package matures:

  • Automated release workflows with GitHub Actions
  • Upload releases to private PyPI for internal use before public release
  • Add badges to README for PyPI version, CI status etc.
  • Promote your package on social media, Python newsletters etc. to reach users
  • Grow a community to support new users in forums, Discord etc.
  • Consider cross-platform C extensions for performance gains when needed

The joy of open source is seeing your package used and improved by the community over time.

Conclusion

In this guide, we covered implementing and publishing a Python package end-to-end – from planning functionality to maintaining releases. The key topics included:

  • Conceptualizing package features and scope
  • Structuring package files properly
  • Writing useful module code
  • Crafting setup scripts, READMEs, tests etc
  • Versioning and change logs
  • Publishing to PyPI
  • Managing releases and contributions

Python’s vast ecosystem is powered by developers generously sharing their packages. With this knowledge, you can start creating your own useful tools to give back to the community.

The steps here aim to help you build high-quality, well-documented packages worth sharing. As your open source project grows, you’ll learn lessons and best practices that can’t be taught but only experienced.

I hope you feel inspired to scratch your own itch by developing a Python package. Let me know if you end up publishing to PyPI – I’d be delighted to try out your creation!

Frequently Asked Questions

Should I open source my Python package or keep it private?

Some factors when deciding whether to open source a package:

  • Will a public package benefit your business or establish mindshare?
  • Is your package a core competitive advantage to keep proprietary?
  • Does it rely on private dependencies or data sources?
  • Does it present any security risks if made public?
  • Do you have resources to review contributions and provide support?

Evaluate these factors to determine if an open or closed source package makes sense.

What are some tips for writing good Pythonic code?

Some characteristics of idiomatic, Pythonic code:

  • Leverage built-in types like dicts, lists, sets rather than reinventing
  • Prefer simple readable code over condensed unreadable one-liners
  • Use list comprehensions and generators rather than verbose loops
  • Name variables, functions and classes descriptively
  • Avoid obvious shortcuts like mutable defaults which can cause bugs
  • Keep classes focused with single responsibilities
  • Use Exceptions rather than return codes to indicate errors
  • Take advantage of Python dynamism and duck typing where it simplifies

What are some tools to help creating Python packages?

Some useful tools when developing Python packages:

  • Cookiecutter – Template engine for starting projects
  • Poetry – Robust dependency and packaging management
  • setuptools/twine – Packaging and PyPI distribution
  • Sphinx – Creating documentation sites
  • bump2version – Automating version bumps
  • pytest – Testing code
  • Travis CI – Hosted continuous integration

Leverage tools like these to automate and reduce toil when developing Python packages.

Should I use a src layout with packages?

The src layout separates actual package code under src/packagename/ from outer metadata. This adds complexity but can improve import correctness. Some pros:

  • Avoids confusing imports with top-level package directory gone
  • Separates tests outside of src/ tree to avoid import issues
  • Allows clean import resolution in IDEs
  • Common standard for larger projects

So in summary, src layout improves import ergonomics at the cost of slightly more complexity.

What licensing should I use for my Python package?

Some common open source licenses for Python projects:

  • MIT – Very permissive, allows commercial use
  • BSD – Similar to MIT
  • Apache 2.0 – Permissive but with patent clause
  • GPL – Copyleft license requiring source sharing

MIT is the most popular and places minimal restrictions on reuse. Consider your project goals when choosing a license.

Leave a Reply

Your email address will not be published. Required fields are marked *