Skip to content

Wrapper around Pandas and SQLAlchemy to easily get DataFrames in and out of databases.

License

Notifications You must be signed in to change notification settings

antonlydell/Pandemy

Repository files navigation

Pandemy

PyPI conda-forge - Version PyPI - Python Version conda-forge - Platform Documentation status PyPI - License

Pandemy is a wrapper around pandas and SQLAlchemy to provide an easy class based interface for working with DataFrames and databases. This package is for those who enjoy working with pandas and SQL but do not want to learn all "bells and whistles" of SQLAlchemy. Use your existing SQL knowledge and provide text based SQL statements to load DataFrames from and write DataFrames to databases.

Installation

Pandemy is available for installation through PyPI using pip and conda-forge using conda. The source code is hosted on GitHub at: https://github.com/antonlydell/Pandemy

Install with pip:

$ pip install Pandemy

Install with conda:

$ conda install -c conda-forge pandemy

Dependencies

The core dependencies of Pandemy are:

  • pandas : powerful Python data analysis toolkit
  • SQLAlchemy : The Python SQL Toolkit and Object Relational Mapper

Databases except for SQLite require a third-party database driver package to be installed. The table below lists database driver packages for supported databases and their corresponding optional dependency identifier.

Optional dependencies of Pandemy.
Database Driver package Optional dependency identifier Version added
Oracle cx_Oracle oracle 1.1.0

To install cx_Oracle together with Pandemy run:

$ pip install Pandemy[oracle]

When using conda supply the driver package as a separate argument to the install command:

$ conda install -c conda-forge pandemy cx_oracle

A DataFrame to and from table round trip

This section shows a simple example of writing a DataFrame to a SQLite database and reading it back again.

Save a DataFrame to a table

Let's create a new SQLite database and save a DataFrame to it.

import io
import pandas as pd
import pandemy

# Data to save to the database
data = io.StringIO("""
ItemId,ItemName,MemberOnly,Description
1,Pot,0,This pot is empty.
2,Jug,0,This jug is empty.
3,Shears,0,For shearing sheep.
4,Bucket,0,It's a wooden bucket.
5,Bowl,0,Useful for mixing things.
6,Amulet of glory,1,A very powerful dragonstone amulet.
""")

df = pd.read_csv(filepath_or_buffer=data, index_col='ItemId')  # Create a DataFrame

# SQL statement to create the table Item in which to save the DataFrame df
create_table_item = """
-- The available items in General Stores
CREATE TABLE IF NOT EXISTS Item (
   ItemId      INTEGER,
   ItemName    TEXT    NOT NULL,
   MemberOnly  INTEGER NOT NULL,
   Description TEXT,

   CONSTRAINT ItemPk PRIMARY KEY (ItemId)
);
"""

db = pandemy.SQLiteDb(file='Runescape.db')  # Create the SQLite DatabaseManager instance

with db.engine.begin() as conn:
   db.execute(sql=create_table_item, conn=conn)
   db.save_df(df=df, table='Item', conn=conn)

The database is managed through the DatabaseManager class (in this case the SQLiteDb instance). Each SQL dialect is a subclass of DatabaseManager. The initialization of the DatabaseManager creates the database engine, which is used to create a connection to the database. The begin method of the engine returns a context manager with an open database transaction, which commits the statements if no errors occur or performs a rollback on error. The connection is automatically returned to the engine's connection pool when the context manager exits. If the database file does not exist it will be created. The execute method allows for execution of arbitrary SQL statements such as creating a table. The save_df method saves the DataFrame df to the table Item in the database db.

Load a table into a DataFrame

The DataFrame saved to the table Item of the database Runescape.db can easily be read back into a DataFrame.

import pandas as pd
from pandas.testing import assert_frame_equal
import pandemy

db = pandemy.SQLiteDb(file='Runescape.db', must_exist=True)

sql = """SELECT * FROM Item ORDER BY ItemId;"""  # Query to read back table Item into a DataFrame

with db.engine.connect() as conn:
   df_loaded = db.load_table(sql=sql, conn=conn, index_col='ItemId')

assert_frame_equal(df, df_loaded, check_dtype=False)
print(df)
               ItemName  MemberOnly                          Description
ItemId
1                   Pot           0                   This pot is empty.
2                   Jug           0                   This jug is empty.
3                Shears           0                  For shearing sheep.
4                Bucket           0                It's a wooden bucket.
5                  Bowl           0            Useful for mixing things.
6       Amulet of glory           1  A very powerful dragonstone amulet.

If the must_exist parameter is set to True pandemy.DatabaseFileNotFoundError will be raised if the database file is not found. This is useful if you expect the database to exist and you want to avoid creating a new database by mistake if it does not exist. The connect method of the engine is similar to begin, but without opening a transaction. The load_table method supports either a table name or a sql statement for the sql parameter.

Documentation

The full documentation is hosted at: https://pandemy.readthedocs.io

Tests

Pandemy has a test suite that is using the pytest framework. The test suite is located in the directory tests.

Run the test suite with the pytest command from the root directory of the repository:

$ pytest

License

Pandemy is distributed under the MIT-license.

Contributing

Suggestions, feature requests and feedback are welcome in text form on the tab GitHub Discussions, but not as written code. This project is meant as a source of practice for me to become a better Python developer and I prefer to write the code myself. Please use the category Ideas for suggestions and feature request and the General category for feedback on the project and general questions. Bug reports should be submitted at the Github Issues tab.

About

Wrapper around Pandas and SQLAlchemy to easily get DataFrames in and out of databases.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages