Skip to content

Florents-Tselai/pandas-sets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Pandas Sets: Set-oriented Operations in Pandas

If you store standard Python sets or frozensets in your Series or DataFrame objects, you'll find this useful.

The pandas_sets package adds a .set accessor to any pandas Series object; it's like .dt for datetime or .str for string, but for set.

It exposes all public methods available in the standard set.

Installation

pip install pandas-sets

Just import the pandas_sets package and it will register a .set accessor to any Series object.

import pandas_sets

Examples

import pandas_sets
import pandas as pd
df = pd.DataFrame({'post': [1, 2, 3, 4],
                    'tags': [{'python', 'pandas'}, {'philosophy', 'strategy'}, {'scikit-learn'}, {'pandas'}]
                   })

pandas_posts = df[df.tags.set.contains('pandas')]

pandas_posts.tags.set.add('data')

pandas_posts.tags.set.update({'data', 'analysis'})

pandas_posts.tags.set.len()

Notes

  • The implementation is primitive for now. It's based heavily on the pandas' core StringMethods implementation.
  • The public API has been tested for most expected scenarios.
  • The API will need to be extended to handle NA values appropriately.