If you store standard Python set
s or frozenset
s in your Series
or DataFrame
objects, you'll find this useful.
The pandas_sets
package adds a .set
accessor to any pandas Series
object;
it's like .dt
for datetime
or .str
for string
, but for set
.
It exposes all public methods available in the standard set
.
pip install pandas-sets
Just import the pandas_sets
package and it will register a .set
accessor to any Series
object.
import pandas_sets
import pandas_sets
import pandas as pd
df = pd.DataFrame({'post': [1, 2, 3, 4],
'tags': [{'python', 'pandas'}, {'philosophy', 'strategy'}, {'scikit-learn'}, {'pandas'}]
})
pandas_posts = df[df.tags.set.contains('pandas')]
pandas_posts.tags.set.add('data')
pandas_posts.tags.set.update({'data', 'analysis'})
pandas_posts.tags.set.len()
- The implementation is primitive for now. It's based heavily on the pandas' core
StringMethods
implementation. - The public API has been tested for most expected scenarios.
- The API will need to be extended to handle
NA
values appropriately.