Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common Date Transformers #42

Open
jmackwinn opened this issue Oct 21, 2017 · 2 comments
Open

Common Date Transformers #42

jmackwinn opened this issue Oct 21, 2017 · 2 comments

Comments

@jmackwinn
Copy link

First off thanks to the devs for creating such an awesome and useful library. Just a suggestion - it would be great to add a few date transformers to this library. For example pass on a list of data columns and for each column spit out separate columns year, month, weekday, hour etc. Here is a rudimentary date differ transformer I use often.

import pandas as pd
import numpy as np
import datetime as dt
from sklearn.base import TransformerMixin

class DateDiffer(TransformerMixin):
    '''
    # takes the difference between two dates and returns value in days
    # Please use DateFormatter() before using DateDiffer()
    
    How it works:
    If you specify 3 dates: [date1,date2,date3]
    Output will be 2 columns:
        date2-date1
        date3 - date2
    
    The transformer takes the following parameter 'units':
        Y:  year	
        M:  month	
        W:  week	
        D:  day		
        h:  hour	
        m:  minute	
        s:  second	
        ms: millisecond	
        us: microsecond	
        ns: nanosecond	
        ps: picosecond	
        fs: femtosecond	
        as: attosecond	
    '''
    def __init__(self, unit='D'):
        self.unit = unit
    
    def fit(self, X, y=None):
        # stateless transformer
        return self

    def transform(self, X):
        # assumes X is a DataFrame
        beg_cols = X.columns[:-1]
        end_cols = X.columns[1:]
        Xbeg = X[beg_cols].as_matrix()
        Xend = X[end_cols].as_matrix()
        Xd = (Xend - Xbeg) / np.timedelta64(1, self.unit)
        diff_cols = ['->'.join(pair) for pair in zip(beg_cols, end_cols)]
        Xdiff = pd.DataFrame(Xd, index=X.index, columns=diff_cols)
        return Xdiff


My Python foo skills are limited - for example, I am unable to generalize the DateDiffer() transformer to an entire dataframe, or say, pass it a list of columns and do a fit_transform()

Finally, is there a way to pass two numeric columns to a transformer and obtain the column differences? I know I can create interaction variables with the sklearn polynomial transformer but not df{'x1']+df['x2'] for instance.

@tgsmith61591
Copy link
Owner

I think this is a reasonable request, and certainly a common enough use case. @charlesdrotar let's spend some time discussing

@jmackwinn
Copy link
Author

Thanks guys - and I might add some of your classes are already solving some pain points alot of us have e..g: safelabelencoder encodes unseen values. I referenced your work in this stackoverflow thread

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants