Lab 6.2: #69

aristsakpinis93 · 2023-05-16T08:17:19Z

    df = pd.read_csv(file_path)
    ## Convert to datetime columns
    df["firstorder"]=pd.to_datetime(df["firstorder"],errors='coerce')
    df["lastorder"] = pd.to_datetime(df["lastorder"],errors='coerce')
    ## Drop Rows with null values
    df = df.dropna()
    ## Create Column which gives the days between the last order and the first order
    df["first_last_days_diff"] = (df['lastorder']-df['firstorder']).dt.days
    ## Create Column which gives the days between when the customer record was created and the first order
    df['created'] = pd.to_datetime(df['created'])
    df['created_first_days_diff']=(df['created']-df['firstorder']).dt.days
    ## Drop Columns
    df.drop(['custid','created','firstorder','lastorder'],axis=1,inplace=True)
    ## Apply one hot encoding on favday and city columns
    df = pd.get_dummies(df,prefix=['favday','city'],columns=['favday','city'])
    return df
    
# convert the store_data file into csv format
store_data = pd.read_excel("storedata_total.xlsx")
store_data.to_csv("storedata_total.csv")

When reading the pdf in L2 df = pd.read_csv(file_path), the index needs to be taken care of. Otherwise we end up with a file containing 22 instead of 21 columns breaking the inference. This fixes the issue: df = pd.read_csv(file_path, index_col=0)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lab 6.2: #69

Lab 6.2: #69

aristsakpinis93 commented May 16, 2023 •

edited

Loading

Lab 6.2: #69

Lab 6.2: #69

Comments

aristsakpinis93 commented May 16, 2023 • edited Loading

aristsakpinis93 commented May 16, 2023 •

edited

Loading