You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
df = pd.read_csv(file_path)
## Convert to datetime columns
df["firstorder"]=pd.to_datetime(df["firstorder"],errors='coerce')
df["lastorder"] = pd.to_datetime(df["lastorder"],errors='coerce')
## Drop Rows with null values
df = df.dropna()
## Create Column which gives the days between the last order and the first order
df["first_last_days_diff"] = (df['lastorder']-df['firstorder']).dt.days
## Create Column which gives the days between when the customer record was created and the first order
df['created'] = pd.to_datetime(df['created'])
df['created_first_days_diff']=(df['created']-df['firstorder']).dt.days
## Drop Columns
df.drop(['custid','created','firstorder','lastorder'],axis=1,inplace=True)
## Apply one hot encoding on favday and city columns
df = pd.get_dummies(df,prefix=['favday','city'],columns=['favday','city'])
return df
# convert the store_data file into csv format
store_data = pd.read_excel("storedata_total.xlsx")
store_data.to_csv("storedata_total.csv")
When reading the pdf in L2 df = pd.read_csv(file_path), the index needs to be taken care of. Otherwise we end up with a file containing 22 instead of 21 columns breaking the inference. This fixes the issue: df = pd.read_csv(file_path, index_col=0)
The text was updated successfully, but these errors were encountered:
When reading the pdf in L2
df = pd.read_csv(file_path)
, the index needs to be taken care of. Otherwise we end up with a file containing 22 instead of 21 columns breaking the inference. This fixes the issue:df = pd.read_csv(file_path, index_col=0)
The text was updated successfully, but these errors were encountered: