A data analysis task during my Future Work is Digital Scholarship provided by the Egyptian Ministry of Communications and Information Technology.
- This dataset collects information from 100k medical appointments in Brazil and is focused on the question of whether patients show up for their appointment. Several characteristics about the patient are included in each row.
- The data has 110527 rows and 14 columns.
- General Questions related to the existence of:
- missing values?
- wrong datatypes for columns?
- complete duplicates in the data?
- outliers in each column?
- Univariate Questions:
- Which
Gender
is more healthier than the other depending on number of enteries? - Which
Age
values are valid or there are values like 1000..,etc? - Which disease among
Hipertension
,Diabetes
andAlcoholism
is most dominant?
- Bivariate Questions:
- Is the number of
PatientId
same asAppointmentID
or may be more than one appointment for the same patient? - Does Receiving
Scholarship
has strong effect on not cancelling the appointment or not? - Does the period between
AppointmentDay
andScheduleDay
has effect on cancelling the appointment?
I went through a 110527-row and 14-column dataset about medical appointments inBrazil to discover which features affects the meeting being Canceled or not.
- dropped
AppointmentID
andHandCap
Columns. - replaced
PatientId
values with0 -> 62298
values. - renamed
No-show
toCanceled
for avoiding confusion. - changed the datatypes for
Scholarship
,Hipertension
,Diabetes
,Alcoholism
andSMS_received
. - replaced wrong values in the
Age
column.
- In our sample,
Females
care with their health more thanMales
as they reserved moreappointments. - The most dominant disease in our sample is
Diabetes
. - % of
Alcohol
additction is, by far, higher inMales
thanFemales
. - Receiving
Scholarship
has strong effect on not cancelling the appointment. - Longer
ReservationPeriod
_period betweenAppointmentDay
andScheduledDay
contributes to those cancelled appointments. Gender
has no notable effect on cancelling appointments.
Receiving financial Scholarship
& getting shorter ReservationPeriod
for the
appointment arethe most two features that may tend to lower the number of Canceled
appointments
Handcap
has no documentation in the kaggle description & its name has no translation so it may be important but we cannot understand it.- Discription of data on kaggle shows that
SMS_received
is a field indicates number of message received but in this data it is only binary field with either 0 or 1 which may reflect misleading understand. (Thanks