-
Notifications
You must be signed in to change notification settings - Fork 8
GPS Analysis v1.0
Date completed | Feb 5, 2024 |
Release where first appeared | v2.0 |
Researcher / Developer | Georgios Efstathiadis |
import openwillis as ow
hourly, daily, summary = ow.gps_analysis(filepath = '', timezone = 'US/Eastern')
This function is used for summarizing geolocation information. It requires two inputs. The first is a CSV file containing four columns: timestamps, latitude, longitude, and the accuracy of GPS measurement from the source device. The second is the timezone the data was collected in.
The GPS data is first processed using the Forest library, which includes modules for GPS processing and imputation. The missing data are imputed using a bidirectional imputation algorithm described in “Bidirectional imputation of spatial GPS trajectories with missingness using sparse online Gaussian Process” by G. Liu and J.P. Onnela. Then, the data are separated into flights and pauses, where a flight “is defined to be a longest straight-line trip of a particle from one location to another without a directional change or pause” (taken from forest documentation).
This matrix of flights and pauses is then used to calculate summary statistics of interest, at an hourly and daily level. The timestamps are converted to date and time using the timezone string that corresponds to the location of the device. The possible values for timezone come from the pytz library and users can find a list of all possible values here. Home location is considered the place the source device spends the most amount of time at night between 7pm to 9am across the dataset.
The csv file needs to contain at least 60 observations per hour for at least 5% of the hours of analysis in order to contain enough information to process and impute any missing data. Additionally the dataset needs to contain data in the 7pm to 9am interval otherwise the home location cannot be inferred and the analysis will fail.
The final output will include summary statistics for each hour or day respectively:
-
datetime
of the summaries- YYYY-MM-DD for
daily
- YYYY-MM-DD HH_00_00 for
hourly
- YYYY-MM-DD for
-
observed_time
, which indicates how much of the GPS data were not imputed. At'daily'
level,observed_time_day
andobserved_time_night
are also included, which separate the observed_time into the one that is in the day (8am to 8pm) and the one that is at night (8pm to 8am). -
dist_travelled
, which specifies the number of kms moved -
home_time
, which are the number of hours spent at home -
home_max_dist
, which indicates the maximum distance the person was from home -
home_mean_dist
, which indicates the average distance the person was from home
The function’s first output is an hourly level summary. This includes:
-
datetime
of the measure - “YYYY-MM-DD HH_00_00” for format. -
observed_time
, which indicates how much of the GPS data was not imputed, but observed in the input file. -
dist_travelled
, which specifies the number of kms moved. -
home_time
, which is the number of hours spent at home. -
home_max_dist
, which indicates the maximum distance the person was from home. -
home_mean_dist
, which indicates the average distance the person was from home.
The function’s second output is a daily level summary. This includes:
-
datetime
of the measure - “YYYY-MM-DD” for format. -
observed_time
, which indicates how much of the GPS data were not imputed, but observed in the input file. -
observed_time_day
, which indicates how much of the GPS data was not imputed, but observed in the input file in the day (8am to 8pm). -
observed_time_night
, which indicates how much of the GPS data was not imputed, but observed in the input file in the night (8pm to 8am). -
dist_travelled
, which specifies the number of kms moved. -
home_time
, which is the number of hours spent at home. -
home_max_dist
, which indicates the maximum distance the person was from home. -
home_mean_dist
, which indicates the average distance the person was from home.
The summary
dataframe compiles file-level information. It mostly includes the mean and standard deviation of the daily measures.
-
no_days
, the number of days analyzed
length of daily dataframe -
total_observed_time
, the total time of observation
sum ofobserved_time
from the daily summary statistics -
mean_move_time
, the average time spent moving per day
mean ofmove_time
from the daily summary statistics -
sd_move_time
, the standard deviation of time spent moving per day
std ofmove_time
from the daily summary statistics -
mean_pause_time
, the average time spent idle per day
mean ofpause_time
from the daily summary statistics -
sd_pause_time
, the standard deviation of time spent idle per day
std ofpause_time
from the daily summary statistics -
mean_dist_travelled
, the average distance traveled per day
mean ofdist_travelled
from the daily summary statistics -
sd_dist_travelled
, the standard deviation of distance traveled per day
std ofdist_travelled
from the daily summary statistics -
mean_home_time
, the average time spent at home per day
mean ofhome_time
from the daily summary statistics -
sd_home_time
, the standard deviation of time spent at home per day
std ofhome_time
from the daily summary statistics -
mean_home_max_dist
, the average max distance from home per day
mean ofhome_max_dist
from the daily summary statistics -
sd_home_max_dist
, the standard deviation of max distance from home per day
std ofhome_max_dist
from the daily summary statistics -
mean_home_mean_dist
, the average mean distance from home per day
mean ofhome_mean_dist
from the daily summary statistics -
sd_home_mean_dist
, the standard deviation of mean distance from home per day
std ofhome_mean_dist
from the daily summary statistics
Type | String |
Description | Path to CSV that contains GPS data. |
Type | String |
Description | The time zone at which the GPS data are collected. Time zone codes are the same as used in pytz, a list can be found here. |
Type | pd.DataFrame |
Description | Hour-level summary statistics for GPS data. |
The data frame is the transpose of the table below:
datetime | |
observed_time | |
move_time | |
pause_time | |
dist_travelled | |
home_time | |
home_max_dist | |
home_mean_dist |
Type | pd.DataFrame |
Description | Day-level summary statistics for GPS data. |
The data frame is the transpose of the table below:
datetime | |
observed_time | |
observed_time_day | |
observed_time_night | |
move_time | |
pause_time | |
dist_travelled | |
home_time | |
home_max_dist | |
home_mean_dist |
Type | pd.DataFrame |
Description | File-level summary statistics for GPS data. |
The data frame is the transpose of the table below:
no_days | |
total_observed_time | |
mean_move_time | |
sd_move_time | |
mean_pause_time | |
sd_pause_time | |
mean_dist_travelled | |
sd_dist_travelled | |
mean_home_time | |
sd_home_time | |
mean_home_max_dist | |
sd_home_max_dist | |
mean_home_mean_dist | |
sd_home_mean_dist |
hourly, daily, summary = ow.gps_analysis(data_path = 'data.csv', timezone = 'US/Eastern')
daily.head(x)
datetime | observed_time | observed_time_day | observed_time_night | move_time | pause_time | dist_travelled | home_time | home_max_dist |
2023-09-20 | 6.24 | 3.12 | 3.12 | 2.4 | 21.6 | 2.12 | 18.2 | 1.02 |
2023-09-21 | 5.65 | 2.65 | 3 | 8.6 | 15.4 | 4.15 | 17 | 0.25 |
Below are dependencies specific to calculation of this measure.
Dependency | License | Justification |
Forest | BSD 3-Clause | Used for GPS imputation algorithm and basic GPS trajectory processing |
OpenWillis was developed by a small team of clinicians, scientists, and engineers based in Brooklyn, NY.
- Release notes
- Getting started
-
List of functions
- Video Preprocessing for Faces v1.0
- Video Cropping v1.0
- Facial Expressivity v2.0
- Emotional Expressivity v2.0
- Eye Blink Rate v1.0
- Speech Transcription with Vosk v1.0
- Speech Transcription with Whisper v1.0
- Speech Transcription with AWS v1.0
- WillisDiarize v1.0
- WillisDiarize with AWS v1.0
- Speaker Separation with Labels v1.1
- Speaker Separation without Labels v1.1
- Audio Preprocessing v1.0
- Speech Characteristics v3.2
- Vocal Acoustics v2.1
- Phonation Acoustics v1.0
- GPS Analysis v1.0
- Research guidelines