-
Notifications
You must be signed in to change notification settings - Fork 2
Customizing Hive Implementation
Running AMP requires a configuration file (.ini) to be passed in to the main python file.
This configuration has several parameters that are used to define the data used and how to perform the aggregation. An example can be found in config/ais.ini
.
table_name
- The name of the Hive table that contains your data.
table_schema_id
- The column of your Hive table that contains the track id or user id that identifies a track.
table_schema_dt
- The column of your Hive table that contains the timestamp to be used (YYYY-mm-dd HH:MM:SS).
table_schema_lat
- The column of your Hive table that contains latitude.
table_schema_lon
- The column of your Hive table that contains longitude.
time_filter
- The maximum number of seconds allowed between points on a track. Any segment with more time between points gets removed.
distance_filter
- The maximum distance allowable between points in KM. Any segment with more distance between points gets removed.
lower_left_lat
- Lower Left latitude of bounding box to contain data.
lower_left_lon
- Lower Left longitude of bounding box to contain data.
upper_right_lat
- Upper Right latitude of bounding box to contain data.
upper_right_lon
- Upper Right longitude of bounding box to contain data.
trip_name
- A label for the aggregated data. Used in naming Hive tables.
resolution_lat
- The height of bins in approximately 100 KM. This must be a factor of 10 (e.g. 1 ~= 100KM, .1 ~= 10KM, .01 ~= 1KM).
resolution_lon
- The width of bins in approximately 100 KM. This must be a factor of 10 (e.g. 1 ~= 100KM, .1 ~= 10KM, .01 ~= 1KM).
temporal_split
- Used to further bin data by discrete temporal amounts. Valid values are "minute", "hour", "day", "month", "year", and "all" for ignoring timestamps for binning.