-
Notifications
You must be signed in to change notification settings - Fork 8
Job Scheduler
The Job Scheduler allows you to import data from a SQL database into HDFS connected to TAP. Data can be imported in batch mode or by scheduling periodic, automatic updates.
From the TAP console main menu, navigate to Job Scheduler and then Import data.
TAP displays a form for you to fill out, starting with a job name, as shown below. Pick a unique name for your job.
-
JDBC URI
- You can enter the URI directly, or you can fill in the fields above the URI field to create the required schema for the jdbc URI:
jdbc:driver://host:port/database_name
You can add optional JDBC URI parameters, for example, to turn on SSL for postgresql connection:
jdbc:postgresql://host:port/database_name?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory
(Please note that parameters are driver-specific, check database driver documentation for details)
-
Username
andPassword
- These are the credentials to connect to the data source -
Table
- This is the name of the database table to be imported into HDFS. -
Destination dir
- This is the directory in the target HDFS where you will store the imported data. Note: Make sure you have write access rights to this directory. -
Choose import mode
- There are 3 import modes available:Append
,Overwrite
, andIncremental
-
Append
- Each import will fetch the whole table into a separate file. Results of previous imports will not be overwritten. Files on HDFS will have names in the pattern:part-m-00000
,part-m-00001
, and so on. -
Overwrite
- Each import will fetch the entire source table and overwrite results of the previous import, using part-m-00000 for the filename. -
Incremental
- The import will fetch records having, in a column identified byColumn name
parameter, values not lower than the value provided by theValue
parameter. Each subsequent import will fetch records with values in the aforementioned column higher than the previously fetched values. For this purpose (identifying), we recommend using a numeric column, which is auto-incremented.-
Column name
- The column from the database (unique numeric format), against whichValue
will be checked; used for unique identification of data to be imported. -
Value
- A reference value used to filter out records from the source database - only records with values (in a column identified by ‘Column name’) not smaller than this reference ‘Value’ will be imported.
-
-
-
Start time
- The start time of your job.-
Note: When you enter a
Start time
prior to the current time, Oozie will try to “catch up” by executing jobs from the past.
-
Note: When you enter a
-
End time
- The end time of your job.-
End time
should always be later thanStart time
.
-
-
Frequency
- The frequency with which your job will be submitted. -
Timezone
- The id of the time zone for the entered start and end time.
Selecting Job Scheduler then Job browser from the TAP main menu allows you to view scheduled jobs. There are two tabs on the Job browser page: Workflow jobs
and Coordinator jobs
.
-
Workflow jobs
- On this tab, you can see a list of workflow jobs. Workflow jobs represent imports from databases to HDFS. Click onSee details
to the right of a job name for additional information (example shown below).-
Details
- This section provides additional information about the specified workflow job. -
See logs
- This section provides logs related to the specified workflow job. You can kill the job by clicking on the Kill button.
-
-
Coordinator jobs
- This tab contains configuration information and manages workflow jobs. Click onSee details
to the right of a job for additional information (example shown below).-
Details
- Additional information about the coordinator job. -
Started workflow jobs
- List of workflow jobs spawned by the coordinator job. Each workflow job on the list has aSee details
link, which will redirect you to the selected workflow job details.
-