Skip to content

Job scheduler

jakubzembik edited this page Apr 4, 2016 · 31 revisions

Job scheduling service

Import data from a SQL database

Inside page Job Scheduler you can select Import data page for schedule import

  • Name - is a name of your job

  • JDBC Uri - this value can be entered directly or you can fill fields above. Required schema of jdbc uri:
jdbc:driver://host:port/database_name

  • Username and Password - you need to pass credentials

  • Table - it's name of table from database specified for import

  • Destination dir - it's directory where you will import data
  • Choose import mode - there are 3 possible modes: Append, Overwrite and Incremental
    • Append - each import will fetch whole table into separate file. Results of previous imports will not be overwritten. Files on hdfs will have names: part-m-00000, part-m-00001 and so on.
    • Overwrite - each import will fetch whole table and overwrite results of previous import.
    • Incremental - import will fetch records starting from Value from specified Column name. Each next import will fetch records with Value higher then last fetched Value from specified Column name. We recommend using column which is auto-incremented.
      • Column name - it's column from database, from which Value will be checked
      • Value - it's value from which you want import database

  • Start time - it's start time of your job
  • End time - it's end time of your job.
    • End time should be always after Start time
    • Be aware that when you enter Start time from the past Oozie will try to catch up execution of jobs from the past.
  • Frequency - it's frequency with which your job will be submitted
  • Timezone - it's id of the time zone in which you entered start and end time

Job browser

In page Job browser you could see submitted jobs. There are two pages inside: Workflow jobs and Coordinator jobs.

  • Coordintator jobs - In this page you can see list of coordinator jobs. Coorinator jobs contain configuration and manage to spawn workflow jobs. You can click on See details to get additional information.
    • Details - Additional information about coordinator job

  • Started workflow jobs - List of workflow jobs spawned by coordinator job. Each workflow job on the list have See details field, whcich will redirect you to selected workflow job details.

  • Workflow jobs - In this page you can see list of workflow jobs. Workflow job represent import from database to hdfs. You can click on See details to get additional information.
    • Details - Additional information about workflow job

  • See logs - Here you can see logs related to workflow job
Clone this wiki locally