• Home
  • Search Tags
  • About

oozie

Topics related to oozie:

Getting started with oozie

Oozie is an Apache open source project, originally developed at Yahoo. Oozie is a general purpose scheduling system for multistage Hadoop jobs.

  • Oozie allow to form a logical grouping of relevant Hadoop jobs into an entity called Workflow. The Oozie workflows are DAG (Directed cyclic graph) of actions.
  • Oozie provides a way to schedule Time or Data dependent Workflow using an entity called Coordinator.
  • Further you can combine the related Coordinators into an entity called Bundle and can be scheduled on a Oozie server for execution.

Oozie support most of the Hadoop Jobs as Oozie Action Nodes like: MapRedude, Java, FileSystem (HDFS operations), Hive, Hive2, Pig, Spark, SSH, Shell, DistCp and Sqoop. It provides a decision capability using a Decision Control Node action and Parallel execution of the jobs using Fork-Join Control Node. It allow users to configure email option for Success/Failure notification of the Workflow using Email action.

Oozie 101

Oozie data triggered coordinator

    <done-flag>_SUCCESS</done_flag> 

The above snippet in coordinator.xml for input dataset signals the presence of input data. That means coordinator action will be in WAITING state till _SUCCESS file is present in the given input directory. Once it is present, workflow will start execution.

Content on the page is taken from Stack Overflow Documentation

This site is NOT affiliated with Stack Overflow or any of the contributors. | Privacy Policy | Terms of Service