Oozie is an Apache open source project, originally developed at Yahoo. Oozie is a general purpose scheduling system for multistage Hadoop jobs.
Workflow
. The Oozie workflows are DAG (Directed cyclic graph) of actions.Coordinator
.Bundle
and can be scheduled on a Oozie server for execution.Oozie support most of the Hadoop Jobs as Oozie Action Nodes like: MapRedude
, Java
, FileSystem
(HDFS operations), Hive
, Hive2
, Pig
, Spark
, SSH
, Shell
, DistCp
and Sqoop
. It provides a decision capability using a Decision Control Node
action and Parallel execution of the jobs using Fork-Join Control Node
. It allow users to configure email option for Success/Failure notification of the Workflow using Email
action.
<done-flag>_SUCCESS</done_flag>
The above snippet in coordinator.xml for input dataset signals the presence of input data. That means coordinator action will be in WAITING state till _SUCCESS file is present in the given input directory. Once it is present, workflow will start execution.