Oozie data triggered coordinator

Other topics

Remarks:

    <done-flag>_SUCCESS</done_flag> 

The above snippet in coordinator.xml for input dataset signals the presence of input data. That means coordinator action will be in WAITING state till _SUCCESS file is present in the given input directory. Once it is present, workflow will start execution.

oozie coordinator sample

The below coordinator job will trigger coordinator action once in a day that executes a workflow. The workflow has a shell script that moves input to output.

<coordinator-app name="log_process_coordinator" frequency="${coord:days(1)}" start="2017-04-29T06:00Z" end="2018-04-29T23:25Z" timezone="UTC"
             xmlns="uri:oozie:coordinator:0.2">
<datasets>
    <dataset name="input_dataset" frequency="${coord:days(1)}" initial-instance="2017-04-29T06:00Z" timezone="GMT">
        <uri-template>${nameNode}/mypath/coord_job_example/input/${YEAR}${MONTH}${DAY}</uri-template>
        <done-flag>_SUCCESS</done-flag>
    </dataset>
    <dataset name="output_dataset" frequency="${coord:days(1)}" initial-instance="2017-04-29T06:00Z" timezone="GMT">
        <uri-template>${nameNode}/mypath/coord_job_example/output/${YEAR}${MONTH}${DAY}</uri-template>
        <done-flag>_SUCCESS</done-flag>
    </dataset>
</datasets>
<input-events>
    <data-in name="input_event" dataset="input_dataset">
        <instance>${coord:current(0)}</instance>
    </data-in>
</input-events>                
 <output-events>
    <data-out name="output_event" dataset="output_dataset">
        <instance>${coord:current(0)}</instance>
    </data-out>
</output-events>
 <action>
    <workflow>
        <app-path>${workflowAppUri}</app-path>
        <configuration>
            <property>
                <name>jobTracker</name>
                <value>${jobTracker}</value>
            </property>
            <property>
                <name>nameNode</name>
                <value>${nameNode}</value>
            </property>
            <property>
                <name>pool.name</name>
                <value>${poolName}</value>
            </property>
            <property>
                <name>inputDir</name>
                <value>${coord:dataIn('input_event')}</value>
            </property>
             <property>
                <name>outputDir</name>
                <value>${coord:dataOut('output_event')}</value>
            </property>
        </configuration>
    </workflow>
</action>

</coordinator-app>

oozie workflow sample

<workflow-app xmlns="uri:oozie:workflow:0.4" name="shell-wf">
<start to="shell-node"/>
<action name="shell-node">
 <shell xmlns="uri:oozie:shell-action:0.2">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
 <property>
   <name>mapred.job.queue.name</name>
   <value>${poolName}</value>
 </property>
</configuration>
<exec>${myscript}</exec>
<argument>${inputDir}</argument>
<argument>${outputDir}</argument>
<file>${myscriptPath}</file>
<capture-output/>
</shell>
<ok to="end"/>
<error to="fail"/>
</action>
 <kill name="fail">
  <message>Shell action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
  </message>
</kill>
 <end name="end"/>
</workflow-app>

job.properties sample

nameNode=hdfs://namenode:port
start=2016-04-12T06:00Z
end=2017-02-26T23:25Z
jobTracker=yourjobtracker
poolName=yourpool
oozie.coord.application.path=${nameNode}/hdfs_path/coord_job_example/coord
workflowAppUri=${oozie.coord.application.path}
myscript=myscript.sh
myscriptPath=${oozie.coord.application.path}/myscript.sh

shell script sample

inputDir=${1}
outputDir=${2}
hadoop fs -mkdir -p ${outputDir}
hadoop fs -cp ${inputDir}/* ${outputDir}/

submitting the coordinator job

Copy the script, coordinator.xml and workflow.xml into HDFS. coordinator.xml must be present in the directory specified by oozie.coord.application.path in job.properties. workflow.xml should be present in the directory specified by workflowAppUri. Once everything is in place, run the below command from shell

oozie job -oozie <oozie_url>/oozie/ -config job.properties

Contributors

Topic Id: 9845

Example Ids: 30294,30295,30296,30297,30298

This site is not affiliated with any of the contributors.