'count'=>infinity
and element restart
in your plan.Childspec = #{id => foo ,start => {bar, baz, [arg1, arg2]} ,plan => [restart] ,count => infinity}.
If your process did not start after crash, director will lock and retries to restart your process infinity
times !
If you are using infinity
for 'count'
, always use {restart, MiliSeconds}
in 'plan'
instead of restart
.
Childspec1 = #{id => foo ,start => {bar, baz} ,plan => [restart,restart,delete,wait,wait, {restart, 4000}] ,count => infinity}. Childspec2 = #{id => foo ,start => {bar, baz} ,plan => [restart,restart,stop,wait, {restart, 20000}, restart] ,count => infinity}. Childspec3 = #{id => foo ,start => {bar, baz} ,plan => [restart,restart,stop,wait, {restart, 20000}, restart] ,count => 0}. Childspec4 = #{id => foo ,start => {bar, baz} ,plan => [] ,count => infinity}.
The rest of delete
element in Childspec1
and the rest of stop
element in Childspec2
will never evaluate!
In Childspec3
you want to run your plan 0 times!
In ChildSpec4
you have not any plan to run infinity
times!
release_handler
, release_handler
calls supervisor:get_callback_module/1
for fetching its callback module.get_callback_module/1
uses supervisor internal state record for giving its callback module. Our director does not know about supervisor internal state record, then supervisor:get_callback_module/1
does not work with directors.supervisor:get_callback_module/1
works perfectly with directors :).1> foo:start_link(). {ok,<0.105.0>} 2> supervisor:get_callback_module(foo_sup). foo 3>
Pouriya@Jahanbakhsh ~ $ git clone https://github.com/Pouriya-Jahanbakhsh/director.git
Note that OTP>=19 required (if you want to upgrade it using release_handler
).
Go to director
and use rebar
or rebar3
.
Pouriya@Jahanbakhsh ~ $ cd director
rebar
Pouriya@Jahanbakhsh ~/director $ rebar compile ==> director_test (compile) Compiled src/director.erl Pouriya@Jahanbakhsh ~/director $
rebar3
Pouriya@Jahanbakhsh ~/director $ rebar3 compile ===> Verifying dependencies... ===> Compiling director Pouriya@Jahanbakhsh ~/director $
director needs a callback module (like OTP supervisor).
In callback module you should export function init/1
.
What init/1
should return? wait, i'll explain step by step.
-module(foo). -export([init/1]). init(_InitArg) -> {ok, []}.
Save above code in foo.erl
in director directory and go to the Erlang shell.
Use erl -pa ./ebin
if you used rebar
to compile it and use rebar3 shell
if you used rebar3
.
Erlang/OTP 19 [erts-8.3] [source-d5c06c6] [64-bit] [smp:8:8] [async-threads:0] [hipe] [kernel-poll:false] Eshell V8.3 (abort with ^G) 1> c(foo). {ok,foo} 2> Mod = foo. foo 3> InitArg = undefined. %% i don't need it yet. undefined 4> {ok, Pid} = director:start_link(Mod, InitArg). {ok,<0.112.0>} 5>
Now we have a supervisor without children.
Good news is that director comes with full OTP/supervisor API and it has its advanced features and specific approach too.
5> director:which_children(Pid). %% You can use supervisor:which_children(Pid) too :) [] 6> director:count_children(Pid). %% You can use supervisor:count_children(Pid) too :) [{specs,0},{active,0},{supervisors,0},{workers,0}] 7> director:get_pids(Pid). %% You can NOT use supervisor:get_pids(Pid) because it hasn't :D []
OK, I'll make simple gen_server
and give it to our director.
-module(bar). -behaviour(gen_server). -export([start_link/0 ,init/1 ,terminate/2]). %% i am not going to use handle_call, handle_cast ,etc. start_link() -> gen_server:start_link(?MODULE, null, []). init(_GenServerInitArg) -> {ok, state}. terminate(_Reason, _State) -> ok.
Save above code in bar.erl
and got back to the Shell.
8> c(bar). bar.erl:2: Warning: undefined callback function code_change/3 (behaviour 'gen_server') bar.erl:2: Warning: undefined callback function handle_call/3 (behaviour 'gen_server') bar.erl:2: Warning: undefined callback function handle_cast/2 (behaviour 'gen_server') bar.erl:2: Warning: undefined callback function handle_info/2 (behaviour 'gen_server') {ok,bar} %% You should define unique id for your process. 9> Id = bar_id. bar_id %% You should tell diector about start module and function for your process. %% Should be tuple {Module, Function, Args}. %% If your start function doesn't need arguments (like our example) %% just use {Module, function}. 10> start = {bar, start_link}. {bar,start_link} %% What is your plan for your process? %% I asked you some questions at the first of this README file. %% Plan should be an empty list or list with n elemenst. %% Every element can be one of %% 'restart' %% 'delete' %% 'stop' %% {'stop', Reason::term()} %% {'restart', Time::pos_integer()} %% for example my plan is: %% [restart, {restart, 5000}, delete] %% In first crash director will restart my process, %% after next crash director will restart it after 5000 mili-seconds %% and after third crash director will not restart it and will delete it 11> Plan = [restart, {restart, 5000}, delete]. [restart,{restart,5000},delete] %% What if i want to restart my process 500 times? %% Do i need a list with 500 'restart's? %% No, you just need a list with one element, I'll explain it later. 12> Childspec = #{id => Id ,start => Start ,plan => Plan}. #{id => bar_id, plan => [restart,{restart,5000},delete], start => {bar,start_link}} 13> director:start_child(Pid, Childspec). %% You can use supervisor:start_child(Pid, ChildSpec) too :) {ok,<0.160.0>} 14>
Lets check it
14> director:which_children(Pid). [{bar_id,<0.160.0>,worker,[bar]}] 15> director:count_children(Pid). [{specs,1},{active,1},{supervisors,0},{workers,1}] %% What was get_pids/1? %% It will returns all RUNNING ids with their pids. 16> director:get_pids(Pid). [{bar_id,<0.160.0>}] %% We can get Pid for specific RUNNING id too 17> {ok, BarPid1} = director:get_pid(Pid, bar_id). {ok,<0.160.0>} %% I want to kill that process 18> erlang:exit(BarPid1, kill). true %% Check all running pids again 19> director:get_pids(Pid). [{bar_id,<0.174.0>}] %% changed (restarted) %% I want to kill that process again %% and i will check children before spending time 20> {ok, BarPid2} = director:get_pid(Pid, bar_id), erlang:exit(BarPid2, kill). true 21> director:get_pids(Pid). [] 22> director:which_children(Pid). [{bar_id,restarting,worker,[bar]}] %% restarting 23> director:get_pid(Pid, bare_id). {error,not_found} %% after 5000 ms 24> director:get_pids(Pid). [{bar_id,<0.181.0>}] 25> %% Yoooohoooooo
I mentioned advanced features, what are they?
Lets see other acceptable keys for Childspec
map.
-type childspec() :: #{'id' => id() ,'start' => start() ,'plan' => plan() ,'count' => count() ,'terminate_timeout' => terminate_timeout() ,'type' => type() ,'modules' => modules() ,'append' => append()}. %% 'id' is mandatory and can be any Erlang term -type id() :: term(). %% Sometimes 'start' is optional ! just wait and read carefully -type start() :: {module(), function()} % default Args is [] | mfa(). %% I explained 'restart', 'delete' and {'restart', MiliSeconds} %% 'stop': director will crash with reason {stop, [info about process crash]}. %% {'stop', Reason}: director exactly will crash with reason Reason. %% 'wait': director will not restart process, %% but you can restart it using director:restart_child/2 and you can use supervisor:restart_child/2 too. %% fun/2: director will execute fun with 2 arguments. %% First argument is crash reason for process and second argument is restart count for process. %% Fun should return terms like other plan elements. %% Default plan is: %% [fun %% (normal, _RestartCount) -> %% delete; %% (shutdown, _RestartCount) -> %% delete; %% ({shutdown, _Reason}, _RestartCount) -> %% delete; %% (_Reason, _RestartCount) -> %% restart %% end] -type plan() :: [plan_element()] | []. -type plan_element() :: 'restart' | {'restart', pos_integer()} | 'wait' | 'stop' | {'stop', Reason::term()} | fun((Reason::term() ,RestartCount::pos_integer()) -> 'restart' | {'restart', pos_integer()} | 'wait' | 'stop' | {'stop', Reason::term()}). %% How much time you want to run plan? %% Default value of 'count' is 1. %% Again, What if i want to restart my process 500 times? %% Do i need a list with 500 'restart's? %% You just need plan ['restart'] and 'count' 500 :) -type count() :: 'infinity' | non_neg_integer(). %% How much time director should wait for process termination? %% 0 means brutal kill and director will kill your process using erlang:exit(YourProcess, kill). %% For workers default value is 1000 mili-seconds and for supervisors default value is 'infinity'. -type terminate_timeout() :: 'infinity' | non_neg_integer(). %% default is 'worker' -type type() :: 'worker' | 'supervisor'. %% Default is first element of 'start' (process start module) -type modules() :: [module()] | 'dynamic'. %% :) %% Default value is 'false' %% I'll explan it -type append() :: boolean().
Edit foo
module:
-module(foo). -export([start_link/0 ,init/1]). start_link() -> director:start_link({local, foo_sup}, ?MODULE, null). init(_InitArg) -> Childspec = #{id => bar_id ,plan => [wait] ,start => {bar,start_link} ,count => 1 ,terminate_timeout => 2000}, {ok, [Childspec]}.
Go to the Erlang shell again:
1> c(foo). {ok,foo} 2> foo:start_link(). {ok,<0.121.0>} 3> director:get_childspec(foo_sup, bar_id). {ok,#{append => false,count => 1,id => bar_id, modules => [bar], plan => [wait], start => {bar,start_link,[]}, terminate_timeout => 2000,type => worker}} 4> {ok, Pid} = director:get_pid(foo_sup, bar_id), erlang:exit(Pid, kill). true 5> director:which_children(foo_sup). [{bar_id,undefined,worker,[bar]}] %% undefined 6> director:count_children(foo_sup). [{specs,1},{active,0},{supervisors,0},{workers,1}] 7> director:get_plan(foo_sup, bar_id). {ok,[wait]} %% I can change process plan %% I killed process one time. %% If i kill it again, entire supervisor will crash with reason {reached_max_restart_plan... because 'count' is 1 %% But after changing plan, its counter will restart from 0. 8> director:change_plan(foo_sup, bar_id, [restart]). ok 9> director:get_childspec(foo_sup, bar_id). {ok,#{append => false,count => 1,id => bar_id, modules => [bar], plan => [restart], %% here start => {bar,start_link,[]}, terminate_timeout => 2000,type => worker}} 10> director:get_pids(foo_sup). [] 11> director:restart_child(foo_sup, bar_id). {ok,<0.111.0>} 12> {ok, Pid2} = director:get_pid(foo_sup, bar_id), erlang:exit(Pid2, kill). true 13> director:get_pid(foo_sup, bar_id). {ok,<0.113.0>} 14> %% Hold onFinally what the
append
key is?
actually always we have one DefaultChildspec
.
14> director:get_default_childspec(foo_sup). {ok,#{count => 0,modules => [],plan => [],terminate_timeout => 0}} 15>
DefaultChildspec
is like normal childspecs except that it can't accept id
and append
keys.
If i change append
value to true
in my Childspec
:
My terminate_timeout
will be added to terminate_timeout
of DefaultChildspec
.
My count
will be added to count
of DefaultChildspec
.
My modules
will be added to modules
of DefaultChildspec
.
My plan
will be added to plan
of DefaultChildspec
.
And if i have start
key with value {ModX, FuncX, ArgsX}
in DefaultChildspec
and start
key with value {ModY, FunY, ArgsY}
in Childspec
, final value will be {ModY, FuncY, ArgsX ++ ArgsY}
.
And finally if i have start
key with value {Mod, Func, Args}
in DefaultChildspec
, start
key in Childspec
is optional for me.
You can return your own DefaultChildspec
as third element of tuple in init/1
.
Edit foo.erl
:
-module(foo). -behaviour(director). %% Yes, this is a behaviour -export([start_link/0 ,init/1]). start_link() -> director:start_link({local, foo_sup}, ?MODULE, null). init(_InitArg) -> Childspec = #{id => bar_id ,plan => [wait] ,start => {bar,start_link} ,count => 1 ,terminate_timeout => 2000}, DefaultChildspec = #{start => {bar, start_link} ,terminate_timeout => 1000 ,plan => [restart] ,count => 5}, {ok, [Childspec], DefaultChildspec}.
Restart the shell:
1> c(foo). {ok,foo} 2> foo:start_link(). {ok,<0.111.0>} 3> director:get_pids(foo_sup). [{bar_id,<0.112.0>}] 4> director:get_default_childspec(foo_sup). {ok,#{count => 5, plan => [restart], start => {bar,start_link,[]}, terminate_timeout => 1000}} 5> Childspec1 = #{id => 1, append => true}, %% Default 'plan' is [Fun], so 'plan' will be [restart] ++ [Fun] or [restart, Fun]. %% Default 'count' is 1, so 'count' will be 1 + 5 or 6. %% Args in above Childspec is [], so Args will be [] ++ [] or []. %% Default 'terminate_timeout' is 1000, so 'terminate_timeout' will be 1000 + 1000 or 2000. %% Default 'modules' is [bar], so 'modules' will be [bar] ++ [] or [bar]. 5> director:start_child(foo_sup, Childspec1). {ok,<0.116.0>} %% Test 6> director:get_childspec(foo_sup, 1). {ok,#{append => true, count => 6, id => 1, modules => [bar], plan => [restart,#Fun<director.default_plan_element_fun.2>], start => {bar,start_link,[]}, terminate_timeout => 2000, type => worker}} 7> director:get_pids(foo_sup). [{bar_id,<0.112.0>},{1,<0.116.0>}] %% I want to have 9 more children like that 8> [director:start_child(foo_sup ,#{id => Count, append => true}) || Count <- lists:seq(2, 10)]. [{ok,<0.126.0>}, {ok,<0.127.0>}, {ok,<0.128.0>}, {ok,<0.129.0>}, {ok,<0.130.0>}, {ok,<0.131.0>}, {ok,<0.132.0>}, {ok,<0.133.0>}, {ok,<0.134.0>}] 10> director:count_children(foo_sup). [{specs,11},{active,11},{supervisors,0},{workers,11}] 11>
You can change defaultChildspec
dynamically using change_default_childspec/2
!
And you can change Childspec
of children dynamically too and set their append
to true
!
But with changing them in different parts of code, you will make spaghetti code
Yessssss, diorector has its own debug and accepts standard sys:dbg_opt/0
.
director sends valid logs to sasl
and error_logger
in different states too.
1> Name = {local, dname}, Mod = foo, InitArg = undefined, DbgOpts = [trace], Opts = [{debug, DbgOpts}]. [{debug,[trace]}] 2> director:start_link(Name, Mod, InitArg, Opts). {ok,<0.106.0>} 3> 3> director:count_children(dname). *DBG* director "dname" got request "count_children" from "<0.102.0>" *DBG* director "dname" sent "[{specs,1}, {active,1}, {supervisors,0}, {workers,1}]" to "<0.102.0>" [{specs,1},{active,1},{supervisors,0},{workers,1}] 4> director:change_plan(dname, bar_id, [{restart, 5000}]). *DBG* director "dname" got request "{change_plan,bar_id,[{restart,5000}]}" from "<0.102.0>" *DBG* director "dname" sent "ok" to "<0.102.0>" ok 5> {ok, Pid} = director:get_pid(dname, bar_id). *DBG* director "dname" got request "{get_pid,bar_id}" from "<0.102.0>" *DBG* director "dname" sent "{ok,<0.107.0>}" to "<0.102.0>" {ok,<0.107.0>} %% Start SASL 6> application:start(sasl). ok ... %% Log about starting SASL 7> erlang:exit(Pid, kill). *DBG* director "dname" got exit signal for pid "<0.107.0>" with reason "killed" true =SUPERVISOR REPORT==== 4-May-2017::12:37:41 === Supervisor: dname Context: child_terminated Reason: killed Offender: [{id,bar_id}, {pid,<0.107.0>}, {plan,[{restart,5000}]}, {count,1}, {count2,0}, {restart_count,0}, {mfargs,{bar,start_link,[]}}, {plan_element_index,1}, {plan_length,1}, {timer_reference,undefined}, {terminate_timeout,2000}, {extra,undefined}, {modules,[bar]}, {type,worker}, {append,false}] 8> %% After 5000 mili-seconds *DBG* director "dname" got timer event for child-id "bar_id" with timer reference "#Ref<0.0.1.176>" =PROGRESS REPORT==== 4-May-2017::12:37:46 === supervisor: dname started: [{id,bar_id}, {pid,<0.122.0>}, {plan,[{restart,5000}]}, {count,1}, {count2,1}, {restart_count,1}, {mfargs,{bar,start_link,[]}}, {plan_element_index,1}, {plan_length,1}, {timer_reference,#Ref<0.0.1.176>}, {terminate_timeout,2000}, {extra,undefined}, {modules,[bar]}, {type,worker}, {append,false}] 8>
rebar:
Pouriya@Jahanbakhsh ~/director $ rebar doc
rebar3:
Pouriya@Jahanbakhsh ~/director $ rebar3 edoc
erl
Pouriya@Jahanbakhsh ~/director $ mkdir -p doc && erl -noshell\ -eval "edoc:file(\"./src/director.erl\", [{dir, \"./doc\"}]),init:stop()."
After running one of the above commands, HTML documentation should be in doc
directory.