Batch normalization

Other topics

Prototxt for training

The following is an example definition for training a BatchNorm layer with channel-wise scale and bias. Typically a BatchNorm layer is inserted between convolution and rectification layers. In this example, the convolution would output the blob layerx and the rectification would receive the layerx-bn blob.

layer { bottom: 'layerx' top: 'layerx-bn' name: 'layerx-bn' type: 'BatchNorm'
  batch_norm_param {
    use_global_stats: false  # calculate the mean and variance for each mini-batch
    moving_average_fraction: .999  # doesn't effect training 
  }
  param { lr_mult: 0 } 
  param { lr_mult: 0 } 
  param { lr_mult: 0 }}
# channel-wise scale and bias are separate
layer { bottom: 'layerx-bn' top: 'layerx-bn' name: 'layerx-bn-scale' type: 'Scale',
  scale_param { 
    bias_term: true
    axis: 1      # scale separately for each channel
    num_axes: 1  # ... but not spatially (default)
    filler { type: 'constant' value: 1 }           # initialize scaling to 1
    bias_filler { type: 'constant' value: 0.001 }  # initialize bias
}}

More information can be found in this thread.

Prototxt for deployment

The main change needed is to switch use_global_stats to true. This switches to using the moving average.

layer { bottom: 'layerx' top: 'layerx-bn' name: 'layerx-bn' type: 'BatchNorm'
  batch_norm_param {
    use_global_stats: true  # use pre-calculated average and variance
  }
  param { lr_mult: 0 } 
  param { lr_mult: 0 } 
  param { lr_mult: 0 }}
# channel-wise scale and bias are separate
layer { bottom: 'layerx-bn' top: 'layerx-bn' name: 'layerx-bn-scale' type: 'Scale',
  scale_param { 
    bias_term: true
    axis: 1      # scale separately for each channel
    num_axes: 1  # ... but not spatially (default)
}}

Parameters:

ParameterDetails
use_global_statsFrom rohrbach's post from 2nd March 2016 - maybe he knows:
(use_global_stats)"By default, during training time, the network is computing global mean/ variance statistics via a running average, which is then used at test time to allow deterministic outputs for each input. You can manually toggle whether the network is accumulating or using the statistics via the use_global_stats option. IMPORTANT: for this feature to work, you MUST set the learning rate to zero for all three parameter blobs, i.e., param {lr_mult: 0} three times in the layer definition.
(use_global_stats)This means by default (as the following is set in batch_norm_layer.cpp), you don't have to set use_global_stats at all in the prototxt. use_global_stats_ = this->phase_ == TEST;"

Contributors

Topic Id: 6575

Example Ids: 22488,22489

This site is not affiliated with any of the contributors.