   ###### 编程技术网

 用户名 Email 自动登录 找回密码 密码 立即注册 # Keras与Caffe的卷积之间的区别是什么?:What&#39;s the difference between convolution in Keras vs Caffe?

wr1ttenyu zhao CNN 2022-5-10 14:51 11人围观 Keras与Caffe的卷积之间的区别是什么?的处理方法

I'm trying to replicate a large Caffe network into Keras (based on tensorflow backend). But I'm having a large trouble doing it even at a single convolutional layer.

Let's say we had a 4D input with shape `(1, 500, 500, 3)`, and we had to perform a single convolution on this input with `96` filters with kernel size of `11` and `4x4` strides.

Let's set our weight and input variables:

``w = np.random.rand(11, 11, 3, 96) # weights 1 b = np.random.rand(96) # weights 2 (bias) x = np.random.rand(500, 500, 3) ``

Keras中的简单卷积:

This is how it could be defined in Keras:

``from keras.layers import Input from keras.layers import Conv2D import numpy as np inp = Input(shape=(500, 500, 3)) conv1 = Conv2D(filters=96, kernel_size=11, strides=(4, 4), activation=keras.activations.relu, padding='valid')(inp) model = keras.Model(inputs=[inp], outputs=conv1) model.layers.set_weights([w, b]) # set weights for convolutional layer predicted = model.predict([x.reshape(1, 500, 500, 3)]) print(predicted.reshape(1, 96, 123, 123)) # reshape keras output in the form of Caffe ``

Caffe中的简单卷积:

` simple.prototxt `:

``name: "simple" input: "inp" input_shape { dim: 1 dim: 3 dim: 500 dim: 500 } layer { name: "conv1" type: "Convolution" bottom: "inp" top: "conv1" param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 } convolution_param { num_output: 96 kernel_size: 11 pad: 0 stride: 4 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } ``

Python中的Caffe:

Caffe in Python:

``import caffe net = caffe.Net('simple.prototxt', caffe.TEST) net.params['conv1'].data[...] = w.reshape(96, 3, 11, 11) # set weights 1 net.params['conv1'].data[...] = b # set weights 2 (bias) net.blobs['inp'].reshape(1, 3, 500, 500) # reshape input layer to fit our input array x print(net.forward(inp=x.reshape(1, 3, 500, 500)).get('conv1')) ``

If we executed both of the snippets of code, we would notice that outputs are different from each other. I understand that there are few differences such as symmetric padding of Caffe, but I didn't even use padding here. Yet the output of Caffe is different from output of Keras...

Why is this so? I know that Theano backend doesn't utilize correlation like Caffe does and hence it requires kernel to be rotated by 180 degrees, but is it the same for tensorflow? from what I know, both Tensorflow and Caffe use cross-correlation instead of Convolution.

How could I make two identical models in Keras and Caffe that use convolution?

Any help would be appreciated, thanks!

### 问题解答

I found the problem, but I'm not sure how to fix it yet...

The difference between these two convolutional layers is alignment of their items. This alignment problem only occurs when number of filters are equal to `N` such that `N > 1 && N > S` where `S` is dimension of filter. In other words, such problem only occurs when we get a multi-dimensional array from convolution which has both number of rows and number of columns greater than 1.

To see this, I simplified my input and output data so that we can better analyze the mechanics of both layers.

` simple.prototxt `:

``input: "input" input_shape { dim: 1 dim: 1 dim: 2 dim: 2 } layer { name: "conv1" type: "Convolution" bottom: "input" top: "conv1" convolution_param { num_output: 2 kernel_size: 1 pad: 0 stride: 1 } } layer { name: "relu1" type: "ReLU" bottom: "conv1" top: "conv1" } ``

` simple.py `:

``import keras import caffe import numpy as np from keras.layers import Input, Conv2D from keras.activations import relu from keras import Model filters = 2 # greater than 1 and ker_size ker_size = 1 _input = np.arange(2 * 2).reshape(2, 2) _weights = [np.reshape([[2 for _ in range(filters)] for _ in range(ker_size*ker_size)], (ker_size, ker_size, 1, filters)), np.reshape([0 for _ in range(filters)], (filters,))] # weights for Keras, main weight is array of 2`s while bias weight is array of 0's _weights_caffe = [_weights.T, _weights.T] # just transpose them for Caffe # Keras Setup keras_input = Input(shape=(2, 2, 1), dtype='float32') keras_conv = Conv2D(filters=filters, kernel_size=ker_size, strides=(1, 1), activation=relu, padding='valid')(keras_input) model = Model(inputs=[keras_input], outputs=keras_conv) model.layers.set_weights([_weights, _weights]) # Caffe Setup net = caffe.Net("simpler.prototxt", caffe.TEST) net.params['conv1'].data[...] = _weights_caffe net.params['conv1'].data[...] = _weights_caffe net.blobs['input'].data[...] = _input.reshape(1, 1, 2, 2) # Predictions print("Input:\n---") print(_input) print(_input.shape) print("\n") print("Caffe:\n---") print(net.forward()['conv1']) print(net.forward()['conv1'].shape) print("\n") print("Keras:\n---") print(model.predict([_input.reshape(1, 2, 2, 1)])) print(model.predict([_input.reshape(1, 2, 2, 1)]).shape) print("\n") ``

``Input: --- [[0 1] [2 3]] (2, 2) Caffe: --- [[[[0. 2.] [4. 6.]] [[0. 2.] [4. 6.]]]] (1, 2, 2, 2) Keras: --- [[[[0. 0.] [2. 2.]] [[4. 4.] [6. 6.]]]] (1, 2, 2, 2) ``

If you look at output by the Caffe model, you'll notice that our `2x2` array is first doubled (so that we have an array of 2 `2x2` arrays) and then matrix multiplication is performed on each of those two arrays with our weight matrix. Something like this:

``[[[[0. 2.] [4. 6.]] [[0. 2.] [4. 6.]]]] ``

``[[[[(0 * 2) (2 * 2)] [(4 * 2) (6 * 2)]] [[(0 * 2) (2 * 2)] [(4 * 2) (6 * 2)]]]] ``

Tensorflow的功能有所不同，在执行与Caffe相同的操作后，似乎首先按升序对齐输出的2D向量.这似乎是一种怪异的行为，我无法理解他们为什么会这样做.

Tensorflow does something different, it seems to first align 2D vectors of output in ascending order after doing the same thing as Caffe did. This seems like a weird behavior, and I'm unable to understand why would they do such thing.

I have answered my own question about the cause of the problem, but I'm not aware of any clean solution yet. I still don't find my answer satisfying enough hence I'm going to accept the question which has the actual solution.

The only solution I know is creation of custom layer, which is not a very neat solution to me. ## 相关推荐

### 【腾讯云】轻量新用户上云福利，2核2G4M 低至 50 元/年 起， 抓住上云好时机！ ^