Popular Deep Learning Architectures using EINOPS
In this section we will be rewriting the building blocks of deep learning
in both the traditional PyTorch
way as well as using einops
library.
Imports
Firstly, we will import the necessary libraries to be used.
1 2 3 4 5 6 7 8 9 10 11 |
|
Simple ConvNet
Using only PyTorch
Here is an implementation of a simple ConvNet using only PyTorch
without einops
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Using EINOPS + PyTorch
Implementing the same above ConvNet using einops
& PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Why prefer EINOPS implementation?
Following are the reasons to prefer the new implementation:
- In the original code, if the input is changed and the
batch_size
is divisible by 16 (which usually is), we will get something senseless after reshaping.-
The new code using
einops
explicitly raise ERROR in the above scenario. Hence better!!
-
- We won't forget to use the flag
self.training
with the new implementation. - Code is straightforward to read and analyze.
-
nn.Sequential
makes printing/saving/passing trivial. And there is no need in your code to load the model (which also has lots of benefits). - Don't need
logsoftmax
? Now, you can useconv_net_new[-1]
. Another reason to prefernn.Sequential
- ... And we culd also add inplace
ReLU
Super-resolution
Only PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 |
|
Improvements over the old implementation
- No need in special instruction
pixel_shuffle
(& the result is transferrable b/w the frameworks) - Output does not contain a fake axis (& we could do the same for the input)
- inplace
ReLU
used now. For high resolution images this becomes critical and saves a lot of memory. - and all the benefits of
nn.Sequential
Gram Matrix / Style Transfer
Restyling Graam Matrix for style transfer.
Original Code using ONLY PyTorch
The original code is already very good. First line shows what kind of input is expected.
1 2 3 4 5 6 |
|
Using EINSUM
1 2 3 |
|
Improvements
einsum
operations should be read like:
- For each batch & each pair of channels we sum over
h
andw
. - The normalization is also changed, because that's how Gram Matrix is defined. Else we should call it Normalized Gram Matrix or alike.
Recurrent Models (RNNs)
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Improving RNN
Only PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
Channel Shuffle (from ShuffleNet)
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Using EINOPS
1 2 |
|
ShuffleNet
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
Improvements
Rewriting the code helped to identify the following:
- There is no sense in doing reshuffling and not using groups in the first convolution (indeed in the paper it is not so). However , the result is an equivalent model.
- It is strage that the first convolution may not be grouped, while the last convolution is always grouped. (and th's different from the paper)
Also,
- There is an identity layer for pyTorch introduced here.
- The last thing to do is to get rid of
conv1x1
andconv3x3
(those are not better than the standard implementation)
ResNet
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 |
|
FastText
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 |
|
Using EINOPS
1 2 3 4 5 6 7 8 |
|
- Here, the first and last operations (highlighted) do nothing and can be removed. But, were added to explicitly added to show expected input and output shape
- This also gives us the flexibility of changing interface by editing a single line.
Should you need to accept inputs of shape
(b, t)
we just need to change the line toRearrange("b t -> t b")
CNNs for text classification
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
|
Discussion
- Original code misuses
nn.Conv2d
whilenn.Conv1d
is the right choice. - New code can work with any number of
filter_sizes
and won't fail. - First line in the new code does nothing, but was just added for simplicity & clarity of shapes.
Highway Convolutions
ONLY PyTorch
1 2 3 4 5 6 |
|
Using EINOPS
1 2 3 4 5 6 |
|
Simple Attention
ONLY PyTorch
1 2 3 4 5 6 7 8 9 |
|
Using EINOPS
1 2 3 4 5 6 |
|
Multi-head Attention
ONLY PyTorch
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
Using EINOPS
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|