Numpy reshape and transpose
For almost all who worked with Numpy, who must have worked with multi-dimensional arrays or even higher dimensional tensors. Reshape and transpose two methods are inevitably used to manipulate the structure in order to fit desired data shape. The concept is not as in intuitive to grasp at the beginning, but after some understanding, it became relatively easy.
The first hurdle may be to picture a high dimensional array
(10000, 32, 16, 3). Using the intuition obtained from C array, it gets tricky after three dimensions what it can be represented by a cube, with each direction represents an axis.
For a higher dimensional array, picture it as a tree structure instead. Each number after the first number is the number of child branches of the previous node. For the example above it becomes something like this. There are 10000 trees where the root has 32 branches and each of that has another 16 branches, and a future 3 branches for each of those 16 branches.
numpy.reshape takes a shape as input, and format array into that shape. An intuitive way to think of it is that Numpy flattens your array into a plain list, and truncate the long flattened list into the new form.
data = array( [[ [ 0, 1], [ 2, 3] ], [ [ 4, 5], [ 6, 7] ], [ [ 8, 9], [10, 11] ]]) >> data.reshape(3, 4) array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]])
The example reshape an array of shape (3, 2, 2) into shape (3, 4) Notice it feels that it pulls the original array into a one-dimensional array and truncated it into shape(3, 4). This is just an easy way to think.
Transpose, on the other hand, is easy to understand and work out in a two-dimensional array but in a higher dimensional setting.
Transpose switches row and column so
[[1 2], will become [[1, 3] [3 4]] [2, 4]]
transpose method from Numpy also takes axes as input so you may change what axes to invert, this is very useful for a tensor. Eg.
data.transpose(1,0,2) where 0, 1, 2 stands for the axes. The
0 refers to the outermost array.
Assume there is a dataset of shape (10000, 3072). For each of 10,000 row, 3072 consists 1024 pixels in RGB format. First 1024 columns are the R channel value, another 1024 for the green and last 1024 for the blue channel, which they add up 3072 columns.
Now we need to reconstruct each pixel so our program displays the image properly. It is required to have the shape of (10000, 32, 32, 3). Assuming the values for each row starts with 0 and ends in 3071. The first pixel should have the value of (0, 1024, 2048) instead of (0, 1, 2). (It takes RGB value for each chunk of 1024 values)
We could achieve this in two steps, reshape and transpose.
Firstly, create the data to work with
>>> data = np.vstack([np.arange(3072) for _ in range(10000) ]) >>> data.shape (10000, 3072)
>>> data = data.reshape(10000, 3, 32, 32) >>> data.shape (10000, 3, 32, 32)
This breaks the 3072 columns into the shape of
[ [R1...R1024], [G1...G1024], [B1....B1024]]
But we need
[[R1, G1, B1], [R2, G2, B2],..[R1024, G1024, B1024]]
>>> data = data.transpose([0, 2, 3, 1]) >>> data.shape (10000, 32, 32, 3)
Let’s inspect the first pixel
>> data[0,0,0] array([0, 1024, 2048])
That’s exactly what we want!