9.8 C
United States of America
Sunday, November 24, 2024

Getting aware of torch tensors



Getting aware of torch tensors

Two days in the past, I launched torch, an R package deal that gives the native performance that is delivered to Python customers by PyTorch. In that put up, I assumed fundamental familiarity with TensorFlow/Keras. Consequently, I portrayed torch in a means I figured can be useful to somebody who “grew up” with the Keras means of coaching a mannequin: Aiming to concentrate on variations, but not lose sight of the general course of.

This put up now modifications perspective. We code a easy neural community “from scratch”, making use of simply one in all torch’s constructing blocks: tensors. This community can be as “uncooked” (low-level) as could be. (For the much less math-inclined individuals amongst us, it might function a refresher of what’s really occurring beneath all these comfort instruments they constructed for us. However the true objective is as an example what could be executed with tensors alone.)

Subsequently, three posts will progressively present cut back the trouble – noticeably proper from the beginning, enormously as soon as we end. On the finish of this mini-series, you should have seen how computerized differentiation works in torch, use modules (layers, in keras converse, and compositions thereof), and optimizers. By then, you’ll have loads of the background fascinating when making use of torch to real-world duties.

This put up would be the longest, since there’s a lot to study tensors: Easy methods to create them; manipulate their contents and/or modify their shapes; convert them to R arrays, matrices or vectors; and naturally, given the omnipresent want for velocity: get all these operations executed on the GPU. As soon as we’ve cleared that agenda, we code the aforementioned little community, seeing all these features in motion.

Tensors

Creation

Tensors could also be created by specifying particular person values. Right here we create two one-dimensional tensors (vectors), of sorts float and bool, respectively:

library(torch)
# a 1d vector of size 2
t <- torch_tensor(c(1, 2))
t

# additionally 1d, however of kind boolean
t <- torch_tensor(c(TRUE, FALSE))
t
torch_tensor 
 1
 2
[ CPUFloatType{2} ]

torch_tensor 
 1
 0
[ CPUBoolType{2} ]

And listed here are two methods to create two-dimensional tensors (matrices). Word how within the second strategy, it’s worthwhile to specify byrow = TRUE within the name to matrix() to get values organized in row-major order.

# a 3x3 tensor (matrix)
t <- torch_tensor(rbind(c(1,2,0), c(3,0,0), c(4,5,6)))
t

# additionally 3x3
t <- torch_tensor(matrix(1:9, ncol = 3, byrow = TRUE))
t
torch_tensor 
 1  2  0
 3  0  0
 4  5  6
[ CPUFloatType{3,3} ]

torch_tensor 
 1  2  3
 4  5  6
 7  8  9
[ CPULongType{3,3} ]

In increased dimensions particularly, it may be simpler to specify the kind of tensor abstractly, as in: “give me a tensor of <…> of form n1 x n2”, the place <…> may very well be “zeros”; or “ones”; or, say, “values drawn from a regular regular distribution”:

# a 3x3 tensor of standard-normally distributed values
t <- torch_randn(3, 3)
t

# a 4x2x2 (3d) tensor of zeroes
t <- torch_zeros(4, 2, 2)
t
torch_tensor 
-2.1563  1.7085  0.5245
 0.8955 -0.6854  0.2418
 0.4193 -0.7742 -1.0399
[ CPUFloatType{3,3} ]

torch_tensor 
(1,.,.) = 
  0  0
  0  0

(2,.,.) = 
  0  0
  0  0

(3,.,.) = 
  0  0
  0  0

(4,.,.) = 
  0  0
  0  0
[ CPUFloatType{4,2,2} ]

Many related capabilities exist, together with, e.g., torch_arange() to create a tensor holding a sequence of evenly spaced values, torch_eye() which returns an identification matrix, and torch_logspace() which fills a specified vary with a listing of values spaced logarithmically.

If no dtype argument is specified, torch will infer the information kind from the passed-in worth(s). For instance:

t <- torch_tensor(c(3, 5, 7))
t$dtype

t <- torch_tensor(1L)
t$dtype
torch_Float
torch_Long

However we are able to explicitly request a special dtype if we would like:

t <- torch_tensor(2, dtype = torch_double())
t$dtype
torch_Double

torch tensors stay on a gadget. By default, this would be the CPU:

torch_device(kind='cpu')

However we may additionally outline a tensor to stay on the GPU:

t <- torch_tensor(2, gadget = "cuda")
t$gadget
torch_device(kind='cuda', index=0)

We’ll speak extra about units under.

There may be one other crucial parameter to the tensor-creation capabilities: requires_grad. Right here although, I have to ask to your persistence: This one will prominently determine within the follow-up put up.

Conversion to built-in R knowledge sorts

To transform torch tensors to R, use as_array():

t <- torch_tensor(matrix(1:9, ncol = 3, byrow = TRUE))
as_array(t)
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9

Relying on whether or not the tensor is one-, two-, or three-dimensional, the ensuing R object can be a vector, a matrix, or an array:

t <- torch_tensor(c(1, 2, 3))
as_array(t) %>% class()

t <- torch_ones(c(2, 2))
as_array(t) %>% class()

t <- torch_ones(c(2, 2, 2))
as_array(t) %>% class()
[1] "numeric"

[1] "matrix" "array" 

[1] "array"

For one-dimensional and two-dimensional tensors, additionally it is attainable to make use of as.integer() / as.matrix(). (One purpose you may need to do that is to have extra self-documenting code.)

If a tensor at the moment lives on the GPU, it’s worthwhile to transfer it to the CPU first:

t <- torch_tensor(2, gadget = "cuda")
as.integer(t$cpu())
[1] 2

Indexing and slicing tensors

Usually, we need to retrieve not a whole tensor, however solely a number of the values it holds, and even only a single worth. In these circumstances, we speak about slicing and indexing, respectively.

In R, these operations are 1-based, which means that after we specify offsets, we assume for the very first factor in an array to reside at offset 1. The identical habits was carried out for torch. Thus, loads of the performance described on this part ought to really feel intuitive.

The way in which I’m organizing this part is the next. We’ll examine the intuitive elements first, the place by intuitive I imply: intuitive to the R consumer who has not but labored with Python’s NumPy. Then come issues which, to this consumer, could look extra shocking, however will turn into fairly helpful.

Indexing and slicing: the R-like half

None of those ought to be overly shocking:

t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))
t

# a single worth
t[1, 1]

# first row, all columns
t[1, ]

# first row, a subset of columns
t[1, 1:2]
torch_tensor 
 1  2  3
 4  5  6
[ CPUFloatType{2,3} ]

torch_tensor 
1
[ CPUFloatType{} ]

torch_tensor 
 1
 2
 3
[ CPUFloatType{3} ]

torch_tensor 
 1
 2
[ CPUFloatType{2} ]

Word how, simply as in R, singleton dimensions are dropped:

t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))

# 2x3
t$dimension() 

# only a single row: can be returned as a vector
t[1, 1:2]$dimension() 

# a single factor
t[1, 1]$dimension()
[1] 2 3

[1] 2

integer(0)

And identical to in R, you may specify drop = FALSE to maintain these dimensions:

t[1, 1:2, drop = FALSE]$dimension()

t[1, 1, drop = FALSE]$dimension()
[1] 1 2

[1] 1 1

Indexing and slicing: What to look out for

Whereas R makes use of unfavorable numbers to take away components at specified positions, in torch unfavorable values point out that we begin counting from the tip of a tensor – with -1 pointing to its final factor:

t <- torch_tensor(rbind(c(1,2,3), c(4,5,6)))

t[1, -1]

t[ , -2:-1] 
torch_tensor 
3
[ CPUFloatType{} ]

torch_tensor 
 2  3
 5  6
[ CPUFloatType{2,2} ]

This can be a function you may know from NumPy. Similar with the next.

When the slicing expression m:n is augmented by one other colon and a 3rd quantity – m:n:o –, we’ll take each oth merchandise from the vary specified by m and n:

t <- torch_tensor(1:10)
t[2:10:2]
torch_tensor 
  2
  4
  6
  8
 10
[ CPULongType{5} ]

Typically we don’t know what number of dimensions a tensor has, however we do know what to do with the ultimate dimension, or the primary one. To subsume all others, we are able to use ..:

t <- torch_randint(-7, 7, dimension = c(2, 2, 2))
t

t[.., 1]

t[2, ..]
torch_tensor 
(1,.,.) = 
  2 -2
 -5  4

(2,.,.) = 
  0  4
 -3 -1
[ CPUFloatType{2,2,2} ]

torch_tensor 
 2 -5
 0 -3
[ CPUFloatType{2,2} ]

torch_tensor 
 0  4
-3 -1
[ CPUFloatType{2,2} ]

Now we transfer on to a subject that, in observe, is simply as indispensable as slicing: altering tensor shapes.

Reshaping tensors

Modifications in form can happen in two essentially other ways. Seeing how “reshape” actually means: maintain the values however modify their structure, we may both alter how they’re organized bodily, or maintain the bodily construction as-is and simply change the “mapping” (a semantic change, because it had been).

Within the first case, storage should be allotted for 2 tensors, supply and goal, and components can be copied from the latter to the previous. Within the second, bodily there can be only a single tensor, referenced by two logical entities with distinct metadata.

Not surprisingly, for efficiency causes, the second operation is most popular.

Zero-copy reshaping

We begin with zero-copy strategies, as we’ll need to use them each time we are able to.

A particular case usually seen in observe is including or eradicating a singleton dimension.

unsqueeze() provides a dimension of dimension 1 at a place specified by dim:

t1 <- torch_randint(low = 3, excessive = 7, dimension = c(3, 3, 3))
t1$dimension()

t2 <- t1$unsqueeze(dim = 1)
t2$dimension()

t3 <- t1$unsqueeze(dim = 2)
t3$dimension()
[1] 3 3 3

[1] 1 3 3 3

[1] 3 1 3 3

Conversely, squeeze() removes singleton dimensions:

t4 <- t3$squeeze()
t4$dimension()
[1] 3 3 3

The identical may very well be achieved with view(). view(), nevertheless, is way more normal, in that it lets you reshape the information to any legitimate dimensionality. (Legitimate which means: The variety of components stays the identical.)

Right here we now have a 3x2 tensor that’s reshaped to dimension 2x3:

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t1

t2 <- t1$view(c(2, 3))
t2
torch_tensor 
 1  2
 3  4
 5  6
[ CPUFloatType{3,2} ]

torch_tensor 
 1  2  3
 4  5  6
[ CPUFloatType{2,3} ]

(Word how that is totally different from matrix transposition.)

As an alternative of going from two to 3 dimensions, we are able to flatten the matrix to a vector.

t4 <- t1$view(c(-1, 6))

t4$dimension()

t4
[1] 1 6

torch_tensor 
 1  2  3  4  5  6
[ CPUFloatType{1,6} ]

In distinction to indexing operations, this doesn’t drop dimensions.

Like we mentioned above, operations like squeeze() or view() don’t make copies. Or, put in another way: The output tensor shares storage with the enter tensor. We are able to in actual fact confirm this ourselves:

t1$storage()$data_ptr()

t2$storage()$data_ptr()
[1] "0x5648d02ac800"

[1] "0x5648d02ac800"

What’s totally different is the storage metadata torch retains about each tensors. Right here, the related info is the stride:

A tensor’s stride() technique tracks, for each dimension, what number of components must be traversed to reach at its subsequent factor (row or column, in two dimensions). For t1 above, of form 3x2, we now have to skip over 2 gadgets to reach on the subsequent row. To reach on the subsequent column although, in each row we simply must skip a single entry:

[1] 2 1

For t2, of form 3x2, the gap between column components is identical, however the distance between rows is now 3:

[1] 3 1

Whereas zero-copy operations are optimum, there are circumstances the place they received’t work.

With view(), this could occur when a tensor was obtained by way of an operation – apart from view() itself – that itself has already modified the stride. One instance can be transpose():

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t1
t1$stride()

t2 <- t1$t()
t2
t2$stride()
torch_tensor 
 1  2
 3  4
 5  6
[ CPUFloatType{3,2} ]

[1] 2 1

torch_tensor 
 1  3  5
 2  4  6
[ CPUFloatType{2,3} ]

[1] 1 2

In torch lingo, tensors – like t2 – that re-use present storage (and simply learn it in another way), are mentioned to not be “contiguous”. One approach to reshape them is to make use of contiguous() on them earlier than. We’ll see this within the subsequent subsection.

Reshape with copy

Within the following snippet, attempting to reshape t2 utilizing view() fails, because it already carries info indicating that the underlying knowledge shouldn’t be learn in bodily order.

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))

t2 <- t1$t()

t2$view(6) # error!
Error in (perform (self, dimension)  : 
  view dimension will not be appropriate with enter tensor's dimension and stride (no less than one dimension spans throughout two contiguous subspaces).
  Use .reshape(...) as an alternative. (view at ../aten/src/ATen/native/TensorShape.cpp:1364)

Nevertheless, if we first name contiguous() on it, a new tensor is created, which can then be (just about) reshaped utilizing view().

t3 <- t2$contiguous()

t3$view(6)
torch_tensor 
 1
 3
 5
 2
 4
 6
[ CPUFloatType{6} ]

Alternatively, we are able to use reshape(). reshape() defaults to view()-like habits if attainable; in any other case it’s going to create a bodily copy.

t2$storage()$data_ptr()

t4 <- t2$reshape(6)

t4$storage()$data_ptr()
[1] "0x5648d49b4f40"

[1] "0x5648d2752980"

Operations on tensors

Unsurprisingly, torch gives a bunch of mathematical operations on tensors; we’ll see a few of them within the community code under, and also you’ll encounter heaps extra whenever you proceed your torch journey. Right here, we rapidly check out the general tensor technique semantics.

Tensor strategies usually return references to new objects. Right here, we add to t1 a clone of itself:

t1 <- torch_tensor(rbind(c(1, 2), c(3, 4), c(5, 6)))
t2 <- t1$clone()

t1$add(t2)
torch_tensor 
  2   4
  6   8
 10  12
[ CPUFloatType{3,2} ]

On this course of, t1 has not been modified:

torch_tensor 
 1  2
 3  4
 5  6
[ CPUFloatType{3,2} ]

Many tensor strategies have variants for mutating operations. These all carry a trailing underscore:

t1$add_(t1)

# now t1 has been modified
t1
torch_tensor 
  4   8
 12  16
 20  24
[ CPUFloatType{3,2} ]

torch_tensor 
  4   8
 12  16
 20  24
[ CPUFloatType{3,2} ]

Alternatively, you may in fact assign the brand new object to a brand new reference variable:

torch_tensor 
  8  16
 24  32
 40  48
[ CPUFloatType{3,2} ]

There may be one factor we have to talk about earlier than we wrap up our introduction to tensors: How can we now have all these operations executed on the GPU?

Operating on GPU

To test in case your GPU(s) is/are seen to torch, run

cuda_is_available()

cuda_device_count()
[1] TRUE

[1] 1

Tensors could also be requested to stay on the GPU proper at creation:

gadget <- torch_device("cuda")

t <- torch_ones(c(2, 2), gadget = gadget) 

Alternatively, they are often moved between units at any time:

torch_device(kind='cuda', index=0)
torch_device(kind='cpu')

That’s it for our dialogue on tensors — nearly. There may be one torch function that, though associated to tensor operations, deserves particular point out. It’s known as broadcasting, and “bilingual” (R + Python) customers will realize it from NumPy.

Broadcasting

We regularly must carry out operations on tensors with shapes that don’t match precisely.

Unsurprisingly, we are able to add a scalar to a tensor:

t1 <- torch_randn(c(3,5))

t1 + 22
torch_tensor 
 23.1097  21.4425  22.7732  22.2973  21.4128
 22.6936  21.8829  21.1463  21.6781  21.0827
 22.5672  21.2210  21.2344  23.1154  20.5004
[ CPUFloatType{3,5} ]

The identical will work if we add tensor of dimension 1:

t1 <- torch_randn(c(3,5))

t1 + torch_tensor(c(22))

Including tensors of various sizes usually received’t work:

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5,5))

t1$add(t2) # error
Error in (perform (self, different, alpha)  : 
  The dimensions of tensor a (2) should match the scale of tensor b (5) at non-singleton dimension 1 (infer_size at ../aten/src/ATen/ExpandUtils.cpp:24)

Nevertheless, underneath sure situations, one or each tensors could also be just about expanded so each tensors line up. This habits is what is supposed by broadcasting. The way in which it really works in torch is not only impressed by, however really an identical to that of NumPy.

The foundations are:

  1. We align array shapes, ranging from the correct.

    Say we now have two tensors, one in all dimension 8x1x6x1, the opposite of dimension 7x1x5.

    Right here they’re, right-aligned:

# t1, form:     8  1  6  1
# t2, form:        7  1  5
  1. Beginning to look from the correct, the sizes alongside aligned axes both must match precisely, or one in all them needs to be equal to 1: through which case the latter is broadcast to the bigger one.

    Within the above instance, that is the case for the second-from-last dimension. This now offers

# t1, form:     8  1  6  1
# t2, form:        7  6  5

, with broadcasting occurring in t2.

  1. If on the left, one of many arrays has a further axis (or multiple), the opposite is just about expanded to have a dimension of 1 in that place, through which case broadcasting will occur as acknowledged in (2).

    That is the case with t1’s leftmost dimension. First, there’s a digital enlargement

# t1, form:     8  1  6  1
# t2, form:     1  7  1  5

after which, broadcasting occurs:

# t1, form:     8  1  6  1
# t2, form:     8  7  1  5

In line with these guidelines, our above instance

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5,5))

t1$add(t2)

may very well be modified in varied ways in which would enable for including two tensors.

For instance, if t2 had been 1x5, it could solely have to get broadcast to dimension 3x5 earlier than the addition operation:

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(1,5))

t1$add(t2)
torch_tensor 
-1.0505  1.5811  1.1956 -0.0445  0.5373
 0.0779  2.4273  2.1518 -0.6136  2.6295
 0.1386 -0.6107 -1.2527 -1.3256 -0.1009
[ CPUFloatType{3,5} ]

If it had been of dimension 5, a digital main dimension can be added, after which, the identical broadcasting would happen as within the earlier case.

t1 <- torch_randn(c(3,5))
t2 <- torch_randn(c(5))

t1$add(t2)
torch_tensor 
-1.4123  2.1392 -0.9891  1.1636 -1.4960
 0.8147  1.0368 -2.6144  0.6075 -2.0776
-2.3502  1.4165  0.4651 -0.8816 -1.0685
[ CPUFloatType{3,5} ]

Here’s a extra complicated instance. Broadcasting how occurs each in t1 and in t2:

t1 <- torch_randn(c(1,5))
t2 <- torch_randn(c(3,1))

t1$add(t2)
torch_tensor 
 1.2274  1.1880  0.8531  1.8511 -0.0627
 0.2639  0.2246 -0.1103  0.8877 -1.0262
-1.5951 -1.6344 -1.9693 -0.9713 -2.8852
[ CPUFloatType{3,5} ]

As a pleasant concluding instance, by broadcasting an outer product could be computed like so:

t1 <- torch_tensor(c(0, 10, 20, 30))

t2 <- torch_tensor(c(1, 2, 3))

t1$view(c(4,1)) * t2
torch_tensor 
  0   0   0
 10  20  30
 20  40  60
 30  60  90
[ CPUFloatType{4,3} ]

And now, we actually get to implementing that neural community!

A easy neural community utilizing torch tensors

Our process, which we strategy in a low-level means at the moment however significantly simplify in upcoming installments, consists of regressing a single goal datum based mostly on three enter variables.

We immediately use torch to simulate some knowledge.

Toy knowledge

library(torch)

# enter dimensionality (variety of enter options)
d_in <- 3
# output dimensionality (variety of predicted options)
d_out <- 1
# variety of observations in coaching set
n <- 100


# create random knowledge
# enter
x <- torch_randn(n, d_in)
# goal
y <- x[, 1, drop = FALSE] * 0.2 -
  x[, 2, drop = FALSE] * 1.3 -
  x[, 3, drop = FALSE] * 0.5 +
  torch_randn(n, 1)

Subsequent, we have to initialize the community’s weights. We’ll have one hidden layer, with 32 items. The output layer’s dimension, being decided by the duty, is the same as 1.

Initialize weights

# dimensionality of hidden layer
d_hidden <- 32

# weights connecting enter to hidden layer
w1 <- torch_randn(d_in, d_hidden)
# weights connecting hidden to output layer
w2 <- torch_randn(d_hidden, d_out)

# hidden layer bias
b1 <- torch_zeros(1, d_hidden)
# output layer bias
b2 <- torch_zeros(1, d_out)

Now for the coaching loop correct. The coaching loop right here actually is the community.

Coaching loop

In every iteration (“epoch”), the coaching loop does 4 issues:

  • runs by the community, computing predictions (ahead cross)

  • compares these predictions to the bottom fact and quantify the loss

  • runs backwards by the community, computing the gradients that point out how the weights ought to be modified

  • updates the weights, making use of the requested studying price.

Right here is the template we’re going to fill:

for (t in 1:200) {
    
    ### -------- Ahead cross -------- 
    
    # right here we'll compute the prediction
    
    
    ### -------- compute loss -------- 
    
    # right here we'll compute the sum of squared errors
    

    ### -------- Backpropagation -------- 
    
    # right here we'll cross by the community, calculating the required gradients
    

    ### -------- Replace weights -------- 
    
    # right here we'll replace the weights, subtracting portion of the gradients 
}

The ahead cross effectuates two affine transformations, one every for the hidden and output layers. In-between, ReLU activation is utilized:

  # compute pre-activations of hidden layers (dim: 100 x 32)
  # torch_mm does matrix multiplication
  h <- x$mm(w1) + b1
  
  # apply activation perform (dim: 100 x 32)
  # torch_clamp cuts off values under/above given thresholds
  h_relu <- h$clamp(min = 0)
  
  # compute output (dim: 100 x 1)
  y_pred <- h_relu$mm(w2) + b2

Our loss right here is imply squared error:

Calculating gradients the handbook means is a bit tedious, however it may be executed:

  # gradient of loss w.r.t. prediction (dim: 100 x 1)
  grad_y_pred <- 2 * (y_pred - y)
  # gradient of loss w.r.t. w2 (dim: 32 x 1)
  grad_w2 <- h_relu$t()$mm(grad_y_pred)
  # gradient of loss w.r.t. hidden activation (dim: 100 x 32)
  grad_h_relu <- grad_y_pred$mm(w2$t())
  # gradient of loss w.r.t. hidden pre-activation (dim: 100 x 32)
  grad_h <- grad_h_relu$clone()
  
  grad_h[h < 0] <- 0
  
  # gradient of loss w.r.t. b2 (form: ())
  grad_b2 <- grad_y_pred$sum()
  
  # gradient of loss w.r.t. w1 (dim: 3 x 32)
  grad_w1 <- x$t()$mm(grad_h)
  # gradient of loss w.r.t. b1 (form: (32, ))
  grad_b1 <- grad_h$sum(dim = 1)

The ultimate step then makes use of the calculated gradients to replace the weights:

  learning_rate <- 1e-4
  
  w2 <- w2 - learning_rate * grad_w2
  b2 <- b2 - learning_rate * grad_b2
  w1 <- w1 - learning_rate * grad_w1
  b1 <- b1 - learning_rate * grad_b1

Let’s use these snippets to fill within the gaps within the above template, and provides it a attempt!

Placing all of it collectively

library(torch)

### generate coaching knowledge -----------------------------------------------------

# enter dimensionality (variety of enter options)
d_in <- 3
# output dimensionality (variety of predicted options)
d_out <- 1
# variety of observations in coaching set
n <- 100


# create random knowledge
x <- torch_randn(n, d_in)
y <-
  x[, 1, NULL] * 0.2 - x[, 2, NULL] * 1.3 - x[, 3, NULL] * 0.5 + torch_randn(n, 1)


### initialize weights ---------------------------------------------------------

# dimensionality of hidden layer
d_hidden <- 32
# weights connecting enter to hidden layer
w1 <- torch_randn(d_in, d_hidden)
# weights connecting hidden to output layer
w2 <- torch_randn(d_hidden, d_out)

# hidden layer bias
b1 <- torch_zeros(1, d_hidden)
# output layer bias
b2 <- torch_zeros(1, d_out)

### community parameters ---------------------------------------------------------

learning_rate <- 1e-4

### coaching loop --------------------------------------------------------------

for (t in 1:200) {
  ### -------- Ahead cross --------
  
  # compute pre-activations of hidden layers (dim: 100 x 32)
  h <- x$mm(w1) + b1
  # apply activation perform (dim: 100 x 32)
  h_relu <- h$clamp(min = 0)
  # compute output (dim: 100 x 1)
  y_pred <- h_relu$mm(w2) + b2
  
  ### -------- compute loss --------

  loss <- as.numeric((y_pred - y)$pow(2)$sum())
  
  if (t %% 10 == 0)
    cat("Epoch: ", t, "   Loss: ", loss, "n")
  
  ### -------- Backpropagation --------
  
  # gradient of loss w.r.t. prediction (dim: 100 x 1)
  grad_y_pred <- 2 * (y_pred - y)
  # gradient of loss w.r.t. w2 (dim: 32 x 1)
  grad_w2 <- h_relu$t()$mm(grad_y_pred)
  # gradient of loss w.r.t. hidden activation (dim: 100 x 32)
  grad_h_relu <- grad_y_pred$mm(
    w2$t())
  # gradient of loss w.r.t. hidden pre-activation (dim: 100 x 32)
  grad_h <- grad_h_relu$clone()
  
  grad_h[h < 0] <- 0
  
  # gradient of loss w.r.t. b2 (form: ())
  grad_b2 <- grad_y_pred$sum()
  
  # gradient of loss w.r.t. w1 (dim: 3 x 32)
  grad_w1 <- x$t()$mm(grad_h)
  # gradient of loss w.r.t. b1 (form: (32, ))
  grad_b1 <- grad_h$sum(dim = 1)
  
  ### -------- Replace weights --------
  
  w2 <- w2 - learning_rate * grad_w2
  b2 <- b2 - learning_rate * grad_b2
  w1 <- w1 - learning_rate * grad_w1
  b1 <- b1 - learning_rate * grad_b1
  
}
Epoch:  10     Loss:  352.3585 
Epoch:  20     Loss:  219.3624 
Epoch:  30     Loss:  155.2307 
Epoch:  40     Loss:  124.5716 
Epoch:  50     Loss:  109.2687 
Epoch:  60     Loss:  100.1543 
Epoch:  70     Loss:  94.77817 
Epoch:  80     Loss:  91.57003 
Epoch:  90     Loss:  89.37974 
Epoch:  100    Loss:  87.64617 
Epoch:  110    Loss:  86.3077 
Epoch:  120    Loss:  85.25118 
Epoch:  130    Loss:  84.37959 
Epoch:  140    Loss:  83.44133 
Epoch:  150    Loss:  82.60386 
Epoch:  160    Loss:  81.85324 
Epoch:  170    Loss:  81.23454 
Epoch:  180    Loss:  80.68679 
Epoch:  190    Loss:  80.16555 
Epoch:  200    Loss:  79.67953 

This seems to be prefer it labored fairly properly! It additionally ought to have fulfilled its objective: Exhibiting what you may obtain utilizing torch tensors alone. In case you didn’t really feel like going by the backprop logic with an excessive amount of enthusiasm, don’t fear: Within the subsequent installment, this can get considerably much less cumbersome. See you then!

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles