您现在的位置是:首页 >学无止境 >Linear Regression in mojo with NDBuffer网站首页学无止境

Linear Regression in mojo with NDBuffer

最老程序员闫涛 2024-06-26 14:23:30
简介Linear Regression in mojo with NDBuffer

The linear regression is the simplest machine learning algorithm. In this article I will use mojo NDBuffer to implement a simple linear regression algorithm from scratch. I will use NDArray class which was developed by in the previous article. First import the necessary libs and NDArray definition:

# common import
from String import String
from Bool import Bool
from List import VariadicList
from Buffer import NDBuffer
from List import DimList
from DType import DType
from Pointer import DTypePointer
from TargetInfo import dtype_sizeof, dtype_simd_width
from Index import StaticIntTuple
from Random import rand

alias nelts = dtype_simd_width[DType.f32]()

struct NDArray[rank:Int, dims:DimList, dtype:DType]:
    var ndb:NDBuffer[rank, dims, dtype]
    
    fn __init__(inout self, size:Int):
        let data = DTypePointer[dtype].alloc(size)
        self.ndb = NDBuffer[rank, dims, dtype](data)
        
    fn __getitem__(self, idxs:StaticIntTuple[rank]) -> SIMD[dtype,1]:
        return self.ndb.simd_load[1](idxs)
    
    fn __setitem__(self, idxs:StaticIntTuple[rank], val:SIMD[dtype,1]):
        self.ndb.simd_store[1](idxs, val)

Let’s assume we want to figure out this function:
y = W ⋅ X y = W cdot X y=WX
Where:

  • W is the parameter
  • X is the sample design matrix. Each row is a sample. Each sample is a n dimension vector x ∈ R n oldsymbol{x} in R^{n} xRn. If we have m samples then X ∈ R m × n X in R^{m imes n} XRm×n
    Here we will deal with a very simple toy problem. We assume n = 3 n=3 n=3 and m = 5 m=5 m=5. Let’s define the ith sample:
    x ( i ) = [ x 1 ( i ) x 2 ( i ) x 3 ( i ) ] ∈ R 3 × 1 , i ∈ { 1 , 2 , 3 , 4 , 5 } oldsymbol{x}^{(i)} = egin{bmatrix} x^{(i)}_1 \ x^{(i)}_2 \ x^{(i)}_3 end{bmatrix} in R^{3 imes 1}, i in { 1, 2, 3, 4, 5} x(i)= x1(i)x2(i)x3(i) R3×1,i{1,2,3,4,5}
    Notes:
  • i is the index of the sample;
  • 1, 2, 3 is the subscript of the feature dimension;
  • m is the total number of samples. In this case m=5;
  • n is the dimension of the feature vector. in this case n=3;

Let’s generate the dataset:

alias X_rank = 2
alias r1 = 5
alias r2 = 3
var X_size = r1 * r2
var X = NDArray[X_rank, DimList(r1, r2), DType.f32](X_size)
# geneate random number and set to X
var rvs = DTypePointer[DType.f32].alloc(X_size)
rand[DType.f32](rvs, X_size)
for d1 in range(r1):
    for d2 in range(r2):
        X[StaticIntTuple[X_rank](d1, d2)] = rvs.load(d1*r2+d2)*5.0 + 1.0

Let’s define the parameter w oldsymbol{w} w:
w = [ 1.1 1.2 1.3 ] oldsymbol{w} = egin{bmatrix} 1.1 \ 1.2 \ 1.3 end{bmatrix} w= 1.11.21.3

Let define the ground truth hypothesis function:
y = w T ⋅ x + b = [ 1.1 2.2 3.3 ] ⋅ [ x 1 x 2 x 3 ] + b = 1.1 ⋅ x 1 + 2.2 ⋅ x 2 + 3.3 ⋅ x 3 + 1.8 y = oldsymbol{w}^{T} cdot oldsymbol{x} + b =egin{bmatrix} 1.1 \ 2.2 \ 3.3 end{bmatrix} cdot egin{bmatrix} x_{1} \ x_{2} \ x_{3} end{bmatrix} + b = 1.1 cdot x_{1} + 2.2 cdot x_{2} + 3.3 cdot x_{3} + 1.8 y=wTx+b= 1.12.23.3 x1x2x3 +b=1.1x1+2.2x2+3.3x3+1.8

Let’s define the paramter w oldsymbol{w} w:

alias w_rank = 2
alias w_r1 = 3
alias w_r2 = 1
var w = NDArray[w_rank, DimList(w_r1, w_r2), DType.f32](w_r1 * w_r2)
w[StaticIntTuple[w_rank](0,0)] = 0.01
w[StaticIntTuple[w_rank](1,0)] = 0.02
w[StaticIntTuple[w_rank](2,0)] = 0.03
var b = SIMD[DType.f32, 1](0.0)

Now we can get the ground truch label ?:

alias y_rank = 1
alias y_r1 = 5
var y = NDArray[y_rank, DimList(y_r1), DType.f32](y_r1)
for d1 in range(y_r1):
    y[StaticIntTuple[y_rank](d1)] = 1.1 * X[StaticIntTuple[X_rank](d1,0)] + 
                2.2 * X[StaticIntTuple[X_rank](d1,1)] + 
                3.3 * X[StaticIntTuple[X_rank](d1,2)] + 1.8

Let define the function get_batch to get a mini batch from the training dataset:

alias batch_size = 2
alias batch_rank = 2
# idx can only be 0,1 We will ignore the last element in X.
fn get_batch(inout batch_X:NDArray[batch_rank, DimList(batch_size, r2), DType.f32],
             inout batch_y:NDArray[y_rank, DimList(batch_size), DType.f32],
             X:NDArray[X_rank, DimList(r1, r2), DType.f32], 
             y:NDArray[y_rank, DimList(y_r1), DType.f32],
             batch_idx:Int):
    for b_idx in range(batch_size):
        batch_y[StaticIntTuple[y_rank](b_idx)] = y[StaticIntTuple[y_rank]
        		(batch_size*batch_idx+b_idx)]
        for c_idx in range(r2):
            batch_X[StaticIntTuple[batch_rank](b_idx, c_idx)] = 
            		X[StaticIntTuple[X_rank](batch_size*batch_idx+b_idx,c_idx)]

Let’s discuss the math theory of linear regress. For ith sample we will omit the (i) subscript for simplicity. The calculated label y ^ hat{y} y^:
y ^ = w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b hat{y} = w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b y^=w1x1+w2x2+w3x3+b
As we have the ground truth label y we define the loss function as:
l = 1 2 ( y ^ − y ) 2 = 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 mathcal{l} = frac{1}{2}(hat{y}-y)^{2} = frac{1}{2}(w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y)^{2} l=21(y^y)2=21(w1x1+w2x2+w3x3+by)2
According to linear regression algorithm we will set random value to parameter w oldsymbol{w} w and zero to b b b. We will calculate y by using these parameters setting. Then we calculate the loss which represent how good our parameters are. Our task is to find the best parameters setting to let the loss minmum:
arg ⁡ min ⁡ w , b l argmin_{oldsymbol{w},b} mathcal{l} argw,bminl

To get the minmum parameter we will get each individual parameter’s gradient of loss and adjust the parameter against the gradient direction. This is the gradient descent algorithm. So let get parameter w 1 w_{1} w1 gradient of loss:
∂ l ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ w 1 = ∂ ( 1 2 ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) 2 ) ∂ ( ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ) ⋅ ∂ ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 frac{partial{mathcal{l}}}{partial{w_{1}}} = frac{partial{ ig( frac{1}{2}(w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y)^{2} ig) }}{partial{w_{1}}} \ = frac{partial{ ig( frac{1}{2}(w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y)^{2} ig) }}{partial{ ig( (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) ig) }} cdot frac{partial{ (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) }} { partial{w_{1}} } \ = (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) cdot x_{1} w1l=w1(21(w1x1+w2x2+w3x3+by)2)=((w1x1+w2x2+w3x3+by))(21(w1x1+w2x2+w3x3+by)2)w1(w1x1+w2x2+w3x3+by)=(w1x1+w2x2+w3x3+by)x1
We use the chain rule of gradient in the above formula. We can get all parameters gradient of loss in the same way:
∂ l ∂ w 1 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 1 ∂ l ∂ w 2 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 2 ∂ l ∂ w 3 = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) ⋅ x 3 ∂ l ∂ b = ( w 1 ⋅ x 1 + w 2 ⋅ x 2 + w 3 ⋅ x 3 + b − y ) frac{partial{mathcal{l}}}{partial{w_{1}}} = (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) cdot x_{1} \ frac{partial{mathcal{l}}}{partial{w_{2}}} = (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) cdot x_{2} \ frac{partial{mathcal{l}}}{partial{w_{3}}} = (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) cdot x_{3} \ frac{partial{mathcal{l}}}{partial{b}} = (w_{1} cdot x_{1} + w_{2} cdot x_{2} + w_{3} cdot x_{3} + b - y) w1l=(w1x1+w2x2+w3x3+by)x1w2l=(w1x1+w2x2+w3x3+by)x2w3l=(w1x1+w2x2+w3x3+by)x3bl=(w1x1+w2x2+w3x3+by)

Assume the learning rate is α alpha α then we can update the parameters:
w 1 : = w 1 − α ⋅ l ∂ w 1 w 2 : = w 2 − α ⋅ l ∂ w 2 w 3 : = w 3 − α ⋅ l ∂ w 3 b : = b − α ⋅ l ∂ b w_{1} := w_{1} - alpha cdot frac{mathcal{l}}{partial{w_{1}}} \ w_{2} := w_{2} - alpha cdot frac{mathcal{l}}{partial{w_{2}}} \ w_{3} := w_{3} - alpha cdot frac{mathcal{l}}{partial{w_{3}}} \ b := b - alpha cdot frac{mathcal{l}}{partial{b}} w1:=w1αw1lw2:=w2αw2lw3:=w3αw3lb:=bαbl

let epochs = 1000
var y_hat = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
alias loss_rank = 1
var loss = NDArray[loss_rank, DimList(batch_size), DType.f32](batch_size)
var coff = 0.5
var Xi = NDArray[batch_rank, DimList(batch_size, r2), DType.f32](batch_size*r2)
var yi = NDArray[y_rank, DimList(batch_size), DType.f32](batch_size)
var lr = 0.001
for epoch in range(epochs):
    for bidx in range(batch_size):
        get_batch(Xi, yi, X, y, bidx)
        # forward pass
        y_hat[StaticIntTuple[y_rank](0)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](0,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](0,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](0,2)] +
                    b
        y_hat[StaticIntTuple[y_rank](1)] = 
        			w[StaticIntTuple[w_rank](0,0)]*Xi[StaticIntTuple[X_rank](1,0)] + 
                    w[StaticIntTuple[w_rank](1,0)]*Xi[StaticIntTuple[X_rank](1,1)] + 
                    w[StaticIntTuple[w_rank](2,0)]*Xi[StaticIntTuple[X_rank](1,2)] +
                    b
        # calculate the loss
        loss[StaticIntTuple[loss_rank](0)] = coff *(
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])*
                (y[StaticIntTuple[y_rank](0)]-y_hat[StaticIntTuple[y_rank](0)])
        )
        loss[StaticIntTuple[loss_rank](1)] = coff *(
                (y[StaticIntTuple[y_rank](1)]-y_hat[StaticIntTuple[y_rank](1)])*
                (y[StaticIntTuple[y_rank](1)]-
                y_hat[StaticIntTuple[y_rank](1)])
        )
        g_w1 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,0)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,0)]
        w[StaticIntTuple[w_rank](0,0)] -= lr*g_w1
        g_w2 = (y_hat[StaticIntTuple[y_rank](0)]-
        		y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,1)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,1)]
        w[StaticIntTuple[w_rank](1,0)] -= lr*g_w2
        g_w3 = (y_hat[StaticIntTuple[y_rank](0)]-
        y[StaticIntTuple[y_rank](0)])*Xi[StaticIntTuple[X_rank](0,2)] + 
                (y_hat[StaticIntTuple[y_rank](1)]-
                y[StaticIntTuple[y_rank](1)])*Xi[StaticIntTuple[X_rank](1,2)]
        w[StaticIntTuple[w_rank](2,0)] -= lr*g_w3
        g_b = (y_hat[StaticIntTuple[y_rank](0)]-y[StaticIntTuple[y_rank](0)]) + 
                (y_hat[StaticIntTuple[y_rank](1)]-y[StaticIntTuple[y_rank](1)])
        b -= lr*g_b
        loss_val = loss[StaticIntTuple[loss_rank](0)] + 
        		loss[StaticIntTuple[loss_rank](0)]
        print('epoch_', epoch, ': idx=', bidx, ' loss=', loss_val, 
        		'; w1=', w[StaticIntTuple[w_rank](0,0)], 
              ', w2=', w[StaticIntTuple[w_rank](1,0)], 
              ', w3=', w[StaticIntTuple[w_rank](2,0)], ', b=', b, ';')
风语者!平时喜欢研究各种技术,目前在从事后端开发工作,热爱生活、热爱工作。