<![CDATA[Utkal's blog]]>https://ukc.hashnode.devRSS for NodeFri, 11 Oct 2024 02:26:51 GMT60<![CDATA[Feature Scaling in Machine Learning]]>https://ukc.hashnode.dev/feature-scaling-in-machine-learninghttps://ukc.hashnode.dev/feature-scaling-in-machine-learningWed, 31 Jul 2024 22:20:43 GMT<![CDATA[<h1 id="heading-what-is-feature-scaling">What is Feature scaling?</h1><p>Feature scaling is a method used to normalize the range of independent variables or features of data. By scaling the features, we ensure that each feature contributes equally to the result, improving the performance and accuracy of machine learning models.</p><p>In data processing, it is also known as data normalization and is crucial for algorithms that are sensitive to the magnitudes of data, such as gradient descent and k-nearest neighbors.</p><p>In this blog, we will look up on the need of Feature Scaling, it's importance and different types of feature scaling used in Machine learning.</p><h2 id="heading-need-of-feature-scaling-in-ml">Need of Feature Scaling in ML</h2><p>When the data includes different features which have significantly different ranges, some features may disproportionately influence the model's output, potentially impacting the it's performance and accuracy. To enhance both efficiency and accuracy, it is essential to transform these numerical features to a common scale using feature scaling techniques.</p><p>For example, to build a machine learning model to predict price of houses including features as number of bedrooms (0-5), size (in 1000 sqft.), age of the house (1-100 years) and many more.</p><p>The machine learning model assign weights (\(w\)) to the independent variables according to their data points. If the difference between the data points is high, the model will provide more weight (\(w\)) to the larger magnitudes, which might confuse our model to consider this feature as a better one. The model with large weights (\(w\)) assigned to a some feature can often produce poor results.</p><p>To understand better, let's take an example -</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722458750109/24a9f1e9-5769-45db-8091-2bbd6cb5788c.png" alt class="image--center mx-auto" /></p><p>This is a plot between two features (number of bedrooms and size) of the houses, which will be used to build a model to predict the housing prices. Clearly, we can see both the features have different ranges.</p><ul><li><p><strong>size (in sqft.)</strong>- \(300\leq x_{1}\leq2000\)</p></li><li><p><strong>numberof bedrooms</strong>- \(1\leq x_{2}\leq5\)</p></li></ul><p>If we train our machine learning model on this dataset, then it might assign heavy weight (\(w\)) to the \(x_{1}\) (size), as it has larger magnitudes, which might lessen the contribution of \(x_{2}\) (no. of bedrooms) to the housing prices. So, the model might produce poor results.</p><p>To train our machine learning model to produce more accurate results, we need to do feature scaling on the features.</p><h2 id="heading-types-of-feature-scaling">Types of Feature Scaling</h2><p>There are 3 types of feature scaling we use -</p><ul><li><p>Min-max Normalization</p></li><li><p>Mean Normalization</p></li><li><p>Standardization</p></li></ul><h3 id="heading-min-max-normalization">Min-max Normalization</h3><p>This technique rescales a feature with a distribution value between 0 and 1.</p><p>$$x_{ scaled} = \frac{x - min(x)}{max(x) - min(x)}$$</p><p>Applying it to our considered example -</p><p>\(x_{1 scaled} = \frac{x_{1} - 300}{2000 - 300}\)</p><p>\(x_{2 scaled} = \frac{x_{2} - 1}{5- 1}\)</p><p>New ranges -</p><p>\(0 \leq x_{1 rescaled} \leq 1\)</p><p>\(0 \leq x_{2rescaled} \leq 1\)</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722462840432/278b59c2-fc98-4a38-a9ab-99fe86cb0ccd.png" alt class="image--center mx-auto" /></p><h3 id="heading-mean-normalization">Mean Normalization</h3><p>This technique calculates and subtracts the mean for every feature in a machine learning model and brings all the model into similar range.</p><p>$$x_{rescaled} = \frac{x - \mu}{max(x)-min(x)}$$</p><p>where, \(\mu\) is the mean of \(x\)</p><p>Applying it to our considered example -</p><p>\(x_{1rescaled} = \frac{x - 1183}{2000-300}\) (\(\mu_{x_{1}} = 1183\))</p><p>\(x_{2rescaled} = \frac{x - 2.45}{5-1}\) (\(\mu_{x_{2}} = 2.45\))</p><p>New ranges -</p><p>\(-0.519 \leq x_{1 rescaled} \leq 0.480\)</p><p>\(-0.362 \leq x_{2 rescaled} \leq 0.637\)</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722462641594/947596cf-8ac9-43de-8238-a2f546fc8196.png" alt class="image--center mx-auto" /></p><h3 id="heading-standardization">Standardization</h3><p>When the same process as in mean normalization is done, and the standard deviation is used as the denominator, then this process is called standardization.</p><p>$$x_{rescaled} = \frac{x - \mu}{\sigma}$$</p><p>where,</p><ul><li><p>\(\mu\) is the mean of x</p></li><li><p>\(\sigma\) is the standard deviation of x</p></li></ul><p>This technique is mostly used when the distribution of data follows *Gaussian distribution.</p><p>*(a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean - <a target="_blank" href="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRpZkelUOBTmrxYq_hKy4NDXAa7LLW6SMxWMw&s">example</a>)</p><p>Applying it to our example-</p><p>\(x_{1rescaled} = \frac{x - 1183}{569.485}\) (\(\mu_{1} = 1183\) and \(\sigma_{1} = 569.485\))</p><p>\(x_{2rescaled} = \frac{x - 2.45}{1.283}\) (\(\mu_{2} = 2.45 \) and \(\sigma_{2} = 1.283\))</p><p>New ranges -</p><p>\(-1.550 \leq x_{1 rescaled} \leq 1.434\)</p><p>\(-1.129 \leq x_{2 rescaled} \leq 1.986\)</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722462517812/cb7aa372-f275-4c03-b6cc-728ccd4499a5.png" alt class="image--center mx-auto" /></p><h2 id="heading-conclusion">Conclusion</h2><p>By applying these techniques, we bring all features into similar ranges. This ensures that all features contribute equally to the output and enhance the performance of the machine learning model. By applying these techniques, we prevent any single feature from disproportionately influencing the model, leading to more balanced and accurate predictions. Feature scaling is thus a crucial step in the data preprocessing phase for building more accurate machine learning models.</p><p><em>If you have any questions or need further clarification on any of the topics discussed, feel free to leave a comment below or reach out to me directly. Let's learn and grow together!</em></p><p>LinkedIn- <a target="_blank" href="https://www.linkedin.com/in/utkal-kumar-das-785074289/">https://www.linkedin.com/in/utkal-kumar-das-785074289/</a></p><p>To further explore the world of machine learning, here are some recommended resources:</p><ul><li><p>Coursera: Machine Learning by Andrew Ng- <a target="_blank" href="https://www.coursera.org/learn/machine-learning">https://www.coursera.org/learn/machine-learning</a></p></li><li><p>Towards Data Science- <a target="_blank" href="https://towardsdatascience.com/">https://towardsdatascience.com/</a></p></li></ul>]]><![CDATA[<h1 id="heading-what-is-feature-scaling">What is Feature scaling?</h1><p>Feature scaling is a method used to normalize the range of independent variables or features of data. By scaling the features, we ensure that each feature contributes equally to the result, improving the performance and accuracy of machine learning models.</p><p>In data processing, it is also known as data normalization and is crucial for algorithms that are sensitive to the magnitudes of data, such as gradient descent and k-nearest neighbors.</p><p>In this blog, we will look up on the need of Feature Scaling, it's importance and different types of feature scaling used in Machine learning.</p><h2 id="heading-need-of-feature-scaling-in-ml">Need of Feature Scaling in ML</h2><p>When the data includes different features which have significantly different ranges, some features may disproportionately influence the model's output, potentially impacting the it's performance and accuracy. To enhance both efficiency and accuracy, it is essential to transform these numerical features to a common scale using feature scaling techniques.</p><p>For example, to build a machine learning model to predict price of houses including features as number of bedrooms (0-5), size (in 1000 sqft.), age of the house (1-100 years) and many more.</p><p>The machine learning model assign weights (\(w\)) to the independent variables according to their data points. If the difference between the data points is high, the model will provide more weight (\(w\)) to the larger magnitudes, which might confuse our model to consider this feature as a better one. The model with large weights (\(w\)) assigned to a some feature can often produce poor results.</p><p>To understand better, let's take an example -</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722458750109/24a9f1e9-5769-45db-8091-2bbd6cb5788c.png" alt class="image--center mx-auto" /></p><p>This is a plot between two features (number of bedrooms and size) of the houses, which will be used to build a model to predict the housing prices. Clearly, we can see both the features have different ranges.</p><ul><li><p><strong>size (in sqft.)</strong>- \(300\leq x_{1}\leq2000\)</p></li><li><p><strong>numberof bedrooms</strong>- \(1\leq x_{2}\leq5\)</p></li></ul><p>If we train our machine learning model on this dataset, then it might assign heavy weight (\(w\)) to the \(x_{1}\) (size), as it has larger magnitudes, which might lessen the contribution of \(x_{2}\) (no. of bedrooms) to the housing prices. So, the model might produce poor results.</p><p>To train our machine learning model to produce more accurate results, we need to do feature scaling on the features.</p><h2 id="heading-types-of-feature-scaling">Types of Feature Scaling</h2><p>There are 3 types of feature scaling we use -</p><ul><li><p>Min-max Normalization</p></li><li><p>Mean Normalization</p></li><li><p>Standardization</p></li></ul><h3 id="heading-min-max-normalization">Min-max Normalization</h3><p>This technique rescales a feature with a distribution value between 0 and 1.</p><p>$$x_{ scaled} = \frac{x - min(x)}{max(x) - min(x)}$$</p><p>Applying it to our considered example -</p><p>\(x_{1 scaled} = \frac{x_{1} - 300}{2000 - 300}\)</p><p>\(x_{2 scaled} = \frac{x_{2} - 1}{5- 1}\)</p><p>New ranges -</p><p>\(0 \leq x_{1 rescaled} \leq 1\)</p><p>\(0 \leq x_{2rescaled} \leq 1\)</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722462840432/278b59c2-fc98-4a38-a9ab-99fe86cb0ccd.png" alt class="image--center mx-auto" /></p><h3 id="heading-mean-normalization">Mean Normalization</h3><p>This technique calculates and subtracts the mean for every feature in a machine learning model and brings all the model into similar range.</p><p>$$x_{rescaled} = \frac{x - \mu}{max(x)-min(x)}$$</p><p>where, \(\mu\) is the mean of \(x\)</p><p>Applying it to our considered example -</p><p>\(x_{1rescaled} = \frac{x - 1183}{2000-300}\) (\(\mu_{x_{1}} = 1183\))</p><p>\(x_{2rescaled} = \frac{x - 2.45}{5-1}\) (\(\mu_{x_{2}} = 2.45\))</p><p>New ranges -</p><p>\(-0.519 \leq x_{1 rescaled} \leq 0.480\)</p><p>\(-0.362 \leq x_{2 rescaled} \leq 0.637\)</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722462641594/947596cf-8ac9-43de-8238-a2f546fc8196.png" alt class="image--center mx-auto" /></p><h3 id="heading-standardization">Standardization</h3><p>When the same process as in mean normalization is done, and the standard deviation is used as the denominator, then this process is called standardization.</p><p>$$x_{rescaled} = \frac{x - \mu}{\sigma}$$</p><p>where,</p><ul><li><p>\(\mu\) is the mean of x</p></li><li><p>\(\sigma\) is the standard deviation of x</p></li></ul><p>This technique is mostly used when the distribution of data follows *Gaussian distribution.</p><p>*(a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean - <a target="_blank" href="https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcRpZkelUOBTmrxYq_hKy4NDXAa7LLW6SMxWMw&s">example</a>)</p><p>Applying it to our example-</p><p>\(x_{1rescaled} = \frac{x - 1183}{569.485}\) (\(\mu_{1} = 1183\) and \(\sigma_{1} = 569.485\))</p><p>\(x_{2rescaled} = \frac{x - 2.45}{1.283}\) (\(\mu_{2} = 2.45 \) and \(\sigma_{2} = 1.283\))</p><p>New ranges -</p><p>\(-1.550 \leq x_{1 rescaled} \leq 1.434\)</p><p>\(-1.129 \leq x_{2 rescaled} \leq 1.986\)</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1722462517812/cb7aa372-f275-4c03-b6cc-728ccd4499a5.png" alt class="image--center mx-auto" /></p><h2 id="heading-conclusion">Conclusion</h2><p>By applying these techniques, we bring all features into similar ranges. This ensures that all features contribute equally to the output and enhance the performance of the machine learning model. By applying these techniques, we prevent any single feature from disproportionately influencing the model, leading to more balanced and accurate predictions. Feature scaling is thus a crucial step in the data preprocessing phase for building more accurate machine learning models.</p><p><em>If you have any questions or need further clarification on any of the topics discussed, feel free to leave a comment below or reach out to me directly. Let's learn and grow together!</em></p><p>LinkedIn- <a target="_blank" href="https://www.linkedin.com/in/utkal-kumar-das-785074289/">https://www.linkedin.com/in/utkal-kumar-das-785074289/</a></p><p>To further explore the world of machine learning, here are some recommended resources:</p><ul><li><p>Coursera: Machine Learning by Andrew Ng- <a target="_blank" href="https://www.coursera.org/learn/machine-learning">https://www.coursera.org/learn/machine-learning</a></p></li><li><p>Towards Data Science- <a target="_blank" href="https://towardsdatascience.com/">https://towardsdatascience.com/</a></p></li></ul>]]>https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/-WXQm_NTK0U/upload/55afa5bf6d3b778cd9f5202a02dccf12.jpeg<![CDATA[Understanding Linear Regression in Machine Learning]]>https://ukc.hashnode.dev/understanding-linear-regression-in-machine-learninghttps://ukc.hashnode.dev/understanding-linear-regression-in-machine-learningTue, 23 Jul 2024 18:54:26 GMT<![CDATA[<p>Regression in ML is a supervised learning algorithm which computes a relationship between dependent and independent variables.</p><p>It is most often used to predict an output from multiple possible outputs. (in most of the cases it is a number)</p><p>There are two types of regressions we use in machine learning. They are-</p><ul><li><p>Linear Regression</p></li><li><p>Logistic Regression</p></li></ul><p>In this blog, we'll learn about basics of Linear regression and implement it in python.</p><h1 id="heading-linear-regression">Linear Regression</h1><p>Linear regression is a type of supervised-machine learning algorithm which computes a linear relationship between a dependable variable and other independent variables.</p><p>Linear regression is of two types -</p><ul><li><p>Simple Linear Regression - It involves only one dependent variable and one independent variable.</p></li><li><p>Multiple Linear Regression - It involve one dependent variable and more than one independent variables</p></li></ul><h1 id="heading-simple-linear-regression">Simple Linear Regression</h1><p>A simple linear regression computes a relationship between one dependent variable and one independent variable. It is represented by -</p><p>$$\hat{y} = w.x + b$$</p><p>where:</p><ul><li><p>\(\hat{y}\) is the dependent variable (output)</p></li><li><p>\(x\) is the independent variable (input)</p></li><li><p>\(w\) is the slope</p></li><li><p>\(b\) is the intercept</p></li></ul><p>** \(y\) <em>and</em> \(\hat{y}\) <em>are two different terms, where:</em></p><ul><li><p>\(y\) <em>is the true value (used to train the model)</em></p></li><li><p>\(\hat{y}\) <em>is the output value obtained from the linear regression model</em></p></li></ul><p>The goal of a simple linear regression algorithm is to find the best-fit Line equation between the inputs and outputs. As the name suggests 'best-fit Line', it implies that the error between the predicted values and actual values should be minimum.</p><p><a target="_blank" href="https://media.geeksforgeeks.org/wp-content/uploads/20231129130431/11111111.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721687447458/2e6ad4b7-5c4e-4e1f-96d9-ccdfa6d86bbe.png" alt class="image--center mx-auto" /></a></p><p><em>Here, Y is the output variable and X is the input variable.</em></p><p>Linear regression is a model that performs the task to predict the output \(\hat{y}\) based on the given input \(x\).</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721688863900/1370ede3-e2c1-4098-9f71-0666a10948b4.png" alt class="image--center mx-auto" /></p><h2 id="heading-predicting-the-values-of-w-and-b">Predicting the values of \(w\) and \(b\)</h2><p>In order to achieve the equation (<em>of best-fit line</em>) that predicts the output value (\(\hat{y}\)), such that the error between the predicted values (\(\hat{y}\)) and the true value (\(y\)) is minimum, we need to update the values of \(w\) and \(b\).</p><h3 id="heading-cost-function">Cost Function</h3><p>Cost Function is nothing but an error calculator between the predicted values (\(\hat{y}\)) and the true values (\(y\)).</p><p>In Linear regression, we use <strong>Mean Squared Error</strong> (MSE) cost function to calculate the average of the squared error between the predicted values (\(\hat{y}_i\)) and true values (\(y_i\)). The purpose of the cost function is to determine the values of \(w\) and \(b\) that would minimize the error of the linear regression model.</p><p>The MSE cost function can be calculated as:</p><p>$$J(w,b) = \frac{1}{2m}\times\sum_{i=1}^m(\hat{y}_i-y_i)^2$$</p><p>where:</p><ul><li><p>\(J(w,b)\) represents the cost function.</p></li><li><p>\(m\) is the total number of training sets.</p></li><li><p>\(i = (1, 2, 3,.... m)\)</p></li></ul><p>The graph of \(J(w,b)\) or cost function is bowl shaped.</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721698315849/a2e6437a-8f0e-4f17-9fb7-c790566b9517.png" alt class="image--center mx-auto" /></p><p>So, the values of \(w\) and \(b\) for which \(J(w,b)\) will be minimum is the best-fit for the Linear regression model.</p><h3 id="heading-gradient-descent">Gradient Descent</h3><p>Gradient descent is just an optimization algorithm used to train Linear regression model to reduce the cost function to the minimum by modifying the parameters \(w\) and \(b\) iteratively. The idea is to start with random values of \(w\) and \(b\) and then iteratively update the values to reach minimum \(J(w,b)\).</p><p>The algorithm of Gradient descent:</p><p><em>repeat till convergence</em> {</p><p>\(w = w - \alpha\frac{\partial}{\partial w}J(w,b)\)</p><p>\(b = b - \alpha\frac{\partial}{\partial b}J(w,b)\)</p><p>}</p><p>Differentiating \(J\) with respect to \(w\) :</p><p>\(\frac{\partial}{\partial w}J(w,b)\)</p><p>\(= \frac{\partial}{\partial w}\sum_{i=1}^m\frac{1}{2m}(\hat{y}_i-y_i)^2\)</p><p>\(= \frac{1}{2m}\frac{\partial}{\partial w}\sum_{i=1}^m((w\times x_i + b)-y_i)^2\)</p><p>\(= \frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i).x_i\)</p><p>Differentiating \(J\) with respect to \(b\) :</p><p>\(\frac{\partial}{\partial b}J(w,b)\)</p><p>\(= \frac{\partial}{\partial b}\sum_{i=1}^m\frac{1}{2m}(\hat{y}_i-y_i)^2\)</p><p>\(= \frac{1}{2m}\frac{\partial}{\partial b}\sum_{i=1}^m((w\times x_i + b)-y_i)^2\)</p><p>\(= \frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i)\)</p><p>The Gradient descent algorithm (after putting the values of partial derivatives):</p><p><em>repeat till convergence</em>{</p><p>\(w = w - \alpha\frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i).x_i\)</p><p>\(b = b - \alpha\frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i)\)</p><p>}</p><p>where:</p><ul><li><p>\(w\) and \(b\) are the parameters of linear regression model</p></li><li><p>\(\alpha\) is the learning rate</p></li></ul><p><strong>Learning Rate (</strong>\(\alpha\)) <strong>-</strong></p><p>Learning rate (\(\alpha\)) is a constant multiplied to the partial derivative term which determines how fast we reach minimum \(J(w,b)\).</p><p>If \(\alpha\) is too small, then the gradient descent may be slow and it would take a long time to reach minimum \(J(w,b)\).</p><p>If \(\alpha \) is too large, then the gradient descent may overshoot and never reach minimum \(J(w,b)\).</p><p>Therefore, we should choose a learning rate (\(\alpha\)) that's neither too small nor too large.</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721724088376/920959bf-7b24-4639-93cc-a71e24df1640.png" alt class="image--center mx-auto" /></p><p>So, this is it for the mathematics part of Linear regression, now let's go through the python implementation of Linear regression model.</p><h2 id="heading-python-implementation-of-linear-regression-model">Python Implementation of Linear regression model</h2><p>To understand the Linear regression model, we are going to use basic python with some frameworks such as numpy and matplotlib here.</p><p>Let's take a small dataset and develop a Linear regression model for it.</p><div class="hn-table"><table><thead><tr><td>Size (in 1000 sq ft.)</td><td>Price (in 1000s of dollars)</td></tr></thead><tbody><tr><td>1.0</td><td>300</td></tr><tr><td>1.5</td><td>360</td></tr><tr><td>2.0</td><td>500</td></tr><tr><td>2.75</td><td>540</td></tr><tr><td>3.0</td><td>650</td></tr></tbody></table></div><p>First, we will import all the necessary libraries.</p><pre><code class="lang-python"><span class="hljs-keyword">import</span> math, copy<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt</code></pre><p>Entering the data and storing them in an array which will be used to train the linear regression model.</p><pre><code class="lang-python">x_train = np.array([<span class="hljs-number">1.0</span>, <span class="hljs-number">1.5</span>, <span class="hljs-number">2.0</span>, <span class="hljs-number">2.75</span>, <span class="hljs-number">3.0</span>])y_train = np.array([<span class="hljs-number">300.0</span>, <span class="hljs-number">360.0</span>, <span class="hljs-number">500.0</span>, <span class="hljs-number">540.0</span>, <span class="hljs-number">650.0</span>,])m = x_train.shape[<span class="hljs-number">0</span>] <span class="hljs-comment">#total number of training datasets</span></code></pre><p>Now, computing the Linear regression model by taking some random values of parameters \(w\) and \(b\).</p><pre><code class="lang-python">w = <span class="hljs-number">100</span> b = <span class="hljs-number">100</span><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_model_output</span>(<span class="hljs-params">x, w, b</span>):</span> m = x.shape[<span class="hljs-number">0</span>] y_hat = np.zeros(m) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(m): y_hat[i] = w*x[i] + b <span class="hljs-comment">#predicted values</span> <span class="hljs-keyword">return</span> y_hattemp_y_hat = compute_model_output(x_train, w, b)</code></pre><p>Let's plot the graph and see how fit is our Linear regression model.</p><pre><code class="lang-python">plt.plot(x_train, temp_y_hat, c=<span class="hljs-string">'b'</span>, label=<span class="hljs-string">"our predictions"</span>)plt.scatter(x_train, y_train, marker=<span class="hljs-string">'x'</span>, c=<span class="hljs-string">'r'</span>)plt.title(<span class="hljs-string">"Housing prices"</span>)plt.ylabel(<span class="hljs-string">"Price (in 1000s of dollars)"</span>)plt.xlabel(<span class="hljs-string">"Size (in 1000 sqft.)"</span>)plt.show()</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721730641843/34a3c6df-5a75-440f-8670-ebe0caba082c.jpeg" alt class="image--center mx-auto" /></p><p>It's clearly visible that it is not the best fit Linear regression model for our dataset. We need to update the values of the parameters \(w\) and \(b\).</p><p>Now we'll calculate the MSE Cost function to find the error between the predicted and true values.</p><pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_cost</span>(<span class="hljs-params">x, y, w, b</span>):</span> m = x.shape[<span class="hljs-number">0</span>] cost_sum = <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(m): y_hat = w*x[i] + b cost = (y_hat - y[i])**<span class="hljs-number">2</span> cost_sum += cost total_cost = cost_sum / (<span class="hljs-number">2</span>*m) <span class="hljs-keyword">return</span> total_costa = compute_cost(x_train, y_train, w, b)print(a)</code></pre><p>$$15182.5$$</p><p>The value of cost function is too high, we need to run the Gradient descent algorithm to update the values of \(w\) and \(b\) to minimize cost function.</p><p>Let's now write the code for Gradient descent algorithm (which is the main objective of Linear regression).</p><pre><code class="lang-python"><span class="hljs-comment"># computing gradient (partial derivative)</span><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_gradient</span>(<span class="hljs-params">x, y, w, b</span>):</span> m = x.shape[<span class="hljs-number">0</span>] dj_dw = <span class="hljs-number">0</span> dj_db = <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(m): y_hat = w*x[i] + b dj_dw_i = (y_hat - y[i])*x[i] dj_db_i = (y_hat - y[i]) dj_dw += dj_dw_i dj_db += dj_db_i dj_dw = dj_dw / m dj_db = dj_db / m <span class="hljs-keyword">return</span> dj_dw, dj_db</code></pre><pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">gradient_descent</span>(<span class="hljs-params">x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function</span>):</span> <span class="hljs-comment"># x = input data</span> <span class="hljs-comment"># y = target values</span> <span class="hljs-comment"># w_in and b_in = initial values of w and b</span> <span class="hljs-comment"># alpha = learning rate</span> <span class="hljs-comment"># num_iters = number of iterations to run gradient descent</span> <span class="hljs-comment"># cost_function = function to compute cost</span> <span class="hljs-comment"># gradient_function = function to compute gradient</span> J_history = [] <span class="hljs-comment"># history of cost values</span> p_history = [] <span class="hljs-comment"># history of parameters w and b</span> w = w_in b = b_in <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(num_iters): dj_dw, dj_db = gradient_function(x, y, w, b) w = w - (alpha * dj_dw) b = b - (alpha * dj_db) <span class="hljs-comment"># Save cost J at each iteration</span> <span class="hljs-keyword">if</span> (i < <span class="hljs-number">100000</span>): <span class="hljs-comment"># to prevent resource exhaustion</span> J_history.append(cost_function(x, y, w, b)) p_history.append([w,b]) <span class="hljs-comment"># Print cost at every 10 intervals and at every interval if i<10</span> <span class="hljs-keyword">if</span> (i % math.ceil(num_iters/<span class="hljs-number">10</span>) == <span class="hljs-number">0</span>): print(<span class="hljs-string">f"Iteration <span class="hljs-subst">{i:<span class="hljs-number">4</span>}</span>: Cost <span class="hljs-subst">{J_history[<span class="hljs-number">-1</span>]:<span class="hljs-number">0.2</span>e}</span> "</span>, <span class="hljs-string">f"dj_dw: <span class="hljs-subst">{dj_dw: <span class="hljs-number">0.3</span>e}</span>, dj_db: <span class="hljs-subst">{dj_db: <span class="hljs-number">0.3</span>e}</span> "</span>, <span class="hljs-string">f"w: <span class="hljs-subst">{w: <span class="hljs-number">0.3</span>e}</span>, b:<span class="hljs-subst">{b: <span class="hljs-number">0.5</span>e}</span>"</span>) <span class="hljs-keyword">return</span> w, b, J_history, p_historyiterations = <span class="hljs-number">10000</span>tmp_alpha = <span class="hljs-number">0.01</span>w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w, b, tmp_alpha, iterations, compute_cost, compute_gradient)print(<span class="hljs-string">f"(w,b) found by gradient descent: (<span class="hljs-subst">{w_final:<span class="hljs-number">8.4</span>f}</span>,<span class="hljs-subst">{b_final:<span class="hljs-number">8.4</span>f}</span>)"</span>)</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721737579942/9ddba0a6-cc11-45ed-ab6d-11e615fcd217.jpeg" alt class="image--center mx-auto" /></p><p>Now, we got the best-fit values of parameters \(w\) and \(b\), by which we can achieve the most optimized Linear regression model for the given dataset.</p><p>Let's plot the graph again after obtaining the values of parameters \(w\) and \(b\).</p><pre><code class="lang-python">temp_y_hat = compute_model_output(x_train, w_final, b_final)plt.plot(x_train, temp_y_hat, c=<span class="hljs-string">'b'</span>, label=<span class="hljs-string">"our predictions"</span>)plt.scatter(x_train, y_train, marker=<span class="hljs-string">'x'</span>, c=<span class="hljs-string">'r'</span>)plt.title(<span class="hljs-string">"Housing prices"</span>)plt.ylabel(<span class="hljs-string">"Price (in 1000s of dollars)"</span>)plt.xlabel(<span class="hljs-string">"Size (in 1000 sqft.)"</span>)plt.show()</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721739213687/779cb1b4-4d62-4d82-b92f-81e20858e0d4.jpeg" alt class="image--center mx-auto" /></p><p>So, now we got the best-fit line, which means we have successfully trained the Linear regression model for our dataset. We can now enter new test values (size in 1000 sqft.) and the model will give us the output (price in 1000s of dollars) with minimum error.</p><pre><code class="lang-python">x_test = np.array([<span class="hljs-number">2.75</span>])y_hat_test = compute_model_output(x_test, w_final, b_final)print(<span class="hljs-string">f"The predicted price for a house of size <span class="hljs-subst">{x_test[<span class="hljs-number">0</span>]*<span class="hljs-number">1000</span>}</span> sqft. is $<span class="hljs-subst">{round(y_hat_test[<span class="hljs-number">0</span>]*<span class="hljs-number">1000</span>,<span class="hljs-number">2</span>)}</span>"</span>)</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721740307603/0e6d2bc7-428c-42dd-aaa4-2b3f8d1f0cd7.png" alt class="image--center mx-auto" /></p><h1 id="heading-multiple-linear-regression">Multiple Linear Regression</h1><p>A multiple linear regression computes a relationship between one dependent variable and more than one independent variable. It is represented by -</p><p>$$\hat{y} = \vec{w}.\vec{x} + b$$</p><p>where:</p><ul><li><p>\(\hat{y}\) is the dependent variable (output)</p></li><li><p>\(\vec{x}\) is the vector of all the independent variables (inputs)</p></li><li><p>\(\vec{w}\) is a vector of parameters corresponding to each \(x\)</p><p> <em>(the value of each</em> \(w\) <em>corresponding to the</em> \(x\) <em>(feature) depends on how much it affects the output result)</em></p></li><li><p>\(b\) is another parameter</p></li></ul><p>To understand Multiple linear regression, we can take help of the previous example of prediction of Housing price, where there was only one independent variable (\(x\)) (size of the housing) and one dependent variable (\(\hat{y}\)) (price). But we know that the Price of the house depends on a lot of factors other than size, like age of the house, location and more.</p><p>So, to predict the price of the house more accurately we have to consider more than one independent variables like age and size together to build a Linear regression model.</p><p>The method to train a Multiple linear regression model is mostly same as that of Simple linear regression model except for in place of all the scaler calculations, we use vector calculations. Later, in the upcoming blogs I'll discuss more about Multiple Linear regression.</p><h1 id="heading-conclusion">Conclusion</h1><p>So, this was quite everything about Linear Regression in Machine Learning. After reading this blog, you should now have a basic understanding of Linear Regression and be able to implement it in Python.</p><p><em>If you have any questions or need further clarification on any of the topics discussed, feel free to leave a comment below or reach out to me directly. Let's learn and grow together!</em></p><p>LinkedIn: <a target="_blank" href="https://www.linkedin.com/in/utkal-kumar-das-785074289">https://www.linkedin.com/in/utkal-kumar-das-785074289</a></p><p>To further explore the world of machine learning, here are some recommended resources:</p><ul><li><p>Coursera: Machine Learning by Andrew Ng-<a target="_blank" href="https://www.coursera.org/learn/machine-learning">https://<strong>www.coursera.org/learn/machine-learning</strong></a></p></li><li><p>Towards Data Science- <a target="_blank" href="https://towardsdatascience.com/"><strong>https://towardsdatascience.com/</strong></a></p></li></ul>]]><![CDATA[<p>Regression in ML is a supervised learning algorithm which computes a relationship between dependent and independent variables.</p><p>It is most often used to predict an output from multiple possible outputs. (in most of the cases it is a number)</p><p>There are two types of regressions we use in machine learning. They are-</p><ul><li><p>Linear Regression</p></li><li><p>Logistic Regression</p></li></ul><p>In this blog, we'll learn about basics of Linear regression and implement it in python.</p><h1 id="heading-linear-regression">Linear Regression</h1><p>Linear regression is a type of supervised-machine learning algorithm which computes a linear relationship between a dependable variable and other independent variables.</p><p>Linear regression is of two types -</p><ul><li><p>Simple Linear Regression - It involves only one dependent variable and one independent variable.</p></li><li><p>Multiple Linear Regression - It involve one dependent variable and more than one independent variables</p></li></ul><h1 id="heading-simple-linear-regression">Simple Linear Regression</h1><p>A simple linear regression computes a relationship between one dependent variable and one independent variable. It is represented by -</p><p>$$\hat{y} = w.x + b$$</p><p>where:</p><ul><li><p>\(\hat{y}\) is the dependent variable (output)</p></li><li><p>\(x\) is the independent variable (input)</p></li><li><p>\(w\) is the slope</p></li><li><p>\(b\) is the intercept</p></li></ul><p>** \(y\) <em>and</em> \(\hat{y}\) <em>are two different terms, where:</em></p><ul><li><p>\(y\) <em>is the true value (used to train the model)</em></p></li><li><p>\(\hat{y}\) <em>is the output value obtained from the linear regression model</em></p></li></ul><p>The goal of a simple linear regression algorithm is to find the best-fit Line equation between the inputs and outputs. As the name suggests 'best-fit Line', it implies that the error between the predicted values and actual values should be minimum.</p><p><a target="_blank" href="https://media.geeksforgeeks.org/wp-content/uploads/20231129130431/11111111.png"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721687447458/2e6ad4b7-5c4e-4e1f-96d9-ccdfa6d86bbe.png" alt class="image--center mx-auto" /></a></p><p><em>Here, Y is the output variable and X is the input variable.</em></p><p>Linear regression is a model that performs the task to predict the output \(\hat{y}\) based on the given input \(x\).</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721688863900/1370ede3-e2c1-4098-9f71-0666a10948b4.png" alt class="image--center mx-auto" /></p><h2 id="heading-predicting-the-values-of-w-and-b">Predicting the values of \(w\) and \(b\)</h2><p>In order to achieve the equation (<em>of best-fit line</em>) that predicts the output value (\(\hat{y}\)), such that the error between the predicted values (\(\hat{y}\)) and the true value (\(y\)) is minimum, we need to update the values of \(w\) and \(b\).</p><h3 id="heading-cost-function">Cost Function</h3><p>Cost Function is nothing but an error calculator between the predicted values (\(\hat{y}\)) and the true values (\(y\)).</p><p>In Linear regression, we use <strong>Mean Squared Error</strong> (MSE) cost function to calculate the average of the squared error between the predicted values (\(\hat{y}_i\)) and true values (\(y_i\)). The purpose of the cost function is to determine the values of \(w\) and \(b\) that would minimize the error of the linear regression model.</p><p>The MSE cost function can be calculated as:</p><p>$$J(w,b) = \frac{1}{2m}\times\sum_{i=1}^m(\hat{y}_i-y_i)^2$$</p><p>where:</p><ul><li><p>\(J(w,b)\) represents the cost function.</p></li><li><p>\(m\) is the total number of training sets.</p></li><li><p>\(i = (1, 2, 3,.... m)\)</p></li></ul><p>The graph of \(J(w,b)\) or cost function is bowl shaped.</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721698315849/a2e6437a-8f0e-4f17-9fb7-c790566b9517.png" alt class="image--center mx-auto" /></p><p>So, the values of \(w\) and \(b\) for which \(J(w,b)\) will be minimum is the best-fit for the Linear regression model.</p><h3 id="heading-gradient-descent">Gradient Descent</h3><p>Gradient descent is just an optimization algorithm used to train Linear regression model to reduce the cost function to the minimum by modifying the parameters \(w\) and \(b\) iteratively. The idea is to start with random values of \(w\) and \(b\) and then iteratively update the values to reach minimum \(J(w,b)\).</p><p>The algorithm of Gradient descent:</p><p><em>repeat till convergence</em> {</p><p>\(w = w - \alpha\frac{\partial}{\partial w}J(w,b)\)</p><p>\(b = b - \alpha\frac{\partial}{\partial b}J(w,b)\)</p><p>}</p><p>Differentiating \(J\) with respect to \(w\) :</p><p>\(\frac{\partial}{\partial w}J(w,b)\)</p><p>\(= \frac{\partial}{\partial w}\sum_{i=1}^m\frac{1}{2m}(\hat{y}_i-y_i)^2\)</p><p>\(= \frac{1}{2m}\frac{\partial}{\partial w}\sum_{i=1}^m((w\times x_i + b)-y_i)^2\)</p><p>\(= \frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i).x_i\)</p><p>Differentiating \(J\) with respect to \(b\) :</p><p>\(\frac{\partial}{\partial b}J(w,b)\)</p><p>\(= \frac{\partial}{\partial b}\sum_{i=1}^m\frac{1}{2m}(\hat{y}_i-y_i)^2\)</p><p>\(= \frac{1}{2m}\frac{\partial}{\partial b}\sum_{i=1}^m((w\times x_i + b)-y_i)^2\)</p><p>\(= \frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i)\)</p><p>The Gradient descent algorithm (after putting the values of partial derivatives):</p><p><em>repeat till convergence</em>{</p><p>\(w = w - \alpha\frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i).x_i\)</p><p>\(b = b - \alpha\frac{1}{m}\sum_{i=1}^m(\hat{y}_i-y_i)\)</p><p>}</p><p>where:</p><ul><li><p>\(w\) and \(b\) are the parameters of linear regression model</p></li><li><p>\(\alpha\) is the learning rate</p></li></ul><p><strong>Learning Rate (</strong>\(\alpha\)) <strong>-</strong></p><p>Learning rate (\(\alpha\)) is a constant multiplied to the partial derivative term which determines how fast we reach minimum \(J(w,b)\).</p><p>If \(\alpha\) is too small, then the gradient descent may be slow and it would take a long time to reach minimum \(J(w,b)\).</p><p>If \(\alpha \) is too large, then the gradient descent may overshoot and never reach minimum \(J(w,b)\).</p><p>Therefore, we should choose a learning rate (\(\alpha\)) that's neither too small nor too large.</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721724088376/920959bf-7b24-4639-93cc-a71e24df1640.png" alt class="image--center mx-auto" /></p><p>So, this is it for the mathematics part of Linear regression, now let's go through the python implementation of Linear regression model.</p><h2 id="heading-python-implementation-of-linear-regression-model">Python Implementation of Linear regression model</h2><p>To understand the Linear regression model, we are going to use basic python with some frameworks such as numpy and matplotlib here.</p><p>Let's take a small dataset and develop a Linear regression model for it.</p><div class="hn-table"><table><thead><tr><td>Size (in 1000 sq ft.)</td><td>Price (in 1000s of dollars)</td></tr></thead><tbody><tr><td>1.0</td><td>300</td></tr><tr><td>1.5</td><td>360</td></tr><tr><td>2.0</td><td>500</td></tr><tr><td>2.75</td><td>540</td></tr><tr><td>3.0</td><td>650</td></tr></tbody></table></div><p>First, we will import all the necessary libraries.</p><pre><code class="lang-python"><span class="hljs-keyword">import</span> math, copy<span class="hljs-keyword">import</span> numpy <span class="hljs-keyword">as</span> np<span class="hljs-keyword">import</span> matplotlib.pyplot <span class="hljs-keyword">as</span> plt</code></pre><p>Entering the data and storing them in an array which will be used to train the linear regression model.</p><pre><code class="lang-python">x_train = np.array([<span class="hljs-number">1.0</span>, <span class="hljs-number">1.5</span>, <span class="hljs-number">2.0</span>, <span class="hljs-number">2.75</span>, <span class="hljs-number">3.0</span>])y_train = np.array([<span class="hljs-number">300.0</span>, <span class="hljs-number">360.0</span>, <span class="hljs-number">500.0</span>, <span class="hljs-number">540.0</span>, <span class="hljs-number">650.0</span>,])m = x_train.shape[<span class="hljs-number">0</span>] <span class="hljs-comment">#total number of training datasets</span></code></pre><p>Now, computing the Linear regression model by taking some random values of parameters \(w\) and \(b\).</p><pre><code class="lang-python">w = <span class="hljs-number">100</span> b = <span class="hljs-number">100</span><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_model_output</span>(<span class="hljs-params">x, w, b</span>):</span> m = x.shape[<span class="hljs-number">0</span>] y_hat = np.zeros(m) <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(m): y_hat[i] = w*x[i] + b <span class="hljs-comment">#predicted values</span> <span class="hljs-keyword">return</span> y_hattemp_y_hat = compute_model_output(x_train, w, b)</code></pre><p>Let's plot the graph and see how fit is our Linear regression model.</p><pre><code class="lang-python">plt.plot(x_train, temp_y_hat, c=<span class="hljs-string">'b'</span>, label=<span class="hljs-string">"our predictions"</span>)plt.scatter(x_train, y_train, marker=<span class="hljs-string">'x'</span>, c=<span class="hljs-string">'r'</span>)plt.title(<span class="hljs-string">"Housing prices"</span>)plt.ylabel(<span class="hljs-string">"Price (in 1000s of dollars)"</span>)plt.xlabel(<span class="hljs-string">"Size (in 1000 sqft.)"</span>)plt.show()</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721730641843/34a3c6df-5a75-440f-8670-ebe0caba082c.jpeg" alt class="image--center mx-auto" /></p><p>It's clearly visible that it is not the best fit Linear regression model for our dataset. We need to update the values of the parameters \(w\) and \(b\).</p><p>Now we'll calculate the MSE Cost function to find the error between the predicted and true values.</p><pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_cost</span>(<span class="hljs-params">x, y, w, b</span>):</span> m = x.shape[<span class="hljs-number">0</span>] cost_sum = <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(m): y_hat = w*x[i] + b cost = (y_hat - y[i])**<span class="hljs-number">2</span> cost_sum += cost total_cost = cost_sum / (<span class="hljs-number">2</span>*m) <span class="hljs-keyword">return</span> total_costa = compute_cost(x_train, y_train, w, b)print(a)</code></pre><p>$$15182.5$$</p><p>The value of cost function is too high, we need to run the Gradient descent algorithm to update the values of \(w\) and \(b\) to minimize cost function.</p><p>Let's now write the code for Gradient descent algorithm (which is the main objective of Linear regression).</p><pre><code class="lang-python"><span class="hljs-comment"># computing gradient (partial derivative)</span><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">compute_gradient</span>(<span class="hljs-params">x, y, w, b</span>):</span> m = x.shape[<span class="hljs-number">0</span>] dj_dw = <span class="hljs-number">0</span> dj_db = <span class="hljs-number">0</span> <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(m): y_hat = w*x[i] + b dj_dw_i = (y_hat - y[i])*x[i] dj_db_i = (y_hat - y[i]) dj_dw += dj_dw_i dj_db += dj_db_i dj_dw = dj_dw / m dj_db = dj_db / m <span class="hljs-keyword">return</span> dj_dw, dj_db</code></pre><pre><code class="lang-python"><span class="hljs-function"><span class="hljs-keyword">def</span> <span class="hljs-title">gradient_descent</span>(<span class="hljs-params">x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function</span>):</span> <span class="hljs-comment"># x = input data</span> <span class="hljs-comment"># y = target values</span> <span class="hljs-comment"># w_in and b_in = initial values of w and b</span> <span class="hljs-comment"># alpha = learning rate</span> <span class="hljs-comment"># num_iters = number of iterations to run gradient descent</span> <span class="hljs-comment"># cost_function = function to compute cost</span> <span class="hljs-comment"># gradient_function = function to compute gradient</span> J_history = [] <span class="hljs-comment"># history of cost values</span> p_history = [] <span class="hljs-comment"># history of parameters w and b</span> w = w_in b = b_in <span class="hljs-keyword">for</span> i <span class="hljs-keyword">in</span> range(num_iters): dj_dw, dj_db = gradient_function(x, y, w, b) w = w - (alpha * dj_dw) b = b - (alpha * dj_db) <span class="hljs-comment"># Save cost J at each iteration</span> <span class="hljs-keyword">if</span> (i < <span class="hljs-number">100000</span>): <span class="hljs-comment"># to prevent resource exhaustion</span> J_history.append(cost_function(x, y, w, b)) p_history.append([w,b]) <span class="hljs-comment"># Print cost at every 10 intervals and at every interval if i<10</span> <span class="hljs-keyword">if</span> (i % math.ceil(num_iters/<span class="hljs-number">10</span>) == <span class="hljs-number">0</span>): print(<span class="hljs-string">f"Iteration <span class="hljs-subst">{i:<span class="hljs-number">4</span>}</span>: Cost <span class="hljs-subst">{J_history[<span class="hljs-number">-1</span>]:<span class="hljs-number">0.2</span>e}</span> "</span>, <span class="hljs-string">f"dj_dw: <span class="hljs-subst">{dj_dw: <span class="hljs-number">0.3</span>e}</span>, dj_db: <span class="hljs-subst">{dj_db: <span class="hljs-number">0.3</span>e}</span> "</span>, <span class="hljs-string">f"w: <span class="hljs-subst">{w: <span class="hljs-number">0.3</span>e}</span>, b:<span class="hljs-subst">{b: <span class="hljs-number">0.5</span>e}</span>"</span>) <span class="hljs-keyword">return</span> w, b, J_history, p_historyiterations = <span class="hljs-number">10000</span>tmp_alpha = <span class="hljs-number">0.01</span>w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w, b, tmp_alpha, iterations, compute_cost, compute_gradient)print(<span class="hljs-string">f"(w,b) found by gradient descent: (<span class="hljs-subst">{w_final:<span class="hljs-number">8.4</span>f}</span>,<span class="hljs-subst">{b_final:<span class="hljs-number">8.4</span>f}</span>)"</span>)</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721737579942/9ddba0a6-cc11-45ed-ab6d-11e615fcd217.jpeg" alt class="image--center mx-auto" /></p><p>Now, we got the best-fit values of parameters \(w\) and \(b\), by which we can achieve the most optimized Linear regression model for the given dataset.</p><p>Let's plot the graph again after obtaining the values of parameters \(w\) and \(b\).</p><pre><code class="lang-python">temp_y_hat = compute_model_output(x_train, w_final, b_final)plt.plot(x_train, temp_y_hat, c=<span class="hljs-string">'b'</span>, label=<span class="hljs-string">"our predictions"</span>)plt.scatter(x_train, y_train, marker=<span class="hljs-string">'x'</span>, c=<span class="hljs-string">'r'</span>)plt.title(<span class="hljs-string">"Housing prices"</span>)plt.ylabel(<span class="hljs-string">"Price (in 1000s of dollars)"</span>)plt.xlabel(<span class="hljs-string">"Size (in 1000 sqft.)"</span>)plt.show()</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721739213687/779cb1b4-4d62-4d82-b92f-81e20858e0d4.jpeg" alt class="image--center mx-auto" /></p><p>So, now we got the best-fit line, which means we have successfully trained the Linear regression model for our dataset. We can now enter new test values (size in 1000 sqft.) and the model will give us the output (price in 1000s of dollars) with minimum error.</p><pre><code class="lang-python">x_test = np.array([<span class="hljs-number">2.75</span>])y_hat_test = compute_model_output(x_test, w_final, b_final)print(<span class="hljs-string">f"The predicted price for a house of size <span class="hljs-subst">{x_test[<span class="hljs-number">0</span>]*<span class="hljs-number">1000</span>}</span> sqft. is $<span class="hljs-subst">{round(y_hat_test[<span class="hljs-number">0</span>]*<span class="hljs-number">1000</span>,<span class="hljs-number">2</span>)}</span>"</span>)</code></pre><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721740307603/0e6d2bc7-428c-42dd-aaa4-2b3f8d1f0cd7.png" alt class="image--center mx-auto" /></p><h1 id="heading-multiple-linear-regression">Multiple Linear Regression</h1><p>A multiple linear regression computes a relationship between one dependent variable and more than one independent variable. It is represented by -</p><p>$$\hat{y} = \vec{w}.\vec{x} + b$$</p><p>where:</p><ul><li><p>\(\hat{y}\) is the dependent variable (output)</p></li><li><p>\(\vec{x}\) is the vector of all the independent variables (inputs)</p></li><li><p>\(\vec{w}\) is a vector of parameters corresponding to each \(x\)</p><p> <em>(the value of each</em> \(w\) <em>corresponding to the</em> \(x\) <em>(feature) depends on how much it affects the output result)</em></p></li><li><p>\(b\) is another parameter</p></li></ul><p>To understand Multiple linear regression, we can take help of the previous example of prediction of Housing price, where there was only one independent variable (\(x\)) (size of the housing) and one dependent variable (\(\hat{y}\)) (price). But we know that the Price of the house depends on a lot of factors other than size, like age of the house, location and more.</p><p>So, to predict the price of the house more accurately we have to consider more than one independent variables like age and size together to build a Linear regression model.</p><p>The method to train a Multiple linear regression model is mostly same as that of Simple linear regression model except for in place of all the scaler calculations, we use vector calculations. Later, in the upcoming blogs I'll discuss more about Multiple Linear regression.</p><h1 id="heading-conclusion">Conclusion</h1><p>So, this was quite everything about Linear Regression in Machine Learning. After reading this blog, you should now have a basic understanding of Linear Regression and be able to implement it in Python.</p><p><em>If you have any questions or need further clarification on any of the topics discussed, feel free to leave a comment below or reach out to me directly. Let's learn and grow together!</em></p><p>LinkedIn: <a target="_blank" href="https://www.linkedin.com/in/utkal-kumar-das-785074289">https://www.linkedin.com/in/utkal-kumar-das-785074289</a></p><p>To further explore the world of machine learning, here are some recommended resources:</p><ul><li><p>Coursera: Machine Learning by Andrew Ng-<a target="_blank" href="https://www.coursera.org/learn/machine-learning">https://<strong>www.coursera.org/learn/machine-learning</strong></a></p></li><li><p>Towards Data Science- <a target="_blank" href="https://towardsdatascience.com/"><strong>https://towardsdatascience.com/</strong></a></p></li></ul>]]>https://cdn.hashnode.com/res/hashnode/image/upload/v1721760682275/bb3daedb-5848-40c2-845f-a84364d215a5.jpeg<![CDATA[Introduction to Machine Learning and it's Algorithms]]>https://ukc.hashnode.dev/introduction-to-machine-learning-and-its-algorithmshttps://ukc.hashnode.dev/introduction-to-machine-learning-and-its-algorithmsThu, 18 Jul 2024 16:53:33 GMT<![CDATA[<p>In this blog I have summarized about the term Machine Learning and all the algorithm used in real world applications. Anyone who wants to dive into the field of ML or just wants to get an overview about Machine Learning, this blog will help you.</p><p><em>Enjoy reading!</em></p><h2 id="heading-machine-learning">Machine Learning</h2><p>Machine Learning (ML) is a discipline of Artificial Intelligence (AI) that provides machine the ability to automatically learn from data and past experiences to identify patterns and make predictions with minimal human interventions.</p><h3 id="heading-machine-learning-vs-artificial-intelligence">Machine Learning vs Artificial Intelligence</h3><p>This is one of the most common question that comes to our mind whenever we get to hear something about ML. So in simple terms-</p><p>Artificial Intelligence is a discipline that focusses on creating machines that can perform tasks which require typical human intelligence. It involves development of algorithms and system that can reason, learn and make predictions based on input data.</p><p>On the other hand, Machine Learning is a sub-field of AI, that involves teaching machines to learn from data. ML algorithms can identify patterns and trends and use them to make predictions. ML is used to build predictive models, classify data and recognize patterns. ML is an essential tool for AI.</p><h3 id="heading-what-is-deep-learning">What is Deep Learning?</h3><p>Talking about Machine Learning, we often come across with another familiar term 'Deep Learning'. Deep learning is a subset of machine learning that uses artificial neural networks to process and analyze information to mimic the learning process of the human brain.</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721314090737/9c546692-b392-408d-a283-3587ec707d15.png" alt class="image--center mx-auto" /></p><h2 id="heading-machine-learning-algorithms">Machine Learning Algorithms</h2><p>There are different types of algorithms that are used to train the machines, so that they can predict the outputs for the new inputs. There are 2 types of Machine Learning algorithms that are used most in the real world applications. They are-</p><ul><li><p>Supervised Learning</p></li><li><p>Unsupervised Learning</p></li></ul><h3 id="heading-supervised-learning">Supervised Learning</h3><p>Supervised Machine Learning is an algorithm in which the machine is trained with inputs (X) and it's corresponding right outputs (Y). Then they can take a brand new input and produce the possible output.</p><p>The Supervised Learning algorithm further can be classified into 2 types -</p><ul><li><p><strong>Regression</strong> - This algorithm is used when we need the machine to predict an output (mostly a number) from infinitely many possible outputs.</p><p> For example <em>- predicting the price of house etc.</em></p></li><li><p><strong>Classification</strong> - This algorithm is used to predict categories i.e predicts an output from finite limited number of outputs.</p><p> For example - <em>predicting whether the image is of a cat or not, predicting whether the student will pass or fail etc.</em></p></li></ul><h3 id="heading-unsupervised-learning">Unsupervised Learning</h3><p>Unsupervised Learning is an algorithm where the machine is trained with inputs (X) without corresponding outputs (Y). The machine tries to find patterns and relationships in the data on its own. This type of learning is used for clustering and association problems, such as grouping similar items together or finding associations between items.</p><p>There are 3 types of unsupervised algorithms mostly used in real world applications-</p><ul><li><p><strong>Clustering</strong> - This algorithm is used to group similar data points together</p><p> <em>For example - Clustering is used in Google News</em></p></li><li><p><strong>Dimensionality reduction</strong> - This algorithm used to reduce the number of input variables in a dataset while preserving its essential information.</p><p> <em>It is used in speech recognition, signal processing etc.</em></p></li><li><p><strong>Anomaly detection</strong> - This algorithm is used to identify unusual patterns in the data points.</p><p> <em>It is used in finance for fraud detection, used in manufacturing to identify defects etc.</em></p></li></ul><p>There are two more advanced Machine Learning Algorithms also used in industries -</p><ul><li><p><strong>Recommender Systems</strong> - A recommendation system is a type of machine learning system that provides personalized recommendations to users based on their past behaviors, preferences, and patterns.</p></li><li><p><strong>Reinforcement Learning</strong> - It is a Machine Learning technique that trains software to make decisions to obtain most optimal results. It is similar to the trial and error learning process humans use.</p><p> <em>It is used to teach computer to play video games and can also be seen in ChatGPT as it uses Reinforcement Learning from human feedbacks.</em></p></li></ul><h3 id="heading-conclusion">Conclusion</h3><p>Understanding machine learning and its diverse set of algorithms is crucial for anyone looking to explore the field of AI and data science and for tackling a wide range of problems. From linear regression to neural networks, each algorithm offers unique insights and solutions. Keep learning and experimenting to master these concepts. I hope this overview has provided a solid foundation for your journey into machine learning. Happy learning!</p><p><em>If you have any questions or need further clarification on any of the topics discussed, feel free to leave a comment below or reach out to me directly. Let's learn and grow together!</em></p><p>To further explore the world of machine learning, here are some recommended resources:</p><ul><li><p>Coursera: Machine Learning by Andrew Ng-https://<a target="_blank" href="http://www.coursera.org/learn/machine-learning">www.coursera.org/learn/machine-learning</a></p></li><li><p>Towards Data Science - <a target="_blank" href="https://towardsdatascience.com/">https://towardsdatascience.com/</a></p></li></ul>]]><![CDATA[<p>In this blog I have summarized about the term Machine Learning and all the algorithm used in real world applications. Anyone who wants to dive into the field of ML or just wants to get an overview about Machine Learning, this blog will help you.</p><p><em>Enjoy reading!</em></p><h2 id="heading-machine-learning">Machine Learning</h2><p>Machine Learning (ML) is a discipline of Artificial Intelligence (AI) that provides machine the ability to automatically learn from data and past experiences to identify patterns and make predictions with minimal human interventions.</p><h3 id="heading-machine-learning-vs-artificial-intelligence">Machine Learning vs Artificial Intelligence</h3><p>This is one of the most common question that comes to our mind whenever we get to hear something about ML. So in simple terms-</p><p>Artificial Intelligence is a discipline that focusses on creating machines that can perform tasks which require typical human intelligence. It involves development of algorithms and system that can reason, learn and make predictions based on input data.</p><p>On the other hand, Machine Learning is a sub-field of AI, that involves teaching machines to learn from data. ML algorithms can identify patterns and trends and use them to make predictions. ML is used to build predictive models, classify data and recognize patterns. ML is an essential tool for AI.</p><h3 id="heading-what-is-deep-learning">What is Deep Learning?</h3><p>Talking about Machine Learning, we often come across with another familiar term 'Deep Learning'. Deep learning is a subset of machine learning that uses artificial neural networks to process and analyze information to mimic the learning process of the human brain.</p><p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1721314090737/9c546692-b392-408d-a283-3587ec707d15.png" alt class="image--center mx-auto" /></p><h2 id="heading-machine-learning-algorithms">Machine Learning Algorithms</h2><p>There are different types of algorithms that are used to train the machines, so that they can predict the outputs for the new inputs. There are 2 types of Machine Learning algorithms that are used most in the real world applications. They are-</p><ul><li><p>Supervised Learning</p></li><li><p>Unsupervised Learning</p></li></ul><h3 id="heading-supervised-learning">Supervised Learning</h3><p>Supervised Machine Learning is an algorithm in which the machine is trained with inputs (X) and it's corresponding right outputs (Y). Then they can take a brand new input and produce the possible output.</p><p>The Supervised Learning algorithm further can be classified into 2 types -</p><ul><li><p><strong>Regression</strong> - This algorithm is used when we need the machine to predict an output (mostly a number) from infinitely many possible outputs.</p><p> For example <em>- predicting the price of house etc.</em></p></li><li><p><strong>Classification</strong> - This algorithm is used to predict categories i.e predicts an output from finite limited number of outputs.</p><p> For example - <em>predicting whether the image is of a cat or not, predicting whether the student will pass or fail etc.</em></p></li></ul><h3 id="heading-unsupervised-learning">Unsupervised Learning</h3><p>Unsupervised Learning is an algorithm where the machine is trained with inputs (X) without corresponding outputs (Y). The machine tries to find patterns and relationships in the data on its own. This type of learning is used for clustering and association problems, such as grouping similar items together or finding associations between items.</p><p>There are 3 types of unsupervised algorithms mostly used in real world applications-</p><ul><li><p><strong>Clustering</strong> - This algorithm is used to group similar data points together</p><p> <em>For example - Clustering is used in Google News</em></p></li><li><p><strong>Dimensionality reduction</strong> - This algorithm used to reduce the number of input variables in a dataset while preserving its essential information.</p><p> <em>It is used in speech recognition, signal processing etc.</em></p></li><li><p><strong>Anomaly detection</strong> - This algorithm is used to identify unusual patterns in the data points.</p><p> <em>It is used in finance for fraud detection, used in manufacturing to identify defects etc.</em></p></li></ul><p>There are two more advanced Machine Learning Algorithms also used in industries -</p><ul><li><p><strong>Recommender Systems</strong> - A recommendation system is a type of machine learning system that provides personalized recommendations to users based on their past behaviors, preferences, and patterns.</p></li><li><p><strong>Reinforcement Learning</strong> - It is a Machine Learning technique that trains software to make decisions to obtain most optimal results. It is similar to the trial and error learning process humans use.</p><p> <em>It is used to teach computer to play video games and can also be seen in ChatGPT as it uses Reinforcement Learning from human feedbacks.</em></p></li></ul><h3 id="heading-conclusion">Conclusion</h3><p>Understanding machine learning and its diverse set of algorithms is crucial for anyone looking to explore the field of AI and data science and for tackling a wide range of problems. From linear regression to neural networks, each algorithm offers unique insights and solutions. Keep learning and experimenting to master these concepts. I hope this overview has provided a solid foundation for your journey into machine learning. Happy learning!</p><p><em>If you have any questions or need further clarification on any of the topics discussed, feel free to leave a comment below or reach out to me directly. Let's learn and grow together!</em></p><p>To further explore the world of machine learning, here are some recommended resources:</p><ul><li><p>Coursera: Machine Learning by Andrew Ng-https://<a target="_blank" href="http://www.coursera.org/learn/machine-learning">www.coursera.org/learn/machine-learning</a></p></li><li><p>Towards Data Science - <a target="_blank" href="https://towardsdatascience.com/">https://towardsdatascience.com/</a></p></li></ul>]]>https://cdn.hashnode.com/res/hashnode/image/stock/unsplash/U-WdRP2M56w/upload/c64af44d2f95ada2bcbb6743b4b6c0d1.jpeg