Supervised Learning

Vtantravahi
4 min readSep 5, 2022

--

Image Credit: Author Edit

In the realm of computers, programming and giving instructions are commonplace. Have you ever considered whether computers may benefit from this wealth of knowledge? We all know that computer memory has a limited lifespan and eventually runs out, thus it stands to reason that we all store everything as a file or object in permanent storage. The claim that computers learn from previously acquired data is widespread in the machine learning (ML) field, but is it true? In this article, let’s investigate through answering a series of questions.

What and Where does data Come from?

Data is nothing but a piece of recorded information / observation during a task.

Image Credit: Author Edit

In the data corpus, data is frequently what is captured from the user endpoint and frequently contains a number of traits, features, or dependent factors that aid in predicting or attempting to anticipate a single output or independent variable. It might be gathered through user behavior on websites, payment transactional data, IOT data, smart watches, mobile phones, and many more sources. But does this kind of raw data that is recorded help the programme learn? The simple answer is NO, which is why we utilize data transformation. But can every observation be used for training a supervised learning model even after transformation?

What is Supervised Learning?

Supervised learning (SL) is the machine learning task of learning a function that maps an input to an output based on example input-output pairs.[1] It infers a function from labeled training data consisting of a set of training examples. In supervised learning, each example is a pair consisting of an input object (typically a vector) and a desired output value (also called the supervisory signal).

In Simple terms, One type of machine learning is supervised learning, which involves labelling data (for instance, data that includes both features and predicted results) and using that data to train an algorithm.

For example: 
| x | y | sum |
---------------
| 2 | 3 | 5 |
| 4 | 1 | 5 |
| 18 | 14 | 32 |
Labelled data looks similar to this where,
x,y --> features
sum --> output label / predictable feature

Does each supervised learning issue exhibits a consistent behavior?

Image Source: Author Edit

Although the data used in supervised learning is never consistent and may either be discrete/categorical or continuous/numerical depending on the situation, it is used to discover rules or learn from the relationship between characteristics/attributes and intended output. Classification algorithms (which sort objects into classes or categories) and regression algorithms (which forecast real values) are the two categories under which supervised learning is categorized, and the list of these methods is lengthy. So how do you choose the appropriate algorithm at the right time? I personally use the Scikit-Learn model selection cheat sheet as a rule of thumb.

Image Source: Scikit-Learn

However, do not consider this your only option; it may come in handy if you run into trouble while selecting an algorithm. However, you may always experiment with other methods through a trial-and-error approach that you will become accustomed to as you advance in performing side projects using certain open-source datasets.

Practical Time:

Let’s examine a categorization (Classification) example in this exercise, which falls under the umbrella of supervised learning. But first, we need to familiarize ourselves with some popular ML jargon.

  • Dataset: A data set is a grouping of connected, distinct pieces of connected data that may be viewed separately, together, or handled as a single unit. Each dataset includes features (reliable characteristics that aid in target prediction), target (undependable variable that is predictable by features).
  • Model: a programme/ algorithm that may be trained to recognize patterns in data and anticipate desired values.
  • Train, Test and Validation Split: Every ML challenge includes a train, test, and validation split of the dataset. In contrast, validation data is a type of real-time simulation where we attempt to simulate production-like data to validate how well our model performs in production-like environments. Test data is the data that we use to test our trained model to see how far it is performing on similar data that it hasn’t seen before, whereas train data is the data that the algorithm learns during training and returns us a model.
  • Metrics: Metrics are a mechanism for us to assess the performance of our model in terms of numerical numbers. In our practical example we have used confusion matrix as one of metrics which you can learn here.

Conclusion:

As a last point, the industry’s advancement in ML and AI has taken a quite sharp turn. Supervised Learning, which is a career-starting learning for many ML engineers and data scientists, has, nevertheless, taken its position in the market. In my opinion, it is similar to how we learn and make judgments when we are uncertain about the results. I’m hope you fully grasped the idea and look forward to hearing from you.

--

--

Vtantravahi

👋Greetings, I am Venkatesh Tantravahi, your friendly tech wizard. By day, I am a grad student in CIS at SUNY, by night a data nerd turning ☕️🧑‍💻 and 😴📝