The world of applications is evolving, its evolving at a rapid pace. Most of the applications that we see today claim to be smart, intelligent and make our lives smoother! The apps are smart enough to understand our likes and dislikes, our communication styles, they seem to understand our language, our thoughts, and our behavior, but the question is how do they do it? How do the apps recommend movies, how does it know which task to do first, how is it smart enough to auto complete our sentences! All these modern day computer marvels are a result of few cool algorithms operating on huge volumes of data backed by mathematical concepts, in short, the field of Machine learning, a subset of Artificial Intelligence!
There are a lot of articles and resources to understand the granular details of the Machine learning techniques and the math behind it, but in this article the attempt is to understand an over-arching umbrella of the possibilities.
How ML is all about Data?
The research in the field of Machine learning, dates back to few decades, most of the classic algorithms have stood the test of the time, but why are they being widely used now? The answer is data. Let us try to understand with a simple example that as a human how do we process data and find a pattern. Consider the below table with 3 columns. The first 2 columns are features/variables and the 3rd one is a function of the first two. At a first glance how do you relate
Did you conclude
f(a,b) = a x b ? What if we added more records to make the data look like this:
Would you like to change your understanding about
f(a,b)? Yes with the new dataset you re-trained yourself to devise a new relationship between feature
f(a,b) = a ^ b (
a to the power
b). If you predicted this at the first glance itself then you are ahead of time 👏 but you would still need the added record to conclude your decision. Therefore, if our own understanding about the world features is dependent on the data that we see as humans, then it should be pretty obvious that why data is so important for Machines to learn.
Data is the fuel for any machine learning algorithm. Besides, the success of the algorithm depends on not just the quantity but also the quality of this fuel. An algorithm is as powerful as the quality of the data it consumes for learning. Huge and quality data when combined with the superior computational power yields very good inferences. Lets continue with our previous example to corroborate on the quality aspect. As we see in the below augmented dataset the output is not just function of variables
b but is also influenced by another factor
c. This completely changes the output function as
f(a,b,c) = (a / b ) x c
There could be n other possible combinations of the features defining the relation between the output and the input features. We can never be 100% sure about one equation because the data keeps generating - a dataset is never said to be complete. If the additional data is redundant then it can be discarded, otherwise it is used to re-train the machine and make it improve upon its previous learning - just as we humans do. Therefore in Machine Learning there isn't one fixed possible answer. There are multiple outputs listed in order of its probability - the one that satisfies the maximum data points stays at the top. With few more feature columns and 10 to 20 records it is feasible for us to find a pattern by manually scanning the data. But to build a system having an intelligence equivalent to a fraction of human brain requires millions of records and hundreds or probably thousands of feature columns.
.. its not that we are using machines because we have huge amount of data. We need huge amount of data because we want to use machines as intelligent systems.
Salesforce - The reserve for Machine Learning fuel
As a Salesforce developer we know the prowess of the platform, and the wide range of applications that can be built on the platform. With more connected devices, and organizations, the data can all be tied to get a unified 360 degree view of the customers. This means that we have the fuel ready - and in fact the platform keeps on adding new data.
Machine Learning with data extracted from Salesforce
Below are few use cases that can be achieved with the data in Salesforce. Most of them are available out of the box in Salesforce while others can be made available using Einstein
- Predict the chances of converting a lead to opportunity
- Predict the chances of closing and winning a particular opportunity
- Forecast sales
- Predict if a warranty claim raised by customer is genuine or fake
- Classify customers with similar features for a targeted marketing
- Recommend products to the sales team during the quoting process based on the past customer choices.
Machine Learning Categories
Machine Learning can be broadly classified into 3 main categories as shown below.
Primarily the categorization is based on the type of dataset that we have. For e.g.
- Do we have the data as a combination of inputs and outputs so that the algorithm can be applied to predict the output on unseen data. This is termed as Supervised learning.
- Do we want the model to gain insights from the data and infer. This is termed as Unsupervised learning.
Additionally the outcome that we are trying to predict also gives a category to the Machine Learning technique. For e.g.
Predicting a value
E.g. Predict the house price based on knowledge of existing prices..
Associating label with data
E.g. Spam or not spam - find similar emails and group them.
Understanding the user's language
E.g. Usage in chat bots - performing Natural Language Processing
Identify elements in an image
E.g. Face detection from an image - An application of computer vision.
The above method of "Supervised and Unsupervised" is looked at as the Classical Learning. In the upcoming articles we will see these Classical Learning techniques along with Reinforcement Learning in detail and also discuss some of the frequently used algorithms and libraries.