Artificial Intelligence, Machine Learning and all relevant keywords have been leading the headlines lately and for good reason. This
new field has already transformed industries across the globe, and companies are racing to understand how to integrate this emerging technology: If we had an AI ready to give answers, what would we ask? And if we can think of a question, is it valid for an AI?
First things first, let’s initially see how Data Analytics lead to Machine Learning scenarios, and then check ML.NET, Microsoft’s solution for everyone but especially for #ASPNET developers.
Step 1, What is Data Analytics
Through data mining and various other technics, vast quantities of -sometimes- unstructured data are collected. The analysis of those for commonalities (such as averages, ratios, known math graphs, etc.) is presented through aggregations on a dashboard, and humans are responsible for making assumptions and predicting the future.
Step 2, What is Predictive Analytics
Through the analysis done in the previous step, we naturally end up predicting what will come tomorrow. Predictions are based on historical repeated data and humans are called to identify those patterns in the data, in advanced scenarios write the math equations that represent this pattern, verify them, test them, adjust them and finally apply that hard-earned logic to new unknown data and predict the results.
Step 3, What is Machine Learning
Machine learning could be explained as a predictive analysis process with one key difference. A machine and not a human is making the assumptions, the tests and the adjustments in order to finally learn how to predict results. Why is this better? Because a machine can study millions of different datasets that contain millions of theoretically unlinked data in ways and speeds that are foreign to human nature. Through this study it can discover connections that “shouldn’t be possible” and give solutions when functions are unknown or too complex to discover.
The simplest way that I can think of to explain the difference between predictive analytics and machine learning solutions, could be expressed somehow mathematically:
fis applied to
xand transforms it to
f(x)=yIf we know
y, it is not a machine learning problem. If we know
f, then it is a machine learning problem.
We are not going to get deeper on the subject, but a crucial factor of success for every step is finding and preparing data, the so called data pre-processing techniques. Although they come with many names and there are various techniques to achieve, we are just going to mention a few and leave the rest for a google search!
- Data collection
Self explanatory but a rather difficult step: Do you have tones of it somewhere? Collect them and try to aggregate them.
- Feature Selection
Identify those input variables that are most relevant to the task.
- Data Profiling
Check for trends, outliers, exceptions, incorrect, inconsistent, missing, or skewed information. Make your data consistent.
- Data Quality
Dealing with erroneous data, missing values, extreme values, and outliers in your data
- Feature Engineering
Derive new variables from available data.
What is ML.NET
ML.NET is a free, open source, and cross platform machine learning framework for the .NET developer platform.
ML.NET allows you to train, build, and ship custom machine learning models using C# or F# for a variety of ML scenarios. ML.NET includes features like automated machine learning (AutoML) and tools like ML.NET CLI and ML.NET Model Builder, which make integrating machine learning into your applications even easier.
A Hello World with ML.NET
Although this is a bit more complicated than just your average Hello World app, we could easily separate it in 7 distinct steps.
In case you want to check a working example instead of reading the next steps, you might find Microsoft.ML.Forecasting.GlobalTemperature on my GitHub account useful.
1. The ML.NET Context
MLContext is the starting point for all ML.NET operations. The
MLContext is used for all aspects of creating and consuming an ML.NET model. It is similar conceptually to
DbContext in Entity Framework.
2. Load data
Data in ML.NET is represented as an
IDataView, which is a flexible, efficient way of describing tabular data (for example, rows and columns). You can load data from files or from real-time streaming sources to an
IDataView. For example
LoadFromTextFile allows you to load data from TXT, CSV, TSV, and other file formats.
Learn more about loading data here.
3. Transform data
This is data pre-processing made easy, since usually data are not ready to be consumed! Transformers take data, do some work on it, and return new, transformed data. For example, did you know that you can only feed numbers to the engine? This is how a text is being transformed:
FeaturizeText("This is a text we want to use") => [0.86, 0.67, 0.45, 0.99....]
There are built-in set of data transforms for replacing missing values, data conversion, featurizing text, and more.
Learn more about data transformations here.
4. Choose the correct algorithm
When using machine learning and ML.NET, you must choose a machine learning task that goes along with your scenario. ML.NET offers over 30 algorithms (or trainers) for a variety of ML tasks:
|Binary classification (for example, sentiment analysis)||
|Multi-class classification (for example, topic categorization)||
|Regression (for example, price prediction)||
|Clustering (for example, customer segmentation)||
|Anomaly Detection (for example, shampoo sales spike detection)||
|Recommendation (for example, movie recommender)||
|Ranking (for example, search results)||
For example, the next is using the
AveragedPerceptronTrainer FOR sentiment analysis
5. Train Model
The data transformations and algorithms you have specified are not executed until you call the
Fit() method (because of ML.NET’s lazy loading approach). This is when model training happens.
6. Evaluate Model
ML.NET offers evaluators that assess the performance of your model on a variety of metrics:
- Area under the curve (AUC)
- Root Mean Squared Error (RMSE)
7. Deploy and Consume model
You can save your trained model as a binary file that is then integrated into your .NET applications.
Once you have saved the trained model, you can load the model in your other .NET applications and start making predictions:
Quite a long Hello World right? Well, not exactly! If you think that you just used 15-20 lines of code to train a machine to give you answers, it doesn’t seem that long! Could you imagine that just a few years ago?
In case you want to check a working example, you might find Microsoft.ML.Forecasting.GlobalTemperature on my GitHub account useful.