A Hello World with Microsoft’s Machine Learning framework, ML.NET

by George Kosmidis / Published 5 years ago

Artificial Intelligence, Machine Learning and all relevant keywords have been leading the headlines lately and for good reason. This ~~new~~ field has already transformed industries across the globe, and companies are racing to understand how to integrate this emerging technology: If we had an AI ready to give answers, what would we ask? And if we can think of a question, is it valid for an AI?

Introduction

First things first, let’s initially see how Data Analytics lead to Machine Learning scenarios, and then check ML.NET, Microsoft’s solution for everyone but especially for #ASPNET developers.

Step 1, What is Data Analytics

Through data mining and various other technics, vast quantities of -sometimes- unstructured data are collected. The analysis of those for commonalities (such as averages, ratios, known math graphs, etc.) is presented through aggregations on a dashboard, and humans are responsible for making assumptions and predicting the future.

Step 2, What is Predictive Analytics

Through the analysis done in the previous step, we naturally end up predicting what will come tomorrow. Predictions are based on historical repeated data and humans are called to identify those patterns in the data, in advanced scenarios write the math equations that represent this pattern, verify them, test them, adjust them and finally apply that hard-earned logic to new unknown data and predict the results.

Step 3, What is Machine Learning

Machine learning could be explained as a predictive analysis process with one key difference. A machine and not a human is making the assumptions, the tests and the adjustments in order to finally learn how to predict results. Why is this better? Because a machine can study millions of different datasets that contain millions of theoretically unlinked data in ways and speeds that are foreign to human nature. Through this study it can discover connections that “shouldn’t be possible” and give solutions when functions are unknown or too complex to discover.

The simplest way that I can think of to explain the difference between predictive analytics and machine learning solutions, could be expressed somehow mathematically:

A function f is applied to x and transforms it to y: f(x)=y
If we know f and x but not y, it is not a machine learning problem.
If we know x and partially y but not f, then it is a machine learning problem.

Data Preparation

We are not going to get deeper on the subject, but a crucial factor of success for every step is finding and preparing data, the so called data pre-processing techniques. Although they come with many names and there are various techniques to achieve, we are just going to mention a few and leave the rest for a google search!

Data collection
Self explanatory but a rather difficult step: Do you have tones of it somewhere? Collect them and try to aggregate them.
Feature Selection
Identify those input variables that are most relevant to the task.
Data Profiling
Check for trends, outliers, exceptions, incorrect, inconsistent, missing, or skewed information. Make your data consistent.
Data Quality
Dealing with erroneous data, missing values, extreme values, and outliers in your data
Feature Engineering
Derive new variables from available data.

ML.NET

What is ML.NET

ML.NET is a free, open source, and cross platform machine learning framework for the .NET developer platform.

ML.NET allows you to train, build, and ship custom machine learning models using C# or F# for a variety of ML scenarios. ML.NET includes features like automated machine learning (AutoML) and tools like ML.NET CLI and ML.NET Model Builder, which make integrating machine learning into your applications even easier.

A Hello World with ML.NET

Although this is a bit more complicated than just your average Hello World app, we could easily separate it in 7 distinct steps.

In case you want to check a working example instead of reading the next steps, you might find Microsoft.ML.Forecasting.GlobalTemperature on my GitHub account useful.

1. The ML.NET Context

MLContext is the starting point for all ML.NET operations. The MLContext is used for all aspects of creating and consuming an ML.NET model. It is similar conceptually to DbContext in Entity Framework.

var mlContext = new MLContext();

2. Load data

Data in ML.NET is represented as an IDataView, which is a flexible, efficient way of describing tabular data (for example, rows and columns). You can load data from files or from real-time streaming sources to an IDataView. For example LoadFromTextFile allows you to load data from TXT, CSV, TSV, and other file formats.

var trainingData = mlContext.Data
    .LoadFromTextFile<SentimentInput>(dataPath, separatorChar: ',', hasHeader: true);

Learn more about loading data here.

3. Transform data

This is data pre-processing made easy, since usually data are not ready to be consumed! Transformers take data, do some work on it, and return new, transformed data. For example, did you know that you can only feed numbers to the engine? This is how a text is being transformed:

FeaturizeText("This is a text we want to use") => [0.86, 0.67, 0.45, 0.99....]

There are built-in set of data transforms for replacing missing values, data conversion, featurizing text, and more.

// Convert sentiment text into numeric features
var dataTransformPipeline = mlContext.Transforms.Text
    .FeaturizeText("Features", "SentimentText");

Learn more about data transformations here.

4. Choose the correct algorithm

When using machine learning and ML.NET, you must choose a machine learning task that goes along with your scenario. ML.NET offers over 30 algorithms (or trainers) for a variety of ML tasks:

ML Task	Algorithms
Binary classification (for example, sentiment analysis)	`AveragedPerceptronTrainer`, `SdcaLogisticRegressionBinaryTrainer`
Multi-class classification (for example, topic categorization)	`LightGbmMulticlassTrainer`, `OneVersusAllTrainer`
Regression (for example, price prediction)	`LbfgsPoissonRegressionTrainer`, `FastTreeRegressionTrainer`
Clustering (for example, customer segmentation)	`KMeansTrainer`
Anomaly Detection (for example, shampoo sales spike detection)	`RandomizedPcaTrainer`
Recommendation (for example, movie recommender)	`MatrixFactorizationTrainer`
Ranking (for example, search results)	`LightGbmRankingTrainer`, `FastTreeRankingTrainer`

For example, the next is using the AveragedPerceptronTrainer FOR sentiment analysis

var trainer = mlContext.BinaryClassification.Trainers
    .AveragedPerceptron(labelColumnName: "Sentiment", featureColumnName: "Features"));

var trainingPipeline = dataTransformPipeline.Append(trainer);

5. Train Model

The data transformations and algorithms you have specified are not executed until you call the Fit() method (because of ML.NET’s lazy loading approach). This is when model training happens.

var model = pipeline.Fit(trainingData);

6. Evaluate Model

ML.NET offers evaluators that assess the performance of your model on a variety of metrics:

Accuracy
Area under the curve (AUC)
R-Squared
Root Mean Squared Error (RMSE)

// Make predictions on test data
IDataView predictions = model.Transform(testDataView);

// Evaluate model and return metrics
var metrics = mlContext.BinaryClassification
    .Evaluate(predictions, labelColumnName: "Sentiment"); 

// Print out accuracy metric
Console.WriteLine("Accuracy" + metrics.Accuracy);

7. Deploy and Consume model

You can save your trained model as a binary file that is then integrated into your .NET applications.

Once you have saved the trained model, you can load the model in your other .NET applications and start making predictions:

var mlContext = new MLContext();
DataViewSchema predictionPipelineSchema;
var model = mlContext.Model.Load("model.zip", out predictionPipelineSchema);

var predEngine = mlContext.Model.CreatePredictionEngine(model); 
var sampleComment = new SentimentInput{ SentimentText = "This is very rude!" }; 
var result = predEngine.Predict(sampleComment);
Console.WriteLine(result.Prediction);

Done!

Quite a long Hello World right? Well, not exactly! If you think that you just used 15-20 lines of code to train a machine to give you answers, it doesn’t seem that long! Could you imagine that just a few years ago?

In case you want to check a working example, you might find Microsoft.ML.Forecasting.GlobalTemperature on my GitHub account useful.

This page is open source. Noticed a typo? Or something unclear?
Edit Page Create Issue Discuss