“A Gentle Guide to Machine Learning” by Raúl Garreta

Posted at MonkeyLearn

“Machine Learning is a subfield within Artificial Intelligence that builds algorithms that allow computers to learn to perform tasks from data instead of being explicitly programmed.

(…) some of the most common categories of practical Machine Learning applications:

Image Processing (…) : Image tagging (…) , Optical Character Recognition (…) , Self-driving cars (…)

Text Analysis(…) : Spam filtering, (…) Sentiment Analysis,(…) Information Extraction, (…)
Data Mining(…): Anomaly detection, (…) Grouping , (…), Predictions(…)
Video Games & Robotics

(…) examples:

Example 1: A system that given an image, it has to say if Barack Obama’s face appears in that image (the generalization would be like Facebook’s auto tagging).
Example 2: A system that given a tweet, tells if it talks with a positive or negative mood.
Example 3: A system that given some person’s profile, assigns a score representing the probability of that person paying a credit loan.(…)

In order to allow the algorithm to learn to transform the input to the desired output, you have to provide what is called training instances (…)

Every training instance usually is represented as a fixed set of attributes or features. (…)

Calculating and selecting the proper features to represent an instance is one of the most important tasks when working with Machine Learning(…)

(…) two general categories of Machine Learning algorithms: (…)

Supervised Learning (…) can be seen as a process that has to transform a particular input to a desired output.(…)

Unsupervised Learning(…). In this case, the training examples only need to be the input to the algorithm, but not the desired output. The typical use case is to discover the hidden structure and relations between the training examples. (…)

There are lot of Machine Learning algorithms, but let’s briefly mention three of the most popular ones:

Support Vector Machines: (…) for some domains it is one of the best Machine Learning algorithms you can use nowadays.
Probabilistic Models: (…) Perhaps the most popular algorithms in this category are Naive Bayes classifiers (…) they return not only the prediction but also the degree of certainty, which can be very useful.
Deep Learning: is a new trend in Machine Learning based on the very known Artificial Neural Network models. (…)

Machine Learning sounds like a beautiful concept, and it is, but there are some processes involved in Machine Learning that are not so automatic. In fact, many times manual steps are required when designing the solution. Nevertheless, they are of huge importance in order to obtain decent results. Some of those aspects are:

Which type of Machine Learning algorithm should I use?
Supervised or unsupervised?(…)

Classification, regression or clustering?(…)

Deep Learning, SVM, Naive Bayes, Decision Trees… which is the best? (…)

So, you have to feed the Machine Learning algorithm with training examples. (…) The general rule regarding training examples is: the more quality training data you can gather, the better results you may get.

Testing examples and performance metrics (…) Accuracy is the most basic metric, you should also look at other metrics like Precision and Recall (…) For regression and clustering problems you have other sets of metrics that will allow to see if your algorithm is performing well.

(…)” read full post for a good introductory guide to machine learning