Introduction to machine learning by Quentin de Laroussilhe - http://underflow.fr - @Underflow404
Machine learning
A machine learning algorithm is an algorithm learning to accomplish a task by observing data.
●Used on complex tasks where it’s hard to develop algorithms with
handcrafted-rules
●Exploits patterns in observed data and extract rules automatically
Fields of application
●Computer vision
●Speech recognition
●Financial analysis
●Search engines
●Ads-targeting
●Content suggestion
●Self-driving cars
●Assistants
●etc...
Example : object detection
Big variation in visual features :●Shape
●Background ●Size / position Classifying an object in a
picture is not an easy task.
Example : object detection
●Learn from annotated corpus of examples
(a dataset) to classify unknown images
among different object types
●Observe images to learn patterns
●Lot of data available (i.e: ImageNet
dataset)
●Very good error rates (< 5% with deep-
CNN)
General concepts
Types of ML algorithms
Supervised
Learn a function by observing examples containing the input and the expected output.
●Classification
●Regression Unsupervised
Find underlining relations in data by observing the raw data only (without the expected output).
●Clustering
●Dimensionality reduction
Training set
Classification vs Regression
Regression
Learn a function mapping an input element to a real value.
i.e: Predict the temperature of tomorrow given some meteo signals Classification
Learn a function mapping an input element to a class (within a finite set of possible classes).
i.e: Predict the weather of tomorrow: {sunny, cloudy, rainy} given some meteo signals
Regression
Classification
Clustering
A clustering algorithm separate different observed data points in similar groups (clusters). We do not know the labels during training.
Cluster 1
Cluster 3 Cluster 2
Reinforcement learning
Learn the optimal behavior for an agent in an environment to maximize a given goal.
Examples:
●Drive a car on a road and minimize the collision risk
●Play video-games
●Choose the position of ads on a website to maximize the number of clicks
Feature extraction
The first step in a machine learning process is to extract useful values from the data (called features).
The goal is to extract the information useful for the task we want to learn. Examples:
●Stock market time-serie → [opening price, closing price, lowest, highest]
●Image → Image with edges filtered
●Document → bag-of-word
Modelisation process
k nearest neighbors
k-nearest neighbors
●Classification and regression model
●Supervised learning: we have annotated examples
●We classify a new example based on the labels of his “nearest neighbors”
●k is the number of neighbors taken in consideration
k-nearest neighbors
To classify a point:
We look the k-nearest neighbors (here k=5)
and we do a majority vote.
This point has 3 red neighbors and 2 blue
neighbors, it will be classified as red.
k-nearest neighbors
●N data points
●Require a distance function between points
●Regression (average the value of the k-nearest neighbors)
●Classification (majority vote of the k-nearest neighbors)
k-nearest neighbors : effect of k ●k is the number of neighbors taken in
consideration
●If k = 1
○The accuracy on the training set is 100%
○It might not generalize on new data
●If k > 1
○The accuracy on the training set might not
be 100%
○It might generalize better on unseen data