Julien Harbulot
Technical blog
How to write a git commit message
The seven rules of a great Git commit message.

SSH Essentials
SSH is a protocol to start shell sessions on remote computer. Here are my essential tips to be more productive.

An extension to rename downloads in Google Chrome
Everytime I download an ebook, the filename is gibberish. So I decided to create a plugin to scrape the book title from the webpage and automatically rename the downloaded file. Here's a tutorial.

Introduction to PAC Learning
What is “learning” and do we have a formal model for it? I’ve decided to dive into the theoretical underpinnings of machinelearning, so here’s a quick introduction to...

Complete guide to writing commandline tools
Turn your python scripts into great commandline tools that others will want to use using these guidelines and libraries.

Scripting a mouse tracker on Windows
My journey to understanding Windows API and building a mouse tracking script.

How to send emails from python
Learn how to send emails from python to send yourself notifications or automate sending email to real people.

So you want to be a python expert?
I love James Powell's talk at PyData. In this one, James shows some very interesting python features.

Faster workflow with bash completion scripts
You can create autocompletion rules to use with your scripts and customize bash completion to boost your productivity.

Create a website using Pelican static generator
This article relates my journey to build http://scripting.tips using the static website generator Pelican.

Understanding pvalues
Hypothesis testing and pvalues are often misused and misunderstood. In this article, I explain what a pvalue is, and how to use it.

Introduction to hypothesis testing
We introduce the basic vocabulary required to understand hypothesis testing and define the pvalue.

MLE: an information theory viewpoint
We show that the MLE is obtained by minimizing the KLdivergence from an empirical distribution and interpret what it means.

The maximum likelihood estimator (MLE)
The maximum likelihood estimator is one of the most used estimators in statistics. In this article, we introduce this estimator and study its properties.

Introduction to statistical estimators
In this article we define what an estimator is. We focus on the theory to compare and assess estimators, rather than how to find one.

The effect of L2regularization
When fitting a model to some training dataset, we want to avoid overfitting. A common method to do so is to use regularization. In this article, we discuss the impact of L2regularization on the estimated parameters of a linear model.

Underfitting and overfitting illustrated
In this article, we define underfitting and overfitting and show some nice ways to vizualize them on polynomial regressions.

How to assess an OLS regression?
We’ve just fitted OLS to our trainset. How to assess whether it was a good model to use? We will answer this question from the point of view...

OLS regressions from the probabilistic viewpoint
We will show that the loss function used by ordinary leastsquares (OLS) stems from the statistical theory of maximum likelihood estimation applied to the normal distribution.

Vector notation for linear regressions
A linear regression attempts to estimate an output value using a linear function. Those functions can be expressed concisely using the vector notations. In this article, we define...

Primer on stochastic convergence
Types of convergence: in distribution, in probability and the fundamental convergence theorems.

Why there is more to classification than dicrete regression
In a classification problem, the dataset consists of pairs of input vectors and discrete labels :

What is a statistic and why do we care?
In this article, we explain that a statistic is a way of compressing information contained in the data, and we show how it can be used for inference....

Derivative, Gradient and Jacobian unified
A summary about scalar and vector derivatives.

What is a generalized linear model?
To understand what a generalized linear model does, let’s look back at linear models.

Conditional expectations and regression with squared error loss
In this article we review the solution to a regression with squared error loss. We start with the theoretical formulation before tackling the problem in practice.

The geometry of the normal equations
In this article, I show that the normal equations define the orthogonal projection of a vector onto a linear subspace.

The MoorePenrose (pseudoinverse) matrix
The MoorePenrose inverse of a matrix is used to approximatively solve a degenerate system of linear equations.

Understanding and solving the normal equations
The normal equations arise in several branches of mathematics, from statistics to geometry. In this article, we discuss how they emerge and how to solve them.

Why do we care about convexity?
In machine learning, the best parameters for a model are chosen so as to minimize the training objective. Strictly convex functions are paticularly interesting because they have a...

Crafting a better download tool
This morning I spent over an hour sorting and renaming the ebooks I downloaded this year. There were over 300 ebooks. What a waste of my time.

Scraping with Python3 and Scrapy
Scrapy is one of the most popular Python framework for large scale web scraping. It gives you all the tools you need to efficiently extract data from websites,...

Scraping basics with python3 and urllib
Scraping means using a program to extract data from a source. When the source is a website or a blog, we say web scraping, and today we will...

Scraping with BeautifulSoup
In a previous article, we discussed how to use python and urllib to scrape the web. In this article, we will see how the BeautifulSoup library replaces regexes...

Pause and restart a python script using Pickle
You’ve just quickly crafted a python script, and you ready to let it run the whole night while you sleep. Not long after you’ve fallen asleep, an error...

A probability distribution to quantify measurement errors
In this article we will derive the normal distribution as the probability distribution that models measurement errors. We start with a dart game and follow Herschel’s derivation.

The geometry of (normal) parameter estimation
This article shows geometrically where the best estimates for the mean and variance of a normally distributed random vector can be found. We start with a simple question...

Propositional logic derived as a special case of probability calculus
In this article, I will apply the rules of probability calculus to derive the rules of propositional logic (also called propositional calculus).

Why bayesian inference is more powerful than logic
In a previous article I showed that the inference rules of propositional logic can be obtained from probability calculus. But actually, we can obtain much more, and even...

Key ideas in probability and statistics illustrated on a simple problem

Extending logic to deal with uncertainty
This article sketches a construction of probability calculus as an extension of classical logic to account for uncertainty so that by construction, it can be used to automate...

A Bayesian Perspective
Probability is not a property of an event or state; there is no such thing as the probability that the coin lands showing head. Probability expresses a strength...

An information theory perspective on probability
In 1948, Claude Shannon invented information theory based on probability theory. The basic definition is entropy. Given of a set of messages mi, each one occurring with probability...

The change of basis matrix
This article explains the intuition behind the change of basis matrix.

A nontechnical introduction to statistics
This article explains in simple terms the purpose of statistical theory and gives an overview of how it is used.