# Roundup

We have finally arrived at the last post of our series of the proof that linear regression indeed is a sharp learner. Recall that in the first post we began by motivating linear regression as a problem on predicting house prices and quickly came to understand there was a beautiful way to frame this probem abstractly: given any set of features $\mathfrak{X}$ and fd Euclidean space of labels $\mathfrak{y}$, as well as dataspace $\mathfrak{D}$ satisfying the separation condition for a finite dimensional hypothesis space $\mathfrak{H}\subset \mathfrak{y}^{\mathfrak{X}}$, is it possible to find a map $$h:\mathfrak{D} \longrightarrow \mathfrak{H}$$ such that $c(\Delta, h_\Delta)=\min_{h \in \mathfrak{H}} c(\Delta,h)$ where $$c(\Delta, h)=\sum_{(x,y)\in \Delta}\vert \vert y-h(x)\vert\vert^2$$

# Coordinates: Regression the way we know it

In our post series on linear regression in machine learning, up to now, we have already done quite a bit of work: we first gave mathematical definition of supervised learning. We then described how to interpret the concept of linear regression as a class of supervised learners. Along the way, we gave an overview of an important and underestimated tool in linear algebra: the pseudo-inverse. Our approach discussed linear regression in the highest possible generality.

# Defining Supervised Learning

In this inaugural post of the mathvsmachine blog we will introduce a mathematical definition that encaptures the idea of supervised learning. This definition is close to the one found in the literature, just stated just a little more precisely. In this way, we’ll ibe able to not only prove that the algorithms we all know and love indeed fit this definition, but in time expand and build a fully fledged theory of supervised learners

# Disecting Pseudo-inverses

Welcome to the third installment of our post series on linear regression…our way!! Let’s start by recapping what we already discussed: In the first post, we explained how to define linear regression as a supervised learner: Let $\mathfrak{X}$ be a set of features and $\mathfrak{y}$ a finite dimensional inner product space. Pick $\mathfrak{H} \subset \text{Hom}_{\mathbb{R}}(\mathfrak{X},\mathfrak{y})$ to be any finite dimensional subspace of functions as well as a collection $\mathfrak{D}$ of finite datasets $\Delta \subset \mathfrak{X}\times \mathfrak{y}$ that separate $\mathfrak{H}$ in the sense that any $h \in \mathfrak{H}$ is uniquely determined by its restriction on $\Delta$.

Last time, we introduced linear regression as a new class of learners which we called linear. Let’s start with a little recap… We considered a set of features $\mathfrak{X}$ together with labels which in turn took values in a finite-dimensional inner product space $\mathfrak{y}$. We next considered any finite-dimensional subspace $\mathfrak{H}\subset \mathfrak{y}^\mathfrak{X}$ of the vector space of functions $\mathfrak{X}\longrightarrow \mathfrak{y}$ as the possible hypotheses as well as a dataspace $\mathfrak{D}$ consisting of finite subsets of $\mathfrak{X}\times \mathfrak{y}$ which separate the hypothesis space $\mathfrak{H}$.