Functionals
This is the beginning of numerous blog posts where I learn the basics of functionals, both in an abstract lens (functional analysis) and an applied lens (calculus of variations and statistics). My goal is to understand the intuition behind the Gateaux derivative, which is what an influence function is. I will be channeling my knowledge from real analysis and differential geometry (just being comfortable with differentials). There is a chance these concepts come up in my future, but I want to be prepared, as well as understand the concepts at a deeper level.
I am following notes from MIT and UW. I will be doing everything in \(\mathbb{R}\) because I do not respect complex numbers (just kidding).
Vector Spaces
A very loose definition of a function space is a set of functions between two fixed sets (domain to codomain, with the codomain being either \(\mathbb{R}\) or \(\mathbb{C}\)). It is useful to give a function space some sort of structure, such as the linearity property (ring any bells? a vector space!), or maybe all functions must be continuous and bounded. I have encountered function spaces in my undergrad, namely the set of all continuous functions \(C^{\infty}\), but mostly just as an example of an infinite-dimensional vector space (since a lot of infinite-dimensional vector spaces are function spaces). Unlike finite-dimensional vector spaces, which have really nice properties, infinite-dimensional vector spaces require some extra structure to study them.
Just like how we might want to define what distances are for some set \(X\) to measure closeness of two arbitrary elements, i.e. a metric space \((X, d)\), we want to create a notion of distance for a vector space \(V\), which is called a normed vector space. It is defined as a function \(\| \cdot \|: V \rightarrow [0, \infty)\) that satisfies definiteness, absolute homogeneity, and triangle inequality. The first and third property are similar to properties found in a metric, but absolute homogeneity is just pulling out constants!
We can induce a metric from a norm as \(d(x,y) = \| x - y \|\). It should be clear why normed vector spaces are a subset of metric spaces.
Now we might want to think about the angle between vectors. We cannot accomplish this with just a norm, we need an inner product (a generalizaion of the dot product). Channeling my differential geometry knowledge, we can induce a norm from an inner product (\(\| v \| = \sqrt{< v,v >}\)). I am unsure how important this is for my purposes, but I am putting it here just in case.
Functionals from Functional Analysis
A functional is a map \(J: V \rightarrow \mathbb{R}\), where \(V\) is a vector space equipped with some extra structure (e.g. a norm), and the codomain is its field of scalars (since I am working with \(\mathbb{R}\), I am just keeping it as \(\mathbb{R}\)). A linear functional satisfies the linearity property, i.e. \(J[\alpha f+ \beta g] = \alpha J[f] + \beta J[g]\) for some \(\alpha, \beta \in \mathbb{R}\) and \(f, g \in V\). This is important to make everything well-behaved, but I am unsure if I need to generalize later.
Notice that \(X\) doesn’t have to be a space of functions. I think historically, functionals were defined as functions of functions, but this is a more general form that includes this loose definition. There are some more things I need to develop in my understanding before moving on.
In practice (i.e. calculus of variations), a functional is a map \(J: \mathcal{F} \rightarrow \mathbb{R}\), where \(\mathcal{F}\) is a collection of functions. The most standard form is
\[J[f] = \int_a^b L(x, f(x), f'(x)) dx\]A caveat is that \(\mathcal{F}\) is not just all possible functions. For example, we might only want to look at continuous functions. Furthermore, we want a notion of how close one function is from another. For example, how different are \(f(x) = 1 + x\) and \(g(x) = 5\) from the interval \(x \in [0,1]\) from one another? We might choose the \(L_2\) norm to define that. This is why we need a notion of a normed vector space. In fact, we need something a bit stronger, but I will discuss that in the next blog post about functional analysis. I’m tired right now.
Examples in Statistics
We can think of a lot of objects from basic statistics in terms of functionals. A reason to do this is to unify all these concepts of describing features of a population into one mathematical idea. For the following examples, consider a functional \(T\) that inputs a probability distribution (its CDF, call it \(P\)) and outputs a number.
The mean: \(T[P] = \int x dP(x)\)
The variance: \(T[P] = \int x^2 dP(x) - (\int x dP(x)) ^2\)
These are both in integral form (with respect to probability measure) to unite discrete and continuous probability distributions into one framework. In either case, it will be the expectation formula we expect from basic probability theory. Furthermore, if we input an empirical distribution function (i.e. a CDF with observed data), the expectation integral reduces to the sample mean. This is currently beyond the scope of what I was trying to do here, so I won’t write it in notation.
The median (non-linear): \(T[P] = P^{-1}(0.5)\)
Nonparameteric statistics is connected to infinite dimensional spaces, because we say that the distribution is some unknown distribution in the set of all distributions, which is infinite dimensional. Just thought that was an importnat note to say.
Furthermore, being non-linear makes the existence of derivatives a bit more difficult. I don’t know how much detail to go into here, but it seems like the median and IQR are well-behaved enough. This can be an issue with influence functions on these features of a population.
What I will cover later
In a future blog post, I will cover Banach and Hilbert spaces, as well as some other definitions like bounded linear operators, pertubations, the riesz representation etc. Hopefully this will get us to a slightly more rigorous formulation of functional derivatives than what might be covered in calculus of variations, but is still in the realm of reality to not make me go completely insane.
Enjoy Reading This Article?
Here are some more articles you might like to read next:
- Google Gemini updates: Flash 1.5, Gemma 2 and Project Astra
- Displaying External Posts on Your al-folio Blog
- MCMC for Approximating Distributions
- Frechet and Gateaux Derivatives
- Banach and Hilbert Spaces (Part 2)
- Banach and Hilbert Spaces (Part 1)
- Multiple Imputation via Chained Equations (MICE)
- Bayesian Regression
- Firths Penalized Logistic Regression
- Optimal Transportation