Now that I have a background in the definition of a functional, as well as Banach and Hilbert spaces, I want to discuss functional derivatives. A lot of this is taken from wikipedia.

In Calculus of Variations

We first need to define what we mean by a “small change in a function \(f\)”, which leads us to the concept of a variation.

A pertubation of a function \(f(x)\) is defined as \(f(x) \rightarrow f(x) + \epsilon h(x)\), where \(h(x)\) is an arbitrary function and \(\epsilon > 0\) is some real parameter.

A variation of a function \(f(x)\), denoted \(\delta f\) is… (why do we take the derivative with respect to epsilon?)

Big O and Little o-notation

Apparently this is common notation in math and stats when it comes to asymptotics (makes sense since we want to see the growth behavior when we approach infinity and see when that kicks in), so I might as well clear out any confusion, especially if I want to look at data structures in the future.

\(O(g)\) refers to a set of functions such that for some \(f \in O(g)\), \(f(x) \leq kg(x)\) (where \(k \in \mathbb{R}^{+}\) is a constant) for all \(x > a\). This basically means that the growth of \(f\) to infinity is no faster than the growth of \(g\). Little o is similar, but with \(f(x) < kg(x)\), (and with an extra condition that I am not sure about)

In terms of a limit definition, big O is \(\limsup_{x \to \infty} \frac{f(x)}{g(x)} < \infty\) and little O is \(\limsup_{x \to \infty} \frac{f(x)}{g(x)} = 0\).

Why the limit supremum? Because there are some functions where the limit does not exist (but recall from analysis that the \(\limsup\) always exists in the extended real number line).

Here are some examples:

\(400x^2 \in O(x^2)\) \(3 + (n \mod 2) \in O(1)\) \(3x \notin o(x)\) but \(3x \in O(x)\)

Multivariable Calculus and Differentials

The gradient as the “total derivative”, we write that in terms of a vector. That is the linear map that is induced (which I know from differential geometry). We can extend that to vector fields by adding another dimension. In one dimension, the total derivative is literally just the derivative, so that’s where I was initially confused.

Definition of a directional derivative in limit form: include because I don’t think I learned this

The Derivatives

Let \(B\) be a banach space (remember this can be a set of functions, but it doesn’t have to be), and let \(F: B \rightarrow \mathbb{R}\) be some functional. The differential of \(F\) at a point \(\rho \in B\) is a linear functional \(\delta F_{\rho}[\cdot]\) such that for all \(\phi \in B\),

\[F[\rho + \phi] - F[\rho] = \delta F_{\rho}[\phi] + \epsilon \| \phi \|\]

With the property that as \(\| \phi \| \rightarrow 0\), then \(\epsilon \rightarrow 0\). In fact, the Frechet derivative of \(F\) at \(\rho\) is given by this differential as the “total derivative”.

(There is the limit form of this equation), we shrink the norm of an arbitrary \(\phi\), so it can approach zero from any direction.

What if we fix an arbitrary direction \(\phi\)? We shrink everything based on one direction, analogous to the directional derivative.

In fact, if a Frechet derivative exists, it is equal to the gateaux derivative. One terminology thing: Frechet = Strong, Gateaux = Weak, Pathwise.

Functional derivatives don’t need to be defined with a banach space, but that makes things a bit too complicated for my purposes.

Example: A simple functional

Let \(J[f] = \int (f(x))^2 dx\) be a functional. We want to compute its frechet derivative.

Now a case where the Frechet derivative doesn’t exist but the Gateaux derivative does:

Application: Influence Functions

What we are doing is not trying to optimize, so we don’t need the tools from calculus of variations (although I would still like to cover Euler-Lagrange Equations). Instead, we want to see how much of a pertutation changes a statistical quantity we are interested in, such as the mean or the conditional mean (regression).

Frechet and Gateaux Derivatives