2

1

This is an inequality on page 36 of the book Foundations of Machine Learning, but the author only states it without proof. $$ \mathbb{P}\left[\left|R(h)-\widehat{R}_{S}(h)\right|>\epsilon\right] \leq 4 \Pi_{\mathcal{H}}(2 m) \exp \left(-\frac{m \epsilon^{2}}{8}\right) $$

Here the growth function $\Pi_{\mathcal{F}}: \mathbb{N} \rightarrow \mathbb{N}$ for a hypothesis set $\mathcal{H}$ is defined by: $$ \forall m \in \mathbb{N}, \Pi_{\mathcal{F}}(m)=\max _{\left\{x_{1}, \ldots, x_{m}\right\} \subseteq X}\left|\left\{\left(h\left(x_{1}\right), \ldots, h\left(x_{m}\right)\right): h \in \mathcal{H}\right\}\right| $$

Given a hypothesis h $\in \mathcal{H},$ a target concept $c \in \mathcal{C}$ and an underlying distribution $\mathcal{D},$ the generalization error or risk of $h$ is defined by $$ R(h)=\underset{x \sim D}{\mathbb{P}}[h(x) \neq c(x)]=\underset{x \sim D}{\mathbb{E}}\left[1_{h(x) \neq c(x)}\right] $$ where $1_{\omega}$ is the indicator function of the event $\omega$.

And the empirical error or empirical risk of $h$ is defined $$ \widehat{R}_{S}(h)=\frac{1}{m} \sum_{i=1}^{m} 1_{h\left(x_{i}\right) \neq c\left(x_{i}\right)} $$

In the book, the author proves another inequality that differs from this one by only a constant using Rademacher complexity, but he says that the stated inequality can be proved without using Rademacher complexity. Does anyone know how to prove it?