-
Notifications
You must be signed in to change notification settings - Fork 2
/
calculus.tex
1846 lines (1445 loc) · 71.5 KB
/
calculus.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\section{Overview}
\url{https://www.math.ucla.edu/~tao/preprints/forms.pdf}
Differential calculus is a way to compute quantities related to functions by treating the smooth
curve or surface of function output values as being comprised of many local linear functions. Each
linear approximation applies over a tiny (arbitrarily small) local interval; the linear
approximation in the next interval will in general have a slightly different gradient.
A central concept in differential calculus is the \textit{differential}: the change in output value
caused by a small change in the input value, at some starting input value. This describes the way in
which the function output changes in response to changes in input. Differentials are often used to
compute a \textit{derivative}: the ratio of change in output value to the change in some input
value. Derivatives define a local \textit{linear approximation} to the function: over a small local
region we consider the real function to be approximated by a line with gradient equal to the
derivative at that point.
The above is differential calculus. Integral calculus is concerned with ``summing'' the output values
of a function associated with some region in the input space. In the familiar case, the input space
is a section of the real number line, and the output values are also real numbers. So ``summing'' the
output values corresponds to calculating the area under a curve (i.e. under the graph of the
function).
Now allow the input space to be a higher dimensional Euclidean space, e.g. some region of the plane
$\R^2$, but keep the output values as being simply real numbers. One question is what is the value
of the integral along some 1-dimensional \textit{path} through the input space. We imagine dividing
the input space up into many small sections (vectors) $\Delta x_i$, as usual. However, when computing
the contribution from one such infinitisimal section, it is not sufficient to say simply that this
is $f(x_i)|\Delta x_i|$. The reason is that the appropriate contribution might depend not only on the
position $x_i$ but on the direction of the infinitisimal displacement vector
$\Delta x_i$. Therefore, we define $\omega_{x_i}$ to be the linear mapping that takes as input
$\Delta x_i$ and outputs the ``height'' $f(x_i)$.
What does this look like in the simple case where the answer is insensitive to the direction of the
infinitisimal displacement vector $\Delta x_i$? I think $\omega$ would depend on $|\Delta x_i|$ only
and not otherwise on $\Delta x_i$?
Another question is what is the value of the integral over some higher dimensional region of input
space (e.g. a subset of the plane).
\section{Functions of a single variable}
\subsection{Definition of derivative}
Sussman et al. Structure and Interpretation of Classical Mechanics p.482-483:
\begin{quote}
``The derivative of a function $f$ is the function $D f$ whose value for a particular argument is something
that can be multiplied by an increment $\Delta x$ in the argument to get a linear approximation to the
increment in the value of $f$: $f(x + \Delta x) \approx f(x) + D f(x) \Delta x$.''\footnote{Sussman et
al. Structure and Interpretation of Classical Mechanics p.482}
\end{quote}
\begin{quote}
``The derivative of a real-valued function of multiple arguments is an object whose contraction with the tuple
of increments in the arguments gives a linear approximation to the increment in the function’s
value.''\footnote{Sussman et al. Structure and Interpretation of Classical Mechanics p.483}
\end{quote}
\begin{definition*}~\\
A \defn{derivative} of a function $f$ is the function $D f$. When $D f$ is evaluated at an input value the
result is something which can be multiplied by an increment to the function's input to give a linear
approximation to the increment in output:
\begin{align*}
f(x + \Delta x) \approx f(x) + (D f)(x)\Delta x.
\end{align*}
Note that this implies that the product (``contraction'' or matrix product etc) of the evaluated derivative
with the input increment is something which can be added to $f(x)$, i.e. it's in the codomain of $f$.
E.g. consider a linear map $f:\R^n \to \R^m$ (which can be represented by a
matrix $A \in \R^{m \times n}$). Let $x \in \R^n$ and let $U = (D f)(x)$. It must be the case that one or
other of
\begin{align*}
&f(x) + U \cdot \Delta x ~~~~~~~\text{xor} \\
&f(x) + \Delta x \cdot U
\end{align*}
is valid (compatible for multiplication) and is an approximation to $f(x + \Delta x)$.
We have $\Delta x \in \R^n$ and $f(x) \in \R^m$. So if we're saying that $f(x) = Ax$, then $x$ and $\Delta x$
are $(n \times 1)$ column vectors, and $f(x)$ is a $(m \times 1)$ column vector. So we need something that
maps column vectors in $\R^n$ to column vectors in $\R^m$, i.e. $U \in \R^{m \times n}$ and the version that
is valid is
\begin{align*}
&f(x) + U \cdot \Delta x.
\end{align*}
This definition holds for a function with $n$ inputs: the derivative function has $n$ inputs and $n$
outputs. Its output is something whose ``contraction''\footnote{I understand ``contraction'' to refer to the
multiplicative combination of one object with another object from the dual space. So for example, the
matrix product of a row vector on the left with a column vector on the right.} with the increment in the
function inputs gives a linear approximation to the increment in output.
In the case where these inputs and outputs are $n$-dimensional vectors in $\R^n$ we can write this
\begin{align*}
f(\overrightarrow{x} + \overrightarrow{\Delta x}) \approx \overrightarrow{f(x)} + \overrightarrow{(D f)(x)} \cdot \overrightarrow{\Delta x}.
\end{align*}
Note that the value of the derivative $(D f)(x)$ is compatible for multiplication with the increment
vector $\Delta x$. This is connected to the notions of column vector/row vector, linear
functional\footnote{\url{https://en.wikipedia.org/wiki/Linear_form}}, vector/covector, tensor algebra etc. In SICM they refer to the
output of the derivative function being a ``down tuple'', whereas all the other tuples here are ``up tuples''.
A \defn{partial derivative} is one component of the derivative of a function of multiple inputs.
So for a function $f:X \to Y$, the derivative is the function $D f:X \to X^*$, where $X^*$ is a space
containing versions of $x \in X$ that are compatible for multiplication/contraction with $x$, i.e. a ``dual''
space.
Suppose $f$ has an argument named $a$ that is of type $A$. Then the partial derivative of $f$ with respect to
that argument is $\partial_a{f}:X \to A^*$.
\end{definition*}
\subsection{The chain rule}
\begin{theorem}
Let $g:U \to V$and $f:V \to W$ be functions with derivatives $g':U \to U$ and $f':V \to V$\footnote{Actually,
the output of the derivative function is an element of a dual space, i.e. if the input to $f$ is a column
vector then the output of $f'$ is a row vector.}. Then their composition $f \circ g$ has
derivative $U \to V$ given by
\begin{align*}
(f \circ g)' = g' \cdot (f' \circ g).
\end{align*}
\end{theorem}
{\bf Intuition}: By definition, $(f \circ g)'$ is a function that takes in an increment in the domain of $g$ and returns
something which multiplies that increment to give an approximation to the resulting change in the output
of $f$. The change in the output of $f$ is due to two sources: the sensitivity of $g$ to changes in its input,
and the sensitivity of $f$ to the output of $g$.
Similarly, by definition, $g'$ is a function that takes an increment in the domain of $g$ and returns
something which multiplies that increment to give an approximation to the change in output of $g$.
And $(f' \circ g)$ is a function that takes in a value in the domain of $g$, and returns something which
multiplies an increment in the domain of $f$ to give an approximation to the change in output of $f$. It's the ``derivative of $f$ at $g$''.
In Leibnitz notation this might be written as
\begin{align*}
\ddu f(g(u)) = \dgdu \dfdg.
\end{align*}
\begin{proof}
TODO
\end{proof}
\subsection{The product rule}
\begin{theorem}
Let $f:U \to U$ and $g:U \to U$. Then their product $fg:U \to U$ has derivative
$(fg)':U \to U$ given by
\begin{align*}
(fg)' = f'g + g'f.
\end{align*}
\end{theorem}
\begin{example}
\begin{align*}
\ddx \(x^2\sin(x)\) = 2x\sin(x) + \cos(x)x^2.
\end{align*}
In this example, $f(x) = x^2$ and $g(x) = \sin(x)$. Whereas the theorem was stated above at
the level of functions, this Leibnitz notation gives the value of the derivative-of-the-product at
a single input value $x$.
\end{example}
\subsection{Integration by substitution}
\todo{Incomplete}
\begin{theorem}[Integration by substitution]
Let $g:X \to Y$ and $f:Y \to Z$. Then
\begin{align*}
\int f(g(x)) g'(x) \dx = \int f(g) \dg.
\end{align*}
\end{theorem}
\begin{proof}
From the chain rule we have that if $g:U \to V$ and $f:V \to W$, then
\begin{align*}
(f \circ g)' = g' \cdot (f' \circ g).
\end{align*}
Taking antiderivatives of both sides gives
\begin{align*}
f \circ g = \int (f' \circ g) \cdot g' \du + C,
\end{align*}
and we can make the replacement $g'\du = \dg$ yielding
\begin{align*}
f \circ g = \int (f' \circ g) \dg + C.
\end{align*}
\end{proof}
\begin{theorem*}[Integration by substitution]
Let $u = h(x)$. Then
\begin{align*}
\int g(h(x))h'(x) \dx = \int g(u) \du.
\end{align*}
\end{theorem*}
\begin{proof}
Let $G' = g$, i.e. $G$ is an antiderivative of $g$.
Recall the chain rule:
\begin{align*}
(G \circ h)' = G' h'
\end{align*}
Integrating both sides with respect to $x$ gives
\begin{align*}
G \circ h + C = \int G' h' \dx = \int g h' \dx.
\end{align*}
Let $u = h(x)$. Then
\begin{align*}
G(u) + C &= \int g(u) \du
= \int \frac{\dG}{\dh} \frac{\du}{\dx} \dx.
\end{align*}
\end{proof}
\subsection{Integration by parts}
\begin{theorem}
Let $f:X \to X$ and $g:X \to X$. Then
\begin{align*}
\int fg' \dx = fg - \int gf' \dx.
\end{align*}
\end{theorem}
\todo{Does the RHS need to be $fg - \int g\df$ instead?}
So, if you can recognise an integrand as having a factor that you can integrate, then rewriting the
integral in the IBP form may help.
In Leibnitz notation this might be written
\begin{align*}
\int f(x)g'(x) \dx &= f(x)g(x) - \int f'(x)g(x) \dx,
\end{align*}
or
\begin{align*}
\int u \dvdx \dx = uv - \int v \du.
\end{align*}
\todo{$f'\du$ has become $\du$.}
\begin{proof}
From the product rule we have
\begin{align*}
(fg)' = f'g + g'f.
\end{align*}
Taking antiderivatives of both sides and rearranging gives the result.
\todo{But what happens to the constant of integration?}
\end{proof}
\subsection{Integration by parts: examples}
% \begin{mdframed}
% \includegraphics[width=400pt]{img/integration-by-parts-example-1.png}
% \end{mdframed}
\subsection{Integration by substitution: examples}
\footnotetext{\url{https://en.wikipedia.org/wiki/Integration_by_substitution\#Examples}}
\begin{example}
Evaluate
\begin{align*}
\int_{0}^{2} x\cos(x^2 + 1) \dx.
\end{align*}
\begin{mdframed}
\includegraphics[width=400pt]{img/calculus-integration-by-substitution-example-1.png}
\end{mdframed}
It's easy to see that an antiderivative is $\frac{1}{2}\sin(x^2 + 1)$, leading to the answer
$\frac{1}{2}(\sin 5 - \sin 1)$. 5 radians is in the 3rd quadrant and 1 radian is in the first
quadrant, so $\sin 5$ is negative and $\sin 1$ is positive, and the final result is some negative
number (close to -0.9). But let's do it by substitution.
First, we define a function $u(x) = x^2 + 1$. So the integral is now
\begin{align*}
\int_{x=0}^{x=2} x\cos(u(x)) \dx.
\end{align*}
Next, we notice that $\dudx = 2x$, so the integral can be written as
\begin{align}
\int_{x=0}^{x=2} \frac{1}{2} \dudx \cos(u(x)) \dx. \label{int-by-subst-ex1-1}
\end{align}
So far, nothing we've done is questionable.
But now, we write the integral as
\begin{align*}
\int_{u=1}^{u=5} \frac{1}{2} \cos(u) \du,
\end{align*}
Clearly, this is going to give the same answer as above: $\frac{1}{2}(\sin 5 - \sin 1)$.
But, it requires justification. We've done 3 things:
\begin{enumerate}
\item We apparently replaced $\dudx \dx$ with $\du$.
\item We changed the integral limits to be the corresponding $u$ values.
\item We wrote $\cos(u)$ in place of $\cos(u(x))$.
\end{enumerate}
Note that \eqref{int-by-subst-ex1-1} is of the form
\begin{align*}
\int_{x=a}^{x=b} f(u(x))u'(x) \dx
\end{align*}
How can we justify this jump?
First examine the indefinite integrals:
An antiderivative of $\frac{1}{2}\cos u$ is $\frac{1}{2}\sin u$.
What's an antiderivative of $\frac{1}{2} \dudx \cos(u(x))$?
\end{example}
\begin{example}
Evaluate
\begin{align*}
\int _{0}^{1} \sqrt {1-x^{2}} \dx.
\end{align*}
We can see that this is going to be a positive number (larger than the integral without the square
root transformation). In fact, we can evaluate this immediately: note, for $x \in [0, 1]$, that
$\sqrt{1 - x^2}$ is the y-coordinate of the unit circle in the upper-right quadrant. So the answer
must be $\pi/4$.
This time, there's no obvious antiderivative.
But, we know that $\sin^2 \theta + \cos^2 \theta = 1$, and we notice that the expression
$\sqrt{1 - x^2}$ reminds us of $\sqrt{1 - \sin^2 \theta}$, which is equal to $\cos \theta$.
To proceed, we say ``Let $x = \sin \theta$.'' But what does that mean? Why can we just let $x$ be something else?
What we are doing is saying that, as we move from $x=0$ to $x = 1$, we are free to consider those
$x$ values to be the output of the $\sin$ function, as it sweeps through the first quadrant of the
unit circle ($0$ to $\frac{\pi}{2}$).\footnote{Note that the function $x(\theta) = \sin(\theta)$, when restricted to the domain
$(0, \frac{\pi}{2})$ is a bijective map between $\theta$ values in $(0, \frac{\pi}{2})$ and $x$ values in
$(0, 1)$. This means it is invertible: for every $x$ value along the path that we are integrating
over, there is a uniquely determined $\theta$ value.}
So basically, what we're going to do is evaluate this integral by expressing it as an integral along
a path through $\theta$ values instead of $x$ values. The mapping $x \mapsto \theta$ is defined by the inverse of the
$\sin$ function. We're doing this because, once expressed as an integral along a path through
$\theta$ values, it's going to be easy to evaluate.
So, the integral is now
\begin{align*}
\int _{x=0}^{x=1} \sqrt {1-\sin^{2} \theta} \dx,
\end{align*}
and we know that this is equivalent to
\begin{align*}
\int _{x=0}^{x=1} \cos \theta \dx.
\end{align*}
Notice that we have a $\dx$, and an integrand that's a function of some other variable $\theta$. So in
particular, it would be incorrect to just ``integrate $\cos \theta$'' and say that the answer is
$\sin \theta\Big|_0^1$.
What the integral is saying is: ``walk along the $x$ axis from 0 to 1, and accumulate $\cos \theta$ values as
you do so, where $\theta$ is the angle in the first quadrant whose $\sin$ is $x$.''
And to evaluate that integral, we want to express it as an integral over a path in $\theta$
space. Since $x = \sin \theta$, we have that $\dx = \cos \theta \d\theta$. So the integral is now
\begin{align*}
\int_{\theta=0}^{\theta=\pi/2} \cos^2 \theta \d\theta.
\end{align*}
To proceed one could use the double angle formula $\cos 2\theta = \cos^2\theta - \sin^2\theta$, or integration
by parts. These lead to a value of $\pi/4$, as they must, since the integral is the upper right
quadrant of the unit circle.
\end{example}
\section{Function of multiple variables }
\subsection{The chain rule for a function with multiple inputs}
Suppose that a function $f$ measures something about a particle at a moment in time and depends on
three inputs:
\begin{enumerate}
\item the position $y(\alpha, t)$
\item the velocity $y'(\alpha, t)$
\item the time $t$
\end{enumerate}
where position and velocity depend on a parameter $\alpha$ in addition to time.
Now\footnote{Regarding $\Delta y$, $\Delta y'$, $\Delta f$: these are small increments in the \emph{value} of these
functions. The notation is bad: it implies that they are increments in the function itself (like a
``variation'' in calculus of variations). I can't think of a better notation.}, let the value of $\alpha$ be
changed slightly, to $\alpha + \Delta\alpha$, causing $y(t)$ to change to $y(t) + \Delta y$ and $y'(t)$ to
change to $y'(t) + \Delta y'$. These changes in turn cause $f(t)$ to change to $f(t) + \Delta f$.
We'll use the notation of Spivak (1965)\footnote{Calculus on Manifolds} and Sussman (2001)\footnote{Structure
and Interpretation of Classical Mechanics} for partial derivatives\footnote{See also
\url{http://www.vendian.org/mncharity/dir3/dxdoc/}}. This notation abandons all attempts to indicate what the argument \emph{is} with respect
to which a partial derivative is being taken, instead using an integer subscript to indicate \emph{which} argument it is
(first, second, third, etc).
So define $\del_i g$ to be the partial derivative of a function $g$ with respect to its $i$-th
argument\footnote{Spivak (1965) uses $D_i g$ for this}. We also need a function composition notation that can
handle a function with multiple arguments. So
define ${(f \circ (y, y'))(\alpha, t) := f(y(\alpha, t), y'(\alpha, t), t)}$\footnote{In other
words, $f \circ (y, y')$ is a function which takes the same argument types as do $y$ and $y'$. (The
construction implies that the two functions on the RHS of the circle take the same argument types, as indeed
they do in this case, since one is the derivative of the other.) These arguments are fed independently into
both $y$ and $y'$; the result from $y$ yields the first argument to $f$, and the result from $y'$ yields the
second argument to $f$.}.
The increment in $f(t)$ comes from two sources: the change in $y(t)$ and the change in $y'(t)$. We can use the
definition of partial derivative to make an approximation\footnote{The additive nature of this approximation
needs to be justified I think.} to the increment in $f(t)$:
\begin{align*}
\Delta f \approx ~ &(\del_1 f)(y, y', t) \cdot \Delta y \\
+ &(\del_2 f)(y, y', t) \cdot \Delta y'.
\end{align*}
Here we are abusing notation again: $y$ and $y'$ are not functions but rather the values $y(\alpha, t)$
and $y'(\alpha, t)$.
And we can do the same for $\Delta y$ and $\Delta y'$, replacing them with their linear
approximations given the increment in $\alpha$:
\begin{align*}
\Delta f \approx ~ &(\del_1 f)(y, y', t) \cdot (\del_1 y)(\alpha, t) \cdot \Delta \alpha \\
+ &(\del_2 f)(y, y', t) \cdot (\del_1 y')(\alpha, t) \cdot \Delta \alpha.
\end{align*}
The partial derivative of $f$ with respect to $\alpha$ is written\footnote{It's hard not to want to
write $\del_\alpha f$ here even though that is not Spivak notation.}
$\del_\alpha f := \del_1 (f \circ (y, y'))$. It is defined to be a function which, when evaluated at
$(\alpha, t)$, yields a quantity which multiplies $\Delta\alpha$ to give a linear approximation to the
increment $\Delta f$:
\begin{align*}
\Delta f \approx \Delta\alpha \cdot (\del_\alpha f)(t).
\end{align*}
So we see that the quantity
\begin{align*}
&(\del_1 f)(t) \cdot (\del_1 y)(t) \\
+ &(\del_2 f)(t) \cdot (\del_1 y')(t)
\end{align*}
fits the definition of $(\del_\alpha f)(t)$. That is the partial derivative evaluated at a single
point in time. But we can write the partial derivative as an equation involving functions, as
opposed to function values:
\begin{align*}
\del_1 (f \circ (y, y')) = \del_1 f \cdot \del_1 y + \del_2 f \cdot \del_1 y'.
\end{align*}
Here we are multiplying and adding functions, with these operations defined pointwise.
Let's check the types. Let $t \in \R$, $\alpha \in \R$, and let the codomain of $f$ be $\R$. Then
we have
\begin{align*}
y: &\R^2 \to \R \\
y': &\R^2 \to \R \\
\del_1 y: &\R^2 \to \R \\
\del_1 y': &\R^2 \to \R \\
f: &\R^3 \to \R \\
\del_1 f: &\R^3 \to \R \\
\del_2 f: &\R^3 \to \R \\
f \circ (y, y'): &\R^2 \to \R \\
\del_1 f \circ (y, y'): &\R^2 \to \R
\end{align*}
Alternatively, traditional (Leibniz) notation features a pattern of symbols that looks like
multiplication of fractions with cancellation:
\begin{align*}
\pdv{f}{\alpha} = \pdv{f}{y} \pdv{y}{\alpha} + \pdv{f}{y'}\pdv{y'}{\alpha}.
\end{align*}
\todo{What do the elements of the Leibniz notation mean?\footnote{\url{https://en.wikipedia.org/wiki/Chain_rule}}}
\subsection{Partial derivatives with respect to non-independent inputs}
Consider the function $f(x) = x^2 + 2x$. Clearly the derivative is $(D f)(x) = 2x + 2$.
However, suppose we choose to think of the function as $f(x, x^2) = x^2 + 2x$. In that case the
derivative is
\begin{align*}
(D f)(x, x^2) = (x^2 + 2, 1).
\end{align*}
\todo{Finish this.}
\subsection{Gradient and directional derivative}
\newpage
A working informal definition of derivative is
\begin{quote}
\emph{
The derivative of $f:\R^n \to \R$ at a point $\r$ is something that multiplies an increment
$\Delta \r$ in the input to give an approximation to the associated increment $\Delta f$ in output.
}
\end{quote}
Geometrically, we think of the gradient (i.e. the derivative of a function $\R^n \to \R$) and
directional derivative as, basically, directions in the \emph{input} space $\R^n$. I.e. the gradient
at $\r$ is a ``direction you walk in'' while watching the function value increase above you (and in
this direction it increases more steeply than in any other direction).
Superficially that seems to make some sense because, if the derivative is multiplying an increment
to the input then it has to be the ``same kind of thing'' as an increment to the input.
$\Delta \r$ is a vector in $\R^n$. However, in a vector space, there is no multiplication operation defined
on the set of vectors. So, although we think of the gradient as a vector in $\R^n$, the gradient
$(\grad f)(\r)$ can't literally be a vector in the same vector space as $\Delta \r$, with which it combines
multiplicatively, because no such multiplication operation is defined.
So, backing up, we can modify our definition of derivative as follows:
\begin{quote}
\emph{
The derivative of $f:\R^n \to \R$ at a point $\r$ is a \textbf{function} $\R^n \to \R$ that takes in an increment
$\Delta \r$ in the input and returns an approximation to the associated increment $\Delta f$ in output.
}
\end{quote}
Furthermore, we know that ``the derivative is linear''. What does this mean? Viewed as an operator
mapping functions to functions, this means that the derivative operator is linear under scalar
multiplication and addition \emph{of functions}. Alternatively, we might be saying that the
derivative $f'$ at a point $\r$ is a linear transformation on $\R^n$ in the sense that
$f'(a\Delta \r + b) = af'(\Delta\r) + b$.
So we can improve our definition:
\begin{quote}
\emph{
The derivative of $f:\R^n \to \R$ at a point $\r$ is a \textbf{linear transformation}
$\R^n \to \R$ that takes in an increment $\Delta \r$ in the input and returns an approximation to the
associated increment $\Delta f$ in output.
}
\end{quote}
Now, given a choice of basis, a linear transformation $f':\R^n \to \R$ is represented by a
$1 \times n$ matrix. So when we apply the derivative to the increment in input, we are performing a
matrix-vector multiplication:
\begin{align*}
\Bigg[\pdfdx(\r), \pdfdy(\r), \pdfdz(\r)\Bigg] \bvecMMM{\Delta x}{\Delta y}{\Delta z} \approx \Delta f.
\end{align*}
In some sense this is ``the same'' as the dot product operation:
\begin{align*}
(\grad f)(\r) \cdot \Delta\r \approx \Delta f.
\end{align*}
When the dot product is first introduced, one is encouraged to think of it geometrically, as giving
the projection of one vector onto another, and defining the angle between the two vectors. And of
course, those two vectors are living in the same vector space, otherwise one wouldn't be able to
visualize their geometry like that.
So a correspondence exists: $\vec v_1 \cdot \vec v_2 = \vec v_1^T \vec v_2$, where on the LHS the two
vectors are in the same vector space, and on the RHS $\vec v_1^T$ is an element of a space of
$n \times 1$ matrices, or ``linear functionals''. In differential geometry this latter space is referred
to as the ``cotangent space''.
Because of this one-to-one correspondence between elements of $\R^n$ and linear transformations
$\R^n \to \R$, we are able to think of the gradient simultaneously as a vector in the input space
$\R^n$, \emph{and} as a linear transformation mapping $\Delta\r \in \R^n$ to an approximation to the
increment in output $\Delta f$.
\newpage
\begin{definition*}
Let $f:\R^2 \to \R$. The \defn{gradient} of $f$ evaluated at
$(x, y)$ is the row vector (cotangent vector\footnote{See
\url{https://math.stackexchange.com/a/54359/397805}})
\begin{align*}
(\nabla f)(x, y) = \(\pdfdx(x, y), \pdfdy(x, y)\).
\end{align*}
\end{definition*}
I believe this is the same concept as the Spivak/Sussman definition of the derivative $D f$:
\begin{quote}
``The derivative of a real-valued function of multiple arguments is
an object whose contraction with the tuple of increments in the
arguments gives a linear approximation to the increment in the
function’s value.''\footnote{Sussman et al. Structure and
Interpretation of Classical Mechanics p.483}
\end{quote}
\begin{theorem*}
Let $\d\r$ be an increment in input, and let $\df$ be the linear approximation to the increment in output. Then
\begin{align*}
\df = \grad f \cdot \d\r.
\end{align*}
\end{theorem*}
\begin{theorem*}
The direction of $\grad f$ is perpendicular to the surface\footnote{The
``surface'' of constant f will be a line if the domain of $f$ is
$\R^2$} of constant $f$.
\end{theorem*}
\todo{This stuff about directional derivative and why grad is the direction of steepest ascent is not quite there.}
\begin{definition*}
Let $f:\R^2 \to \R$ and let $u \in \R^2$. The \defn{directional derivative}
of $f$ in the direction of $u$ is
\begin{align*}
(\grad_u f)(\r)
&= u_1\pdfdx(\r) + u_2\pdfdy(\r) \\
&= \vec{u} \cdot (\grad f)(\r)
\end{align*}
\end{definition*}
\begin{theorem*}
The directional derivative converts an increment in the direction of $u$ into an approximation to
the resulting increment in $f$:
\begin{align*}
\Delta f \approx \grad_u f \cdot \Delta \r.
\end{align*}
\todo{but the notation needs to indicate that $\Delta\r$ is in the direction of $u$?}
\end{theorem*}
\begin{proof}
\todo{}
\end{proof}
\begin{theorem*}
The direction of $\grad f$ at $\r$ is the direction of steepest increase in $f$ at $\r$.
\end{theorem*}
\begin{proof}
Let $u \in \R^2$ be a unit vector. We seek the $u$ which maximises the directional derivative
$(\grad_u f)(\r)$. By definition of directional derivative we have
\begin{align*}
(\grad_u f)(\r) &= \vec{u} \cdot (\grad f)(\r),
\end{align*}
therefore the $u$ we seek is the $u$ which maximises this dot product. Therefore it has the same
direction as $(\grad f)(\r)$.
\end{proof}
\newpage
\section{The Fundamental Theorem of (Integral) Calculus}
\begin{figure}[h]
\centering
\includegraphics[width=500pt]{img/newton-october-1666-tract-ftc.png}
\captionsetup{labelformat=empty,justification=centering}
\caption[xxx]{Newton's October 1666 Tract on Fluxions.\\
\emph{``...the motion by which y increaseth will bee $bc = q$.''}}
\end{figure}
\footnotetext{\url{https://cudl.lib.cam.ac.uk/view/MS-ADD-03958/109}}
\includegraphics[width=300pt]{img/ftc.png}
Recall that the definition of $\int_a^b f(x) \dx$ is the area under the graph,
computed as the limit of approximating rectangles (Riemann sums).
Consider an ``accumulation function'', or ``area-so-far function'' $F$ defined
as
\begin{align*}
F(x) = \int_0^x f(u) \d u.
\end{align*}
$F(x)$ is the amount that has accumulated when we are at point $x$ in the
input space.
The FTC comes in two parts. Part I states that the derivative of the
area-so-far function is the original function of interest:
\begin{align*}
\ddx F(x) = f(x).
\end{align*}
Note that this is the first time we have connected integration with
differentiation: $F$ was defined as a definite integral (area-so-far); nothing
in its definition involved differentiation.
Part II states that the definite integral $\int_a^b f(x) \dx$ can be computed as
\begin{align*}
\int_a^b f(x) \dx = F(b) - F(a).
\end{align*}
I think that this is obvious from the definition of $F$ as area-so-far, but the
point is that part I has shown us that $F$ might be obtainable as an
antiderivative of $f$ rather than via some explicit area calculation
(e.g. Riemann sums).
So how do we prove this? What exactly is it we need to prove anyway? We have a
definition for derivative, and we have a definition for area-so-far (limit of
Riemann sums). So, first, using the definition of derivative,
\begin{align*}
\ddx F(x) := \lim_{h \to 0} \frac{F(x+h) - F(x)}{h}.
\end{align*}
In the numerator is the area above a horizontal section of width
$h$. Intuitively, this is approximately $hf(x)$, giving
\begin{align*}
\ddx F(x) = \lim_{h \to 0} \frac{hf(x)}{h} = f(x),
\end{align*}
as desired. How to make this rigorous? Using the Riemann sums definition of area,
\begin{align*}
\ddx F(x) &= \lim_{h \to 0} \frac{\lim_{N \to \infty} \sum_i^N \frac{h}{N} f\(x + \frac{ih}{N}\)}{h}\\
&= \lim_{N \to \infty} \frac{1}{N} \sum_i^N \lim_{h \to 0} f\(x + \frac{ih}{N}\)\\
&= \lim_{N \to \infty} \frac{1}{N} \sum_i^N f(x)\\
&= f(x).
\end{align*}
But in fact real proofs use the Extreme Value Theorem. I am told that one error
in the above proof is that it is not valid to exchange the order of the two
limits.
TODO FTC -- moving away from thinking that an integral ``just has to end with
d-something''. Why does one seek the antiderivative of the part without the
d-something?
\subsection*{FTC in Penrose - The Road To Reality}
\begin{mdframed}
\includegraphics[width=200pt]{img/calculus-ftc-penrose-1.png}
\includegraphics[width=250pt]{img/calculus-ftc-penrose-2.png}
\end{mdframed}
\begin{itemize}
\item An integral of a real-valued function $f$ gives the area under the curve $f(x)$.
\item So, basically, it's equal to the sum of a bunch of (base) x (height) calculations: $\Delta x \times f(x)$.
\item Now, suppose we can find a function $g$ whose \textit{slope} at $x$ is equal to the height $f(x)$.
\item That means that we can now think of $\Delta x \times f(x)$ as (increment in input) x (slope).
\item So, what we were thinking of as a sum of rectangles under $f$, we can now think of a sum of
(increments in height of $g$).
\item The end result is that the net area accumulated under $f(x)$ is equal to the net change in height of
the function $g(x)$.
\item More generally (e.g. complex-valued $f$), an integral $\int_{a \to b} f(z) \dz$ gives an ``amount of
function value accumulated'' along some path from $a$ to $b$.
\item But the same argument applies: if we can find a function $g$ whose derivative $g'$ is equal to
$f$, then the integral becomes a sum of (increment in input) x (derivative) calculations, and the
value of the integral is equal to the net change in output of $g$ over the interval.
\end{itemize}
One implication of this is that if we are evaluating an integral of $f$ over some interval $(a, b)$ we
only need to find a $g$ whose derivative is $f$ over that same interval; it doesn't have to be over the
whole domain. Not sure what the version of that statement is for domains other than real intervals.
\subsubsection*{Examples}
In all the following examples, some quantity is
``accumulating''\footnote{``Accumulating'' can involve decreasing as well as
increasing. For example if the particle starts moving back towards the
origin, or if the vase is being filled with a tube and someone starts sucking
on it rather than dispensing water.}.
\begin{enumerate}
\item $F(x)$ is the area under a graph to the left of $x$.\\
$f(x)$ is the height of the graph at $x$.\\
\item $F(x)$ is the volume of a vase between the base and height $x$. \\
$f(x)$ is the cross-sectional area at height $x$.\\
\item $F(r)$ is the area of a circle with radius $r$.\\
$f(r)$ is the diameter of a circle with radius $r$.\\
\item $F(t)$ is the volume of water in a vase that is being filled, at time $t$.\\
$f(t)$ is the rate of filling at time $t$.\\
\item $F(t)$ is the position of a moving particle at time $t$, relative to the origin.\\
$f(t)$ is the velocity of the particle at time $t$.\\
\item $F(t)$ is the number of bacteria at time $t$.\\
$f(t)$ is the rate at which new bacteria are produced at time $t$.
\end{enumerate}
\subsubsection*{Constant rate}
\begin{enumerate}
\item The height of the graph is constant at $h$ (a rectangle).\\
The area to the left of $x$ is $hx$.\\
\item $F(x)$ is the volume of a vase between the base and height $x$. \\
The cross-sectional area is constant at $a$ (a cylinder).\\
$F(x) = ax$\\
\item $F(t)$ is the volume of water in a vase that is being filled, at time $t$.\\
Water enters at a constant rate $v$ liters/sec.\\
$F(t) = vt$\\
\item $F(t)$ is the displacement of a moving particle at time $t$, relative to the origin.\\
The velocity of the particle is constant at $v$ m/sec.\\
$F(t) = vt$.\\
\item $F(t)$ is the number of bacteria at time $t$.\\
Bacteria are produced at a constant rate $v$ bacteria/sec.\\
$F(t) = vt$.
\end{enumerate}
The amount-so-far can be computed manually:
\begin{enumerate}
\item If the rate of increase is constant at $v$, then the amount to the left
of $x$ is simply $vx$.\\
\item If the rate of increase at time $t$ is $ct$ (proportional to $t$), then
the amount-so-far graph is a triangle, so the amount to the left of $t$ is
$\frac{1}{2}\cdot ct \cdot t = \frac{1}{2}ct^2$.\\
\item If the rate of increase at point $r$ is $2\pi r$ (the outer edge of a
growing disc), then the amount-so-far graph is a triangle again, and the area
of the disc is $\frac{1}{2}\cdot r \cdot 2\pi r = \pi r^2$.
\end{enumerate}
What about if the rate of increase is a more complex function? We can still
compute the area so far manually, as a limit of Riemann sums:
Compare
\begin{align*}
\int_0^2 (2 - x^2) \dx
&= \lim_{N \to \infty}\sum_{i=1}^N \frac{2}{N}\(2 - \(\frac{2i}{N}\)^2\) \\
&= \lim_{N \to \infty}\sum_{i=1}^N \frac{4}{N} - \frac{8i^2}{N^3} \\
&= \lim_{N \to \infty}\(4 - \frac{8}{N^3}\sum_{i=1}^Ni^2 \)\\
&= \lim_{N \to \infty}\(4 - \frac{8}{N^3}\frac{N(N+1)(2N+1)}{6} \)\\
&= \lim_{N \to \infty}\(4 - 8\frac{(N+1)(2N+1)}{6N^2} \)\\
&= \lim_{N \to \infty}\(4 - 8\frac{2 + 3N^{-1} + N^{-2}}{6} \)\\
&= 4 - \frac{8}{3} = \frac{4}{3}\\
\end{align*}
with the solution using antiderivatives:
\begin{align*}
\int_0^2 (2 - x^2) \dx
&= \left[2x - \frac{x^3}{3}\right]_0^2 \\
&= 4 - \frac{8}{3} = \frac{4}{3}.
\end{align*}
\newpage
Let's fix a physical example for discussing FTC: a moving object. The key
quantity here is the distance from the starting point.
Next, before writing the equations that state the FTC, let's be clear about the
objects that are going to be involved in those equations. The most important
object is a function that gives the distance from the starting point as a
function of time.
More generally, this is an ``accumulation function'', or ``area-so-far
function''.
Now, let's introduce some notation. The notation $\int_3^4 f(t) \dt$ is
\textit{defined} to mean the area under the curve $f$, between $3$ and
$4$. It's really important to be clear here: the definition of
$\int_3^4 t^2 \dt$ is simply that it is the area under the $t^2$ curve between
those two points. (In particular, note that the definition does \textit{not}
involve $\frac{1}{3}t^3$).
Similarly, $\int_0^4 f(t) \dt$ is the area under the curve between 0 and 4. The
answer is a number. The answer doesn't involve $t$: $t$ is just a variable used
internally in that expression.
Now comes a slightly less obvious point: if the upper limit is not a fixed
number, but a variable, as in $\int_0^{x} f(t) \dt$, then that entire
expression represents a function of $x$: it takes in an $x$ value and outputs
the area under the curve, between 0 and $x$. We can give the new function a
name, $g$, and write the definition of $g$ as
\begin{align*}
g(x) = \int_0^{x} f(t) \dt.
\end{align*}
\includegraphics[width=200pt]{img/stewart-ftc-1.png}
Functions like $g$ are ``accumulation functions'', or ``area-so-far
functions'', because they tell you the area up to $x$, i.e. the area to the
left of $x$.
The FTC is usually split into two parts. The first part states\\
\begin{mdframed}
At any point $x$, the rate of change of the area-so-far function at that
point is the same as the height of the curve at that point.
\end{mdframed}
This is what Newton was saying when he wrote ``...the motion by which y
increaseth will bee $q$.'': in his diagram, $y$ is the area, and $q$ is the
height of the curve\footnote{He actually wrote ``$bc=q$''; $bc$ is a line in
his diagram with length $q$.}.
\section{Differentiation theorems}
\begin{theorem*}[Quotient rule]
$\(\frac{f}{g}\)' = \frac{gf' - fg'}{g^2}$
\end{theorem*}
\subsection{Derivatives of trigonometric functions}
\begin{claim*}
$\tan' = \frac{1}{\cos^2} =: \sec^2$
\end{claim*}
\begin{proof}
$\tan = \frac{\sin}{\cos}$, so by the quotient rule
\begin{align*}
\tan'
= \frac{\cos^2 + \sin^2}{\cos^2}
= \frac{1}{\cos^2}
= \sec^2.
\end{align*}
\end{proof}
\begin{claim*}
What is the derivative of $\sin^\1$?
\end{claim*}
\begin{proof}
\begin{align*}
\frac{\d sin^\1 a}{\d a}
= \frac{\d \theta}{\d \sin \theta}
= \frac{1}{\cos \theta}
= \frac{1}{\sqrt{1 - \sin^2 \theta }}
= \frac{1}{\sqrt{1 - a^2}}
\end{align*}
\end{proof}
\begin{claim*}
What is the derivative of $\tan^\1$?
\end{claim*}
\begin{proof}
\begin{align*}
\frac{\d \tan^\1(a)}{\d a}
= \frac{\d \theta}{\d \tan(\theta)}
= \cos^2(\theta)
= \cos^2(\tan^\1 a)
\end{align*}
Note that a right-angle triangle with angle $\tan^\1 a$ has opposite length $a$ relative to
adjacent length 1. Therefore $\cos(\tan^\1 a) = \frac{1}{\sqrt{1 + a^2}}$.
Therefore the derivative of $\tan^\1(a)$ is $\frac{1}{1 + a^2}$.
\end{proof}
\section{Constrained optimization: Lagrange Multipliers}
Consider a scalar-valued function $f:\R^n \to \R$.
$f$ is a set $\{(x, y) ~|~ x \in \R^n, y = f(x) \}$.
\begin{definition*}
The \defn{optimization problem} is to find the set of input values for which the function value is
minimal. I.e. the problem is to find
\begin{align*}
\argmin f = \{ x ~|~ x \in \R^n, f(x) = f^{\text{min}} \},
\end{align*}
where $f^{\text{min}} = \min\{f(x) ~ | ~ x \in \R^n\}$.
\end{definition*}
This can be solved using the standard search for stationary points of $f$: i.e. compute the
derivative function $\grad f$ (a vector field) and find the zeros of this function:
$\{\vec{x} ~|~ (\grad f)(x) = \vec{0} \}$. In other words, we are examining the \emph{input } space, looking
for points where the gradient is the zero vector. When considering a candidate point $\vec{x}$
we are concerned with the gradient at that point and not directly concerned with the function value
$f(\vec{x})$.
Now consider a \defn{constrained optimization problem}: we want to find minima within a certain
subset of the domain. We will initially require this subset to be a curve in the domain.
Recall that there are various ways to specify a curve in the domain $\R^n$, including:
\begin{enumerate}
\item As an ``implicit'' equation, i.e. a {\it relation} $g(x, y, z) = 0$ (the RHS may always be taken to be zero WLOG).
\item Parametrically, e.g. $\vecMMM{x(t)}{y(t)}{z(t)}$
\end{enumerate}
In the first case, for some curves it is possible to rearrange the implicit equation to express one
coordinate as a function of the others, i.e. $g(x, y, z) = 0 \iff z = h(x, y)$.
So for example, the explicit equation $y = 2x + 1$ is equivalent to the implicit relation
$2x - y + 1 = 0$. The explicit version describes a line in $\R^2$, whereas the implicit
version is a slice through an explicit equation of a plane in $\R^3$ ($x = 2x - y + 1$).
On the other hand, the implicit relation $ax^2 + by^2 = 0$ (an ellipse in $\R^2$ centered at
the origin) cannot be expressed as an explicit equation in $\R^2$.
Here we will specify the constraint set implicitly as the set of points in the domain satisfying
\begin{align*}
g(x) = 0,
\end{align*}
where $g:\R^n \to \R$ is a differentiable function (we take the RHS to be zero WLOG).
Geometrically, we can suppose that the domain is $\R^2$ and we can visualize the constraint function $g$ as a
surface in $\R^3$: the constraint set is the intersection of this surface with the x-y plane.
\todo{How does the theory hold up to distinct choices of $g$ which yield the same constraint set?}
So in other words, we can specify the points in the domain that satisfy the constraint arbitrarily
by choosing $g$ such that it is zero at those points; we just have to ensure that $g$ is differentiable.
Let $g:\R^n \to \R$, and consider the set $\{ x ~|~ x \in \R^n, g(x) = 0 \}$.
\begin{definition*}
The \defn{constrained optimization problem} is to find the set of input values \emph{in the
constraint set} for which $f$ is minimum:
\begin{align*}
\{ x ~|~ x \in \R^n, g(x) = 0, f(x) = f^{\text{min}} \},
\end{align*}
where $f^{\text{min}} = \min \{f(x) ~|~ x \in \R^n, g(x) = 0\}$.
\end{definition*}
\begin{theorem*}[Lagrange multiplier]
Let $f: \R^n \to \R$ and $g:\R^n \to \R$ be differentiable.
Define $\mathcal{L}(x, \lambda) = f(x) - \lambda g(x)$ for $\lambda \in \R$.
Then the $x$-coordinates of the stationary points of $\mathcal{L}$ are maxima/minima of $f$ subject to
the constraint that $g(x) = 0$.
\end{theorem*}
\begin{example}
Consider $f:\R^2 \to \R$ given by $f(x_1, x_2) = -(x_1^2 + x_2^2)$. This is a convex function with its maximum at the origin.