latent variable
latent variables are variables that you cannot be observed
latent variable
latent variables are variables that you cannot be observed
One way to detectoverfitting inpractice is toobserve that themodel has lowtraining risk buthigh test risk duringcross validation
overfitting = high acc during training and low acc during testing
Model Fitting
how well a model is learning
cross-validation
technique used to evaluate how well your model is doing
validation se
I have always been confused by the validation set. It is a set used to provide a glimpse of how your model will react to the data. Usually you take a portion of the training set to create the validation set
egularization
technique used to reduce overfitting
overfitting
overfitting = during training the error is small where as during testing it is large
Another phrasecommonly used forexpected risk is“population risk”
From what I know, population risk is the number of individuals at risk. Is it samething as expected risk ?
independent and identicallyindependent andidenticallydistributed distributed
what is a set of example here? I am thinking of it as features rather than anything else. But features are dependent upon one another so I am not sure what this means
Affine functions areoften referred to aslinear functions inmachine learning
affine faction = linear function
Training or parameter estimation
adjust predictive model based on training data.
In order to find good predictors do one of two things: 1) find the best predict based on some measure of quality (known as finding a point estimate) and 2) using bayesian inference
Prediction or inference
predict on unseen test data. 'inference' can mean prediction for non-probabilist models or parameter estimation
goal of learning is to find a model and its corresponding parame-ters such that the resulting predictor will perform well on unseen data
important
noisy observation
real-life data is always noisy
example or data point
I thought rows were observations or instances?
we do not expect the iden-tifier (the Name) to be informative for a machine learning task
This is a good reminder that only query the columns or data that are relevant to the exercise
features, attributes, or covariates
What dowe mean by good models?
This is a great question. I usually think of models as algorithms
good models should perform well on unseendata
The main idea in implementing a machine learning model
This derivation iseasiest tounderstand bydrawing thereasoning as itprogresses.
The reasoning of the derivative?
Exampleof a convex set
an easy example to identify convex sets. One way to determine a convex sex is to keep in mind that if at given given point within the set if a line segment is within the set it is convex
Lagrange multiplier
Lagrange aims to find the local minima and maxima of function
The step-size is alsocalled the learningrate.
when implementing a neural net, the learning rate is a hyper parameter that controls how much to adjust the weights by w.r.t to the gradient
We use theconvention of rowvectors forgradients
so a matrix? or just rows like this : [a b c ]?
minx f (x)
This is important. For all optimization problems, the end goal is to minimize the function
Linear Program
interesting example. Seeing how the linear programs can be plotted.
elative frequencies of eventsof interest to the total number of events that occurred
isn't this the definition of mean?
abducted by aliens
lol
Theorem 4.3. A square matrix A ∈ Rn×n has det(A) 6 = 0 if and only ifrk(A) = n. In other words, A is invertible if and only if it is full rank
refer to section 2.6.2 for rank definition
pA(λ) := det(A − λI)
(−1)k+j det(Ak,j )a cofactor
det(Ak,j ) is calleda minor
det(A) =n∑k=1(−1)k+j akj det(Ak,j )
Adding a multiple of a column/row to another one does not changedet(A)
Swapping two rows/columns changes the sign of det(A)
det(λA) = λn det(A)
If A is regular (invertible), then det(A−1) = 1det(A)
(4.7)
determinant of 3x3 matrix
(4.6)
determinant of 2x2 matrix
is invertibleif and only if det(A) 6 = 0
Invertible: det(A) \(\neq 0\)
determinant of a square matrix A ∈ Rn×n is a function that maps A
determinant
rotation matrix
coordinates of rotation in the form on basis vectors
rotation
linear mapping that rotates a plan by angle \(\theta\) with respect to origin
if angle \(\theta\) > 0 rotate counterclockwise
orthogonal basis
$$<b_{i}, b_{j}> = 0, i \neq j$$
orthogonal complement
Let W be a subspace of a vector space V. Then the orthogonal complement of W is also a subspace of V. Furthermore, the intersection of W and its orthogonal complement is just the zero vector.
normal vector
vector with magnitude 1, \(||w|| = 1\) and is perpendicular to the surface
Gram-Schmidt process
concatenate basis vector (non-orthogonal and unnormalized) into a matrix, apply gaussian eliminate and obtain an orthonormal basis
Orthonormal Basi
basis vectors = subset of vectors linearly independent if orthonormal basis -> orthogonal basis
3.32
distance of orthogonal matrix
‖Ax‖2 = (Ax)>(Ax) = x>A>Ax = x>Ix = x>x = ‖x‖2
this is an important proof of dot product for an orthogonal matrix
Orthogonal Matrix
$$AA^T = I = A^TA \Rightarrow A^{-1} = A^T$$ orthonormal columns
〈x, y〉
this is equal to 1, which does not meet the requirement of orthogonality
(Orthogonality
if \(<x,y> = 0\) and \(||x|| = ||y|| = 0\)<br /> any two lines that are perpendicular - 90 degree angle
cos ω
used to find angle between vector
x, y) 7 → d(x, y)
if x and y are two points in a vector space then, you can find the distance
d(x, y) := ‖x − y‖ =√〈x − y, x − y
so a Euclidean distance is a distance from point x to point y, so the shortest path (a straight line). I don't understand the difference between distance and Euclidean distance. Isn't distance also a dot product? how would you do the calculation?
inner product returns smallervalues than the dot product if x1 and x2 have the same sig
this is interesting
atisfies (3.11) is called symmetric, positive definite
symmetric positive definite
(3.9)
The inner product must be positive definite, symmetric and bilinear. test for inner product: let v = (1,2) -> <v,v> = (1)(1) - (1)(2) - (2)(1) + 2(2)(2) = 1 - 2 -2 + 8 = -3 + 8 = 5 (symmetric, bilinear and positive)
test for dot product: as per (3.5) the right side does not equal the left side
se the dot product defined in (3.5), we call(V, 〈·, ·〉) a Euclidean vector space
euclidean vector space
The pair (V, 〈·, ·〉) is called an inner product space
inner product space
positive definite, symmetric bilinear mapping Ω : V ×V → R is calledan inner product on V
positive definite if positive definite∀x ∈ V \{0} : Ω(x, x) > 0 , Ω(0, 0) = 0
symmetric if Ω(x, y) = Ω(y, x)
a symmetric matrix was: (A^(-1))^T = (A^(T))^-1
x>y =n∑i=1xiyi
inner product and dot product interchangeable here
3.4
distance from the origin of a vector
Positive definite: ‖x‖ > 0 and ‖x‖ = 0 ⇐⇒ x = 0
Triangle inequality: ‖x + y‖ 6 ‖x‖ + ‖y‖
Absolutely homogeneous: ‖λx‖ = |λ|‖x‖
A norm on a vector space V is a function