- Nov 2022
-
mml-book.github.io mml-book.github.io
-
latent variable
latent variables are variables that you cannot be observed
-
One way to detectoverfitting inpractice is toobserve that themodel has lowtraining risk buthigh test risk duringcross validation
overfitting = high acc during training and low acc during testing
-
Model Fitting
how well a model is learning
-
cross-validation
technique used to evaluate how well your model is doing
-
validation se
I have always been confused by the validation set. It is a set used to provide a glimpse of how your model will react to the data. Usually you take a portion of the training set to create the validation set
-
egularization
technique used to reduce overfitting
-
overfitting
overfitting = during training the error is small where as during testing it is large
-
Another phrasecommonly used forexpected risk is“population risk”
From what I know, population risk is the number of individuals at risk. Is it samething as expected risk ?
-
independent and identicallyindependent andidenticallydistributed distributed
what is a set of example here? I am thinking of it as features rather than anything else. But features are dependent upon one another so I am not sure what this means
-
Affine functions areoften referred to aslinear functions inmachine learning
affine faction = linear function
-
Training or parameter estimation
adjust predictive model based on training data.
In order to find good predictors do one of two things: 1) find the best predict based on some measure of quality (known as finding a point estimate) and 2) using bayesian inference
-
Prediction or inference
predict on unseen test data. 'inference' can mean prediction for non-probabilist models or parameter estimation
-
goal of learning is to find a model and its corresponding parame-ters such that the resulting predictor will perform well on unseen data
important
-
noisy observation
real-life data is always noisy
-
example or data point
I thought rows were observations or instances?
-
we do not expect the iden-tifier (the Name) to be informative for a machine learning task
This is a good reminder that only query the columns or data that are relevant to the exercise
-
features, attributes, or covariates
-
What dowe mean by good models?
This is a great question. I usually think of models as algorithms
-
good models should perform well on unseendata
The main idea in implementing a machine learning model
Tags
Annotators
URL
-
- Oct 2022
-
mml-book.github.io mml-book.github.io
-
This derivation iseasiest tounderstand bydrawing thereasoning as itprogresses.
The reasoning of the derivative?
-
Exampleof a convex set
an easy example to identify convex sets. One way to determine a convex sex is to keep in mind that if at given given point within the set if a line segment is within the set it is convex
-
Lagrange multiplier
Lagrange aims to find the local minima and maxima of function
-
The step-size is alsocalled the learningrate.
when implementing a neural net, the learning rate is a hyper parameter that controls how much to adjust the weights by w.r.t to the gradient
-
We use theconvention of rowvectors forgradients
so a matrix? or just rows like this : [a b c ]?
-
minx f (x)
This is important. For all optimization problems, the end goal is to minimize the function
-
Linear Program
interesting example. Seeing how the linear programs can be plotted.
-
elative frequencies of eventsof interest to the total number of events that occurred
isn't this the definition of mean?
-
abducted by aliens
lol
-
Theorem 4.3. A square matrix A ∈ Rn×n has det(A) 6 = 0 if and only ifrk(A) = n. In other words, A is invertible if and only if it is full rank
refer to section 2.6.2 for rank definition
-
pA(λ) := det(A − λI)
-
(−1)k+j det(Ak,j )a cofactor
-
det(Ak,j ) is calleda minor
-
det(A) =n∑k=1(−1)k+j akj det(Ak,j )
-
Adding a multiple of a column/row to another one does not changedet(A)
-
Swapping two rows/columns changes the sign of det(A)
-
det(λA) = λn det(A)
-
If A is regular (invertible), then det(A−1) = 1det(A)
-
(4.7)
determinant of 3x3 matrix
-
(4.6)
determinant of 2x2 matrix
-
is invertibleif and only if det(A) 6 = 0
Invertible: det(A) \(\neq 0\)
-
determinant of a square matrix A ∈ Rn×n is a function that maps A
determinant
-
- Sep 2022
-
mml-book.github.io mml-book.github.io
-
rotation matrix
coordinates of rotation in the form on basis vectors
-
rotation
linear mapping that rotates a plan by angle \(\theta\) with respect to origin
if angle \(\theta\) > 0 rotate counterclockwise
-
orthogonal basis
$$<b_{i}, b_{j}> = 0, i \neq j$$
-
orthogonal complement
Let W be a subspace of a vector space V. Then the orthogonal complement of W is also a subspace of V. Furthermore, the intersection of W and its orthogonal complement is just the zero vector.
-
normal vector
vector with magnitude 1, \(||w|| = 1\) and is perpendicular to the surface
-
Gram-Schmidt process
concatenate basis vector (non-orthogonal and unnormalized) into a matrix, apply gaussian eliminate and obtain an orthonormal basis
-
Orthonormal Basi
basis vectors = subset of vectors linearly independent if orthonormal basis -> orthogonal basis
-
3.32
distance of orthogonal matrix
-
‖Ax‖2 = (Ax)>(Ax) = x>A>Ax = x>Ix = x>x = ‖x‖2
this is an important proof of dot product for an orthogonal matrix
-
Orthogonal Matrix
$$AA^T = I = A^TA \Rightarrow A^{-1} = A^T$$ orthonormal columns
-
〈x, y〉
this is equal to 1, which does not meet the requirement of orthogonality
-
(Orthogonality
if \(<x,y> = 0\) and \(||x|| = ||y|| = 0\)<br /> any two lines that are perpendicular - 90 degree angle
-
cos ω
used to find angle between vector
-
x, y) 7 → d(x, y)
if x and y are two points in a vector space then, you can find the distance
-
d(x, y) := ‖x − y‖ =√〈x − y, x − y
so a Euclidean distance is a distance from point x to point y, so the shortest path (a straight line). I don't understand the difference between distance and Euclidean distance. Isn't distance also a dot product? how would you do the calculation?
-
inner product returns smallervalues than the dot product if x1 and x2 have the same sig
this is interesting
-
atisfies (3.11) is called symmetric, positive definite
symmetric positive definite
-
(3.9)
The inner product must be positive definite, symmetric and bilinear. test for inner product: let v = (1,2) -> <v,v> = (1)(1) - (1)(2) - (2)(1) + 2(2)(2) = 1 - 2 -2 + 8 = -3 + 8 = 5 (symmetric, bilinear and positive)
test for dot product: as per (3.5) the right side does not equal the left side
-
se the dot product defined in (3.5), we call(V, 〈·, ·〉) a Euclidean vector space
euclidean vector space
-
The pair (V, 〈·, ·〉) is called an inner product space
inner product space
-
positive definite, symmetric bilinear mapping Ω : V ×V → R is calledan inner product on V
-
positive definite if positive definite∀x ∈ V \{0} : Ω(x, x) > 0 , Ω(0, 0) = 0
-
symmetric if Ω(x, y) = Ω(y, x)
a symmetric matrix was: (A^(-1))^T = (A^(T))^-1
-
x>y =n∑i=1xiyi
inner product and dot product interchangeable here
-
3.4
distance from the origin of a vector
-
Positive definite: ‖x‖ > 0 and ‖x‖ = 0 ⇐⇒ x = 0
-
Triangle inequality: ‖x + y‖ 6 ‖x‖ + ‖y‖
-
Absolutely homogeneous: ‖λx‖ = |λ|‖x‖
-
A norm on a vector space V is a function
Tags
Annotators
URL
-