- Last 7 days
-
mlpr.inf.ed.ac.uk mlpr.inf.ed.ac.uk
-
Describe how youcould incorporate this information into your analysis.
Flag: suggested answer (don't read if don't want to see a (possibly incorrect) attempt:
Update - realise some bi-modal continuous distribution may be better (but potentially difficult to perform the update)
Attempt: we model the parameter pi in a Bayesian way: we put a distribution on pi (0.7 w.p 1/2, 0.2 w.p 1/2) then we weight the 1/2 with the likelihood of the observations, given that parameter (i.e. what is the likleihood when pi = 0.7, multiply that by 1/2 then divide by the normalizing constant to get our new probability for pi = 0.7 (do the same for pi = 0.2, the normalizing constant is the sum of the 'scores' for 0.7 and 0.2 i.e. 1/2 * likelihood so we can't 'divide by the normalising constant until we have the score for both 0.2 and 0.7)
-
xplain your answers
Flag - suggested answer (don't read if don't want to see a (possibly incorrect) attempt:
Grateful for comments here as I am not very certain on the situations that the MLE approach is better vs situations where Bayesian approach is better
Suggested answer:
c(i) Is frequentist approach where we have one parameter estimate (the MLE) c(ii) bayesian approach - distribution over parameters and we update our prior belief based on observations If we have no prior belief - c(i) may be a better estimate (i.e. in (my version of) c(ii) we are constraining the parameters to be 0.7 or 0.2 and updating our relative convictions about these - which is a strong prior asssumption (we can never have 0.5 for instance) If we do have prior belief and also want to incorporate uncertainty estimations in our parameters, I think c(ii) is better If the MLE is 0.7 then we will have c(i) giving 0.7 and c(ii) giving 0.7 with a very high probability and 0/2 with a very low probability to the methods will perform similarly
-
likelihood estimator of π?
Flag: suggested answer (don't read if don't want to see a )(possibly incorrect) attempt:
attempt: MLE = k/3
-
If you thought that this assumption was unrealistic, howwould you relax this assumption
Flag: Don't read if don't want to see a (possibly incorrect) attempt of an answer: (Grateful for any comments/disagreements, further points to add)
Attempted answer: Assumption is that, given a class, features are independent. We could relax this by using 2-d gaussians for our class distributions that have non-zero covariance (off-diagonal) terms so that we have dependencies between features (currently we have these set to zero for independence)
-
-
mlpr.inf.ed.ac.uk mlpr.inf.ed.ac.uk
-
overfitting.
(TLDR: Can summarise the question I am asking in this post as: Imagine you were emailing a machine learning expert (and expert in deduction) about an ML experiment you had done and to which you suspect you are overfitting. A priori, they know no information about anything you are doing: what is the minimum amount of information they need to be able to respond 'yes you are overfitting')
Assume she might say this because average test error is higher than average training error, which is typically the case for overfitting models (but we have seen in lectures that this can be the case for underfitting models too i.e. loss vs K (complexity) graph from lecture 5)
We tell someone the mean train and test errors for one fitted model - and we tell them that the mean test error is higher than the mean training error. Question asks to explain why, from this data, one cannot infer that the model is overfitting.
This question has highlighted some gaps in my knowledge about what overfitting actually means and what information we would need to tell someone about the results of an experiment for them to be able to correctly deduce 'ah yes, your model is overfitting'
I have listed my thoughts below in terms of different 'further information' we could tell them (but I am not entirely sure about the answer thus the post):
The main thoughts I have are: (i) We give them more granular information about the train and test errors (rather than just the means): Can we tell them them the train and test loss for every point. Now they know the variation and mean of your train and test loss - can they deduce overfitting from this? (I don't think so, but not sure - maybe low variation of the train error is indicative of overfitting?)
(ii) Do we need to tell them about what model we have actually learned (i.e. the parameters we have learned) - not just the errors on training and test sets (intuitively for me this would be sufficient (I think) : if it was me, I would want to draw the function and see 'how wiggly it is' (extrapolating complex patterns that the training data doesn't show) - if that's the case I would say yes it seems your function is overfitting (but maybe I can't actually make that statement)
(iii) Maybe telling them about just one model isn't actually enough: we have to tell them the results about a less complex model (i.e. models from different experiments) Ff you told them that the test error was lower on a less complex model then I think they can confidently say 'yes your model is overfitting')
I am a bit confused here so any help would be fantastic
-