- Jan 2022
-
arxiv.org arxiv.org
-
Conclusion
It might be me, but I was missing a section on possible future directions and the drawbacks of using this model. Noticed that there was a lot of praise, but couldn't find a lot of critiques. (Or I might overseen it as I'm writing this at half 4 in the morning.)
Normally, you could find the published reviews on OpenReview with the discussed critique (see: https://papers.nips.cc/paper/2016/file/fb87582825f9d28a8d42c5e5e5e8b23d-Reviews.html which I used for another repro course), but when googling on this paper, I couldn't find it :(
Either way, here's the OR link to the current paperhttps://openreview.net/forum?id=FOL6UA7JYtk)
-
We believe that StylEx is a promising step towardsdetection and mitigation of previously unknown biases inclassifiers.
Question: What kind of biases in classifiers?
-
83.2%
Quite impressive results imo, SOTA!
-
In general for all coordinates extracted by StylEx,except one, there is a common word shared by all descrip-tions, and only in two coordinates the most common word isthe same. On the other hand, for Wu et al. less than half ofthe coordinates have a common word in all their descriptions,and two pairs of coordinates have the same most commonword.
-
Users are shown four animated GIFs, each correspondingto modifying an attribute for a given image.
-
We first demonstrate that each StylEx attribute corre-sponds to clear visual concepts, and then show that these canbe used to explain classifier outputs on specific images
Quantative evaluation criteria, see column on the right-hand side here.
-
However, the three followingcriteria seem key to any such method
Quantitative evaluation criteria.
-
ee Sec. 4.2.2 for user studiesthat further demonstrate this
Quite nice that they strengthened their analysis with user studies, something that I haven't seen often in these type of ML papers.
-
Inother words, the modified classifier output may reflect biasesin classifier training, and not a true correlation between thelabel and visual attributes
Interesting nuance made.
-
high accuracy of at least 95% on their test sets.
Ah nice!
-
greedy search (i.e., at each step find the next most influentialattribute for this image, given the subset of attributes selectedso far; halt once the classifier has flipped its classification).We can then visualize the effect of modifying this subset.We refer to this as Subset selection
-
The simplest is to iterate over StylExattributes, calculate the effect of changing each on the classi-fier output for this image, and return the top-k of these. Wecan then visualize the resulting k modified images. We referto this strategy as Independent selection.
-
AttFind proceeds as follows: At each iteration it consid-ers all K style coordinates and calculates their effect on theprobability of y.3 4 It then selects the coordinate with largesteffect, and removes all images where changing this coordi-nate had a large effect on their probability to belong to classy
-
AttFind takes as input the trained model and a set of N im-ages whose predicted label by C is different from y. Foreach class y (e.g., y=“cat” or y=“dog”), AttFind then findsa set Sy of M style coordinates (i.e., Sy ⊂ [1, . . . , K] and|Sy|= M ), such that changing these coordinates increasesthe average probability of the class y on these images.
Question: How can you actually define a style coordinate? How should I envision this?
-
StylEx architecture.
TLDR methodology: 1) StyleGAN + classifier 2) Search StyleSpace
-
ablation
Question: Ablation refers to the removal of an component within the AI system. Does it mean in this context that the loss will be deconstructed or something?
-
LPIPS
Referring to Latent Perceptual Image Similarity.
-
The basic GAN trainingrecipe is to train the generator G and an adversarial discrimi-nator D simultaneously [8]. Additionally, we jointly train anencoder E with the generator G, using a reconstruction loss(i.e., the Encoder and Generator function together as an au-toencoder). Finally, we add two components that introducethe classifier into the training procedure.
Adjustments made in virtue of this new research.
-
latent vector w
Question: This is a compressed representation of the image, right?
-
he observationrecently made by [35] that StyleGAN2 tends to inherentlycontain a disentangled StyleSpace space, which can be usedto extract individual attributes. Thus, we argue that modify-ing coordinates of StyleSpace is a natural approach to ourproblem of modifying classifier-related attributes.
Key aspect of StyleGAN2 in which the authors based this current research on.
-
a) A conditionalgenerative model that maps an embedding w into an outputimage. b) An encoder that maps any input image into anembedding w, so that the generator can modify attributesin real images. c) A mechanism for “intervening” with thegeneration process to change visual attributes in the image
Network architecture in detail, generative model: StyleGAN2 -> question, are there possible alternatives to using StyleGAN2?
-
cannot visualize/explain well attributes that are not spatiallylocalized, like size, color, etc. In addition, they can showwhich areas of the image may be changed in order to affectthe classification, but not how they should be changed
Drawbacks of using heat maps -- they don't provide enough justifications :(
-
It should be emphasized that our goal is not to explainthe true label, but rather what classifiers are learning.
Interesting! If we were to use post-hoc explanations (think of SHAP and LIME), would you say that we intend to explain the true label, right?
-
Our contributions are as follows
Contributions, but then elaborated more
-
Adversarialexamples [9, 12] , which are slight modifications to the inputxthat change the classification to the wrong class,
Reminded me of this example again (https://openai.com/blog/multimodal-neurons/) how you could trick CLIP into believing an actual apple is an iPod
-
A keyadvantage of this approach is that it provides per-exampleexplanations, pinpointing which parts of the input are salienttowards the classification and also how they can be changedin order to obtain an alternative outcome.
Main advantage of using CE
-
A counterfactual explanation is a statement of the form“Had the input xbeen ̃xthen the classifier output would havebeen ̃y instead of y”, where the difference between xand ̃xis easy to explain.
Definition of a counterfactual explanation in ML.
-
counterfactual example,i.e., visualizing how manipulation of this attribute (style coordinate) affects the classifier output probability.
-
Deep net classifiers are often described as “black boxes”whose decisions are opaque and hard for humans to under-stand. This significantly holds back their usage, especiallyin areas of high societal impact, such as medical imagingand autonomous driving, where human oversight is critical.
Would you argue that e.g. MLP is also a deep net? I would slightly rephrase the term "deep net" and change it into something like "neural network-based classifiers".
But, related to this, I actually found quite a relevant and interesting additional paper! It happened to be published 3 days ago, and the paper basically argued that you don't neccesarily need deep convnets to solve similar societal problems - any problem that can be solved using a transformer can be solved by a MLP/CNN, and vice versa, provided that you do exhaustive tuning and use the right inductive bias. Idem ditto for RNNs!
Link to CNN paper: https://arxiv.org/abs/2201.03545 Link to RNN paper: https://arxiv.org/abs/1611.09913
-
driving latent attributes in the GAN’s StyleSpace to captureclassifier-specific attributes
Details on what specific attributes are visualized -> does so by generating counterfactual examples.
-
StylEx, explains the decisions of aclassifier by discovering and visualizing multiple attributes that affect its prediction.
Main contribution of the paper.
-