272 Matching Annotations
  1. Mar 2022
    1. Well, at some level, every aspect of reality seems to be made of interconnected parts. Atoms and molecules are connected to each other with chemical bonds. Your neurons connect to each other through synapses, and the different parts of your brain connect to each other through groups of neurons interacting with each other. At a larger level, you are interconnected with other humans through social networks, and our economy is a global, interconnected trade network. The Earth’s food chain is an ecological network, and larger still, every object with mass in the universe is connected to every other object through a gravitational network.

      pagerank algorithm?

  2. Feb 2022
  3. Dec 2021
    1. Our model could say that this distribution of a Gaussian, for example, in which case we would have to estimate the parameters of this Gaussian

      rephrase

      "We could assume, for example, that our data are normally distributed. Then, our only goal would be to estimate the parameters of this distribution -- this is called a parametric model. On the other hand, we might not know what the distribution is at all, and we're just trying to fit our data the best we can. This situation is called a nonparametric model.

    2. We consider the distribution where our observations are sampled from as really anything

      The distribution our observations are sampled from could be anything

    3. The first is given two graphs, and their corresponding latent positions, are their positions the same?

      Both tests compare the latent positions of the two networks, in slightly different ways. The first type of test is intended to figure out if the latent positions themselves are exactly the same between the two networks. The second is intended to determine whether the distributions of the latent positions between the two networks are the same.

    1. 𝑎→𝑎a→aa \rightarrow a 𝑏→𝑏b→bb \rightarrow b 𝑐→𝑑c→dc \rightarrow d 𝑑→𝑐

      i'd just write this out instead of doing this arrow stuff

  4. Nov 2021
    1. choose a particular percentile, and divide it by 100100100 to obtain the quantile

      it's a little unclear to me why we're dividing by 100.

      if have edge weights 3, 5, 7, and we pick 20%, then we'd get 0.2 when dividing by 100. But we wouldn't set any edge weights below 0.2 to 100, because there are no edge weights below 0.2. I dont think that's what you mean to do, but I think the phrasing implies that?

    2. it might in fact overfit the training data and model spurious noise, which raises the variance

      A low-bias model, for instance might fit to our training data too well. This fit would just model noise, raising the variance when applied to new data.

    3. ′=12(𝐴+𝐴⊤)=12⎛⎝⎜⎜⎜⎡⎣⎢⎢⎢𝑎11⋮𝑎𝑛1...⋱...𝑎1𝑛⋮𝑎𝑛𝑛⎤⎦⎥⎥⎥+⎡⎣⎢⎢⎢𝑎11⋮𝑎1𝑛...⋱...𝑎𝑛1⋮𝑎𝑛𝑛⎤⎦⎥⎥⎥⎞⎠⎟⎟⎟=⎡⎣⎢⎢⎢12(𝑎11+𝑎11)⋮12(𝑎𝑛1+𝑎1𝑛)...⋱...12(𝑎1𝑛+𝑎𝑛1)⋮12(𝑎𝑛𝑛+𝑎𝑛𝑛)⎤⎦⎥⎥⎥=⎡⎣⎢⎢⎢𝑎11⋮12(𝑎𝑛1+𝑎1𝑛)...⋱...12(𝑎1𝑛+𝑎𝑛1)⋮𝑎𝑛𝑛⎤⎦⎥⎥⎥

      feels a little bulky, maybe unecessary

    4. node 𝑖ii being stimulated leading to node 𝑗jj does not necessarily mean that node 𝑗jj being stimulated leads to node 𝑖ii being stimulated.

      rephrase

    1. Next, we can select a different neighbor of node 𝑖ii, for which there are 𝑑𝑖−1di−1d_i - 1 total. This gives us a triplet consisting of node 𝑖ii, one of 𝑑𝑖did_i possible nodes, and one of 𝑑𝑖−1di−1d_i - 1 possible nodes, since there will exist at least two edges between them (one edge from node 𝑖ii to one of its 𝑑𝑖did_i neighbors, and the other edge from node 𝑖ii to one of its other 𝑑𝑖−1di−1d_i - 1 neighbors). Therefore, the number of open and closed triplets is the quantity ∑𝑖𝑑𝑖(𝑑𝑖−1)∑idi(di−1)\sum_i d_i (d_i - 1).

      didnt really understand this description too well

      Here's how you can find an arbitrary triplet:

      1. Pick a neighbor for node $i$
      2. Pick a different neighbor for node $i$
      3. Since node $i$ has edges with both of these neighbors, the triplet consisting of $i$ and its two neighbors will have at least two edges.
      4. If those neighbors are connected, the triplet will be open, and if they aren't, the triplet will be closed
      5. can figure out how many total triplets there are by counting the number of times we can go through this process

      (maybe revise point 5, or reorganize into paragraphs, and/or add more numerical specifics)

    2. nodes from our example network. To do this, we will look only at the boroughs Staten Island, Manhattan, Brooklyn, and Queens. Our network looks like this:

      Let's look at only Staten Island, Manhattan, Brooklyn, and Queens in our example network.

    3. Here, we cover some useful quantities we might want to compute about a network.

      These properties are called network summary statistics. Although this book will be more focused on finding and using representations for networks than using summary statistics, they're useful to know about.

    4. where 𝑎𝑖𝑗aija_{ij} takes the value of 111 if nodes 𝑖ii and 𝑗jj are connected, and the value 000 if nodes 𝑖ii and 𝑗jj are not connected.

      ?

    5. /opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/sklearn/utils/validation.py:585: FutureWarning: np.matrix usage is deprecated in 1.0 and will raise a TypeError in 1.2. Please convert to a numpy array with np.asarray. For more information see: https://numpy.org/doc/stable/reference/generated/numpy.matrix.html warnings.warn(

      delete

    6. summing all of the adjacencies corresponding to a potential edge incident a node 𝑖ii:

      overcomplicated sentence

      "We can get the degree of a $i$ by counting all the edges incident to it. To do this, we can just sum along the $i_{th}$ row (or column) of the adjacency matrix:

      (equation)

    7. From the description above, we learned that every edge incident a node 𝑖ii will have 𝑎𝑖𝑗aija_{ij} take the value of one. Therefore,

      Since every edge incident to $i$ will have $a_{ij}$ take the value of 1, we can count...

    8. For most purposes, we will largely be considered with binary networks, which are also more traditionally called

      For most purposes, we'll primarily consider unweighted or binary networks.

    1. Thus, 𝐴AA and 𝐵BB are said to be isomorphicisomorphic\textit{isomorphic}.

      don't think introducing the term "isomorphic" is necessary here - can just say "So A and B are the same network, but the nodes just have different indices"

    2. fig, axs = plt.subplots(1, 3, figsize=(20, 20)) heatmap(A, ax=axs[0], cbar=False, title = r'$A_T$') heatmap(B, ax=axs[1], cbar=False, title = r'$A_F$') heatmap(P@B@P.T, ax=axs[2], cbar=False, title = r'$A_F$

      same deal as before, separate out plotting code

    3. fig, axs = plt.subplots(1, 3, figsize=(20, 20)) heatmap(B, ax=axs[0], cbar=False, title = r'Original Matrix $B$') heatmap(P@B, ax=axs[1], cbar=False, title = r'Row Permutation $PB$:') heatmap(B@P.T, ax=axs[2], cbar=False, title = r'Row Permutation $BP^T$:')
      • separate plotting stuff out into a new codeblock and hide the cell
      • add ; at the end of the code block to prevent the <AxesSubplot:title={'center':'Row Permutation $BP^T$:'}>
    4. As you can imagine, there are a very large number of these possible mappings

      There are a ton of ways to match the nodes in F to the nodes in T. In fact...

  5. Oct 2021
    1. arred sections, these sections will assume familiarity with more advanced mathematical and probability concepts. { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./representations/ch5" }, predefinedOutput: true } kernelName = 'python3'

      consider a paragraph that just introduces all the models with like a sentence-description

  6. Sep 2021
    1. a posteriori Stochastic Block Model, Recap We just covered many details about how to perform statistical inference with a realization of a random network which we think can be well summarized by a Stochastic Block Model. For this reason, we will review some of the key things that were covered, to better put them in context: We learned that the Adjacency Spectral Embedding is a key algorithm for making sense of networks we believe may be realizations of networks which are well-summarized by Stochastic Block Models, as inference on the the estimated latent positions is key for learning about community assignments. We learned how unsupervised learning allows us to use the estimated latent positions to learn community assignments for nodes within our realization. We learned how to align the labels produced by our unsupervised learning technique with true labels in our network, using remap_labels. We learned how to produce community assignments, regardless of whether we know how many communities may be present in the first place. { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./representations/ch6" }, predefinedOutput: true } kernelName = 'python3'

      I think this recap should be the introductory paragraph, and should be expanded

    2. Unlike the SBM example, the scatter plots for the adjacency spectral embedding of a realization of an ER network no longer show the distinct separability into individual communities.

      Unlike with the SBM, we can't see any obvious clusters in this pairs plot

    3. histograms of the indicated values for the indicated dimension.

      "the indicated values for the indicated dimension" I don't feel like I understand the histograms better after reading this sentence

    4. we will find reasonable “guesses” at community assignments further down the line.

      that we'll be able to guess our community assignments reasonably well

    5. Remember that as we learned in the single network models section, even though the communities eachh node is assigned to look obvious, this is an artifact of the ordering of the nodes.

      consider:

      Remember that if we reorder the nodes, the community each node is assigned to won't be as visually obvious

    6. the approach we will take will be to use 𝐴AA to produce a best guess as to which community each node of 𝐴AA is from, and then use our best guesses as to which community each node is from to learn about 𝜋⃗ π→\vec \pi and 𝐵BB.

      I have had to linger on this sentence for the past 15 seconds to understand it - rewrite

    7. l = RDPGEstimator(loops=False) # number of latent dimensions is not known model.fit(A) Phat = model.p_mat_

      i'd make a "d_hat" variable somewhere in this code

    8. What if we did not know that 𝑑dd was 222 ahead of time

      what if we didn't know that there were two latent dimensions ahead of time?

      I would also visualize the latent position matrix somewhere

    9. from graphbook_code import plot_latents fig, axs = plt.subplots(1, 2, figsize=(12, 6)) heatmap(Phat, vmin=0, vmax=1, font_scale=1.5, title="$\hat P_{RDPG}$", ax=axs[0]) heatmap(P, vmin=0, vmax=1, font_scale=1.5, title="$P_{RDPG}$", ax=axs[1]) fig;

      hide this?

    10. We will evaluate the performance of the RDPG estimator agaigraspologic.plot the estimated probability matrix, 𝑃̂ =𝑋̂ 𝑋̂ ⊤P^=X^X^⊤\hat P = \hat X \hat X^\top, to the true probability matrix, 𝑃=𝑋𝑋⊤P=XX⊤P = XX^\top.

      this sentence is confusing

    11. st 111) and when people are very far apart, we think that they will have a very low probability of being friends (almost 000). We define 𝑋XX to have rows given by: 𝑥⃗ 𝑖=⎡⎣⎢⎢(60−𝑖60)2(𝑖60)2⎤⎦⎥⎥

      this latent position matrix doesn't really line up with the street situation you're describing - I'd either change the latent position matrix or change the example

    12. We define 𝑋XX to have rows given by: 𝑥⃗ 𝑖=⎡⎣⎢⎢(60−𝑖60)2(𝑖60

      this doesn't feel like it transitions from the street example well enough: I don't immediately see the connection between X and the street and this equation

    13. Let’s assume that we have 606060 people who live along a very long road that is 606060 miles long, and each person is 111 of a mile apart.

      I would literally have this be the first sentence of the section

    14. The ⋅̂ ⋅^\hat \cdot symbol just means that 𝑑̂ d^\hat d is an estimate of the number of latent dimensions 𝑑dd, and not necessarily the actual number of latent dimension

      consider rewriting to something like

      "the hat symbol above the $d$ means that it's our best guess for the number of dimensions (using some reasonable estimation method), rather than it being the actual number of dimensions.

    15. We might have a reasonable ability to “guess” what 𝑑dd is ahead of time, but this will often not be the case

      in some situations we can guess $d$, but we want some way of picking it automatically

    16. he estimate of 𝑋XX is produced by using the Adjacency Spectral Embedding, by embedding the observed network 𝐴AA into 𝑑dd (if the number of latent dimensions is known) or 𝑑̂ d^\hat d (if the number of latent dimensions is not known) dimensions.

      "... by embedding the observed network A into d or d if the number of latent dimension is known or not known."

    17. We estimate 𝑋XX extremely simply for a realization 𝐴AA of a random network 𝐴𝐴AA\pmb A which is characterized using the a priori Random Dot Product Graph.

      I didn't understand this sentence on first read, would rephrase

    18. That expression, it turns out, is a lot more complicated than what we had to deal with for the a priori Stochastic Block Model. Taking the log gives us that:

      give the reader warning in advance

    19. , in fact, additional steps on top of how we estimate parameters for an a priori Random Dot Product Graph (RDPG).

      run-on sentence and confusing phrasing

    20. ℙ𝜃(𝐴)=∑𝜏⃗ ∈∏𝑘=1𝐾[𝜋𝑛𝑘𝑘⋅∏𝑘′=1𝐾𝑏𝑚𝑘′𝑘𝑘′𝑘(1−𝑏𝑘′𝑘)𝑛𝑘′𝑘−𝑚𝑘′𝑘]

      I would organize this whole section like:

      "look how complicated the thing below is:

      <the equation>

      this equation is way too complicated, so we're going to have to get a bit creative here"

    1. original matrix B [[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12] [13 14 15 16]] row permutation: [[ 5 6 7 8] [ 9 10 11 12] [13 14 15 16] [ 1 2 3 4]] column permutation: [[ 2 3 4 1] [ 6 7 8 5] [10 11 12 9] [14 15 16 13]]

      the formatting is a bit visually chaotic

      I might replace with a figure that uses color to show row/column movement & a heatmap

      with all 0's and a row of 1

    2. Due to the one-to-one nature of these matchings, they are also known as bijectionsbijections\textit{bijections}

      this sentence will be meaningless to many readers

    1. Stated another way, our observed network is assumed to be a realization of a governing random network. From now on, when we say the word network without the word random in front of it, we are referring to the realizations of random networks.

      dax: this sentence is doing a ton of heavy lifting and needs more pomp+circumstance.

      maybe in its own section, and another paragraph (or two)?

    1. the diagonal is entirely 0

      "edges can't connect to themselves" - I wouldn't talk about the diagonal, that's not about the network, its about the adjacency matrix

    1. et’s try an example of an a priori RDPG. We will use the same example that we used in the single network models section, where we

      I'd add a big header here that says something like "Fitting Models For Random Dot Product Graphs"

    2. 𝑏𝑙′𝑙logℙ𝜃(𝐴)⇒𝑏∗𝑙′𝑙=0+∂∂𝑏𝑙′𝑙[𝑚𝑙′𝑙log𝑏𝑙′𝑙+(𝑛𝑙′𝑙−𝑚𝑙′𝑙)log(1−𝑏𝑙′𝑙)]=𝑚𝑙′𝑙𝑏𝑙′𝑙−𝑛𝑙′𝑙−𝑚𝑙′𝑙1−𝑏𝑙′𝑙=0=𝑚𝑙′𝑙𝑛𝑙′𝑙

      too mathy

    3. ∂∂𝑏𝑙′𝑙logℙ𝜃(𝐴)=∂∂𝑏𝑙′𝑙∑𝑘,𝑘′∈[𝐾]𝑚𝑘′𝑘log𝑏𝑘′𝑘+(𝑛𝑘′𝑘−𝑚𝑘′𝑘)log(1−𝑏𝑘′𝑘)=∑𝑘,𝑘′∈[𝐾]∂∂𝑏𝑙′𝑙[𝑚𝑘′𝑘log𝑏𝑘′𝑘+(𝑛𝑘′𝑘−𝑚𝑘′𝑘)log(1−𝑏𝑘′𝑘)]

      too mathy

    4. ℙ𝜃(𝐴)=∏𝑘,𝑘′∈[𝐾]𝑏𝑚𝑘′𝑘𝑘′𝑘⋅(1−𝑏𝑘′𝑘)𝑛𝑘′𝑘−𝑚𝑘′𝑘Pθ(A)=∏k,k′∈[K]bk′kmk′k⋅(1−bk′k)nk′k−mk′k\begin{align*} \mathbb P_\theta(A) &= \prod_{k, k' \in [K]}b_{k'k}^{m_{k'k}} \cdot (1 - b_{k'k})^{n_{k'k - m_{k'k}}} \end{align*} where 𝑛𝑘′𝑘=∑𝑖<𝑗𝟙𝜏𝑖=𝑘𝟙𝜏𝑗=𝑘′nk′k=∑i<j1τi=k1τj=k′n_{k'k} = \sum_{i < j}\mathbb 1_{\tau_i = k}\mathbb 1_{\tau_j = k'} was the number of possible edges between nodes in community 𝑘kk and 𝑘′k′k', and 𝑚𝑘′𝑘=∑𝑖<𝑗𝟙𝜏𝑖=𝑘𝟙𝜏𝑗=𝑘′𝑎𝑖𝑗mk′k=∑i<j1τi=k1τj=k′aijm_{k'k} = \sum_{i < j}\mathbb 1_{\tau_i = k}\mathbb 1_{\tau_j = k'}a_{ij} was the number of edges in the realization 𝐴AA between nodes within communities 𝑘kk and 𝑘′k′k'.

      way too mathy, nobody in industry is gonna know what an indicator function is

    5. ve, which we omit since it is rather mathematically tedious, we see that the second derivative at 𝑝∗p∗p^* is negative, so we indeed have found an estimate of the maximum, and will be denoted by 𝑝̂ p^\hat p. This gives that the Maximum Likelihood Estimate (or, the MLE, for short) of the probability 𝑝pp for a random network 𝐀A\mathbf A which is ER is: 𝑝̂ =𝑚(𝑛2)

      I don't think we should be talking about MLEs right when we introduce ER, way too in-depth

    6. xt, we take the derivative with respect to 𝑝pp, set equal to 000, and we

      I don't think there's any reason to talk about derivatives in the first couple paragraphs of the introduction to an ER model

    7. 𝑑𝑑𝑝logℙ(𝐱1=𝑥1,...,𝐱𝑛=𝑥𝑛;𝑝)⇒∑𝑛𝑖=1𝑥𝑖𝑝⇒(1−𝑝)∑𝑖=1𝑛𝑥𝑖∑𝑖=1𝑛𝑥𝑖−𝑝∑𝑖=1𝑛𝑥𝑖⇒𝑝∗=∑𝑛𝑖=1𝑥𝑖𝑝−𝑛−∑𝑛𝑖=1𝑥𝑖1−𝑝=0=𝑛−∑𝑛𝑖=1𝑥𝑖1−𝑝=𝑝(𝑛−∑𝑖=1𝑛𝑥𝑖)=𝑝𝑛−𝑝∑𝑖=1𝑛𝑥𝑖=1𝑛∑𝑖=1𝑛𝑥𝑖

      way way way too mathy, anyone who isnt an academic will not read any of this

    8. ℙ𝜃(𝐱1=𝑥1,...,𝐱𝑛=𝑥𝑛;𝑝)=∏𝑖=1𝑛ℙ(𝐱𝑖=𝑥𝑖)=∏𝑖=1𝑛𝑝𝑥𝑖(1−𝑝)1−𝑥𝑖=𝑝∑𝑛𝑖=1𝑥𝑖(1−𝑝)𝑛−∑𝑛𝑖=1𝑥𝑖

      too mathy imo, a non-academic's eyes will glaze over

    9. We estimate 𝑋XX extremely simply for a realization 𝐴AA of a random network 𝐴𝐴AA\pmb A which is characterized using the a priori Random Dot Product Graph.

      no idea what this means

    10. Whereas the log of a product of terms is the sum of the logs of the terms, no such easy simplification exists for the log of a sum of terms. This means that we will have to get a bit creative here. Instead, we will turn first to the a priori Random Dot Product Graph, and then figure out how to estimate parameters from a a posteriori SBM using that.

      I'm lost here because I didn't read the 'why use statistical models' section recently

      (which implies that someone would have had to read and understand the 'why use statistical models' section to understand this)

    11. ℙ𝜃(𝐴)=∑𝜏⃗ ∈∏𝑘=1𝐾[𝜋𝑛𝑘𝑘⋅∏𝑘′=1𝐾𝑏𝑚𝑘′𝑘𝑘′𝑘(1−𝑏𝑘′𝑘)𝑛𝑘′𝑘−𝑚𝑘′𝑘]Pθ(A)=∑τ→∈T∏k=1K[πknk⋅∏k′=1Kbk′kmk′k(1−bk′k)nk′k−mk′k]\begin{align*} \mathbb P_\theta(A) &= \sum_{\vec \tau \in \mathcal T} \prod_{k = 1}^K \left[\pi_k^{n_k}\cdot \prod_{k'=1}^K b_{k' k}^{m_{k' k}}(1 - b_{k' k})^{n_{k' k} - m_{k' k}}\right] \end{align*} That expression, it turns out, is a lot more complicated than what we had to deal with for the a priori Stochastic Block Model. Taking the log gives us that: logℙ𝜃(𝐴)=log(∑𝜏⃗ ∈∏𝑘=1𝐾[𝜋𝑛𝑘𝑘⋅∏𝑘′=1𝐾𝑏𝑚𝑘′𝑘𝑘′𝑘(1−𝑏𝑘′𝑘)𝑛𝑘′𝑘−𝑚𝑘′𝑘])

      I dont think anyone who is non-mathy will get anything from this

      (my eyes glazed over when I saw those equations)

  7. Aug 2021
  8. Jul 2021
    1. In particular, what this result says is that if we were to look at the sum of 𝐴AA expressed above, and only look at the sum of the first 𝑘kk of those terms, that rank 𝑘kk matrix 𝐴𝑘AkA_k is the most similar rank 𝑘kk matrix to 𝐴AA, according to the Frobenius norm.

      This is telling us that $A_k$ is the matrix which similar as possible to A, but is less complicated (meaning, it's only rank k instead of being full-rank).

    2. Discussing the below, we will delve briefly into the concept of matrix rank, which we will define now.

      needs more of a transition: why are we talking about matrix rank now? What's the connection to what we were just talking about?

    3. Let’s illustrate

      Add a sentence describing orthogonality geometrically

      so, " it's useful to think about the fact that the columns are orthogonal geometrically. If you think of each column as a vector in space, those vectors will all be at 90 degree angles from each other - and so every pair of columns will have a dot product of 0." (maybe you can go into what having a dot product of 0 implies about the information that the columns contain)

    4. hat is, 𝐴∈ℝ𝑛×𝑛A∈Rn×nA \in \mathbb R^{n \times n}, and for any 𝑖,𝑗∈[𝑛]i,j∈[n]i, j \in [n], 𝑎𝑖𝑗=𝑎𝑗𝑖aij=ajia_{ij} = a_{ji}

      I would put a figure here that shows a small symmetric matrices, with an arrow pointing to the symmetry and text that says something like "the upper and lower portions of the matrix are the same"

    5. This description of the SVD has been modified to fit our purposes: particularly, the description we provide applies only

      This description only works for square matrices because it's easier and more intuitive to think through, but you can (and should!) find other descriptions that generalize to nonsquare matrices

    6. Note that for these and successive sections, we will present a simplified, and non-rigorous, review of the SVD and many results that are important for developing intuition around this decomposition.

      feels pretty formal

  9. Jun 2021
    1. ax=adjplot(A[tuple([vtx_perm])] [:,vtx_perm], meta=meta, color="School", palette="Blues")

      I might add something that explains what the blue lines mean

    2. ting_context("talk", font_scale=1): ax = sns.heatmap(X, cmap="Purples", ax=ax, cbar_kws=dict(shrink=1), yticklabels=False, xticklabels=False, vmin=0, vmax=1) ax.set_title(titl

      have text annotations in the center of this matrix and then remove the colorbar

    3. import matplotlib.pyplot as plt import seaborn as sns import numpy as np

      I'd change the figure a bit to map the colors to numbers a bit better so that the reader understands how the figure represents a vector

    4. Next, let’s plot what 𝜏⃗ τ→\vec \tau and 𝐵BB look like: import matplotlib.pyplot as plt import seaborn as sns import numpy as np import matplotlib def plot_tau(tau, title="", xlab="Node"): cmap = matplotlib.colors.ListedColormap(["skyblue", 'blue']) fig, ax = plt.subplots(figsize=(10,2)) with sns.plotting_context("talk", font_scale=1): ax = sns.heatmap((tau - 1).reshape((1,tau.shape[0])), cmap=cmap, ax=ax, cbar_kws=dict(shrink=1), yticklabels=False, xticklabels=False) ax.set_title(title) cbar = ax.collections[0].colorbar cbar.set_ticks([0.25, .75]) cbar.set_ticklabels(['School 1', 'School 2']) ax.set(xlabel=xlab) ax.set_xticks([.5,149.5,299.5]) ax.set_xticklabels(["1", "150", "300"]) cbar.ax.set_frame_on(True) return n = 300 # number of students # tau is a column vector of 150 1s followed by 50 2s # this vector gives the school each of the 300 students are from tau = np.vstack((np.ones((int(n/2),1)), np.full((int(n/2),1), 2))) plot_tau(tau, title="Tau, Node Assignment Vector", xlab="Student")

      I would move image to right where you first explained it

    5. To describe the a priori SBM, we will use a latent variable model. To do so, we will assume there is some vector-valued random variable, 𝜏𝜏→ττ→\vec{\pmb \tau}, which we will call the node assignment vector. This random variable takes values 𝜏⃗ τ→\vec\tau which are in the space {1,...,𝐾}𝑛{1,...,K}n\{1,...,K\}^n. That means that each element of a realization 𝜏⃗ τ→\vec\tau takes one of 𝐾KK possible values. Each node receives a community assignment, so we say that 𝑖ii goes from 111 to 𝑛nn. Stated another way, each node 𝑖ii of our network receives an assignment 𝜏𝑖τi\tau_i to one of the 𝐾KK communities. This model is called the a priori SBM because we use it when we have a realization 𝜏⃗ τ→\vec\tau that we know ahead of time. In our social network example, for instance, 𝜏𝑖τi\tau_i would reflect that each student can attend one of two possible schools. For a single node 𝑖ii that is in community ℓℓ\ell, where ℓ∈{1,...,𝐾}ℓ∈{1,...,K}\ell \in \{1, ..., K\}, we write that 𝜏𝑖=ℓτi=ℓ\tau_i = \ell.

      We want to assign each node to one of some number of communities, which we're calling K.

      We can designate which community each node belongs to with a big list. The list will have the same length as the number of nodes. If there are three communities, for instance, and the first two nodes were in community 1, the second was in community 2, and the third was in community 3, then the list would look like this: [1, 1, 2, 3]

      This list doesn't have to be set in stone. In fact, we can draw our community values randomly bla bla bla probability distributions etc

    6. school 111 or school 222. Our network has 100100100 nodes, and each node represents a single student. The edges of this network represent whether a pair of students are friends. Intuitively, if two students go to the same school, it might make sense to say that they have a higher chance of being friends than if they do not go to the same school. If we were to try to characterize this network using an ER network, we would run into a problem very similar to when we tried to capture the two cluster coin flip example with only a single coin. Intuitively, there must be a better way! The Stochastic Block Model, or SBM, captures this idea by assigning each of the 𝑛nn nodes in the network to one of 𝐾KK communities. A community is a group of

      I would add a layouts-type figure to show the schools visually here

    7. If we were to try to characterize this network using an ER network

      If we were to say that every single student has the same probability of being friends, like we would have to if we described this situation with an erdos-renyi model, we'd be wrong

    8. we would run into a problem very similar to when we tried to capture the two cluster coin flip example with only a single coin.

      delete if coin stuff is getting deleted

    9. Add something like:

      "You could view this setup as a network!. In this network, there would be 100 nodes, and each node would correspond to a particular student"

  10. May 2021
    1. So now we have a combined representation for our separate embeddings, but we have a new problem: our latent positions suddenly have way too many dimensions. In this example they have eight (the number of columns in our combined matrix), but remember that in general we’d have 𝑚×𝑑m×dm \times d. This somewhat defeats the purpose of an embedding: we took a bunch of high-dimensional objects and turned them all into a single high-dimensional object. Big whoop. We can’t see what our combined embedding look like in euclidean space, unless we can somehow visualize 𝑚×𝑑m×dm \times d dimensional space (hint: we can’t). We’d like to just have d dimensions - that was the whole point of using d components for each of our Adjacency Spectral Embeddings in the first place!

      talk about rotational invariance

    1. __drawArrow__

      let's not use this __method__ syntax in class methods?

      that's usually reserved for magic/dunder methods which change stuff like what happens when you use operators on instances of the class

      https://www.tutorialsteacher.com/python/magic-methods-in-python

      if you want a method to be private, let's use the _method syntax, or if you really want it to be private, the __method syntax (the double-underscore at the beginning actually causes python to change the name of the method in the namespace)

  11. Apr 2021
    1. reflect

      dont like this word, maybe just "we might want to say that..."

      also, I think could restructure the group of sentences, might make more sense to say that Alice has a lower probability of being friends with Bob than a random person, since Bob is unpopular

    2. def plot_lp(X, title="", ylab="Student"):

      I like the idea, but it looks like the plot is continuous rather than discrete. Again a big fan of having only 30 students (or less), and making lines under each row, that way the reader can think through this in terms of smaller numbers

    3. 𝑥⃗ 1=[10]x→1=[10]\vec x_1 = \begin{bmatrix}1 \\ 0\end{bmatrix},

      I think it's a bit of a cognitive load on readers to see column vectors and then have to realize that those column vectors correspond to the row vectors of X

    4. Let’s assume, for instance,

      i'd put the "let's assume" part as the first sentence, remove the "for instance", and restructure to account for that, that way we begin with a scenario

    5. We write that X∈Rn×dX \in \mathbb R^{n \times d}, which means that it is a matrix with real values, nn rows, and dd columns.

      hopefully we also explain this notation earlier

    6. What is so special about this formulation of the SBM problem?

      latent positions are really, really important, and this feels kind of tucked away, I'd give this part its own heading, and multiple figures, and say something at the beginning of the section like, "and now we get to one of the most important ideas in network modeling: that there's a way to give the nodes of networks a location in the coordinate space that other machine learning algorithms use".

    7. 𝐚𝑖𝑖=0aii=0\mathbf a_{ii} = 0.

      instead of statements like "a_{ii} = 0", in general, I'd say something like "there are zeroes along the diagonal"? seems more clear to non-advanced readers

    8. 𝜏𝑖=ℓτi=ℓ\tau_i = \ell and 𝜏𝑗=𝑘τj=k\tau_j = k, that 𝐚𝑖𝑗∼𝐵𝑒𝑟𝑛(𝑏ℓ𝑘)aij∼Bern(bℓk)\mathbf a_{ij} \sim Bern(b_{\ell k}).

      not convinced that someone without a math background would understand this statement, I'd write it with words instead

    9. Say we have 300300300 students, and we know that each student goes to one of two possible schools.

      i'd put the figure directly below this sentence to illustrate

    10. In the first set of twenty coin flips, all of the coin flips are performed with the same coin. Stated another way, we have a single cluster, or a set of coin flips which are similar. On the other hand, in the second set of twenty coin flips, twenty of the coin flips are performed with a fair coin, and ten of the coin flips are performed with a different coin which is not fair. Here, we have two clusters of coin flips, those that occur with the first coin, and those that occur with the second coin. Since the first cluster of coin flips are with a fair coin, we expect that coin flips from the first cluster will not necessarily have an identical number of heads and tails, but at least a similar number of heads and tails. On the other hand, coin flips from the second cluster will tend to have more heads than tails. What does this example have to do with networks? In the above examples, the two sets of coin flips differ in the number of coins with different probabilities that we use for the example. The first example has only one unique coin, whereas the second example has two unique coins with different probabilities of heads or tails. If we were to assume that the second example had been performed with only a single coin when in reality it was performed with two different coins, we would be unable to capture that the second ten coin flips had a substantially different chance of landing on heads than the first ten coin flips. Just like coin flips can be performed with fundamentally different coins, the nodes of a network could also be fundamentally different. The way in which two nodes differ (or do not differ) sometimes holds value in determining the probability that an edge exists between them.

      this should all probably be slimmed down a lot, I read the whole thing out of willpower but my brain was telling me to skim it

    11. Basically, the approach is to look at each entry of 𝐴AA which can take different values, and multiply the total number of possibilities by 222 for every element which can take different values

      i'd approach this kinda like the intro does it, using example of 2, then 4, etc

    12. If 𝐴AA is 2×22×22 \times 2, there are (22)=1(22)=1\binom{2}{2} = 1 unique entry of 𝐴AA, which takes one of 222 values. There are 222 possible ways that 𝐴AA could look: [0110] or [0000][0110] or [0000]\begin{align*} \begin{bmatrix} 0 & 1 \\ 1 & 0 \end{bmatrix}\textrm{ or } \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix} \end{align*} If 𝐴AA is 3×33×33 \times 3, there are (32)=3×22=3(32)=3×22=3\binom{3}{2} = \frac{3 \times 2}{2} = 3 unique entries of 𝐴AA, each of which takes one of 222 values. There are 888 possible ways that 𝐴AA could look: ⎡⎣⎢⎢011101110⎤⎦⎥⎥ or ⎡⎣⎢⎢010101010⎤⎦⎥⎥ or ⎡⎣⎢⎢001001110⎤⎦⎥⎥ or ⎡⎣⎢⎢011100100⎤⎦⎥⎥ or ⎡⎣⎢⎢001000100⎤⎦⎥⎥ or ⎡⎣⎢⎢000001010⎤⎦⎥⎥ or ⎡⎣⎢⎢010100000⎤⎦⎥⎥ or ⎡⎣⎢⎢000000000⎤⎦⎥⎥

      I like this stuff

    13. n = 300 xs_1 = np.random.normal(size=n) pi = 0.5 ys = np.random.binomial(1, pi, size=n) xs_2 = np.random.normal(loc=ys*5) sex=["Male", "Female"]

      might actually rename variables and show this block of code

    14. it has clearly delineated communities, which are the vertices that comprise the obvious “squares” in the above adjacency matrix.

      it has distinct communities. Remember that the connections for each node is represented by a row of this heatmap. The first half of the rows have strong connections with the first half of the columns, so the students in the first school are likely close friends

      ^ I don't like this much, would like to see more simplified, but probably how I would describe it

    15. We notice this from the fact that there are more connections between people from school 111 than from school 222.

      unecessary sentence (basically the same as the previous one, but reworded)

    16. =False) meta = pd.DataFrame( data = {"School": tau.reshape((n)).astype(int)} ) ax=adjplot(A, meta=meta, color="School", palette="Blues")

      add legend that says which color belongs to which school

    17. meta = pd.DataFrame( data = {"School": tau.reshape((n)).astype(int)} ) ax=adjplot(A, meta=meta, color="School", palette="Blues")

      this code is confusing for a reader who hasn't played with graspologic's adjplot section

      I'd break into two sections and hide

    18. 0.20.2.

      Now, let's turn this scenario into a network, which we can model with a Stochastic Block Model. Students can represent nodes, and their friendships can represent edges.

    19. 𝑖ii is male, is a realization of a Gaussian random variable with mean 000 and variance 111, and if 𝑖ii is female, is a realization of a Gaussian random variable with mean 555 and variance 111.

      sentence structure is confusing to me

    20. For instance, if we see a half of the nodes have a very high degree, and the rest of the nodes with a much lower degree, we can reasonably conclude the network might be more complex than can be described by the ER model.

      I'd replace this with "For instance, if we see that half the nodes have a ton of edges (meaning, they have a high degree), and half don't, we should probably use a more complicated model than an Erdos-Renyi"

    21. <ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡𝑐𝑙𝑎𝑠𝑠="ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡">𝜏⃗ </ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡>

      rendering weird

    1. The goal of MASE is to embed the networks into a single space, with each point in that space representing a single node

      you want to separately identify homogeneity and heterogeneity

  12. Mar 2021
    1. let’s imagine we are tossing a coin 100 times, and we want to determine what the probability of the coin landing on heads is.

      Maybe start the note with this example, and then lead into "the traditional framework..." after? homl does this a lot, where the first thing you read will be an example or a thought experiment