272 Matching Annotations
  1. Mar 2021
    1. Tuning the weight noticeably improved our clustering. Below, you can see the difference between our embedding prior to tuning and our embedding after tuning.

      BIC or sillhouette score both fine

      jovo prefers BIC >> take kmeans clusters, fit BIC score, under model that each cluster has spherical covariance matrix

      • a network is "vertices, edges, vertex attributes, edge attributes, graph attributes"
      • title: joint representation learning

      • make network be the tuple of all the things

      • clarify what a network is in the "what is a network" section

        • but a network can include graph, vertex, edge attributes. the network is the tuple of all of those things
      • figure showing the other options for joint embedding (jointly embed covariates/network with omni or MASE, embed them both separately, do case) would be helpful

    1. That is, the statistical network model indicates what, exactly, is random, and how we can characterize its behavior statistically.

      potentially could remove? but maybe fine

    1. For instance, in our social network example, we might only know whether two people are from the same school, and might not know whether they are in the same grade or share classes together, even though we would expect these facts to impact whether they might have a higher chance of being friends.

      run-on sentence

    2. ss you are looking at a graph in which the vertices are already arranged in an order which respects the community struucture.

      unless we're looking at a network that already has reasonably-ordered nodes

    3. ipython-input-11-8581568507e8>:8: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. heatmap(A[[vtx_perm]] [:,vtx_perm])

      wot

    4. The below graph shows the exact same set of adjacencies as-above, but wherein 𝐴𝐴AA\pmb A has had its vertices resorted randomly.

      The heatmap below is the same network, it just has its nodes reordered randomly

    5. Consider, for instance, a similar adjacency matrix to the graph plotted above, with the exact same realization, up to a permutation (reordering) of the vertice

      Think about a network that's exactly the same as the one above, except we've reordered the nodes

    6. Indeed, the block structure may only be apparent given a particular ordering of the vertices,

      If you think about an adjacency matrix, we don't actually need to order the nodes any particular way. If two nodes that aren't in the same group are right on top of each other in an adjacency matrix, we won't even see any block structure at all! This doesn't mean that our network isn't an SBM, it just means that the way we ordered our nodes doesn't let us see any communities.

    7. a block structure is visually discernable.

      we can actually see the blocks

      or, changing the whole thing, "a network might be an SBM even if we can't see any blocks when we visualize it"

    8. The block structure is clearly delineated by the first 505050 vertices being from a single community, and the second 505050 vertices being from a different community.

      dont know if we need this if we say that blocks in SBM = communities in network earlier

    9. These blocks are the apparent β€œsubgraphs”, or square patterns, observed in the above graph.

      "blocks in the heatmap of stochastic block models represent the communities in our network. Each community could be also be considered its own network, separate from the others! This is called a subgraph"

    10. n the above simulation, we can clearly see an apparent 444-β€œblock structure”, which describes the fact that the probability of an edge existing depends upon which of the 444 β€œblocks” the edge falls into

      This Stochastic Block Model has two groups: the first is .....

    11. # for simplicity, the simulation code generates samples wherein # vertices from the same community are ordered in the vertex set by # their community order.

      had to think about this for a sec to understand it

    12. the adjacency matrices describing a graph which has the 𝑆𝐡𝑀𝑛(πœβƒ—Β ,𝐡𝐡)SBMn(Ο„β†’,BB)SBM_n(\vec \tau, \pmb B) distribution.

      generate and visualize a stochastic block model

    13. graph which has the 𝑆𝐡𝑀𝑛(πœβƒ—Β ,𝐡𝐡)SBMn(Ο„β†’,BB)SBM_n(\vec \tau, \pmb B) distribution.

      a stochastic block model

    14. o the symmetry of 𝐡𝐡BB\pmb B, 𝐴𝑗𝑖|𝑣𝑖=π‘˜,𝑣𝑗=π‘™βˆΌπ΅π‘’π‘Ÿπ‘›π‘œπ‘’π‘™π‘™π‘–(π‘π‘˜π‘™)Aji|vi=k,vj=l∼Bernoulli(bkl)A_{ji} | v_i = k, v_j = l \sim Bernoulli(b_{kl}), for all 𝑖,π‘—βˆˆ1,...,𝑛i,j∈1,...,ni,j \in 1,...,n.

      "due to the symmetry of B, ..." is jargonerific

      I think we should just assume that anybody reading this will forget what most of the single letters/symbols mean essentially as soon as they read them

    15. if vertex 𝑣𝑖viv_i is in community π‘˜kk and vetex 𝑣𝑗vjv_j is in community 𝑙ll, then an edge 𝑒𝑖𝑗eije_{ij} or 𝑒𝑗𝑖ejie_{ji} exists between 𝑣𝑖viv_i and 𝑣𝑗vjv_j with probability π‘π‘˜π‘™=π‘π‘™π‘˜bkl=blkb_{kl}=b_{lk}. Fomally, we wite that π΄π΄βˆΌπ‘†π΅π‘€π‘›(πœβƒ—Β ,𝐡𝐡)AA∼SBMn(Ο„β†’,BB)\pmb A \sim SBM_n(\vec \tau, \pmb B) if 𝐴𝑖𝑗|𝑣𝑖=π‘˜,𝑣𝑗=π‘™βˆΌπ΅π‘’π‘Ÿπ‘›π‘œπ‘’π‘™π‘™π‘–(π‘π‘˜π‘™)Aij|vi=k,vj=l∼Bernoulli(bkl)A_{ij} | v_i = k, v_j = l \sim Bernoulli(b_{kl}),

      hella jargonerific

    16. ntuitionally, this would correspond to the graph in which each of the The m

      I like this, needs to be a real sentence (and probably be way bigger, the main thing we want to emphasize)

    17. Further, the matrix 𝐡𝐡BB\pmb B is supposed to be symmetric; that is, for any π‘π‘˜π‘™bklb_{kl}, it is always the case that π‘π‘˜,𝑙=π‘π‘™π‘˜bk,l=blkb_{k,l} = b_{lk} for all π‘˜=1,...,𝐾k=1,...,Kk = 1,..., K.

      also very jargonerific, "symmetric" is unclear

    18. π‘˜=1,...,𝐾k=1,...,Kk = 1,..., K tend to exceed off-diagonal entries π‘π‘˜π‘™bklb_{kl} where π‘˜β‰ π‘™kβ‰ lk \neq l and π‘˜,𝑙=1,...,𝐾k,l=1,...,Kk,l = 1,...,K

      too formal

    19. 𝐡𝐡BB\pmb B such that the diagonal entries π‘π‘˜π‘˜

      a non-mathematician will read this and say, "what the hell is B again? what is a b_kk??"

    20. 𝐡𝐡BB\pmb B with entries π‘π‘˜π‘™bklb_{kl} for π‘˜,𝑙=1,...,𝐾k,l=1,...,Kk, l = 1,..., K defines the probability of an edge existing between vertices which are in community π‘˜kk

      too formal

    21. For instance, in a social network in which the vertices are students and the edges define whether two students are friends, a vertex assignment vector might denote the school in which each student learns.

      probably move this analogy to the top and elaborate/expand it more, this kinda stuff is what we want I think

    22. ex assignment vector has entries πœβƒ—Β π‘–Ο„β†’i\vec \tau_i, where 𝑖=1,...,𝑛i=1,...,ni = 1, ..., n, for each of the vertices in the graph. For a given vertex π‘£π‘–βˆˆξ‰‚vi∈Vv_i \in \mathcal V, the corresponding vertex assignment πœβƒ—Β π‘–Ο„β†’i\vec \tau_i defines which of the 𝐾KK communities in which 𝑣𝑖viv_i is a member.

      too formal

    23. n this case, rather than having a single edge existence probability, each pair of communities has its own unique edge existence probabili

      relate this to ER?

      "We can make our erdyos-renyi graph a little more complicated by... in order to..."

    24. 𝐺=(,)G=(V,E)G = (\mathcal V, \mathcal E) is an SBM with 𝑛nn vertices, each vertex 𝑣𝑖viv_i can take be a member of one (and only one) of 𝐾KK possible communities.

      introducing statistical notation way too fast

    25. The Stochastic Block Model, or SBM, is a random graph model which produces graphs in which edge existence probabilities depend upon which vertices a given edge is adjacent to

      probably start with an analogy: "imagine we had two groups of people..."

    26. Which is in close agreement to the true probability, 𝑝=0.3p=0.3p = 0.3

      "which is pretty close to the actual probability"

      I think in general we want to talk as if we're having a casual conversation with a friend

    27. Given a graph with an adjacency matrix 𝐴𝐴(𝑠)AA(s)\pmb A^{(s)}, we can also use graspologic to estimate the probability parameter of the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) model:

      If we have a network, we can use graspologic to estimate the chance that a given pair of nodes is connected

    28. uare is dark red if an edge is present, and white if no edge is pr

      I'm thinking that for heatmaps in this book, especially unweighted, we maybe want them black-and-white with no colorbar?

      or just a binary colorbar, like I have set up in the covariates matrix code in the CASC subsection (see that subsection for matplotlib code that does that)

      was thinking of just PRing the ability to do a binary colorbar into graspologic too

    29. n the simple simulation above, we sample a single, undirected, network with loops, with adjacency matrix 𝐴𝐴(𝑠)AA(s)\pmb A^{(s)}.

      "Here's an example of what an erdos-renyi network looks like once we make it from our model"

    30. 0 vertices ps = 0.3 # probability of an edge existing is .3 # sample a single adj. mtx from ER(50, .3) As = er_np(n=n, p=ps, directed=True, loops=True) # and plot it heatmap(As, title="ER(50, 0.3) Simulation")

      maybe this figure should be right in the beginning without code (using code cell hiding), and then the reader has an image in their head as they go?

      although it'd still be good to have this code visible somewhere

    31. The following python code can be used to generate and visualize a graph which is generated by the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) model. Here, we let 𝑛=50n=50n=50 vertices, and the probability of an edge 𝑝=.3p=.3p=.3:

      graph > network

      "We'll use a network with 50 nodes, and each pair of nodes has a 30% probability of being connected"

    32. E[deg(vi)]=E[βˆ‘jβ‰ iAij]=βˆ‘jβ‰ iE[Aij]Expectation of a finite sum is the sum of the expectations=(nβˆ’1)p\begin{align*} \mathbb E[deg(v_i)] &= \mathbb E\left[\sum_{j \neq i} A_{ij}\right] \\ &= \sum_{j \neq i} \mathbb E[A_{ij}]\;\;\;\;\textrm{Expectation of a finite sum is the sum of the expectations} \\ &= (n-1) p \end{align*} Which follows by using the fact that all of the π‘›βˆ’1nβˆ’1n-1 possible edges which are incident vertex 𝑣𝑖viv_i have the same expected probability of occurence, 𝑝pp, governed by the parameter for the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) model. This tractability of theoretical results makes the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) an ideal candidate graph to study in describing properties of networks to be expected if the network is 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p). Similarly, we can easily invert the properties of 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) networks, to study when a graph is not an 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) random graph, and may merit more careful inferential tasks. On another front, when one wishes to devise new computational techniques and deduce the efficiency or effectiveness of a technique on a network with a given number of nodes and a given number of edges, and is not concerned with how efficient the technique is if the network displays other (potentially exploitable) properties, the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) model also makes a good candidate for analysis. This is particularly common when dealing with graphs which are known to be sparse; that is, 𝑝pp is very small (usually, on the order or less than 1/𝑛1/n1/n).

      very jargonerific paragraph, had to reread it a few times

    33. This is particularly common when dealing with graphs which are known to be sparse; that is, 𝑝pp is very small (usually, on the order or less than 1/𝑛1/n1/n).

      "network"

      sparse > jargonerific

    34. given number of nodes and a given number of edges, and is not concerned with how efficient the technique is if the network displays other (potentially exploitable) properties, the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) model also makes a good candidate for analysis

      run-on sentence

    35. larly, we can easily invert the properties of 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) networks, to study when a graph is not an 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) random graph, and may merit more careful inferential tasks. On

      not sure what this sentence means

    36. This tractability of theoretical results makes the 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p) an ideal candidate graph to study in describing properties of networks to be expected if the network is 𝐸𝑅𝑛(𝑝)ERn(p)ER_n(p).

      a little circular, if we know something is ER, then of course ER is a good candidate model to use

    37. 𝔼[𝑑𝑒𝑔(𝑣𝑖)]=𝔼[βˆ‘π‘—β‰ π‘–π΄π‘–π‘—]=βˆ‘π‘—β‰ π‘–π”Ό[𝐴𝑖𝑗]

      I think any time we use an equation, we should explain what it's doing in words above?

      also not sure if we need this equation? or at least needs motivation first

    38. 𝑑𝑒𝑔(𝑣𝑖)=βˆ‘π‘—β‰ π‘–π΄π‘–π‘—deg(vi)=βˆ‘jβ‰ iAijdeg(v_i) = \sum_{j \neq i}A_{ij}

      only summing across i or j, I think this implies summing everything that isn't diagonal

      also maybe just delete? dunno if we need this sum

    39. β€œidentical” clarifies that the edge probability 𝑝pp (or the probability of no edge, 1βˆ’π‘1βˆ’p1-p) is the same for all edges within the network.

      every pair of nodes in our network has the same probability of being connected

    40. the occurence or not occurence of other edges in the network does not impact the probability of occurence of a given edge

      edges don't affect each other

    41. generative model we select for our network will not be the true generative model that underlies our network

      using the word "generative model" twice sounds a little awkward

    42. The vertex assignment vector has entries $\vec \tau_i$, where $i = 1, …, n$, for each of the vertices in the graph. For a given vertex $v_i \in \mathcal V$, the corresponding vertex assignment $\vec \tau_i$ defines which of the $K$ communities in which $v_i$ is a member. For instance, in a social network in which the vertices are students and the edges define whether two students are friends, a vertex assignment vector might denote the school in which each student learns. The matrix $\pmb B$ with entries $b_{kl}$ for $k, l = 1,…, K$ defines the probability of an edge existing between vertices which are in community $k$ with vertices which are in community $l$. For instance, in the social network example, one might select $\pmb B$ such that the diagonal entries $b_{kk}$ for $k = 1,…, K$ tend to exceed off-diagonal entries $b_{kl}$ where $k \neq l$ and $k,l = 1,…,K$. Further, the matrix $\pmb B$ is supposed to be symmetric; that is, for any $b_{kl}$, it is always the case that $b_{k,l} = b_{lk}$ for all $k = 1,…, K$. Intuitionally, this would correspond to the graph in which each of the The matrix $\pmb B$ defines that if vertex $v_i$ is in community $k$ and vetex $v_j$ is in community $l$, then an edge $e_{ij}$ or $e_{ji}$ exists between $v_i$ and $v_j$ with probability $b_{kl}=b_{lk}$. Fomally, we wite that $\pmb A \sim SBM_n(\vec \tau, \pmb B)$ if $A_{ij} | v_i = k, v_j = l \sim Bernoulli(b_{kl})$, or equivalently due to the symmetry of $\pmb B$, $A_{ji} | v_i = k, v_j = l \sim Bernoulli(b_{kl})$, for all $i,j \in 1,…,n$.

      too mathy

    1. We need a graph and some covariates. To start off, we’ll make a pretty straightforward Stochastic Block Model with 1500 nodes and 3 communities.

      need to add equations for sampling data then algorithm bit then demonstrate it works when it's supposed to work