from statistics.
little too paternalistic
from statistics.
little too paternalistic
communities
probabilities
Tuning the weight noticeably improved our clustering. Below, you can see the difference between our embedding prior to tuning and our embedding after tuning.
BIC or sillhouette score both fine
jovo prefers BIC >> take kmeans clusters, fit BIC score, under model that each cluster has spherical covariance matrix
ππππ,πXXi,jTXX^T_{i, j}
need parenthesis
title: joint representation learning
make network be the tuple of all the things
clarify what a network is in the "what is a network" section
figure showing the other options for joint embedding (jointly embed covariates/network with omni or MASE, embed them both separately, do case) would be helpful
title="Visualization of the covariates", xticks=[], ylabel="Rows
add xlabel make ylabel more simple?
page note test
data that matches the data near perfectly
had to think about what this meant
the information we did not get to observe
that information
vertices
nodes
That is, the statistical network model indicates what, exactly, is random, and how we can characterize its behavior statistically.
potentially could remove? but maybe fine
Figure showing the other options for joint embedding (jointly embed covariates/network with omni or MASE, embed them both separately, do case) would be helpful
figure 1 here: http://www.cis.jhu.edu/~parky/CEP-Publications/MCPTP-JOC2010.pdf
graphs
networks
we can still summarize that more complex model using a πpp term without much work
had to think for a sec to figure out what this meant
positive aspects to the ER networkβs simplicity
ways that the simplicity of an Erdos-Renyi network might be useful
If we do not know any patterns defining how people are or are not friends
If we don't know anything about the social community in our school,
For instance, in our social network example, we might only know whether two people are from the same school, and might not know whether they are in the same grade or share classes together, even though we would expect these facts to impact whether they might have a higher chance of being friends.
run-on sentence
ss you are looking at a graph in which the vertices are already arranged in an order which respects the community struucture.
unless we're looking at a network that already has reasonably-ordered nodes
ipython-input-11-8581568507e8>:8: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. heatmap(A[[vtx_perm]] [:,vtx_perm])
wot
The graph has an identical block structuure (up to the reordering of the vertices) as the preceding graph illustrated.
think you already said this
ll. Consider, for instance, a s
"For instance, look at this network:" (hidden code cell that makes the network) then description
The below graph shows the exact same set of adjacencies as-above, but wherein π΄π΄AA\pmb A has had its vertices resorted randomly.
The heatmap below is the same network, it just has its nodes reordered randomly
Consider, for instance, a similar adjacency matrix to the graph plotted above, with the exact same realization, up to a permutation (reordering) of the vertice
Think about a network that's exactly the same as the one above, except we've reordered the nodes
Indeed, the block structure may only be apparent given a particular ordering of the vertices,
If you think about an adjacency matrix, we don't actually need to order the nodes any particular way. If two nodes that aren't in the same group are right on top of each other in an adjacency matrix, we won't even see any block structure at all! This doesn't mean that our network isn't an SBM, it just means that the way we ordered our nodes doesn't let us see any communities.
may
jargonerific
a block structure is visually discernable.
we can actually see the blocks
or, changing the whole thing, "a network might be an SBM even if we can't see any blocks when we visualize it"
The block structure is clearly delineated by the first 505050 vertices being from a single community, and the second 505050 vertices being from a different community.
dont know if we need this if we say that blocks in SBM = communities in network earlier
These blocks are the apparent βsubgraphsβ, or square patterns, observed in the above graph.
"blocks in the heatmap of stochastic block models represent the communities in our network. Each community could be also be considered its own network, separate from the others! This is called a subgraph"
n the above simulation, we can clearly see an apparent 444-βblock structureβ, which describes the fact that the probability of an edge existing depends upon which of the 444 βblocksβ the edge falls into
This Stochastic Block Model has two groups: the first is .....
In the above simulation,
"in this heatmap"
# for simplicity, the simulation code generates samples wherein # vertices from the same community are ordered in the vertex set by # their community order.
had to think about this for a sec to understand it
the adjacency matrices describing a graph which has the ππ΅ππ(πβΒ ,π΅π΅)SBMn(Οβ,BB)SBM_n(\vec \tau, \pmb B) distribution.
generate and visualize a stochastic block model
graph which has the ππ΅ππ(πβΒ ,π΅π΅)SBMn(Οβ,BB)SBM_n(\vec \tau, \pmb B) distribution.
a stochastic block model
o the symmetry of π΅π΅BB\pmb B, π΄ππ|π£π=π,π£π=πβΌπ΅πππππ’πππ(πππ)Aji|vi=k,vj=lβΌBernoulli(bkl)A_{ji} | v_i = k, v_j = l \sim Bernoulli(b_{kl}), for all π,πβ1,...,πi,jβ1,...,ni,j \in 1,...,n.
"due to the symmetry of B, ..." is jargonerific
I think we should just assume that anybody reading this will forget what most of the single letters/symbols mean essentially as soon as they read them
if vertex π£πviv_i is in community πkk and vetex π£πvjv_j is in community πll, then an edge πππeije_{ij} or πππejie_{ji} exists between π£πviv_i and π£πvjv_j with probability πππ=πππbkl=blkb_{kl}=b_{lk}. Fomally, we wite that π΄π΄βΌππ΅ππ(πβΒ ,π΅π΅)AAβΌSBMn(Οβ,BB)\pmb A \sim SBM_n(\vec \tau, \pmb B) if π΄ππ|π£π=π,π£π=πβΌπ΅πππππ’πππ(πππ)Aij|vi=k,vj=lβΌBernoulli(bkl)A_{ij} | v_i = k, v_j = l \sim Bernoulli(b_{kl}),
hella jargonerific
ntuitionally, this would correspond to the graph in which each of the The m
I like this, needs to be a real sentence (and probably be way bigger, the main thing we want to emphasize)
Further, the matrix π΅π΅BB\pmb B is supposed to be symmetric; that is, for any πππbklb_{kl}, it is always the case that ππ,π=πππbk,l=blkb_{k,l} = b_{lk} for all π=1,...,πΎk=1,...,Kk = 1,..., K.
also very jargonerific, "symmetric" is unclear
π=1,...,πΎk=1,...,Kk = 1,..., K tend to exceed off-diagonal entries πππbklb_{kl} where πβ πkβ lk \neq l and π,π=1,...,πΎk,l=1,...,Kk,l = 1,...,K
too formal
π΅π΅BB\pmb B such that the diagonal entries πππ
a non-mathematician will read this and say, "what the hell is B again? what is a b_kk??"
π΅π΅BB\pmb B with entries πππbklb_{kl} for π,π=1,...,πΎk,l=1,...,Kk, l = 1,..., K defines the probability of an edge existing between vertices which are in community πkk
too formal
For instance, in a social network in which the vertices are students and the edges define whether two students are friends, a vertex assignment vector might denote the school in which each student learns.
probably move this analogy to the top and elaborate/expand it more, this kinda stuff is what we want I think
ex assignment vector has entries πβΒ πΟβi\vec \tau_i, where π=1,...,πi=1,...,ni = 1, ..., n, for each of the vertices in the graph. For a given vertex π£πβξviβVv_i \in \mathcal V, the corresponding vertex assignment πβΒ πΟβi\vec \tau_i defines which of the πΎKK communities in which π£πviv_i is a member.
too formal
n this case, rather than having a single edge existence probability, each pair of communities has its own unique edge existence probabili
relate this to ER?
"We can make our erdyos-renyi graph a little more complicated by... in order to..."
πΊ=(ξ,ξ±)G=(V,E)G = (\mathcal V, \mathcal E) is an SBM with πnn vertices, each vertex π£πviv_i can take be a member of one (and only one) of πΎKK possible communities.
introducing statistical notation way too fast
The Stochastic Block Model, or SBM, is a random graph model which produces graphs in which edge existence probabilities depend upon which vertices a given edge is adjacent to
probably start with an analogy: "imagine we had two groups of people..."
l
dont know what 'l' stands for
Which is in close agreement to the true probability, π=0.3p=0.3p = 0.3
"which is pretty close to the actual probability"
I think in general we want to talk as if we're having a casual conversation with a friend
Given a graph with an adjacency matrix π΄π΄(π )AA(s)\pmb A^{(s)}, we can also use graspologic to estimate the probability parameter of the πΈπ π(π)ERn(p)ER_n(p) model:
If we have a network, we can use graspologic to estimate the chance that a given pair of nodes is connected
uare is dark red if an edge is present, and white if no edge is pr
I'm thinking that for heatmaps in this book, especially unweighted, we maybe want them black-and-white with no colorbar?
or just a binary colorbar, like I have set up in the covariates matrix code in the CASC subsection (see that subsection for matplotlib code that does that)
was thinking of just PRing the ability to do a binary colorbar into graspologic too
n the simple simulation above, we sample a single, undirected, network with loops, with adjacency matrix π΄π΄(π )AA(s)\pmb A^{(s)}.
"Here's an example of what an erdos-renyi network looks like once we make it from our model"
0 vertices ps = 0.3 # probability of an edge existing is .3 # sample a single adj. mtx from ER(50, .3) As = er_np(n=n, p=ps, directed=True, loops=True) # and plot it heatmap(As, title="ER(50, 0.3) Simulation")
maybe this figure should be right in the beginning without code (using code cell hiding), and then the reader has an image in their head as they go?
although it'd still be good to have this code visible somewhere
single adj. mtx from ER(50, .3)
adj. mtx > adjacency matrix
dont think we should use abbreviation if we can help it
The following python code can be used to generate and visualize a graph which is generated by the πΈπ π(π)ERn(p)ER_n(p) model. Here, we let π=50n=50n=50 vertices, and the probability of an edge π=.3p=.3p=.3:
graph > network
"We'll use a network with 50 nodes, and each pair of nodes has a 30% probability of being connected"
E[deg(vi)]=E[βjβ iAij]=βjβ iE[Aij]Expectation of a finite sum is the sum of the expectations=(nβ1)p\begin{align*} \mathbb E[deg(v_i)] &= \mathbb E\left[\sum_{j \neq i} A_{ij}\right] \\ &= \sum_{j \neq i} \mathbb E[A_{ij}]\;\;\;\;\textrm{Expectation of a finite sum is the sum of the expectations} \\ &= (n-1) p \end{align*} Which follows by using the fact that all of the πβ1nβ1n-1 possible edges which are incident vertex π£πviv_i have the same expected probability of occurence, πpp, governed by the parameter for the πΈπ π(π)ERn(p)ER_n(p) model. This tractability of theoretical results makes the πΈπ π(π)ERn(p)ER_n(p) an ideal candidate graph to study in describing properties of networks to be expected if the network is πΈπ π(π)ERn(p)ER_n(p). Similarly, we can easily invert the properties of πΈπ π(π)ERn(p)ER_n(p) networks, to study when a graph is not an πΈπ π(π)ERn(p)ER_n(p) random graph, and may merit more careful inferential tasks. On another front, when one wishes to devise new computational techniques and deduce the efficiency or effectiveness of a technique on a network with a given number of nodes and a given number of edges, and is not concerned with how efficient the technique is if the network displays other (potentially exploitable) properties, the πΈπ π(π)ERn(p)ER_n(p) model also makes a good candidate for analysis. This is particularly common when dealing with graphs which are known to be sparse; that is, πpp is very small (usually, on the order or less than 1/π1/n1/n).
very jargonerific paragraph, had to reread it a few times
This is particularly common when dealing with graphs which are known to be sparse; that is, πpp is very small (usually, on the order or less than 1/π1/n1/n).
"network"
sparse > jargonerific
given number of nodes and a given number of edges, and is not concerned with how efficient the technique is if the network displays other (potentially exploitable) properties, the πΈπ π(π)ERn(p)ER_n(p) model also makes a good candidate for analysis
run-on sentence
deduce the efficiency or effectiveness of a technique on a networ
jargonerific
larly, we can easily invert the properties of πΈπ π(π)ERn(p)ER_n(p) networks, to study when a graph is not an πΈπ π(π)ERn(p)ER_n(p) random graph, and may merit more careful inferential tasks. On
not sure what this sentence means
This tractability of theoretical results makes the πΈπ π(π)ERn(p)ER_n(p) an ideal candidate graph to study in describing properties of networks to be expected if the network is πΈπ π(π)ERn(p)ER_n(p).
a little circular, if we know something is ER, then of course ER is a good candidate model to use
incident vertex π£πviv_i
jargonerific
πΈπ π(π)
write out
πΌ[πππ(π£π)]=πΌ[βπβ ππ΄ππ]=βπβ ππΌ[π΄ππ]
I think any time we use an equation, we should explain what it's doing in words above?
also not sure if we need this equation? or at least needs motivation first
a trivial exercise:
"pretty simple"
the phrase "trivial exercise" I think is pretty specific to mathematicians
πππ(π£π)=βπβ ππ΄ππdeg(vi)=βjβ iAijdeg(v_i) = \sum_{j \neq i}A_{ij}
only summing across i or j, I think this implies summing everything that isn't diagonal
also maybe just delete? dunno if we need this sum
βidenticalβ clarifies that the edge probability πpp (or the probability of no edge, 1βπ1βp1-p) is the same for all edges within the network.
every pair of nodes in our network has the same probability of being connected
the occurence or not occurence of other edges in the network does not impact the probability of occurence of a given edge
edges don't affect each other
(including which vertices an edge is incident, other edges within the graph, nor other factors)
unclear sentence
vertices
do we want to say "nodes" or "vertices"? I think we should use the same word throughout the book. I vote "nodes".
we may lack obvious things such as knowing
we might not know
generative model we select for our network will not be the true generative model that underlies our network
using the word "generative model" twice sounds a little awkward
The vertex assignment vector has entries $\vec \tau_i$, where $i = 1, β¦, n$, for each of the vertices in the graph. For a given vertex $v_i \in \mathcal V$, the corresponding vertex assignment $\vec \tau_i$ defines which of the $K$ communities in which $v_i$ is a member. For instance, in a social network in which the vertices are students and the edges define whether two students are friends, a vertex assignment vector might denote the school in which each student learns. The matrix $\pmb B$ with entries $b_{kl}$ for $k, l = 1,β¦, K$ defines the probability of an edge existing between vertices which are in community $k$ with vertices which are in community $l$. For instance, in the social network example, one might select $\pmb B$ such that the diagonal entries $b_{kk}$ for $k = 1,β¦, K$ tend to exceed off-diagonal entries $b_{kl}$ where $k \neq l$ and $k,l = 1,β¦,K$. Further, the matrix $\pmb B$ is supposed to be symmetric; that is, for any $b_{kl}$, it is always the case that $b_{k,l} = b_{lk}$ for all $k = 1,β¦, K$. Intuitionally, this would correspond to the graph in which each of the The matrix $\pmb B$ defines that if vertex $v_i$ is in community $k$ and vetex $v_j$ is in community $l$, then an edge $e_{ij}$ or $e_{ji}$ exists between $v_i$ and $v_j$ with probability $b_{kl}=b_{lk}$. Fomally, we wite that $\pmb A \sim SBM_n(\vec \tau, \pmb B)$ if $A_{ij} | v_i = k, v_j = l \sim Bernoulli(b_{kl})$, or equivalently due to the symmetry of $\pmb B$, $A_{ji} | v_i = k, v_j = l \sim Bernoulli(b_{kl})$, for all $i,j \in 1,β¦,n$.
too mathy
We need a graph and some covariates. To start off, weβll make a pretty straightforward Stochastic Block Model with 1500 nodes and 3 communities.
need to add equations for sampling data then algorithm bit then demonstrate it works when it's supposed to work