heatmap
range -1 +1
heatmap
range -1 +1
Since most of these metrics already exist in networkx, youβll just pull from there. You can check the networkx documentation for details.
use graspologic
community
estimated community
features
these features
for reference, gives an indication of how clustered the network is, and works by counting the proportion of triangles out of all edges that share a node.
refer to previous definition
triangles
clarify earlier that we always mean closed after
statistics
for the 4 statistics we show
103501035010^{350}
cite # atoms in universe, number of chess games, number of go games on a standard go board
How do you define a distance between one node and another node?
How do you divide a network by the number 6
largest
mention strongly and weakly connected (in a sentence)
ππΌ,π΅πΎ),(π΅πΎ,ππ»),(ππ»,π),(ππ»,π΅π),(π,π΅π
use 1 letter acronyms for each borrough
Island
,
π΄
for undriectd graphs
figure
move MH
see
change layout so they are not linear
degree
do directed case too
matrix
do directed case too
ππππππ(π)=βπ=1ππππ=βπ:πππ=1πππ+βπ:πππ=0πππ=βπ:πππ=11+0
remove
adjacency
weighted
πjj
for example
matrix
hollow
trite
why?
In the context of this book, you will usually only worry about the undirected case, or when the presence of an arrow implies that the other direction exists, too. A network is undirected if an edge between node πii and node πjj implies that node πjj is also connected to node πii. For this reason, you will usually omit the arrows entirely, like you show below:
we
red
super not salient
you look
we see
The
An Adjacency Matrix not The Adjacency Matrix
Matrix
there are n! different adjacency matrices, and we are looking at 1 of them.
Laplacian
mention directed graphs
and mention all the ones in graspologic
This lets you make inferences with πΏππππππππ§ππLnormalizedL^{normalized} that you couldnβt make with πΏLL.
true?
code
there should be a function in graspologic to convert an adjacency to a laplacian and back.
below
?
AttributeError Traceback (most recent call last)
fix
4.1.2. The Incidence MatrixΒΆ Instead of having values in a symmetric matrix represent possible edges, like with the Adjacency Matrix, we could have rows represent nodes and columns represent edges. This is called the Incidence Matrix, and itβs useful to know about β although it wonβt appear too much in this book. If there are πnn nodes and πmm edges, you make an πΓπnΓmn \times m matrix. Then, to determine whether a node is a member of a given edge, youβd go to that nodeβs row and the edgeβs column. If the entry is nonzero (111 if the network is unweighted), then the node is a member of that edge, and if thereβs a 000, the node is not a member of that edge. You can see the incidence matrix for our network below. Notice that with incidence plots, edges are (generally arbitrarily) assigned indices as well as nodes. Click to show from networkx.linalg.graphmatrix import incidence_matrix cmap = sns.color_palette("Purples", n_colors=2) I = incidence_matrix(nx.Graph(A)).toarray().astype(int) fig, axs = plt.subplots(1, 2, figsize=(12,6)) plot = sns.heatmap(I, annot=True, linewidths=.1, cmap=cmap, cbar=False, xticklabels=True, yticklabels=True, ax=axs[0]); plot.set_xlabel("Edges") plot.set_ylabel("Nodes") plot.set_title("Incidence matrix", fontsize=18) ax2 = nx.draw_networkx(G, with_labels=True, node_color="tab:purple", pos=pos, font_size=10, font_color="whitesmoke", arrows=True, edge_color="black", width=1, ax=axs[1]) ax2 = plt.gca() ax2.text(.24, 0.2, s="Edge 1", color='black', fontsize=11, rotation=65) ax2.text(.45, 0.01, s="Edge 0", color='black', fontsize=11) ax2.set_title("Layout plot", fontsize=18) sns.despine(ax=ax2, left=True, bottom=True) Copy to clipboard /tmp/ipykernel_4749/905225500.py:4: FutureWarning: incidence_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0. I = incidence_matrix(nx.Graph(A)).toarray().astype(int) Copy to clipboard When networks are large, incidence matrices tend to be extremely sparse β meaning, their values are mostly 0βs. This is because each column must have exactly two nonzero values along its rows: one value for the first node its edge is connected to, and another for the second. Because of this, incidence matrices are usually represented in Python computationally as scipyβs sparse matrices rather than as numpy arrays, since this data type is much better-suited for matrices which contain mostly zeroes. You can also add orientation to incidence matrices, even in undirected networks, which weβll discuss next. 4.1.3. The Oriented Incidence MatrixΒΆ The oriented incidence matrix is extremely similar to the normal incidence matrix, except that you assign a direction or orientation to each edge: you define one of its nodes as being the head node, and the other as being the tail. For undirected networks, you can assign directionality arbitrarily. Then, for the column in the incidence matrix corresponding to a given edge, the tail node has a value of β1β1-1, and the head node has a value of 000. Nodes who arenβt a member of a particular edge are still assigned values of 000. Weβll give the oriented incidence matrix the name πNN. Click to show from networkx.linalg.graphmatrix import incidence_matrix cmap = sns.color_palette("Purples", n_colors=3) N = incidence_matrix(nx.Graph(A), oriented=True).toarray().astype(int) fig, axs = plt.subplots(1, 2, figsize=(12,6)) plot = sns.heatmap(N, annot=True, linewidths=.1, cmap=cmap, cbar=False, xticklabels=True, yticklabels=True, ax=axs[0]); plot.set_xlabel("Edges") plot.set_ylabel("Nodes") plot.set_title("Oriented Incidence matrix $N$", fontsize=18) plot.annotate("Tail Node", (.05, .95), color='black', fontsize=11) plot.annotate("Head Node", (.05, 1.95), color='white', fontsize=11) ax2 = nx.draw_networkx(G, with_labels=True, node_color="tab:purple", pos=pos, font_size=10, font_color="whitesmoke", arrows=True, edge_color="black", width=1, ax=axs[1]) ax2 = plt.gca() ax2.text(.24, 0.2, s="Edge 1", color='black', fontsize=11, rotation=65) ax2.text(.45, 0.01, s="Edge 0", color='black', fontsize=11) ax2.set_title("Layout plot", fontsize=18) sns.despine(ax=ax2, left=True, bottom=True) ax2.text(-.1, -.05, s="Tail Node", color='black', fontsize=11) ax2.text(.9, -.05, s="Head Node", color='black', fontsize=11) Copy to clipboard /tmp/ipykernel_4749/2332982598.py:4: FutureWarning: incidence_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0. N = incidence_matrix(nx.Graph(A), oriented=True).toarray().astype(int) Copy to clipboard Text(0.9, -0.05, 'Head Node') Copy to clipboard Although we wonβt use incidence matrices, oriented or otherwise, in this book too much, we introduced them because thereβs a deep connection between incidence matrices, adjacency matrices, and a matrix representation that we havenβt introduced yet called the Laplacian. Before we can explore that connection, weβll discuss one more representation: the degree matrix.
put in appendix
ax2.text(0, 0.2, s="Nodes 0 and 2 \nare connected", color='black', fontsize=11, rotation=63)
add nodes 0 and 1 are connected
The most
A
1.1.2. Viewing Networks StatisticallyΒΆ
Network machine learning v. network data science
1.1. What Is A Network?ΒΆ
What is Network Machine Learning
features
give example, but then show something else
For instance, each node might have a set of features attached to it: extra information that comes in the form of a table. For instance
2 for instances
refer to this in preface, don't put it here
Acknowledgements
add all the other crap he included
models
i wouldn't use the word 'model' in this book
The theoretical underpinnings of the techniques described in the book
i'd put in appendix
network data science
maybe make a venn diagram including networks, data science, and machine learning
Network Machine Learning and YouΒΆ
start with something awesome that happened in the world because of network data science?
a statistical object
avoid 'statistics' everywhere
This book assumes you know next to nothing about how networks can be viewed as a statistical object
about how you can learn from network data.
not 'viewed as a stats object'
With the nodes as employees, edges as team co-assignment between employees, and node covariates as employee performance, you can isolate groups within your company which are over or underperforming employee expectations.
move to end before brains
So
delete 'so'
Representations of
dos and donts
, but are often more generalizable
remove
Connection
provide intuition or move to appendix
πnn nodes
permutation invariant
this book
first in sentence
2
1
without
so we estimate them?
πΟ\rho
let's mention Rho too
5.5.3. Correlated Network ModelsΒΆ
put time-series after this
π (1)R(1)R^{(1)} and
let these be the identity matrix. i think you can just QR these, and then inverse QR R^3
score matrices
clarify that they will be dfeind below
Letβs try this first by looking at the latent position matrices for π(1)A(1)\mathbf A^{(1)} and π(2)A(2)\mathbf A^{(2)} from the random networks for Monday and Tuesday first:
they are implicit and not necessarily the same, rather, only X'X is the same.
π
M
πβ€P=XXβ€P = XX^\top!
"!"
Pseudoinverse
go end to end
array([0.74745577, 0.49019859])
too many sig digs
f you think about it, however, you can think of this adjacency vector as kind of an estimate for edge probabilities. Say you sample a nodeβs adjacency vector from an RDPG, then you sample again, and again. Averaging all of your samples will get you closer and closer to the actual edge connection probabilities.
not quite
The average values were .775 and .149 - so, pretty close!
say we computed
took a long time
computtion
πΈπ ERER
ER_n(p) or ER random network every time
their paper is called βOn a two-truths phenomenon in spectral graph clusteringβ - itβs an interesting read for the curious)
up
- in 2018 -
,
Adjacency
combine with the below
useful information
careful
them
explicitly refer to the figure
Heck
goes earlier
Laplacian
never write 'the laplacian' always write "normalize" (etc) Laplacian
two
is this a rank 2 matrix?
Stochastic Block Model
no caps
Spectral Embedding
no caps
ββ€β¦β₯β₯[βπ’βΒ π1β]+π2β‘β£β’β’βπ’βΒ 2ββ€β¦β₯β₯[βπ’βΒ π2β]+...+ππβ‘β£β’β’βπ’βΒ πββ€β¦β₯β₯[βπ’βΒ ππβ]
confusing
-
close
in MSE
ninety-degree
maybe not so correct, eg, 90 degrees in high dimensions does not specify the other angles?
Singular Value Decomposition
lower case
and another which rotates them bac
i wouldn't say this
[x11"x22β±"xnn]\begin{align*} \begin{bmatrix} x_{11} & & & " \\ & x_{22} & & \\ & & \ddots & \\ " & & & x_{nn} \end{bmatrix} \end{align*}
makes me think this is diagonal
youβll
the alg
submatrices
not them
break a single matrix
decompose a matrix
or factorize?
default
for undirected graphs
Laplacians
such a thing as directed laplacians
The
comes out of nowhere?
started
present tense
The
make more high level
5.2.7. Network models for networks which arenβt simpleΒΆ
make part of ER section
whether youβre using a neural network, or a decision tree, or whether your goal is to classify observations or to predict a value using regression -
those 'or's are not parallel
machine learning
CS and physics
6.3.1. The Coin Flip ExampleΒΆ
starred? maybe mention MLE in section title?
πβpβp^*
ptilde
two plots look almost nothing alike
i wouldn't say that
βπ(π=π΄)=β(π11=π11,π12=π12,...,πππ=πππ)=βπ,πβπ(πππ=πππ)
don't forget punctuation
(in network statistics, often just one)
not quite
we can never hope to understand the true distribution of πA\mathbf A.
strong
β¨π₯βΒ βπ¦βΒ ,π₯βΒ βπ¦βΒ β©
remove
With π₯βΒ xβ\vec x defined as above in the sum example, we would have that: βπ=13π₯π=6.12βi=13xi=6.12\begin{align*} \prod_{i = 1}^3 x_i = 6.12 \end{align*} if we were to use ξ΅={1,3}I={1,3}\mathcal I = \{1,3\}, then: βπβξ΅π₯π=3.4
redundant
ted to multiply all the elements of π₯βΒ xβ\vec x, we would write: βπ=1ππ₯π=π₯1Γπ₯2Γ...Γπ₯πβi=1dxi=x1Γx2Γ...Γxd\begin{align*} \prod_{i = 1}^d x_i = x_1 \times x_2 \times ... \times x_d \end{align*} Where ΓΓ\times is just multiplication like you are probably used to. Again, we have the exact same indexing conventions, where: βπβ[π]π₯π=βπ=1ππ₯π=π₯1Γπ₯2Γ...Γπ₯π
redundant
In the general case, for a set ξΏS\mathcal S, we would say that πβξΏπΓπSβSrΓcS \in \mathcal S^{r \times c} if (think this through!) for any πβ[π]iβ[r]i \in [r] and any πβ[π]jβ[c]j \in [c], π ππβξΏsijβSs_{ij} \in \mathcal S
overkill?
. We Like previously, there are two types of RDPGs: one in which πXX is treated as known, and another in which πXX is treated as unknown.
delete
Symbolically
this is just 1 word.
such
such as
300300300
be consistent. let's stick with 100 students
, since they are undirected; if we relax the requirement of undirectedness (and allow directed networks) π΅BB no longer need be symmetric.
delete
simple
undirected
Imagine that we are flipping a fair single coin. A fair coin is a coin in which the probability of seeing either a heads or a tails on a coin flip is 1212\frac{1}{2}. Letβs imagine we flip the coin 202020 times, and we see 101010 heads and 101010 tails. What would happen if we were to flip 222 coins, which had a different probability of seeing heads or tails? Imagine that we flip each coin 10 times. The first 10 flips are with a fair coin, and we might see an outcome of five heads and five tails. On the other hand, the second ten flips are not with a fair coin, but with a coin that has a 4545\frac{4}{5} probability to land on heads, and a 1515\frac{1}{5} probability of landing on tails. In the second set of 101010 flips, we might see an outcome of nine heads and one tails. In the first set of 20 coin flips, all of the coin flips are performed with the same coin. Stated another way, we have a single group, or a set of coin flips which are similar. On the other hand, in the second set of twenty coin flips, twenty of the coin flips are performed with a fair coin, and 10 of the coin flips are performed with a different coin which is not fair. Here, we have two clusters of coin flips, those that occur with the first coin, and those that occur with the second coin. Since the first cluster of coin flips are with a fair coin, we expect that coin flips from the first cluster will not necessarily have an identical number of heads and tails, but at least a similar number of heads and tails. On the other hand, coin flips from the second cluster will tend to have more heads than tails.
but a simplified version of this after the description.
happens
but below equation on 1 line
ER network
ER random network
nework
network
Also, we let πππ=0aii=0\mathbf a_{ii} = 0, which means that all self-loops are always unconnected
Also, we assume it is not possible for anyone to be friends with themselves, which means that a_ii = NaN for all i.
When π>πi>ji > j, we allow πππ=πππaij=aji\mathbf a_{ij} = \mathbf a_{ji}. This means that the connections across the diagonal of the adjacency matrix are all equal, which means that we have built-in the property of undirectedness into our networks
too complicated.
We assume here that edges are undirected meaning that if there is an edge from i to j then there is also an edge from j to i
ErdΓΆs RΓ©nyi network
no such thing as an ER network
The ErdΓΆs RΓ©nyi model formalizes this relatively simple model with a single parameter:
change something, either this sentence or the table, iid belongs somewhere
model
use a different word
If π΄AA and π΄β³Aβ³A'' are members of different equivalence classes; that is, π΄βπΈAβEA \in E and π΄β³βπΈβ²Aβ³βEβ²A'' \in E' where πΈ,πΈβ²E,Eβ²E, E' are equivalence classes, then βπ(π΄)β βπ(π΄β³)PΞΈ(A)β PΞΈ(Aβ³)\mathbb P_\theta(A) \neq \mathbb P_\theta(A'').
delete
(π΄
otherwise, ...
πΈEE
use different notation
For the case of one observed network π΄AA, an estimate of πΞΈ\theta (referred to as πΜΒ ΞΈ^\hat\theta) would simply be for πΜΒ ΞΈ^\hat\theta to have a 111 in the entry corresponding to our observed network, and a 000 everywhere else. Inferentially, this would imply that the network-valued random variable πA\mathbf A which governs realizations π΄AA is deterministic, even if this is not the case.
MLE
variaable
typo
parameters
distribution
,
above figure goes before section header
To
directed with loops for simplicity
think
first think about nodes
first
first edge?
edges are not ordered
Fundamentally
and it borrows strength
Now, the only question is how to actually pull the separate latent positions for each network from this matrix.
clarify
big
do we want to use 4 different symbols for the 4 different numbers?
rows
blocks
you donβt have to rotate or flip your points to line them up across embeddings.
assuming no multiplicity of eigenvalues
big circle
big red circle. i prefer no outlines
bad
not very bad
β
remove '-' here
rotations of each other
i don't think this is english
none of the embeddings are rotations of each other
not quite right. talk to pedigo.
two
d_i
So
ask pedigo
In this example they have eight (the number of columns in our combined matrix), but remember that in general weβd have πΓπmΓdm \times d. T
the problem is that they are not in the same subspace
averaging
where did .5 go
great
often a great idea (for example....)
node
these embeddings should look different
he edges of
remove
Remember
new section heading: averaging
Grouping the Networks SeparatelyΒΆ
different ways to do joint analysis
normal
removr
Below, we turn our list of networks into a 3-D numpy array, and then do the above multiplication to get a new 3D numpy array of scores matrices. Because we embedded into two dimensions, each score matrix is 2Γ22Γ22 \times 2, and the four score matrices are βslicesβ along the 0th axis of the numpy array.
generate multiple networks
Any
explain how to get each V first
Combining the networksΒΆ
add omni as another one
Combining the classifiersΒΆ
remove
Multiple Adjacency Spectral EmbeddingΒΆ
3.5.1
communities
we are learning representations and we can do whatever we want with them, we illustrate with clasification just to show something,
product
find a better word
Model
clarify label swapping
Adjacency Spectral Embedding
only under some strict models. clarify
For example, say you have nine networks of normal brains and nine networks of abnormal brains.
nine networks from each population.
To illustrate this point, we simulate 18 brains using the below code.
The normal and abnormal brains are all drawn from stochastic block models, but they come from different distributions. If you look at the code, youβll see that the normal brains were set up to have strong connections within communities, whereas the abnormal brains were set up to have strong connections between communities. Below is a plot of the adjacency networks for every normal and every abnormal brain weβve created.
this paragraph goes below this figure?
types
groups
create
estimate
algorithm
please no
β
(0,1)
(π₯βΒ β€ππ₯βΒ π
define transpose notation
2
maybe 1
We will call the matrix πPP the probability matrix whose ππ‘βithi^{th} row and ππ‘βjthj^{th} column is the entry πππpijp_{ij}, as defined above. Stated another way, π=(πππ)P=(pij)P = (p_{ij})
this goes before any of the actual models, it is the fundamental object associated with any independent edge model
π
for each model, write out the P matrix.
We showed above the model for the random node-assignment variable, ππβΟΟβ\vec{\pmb \tau}, above.
remove
process
variable
π11
connect to tau better. and remove the \mathcal{E}
R.V.
write out random variable
50
what happend to the other 100?
Likelihood
can you also write out
P = tau' B tau
use this later to show that you can always rewrite the above as
P = x' x, where x = tau * sqrt(B)
β
little k
πβ
not a parameter, given realization of a random variable
known
forgot an asterisk
Logically
not logic
unique
remove
cluster
group
222
write out numbers < 10
πΈπEiE_i,
maybe add the union of these equivalence classes is A_n
1. and 2.
remove periods
if it is totally unambiguous what πΞΈ\theta refers to,
remove
As there are no diagonal entries to our matrix π΄AA, this means that there are π2βπ=π(πβ1)n2βn=n(nβ1)n^2 - n = n(n - 1) values that are definitely not 000 in π΄AA, since there are πnn diagonal entries of π΄AA (due to the fact that it is hollow)
wrong, might not be 0
don't talk about AdjMat as the network
To summarize
before this, put the 'formally, ...." stuff
π΄Β is anΒ πΓπΒ matrix withΒ 0s andΒ 1s,π΄Β is symmetric,π΄Β is hollow
this should be formal if you write 'formally'
a_ij \in {0,1}, a_ij = a_ji, a_ii = 0, \forall i,j \in [n]
|ξ±||E||\mathcal E|
m, and then use it, not m(A)
This means that all the networks are hollow.
hollow is a property of the adjacency matrix
right
appropriate/ justified / reasonable
learn
explore and exploit
If you care about whatβs under the hood you should also have a reasonable understanding of college-level math, such as calculus, linear algebra, probability, and statistics.
add, 'but you won't need it to understand the content, except certain sections that we will highlight as 'advanced material'
http://docs.neurodata.io/graph-stats-book/intro.html
hyperlink
Johns Hopkins University
the NeuroData lab at
for example
capitalize.
for each example, define the nodes and edges
Network Machine Learning
no caps