- Apr 2022
-
docs.neurodata.io docs.neurodata.io
-
heatmap
range -1 +1
-
Since most of these metrics already exist in networkx, youβll just pull from there. You can check the networkx documentation for details.
use graspologic
-
community
estimated community
-
features
these features
-
for reference, gives an indication of how clustered the network is, and works by counting the proportion of triangles out of all edges that share a node.
refer to previous definition
-
triangles
clarify earlier that we always mean closed after
-
statistics
for the 4 statistics we show
-
103501035010^{350}
cite # atoms in universe, number of chess games, number of go games on a standard go board
-
How do you define a distance between one node and another node?
How do you divide a network by the number 6
-
-
docs.neurodata.io docs.neurodata.io
-
largest
mention strongly and weakly connected (in a sentence)
-
ππΌ,π΅πΎ),(π΅πΎ,ππ»),(ππ»,π),(ππ»,π΅π),(π,π΅π
use 1 letter acronyms for each borrough
-
Island
,
-
π΄
for undriectd graphs
-
figure
move MH
-
see
change layout so they are not linear
-
degree
do directed case too
-
matrix
do directed case too
-
ππππππ(π)=βπ=1ππππ=βπ:πππ=1πππ+βπ:πππ=0πππ=βπ:πππ=11+0
remove
-
adjacency
weighted
-
πjj
for example
-
matrix
hollow
-
trite
why?
-
In the context of this book, you will usually only worry about the undirected case, or when the presence of an arrow implies that the other direction exists, too. A network is undirected if an edge between node πii and node πjj implies that node πjj is also connected to node πii. For this reason, you will usually omit the arrows entirely, like you show below:
we
-
red
super not salient
-
you look
we see
-
-
docs.neurodata.io docs.neurodata.io
-
The
An Adjacency Matrix not The Adjacency Matrix
-
Matrix
there are n! different adjacency matrices, and we are looking at 1 of them.
-
Laplacian
mention directed graphs
and mention all the ones in graspologic
-
This lets you make inferences with πΏππππππππ§ππLnormalizedL^{normalized} that you couldnβt make with πΏLL.
true?
-
code
there should be a function in graspologic to convert an adjacency to a laplacian and back.
-
below
?
-
AttributeError Traceback (most recent call last)
fix
-
4.1.2. The Incidence MatrixΒΆ Instead of having values in a symmetric matrix represent possible edges, like with the Adjacency Matrix, we could have rows represent nodes and columns represent edges. This is called the Incidence Matrix, and itβs useful to know about β although it wonβt appear too much in this book. If there are πnn nodes and πmm edges, you make an πΓπnΓmn \times m matrix. Then, to determine whether a node is a member of a given edge, youβd go to that nodeβs row and the edgeβs column. If the entry is nonzero (111 if the network is unweighted), then the node is a member of that edge, and if thereβs a 000, the node is not a member of that edge. You can see the incidence matrix for our network below. Notice that with incidence plots, edges are (generally arbitrarily) assigned indices as well as nodes. Click to show from networkx.linalg.graphmatrix import incidence_matrix cmap = sns.color_palette("Purples", n_colors=2) I = incidence_matrix(nx.Graph(A)).toarray().astype(int) fig, axs = plt.subplots(1, 2, figsize=(12,6)) plot = sns.heatmap(I, annot=True, linewidths=.1, cmap=cmap, cbar=False, xticklabels=True, yticklabels=True, ax=axs[0]); plot.set_xlabel("Edges") plot.set_ylabel("Nodes") plot.set_title("Incidence matrix", fontsize=18) ax2 = nx.draw_networkx(G, with_labels=True, node_color="tab:purple", pos=pos, font_size=10, font_color="whitesmoke", arrows=True, edge_color="black", width=1, ax=axs[1]) ax2 = plt.gca() ax2.text(.24, 0.2, s="Edge 1", color='black', fontsize=11, rotation=65) ax2.text(.45, 0.01, s="Edge 0", color='black', fontsize=11) ax2.set_title("Layout plot", fontsize=18) sns.despine(ax=ax2, left=True, bottom=True) Copy to clipboard /tmp/ipykernel_4749/905225500.py:4: FutureWarning: incidence_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0. I = incidence_matrix(nx.Graph(A)).toarray().astype(int) Copy to clipboard When networks are large, incidence matrices tend to be extremely sparse β meaning, their values are mostly 0βs. This is because each column must have exactly two nonzero values along its rows: one value for the first node its edge is connected to, and another for the second. Because of this, incidence matrices are usually represented in Python computationally as scipyβs sparse matrices rather than as numpy arrays, since this data type is much better-suited for matrices which contain mostly zeroes. You can also add orientation to incidence matrices, even in undirected networks, which weβll discuss next. 4.1.3. The Oriented Incidence MatrixΒΆ The oriented incidence matrix is extremely similar to the normal incidence matrix, except that you assign a direction or orientation to each edge: you define one of its nodes as being the head node, and the other as being the tail. For undirected networks, you can assign directionality arbitrarily. Then, for the column in the incidence matrix corresponding to a given edge, the tail node has a value of β1β1-1, and the head node has a value of 000. Nodes who arenβt a member of a particular edge are still assigned values of 000. Weβll give the oriented incidence matrix the name πNN. Click to show from networkx.linalg.graphmatrix import incidence_matrix cmap = sns.color_palette("Purples", n_colors=3) N = incidence_matrix(nx.Graph(A), oriented=True).toarray().astype(int) fig, axs = plt.subplots(1, 2, figsize=(12,6)) plot = sns.heatmap(N, annot=True, linewidths=.1, cmap=cmap, cbar=False, xticklabels=True, yticklabels=True, ax=axs[0]); plot.set_xlabel("Edges") plot.set_ylabel("Nodes") plot.set_title("Oriented Incidence matrix $N$", fontsize=18) plot.annotate("Tail Node", (.05, .95), color='black', fontsize=11) plot.annotate("Head Node", (.05, 1.95), color='white', fontsize=11) ax2 = nx.draw_networkx(G, with_labels=True, node_color="tab:purple", pos=pos, font_size=10, font_color="whitesmoke", arrows=True, edge_color="black", width=1, ax=axs[1]) ax2 = plt.gca() ax2.text(.24, 0.2, s="Edge 1", color='black', fontsize=11, rotation=65) ax2.text(.45, 0.01, s="Edge 0", color='black', fontsize=11) ax2.set_title("Layout plot", fontsize=18) sns.despine(ax=ax2, left=True, bottom=True) ax2.text(-.1, -.05, s="Tail Node", color='black', fontsize=11) ax2.text(.9, -.05, s="Head Node", color='black', fontsize=11) Copy to clipboard /tmp/ipykernel_4749/2332982598.py:4: FutureWarning: incidence_matrix will return a scipy.sparse array instead of a matrix in Networkx 3.0. N = incidence_matrix(nx.Graph(A), oriented=True).toarray().astype(int) Copy to clipboard Text(0.9, -0.05, 'Head Node') Copy to clipboard Although we wonβt use incidence matrices, oriented or otherwise, in this book too much, we introduced them because thereβs a deep connection between incidence matrices, adjacency matrices, and a matrix representation that we havenβt introduced yet called the Laplacian. Before we can explore that connection, weβll discuss one more representation: the degree matrix.
put in appendix
-
ax2.text(0, 0.2, s="Nodes 0 and 2 \nare connected", color='black', fontsize=11, rotation=63)
add nodes 0 and 1 are connected
-
The most
A
-
- Mar 2022
-
docs.neurodata.io docs.neurodata.io
-
1.1.2. Viewing Networks StatisticallyΒΆ
Network machine learning v. network data science
-
1.1. What Is A Network?ΒΆ
What is Network Machine Learning
-
features
give example, but then show something else
-
For instance, each node might have a set of features attached to it: extra information that comes in the form of a table. For instance
2 for instances
-
-
docs.neurodata.io docs.neurodata.io
-
refer to this in preface, don't put it here
-
-
docs.neurodata.io docs.neurodata.io
-
Acknowledgements
add all the other crap he included
-
models
i wouldn't use the word 'model' in this book
-
The theoretical underpinnings of the techniques described in the book
i'd put in appendix
-
network data science
maybe make a venn diagram including networks, data science, and machine learning
-
Network Machine Learning and YouΒΆ
start with something awesome that happened in the world because of network data science?
-
a statistical object
avoid 'statistics' everywhere
-
This book assumes you know next to nothing about how networks can be viewed as a statistical object
about how you can learn from network data.
not 'viewed as a stats object'
-
With the nodes as employees, edges as team co-assignment between employees, and node covariates as employee performance, you can isolate groups within your company which are over or underperforming employee expectations.
move to end before brains
-
So
delete 'so'
-
- Dec 2021
-
docs.neurodata.io docs.neurodata.io
-
Representations of
dos and donts
-
-
docs.neurodata.io docs.neurodata.io
-
, but are often more generalizable
remove
-
Connection
provide intuition or move to appendix
-
πnn nodes
permutation invariant
-
this book
first in sentence
-
-
docs.neurodata.io docs.neurodata.io
-
2
1
-
without
so we estimate them?
-
-
docs.neurodata.io docs.neurodata.io
-
πΟ\rho
let's mention Rho too
-
5.5.3. Correlated Network ModelsΒΆ
put time-series after this
-
π (1)R(1)R^{(1)} and
let these be the identity matrix. i think you can just QR these, and then inverse QR R^3
-
score matrices
clarify that they will be dfeind below
-
Letβs try this first by looking at the latent position matrices for π(1)A(1)\mathbf A^{(1)} and π(2)A(2)\mathbf A^{(2)} from the random networks for Monday and Tuesday first:
they are implicit and not necessarily the same, rather, only X'X is the same.
-
π
M
-
- Oct 2021
-
docs.neurodata.io docs.neurodata.io
-
πβ€P=XXβ€P = XX^\top!
"!"
-
-
docs.neurodata.io docs.neurodata.io
-
Pseudoinverse
go end to end
-
array([0.74745577, 0.49019859])
too many sig digs
-
f you think about it, however, you can think of this adjacency vector as kind of an estimate for edge probabilities. Say you sample a nodeβs adjacency vector from an RDPG, then you sample again, and again. Averaging all of your samples will get you closer and closer to the actual edge connection probabilities.
not quite
-
The average values were .775 and .149 - so, pretty close!
say we computed
-
took a long time
computtion
-
-
docs.neurodata.io docs.neurodata.io
-
πΈπ ERER
ER_n(p) or ER random network every time
-
- Sep 2021
-
docs.neurodata.io docs.neurodata.io
-
their paper is called βOn a two-truths phenomenon in spectral graph clusteringβ - itβs an interesting read for the curious)
up
-
- in 2018 -
,
-
Adjacency
combine with the below
-
useful information
careful
-
them
explicitly refer to the figure
-
Heck
goes earlier
-
Laplacian
never write 'the laplacian' always write "normalize" (etc) Laplacian
-
two
is this a rank 2 matrix?
-
Stochastic Block Model
no caps
-
Spectral Embedding
no caps
-
ββ€β¦β₯β₯[βπ’βΒ π1β]+π2β‘β£β’β’βπ’βΒ 2ββ€β¦β₯β₯[βπ’βΒ π2β]+...+ππβ‘β£β’β’βπ’βΒ πββ€β¦β₯β₯[βπ’βΒ ππβ]
confusing
-
-
-
close
in MSE
-
ninety-degree
maybe not so correct, eg, 90 degrees in high dimensions does not specify the other angles?
-
Singular Value Decomposition
lower case
-
and another which rotates them bac
i wouldn't say this
-
[x11"x22β±"xnn]\begin{align*} \begin{bmatrix} x_{11} & & & " \\ & x_{22} & & \\ & & \ddots & \\ " & & & x_{nn} \end{bmatrix} \end{align*}
makes me think this is diagonal
-
youβll
the alg
-
submatrices
not them
-
break a single matrix
decompose a matrix
or factorize?
-
default
for undirected graphs
-
Laplacians
such a thing as directed laplacians
-
- Aug 2021
-
docs.neurodata.io docs.neurodata.io
-
The
comes out of nowhere?
-
started
present tense
-
The
make more high level
-
-
docs.neurodata.io docs.neurodata.io
-
5.2.7. Network models for networks which arenβt simpleΒΆ
make part of ER section
-
-
docs.neurodata.io docs.neurodata.io
-
whether youβre using a neural network, or a decision tree, or whether your goal is to classify observations or to predict a value using regression -
those 'or's are not parallel
-
machine learning
CS and physics
-
-
docs.neurodata.io docs.neurodata.io
-
6.3.1. The Coin Flip ExampleΒΆ
starred? maybe mention MLE in section title?
-
πβpβp^*
ptilde
-
two plots look almost nothing alike
i wouldn't say that
-
βπ(π=π΄)=β(π11=π11,π12=π12,...,πππ=πππ)=βπ,πβπ(πππ=πππ)
don't forget punctuation
-
(in network statistics, often just one)
not quite
-
we can never hope to understand the true distribution of πA\mathbf A.
strong
-
-
docs.neurodata.io docs.neurodata.io
-
β¨π₯βΒ βπ¦βΒ ,π₯βΒ βπ¦βΒ β©
remove
-
With π₯βΒ xβ\vec x defined as above in the sum example, we would have that: βπ=13π₯π=6.12βi=13xi=6.12\begin{align*} \prod_{i = 1}^3 x_i = 6.12 \end{align*} if we were to use ξ΅={1,3}I={1,3}\mathcal I = \{1,3\}, then: βπβξ΅π₯π=3.4
redundant
-
ted to multiply all the elements of π₯βΒ xβ\vec x, we would write: βπ=1ππ₯π=π₯1Γπ₯2Γ...Γπ₯πβi=1dxi=x1Γx2Γ...Γxd\begin{align*} \prod_{i = 1}^d x_i = x_1 \times x_2 \times ... \times x_d \end{align*} Where ΓΓ\times is just multiplication like you are probably used to. Again, we have the exact same indexing conventions, where: βπβ[π]π₯π=βπ=1ππ₯π=π₯1Γπ₯2Γ...Γπ₯π
redundant
-
In the general case, for a set ξΏS\mathcal S, we would say that πβξΏπΓπSβSrΓcS \in \mathcal S^{r \times c} if (think this through!) for any πβ[π]iβ[r]i \in [r] and any πβ[π]jβ[c]j \in [c], π ππβξΏsijβSs_{ij} \in \mathcal S
overkill?
-
- Jun 2021
-
docs.neurodata.io docs.neurodata.io
-
. We Like previously, there are two types of RDPGs: one in which πXX is treated as known, and another in which πXX is treated as unknown.
delete
-
Symbolically
this is just 1 word.
-
such
such as
-
300300300
be consistent. let's stick with 100 students
-
, since they are undirected; if we relax the requirement of undirectedness (and allow directed networks) π΅BB no longer need be symmetric.
delete
-
simple
undirected
-
Imagine that we are flipping a fair single coin. A fair coin is a coin in which the probability of seeing either a heads or a tails on a coin flip is 1212\frac{1}{2}. Letβs imagine we flip the coin 202020 times, and we see 101010 heads and 101010 tails. What would happen if we were to flip 222 coins, which had a different probability of seeing heads or tails? Imagine that we flip each coin 10 times. The first 10 flips are with a fair coin, and we might see an outcome of five heads and five tails. On the other hand, the second ten flips are not with a fair coin, but with a coin that has a 4545\frac{4}{5} probability to land on heads, and a 1515\frac{1}{5} probability of landing on tails. In the second set of 101010 flips, we might see an outcome of nine heads and one tails. In the first set of 20 coin flips, all of the coin flips are performed with the same coin. Stated another way, we have a single group, or a set of coin flips which are similar. On the other hand, in the second set of twenty coin flips, twenty of the coin flips are performed with a fair coin, and 10 of the coin flips are performed with a different coin which is not fair. Here, we have two clusters of coin flips, those that occur with the first coin, and those that occur with the second coin. Since the first cluster of coin flips are with a fair coin, we expect that coin flips from the first cluster will not necessarily have an identical number of heads and tails, but at least a similar number of heads and tails. On the other hand, coin flips from the second cluster will tend to have more heads than tails.
but a simplified version of this after the description.
-
happens
but below equation on 1 line
-
ER network
ER random network
-
nework
network
-
Also, we let πππ=0aii=0\mathbf a_{ii} = 0, which means that all self-loops are always unconnected
Also, we assume it is not possible for anyone to be friends with themselves, which means that a_ii = NaN for all i.
-
When π>πi>ji > j, we allow πππ=πππaij=aji\mathbf a_{ij} = \mathbf a_{ji}. This means that the connections across the diagonal of the adjacency matrix are all equal, which means that we have built-in the property of undirectedness into our networks
too complicated.
We assume here that edges are undirected meaning that if there is an edge from i to j then there is also an edge from j to i
-
ErdΓΆs RΓ©nyi network
no such thing as an ER network
-
The ErdΓΆs RΓ©nyi model formalizes this relatively simple model with a single parameter:
change something, either this sentence or the table, iid belongs somewhere
-
model
use a different word
-
If π΄AA and π΄β³Aβ³A'' are members of different equivalence classes; that is, π΄βπΈAβEA \in E and π΄β³βπΈβ²Aβ³βEβ²A'' \in E' where πΈ,πΈβ²E,Eβ²E, E' are equivalence classes, then βπ(π΄)β βπ(π΄β³)PΞΈ(A)β PΞΈ(Aβ³)\mathbb P_\theta(A) \neq \mathbb P_\theta(A'').
delete
-
(π΄
otherwise, ...
-
πΈEE
use different notation
-
For the case of one observed network π΄AA, an estimate of πΞΈ\theta (referred to as πΜΒ ΞΈ^\hat\theta) would simply be for πΜΒ ΞΈ^\hat\theta to have a 111 in the entry corresponding to our observed network, and a 000 everywhere else. Inferentially, this would imply that the network-valued random variable πA\mathbf A which governs realizations π΄AA is deterministic, even if this is not the case.
MLE
-
variaable
typo
-
parameters
distribution
-
,
above figure goes before section header
-
To
directed with loops for simplicity
-
think
first think about nodes
-
first
first edge?
edges are not ordered
-
-
docs.neurodata.io docs.neurodata.io
-
Fundamentally
and it borrows strength
-
Now, the only question is how to actually pull the separate latent positions for each network from this matrix.
clarify
-
big
do we want to use 4 different symbols for the 4 different numbers?
-
rows
blocks
-
you donβt have to rotate or flip your points to line them up across embeddings.
assuming no multiplicity of eigenvalues
-
big circle
big red circle. i prefer no outlines
-
bad
not very bad
-
β
remove '-' here
-
rotations of each other
i don't think this is english
-
none of the embeddings are rotations of each other
not quite right. talk to pedigo.
-
- May 2021
-
docs.neurodata.io docs.neurodata.io
-
two
d_i
-
So
ask pedigo
-
In this example they have eight (the number of columns in our combined matrix), but remember that in general weβd have πΓπmΓdm \times d. T
the problem is that they are not in the same subspace
-
averaging
where did .5 go
-
great
often a great idea (for example....)
-
node
these embeddings should look different
-
he edges of
remove
-
Remember
new section heading: averaging
-
Grouping the Networks SeparatelyΒΆ
different ways to do joint analysis
-
normal
removr
-
- Apr 2021
-
docs.neurodata.io docs.neurodata.io
-
Below, we turn our list of networks into a 3-D numpy array, and then do the above multiplication to get a new 3D numpy array of scores matrices. Because we embedded into two dimensions, each score matrix is 2Γ22Γ22 \times 2, and the four score matrices are βslicesβ along the 0th axis of the numpy array.
generate multiple networks
-
Any
explain how to get each V first
-
Combining the networksΒΆ
add omni as another one
-
Combining the classifiersΒΆ
remove
-
Multiple Adjacency Spectral EmbeddingΒΆ
3.5.1
-
communities
we are learning representations and we can do whatever we want with them, we illustrate with clasification just to show something,
-
product
find a better word
-
Model
clarify label swapping
-
Adjacency Spectral Embedding
only under some strict models. clarify
-
For example, say you have nine networks of normal brains and nine networks of abnormal brains.
nine networks from each population.
To illustrate this point, we simulate 18 brains using the below code.
-
The normal and abnormal brains are all drawn from stochastic block models, but they come from different distributions. If you look at the code, youβll see that the normal brains were set up to have strong connections within communities, whereas the abnormal brains were set up to have strong connections between communities. Below is a plot of the adjacency networks for every normal and every abnormal brain weβve created.
this paragraph goes below this figure?
-
types
groups
-
create
estimate
-
algorithm
please no
-
-
docs.neurodata.io docs.neurodata.io
-
β
(0,1)
-
(π₯βΒ β€ππ₯βΒ π
define transpose notation
-
2
maybe 1
-
We will call the matrix πPP the probability matrix whose ππ‘βithi^{th} row and ππ‘βjthj^{th} column is the entry πππpijp_{ij}, as defined above. Stated another way, π=(πππ)P=(pij)P = (p_{ij})
this goes before any of the actual models, it is the fundamental object associated with any independent edge model
-
π
for each model, write out the P matrix.
-
We showed above the model for the random node-assignment variable, ππβΟΟβ\vec{\pmb \tau}, above.
remove
-
process
variable
-
π11
connect to tau better. and remove the \mathcal{E}
-
R.V.
write out random variable
-
50
what happend to the other 100?
-
Likelihood
can you also write out
P = tau' B tau
use this later to show that you can always rewrite the above as
P = x' x, where x = tau * sqrt(B)
-
β
little k
-
πβ
not a parameter, given realization of a random variable
-
known
forgot an asterisk
-
Logically
not logic
-
unique
remove
-
cluster
group
-
222
write out numbers < 10
-
πΈπEiE_i,
maybe add the union of these equivalence classes is A_n
-
1. and 2.
remove periods
-
if it is totally unambiguous what πΞΈ\theta refers to,
remove
-
As there are no diagonal entries to our matrix π΄AA, this means that there are π2βπ=π(πβ1)n2βn=n(nβ1)n^2 - n = n(n - 1) values that are definitely not 000 in π΄AA, since there are πnn diagonal entries of π΄AA (due to the fact that it is hollow)
wrong, might not be 0
don't talk about AdjMat as the network
-
To summarize
before this, put the 'formally, ...." stuff
-
π΄Β is anΒ πΓπΒ matrix withΒ 0s andΒ 1s,π΄Β is symmetric,π΄Β is hollow
this should be formal if you write 'formally'
a_ij \in {0,1}, a_ij = a_ji, a_ii = 0, \forall i,j \in [n]
-
|ξ±||E||\mathcal E|
m, and then use it, not m(A)
-
This means that all the networks are hollow.
hollow is a property of the adjacency matrix
-
-
docs.neurodata.io docs.neurodata.io
-
right
appropriate/ justified / reasonable
-
learn
explore and exploit
-
If you care about whatβs under the hood you should also have a reasonable understanding of college-level math, such as calculus, linear algebra, probability, and statistics.
add, 'but you won't need it to understand the content, except certain sections that we will highlight as 'advanced material'
-
http://docs.neurodata.io/graph-stats-book/intro.html
hyperlink
-
Johns Hopkins University
the NeuroData lab at
-
for example
capitalize.
for each example, define the nodes and edges
-
Network Machine Learning
no caps
-