One of the most popular statistics to use to determine sparsity in realized networks is the network density, but there are many others that have their own advantages [7], [8].
delete?
One of the most popular statistics to use to determine sparsity in realized networks is the network density, but there are many others that have their own advantages [7], [8].
delete?
infinite
doesn't need to be infinite i dont think
x→(n), this quantity could be written: 𝕩
why does 'x' look different
ip:
no . after Pr
element xi(n) is a random
why superscript
9.3.1.3. The algorithmic implications#
give example with a big Matvec Op
Unfortunately, Fisher’s exact test has a slight caveat: it can be extremely computationally intensive to compute, especially when the number of data observations that we have (in this case, 200,000) is really big (it could be even bigger than 200,000).
no
SNP 1, alternative base T
SNP 300M
Let’s assume that we have a small task, where for each row in the matrix X, we want to compute the row-wise sum. Stated another way, for a given row i, the quantity that you want to compute is ∑j=1mxij. If you ignore sparsity all-together, you can do this operation pretty easily: there are n rows, and m terms that you need to add together for each row, which means that you will have n⋅m total operations to perform (for each of n rows, perform an addition involving m terms).
no m
If the rows can be sparse, the columns could be too; let’s assume that we have a matrix where m′ of the columns are sparse. Following a similar approach to the above, if we had a list Y with m′ elements telling us which columns were not sparse, we could just store the m′ non-sparse columns (each of which has n rows), and then the list of the m′ non-zero elements. Like above, we can store this information with 64⋅(n⋅m′+m′+1) bits.
necessary
Let’s say that of these n rows, we know ahead of time that a lot of the rows are sparse. By “row sparse”, what we mean is that xij=0 for all of these sparse rows i. Let’s assume that of the n total rows, only n′ are not sparse. We could, for instance, store the non-sparse rows in a little set X which has n′ elements telling us which rows are not sparse. For these non-sparse rows, we store all m pieces of column-wise information, but for the sparse rows, we just ignore them entirely. To store this entire matrix, we will need 64⋅(n′⋅m) (64 bits for each entry of a non-sparse row) +64⋅n′ (64 bits to store each element of X) +64 (to store the total number of rows that the matrix has), for a total of 64⋅(n′⋅m+n′+1) bits.
matrix sparsity
the rows are sparse. By “row sparse”, what we mean is that xij=0 for all of these sparse rows i.
these are different
but a common cutoff is if the number of non-zero elements is at most the number of rows or columns.
dont think so
L’Hopital’s
i hope not
Examples
problems in NML:
1 para per ch. 4 thing but also re-read Homl and check if any are easily portable (eg, too small network, or too dense detwork)
intuitively
tie back to assumptions
allows
asuumes a particular form of
attributes
edge node network multi-network
mentally
alien
Networks with cross-network attributes
multiple networks with node attributes and/or labels
For
give AlphaFold example too
with more than one element
new sentence
for it to be a meaningful network, there must be multiple nodes and edges
usually
is defined by a
whether or not the approach can be used in isolation from a statistical model (non-model based or model-based network learning systems).
add a paragraph about edge vs node vs community vs network
and
vs
As the internet became widespread and coding tools became easier to use – Python became prevalent in machine learning, for instance, and cloud computing came into its own with Amazon’s AWS and Microsoft’s Azure –
delete
cloud computing came into its own with Amazon’s AWS and Microsoft’s Azure –
remove
One crucially influential application for networks was in 1996, when a graduate student at Stanford named Larry Page made the PageRank algorithm. The idea was that websites on the internet (which, in 1996, had barely formed) could be ordered into a hierarchy by “link popularity”: a web page would rank higher the more links there were to it. Larry Page and his friend Sergey Brin realized that PageRank could be used to create a search engine – and so they used the PageRank algorithm to found a small web searching company they called Google.
this paragraph is redundant
machine
refer back to venn diagram
Fig
'network population' --> 'network population assumption'
'network sample = data'
'network machine learning' <-- 'learn about the network sample'
to the right, is 'guess about some property of network population'
who could potentially have the mental illness
psychological property, or skill
network
special cases
1.2.2.3. We might errorfully observe the networks¶
goes first
, and although this book doesn’t focus on GNNs specifically, it does give you the fundamental ideas that you can build off of to understand them.
. This book provides the basic foundational concepts and intuition required to understand how, when, and why GNNs, or any other network machine learning tool, works.
organized
can be thought of as
Broadly
replace ML with 'statistical learning'
add pointer to ML which is the overlap of SL + DS
add pointer graph theory = overlap of NS + DS
Dr
isn't he a section contributor
independence
hypothesis
Microsoft
DARPA program manager
ericwb95 - at - gmail - dot - com
use your neurodata email address. eric@neurodata.io
ask jong
Doksum
add diversity to recommendations
texts
others
would be
is
we think a reasonable
our favorite
learning
add bullets to appendix
which
that
unfortunately
remove
Machine Learning
decapitalize
easy to use
hyphenate?
everything unique
overclaim
maybe mention a chinese/indian/one
nearly
over
the development of machine learning strategies for data that is a network.
machine learning for network-valued data
For
al ot of 'for instances' here
have
choose
We don’t really like that word
we like it, there is a downside
machine learning
and data science
nd each column represented the length and biological sex (male or female) of the lobster
just 'sex'
a piece of
some
wikipedia
fix to say what wiki says
Learning
make these pages auto generate a ToC
Hands-on Network Machine Learning with Scikit-Learn and Graspologic¶
list authors
SignalSubgraph
check that it deals with ties properly
10.2.3.2. Classification with Bayes Plugin Classifier¶
Graph Classification
10.2.3.1. Bayes Plugin Classifier (Statistical Intuition)¶
appendix
humans
humans
astronauts
martians
Can we come up with a signal subnetwork classifier?
this means find the subnets that differ
astronauts
the descendents of the astronauts are astronauts, are they?
lobes
did we change that? i thought we were going with sensory modalities
Estimation
consolidate bootstrap stuff, maybe in appendix
above
ensure the result is still (0,1)
There will be the same number of adjacency matrices as there are time points, since our network will be changing over time.
confusing
implement
run
VNSGM
introduce acronym earlier
8.4
jovo didn't do this yet
-
show match ratio here too
Unshuffling
Matching
match ratio(𝑃,𝑃𝑢)
update equation
match_ratio
put in graspologic
𝑃𝐵PBPB
transpose
reorder
not really
0,1,2,3}
i don't think this example works because there are multiple permutations that yield 0
The
/linebreak
If we consider the worst possible case (every edge in 𝐴AA does not exist in 𝐵BB), 𝐴=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐵=012012⎛⎝⎜⎜000000000⎞⎠⎟⎟𝐴−𝐵=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟||𝐴−𝐵||2𝐹=6
seems unnecessary
𝐴=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐵=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐴−𝐵=012012⎛⎝⎜⎜000000000⎞⎠⎟⎟||𝐴−𝐵||2𝐹=0
this seems unnecessary
package
only sklean, scipy, graspologic
stochastic_block_test
put this in graspy
familywise error rate
previously FWER
𝐵(
need period
special
no
This
and we dont need matched!!!!
multipletests
does that work?
𝑎≠𝑏a≠ba \neq b
not nested
assumptions
guess
Let’s formalize this situation a little bit more. We have the following three hypotheses. 𝐻0:𝑝1=𝑝2=𝑝3=𝑎H0:p1=p2=p3=aH_0: p_1 = p_2 = p_3 = a, against 𝐻1:𝑝1=𝑝2=𝑎H1:p1=p2=aH_1: p_1 = p_2 = a, but 𝑝3=𝑐p3=cp_3 = c. Finally, we have 𝐻2:𝑝1=𝑎H2:p1=aH_2: p_1 = a, 𝑝2=𝑏p2=bp_2 = b, and 𝑝3=𝑐p3=cp_3 = c. The hypothesis 𝐻HH is nested in the hypothesis 𝐻′H′H' if whenever 𝐻HH is true, 𝐻′H′H' is also true. In this sense, the hypothesis 𝐻′H′H' is said to contain the hypothesis 𝐻HH. Let’s consider 𝐻0H0H_0 and 𝐻1H1H_1, for instance. Notice that if 𝐻0H0H_0 is true, then 𝑝1=𝑝2=𝑝3=𝑎p1=p2=p3=ap_1 = p_2 = p_3 = a. However, 𝐻1H1H_1 is also true, since 𝑝1=𝑝2=𝑎p1=p2=ap_1 = p_2 = a, and 𝑝3=𝑐p3=cp_3= c can also be set equal to 𝑝1p1p_1 and 𝑝2p2p_2 if 𝑐=𝑎c=ac = a. A sequence of hypotheses 𝐻0,𝐻1,...,𝐻𝑛H0,H1,...,HnH_0, H_1, ..., H_n is called sequentially nested if 𝐻0H0H_0 is nested in 𝐻1H1H_1, which is nested in 𝐻2H2H_2, so on and so forth up to 𝐻𝑛−1Hn−1H_{n-1} is nested in 𝐻𝑛HnH_n. Note that the sequence of hypotheses that we presented for our three coin example are sequentially nested. We already saw that 𝐻0H0H_0 was nested in 𝐻1H1H_1. Now, let’s compare 𝐻2H2H_2 to 𝐻1H1H_1. Notet that if 𝑎=𝑏a=ba = b, that 𝑝1=𝑝2p1=p2p_1 = p_2, and 𝑝3=𝑐p3=cp_3 = c, exactly as in 𝐻1H1H_1, so 𝐻1H1H_1 is nested in 𝐻2H2H_2. Therefore, since 𝐻0H0H_0 is nested in 𝐻1H1H_1 and 𝐻1H1H_1 is nested in 𝐻2H2H_2, The sequence 𝐻0H0H_0, 𝐻1H1H_1, and 𝐻2H2H_2 are sequentially nested.
dense.
draw a diagram
samples with which we are presented
data
and
by
presenting
selecting among
describe
may describe
=
\neq
faithful
accurate, veridical,
Pretty exciting, huh?
this pvalue is not valid
see appendix for a robust approach that has higher power for weighted networks.
overcoming
appendix
. Unfortunately, if the data is not well-summarized by a normal distribution, the 𝑡tt-test tends to be a fairly poor choice for hypothesis testing.
not quite right
8.2.2.2.2. Weighted Networks¶
appendix
,
no space after comma
below plot
weird formatting
8.2. Testing for Differences between Groups of Edges¶
between known groups of edges
the
same here
the number of adjacencies in cluster one with an adjacency of zero
the # fo zero valued adjacencies
8.2.2.1. Hypothesis Testing with coin flips¶
these sections all go in appendix
alternative
null, as opposed to the alternative
indicates
i don't think they indicate anything
they assert
RDPG
not true. GRDPG does
8.2.1. The Structured Independent Edge Model is parametrized by a Cluster-Assignment Matrix and a probability vector
this is a model, so goes in ch. 5
higher chance two students are friends if they go to the same school than if they go to two different schools.
RDPG must find this.
so, use GRDGP or a different model/hypothesis
resort
re-sort
the
remove word
8.1.1.2. Evaluating
the interesting thing for k-means, silloutte, ARI, etc. is showing them in a graph, and showing when they get it wrong.
and then showing AutoGMM gets it right.
8.1.1.1
non-graph things go in appendix, including: - k-means - silloutte score - ARI - confusion
heatmap
adjacency matrix
hat if your true labels are disproportionate
it doesn't normalize for chance.
You
add a section on graspologics thingy.
that may require updating grapsologic documentation
Temporary cluster assignments
Find closest center for each point
enters from previous iteration
Compute all distances to center
3 step
2
smack dab
approximately
ry to find reasonable guesses at the “centers”
not our goal here
dataset
and the label of each pont
ur goal is to learn about the block matrix, 𝐵BB,
learn the latent community assignment vector
these nodes tend to be more connected (more edges exist between and amongst them)
communities are groups of nodes that are stochastically equivalent.
Non-Identifiability
move to ase section?
had to delete
deleted
and so, f
Finally
Embedding
the point is that your embeddings are not in the same space.
humans + aliens
maybe clarify that
and so forth
remove
forth
,
first
introduce mase before omni if you are explaining mase before omni
used
that used
However, as you can see, the colors are flipped: the communities are in different places relative to each other.
this doesn't make any sense.
also, label communities L and R not 0 and 1.
plot_latents
plot these on the same scale
one
before this, show the true latent positions, label them Lhuman Rhuman Laliean Ralien. maybe all on one coordiate axis.
consider showing that they are not rotations of one another.
P = np.array([[pa, pb], [pc, pd]]) return sbm([n, n], P, return_labels=return_labels) # make nine human networks # and nine alien networks p1, p2, p3 = .12, .06, .03
too many parameters and don't write 9 unless you sample 9
because
b
bilateralized
bilateral
you’ll
We'll
you’ll just simulate
we'll simulate human and ...
simply
remove
aving less stuff to deal with
?
Advisors
add:
Sambit Ali Jason
dependence
logical and statistical
wheel
crank
statsmodels
use sklearn or scipy
Curse
make suck less
element
adjacency rows, but then d=n
Spectral Embedding
and GNN
5.6.1
haven't done this yet
RDPG
a different RDPG
Adm
L, A, M
IER
venn diagram on 1 graph models and n-graph models
easily
impossible. parachute
coins
unique
Ranking
comment that binarization is decimation of ranking
networks
this comes after sparsification and truncation because you modify every edge
normalization
global rescaling
Sparsification
this is a special case of 'edge trimming'
add truncation
Lowering
this isn't lowering edge bias
done
clarify that if it is weighted, the remaining edges keep their weights, as compared to binarization,
Note
One cannot get arbitrary densities if one has repeated values for weights unless one has a procedure for discarding replicates.
exclude the diagonal
check graspologic, and make issue/PR
bias
thresholding reduces variance, adds bias
he task easier to estimate
not necessarily
The bias/variance tradeoff is
reference ESL chapter
Ignoring
no. only do this when the matrix is stored as upper/lower. but then don't quite do this
degree
remove 'pendants' and 'pizza huts'
4.4.1. Regularization of the Nodes
Node pruning
Degree
show this, and re-order to do this node trimming first.
show the degree distribution before and after
You
be more clear, and show result
Nodes
desribe node latent space here, and network latent space in bag of networks
space
network latent space (as opposed to the node latent space we use to visualize nodes in a network)
Embedding this new matrix will give us a point in space for each network.
maybe move down?
dissimilarity
label the axes and update title to be dissimilarity of networks
All you need to get out of this code is that you have six networks from the first group, and another twelve networks from the second.
why not 5 and 5? or 10 and 10?
the whole
each
Nodes plotted \nas 2d points
Each node is a point.
Add a caption to this figure:
Each point is a node display in 2D latent space. Because there are 20 nodes in this graph, there are 20 points in this figure. Because there are 10 nodes in each community, we have colored 10 points in the figure to indicate which community it is in.
on a coordinate axis
in latent space
Euclidean
reals?
moving
mapping
Euclidean
not necessarily Euclidean
issue
for sound theoretical reasons
statsmodels
graspologic
end
probably
outlier
signal
outlier
signal
that
Clarify: the issue is not computing these features, but rather, interpreting them. and in particular, interpreting them in a causal light.
you’d
one could
f you’re familiar with correlation, you’ll notice that these correlation numbers generally have a pretty high magnitude: each feature generally tells you a lot about each other feature.
not quite. some are high, some are low, some say a lot about the others