Hypothesis

519 Matching Annotations

Apr 2021
docs.neurodata.io docs.neurodata.io

Single-Network Models — Network Machine Learning in Python

12
1. jovo 12 Apr 2021
  
  in Public
  
  re 𝜃(𝐴)∝ℙ𝜃(𝐴)Lθ(A)∝Pθ(A)\mathcal L_\theta(A) \propto \mathbb P_\theta(A);
  
  do everything in terms of probabilities
2. jovo 12 Apr 2021
  
  in Public
  
  The reason we write ∝∝\propto instead of === is that, in the case of probability distributions, it is often rather tedious to take care of all constants when simplifying expressions, and these constants can detract from much of the intuition of what is going on. If we instead focus on the likelihood, we only need to worry about the parts of the probability statement that deal directly with the parameters 𝜃θ\theta and the realization itself 𝐴AA.
  
  tedious. remove until we decide we need it
3. jovo 05 Apr 2021
  
  in Public
  
  What does the likelihood for the a priori SBM look like? Fortunately, since 𝜏⃗ τ→\vec \tau is a parameter of the a priori SBM, the likelihood is a bit simpler than for the a posteriori SBM. This is because the a posteriori SBM requires a marginalization over potential realizations of 𝜏𝜏→ττ→\vec{\pmb \tau}, whereas the a priori SBM does not. The likelihood is as follows, omitting detailed explanations of steps that are described above: 𝜃(𝐴)∝ℙ𝜃(𝐀=𝐴)=∏𝑗>𝑖ℙ𝜃(𝐚𝑖𝑗=𝑎𝑖𝑗)Independence Assumption=∏𝑗>𝑖𝑏𝑎𝑖𝑗ℓ𝑘(1−𝑏ℓ𝑘)1−𝑎𝑖𝑗p.m.f. of Bernoulli distribution=∏𝑘,ℓ𝑏|ℓ𝑘|ℓ𝑘(1−𝑏ℓ𝑘)𝑛ℓ𝑘−|ℓ𝑘|Lθ(A)∝Pθ(A=A)=∏j>iPθ(aij=aij)Independence Assumption=∏j>ibℓkaij(1−bℓk)1−aijp.m.f. of Bernoulli distribution=∏k,ℓbℓk|Eℓk|(1−bℓk)nℓk−|Eℓk|\begin{align*} \mathcal L_\theta(A) &\propto \mathbb P_{\theta}(\mathbf A = A) \\ &= \prod_{j > i} \mathbb P_\theta(\mathbf a_{ij} = a_{ij})\;\;\;\;\textrm{Independence Assumption} \\ &= \prod_{j > i} b_{\ell k}^{a_{ij}}(1 - b_{\ell k})^{1 - a_{ij}}\;\;\;\;\textrm{p.m.f. of Bernoulli distribution} \\ &= \prod_{k, \ell}b_{\ell k}^{|\mathcal E_{\ell k}|}(1 - b_{\ell k})^{n_{\ell k} - |\mathcal E_{\ell k}|} \end{align*} Like the ER model, there are again equivalence classes of the sample space <ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡𝑐𝑙𝑎𝑠𝑠="ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡">𝑛</ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡><hypothesis−highlightclass="hypothesis−highlight">An</hypothesis−highlight>\mathcal A_n in terms of their likelihood. Let |ℓ𝑘(𝐴)||Eℓk(A)||\mathcal E_{\ell k}(A)| denote the number of edges in the (ℓ,𝑘)(ℓ,k)(\ell, k) block of adjacency matrix 𝐴AA. For a two-community setting, with 𝜏⃗ τ→\vec \tau and 𝐵BB given, the equivalence classes are the sets: 𝐸𝑎,𝑏,𝑐(𝜏⃗ ,𝐵)={𝐴∈𝑛:11(𝐴)=𝑎,21=12(𝐴)=𝑏,22(𝐴)=𝑐}Ea,b,c(τ→,B)={A∈An:E11(A)=a,E21=E12(A)=b,E22(A)=c}\begin{align*} E_{a,b,c}(\vec \tau, B) &= \left\{A \in \mathcal A_n : \mathcal E_{11}(A) = a, \mathcal E_{21}=\mathcal E_{12}(A) = b, \mathcal E_{22}(A) = c\right\} \end{align*} The number of equivalence classes possible scales with the number of communities, and the manner in which vertices are assigned to communities (particularly, the number of nodes in each community). As before, we have the following. For any 𝜏⃗ τ→\vec \tau and 𝐵BB: If 𝐴,𝐴′∈𝐸𝑎,𝑏,𝑐(𝜏⃗ ,𝐵)A,A′∈Ea,b,c(τ→,B)A, A' \in E_{a,b,c}(\vec \tau, B) (that is, 𝐴AA and 𝐴′A′A' are in the same equivalence class), 𝜃(𝐴)=𝜃(𝐴′)Lθ(A)=Lθ(A′)\mathcal L_\theta(A) = \mathcal L_\theta(A'), and If 𝐴∈𝐸𝑎,𝑏,𝑐(𝜏⃗ ,𝐵)A∈Ea,b,c(τ→,B)A \in E_{a, b, c}(\vec \tau, B) but 𝐴′∈𝐸𝑎′,𝑏′,𝑐′(𝜏⃗ ,𝐵)A′∈Ea′,b′,c′(τ→,B)A' \in E_{a', b', c'}(\vec \tau, B) where either 𝑎≠𝑎′,𝑏≠𝑏′a≠a′,b≠b′a \neq a', b \neq b', or 𝑐≠𝑐′c≠c′c \neq c', then 𝜃(𝐴)≠𝜃(𝐴′)Lθ(A)≠Lθ(A′)\mathcal L_\theta(A) \neq \mathcal L_\theta(A').
  
  goes in starred section
4. jovo 05 Apr 2021
  
  in Public
  
  What does the likelihood for the a posteriori SBM look like? In this case, 𝜃=(𝜋⃗ ,𝐵)θ=(π→,B)\theta = (\vec \pi, B) are the parameters for the model, so the likelihood for a realization 𝐴AA of 𝐀A\mathbf A is: 𝜃(𝐴)∝ℙ𝜃(𝐀=𝐴)Lθ(A)∝Pθ(A=A)\begin{align*} \mathcal L_\theta(A) &\propto \mathbb P_\theta(\mathbf A = A) \end{align*} Next, we use the fact that the probability that 𝐀=𝐴A=A\mathbf A = A is, in fact, the marginalization (over realizations of 𝜏𝜏ττ\pmb \tau) of the joint (𝐀,𝜏𝜏)(A,ττ)(\mathbf A, \pmb \tau). In the line after that, we use Bayes’ Theorem to separate the joint probability into a conditional probability and a marginal probability: (2.1)¶=∫𝜏ℙ𝜃(𝐀=𝐴,𝜏𝜏=𝜏)d𝜏=∫𝜏ℙ𝜃(𝐀=𝐴∣∣𝜏𝜏=𝜏)ℙ𝜃(𝜏𝜏=𝜏)d𝜏=∫τPθ(A=A,ττ=τ)dτ=∫τPθ(A=A|ττ=τ)Pθ(ττ=τ)dτ\begin{align} &= \int_\tau \mathbb P_\theta(\mathbf A = A, \pmb \tau = \tau)\textrm{d}\tau \nonumber\\ &= \int_\tau \mathbb P_\theta(\mathbf A = A \big | \pmb \tau = \tau)\mathbb P_\theta(\pmb \tau = \tau)\textrm{d}\tau \label{eqn:apost_sbm_eq1} \end{align} Let’s think about each of these probabilities separately. Remember that for 𝜏𝜏ττ\pmb \tau, that each entry 𝜏𝜏𝑖ττi\pmb \tau_i is sampled independently and identically from 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙(𝜋⃗ )Categorical(π→)Categorical(\vec \pi).The probability mass for a 𝐶𝑎𝑡𝑒𝑔𝑜𝑟𝑖𝑐𝑎𝑙(𝜋⃗ )Categorical(π→)Categorical(\vec \pi)-valued random variable is ℙ(𝜏𝜏𝑖=𝜏𝑖;𝜋⃗ )=𝜋𝜏𝑖P(ττi=τi;π→)=πτi\mathbb P(\pmb \tau_i = \tau_i; \vec \pi) = \pi_{\tau_i}. Finally, note that if we are taking the products of 𝑛nn 𝜋𝜏𝑖πτi\pi_{\tau_i} terms, that many of these values will end up being the same. Consider, for instance, if the vector 𝜏=[1,2,1,2,1]τ=[1,2,1,2,1]\tau = [1,2,1,2,1]. We end up with three terms of 𝜋1π1\pi_1, and two terms of 𝜋2π2\pi_2, and it does not matter which order we multiply them in. Rather, all we need to keep track of are the counts of each 𝜋π\pi. term. Written another way, we can use the indicator that 𝜏𝑖=𝑘τi=k\tau_i = k, given by 𝟙𝜏𝑖=𝑘1τi=k\mathbb 1_{\tau_i = k}, and a running counter over all of the community probability assignments 𝜋𝑘πk\pi_k to make this expression a little more sensible. We will use the symbol 𝑛𝑘=∑𝑛𝑖=1𝟙𝜏𝑖=𝑘nk=∑i=1n1τi=kn_k = \sum_{i = 1}^n \mathbb 1_{\tau_i = k} to denote this value: ℙ𝜃(𝜏𝜏=𝜏)=∏𝑖=1𝑛ℙ𝜃(𝜏𝜏𝑖=𝜏𝑖)Independence Assumption=∏𝑖=1𝑛𝜋𝜏𝑖p.m.f. of a Categorical R.V.=∏𝑘=1𝐾𝜋𝑛𝑘𝑘Pθ(ττ=τ)=∏i=1nPθ(ττi=τi)Independence Assumption=∏i=1nπτip.m.f. of a Categorical R.V.=∏k=1Kπknk\begin{align*} \mathbb P_\theta(\pmb \tau = \tau) &= \prod_{i = 1}^n \mathbb P_\theta(\pmb \tau_i = \tau_i)\;\;\;\;\textrm{Independence Assumption} \\ &= \prod_{i = 1}^n \pi_{\tau_i} \;\;\;\;\textrm{p.m.f. of a Categorical R.V.}\\ &= \prod_{k = 1}^K \pi_{k}^{n_k} \end{align*} Next, let’s think about the conditional probability term, ℙ𝜃(𝐀=𝐴∣∣𝜏𝜏=𝜏)Pθ(A=A|ττ=τ)\mathbb P_\theta(\mathbf A = A \big | \pmb \tau = \tau). Remember that the entries are all independent conditional on 𝜏𝜏ττ\pmb \tau taking the value 𝜏τ\tau. This means that we can separate the probability of the entire 𝐀=𝐴A=A\mathbf A = A into the product of the probabilities edge-wise. Further, remember that conditional on 𝜏𝜏𝑖=ℓττi=ℓ\pmb \tau_i = \ell and 𝜏𝜏𝑗=𝑘ττj=k\pmb \tau_j = k, that 𝐚𝑖𝑗aij\mathbf a_{ij} is 𝐵𝑒𝑟𝑛(𝑏ℓ,𝑘)Bern(bℓ,k)Bern(b_{\ell,k}). The distribution of 𝐚𝑖𝑗aij\mathbf a_{ij} does not depend on any of the other entries of 𝜏𝜏ττ\pmb \tau. Remembering that the probability mass function of a Bernoulli R.V. is given by ℙ(𝐚𝑖𝑗=𝑎𝑖𝑗;𝑝)=𝑝𝑎𝑖𝑗(1−𝑝)𝑎𝑖𝑗P(aij=aij;p)=paij(1−p)aij\mathbb P(\mathbf a_{ij}=a_{ij}; p) = p^{a_{ij}}(1 - p)^{a_{ij}}, this gives: ℙ𝜃(𝐀=𝐴∣∣𝜏𝜏=𝜏)=∏𝑗>𝑖ℙ𝜃(𝐚𝑖𝑗=𝑎𝑖𝑗∣∣𝜏𝜏=𝜏)Independence Assumption=∏𝑗>𝑖ℙ𝜃(𝐚𝑖𝑗=𝑎𝑖𝑗∣∣𝜏𝜏𝑖=ℓ,𝜏𝜏𝑗=𝑘)𝐚𝑖𝑗 depends only on 𝜏𝑖 and 𝜏𝑗=∏𝑗>𝑖𝑏𝑎𝑖𝑗ℓ𝑘(1−𝑏ℓ𝑘)1−𝑎𝑖𝑗Pθ(A=A|ττ=τ)=∏j>iPθ(aij=aij|ττ=τ)Independence Assumption=∏j>iPθ(aij=aij|ττi=ℓ,ττj=k)aij depends only on τi and τj=∏j>ibℓkaij(1−bℓk)1−aij\begin{align*} \mathbb P_\theta(\mathbf A = A \big | \pmb \tau = \tau) &= \prod_{j > i}\mathbb P_\theta(\mathbf a_{ij} = a_{ij} \big | \pmb \tau = \tau)\;\;\;\;\textrm{Independence Assumption} \\ &= \prod_{j > i}\mathbb P_\theta(\mathbf a_{ij} = a_{ij} \big | \pmb \tau_i = \ell, \pmb \tau_j = k) \;\;\;\;\textrm{$\mathbf a_{ij}$ depends only on $\tau_i$ and $\tau_j$}\\ &= \prod_{j > i} b_{\ell k}^{a_{ij}} (1 - b_{\ell k})^{1 - a_{ij}} \end{align*} Again, we can simplify this expression a bit. Recall the indicator function above. Let |ℓ𝑘|=∑𝑗>𝑖𝟙𝜏𝑖=ℓ𝟙𝜏𝑗=𝑘𝑎𝑖𝑗|Eℓk|=∑j>i1τi=ℓ1τj=kaij|\mathcal E_{\ell k}| = \sum_{j > i}\mathbb 1_{\tau_i = \ell}\mathbb 1_{\tau_j=k}a_{ij}, and let 𝑛ℓ𝑘=∑𝑗>𝑖𝟙𝜏𝑖=ℓ𝟙𝜏𝑗=𝑘nℓk=∑j>i1τi=ℓ1τj=kn_{\ell k}= \sum_{j>i}\mathbb 1_{\tau_i = \ell}\mathbb 1_{\tau_j = k}. Note that ℓ𝑘Eℓk\mathcal E_{\ell k} is the number of edges between nodes in community ℓℓ\ell and community 𝑘kk, and 𝑛ℓ𝑘nℓkn_{\ell k} is the number of possible edges between nodes in community ℓℓ\ell and community 𝑘kk. This expression can be simplified to: ℙ𝜃(𝐀=𝐴∣∣𝜏𝜏=𝜏)=∏ℓ,𝑘𝑏|ℓ𝑘|ℓ𝑘(1−𝑏ℓ𝑘)𝑛ℓ𝑘−|ℓ𝑘|Pθ(A=A|ττ=τ)=∏ℓ,kbℓk|Eℓk|(1−bℓk)nℓk−|Eℓk|\begin{align*} \mathbb P_\theta(\mathbf A = A \big | \pmb \tau = \tau) &= \prod_{\ell,k} b_{\ell k}^{|\mathcal E_{\ell k}|}(1 - b_{\ell k})^{n_{\ell k} - |\mathcal E_{\ell k}|} \end{align*} Combining these into the integrand from Equation (\ref{eqn:apost_sbm_eq1}) gives: 𝜃(𝐴)∝∫𝜏ℙ𝜃(𝐀=𝐴∣∣𝜏𝜏=𝜏)ℙ𝜃(𝜏𝜏=𝜏)d𝜏=∫𝜏∏𝑘=1𝐾𝜋𝑛𝑘𝑘⋅∏ℓ,𝑘𝑏|ℓ𝑘|ℓ𝑘(1−𝑏ℓ𝑘)𝑛ℓ𝑘−|ℓ𝑘|d𝜏
  
  i love. its complicated. make it a 'starred subsection' or something.
5. jovo 05 Apr 2021
  
  in Public
  
  ∫𝜏
  
  make it a sum
6. jovo 05 Apr 2021
  
  in Public
  
  ℓ𝑘|ℓ𝑘(
  
  use m_lk
7. jovo 05 Apr 2021
  
  in Public
  
  𝑛𝑘=∑𝑛𝑖=1𝟙𝜏𝑖=𝑘nk=∑i=1n1τi=kn_k = \sum_{i = 1}^n \mathbb 1_{\tau_i = k}
  
  spell out in words what is n_k
8. jovo 05 Apr 2021
  
  in Public
  
  𝜏𝜏
  
  can put paragraph back in, but must introduce latent variables
9. jovo 05 Apr 2021
  
  in Public
  
  <ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡𝑐𝑙𝑎𝑠𝑠="ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡">𝑝</ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠−ℎ𝑖𝑔ℎ𝑙𝑖𝑔ℎ𝑡>
  
  didn't render properly
10. jovo 05 Apr 2021
  
  in Public
  
  Theory
  
  motivation
11. jovo 05 Apr 2021
  
  in Public
  
  =∏𝑗>𝑖ℙ𝜃(𝐚𝑖𝑗=𝑎𝑖𝑗)
  
  please right out the left side of this equation
12. jovo 05 Apr 2021
  
  in Public
  
  model
  
  model is not a random process, etc.
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/representations/ch5/single-network-models.html
docs.neurodata.io docs.neurodata.io

Multigraph Representation Learning — Network Machine Learning in Python

4
1. jovo 12 Apr 2021
  
  in Public
  
  Fig. 3.1 The MASE algorithm¶
  
  i'd elaborate here showing that we went from graph layouts to adjacency matrices. and explain the colors.
2. jovo 12 Apr 2021
  
  in Public
  
  Well, you could embed
  
  this paragraph should be about considering the many different ways one could do 'joint embedding'
  
  see figure from cep's paper, probably include a version of it.
  
  include just averaging graphs and then embedding the average, which is optimal in the absence of across-graph heterogeneity
3. jovo 12 Apr 2021
  
  in Public
  
  The goal of MASE is to embed the networks into a single space, with each point in that space representing a single node
  
  not quite
4. jovo 12 Apr 2021
  
  in Public
  
  However, what you’d really like to do is combine them all into a single representation to learn from every network at once.
  
  not necessarily
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/representations/ch6/multigraph-representation-learning.html
Mar 2021
docs.neurodata.io docs.neurodata.io

Network and Vertex feature joint representation learning — Network Machine Learning in Python

19
1. jovo 30 Mar 2021
  
  in Public
  
  Non-Assortative Case
  
  move it later
2. jovo 15 Mar 2021
  
  in Public
  
  silhouette
  
  replace with BIC, talk to tingshan
3. jovo 15 Mar 2021
  
  in Public
  
  section
  
  which is a computationally efficient line-search approach
4. jovo 15 Mar 2021
  
  in Public
  
  K-means
  
  lower case k
5. jovo 15 Mar 2021
  
  in Public
  
  Lloyd2
  
  LLoyd made up lloyd's algroithm for approxiamtely solving k-means
6. jovo 15 Mar 2021
  
  in Public
  
  a searching procedure until all the cluster centers are in nice places
  
  when no points move clusters from one iteration to the next
7. jovo 15 Mar 2021
  
  in Public
  
  essentially random places in our data
  
  k-means++ does not do this, the cluster centers are far from one another
8. jovo 15 Mar 2021
  
  in Public
  
  faster implementation
  
  remove
9. jovo 15 Mar 2021
  
  in Public
  
  1
  
  this equation is still wrong, needs a transpose
10. jovo 15 Mar 2021
  
  in Public
  
  covariates
  
  explain why we have ~50 dimensions, especially given that we are only embedding into 2
11. jovo 15 Mar 2021
  
  in Public
  
  from statistics
  
  not a necessary clause.
12. jovo 15 Mar 2021
  
  in Public
  
  Stochastic Block Model¶
  
  i always want: words --> math --> figure
13. jovo 08 Mar 2021
  
  in Public
  
  4.2359312775571826e-05 and a maximum weight of 40.00562586173658.
  
  just use 2 sig digs
14. jovo 08 Mar 2021
  
  in Public
  
  CASE simply sums these two matrices together, using a weight for 𝑋𝑋𝑇XXTXX^T so that they both contribute an equal amount of useful information to the result.
  
  CASE is a weighted sum
15. jovo 08 Mar 2021
  
  in Public
  
  𝑋𝑋𝑇𝑖,𝑗XXi,jTXX^T_{i, j}
  
  is not valid notation.
16. jovo 08 Mar 2021
  
  in Public
  
  which we denote here by L for brevity
17. jovo 08 Mar 2021
  
  in Public
  
  .
  
  cite CASC paper here
18. jovo 08 Mar 2021
  
  in Public
  
  them
  
  "then plots the results, color coding each node by its true community"
19. jovo 08 Mar 2021
  
  in Public
  
  best
  
  avoid 'best'
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/representations/ch6/joint-representation-learning.html
docs.neurodata.io docs.neurodata.io

Single-Network Models — Network Machine Learning in Python

32
1. jovo 30 Mar 2021
  
  in Public
  
  With a single network observed (or really, any number of networks we could collect in the real world) we would never be able to estimate 2𝑛22n22^{n^2} parameters. The number grows too quickly with 𝑛nn for any realistic choice of 𝑛nn in real-world data. This would lead to a thing called a lack of identifiability with a single network, which means that we would never be able to estimate 2𝑛22n22^{n^2} parameters from 111 network.
  
  unclear what you mean. MLE is [1,0,....,0]
2. jovo 30 Mar 2021
  
  in Public
  
  , we would need about 30,000,00030,000,00030,000,000 times the total number of storage available in the world to represent the parameters of a single distribution.
  
  confused
3. jovo 30 Mar 2021
  
  in Public
  
  We use a semi-colon to denote that the parameters 𝜃θ\theta are supposed to be fixed quantities for a given 𝐀A\mathbf A.
  
  remove
4. jovo 30 Mar 2021
  
  in Public
  
  What is the most natural choice for (Θ)P(Θ)\mathcal P(\Theta) that makes any sense?
  
  remove
5. jovo 30 Mar 2021
  
  in Public
  
  t is, in general, good for (Θ)P(Θ)\mathcal P(\Theta) to be fairly rich; that is, when we specify a parametrized statistical model (𝑛,(Θ))(An,P(Θ))(\mathcal A_n, \mathcal P(\Theta)), we want (Θ)P(Θ)\mathcal P(\Theta) to contain distributions that we think faithfully could represent our network realization 𝐴
  
  remove
6. jovo 30 Mar 2021
  
  in Public
  
  Note that by construction, we have that |(Θ)|=|Θ||P(Θ)|=|Θ|\left|\mathcal P(\Theta)\right| = \left|\Theta\right|. That is, the two sets have the same number of elements, since since 𝜃∈Θθ∈Θ\theta \in \Theta has a particular distribution ℙ𝜃∈(Θ)Pθ∈P(Θ)\mathbb P_\theta \in \mathcal P(\Theta), and vice-versa.
  
  remove
7. jovo 30 Mar 2021
  
  in Public
  
  (Θ)|=|Θ|
  
  define notation
8. jovo 30 Mar 2021
  
  in Public
  
  So, now we know that we have probability distributions on networks, and a set 𝑛An\mathcal A_n which defines all of the aadjacency matrices that every probability distribution must assign a probability to. Now, just what is a single network model? The single network model is the tuple (𝑛,)(An,P)(\mathcal A_n, \mathcal P). Above, we learned that 𝑛An\mathcal A_n was the set of all possible adjacency matrices for unweighted networks with 𝑛nn nodes. We will call 𝑛An\mathcal A_n the sample space of 𝑛nn-node networks. In general, 𝑛An\mathcal A_n will be the same sample space for all 𝑛nn-node network models. This means that for any 𝑛nn-node network realization 𝐴AA, we can calculate a probability that 𝐴AA is described by any probability distribution on 𝑛An\mathcal A_n found in P\mathcal P. What is P\mathcal P? It depends on the model we want to use! In general, P\mathcal P has only one rule: it is a nonempty set (it contains at least something), where for every ℙ∈P∈P\mathbb P \in \mathcal P, ℙP\mathbb P is a probability distribution on 𝑛An\mathcal A_n. Not that this says only that P\mathcal P cannot be empty, but it doesn’t say anything about how big or diverse it can be! In general, we will simplify P\mathcal P through something called parametrization; that is, we will write P\mathcal P as the set:
  
  don't love it
9. jovo 30 Mar 2021
  
  in Public
  
  {𝐴:𝐴∈{0,1}𝑛×𝑛}
  
  prob remove this
10. jovo 30 Mar 2021
  
  in Public
  
  When you see the short-hand expression ℙ(𝐴)P(A)\mathbb P(A), you should typically think back to the most recent random network 𝐀A\mathbf A that has been discussed, and it is typically assumed that ℙP\mathbb P refers to the probability distribution of that random network; e.g., 𝐀∼ℙA∼P\mathbf A \sim \mathbb P.
  
  delete
11. jovo 30 Mar 2021
  
  in Public
  
  Is this set the same for any unweighted random network, or is it ever different? It turns out that the answer here is fairly straightforward: for any unweighted random network with 𝑛nn nodes, the set of possible realizations, which we represent with the symbol 𝑛An\mathcal A_n, is exactly the same!
  
  i think this is more confusing than helpful.
12. jovo 30 Mar 2021
  
  in Public
  
  possibly
  
  rmove
13. jovo 30 Mar 2021
  
  in Public
  
  if 𝐀A\mathbf A is an unweighted random network with 𝑛nn nodes,
  
  not necessary
14. jovo 30 Mar 2021
  
  in Public
  
  topology
  
  topology
15. jovo 30 Mar 2021
  
  in Public
  
  random network 𝐀
  
  network-valued random variable \mathbf{A}
16. jovo 30 Mar 2021
  
  in Public
  
  topology
  
  remove
17. jovo 22 Mar 2021
  
  in Public
  
  𝐵
  
  why is it symmetric?
18. jovo 22 Mar 2021
  
  in Public
  
  This is an especially common approach when people deal with networks that are said to be sparse. A sparse network is a network in which the number of edges is much less than the total possible number of edges. This contrasts with a dense network, which is a network in which the number of edges is close to the maximum number of possible edges. In the case of an 𝐸𝑅𝑛(𝑝)ERn(p)ER_{n}(p) network, the network is sparse when 𝑝pp is small (closer to 000), and dense when 𝑝pp is large (closer to 111).
  
  why talk about sparse? if so, let's dedicate a section to it, not have it here.
  
  sparse can mean lots of different things:
  
  computationally sparse, meaning that storing the graph as an edge list is smaller than as an adjacency matrix.
  
  the probability of an edge scales with n, rather than n^2. Thus, p must be a function of n. This is an asymptotic claim, and therefore, does not make sense to apply to any given network.
19. jovo 22 Mar 2021
  
  in Public
  
  Probability that an edge exists between a pair of vertices
  
  iid on edges
20. jovo 11 Mar 2021
  
  in Public
  
  0.3
  
  use same notation in title as the rest of the paper, so we write
  
  ER_n(p)
  
  in the paper, so let's also use it in the title.
21. jovo 11 Mar 2021
  
  in Public
  
  True
  
  false
22. jovo 11 Mar 2021
  
  in Public
  
  True
  
  undirected
23. jovo 11 Mar 2021
  
  in Public
  
  ps
  
  i'd use p
24. jovo 11 Mar 2021
  
  in Public
  
  same
  
  i'd write
  
  "is the same number, n*p
25. jovo 11 Mar 2021
  
  in Public
  
  The
  
  also need to be able to estimate parameters of the model
26. jovo 11 Mar 2021
  
  in Public
  
  unlikely
  
  impossible
27. jovo 11 Mar 2021
  
  in Public
  
  the model
  
  model is a set
28. jovo 11 Mar 2021
  
  in Public
  
  framework
  
  not a framework
29. jovo 01 Mar 2021
  
  in Public
  
  .3
  
  always write 0.3 instead of .3
30. jovo 01 Mar 2021
  
  in Public
  
  Structured Independent Edge Model (SIEM)¶
  
  i think this should go at the end, ie, right before IER, since it is a different special case?
31. jovo 01 Mar 2021
  
  in Public
  
  _{s}
  
  remove _s i think
32. jovo 01 Mar 2021
  
  in Public
  
  $\pmb A \sim ER_n(p)$
  
  write this out by factorizing, eg, show
  
  A ~ Bern(P) = \prod_ij Bern(p_ij) = \prod_ij Bern(p)
  
  and explain why
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/representations/ch5/single-network-models.html
docs.neurodata.io docs.neurodata.io

2. Why Use Statistical Models? — Network Machine Learning in Python

48
1. jovo 25 Mar 2021
  
  in Public
  
  Model Selection: The model is appropriate for the data w
  
  appropriateness
2. jovo 25 Mar 2021
  
  in Public
  
  Machine Learning
  
  is this capitalized?
3. jovo 25 Mar 2021
  
  in Public
  
  underlies
  
  nix for govern, everywhere maybe?
4. jovo 25 Mar 2021
  
  in Public
  
  underlying
  
  remove
5. jovo 25 Mar 2021
  
  in Public
  
  Stated another way, even if we believe that the process underlying the network isn’t random at all, we can still extract value by using a statistical model.
  
  clarify
6. jovo 25 Mar 2021
  
  in Public
  
  underlies
  
  governs
7. jovo 25 Mar 2021
  
  in Public
  
  statistics
  
  ML
8. jovo 25 Mar 2021
  
  in Public
  
  statistics
  
  replace everywhere with network machine learning.
9. jovo 25 Mar 2021
  
  in Public
  
  Comparing
  
  somewhere in here we need to remind people of our notational conventions.
10. jovo 25 Mar 2021
  
  in Public
  
  lives
  
  and also talk about missing people, people with multiple accounts (eg, famous people have personal and pro accounts), etc. also do that above.
11. jovo 25 Mar 2021
  
  in Public
  
  quantity
  
  variable
12. jovo 25 Mar 2021
  
  in Public
  
  not
  
  on a specific social media site
13. jovo 25 Mar 2021
  
  in Public
  
  random
  
  if we are assuming iid, we should say it here.
14. jovo 25 Mar 2021
  
  in Public
  
  For instance, we might think that
  
  In this simple example,
15. jovo 25 Mar 2021
  
  in Public
  
  Instead
  
  remove
16. jovo 25 Mar 2021
  
  in Public
  
  this
  
  remove
17. jovo 25 Mar 2021
  
  in Public
  
  discrimative modeling,
  
  is discriminative always classification? seems not, please clarify
18. jovo 25 Mar 2021
  
  in Public
  
  relating to how the network behaves
  
  about properties of the network
19. jovo 25 Mar 2021
  
  in Public
  
  network
  
  and potential network, node, and edge attributes
20. jovo 25 Mar 2021
  
  in Public
  
  statistics
  
  network ML
21. jovo 25 Mar 2021
  
  in Public
  
  𝑑dd-dimensions
  
  i don't want to corner stats into Euclidean stats. The 'problem' with classical stats with regards to network machine learning is that it doesn't tell us how to leverage the structure of a network efficiently
22. jovo 25 Mar 2021
  
  in Public
  
  presentation
  
  representation
23. jovo 25 Mar 2021
  
  in Public
  
  well
  
  remove
24. jovo 25 Mar 2021
  
  in Public
  
  is that we have
  
  is concerned with
25. jovo 25 Mar 2021
  
  in Public
  
  reaization
  
  realization.
  
  This sentence is not right, however. We assume that our observed network is merely a realization of a random network.
26. jovo 25 Mar 2021
  
  in Public
  
  with which we seek
  
  we use
27. jovo 25 Mar 2021
  
  in Public
  
  Statistical modelling is a technique in which
  
  In statistical modeling
28. jovo 25 Mar 2021
  
  in Public
  
  Perhaps
  
  what about missing people from the social network? that is a big one
29. jovo 25 Mar 2021
  
  in Public
  
  perhaps
  
  remove
30. jovo 25 Mar 2021
  
  in Public
  
  is filled with
  
  includes much
31. jovo 25 Mar 2021
  
  in Public
  
  might
  
  remove
32. jovo 25 Mar 2021
  
  in Public
  
  the question that we
  
  we may
33. jovo 22 Mar 2021
  
  in Public
  
  A common question as a scientist that we might have is how, exactly, we could describe the network in the simplest way possible
  
  this is just not where to start
34. jovo 18 Mar 2021
  
  in Public
  
  Topology
  
  simple
35. jovo 18 Mar 2021
  
  in Public
  
  before
  
  no time yet please
36. jovo 18 Mar 2021
  
  in Public
  
  are
  
  correspond to
  
  or
  
  represent
37. jovo 18 Mar 2021
  
  in Public
  
  are
  
  correspond to
  
  or
  
  represent
38. jovo 18 Mar 2021
  
  in Public
  
  topology of a network
  
  simple network
39. jovo 18 Mar 2021
  
  in Public
  
  𝑋
  
  A is realization of a vector/matrix \mathbf{A} is vector/matrix valued RV
  
  a is scalar realization \matbf{a} is a scalar valued RV
  
  a_ij is the realization of an edge
40. jovo 18 Mar 2021
  
  in Public
  
  𝑎𝑎aa\pmb a
  
  i don't think we can use 'a', A is observed, and \mathbf{A} is random variable
41. jovo 18 Mar 2021
  
  in Public
  
  𝑥𝑖
  
  should be x not x_i since we use X later.
42. jovo 18 Mar 2021
  
  in Public
  
  value
  
  whereas x takes on two possible values: heads or tails
43. jovo 18 Mar 2021
  
  in Public
  
  𝑋
  
  but x is not, it is the actual observed realization of a coin flip
44. jovo 18 Mar 2021
  
  in Public
  
  generative
  
  we also use discriminative modeling, eg, signal subgraph
45. jovo 18 Mar 2021
  
  in Public
  
  might have a slightly different group of friends depending on when we look at their friend circle
  
  let's not introduce time varying yet
46. jovo 18 Mar 2021
  
  in Public
  
  we assume that the true network is a network for which we could never observe completely, as each time we look at the network, we will see it slightly differently.
  
  we have measurement error and other sources of uncertainty in our data. that is an empirical fact. let's preserve the word assumption for explicit model assumptions.
47. jovo 18 Mar 2021
  
  in Public
  
  The way we characterize the network is called the choice of an underlying statistical model.
  
  grammar
48. jovo 18 Mar 2021
  
  in Public
  
  if we know people within the social network are groups of student
  
  grammar
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/representations/ch5/ch5.html
docs.neurodata.io docs.neurodata.io

What Is A Network? — Network Machine Learning in Python

1
1. jovo 11 Mar 2021
  
  in Public
  
  What Is A Network?¶ { requestKernel: true, binderOptions: { repo: "binder-examples/jupyter-stacks-datascience", ref: "master", }, codeMirrorConfig: { theme: "abcdef", mode: "python" }, kernelOptions: { kernelName: "python3", path: "./foundations/ch1" }, predefinedOutput: true } kernelName = 'python3'
  
  simple network
  
  weighted
  
  loopless
  
  directed
  
  attributed
  
  then we discuss different representations of networks again
  
  edge list
  
  adjacency matrix
  
  various laplacians.
  
  maybe in its own subsection
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/foundations/ch1/what-is-a-network.html
docs.neurodata.io docs.neurodata.io

Preface — Network Machine Learning in Python

1
1. jovo 01 Mar 2021
  
  in Public
  
  Preface
  
  add pretty cover art from @pedigo
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/intro.html
docs.neurodata.io docs.neurodata.io

Learning Graph Representations — Network Machine Learning in Python

1
1. jovo 01 Mar 2021
  
  in Public
  
  $LL + aXX^T$
  
  make latex work :)
Visit annotations in context

Annotators

jovo

URL

docs.neurodata.io/graph-stats-book/representations/ch6/ch6.html
Oct 2020
psyarxiv.com psyarxiv.com

Herrera-Bennett, Heene, Lakens, & Ufer (2020, preprint)_Study1.pdf

1
1. jovo 18 Oct 2020
  
  in Public
  
  The p-value determines the probability ofobserving data (or more extreme results), contingenton having assumed the null-hypothesis is true. The formal definition can be expressed as follows: P(X≥ x|H0) or P(X≤ x|H0),
  
  I'm sure you know this, but this sentence, as written, I do not think is quite right. Assume that X is a random variable, and x is a realization of that random variable, and we sample n times identically and independently from some true but unknown distribution P. Then, choose a test statistic, T, which maps from the data (X1, ..., Xn) to a scalar t. Now, we can define the p-value the the probability of observing data with a test statistic as or more extreme than the observed, contingent on having assumed the null-hypothesis is true.
  
  The fact that there is a test statistic in there, I think, is incredibly important, because obviously (to you), if one chooses a different test statistic, one can obtain a different p-value.
  
  This is also important for your PV8, where "result" is ambiguous. The result here implicitly refers to the test statistic, which was not previously mentioned. It is easy to mistakenly believe that the 'result' somehow magically implies something about 'the data'. For example, anecdotally, I often find that people think a big p-value on a t-test implies no effect, whereas had they used a robust test, or tested for a change in variance rather than the mean, the effect is clear.
Visit annotations in context

Annotators

jovo

URL

psyarxiv.com/zt3g9/

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL