519 Matching Annotations
  1. Mar 2023
    1. One of the most popular statistics to use to determine sparsity in realized networks is the network density, but there are many others that have their own advantages [7], [8].

      delete?

    2. Unfortunately, Fisher’s exact test has a slight caveat: it can be extremely computationally intensive to compute, especially when the number of data observations that we have (in this case, 200,000) is really big (it could be even bigger than 200,000).

      no

    3. Let’s assume that we have a small task, where for each row in the matrix X, we want to compute the row-wise sum. Stated another way, for a given row i, the quantity that you want to compute is ∑j=1mxij. If you ignore sparsity all-together, you can do this operation pretty easily: there are n rows, and m terms that you need to add together for each row, which means that you will have n⋅m total operations to perform (for each of n rows, perform an addition involving m terms).

      no m

    4. If the rows can be sparse, the columns could be too; let’s assume that we have a matrix where m′ of the columns are sparse. Following a similar approach to the above, if we had a list Y with m′ elements telling us which columns were not sparse, we could just store the m′ non-sparse columns (each of which has n rows), and then the list of the m′ non-zero elements. Like above, we can store this information with 64⋅(n⋅m′+m′+1) bits.

      necessary

    5. Let’s say that of these n rows, we know ahead of time that a lot of the rows are sparse. By “row sparse”, what we mean is that xij=0 for all of these sparse rows i. Let’s assume that of the n total rows, only n′ are not sparse. We could, for instance, store the non-sparse rows in a little set X which has n′ elements telling us which rows are not sparse. For these non-sparse rows, we store all m pieces of column-wise information, but for the sparse rows, we just ignore them entirely. To store this entire matrix, we will need 64⋅(n′⋅m) (64 bits for each entry of a non-sparse row) +64⋅n′ (64 bits to store each element of X) +64 (to store the total number of rows that the matrix has), for a total of 64⋅(n′⋅m+n′+1) bits.

      matrix sparsity

  2. May 2022
    1. Examples

      problems in NML:

      • pizza hut nodes
      • pendants
      • disconnected networks
      • directed networks
      • big weights

      1 para per ch. 4 thing but also re-read Homl and check if any are easily portable (eg, too small network, or too dense detwork)

    1. whether or not the approach can be used in isolation from a statistical model (non-model based or model-based network learning systems).

      add a paragraph about edge vs node vs community vs network

    1. As the internet became widespread and coding tools became easier to use – Python became prevalent in machine learning, for instance, and cloud computing came into its own with Amazon’s AWS and Microsoft’s Azure –

      delete

    2. One crucially influential application for networks was in 1996, when a graduate student at Stanford named Larry Page made the PageRank algorithm. The idea was that websites on the internet (which, in 1996, had barely formed) could be ordered into a hierarchy by “link popularity”: a web page would rank higher the more links there were to it. Larry Page and his friend Sergey Brin realized that PageRank could be used to create a search engine – and so they used the PageRank algorithm to found a small web searching company they called Google.

      this paragraph is redundant

    3. Fig

      'network population' --> 'network population assumption'

      'network sample = data'

      'network machine learning' <-- 'learn about the network sample'

      to the right, is 'guess about some property of network population'

    4. , and although this book doesn’t focus on GNNs specifically, it does give you the fundamental ideas that you can build off of to understand them.

      . This book provides the basic foundational concepts and intuition required to understand how, when, and why GNNs, or any other network machine learning tool, works.

    1. Broadly

      replace ML with 'statistical learning'

      add pointer to ML which is the overlap of SL + DS

      add pointer graph theory = overlap of NS + DS

    1. There will be the same number of adjacency matrices as there are time points, since our network will be changing over time.

      confusing

    1. If we consider the worst possible case (every edge in 𝐴AA does not exist in 𝐵BB), 𝐴=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐵=012012⎛⎝⎜⎜000000000⎞⎠⎟⎟𝐴−𝐵=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟||𝐴−𝐵||2𝐹=6

      seems unnecessary

    2. 𝐴=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐵=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐴−𝐵=012012⎛⎝⎜⎜000000000⎞⎠⎟⎟||𝐴−𝐵||2𝐹=0

      this seems unnecessary

    1. Let’s formalize this situation a little bit more. We have the following three hypotheses. 𝐻0:𝑝1=𝑝2=𝑝3=𝑎H0:p1=p2=p3=aH_0: p_1 = p_2 = p_3 = a, against 𝐻1:𝑝1=𝑝2=𝑎H1:p1=p2=aH_1: p_1 = p_2 = a, but 𝑝3=𝑐p3=cp_3 = c. Finally, we have 𝐻2:𝑝1=𝑎H2:p1=aH_2: p_1 = a, 𝑝2=𝑏p2=bp_2 = b, and 𝑝3=𝑐p3=cp_3 = c. The hypothesis 𝐻HH is nested in the hypothesis 𝐻′H′H' if whenever 𝐻HH is true, 𝐻′H′H' is also true. In this sense, the hypothesis 𝐻′H′H' is said to contain the hypothesis 𝐻HH. Let’s consider 𝐻0H0H_0 and 𝐻1H1H_1, for instance. Notice that if 𝐻0H0H_0 is true, then 𝑝1=𝑝2=𝑝3=𝑎p1=p2=p3=ap_1 = p_2 = p_3 = a. However, 𝐻1H1H_1 is also true, since 𝑝1=𝑝2=𝑎p1=p2=ap_1 = p_2 = a, and 𝑝3=𝑐p3=cp_3= c can also be set equal to 𝑝1p1p_1 and 𝑝2p2p_2 if 𝑐=𝑎c=ac = a. A sequence of hypotheses 𝐻0,𝐻1,...,𝐻𝑛H0,H1,...,HnH_0, H_1, ..., H_n is called sequentially nested if 𝐻0H0H_0 is nested in 𝐻1H1H_1, which is nested in 𝐻2H2H_2, so on and so forth up to 𝐻𝑛−1Hn−1H_{n-1} is nested in 𝐻𝑛HnH_n. Note that the sequence of hypotheses that we presented for our three coin example are sequentially nested. We already saw that 𝐻0H0H_0 was nested in 𝐻1H1H_1. Now, let’s compare 𝐻2H2H_2 to 𝐻1H1H_1. Notet that if 𝑎=𝑏a=ba = b, that 𝑝1=𝑝2p1=p2p_1 = p_2, and 𝑝3=𝑐p3=cp_3 = c, exactly as in 𝐻1H1H_1, so 𝐻1H1H_1 is nested in 𝐻2H2H_2. Therefore, since 𝐻0H0H_0 is nested in 𝐻1H1H_1 and 𝐻1H1H_1 is nested in 𝐻2H2H_2, The sequence 𝐻0H0H_0, 𝐻1H1H_1, and 𝐻2H2H_2 are sequentially nested.

      dense.

      draw a diagram

    1. . Unfortunately, if the data is not well-summarized by a normal distribution, the 𝑡tt-test tends to be a fairly poor choice for hypothesis testing.

      not quite right

    2. 8.2.1. The Structured Independent Edge Model is parametrized by a Cluster-Assignment Matrix and a probability vector

      this is a model, so goes in ch. 5

    3. higher chance two students are friends if they go to the same school than if they go to two different schools.

      RDPG must find this.

      so, use GRDGP or a different model/hypothesis

  3. Apr 2022
    1. 8.1.1.2. Evaluating

      the interesting thing for k-means, silloutte, ARI, etc. is showing them in a graph, and showing when they get it wrong.

      and then showing AutoGMM gets it right.

    2. these nodes tend to be more connected (more edges exist between and amongst them)

      communities are groups of nodes that are stochastically equivalent.

    1. However, as you can see, the colors are flipped: the communities are in different places relative to each other.

      this doesn't make any sense.

      also, label communities L and R not 0 and 1.

    2. one

      before this, show the true latent positions, label them Lhuman Rhuman Laliean Ralien. maybe all on one coordiate axis.

      consider showing that they are not rotations of one another.

    3. P = np.array([[pa, pb], [pc, pd]]) return sbm([n, n], P, return_labels=return_labels) # make nine human networks # and nine alien networks p1, p2, p3 = .12, .06, .03

      too many parameters and don't write 9 unless you sample 9

    1. Note

      One cannot get arbitrary densities if one has repeated values for weights unless one has a procedure for discarding replicates.

    1. All you need to get out of this code is that you have six networks from the first group, and another twelve networks from the second.

      why not 5 and 5? or 10 and 10?

    2. Nodes plotted \nas 2d points

      Each node is a point.

      Add a caption to this figure:

      Each point is a node display in 2D latent space. Because there are 20 nodes in this graph, there are 20 points in this figure. Because there are 10 nodes in each community, we have colored 10 points in the figure to indicate which community it is in.

    3. that

      Clarify: the issue is not computing these features, but rather, interpreting them. and in particular, interpreting them in a causal light.

    4. f you’re familiar with correlation, you’ll notice that these correlation numbers generally have a pretty high magnitude: each feature generally tells you a lot about each other feature.

      not quite. some are high, some are low, some say a lot about the others