519 Matching Annotations
  1. Mar 2023
    1. One of the most popular statistics to use to determine sparsity in realized networks is the network density, but there are many others that have their own advantages [7], [8].

      delete?

    2. infinite

      doesn't need to be infinite i dont think

    3. x→(n), this quantity could be written: 𝕩

      why does 'x' look different

    4. ip:

      no . after Pr

    5. element xi(n) is a random

      why superscript

    6. 9.3.1.3. The algorithmic implications#

      give example with a big Matvec Op

    7. Unfortunately, Fisher’s exact test has a slight caveat: it can be extremely computationally intensive to compute, especially when the number of data observations that we have (in this case, 200,000) is really big (it could be even bigger than 200,000).

      no

    8. SNP 1, alternative base T

      SNP 300M

    9. Let’s assume that we have a small task, where for each row in the matrix X, we want to compute the row-wise sum. Stated another way, for a given row i, the quantity that you want to compute is ∑j=1mxij. If you ignore sparsity all-together, you can do this operation pretty easily: there are n rows, and m terms that you need to add together for each row, which means that you will have n⋅m total operations to perform (for each of n rows, perform an addition involving m terms).

      no m

    10. If the rows can be sparse, the columns could be too; let’s assume that we have a matrix where m′ of the columns are sparse. Following a similar approach to the above, if we had a list Y with m′ elements telling us which columns were not sparse, we could just store the m′ non-sparse columns (each of which has n rows), and then the list of the m′ non-zero elements. Like above, we can store this information with 64⋅(n⋅m′+m′+1) bits.

      necessary

    11. Let’s say that of these n rows, we know ahead of time that a lot of the rows are sparse. By “row sparse”, what we mean is that xij=0 for all of these sparse rows i. Let’s assume that of the n total rows, only n′ are not sparse. We could, for instance, store the non-sparse rows in a little set X which has n′ elements telling us which rows are not sparse. For these non-sparse rows, we store all m pieces of column-wise information, but for the sparse rows, we just ignore them entirely. To store this entire matrix, we will need 64⋅(n′⋅m) (64 bits for each entry of a non-sparse row) +64⋅n′ (64 bits to store each element of X) +64 (to store the total number of rows that the matrix has), for a total of 64⋅(n′⋅m+n′+1) bits.

      matrix sparsity

    12. the rows are sparse. By “row sparse”, what we mean is that xij=0 for all of these sparse rows i.

      these are different

    13. but a common cutoff is if the number of non-zero elements is at most the number of rows or columns.

      dont think so

    14. L’Hopital’s

      i hope not

  2. May 2022
    1. Examples

      problems in NML:

      • pizza hut nodes
      • pendants
      • disconnected networks
      • directed networks
      • big weights

      1 para per ch. 4 thing but also re-read Homl and check if any are easily portable (eg, too small network, or too dense detwork)

    1. intuitively

      tie back to assumptions

    2. allows

      asuumes a particular form of

    3. attributes

      edge node network multi-network

    4. mentally

      alien

    5. Networks with cross-network attributes

      multiple networks with node attributes and/or labels

    6. For

      give AlphaFold example too

    7. with more than one element

      new sentence

      for it to be a meaningful network, there must be multiple nodes and edges

    8. usually

      is defined by a

    9. whether or not the approach can be used in isolation from a statistical model (non-model based or model-based network learning systems).

      add a paragraph about edge vs node vs community vs network

    10. and

      vs

    1. As the internet became widespread and coding tools became easier to use – Python became prevalent in machine learning, for instance, and cloud computing came into its own with Amazon’s AWS and Microsoft’s Azure –

      delete

    2. cloud computing came into its own with Amazon’s AWS and Microsoft’s Azure –

      remove

    3. One crucially influential application for networks was in 1996, when a graduate student at Stanford named Larry Page made the PageRank algorithm. The idea was that websites on the internet (which, in 1996, had barely formed) could be ordered into a hierarchy by “link popularity”: a web page would rank higher the more links there were to it. Larry Page and his friend Sergey Brin realized that PageRank could be used to create a search engine – and so they used the PageRank algorithm to found a small web searching company they called Google.

      this paragraph is redundant

    4. machine

      refer back to venn diagram

    5. Fig

      'network population' --> 'network population assumption'

      'network sample = data'

      'network machine learning' <-- 'learn about the network sample'

      to the right, is 'guess about some property of network population'

    6. who could potentially have the mental illness

      psychological property, or skill

    7. network

      special cases

    8. 1.2.2.3. We might errorfully observe the networks¶

      goes first

    9. , and although this book doesn’t focus on GNNs specifically, it does give you the fundamental ideas that you can build off of to understand them.

      . This book provides the basic foundational concepts and intuition required to understand how, when, and why GNNs, or any other network machine learning tool, works.

    10. organized

      can be thought of as

    1. Broadly

      replace ML with 'statistical learning'

      add pointer to ML which is the overlap of SL + DS

      add pointer graph theory = overlap of NS + DS

    2. Dr

      isn't he a section contributor

    3. independence

      hypothesis

    4. Microsoft

      DARPA program manager

    5. ericwb95 - at - gmail - dot - com

      use your neurodata email address. eric@neurodata.io

      ask jong

    6. Doksum

      add diversity to recommendations

    7. texts

      others

    8. would be

      is

    9. we think a reasonable

      our favorite

    10. learning

      add bullets to appendix

    11. which

      that

    12. unfortunately

      remove

    13. Machine Learning

      decapitalize

    14. easy to use

      hyphenate?

    15. everything unique

      overclaim

    16. Twitter

      maybe mention a chinese/indian/one

    17. nearly

      over

    1. the development of machine learning strategies for data that is a network.

      machine learning for network-valued data

    2. For

      al ot of 'for instances' here

    3. have

      choose

    4. We don’t really like that word

      we like it, there is a downside

    5. machine learning

      and data science

    6. nd each column represented the length and biological sex (male or female) of the lobster

      just 'sex'

    7. a piece of

      some

    8. wikipedia

      fix to say what wiki says

    1. SignalSubgraph

      check that it deals with ties properly

    2. 10.2.3.2. Classification with Bayes Plugin Classifier¶

      Graph Classification

    3. 10.2.3.1. Bayes Plugin Classifier (Statistical Intuition)¶

      appendix

    4. humans

      humans

    5. astronauts

      martians

    6. Can we come up with a signal subnetwork classifier?

      this means find the subnets that differ

    7. astronauts

      the descendents of the astronauts are astronauts, are they?

    8. lobes

      did we change that? i thought we were going with sensory modalities

    1. Estimation

      consolidate bootstrap stuff, maybe in appendix

    2. above

      ensure the result is still (0,1)

    3. There will be the same number of adjacency matrices as there are time points, since our network will be changing over time.

      confusing

    1. -

      show match ratio here too

    2. Unshuffling

      Matching

    3. match ratio(𝑃,𝑃𝑢)

      update equation

    4. match_ratio

      put in graspologic

    5. 𝑃𝐵PBPB

      transpose

    6. reorder

      not really

    7. 0,1,2,3}

      i don't think this example works because there are multiple permutations that yield 0

    8. The

      /linebreak

    9. If we consider the worst possible case (every edge in 𝐴AA does not exist in 𝐵BB), 𝐴=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐵=012012⎛⎝⎜⎜000000000⎞⎠⎟⎟𝐴−𝐵=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟||𝐴−𝐵||2𝐹=6

      seems unnecessary

    10. 𝐴=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐵=012012⎛⎝⎜⎜011101110⎞⎠⎟⎟𝐴−𝐵=012012⎛⎝⎜⎜000000000⎞⎠⎟⎟||𝐴−𝐵||2𝐹=0

      this seems unnecessary

    1. multipletests

      does that work?

    2. 𝑎≠𝑏a≠ba \neq b

      not nested

    3. assumptions

      guess

    4. Let’s formalize this situation a little bit more. We have the following three hypotheses. 𝐻0:𝑝1=𝑝2=𝑝3=𝑎H0:p1=p2=p3=aH_0: p_1 = p_2 = p_3 = a, against 𝐻1:𝑝1=𝑝2=𝑎H1:p1=p2=aH_1: p_1 = p_2 = a, but 𝑝3=𝑐p3=cp_3 = c. Finally, we have 𝐻2:𝑝1=𝑎H2:p1=aH_2: p_1 = a, 𝑝2=𝑏p2=bp_2 = b, and 𝑝3=𝑐p3=cp_3 = c. The hypothesis 𝐻HH is nested in the hypothesis 𝐻′H′H' if whenever 𝐻HH is true, 𝐻′H′H' is also true. In this sense, the hypothesis 𝐻′H′H' is said to contain the hypothesis 𝐻HH. Let’s consider 𝐻0H0H_0 and 𝐻1H1H_1, for instance. Notice that if 𝐻0H0H_0 is true, then 𝑝1=𝑝2=𝑝3=𝑎p1=p2=p3=ap_1 = p_2 = p_3 = a. However, 𝐻1H1H_1 is also true, since 𝑝1=𝑝2=𝑎p1=p2=ap_1 = p_2 = a, and 𝑝3=𝑐p3=cp_3= c can also be set equal to 𝑝1p1p_1 and 𝑝2p2p_2 if 𝑐=𝑎c=ac = a. A sequence of hypotheses 𝐻0,𝐻1,...,𝐻𝑛H0,H1,...,HnH_0, H_1, ..., H_n is called sequentially nested if 𝐻0H0H_0 is nested in 𝐻1H1H_1, which is nested in 𝐻2H2H_2, so on and so forth up to 𝐻𝑛−1Hn−1H_{n-1} is nested in 𝐻𝑛HnH_n. Note that the sequence of hypotheses that we presented for our three coin example are sequentially nested. We already saw that 𝐻0H0H_0 was nested in 𝐻1H1H_1. Now, let’s compare 𝐻2H2H_2 to 𝐻1H1H_1. Notet that if 𝑎=𝑏a=ba = b, that 𝑝1=𝑝2p1=p2p_1 = p_2, and 𝑝3=𝑐p3=cp_3 = c, exactly as in 𝐻1H1H_1, so 𝐻1H1H_1 is nested in 𝐻2H2H_2. Therefore, since 𝐻0H0H_0 is nested in 𝐻1H1H_1 and 𝐻1H1H_1 is nested in 𝐻2H2H_2, The sequence 𝐻0H0H_0, 𝐻1H1H_1, and 𝐻2H2H_2 are sequentially nested.

      dense.

      draw a diagram

    5. samples with which we are presented

      data

    6. and

      by

    7. presenting

      selecting among

    8. describe

      may describe

    9. =

      \neq

    10. faithful

      accurate, veridical,

    1. Pretty exciting, huh?

      this pvalue is not valid

      see appendix for a robust approach that has higher power for weighted networks.

    2. overcoming

      appendix

    3. . Unfortunately, if the data is not well-summarized by a normal distribution, the 𝑡tt-test tends to be a fairly poor choice for hypothesis testing.

      not quite right

    4. 8.2.2.2.2. Weighted Networks¶

      appendix

    5. ,

      no space after comma

    6. below plot

      weird formatting

    7. 8.2. Testing for Differences between Groups of Edges¶

      between known groups of edges

    8. the

      same here

    9. the number of adjacencies in cluster one with an adjacency of zero

      the # fo zero valued adjacencies

    10. 8.2.2.1. Hypothesis Testing with coin flips¶

      these sections all go in appendix

    11. alternative

      null, as opposed to the alternative

    12. indicates

      i don't think they indicate anything

      they assert

    13. RDPG

      not true. GRDPG does

    14. 8.2.1. The Structured Independent Edge Model is parametrized by a Cluster-Assignment Matrix and a probability vector

      this is a model, so goes in ch. 5

    15. higher chance two students are friends if they go to the same school than if they go to two different schools.

      RDPG must find this.

      so, use GRDGP or a different model/hypothesis

    16. resort

      re-sort

    17. the

      remove word

  3. Apr 2022
    1. 8.1.1.2. Evaluating

      the interesting thing for k-means, silloutte, ARI, etc. is showing them in a graph, and showing when they get it wrong.

      and then showing AutoGMM gets it right.

    2. 8.1.1.1

      non-graph things go in appendix, including: - k-means - silloutte score - ARI - confusion

    3. heatmap

      adjacency matrix

    4. hat if your true labels are disproportionate

      it doesn't normalize for chance.

    5. You

      add a section on graspologics thingy.

      that may require updating grapsologic documentation

    6. Temporary cluster assignments

      Find closest center for each point

    7. enters from previous iteration

      Compute all distances to center

    8. 3 step

      2

    9. smack dab

      approximately

    10. ry to find reasonable guesses at the “centers”

      not our goal here

    11. dataset

      and the label of each pont

    12. ur goal is to learn about the block matrix, 𝐵BB,

      learn the latent community assignment vector

    13. these nodes tend to be more connected (more edges exist between and amongst them)

      communities are groups of nodes that are stochastically equivalent.

    1. Non-Identifiability

      move to ase section?

    2. had to delete

      deleted

    3. and so, f

      Finally

    4. Embedding

      the point is that your embeddings are not in the same space.

    5. humans + aliens

      maybe clarify that

    6. and so forth

      remove

    7. forth

      ,

    8. first

      introduce mase before omni if you are explaining mase before omni

    9. used

      that used

    10. However, as you can see, the colors are flipped: the communities are in different places relative to each other.

      this doesn't make any sense.

      also, label communities L and R not 0 and 1.

    11. plot_latents

      plot these on the same scale

    12. one

      before this, show the true latent positions, label them Lhuman Rhuman Laliean Ralien. maybe all on one coordiate axis.

      consider showing that they are not rotations of one another.

    13. P = np.array([[pa, pb], [pc, pd]]) return sbm([n, n], P, return_labels=return_labels) # make nine human networks # and nine alien networks p1, p2, p3 = .12, .06, .03

      too many parameters and don't write 9 unless you sample 9

    14. because

      b

    15. bilateralized

      bilateral

    16. you’ll

      We'll

    17. you’ll just simulate

      we'll simulate human and ...

    18. simply

      remove

    19. aving less stuff to deal with

      ?

    1. Ranking

      comment that binarization is decimation of ranking

    2. networks

      this comes after sparsification and truncation because you modify every edge

    3. normalization

      global rescaling

    4. Sparsification

      this is a special case of 'edge trimming'

      add truncation

    5. Lowering

      this isn't lowering edge bias

    6. done

      clarify that if it is weighted, the remaining edges keep their weights, as compared to binarization,

    7. Note

      One cannot get arbitrary densities if one has repeated values for weights unless one has a procedure for discarding replicates.

    8. exclude the diagonal

      check graspologic, and make issue/PR

    9. bias

      thresholding reduces variance, adds bias

    10. he task easier to estimate

      not necessarily

    11. The bias/variance tradeoff is

      reference ESL chapter

    12. Ignoring

      no. only do this when the matrix is stored as upper/lower. but then don't quite do this

    13. degree

      remove 'pendants' and 'pizza huts'

    14. 4.4.1. Regularization of the Nodes

      Node pruning

    15. Degree

      show this, and re-order to do this node trimming first.

      show the degree distribution before and after

    16. You

      be more clear, and show result

    1. Nodes

      desribe node latent space here, and network latent space in bag of networks

    2. space

      network latent space (as opposed to the node latent space we use to visualize nodes in a network)

    3. Embedding this new matrix will give us a point in space for each network.

      maybe move down?

    4. dissimilarity

      label the axes and update title to be dissimilarity of networks

    5. All you need to get out of this code is that you have six networks from the first group, and another twelve networks from the second.

      why not 5 and 5? or 10 and 10?

    6. the whole

      each

    7. Nodes plotted \nas 2d points

      Each node is a point.

      Add a caption to this figure:

      Each point is a node display in 2D latent space. Because there are 20 nodes in this graph, there are 20 points in this figure. Because there are 10 nodes in each community, we have colored 10 points in the figure to indicate which community it is in.

    8. on a coordinate axis

      in latent space

    9. Euclidean

      reals?

    10. moving

      mapping

    11. Euclidean

      not necessarily Euclidean

    12. issue

      for sound theoretical reasons

    13. statsmodels

      graspologic

    14. end

      probably

    15. outlier

      signal

    16. outlier

      signal

    17. that

      Clarify: the issue is not computing these features, but rather, interpreting them. and in particular, interpreting them in a causal light.

    18. you’d

      one could

    19. f you’re familiar with correlation, you’ll notice that these correlation numbers generally have a pretty high magnitude: each feature generally tells you a lot about each other feature.

      not quite. some are high, some are low, some say a lot about the others