142 Matching Annotations
  1. Last 7 days
    1. Lecture 3 - Before the Startup (Paul Graham)

      [[Seedling: How to start a startup?]] [[startup]]

    1. Bret Victor - Inventing on Principle

      [[Principle]] [[Seedling: principle of life]]

  2. Aug 2022
    1. Alan Kay on Learning and Computer Science

      [[problem solving]]

  3. Jul 2022
    1. The Transistor: a 1953 documentary, anticipating its coming impact on technology

      [[Video: The Transistor: a 1953 documentary, anticipating its coming impact on technology]]

    1. AT&T Archives: The Phone Boom of the 1950s

      [[Video: AT&T Archives: The Phone Boom of the 1950]]

    1. AT&T Archives: The UNIX Operating System

      [[Video: AT&T Archives: The UNiX Operating System]]

  4. May 2022
    1. What is Stablecoin?: A Survey on Its Mechanism andPotential as Decentralized Payment Systems

      [[Paper: What is Stablecoin?: A Survey on Its Mechanism and Potential as Decentralized Payment Systems]]

  5. Apr 2022
    1. Blockchain for the Metaverse: A Review

      [[Paper Survey: Blockchain for the Metaverse: A Review]]

  6. arxiv.org arxiv.org
    1. VIDEO (LANGUAGE) MODELING: A BASELINEFOR GENERATIVE MODELS OF NATURAL VIDEOS

      [[Paper: VIDEO (LANGUAGE) MODELING: A BASELINE FOR GENERATIVE MODELS OF NATURAL VIDEOS]]

    2. The former approach turns the classification problem into regression. As mentioned in sec. 1, thisis hard because it is very easy for the model to produce relatively low reconstruction errors bymerely blurring the last frame. In our experiments, we found that this approach was harder tooptimize and yielded results only marginally better than simply predicting the last frame (relativeMSE improvement of 20% only).

      [[Seedling: challenges of videos frame classifications task]]

    3. The above mentioned methods work on a sequence of discrete input values; however, video framesare usually received as continuous vectors (to the extent that 8-bit numbers are continuous). If wewant to use these methods to process video sequences, we can follow two main strategies. We caneither replace the cross-entropy loss with mean squared error (or some other regression loss), or wecan discretize the frames

      [[Seedling: challenges of videos frame classifications task]]

  7. Mar 2022
    1. Representations of temporal networks

      [[Seedling: representation of temporal networks]]

    2. Modern temporal network theory: a colloquium

      [[Paper: Modern temporal network theory: a colloquium]]

    1. Convolutional LSTM Network: A Machine LearningApproach for Precipitation Nowcasting

      [[Paper: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting]]

    1. STRUCTURED SEQUENCE MODELING WITHGRAPH CONVOLUTIONAL RECURRENT NETWORKS

      [[Paper: STRUCTURED SEQUENCE MODELING WITH GRAPH CONVOLUTIONAL RECURRENT NETWORKS]]

  8. arxiv.org arxiv.org
    1. Evaluating Link Prediction Accuracy on DynamicNetworks with Added and Removed Edges

      [[Paper: Evaluating Link Prediction Accuracy on Dynamic Networks with Added and Removed Edges]]

    1. Organizational Building Blocks for Blockchain Governance: A Survey of 241 Blockchain White Papers

      [[Paper: Organizational Building Blocks for Blockchain Governance: A Survey of 241 Blockchain White Papers]]

      [[blockchain]], [[decentralized governance]]

  9. Feb 2022
    1. Based on the above observations, thefollowing three key challenges should be addressed: (a) howto design a specific feature extraction module that highlightsmotion information and concurrently preserves both spatialand temporal knowledge from videos? (b) how to model multi-scale temporal dependencies in an explicit way to increase thetemporal diversity of large amounts of unlabeled videos? (c)how to build an unified self-supervised learning frameworkthat can learn video understanding model with strong videorepresentative power and generate well to downstream tasks?

      [[Seedling: challenges of videos frame classifications task]]

    2. For video frames, their inherent information is conveyed atdifferent frequencies [33]–[35]. As shown in Figure 2, a videoframe can be decomposed into a low spatial frequency com-ponent that describes the smoothly changing structure (scenerepresentation) and a high spatial frequency that describesthe rapidly changing details (motion representation). The low-frequency representation can retain the most of scene infor-mation. While in the high frequency, the scene informationwould be counteracted, and the distinct motion edges wouldbe highlighted. In order to capture both temporal dynamics andscene appearance through different frequencies, we calculatethe feature frequency spectrum along the temporal domainbased on the discrete cosine transform, and then distill thediscriminative spatial-temporal knowledge from videos

      [[Seedling: properties of videos frames]]

      [[Seedling: How to learn spatial invariants features from sequential/temporal varying data?]]

      [[Seedling: how to learn temporal dependencies of sequential events of any kinds]]

    3. long-term and short-term temporal dependencies have complicatedinterdependence which are correlated and complementary toeach other. Inspired by the convincing performance and highinterpretability of graph convolutional networks (GCN) [24]–[28], several works [29]–[32] were proposed to increase thetemporal diversity by using GCN in a supervised learningfashion with labeled videos. Unfortunately, due to the lackof principles to explore the multi-scale temporal knowledgeof unlabeled videos, it is quite challenging to utilize GCN forself-supervised multi-scale temporal dependencies modeling

      [[Seedling: How to embed temporal dependencies in graph? aka how to invent dynamic network embedding]]

    4. Recently,a variety of approaches have been proposed such as orderverification [10], [11], order prediction [12]–[14], speedinessprediction [15], [16].

      [[Seedling: how to design contrastive loss for dynamic graph?]]

    5. Illustration of the multi-scale temporal dependencies.The handshaking contains the long-term (inter-snippet) tem-poral dependencies of walking forward, shaking hands, andhugging, while it also includes the short-term (intra-snippet)temporal dependencies of periodic hands and feet movement

      [[Seedling: How to embed temporal dependencies in graph? aka how to invent dynamic network embedding]]

    6. TCGL: Temporal Contrastive Graph forSelf-supervised Video Representation Learning

      [[Paper: TCGL: Temporal Contrastive Graph for Self-supervised Video Representation Learning]]

    1. How EIP-1599 fixed this systemEIP-1599 has eliminated the first-price auction mechanism to calculate the fee for a transaction. In the EIP-1599 update, there is a base fee for all transactions to be included in the next block and a priority fee that speeds up the processing of transactions. The base fee fluctuates according to the network congestion and is then burned.The user submits a fee higher than the base fee with the transaction. As the base fee fluctuates with the network congestion, users can put up a fee cap. After its inclusion, the users only pay the difference between the final base fee and the fee cap.These changes in the transaction fee system allow users to estimate cost better since the base fee is the minimum price for being included in the next block. Overall, this will result in fewer users overpaying for transactions

      [[Seedling: how to economically control computation resources in blockchain?]]

    1. There are two main factors for determining the price of a transaction: gas limit and gas price.The gas limit is the maximum amount of computational resources that can be utilized to complete the transaction, and is usually estimated fairly accurately. Sometimes you’ll actually use less than estimated, in which case the leftover amount will be refunded.The least expensive transaction on the Ethereum blockchain is the simple function of sending ETH from one externally owned account (an “EOA”) to another; this would have a gas limit of 21000.The more complicated a transaction is and the more functions it involves, the more gas that transaction will require. Some highly complicated transactions can use millions of units of gas.The gas price is commonly denoted in gwei and is on a scale that starts from 0 and goes to 500+ in some extreme cases.

      [[Blockchain/gas]]

      [[Seedling: how to economically control computation resources in blockchain?]]

    1. Costs of interacting with smart contractsSince all contract executions are run by everyone running an Ethereum node, an attacker could try creating contracts including lots of computationally expensive operations to slow down the network. To prevent such attacks from happening, every opcode has its own base gas cost. Furthermore, several complicated opcodes also charge a dynamic gas cost. For instance, the opcode KECCAK256 (formerly known as SHA3) has a base cost of 30 gas, and a dynamic cost of 6 gas per word (words are 256-bit items). Computationally expensive instructions charge a higher gas fee than simple, straightforward instructions. On top of that, every transaction starts at 21000 gas.

      [[Seedling: trade off between block size and speed of blockchains]]

  10. Jan 2022
    1. The intention is that, since permissions are grouped into semantic bundles of functionality, it willbe possible to develop specialized mechanisms for mediating access to the underlying functionality(i.e. specialized funding mechanisms and specialized dispute-resolution mechanisms, as opposed toa general-purpose ‘voting’ mechanism meant to handle all possible decisions).

      [[seedling: voting mechanism in DAO]]

  11. Dec 2021
    1. While a communication event (interaction)only propagates local information across two nodes, an association event changes the topology andthereby has more global effect

      [[Seedling: Comparing relatives influence of different types of dynamical process on dynamic graph]]

    2. DYREP: LEARNING REPRESENTATIONS OVERDYNAMIC GRAPHS

      [[Paper: DYREP: LEARNING REPRESENTATIONS OVER DYNAMIC GRAPHS]]

    1. ampling-based algorithm for link prediction in temporal networ

      [[Paper: Sampling-based algorithm for link prediction in temporal networks]] [[dynamic link prediction]]

    2. on is represented b

      note: highlight doesn't work on this page, so read the note.

      seedling: how to evaluate dynamic link prediction

      6.2. Experimental setup For each temporal network in the dataset tested, we input $T{N}$ snapshots of the graphs $G{1}, G{2}, \ldots, G{T{N}} .$ At each time step $t{0}$, $t{0}=1,2, \ldots, T{N}-T$, we used the next $T$ graphs, $G{t{0}}, G{t{0+1}}, \ldots, G{t{0}+T{1}, 1}$, to test the link prediction algorithm for detection of the potential links in $G{t{0}+T}$. Because we already know the topological structure of $G{t{0}+T}$, the prediction result can be assessed by comparing the links predicted with the actual presence of the links in $G{t_{0}+T}$.

      In the first part of our experiments, we tested our TS-VLP algorithm and compared the quality of its results with that of the other algorithms based on a reduced static graph. The reduced static graph-based method has been frequently used in methods for link prediction in temporal networks. In this method, networks represented by the snapshots of the graphs are first reduced and represented by a static graph. Then, an algorithm for link prediction in static graphs is exploited to predict potential links in the reduced static graph. That is to say, the series $G{t{0}}, G{t{0}+1}, \ldots, G{t{0}+T{-} 1}$ is transformed into one reduced binary graph $G{t{0}, T}$ represented by a new adjacency matrix $A{t{0}, T}$ with the elements defined as $$ A{t{0}, T}(i, j)= \begin{cases}1 & \text { if } \exists k \in\left[t{0}, t{0}+T-1\right]: A{k}(i, j)=1 \ 0 & \text { otherwise }\end{cases} $$ Then, a static network link prediction method, such as Common Neighbor, Jaccard, Resource Allocation, Hub Promoted Index, Local Random Walk, Superposed Random Walk, and Leicht Holme Newman2, is used on the reduced static graph $G{t{0}, T}$, and the result is used as the final link prediction solution on $G{t{0}, T}$. Those reduced static graph-based methods are denoted as $C N, J C$, $R A, H P I, L R W, S R W$, and $L H N 2$, respectively. In the other parts of our experiments, we test $T S-V L P$ under different parameter values, such as error bound $\varepsilon$, probability $\delta$, and length of sampled path $L$, and show the influence of different parameter values on the precision of the results and computational time of the algorithm.

    1. Temporal Link Prediction: Techniques and Challenges

      [[Paper: Temporal Link Prediction: Techniques and Challenges]]

    2. 6.3 Link prediction challengesSeasonal fluctuation and ill-behaved node are two phe-nomena that skew the prediction proceeding. In sea-sonal fluctuations nodes add or delete a massive num-

      [[Seedling: challenges of dynamic link prediction (temporal link prediction)]]

      [[re-written]] behavior related to particular link prediction itself add another layer of complexity such as seasonal fluctuation or ill-behaved node (aka faulty node or missing nodes)

    3. 6.2 Linked-data challengesMining linked data is about how to effectively and ef-ficiently utilize information from both nodes attributeand link structure. Many existing models seek to rep-resent link structures using the selected statistic on net-works, then combine these statistics with node attributes[1]. Copping with noisy data is a main concern in linkdata. In some networks, relations can be missing or beinvalid [10]. On the other hand, sparse dynamic net-works are sensitive to noise. Precisely, noise-to-signalratio can be easily changed on sparse networks [13].

      [[Seedling: challenges of dynamic link prediction (temporal link prediction)]]

      [[re-written]] links data may not be useful for learning. (it may be noisy)

    4. ber of connections in a very short period of time, mayindicate something significant, such as a shocking eventhappening in a social network, or a serious detection ofcancer [17]. Ill-behaved nodes referring to nodes withrandom behavior, usually provide less information [17].They randomly start and end relationship with othernodes, which sabotages the network stableness. Thepoint is, that this circumstance has a nonlinear trans-formation over time and is commonly considered in dy-namic networks, which has an expensive cost for catch-ing this nonlinearity [16]

      [[Seedling: challenges of dynamic link prediction (temporal link prediction)]]

    5. 6.1 Dynamic network challengesOne of the main concern in a dynamic network is deal-ing with node failures. This phenomenon is a seriouschallenge in temporal link prediction when in differentsnapshots of graphs nodes joining and leaving the net-work. Unfortunately, most of the methods ignore thisand assume that nodes V remain the same at all timesteps [9] [4]. Unfortunately, if ignore the dynamic natureof the nodes and assume that all nodes in all snapshotsare presented, then the information is missed which cor-responds to creating a new link due to appearance ofone of its nodes, and vice versa [8]. Large dynamic net-works may be complicated by the high dimensionalityof responses, the large number of observations and thecomplexity of choices to be made among the explana-tory variable [1]

      [[Seedling: challenges of dynamic link prediction (temporal link prediction)]]

    6. inked data challenges, and link prediction specific chal-lenges(Fig. 4). In this categorization, not only dynamicnetwork challenges are dependent on link prediction,but their challenges are also primary for link predictionproblem.

      [[link prediction]] [[Seedling: challenges of dynamic link prediction (temporal link prediction)]]

    7. We classify the link prediction chal-lenges in three categories: dynamic network challenges,

      [[link prediction]] [[Seedling: challenges of dynamic link prediction (temporal link prediction)]]

    8. Given a series of snapshots {Gt−N , Gt−N+1, . . . , Gt} ofan evolving graph Gt = ⟨V, Et⟩ in which each e =(u, v) ∈ Et represents a link between u and v that tookplace at a particular time t. We seek to predict the mostlikely link state in the next time step Gt+1. The tempo-ral link prediction also called a periodic link predictionwhen N ≫ 0 [7] [15](Fig. 2). In almost every methodthat we study in this paper, assume that nodes V re-main the same at all time steps but edges Et changesfor each time t. In this paper we focus on this type oflink prediction

      [[link prediction]]

    9. Link prediction is a task of forecasting relations in anetwork. Predicting of unknown links falls into two cat-egories in accordance with the linked data: (i) MissingLink Prediction and (ii) Temporal Link prediction [16].Missing Link prediction is a task of predicting an un-seen link with the current state of network, in order tocomplete the network [5]

      [[link prediction]]

    1. , ̃X, are obtained by row-wiseshuffling of X. That is, the corrupted graph consists of exactly the same nodes as the original graph,but they are located in different places in the graph, and will therefore receive different patch repre-sentations.

      [[node feature shuffle (NFS)]]

      In this paper, NFS is applied to be used as data augmentation in transductive setting by keeping adjacency matrix untouched and shuffled node features, X.

      This allows model to learn structure invariance using global-local loss (aka patch-node loss), see the paper for more information on patch-node loss.

    2. As all of the derived patch representations are driven to preserve mutual information with the globalgraph summary, this allows for discovering and preserving similarities on the patch-level—for ex-ample, distant nodes with similar structural roles (which are known to be a strong predictor formany node classification task

      [[Seedling: characteristics and properties of graph embeddings]]

    3. Negative samples for D are provided by pairing the summary ~s from (X,A) with patch represen-tations ~ ̃hj of an alternative graph, ( ̃X, ̃A). In a multi-graph setting, such graphs may be obtainedas other elements of a training set. However, for a single graph, an explicit (stochastic) corruptionfunction, C : RN×F ×RN×N → RM×F ×RM×M is required to obtain a negative example fromthe original graph, i.e. ( ̃X, ̃A) = C(X,A). The choice of the negative sampling procedure willgovern the specific kinds of structural information that is desirable to be captured as a byproduct ofthis maximization

      [[seedling: How to choose contrastive loss?]]

    4. A key consequence is that the produced node embeddings, ~hi, summarize apatch of the graph centered around node i rather than just the node itself. In what follows, we willoften refer to ~hi as patch representations to emphasize this poin

      [[patch representation]]

    1. Figure 1: Schematic illustration of negative sampling methods for the example of classifying speciesof tree. Top row: uniformly samples negative examples (red rings); mostly focuses on very differentdata points from the anchor (yellow triangle), and may even sample examples from the same class(triangles, vs. circles). Bottom row: Hard negative sampling prefers examples that are (incorrectly)close to the anchor.

      [[Seedling: sampling bias]]

      [[hard negative sample]]

    1. Figure 2: Sampling bias leads to perfor-mance drop: Results on CIFAR-10 fordrawing x−i from p(x) (biased) and fromdata with different labels, i.e., truly seman-tically different data (unbiased)

      [[Seedling: sampling bias]]

    2. Figure 1: “Sampling bias”: The common prac-tice of drawing negative examples x−i from thedata distribution p(x) may result in x−i that areactually similar to x.

      Seedling: sampling bias

    1. Under the setup in section 4.1, the standard approach to obtainnegative samples is to draw from the marginal distribution 𝑝, inpractice 𝑝 is often approximated using the empirical distributionof the data. There are two issues with this sampling process. Thefirst issue is sampling bias. As was pointed out in [7], samplingfrom 𝑝(𝑥) instead of the (ideal) 𝑝−(𝑥′|𝑥) =𝑝(𝑥′|𝑐(𝑥′) ≠𝑐(𝑥))produces a biased sampling scheme for the negative samples, sincechances are that a sample with the same semantic class is drawn,resulting in a larger loss value and might produce very differentlearned representations [7, Figure 2]

      [[Seedling: sampling bias on graph]]

    2. Figure 2: An illustration about the necessity of time-dependent similarity formulation (6). In figure (a) above, the structure oftwo temporal subgraphs corresponding to nodes 𝑢1 and 𝑢2 are identical, and between the scope of these two subgraphs thereare no relevant interactions targeting 𝑢1. The situation becomes more complicated in figure (b), where two identical temporalsubgraphs (with target nodes𝑢1 and𝑢3 respectively) exhibit a fairly large time gap. There might be interactions targeting node𝑢1 that are not included in either subgraph, under time-independent setting, case (a) and (b) would share the same similarityscore, which is problematic because of the uncaptured interaction that happens at 𝑡2 in figure (b)

      [[Seedling: what are the types of similarity to think about when working on contrastive learning on graphs?]]

      Given that node u_1, u_2, and u_3 are different. figure 2 shows that u1 and u3 cannot be identical because figure (b) shows that u_1 structure has changed overtime. saying that u_1 and u_3 are the same disregard structure evolution of u_1

    3. In dynamic setups, using a constant weighting matrix Ωmaybeproblematic

      [[seedling: How to choose contrastive loss?]] [[Seedling: what are the types of similarity to think about when working on contrastive learning on graphs?]]

    4. he GAN-type objective was previously shown to exhibit su-perior performance on unsupervised representation learning tasksover static graphs [17, 42

      [[seedling: How to choose contrastive loss?]]

    5. Self-supervised Representation Learning on Dynamic Graphs

      [[Paper: Self-supervised Representation Learning on Dynamic Graphs]]

    6. It isthus reasonable to assume that the evolution process is in general"smooth", which means that for a small amount of elapsed time, thesemantic of each individual object remains the same. This observa-tion motivates the positive sample construction process which weterm temporal views, formally defined as follows: given an inputtuple (𝑣,𝑡,𝐺𝑘𝑡 (𝑣))and an upper limit𝛿max =max{𝑠 : 𝑠 ∈ (0,𝑡),𝑐(𝑣,𝑡,𝐺𝑘𝑡 (𝑣)) =𝑐(𝑣,𝑡 −𝑠,𝐺𝑘𝑡−𝑠 (𝑣))} (3)

      [[seedling: How to choose contrastive loss?]] [[temporal views]]

    7. In [48], it was reported that differ-ent graph perturbation strategies for static graph representationlearning (i.e., node dropping, edge perturbation) behave differentlyaccording to the distributions of the underlying graph data

      [[seedling: how to apply data augmentation policy in contrastive learning?]]

    8. We found during evaluation that using the MoCo learning ruleconsistently outperforms the E2E learning rule, hence without fur-ther clarification, the results are produced using MoCo. We willmake an additional assessment on the performance difference be-tween MoCo and E2E learning in section 5.7

      [[Seedling: how to train self supervised learning models?]]

    9. we identify two chal-lenges of self-supervised dynamic graph representation learning:Firstly, previous empirical observations [48] suggest that the effec-tiveness of different graph data augmentation methods dependson the underlying graph data. Consequently, the learned represen-tations may be less transferable compared with computer visionsetups. Secondly, recent theoretical developments in contrastivelearning [37] discovered the fact that negative samples which are"false negatives" hurt the generalization ability of contrastive frame-works. Hence it is worthwhile to design a better negative samplingprocess to improve the generalizability of contrastive approaches

      [[Seedling: how to self-supervised graph neural network?]]

      [[Seedling: my paper on self-supervised dynamic graph neural networks.]]

      [[Add to Research Idea List]]

    10. Despite the success of these time-dependent proposals, all of them still require labeled data to train.However, it is often very costly to obtain annotated labels for graph-related tasks since they rely heavily on domain knowledge [38]

      [[Seedling: How to integrate domain knowledge into design of deep learning models]]

      [[Paper: INFOGRAPH: UNSUPERVISED AND SEMI-SUPERVISED GRAPH-LEVEL REPRESENTATION LEARNING VIA MUTUAL INFORMATION MAXIMIZATION]]

    11. Thereare various ways to define pretext tasks, among which instancediscrimination trained using a contrastive objective [14, 44] has re-ceived considerable success in the domain of computer vision [6]

      [[instance level discrimination objectives]] [[Paper: Unsupervised Feature Learning via Non-Parametric Instance Discrimination]] [[Paper: Dimensionality Reduction by Learning an Invariant Mapping]]

    12. Prior work on representation learning over dynamic graphspreprocesses the dynamic graphs into a sequence of snapshots,and use recurrent or attentive architectures to combine intermedi-ate representations extracted via GNN type encoders [12, 30, 36],usually referred to as discrete-time methods [20]. The primal draw-back of discrete-time approaches is that the right granularity fortemporal discretization is often subtle, leading to coarsely cap-tured temporal information. A recent line of work [35, 40, 43, 46]explored continuous-time approaches to integrate temporal infor-mation directly into the representation learning process and wereshown to significantly improve time-independent and discrete-timeapproaches on standard dataset

      [[dynamic graph modeling/discrete-time approaches]] [[Continous-time network models]] [[Seedling: how to train dynamic graph data in deep learning]]

    13. Naive appli-cation of static graph learning paradigms such as message-passingstyle GNN to dynamic graphs fails to capture the temporal informa-tion and was shown to be sub-optimal in dynamic graph scenarios[46]

      [[Seedling: how to train dynamic graph data in deep learning]]

    14. The rule (1) does not effectively ex-ploit the temporal information. Recent works [35, 46] suggest thatcareful treatment of event times has significant impact on the per-formance of dynamic graph models, the treatment is based onan assumption that time-dependent aggregation schemes shall betranslation-invariant in time, thereby obtaining a form of generictime-encoding [45, 46] that stems from the design of random Fourierfeatures [33]. Time encoding defines a learnable map TE(𝑡) thattakes any 𝑡 ≥ 0 into a 𝑑𝑇 dimensional embedding vector, and atime-aware GNN encoder is thus defined as:ℎ𝑙+1𝑡 (𝑣) =AGG(MERGE(ℎ𝑙𝑡 (𝑣),TE(0)),{MERGE(ℎ𝑙𝑡 (𝑢),TE(𝑡𝑣 −𝑡𝑢)),𝑢 ∈N𝑡 (𝑣)})(2)where 𝑡𝑢 and 𝑡𝑣 are event times of 𝑢 and 𝑣, and the MERGE operationfuses nodes’ (hidden) features and their time encodings. Accordingto [46], simple merging functions like summation or concatenationworks reasonably well

      [[Seedling: How to embed temporal dependencies in graph? aka how to invent dynamic network embedding]]

    15. We conduct exten-sive experiments on benchmark datasets via testing the DDGCLframework under two different self-supervision schemes: pretrain-ing and finetuning and multi-task learning.

      [[pre-training and fine-tuning (PF)]] [[Join Learning (JL)]] #[[joint-training]]

    16. Inspired by recent theoretical developments in contrastivelearning, we propose a novel debiased GAN-type contrastive lossas the learning objective to correct the sampling bias that occurredin the negative sample construction process

      [[Seedling: How to efficiently train model that use contrastive learning?]]

    17. dynamic graph contrastive learning (DDGCL), the first self-supervised representation learning framework ondynamic graphs.

      [[dynamic graph contrastive learning (DDGCL)]]

    1. Next, about half ofthe new solutions drew on another existing theory (ranging from Fuzzylogic to Condorcet voting mechanism), while the other half did not.Almost all of the new algorithms aimed to address the limitations ofthe common consensus, namely: resource consumption (energy and/ortime), limited throughput, high latency, security, to name a few.

      [[Seedling: common limitation of blockchain consensus protocol]]

    1. Decentralized Autonomous Organizations on Blockchain:Analysis and Visualization

      [[Paper: Decentralized Autonomous Organizations on Blockchain: Analysis and Visualization]]

    2. Table 4.3: Voting comparison by platform and network

      [[Seedling: evaluation metrics to compare DAO performance]]

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    3. Figure 4.4: Number of active users of the three platforms. Lighter colorsjust refer mainnet users. Darker colors take into account users frommainnet and xDai

      [[Seedling: evaluation metrics to compare DAO performance]]

    4. Figure 4.3: Number of active DAOs of the three platforms. Lighter colorsjust refer mainnet DAOs. Darker colors take into account DAOs frommainnet and xDai.

      [[Seedling: evaluation metrics to compare DAO performance]]

    5. Figure 4.1: Number of DAOs of Aragon, and DAOhaus. Lighter colors just refermainnet DAOs. Darker colors take into account DAOs from mainnet and xDai.

      [[Seedling: evaluation metrics to compare DAO performance]]

    6. Figure 4.2: Number of users of Aragon, and DAOhaus. Lighter colors just refermainnet users. Darker colors take into account users from both networks.

      [[Seedling: evaluation metrics to compare DAO performance]]

    7. able 4.1: Network comparison

      [[Seedling: evaluation metrics to compare DAO performance]]

    8. Table 4.2: Comparison of the three ecosystems in terms of number of DAOs, users andproposals

      [[Seedling: evaluation metrics to compare DAO performance]]

    9. Decentralized Autonomous Organizations on Blockchain:Analysis and Visualization

      [[Paper: Decentralized Autonomous Organizations on Blockchain: Analysis and Visualization]]

    10. In the case of DAOstack, their DAO users are so-called Reputation holders. Concerningthe Stakes category, it is a special vote in this ecosystem (see Section 2.4.2). Stakes are alsoused to calculate the success rate metrics (e.g., Total success rate of the stakes, Successrate of the stakes by type), these show how good are the predictors (stakes) vs the realresult (proposal outcome), this is better described in (Faqir-Rhazoui et al., 2021b)

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    11. Holographic Consensus was validated (Faqir-Rhazoui et al., 2021b), and the results showthat usually, the larger a DAO is the better this mechanism work

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    12. With that mechanism, DAO members do not spend that much time deciding whatproposals are really interesting for the community because, if staking works correctly,stakers actually filter the good proposals from the bad ones. To be rewarded, stakers needto be aligned with the DAO global opinion; otherwise, they will lose their investment.

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    13. DAOstack proposes its own decision-making system, theHolographic Consensus (Field, 2018b, 2019). The HC states three actors in the stage; theDAO members, the proposals, and the stakers. DAO members send and vote for proposals.Proposals are approved by absolute majority. And finally, the stakers are external actors16who stake their money trying to guess if a proposal will be approved. If it finally is, thestaker will gain a bounty, if not they will lose the money staked. Besides, if a proposal isreceives enough stakes, it will reduce its majority from absolute to relative

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    14. Anaive solution to this could be reducing the required quorum (i.e., a relative majority), butit also introduces new flaws. For example, an attacker could spam lots of decisions in asmall frame-time requesting the DAO funds, and it will be easier to get the funds using alower quorum. Field states that increasing the DAO membership will reduce its resilience(Field, 2018a).

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    15. owever, the case of Colony will not be covered in this paper due to the lack of dataavailability (at the date of this work) and its novelty (released in February 2020); besides,the way that a Colony DAO works breaks with the "traditional" vote-driven DAO, whereeach action of the DAO must be voted. In Colony, they use the concept of work-driven,where works are published, and members accept them for a bounty (Mannan, 2018). So,comparing vote-driven and work-driven DAOs makes no sense due to the different natureof their conceptions

      [[work-driven DAO]] vs [[vote-driven DAO]]

      [[CRYPTO:CLNY]]

    16. As we previously said, there is no way to know how much gas fee you have to pay for atransaction. Users can set a gas limit to pay for their transactions. However, a low limitmay lead to a long time to accomplish the transaction, or even worse, the transaction maynot accomplish, and the user loses his/her money. On the contrary, a higher gas limit mayderive from an overpriced fee. Due to that, there are many tools (e.g., ETH Gas Station,4or Etherchain5) to estimate the gas limit you need to set. Usually, those tools give youan estimated price depending on the speed and the cost you need. For example, the "safelow" transaction takes less than 30 minutes to be processed, and it is the cheapest one.The "fastest" transaction takes at most 30 seconds to be processed, but it is the mostexpensive one (Pierro and Rocha, 2019).

      [[Seedling: how to economically control computation resources in blockchain?]]

    17. To better understand this, we have to introduce what is the "block gas limit."Each block is filled with a certain number of transactions, and in turn, this number is tiedto the total number of gas which those transactions spend. So, each block has a limit ofgas, which transactions can use, and this limit is set by the miners (Sousa et al., 2020).Increasing the block gas limit will also take more time to propagate the changes aroundthe network. If those blocks take more time to be processed, it will also take more time todiscover new blocks, decreasing the network’s decentralization (0xNick, 2020)

      [[Seedling: how to economically control computation resources in blockchain?]]

      [[Seedling: trade off between block size and speed of blockchains]]

    18. The idea behind the gas is to make the users pay for computational resources (e.g.,CPU, energy) whenever a transaction has to be processed (Dannen, 2017). So, each timeyou want to run your program on the blockchain, you must pay a fee for it in order tocompensate for the used resources that miners use to carry it out. However, the amountyou must pay is not fixed, and it will be tied to the complexity of the code you want to run.For example, a bare transfer may use 21,000 gas, but a more complex transaction (e.g.,decentralized finance apps) could increase its need to 1,000,000 gas (Kordyban, 2020b;ethereumprice, 2020).

      [[Seedling: how to economically control computation resources in blockchain?]]

    1. Internet Organizations must thus assume the lowest common denominator: that every memberis rationally self interested and focussed entirely on maximising personal utility and profit, andgiven incentives accordingly. This gets to the heart of Colony: a protocol seeking to facilitate thesame kind of meritocratic division of labour and authority that the idealised model of the corporatehierarchy should, except from the bottom up, and less prone to error. Decentralised, self-organisingcompanies, where decision-making power derives from a fairly-assessed contribution of value

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    2. COLONY

      [[CRYPTO:CLNY]]

    1. On all other matters, the only way you get a say in a colony’s operations is by having a good reputation within that colony. And that means you have to be involved and actively contributing to its success. In other words, only those people with a “living stake” in the enterprise count as true stakeholders with a right to influence its governance and management.

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    2. Colony makes another important distinction between reputation and tokens. Reputation gives you influence in the governance of a colony, whereas tokens generally do not. There is only one question on which token matters in colony decisions, and that’s around the fiscal policy of changing the supply of tokens.

      [[Seedling: important factors to consider when designing your colony token and reputation system. (colony coin)]]

      [[Seedling: design of governance system within Decentralized Autonomous Organization (DAO)]]

    1. neighbors and a graph diffusion. We showed that unlikevisual representation learning, increasing the number ofviews or contrasting multi-scale encodings do not improvethe performance.

      [[seedling: how to apply data augmentation policy in contrastive learning?]]

    1. [[Seedling: What are risks associated with Proof-of-Stake mechanism]]

    2. ShibaSwap will highlight our three flagship tokens: Shiba Inu: \$SHIB Leash Dogecoin Killer: \$LEASH Bone: \$BONE

      On ShibaSwap your Shibs will DIG for BONES, or even BURY their tokens. The best trainers even teach their Shibas to SWAP which allows the pup to exchange one token for another token.

      When Shibas DIG, BURY or SWAP, they generate "Returns" that are distributed to the Puppy Pools where #SHIBARMY has either BURIED their tokens or are DIGCING for BONES.

    3. Shiba Inu Token and the Shib Army have evolved beyond a simple experiment. As of the publishing of this document and the release of ShibaSwap, we have become a Decentralized Ecosystem enriched by its own DEX.

    4. "You must understand that there is more than one path to the top of the mountain"

      • [[@Miyamoto Musashi]]

      [[Quote]]

    5. The goal of ShibaSwap is to provide a safe place to trade your valuable crypto while remaining decentralized. We are loyal to our holders, and that gives us the means to grow exponentially. We will constantly scale this Ecosystem so it may bring ever increasing interested parties to the ShibaSwap platform.

      Our unique tokenomics, solid design, technical implementation, and the viral growth from our good 'ole fashion memes, will reinforce the platform's strength and ultimately provide residual benefits to the Ecosystem.

      Making a swap without a budget is a unique challenge but by tackling various genres and product lines, a focused development team rose from the ranks of our community.

    6. Shiba Inus are incredible dogs. From the tips of their little teddy-bear noses to the ends of their curled tails, they are fiercely intelligent, brave, and independent, with an equal propensity for loyalty and mischief.

      Sadly, the characteristics which make them extraordinary are the same ones which can make them a challenging pet. New or inexperienced owners can quickly find themselves overwhelmed and unprepared for life with a breed that's known for its bold (i.e. stubborn) personality.

    7. Our founder also chose to send $50 \%$ of the total supply to Vitalik Buterin's wallet because, in his words, "We sent over $50 \%$ of the TOTAL supply to Vitalik. There is no greatness without a vulnerable point and as long as VB doesn't rug us, then SHIBA will grow and survive."

    8. "To know ten thousand things, know one well."

      • [[@Miyamoto Musashi]] [[Quote]]
    9. "You can only fight the you you practice"

      • [[@Miyamoto Musashi]] [[Quote]]
    1. traditional link prediction and continuous DGNN works (Xu et al., 2020; Rossi et al., 2020). This isthe only approach not to train in a ”roll forward” manner.It is presumably beneficial to exploit the temporal information in the training set and roll forwardduring training. One way to do this is to only encode the previous snapshot when attempting topredict the next snapshot. This does not require any snapshot aggregation. This can be seen as asliding window of size 1 and is the approach used by Pareja et al. (2020) for static GNNs.Complex networks tend to be rather sparse. It might therefore be beneficial to use a sliding win-dow. We explore sliding windows of size 5 and 10. Size 10 is the default for EGCN, we chose toadditionally use size 5 to investigate whether the size of the sliding window is influential. For thestatic models, these ”sliding snapshot windows” are aggregated into one snapshot. For even sparsernetworks it may be beneficial to represent the dynamic network as an evolving network. For this,we use an expanding window. We refer to this option as ’expanding’.

      [[Seedling: how to train dynamic graph data in deep learning]]

      [[Seedling: is sliding window evaluation needed to evaluate dynamic graph models? OR does it make sense to even use it in dynamic graph models?]]

    2. A.4.1 STATICStatic GNNs encode a graph. Since our training set includes multiple snapshots, we convert thesesnapshots into one graph to enable the training of the GNN. We can aggregate an arbitrary numberof snapshots into one snapshot by including a link in the output snapshot if it occurs in any of theinput snapshots, thus turning a discrete network into a static one. We do this in three different waysand consider these approaches a hyperparameter.The most straightforward way to train a GNN on a dynamic network is to combine all the snapshotsin the training set into one big snapshot. We call this approach ’static’. This is the approach taken by

      [[Seedling: how to train dynamic graph data in deep learning]]

    3. BENCHMARKING GRAPH NEURAL NETWORKS ONDYNAMIC LINK PREDICTION

      [[Paper: BENCHMARKING GRAPH NEURAL NETWORKS ON DYNAMIC LINK PREDICTION]]

    4. Most models are spread across a large spectrumof scores, implying that optimizing the hyperparameters is essential for obtaining a representativescore for both GNNs and DGNNs

      [[Seedling: how to train dynamic graph data in deep learning]]

    5. A sliding time-window of size 5 or 10 consistentlyproduces the best results, particularly for the discrete models. This indicates that it is beneficialto use a sliding window when training DGNNs

      [[Seedling: is sliding window evaluation needed to evaluate dynamic graph models? OR does it make sense to even use it in dynamic graph models?]]

    6. get scores for missing links beyond common neighbors, whereas the other methods give a similarityscore of 0 if there are no common neighbors

      [[Seedling: Techniques for computing link prediction heuristics]]

    7. To represent link prediction heuristics we select three well established methods (Liben-Nowell &Kleinberg, 2007; Mart ́ınez et al., 2016), Common Neighbors (CN), Adamic-Adar (AA) (Adamic &Adar, 2003) and Jaccard. As well as two modern heuristics; Newton’s gravitational law (Newton)(Wahid-Ul-Ashraf et al., 2017) and Common Neighbor and Centrality based Parameterized Algo-rithm (CCPA) (Ahmad et al., 2020). The modern methods use the shortest path between nodes to

      [[Seedling: Techniques for computing link prediction heuristics]]

    8. We prepare two versions of each dataset, a directed continuous interaction network and an undirecteddiscrete network. Continuous models encode the continuous network. The static and discrete modelsencode the discrete network. In the conversion from continuous to discrete, reciprocal edges areadded to make the discrete networks undirected. The number of times an edge occurs in a snapshotis added as a weight to the snapshot’s edge

      [[Seedling: How to aggregate set of of dynamic graph in dynamic graph neural network?]]

    9. We chose interaction networks as they allowus to easily convert to more coarse-grained temporal granularities such as discrete networks. Sparsersnapshots indicate a greater imbalance between links and non-links, thus making the classificationproblem harder. Details on the datasets are found in the appendix, Section A.1

      [[Seedling: How to evaluate dynamic graph models?]]

    1. While processing massive graph streams is an active area of research, itwas hitherto unknown whether it was possible to analyze graphs in the sliding-window model. We present an extensive set of positive results including algo-rithms for constructing basic graph synopses like combinatorial sparsifiers andspanners as well as approximating classic graph properties such as the size of agraph matching or minimum spanning tree

      [[Seedling: is sliding window evaluation needed to evaluate dynamic graph models? OR does it make sense to even use it in dynamic graph models?]]

    1. “Altcoins like SHIB are primarily community-based, meaning their success is largely dependent on the success and growth of its community instead of its utility,” says Boneparth

      [[Seedling: risk of blockchains]]

    1. Layers of projection head. We experiment with differentnumber of hidden layers. Unlike [11], we only use differentlayers in pre-training and perform the linear evaluation ontop of the same backbone by removing the entire projectionhead. In Table 6, we can see using 3 hidden layers yields thebest performance and we choose this as our default setting

      [[Seedling: How to efficiently train model that use contrastive learning?]]

    2. Pre-vious work provides temporal augmentation techniques likesorting video frames or clips [42, 39, 72], altering playbackrates [75, 67], etc. However, directly incorporating theminto CVRL would result in learning temporally invariantfeatures, which opposes the temporally evolving nature ofvideos. We instead account for the temporal changes usinga sampling strategy. The main motivation is that two clipsfrom the same video would be more distinct when their tem-poral interval is larger. If we sample temporally distant clipswith smaller probabilities, the contrastive loss (Equation 1)would focus more on the temporally close clips, pullingtheir features closer and imposing less penalty over the clipsthat are far away in time. Given an input video of length T,our sampling strategy takes two steps. We first draw a timeinterval t from a distribution P(t) over [0,T]. We then uni-formly sample a clip from [0,T −t], followed by the secondclip which is delayed by t after the first. More details onthe sampling procedure can be found in Appendix A. Weexperiment with monotonically increasing, decreasing, anduniform distributions, as illustrated in Figure 3. We find thatdecreasing distributions (a-c) generally perform better thanthe uniform (d) or increasing ones (e-f), aligning well withour motivation above of assigning lower sampling probabil-ity on larger temporal intervals.0 50 100 150 2000.0000.0020.0040.0060.0080.010 experimentaltheoreticalexperimentaltheoretical(a) P (t) ∝−t + c (63.8% acc.)0 50 100 150 2000.0000.0020.0040.0060.0080.0100.0120.014experimentaltheoreticalexperimentaltheoretical (b) P (t) ∝−t0.5 + c (63.1% acc.)0 50 100 150 2000.0000.0020.0040.006

      [[re-written]] previous work on [[temporal augmentation]] such as sorting video frames, altering playback rate, etc. results in CVRL learning temporally invariant features

      For example, augment data by sorting clips is basically asking the model to learn ordered invariance features which are not necessarily features of ever evolving temporal features of the underlying input video.

      To allow the model to learn the intrinsic evolving temporal features of the videos, an alternative way is encourage the model to distinguish temporal distances between clips that are closer and further apart in the video. This is, I think, exactly what probability based sampling temporal data augmentation purposed in this paper do.

      [[Seedling: How to allow self-supervised model to learn temporal varying features? AND How to relax temporal invariance constraints in self-supervised model?]]

    3. Figure 3. Performance of different sampling distributions. Thex-axis is the temporal interval t between two clips in a video, andthe y-axis is the sampling probability P (t). We report linear eval-uation accuracy upon 200 epochs of pre-training on Kinetics-400

      [[Seedling: How to allow self-supervised model to learn temporal varying features? AND How to relax temporal invariance constraints in self-supervised model?]]

    4. Temporal Augmentation: a sampling perspective.

      [[Seedling: How to allow self-supervised model to learn temporal varying features? AND How to relax temporal invariance constraints in self-supervised model?]]

    5. Althoughthe question of how to apply strong spatial augmentationsto videos remains open, a natural strategy is to utilize ex-isting image-based spatial augmentation methods to thevideo frames one by one. However, this method couldbreak the motion cues across frames. Spatial augmentationmethods often contain some randomness such as randomcropping, color jittering and blurring as important ways tostrengthen their effectiveness. In videos, however, such ran-domness between consecutive frames, could negatively af-fect the representation learning along the temporal dimen-sion. Therefore, we design a simple yet effective approachto address this issue, by making the spatial augmentationsconsistent along the temporal dimension. With fixed ran-domness across frames, the 3D video encoder is able to bet-ter utilize spatiotemporal cues. This approach is validatedby experimental results in Table 9. Algorithm 1 demon-strates the detailed procedure of our temporally consistentspatial augmentations, where the hyper-parameters are onlygenerated once for each video and applied to all frames. Anillustration can be found in Appendix C.2

      [[Seedling: How to learn spatial invariants features from sequential/temporal varying data?]]

    6. Spatial Augmentation: a temporally consistent design

      [[Seedling: how to apply data augmentation in contrastive learning?]]

  12. proceedings.mlr.press proceedings.mlr.press
    1. A linear learning rate scaling is used here. Figure B.1 showsusing a square root learning rate scaling can improve performanceof ones with small batch sizes

      [[Seedling: How to efficiently train model that use contrastive learning?]]

    2. Figure 5 shows linear evaluation results under individualand composition of transformations. We observe that nosingle transformation suffices to learn good representations,even though the model can almost perfectly identify thepositive pairs in the contrastive task. When composing aug-mentations, the contrastive prediction task becomes harder,but the quality of representation improves dramatically. Ap-pendix B.2 provides a further study on composing broaderset of augmentations

      [[seedling: how to apply data augmentation policy in contrastive learning?]]

    3. Linear evaluation (top-1) for models trained with differentloss functions. “sh” means using semi-hard negative mining.

      [[seedling: How to choose contrastive loss?]]

    4. while(semi-hard) negative mining helps, the best result is stillmuch worse than our default NT-Xent loss

      [[seedling: How to choose contrastive loss?]]

    5. Table2 shows the objective function as well as the gradient tothe input of the loss function. Looking at the gradient, weobserve 1) `2 normalization (i.e. cosine similarity) alongwith temperature effectively weights different examples, andan appropriate temperature can help the model learn fromhard negatives; and 2) unlike cross-entropy, other objec-tive functions do not weigh the negatives by their relativehardness. As a result, one must apply semi-hard negativemining (Schroff et al., 2015) for these loss functions: in-stead of computing the gradient over all loss terms, one cancompute the gradient using semi-hard negative terms (i.e.,those that are within the loss margin and closest in distance,but farther than positive examples).

      [[seedling: How to choose contrastive loss?]]

      [[Seedling: how can relative hardness of negative samples be included in loss function]]

    6. Furthermore, even when nonlinearprojection is used, the layer before the projection head, h,is still much better (>10%) than the layer after, z = g(h),which shows that the hidden layer before the projectionhead is a better representation than the layer after.

      [[Seedling: How to efficiently train model that use contrastive learning?]]

      [[Seedling: characteristics and properties of embeddings]]

    7. Different loss functions impose different weightings of positive and negative examples.

      [[seedling: How to choose contrastive loss?]]

    8. A nonlinear projection head improves therepresentation quality of the layer before it

      [[Seedling: How to efficiently train model that use contrastive learning?]]

    9. One composition of augmentations stands out: random crop-ping and random color distortion. We conjecture that oneserious issue when using only random cropping as dataaugmentation is that most patches from an image share asimilar color distribution. Figure 6 shows that color his-tograms alone suffice to distinguish images. Neural netsmay exploit this shortcut to solve the predictive task. There-fore, it is critical to compose cropping with color distortionin order to learn generalizable features.

      [[Seedling: How to determine degree of difficulty of pretext task in contrastive learning]]

    10. Figure 5. Linear evaluation (ImageNet top-1 accuracy) under in-dividual or composition of data augmentations, applied only toone branch. For all columns but the last, diagonal entries corre-spond to single transformation, and off-diagonals correspond tocomposition of two transformations (applied sequentially). Thelast column reflects the average over the row.

      This figure shows that order of data augmentations operations doesn't matter, and combination of data augmentations operations to apply matters a lot. (at least when number of operations are exactly 2).

      [[Seedling: How to train model that use contrastive learning?]]

    11. Composition of data augmentation operations iscrucial for learning good representations

      [[Seedling: How to train model that use contrastive learning?]]

    12. Contrastive learning needs stronger dataaugmentation than supervised learning

      [[Seedling: How to train model that use contrastive learning?]]

  13. Nov 2021
    1. Maybe the sweet spot is something more like 40 functions general enough to operate usefully on a universal data structure such as lists, but also 10 sets of 6 functions each that are relevant when we take one of 10 specialized views of that universal data structure

      [[re-written]] good program should consist of flexible function that can do many things, and few sets of functions to allows human to efficiently apply our "natural born powerful associative memories" as tools for reasoning and building things.

    1. What Is a Layer-1 Blockchain?A layer-1 blockchain is a set of solutions that improve the base protocol itself to make the overall system a lot more scalable. There are two most common layer-1 solutions, and these are the consensus protocol changes as well as sharding.When it comes to consensus protocol changes, projects like Ethereum are moving from older, clunky consensus protocols such as proof-of-work (PoW) to much faster and less energy-wasteful protocols such as proof-of-stake (PoS). Sharding is one of the most popular layer-1 scalability methods out there as well. Instead of making a network sequentially work on each transaction, sharding breaks these transaction sets into small data sets which are known as "shards," and these can then be processed by the network in parallel. One of the pros when it comes to layer-1 solutions is that there is no need to add anything on top of the existing infrastructure

      [[FAQ]] is layer-1 blockchain an application layer?

  14. Oct 2021