911 Matching Annotations
  1. Nov 2023
    1. In this paper, we address the central question inherent in this trade-off: when is coordination strictly necessary to maintain application-level consistency? To do so, we enlist the aid of application pro-grammers to specify their correctness criteria in the form of invari-ants.

      I.e., invariants as a way to specify database validity criteria, which grants knowledge about which txes are safe to run in parallel and which need to run on top of each other.

    1. Use cases that requiretransactional isolation of repeatable reads [ 5] are not a good fit,as FabricCRDT commits transactions even if their read-set is out-dated.

      I.e., having tx results as CRDT allows to merge them. But this is only possible if one tx does not change read deps of the other tx. Otherwise the result of the other tx would be ivladit, and would be invalid to merge it. It would need to be re-run on top. E.g., for cases where double-spend can occur.

    2. According to our findings, FabricCRDTsuccessfully merges all conflicting transactions without any failureswhen all transactions use CRDTs.

      Note: when all txes use CRDTs. When there is a non-CRDT tx it may fail.

    3. CRDT Transactions in a Block

      Non-CRDT tx is akin to a snapshot. Multiple snapshots for the same cannot exist within a block if you don't have rules on how to merge them (approach described in XOX paper for post-order execution).

      CRDT ops can be applied on top of a snapshot.

      snapshot cannot be applied on top of CRDT.

      So a block with non-CRDT and CRDT ops seems to only be fully valid if it has snapshot first and then CRDT deltas on top.

    4. Once a transaction fails, the only option for clients is to create anew transaction and resubmit, which adds to the complexity ofFabric application development

      As shown by XOX, it's possible to retain conflicting transactions and perform them on top of detected "happened before" ones.

    1. where there exists a happened-before relationVoteRegister1

      Malicious voter may issue votes with no Vote1 "happened-before" Vote2. If we have 2 of 4 endorsment reuqirement, then he can send Vote1 to two orgs and Vote2 to the other two. Both sets of orgs will endorse the vote.

      Seems it may be tackled by imposing 3 of 4 as endorsement threshold.

      Or, in general case, > 1/2 orgs.

      Given orgs may be malicious (faulty), we can take it into an account by setting fault threshold. Say to 1/4.

      Then endorsement threshold will be: ((1/2 + 1/4) +1) * number of orgs.

      Or, in general case, endorsement theroshld = ((1/2 + fault threshold) + 1) * number of orgs

    1. Compatibility with external oracles: To allow the useof external oracles in the deterministic second executionphase, we gather and save oracle inputs in the pre-orderexecution step.

      I.e., external oracles are read deps as of pre-order execution step.

    2. A dependency exists when two transactionsoverlap in some keys of their RW sets (read-only transactionsdo not even enter the orderer). In that case, we cannot processthem independently. Therefore, we need to keep track ofdependencies between transactions so we know which subsetsof transactions can be processed concurrently

      Note: RW of dependee transaction overlaps with W of dependent transactions. As clarified later in the paper.

    3. This requires theuse of a deterministic contract language, such as Ethereum’sSolidity, which must be learned by the application developercommunity

      Or any other determenistic compute. Such as Docker, or Nix, Guix, IPVM.

    1. Rather than moving forward by a fixed number of predecessors in each round trip, this algorithm allows some commits to be skipped, so that it reaches the common predecessors faster.

      Binary search as an alternative

    1. Instead of resolvingbranches, it permits them, resulting in a Directed AcyclicGraph (DAG) structure of the blockchain rather than a linearone.

      Seems it makes possible to double-spend.

    1. In this chapter, we address theorthogonal problem of automatically suggesting an efficient set of MV for a given workloadof SPARQL queries.

      Can we avoid the problem of selecting MVs by treating each query as AST, preserving it's results? And so a set of queries would form a forest of ASTs, where shared nodes would only be computed once.

      Nodes become MVs.


      Interestingly, a node may be only partially computed.

      E.g., if there is a join node AB that joins A and B nodes, then A could be fully computed, and B could be computed only to find matching results for AB pivot.

      Although here we mix logical representation of node A, node B, and a pivot node AB, pivot(A, B), with it's physical representation of join(A, B) and it's optimisation join(A, B(A)), where B(A) is B based on results of A, which resembles embedding A and AB, rather than A, B, AB.

      These are different ASTs. B based on A not= B.


      All in all, out of logical queries physical ASTs can be derived that are optimized for those queries.


      Perhaps having embeddings as joins is optimal. E.g., AB = join(A, B(A)) ABC = join(AB, C(AB))

    2. Getting the complete set of solutions of q would require to query the residual pattern(a?,p4,?e) over the RDF dataset and join the results with the partial results of i1 andi2.

      Residial patterns could have indexes created for them on-the-fly as overlapping embeddings. Making use of precomputed solutions of those embeddings.

      So during query processing we don't seek matches for all residual patternss and join afterwards, dropping some of the matches. Rather we query for matches based on embedding matches, so we don't get "to be dropped" matches at all.

      This is an approach on "how to efficiently compute join of embeddings". And it's about having a "join embedding" that joins based on most-restrcitive embedding first. I.e., that embedding that has less matches or, if embeddings haven't been computed yet - that embedding that is expected to be most restrictive (e.g., because it has most restrictive BGB).

    3. We focus on covers with overlapping embeddings since they allow better estimations ofthe cost savings that can be achieved with them (see Section 2.3).

      I.e., covers with overlapping embeddings allow better cost saving estimation than covers with non-overlapping embeddings.

    4. Query

      This query could have its own index. Making use of (c) and (d) to maintain it's results by joining their solutions.

      I.e., there can be an index per query. Query patterns would be broken down to embeddings for the most efficient compute of queries.

      I.e., query as an AST of embeddings.

    5. However, for the rest of this thesis we only considerintensional overlaps to avoid the costly pre-computation and maintenance of extensionaloverlaps

      An extensional overlap is the one not made intensional.

      Extensional overlaps can be made intensional overlaps.

      And maintained in the same way.

    6. Let G be a data graph, I the set of indexes on G, andQ a query against G. We call an index I ∈ I eligible for Q iff P(I) v P(Q). The set ofall eligible indexes for query Q is denoted by IQ.

      The notation is meh.

      Alternative:

      G - graph

      P - pattern

      S - solution

      I - index

      EI - eligible index

      EIs = ei(P, I) - eligible indexes I for P

    7. itis not possible to map variables ?y and z from i2 to any variable of q

      ?c would contain all ?y. Since ?y is a more restrictive pattern than ?c.

      I.e., it may be possible to adopt some of the ?y matches for ?c.

    8. each occurrence of the triple pattern (?a,p1,?b) of q must be contained inthe set of occurrences of i1

      Not only be contained, but exactly match.

    9. During index creation, indexes are computed andpersistently materialized and the results are made available for the query processor. Inour approach, indexes are defined offline, i.e., created by an administrator before queriesare executed.

      Alternative approach is to create indexes at arbitrary time, e.g., after query execution.

      Then, during query execution available indexes are used. And the result can always be indexed as one needs.

      This would speed up query execution. As there is no cost of index creation and maintenance at this stage.

    10. The frequency of I in G, written as #G(P), is thenumber of occurrences of P in G.

      |O| would be more uniform in style, akin to |P|.

      Or O(P(Q), G), |P(Q)| - size of query, |O(P(Q), G)| - size of solutions.

      Or, in polish notation, (O (P Q) G).

    11. O represents the set of all occurrences of P in G

      I.e., O = solutions of P.

    12. which is the set of triple patterns in the body of Q

      I.e., a BGP.

    13. Figure2.1 shows the graph representation of a triple pattern.

      I.e., tripple pattern reified in RDF.

    14. The query pattern must be divided into that part that is executed byusing materialized results and the rest of the query patterns that are not possible or notbeneficial to execute using them

      Perhaps would be nice to have that as optimization at the query engine level?

    15. 1.2.6 Materialized View Selection Problem

      A naive approach is to have MV per each intersection of queries.

    16. The MV–selection problem can be viewed as a combinatorial optimization problemsince its solution has to be chosen from among a finite number of possible configurations.Therefore, explicit and complete enumeration and scoring of all possible MV subsets is,in principle, a possible way to solve it. This method is however impractical in most cases,since the computational effort grows exponentially with the number of candidate MV[25].

      Enumirating all possible MV combinations in order to find the best one found impractical.

    17. However, we deal with the fact that queries may notbe fully answered using only materialized results. Thus, the partial results need to beextended to answer the query.

      Resembles "Partial Materialized Views" approach used by SageDB.

    1. doc.get(“todo”).idx(1).get(“done”) := true;

      A fine dot.notation interface for delta-based CRDT

    2. In order to generate globally unique operation identifierswithout requiring synchronous coordination between repli-cas we use Lamport timestamps [42].

      Ehh. Isn't author+deps enough?

    3. “done”: true

      Ehh, why?

      This node has been marked as deleted and has"done" true field. Why is this field leaked out of the deleted node?

      Data Laced with history gets it right by having data structure as AST. Where you'd have a node with this field, and that has been marked as deleted, resulting in node and all its fields "deleted".

    1. Collaboration = keep all edits and merge them

    Annotators

  2. agregore.mauve.moe agregore.mauve.moe
    1. Agregore works by letting you load web pages and content from peer to peer protocols the same way you would load them from a website.

      I.e., use Hypercore to maintain a personal website. Use Aggregore as browser.

    1. Append a new value to the autobase.

      This makes autobase view not just a view.

      Why isn't it done in one of the inputs?

    1. Hypercore is a secure, distributed append-only log built for sharing large datasets and streams of real-time data. It comes with a secure transport protocol, making it easy to build fast and scalable peer-to-peer applications.

      Hypercore bundles CRDT with p2p.

      It is akin to OrbitDB's db, if only it'd have one writter.

      OrbitDB is a toolkit, where you can assemble your CRDT p2p solution. Making use of components: - ipfs-log as base for your CRDT - libp2p as base for your p2p - auth to manage readers and writters

      Whereas Hypercore seems to bundle it all up.

  3. dat-ecosystem-archive.github.io dat-ecosystem-archive.github.io
    1. Multi-writer will allow Dats to be modified by multiple devices and multiple authors at the same time. Each author will have their own secret key and publish a Dat with their data in it. Multi-writer fuses all of these separate Dats into one “meta-Dat” that is a view of everyone’s data combined
    1. multi-causal operations will prevent the structured log from being linearly interpretable and may severely affect performance, simplicity, and intent

      Welp, what can we do when an op does express intent that depends on multiple others? E.g., transferm money from Bob to Alice

    2. This won’t be convergent by default since the operations don’t have an inherent total order, but it’s easy to fix this by giving each one a globally-unique ID in the form of an owner UUID 3 plus a Lamport timestamp. (In the diagrams below, this is encoded as “SX@TY”, where “SX” represents the UUID for site X, and Y is just the timestamp.) With this scheme, no two operations can have the same ID: operations from the same owner will have different timestamps, while operations from different owners will have different UUIDs

      Welp, would be nice to have author inside the operation.

      Lamport timestamp doesn't seem to add any new information. It's an idx of op, which can be derived from the longest chain of its parent if needed.

      Wall clock is useful. It's akin to "decision time".

      Wall clock + author would be enough for uniqueness of operptons. And linking can be done by hash.

    3. Nonetheless, I made one additional tweak to ensure that remapping only happens very rarely. Instead of storing just the UUID in the site map, I also store the wall clock time at which the UUID was added. In the site map, these tuples are sorted first by time, then by UUID. Assuming that modern connected devices tend to have relatively accurate clocks (but not relying on this fact for correctness), we can ensure that new sites almost always get appended to the end of the ordered array and thus avoid shifting any of the existing UUIDs out of their previous spots. The only exception is when multiple sites happen to be added concurrently or when the wall clock on a site is significantly off.

      Perhaps an alternative way is to have a tx log on each devices - which will only be accreted with newly observed ops. Then it's order never changes and we can use idx as identifiers that stay evergreen.

    4. I’ve solved this with the help of a secondary CRDT that’s stored and transferred along with the CT: an ordered, insert-only array of known UUIDs called the site map. The 16-bit site identifier corresponding to a UUID is simply its index in the array.

      Reminds of the techinque WasmTree uses, where an internal representation of RDF nodes replace URLs with an integer ID and has a map to resolve URLs back.

      Except that here we use idx of a log entry in place of its UUID. Which requires to update mapping on merge of new ops. Whereas by having integer ids in entries, in place of UUIDs, we would be spared from that. Con is that there is a storage cost to store those integer ids. Which is lower than cost of UUIDS, but more than of that idx approach.

    1. We see further interesting problems around types, schemamigrations, and compatibility. Different collaborators may beusing different versions of an application, potentially withdifferent features. As there is no central database server,there is no authoritative “current” schema for the data. Howcan we write software so that varying application versionscan safely interoperate, even as data formats evolve?

      Events to the rescue?

    2. CRDTs do not require a peer-to-peer networking layer;using a server for communication is fine for CRDTs. However,to fully realize the longevity goal of local-first software, wewant applications to outlive any backend services managedby their vendors, so a decentralized solution is the logicalend goal.

      Should not bake a server into a lofi software as a mean to sync. It needs to outlive the server.

    3. URLs are a good mechanism for sharing

      URLs as collaboration sites been found handy.

      They could be treated as "compilation sites" on a subject.

      Where each user could have their own compiled view.

      I.e., they are IDs of subjects. Note: ID not= content. Content is user-specific. A personal compilation / view on it.

    4. Visualizing document history is important

      Would it be valuable to have it generic?

      Same time-travel interface for all the apps.

    5. However, in all the prototypes we devel-oped, we found that the default merge semantics to be suffi-cient, and we have so far not identified any case requiringcustomised semantics

      The mentioned "one user sets state on a task to one state and another to another" happened. As mentioned, it's not clear how to resolve. So they did encounter merge conflicts. I guess these conflicts are not app-specific and conflict resolution can be implemented in a generic way as well. Such as present users with choice which delta to adopt.

    6. In practice, the CouchDB model has not been widelyadopted [42]. Various reasons have been cited for this: scala-bility problems when a separate database per user is required;difficulty embedding the JavaScript client in native apps oniOS and Android; the problem of conflict resolution; theunfamiliar MapReduce model for performing queries; andmore. All in all, while we agree with much of the philosophybehind CouchDB, we feel that the implementation has notbeen able to realize the local-first vision in practice.

      CouchDB is a meh local-first backend (storage).

    7. These thick-client apps have the advantage of being fastand working offline, because the server sync happens in thebackground.

      Rather "thick-client local-first apps". As "thick-client" on it's own does not imply local-first, there may be a server.

    8. web apps will never be ableto provide all the local-first properties we are looking for,due to the fundamental thin-client nature of the platform

      Web apps can be thick clients.

      Optionally they can be loaded gradually. E.g., with granularity down to components, as shown by ReactJS.

      Having loaded the entirety - they can potentially work offline.

      Making web app and its components content-addressable would allow to load them with no regard to location / some authoritative service.

    9. In many web browsers,if the user clears their cookies, all data in local storage isalso deleted [121]; while this is not a problem for a cache,it makes the browser’s local storage unsuitable for storingdata of any long-term importance

      Local Storage is not a good solution for persistence.

    10. In principle it is possible to collaborate without a repository service, e.g.by sending patch files by email [48],

      Sending git diffs as text over emails or other medius is the way it's been meant to use, if I'm not mistaken.

    11. It’s interesting to note that most software engineers havebeen reluctant to embrace cloud software for their editors,IDEs, runtime environments, and build tools. In theory, wemight expect this demographic of sophisticated users to em-brace newer technologies sooner than other types of users.

      I.e., software engineers may be best keen to adopt lofi concepts.

    12. But Git has no capability forreal-time, fine-grained collaboration, such as the auto-matic, instantaneous merging that occurs in tools likeGoogle Docs, Trello, and Figma

      Because it captures snapshots and not deltas.

    13. Git is excellent for asynchronous collaboration, espe-cially using pull requests (Figure 6, [61]), which takea coarse-grained set of changes and allow them to bediscussed and amended before merging them into theshared master branch

      There is no "shared master branch" in git. Subsequently, no authority that decides what's in it. Each git user is free to adopt any changes they see fit.

    14. If your computer’s hard drive fails, you canrestore your work simply by installing the app and waitingfor it to sync.

      In an event when the sync service is down and you lose local copy - you won't be able to restore.

      For this case it's better to have multiple sync options. Such as your other devices and incentivized third-parties.

    15. The flip side to this is a total loss of ownership and control:the data on the server is what counts, and any data on yourclient device is unimportant — it is merely a cache

      Whereas it is the opposite in reality. Devices are the primary sites of intent they issue. And server is merely an aggregation site.

    16. Attachments are easy to understandand trustworthy.

      Given email is signed.

    17. Using a version control system such as Git

      Git alone does not dictate how one transfers stuff.

    18. Consider a significant personal creation, such as aPhD thesis or the raw footage of a film. For these you mightbe willing to take responsibility for storage and backups inorder to be certain that your data is safe and fully under yourcontrol

      Loosing a PHD paper, especially after it has been published and some other papers already referred to it / based their reasoning on top of it, would be a serious hit to humanity's knowledge. Sadly, this is the reality today, may happen.

    19. With data ownership comes responsibility: maintainingbackups or other preventative measures against data loss,protecting against ransomware, and general organizing andmanaging of file archives

      Would be nice for lofi software to make it as seemless as replicating across chosen parties automatically.

      E.g., in your account you linked your devices, and chosen them as replication sites. In addition you chosen a third-party, e.g., Web3Storage.

      So you can use apps and rest worry-free about persistence of your data.

    20. so you have the freedom to process thisdata in arbitrary ways

      Even better if software uses known data model, such as RDF, and so you're not locked onto the tool.

      Perhaps even better to capture your intent as events and have data models derived out of it. This way you're not even locked at e.g., RDF ecosystem.

    21. You should be able to copy and modifydata in any way, write down any thought,7 and no companyshould restrict what you are allowed to do

      I.e., don't rely on third-party as a medium to express yourself. Otherwise they're in control what you can express.

      E.g., Twitter/X, Google Docs and their bans.

    22. Although there does not seem to be a great danger ofGoogle shutting down Google Docs anytime soon, popu-lar products (e.g. Google Reader) do sometimes get shutdown [106] or lose data [105], so we know to be careful

      I have dread in my head every time I use cloud service that there is an issue about to happen. There will be a data breach, data loss, service becoming unavailable (e.g., they don't like you anymor, e.g., Github can block you from your private repos at their will), it's just a matter of time.

      So I need to get my data out, back it up, if service been kind to allow for that. But that's a burden. It's a hidden cost you learn to recognize over time. May not be apparent from the start to everybody.

    23. Cuneiform script on clay tablet, ca. 3000 BCE. Image from Wikimedia Commons [5].

      This needs explanation.

    24. Besides having severalpeople edit the same document in real-time, it is sometimesuseful for one person to tentatively propose changes thatcan be reviewed and selectively applied by someone else

      This stems from the design that there is a mutable place and some authority over it. E.g., a Google Doc, Wikipedia page.

      Would be nicer to embrace people as primary sites of their intent. And allow to compose them. This way a person who would previously need to request changes to a doc, would simply have a kind of forked doc locally, where he would make the change. And that delta may be shared with the original doc's authors, who may accept or reject it. With no regard to their decision, the forked version stays as intented, with the delta.

      I.e., docs are sites that aggregated some deltas. Have a site per person. Allow sites to exhcange deltas. This allows for having personal views on a subject.

      Akin to how it's done in Git. If only it tracked deltas and they can be arbitrary.

      I.e., NextGraph gets it pretty damm right.

      Also RhizomeDB is this spirit.

    25. neverneeding to show you a spinner while you wait

      For the cases when you do wish to send op to a remote party you'd still need a spinner. Given you embrace distributed compute.

    26. Local-first software is different: because it keeps the pri-mary copy of the data on the local device, there is never aneed for the user to wait for a request to a server to complete.

      Local-first has most responsive UX due to operations being performed locally, as opposed to sending them to the server.

    27. The user interface may try to hide that latency by showingthe operation as if it were complete, even though the requestis still in progress — a pattern known as Optimistic UI [92] —but until the request is complete, there is always the possibil-ity that it may fail (for example, due to an unstable Internetconnection).

      Optimistic UI / Optimistic Updates may actually fail at the source being updated (suprise-suprise, the word "optimistic" is in the definition).

    1. Online eventprocessing (OLEP) [36] is an approach for handling such multi-partition interactions by breaking them down into multiple streamprocessing stages.

      This breaks down the original intent transferMoney(Alice -> Bob) into multiple one. Thus we don't have atomicity, as shown in the paper.

      Breaking down seems to be just one approach of how such events can be handled by OLEP though. An implementation detail.

      It seems possible to achieve atomicity and isolation for such events, as described here.

    2. then it is no longer sufficientto process the events in each partition independently

      We can treat partitions as aggregates of events.

      Given there is Alice and Bob, and Alice transfers Bob money.

      Then we have 3 partitions, that aggregate their respective events.

      Event transferMoney(Alice -> Bob) would end up in all three partitions: Alice, Bob, Alice+Bob.

    3. A challenge in many event-based systems is how to handle changesin the schema or data format [56]. The problem is especially pro-nounced in systems where we cannot guarantee that all replicasare running the same version of the software. For example, in ap-plications that run on end-users’ devices, it is up to the user todecide when to install a software update; thus, a user who is hes-itant to install updates may be running a much older version ofan application than a user who always installs the latest version.Nevertheless, those users should be able to interoperate as much aspossible, which means that any data format changes must be bothforward and backward compatible. The challenge becomes evengreater if users are able to customise the software they are running,e.g. through end-user programming [30].

      Event-processing logic may be different across users.

    4. In systems that are based on immutable events, one promisingapproach is to use bidirectional functions (lenses) to convert be-tween different versions of a data model, which allows differentreplicas to use different state representations while still being ableto interoperate [49].

      Bidirectional lenses as a way to integrate different data models.

    5. Partially ordered event-based systems are well placed to supportsuch branching-and-merging workflows, since they already makedata changes explicit in the form of events, and their support forconcurrent updates allows several versions of a dataset to coexistside-by-side.

      Event-based systems allow for divergent views as first-class citizens.

    6. in Git terms, one user can create a branch (a setof commits that are not yet part of the main document version),and another user can choose whether to merge it

      Users are empowered to create composed views out of events of their choice.

      I.e., collaboration as composition.

      I.e., divergent views as first-class citizens.

    7. Database transactions support a weak form of multi-version con-currency control by allowing an uncommitted transaction’s writesto be either committed or rolled back by aborting the transaction [8].However, most databases do not allow one user to share the state ofan uncommitted transaction with another user, and most databasesdo not allow the user to find out what data has changed in anuncommitted transaction (the equivalent of git diff). Moreover,in most database systems, an uncommitted transaction may holdlocks and thus prevent other transactions from making progress.

      I.e., DB-first approach does not have divergent view as first-class citizen.

    8. compute the differencesbetween versions of the document to visualise the change history
    9. we can reconstruct the state of thedocument at any past moment in time
    10. an initial event represents only the intention toperform a certain action
    11. Moreover, if processing an event may have external side-effectsbesides updating a replica state – for example, if it may trigger anemail to be sent – then the time warp approach requires some wayof undoing or compensating for those side-effects in the case wherea previously processed event is affected by a late-arriving eventwith an earlier timestamp. It is not possible to un-send an emailonce it has been sent, but it is possible to send a follow-up emailwith a correction, if necessary. If the possibility of such correctionsis unacceptable, optimistic replication cannot be used, and SMR oranother strongly consistent approach must be used instead. In manybusiness systems, corrections or apologies arise from the regularcourse of business anyway [27], so maybe occasional correctionsdue to out-of-order events are also acceptable in practice.
    12. However, when a replicaprocesses events out of timestamp order (inserting an event some-where in the middle of the timestamp-ordered sequence), it mustbe able to roll back the replica state to the state at the time corre-sponding to the insertion position, apply the new event, and thenreplay the events whose timestamps are greater than that of thenew event [64]. This approach is known as time warp [31].

      time warp as solution for eventually consistent derived state out of events.

      Alternative solution is to keep snapshots.

      These two solutions can complement each other.

    13. In a partially ordered system it is still possible to enforce a to-tal order on events after the fact, as illustrated in Figure 2. Wedo this by attaching a logical timestamp to each event; Lamporttimestamps [45] are a common choice.
    14. However, other eventsmay be concurrent, which means that neither happened before theother; in this case, different replicas may process those events in adifferent order [10].
    15. If permanent deletion of records is required (e.g. to delete per-sonal data in compliance with the GDPR right to be forgotten [62]),an immutable event log requires extra care.
    16. In applications with a high rate of events, storing and replaying thelog may be expensive
    17. the level of indirection between the event log andthe resulting database state adds complexity in some types of appli-cations that are more easily expressed in terms of state mutations
    18. itis less familiar to most application developers than mutable-statedatabases
    19. Blockchains and distributed ledgers also use SMR, in which casethe chain of blocks (and the transactions therein) constitutes theevent log, the ledger (e.g. the balance of every account) is the result-ing state, and smart contracts or the network’s built-in transactionprocessing logic are the state transition function [66].
    20. it is easy to maintain severaldifferent views onto the same underlying event log if needed
    21. If the applicationdevelopers wish to change the logic for processing an event, forexample to change the resulting database schema or to fix a bug,they can set up a new replica, replay the existing event log usingthe new processing function, switch clients to reading from thenew replica instead of the old one, and then decommission theold replica [34].
    22. well-designed events oftencapture the intent and meaning of operations better than eventsthat are a mere side-effect of a state mutation [68].
    1. Disadvantages of the OLEP ap-proach. In the previous examples, log consumers update the state in data stores (the database and search index in Figure 2; the account balances and account statements in Figure 3). While the OLEP approach ensures every event in the log will eventually be processed by every consumer, even in the face of crashes, there is no upper bound on the time until an event is processed.This means if a client reads from two different data stores that are up-dated by two different consumers or log partitions, then the values read by the client may be inconsistent with each other. For example, reading the source and destination accounts of a payment may return the source ac-count after the payment has been pro-cessed, but the destination account before it has been processed. Thus, even though the accounts will even-tually converge toward a consistent state, they may be inconsistent when read at one particular point in time

      There are two culprits here: 1. Original intent (transfer from Alice to Bob) has been split into different intents (withdraw from Alice, receive to Bob)

      I.e., there is no atomicity at the level of original intent, and subsequently there is no isolation.

      1. Reads are executed on derived views (DBs) that are stale.

      I.e., there is no consistency.

      Instead we can have 1 event that describes the original intent. It can be complemented by withdrawal approval and receiving approval. This augmented event, in form of (approvedReception (approvedWithdrawal (transaction Alice->Bob)) can be replicated across the corresponding source account and destination account logs. This way we have holistic view on what happens. This augmented event can be replicated across source account and destination account. Leading to an eventually consistent states there. Since we have requirement on atomicity of writes this event belongs in Alice+Bob log. Where each event relating to both Alice and Bob end up. Alice log and Bob log depend on Alice+Bob log. Thus ensuring that Alice and Bob are consistent with regard to Alice+Bob transactions.

      To cater for 2), we need to make sure that reads happen on top of the latest log entry. We can represent read as an event that depend on the latest log event. Thus, it'll be processed on DB state as of that dependent event.

    2. Splitting a “transaction” into a mul-tistage pipeline of stream processors allows each stage to make progress based only on local data; it ensures that one partition is never blocked waiting for communication or coordination with another partition.

      Event logs allow for gradual processing of events.

    3. From a data analysis point of view, an event log is more valuable than the state in a database. For example, in an e-commerce setting, it is valuable for business analysts to see not only the final state of the cart at checkout, but also the full sequence of items added to and removed from the cart, since the removed items carry information, too (for example, one product is a sub-stitute for another, or the customer may return to buy a certain item on a later occasion).

      Event logs represent deltas. Whereas DB's are snapshots. Deltas may be useful in order to get insight about intentions.

    4. In contrast, in a database that sup-ports arbitrary insertions, updates, and deletes, it is much harder to re-cover from incorrect writes, poten-tially requiring the database to be re-stored from a backup.

      DB-first approach makes harder to reconcile happened events.

    5. Some log-based stream processors such as Apache Flink support so-called exactly-once semantics, which means that even though an event may be pro-cessed more than once, the effect of the processing will be the same as if it had been processed exactly once. This behavior is implemented by manag-ing side effects within the processing framework and atomically commit-ting these side effects together with the checkpoint that marks a section of the log as processed.

      Only-once semantics for side-effects by linking side-effects to a checkpoint. Thus we have a kind of a commit with side-effects as of some checkpoint. They are processed only then. And having checkpoint guarantees that it'll be picked from this state, and so side-effects won't be issued twice.

      This is not the same as idempotence of side-effects. It merely guarantees that side-effects is expressed only once.

      It relies that event processing won't replay events before a checkpoint. This is not the case for all architectures. Some may use time-warp technique. Some may replay from the start.

    6. Since it is possible for an event to be processed more than once when re-covering from a failure, state updates must also be idempotent.

      Idempotency of side-effects as a solution for multiple invocation of the same side-effect (intent).

    7. Thus, if a subscriber crashes and re-starts, it may append duplicate events to other logs.

      A solution is to have events idempotent.

      E.g., to have events describe intent, and be content-addressable.

      Then, the same intent = the same content-based id = noop by others that already received it.

    8. A subscriber periodically check-points the latest LSN it has processed to stable storage. When a subscriber crashes, upon recovery it resumes pro-cessing from the latest checkpointed LSN. Thus, a subscriber may process some events twice (those processed between the last checkpoint and the crash), but it never skips any events. Events in the log are processed at least once by each subscriber.

    Tags

    Annotators

  4. Oct 2023
    1. To refer to a piece of content, assign it an immutable Unique ID (UID). Use that UID in operations involving the content, instead of using a mutable descriptor like its index in a list.

      Or use ID for a node in causally aware DAG as its hash.

    1. all commits depending on an expiring commit must expire at the same time as, or earlier than the one they’re depending on.

      why so?

    2. Data deletion is possible by setting an expiry time for the storage objects.

      This is akin to specifying "validTo". However, why remove "no more valid" commits? The can be of use for time-travelling queries? E.g., "give me the state as of that validTime". And performing such queries at time when something's expired could omit/filter it out.

    3. its commits can be garbage collected

      those, that are not used in other branches

    4. or the branch owner can make a snapshot and compact the commits of the previous branch and thus remove dependencies on earlier commits

      Huh, we're losing authorship this way. And commit deduplication.

    5. and removing access from past members by excluding those members from the newly encrypted branch definition

      why not have a removeAccess kind of commit?

      Which would allow to have management of authorized parties without the need for creating a new branch.

    6. in order to reduce branching in the DAG

      how so?

    7. and to allow deduplication of commit bodies

      Nice

    8. Valid commits are acknowledged by a quorum of publishers in subsequent commits.

      We may end up in scenario, where a SPARQL tx that generated commits acrsoss multiple repos is only partially valid. In one repo it's commit is valid, in another it's consider invalid. Leaving us in a half-horse-half-zebra state.

    9. ackDelay

      commitAckDelay?

    10. Reliability

      Transmitting an event twice is a noop when we have causal deps on events. Is this purely for optimization purpose?

    11. Causal delivery

      Why have it if each commit has it's dependencies?

    12. public key & secret

      Why have secret? Is public key not enough to uniquely ID a repo?

    13. Each user is responsible for acquiring access to one or more core nodes of their choice, which can be self-hosted or offered by service providers.

      So the brokers are not per pub/sub / repo, but per user. They are a contact point / API / gateway for that user.

    14. The core network facilitates communication among remote nodes in different edge networks, and enables asynchronous communication via store-and-forward message brokers that provide routing and storage services.

      These brokes take on two responsibilities: 1) overall network health 2) store-and-forward for specific overlay

      They'd need to be incentivized. Stakeholder of responsibilities differ. 1) stakeholders are everybody 2) stakeholders are that specific overlay's members

      How this incentive's done? Can IPFS services be used for 2)? Such as Web3Storage for store.

    15. Data that has been removed by a commit remains in the branch, since all commits are kept in the branch.

      Since it's content-addressed, can peers agree on not keeping / pinning such content?

    1. how does decentralised identity and authentication work?

      e.g., auth as DID that have public keys on it

    2. CRDTs are the answer to this challenge as they bring strong eventual consistency and also offline-first capabilities.

      CRDTs at the level of compute-as-data especially interesting imo

    3. E2EE forces the data to be present and manipulated locally, which in turn implies that data processing (query engine) must run in embedded/mobile devices.

      Still, a third-party can be authorized with access to data and asked to perform a query on behalf of user.

    4. Therefor decentralised PubSub technologies should be used in order to synchronise efficiently semantic data between peers.

      Semantic Web PubSub on top of libp2p?

    1. O(m ∗n) time

      Hmm, it seems that the proposed algorithm may terminate prior to reaching n, having found the latest dependency.

      Additionally, algorithm can be restructured to complete in one pass, as "go through the log until you have found all latest deps". Then algorithm will have time complexity up to O(n). If I'm not mistaken.

      As an alternative, perhaps keeping indexes per each variable may provide interesting tradeoff of time for space. UPD: this technique is described in 3.3.1.

    2. Connect the new node with existing DFG nodes to track any AST readdependencies.

      It seems this can also be delayed up to read request. Since dependencies are needed to perform computation, and computation is only needed for read.

    3. Each node in the DFG is labelled with the positive integer, whichspecifies their specific position in the ordered sequence.

      Could instead store the last AST and have that AST point out to dependent ASTs, making one AST DAG. Content-addressing of ASTs seems nice too

    1. familiar querying power of SQL in the client.

      SQL may not be that familiar for web devs.

      GraphQL and dot.notation is what they're used to.

    2. They both suggested emulating API request/response patterns through a distributed state machine running on a replicated object.

      Have authority to sign / process updates.

      I.e., there's an authority, e.g., a shop, and it would need to process requests.

      What the architecture can do is to abstract from URLs/exact endpoints, having ID of the authority instead, and abstract away transport layer, i.e., nodes in network talk with each other and we don't care how, all we care is expressing intents in data and they get passed around somehow.

    1. I had off channel conversation with @expede where she identified following problem with the task/spawn / task/fork design Introduces more privitives for things that we can already do (enqueuing more tasks) Introduces nesting of invocations (more cycles and network to discover the entire workflow) Need to add a version that says "but don't run it yourself" when that's not a problem that we have today The label "spawn" reads poorly if it's going to dedup (looks like an imperative command but isn't) One that especially resonated that is that they imply that you always ask executor to run the task as opposed to find a prior task execution.

      Good points.

    1. actors

      couples produces with consumer. pubsub would be a simpler approach

    2. Task

      Receipt, perhaps?

    3. WasmTask

      Task, perhaps

    4. signature

      Have it as part of Workflow, signing over its fields? As in UCAN and Invocation spec

    1. This field helps prevent replay attacks and ensures a unique CID per delegation.

      have REQUIRED decisionTime instead?

    1. which requires gaining a global lock on the job

      Perhaps an alternative strategy is to authorize side-effectful resources with "only once" restrictions?

      E.g., issue UCAN that limits capability to 1 invocation.

      E.g., one tweet.

      It can be freely delegated by executor further, but only one invocation will be possible.

    2. There may simply be a tradeoff for the programmer to say "yes I really need this effect, even though I'll have fewer options".

      Decouple effect from pure compute? Make it "await" that compute.

  5. Sep 2023
    1. Yeah, I agree with you, at least in principal! I think that revocation and memoization get harder if we aren't including the proof CIDs in the credential directly. How would you handle these cases?

      Memoization: ``` let allUCANs = findAllUCANs(UCANs, UCANRevocations) // a solution of UCANs that allows for op let opUCANs = findOpUCANs(op, allUCANs) let whetherWithinTimeBounds = withinTimeBounds?(opUCANs, now)

      let whetherStillCan = stillCan?(opUCANs, opUCANRevokations, whetherWithinTimeBounds) // memoized // becomes false when revokations arrive or time bounds exceeded ```

    2. I do still think that defining and spec-ing unattenuated delegation is good idea. With all the above I also think that { with: “*”, can: “*” } is the most intuitive way.

      att represents attenuation / restriction / narrowing down.

      Perhaps absence of attenutaion (att field) is a way to represent "nothing's restricted".

    1. because it now has to beevaluated twice

      Even if BGP evaluation engine is to cache results, merging low-selectivity BGP, b1, would incur costs of joining it with the merged-in nodes, b2 and b3. Which is one merge more than in the original BE-tree.

    2. μ1 ∈ Ω1 ∧ μ2 ∈ Ω2 ∧ μ1

      Shouldn't these be intersecting / common bindings?

      I.e., for each common binding, they should be equivalent.

      I.e.,

      Ω1 |><| Ω2 = {μ1 ∪ μ2 | for each common variable Vcommon, u1(Vcommon) ~ u2(Vcommon)}

    3. 1

      meant 2, I guess

    4. 1

      meant 2, I guess

    1. (tp1 ✶ tp2) ✶ tp3

      But we need tp2 >< tp3.

      This example rewrite violates that, as optional tp2 will be added on top even if it doesn't >< with tp3.

    2. E.g

      By lifting optionals to the upper level this example of ((Pa ]>< Pb) >< (Pc ]>< Pd)) ]>< (Pe ]>< Pf) could be transformed into: ((Pa >< Pc) >< (Pa ]>< Pb) >< (Pc ]>< Pd)) ]>< (Pe ]>< Pf) in prefix notation with multiple args to functions, looks: (]>< (>< Pa Pc) Pb Pd (]>< Pe Pf))

    3. However, the inner-join P2 = tp2 ✶ tp3 has tobe evaluated before the left-outer-join P1 ✶ P2, due to therestrictions on the reorderability of left-outer-joins.

      ]>< is kinda "enhancing".

      I.e., can be evaluated on top of ><.

      And so it can be lifted to the upper layer.

      E.g., tp1 ]>< (tp2 >< tp3) = tp1 >< ((tp1 ]>< tp2) >< (tp1 ]>< tp2))

      E.g., (tp1 ]>< tpo1) >< (tp2 ]>< tpo2) = ((tp1 >< tp2) ]>< tpo1) >< ((tp1 >< tp2) ]>< tpo2)

      Or, if we allow joints to be a function of many arguments, in prefix notation, then: (]>< tp1 tpo1 tpo2) (]>< (>< tp1 tp2) tpo1 tpo2)

      Overall, we can built a plan where non-optionals are evaluated first and then enhanced by optionals. And it makes sense to do so, it's the least computation-expensive strategy.

    4. e.g., in the case of Q2 above, left-outer-join between tp1 andtp2 cannot be performed before the inner-join P2 = (tp2 ✶tp3).

      I.e., tp1 ]>< (tp2 >< tp3) not= (tp1 ]>< tp2) >< tp3 This shows that ]>< is not associative.

      To show that it's not commutative, tp1 ]>< (tp2 >< tp3) not= (tp2 >< tp3) ]>< tp1

      Also, ]>< is not distributive over ><.

      E.g.,

      tp1 ]>< (tp2 >< tp3) not= (tp1 ]>< tp2) >< (tp1 ]>< tp2)

      Also, >< is not distributive over ]><.

      E.g., tp1 >< (tp2 ]>< tp3) not= (tp1 >< tp2) ]>< (tp1 >< tp2)

    1. It is useful to be able to have queries that allow information to be added to the solution where the information is available, but do not reject the solution because some part of the query pattern does not match.

      Optional is meant to only accrete solution with values, never restrict a match for a solution.

    2. It is useful to be able to have queries that allow information to be added to the solution where the information is available, but do not reject the solution because some part of the query pattern does not match.

      Optional is meant to only accrete solution with values, never restrict a match for a solution.

    3. GROUP BY

      Can also be used with multiple variables.

    4. (?p*(1-?discount) AS ?price)

      This, and BIND, could be expressed as CONSTRUCT, allowing for uniform representation of how data's stored - as triples.

      E.g., sparql PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX ns: <http://example.org/ns#> SELECT ?title ?discountedPrice WHERE { ?x ns:discountedPrice ?discountedPrice} CONSTRUCT { ?x ns:discountedPrice (?p*(1-?discount)) } WHERE { ?x ns:price ?p . ?x dc:title ?title . ?x ns:discount ?discount }

      I.e., clojure (-> graph match-product-with-discounts derive-discounted-price (select-keys [:title :discountedPrice]))

    5. (?p*(1-?discount) AS ?price)

      This can be expressed as BIND at the level of query, sparing the need to introduce another special bind at the level of SELECT. Although SELECT may work on GROUPped solutions,

    6. GRAPH ?g

      CONSTRUCT returns a graph, perhaps it would be of value to be able further feed it into queries, perhaps as GRAPH ?constructedGraph {query}

    7. HAVING operates over grouped solution sets, in the same way that FILTER operates over un-grouped ones.

      What's value of HAVING over writing it as FILTERing of a subquery?

    8. ToMultiSet

      There won't be duplicate values, yet we wrap with a multiset

    9. Filter( ?v1 < 3 ,   LeftJoin( BGP(?s :p1 ?v1), BGP(?s :p2 ?v2), true) ,   )

      Here, for ?s that are to be removed, they still will get LeftJoined with the optional. Seems like a redundant work. Perhaps filter first and then add optional?

    1. However, after having received the complete sequence of triples, itturns out that μ2 is a solution for the query but μ1 is not; instead, thefollowing new mapping is another solution in the (sound and complete)query result:

      From what I understand, OPTIONAL would behave like that only when appearing preceeding to non-optional clause. When it appears after a non-optional clause, it may only accrete values, but it would not restrict.

    Annotators

    1. In fact, as updates change the content of the RDF store, all the activesubscriptions must be checked on the same RDF store snapshot

      It would be possible to not block on updates if they were to captured with tx time. Then notifications would be able to grab the delta at tx time and do notification detection in parallel to coming updates.

    1. ucan:./* represents all of the UCANs in the current proofs array.

      If we are to have UCANs as delta restrictions, then this behaviour would be expressed automatically - all capabilities of UCANs in proofs would be delegated as is if no further restrcitions are specified.

    2. diagram

      Perhaps delegation arrows could be reverse to denote that they include / reference

      Also a separate prf field would be of use to show for tracking delegation

      Also arrows between capabilities are misleading, capabilities are not references

    3. All of any scheme "owned" by a DID

      ", that the issuer is authorized to"?

    4. "*"

      Is there value in having "*", perhaps instead we could make ucan-selector optional?

    5. eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCIsInVjdiI6IjAuOC4xIn0

      Not clear what's in it, perhaps annotate that it's a UCAN or substitute with a human-readable hash "bafy...UCAN"?

    6. Escalation by adding new capability

      "Amplification by adding new caveat"?

    7. capability

      "caveat"?

    8. Removes

      "Attenuates"?

    9. Escalation

      "Amplification"?

    10. as an empty array means "in no case"

      This is contrary to the intuition behind caveats - they are restrictions, if no restrictions set - allow all.

    11. (an "OR", NOT an "and")

      "and" seems useful to have.

      E.g., in the example above I'd read it as "and", actually json "mailto:username@example.com": { "msg/send": [{}], // Proof is (correctly) broader than claimed "msg/receive": [ { "max_count": 5, "templates": [ "newsletter", "marketing" ] } ] },

    12. Capabilities composed from multiple proofs (see rights amplification)

      Could be unified with 1., having it as

      A strict subset (attenuation) of the capability authorities from the prf field

    13. Ability

      "Operation"? The way it's named in incovation spec.

    14. Abilities

      "Ability" or "Capability"?

    15. Capabilities

      "Abilities"?

    16. Capabilities composed from multiple proofs (see rights amplification)

      What's value in having amplification?

      Such composed tokens may become partially invalid if one of the proofs becomes revoked.

      E.g., composing write access to / and /photos/ into / And having / proof become invalid would prevent access to /photos/, as it's been merged. Keeping them separately would insure more invalidation-resilient token. Although that is a point towards not merging rather than composition.

      Also, composition may accidentally attenuate, as merged / access inherits timeframe of /, whereas timeframe of /photos/ could have been larger.

      Also, due to difference in caveats merging by ability with no regard to caveats may lead to / access attenuate less-caveaty /photos/ access. We could try to determine if / caveats are more ampe than /photos/ caveats and merge only then, but that may be error-prone and complex to do.

      Also proofs most likely will have different valid timeframes, so the composed token will be partially invalid at some time.

    17. This map is REQUIRED but MAY be empty.

      Seems redundant. What's value in having an empty map vs absent key?

    18. until

      "to"?

    19. identifier for multiple keys instead of an identity

      "identify with multiple keys instead of a specific identifier"?

    20. broader

      This delegated capabilities did not attenuate

    21. 2.11.1 Valid Time Range

      Could be additionally clarified, that delegation validFrom + validTo timerange should be contained in the timerange of its Proof UCAN.

    22. exp

      exp may be mistakenly thought of "expires in some time" rather than "expires at time".

      Perhaps use "validFromTime" vft and "validToTime" vtt or smth?

    23. Yes

      Why not optional, as with nbf?

  6. Aug 2023
    1. alice

      bob?

    2. bafy...sendBobEmailInvocation

      Should it include "cause" field, pointing to updateDnsInvocation?

    3. Batched

      Should we reverse arrows to indicate that tasks depend / contain?

    4. Executor MUST fail Task that Awaits failed output of the successful Task.

      What's the value from doing so? Could Executor not run such task at all?

    5. Effect

      Can we represent Effect as Task?

      In order to make spec more compact / generic.

    6. version

      of what?

    7. fork and join

      "sync" and "async"

    8. Result

      How is it different from Receipt?

    9. "cause": {"/": "bafy...somePriorInvocation"},

      Can it reuse await/ok approach?

    10. updateDnsTask

      updateDnsInstruction?

    11. sendEmailTask

      sendEmailInstruction?

    12. createBlogPostTask

      createBlogPostInstruction?

    1. "200GB"

      Should it be in the same domain model as other byte values? E.g., [200 "giga" "bytes"].

    2. [500, "kilo", "bytes"]

      Would it be useful for "memory" (and other byte value fields) to support number value in bytes?

    3. [500, "milli", "seconds"]

      Would it be useful for "timeout" to support number value in milliseconds.

      It's a rather standard approach, may be easy to use.

    1. The operation of adding upall changes is stream integration.

      Akin to reduce(previousDB, tx) => currentDB

    2. Δ푉 = D(↑푄(퐷퐵)) = D(↑푄(I(푇)))

      ^Q can be generalized as yet another T, denotade sa ^T (^ hints that it this "live" T may be applied on top of other Ts / maintains a "live" view).

      This gives the ability for a ^T to depend on other ^Ts.

      So, for each ^T in Ts, ^T(I(Ts)) = ^T(DB).


      Additionally, DB is a snapshot. Perhaps ^T(DB) better denoted as ^T(Ts).

      Thus the relation can be written as

      ΔV = D(^T(Ts)

      Additionally, D is akin to Δ, denoting it as such we end up with

      ΔV = Δ^T(Ts), for each ^T in Ts.

      And since Ts are versioned, ^T(TsN) implicitly has access to ^T(TsN-1).

      I.e., TsN contains ^T(TsN-1), for each ^T.

      Which allows ^T to be incrementally computed over it's previous value.

      ^T(^T(TsN-1), TN)

      ^T has function signature akin to that of reduce, i.e., ^T(accumulator, sequence element)

    1. However, while developing a system, classes will be defined in various places, and it makes sense to be able to see relevant (applicable) methods adjacent to these classes.

      Classes / onthologies are not a core feature of the language.

      It's how we have RDF and OWS - they're separate.

      Classes can be build on top of pure functions and data - these two are the core, nothing else.

      Perhaps even functions can be eliminated from the core. Function is a template of some computation. It can be baked-in into the program. Since names are user-level feature.

      So we end up with data and ops on it as core, and some flow control primitives (perhaps or and and is enough). The rest can be built on top. As to what data to provide, multisets seem to be the most universal data structure / less restrictive, out of which more special data structures can be derived. And with advent of SSDs we are not limited by performance to sequential reads, so perhaps it'll be not all-to-crazy to switch to multisets as basic structural block of programs from lists.

    2. There will also be means of associating a name with the generic function.

      Naming system is not the core part of a language.

      Naming system serves two purposes:

      1. Create structure of a program

      2. Give a user-friendly interface

      You don't need 2. in core of your language. How data (your program) is displayed should be up to the end-user (programmer). If he wants to see it as text, formatted as a LISP - his choise, if he wants to see it as text in a Java-like style - ok, Haskel-like - sure, visual - no prob.

      Having languages as data allows just that. It helps us get rid of accidental complexity from managing a syntax-heavy bag of text files (and having compilers). E.g., how Unison lang have AST as data structure and text-based interface to tweak it.

      Having code as data would also make run-time tweaking more easier, bringing us closer to the promise of LISP.

      And also all the rest of neat features on top of content-addressing of code, that are now waaay easier to implement, such as incremental compilation, distributed compute, caching.

      Have names as user-level feature, their personal dictionaries. Some will call reducing function reduce, some fold, some foldr, some will represent it as a triangle (for visual code management).

    3. more elaborate object-oriented support

      In no part is a core feature for a language.

      The mainstream OOP is complex and has many responsibilities.

      OOP as envisioned by it's creator is actor model - state management (state + managing actor) paired with linking actors together - a complex approach. Can be broken down to it's primitives. And OOP can be constructed out of them, if so desired, but not at the core level.

      A good reference of language decomplection is Clojure.

    4. In the second layer I include multiple values

      Treating single values as a special case of multiple values is generally more performant.

    5. the elaborate IO functions

      IO is not the core of the language. It's more of an utility layer that allow the language to speak to the outside world.