Partial Materialized Views
.
Partial Materialized Views
.
doc.get(“todo”).idx(1).get(“done”) := true;
A fine dot.notation interface for delta-based CRDT
In order to generate globally unique operation identifierswithout requiring synchronous coordination between repli-cas we use Lamport timestamps [42].
Ehh. Isn't author+deps enough?
“done”: true
Ehh, why?
This node has been marked as deleted and has"done" true field. Why is this field leaked out of the deleted node?
Data Laced with history gets it right by having data structure as AST. Where you'd have a node with this field, and that has been marked as deleted, resulting in node and all its fields "deleted".
Collaboration = keep all edits and merge them
Agregore works by letting you load web pages and content from peer to peer protocols the same way you would load them from a website.
I.e., use Hypercore to maintain a personal website. Use Aggregore as browser.
Append a new value to the autobase.
This makes autobase view not just a view.
Why isn't it done in one of the inputs?
Hypercore is a secure, distributed append-only log built for sharing large datasets and streams of real-time data. It comes with a secure transport protocol, making it easy to build fast and scalable peer-to-peer applications.
Hypercore bundles CRDT with p2p.
It is akin to OrbitDB's db, if only it'd have one writter.
OrbitDB is a toolkit, where you can assemble your CRDT p2p solution. Making use of components: - ipfs-log as base for your CRDT - libp2p as base for your p2p - auth to manage readers and writters
Whereas Hypercore seems to bundle it all up.
Multi-writer will allow Dats to be modified by multiple devices and multiple authors at the same time. Each author will have their own secret key and publish a Dat with their data in it. Multi-writer fuses all of these separate Dats into one “meta-Dat” that is a view of everyone’s data combined
multi-causal operations will prevent the structured log from being linearly interpretable and may severely affect performance, simplicity, and intent
Welp, what can we do when an op does express intent that depends on multiple others? E.g., transferm money from Bob to Alice
This won’t be convergent by default since the operations don’t have an inherent total order, but it’s easy to fix this by giving each one a globally-unique ID in the form of an owner UUID 3 plus a Lamport timestamp. (In the diagrams below, this is encoded as “SX@TY”, where “SX” represents the UUID for site X, and Y is just the timestamp.) With this scheme, no two operations can have the same ID: operations from the same owner will have different timestamps, while operations from different owners will have different UUIDs
Welp, would be nice to have author inside the operation.
Lamport timestamp doesn't seem to add any new information. It's an idx of op, which can be derived from the longest chain of its parent if needed.
Wall clock is useful. It's akin to "decision time".
Wall clock + author would be enough for uniqueness of operptons. And linking can be done by hash.
Nonetheless, I made one additional tweak to ensure that remapping only happens very rarely. Instead of storing just the UUID in the site map, I also store the wall clock time at which the UUID was added. In the site map, these tuples are sorted first by time, then by UUID. Assuming that modern connected devices tend to have relatively accurate clocks (but not relying on this fact for correctness), we can ensure that new sites almost always get appended to the end of the ordered array and thus avoid shifting any of the existing UUIDs out of their previous spots. The only exception is when multiple sites happen to be added concurrently or when the wall clock on a site is significantly off.
Perhaps an alternative way is to have a tx log on each devices - which will only be accreted with newly observed ops. Then it's order never changes and we can use idx as identifiers that stay evergreen.
I’ve solved this with the help of a secondary CRDT that’s stored and transferred along with the CT: an ordered, insert-only array of known UUIDs called the site map. The 16-bit site identifier corresponding to a UUID is simply its index in the array.
Reminds of the techinque WasmTree uses, where an internal representation of RDF nodes replace URLs with an integer ID and has a map to resolve URLs back.
Except that here we use idx of a log entry in place of its UUID. Which requires to update mapping on merge of new ops. Whereas by having integer ids in entries, in place of UUIDs, we would be spared from that. Con is that there is a storage cost to store those integer ids. Which is lower than cost of UUIDS, but more than of that idx approach.
We see further interesting problems around types, schemamigrations, and compatibility. Different collaborators may beusing different versions of an application, potentially withdifferent features. As there is no central database server,there is no authoritative “current” schema for the data. Howcan we write software so that varying application versionscan safely interoperate, even as data formats evolve?
Events to the rescue?
CRDTs do not require a peer-to-peer networking layer;using a server for communication is fine for CRDTs. However,to fully realize the longevity goal of local-first software, wewant applications to outlive any backend services managedby their vendors, so a decentralized solution is the logicalend goal.
Should not bake a server into a lofi software as a mean to sync. It needs to outlive the server.
URLs are a good mechanism for sharing
URLs as collaboration sites been found handy.
They could be treated as "compilation sites" on a subject.
Where each user could have their own compiled view.
I.e., they are IDs of subjects. Note: ID not= content. Content is user-specific. A personal compilation / view on it.
Visualizing document history is important
Would it be valuable to have it generic?
Same time-travel interface for all the apps.
However, in all the prototypes we devel-oped, we found that the default merge semantics to be suffi-cient, and we have so far not identified any case requiringcustomised semantics
The mentioned "one user sets state on a task to one state and another to another" happened. As mentioned, it's not clear how to resolve. So they did encounter merge conflicts. I guess these conflicts are not app-specific and conflict resolution can be implemented in a generic way as well. Such as present users with choice which delta to adopt.
In practice, the CouchDB model has not been widelyadopted [42]. Various reasons have been cited for this: scala-bility problems when a separate database per user is required;difficulty embedding the JavaScript client in native apps oniOS and Android; the problem of conflict resolution; theunfamiliar MapReduce model for performing queries; andmore. All in all, while we agree with much of the philosophybehind CouchDB, we feel that the implementation has notbeen able to realize the local-first vision in practice.
CouchDB is a meh local-first backend (storage).
These thick-client apps have the advantage of being fastand working offline, because the server sync happens in thebackground.
Rather "thick-client local-first apps". As "thick-client" on it's own does not imply local-first, there may be a server.
web apps will never be ableto provide all the local-first properties we are looking for,due to the fundamental thin-client nature of the platform
Web apps can be thick clients.
Optionally they can be loaded gradually. E.g., with granularity down to components, as shown by ReactJS.
Having loaded the entirety - they can potentially work offline.
Making web app and its components content-addressable would allow to load them with no regard to location / some authoritative service.
In many web browsers,if the user clears their cookies, all data in local storage isalso deleted [121]; while this is not a problem for a cache,it makes the browser’s local storage unsuitable for storingdata of any long-term importance
Local Storage is not a good solution for persistence.
In principle it is possible to collaborate without a repository service, e.g.by sending patch files by email [48],
Sending git diffs as text over emails or other medius is the way it's been meant to use, if I'm not mistaken.
It’s interesting to note that most software engineers havebeen reluctant to embrace cloud software for their editors,IDEs, runtime environments, and build tools. In theory, wemight expect this demographic of sophisticated users to em-brace newer technologies sooner than other types of users.
I.e., software engineers may be best keen to adopt lofi concepts.
But Git has no capability forreal-time, fine-grained collaboration, such as the auto-matic, instantaneous merging that occurs in tools likeGoogle Docs, Trello, and Figma
Because it captures snapshots and not deltas.
Git is excellent for asynchronous collaboration, espe-cially using pull requests (Figure 6, [61]), which takea coarse-grained set of changes and allow them to bediscussed and amended before merging them into theshared master branch
There is no "shared master branch" in git. Subsequently, no authority that decides what's in it. Each git user is free to adopt any changes they see fit.
If your computer’s hard drive fails, you canrestore your work simply by installing the app and waitingfor it to sync.
In an event when the sync service is down and you lose local copy - you won't be able to restore.
For this case it's better to have multiple sync options. Such as your other devices and incentivized third-parties.
The flip side to this is a total loss of ownership and control:the data on the server is what counts, and any data on yourclient device is unimportant — it is merely a cache
Whereas it is the opposite in reality. Devices are the primary sites of intent they issue. And server is merely an aggregation site.
Attachments are easy to understandand trustworthy.
Given email is signed.
Using a version control system such as Git
Git alone does not dictate how one transfers stuff.
Consider a significant personal creation, such as aPhD thesis or the raw footage of a film. For these you mightbe willing to take responsibility for storage and backups inorder to be certain that your data is safe and fully under yourcontrol
Loosing a PHD paper, especially after it has been published and some other papers already referred to it / based their reasoning on top of it, would be a serious hit to humanity's knowledge. Sadly, this is the reality today, may happen.
With data ownership comes responsibility: maintainingbackups or other preventative measures against data loss,protecting against ransomware, and general organizing andmanaging of file archives
Would be nice for lofi software to make it as seemless as replicating across chosen parties automatically.
E.g., in your account you linked your devices, and chosen them as replication sites. In addition you chosen a third-party, e.g., Web3Storage.
So you can use apps and rest worry-free about persistence of your data.
so you have the freedom to process thisdata in arbitrary ways
Even better if software uses known data model, such as RDF, and so you're not locked onto the tool.
Perhaps even better to capture your intent as events and have data models derived out of it. This way you're not even locked at e.g., RDF ecosystem.
You should be able to copy and modifydata in any way, write down any thought,7 and no companyshould restrict what you are allowed to do
I.e., don't rely on third-party as a medium to express yourself. Otherwise they're in control what you can express.
E.g., Twitter/X, Google Docs and their bans.
Although there does not seem to be a great danger ofGoogle shutting down Google Docs anytime soon, popu-lar products (e.g. Google Reader) do sometimes get shutdown [106] or lose data [105], so we know to be careful
I have dread in my head every time I use cloud service that there is an issue about to happen. There will be a data breach, data loss, service becoming unavailable (e.g., they don't like you anymor, e.g., Github can block you from your private repos at their will), it's just a matter of time.
So I need to get my data out, back it up, if service been kind to allow for that. But that's a burden. It's a hidden cost you learn to recognize over time. May not be apparent from the start to everybody.
Cuneiform script on clay tablet, ca. 3000 BCE. Image from Wikimedia Commons [5].
This needs explanation.
Besides having severalpeople edit the same document in real-time, it is sometimesuseful for one person to tentatively propose changes thatcan be reviewed and selectively applied by someone else
This stems from the design that there is a mutable place and some authority over it. E.g., a Google Doc, Wikipedia page.
Would be nicer to embrace people as primary sites of their intent. And allow to compose them. This way a person who would previously need to request changes to a doc, would simply have a kind of forked doc locally, where he would make the change. And that delta may be shared with the original doc's authors, who may accept or reject it. With no regard to their decision, the forked version stays as intented, with the delta.
I.e., docs are sites that aggregated some deltas. Have a site per person. Allow sites to exhcange deltas. This allows for having personal views on a subject.
Akin to how it's done in Git. If only it tracked deltas and they can be arbitrary.
I.e., NextGraph gets it pretty damm right.
Also RhizomeDB is this spirit.
neverneeding to show you a spinner while you wait
For the cases when you do wish to send op to a remote party you'd still need a spinner. Given you embrace distributed compute.
Local-first software is different: because it keeps the pri-mary copy of the data on the local device, there is never aneed for the user to wait for a request to a server to complete.
Local-first has most responsive UX due to operations being performed locally, as opposed to sending them to the server.
The user interface may try to hide that latency by showingthe operation as if it were complete, even though the requestis still in progress — a pattern known as Optimistic UI [92] —but until the request is complete, there is always the possibil-ity that it may fail (for example, due to an unstable Internetconnection).
Optimistic UI / Optimistic Updates may actually fail at the source being updated (suprise-suprise, the word "optimistic" is in the definition).
Online eventprocessing (OLEP) [36] is an approach for handling such multi-partition interactions by breaking them down into multiple streamprocessing stages.
This breaks down the original intent transferMoney(Alice -> Bob) into multiple one. Thus we don't have atomicity, as shown in the paper.
Breaking down seems to be just one approach of how such events can be handled by OLEP though. An implementation detail.
It seems possible to achieve atomicity and isolation for such events, as described here.
then it is no longer sufficientto process the events in each partition independently
We can treat partitions as aggregates of events.
Given there is Alice and Bob, and Alice transfers Bob money.
Then we have 3 partitions, that aggregate their respective events.
Event transferMoney(Alice -> Bob) would end up in all three partitions: Alice, Bob, Alice+Bob.
A challenge in many event-based systems is how to handle changesin the schema or data format [56]. The problem is especially pro-nounced in systems where we cannot guarantee that all replicasare running the same version of the software. For example, in ap-plications that run on end-users’ devices, it is up to the user todecide when to install a software update; thus, a user who is hes-itant to install updates may be running a much older version ofan application than a user who always installs the latest version.Nevertheless, those users should be able to interoperate as much aspossible, which means that any data format changes must be bothforward and backward compatible. The challenge becomes evengreater if users are able to customise the software they are running,e.g. through end-user programming [30].
Event-processing logic may be different across users.
In systems that are based on immutable events, one promisingapproach is to use bidirectional functions (lenses) to convert be-tween different versions of a data model, which allows differentreplicas to use different state representations while still being ableto interoperate [49].
Bidirectional lenses as a way to integrate different data models.
Partially ordered event-based systems are well placed to supportsuch branching-and-merging workflows, since they already makedata changes explicit in the form of events, and their support forconcurrent updates allows several versions of a dataset to coexistside-by-side.
Event-based systems allow for divergent views as first-class citizens.
in Git terms, one user can create a branch (a setof commits that are not yet part of the main document version),and another user can choose whether to merge it
Users are empowered to create composed views out of events of their choice.
I.e., collaboration as composition.
I.e., divergent views as first-class citizens.
Database transactions support a weak form of multi-version con-currency control by allowing an uncommitted transaction’s writesto be either committed or rolled back by aborting the transaction [8].However, most databases do not allow one user to share the state ofan uncommitted transaction with another user, and most databasesdo not allow the user to find out what data has changed in anuncommitted transaction (the equivalent of git diff). Moreover,in most database systems, an uncommitted transaction may holdlocks and thus prevent other transactions from making progress.
I.e., DB-first approach does not have divergent view as first-class citizen.
compute the differencesbetween versions of the document to visualise the change history
we can reconstruct the state of thedocument at any past moment in time
an initial event represents only the intention toperform a certain action
Moreover, if processing an event may have external side-effectsbesides updating a replica state – for example, if it may trigger anemail to be sent – then the time warp approach requires some wayof undoing or compensating for those side-effects in the case wherea previously processed event is affected by a late-arriving eventwith an earlier timestamp. It is not possible to un-send an emailonce it has been sent, but it is possible to send a follow-up emailwith a correction, if necessary. If the possibility of such correctionsis unacceptable, optimistic replication cannot be used, and SMR oranother strongly consistent approach must be used instead. In manybusiness systems, corrections or apologies arise from the regularcourse of business anyway [27], so maybe occasional correctionsdue to out-of-order events are also acceptable in practice.
However, when a replicaprocesses events out of timestamp order (inserting an event some-where in the middle of the timestamp-ordered sequence), it mustbe able to roll back the replica state to the state at the time corre-sponding to the insertion position, apply the new event, and thenreplay the events whose timestamps are greater than that of thenew event [64]. This approach is known as time warp [31].
time warp as solution for eventually consistent derived state out of events.
Alternative solution is to keep snapshots.
These two solutions can complement each other.
In a partially ordered system it is still possible to enforce a to-tal order on events after the fact, as illustrated in Figure 2. Wedo this by attaching a logical timestamp to each event; Lamporttimestamps [45] are a common choice.
However, other eventsmay be concurrent, which means that neither happened before theother; in this case, different replicas may process those events in adifferent order [10].
If permanent deletion of records is required (e.g. to delete per-sonal data in compliance with the GDPR right to be forgotten [62]),an immutable event log requires extra care.
In applications with a high rate of events, storing and replaying thelog may be expensive
the level of indirection between the event log andthe resulting database state adds complexity in some types of appli-cations that are more easily expressed in terms of state mutations
itis less familiar to most application developers than mutable-statedatabases
Blockchains and distributed ledgers also use SMR, in which casethe chain of blocks (and the transactions therein) constitutes theevent log, the ledger (e.g. the balance of every account) is the result-ing state, and smart contracts or the network’s built-in transactionprocessing logic are the state transition function [66].
it is easy to maintain severaldifferent views onto the same underlying event log if needed
If the applicationdevelopers wish to change the logic for processing an event, forexample to change the resulting database schema or to fix a bug,they can set up a new replica, replay the existing event log usingthe new processing function, switch clients to reading from thenew replica instead of the old one, and then decommission theold replica [34].
well-designed events oftencapture the intent and meaning of operations better than eventsthat are a mere side-effect of a state mutation [68].
Disadvantages of the OLEP ap-proach. In the previous examples, log consumers update the state in data stores (the database and search index in Figure 2; the account balances and account statements in Figure 3). While the OLEP approach ensures every event in the log will eventually be processed by every consumer, even in the face of crashes, there is no upper bound on the time until an event is processed.This means if a client reads from two different data stores that are up-dated by two different consumers or log partitions, then the values read by the client may be inconsistent with each other. For example, reading the source and destination accounts of a payment may return the source ac-count after the payment has been pro-cessed, but the destination account before it has been processed. Thus, even though the accounts will even-tually converge toward a consistent state, they may be inconsistent when read at one particular point in time
There are two culprits here: 1. Original intent (transfer from Alice to Bob) has been split into different intents (withdraw from Alice, receive to Bob)
I.e., there is no atomicity at the level of original intent, and subsequently there is no isolation.
I.e., there is no consistency.
Instead we can have 1 event that describes the original intent.
It can be complemented by withdrawal approval and receiving approval.
This augmented event, in form of
(approvedReception (approvedWithdrawal (transaction Alice->Bob))
can be replicated across the corresponding source account and destination account logs.
This way we have holistic view on what happens.
This augmented event can be replicated across source account and destination account. Leading to an eventually consistent states there. Since we have requirement on atomicity of writes this event belongs in Alice+Bob log. Where each event relating to both Alice and Bob end up. Alice log and Bob log depend on Alice+Bob log. Thus ensuring that Alice and Bob are consistent with regard to Alice+Bob transactions.
To cater for 2), we need to make sure that reads happen on top of the latest log entry. We can represent read as an event that depend on the latest log event. Thus, it'll be processed on DB state as of that dependent event.
Splitting a “transaction” into a mul-tistage pipeline of stream processors allows each stage to make progress based only on local data; it ensures that one partition is never blocked waiting for communication or coordination with another partition.
Event logs allow for gradual processing of events.
From a data analysis point of view, an event log is more valuable than the state in a database. For example, in an e-commerce setting, it is valuable for business analysts to see not only the final state of the cart at checkout, but also the full sequence of items added to and removed from the cart, since the removed items carry information, too (for example, one product is a sub-stitute for another, or the customer may return to buy a certain item on a later occasion).
Event logs represent deltas. Whereas DB's are snapshots. Deltas may be useful in order to get insight about intentions.
In contrast, in a database that sup-ports arbitrary insertions, updates, and deletes, it is much harder to re-cover from incorrect writes, poten-tially requiring the database to be re-stored from a backup.
DB-first approach makes harder to reconcile happened events.
Some log-based stream processors such as Apache Flink support so-called exactly-once semantics, which means that even though an event may be pro-cessed more than once, the effect of the processing will be the same as if it had been processed exactly once. This behavior is implemented by manag-ing side effects within the processing framework and atomically commit-ting these side effects together with the checkpoint that marks a section of the log as processed.
Only-once semantics for side-effects by linking side-effects to a checkpoint. Thus we have a kind of a commit with side-effects as of some checkpoint. They are processed only then. And having checkpoint guarantees that it'll be picked from this state, and so side-effects won't be issued twice.
This is not the same as idempotence of side-effects. It merely guarantees that side-effects is expressed only once.
It relies that event processing won't replay events before a checkpoint. This is not the case for all architectures. Some may use time-warp technique. Some may replay from the start.
Since it is possible for an event to be processed more than once when re-covering from a failure, state updates must also be idempotent.
Idempotency of side-effects as a solution for multiple invocation of the same side-effect (intent).
Thus, if a subscriber crashes and re-starts, it may append duplicate events to other logs.
A solution is to have events idempotent.
E.g., to have events describe intent, and be content-addressable.
Then, the same intent = the same content-based id = noop by others that already received it.
A subscriber periodically check-points the latest LSN it has processed to stable storage. When a subscriber crashes, upon recovery it resumes pro-cessing from the latest checkpointed LSN. Thus, a subscriber may process some events twice (those processed between the last checkpoint and the crash), but it never skips any events. Events in the log are processed at least once by each subscriber.
To refer to a piece of content, assign it an immutable Unique ID (UID). Use that UID in operations involving the content, instead of using a mutable descriptor like its index in a list.
Or use ID for a node in causally aware DAG as its hash.
all commits depending on an expiring commit must expire at the same time as, or earlier than the one they’re depending on.
why so?
Data deletion is possible by setting an expiry time for the storage objects.
This is akin to specifying "validTo". However, why remove "no more valid" commits? The can be of use for time-travelling queries? E.g., "give me the state as of that validTime". And performing such queries at time when something's expired could omit/filter it out.
its commits can be garbage collected
those, that are not used in other branches
or the branch owner can make a snapshot and compact the commits of the previous branch and thus remove dependencies on earlier commits
Huh, we're losing authorship this way. And commit deduplication.
and removing access from past members by excluding those members from the newly encrypted branch definition
why not have a removeAccess kind of commit?
Which would allow to have management of authorized parties without the need for creating a new branch.
in order to reduce branching in the DAG
how so?
and to allow deduplication of commit bodies
Nice
Valid commits are acknowledged by a quorum of publishers in subsequent commits.
We may end up in scenario, where a SPARQL tx that generated commits acrsoss multiple repos is only partially valid. In one repo it's commit is valid, in another it's consider invalid. Leaving us in a half-horse-half-zebra state.
ackDelay
commitAckDelay?
Reliability
Transmitting an event twice is a noop when we have causal deps on events. Is this purely for optimization purpose?
Causal delivery
Why have it if each commit has it's dependencies?
public key & secret
Why have secret? Is public key not enough to uniquely ID a repo?
Each user is responsible for acquiring access to one or more core nodes of their choice, which can be self-hosted or offered by service providers.
So the brokers are not per pub/sub / repo, but per user. They are a contact point / API / gateway for that user.
The core network facilitates communication among remote nodes in different edge networks, and enables asynchronous communication via store-and-forward message brokers that provide routing and storage services.
These brokes take on two responsibilities: 1) overall network health 2) store-and-forward for specific overlay
They'd need to be incentivized. Stakeholder of responsibilities differ. 1) stakeholders are everybody 2) stakeholders are that specific overlay's members
How this incentive's done? Can IPFS services be used for 2)? Such as Web3Storage for store.
Data that has been removed by a commit remains in the branch, since all commits are kept in the branch.
Since it's content-addressed, can peers agree on not keeping / pinning such content?
how does decentralised identity and authentication work?
e.g., auth as DID that have public keys on it
CRDTs are the answer to this challenge as they bring strong eventual consistency and also offline-first capabilities.
CRDTs at the level of compute-as-data especially interesting imo
E2EE forces the data to be present and manipulated locally, which in turn implies that data processing (query engine) must run in embedded/mobile devices.
Still, a third-party can be authorized with access to data and asked to perform a query on behalf of user.
Therefor decentralised PubSub technologies should be used in order to synchronise efficiently semantic data between peers.
Semantic Web PubSub on top of libp2p?
O(m ∗n) time
Hmm, it seems that the proposed algorithm may terminate prior to reaching n, having found the latest dependency.
Additionally, algorithm can be restructured to complete in one pass, as "go through the log until you have found all latest deps". Then algorithm will have time complexity up to O(n). If I'm not mistaken.
As an alternative, perhaps keeping indexes per each variable may provide interesting tradeoff of time for space. UPD: this technique is described in 3.3.1.
Connect the new node with existing DFG nodes to track any AST readdependencies.
It seems this can also be delayed up to read request. Since dependencies are needed to perform computation, and computation is only needed for read.
Each node in the DFG is labelled with the positive integer, whichspecifies their specific position in the ordered sequence.
Could instead store the last AST and have that AST point out to dependent ASTs, making one AST DAG. Content-addressing of ASTs seems nice too
familiar querying power of SQL in the client.
SQL may not be that familiar for web devs.
GraphQL and dot.notation is what they're used to.
They both suggested emulating API request/response patterns through a distributed state machine running on a replicated object.
Have authority to sign / process updates.
I.e., there's an authority, e.g., a shop, and it would need to process requests.
What the architecture can do is to abstract from URLs/exact endpoints, having ID of the authority instead, and abstract away transport layer, i.e., nodes in network talk with each other and we don't care how, all we care is expressing intents in data and they get passed around somehow.
Invoke Invoke
Creating Invocation in order to delegate execution?
I had off channel conversation with @expede where she identified following problem with the task/spawn / task/fork design Introduces more privitives for things that we can already do (enqueuing more tasks) Introduces nesting of invocations (more cycles and network to discover the entire workflow) Need to add a version that says "but don't run it yourself" when that's not a problem that we have today The label "spawn" reads poorly if it's going to dedup (looks like an imperative command but isn't) One that especially resonated that is that they imply that you always ask executor to run the task as opposed to find a prior task execution.
Good points.
actors
couples produces with consumer. pubsub would be a simpler approach
Task
Receipt, perhaps?
WasmTask
Task, perhaps
signature
Have it as part of Workflow, signing over its fields? As in UCAN and Invocation spec
This field helps prevent replay attacks and ensures a unique CID per delegation.
have REQUIRED decisionTime instead?
which requires gaining a global lock on the job
Perhaps an alternative strategy is to authorize side-effectful resources with "only once" restrictions?
E.g., issue UCAN that limits capability to 1 invocation.
E.g., one tweet.
It can be freely delegated by executor further, but only one invocation will be possible.
There may simply be a tradeoff for the programmer to say "yes I really need this effect, even though I'll have fewer options".
Decouple effect from pure compute? Make it "await" that compute.
| Success "ok" -- End task with Success object
ok != handle(error)
Yeah, I agree with you, at least in principal! I think that revocation and memoization get harder if we aren't including the proof CIDs in the credential directly. How would you handle these cases?
Memoization: ``` let allUCANs = findAllUCANs(UCANs, UCANRevocations) // a solution of UCANs that allows for op let opUCANs = findOpUCANs(op, allUCANs) let whetherWithinTimeBounds = withinTimeBounds?(opUCANs, now)
let whetherStillCan = stillCan?(opUCANs, opUCANRevokations, whetherWithinTimeBounds) // memoized // becomes false when revokations arrive or time bounds exceeded ```
I do still think that defining and spec-ing unattenuated delegation is good idea. With all the above I also think that { with: “*”, can: “*” } is the most intuitive way.
att represents attenuation / restriction / narrowing down.
Perhaps absence of attenutaion (att field) is a way to represent "nothing's restricted".
because it now has to beevaluated twice
Even if BGP evaluation engine is to cache results, merging low-selectivity BGP, b1, would incur costs of joining it with the merged-in nodes, b2 and b3. Which is one merge more than in the original BE-tree.
μ1 ∈ Ω1 ∧ μ2 ∈ Ω2 ∧ μ1
Shouldn't these be intersecting / common bindings?
I.e., for each common binding, they should be equivalent.
I.e.,
Ω1 |><| Ω2 = {μ1 ∪ μ2 | for each common variable Vcommon, u1(Vcommon) ~ u2(Vcommon)}
1
meant 2, I guess
1
meant 2, I guess
(tp1 ✶ tp2) ✶ tp3
But we need tp2 >< tp3.
This example rewrite violates that, as optional tp2 will be added on top even if it doesn't >< with tp3.
E.g
By lifting optionals to the upper level this example of
((Pa ]>< Pb) >< (Pc ]>< Pd)) ]>< (Pe ]>< Pf)
could be transformed into:
((Pa >< Pc) >< (Pa ]>< Pb) >< (Pc ]>< Pd)) ]>< (Pe ]>< Pf)
in prefix notation with multiple args to functions, looks:
(]>< (>< Pa Pc) Pb Pd (]>< Pe Pf))
However, the inner-join P2 = tp2 ✶ tp3 has tobe evaluated before the left-outer-join P1 ✶ P2, due to therestrictions on the reorderability of left-outer-joins.
]>< is kinda "enhancing".
I.e., can be evaluated on top of ><.
And so it can be lifted to the upper layer.
E.g.,
tp1 ]>< (tp2 >< tp3)
=
tp1 >< ((tp1 ]>< tp2) >< (tp1 ]>< tp2))
E.g.,
(tp1 ]>< tpo1) >< (tp2 ]>< tpo2)
=
((tp1 >< tp2) ]>< tpo1) >< ((tp1 >< tp2) ]>< tpo2)
Or, if we allow joints to be a function of many arguments, in prefix notation, then:
(]>< tp1 tpo1 tpo2)
(]>< (>< tp1 tp2) tpo1 tpo2)
Overall, we can built a plan where non-optionals are evaluated first and then enhanced by optionals. And it makes sense to do so, it's the least computation-expensive strategy.
e.g., in the case of Q2 above, left-outer-join between tp1 andtp2 cannot be performed before the inner-join P2 = (tp2 ✶tp3).
I.e.,
tp1 ]>< (tp2 >< tp3)
not=
(tp1 ]>< tp2) >< tp3
This shows that ]>< is not associative.
To show that it's not commutative,
tp1 ]>< (tp2 >< tp3)
not=
(tp2 >< tp3) ]>< tp1
Also, ]>< is not distributive over ><.
E.g.,
tp1 ]>< (tp2 >< tp3)
not=
(tp1 ]>< tp2) >< (tp1 ]>< tp2)
Also, >< is not distributive over ]><.
E.g.,
tp1 >< (tp2 ]>< tp3)
not=
(tp1 >< tp2) ]>< (tp1 >< tp2)
It is useful to be able to have queries that allow information to be added to the solution where the information is available, but do not reject the solution because some part of the query pattern does not match.
Optional is meant to only accrete solution with values, never restrict a match for a solution.
It is useful to be able to have queries that allow information to be added to the solution where the information is available, but do not reject the solution because some part of the query pattern does not match.
Optional is meant to only accrete solution with values, never restrict a match for a solution.
GROUP BY
Can also be used with multiple variables.
(?p*(1-?discount) AS ?price)
This, and BIND, could be expressed as CONSTRUCT, allowing for uniform representation of how data's stored - as triples.
E.g.,
sparql
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX ns: <http://example.org/ns#>
SELECT ?title ?discountedPrice
WHERE
{ ?x ns:discountedPrice ?discountedPrice}
CONSTRUCT
{ ?x ns:discountedPrice (?p*(1-?discount)) }
WHERE
{ ?x ns:price ?p .
?x dc:title ?title .
?x ns:discount ?discount
}
I.e.,
clojure
(-> graph
match-product-with-discounts
derive-discounted-price
(select-keys [:title :discountedPrice]))
(?p*(1-?discount) AS ?price)
This can be expressed as BIND at the level of query, sparing the need to introduce another special bind at the level of SELECT. Although SELECT may work on GROUPped solutions,
GRAPH ?g
CONSTRUCT returns a graph, perhaps it would be of value to be able further feed it into queries, perhaps as GRAPH ?constructedGraph {query}
HAVING operates over grouped solution sets, in the same way that FILTER operates over un-grouped ones.
What's value of HAVING over writing it as FILTERing of a subquery?
ToMultiSet
There won't be duplicate values, yet we wrap with a multiset
Filter( ?v1 < 3 , LeftJoin( BGP(?s :p1 ?v1), BGP(?s :p2 ?v2), true) , )
Here, for ?s that are to be removed, they still will get LeftJoined with the optional. Seems like a redundant work. Perhaps filter first and then add optional?
However, after having received the complete sequence of triples, itturns out that μ2 is a solution for the query but μ1 is not; instead, thefollowing new mapping is another solution in the (sound and complete)query result:
From what I understand, OPTIONAL would behave like that only when appearing preceeding to non-optional clause. When it appears after a non-optional clause, it may only accrete values, but it would not restrict.
In fact, as updates change the content of the RDF store, all the activesubscriptions must be checked on the same RDF store snapshot
It would be possible to not block on updates if they were to captured with tx time. Then notifications would be able to grab the delta at tx time and do notification detection in parallel to coming updates.
ucan:./* represents all of the UCANs in the current proofs array.
If we are to have UCANs as delta restrictions, then this behaviour would be expressed automatically - all capabilities of UCANs in proofs would be delegated as is if no further restrcitions are specified.
diagram
Perhaps delegation arrows could be reverse to denote that they include / reference
Also a separate prf field would be of use to show for tracking delegation
Also arrows between capabilities are misleading, capabilities are not references
All of any scheme "owned" by a DID
", that the issuer is authorized to"?
"*"
Is there value in having "*", perhaps instead we could make ucan-selector optional?
eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCIsInVjdiI6IjAuOC4xIn0
Not clear what's in it, perhaps annotate that it's a UCAN or substitute with a human-readable hash "bafy...UCAN"?
Escalation by adding new capability
"Amplification by adding new caveat"?
capability
"caveat"?
Removes
"Attenuates"?
Escalation
"Amplification"?
as an empty array means "in no case"
This is contrary to the intuition behind caveats - they are restrictions, if no restrictions set - allow all.
(an "OR", NOT an "and")
"and" seems useful to have.
E.g., in the example above I'd read it as "and", actually
json
"mailto:username@example.com": {
"msg/send": [{}], // Proof is (correctly) broader than claimed
"msg/receive": [
{
"max_count": 5,
"templates": [
"newsletter",
"marketing"
]
}
]
},
Capabilities composed from multiple proofs (see rights amplification)
Could be unified with 1., having it as
A strict subset (attenuation) of the capability authorities from the prf field
Ability
"Operation"? The way it's named in incovation spec.
Abilities
"Ability" or "Capability"?
Capabilities
"Abilities"?
Capabilities composed from multiple proofs (see rights amplification)
What's value in having amplification?
Such composed tokens may become partially invalid if one of the proofs becomes revoked.
E.g., composing write access to / and /photos/ into / And having / proof become invalid would prevent access to /photos/, as it's been merged. Keeping them separately would insure more invalidation-resilient token. Although that is a point towards not merging rather than composition.
Also, composition may accidentally attenuate, as merged / access inherits timeframe of /, whereas timeframe of /photos/ could have been larger.
Also, due to difference in caveats merging by ability with no regard to caveats may lead to / access attenuate less-caveaty /photos/ access. We could try to determine if / caveats are more ampe than /photos/ caveats and merge only then, but that may be error-prone and complex to do.
Also proofs most likely will have different valid timeframes, so the composed token will be partially invalid at some time.
This map is REQUIRED but MAY be empty.
Seems redundant. What's value in having an empty map vs absent key?
until
"to"?
identifier for multiple keys instead of an identity
"identify with multiple keys instead of a specific identifier"?
broader
This delegated capabilities did not attenuate
2.11.1 Valid Time Range
Could be additionally clarified, that delegation validFrom + validTo timerange should be contained in the timerange of its Proof UCAN.
exp
exp may be mistakenly thought of "expires in some time" rather than "expires at time".
Perhaps use "validFromTime" vft and "validToTime" vtt or smth?
Yes
Why not optional, as with nbf?
Use of this claim is OPTIONAL.
.
alice
bob?
bafy...sendBobEmailInvocation
Should it include "cause" field, pointing to updateDnsInvocation?
Batched
Should we reverse arrows to indicate that tasks depend / contain?
Executor MUST fail Task that Awaits failed output of the successful Task.
What's the value from doing so? Could Executor not run such task at all?
Effect
Can we represent Effect as Task?
In order to make spec more compact / generic.
version
of what?
fork and join
"sync" and "async"
Result
How is it different from Receipt?
"cause": {"/": "bafy...somePriorInvocation"},
Can it reuse await/ok approach?
updateDnsTask
updateDnsInstruction?
sendEmailTask
sendEmailInstruction?
createBlogPostTask
createBlogPostInstruction?
"200GB"
Should it be in the same domain model as other byte values? E.g., [200 "giga" "bytes"].
[500, "kilo", "bytes"]
Would it be useful for "memory" (and other byte value fields) to support number value in bytes?
[500, "milli", "seconds"]
Would it be useful for "timeout" to support number value in milliseconds.
It's a rather standard approach, may be easy to use.
The operation of adding upall changes is stream integration.
Akin to reduce(previousDB, tx) => currentDB
Δ푉 = D(↑푄(퐷퐵)) = D(↑푄(I(푇)))
^Q can be generalized as yet another T, denotade sa ^T (^ hints that it this "live" T may be applied on top of other Ts / maintains a "live" view).
This gives the ability for a ^T to depend on other ^Ts.
So, for each ^T in Ts, ^T(I(Ts)) = ^T(DB).
Additionally, DB is a snapshot. Perhaps ^T(DB) better denoted as ^T(Ts).
Thus the relation can be written as
ΔV = D(^T(Ts)
Additionally, D is akin to Δ, denoting it as such we end up with
ΔV = Δ^T(Ts), for each ^T in Ts.
And since Ts are versioned, ^T(TsN) implicitly has access to ^T(TsN-1).
I.e., TsN contains ^T(TsN-1), for each ^T.
Which allows ^T to be incrementally computed over it's previous value.
^T(^T(TsN-1), TN)
^T has function signature akin to that of reduce, i.e., ^T(accumulator, sequence element)
However, while developing a system, classes will be defined in various places, and it makes sense to be able to see relevant (applicable) methods adjacent to these classes.
Classes / onthologies are not a core feature of the language.
It's how we have RDF and OWS - they're separate.
Classes can be build on top of pure functions and data - these two are the core, nothing else.
Perhaps even functions can be eliminated from the core. Function is a template of some computation. It can be baked-in into the program. Since names are user-level feature.
So we end up with data and ops on it as core, and some flow control primitives (perhaps or and and is enough). The rest can be built on top.
As to what data to provide, multisets seem to be the most universal data structure / less restrictive, out of which more special data structures can be derived. And with advent of SSDs we are not limited by performance to sequential reads, so perhaps it'll be not all-to-crazy to switch to multisets as basic structural block of programs from lists.
There will also be means of associating a name with the generic function.
Naming system is not the core part of a language.
Naming system serves two purposes:
Create structure of a program
Give a user-friendly interface
You don't need 2. in core of your language. How data (your program) is displayed should be up to the end-user (programmer). If he wants to see it as text, formatted as a LISP - his choise, if he wants to see it as text in a Java-like style - ok, Haskel-like - sure, visual - no prob.
Having languages as data allows just that. It helps us get rid of accidental complexity from managing a syntax-heavy bag of text files (and having compilers). E.g., how Unison lang have AST as data structure and text-based interface to tweak it.
Having code as data would also make run-time tweaking more easier, bringing us closer to the promise of LISP.
And also all the rest of neat features on top of content-addressing of code, that are now waaay easier to implement, such as incremental compilation, distributed compute, caching.
Have names as user-level feature, their personal dictionaries. Some will call reducing function reduce, some fold, some foldr, some will represent it as a triangle (for visual code management).
more elaborate object-oriented support
In no part is a core feature for a language.
The mainstream OOP is complex and has many responsibilities.
OOP as envisioned by it's creator is actor model - state management (state + managing actor) paired with linking actors together - a complex approach. Can be broken down to it's primitives. And OOP can be constructed out of them, if so desired, but not at the core level.
A good reference of language decomplection is Clojure.
In the second layer I include multiple values
Treating single values as a special case of multiple values is generally more performant.
the elaborate IO functions
IO is not the core of the language. It's more of an utility layer that allow the language to speak to the outside world.
macros
Would be nice to have them at run-time.
and very basic object-oriented support.
OOP is an abstraction that is useful for a very narrow amount of use-cases, giving accidental complexity to others. Personally, I'd leave it out of the core.
I believe nothing in the kernel need be dynamically redefinable.
This moves us away from the value of LISP as a meta-language that can change itself. We have macros at compile time, and not run-time. Having them at run-time gives us power we've been originally promised by LISP philosophy. Having no run-time dynamism would not allow for this feature.
I.e., having codebase as persistent data structure, tweakable at run-time sounds powerful.
Our environments should not discriminate against non-Lisp programmers the way existing environments do. Lisp is not the center of the world.
A possible approach to that is having LISP hosted, providing a simpler interface on top of established but less simple and expressive environments. E.g., the Clojure way.
All interested parties must step forward for the longer-term effort.
This is an effort to battle the LISP Curse. Would be a great movement, as it's one of the LISPs hinderspots. (another is adoption, tried to be solved by Clojure)
And soon they will have incremental compilation and loading.
That would be great. And it is. Unison Lang gives us this power. Hope for wider adoption.
Having content-addressable codebase simplifies it a ton.
Perhaps an AST can be built on top of IPVM, akin to Unison Lang, but for any WASM 40+ langs out there.
and using the same basic data representations
Having common data structures would be great.
Atm each language implements the same data structures on its own.
Would be nice to have them protocol-based, implementation-agnostic.
Likely, IPLD gives us the common data structures.
The very best Lisp foreign functionality is simply a joke when faced with the above reality.
Been true. Fixed in Clojure. Clojure is top-tier in integration with host platforms.
The real problem has been that almost no progress in Lisp environments has been made in the last 10 years.
That's a good point. Again, not something inherent in LISPs. Possibly due to lack of adoption and the LISP Curse.
Seventh, environments are not multi-user when almost all interesting software is now written in groups.
Not an inherent trait of LISP envs. And LISPs do have multiuser envs, e.g., REPL to which you can connect multiple clients.
Sixth, using the environment is difficult. There are too many things to know. It’s just too hard to manage the mechanics.
Environments for LISPs are simpler due to simplicity of the language they're for. For a more complex language they'd be only more complex and hence more difficult to use.
what is fully defined and what is partially defined
Not sure I get that
Fifth, information is not brought to bear at the right times.
How that is a trait of non-LISP environments?
More than that, it seems LISPs give you greater introspection. E.g., because of REPL. E.g., because of macros that can sprinkle code with introspection in dev environment and be rid of it in prod env.
Fourth, they do not address the software lifecycle in any extensive way.
How is that an inherent property of LISP environments?
Surely they can be extended with all the mentioned properties.
It does require effort to implement, and with sufficient resources behind an environment it would be wise to invest there.
E.g., Emacs has great docs. Clojure ecosystem has great docs and is a joy to work in.
I'd say LISPs have greater potential user-friendliness due to simplicity of interface. Simple interface + good docs is more user-friendly than complex interface + great docs.
And you don't need that much docs in the first place for simple interface.
As well as a simple, well-designed interface can serve as documentation itself, because you can grok on it, instead of going through docs. You mostly need docs for the mindset.
Third, they are not multi-lingual even when foreign interfaces are available.
This is great. It's a desirable trait. But I don't see how that is a unique value available only for Non-Lisp environments.
Files are used to keep persistent data -- how 1960s.
What's wrong with that? Files are universaly accesable on a machine (and with IPFS - across the machines), seems to be a good design for the times. Any programs can interoperate through files - a common interface.
Files are 'caches' of computations made.
Sure, it would be nice to capture computations behind it as well, although that is not practical for back in times and is not that much needed.
But nowadays IPVM does just that at a globe-scale. And, thankfully, we also have a data structures as means to communicate and not a custom text.
I don't see what's wrong with that approach. Taking it further (as IPVM does) gives a next level of simplicity and interoperability, along with immutability/persistence - a game changer.
In fact, I believe no currently available Lisp environment has any serious amount of integration.
Well, that's a shame. Composability is a valuable trait of computer programs, at user interface included. The facts that they're not composable may mean that the problem domain is not well known, so it wasn't clear what are the components. Perhaps with time it'll become clearer. This is, interestingly is a non-the-right-thing approach. UI got shiped to satisfy the need without covering all the cases (integration of UIs). A lean startup approach. E.g., Emacs starter as non-composable and now turns into composable.
The virus lives while the complex organism is stillborn. Lisp must adapt, not the other way around.
What's "right" is context-dependent. For programmers the right thing will be a simple and performant and mainstream etc. language.
LISP did not check all the boxes back then. Clojure now tries to get closer to checking what a programmer may need in production, and has a broader success.
Clojure had effort in it's design to make it a simple-interface thing, and it's excellent in that. It had effort in making it easy to adopt. So it's a well-design virus. The right thing. Virality is one of the traits of the right thing, in the context of production programming.
You know, you cannot write production code as bad as this in C.
Performance is not the only metric of "goodness" of code. Simplicity is one of.
The following examples of badly performing Lisp programs were all written by competent Lisp programmers while writing real applications that were intended for deployment
Often performance is not the higest value for business. It is especially so when we have ever-growing powerful hardware.
LISP allows for simplicity of interface. You can iterate on making the implementation performant, if you need so, later on.
The lesson to be learned from this is that it is often undesirable to go for the right thing first.
Great, don't go 100% in. Especially since those last 20% take 80% of time. But please do have a good interface design in those 50%. It is to stay.
The right thing is frequently a monolithic piece of software
Unix is a mono kernel. C is a fixed language, whereas LISP can be extended with macros.
Composability is a trait of a good design. I'd expect The Right Thing approach to produce composable products, and Worse Is Better approach to produce complexed ones.
there are many more compiler experts who want to make C compilers better than want to make Lisp compilers better
It is more fun to play with simple things. They're more rewarding.
C is more simple in implementation than a LISP. It's more fun to play with it's compiler. LISP is more simpler in interface than C, it's more fun to play with it as a language. (hence the LISP Curse)
I wonder why we don't have a C Curse at the level of compiler though. Or do we?
and third will be improved to a point that is almost the right thing
Huge doubts there. The original interface will stay there in some way or another. And it is complex. So users will pay the cost of it from then on.
E.g., Java is OOP, it introduces functional style, but OOP stays there. Some folks would like to switch to functional, but there is pressure from legacy codebase and legacy mindset around to carry on in the same fashion.
E.g., C is still a C. C++ and other C* are not far away in simplicity of their interface.
Unix is still a text-based chatter mono kernel.
It seems hard to impossible to change the core design, so in Worse is Better it stays Worse.
namely, implementation simplicity was more important than interface simplicity.
Can you go far with such a design before getting down in accidental complexity from having complex abstractions?
Abstracting away is the bread and butter of programming. In order to do it efficiently you need simple abstractions - simple interface. For this task interface simplicity is way more valuable than the implementation simplicity. E.g., you may have a functional interface on top of an imprerative implementation.
Early Unix and C are examples of the use of this school of design
C is complex, compared to LISPs.
Unix has a mono kernel and a ton of C.
Are they the crux of simplicity, which is the highest value of Worse is Better?
it’s about half of the global population at 47.1 percent
.
there are 7.26 billion mobile phone users worldwide
.