105 Matching Annotations

Jun 2021
pdos.csail.mit.edu pdos.csail.mit.edu

6.824 Lab 4: Sharded Key/Value Service

1
1. peerless07 21 Jun 2021
  
  in Public
  
  While the optimization above is good, we can still do better
  
  这句话说的好
Visit annotations in context

Annotators

peerless07

URL

pdos.csail.mit.edu/6.824/labs/lab-shard.html
Jun 2020
15721.courses.cs.cmu.edu 15721.courses.cs.cmu.edu

An Empirical Evaluation of In-Memory Multi-Version Concurrency Control

76
1. peerless07 22 Jun 2020
  
  in Public
  
  he DBMS performs the best on both the low-contention and high-contention workloads with the Oracle/MySQLand NuoDB configurations. This is because these systems’ stor-age schemes scale well in multi-core and in-memory systems, andtheirMV2PLprotocol provides comparatively higher performanceregardless of the workload contention. HYRISE, MemSQL, andHyPer’s configurations yield relatively lower performance, as theuse ofMVOCCprotocol can bring high overhead due to the read-settraversal required by the validation phase. Postgres and Hekaton’sconfigurations lead to the worst performance, and the major reasonis that the use of append-only storage withO2Nordering severelyrestricts the scalability of the system. This experiment demonstratesthat both concurrency control protocol and version storage schemecan have a strong impact on the throughput.
  
  结论不同数据库比较
  
  Experiment
2. peerless07 22 Jun 2020
  
  in Public
  
  We also observed that the performance of a MVCC DBMS istightly coupled with its GC implementation. In particular, we foundthat a transaction-level GC provided the best performance with thesmallest memory footprint. This is because it reclaims expiredtuple versions with lower synchronization overhead than the otherapproaches. We note that the GC process can cause oscillations inthe system’s throughput and memory footprint.
  
  结论GC
  
  Experiment
3. peerless07 22 Jun 2020
  
  in Public
  
  Lastly, we found that the index management scheme can alsoaffect the DBMS’s performance for databases with many secondaryindexes are constructed. The results inSect. 7.5show that logicalpointer scheme always achieve a higher throughput especially whenprocessing update-intensive workloads.
  
  结论索引管理
  
  Experiment
4. peerless07 22 Jun 2020
  
  in Public
  
  We observed that the performanceof append-only and time-travel schemes are influenced by the effi-ciency of the underlying memory allocation schemes; aggressivelypartitioning memory spaces per core resolves this problem. Deltastorage scheme is able to sustain a comparatively high performanceregardless of the memory allocation, especially when only a subsetof the attributes stored in the table is modified. But this schemesuffers from low table scan performance, and may not be a good fitfor read-heavy analytical workloads.
  
  结论Version Storage
  
  Experiment
5. peerless07 22 Jun 2020
  
  in Public
  
  Overall,we found thatMVTOworks well on a variety of workloads. Noneof the systems that we list in Table 1 adopt this protocol
  
  结果CCP
  
  Experiment
6. peerless07 22 Jun 2020
  
  in Public
  
  performance gap is enlarged to 40% with the number of secondaryindexes increased to 20.Fig. 23further shows the advantage of logi-cal pointers. The results show that for the high contention workload,the DBMS’s throughput when using logical pointers is 45% higherthan the throughput of physical pointers. This performance gapdecreases in both the low contention and high contention workloadswith the increase of number of threads
  
  索引管理：Logical pointer更优
  
  Experiment
7. peerless07 22 Jun 2020
  
  in Public
  
  The results in Fig. 22b show that under high contention, logicalpointer achieves 25% higher performance compared to physicalpointer scheme
  
  索引管理：Logical pointer更优
  
  Experiment
8. peerless07 22 Jun 2020
  
  in Public
  
  The results in Fig. 20a indicate that transaction-levelGCachievesslightly better performance than tuple-levelGCfor theread-intensive,but the gap increases to 20% inFig. 20bfor theupdate-intensiveworkload. Transaction-level GC removes expired versions in batches,thereby reducing the synchronization overhead. Both mechanismsimprove throughput by 20–30% compared to when GC is disabled.Fig. 21 shows that both mechanisms reduce the memory usage
  
  GC：不同level比较2
  
  Experiment
9. peerless07 22 Jun 2020
  
  in Public
  
  The results in Fig. 18 show thatCOOPachieves 45% higherthroughput compared toVACunder read-intensive workloads. InFig. 19, we see thatCOOPhas a 30–60% lower memory footprintper transaction thanVAC. Compared toVAC,COOP’s performanceis more stable, as it amortizes the GC overhead across multiplethreads and the memory is reclaimed more quickly. For both work-loads, we see that performance declines over time when GC isdisabled because the DBMS traverses longer version chains to re-trieve the versions. Furthermore, because the system never reclaimsmemory, it allocates new memory for every new version.
  
  GC：不同level比较
  
  Experiment
10. peerless07 22 Jun 2020
  
  in Public
  
  the append-only and time-travel schemes’performance is stable regardless of the number of modified attributes.As expected, the delta scheme performs the best when the numberof modified attributes is small because it copies less data per version.But as the scope of the update operations increases, it is equivalentto the others because it copies the same amount of data per delta
  
  Version Storage比较modified attributes
  
  Experiment
11. peerless07 22 Jun 2020
  
  in Public
  
  We use transaction-level background vacuuming GC andcompare the orderings using two YCSB workload mixtures. We setthe transaction length to 10. We fix the number of DBMS threads to40 and vary the workload’s contention level.
  
  Version Chain Ordering: N2O vs O2N
  
  Experiment
12. peerless07 22 Jun 2020
  
  in Public
  
  We use theYCSBworkload mixtures inthis experiment, but the database is changed to contain a singletable with 10 million tuples, each with one 64-bit primary key and avariable number of 100-byte non-inlineVARCHARtype attributes. Weuse theread-intensiveandupdate-intensiveworkloads under lowcontention (θ=0.2) on 40 threads with each transaction executing 10operations. Each operation only accesses one attribute in a tuple.
  
  Version Storage: 标准setup
  
  Experiment
13. peerless07 22 Jun 2020
  
  in Public
  
  MVOCCis more likely to abortNewOrdertransactions, whereas thePaymentabort rate inMV2PLis 6.8×higher thanNewOrdertransactions. These two transactionsaccess the same table, and again the optimistic protocols only de-tect read conflicts inNewOrdertransactions in the validation phase.SI+SSNachieves a low abort rate due to its anti-dependency track-ing, whereasMVTOavoids false aborts because the timestampassigned to each transaction directly determines their ordering
  
  TPC-C：比较
  
  Experiment
14. peerless07 22 Jun 2020
  
  in Public
  
  MVOCCincurs wastedcomputation because it only detects conflicts in the validation phase.
  
  TPC-C: MVOOC still sucks
  
  Experiment
15. peerless07 22 Jun 2020
  
  in Public
  
  there is not a great difference among the protocols exceptMV2PL; they handle write-write conflicts in a similar way and againmulti-versioning does not help reduce this type of conflicts
  
  MV2PL: WW Conflict
  
  Experiment
16. peerless07 22 Jun 2020
  
  in Public
  
  Beyond this contention level, the performanceofMVOCCis reduced by∼50%. This is becauseMVOCCdoesnot discover that a transaction will abort due to a conflict until afterthe transaction has already executed its operations. There is nothingabout multi-versioning that helps this situation.
  
  MVOCC: reduced performance
  
  Experiment
17. peerless07 22 Jun 2020
  
  in Public
  
  TPC-C:This benchmark is the current standard for measuring theperformance of OLTP systems [43]. It models a warehouse-centricorder processing application with nine tables and five transactiontypes. We modified the original TPC-C workload to include a newtable scan query, calledStockScan, that scans theStocktable andcounts the number of items in each warehouse. The amount of con-tention in the workload is controlled by the number of warehouses
  
  Benchmark: TPC-C -- nine tables & five transaction types
  
  Experiment
18. peerless07 22 Jun 2020
  
  in Public
  
  YCSB:We modified the YCSB [14] benchmark to model differ-ent workload settings of OLTP applications. The database containsa single table with 10 million tuples, each with one 64-bit primarykey and 10 64-bit integer attributes. Each operation is independent;that is, the input of an operation does not depend on the output ofa previous operation. We use three workload mixtures to vary thenumber of reads/update operations per transaction: (1)read-only(100%reads), (2)read-intensive(80%reads, 20%updates), and(3)update-intensive(20%reads, 80%updates). We also vary thenumber of attributes that operations read or update in a tuple. Theoperations access tuples following a Zipfian distribution that is con-trolled by a parameter (θ) that affects the amount of contention (i.e.,skew), whereθ=1.0 is the highest skew setting.
  
  Benchmark：YCSB -- 独立操作，三个workloads
  
  Experiment
19. peerless07 22 Jun 2020
  
  in Public
  
  For each trial, we execute the workload for 60 seconds to let theDBMS to warm up and measure the throughput after another 120seconds. We execute each trial five times and report the averageexecution time.
  
  标准
  
  Experiment
20. peerless07 22 Jun 2020
  
  in Public
  
  We execute all transactions as storedprocedures under theSERIALIZABLEisolation level
  
  实验用的最高isolation level
  
  Experiment
21. peerless07 22 Jun 2020
  
  in Public
  
  With this second scheme, the DBMS stores the physical addressof versions in the index entries. This approach is only applicablefor append-only storage, since the DBMS stores the versions in thesame table and therefore all of the indexes can point to them. Whenupdating any tuple in a table, the DBMS inserts the newly createdversion into all the secondary indexes. In this manner, the DBMScan search for a tuple from a secondary index without comparingthe secondary key with all of the indexed versions. Several MVCCDBMSs, including MemSQL and Hekaton, employ this scheme
  
  Physical Pointers: 只对Append-only storage有用
  
  问题：“since the DBMS stores the versions in the same table and therefore all of the indexes can point to the”这句话怎么理解
  
  Likewise, using physical pointers is better for read-intensive workloads because an index entry points to the exact version. But it is slower for update operations because this scheme requires theDBMS to insert an entry into every secondary index for each new version, which makes update operations slower
  
  Question Concept
22. peerless07 22 Jun 2020
  
  in Public
  
  The main idea of using logical pointers is that the DBMS usesa fixed identifier that does not change for each tuple in its indexentry. Then, as shown in Fig. 5a, the DBMS uses an indirectionlayer that maps a tuple’s identifier to the HEAD of its version chain.This avoids the problem of having to update all of a table’s indexesto point to a new physical location whenever a tuple is modifiedHEADINDEXVERSION CHAINSHEADINDIRECTION(a)Logical PointersHEADINDEXVERSION CHAINSHEAD(b)Physical PointersFigure 5: Index Management– The two ways to map keys to tuples in aMVCC are to use logical pointers with an indirection layer to the versionchain HEAD or to use physical pointers that point to an exact version.(even if the indexed attributes were not changed). Only the mappingentry needs to change each time. But since the index does not pointto the exact version, the DBMS traverses the version chain fromthe HEAD to find the visible version. This approach is compatiblewith any version storage scheme. As we now discuss, there are twoimplementation choices for this mapping
  
  Logical Pointers:
  
  logical pointer approach is better for write-intensive workloads, as the DBMS updates the secondary indexes only when a transaction modifies the indexes attributes.
  
  Concept
23. peerless07 22 Jun 2020
  
  in Public
  
  Tuple Id (TupleId):One drawback of thePKeypointers is thatthe database’s storage overhead increases as the size of a tuple’sprimary key increases, since each secondary index has an entirecopy of it. In addition to this, since most DBMSs use an order-preserving data structure for its primary key indexes, the cost ofperforming the additional look-up depends on the number of entries.An alternative is to use a unique 64-bit tuple identifier instead ofthe primary key and a separate latch-free hash table to maintain themapping information to the tuple’s version chain HEAD.
  
  Logical Pointers: TupleId
  
  Concept
24. peerless07 22 Jun 2020
  
  in Public
  
  Primary Key (PKey):With this, the identifier is the same as thecorresponding tuple’s primary key. When the DBMS retrieves anentry from a secondary index, it performs another look-up in thetable’s primary key index to locate the version chain HEAD. If asecondary index’s attributes overlap with the primary key, then theDBMS does not have to store the entire primary key in each entry.
  
  Logical Pointers: PKey
  
  Concept
25. peerless07 22 Jun 2020
  
  in Public
  
  Primary key indexes always point to the current version of a tuple.But how often the DBMS updates a primary key index depends onwhether or not its version storage scheme creates new versions whena tuple is updated. For example, a primary key index in the deltascheme always points to the master version for a tuple in the maintable, thus the index does not need to be updated. For append-only,it depends on the version chain ordering:N2Orequires the DBMSto update the primary key index every time a new version is created.If a tuple’s primary key is modified, then the DBMS applies this tothe index as aDELETEfollowed by anINSERT
  
  Index什么时候会被更新
  
  Example: N2O
  
  Concept
26. peerless07 22 Jun 2020
  
  in Public
  
  We define anindex entryas a key/value pair, where thekeyis atuple’s indexed attribute(s) and thevalueis a pointer to that tuple.The DBMS follows this pointer to a tuple’s version chain and thenscans the chain to locate the version that is visible for a transaction.The DBMS will never incur a false negative from an index, but itmay get false positive matches because the index can point to aversion for a key that may not be visible to a particular transaction.
  
  Index entires and false positives
  
  Concept
27. peerless07 22 Jun 2020
  
  in Public
  
  Since MVCC creates new versions when transactions updatetuples, the system will run out of space unless it reclaims the versionsthat are no longer needed.
  
  为何要进行GC
  
  GC在长交易不太行：The DBMS’s performance drops in the presence of long-running transactions. This is because all the versions generated during the lifetime of such a transaction cannot be removed until it completes.
  
  Concept
28. peerless07 22 Jun 2020
  
  in Public
  
  In this GC mechanism, the DBMS reclaims storage space attransaction-level granularity. It is compatible with all of the versionstorage schemes. The DBMS considers a transaction as expiredwhen the versions that it generated are not visible to any activetransaction. After an epoch ends, all of the versions that weregenerated by the transactions belonging to that epoch can be safely
  
  Transaction-level Garbage Collection:
  
  Downside: "however, is that the DBMS tracksthe read/write sets of transactions for each epoch instead of justusing the epoch’s membership counter"
  
  Concept
29. peerless07 22 Jun 2020
  
  in Public
  
  Cooperative Cleaning (COOP):When executing a transaction,the DBMS traverses the version chain to locate the visible version.During this traversal, it identifies the expired versions and recordsthem in a global data structure. This approach scales well as theGC threads no longer needs to detect expired versions, but it onlyworks for theO2Nappend-only storage. One additional challengeis that if transactions do not traverse a version chain for a particulartuple, then the system will never remove its expired versions. Thisproblem is called “dusty corners” in Hekaton [16]. The DBMSovercomes this by periodically performing a complete GC pass witha separate thread like inVAC.
  
  Tuple-level GC: COOP
  
  如果不遍历某一个chain，那么就无法GC而变成dusty corners
  
  Concept
30. peerless07 22 Jun 2020
  
  in Public
  
  Background Vacuuming (VAC):The DBMS uses backgroundthreads that periodically scan the database for expired versions. Asshown in Table 1, this is the most common approach in MVCCDBMSs as it is easier to implement and works with all version stor-age schemes. But this mechanism does not scale for large databases,especially with a small number of GC threads. A more scalableapproach is where transactions register the invalidated versions ina latch-free data structure [27]. The GC threads then reclaim theseexpired versions using the epoch-based scheme described above.Another optimization is where the DBMS maintains a bitmap ofdirty blocks so that the vacuum threads do not examine blocks thatwere not modified since the last GC pass
  
  Tuple-level GC: VAC
  
  Concept
31. peerless07 22 Jun 2020
  
  in Public
  
  There are two GC implementations for a MVCC that differ onhow the DBMS looks for expired versions. The first approachistuple-levelGC wherein the DBMS examines the visibility ofindividual tuples. The second istransaction-levelGC that checkswhether any version created by a finished transaction is visible
  
  GC的两种实现
  
  Concept
32. peerless07 22 Jun 2020
  
  in Public
  
  The DBMS registers each newtransaction into the active epoch and increments this counter. Whena transaction finishes, the DBMS removes it from its epoch (whichmay no longer be the current active one) and decrements this counter.If a non-active epoch’s counter reaches zero and all of the previousepochs also do not contain active transactions, then it is safe for theDBMS to reclaim expired versions that were updated in this epoch
  
  Epoch流程
  
  Concept
33. peerless07 22 Jun 2020
  
  in Public
  
  An in-memory DBMS can avoid this problem with coarse-grainedepoch-basedmemory management that tracks the versions createdby transactions [44]. There is always one active epoch and an FIFOqueue of prior epochs
  
  GC如何解决bottleneck：用epoch
  
  Concept
34. peerless07 22 Jun 2020
  
  in Public
  
  The GC process is divided into three steps: (1) detect expiredversions, (2) unlink those versions from their associated chains andindexes, and (3) reclaim their storage space. The DBMS considers aversion as expired if it is either an invalid version (i.e., created byan aborted transaction) or it is not visible to any active transaction.For the latter, the DBMS checks whether a version’send-tsisless than theTidof all active transactions. The DBMS maintainsa centralized data structure to track this information, but this is ascalability bottleneck in a multi-core system
  
  GC的三个步骤
  
  为什么end-ts比其他交易的时间戳小，那么他就对任何active的交易不可见呢？（可以从版本来理解，end-ts在最新版本里是被set成无穷的）
  
  Concept
35. peerless07 22 Jun 2020
  
  in Public
  
  all of the tuple versions for a table are storedin the same storage space. This approach is used in Postgres, aswell as in-memory DBMSs like Hekaton, NuoDB, and MemSQL.To update an existing tuple, the DBMS first acquires an empty slotfrom the table for the new tuple version. It then copies the contentof the current version to the new version. Finally, it applies themodifications to the tuple in the newly allocated version slot.
  
  Append-only Storage：
  
  “better for analytical queries that perform large scans because ver-sions are stored contiguously in memory, which minimizes CPUcache misses and is ideal for hardware prefetching”
  
  但是旧版本需要chase pointer
  
  同时给索引结构暴露了很多物理版本，所以可以有很多索引管理的选项
  
  Concept
36. peerless07 22 Jun 2020
  
  in Public
  
  With this last scheme, the DBMS maintains the master versionsof tuples in the main table and a sequence ofdelta versionsin aseparatedelta storage. This storage is referred to as therollbacksegmentin MySQL and Oracle, and is also used in HyPer. Mostexisting DBMSs store the current version of a tuple in the main table.To update an existing tuple, the DBMS acquires a continuous spacefrom the delta storage for creating a new delta version.
  
  Delta Storage: 有利于更新操作，因为 “This scheme is ideal forUPDATEoperations that modify a subsetof a tuple’s attributes because it reduces memory allocations”
  
  但是对于read intensive的工作，却有额外的overhead
  
  Concept
37. peerless07 22 Jun 2020
  
  in Public
  
  To update a tuple, the DBMS first acquires a slot in the time-traveltable and then copies the master version to this location. It thenmodifies the master version stored in the main table. Indexes arenot affected by version chain updates because they always point tothe master version. As such, it avoids the overhead of maintainingthe database’s indexes whenever a transaction updates a tuple and isideal for queries that access the current version of a tuple.
  
  Master版本始终在主表中：不用更新索引减少overhead
38. peerless07 22 Jun 2020
  
  in Public
  
  the older versions are stored in a separate table. TheDBMS maintain amasterversion of each tuple in the main table andmultiple versions of the same tuple in a separate time-travel table.In some DBMSs, like SQL Server, the master version is the currentversion of the tuple. Other systems, like SAP HANA, store theoldest version of a tuple as the master version to provide snapshotisolation
  
  Time-Travel Storage
  
  Concept
39. peerless07 22 Jun 2020
  
  in Public
  
  Another issue with append-only storage is how to deal with non-inline attributes (e.g., BLOBs). Consider a table that has two at-tributes (one integer, one BLOB). When a transaction updates atuple in this table, under the append-only scheme the DBMS createsa copy of the BLOB attributes (even if the transaction did not modifyit), and then the new version will point to this copy. This is wastefulbecause it creates redundant copies. To avoid this problem, oneoptimization is to allow the multiple physical versions of the sametuple to point to the same non-inline data. The DBMS maintainsreference counters for this data to ensure that values are deleted onlywhen they are no longer referenced by any version
  
  BLOB如何处理，复制的话太没有效率，所以允许不同版本指向同一个BLOB但是要维护一个counter用来决定BLOB什么时候可以被删除
40. peerless07 22 Jun 2020
  
  in Public
  
  Newest-to-Oldest (N2O):The alternative is to store the newestversion of the tuple as the version chain’s HEAD (see Fig. 3b). Sincemost transactions access the latest version of a tuple, the DBMS doesnot have to traverse the chain. The downside, however, is that thechain’s HEAD changes whenever a tuple is modified. The DBMSthen updates all of the table’s indexes (both primary and secondary)to point to the new version. As we discuss in Sect. 6.1, one canavoid this problem through an indirection layer that provides a singlelocation that maps the tuple’s latest version to physical address. Withthis setup, the indexes point to tuples’ mapping entry instead of theirphysical locations. This works well for tables with many secondaryindexes but increases the storage overhead
  
  N2O
  
  Concept
41. peerless07 22 Jun 2020
  
  in Public
  
  The key decision with the append-only scheme is how the DBMSorders the tuples’ version chains. Since it is not possible to maintaina latch-free doubly linked list, the version chain only points in onedirection. This ordering has implications on how often the DBMSupdates indexes whenever transactions modify tuples.Oldest-to-Newest (O2N):With this ordering, the chain’s HEADis the oldest extant version of a tuple (see Fig. 3a). This versionmight not be visible to any active transaction but the DBMS has yetto reclaim it. The advantage ofO2Nis that the DBMS need notupdate the indexes to point to a newer version of the tuple wheneverit is modified. But the DBMS potentially traverses a long versionchain to find the latest version during query processing. This is slowbecause of pointer chasing and it pollutes CPU caches by readingunneeded versions. Thus, achieving good performance withO2Nishighly dependent on the system’s ability to prune old versions
  
  O2N
  
  Concept
42. peerless07 22 Jun 2020
  
  in Public
  
  thechain’s HEAD is either the newest or oldest version
  
  Version Storage: head of chain
  
  Concept
43. peerless07 22 Jun 2020
  
  in Public
  
  A transaction can commit only when all ofthe transactions that it depends on have committed.
  
  eagerly update
  
  Concept
44. peerless07 22 Jun 2020
  
  in Public
  
  Atransaction is allowed to commit only when its dependency counteris zero, whereupon the DBMS traverses its dependency list anddecrements the counters for all the transactions that are waitingfor it to finish.
  
  speculative read uncommitted
  
  Concept
45. peerless07 22 Jun 2020
  
  in Public
  
  The serial safety net (SSN) is a newer certifier-based proto-col [45]. Unlike withSSI, which is only applicable to snapshotisolation,SSNworks with any isolation level that is at least as strongasREAD COMMITTED. It also uses a more more precise anomaly de-tection mechanism that reduces the number of unnecessary aborts.SSNencodes the transaction dependency information into meta-data fields and validates a transactionT’s consistency by computinga low watermark that summarizes “dangerous” transactions thatcommitted before theTbut must be serialized afterT[45]. Re-ducing the number of false aborts makesSSNmore amenable toworkloads with read-only or read-mostly transactions
  
  Serialization Certifier: SSN -- 利于只读/Read-mostly的交易
  
  Concept
46. peerless07 22 Jun 2020
  
  in Public
  
  the DBMS maintains a serialization graphfor detecting and removing “dangerous structures” formed by con-current transactions
  
  Serialization Certifier
  
  Concept
47. peerless07 22 Jun 2020
  
  in Public
  
  When a transaction commits, the DBMSassigns it a unique timestamp (Tcommit) that is used to update thebegin-tsfield for the versions created by that transaction and thenreleases all of the transaction’s locks.
  
  MV2PL: when to release locks
  
  Concept
48. peerless07 22 Jun 2020
  
  in Public
  
  Similarly, atransaction is allowed to update a versionBxonly if bothread-cntandtxn-idare set to zero
  
  MV2PL: when can update
  
  Concept
49. peerless07 22 Jun 2020
  
  in Public
  
  To perform a read operation on a tupleA, the DBMS searches fora visible version by comparing a transaction’sTidwith the tuples’begin-tsfield. If it finds a valid version, then the DBMS incre-ments that tuple’sread-cntfield if itstxn-idfield is equal to zero(meaning that no other transaction holds the write lock)
  
  MV2PL: when can read
  
  Concept
50. peerless07 22 Jun 2020
  
  in Public
  
  The tuple’swrite lockis thetxn-idfield. For theread lock, the DBMS usesaread-cntfield to count the number of active transactions thathave read the tuple. Although it is not necessary, the DBMS canpacktxn-idandread-cntinto contiguous 64-bit word so that theDBMS can use a single CaS to update them at the same time
  
  MV2PL: 锁同样大小所以单个CAS就可以执行更新
  
  Concept
51. peerless07 22 Jun 2020
  
  in Public
  
  This protocol uses the two-phase locking (2PL) method [11] toguarantee the transaction serializability. Every transaction acquiresthe proper lock on the current version of logical tuple before it isallowed to read or modify it.
  
  MV2PL
  
  Concept
52. peerless07 22 Jun 2020
  
  in Public
  
  But atransaction cannot read a new version until the other transaction thatcreated it commits. A transaction that reads an outdated version willonly find out that it should abort in the validation phase
  
  MVOCC: when cannot read
  
  Concept
53. peerless07 22 Jun 2020
  
  in Public
  
  If the transaction passes these checks, it then enters thewrite phasewhere the DBMS installs all the new versions and setstheirbegin-tstoTcommitandend-tstoINF
  
  Write Phase of MVOCC: end-ts initialized to INF
  
  Concept
54. peerless07 22 Jun 2020
  
  in Public
  
  When a transaction instructs the DBMS that it wants to commit,it then enters thevalidation phase. First, the DBMS assigns thetransaction another timestamp (Tcommit) to determine the serializationorder of transactions. The DBMS then determines whether the tuplesin the transaction’s read set was updated by a transaction that alreadycommitted.
  
  Validation Phase of MVOCC
  
  Concept
55. peerless07 22 Jun 2020
  
  in Public
  
  TheMVOCCprotocol splits a transaction into three phases.When the transaction starts, it is in theread phase. This is where thetransaction invokes read and update operations on the database. LikeMVTO, to perform a read operation on a tupleA, the DBMS firstsearches for a visible versionAxbased onbegin-tsandend-tsfields.Tis allowed to update versionAxif its write lock is not ac-quired. In a multi-version setting, if the transaction updates versionBx, then the DBMS creates versionBx+1with itstxn-idset toTid
  
  Read Phase of MVOCC: read operation & creation of new versions
  
  Concept
56. peerless07 22 Jun 2020
  
  in Public
  
  The motivation behind OCCis that the DBMS assumes that transactions are unlikely to conflict,and thus a transaction does not have to acquire locks on tuples whenit reads or updates them.
  
  MVOCC: optimistic about the conflict
  
  Concept
57. peerless07 22 Jun 2020
  
  in Public
  
  WithMVTO, a transaction always updates the latest version ofa tuple. TransactionTcreates a new versionBx+1if (1) no activetransaction holdsBx’s write lock and (2)Tidis larger thanBx’sread-tsfield. If these conditions are satisfied, then the DBMScreates a new versionBx+1and sets itstxn-idtoTid. WhenTcommits, the DBMS setsBx+1’sbegin-tsandend-tsfields toTidandINF(respectively), andBx’send-tsfield toTid
  
  MVTO: update create new version based on two conditions and update the begin-ts field on commit
  
  Concept
58. peerless07 22 Jun 2020
  
  in Public
  
  Upon readingAx, the DBMS setsAx’sread-tsfield toTidif its current value is less thanTid. Otherwise, the transaction readsan older version without updating this field
  
  MVCC: after read
  
  Concept
59. peerless07 22 Jun 2020
  
  in Public
  
  When transactionTinvokes a read operation on logical tupleA,the DBMS searches for a physical version whereTidis in betweenthe range of thebegin-tsandend-tsfields
  
  MVTO: when can read
  
  Concept
60. peerless07 22 Jun 2020
  
  in Public
  
  The crux ofthis approach is to use the transactions’ identifiers (Tid) to pre-compute their serialization order. In addition to the fields describedin Sect. 2.2, the version headers also contain the identifier of the lasttransaction that read it (read-ts). The DBMS aborts a transactionthat attempts to read or update a version whose write lock is held byanother transaction.
  
  MVTO：记录上一个交易读操作的时间戳
  
  Concept
61. peerless07 22 Jun 2020
  
  in Public
  
  Existing approachesto provide serializable transaction processing use either (1) addi-tional latches in the index [35, 44] or (2) extra validation steps whentransactions commit [27].
  
  有一点像乐观/悲观的Approaches
  
  Concept
62. peerless07 22 Jun 2020
  
  in Public
  
  multi-versioning does notbring any benefits to phantom prevention
  
  How to prevent Phantom in MVCC?
  
  Question
63. peerless07 22 Jun 2020
  
  in Public
  
  Every DBMS includes aconcurrency control protocolthat coor-dinates the execution of concurrent transactions [11]. This protocoldetermines (1) whether to allow a transaction to access or modify aparticular tuple version in the database at runtime, and (2) whether toallow a transaction to commit its modifications.
  
  并发控制
  
  Concept
64. peerless07 22 Jun 2020
  
  in Public
  
  Tuples:As shown in Fig. 1, each physical version contains fourmeta-data fields in its header that the DBMS uses to coordinatethe execution of concurrent transactions (some of the concurrencycontrol protocols discussed in the next section include additionalfields). Thetxn-idfield serves as the version’s write lock. Everytuple has this field set to zero when the tuple is not write-locked.Most DBMSs use a 64-bittxn-idso that it can use a single compare-and-swap (CaS) instruction to atomically update the value. If atransactionTwith identifierTidwants to update a tupleA, then theDBMS checks whetherA’stxn-idfield is zero. If it is, then DBMSwill set the value oftxn-idtoTidusing a CaS instruction [27, 44].begin-tscolumnsContentHeadertxn-idend-ts...pointerFigure 1: Tuple Format– The basic layout of a physical version of a tuple.Any transaction that attempts to updateAis aborted if thistxn-idfield is neither zero or not equal to itsTid. The next two meta-datafields are thebegin-tsandend-tstimestamps that represent thelifetime of the tuple version. Both fields are initially set to zero. TheDBMS sets a tuple’sbegin-tstoINFwhen the transaction deletesit. The last meta-data field is thepointerthat stores the address ofthe neighboring (previous or next) version (if any)
  
  Concept
65. peerless07 22 Jun 2020
  
  in Public
  
  Transactions:The DBMS assigns a transactionTa unique,monotonically increasing timestamp as its identifier (Tid) whenthey first enter the system. The concurrency control protocols usethis identifier to mark the tuple versions that a transaction accesses.Some protocols also use it for the serialization order of transactions
  
  每一个交易都要有一个时间戳
  
  Concept
66. peerless07 22 Jun 2020
  
  in Public
  
  Foremost is that it canpotentially allow for greater concurrency than a single-version sys-tem.
  
  优点1:更加并发，相比与单版本系统
  
  读查询可以查阅历史版本，读写查询可以继续更新对象。
  
  如果旧版本不被移除，甚至可以“时间旅行”😂
  
  Concept
67. peerless07 22 Jun 2020
  
  in Public
  
  This is especially true for in-memory DBMSswhere disk is no longer the main bottleneck
  
  印证了之前的bottleneck
  
  Concept
68. peerless07 22 Jun 2020
  
  in Public
  
  A transaction management scheme permits end-users to access adatabase in a multi-programmed fashion while preserving the illu-
  
  让用户以为每一个程序都在单独执行：
  
  Atomicity和Isolation原则
  
  Concept
69. peerless07 22 Jun 2020
  
  in Public
  
  this previous work does not reflect recent trends in latch-free [27] and serializable [20] concurrency control, as well as in-memory storage [36] and hybrid workloads [40]
  
  过去的研究没有考虑的问题，因为disk-oriented的overhead主要是读写，而in-memory的overhead在于以上几方面
  
  Concept
70. peerless07 22 Jun 2020
  
  in Public
  
  including Oracle (since 1984 [4]), Postgres(since 1985 [41]), and MySQL’s InnoDB engine (since 2001). Butwhile there are plenty of contemporaries to these older systemsthat use a single-version scheme (e.g., IBM DB2, Sybase), almostevery new transactional DBMS eschews this approach in favor ofMVCC [37]. This includes both commercial (e.g., Microsoft Heka-ton [16], SAP HANA [40], MemSQL [1], NuoDB [3]) and academic(e.g., HYRISE [21], HyPer [36]) systems
  
  Example
71. peerless07 22 Jun 2020
  
  in Public
  
  Multi-versioning allows read-only transactionsto access older versions of tuples without preventing read-writetransactions from simultaneously generating newer versions.
  
  一般来讲更新操作会实时地override原有的tuple
  
  Concept
72. peerless07 22 Jun 2020
  
  in Public
  
  but almost every MVCC DBMS uses tuples because itprovides a good balance between parallelism versus the overheadof version tracking
  
  Concept
73. peerless07 22 Jun 2020
  
  in Public
  
  Thebasic idea of MVCC is that the DBMS maintains multiple physicalversions of each logical object in the database to allow operations onthe same object to proceed in parallel.
  
  How to make sure different physical copies are in sync?
  
  Concept Question
74. peerless07 22 Jun 2020
  
  in Public
  
  Computer architecture advancements has led to the rise of multi-core, in-memory DBMSs that employ efficient transaction man-agement mechanisms to maximize parallelism without sacrificingserializability.
  
  lecture原话
  
  Important
75. peerless07 22 Jun 2020
  
  in Public
  
  when there are a large numberof threads running in parallel, the synchronization overhead canoutweigh the benefits of multi-versioning
  
  如何保证延展性
  
  Question
76. peerless07 22 Jun 2020
  
  in Public
  
  Although MVCC was discovered inthe late 1970s, it is used in almost every major relational DBMSreleased in the last decade
  
  他lecture中提到了，1970年代的东西现在却被广泛使用
Visit annotations in context

Tags

Concept

Important

Experiment

Question

Example

Annotators

peerless07

URL

15721.courses.cs.cmu.edu/spring2018/papers/05-mvcc1/wu-vldb2017.pdf
Apr 2020
www.jianshu.com www.jianshu.com

AnalyticDB: Real-time OLAP Database System at Alibaba Cloud

2
1. peerless07 28 Apr 2020
  
  in Public
  
  注：由于存在很多协调节点，所以某个协调节点上缓存的版本可能不是最新版本。
  
  如何解决
2. peerless07 27 Apr 2020
  
  in Public
  
  他们在修改数据时，都会同步修改索引,这种方式在全列索引的情况下不可行（会造成写延时和写放大）。
  
  OLTP场景一般是高并发低时延，OLAP一般为低并发吞吐量大，
Visit annotations in context

Annotators

peerless07

URL

jianshu.com/p/ce05e29c4766
www.vldb.org www.vldb.org

p2059-zhan.pdf

4
1. peerless07 27 Apr 2020
  
  in Public
  
  hybridrow-column layou
  
  查一查
  
  https://downloads.teradata.com/database/articles/hybrid-rowcolumn-stores-a-general-and-flexible-approach
  
  especially those that scan most rows of a table but only a small subset of the columns, column-stores are clearly preferable. On the other hand, there any many workloads that contain very selective predicates and require access to the entire tuple for those rows which pass the predicate. For such workloads, row-stores are clearly preferable.
2. peerless07 27 Apr 2020
  
  in Public
  
  AnalyticDBmaintains all-column indexes in an asynchronous mannerwith acceptable overhead
  
  异步维护所有列的索引并且overhead较小
3. peerless07 27 Apr 2020
  
  in Public
  
  structured and complex data types
  
  不同类型的数据结构：
  
  结构化的数据
  
  非结构化的数据，向量化
4. peerless07 27 Apr 2020
  
  in Public
  
  实时数据分析
Visit annotations in context

Annotators

peerless07

URL

vldb.org/pvldb/vol12/p2059-zhan.pdf
guides.rubyonrails.org guides.rubyonrails.org

Active Record Basics — Ruby on Rails Guides

1
1. peerless07 17 Apr 2020
  
  in Public
  
  Mouse mice
  
  nb
Visit annotations in context

Annotators

peerless07

URL

guides.rubyonrails.org/active_record_basics.html
Mar 2020
localhost:3000 localhost:3000

Unswot

1
1. peerless07 24 Mar 2020
  
  in Public
  
  =6667&s
  
  aaa
Visit annotations in context

Annotators

peerless07

URL

localhost:3000/
beautyschool.webflow.io beautyschool.webflow.io

EuropeanPro

1
1. peerless07 21 Mar 2020
  
  in Public
  
  rtist, it is extremely important to keep your hands and workstations sanitized and supplies properly d
  
  sfsdf
Visit annotations in context

Annotators

peerless07

URL

beautyschool.webflow.io/lessons/lesson2
Feb 2020
piazza.com piazza.com

Fall 2019 CSE 127: Midterm Exam

7
1. peerless07 11 Feb 2020
  
  in Public
  
  (a) (6 points) What memory safety vulnerability does this code have?
  
  n might exceed the boundary (off the end) of either pointer dst or pointer src.
  
  step can be negative. (stack overflow)
  
  step to be greater then INT_MAX / 2 (integer overflow) (stack overflow)
2. peerless07 11 Feb 2020
  
  in Public
  
  (b) (6 points) To demonstrate the bug to Steve, Parisa wrote anattackfunction, partlydefined above. What parameters could she callvulncopy()with to trigger the memory-safety violation?
  
  dst： buffer to overflow src：pointer to the shellcode n：大于dst+src长度的整数 step：1
  
  和1里面提到的vulnerability一样
3. peerless07 11 Feb 2020
  
  in Public
  
  Cross-site request forgery (CSRF)
  
  link use post request to make attacker appear as the user
  
  Attack. unauthorized commands are transmitted from a user that the web application trusts
  
  Add CSRF token
4. peerless07 11 Feb 2020
  
  in Public
  
  Return-oriented programming (ROP).
  
  Attack.
  
  Damage it causes: The attack can run whatever code that can piece together.
  
  A way to mitigate it: CFI can mitigate.
5. peerless07 11 Feb 2020
  
  in Public
  
  Control-flow integrity (CFI)
  
  CFI is a technique for dynamically checking if indirect control transfersduring program execution are consistent with those described by the program source code
  
  Defense. Defend against control-flow attacks
6. peerless07 11 Feb 2020
  
  in Public
  
  True
  
  XSS 和 injection 有什么区别。仅仅限于javascript
7. peerless07 11 Feb 2020
  
  in Public
  
  False
  
  Cookie is generated by server to the the web browser
Visit annotations in context

Annotators

peerless07

URL

piazza.com/class_profile/get_resource/k52y3c7qerb24d/k6bs43bj6nw1eo
Jan 2020
cseweb.ucsd.edu cseweb.ucsd.edu

Computer Security

2
1. peerless07 31 Jan 2020
  
  in Public
  
  cy and anonymity. The goal of the course is to provide an appreciation of how to think adversarially with respect to computer systems as well as an
  
  ？？？？不懂
  
  Quesiton
2. peerless07 31 Jan 2020
  
  in Public
  
  on, covert channels, network s
  
  CSE127 就是打蜡吉
Visit annotations in context

Tags

Quesiton

Annotators

peerless07

URL

cseweb.ucsd.edu/~dstefan/cse127-winter20/
cseweb.ucsd.edu cseweb.ucsd.edu

Assignment 3 - Computer Security

1
1. peerless07 30 Jan 2020
  
  in Public
  
  You will be provided a two skeleton files (memhack.c and timehack.c) that you will use to exploit side channels present in a target library (sysapp.c) to obtain a protected key
  
  Goal
Visit annotations in context

Annotators

peerless07

URL

cseweb.ucsd.edu/~dstefan/cse127-winter20/pa/pa3/
www.math.ucsd.edu www.math.ucsd.edu

w20_180b_pm1.pdf

2
1. peerless07 27 Jan 2020
  
  in Public
  
  3.(15 points) Suppose thatˆX=(X1X2)is bivariate normal with mean ˆμ=(00)andcovariance matrix Σ =(1 00 1). Suppose that we multiplyˆXby an invertible 2×2 matrix
  
  Study linear transformation of bivariate normal
2. peerless07 27 Jan 2020
  
  in Public
  
  Suppose thatXandYare independent random variables withP(X= 1) =P(X=−1) =12andY∼Unif[−1,1]. Prove thatX+Y∼Unif[−2,2]
  
  Use MGF sum = product of each a. Write down the MGF for each function
  
  Calculate CDF
Visit annotations in context

Annotators

peerless07

URL

math.ucsd.edu/~ynemish/180b/w20_180b_pm1.pdf
Local file Local file

Simpson - Won't Somebody Please Think of the Children.pdf

3
1. peerless07 27 Jan 2020
  
  in Public
  
  This suggestion isradical in that it proposes something different to the prevailing ap-proaches in anti-hate speech law. But it is also mild in the sense thatit’s likely to be more agreeable to those who think that these pre-vailing approaches run unacceptably close to being outright prohi-bitions on certain forbidden opinions
  
  折衷
2. peerless07 27 Jan 2020
  
  in Public
  
  apro tantocase
  
  to such extent
3. peerless07 27 Jan 2020
  
  in Public
  
  prima facie
  
  初步证明
Annotators

peerless07
www.math.ucsd.edu www.math.ucsd.edu

180blecture2afterClass.pdf

2
1. peerless07 24 Jan 2020
  
  in Public
  
  Probabitityreviewiportantdsretedistributionst.BYBer(p),pelo,1),"successlfailureexpriment"IfX-Berlp),thenPCK1)=t-PCX-a)=p,E(X)=p,Var(x)=pa-p)Z.BinominalB(hip),neTN.pe(Oil).HX-BCnip),thenpx(K)=(f)pkf-pjk.kthor.int,E=np,Var(X)-npltp)"Humberofsuccessesinaseriesofnindep.Bernoullitrials"3.GéométrieGeomlp),pelai).IfXrCream(p),thenpx(K)=p(tp)"",K-tir.---E=¥,Var(X)="Firstsuccessfuttrialinaseriesofindep.Bernoullitrials"4.PoissonPois(d),d>oIfXnPois(d),thenpx(K)=Ige",K-Oil,--E(X)=D,Yar(X)=d
  
  比较常见的RV。
2. peerless07 24 Jan 2020
  
  in Public
  
  XTY-Pois(dtn)
  
  两个Poisson variable相加，sum还是Poisson（系数相加）
Visit annotations in context

Annotators

peerless07

URL

math.ucsd.edu/~ynemish/180b/180blecture2afterClass.pdf
Local file Local file

Kant - Groundwork (selections) (2).pdf

1
1. peerless07 22 Jan 2020
  
  in Public
  
  By all means! It is just then that the worth of character 4:399 comes out, which is moral and incomparably the highest, namely that he is beneficent not from inclination but from duty
  
  我们崇尚那些即使自己很垃圾但依然给世界带来爱与和平的人。这种人我们赋予他高尚的品格。
Annotators

peerless07
idl.hbdlib.cn idl.hbdlib.cn

151020.pdf

1
1. peerless07 20 Jan 2020
  
  in Public
  
  不可通约性
  
  https://en.wikipedia.org/wiki/Commensurability_(philosophy_of_science)
Visit annotations in context

Annotators

peerless07

URL

idl.hbdlib.cn/book/00000000000000/pdfbook/012/001/151020.pdf

peerless07

Annotations: 105

Joined: January 8, 2020

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

URL

Tags

Annotators

URL

Annotators

URL

Annotators

URL

Annotators

Annotators

URL

Annotators

Annotators

URL