29 Matching Annotations
  1. Feb 2024
    1. The difference between the predicted and actualvalues is captured through the loss function and back-propagatedinto the network,

      This is the difference between this approach and our delta lstms. Our Delta lstms treat the target as a word in the vocabulary (bcz of the OHE), while this treats it as numerical value and uses the difference to compute loss function

    2. Most importantly, they accept top-k predictions at atime, so as to increase the chances of a correct prediction

      In my understanding, Higher k => lower accuracy, higher coverage and higher timeliness issues

    3. when the output valuespace is significantly large (number of different pages), the RNNprediction accuracy tends to be low

      this seems to be an issue, no matter what ML architecture we deploy

  2. Sep 2023
    1. This extra wait-timedue to lazy cache eviction policy adds to the overall latency,especially in a high memory pressure scenario

      A previous paper we read (The Working Set Model for Program Behaviour -Peter J. Denning) suggested that we should not replace until we absolutely have to, cuz aggresively preloading pages can be futile

    2. Data path latencies for two access patterns. Memory dis-aggregation systems have some constant implementation overheadsthat cap their minimum latency to around 1 μs

      Sequential prefetching is performing worse than regular disk accesses?

    3. Linux ABIs

      An ABI (Application Binary Interface) defines how data structures or computational routines are accessed in machine code, which is a low-level, hardware-dependent format.

    1. NUMA nodes

      Non-uniform memory access is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory.

    2. A/B testing methodology

      A/B testing (also known as split testing or bucket testing) is a methodology for comparing two versions of a webpage or app against each other to determine which one performs better.

    1. A/B testing methodology

      A/B testing (also known as split testing or bucket testing) is a methodology for comparing two versions of a webpage or app against each other to determine which one performs better.

  3. Jul 2023
  4. Jun 2023
    1. We can do this by following the pushes andpops of the stack through the dataflow graph showing thatthey are balanced between subsequent executions of thegraph kernel.

      So, let's say our next 10 instructions are: 5 POP, 5 PUSH We can avoid changing the stack pointer after every instruction?

    2. If the memory latency given by theload chain is higher than the independent work executedbetween subsequent delinquent loads, hiding the memorylatency is impossible, even with infinite run-ahead.

      Why would this be a problem if we have an infinite run-ahead (which I am assuming means we can look far ahead into the pattern and know what needs to be fetched)?

    1. Finally, dealingwith rarely occurring deltas is non-trivial.

      I imagine the cache misses caused due to rarely occuring won't be that expensive because of their rarity. Why would this case be non-trivial then?

      Ps: I understand the concern of rare words in NLP, but I imagine, for prefetching, we won't need as high accuracies and thus, can neglect the rare ones.

    2. we need highresolution in every area where addresses are used

      We need high resolution as we can not prefetch large chunks of memory, thus we can't use this quantization approach

    3. Stride prefetchers

      This observes the strides between successive memory accesses and predicts that the same pattern will continue in the future.