17 Matching Annotations
  1. Nov 2023
  2. nightlies.apache.org nightlies.apache.org
    1. Note that the write*() methods on DataStream are mainly intended for debugging purposes. They are not participating in Flink’s checkpointing, this means these functions usually have at-least-once semantics. The data flushing to the target system depends on the implementation of the OutputFormat. This means that not all elements send to the OutputFormat are immediately showing up in the target system. Also, in failure cases, those records might be lost. For reliable, exactly-once delivery of a stream into a file system, use the FileSink. Also, custom implementations through the .addSink(...) method can participate in Flink’s checkpointing for exactly-once semantics.

      生产禁用write*(),改用addSink()

  3. Dec 2022
  4. Jun 2022
  5. nightlies.apache.org nightlies.apache.org
    1. Flink supports Counters, Gauges, Histograms and Meters.

      计数器,仪表,直方图,

    1. 拥有一个预先存在的集群可以节省大量时间申请资源和启动 TaskManager
    2. 每个 TaskManager 有一个 slot,这意味着每个 task 组都在单独的 JVM 中运行(例如,可以在单独的容器中启动)。具有多个 slot 意味着更多 subtask 共享同一 JVM。同一 JVM 中的 task 共享 TCP 连接(通过多路复用)和心跳信息。它们还可以共享数据集和数据结构,从而减少了每个 task 的开销。
    3. 调整 task slot 的数量,用户可以定义 subtask 如何互相隔离
    4. 每个 worker(TaskManager)都是一个 JVM 进程
    5. 算子链接成 task 是个有用的优化:它减少线程间切换、缓冲的开销,并且减少延迟的同时增加整体吞吐量。链行为是可以配置的;请参考链文档以获取详细信息。
    6. JobMaster 负责管理单个JobGraph的执行。Flink 集群中可以同时运行多个作业,每个作业都有自己的 JobMaster。
    7. 决定何时调度下一个 task(或一组 task)、对完成的 task 或执行失败做出反应、协调 checkpoint、并且协调从失败中恢复等等
    8. JobManager 具有许多与协调 Flink 应用程序的分布式执行有关的职责
  6. May 2022
  7. nightlies.apache.org nightlies.apache.org
    1. we define a tumbling window with the size of 2 milliseconds, which results in windows of the form [0,1], [2,3],

      相当于[0,2),[2,4)

    2. Because this behaves like an inner join, elements of one stream that do not have elements from another stream in their tumbling window are not emitted!

      inner-join

    3. Those elements that do get joined will have as their timestamp the largest timestamp that still lies in the respective window. For example a window with [5, 10) as its boundaries would result in the joined elements having 9 as their timestamp.

      那些确实被加入的元素将具有仍然位于相应窗口中的最大时间戳作为它们的时间戳。例如,以 [5, 10) 作为其边界的窗口将导致连接的元素以 9 作为其时间戳。

    4. like an inner-join
  8. Feb 2021
    1. Consider the amount of data and the speed of the data, if low latency is your priority use Akka Streams, if you have huge amounts of data use Spark, Flink or GCP DataFlow.

      For low latency = Akka Streams

      For huge amounts of data = Spark, Flink or GCP DataFlow