2 Matching Annotations
  1. Mar 2019
    1. Within the frame-based loss term, we apply a weighting to encourage accuracy at the start of the note.

      From Onsets and Frames paper

      "we define the weighted frame loss as:

      $$L_{frame}(l,p) = \begin{cases} c L'_{frame}(l,p) & t_1 \leq t \leq t_2 \\ \frac{c}{t-t_2} L'_{frame}(l,p) & t_2 < t \leq t_3 \\ L'_{frame}(l,p) & \text{ elsewhere } \end{cases}$$

      where c = 5.0 as determined with coarse hyperparameter search."

    2. we also restrict the final output of the model to start new notes only when the onset detector is confident that a note onset is in that frame.

      From Onsets and Frames paper

      "We also use the thresholded output of the onset detector during the inference process, similar to concurrent research described in [24]. An activation from the frame detector is only allowed to start a note if the onset detector agrees that an onset is present in that frame."


      From referenced paper [24]

      "Finally, we peak pick the two-channel activation matrix to convert the framewise piano roll to a list of note events. Per note, we step through each time frame and place an onset at positions where the articulation channel is above a set threshold, and then include all frames onward until the sustain channel is under another fixed threshold, at which point we output an offset. If a new articulation is found during an active note event we simply fragment it by outputting additional offsets and onsets."


      where articulation channel refers to the parallel piano-roll channel where only note frames corresponding to note onsets are active, so here onset labels (onsets = articulations in authors' lingo), and sustain channel would be our frame-level predictions corresponding to note-level frame labels.