20 Matching Annotations
  1. May 2023
    1. articulates requirements for readability sating that identifiers must be: Any printable characters from the Universal Character Set of ISO/IEC 10646 (ISO 2012):UTF-8 encoding is required; Case insensitive:Only ASCII case folding is allowed.

      {UTF-8} {ASCII Case Folding}

  2. Dec 2022
  3. Nov 2022
    1. The btoa() function takes a JavaScript string as a parameter. In JavaScript strings are represented using the UTF-16 character encoding: in this encoding, strings are represented as a sequence of 16-bit (2 byte) units. Every ASCII character fits into the first byte of one of these units, but many other characters don't. Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data:
  4. Oct 2022
    1. Of course, if super-intelligent Aliens will arrive on our planet, bearing a writing system with billions characters, I will withdraw this proposal and donate the name "UTF-64" to the Unicode Consortium.
  5. May 2022
    1. You can use a heuristic: only change strings that have one of the bad characters in them, like â. This works well if a character like â won’t ever appear in a valid string. The last time I fixed this kind of bug, though, I wanted to play it safe. I used another useful tool to help: my eyes. Whenever I found a badly encoded string, I printed it out, along with its replacement:
      • no magic solutions!
    2. It seems like those three bytes should be read as UTF-8, where they’d represent a curly quote. Instead, each byte is showing up as a different character. So, which encoding would represent [226, 128, 153] as ’? If you look at a few tables of popular encodings, you’ll see it’s Windows-1252.

      -In UTF8 are 3 bytes - In W1252 a byte= a char

    1. This only forces the client which encoding to use to interpret and display the characters. But the actual problem is that you're already sending ’ (encoded in UTF-8) to the client instead of ’. The client is correctly displaying ’ using the UTF-8 encoding. If the client was misinstructed to use, for example ISO-8859-1, you would likely have seen ââ¬â¢ instead.
      • HERE IT IS!
    2. This answer is not useful Show activity on this post. So what's the problem, It's a ’ (RIGHT SINGLE QUOTATION MARK - U+2019) character which is being decoded as CP-1252 instead of UTF-8. If you check the encodings table, then you see that this character is in UTF-8 composed of bytes 0xE2, 0x80 and 0x99. If you check the CP-1252 code page layout, then you'll see that each of those bytes stand for the individual characters â, € and ™.
      • HERE IT IS!
    1. This works for me: PowerShell -Command "TREE /F | Out-File output.txt -Encoding utf8"
      • WITH POWERSHELL
    2. You should add this command chcp 65001 before dir command to change code page to UTF-8 @echo off CHCP 65001>nul dir>1.txt Further reading about CHCP command
      • DIR NAMES IN UTF-8
    1. Most European keyboards have keycap labels for the apostrophe and both accents. These have always looked like in the ISO and Unicode standards. The photo below shows the relevant keys highlighted on a standard German PC keyboard, which has the acute/grave accent key left and the number-sign/apostrophe key below the backspace key:
      • unicode!
  6. Sep 2021