9 Matching Annotations
- Mar 2023
-
github.com github.com
-
user = User.new(password: "あ" * 25) # 25 characters, 75 bytes
characters vs. bytes
-
- Jan 2023
-
-
And misunderstandings so easily occur here, when we're talking about encodings, but not those encodings, the other encoding, which is really charset. And it's especially hard because you can't visually tell the difference and in so many cases everything still works even though it is wrong.
-
- Nov 2022
-
developer.mozilla.org developer.mozilla.org
-
The btoa() function takes a JavaScript string as a parameter. In JavaScript strings are represented using the UTF-16 character encoding: in this encoding, strings are represented as a sequence of 16-bit (2 byte) units. Every ASCII character fits into the first byte of one of these units, but many other characters don't. Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data:
-
If you need to encode Unicode text as ASCII using btoa(), one option is to convert the string such that each 16-bit unit occupies only one byte.
-
-
www.w3.org www.w3.org
-
The character exists in Unicode/ISO 10646, but not in the character encoding used for the document. In this case, use Numeric Character References (NCRs, example: 噸).
-
-
en.wikipedia.org en.wikipedia.org
- Aug 2021
-
www.w3.org www.w3.org
-
�Yes, but how will we ever keep track of such a large project?�
Unsure of the text encoding here. I'm forcing them to be interpreted as Unicode here, hence the appearance of the replacement character. My browser's default is to treat this document as "Central European (Windows)", but in that case, they appear as majuscule and miniscule S-cedilla characters (e.g.
Şhypertextş
).By a reasonable guess, these are supposed to be open and close quotes. I've seen these appear in other TBL-authored documents from the same era.
-
- Apr 2021
-
en.wikipedia.org en.wikipedia.org
-
The use of U+212B 'Angstrom sign', which was encoded due to round-trip mapping compatibility with an East-Asian character encoding, is discouraged, and the preferred representation is U+00C5 'capital letter A with ring above', which has the same glyph.
Is there a difference in semantic meaning between the two? And if so, what is it? 
-