artists: fix misnormalization of emoji in other names.

Fix `normalize_whitespace` to not strip zero-width joiner characters
(U+200D). These characters are used in emoji and stripping them breaks
some artist other names that use emoji.
This commit is contained in:
evazion
2021-01-10 02:46:20 -06:00
parent 0899194f6b
commit d18dc573fb
2 changed files with 4 additions and 2 deletions

View File

@@ -46,8 +46,9 @@ module Danbooru
# Normalize various horizontal space characters to ASCII space.
text = gsub(/\p{Zs}|\t/, " ")
# Strip various zero width space characters.
text = text.gsub(/[\u180E\u200B\u200C\u200D\u2060\uFEFF]/, "")
# Strip various zero width space characters. Zero width joiner (200D)
# is allowed because it's used in emoji.
text = text.gsub(/[\u180E\u200B\u200C\u2060\uFEFF]/, "")
# Normalize various line ending characters to CRLF.
text = text.gsub(/\r?\n|\r|\v|\f|\u0085|\u2028|\u2029/, "\r\n")