Fix #5365: Don't allow whitespace-only text submission.
Fix bug where it was possible to submit blank text in various text fields. Caused by `String#blank?` not considering certain Unicode characters as blank. `blank?` is defined as `match?(/\A[[:space:]]*\z/)`, where `[[:space:]]` matches ASCII spaces (space, tab, newline, etc) and Unicode characters in the Space category ([1]). However, there are other space-like characters not in the Space category. This includes U+200B (Zero-Width Space), and many more. It turns out the "Default ignorable code points" [2][3] are what we're after. These are the set of 400 or so formatting and control characters that are invisible when displayed. Note that there are other control characters that aren't invisible when rendered, instead they're shown with a placeholder glyph. These include the ASCII C0 and C1 control codes [4], certain Unicode control characters [5], and unassigned, reserved, and private use codepoints. There is one outlier: the Braille pattern blank (U+2800) [6]. This character is visually blank, but is not considered to be a space or an ignorable code point. [1]: https://codepoints.net/search?gc[]=Z [2]: https://codepoints.net/search?DI=1 [3]: https://www.unicode.org/review/pr-5.html [4]: https://codepoints.net/search?gc[]=Cc [5]: https://codepoints.net/search?gc[]=Cf [6]: https://codepoints.net/U+2800 [7]: https://en.wikipedia.org/wiki/Whitespace_character [8]: https://character.construction/blanks [9]: https://invisible-characters.com
This commit is contained in:
@@ -5,6 +5,22 @@ require "danbooru"
|
||||
module Danbooru
|
||||
module Extensions
|
||||
module String
|
||||
# https://invisible-characters.com
|
||||
# https://character.construction/blanks
|
||||
# https://www.unicode.org/review/pr-5.html (5.22 Default Ignorable Code Points)
|
||||
# https://en.wikipedia.org/wiki/Whitespace_character
|
||||
#
|
||||
# [[:space:]] = https://codepoints.net/search?gc[]=Z (Space_Separator | Line_Separator | Paragraph_Separator | U+0009 | U+000A | U+000B | U+000C | U+000D | U+0085)
|
||||
# \p{di} = https://codepoints.net/search?DI=1 (Default_Ignorable_Code_Point)
|
||||
# \u2800 = https://codepoints.net/U+2800 (BRAILLE PATTERN BLANK)
|
||||
INVISIBLE_REGEX = /\A[[:space:]\p{di}\u2800]*\z/
|
||||
|
||||
# Returns true if the string consists entirely of invisible characters. Like `#blank?`, but includes control
|
||||
# characters and certain other invisible Unicode characters that aren't classified as spaces.
|
||||
def invisible?
|
||||
match?(INVISIBLE_REGEX)
|
||||
end
|
||||
|
||||
def to_escaped_for_sql_like
|
||||
string = self.gsub(/%|_|\*|\\\*|\\\\|\\/) do |str|
|
||||
case str
|
||||
|
||||
Reference in New Issue
Block a user