Commit Graph

4 Commits

Author SHA1 Message Date
evazion
d9dc84325f Fix #5365: Don't allow whitespace-only text submission.
Fix bug where it was possible to submit blank text in various text fields.

Caused by `String#blank?` not considering certain Unicode characters as blank. `blank?` is defined
as `match?(/\A[[:space:]]*\z/)`, where `[[:space:]]` matches ASCII spaces (space, tab, newline, etc)
and Unicode characters in the Space category ([1]). However, there are other space-like characters
not in the Space category. This includes U+200B (Zero-Width Space), and many more.

It turns out the "Default ignorable code points" [2][3] are what we're after. These are the set of 400
or so formatting and control characters that are invisible when displayed.

Note that there are other control characters that aren't invisible when rendered, instead they're
shown with a placeholder glyph. These include the ASCII C0 and C1 control codes [4], certain Unicode
control characters [5], and unassigned, reserved, and private use codepoints.

There is one outlier: the Braille pattern blank (U+2800) [6]. This character is visually blank, but is
not considered to be a space or an ignorable code point.

[1]: https://codepoints.net/search?gc[]=Z
[2]: https://codepoints.net/search?DI=1
[3]: https://www.unicode.org/review/pr-5.html
[4]: https://codepoints.net/search?gc[]=Cc
[5]: https://codepoints.net/search?gc[]=Cf
[6]: https://codepoints.net/U+2800
[7]: https://en.wikipedia.org/wiki/Whitespace_character
[8]: https://character.construction/blanks
[9]: https://invisible-characters.com
2022-12-05 01:58:34 -06:00
evazion
95f39afcd9 tests: fixup broken tests. 2021-01-11 05:12:09 -06:00
evazion
efb836ac02 wikis: normalize Unicode characters in wiki bodies.
* Introduce an abstraction for normalizing attributes. Very loosely
  modeled after https://github.com/fnando/normalize_attributes.
* Normalize wiki bodies to Unicode NFC form.
* Normalize Unicode space characters in wiki bodies (strip zero width
  spaces, normalize line endings to CRLF, normalize Unicode spaces to
  ASCII spaces).
* Trim spaces from the start and end of wiki page bodies. This may cause
  wiki page diffs to show spaces being removed even when the user didn't
  explicitly remove the spaces themselves.
2020-12-21 20:47:50 -06:00
evazion
cebf29f83e Allow escaping wildcards (\*) in wildcard searches. 2017-05-31 16:15:18 -05:00