Commit Graph

14 Commits

Author SHA1 Message Date
evazion
be1251b6be autocomplete: optimize various types of bogus input.
Optimize autocomplete to ignore various types of bogus input that will
never match anything. It turns out it's not uncommon for people to do
things like paste random URLs into autocomplete, or hold down keys, or
enter long strings of gibberish text (sometimes in other languages).
Some things, like autocorrect and slash abbreviations, become
pathologically slow when fed certain types of bad input.

Autocomplete will abort and return nothing in the following situations:

* Searching for URLs (tags that start with http:// or https://).
* Overly long tags (strings longer than the 170 char tag name limit).
* Slash abbreviations longer than 10 chars (e.g. typing `/qwoijqoiqogirqewgoi`).
* Slash abbreviations that aren't alphanumeric (e.g. typing `/////////`).
* Autocorrect input that contains too much punctuation and not enough actual letters.
2021-01-11 05:12:09 -06:00
evazion
fc5db679e4 autocomplete: optimize searching by artist/wiki page other names.
Optimize searches for non-English phrases in autocomplete. These
searches were pretty slow, and could sometimes cause sitewide lag spikes
when users typed long strings of non-English text into the search box
and caused an unintentional DoS.

The trick is to use an `array_to_tsvector(other_names) USING gin` index
on other_names. This supports fast string prefix matching against all
elements of the array. The downside is that it doesn't allow infix or
suffix matches, so we can't support wildcards in general. Wildcards
didn't quite work anyway, since artist and wiki other names can contain
literal '*' characters.
2021-01-10 03:35:12 -06:00
evazion
5b7894a8b2 autocomplete: fix exception when type param is missing. 2021-01-01 04:06:38 -06:00
evazion
3ad4beac02 autocomplete: fix exception when completing unsupported metatags. 2020-12-20 01:27:48 -06:00
evazion
2c1da660fd tags: allow tag abbreviations in searches and during tagging.
Expand the tag abbreviation system introduced in b0be8ae45 so that it
works in searches and when tagging posts, not just in autocomplete.

For example, you can tag a post with /evth and it will add the tag
eyebrows_visible_through_hair. You can search for /evth and it will
search for the tag eyebrows_visible_through_hair.

Some more examples:

* /ops is short for one-piece_swimsuit
* /hooe is short for hair_over_one_eye
* /saol is short for standing_on_one_leg
* /tlozbotw is short for the_legend_of_zelda:_breath_of_the_wild

If two tags have the same abbreviation, then the larger tag takes
precedence. For example, /be is short for blue_eyes, not brown_eyes,
because blue_eyes is the bigger tag.

If there is an existing shortcut alias that conflicts with the
abbreviation, then the alias take precedence. For example, /sh is short
for suzumiya_haruhi, not short_hair, because there's an old alias for
/sh -> suzumiya_haruhi.
2020-12-17 23:57:13 -06:00
evazion
26246b0ac9 autocomplete: fix exception when typing "/" in autocomplete.
Fix an exception that could occur when typing "/" by itself in
autocomplete and a regular tag starting with "/" was returned. This
caused an exception in `r[:antecedent].length` because the tag's
antecedent was nil.
2020-12-14 21:57:28 -06:00
evazion
4cdaf7bcdf autocomplete: update html data attributes.
* Remove the `source` and `weight` html data attributes (no longer used).
* Make the `type` html data attribute properly indicate the completion
  type. Valid types: `tag`, `tag-alias`, `tag-abbreviation`,
  `tag-autocorrect`, `tag-other-name`.
2020-12-14 18:58:11 -06:00
evazion
c02c31b966 autocomplete: recognize Japanese tags in autocomplete.
Allowing typing Japanese tags in autocomplete. For example, typing 東方
in autocomplete will be completed to the touhou tag. Typing ぶくぶ will
complete to the bkub tag.

This works using wiki page and artist other names. Effectively, any name
listed as an other name in a wiki or artist page will be treated like an
alias for autocomplete purposes. This is limited to non-ASCII other names,
to prevent English other names from interfering with regular tag searches.
2020-12-14 18:58:11 -06:00
evazion
b002bf25f5 autocomplete: display autocorrected tags like aliases.
Display autocorrected tags similar to aliases, with an arrow pointing at
the corrected tag, but with a dotted underline beneath the misspelled
tag to indicate that it's misspelled.
2020-12-13 04:10:48 -06:00
evazion
6a46aeb55c autocomplete: tune autocorrect algorithm.
Tune autocorrect to produce fewer false positives. Before we used
trigram similarity. Now we use Levenshtein edit distance with a dynamic
typo threshold. Trigram similarity was able to correct large
transpositions (e.g. `miku_hatsune` -> `hatsune_miku`), but it was bad
at correcting small typos. Levenshtein is good at small typos, but can't
correct large transpositions.
2020-12-13 04:10:48 -06:00
evazion
119268e118 autocomplete: fix exception when completing saved search labels.
Fix an exception that was thrown when trying to autocomplete saved
search labels (e.g. `search:all`) as an anonymous user. This was a
pre-existing bug.
2020-12-13 00:45:22 -06:00
evazion
d6a5b9e252 autocomplete: rework cache policy.
The previous cache policy was that all autocomplete results were cached
for a fixed 7 days. The new policy is that if autocomplete returns more
than 10 results they're cached for 24 hours, otherwise if it returns
less than 10 results they're cached for 1 hour.

The rationale is that if autocomplete returns a lot of results, then the
top 10 results are relatively stable and unlikely to change, but if it
returns less than 10 results, then the results are unstable and can be
easily changed.

We also change it so that autocomplete calls can be cached publicly.
Public caching means that HTTP requests are cached by Cloudflare. This
will ideally reduce load on the server and reduce latency for end users.
This is only safe for calls that return the same results for all users
(i.e. the results don't depend on the current user), since the cache is
publicly shared by all users. Currently username, favgroup, and saved
search autocomplete results depend on the current user, so they can't be
publicly cached.
2020-12-13 00:45:22 -06:00
evazion
b0be8ae456 autocomplete: rework tag autocomplete behavior.
Reworks tag autocomplete to work the same way for all users. Previously
autocomplete for Builders worked differently than autocomplete for
regular users.

This is how it works now:

* If the search starts with a slash (/), then do a tag abbreviation
  match. For example, `/evth` matches eyebrows_visible_through_hair.
* Otherwise if the search contains a wildcard (*), then just do a simple
  wildcard search.
* Otherwise do a tag prefix match against tags and aliases. For example,
  `black` matches all tags or aliases beginning with `black`.
* If the tag prefix match returns no results, then do a autocorrect match.

The differences for regular users:

* You can abbreviate tags with a slash (/).

The differences for Builders:

* Now tag abbreviations have to start with a slash (/).
* Autocorrect isn't performed unless a regular search returns no results.
* Results are always sorted by tag count. Before different types of
  results (regular tag matches, alias matches, abbreviation matches,
  and autocorrect matches) were all mixed together based on a tag
  weighting scheme.
2020-12-13 00:45:22 -06:00
evazion
adc1c2c2cc autocomplete: refactor javascript to use /autocomplete endpoint.
This refactors the autocomplete Javascript to use a single dedicated
/autocomplete.json endpoint instead of a bunch of separate endpoints.

This simplifies the autocomplete Javascript by making it so that instead
of calling a different endpoint for each type of query (for users, wiki
pages, pools, artists, etc), then having to parse the results of each
call to get the data we need, we can call a single endpoint that returns
exactly what we need.

This also means we don't have to parse searches clientside in order to
autocomplete metatags. Instead we can just pass the search term to the
server and let it parse the search, which is easy to do serverside.

Finally, this makes autocomplete easier to test, and it makes it easier
to add more sophisticated autocomplete behavior, since most of the logic
lives serverside.
2020-12-13 00:45:22 -06:00