Rewrite the implementation of related tags to be simpler, faster, and
more accurate:
* The related tags are now calculated by taking a random sample of 1000
posts, finding the top 250 most frequent tags among those posts, then
ordering those tags by cosine similarity.
* Related tags can generally be calculated in 50-300ms at these sample
sizes. Very high sample sizes (25000+ posts) are still relatively fast
(1-3 seconds), but generally they don't improve accuracy much.
* Related tags are now cached in redis rather than in the tags table.
The related_tags column in the tags table is no longer used.
* Only the related tags in the search taglist are cached. The related
tags returned by the 'Related tags' button are not cached.
* The cache lifetime is a fixed 4 hours.
* The 'Related tags' button now works with metatags.
* The /related_tag page now works with metatags and multitag searches.
Fixes#4134, #4146.
* Drop support for `source:pixiv/artist-name` searches. This was a hack
that only worked on old pixiv urls that haven't been used for years.
* Replace the old SourcePattern(lower(source)) index with a trigram index.
Drop support for https://danbooru.donmai.us/cache/tags.json. This was a
nightly dump of the tags table that was originally added in #1012. It
was never documented and never really used except for by the DanbooruUp
extension.
Fixes POST/PUT API requests failing with InvalidAuthenticityToken errors
due to missing CSRF tokens.
CSRF protection is only necessary for cookie-based authentication. For
non-cookie-based authentication we can safely disable it. That is, if
the user is already passing their login + api_key, then we don't need
to additionally verify the request with a CSRF token.
ref: 2e407fa476 (comments)
Setting the statement timeout at the beginning didn't work because
`PostPruner.new.prune!` clobbers the timeout (it calls `without_timeout`,
which doesn't restore the timeout properly if the timeout was zero).
Also fixes a bug where mod actions weren't logged on mass updates.
Creating the mod action silently failed because it was called when
CurrentUser wasn' set.
Changes:
* Drop Users.id_to_name.
* Don't cache Users.name_to_id.
* Replace calls to name_to_id with find_by_name when possible.
* Don't autodefine creator_name in belongs_to_creator.
* Don't autodefine updater_name in belongs_to_updater.
* Instead manually define creator_name / updater_name only on models that need
to return these fields in the api.
id_to_name was cached to reduce the impact of N+1 query patterns in
certain places, especially in api responses that return creator_name /
updater_name fields. But it still meant we were doing N calls to
memcache. Using `includes` to prefetch users avoids this N+1 pattern.
name_to_id had no need be cached, it was never used in any performance-
sensitive contexts.
Avoiding caching also avoids the need to keep these caches consistent.
* Drop /posts?ro=true param (broken).
* Clean up tag_match (rescuing PG::ConnectionBad didn't do anything, we
just build the query here, we don't run it).
* Don't allow adding tags with invalid names when they already exist in
the tags table.
* If an invalid tag is added, show an warning and ignore the tag instead
of failing with a hard error.
* Move the _(cosplay) tag validation into the tag name validator.
* Simplify code.
* Show backtraces for all users, not just builders.
* Show backtraces only for unexpected server errors (status 5xx), not
for normal client errors (status 4xx).
* Log expected errors at info level (reduce noise in production logs).
Avoid some queries used in wiki page excerpts:
* Only try to load the artist if the tag is an artist tag.
* Avoid using `exists?` queries for wiki pages.
* Bugfix: don't show wiki excerpts for deleted wikis.