Rewrite the implementation of related tags to be simpler, faster, and
more accurate:
* The related tags are now calculated by taking a random sample of 1000
posts, finding the top 250 most frequent tags among those posts, then
ordering those tags by cosine similarity.
* Related tags can generally be calculated in 50-300ms at these sample
sizes. Very high sample sizes (25000+ posts) are still relatively fast
(1-3 seconds), but generally they don't improve accuracy much.
* Related tags are now cached in redis rather than in the tags table.
The related_tags column in the tags table is no longer used.
* Only the related tags in the search taglist are cached. The related
tags returned by the 'Related tags' button are not cached.
* The cache lifetime is a fixed 4 hours.
* The 'Related tags' button now works with metatags.
* The /related_tag page now works with metatags and multitag searches.
Fixes#4134, #4146.
Changes:
* Drop Users.id_to_name.
* Don't cache Users.name_to_id.
* Replace calls to name_to_id with find_by_name when possible.
* Don't autodefine creator_name in belongs_to_creator.
* Don't autodefine updater_name in belongs_to_updater.
* Instead manually define creator_name / updater_name only on models that need
to return these fields in the api.
id_to_name was cached to reduce the impact of N+1 query patterns in
certain places, especially in api responses that return creator_name /
updater_name fields. But it still meant we were doing N calls to
memcache. Using `includes` to prefetch users avoids this N+1 pattern.
name_to_id had no need be cached, it was never used in any performance-
sensitive contexts.
Avoiding caching also avoids the need to keep these caches consistent.
In multi-tag searches, we calculated the tags in the sidebar by sampling
300 random posts from within the search and finding the most frequently
used tags. This meant we were effectively doing two searches on every
page load, one for the actual search and one for the sidebar. This is
not so bad for fast searches, but very bad for slow searches.
Instead, now we calculate the related tags from the current page of
results. This is much faster, at the cost of slightly lower accuracy and
the tag list changing slightly as you browse between pages.
We could use caching here, which would help when browsing between pages,
but we would still have to calculate the tags on the first page load,
which can be very slow in the worst case.
Move Post#humanized_essential_tag_string to TagSetPresenter#humanized_essential_tag_string.
This allows humanized_essential_tag_string to reuse the same set of tags
already fetched by the tag set presenter for the sidebar.
This avoids fetching the tag categories from memcache again (via
Post#typed_tags) when we're already fetched the tags once before.
This also means it's no longer necessary to cache humanized_essential_tag_string
itself in memcache, since it can be generated as quickly as the sidebar taglist.
Move PostPresenter#categorized_tag_groups to TagSetPresenter#split_tag_list_text.
This allows split_tag_list_text to reuse the same set of tags already
fetched by the tag set presenter for the sidebar.
This avoids a memcache call to get the tag categories when rendering the
tag string for the post edit form.
Refactor the tag set presenter to get both the tag categories and the
tag counts in the same call to the database, instead of getting the
counts from the db and the categories from memcache.
Refactor tag_list_html, split_tag_list_html, and inline_tag_list_html to
take the `show_extra_links` and `current_query` options explicitly,
rather than implicitly relying on CurrentUser or taking `params[:tags]`
from the template.
Captioned post previews (previews with the pool name, similarity, or
size beneath) need to have `height: auto` set, otherwise they'll default
to `height: 154px` (just enough for the image) and the caption won't be visible.