* Change cutoffs on upload page to max 5 results, min. 20% similarity.
* Change cutoffs on standalone /iqdb_queries page to max 20 results, min. 0% similarity.
* /iqdb_queries.json: add `limit` and `similarity` params to change default cutoffs.
DText is processed in three phases: a preprocessing phase, the regular
parsing phases, and a postprocessing phase.
In the preprocessing phase we extract all the wiki links from all the
dtext messages on the page (more precisely, we do this in forum threads
and on comment pages, because these are the main places with lots of
dtext). This is so we can lookup all the tags and wiki pages in one
query, which is necessary because in the worst case (in certain forum
threads and in certain list_of_* wiki pages) there can be hundreds of
tags per page.
In the postprocessing phase we fixup the html generated by the ragel
parser to add CSS classes to wiki links. We do this in a postprocessing
step because it's easier than doing it in the ragel parser itself.
* Parse the wiki page with the actual dtext parser instead of by hand.
This is so that wiki links inside things like [nodtext] or [code]
blocks are handled properly.
* Only include tags that exist and are nonempty. Don't include links to
dead pages or blank tags.
There are a handful of places where we need to strip markup from a piece
of dtext, primarily in <meta> description tags in the wiki. Currently
the dtext parser handles this by having a special mode where it parses
the text but doesn't output html tags. Here we refactor to instead parse
the text normally then strip out the html tags after the fact.
This is more flexible and allows us to simplify a lot of things in the
dtext parser. This also produces more readable output than before in
certain cases.
Previously the page-based (numbered) paginator would always count the
total_pages, even in API calls when it wasn't needed. This could be very
slow in some cases. Refactor so that total_pages isn't calculated unless
it's called.
While we're at it, refactor to condense all the sequential vs. numbered
pagination logic into one module. This incidentally fixes a couple more
bugs:
* "page=b0" returned all pages rather than nothing.
* Bad parameters like "page=blaha123" and "page=a123blah" were accepted.
Disable database timeouts durings daily maintenance. Fixes
`regenerate_post_counts!` timing out. Remove calls to without_timeout
because otherwise it will reenable the timeout when trying to restore
the old timeout (see 97cc873a3f).
This was a mod-only report that used Google BigQuery to search post
versions by tag. 2b4ee0ee8 allows all users to search post versions by
tag, so this report is no longer necessary.
* Automatically fix all tags with incorrect counts during daily
maintenance (previously only tags with negative counts were fixed).
* Log fixed tags to NewRelic.
* Remove the ability to manually fix tag counts with the "Fix" button on
the /tags listing. This is no longer necessary now that tags are
fixed automatically.
* Only check for conflicts with existing aliases/implications when
requests are created or approved, not when requests are rejected.
* Use `update!(status: "deleted")` instead of `update(status: "deleted")`
so that if rejecting the request fails we fail immediately instead of
continuing on and updating the forum topic.
* Wrap `reject!` and `TagChangeRequestPruner.reject_expired` in
transactions so that if updating either the request or the forum
fails, they both get rolled back.
Replace the `method_attributes` and `hidden_attributes` methods with
`api_attributes`. `api_attributes` can be used as a class macro:
# include only the given attributes.
api_attributes :id, :created_at, :creator_name, ...
# include all default attributes plus the `creator_name` method.
api_attributes including: [:creator_name]
or as an instance method:
def api_attributes
[:id, :created_at, :creator_name, ...]
end
By default, all attributes are included except for IP addresses and
tsvector columns.
In xml responses, if the result is an empty array we want the response
to look like this:
<posts type="array"/>
not like this (the default):
<nil-classes type="array"/>
This refactors controllers so that this is done automatically instead of
having to manually call `@things.to_xml(root: "things")` everywhere. We
do this by overriding the behavior of `respond_with` in `ApplicationResponder`
to set the `root` option by default in xml responses.
Stop using the pool_string field internally, but keep maintaining it
until we can drop it later.
* Stop using the pool_string for `pool:<name>` metatag searches.
* Stop using the pool_string in the `Post#pools` method. This is used to
get the list of pools on post show pages.
* Change the source index on posts from `(lower(source) gin_trgm_ops) WHERE source != ''`
to just `(source gin_trgm_ops)`. The WHERE clause prevented the index
from being used in source:<url> searches because we didn't specify
the `source != ''` clause in the search itself. Excluding blank
sources only saved a marginal amount of space anyway. This fixes
timeouts in source:<url> searches and in the bookmarklet (since we do
a source dupe check on the upload page too).
* Also switch from indexing `lower(name)` to `name` on pools and users.
We don't need to lowercase the column because GIN indexes can be used
with both LIKE and ILIKE queries.
Make /favorites redirect to a ordfav:<user> search instead of having a
separate view just for favorites. This duplicated a lot of code for no
good reason.
Rewrite the implementation of related tags to be simpler, faster, and
more accurate:
* The related tags are now calculated by taking a random sample of 1000
posts, finding the top 250 most frequent tags among those posts, then
ordering those tags by cosine similarity.
* Related tags can generally be calculated in 50-300ms at these sample
sizes. Very high sample sizes (25000+ posts) are still relatively fast
(1-3 seconds), but generally they don't improve accuracy much.
* Related tags are now cached in redis rather than in the tags table.
The related_tags column in the tags table is no longer used.
* Only the related tags in the search taglist are cached. The related
tags returned by the 'Related tags' button are not cached.
* The cache lifetime is a fixed 4 hours.
* The 'Related tags' button now works with metatags.
* The /related_tag page now works with metatags and multitag searches.
Fixes#4134, #4146.
* Drop support for `source:pixiv/artist-name` searches. This was a hack
that only worked on old pixiv urls that haven't been used for years.
* Replace the old SourcePattern(lower(source)) index with a trigram index.
Drop support for https://danbooru.donmai.us/cache/tags.json. This was a
nightly dump of the tags table that was originally added in #1012. It
was never documented and never really used except for by the DanbooruUp
extension.