Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads.
Cause: we didn't spoof the Referer header when making the HEAD request
for the image, causing Pixiv to return a 403 error.
Also fix the case where the Content-Length header is absent.
Remove the Downloads::File class. Move download methods to
Danbooru::Http instead. This means that:
* HTTParty has been replaced with http.rb for downloading files.
* Downloading is no longer tightly coupled to source strategies. Before
Downloads::File tried to automatically look up the source and download
the full size image instead if we gave it a sample url. Now we can
do plain downloads without source strategies altering the url.
* The Cloudflare Polish check has been changed from checking for a
Cloudflare IP to checking for the CF-Polished header. Looking up the
list of Cloudflare IPs was slow and flaky during testing.
* The SSRF protection code has been factored out so it can be used for
normal http requests, not just for downloads.
* The Webmock gem can be removed, since it was only used for stubbing
out certain HTTParty requests in the download tests. The Webmock gem
is buggy and caused certain tests to fail during CI.
* The retriable gem can be removed, since we no longer autoretry failed
downloads. We assume that if a download fails once then retrying
probably won't help.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.
Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
* Move the source normalization logic out of the post model
and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
and expand them significantly. Previously we were only testing
a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
When doing a tag search, we have to be careful about which user we're
running the search as because the results depend on the current user.
Specifically, things like private favorites, private favorite groups,
post votes, saved searches, and flagger names depend on the user's
permissions, and whether non-safe or deleted posts are filtered out
depend on whether the user has safe mode on or the hide deleted posts
setting enabled.
* Refactor internal searches to explicitly state whether they're
running as the system user (DanbooruBot) or as the current user.
* Explicitly pass in the current user to PostQueryBuilder instead of
implicitly relying on the CurrentUser global.
* Get rid of CurrentUser.admin_mode? (used to ignore the hide deleted
post setting) and CurrentUser.without_safe_mode (used to ignore safe
mode).
* Change the /counts/posts.json endpoint to ignore safe mode and the
hide deleted posts settings when counting posts.
* Fix searches not correctly overriding the hide deleted posts setting
when multiple status: metatags were used (e.g. `status:banned status:active`)
* Fix fast_count not respecting the hide deleted posts setting when the
status:banned metatag was used.
* Rename `unique_id` to `tag_name`.
* Add `other_names` and `profile_urls` methods that sources can override
to provide extra names or urls when creating new artist entries.
* When creating an artist by clicking the '?' next to the artist tag in
the tag list, prefill the new artist form by finding the artist's last
upload and fetching its source data.
Previously we filled the urls with the source of the artist's last
upload, which was wrong because it was usually a direct image URL (#3078).
* Fix the other names field not escaping spaces within names to underscores.
* Fix the other names field being potentially prefilled with duplicate names.
* Normalize spaces to underscores when saving other names. Preserve case
since case can be significant.
* Fix WikiPage#other_names_include to search case-insensitively (note:
this prevents using the index).
* Fix sources to return the raw tags in `#tags` and the normalized tags
in `#normalized_tags`. The normalized tags are the tags that will be
matched against other names.
Fix sources choosing the wrong strategy when the referer belongs to a
different site (for example, when uploading a twitter post with a pixiv
referer).
* Fix `match?` to only consider the main url, not the referer.
* Change `match?` to match against a list of domains given by the `domains` method.
* Change `match?` to an instance method.
Derive the artist name / profile url / page url from the source URLs when
the API response is unavailable because the Tumblr post was deleted.
This fixes the artist finder to work on bad_tumblr_id posts.
Allow searching the URL field by regex or by wildcard.
If the query looks like `/twitter/` do a regex search, otherwise if it
looks like `http://www.twitter.com/*` do a wildcard search, otherwise if
it looks like an url do an artist finder search, lastly if it looks like
`twitter` do a `*twitter*` search.
Rename Artist#find_all_by_url to url_matches and drop previous
url_matches method, along with find_artists and search_for_profile.
Previously find_artists tried to lookup the url, referer url, and profile
url in turn until an artist match was found. This was wasteful, because
the source strategy already knows which url to lookup (usually the profile
url). If that url doesn't find a match, then the artist doesn't exist.
* On the /uploads/new page, instead of just showing a "This post has
probably already been uploaded" message, show the actual thumbnails of
posts having the same source as what the user is trying to upload.
* Move the iqdb results section up top, beside the related posts section.
Resolves aliases in translated tags. For example, say we lookup `遠坂凛`
and find `tohsaka_rin` and `toosaka_rin`. We apply aliases so that
`tohsaka_rin` becomes `toosaka_rin`, which is then returned as the only
translated tag.
* Only suggest the Danbooru tag with the same name if there is no
matching wiki other name. Example: if we have the Pixiv tag `Fate` and
the Danbooru tag `fate_(series)` with other name `fate`, suggest that,
not the Danbooru tag `fate`.
* Don't suggest tags that are empty or whose wiki is deleted.
* Only split tags on "/" if there are no other matches, and only for Pixiv.
* For Pixiv, only include traditional media tags in tag list, not digital media (Photoshop, SAI).
* Add some tests.