Prevent age-restricted fanbox posts from raising errors when source data
is fetched. This prevents error messages from being shown to users when
switching to the edit tab on a post.
This will cause uploads of age-restricted posts to fail with an
unrelated error because we either can't find the image url (if we were
given only the html page) or we can't download the image (because we're
not logged in to Fanbox).
The image_url method makes a request to `https://seiga.nicovideo.jp/images/source/:image_id`
to see where this URL redirects to. Before we did a GET request, which caused it to download
the full image. This could fail with a timeout error if the download took too long. We also
cached the request, which caused the full image to be cached, even though we only need the
headers. Change it to a HEAD request so we don't have to download the entire image just to
check the URL.
* Factor out the Cloudflare Polish bypass code to a standalone feature.
* Add `http_downloader` method to the base source strategy. This is a
HTTP client that should be used for downloading images or making
requests to images. This client ensures that referrer spoofing and
Cloudflare bypassing are performed.
This fixes a bug with the upload page reporting the polished filesize
instead of the original filesize when uploading ArtStation images.
Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads.
Cause: we didn't spoof the Referer header when making the HEAD request
for the image, causing Pixiv to return a 403 error.
Also fix the case where the Content-Length header is absent.
Nijie tests fail often under parallel testing. This is because every
test needs to login to Nijie first, but Nijie rate-limits the login
endpoint, so eventually we hit the limit and tests start failing.
This is made worse by a thundering herd problem. Eight test processes
try to login to Nijie at the same time, but only one succeeds, so the
rest sleep and try again, but they all wakeup and try again at the same
time, hitting the rate limits again.
The workaround is to set the retry limit ridiculously high, higher than
we would ideally like in production. Another workaround would be to
serialize the Nijie tests in the test suite. This can be done with
lockfiles and flock(2). This helps, but we can still hit the rate limit
even under serialized execution.
Fix Nicoseiga strategy to work with certain direct image urls that we
can't otherwise extract any information from.
Examples:
* https://dic.nicovideo.jp/oekaki/52833.png
Bug: if a Nijie login failed with a 429 Too Many Requests error, the
error would get cached, so when we retried the request, we would just
get our own cached response back every time. The 429 error would
eventually be passed up to the Nijie strategy, which caused random
methods to fail because they couldn't get the html page.
Fix: add the `retriable` feature *after* the `cache` feature so that
retries don't go through the cache. This is a hack. We want retries to
go at the bottom of the stack, below caching, but we can't enforce this
ordering.
The Nijie login process works like this:
* First we submit our `email` and `password` to `https://nijie.info/login_int.php`.
* Then we save the NIJIEIEID session cookie from the response.
* We optionally retry if login failed. Nijie returns 429 errors with a
`Retry-After: 5` header if we send too many login requests. This can
happen during parallel testing.
* We cache the login cookies for only 1 hour so we don't have to worry
about them becoming invalid if we cache them too long.
Cookies and retrying errors on failure are handled transparently by Danbooru::Http.
Remove the Downloads::File class. Move download methods to
Danbooru::Http instead. This means that:
* HTTParty has been replaced with http.rb for downloading files.
* Downloading is no longer tightly coupled to source strategies. Before
Downloads::File tried to automatically look up the source and download
the full size image instead if we gave it a sample url. Now we can
do plain downloads without source strategies altering the url.
* The Cloudflare Polish check has been changed from checking for a
Cloudflare IP to checking for the CF-Polished header. Looking up the
list of Cloudflare IPs was slow and flaky during testing.
* The SSRF protection code has been factored out so it can be used for
normal http requests, not just for downloads.
* The Webmock gem can be removed, since it was only used for stubbing
out certain HTTParty requests in the download tests. The Webmock gem
is buggy and caused certain tests to fail during CI.
* The retriable gem can be removed, since we no longer autoretry failed
downloads. We assume that if a download fails once then retrying
probably won't help.
* Get rid of mechanize, fully switch to Danbooru::Http
* Switch to mobile api, improving speed
* Merge main and manga clients
* Add full support for manga pages
* Add support for anonymous and r-15 images
* Don't fail when attempting to upload oekaki direct links
* Various misc fixes
Fix regression in #4475. Fetch the commentary as html instead of
plaintext so that we don't lose links or other formatting.
Also fix it so that /jump.php redirect links are replaced with the
actual url.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.
Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
The reason that the download was failing was not because the 4k size
didn't exist, but because the Artstation had no way to handle image
cover URLs. This caused it to pass nil to the download function.
Additionally, there was no way to get the preview URL size, i.e. the
smallest available image for an Arstation image URL.
- Adds support for cover URLs
- Adds support for preview URL size