Factor out referrer spoofing so that it can be used outside of downloading
files. We also need to spoof the referrer when determining the remote
filesize of images on the uploads page.
Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads.
Cause: we didn't spoof the Referer header when making the HEAD request
for the image, causing Pixiv to return a 403 error.
Also fix the case where the Content-Length header is absent.
These exceptions are no longer thrown now that we've switched from
HTTParty to http.rb. Swallowing unexpected exceptions during testing was
a bad practice anyway.
The Nijie login process works like this:
* First we submit our `email` and `password` to `https://nijie.info/login_int.php`.
* Then we save the NIJIEIEID session cookie from the response.
* We optionally retry if login failed. Nijie returns 429 errors with a
`Retry-After: 5` header if we send too many login requests. This can
happen during parallel testing.
* We cache the login cookies for only 1 hour so we don't have to worry
about them becoming invalid if we cache them too long.
Cookies and retrying errors on failure are handled transparently by Danbooru::Http.
Allow cookies to be saved and sent back when making several requests in
a row. Usage:
http = Danbooru::Http.use(:session)
# saves the foo=42 cookie sent by the response.
http.get("https://httpbin.org/cookies/set/foo/42")
# sends back the foo=42 cookie from the previous request.
http.get("https://httpbin.org/cookies")
Remove the Downloads::File class. Move download methods to
Danbooru::Http instead. This means that:
* HTTParty has been replaced with http.rb for downloading files.
* Downloading is no longer tightly coupled to source strategies. Before
Downloads::File tried to automatically look up the source and download
the full size image instead if we gave it a sample url. Now we can
do plain downloads without source strategies altering the url.
* The Cloudflare Polish check has been changed from checking for a
Cloudflare IP to checking for the CF-Polished header. Looking up the
list of Cloudflare IPs was slow and flaky during testing.
* The SSRF protection code has been factored out so it can be used for
normal http requests, not just for downloads.
* The Webmock gem can be removed, since it was only used for stubbing
out certain HTTParty requests in the download tests. The Webmock gem
is buggy and caused certain tests to fail during CI.
* The retriable gem can be removed, since we no longer autoretry failed
downloads. We assume that if a download fails once then retrying
probably won't help.
Revert back to previous workaround of fetching previous day if current
day returns no result. A terrible hack, really we should convert dates
to Reportbooru's timezone, but that has other complications.
This reverts commit e83d07ea7b.
It was worth a try, but unfortunately it seems that once
someone sets tools in a Pixiv upload, they become defaults and
are applied to all of their subsequent uploads, so we get some
posts with two or three different digital tags.
* Get rid of mechanize, fully switch to Danbooru::Http
* Switch to mobile api, improving speed
* Merge main and manga clients
* Add full support for manga pages
* Add support for anonymous and r-15 images
* Don't fail when attempting to upload oekaki direct links
* Various misc fixes
* Combine MissedSearchService, PostViewCountService, and
PopularSearchService into single ReportbooruService class.
* Use Danbooru::Http for these services instead of HTTParty.
Bug: Replacing posts hosted on cdn.donmai.us didn't work.
Cause: Original files on cdn.donmai.us are hosted under /var/www/danbooru/original/, but replacements
were trying to store them directly under /var/www/danbooru, which failed with a permission error.
We were trying to store them in the wrong directory because we didn't respect the `original_subdir`
option when generating file paths.
Fix regression in #4475. Fetch the commentary as html instead of
plaintext so that we don't lose links or other formatting.
Also fix it so that /jump.php redirect links are replaced with the
actual url.