Commit Graph

435 Commits

Author SHA1 Message Date
evazion
4074cc99f9 uploads: fix incorrect remote sizes on pixiv uploads.
Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads.

Cause: we didn't spoof the Referer header when making the HEAD request
for the image, causing Pixiv to return a 403 error.

Also fix the case where the Content-Length header is absent.
2020-06-24 03:02:45 -05:00
evazion
85f58bf2f6 newgrounds: fix style nitpicks. 2020-06-24 00:25:45 -05:00
evazion
42c06b0f1e Merge pull request #4516 from nonamethanks/add_newgrounds_support
Add NewGrounds support
2020-06-24 00:20:16 -05:00
evazion
185693b99b Merge branch 'master' into fix-pixiv-profile-url 2020-06-24 00:06:55 -05:00
evazion
617211f405 Merge pull request #4521 from nonamethanks/zerochan_png
Zerochan: Normalize png links
2020-06-23 02:38:39 -05:00
evazion
f2ae9eeae0 Merge pull request #4528 from nonamethanks/fix_4520
Twitter: don't get api without a status id
2020-06-23 02:36:32 -05:00
evazion
31802fb666 nijie: fix parallel test failures.
Nijie tests fail often under parallel testing. This is because every
test needs to login to Nijie first, but Nijie rate-limits the login
endpoint, so eventually we hit the limit and tests start failing.

This is made worse by a thundering herd problem. Eight test processes
try to login to Nijie at the same time, but only one succeeds, so the
rest sleep and try again, but they all wakeup and try again at the same
time, hitting the rate limits again.

The workaround is to set the retry limit ridiculously high, higher than
we would ideally like in production. Another workaround would be to
serialize the Nijie tests in the test suite. This can be done with
lockfiles and flock(2). This helps, but we can still hit the rate limit
even under serialized execution.
2020-06-22 22:21:17 -05:00
evazion
8c6759bbd7 nicoseiga: fix login endpoint.
* Update the login endpoint. The old endpoint returns 404 now.

  POST https://account.nicovideo.jp/api/v1/login ->
  POST https://account.nicovideo.jp/login/redirector?site=seiga

* Let Danbooru::Http cache the login request instead of caching it manually.

* Let Danbooru::Http automatically follow redirects instead of dealing
  with the Location header manually.
2020-06-22 18:46:47 -05:00
evazion
95fee75d9a nicoseiga: fix uploads not working for certain direct image urls.
Fix Nicoseiga strategy to work with certain direct image urls that we
can't otherwise extract any information from.

Examples:

* https://dic.nicovideo.jp/oekaki/52833.png
2020-06-22 16:53:50 -05:00
evazion
db3407caa3 uploads: fix uploading from source not working.
ref: 26ad844bbe (r40077579).
2020-06-22 15:32:48 -05:00
evazion
f85eef9bcd nijie: fix bug with retries returning cached responses.
Bug: if a Nijie login failed with a 429 Too Many Requests error, the
error would get cached, so when we retried the request, we would just
get our own cached response back every time. The 429 error would
eventually be passed up to the Nijie strategy, which caused random
methods to fail because they couldn't get the html page.

Fix: add the `retriable` feature *after* the `cache` feature so that
retries don't go through the cache. This is a hack. We want retries to
go at the bottom of the stack, below caching, but we can't enforce this
ordering.
2020-06-21 18:13:21 -05:00
evazion
a4efeb2260 gems: drop Mechanize, HTTParty, and Sinatra gems. 2020-06-21 15:13:42 -05:00
evazion
7e471fe223 sources: replace HTTParty with Danbooru::Http in http_exists?. 2020-06-21 15:11:56 -05:00
evazion
5604ab0079 pixiv: remove fanbox support.
This is broken and it needs to be rewritten as a separate source
strategy anyway.
2020-06-21 11:59:51 -05:00
evazion
2da8174ce2 hentai foundry: replace HTTParty with Danbooru::Http. 2020-06-21 05:22:57 -05:00
evazion
6e6ce6e62f nijie: replace Mechanize with Danbooru::Http.
The Nijie login process works like this:

* First we submit our `email` and `password` to `https://nijie.info/login_int.php`.
* Then we save the NIJIEIEID session cookie from the response.
* We optionally retry if login failed. Nijie returns 429 errors with a
  `Retry-After: 5` header if we send too many login requests. This can
  happen during parallel testing.
* We cache the login cookies for only 1 hour so we don't have to worry
  about them becoming invalid if we cache them too long.

Cookies and retrying errors on failure are handled transparently by Danbooru::Http.
2020-06-21 05:22:57 -05:00
nonamethanks
79a59e52ec Twitter: don't get api without a status id 2020-06-20 15:08:49 +02:00
evazion
26ad844bbe downloads: refactor Downloads::File into Danbooru::Http.
Remove the Downloads::File class. Move download methods to
Danbooru::Http instead. This means that:

* HTTParty has been replaced with http.rb for downloading files.

* Downloading is no longer tightly coupled to source strategies. Before
  Downloads::File tried to automatically look up the source and download
  the full size image instead if we gave it a sample url. Now we can
  do plain downloads without source strategies altering the url.

* The Cloudflare Polish check has been changed from checking for a
  Cloudflare IP to checking for the CF-Polished header. Looking up the
  list of Cloudflare IPs was slow and flaky during testing.

* The SSRF protection code has been factored out so it can be used for
  normal http requests, not just for downloads.

* The Webmock gem can be removed, since it was only used for stubbing
  out certain HTTParty requests in the download tests. The Webmock gem
  is buggy and caused certain tests to fail during CI.

* The retriable gem can be removed, since we no longer autoretry failed
  downloads. We assume that if a download fails once then retrying
  probably won't help.
2020-06-20 00:20:39 -05:00
evazion
67a52dbc2d tumblr: support new va.media.tumblr.com urls. 2020-06-19 13:53:35 -05:00
nonamethanks
8a06d3a744 Zerochan: Normalize png links 2020-06-18 08:07:45 +02:00
nonamethanks
7a41ee9c34 Add NewGrounds support 2020-06-18 03:22:30 +02:00
BrokenEagle
158a4aa916 Fix Pixiv user profile URL to use the latest format
This will only affect new artist and commentary records going forward.
2020-06-17 07:07:33 +00:00
evazion
1aa0f65187 sources: fix rubocop warnings. 2020-06-16 00:10:37 -05:00
nonamethanks
5b186f3072 Support for new nicoseiga cdn domain 2020-06-15 04:01:34 +02:00
nonamethanks
6fc4d3ec44 Nicoseiga: Add support for drm-served manga 2020-06-15 03:37:51 +02:00
nonamethanks
260bc997f6 NicoSeiga: Add preview urls 2020-06-15 03:37:51 +02:00
nonamethanks
9f0e85e1b5 Refactor nicoseiga strategy
* Get rid of mechanize, fully switch to Danbooru::Http
* Switch to mobile api, improving speed
* Merge main and manga clients
* Add full support for manga pages
* Add support for anonymous and r-15 images
* Don't fail when attempting to upload oekaki direct links
* Various misc fixes
2020-06-15 03:37:51 +02:00
evazion
d002701bc1 Merge pull request #4494 from nonamethanks/fix_deviantart_api_downloads
Deviantart: fix api downloads
2020-06-09 01:37:03 -05:00
nonamethanks
25b801619f Deviantart: fix api downloads 2020-05-31 07:01:43 +02:00
evazion
855e31ac90 nijie: fetch commentary as html instead of plaintext.
Fix regression in #4475. Fetch the commentary as html instead of
plaintext so that we don't lose links or other formatting.

Also fix it so that /jump.php redirect links are replaced with the
actual url.
2020-05-29 15:36:21 -05:00
evazion
88d9fc4e5e sources: simplify artist finder url normalization.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.

Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
2020-05-29 15:35:15 -05:00
nonamethanks
d339947647 Weibo: add source normalization 2020-05-28 01:05:11 +02:00
evazion
feeea6602c Merge pull request #4488 from nonamethanks/add_weibo_support
Add Weibo support
2020-05-27 16:53:14 -05:00
evazion
2c60a51f64 Merge pull request #4475 from nonamethanks/refactor_source_normalizing
Refactor source normalization
2020-05-27 16:52:17 -05:00
evazion
241894428a Merge pull request #4480 from BrokenEagle/fix-artstation
Fixes issues with Artstation source strategy
2020-05-27 15:37:23 -05:00
nonamethanks
5c7307a1c9 Add Weibo support 2020-05-27 11:30:05 +02:00
BrokenEagle
2d88569fac Fixes issues with Artstation source strategy
The reason that the download was failing was not because the 4k size
didn't exist, but because the Artstation had no way to handle image
cover URLs. This caused it to pass nil to the download function.

Additionally, there was no way to get the preview URL size, i.e. the
smallest available image for an Arstation image URL.

- Adds support for cover URLs
- Adds support for preview URL size
2020-05-24 00:38:54 +00:00
nonamethanks
116f3a67ef Nijie: fetch full commentary rather than truncated preview 2020-05-22 02:47:19 +02:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
deeb465b72 Merge pull request #4457 from lllusion3469/fix_da
Fix Deviantart
2020-05-11 16:22:48 -05:00
evazion
1578841a8a Merge pull request #4445 from nonamethanks/hentai_foundry_support
Add hentai-foundry support
2020-05-11 14:01:07 -05:00
lllusion3469
45ae8bfb6f deviantart: support non-downloadable videos 2020-05-11 19:51:04 +02:00
lllusion3469
40fa985e26 deviantart: use #at_css instead of #search
only one result needed, query is css
2020-05-11 19:51:04 +02:00
lllusion3469
0c180b521c deviantart: avoid download api call if not downloadable
because it's included in api_response which is part of /source.json
2020-05-11 19:51:04 +02:00
lllusion3469
70beb7288d rubocop: fix various issues 2020-05-11 19:51:04 +02:00
lllusion3469
0d5e31868f deviantart: fix non-downloadable flash files 2020-05-11 19:51:04 +02:00
lllusion3469
46e9f2dede deviantart: switch to Danbooru::Http
httprb doesn't seem to support a base_uri parameter so use URI.join with
a relative path instead
2020-05-11 16:11:15 +02:00
lllusion3469
2794cd254d deviantart: return nil on failure instead of ""
was also part of eba6440b8b
2020-05-11 16:11:15 +02:00
lllusion3469
413227e7de deviantart: remove #api_url
similar change in eba6440b8b

in case of #page it may get rid of the redirect if artist and title are
found
2020-05-11 16:11:15 +02:00
lllusion3469
c4a403afca deviantart: remove unreachable else
api_deviation is either #blank? (if condition) or #present?

was also part of eba6440b8b
2020-05-11 16:11:14 +02:00