Commit Graph

406 Commits

Author SHA1 Message Date
evazion
855e31ac90 nijie: fetch commentary as html instead of plaintext.
Fix regression in #4475. Fetch the commentary as html instead of
plaintext so that we don't lose links or other formatting.

Also fix it so that /jump.php redirect links are replaced with the
actual url.
2020-05-29 15:36:21 -05:00
evazion
88d9fc4e5e sources: simplify artist finder url normalization.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.

Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
2020-05-29 15:35:15 -05:00
nonamethanks
d339947647 Weibo: add source normalization 2020-05-28 01:05:11 +02:00
evazion
feeea6602c Merge pull request #4488 from nonamethanks/add_weibo_support
Add Weibo support
2020-05-27 16:53:14 -05:00
evazion
2c60a51f64 Merge pull request #4475 from nonamethanks/refactor_source_normalizing
Refactor source normalization
2020-05-27 16:52:17 -05:00
evazion
241894428a Merge pull request #4480 from BrokenEagle/fix-artstation
Fixes issues with Artstation source strategy
2020-05-27 15:37:23 -05:00
nonamethanks
5c7307a1c9 Add Weibo support 2020-05-27 11:30:05 +02:00
BrokenEagle
2d88569fac Fixes issues with Artstation source strategy
The reason that the download was failing was not because the 4k size
didn't exist, but because the Artstation had no way to handle image
cover URLs. This caused it to pass nil to the download function.

Additionally, there was no way to get the preview URL size, i.e. the
smallest available image for an Arstation image URL.

- Adds support for cover URLs
- Adds support for preview URL size
2020-05-24 00:38:54 +00:00
nonamethanks
116f3a67ef Nijie: fetch full commentary rather than truncated preview 2020-05-22 02:47:19 +02:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
deeb465b72 Merge pull request #4457 from lllusion3469/fix_da
Fix Deviantart
2020-05-11 16:22:48 -05:00
evazion
1578841a8a Merge pull request #4445 from nonamethanks/hentai_foundry_support
Add hentai-foundry support
2020-05-11 14:01:07 -05:00
lllusion3469
45ae8bfb6f deviantart: support non-downloadable videos 2020-05-11 19:51:04 +02:00
lllusion3469
40fa985e26 deviantart: use #at_css instead of #search
only one result needed, query is css
2020-05-11 19:51:04 +02:00
lllusion3469
0c180b521c deviantart: avoid download api call if not downloadable
because it's included in api_response which is part of /source.json
2020-05-11 19:51:04 +02:00
lllusion3469
70beb7288d rubocop: fix various issues 2020-05-11 19:51:04 +02:00
lllusion3469
0d5e31868f deviantart: fix non-downloadable flash files 2020-05-11 19:51:04 +02:00
lllusion3469
46e9f2dede deviantart: switch to Danbooru::Http
httprb doesn't seem to support a base_uri parameter so use URI.join with
a relative path instead
2020-05-11 16:11:15 +02:00
lllusion3469
2794cd254d deviantart: return nil on failure instead of ""
was also part of eba6440b8b
2020-05-11 16:11:15 +02:00
lllusion3469
413227e7de deviantart: remove #api_url
similar change in eba6440b8b

in case of #page it may get rid of the redirect if artist and title are
found
2020-05-11 16:11:15 +02:00
lllusion3469
c4a403afca deviantart: remove unreachable else
api_deviation is either #blank? (if condition) or #present?

was also part of eba6440b8b
2020-05-11 16:11:14 +02:00
lllusion3469
f4b4e12235 deviantart: use image_url as it's a single image 2020-05-11 16:10:56 +02:00
lllusion3469
769bf87a4a deviantart: don't apply /intermediary/ hack for gifs
gifs are always stored as original anyways so the /intermediary/ url
doesn't actually exist for gifs

example:
https://www.deviantart.com/heartgear/art/Silent-Night-579982816
2020-05-11 16:10:33 +02:00
lllusion3469
c2e86385a3 deviantart: don't strip metadata
was also part of eba6440b8b
2020-05-11 16:10:33 +02:00
lllusion3469
1a49ef46f9 deviantart: cache refresh token for 11 weeks
it's valid for 3 months according to this:
https://www.deviantart.com/developers/authentication#refresh

use 11 weeks instead to be safe
2020-05-11 16:10:33 +02:00
lllusion3469
f58564a71f deviantart: don't rewrite download url
it's all handled through something like
https://api-da.wixmp.com/_api/download/file?downloadToken=$TOKEN
now so those modifications aren't necessary anymore.
In fact, the one to "strip s3 query params" removes the token, breaking
the download url.
2020-05-11 16:10:32 +02:00
lllusion3469
9205c32424 deviantart: revert to 7f482dc35b
that's the latest commit made to deviantart files before switching from
the developer API to the Javascript backend from the new "Eclipse"
frontend.
This is necessary because it's basically impossible to download posts
now with the JS backend without being logged in, i.e. having the cookies
from a logged in user, which can't be used for very long even if
exporting them from a browser. You would have to save the cookies
deviantart sends you back via the "Set-Cookie" header in a database
somewhere in addition to the other added complexity.

also
* (temporarily) replace HttpartyCache with HTTParty as it's long been
  removed
* fix one case of "last argument as keyword parameter"
* change repository url (5d1a1cc87e)
* remove self-explanatory comment
2020-05-11 16:09:00 +02:00
evazion
e3187e0bd0 tags: add general?, character?, copyright?, artist?, meta?, empty? helper methods. 2020-05-10 23:56:50 -05:00
evazion
f38c38f26e search: split tag_match into user_tag_match / system_tag_match.
When doing a tag search, we have to be careful about which user we're
running the search as because the results depend on the current user.
Specifically, things like private favorites, private favorite groups,
post votes, saved searches, and flagger names depend on the user's
permissions, and whether non-safe or deleted posts are filtered out
depend on whether the user has safe mode on or the hide deleted posts
setting enabled.

* Refactor internal searches to explicitly state whether they're
  running as the system user (DanbooruBot) or as the current user.
* Explicitly pass in the current user to PostQueryBuilder instead of
  implicitly relying on the CurrentUser global.
* Get rid of CurrentUser.admin_mode? (used to ignore the hide deleted
  post setting) and CurrentUser.without_safe_mode (used to ignore safe
  mode).
* Change the /counts/posts.json endpoint to ignore safe mode and the
  hide deleted posts settings when counting posts.
* Fix searches not correctly overriding the hide deleted posts setting
  when multiple status: metatags were used (e.g. `status:banned status:active`)
* Fix fast_count not respecting the hide deleted posts setting when the
  status:banned metatag was used.
2020-05-07 03:29:44 -05:00
nonamethanks
9724b206b0 Add hentai-foundry support 2020-05-05 21:54:34 +02:00
nonamethanks
804c87a0ea Artstation: artists can have underscores in their name 2020-04-17 01:10:23 +02:00
lllusion3469
65fb0cf510 tumblr: pick biggest image based on resolution
photo[:alt_sizes] may contain a bigger image than
photo[:original_size]

fixes #4324
2020-04-08 16:42:17 +02:00
BrokenEagle
a45ae09d72 Account for additional Twitter video image links 2020-03-29 19:27:05 +00:00
BrokenEagle
9e16c01285 Allow video thumbnails as direct Twitter images 2020-03-20 20:59:37 +00:00
evazion
ddffffb413 artists: factor out artist finder to separate module. 2020-03-06 23:23:38 -06:00
evazion
49a3538933 pixiv: add support for techorus urls. 2020-03-04 00:00:39 -06:00
evazion
266e4054b0 Fix #4293: ArtStation: use 4k images.
Also fixes #4290 (Image replacements: undefined method hostname for nil:NilClass)
2020-03-03 23:01:29 -06:00
evazion
1244e02fe2 pixiv: handle new https://i-f.pximg.net urls. 2020-02-18 19:22:57 -06:00
BrokenEagle
0569e8346c Fix profile url for normalization when Pawoo errors 2020-01-29 22:27:10 +00:00
evazion
60bf21ff80 twitter: fix preview_urls when source url is a direct image.
Fix preview_urls returning an empty array when the source url is a
direct image from Twitter.

Also return preview_urls in /source.json.
2020-01-21 16:34:03 -06:00
evazion
e42881fbbf Fix #4262: Exception when using Twitter video image links. 2020-01-15 15:20:33 -06:00
evazion
aff3d3b18f Fix various rubocop issues. 2020-01-11 19:01:40 -06:00
BrokenEagle
114637f594 Use more efficient use of regexes for Twitter hashtag normalization 2020-01-10 15:30:54 +00:00
BrokenEagle
2eef73e8fe Add additional Twitter hashtag
- It was mentioned on issue #4047 but was forgotten to be included
2020-01-10 14:52:49 +00:00
evazion
faeec18efc twitter: add hashtag normalization test.
Add test for #4243. Also fix warning from bootsnap:

    iseq.rb:18: warning: nested repeat operator '+' and '?' was replaced with '*' in regular expression: /(?<!\A)生誕祭(?:\d+)?\z/
2020-01-05 17:38:30 -06:00
BrokenEagle
67860a845c Add normalization for Twitter hashtags 2020-01-05 23:01:50 +00:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
84ba1d417f Fix #4220: Uploading from Tumblr is broken. 2019-12-15 19:04:52 -06:00
evazion
41378bc8e3 sources: replace HttpartyCache with Danbooru::Http. 2019-12-15 17:06:58 -06:00
evazion
da84e3a2f2 twitter: replace twitter gem with our own API client.
The twitter gem had several problems:

* It's been unmaintained for over a year.
* It pulled in a lot of dependencies, many of which were outdated. In
  particular, it locked the `http` gem to version 3.3, preventing us
  from upgrading to 4.2.
* It raised exceptions on normal error conditions, like for deleted
  tweets or suspended users, which we really don't want.
* We had to wrap it to provide caching.

Changes:

* Fixes #4226 (Exception when creating new artists entries for suspended
  Twitter accounts)
* Drop support for scraping images from summary cards. Summary cards
  are the previews you get when you link to a website in a tweet. These
  preview images aren't always the best image.
2019-12-13 17:27:03 -06:00