Commit Graph

132 Commits

Author SHA1 Message Date
evazion
7ed8f95a8e sources: add Source::URL class; factor out Source::URL::Twitter.
Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.

This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.

This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
2022-02-23 23:46:04 -06:00
evazion
68ba447494 uploads: remove batch upload page.
* Make /uploads/batch redirect to /uploads/new.
* Remove /uploads/image_proxy.
2022-02-21 00:03:43 -06:00
evazion
7cfbd891ae pixiv: avoid unnecessary API call when uploading Pixiv posts.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.

API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
2022-02-15 18:55:12 -06:00
evazion
0ba6dc9ee5 Fix #4945: Search for an artist by URL throws an exception. 2021-12-18 01:55:29 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
bb7f24d279 Add HTTP proxy support.
Add support for using a proxy for HTTP requests. Only used for external
requests, such as downloading files or talking to source sites such as
Pixiv or Twitter, not for internal requests, such as talking to IQDB or
Reportbooru.
2021-08-28 04:53:33 -05:00
nonamethanks
073f63cfa7 Pixiv: don't add auto-generated usernames to the other names field 2021-03-16 02:44:49 +01:00
evazion
7b60a476e5 sources: add artist profile links to fetch source data box.
Add site icons linking to all the artist's sites in the fetch source
data box.

Some artist entries have a large number of URLs. Various heuristics are
applied to try to present the most useful URLs first. Dead URLs and
redundant URLs (Pixiv stacc and Twitter intent URLs) are filtered out.
Remaining URLs are sorted first by site (to put sites like Pixiv and
Twitter first), then by URL (to break ties when an artist has multiple
accounts on the same site).

Some sites have shitty hard-to-read icons. It can't be helped. The icons
are the official favicons of each site.
2021-02-26 01:24:30 -06:00
evazion
23a06aff1d Fix #4720: Pixiv commentary links all create invalid urls.
Regression caused by the switch from the mobile API to the Ajax API. In
the Ajax API, commentaries have /jump.php?<url> links that we have to strip out.
2021-02-13 17:41:01 -06:00
evazion
39cc3ed5cf pixiv: fix API breakage.
Fix the Pixiv API no longer working by rewriting the Pixiv strategy to
use the Ajax API instead of the mobile API.

Before we could authenticate in the mobile API by using the OAuth 2.0
grant_type=password authentication flow. This no longer works. Now it
requires logging in through a HTML page, which is protected by Google
reCaptcha. This makes using the mobile API infeasible.

Instead we switch to the Ajax API, which only needs a PHPSESSID to
authenticate. This can be obtained by logging in manually and using the
devtools to extract the cookie.

This also temporarily removes support for Pixiv novels. This should be
moved to a separate source strategy.
2021-02-09 06:18:36 -06:00
evazion
dbb66ace90 routes: replace hardcoded routes in models with route helpers.
Add a Routes module that gives models access to route helpers outside of
views, and use it to replace various hardcoded routes.
2020-12-24 00:17:19 -06:00
nonamethanks
9a7a1e20ca Add fanbox support 2020-08-09 00:21:57 +02:00
nonamethanks
3179509791 Uploads: Check if strategy is enabled before use
Avoid returning bare API tracebacks from pixiv et al when login details
are not configured, and instead raise a generic error.
2020-07-11 04:56:46 +02:00
evazion
185693b99b Merge branch 'master' into fix-pixiv-profile-url 2020-06-24 00:06:55 -05:00
evazion
5604ab0079 pixiv: remove fanbox support.
This is broken and it needs to be rewritten as a separate source
strategy anyway.
2020-06-21 11:59:51 -05:00
BrokenEagle
158a4aa916 Fix Pixiv user profile URL to use the latest format
This will only affect new artist and commentary records going forward.
2020-06-17 07:07:33 +00:00
evazion
1aa0f65187 sources: fix rubocop warnings. 2020-06-16 00:10:37 -05:00
evazion
88d9fc4e5e sources: simplify artist finder url normalization.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.

Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
2020-05-29 15:35:15 -05:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
49a3538933 pixiv: add support for techorus urls. 2020-03-04 00:00:39 -06:00
evazion
1244e02fe2 pixiv: handle new https://i-f.pximg.net urls. 2020-02-18 19:22:57 -06:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
03d9b3feca pixiv: support new https://www.pixiv.net/artworks/:id urls. 2019-09-24 03:33:21 -05:00
evazion
64eb6dbb2a pixiv: possible fix for #4152. 2019-09-02 13:13:58 -05:00
evazion
e781c6b608 pixiv: temp disable source strategy (#4152).
Disable use of the Pixiv API until we get it working again.
2019-09-02 11:13:00 -05:00
evazion
13dff046f7 pixiv: fix illust id parsing (fixup 8cadef2dd) 2019-01-13 15:02:51 -06:00
evazion
8cadef2dd7 pixiv: fix illust id parsing (fix #4043).
* Tighten up illust id parsing to avoid misparsing ids from
  non-illust urls (sketch urls and novel urls).

* Move id parsing tests from post_test.rb to sources/pixiv_test.rb.

* Drop support for touch.pixiv.net urls. These urls are no longer used
  by Pixiv and aren't present as the source of any posts on Danbooru.
2019-01-13 14:28:51 -06:00
evazion
04d5b16da7 pixiv: fix failure to upload bad pixiv id images (fix #4031)
Bug: Uploading bad pixiv id images failed because the pixiv strategy
raised a BadIDError exception when the upload service checked for the
ugoira frame data.
2019-01-03 18:01:20 -06:00
evazion
2129e60b2b pixiv: include stacc url in new artist entries (#4028). 2018-12-27 15:03:11 -06:00
evazion
1f73e60514 sources: add methods for customizing new artist entries.
* Rename `unique_id` to `tag_name`.

* Add `other_names` and `profile_urls` methods that sources can override
  to provide extra names or urls when creating new artist entries.
2018-12-27 15:03:11 -06:00
evazion
c700ea4b5f Fix #4016: Translated tags failing to find some tags.
* Normalize spaces to underscores when saving other names. Preserve case
  since case can be significant.

* Fix WikiPage#other_names_include to search case-insensitively (note:
  this prevents using the index).

* Fix sources to return the raw tags in `#tags` and the normalized tags
  in `#normalized_tags`. The normalized tags are the tags that will be
  matched against other names.
2018-12-16 11:37:57 -06:00
evazion
5cf6a43918 sources: fix sources sometimes choosing wrong strategy (fix #3968)
Fix sources choosing the wrong strategy when the referer belongs to a
different site (for example, when uploading a twitter post with a pixiv
referer).

* Fix `match?` to only consider the main url, not the referer.

* Change `match?` to match against a list of domains given by the `domains` method.

* Change `match?` to an instance method.
2018-11-04 13:00:17 -06:00
r888888888
e060236fb7 add exception for direct links to pixiv fanbox images 2018-10-20 23:58:19 -07:00
Albert Yi
a85f3773e3 fix nil commentary case for pixiv strategy #3948 2018-10-12 14:35:44 -07:00
evazion
fbd5f6b7f2 pixiv: fix preview_urls for ugoiras (#3891). 2018-09-12 00:43:10 -05:00
evazion
37fc215d75 pixiv: fix preview_urls to use correct url (#3891). 2018-09-11 23:55:46 -05:00
Albert Yi
4972c998f8 rely on preview urls if available for gallery 2018-09-11 15:06:12 -07:00
evazion
950fcdb7b2 uploads: add new source:<url> dupe check (fix #3873)
* On the /uploads/new page, instead of just showing a "This post has
probably already been uploaded" message, show the actual thumbnails of
posts having the same source as what the user is trying to upload.

* Move the iqdb results section up top, beside the related posts section.
2018-09-06 20:43:20 -05:00
evazion
5c457fbe51 pixiv: remove obsolete edgesuite.net rewrite rule.
This CDN hasn't been seen for several years.

ref: https://danbooru.donmai.us/forum_topics/10766
2018-09-04 18:15:21 -05:00
evazion
4bbe09762d pixiv: remove dead methods (#is_manga?, #page_count, #page). 2018-09-04 18:15:21 -05:00
Albert Yi
8ec96f42f7 fix specs 2018-09-04 13:38:09 -07:00
Albert Yi
4a56f8d160 fixes #3856 for pixiv fanbox urls 2018-09-04 12:53:58 -07:00
evazion
c689a161f6 pixiv: fix failure when normalizing pixiv stacc artist urls (#3856). 2018-08-30 19:24:44 -05:00
Albert Yi
762dc3da24 Refactor sources 2018-08-24 12:10:51 -07:00
Albert Yi
5ae37597cd fixes #3728 2018-05-25 13:24:49 -07:00
Albert Yi
c97b0245d6 reduce expiry for cached pixiv tokens to 1 week, revert to old method for extracting image url from page (fixes #3722) 2018-05-25 10:04:28 -07:00
Albert Yi
bd49f5ed20 rely on pixiv api for getting image url (fixes #3721)
[skip ci]
2018-05-22 09:46:37 -07:00
Albert Yi
be54741dba bubble up source errors in pixiv strategy in test env 2018-05-16 17:19:20 -07:00
evazion
302994e5d9 Fix #3639: Favorite count pixiv tags aren't skipped by translated tags. 2018-04-13 22:39:52 -05:00