Commit Graph

165 Commits

Author SHA1 Message Date
evazion
b8f154d301 artists: add more artist url icons. 2022-03-30 22:04:24 -05:00
evazion
bfbc932025 Fix #5082: NoMethodError when searching an old-style dead fanbox url in artist urls.
This API call:

    # profile: https://www.pixiv.net/fanbox/creator/40684196
    curl -H "Origin: https://fanbox.cc" "https://api.fanbox.cc/creator.get?userId=40684196"

returns `{ "body": nil }` when the artist is deleted. We didn't expect `body` to be nil.

Also fix it so that `profile_url` returns the `https://www.pixiv.net/fanbox/creator/40684196`
URL if we can't get the `https://<username>.fanbox.cc` URL, usually because the API call failed
because the artist is deleted.
2022-03-30 18:19:08 -05:00
evazion
a272c19b98 Fix #5078: Pixiv booth upload broken.
Allow image URLs from https://booth.pximg.net to be uploaded. Fix bug
where Booth.pm URLs were incorrectly caught by the Pixiv extractor.
2022-03-30 03:25:42 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
34aa22f90b sources: fix fandom.com page urls.
Fix it so that sources like this:

* https://vignette.wikia.nocookie.net/valkyriecrusade/images/c/c5/Crimson_Hatsune_H.png/revision/latest?cb=20180702031954

link to this:

* https://valkyriecrusade.fandom.com/?file=Crimson_Hatsune_H.png

instead of this

* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png

The `/wiki/File:$name` URL redirects to whatever wiki page contains the
image instead of showing the file itself.
2022-03-23 23:38:06 -05:00
evazion
5941c47b79 nicoseiga: support a few more url types. 2022-03-23 23:38:06 -05:00
evazion
c07c5ea594 nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls.
Fix the page_url method not to return URLs like this:

    https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)

These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.

This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
2022-03-23 23:38:06 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
fbca31d29e artists: add more artist url icons. 2022-03-23 02:59:22 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
770f850c66 instagram: add a couple more url types. 2022-03-22 04:35:50 -05:00
evazion
452ce8d165 artstation: add partial support for video clips (#5063).
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
2022-03-21 16:51:42 -05:00
evazion
7c887f8adc artists: fix exception when adding TwitPic urls. 2022-03-20 21:56:38 -05:00
evazion
7394660ba9 posts: fix exception when post has source like 'https://www.twitter.com/username'.
`twitter.com` sources worked but `www.twitter.com` didn't.

Also match the URL by class instead of by site name to ensure we match
the expected class.
2022-03-20 21:08:05 -05:00
evazion
01b683798e sources: add Tinami support. 2022-03-19 00:50:36 -05:00
evazion
40cbc0423c sources: add Instagram profile url normalization. 2022-03-18 18:20:29 -05:00
evazion
c6e528a073 artstation: normalize https://artstation.com/artist/username/albums/all/ urls. 2022-03-18 17:10:26 -05:00
evazion
cc54b5f730 fanbox: normalize http://www.pixiv.net/fanbox/creator/3113804/post urls. 2022-03-18 17:10:26 -05:00
evazion
26d23c49d0 pawoo: normalize https://pawoo.net/users/evazion urls. 2022-03-18 17:10:26 -05:00
evazion
10dac3ee51 artists: normalize urls added to artist entries.
When a URL is added to an artist entry, normalize it to a standard form.

Artist URLs have both a `url` column and a `normalized_url` column. The
`normalized_url` is used for artist finding and the `url` is the raw URL
entered by the user. Previously only the `normalized_url` field was
normalized; now the URL entered by the user is also converted to a
normalized form.

This means that if an URL like this is added to an artist entry:

* http://www.pixiv.net/member.php?id=1234
* http://www.pixiv.net/en/users/1234
* http://www.twitter.com/DanbooruBot/
* http://mobile.twitter.com/DanbooruBot/

It will get normalized to this:

* https://www.pixiv.net/users/1234
* https://twitter.com/DanbooruBot

This fixes problems with duplicate URLs being added to artist entries
because URLs weren't normalized to a single form.
2022-03-18 02:06:50 -05:00
evazion
455ee9a52a fc2: parse more url types. 2022-03-18 02:06:30 -05:00
evazion
03d2a86ef1 artists: normalize fc2.com profile urls. 2022-03-17 19:42:57 -05:00
evazion
ded03df1ff tests: fix more broken tests. 2022-03-15 05:14:56 -05:00
evazion
644dfaf74c tests: fix broken tests. 2022-03-15 04:45:30 -05:00
evazion
133c45ee29 sources: parse more profile url formats.
Add support for parsing these URL formats:

* https://www.artstation.com/felipecartin/profile
* https://www.deviantart.com/nlpsllp/gallery
* https://fantia.jp/asanagi
* https://www.lofter.com/front/blog/home-page/noshiqian
* https://www.lofter.com/app/xiaokonggedmx
* https://www.lofter.com/blog/semblance
* https://q.nicovideo.jp/users/18700356
* https://dic.nicovideo.jp/u/11141663
* https://3d.nicovideo.jp/users/109584
* https://3d.nicovideo.jp/u/siobi
* https://game.nicovideo.jp/atsumaru/users/7757217
* https://www.pixiv.net/user/13569921/series/81967
* https://pixiv.cc/zerousagi/
* https://www.plurk.com/u/ddks2923
* https://www.plurk.com/m/u/leiy1225
* https://www.plurk.com/s/u/salmonroe13
* https://www.plurk.com/RSSSww/invite/4
* https://skeb.jp/@okku_oxn/works
* https://www.tumblr.com/blog/view/artofelaineho/187614935612
* https://www.tumblr.com/blog/view/artofelaineho
* https://www.tumblr.com/blog/artofelaineho
* https://www.tumblr.com/dashboard/blog/dankwartart
* https://rosarrie.tumblr.com/archive
* https://whereisnovember.tumblr.com/tagged/art
* https://twitpic.com/photos/Type10TK
* https://www.weibo.com/detail/4676597657371957
* https://www.weibo.com/u/5957640693/home?wvr=5
* https://www.weibo.com/lvxiuzi0/home
2022-03-15 00:49:54 -05:00
evazion
1d9a15a119 weibo: handle a couple more profile url types.
Parse these profile URL types:

* https://www.weibo.cn/endlessnsmt
* https://www.weibo.com/p/1005055399876326

Also add anchors around the regexes so they have to match the full string.
2022-03-13 20:32:57 -05:00
evazion
9343f7c912 Source::URL: add profile_url method.
Add a method for converting a source URL into a profile URL. This will
be used for normalizing profile URLs in artist entries.

Also add the ability to parse a few more profile URL formats.
2022-03-13 03:54:17 -05:00
evazion
787b5c8e27 sources: merge Sta.sh strategy into DeviantArt strategy.
This turns out to be a little simpler than keeping them separate. The
only thing special we have to do for Sta.sh is use the Sta.sh page when
we have a DeviantArt image with a Sta.sh referer.
2022-03-12 00:57:43 -06:00
evazion
28971fe103 sources: factor out site_name method. 2022-03-11 23:20:53 -06:00
evazion
b4aea72d04 sources: remove preview_urls method from base strategy.
Remove the `preview_urls` method from strategies. The only place this was used was
when doing IQDB searches, to download the thumbnail image from the source instead of
the full image.

This wasn't worth it for a few reasons:

* Thumbnails on other sites are sometimes not the size we want, which could affect
  IQDB results.
* Grabbing thumbnails is complex for some sites. You can't always just rewrite the
  image URL. Sometimes it requires extra API calls, which can be slower than just
  grabbing the full image.
* For videos and animations, thumbnails from other sites don't always match our
  thumbnails. We do smart thumbnail generation to try to avoid blank thumbnails, which
  means we don't always pick the first frame, which could affect IQDB results.

API changes:

* /iqdb_queries?search[file_url] now downloads the URL as is without any modification.
  Before it tried to change thumbnail and sample size image URLs to the full version.

* /iqdb_queries?search[url] now returns an error if the URL is for a HTML page that
  contains multiple images. Before it would grab only the first image and silently
  ignore the rest.
2022-03-11 03:22:23 -06:00
evazion
5016d9ad26 Merge pull request #5043 from nonamethanks/fantia-support
Add Fantia support
2022-03-10 15:21:03 -06:00
evazion
29fc072cf1 Merge pull request #5042 from nonamethanks/weibo-fix-typo
weibo: fix typo in strategy
2022-03-10 15:01:12 -06:00
nonamethanks
a6549bc6fe Add Fantia support
Also fixes a regression in 74fdeef10c
that stopped mastodon urls from being given the right priority.
2022-03-10 17:43:32 +01:00
evazion
43a665a66d sources: factor out Source::URL::NicoSeiga. 2022-03-10 04:53:51 -06:00
nonamethanks
93adba06e5 weibo: fix typo in strategy 2022-03-10 08:31:23 +01:00
evazion
34854185be sources: factor out Source::URL::DeviantArt and Source::URL::Stash. 2022-03-10 00:29:49 -06:00
evazion
bb4b8619f5 pixiv: fix Source::URL::Pixiv not being included in Source::URL list. 2022-03-09 01:14:09 -06:00
evazion
77c88fd867 Merge pull request #5038 from nonamethanks/remove-redundant-comments
sources: remove redundant comments
2022-03-08 23:28:29 -06:00
evazion
6afb2f8e3c Merge pull request #5037 from nonamethanks/tumblr-refactor
sources: factor out Source::URL::Tumblr
2022-03-08 23:26:30 -06:00
evazion
cf4b9a6114 Merge pull request #5039 from nonamethanks/simplify-lofter-tag-parsing
Lofter: simplify tag extraction logic
2022-03-08 23:21:57 -06:00
evazion
52a2d3418c pixiv: fixup bugs in 1c620f805.
* Fix error when uploading non-ugoira files.
* Fix sample image URLs not being rewritten to full images correctly. We
  have to get the full image URL from the API because given an
  /img-master/ URL, we don't know what the original file extension is.
2022-03-08 23:07:24 -06:00
evazion
1c620f8055 sources: factor out Source::URL::Pixiv.
* Drop support for preview_urls. This means that IQDB lookups may be
  slower, especially for ugoiras, since we have to download the full
  ugoira now. However, ugoira lookups should produce better results,
  since the ugoira thumbnail chosen by Pixiv wasn't necessarily the same
  as the thumbnail chosen by Danbooru.

* Drop support for uploading single manga pages:

    http://www.pixiv.net/member_illust.php?mode=manga_big&illust_id=18557054&page=2

  Previously uploading an URL like this would only upload a single image
  out of a multi-image work. Now it will upload all images in the work.
  Pixiv no longer supports URLs like this, so we don't either.

* Add support for parsing URLs like this:

    https://i.pximg.net/c/360x360_70/custom-thumb/img/2022/03/08/00/00/56/96755248_p0_custom1200.jpg

  Apparently artists can choose a custom thumbnail now (not like anyone
  will try to upload one though).
2022-03-08 22:17:38 -06:00
evazion
df0bb70486 sources: factor out Source::URL::PixivSketch.
Add upload support for Pixiv Sketch. Fetch tags, commentary, and artist,
and rewrite sample images to full images.

Authentication isn't required. R18 images are hidden in the browser but
visible in the API.
2022-03-08 18:24:12 -06:00
nonamethanks
ff6bfff311 Lofter: simplify tag extraction logic
Now that we have a separate parsing class we can just use it to properly
parse tag urls as well.
2022-03-08 17:01:50 +01:00
nonamethanks
ebd3670076 sources: remove redundant comments
These comments are already present under the parse blocks, so the huge
walls of text before the code are not needed anymore.
2022-03-08 16:56:00 +01:00
nonamethanks
b9c7e467e5 sources: factor out Source::URL::Tumblr
Also adds support for fetching source data from direct image urls when
possible.
2022-03-08 15:06:06 +01:00
evazion
de61e56161 Merge pull request #5032 from nonamethanks/factor-out-weibo
sources: factor out Source::URL::Weibo
2022-03-07 18:31:15 -06:00
nonamethanks
d8e2f2ee33 sources: factor out Source::URL::Weibo
Additionally, fixed some broken tests and changed normalization for urls
of album type to point to the mobile version instead, because they're
only visible to logged-in users.
2022-03-07 16:52:43 +01:00
nonamethanks
d195d30587 Foundation: fix normalization error
Urls like https://foundation.app/@yohan1754/fso/3 would get normalized
like https://foundation.app/@foundation/foundation/3, which was wrong
because it would point to a completely different collection
2022-03-07 06:52:23 +01:00
user
2600dcdbfa nijie: extract post ID from new image URL. 2022-02-28 21:14:47 +01:00