Commit Graph

102 Commits

Author SHA1 Message Date
evazion
e3af738371 tests: fix broken tests. 2022-08-24 02:03:37 -05:00
evazion
09dfab1f0d hentai foundry: update url for Hentai Foundry tags.
Change the URL used for Hentai Foundry tags from:

    https://www.hentai-foundry.com/search/index?query=elf&search_in=keywords

to:

    https://www.hentai-foundry.com/pictures/tagged/elf
2022-08-24 00:25:37 -05:00
evazion
2c36e02810 foundation.app: fix scraping of image urls.
Foundation changed their HTML page format and we can no longer scrape
the image URL directly from the page. Instead we have to build it based
on API data.
2022-08-24 00:25:37 -05:00
evazion
228850b749 newgrounds: support parsing video urls.
Fixes URLS like `https://www.newgrounds.com/portal/view/830293` being treated as bad_source.
2022-08-23 13:39:32 -05:00
evazion
9c2d362e93 tumblr: fix misparsing of image urls.
Fix URLs like https://yogurtmedia.tumblr.com/post/45732863347 being
misparsed as image urls.
2022-08-20 21:20:46 -05:00
evazion
9cab67c0ac artstation: fix parsing of reserved usernames. 2022-07-06 16:00:54 -05:00
evazion
7149845677 Merge pull request #5202 from nonamethanks/fix-nicoseiga-oekaki-bad-tag
Nicoseiga: normalize oekaki links
2022-06-05 15:56:37 -05:00
nonamethanks
e7584c7e0a Nicoseiga: normalize oekaki links 2022-06-04 22:57:54 +02:00
nonamethanks
2fd8e9bc14 Deviantart: fix regression in 3a0a32b98a 2022-06-04 20:26:14 +02:00
nonamethanks
3a0a32b98a Fix deviantart strategy to get biggest available size 2022-05-27 17:07:22 +02:00
evazion
6b54415c47 Merge pull request #5170 from nonamethanks/fix-fc2-bad-source
Fc2: don't mark valid blog page sources as bad_source
2022-05-16 15:12:07 -05:00
nonamethanks
dcbb2216aa Fc2: don't mark valid blog page sources as bad_source 2022-05-15 18:46:50 +02:00
evazion
80f3778616 Merge pull request #5159 from nonamethanks/fix-furaffinity-ascii-urls
Furaffinity: fix uploads for non-ascii image urls
2022-05-09 14:16:32 -05:00
nonamethanks
5b8402751c Furaffinity: fix uploads for non-ascii image urls
Use Addressable::URI, which supports non-ascii urls.
2022-05-09 18:38:38 +02:00
evazion
c07b099bf8 Fix #5152: Nicovideo video urls getting bad_source. 2022-05-03 03:59:15 -05:00
evazion
2d9bba4abb posts: automatically add the bad_link and bad_source tags.
Automatically add the bad_link tag when the source is an image url from
a known site, but it can't be converted to a page url (for example, a
Twitter or Tumblr direct image link).

Automatically add the bad_source tag when the source is from a known
site, but it's not an image or page url (for example, a Twitter or Pixiv
profile url)
2022-05-01 21:01:36 -05:00
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
nonamethanks
8edd5dd810 Add furaffinity support 2022-04-27 03:47:59 +02:00
evazion
76d9e86724 Fix #5140: Unexpected error: PublicSuffix::DomainInvalid for searching some newgrounds urls in /artists
When the artist name couldn't found for a Newgrounds URL, for example
for `https://www.newgrounds.com/dump/item`, then the `profile_url`
method erroneously returned `https://.newgrounds.com`. This led to an
error later on when the artist finder tried to parse the invalid URL.

Also fix `strategy_should_work` to test that the profile URL is a valid
URL, and not to try to download the file when image_urls is empty.
2022-04-22 23:16:41 -05:00
evazion
90182148aa Merge pull request #5137 from nonamethanks/foundation-videos
Foundation: fix some video posts not being extracted
2022-04-22 01:50:26 -05:00
evazion
57a92ad336 Fix #5072: Fandom source normalization is wrong 2022-04-22 01:27:17 -05:00
evazion
40dda8a672 Merge pull request #5138 from nonamethanks/fix-fandom-links
Fix normalization for fandom sources
2022-04-22 00:36:11 -05:00
nonamethanks
e1b9166a56 Sources: do not use an empty else in case blocks 2022-04-22 03:53:18 +02:00
nonamethanks
3b055138ff Fix normalization for fandom sources 2022-04-22 03:27:05 +02:00
nonamethanks
e6cb255a7a Foundation: fix some video posts not being extracted
Also adjusts SourceTestHelper to not autogenerate contexts, so that
tests can be launched individually.
2022-04-21 17:54:22 +02:00
nonamethanks
c5e6044c23 Anifty: regex fixup for c9227645d9 2022-04-19 15:54:56 +02:00
nonamethanks
c9227645d9 Add anifty.jp support 2022-04-18 16:50:26 +02:00
evazion
d19e307e69 Merge pull request #5125 from nonamethanks/booth-support
Add Booth support
2022-04-17 23:00:14 -05:00
nonamethanks
9612578fcb Add Booth support 2022-04-16 17:52:18 +02:00
evazion
15a0fd604b tags: exclude deprecated tags from related tags list.
Don't show deprecated tags in the related tags or translated tags lists
when editing a post. It doesn't make sense to recommended adding tags
that can't be added to the post.
2022-04-13 03:07:09 -05:00
evazion
83f5124a5e Fix #5091: Normalize reddit sources. 2022-04-03 03:46:17 -05:00
evazion
d96db350f3 pixiv: fix non-www Pixiv urls not being recognized.
Fix non-www Pixiv URLs (e.g. `https://pixiv.net/users/3584828`) URLs not
being recognized by the URL parser.
2022-04-03 03:07:42 -05:00
evazion
ca8083465b newgrounds: exclude links to other works in commentary.
Sometimes when a Newgrounds post is part of a set, there is a list of
links to other posts in the set in the artist's commentary. Exclude
these links because they're not really part of the commentary.

Example: https://www.newgrounds.com/art/view/boxofwant/annie-hughes-1 (NSFW)
2022-04-02 23:13:26 -05:00
evazion
6807ed7786 Fix #5077: Images rated "Adult" on Newgrounds no longer upload. 2022-04-02 17:55:29 -05:00
evazion
9c5b60b630 sources: normalize artist urls for ask.fm, ameblo.jp, anidb.net, animenewsnetwork.com 2022-03-31 02:36:17 -05:00
evazion
54cfbf84c6 pawoo: fix www.pawoo.net urls not being normalized to pawoo.net.
Fix artist URLs like https://www.pawoo.net/@01051708 not being normalized to https://pawoo.net/@01051708.
2022-03-31 02:17:51 -05:00
evazion
990d7f6380 instagram: strip '@' from usernames in profile urls. 2022-03-31 02:02:28 -05:00
evazion
b8f154d301 artists: add more artist url icons. 2022-03-30 22:04:24 -05:00
evazion
bfbc932025 Fix #5082: NoMethodError when searching an old-style dead fanbox url in artist urls.
This API call:

    # profile: https://www.pixiv.net/fanbox/creator/40684196
    curl -H "Origin: https://fanbox.cc" "https://api.fanbox.cc/creator.get?userId=40684196"

returns `{ "body": nil }` when the artist is deleted. We didn't expect `body` to be nil.

Also fix it so that `profile_url` returns the `https://www.pixiv.net/fanbox/creator/40684196`
URL if we can't get the `https://<username>.fanbox.cc` URL, usually because the API call failed
because the artist is deleted.
2022-03-30 18:19:08 -05:00
evazion
a272c19b98 Fix #5078: Pixiv booth upload broken.
Allow image URLs from https://booth.pximg.net to be uploaded. Fix bug
where Booth.pm URLs were incorrectly caught by the Pixiv extractor.
2022-03-30 03:25:42 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
34aa22f90b sources: fix fandom.com page urls.
Fix it so that sources like this:

* https://vignette.wikia.nocookie.net/valkyriecrusade/images/c/c5/Crimson_Hatsune_H.png/revision/latest?cb=20180702031954

link to this:

* https://valkyriecrusade.fandom.com/?file=Crimson_Hatsune_H.png

instead of this

* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png

The `/wiki/File:$name` URL redirects to whatever wiki page contains the
image instead of showing the file itself.
2022-03-23 23:38:06 -05:00
evazion
5941c47b79 nicoseiga: support a few more url types. 2022-03-23 23:38:06 -05:00
evazion
c07c5ea594 nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls.
Fix the page_url method not to return URLs like this:

    https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)

These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.

This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
2022-03-23 23:38:06 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
fbca31d29e artists: add more artist url icons. 2022-03-23 02:59:22 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
770f850c66 instagram: add a couple more url types. 2022-03-22 04:35:50 -05:00
evazion
452ce8d165 artstation: add partial support for video clips (#5063).
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
2022-03-21 16:51:42 -05:00
evazion
7c887f8adc artists: fix exception when adding TwitPic urls. 2022-03-20 21:56:38 -05:00