evazion
228850b749
newgrounds: support parsing video urls.
...
Fixes URLS like `https://www.newgrounds.com/portal/view/830293 ` being treated as bad_source.
2022-08-23 13:39:32 -05:00
evazion
9c2d362e93
tumblr: fix misparsing of image urls.
...
Fix URLs like https://yogurtmedia.tumblr.com/post/45732863347 being
misparsed as image urls.
2022-08-20 21:20:46 -05:00
evazion
9cab67c0ac
artstation: fix parsing of reserved usernames.
2022-07-06 16:00:54 -05:00
nonamethanks
e7584c7e0a
Nicoseiga: normalize oekaki links
2022-06-04 22:57:54 +02:00
evazion
6b54415c47
Merge pull request #5170 from nonamethanks/fix-fc2-bad-source
...
Fc2: don't mark valid blog page sources as bad_source
2022-05-16 15:12:07 -05:00
nonamethanks
dcbb2216aa
Fc2: don't mark valid blog page sources as bad_source
2022-05-15 18:46:50 +02:00
evazion
c07b099bf8
Fix #5152 : Nicovideo video urls getting bad_source.
2022-05-03 03:59:15 -05:00
evazion
2d9bba4abb
posts: automatically add the bad_link and bad_source tags.
...
Automatically add the bad_link tag when the source is an image url from
a known site, but it can't be converted to a page url (for example, a
Twitter or Tumblr direct image link).
Automatically add the bad_source tag when the source is from a known
site, but it's not an image or page url (for example, a Twitter or Pixiv
profile url)
2022-05-01 21:01:36 -05:00
evazion
23b8350320
sources: add image_url?, page_url?, and profile_url? methods.
...
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.
Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
nonamethanks
8edd5dd810
Add furaffinity support
2022-04-27 03:47:59 +02:00
evazion
90182148aa
Merge pull request #5137 from nonamethanks/foundation-videos
...
Foundation: fix some video posts not being extracted
2022-04-22 01:50:26 -05:00
evazion
57a92ad336
Fix #5072 : Fandom source normalization is wrong
2022-04-22 01:27:17 -05:00
evazion
40dda8a672
Merge pull request #5138 from nonamethanks/fix-fandom-links
...
Fix normalization for fandom sources
2022-04-22 00:36:11 -05:00
nonamethanks
e1b9166a56
Sources: do not use an empty else in case blocks
2022-04-22 03:53:18 +02:00
nonamethanks
3b055138ff
Fix normalization for fandom sources
2022-04-22 03:27:05 +02:00
nonamethanks
e6cb255a7a
Foundation: fix some video posts not being extracted
...
Also adjusts SourceTestHelper to not autogenerate contexts, so that
tests can be launched individually.
2022-04-21 17:54:22 +02:00
nonamethanks
c5e6044c23
Anifty: regex fixup for c9227645d9
2022-04-19 15:54:56 +02:00
nonamethanks
c9227645d9
Add anifty.jp support
2022-04-18 16:50:26 +02:00
nonamethanks
9612578fcb
Add Booth support
2022-04-16 17:52:18 +02:00
evazion
83f5124a5e
Fix #5091 : Normalize reddit sources.
2022-04-03 03:46:17 -05:00
evazion
d96db350f3
pixiv: fix non-www Pixiv urls not being recognized.
...
Fix non-www Pixiv URLs (e.g. `https://pixiv.net/users/3584828 `) URLs not
being recognized by the URL parser.
2022-04-03 03:07:42 -05:00
evazion
9c5b60b630
sources: normalize artist urls for ask.fm, ameblo.jp, anidb.net, animenewsnetwork.com
2022-03-31 02:36:17 -05:00
evazion
54cfbf84c6
pawoo: fix www.pawoo.net urls not being normalized to pawoo.net.
...
Fix artist URLs like https://www.pawoo.net/@01051708 not being normalized to https://pawoo.net/@01051708 .
2022-03-31 02:17:51 -05:00
evazion
990d7f6380
instagram: strip '@' from usernames in profile urls.
2022-03-31 02:02:28 -05:00
evazion
b8f154d301
artists: add more artist url icons.
2022-03-30 22:04:24 -05:00
evazion
a272c19b98
Fix #5078 : Pixiv booth upload broken.
...
Allow image URLs from https://booth.pximg.net to be uploaded. Fix bug
where Booth.pm URLs were incorrectly caught by the Pixiv extractor.
2022-03-30 03:25:42 -05:00
evazion
34aa22f90b
sources: fix fandom.com page urls.
...
Fix it so that sources like this:
* https://vignette.wikia.nocookie.net/valkyriecrusade/images/c/c5/Crimson_Hatsune_H.png/revision/latest?cb=20180702031954
link to this:
* https://valkyriecrusade.fandom.com/?file=Crimson_Hatsune_H.png
instead of this
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png
The `/wiki/File:$name` URL redirects to whatever wiki page contains the
image instead of showing the file itself.
2022-03-23 23:38:06 -05:00
evazion
5941c47b79
nicoseiga: support a few more url types.
2022-03-23 23:38:06 -05:00
evazion
c07c5ea594
nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls.
...
Fix the page_url method not to return URLs like this:
https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193 )
These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.
This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
2022-03-23 23:38:06 -05:00
evazion
4ef8178bd1
sources: remove canonical_url method.
...
Refactor source strategies to remove the `canonical_url` method.
`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.
This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
fbca31d29e
artists: add more artist url icons.
2022-03-23 02:59:22 -05:00
evazion
3aa5cab2aa
sources: refactor normalize_for_source.
...
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.
Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.
Finally, this fixes it to generate better page URLs in a handful of cases:
* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
770f850c66
instagram: add a couple more url types.
2022-03-22 04:35:50 -05:00
evazion
452ce8d165
artstation: add partial support for video clips ( #5063 ).
...
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
2022-03-21 16:51:42 -05:00
evazion
7c887f8adc
artists: fix exception when adding TwitPic urls.
2022-03-20 21:56:38 -05:00
evazion
7394660ba9
posts: fix exception when post has source like ' https://www.twitter.com/username '.
...
`twitter.com` sources worked but `www.twitter.com` didn't.
Also match the URL by class instead of by site name to ensure we match
the expected class.
2022-03-20 21:08:05 -05:00
evazion
01b683798e
sources: add Tinami support.
2022-03-19 00:50:36 -05:00
evazion
40cbc0423c
sources: add Instagram profile url normalization.
2022-03-18 18:20:29 -05:00
evazion
c6e528a073
artstation: normalize https://artstation.com/artist/username/albums/all/ urls.
2022-03-18 17:10:26 -05:00
evazion
cc54b5f730
fanbox: normalize http://www.pixiv.net/fanbox/creator/3113804/post urls.
2022-03-18 17:10:26 -05:00
evazion
26d23c49d0
pawoo: normalize https://pawoo.net/users/evazion urls.
2022-03-18 17:10:26 -05:00
evazion
10dac3ee51
artists: normalize urls added to artist entries.
...
When a URL is added to an artist entry, normalize it to a standard form.
Artist URLs have both a `url` column and a `normalized_url` column. The
`normalized_url` is used for artist finding and the `url` is the raw URL
entered by the user. Previously only the `normalized_url` field was
normalized; now the URL entered by the user is also converted to a
normalized form.
This means that if an URL like this is added to an artist entry:
* http://www.pixiv.net/member.php?id=1234
* http://www.pixiv.net/en/users/1234
* http://www.twitter.com/DanbooruBot/
* http://mobile.twitter.com/DanbooruBot/
It will get normalized to this:
* https://www.pixiv.net/users/1234
* https://twitter.com/DanbooruBot
This fixes problems with duplicate URLs being added to artist entries
because URLs weren't normalized to a single form.
2022-03-18 02:06:50 -05:00
evazion
455ee9a52a
fc2: parse more url types.
2022-03-18 02:06:30 -05:00
evazion
03d2a86ef1
artists: normalize fc2.com profile urls.
2022-03-17 19:42:57 -05:00
evazion
ded03df1ff
tests: fix more broken tests.
2022-03-15 05:14:56 -05:00
evazion
644dfaf74c
tests: fix broken tests.
2022-03-15 04:45:30 -05:00
evazion
133c45ee29
sources: parse more profile url formats.
...
Add support for parsing these URL formats:
* https://www.artstation.com/felipecartin/profile
* https://www.deviantart.com/nlpsllp/gallery
* https://fantia.jp/asanagi
* https://www.lofter.com/front/blog/home-page/noshiqian
* https://www.lofter.com/app/xiaokonggedmx
* https://www.lofter.com/blog/semblance
* https://q.nicovideo.jp/users/18700356
* https://dic.nicovideo.jp/u/11141663
* https://3d.nicovideo.jp/users/109584
* https://3d.nicovideo.jp/u/siobi
* https://game.nicovideo.jp/atsumaru/users/7757217
* https://www.pixiv.net/user/13569921/series/81967
* https://pixiv.cc/zerousagi/
* https://www.plurk.com/u/ddks2923
* https://www.plurk.com/m/u/leiy1225
* https://www.plurk.com/s/u/salmonroe13
* https://www.plurk.com/RSSSww/invite/4
* https://skeb.jp/@okku_oxn/works
* https://www.tumblr.com/blog/view/artofelaineho/187614935612
* https://www.tumblr.com/blog/view/artofelaineho
* https://www.tumblr.com/blog/artofelaineho
* https://www.tumblr.com/dashboard/blog/dankwartart
* https://rosarrie.tumblr.com/archive
* https://whereisnovember.tumblr.com/tagged/art
* https://twitpic.com/photos/Type10TK
* https://www.weibo.com/detail/4676597657371957
* https://www.weibo.com/u/5957640693/home?wvr=5
* https://www.weibo.com/lvxiuzi0/home
2022-03-15 00:49:54 -05:00
evazion
1d9a15a119
weibo: handle a couple more profile url types.
...
Parse these profile URL types:
* https://www.weibo.cn/endlessnsmt
* https://www.weibo.com/p/1005055399876326
Also add anchors around the regexes so they have to match the full string.
2022-03-13 20:32:57 -05:00
evazion
9343f7c912
Source::URL: add profile_url method.
...
Add a method for converting a source URL into a profile URL. This will
be used for normalizing profile URLs in artist entries.
Also add the ability to parse a few more profile URL formats.
2022-03-13 03:54:17 -05:00
evazion
787b5c8e27
sources: merge Sta.sh strategy into DeviantArt strategy.
...
This turns out to be a little simpler than keeping them separate. The
only thing special we have to do for Sta.sh is use the Sta.sh page when
we have a DeviantArt image with a Sta.sh referer.
2022-03-12 00:57:43 -06:00