Commit Graph

17 Commits

Author SHA1 Message Date
nonamethanks
d525df9ad5 Newgrounds: fix exception for deleted videos
Fixup for 8c0f2255f9
2022-11-11 12:28:23 +01:00
nonamethanks
8c0f2255f9 Newgrounds: fix support for some old videos 2022-11-11 11:01:13 +01:00
nonamethanks
35bfcbc3bd Newgrounds: support video uploads 2022-11-09 15:01:28 +01:00
nonamethanks
8008b7a5a2 Newgrounds: rewrite tests 2022-10-08 16:23:14 +02:00
evazion
228850b749 newgrounds: support parsing video urls.
Fixes URLS like `https://www.newgrounds.com/portal/view/830293` being treated as bad_source.
2022-08-23 13:39:32 -05:00
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
evazion
76d9e86724 Fix #5140: Unexpected error: PublicSuffix::DomainInvalid for searching some newgrounds urls in /artists
When the artist name couldn't found for a Newgrounds URL, for example
for `https://www.newgrounds.com/dump/item`, then the `profile_url`
method erroneously returned `https://.newgrounds.com`. This led to an
error later on when the artist finder tried to parse the invalid URL.

Also fix `strategy_should_work` to test that the profile URL is a valid
URL, and not to try to download the file when image_urls is empty.
2022-04-22 23:16:41 -05:00
evazion
ca8083465b newgrounds: exclude links to other works in commentary.
Sometimes when a Newgrounds post is part of a set, there is a list of
links to other posts in the set in the artist's commentary. Exclude
these links because they're not really part of the commentary.

Example: https://www.newgrounds.com/art/view/boxofwant/annie-hughes-1 (NSFW)
2022-04-02 23:13:26 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
f062f2d145 sources: factor out Source::URL::Newgrounds.
Also fix it so that the image URL is set as the source for Newgrounds
posts, not the page URL. It's possible to generate the page URL from the
image URL (except for images after the first in multi-image posts).

* Page: https://www.newgrounds.com/art/view/natthelich/weaver
* Image: https://art.ngfiles.com/images/1520000/1520217_natthelich_weaver.jpg?f1606365031
2022-02-25 23:04:03 -06:00
evazion
873d719481 tests: fix broken Newgrounds test.
Broken again by the source post being deleted.
2021-01-04 23:54:18 -06:00
evazion
cc64f8b7ee tests: fix broken source tests.
Fix various tests broken by source files changing or being deleted.
2020-11-10 14:52:54 -06:00
evazion
85f58bf2f6 newgrounds: fix style nitpicks. 2020-06-24 00:25:45 -05:00
nonamethanks
7a41ee9c34 Add NewGrounds support 2020-06-18 03:22:30 +02:00