Fix an exception when trying to fetch source data for URLs like
https://va.media.tumblr.com/tumblr_pgohk0TjhS1u7mrsl.mp4.
For these URLs it's not possible to use the trick where we try to open
the URL as a HTML page and scrape the post id from the HTML. Instead we
get the raw video if we try to to this.
Bug: When uploading a direct Pixiv image URL, we ignored it in favor of the
image URL returned by the Pixiv API. This meant if you tried to upload the
original version of a revised image, we would get the revised version instead.
Fix: When given a direct Pixiv image URL, use it as-is if it's a full
image URL. If it's a sample image URL, ignore it in favor of the full image
URL as returned by the API, unless the post is deleted and the API data
is unavailable.
Foundation changed their HTML page format and we can no longer scrape
the image URL directly from the page. Instead we have to build it based
on API data.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.
Also add more source URL tests and fix various URL parsing bugs.
When the artist name couldn't found for a Newgrounds URL, for example
for `https://www.newgrounds.com/dump/item`, then the `profile_url`
method erroneously returned `https://.newgrounds.com`. This led to an
error later on when the artist finder tried to parse the invalid URL.
Also fix `strategy_should_work` to test that the profile URL is a valid
URL, and not to try to download the file when image_urls is empty.
Sometimes when a Newgrounds post is part of a set, there is a list of
links to other posts in the set in the artist's commentary. Exclude
these links because they're not really part of the commentary.
Example: https://www.newgrounds.com/art/view/boxofwant/annie-hughes-1 (NSFW)