This was an obsolete URL format briefly used by Pixiv around 2019-2020.
There were only ~80 posts with sources using this format. They have been
manually fixed.
Bug: When uploading a direct Pixiv image URL, we ignored it in favor of the
image URL returned by the Pixiv API. This meant if you tried to upload the
original version of a revised image, we would get the revised version instead.
Fix: When given a direct Pixiv image URL, use it as-is if it's a full
image URL. If it's a sample image URL, ignore it in favor of the full image
URL as returned by the API, unless the post is deleted and the API data
is unavailable.
Foundation changed their HTML page format and we can no longer scrape
the image URL directly from the page. Instead we have to build it based
on API data.
Automatically add the bad_link tag when the source is an image url from
a known site, but it can't be converted to a page url (for example, a
Twitter or Tumblr direct image link).
Automatically add the bad_source tag when the source is from a known
site, but it's not an image or page url (for example, a Twitter or Pixiv
profile url)
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.
Also add more source URL tests and fix various URL parsing bugs.
When the artist name couldn't found for a Newgrounds URL, for example
for `https://www.newgrounds.com/dump/item`, then the `profile_url`
method erroneously returned `https://.newgrounds.com`. This led to an
error later on when the artist finder tried to parse the invalid URL.
Also fix `strategy_should_work` to test that the profile URL is a valid
URL, and not to try to download the file when image_urls is empty.
Don't show deprecated tags in the related tags or translated tags lists
when editing a post. It doesn't make sense to recommended adding tags
that can't be added to the post.
Sometimes when a Newgrounds post is part of a set, there is a list of
links to other posts in the set in the artist's commentary. Exclude
these links because they're not really part of the commentary.
Example: https://www.newgrounds.com/art/view/boxofwant/annie-hughes-1 (NSFW)
Fix the page_url method not to return URLs like this:
https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)
These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.
This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
Refactor source strategies to remove the `canonical_url` method.
`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.
This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.