Add support for uploading posts from Gelbooru. Note that the translated
tags will include both the Gelbooru tags and the tags from the Gelbooru
post's source. The commentary and artist information will also be taken
from the Gelbooru post's source. The source of the Danbooru post however
will be left as the Gelbooru post itself, not as the Gelbooru post's source.
* Fixed a bug where manga posts with a single tag would raise an error
* Fixed a bug where dic.nicovideo.jp/oekaki posts weren't uploadable due
to SSL issues
* Added support for more manga corner cases
Fix an exception when trying to fetch source data for URLs like
https://va.media.tumblr.com/tumblr_pgohk0TjhS1u7mrsl.mp4.
For these URLs it's not possible to use the trick where we try to open
the URL as a HTML page and scrape the post id from the HTML. Instead we
get the raw video if we try to to this.
This was an obsolete URL format briefly used by Pixiv around 2019-2020.
There were only ~80 posts with sources using this format. They have been
manually fixed.
Bug: When uploading a direct Pixiv image URL, we ignored it in favor of the
image URL returned by the Pixiv API. This meant if you tried to upload the
original version of a revised image, we would get the revised version instead.
Fix: When given a direct Pixiv image URL, use it as-is if it's a full
image URL. If it's a sample image URL, ignore it in favor of the full image
URL as returned by the API, unless the post is deleted and the API data
is unavailable.
Automatically add the bad_link tag when the source is an image url from
a known site, but it can't be converted to a page url (for example, a
Twitter or Tumblr direct image link).
Automatically add the bad_source tag when the source is from a known
site, but it's not an image or page url (for example, a Twitter or Pixiv
profile url)
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.
Also add more source URL tests and fix various URL parsing bugs.
Fix the page_url method not to return URLs like this:
https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)
These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.
This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
Refactor source strategies to remove the `canonical_url` method.
`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.
This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.