Add support for uploading posts from Gelbooru. Note that the translated
tags will include both the Gelbooru tags and the tags from the Gelbooru
post's source. The commentary and artist information will also be taken
from the Gelbooru post's source. The source of the Danbooru post however
will be left as the Gelbooru post itself, not as the Gelbooru post's source.
Remove the last remaining uses of the PixivUgoiraFrameData model. As of
32bfb8407, Ugoira frame data is now stored in the MediaMetadata model,
under the `Ugoira:FrameDelays` EXIF field.
The pixiv_ugoira_frame_data table still exists, but it can be removed
after this commit is deployed.
Fixes#5264: Error when replacing with ugoira.
* Fixed a bug where manga posts with a single tag would raise an error
* Fixed a bug where dic.nicovideo.jp/oekaki posts weren't uploadable due
to SSL issues
* Added support for more manga corner cases
Remove the `CurrentUser.ip_addr` global variable and replace it with
`request.remote_ip`. Before we had to track the current user's IP in a
global variable so that when we edited a post for example, we could pass
down the user's IP to the model and save it in the post_versions table.
Now that we now longer save IPs in version tables, we don't need a global
variable to get access to the current user's IP outside of controllers.
Fix an exception when trying to fetch source data for URLs like
https://va.media.tumblr.com/tumblr_pgohk0TjhS1u7mrsl.mp4.
For these URLs it's not possible to use the trick where we try to open
the URL as a HTML page and scrape the post id from the HTML. Instead we
get the raw video if we try to to this.
This was an obsolete URL format briefly used by Pixiv around 2019-2020.
There were only ~80 posts with sources using this format. They have been
manually fixed.
Bug: When uploading a direct Pixiv image URL, we ignored it in favor of the
image URL returned by the Pixiv API. This meant if you tried to upload the
original version of a revised image, we would get the revised version instead.
Fix: When given a direct Pixiv image URL, use it as-is if it's a full
image URL. If it's a sample image URL, ignore it in favor of the full image
URL as returned by the API, unless the post is deleted and the API data
is unavailable.
Foundation changed their HTML page format and we can no longer scrape
the image URL directly from the page. Instead we have to build it based
on API data.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.
Also add more source URL tests and fix various URL parsing bugs.
When the artist name couldn't found for a Newgrounds URL, for example
for `https://www.newgrounds.com/dump/item`, then the `profile_url`
method erroneously returned `https://.newgrounds.com`. This led to an
error later on when the artist finder tried to parse the invalid URL.
Also fix `strategy_should_work` to test that the profile URL is a valid
URL, and not to try to download the file when image_urls is empty.
Sometimes when a Newgrounds post is part of a set, there is a list of
links to other posts in the set in the artist's commentary. Exclude
these links because they're not really part of the commentary.
Example: https://www.newgrounds.com/art/view/boxofwant/annie-hughes-1 (NSFW)
Fix the page_url method not to return URLs like this:
https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)
These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.
This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
Refactor source strategies to remove the `canonical_url` method.
`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.
This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
Support grabbing the full image for Tinami uploads, rather than the sample.
Getting the full image requires making a request like this:
curl -X POST \
-H 'Referer: https://www.tinami.com/' \
-H 'Content-Type: application/x-www-form-urlencoded' \
-H 'Cookie: Tinami2SESSID=<redacted>;' \
--data-raw 'action_view_original=true&cont_id=1087268ðna_csrf=<redacted>' \
https://www.tinami.com/view/1087268
Then scraping the <img> tag from the resulting HTML page.
If the post has multiple images, then we need to scrape and pass the
`sub_id` of the image too.
Fixes#2818.