Commit Graph

101 Commits

Author SHA1 Message Date
evazion
c19fc16885 sources: don't escape Unicode characters in tag search URLs.
Fix it so that Unicode characters aren't unnecessarily percent-encoded when generating tag search
URLs. For example, generate URLs like this:

* https://www.pixiv.net/tags/オリジナル/artworks

Not like this:

* https://www.pixiv.net/tags/%E3%82%AA%E3%83%AA%E3%82%B8%E3%83%8A%E3%83%AB/artworks
2022-12-02 16:35:49 -06:00
evazion
2f508f03cf Fix #5362: Suggest ai-generated as a translated tag from pixiv response 2022-12-02 15:42:11 -06:00
evazion
869fddbb1a tests: fix broken tests. 2022-11-16 22:26:01 -06:00
evazion
c2adf279ee ugoira: remove the PixivUgoiraFrameData model.
Remove the last remaining uses of the PixivUgoiraFrameData model. As of
32bfb8407, Ugoira frame data is now stored in the MediaMetadata model,
under the `Ugoira:FrameDelays` EXIF field.

The pixiv_ugoira_frame_data table still exists, but it can be removed
after this commit is deployed.

Fixes #5264: Error when replacing with ugoira.
2022-10-10 18:21:30 -05:00
evazion
1d2bac7b95 Remove CurrentUser.ip_addr.
Remove the `CurrentUser.ip_addr` global variable and replace it with
`request.remote_ip`. Before we had to track the current user's IP in a
global variable so that when we edited a post for example, we could pass
down the user's IP to the model and save it in the post_versions table.
Now that we now longer save IPs in version tables, we don't need a global
variable to get access to the current user's IP outside of controllers.
2022-09-18 05:02:10 -05:00
evazion
600bdc9ae6 pixiv: drop support for https://tc-pximg01.techorus-cdn.com urls.
This was an obsolete URL format briefly used by Pixiv around 2019-2020.
There were only ~80 posts with sources using this format. They have been
manually fixed.
2022-08-24 15:54:10 -05:00
evazion
bf3ee9cfb8 Fix #5238: Trying to upload a pixiv direct image url that got trumped by a revision redirects to the new post if it's uploaded.
Bug: When uploading a direct Pixiv image URL, we ignored it in favor of the
image URL returned by the Pixiv API. This meant if you tried to upload the
original version of a revised image, we would get the revised version instead.

Fix: When given a direct Pixiv image URL, use it as-is if it's a full
image URL. If it's a sample image URL, ignore it in favor of the full image
URL as returned by the API, unless the post is deleted and the API data
is unavailable.
2022-08-24 15:40:04 -05:00
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
1c620f8055 sources: factor out Source::URL::Pixiv.
* Drop support for preview_urls. This means that IQDB lookups may be
  slower, especially for ugoiras, since we have to download the full
  ugoira now. However, ugoira lookups should produce better results,
  since the ugoira thumbnail chosen by Pixiv wasn't necessarily the same
  as the thumbnail chosen by Danbooru.

* Drop support for uploading single manga pages:

    http://www.pixiv.net/member_illust.php?mode=manga_big&illust_id=18557054&page=2

  Previously uploading an URL like this would only upload a single image
  out of a multi-image work. Now it will upload all images in the work.
  Pixiv no longer supports URLs like this, so we don't either.

* Add support for parsing URLs like this:

    https://i.pximg.net/c/360x360_70/custom-thumb/img/2022/03/08/00/00/56/96755248_p0_custom1200.jpg

  Apparently artists can choose a custom thumbnail now (not like anyone
  will try to upload one though).
2022-03-08 22:17:38 -06:00
evazion
df0bb70486 sources: factor out Source::URL::PixivSketch.
Add upload support for Pixiv Sketch. Fetch tags, commentary, and artist,
and rewrite sample images to full images.

Authentication isn't required. R18 images are hidden in the browser but
visible in the API.
2022-03-08 18:24:12 -06:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
0ba6dc9ee5 Fix #4945: Search for an artist by URL throws an exception. 2021-12-18 01:55:29 -06:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
ac12efb636 tests: fix test failures when running without API keys.
Fix the test suite failing when trying to run it in the default state
with no config file or API keys configured. Most source sites require
API keys or login credentials to be set in order to work. Skip these
tests when credentials aren't configured.
2021-09-22 04:33:36 -05:00
evazion
a3587c30b2 Fix broken tests. 2021-08-28 04:53:33 -05:00
nonamethanks
073f63cfa7 Pixiv: don't add auto-generated usernames to the other names field 2021-03-16 02:44:49 +01:00
evazion
23a06aff1d Fix #4720: Pixiv commentary links all create invalid urls.
Regression caused by the switch from the mobile API to the Ajax API. In
the Ajax API, commentaries have /jump.php?<url> links that we have to strip out.
2021-02-13 17:41:01 -06:00
evazion
39cc3ed5cf pixiv: fix API breakage.
Fix the Pixiv API no longer working by rewriting the Pixiv strategy to
use the Ajax API instead of the mobile API.

Before we could authenticate in the mobile API by using the OAuth 2.0
grant_type=password authentication flow. This no longer works. Now it
requires logging in through a HTML page, which is protected by Google
reCaptcha. This makes using the mobile API infeasible.

Instead we switch to the Ajax API, which only needs a PHPSESSID to
authenticate. This can be obtained by logging in manually and using the
devtools to extract the cookie.

This also temporarily removes support for Pixiv novels. This should be
moved to a separate source strategy.
2021-02-09 06:18:36 -06:00
evazion
dbb66ace90 routes: replace hardcoded routes in models with route helpers.
Add a Routes module that gives models access to route helpers outside of
views, and use it to replace various hardcoded routes.
2020-12-24 00:17:19 -06:00
evazion
cc64f8b7ee tests: fix broken source tests.
Fix various tests broken by source files changing or being deleted.
2020-11-10 14:52:54 -06:00
nonamethanks
9a7a1e20ca Add fanbox support 2020-08-09 00:21:57 +02:00
evazion
4074cc99f9 uploads: fix incorrect remote sizes on pixiv uploads.
Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads.

Cause: we didn't spoof the Referer header when making the HEAD request
for the image, causing Pixiv to return a 403 error.

Also fix the case where the Content-Length header is absent.
2020-06-24 03:02:45 -05:00
evazion
185693b99b Merge branch 'master' into fix-pixiv-profile-url 2020-06-24 00:06:55 -05:00
evazion
83a8468ee9 tests: remove unnecessary rescueing of Net::OpenTimeout errors.
These exceptions are no longer thrown now that we've switched from
HTTParty to http.rb. Swallowing unexpected exceptions during testing was
a bad practice anyway.
2020-06-23 03:12:44 -05:00
evazion
8a21c9a8db Merge pull request #4523 from nonamethanks/revert_pixiv_tools
Revert "Pixiv: don't blacklist digital tools"
2020-06-23 02:39:18 -05:00
evazion
5604ab0079 pixiv: remove fanbox support.
This is broken and it needs to be rewritten as a separate source
strategy anyway.
2020-06-21 11:59:51 -05:00
nonamethanks
0a396c8b95 Revert "Pixiv: don't blacklist digital tools"
This reverts commit e83d07ea7b.

It was worth a try, but unfortunately it seems that once
someone sets tools in a Pixiv upload, they become defaults and
are applied to all of their subsequent uploads, so we get some
posts with two or three different digital tags.
2020-06-19 08:08:46 +02:00
BrokenEagle
158a4aa916 Fix Pixiv user profile URL to use the latest format
This will only affect new artist and commentary records going forward.
2020-06-17 07:07:33 +00:00
BrokenEagle
05f9b78ee3 Distinctly separate and label explicit/guro content in Pixiv test
This helps discern why these tests might be failing and serve as a
reminder to set the permissions for the Pixiv account correctly.
2020-06-17 07:07:33 +00:00
evazion
19727ab5c4 Merge pull request #4505 from nonamethanks/pixiv_digital_tags
Pixiv: don't blacklist digital tools anymore
2020-06-15 20:56:56 -05:00
nonamethanks
e83d07ea7b Pixiv: don't blacklist digital tools anymore 2020-06-12 04:15:20 +02:00
evazion
2d05004bef tests: don't cache pixiv sessions. 2020-06-11 00:47:12 -05:00
evazion
d6b266514b tests: disable known broken pixiv fanbox tests. 2020-06-10 18:21:44 -05:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
49a3538933 pixiv: add support for techorus urls. 2020-03-04 00:00:39 -06:00
evazion
09046783ac pixiv: fix tests. 2020-03-03 23:54:03 -06:00
evazion
1244e02fe2 pixiv: handle new https://i-f.pximg.net urls. 2020-02-18 19:22:57 -06:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
03d9b3feca pixiv: support new https://www.pixiv.net/artworks/:id urls. 2019-09-24 03:33:21 -05:00
evazion
8cadef2dd7 pixiv: fix illust id parsing (fix #4043).
* Tighten up illust id parsing to avoid misparsing ids from
  non-illust urls (sketch urls and novel urls).

* Move id parsing tests from post_test.rb to sources/pixiv_test.rb.

* Drop support for touch.pixiv.net urls. These urls are no longer used
  by Pixiv and aren't present as the source of any posts on Danbooru.
2019-01-13 14:28:51 -06:00
evazion
04d5b16da7 pixiv: fix failure to upload bad pixiv id images (fix #4031)
Bug: Uploading bad pixiv id images failed because the pixiv strategy
raised a BadIDError exception when the upload service checked for the
ugoira frame data.
2019-01-03 18:01:20 -06:00
evazion
2129e60b2b pixiv: include stacc url in new artist entries (#4028). 2018-12-27 15:03:11 -06:00
evazion
8f6c710c6b tests: fix translated tags test failures. 2018-11-12 18:04:07 -06:00
evazion
fbd5f6b7f2 pixiv: fix preview_urls for ugoiras (#3891). 2018-09-12 00:43:10 -05:00
evazion
37fc215d75 pixiv: fix preview_urls to use correct url (#3891). 2018-09-11 23:55:46 -05:00
Albert Yi
a5df178bcc Merge pull request #3886 from r888888888/source-api-caching
cache api clients
2018-09-11 17:34:25 -07:00