Commit Graph

556 Commits

Author SHA1 Message Date
evazion
8a50148823 pixiv: fixup bug with fetching image_urls for bad_id posts.
Fix `image_urls` returning `[nil]` when fetching data for a image URL
that was bad_id. In that case `original_urls` is empty, so we fall back
to using the deleted image URL as-is.
2022-03-09 01:14:09 -06:00
evazion
77c88fd867 Merge pull request #5038 from nonamethanks/remove-redundant-comments
sources: remove redundant comments
2022-03-08 23:28:29 -06:00
evazion
6afb2f8e3c Merge pull request #5037 from nonamethanks/tumblr-refactor
sources: factor out Source::URL::Tumblr
2022-03-08 23:26:30 -06:00
evazion
cf4b9a6114 Merge pull request #5039 from nonamethanks/simplify-lofter-tag-parsing
Lofter: simplify tag extraction logic
2022-03-08 23:21:57 -06:00
evazion
987f2985d3 Merge pull request #5040 from nonamethanks/fix-weibo-404
Weibo: fix exception for deleted url
2022-03-08 23:08:37 -06:00
evazion
52a2d3418c pixiv: fixup bugs in 1c620f805.
* Fix error when uploading non-ugoira files.
* Fix sample image URLs not being rewritten to full images correctly. We
  have to get the full image URL from the API because given an
  /img-master/ URL, we don't know what the original file extension is.
2022-03-08 23:07:24 -06:00
nonamethanks
c9be77d1f8 Weibo: fix exception for deleted url 2022-03-09 05:31:38 +01:00
evazion
1c620f8055 sources: factor out Source::URL::Pixiv.
* Drop support for preview_urls. This means that IQDB lookups may be
  slower, especially for ugoiras, since we have to download the full
  ugoira now. However, ugoira lookups should produce better results,
  since the ugoira thumbnail chosen by Pixiv wasn't necessarily the same
  as the thumbnail chosen by Danbooru.

* Drop support for uploading single manga pages:

    http://www.pixiv.net/member_illust.php?mode=manga_big&illust_id=18557054&page=2

  Previously uploading an URL like this would only upload a single image
  out of a multi-image work. Now it will upload all images in the work.
  Pixiv no longer supports URLs like this, so we don't either.

* Add support for parsing URLs like this:

    https://i.pximg.net/c/360x360_70/custom-thumb/img/2022/03/08/00/00/56/96755248_p0_custom1200.jpg

  Apparently artists can choose a custom thumbnail now (not like anyone
  will try to upload one though).
2022-03-08 22:17:38 -06:00
evazion
df0bb70486 sources: factor out Source::URL::PixivSketch.
Add upload support for Pixiv Sketch. Fetch tags, commentary, and artist,
and rewrite sample images to full images.

Authentication isn't required. R18 images are hidden in the browser but
visible in the API.
2022-03-08 18:24:12 -06:00
nonamethanks
ff6bfff311 Lofter: simplify tag extraction logic
Now that we have a separate parsing class we can just use it to properly
parse tag urls as well.
2022-03-08 17:01:50 +01:00
nonamethanks
ebd3670076 sources: remove redundant comments
These comments are already present under the parse blocks, so the huge
walls of text before the code are not needed anymore.
2022-03-08 16:56:00 +01:00
nonamethanks
b9c7e467e5 sources: factor out Source::URL::Tumblr
Also adds support for fetching source data from direct image urls when
possible.
2022-03-08 15:06:06 +01:00
nonamethanks
d8e2f2ee33 sources: factor out Source::URL::Weibo
Additionally, fixed some broken tests and changed normalization for urls
of album type to point to the mobile version instead, because they're
only visible to logged-in users.
2022-03-07 16:52:43 +01:00
evazion
1609059bf4 sources: factor out Source::URL::Fanbox.
Also fix it so that we grab the full image for cover URLs like this:

* Sample: https://pixiv.pximg.net/c/1620x580_90_a2_g5/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
* Full: https://pixiv.pximg.net/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
2022-02-28 06:25:06 -06:00
evazion
317ec886bc sources: factor out Source::URL::Nijie.
Also fixes the uploader uploading all images when trying to upload only a
single image in a multi-image work. Caused by `image_urls` incorrectly
returning all images when the source strategy was given a url for a
single image.
2022-02-27 02:27:35 -06:00
evazion
fcf517834d sources: factor out Source::URL::ArtStation. 2022-02-26 21:03:49 -06:00
evazion
9169f00e80 sources: factor out Source::URL::Moebooru. 2022-02-26 17:46:44 -06:00
evazion
74fdeef10c sources: factor out Source::URL::Mastodon. 2022-02-26 15:08:27 -06:00
evazion
86d8e2d13d sources: factor out Source::URL::Lofter. 2022-02-25 23:43:10 -06:00
evazion
f062f2d145 sources: factor out Source::URL::Newgrounds.
Also fix it so that the image URL is set as the source for Newgrounds
posts, not the page URL. It's possible to generate the page URL from the
image URL (except for images after the first in multi-image posts).

* Page: https://www.newgrounds.com/art/view/natthelich/weaver
* Image: https://art.ngfiles.com/images/1520000/1520217_natthelich_weaver.jpg?f1606365031
2022-02-25 23:04:03 -06:00
evazion
64472a7b7e sources: factor out Source::URL::HentaiFoundry.
Add support for these URL types:

* http://pictures.hentai-foundry.com//s/soranamae/363663.jpg
* http://www.hentai-foundry.com/piccies/d/dmitrys/1183.jpg
* http://www.hentai-foundry.com/pic-149160.php
* http://www.hentai-foundry.com/user-RockCandy.php
* http://www.hentai-foundry.com/profile-sawao.php

These URL types are obsolete, but still present in some old posts.
2022-02-25 22:01:17 -06:00
evazion
e6ded89f85 sources: factor out Source::URL::Plurk.
Also fix it so that for adult works, we get the images posted by the
artist in the replies. Example: https://www.plurk.com/p/omc64y (nsfw).
2022-02-25 02:06:57 -06:00
evazion
26f4cf1ebd sources: factor out Source::URL::Skeb. 2022-02-25 02:06:57 -06:00
evazion
ffe52f5ead sources: factor out Source::URL::Foundation.
Add support for a couple more URL types:

* https://foundation.app/@asuka111art/dinner-with-cats-82426
* https://f8n-production-collection-assets.imgix.net/0x3B3ee1931Dc30C1957379FAc9aba94D1C48a5405/128711/QmcBfbeCMSxqYB3L1owPAxFencFx3jLzCPFx6xUBxgSCkH/nft.png

Also include these URLs in the list of profile URLs:

* https://foundation.app/0x7E2ef75C0C09b2fc6BCd1C68B6D409720CcD58d2 (for https://foundation.app/@mochiiimo)

These URLs should be stable even if the user changes their name.
2022-02-23 23:49:31 -06:00
evazion
043c08eb05 sources: factor out Source::URL::TwitPic. 2022-02-23 23:49:31 -06:00
evazion
7ed8f95a8e sources: add Source::URL class; factor out Source::URL::Twitter.
Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.

This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.

This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
2022-02-23 23:46:04 -06:00
evazion
112b323f01 foundation: fix exception when uploading new Foundation url format.
Fix 'null value in column "source_url"' exception when uploading urls like this:

* https://foundation.app/@KILLERGF/kgfgen/4
* https://foundation.app/@mochiiimo/foundation/97376
2022-02-22 13:29:28 -06:00
evazion
7b009cc893 nicoseiga: fix inability to login to nicoseiga.
NicoSeiga changed it so that on every login, you must enter a 2FA code
sent by email. This broke the NicoSeiga strategy. The fix is to just use
a static session cookie instead (and hope it doesn't expire, and isn't
tied to an IP).

The `nico_seiga_login` and `nico_seiga_password` config settings have
been removed from config/danbooru_default_config.rb and replaced by
`nico_seiga_user_session`. If you run your own Danbooru instance, you
will have to update your config file manually.
2022-02-22 12:23:01 -06:00
evazion
7d49ab6130 Add Danbooru::URL class.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
2022-02-22 00:17:53 -06:00
evazion
68ba447494 uploads: remove batch upload page.
* Make /uploads/batch redirect to /uploads/new.
* Remove /uploads/image_proxy.
2022-02-21 00:03:43 -06:00
evazion
9a5a04d74e nijie: fix uploads not working for new image URL format.
Fix uploads not working for image URLs like this:

    https://pic.nijie.net/07/nijie/17/95/728995/illust/0_0_403fdd541191110c_c25585.jpg
2022-02-15 20:45:28 -06:00
evazion
7cfbd891ae pixiv: avoid unnecessary API call when uploading Pixiv posts.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.

API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
2022-02-15 18:55:12 -06:00
evazion
e4d7453180 uploads: improve error messages.
Improve upload error messages when downloading an URL fails, or it isn't
an image or video file.
2022-02-15 18:54:55 -06:00
evazion
b6538fde38 uploads: fix NicoSeiga sources not working.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:

    https://seiga.nicovideo.jp/image/source/10315315

Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
2022-02-15 17:12:02 -06:00
evazion
37075988ce uploads: fix page_url for null strategy.
Fix the null source strategy setting the page URL. The page URL is
expected to be nil when we can't determine the page containing the image URL.

Fixes the upload_media_assets.page_url field being filled for uploads
from unknown sites.
2022-02-15 00:59:22 -06:00
evazion
27d71f2727 uploads: raise download timeout.
Raise the timeout for downloading files from the source to 60 seconds globally.

Previously had a lower timeout because uploads were processed in the
foreground when not using the bookmarklet, and we didn't want to tie up
Puma worker processes with slow downloads. Now that all uploads are
processed in the background, we can have a higher timeout.
2022-02-15 00:56:51 -06:00
evazion
26da728a07 deviant art: fix new image URLs not being recognized.
Partial fix for #5008. DeviantArt now returns https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com
URLs instead of https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com for images in the
API. Fix these URLs not being recognized by the DeviantArt strategy.
2022-02-14 00:33:50 -06:00
evazion
d6f7725a1e nijie: fix exception in login process.
Fix an exception when we can't find the 'url' field in the login form
because we're rate limited by Nijie and couldn't scrape the login page.
2022-02-12 17:26:25 -06:00
nonamethanks
1c9014a5bb Fix lofter not working with iqdb 2022-02-05 09:43:17 +01:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
00ebd2e13c Merge pull request #4956 from nonamethanks/fix-skeb
Skeb: fix several issues with the strategy
2022-01-14 22:04:44 -06:00
nonamethanks
33db1a2761 Skeb: fix several issues with the strategy
* Fix fetching of videos
* Fix fetching of original commentary
* Fix images being returned out of order in bookmarklet
2022-01-14 21:24:48 +01:00
evazion
17fb34922b nijie: fix failure to fetch source data due to change in login system.
Nijie changed their login system so that now there are two cookies that
need to be remembered: NIJIEIJIEID, and nijie_tok.
2022-01-11 15:14:54 -06:00
evazion
0ba6dc9ee5 Fix #4945: Search for an artist by URL throws an exception. 2021-12-18 01:55:29 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
c94fdef3b2 foundation: fix fetching artist commentary.
The markup for the description changed from a <div> to a <h2>.
2021-12-08 03:01:54 -06:00
nonamethanks
41f9fde2e0 Fix foundation urls not working with iqdb 2021-11-15 10:10:49 +01:00
nonamethanks
49e232f2ae Foundation: add support for unconventional account names 2021-11-09 13:35:52 +01:00
nonamethanks
6c9b49c194 Foundation: add support for videos 2021-11-05 09:43:49 +01:00
nonamethanks
060223c9e2 Add Plurk support 2021-11-01 16:21:27 +01:00