Commit Graph

543 Commits

Author SHA1 Message Date
evazion
1609059bf4 sources: factor out Source::URL::Fanbox.
Also fix it so that we grab the full image for cover URLs like this:

* Sample: https://pixiv.pximg.net/c/1620x580_90_a2_g5/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
* Full: https://pixiv.pximg.net/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
2022-02-28 06:25:06 -06:00
evazion
317ec886bc sources: factor out Source::URL::Nijie.
Also fixes the uploader uploading all images when trying to upload only a
single image in a multi-image work. Caused by `image_urls` incorrectly
returning all images when the source strategy was given a url for a
single image.
2022-02-27 02:27:35 -06:00
evazion
fcf517834d sources: factor out Source::URL::ArtStation. 2022-02-26 21:03:49 -06:00
evazion
9169f00e80 sources: factor out Source::URL::Moebooru. 2022-02-26 17:46:44 -06:00
evazion
74fdeef10c sources: factor out Source::URL::Mastodon. 2022-02-26 15:08:27 -06:00
evazion
86d8e2d13d sources: factor out Source::URL::Lofter. 2022-02-25 23:43:10 -06:00
evazion
f062f2d145 sources: factor out Source::URL::Newgrounds.
Also fix it so that the image URL is set as the source for Newgrounds
posts, not the page URL. It's possible to generate the page URL from the
image URL (except for images after the first in multi-image posts).

* Page: https://www.newgrounds.com/art/view/natthelich/weaver
* Image: https://art.ngfiles.com/images/1520000/1520217_natthelich_weaver.jpg?f1606365031
2022-02-25 23:04:03 -06:00
evazion
64472a7b7e sources: factor out Source::URL::HentaiFoundry.
Add support for these URL types:

* http://pictures.hentai-foundry.com//s/soranamae/363663.jpg
* http://www.hentai-foundry.com/piccies/d/dmitrys/1183.jpg
* http://www.hentai-foundry.com/pic-149160.php
* http://www.hentai-foundry.com/user-RockCandy.php
* http://www.hentai-foundry.com/profile-sawao.php

These URL types are obsolete, but still present in some old posts.
2022-02-25 22:01:17 -06:00
evazion
e6ded89f85 sources: factor out Source::URL::Plurk.
Also fix it so that for adult works, we get the images posted by the
artist in the replies. Example: https://www.plurk.com/p/omc64y (nsfw).
2022-02-25 02:06:57 -06:00
evazion
26f4cf1ebd sources: factor out Source::URL::Skeb. 2022-02-25 02:06:57 -06:00
evazion
ffe52f5ead sources: factor out Source::URL::Foundation.
Add support for a couple more URL types:

* https://foundation.app/@asuka111art/dinner-with-cats-82426
* https://f8n-production-collection-assets.imgix.net/0x3B3ee1931Dc30C1957379FAc9aba94D1C48a5405/128711/QmcBfbeCMSxqYB3L1owPAxFencFx3jLzCPFx6xUBxgSCkH/nft.png

Also include these URLs in the list of profile URLs:

* https://foundation.app/0x7E2ef75C0C09b2fc6BCd1C68B6D409720CcD58d2 (for https://foundation.app/@mochiiimo)

These URLs should be stable even if the user changes their name.
2022-02-23 23:49:31 -06:00
evazion
043c08eb05 sources: factor out Source::URL::TwitPic. 2022-02-23 23:49:31 -06:00
evazion
7ed8f95a8e sources: add Source::URL class; factor out Source::URL::Twitter.
Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.

This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.

This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
2022-02-23 23:46:04 -06:00
evazion
112b323f01 foundation: fix exception when uploading new Foundation url format.
Fix 'null value in column "source_url"' exception when uploading urls like this:

* https://foundation.app/@KILLERGF/kgfgen/4
* https://foundation.app/@mochiiimo/foundation/97376
2022-02-22 13:29:28 -06:00
evazion
7b009cc893 nicoseiga: fix inability to login to nicoseiga.
NicoSeiga changed it so that on every login, you must enter a 2FA code
sent by email. This broke the NicoSeiga strategy. The fix is to just use
a static session cookie instead (and hope it doesn't expire, and isn't
tied to an IP).

The `nico_seiga_login` and `nico_seiga_password` config settings have
been removed from config/danbooru_default_config.rb and replaced by
`nico_seiga_user_session`. If you run your own Danbooru instance, you
will have to update your config file manually.
2022-02-22 12:23:01 -06:00
evazion
7d49ab6130 Add Danbooru::URL class.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
2022-02-22 00:17:53 -06:00
evazion
68ba447494 uploads: remove batch upload page.
* Make /uploads/batch redirect to /uploads/new.
* Remove /uploads/image_proxy.
2022-02-21 00:03:43 -06:00
evazion
9a5a04d74e nijie: fix uploads not working for new image URL format.
Fix uploads not working for image URLs like this:

    https://pic.nijie.net/07/nijie/17/95/728995/illust/0_0_403fdd541191110c_c25585.jpg
2022-02-15 20:45:28 -06:00
evazion
7cfbd891ae pixiv: avoid unnecessary API call when uploading Pixiv posts.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.

API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
2022-02-15 18:55:12 -06:00
evazion
e4d7453180 uploads: improve error messages.
Improve upload error messages when downloading an URL fails, or it isn't
an image or video file.
2022-02-15 18:54:55 -06:00
evazion
b6538fde38 uploads: fix NicoSeiga sources not working.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:

    https://seiga.nicovideo.jp/image/source/10315315

Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
2022-02-15 17:12:02 -06:00
evazion
37075988ce uploads: fix page_url for null strategy.
Fix the null source strategy setting the page URL. The page URL is
expected to be nil when we can't determine the page containing the image URL.

Fixes the upload_media_assets.page_url field being filled for uploads
from unknown sites.
2022-02-15 00:59:22 -06:00
evazion
27d71f2727 uploads: raise download timeout.
Raise the timeout for downloading files from the source to 60 seconds globally.

Previously had a lower timeout because uploads were processed in the
foreground when not using the bookmarklet, and we didn't want to tie up
Puma worker processes with slow downloads. Now that all uploads are
processed in the background, we can have a higher timeout.
2022-02-15 00:56:51 -06:00
evazion
26da728a07 deviant art: fix new image URLs not being recognized.
Partial fix for #5008. DeviantArt now returns https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com
URLs instead of https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com for images in the
API. Fix these URLs not being recognized by the DeviantArt strategy.
2022-02-14 00:33:50 -06:00
evazion
d6f7725a1e nijie: fix exception in login process.
Fix an exception when we can't find the 'url' field in the login form
because we're rate limited by Nijie and couldn't scrape the login page.
2022-02-12 17:26:25 -06:00
nonamethanks
1c9014a5bb Fix lofter not working with iqdb 2022-02-05 09:43:17 +01:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
00ebd2e13c Merge pull request #4956 from nonamethanks/fix-skeb
Skeb: fix several issues with the strategy
2022-01-14 22:04:44 -06:00
nonamethanks
33db1a2761 Skeb: fix several issues with the strategy
* Fix fetching of videos
* Fix fetching of original commentary
* Fix images being returned out of order in bookmarklet
2022-01-14 21:24:48 +01:00
evazion
17fb34922b nijie: fix failure to fetch source data due to change in login system.
Nijie changed their login system so that now there are two cookies that
need to be remembered: NIJIEIJIEID, and nijie_tok.
2022-01-11 15:14:54 -06:00
evazion
0ba6dc9ee5 Fix #4945: Search for an artist by URL throws an exception. 2021-12-18 01:55:29 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
c94fdef3b2 foundation: fix fetching artist commentary.
The markup for the description changed from a <div> to a <h2>.
2021-12-08 03:01:54 -06:00
nonamethanks
41f9fde2e0 Fix foundation urls not working with iqdb 2021-11-15 10:10:49 +01:00
nonamethanks
49e232f2ae Foundation: add support for unconventional account names 2021-11-09 13:35:52 +01:00
nonamethanks
6c9b49c194 Foundation: add support for videos 2021-11-05 09:43:49 +01:00
nonamethanks
060223c9e2 Add Plurk support 2021-11-01 16:21:27 +01:00
evazion
5177a28f2c Merge pull request #4910 from nonamethanks/feat-foundation
Add Foundation support
2021-11-01 05:07:44 -05:00
nonamethanks
043f2fb124 Add Foundation support 2021-11-01 01:39:56 +01:00
evazion
9ff4d94382 Merge pull request #4909 from nonamethanks/add-lofter-theme
Lofter: Add support for additional theme
2021-10-31 05:13:04 -05:00
nonamethanks
5946544f71 Lofter: Add support for additional theme 2021-10-30 17:22:45 +02:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
nonamethanks
45313c56a6 Lofter: fix tag extraction 2021-10-04 14:21:07 +02:00
evazion
738be825ff twitter: include artist name in source URLs on post pages.
Show Twitter sources on post pages like this:

    https://twitter.com/BOW999/status/1261877313349640194

and not like this:

    https://twitter.com/i/web/status/1261877313349640194

We originally removed the artist name because the link would be broken
when the artist changed their name. This is no longer the case.
2021-09-27 11:07:25 -05:00
evazion
34de3b4d18 Merge pull request #4879 from nonamethanks/fix-artist-name
Sources: fix artist_name not being caught in skeb and weibo
2021-09-14 05:39:06 -05:00
nonamethanks
a845477cba Sources: fix artist_name not being caught in skeb and weibo 2021-09-14 11:32:24 +02:00
nonamethanks
9a6a6e52ea Lofter: raise timeout for file download 2021-09-10 13:10:29 +02:00
evazion
38c9559fe8 nokogiri: switch to the nokogumbo-based html5 parser.
https://github.com/sparklemotion/nokogiri/blob/main/CHANGELOG.md#1120--2021-08-02
2021-08-30 21:21:27 -05:00
evazion
bb7f24d279 Add HTTP proxy support.
Add support for using a proxy for HTTP requests. Only used for external
requests, such as downloading files or talking to source sites such as
Pixiv or Twitter, not for internal requests, such as talking to IQDB or
Reportbooru.
2021-08-28 04:53:33 -05:00
nonamethanks
f60fce614b Fix lofter strategy due to changes in their image urls 2021-08-15 02:16:57 -05:00