Commit Graph

356 Commits

Author SHA1 Message Date
evazion
452ce8d165 artstation: add partial support for video clips (#5063).
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
2022-03-21 16:51:42 -05:00
evazion
7394660ba9 posts: fix exception when post has source like 'https://www.twitter.com/username'.
`twitter.com` sources worked but `www.twitter.com` didn't.

Also match the URL by class instead of by site name to ensure we match
the expected class.
2022-03-20 21:08:05 -05:00
evazion
7f58cfbe5e tinami: get the full image.
Support grabbing the full image for Tinami uploads, rather than the sample.

Getting the full image requires making a request like this:

    curl -X POST \
    -H 'Referer: https://www.tinami.com/' \
    -H 'Content-Type: application/x-www-form-urlencoded' \
    -H 'Cookie: Tinami2SESSID=<redacted>;' \
    --data-raw 'action_view_original=true&cont_id=1087268&ethna_csrf=<redacted>' \
    https://www.tinami.com/view/1087268

Then scraping the <img> tag from the resulting HTML page.

If the post has multiple images, then we need to scrape and pass the
`sub_id` of the image too.

Fixes #2818.
2022-03-19 23:22:09 -05:00
evazion
01b683798e sources: add Tinami support. 2022-03-19 00:50:36 -05:00
evazion
a78b6528dc weibo: fix https://m.weibo.cn/detail/1234 urls not finding the artist.
Fix https://m.weibo.cn/detail/4506950043618873 type URLs not finding the
artist because the profile_url method returned nil instead of the actual
profile URL.
2022-03-18 06:01:51 -05:00
evazion
cf8b8207e2 artists: change how artist urls are normalized.
Change how artist URLs are normalized in artist entries. Don't try to secretly
convert image URLs to profile URLs in artist entries. For example, if someone puts a
Pixiv image URL in an artist entry, don't secretly try to fetch the source and
convert it into a profile URL in the `normalized_url` field.

We did this because years ago, it was standard practice to put image URLs in artist
entries. Pixiv image URLs used to contain the artist's username, so we used to put
image URLs in artist entries for artist finding purposes. But Pixiv changed it so
that image URLs no longer contained the username, so we dealt with it by adding a
`normalized_url` column to artist_urls and secretly converting image URLs to profile
URLs in this field. But this is no longer necessary because now we don't normally put
image URLs in artist entries in the first place.

Now the `profile_url` method in `Source::URL` is used to normalize URLs in artist
entries. This lets us parse various profile URL formats and normalize them into a
single canonical form.

This also removes the `normalize_for_artist_finder` method from source strategies.
Instead the `profile_url` method is used for artist finding purposes. So the profile
URL returned by the source strategy needs to be the same as the URL in the artist
entry in order for artist finding to work.
2022-03-13 03:54:17 -05:00
evazion
f2028c14fb Fix #5045: Exception on uploads when SauceNAO is the referrer URL.
Bug: We assumed the referer URL was from the same site as the target
URL. We tried to call methods on the referer only supported by the
target URL.

Fix: Ignore the referer URL when it's from a different site than the
target URL.
2022-03-12 00:04:39 -06:00
evazion
28971fe103 sources: factor out site_name method. 2022-03-11 23:20:53 -06:00
evazion
b4aea72d04 sources: remove preview_urls method from base strategy.
Remove the `preview_urls` method from strategies. The only place this was used was
when doing IQDB searches, to download the thumbnail image from the source instead of
the full image.

This wasn't worth it for a few reasons:

* Thumbnails on other sites are sometimes not the size we want, which could affect
  IQDB results.
* Grabbing thumbnails is complex for some sites. You can't always just rewrite the
  image URL. Sometimes it requires extra API calls, which can be slower than just
  grabbing the full image.
* For videos and animations, thumbnails from other sites don't always match our
  thumbnails. We do smart thumbnail generation to try to avoid blank thumbnails, which
  means we don't always pick the first frame, which could affect IQDB results.

API changes:

* /iqdb_queries?search[file_url] now downloads the URL as is without any modification.
  Before it tried to change thumbnail and sample size image URLs to the full version.

* /iqdb_queries?search[url] now returns an error if the URL is for a HTML page that
  contains multiple images. Before it would grab only the first image and silently
  ignore the rest.
2022-03-11 03:22:23 -06:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
5016d9ad26 Merge pull request #5043 from nonamethanks/fantia-support
Add Fantia support
2022-03-10 15:21:03 -06:00
evazion
8bba1b0a54 weibo: add test for m.weibo.cn/detail urls. 2022-03-10 15:04:02 -06:00
nonamethanks
a6549bc6fe Add Fantia support
Also fixes a regression in 74fdeef10c
that stopped mastodon urls from being given the right priority.
2022-03-10 17:43:32 +01:00
evazion
43a665a66d sources: factor out Source::URL::NicoSeiga. 2022-03-10 04:53:51 -06:00
evazion
6afb2f8e3c Merge pull request #5037 from nonamethanks/tumblr-refactor
sources: factor out Source::URL::Tumblr
2022-03-08 23:26:30 -06:00
evazion
987f2985d3 Merge pull request #5040 from nonamethanks/fix-weibo-404
Weibo: fix exception for deleted url
2022-03-08 23:08:37 -06:00
nonamethanks
c9be77d1f8 Weibo: fix exception for deleted url 2022-03-09 05:31:38 +01:00
evazion
1c620f8055 sources: factor out Source::URL::Pixiv.
* Drop support for preview_urls. This means that IQDB lookups may be
  slower, especially for ugoiras, since we have to download the full
  ugoira now. However, ugoira lookups should produce better results,
  since the ugoira thumbnail chosen by Pixiv wasn't necessarily the same
  as the thumbnail chosen by Danbooru.

* Drop support for uploading single manga pages:

    http://www.pixiv.net/member_illust.php?mode=manga_big&illust_id=18557054&page=2

  Previously uploading an URL like this would only upload a single image
  out of a multi-image work. Now it will upload all images in the work.
  Pixiv no longer supports URLs like this, so we don't either.

* Add support for parsing URLs like this:

    https://i.pximg.net/c/360x360_70/custom-thumb/img/2022/03/08/00/00/56/96755248_p0_custom1200.jpg

  Apparently artists can choose a custom thumbnail now (not like anyone
  will try to upload one though).
2022-03-08 22:17:38 -06:00
evazion
df0bb70486 sources: factor out Source::URL::PixivSketch.
Add upload support for Pixiv Sketch. Fetch tags, commentary, and artist,
and rewrite sample images to full images.

Authentication isn't required. R18 images are hidden in the browser but
visible in the API.
2022-03-08 18:24:12 -06:00
nonamethanks
b9c7e467e5 sources: factor out Source::URL::Tumblr
Also adds support for fetching source data from direct image urls when
possible.
2022-03-08 15:06:06 +01:00
nonamethanks
d8e2f2ee33 sources: factor out Source::URL::Weibo
Additionally, fixed some broken tests and changed normalization for urls
of album type to point to the mobile version instead, because they're
only visible to logged-in users.
2022-03-07 16:52:43 +01:00
evazion
1609059bf4 sources: factor out Source::URL::Fanbox.
Also fix it so that we grab the full image for cover URLs like this:

* Sample: https://pixiv.pximg.net/c/1620x580_90_a2_g5/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
* Full: https://pixiv.pximg.net/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
2022-02-28 06:25:06 -06:00
evazion
317ec886bc sources: factor out Source::URL::Nijie.
Also fixes the uploader uploading all images when trying to upload only a
single image in a multi-image work. Caused by `image_urls` incorrectly
returning all images when the source strategy was given a url for a
single image.
2022-02-27 02:27:35 -06:00
evazion
9169f00e80 sources: factor out Source::URL::Moebooru. 2022-02-26 17:46:44 -06:00
evazion
f062f2d145 sources: factor out Source::URL::Newgrounds.
Also fix it so that the image URL is set as the source for Newgrounds
posts, not the page URL. It's possible to generate the page URL from the
image URL (except for images after the first in multi-image posts).

* Page: https://www.newgrounds.com/art/view/natthelich/weaver
* Image: https://art.ngfiles.com/images/1520000/1520217_natthelich_weaver.jpg?f1606365031
2022-02-25 23:04:03 -06:00
evazion
e6ded89f85 sources: factor out Source::URL::Plurk.
Also fix it so that for adult works, we get the images posted by the
artist in the replies. Example: https://www.plurk.com/p/omc64y (nsfw).
2022-02-25 02:06:57 -06:00
evazion
ffe52f5ead sources: factor out Source::URL::Foundation.
Add support for a couple more URL types:

* https://foundation.app/@asuka111art/dinner-with-cats-82426
* https://f8n-production-collection-assets.imgix.net/0x3B3ee1931Dc30C1957379FAc9aba94D1C48a5405/128711/QmcBfbeCMSxqYB3L1owPAxFencFx3jLzCPFx6xUBxgSCkH/nft.png

Also include these URLs in the list of profile URLs:

* https://foundation.app/0x7E2ef75C0C09b2fc6BCd1C68B6D409720CcD58d2 (for https://foundation.app/@mochiiimo)

These URLs should be stable even if the user changes their name.
2022-02-23 23:49:31 -06:00
evazion
043c08eb05 sources: factor out Source::URL::TwitPic. 2022-02-23 23:49:31 -06:00
evazion
112b323f01 foundation: fix exception when uploading new Foundation url format.
Fix 'null value in column "source_url"' exception when uploading urls like this:

* https://foundation.app/@KILLERGF/kgfgen/4
* https://foundation.app/@mochiiimo/foundation/97376
2022-02-22 13:29:28 -06:00
evazion
9a5a04d74e nijie: fix uploads not working for new image URL format.
Fix uploads not working for image URLs like this:

    https://pic.nijie.net/07/nijie/17/95/728995/illust/0_0_403fdd541191110c_c25585.jpg
2022-02-15 20:45:28 -06:00
evazion
fefa6036fb tests: fix broken upload tests.
* Fix broken Skeb test caused by 404'd image.
* Fix broken Sta.sh tests caused by DeviantArt URL changes.
* Fix broken Nijie tests caused by Nijie URL changes.
2022-02-15 20:33:52 -06:00
evazion
b6538fde38 uploads: fix NicoSeiga sources not working.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:

    https://seiga.nicovideo.jp/image/source/10315315

Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
2022-02-15 17:12:02 -06:00
evazion
26da728a07 deviant art: fix new image URLs not being recognized.
Partial fix for #5008. DeviantArt now returns https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com
URLs instead of https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com for images in the
API. Fix these URLs not being recognized by the DeviantArt strategy.
2022-02-14 00:33:50 -06:00
nonamethanks
1c9014a5bb Fix lofter not working with iqdb 2022-02-05 09:43:17 +01:00
evazion
2bb5ad78fb tests: fix broken tests.
* Fix a bug where creating posts failed if IQDB wasn't configured.
* Fix broken Skeb test caused by changed URL.
* Fix broken IP geolocation tests caused by API returning different data.
* Fix broken post regeneration tests.
2022-01-31 14:17:14 -06:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
00ebd2e13c Merge pull request #4956 from nonamethanks/fix-skeb
Skeb: fix several issues with the strategy
2022-01-14 22:04:44 -06:00
nonamethanks
33db1a2761 Skeb: fix several issues with the strategy
* Fix fetching of videos
* Fix fetching of original commentary
* Fix images being returned out of order in bookmarklet
2022-01-14 21:24:48 +01:00
evazion
17fb34922b nijie: fix failure to fetch source data due to change in login system.
Nijie changed their login system so that now there are two cookies that
need to be remembered: NIJIEIJIEID, and nijie_tok.
2022-01-11 15:14:54 -06:00
evazion
450594b803 tests: fix broken tests. 2022-01-07 14:44:24 -06:00
evazion
edd0656b73 tests: fix broken tests. 2022-01-06 00:41:18 -06:00
evazion
0ba6dc9ee5 Fix #4945: Search for an artist by URL throws an exception. 2021-12-18 01:55:29 -06:00
evazion
e57e38b35f tests: fix broken tests. 2021-12-08 03:01:54 -06:00
evazion
594b46a85d tests: fix broken tests. 2021-11-23 23:18:54 -06:00
evazion
b561ca49f2 foundation: fix mojibake in artist commentaries.
Fix certain artist commentaries for foundation.app containing scrambled
characters. Apparently caused by the Nokogiri HTML5 parser not handling
UTF-8 input correctly when the encoding isn't explicitly set to UTF-8.
2021-11-15 04:55:48 -06:00
nonamethanks
49e232f2ae Foundation: add support for unconventional account names 2021-11-09 13:35:52 +01:00
nonamethanks
6c9b49c194 Foundation: add support for videos 2021-11-05 09:43:49 +01:00
evazion
dccc2edb75 tests: fix broken tests.
* Fix a Twitter test broken by a privated tweet.
* Fix an IP geolocation test broken by the ipregistry.co API returning new data.
2021-11-02 04:42:07 -05:00
nonamethanks
060223c9e2 Add Plurk support 2021-11-01 16:21:27 +01:00
evazion
5177a28f2c Merge pull request #4910 from nonamethanks/feat-foundation
Add Foundation support
2021-11-01 05:07:44 -05:00