Commit Graph

25 Commits

Author SHA1 Message Date
nonamethanks
79a9081efa Moebooru: rewrite tests 2022-10-08 16:10:39 +02:00
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
f2028c14fb Fix #5045: Exception on uploads when SauceNAO is the referrer URL.
Bug: We assumed the referer URL was from the same site as the target
URL. We tried to call methods on the referer only supported by the
target URL.

Fix: Ignore the referer URL when it's from a different site than the
target URL.
2022-03-12 00:04:39 -06:00
evazion
b4aea72d04 sources: remove preview_urls method from base strategy.
Remove the `preview_urls` method from strategies. The only place this was used was
when doing IQDB searches, to download the thumbnail image from the source instead of
the full image.

This wasn't worth it for a few reasons:

* Thumbnails on other sites are sometimes not the size we want, which could affect
  IQDB results.
* Grabbing thumbnails is complex for some sites. You can't always just rewrite the
  image URL. Sometimes it requires extra API calls, which can be slower than just
  grabbing the full image.
* For videos and animations, thumbnails from other sites don't always match our
  thumbnails. We do smart thumbnail generation to try to avoid blank thumbnails, which
  means we don't always pick the first frame, which could affect IQDB results.

API changes:

* /iqdb_queries?search[file_url] now downloads the URL as is without any modification.
  Before it tried to change thumbnail and sample size image URLs to the full version.

* /iqdb_queries?search[url] now returns an error if the URL is for a HTML page that
  contains multiple images. Before it would grab only the first image and silently
  ignore the rest.
2022-03-11 03:22:23 -06:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
9169f00e80 sources: factor out Source::URL::Moebooru. 2022-02-26 17:46:44 -06:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
4074cc99f9 uploads: fix incorrect remote sizes on pixiv uploads.
Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads.

Cause: we didn't spoof the Referer header when making the HEAD request
for the image, causing Pixiv to return a 403 error.

Also fix the case where the Content-Length header is absent.
2020-06-24 03:02:45 -05:00
evazion
8eac82a971 pixiv: fix regression with new user profile urls.
* Update tests to use new Pixiv profile urls.
* Fix issue with artist finder not working when given direct image or
  html page urls.
2020-06-24 02:41:11 -05:00
evazion
9ca848d732 tests: fix more ruby 2.7 deprecation warnings. 2020-05-29 15:36:21 -05:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
a15bbe4264 moebooru: fix preview_urls to fallback to image_urls.
Fix the moebooru strategy to fallback to returning the image url if we
can't find the preview url. Fixes iqdb lookups failing in some cases
because the strategy didn't return a valid url for preview_url.
2019-10-26 13:51:58 -05:00
evazion
c700ea4b5f Fix #4016: Translated tags failing to find some tags.
* Normalize spaces to underscores when saving other names. Preserve case
  since case can be significant.

* Fix WikiPage#other_names_include to search case-insensitively (note:
  this prevents using the index).

* Fix sources to return the raw tags in `#tags` and the normalized tags
  in `#normalized_tags`. The normalized tags are the tags that will be
  matched against other names.
2018-12-16 11:37:57 -06:00
evazion
c8d538f618 moebooru: delegate to substrategy based on post source (#3911).
If the yande.re or konachan.com post has a source from a supported site,
for example Pixiv or Twitter, then delegate the artist and commentary
lookup to that substrategy.

Only do this for sources from recognized sites, not the null strategy.
2018-10-06 14:27:49 -05:00
evazion
e5a4193dd4 moebooru: support batch bookmarklet previews (#3911). 2018-10-06 00:58:22 -05:00
evazion
fdb6e4ecee moebooru: rewrite konachan urls for Post#normalized_source (#3911). 2018-10-06 00:58:22 -05:00
evazion
864349dc7b moebooru: fetch tags (#3911). 2018-10-06 00:58:22 -05:00
evazion
bd3fb7d70e Post#normalized_source: fix for yande.re urls.
Fix regex for yande.re urls like this:

    https://files.yande.re/image/b66909b940e8d77accab7c9b25aa4dc3/yande.re%20377828.png
2018-10-01 20:03:21 -05:00
evazion
958a9f505b moebooru: rewrite sample urls + support bookmarklet on html page.
* Fixes #2942: Add Moebooru Rewrite for Sample Images.
* Addresses #3911: Improve Moebooru support.
2018-09-19 23:32:21 -05:00
evazion
4a99cb098f moebooru: use the image url as the canonical url. 2018-09-16 21:00:11 -05:00
evazion
d9ce953752 Fix #3906: Moebooru strategy raises NotImplementedError. 2018-09-16 21:00:11 -05:00
evazion
cae78fa8ee moebooru: move tests from unit/downloads to unit/sources. 2018-09-16 21:00:11 -05:00