Commit Graph

46 Commits

Author SHA1 Message Date
evazion
e3af738371 tests: fix broken tests. 2022-08-24 02:03:37 -05:00
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
Lily
9dde90ef94 fix broken assertion in nijie test 2022-04-09 12:32:45 -03:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
b4aea72d04 sources: remove preview_urls method from base strategy.
Remove the `preview_urls` method from strategies. The only place this was used was
when doing IQDB searches, to download the thumbnail image from the source instead of
the full image.

This wasn't worth it for a few reasons:

* Thumbnails on other sites are sometimes not the size we want, which could affect
  IQDB results.
* Grabbing thumbnails is complex for some sites. You can't always just rewrite the
  image URL. Sometimes it requires extra API calls, which can be slower than just
  grabbing the full image.
* For videos and animations, thumbnails from other sites don't always match our
  thumbnails. We do smart thumbnail generation to try to avoid blank thumbnails, which
  means we don't always pick the first frame, which could affect IQDB results.

API changes:

* /iqdb_queries?search[file_url] now downloads the URL as is without any modification.
  Before it tried to change thumbnail and sample size image URLs to the full version.

* /iqdb_queries?search[url] now returns an error if the URL is for a HTML page that
  contains multiple images. Before it would grab only the first image and silently
  ignore the rest.
2022-03-11 03:22:23 -06:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
317ec886bc sources: factor out Source::URL::Nijie.
Also fixes the uploader uploading all images when trying to upload only a
single image in a multi-image work. Caused by `image_urls` incorrectly
returning all images when the source strategy was given a url for a
single image.
2022-02-27 02:27:35 -06:00
evazion
9a5a04d74e nijie: fix uploads not working for new image URL format.
Fix uploads not working for image URLs like this:

    https://pic.nijie.net/07/nijie/17/95/728995/illust/0_0_403fdd541191110c_c25585.jpg
2022-02-15 20:45:28 -06:00
evazion
fefa6036fb tests: fix broken upload tests.
* Fix broken Skeb test caused by 404'd image.
* Fix broken Sta.sh tests caused by DeviantArt URL changes.
* Fix broken Nijie tests caused by Nijie URL changes.
2022-02-15 20:33:52 -06:00
evazion
17fb34922b nijie: fix failure to fetch source data due to change in login system.
Nijie changed their login system so that now there are two cookies that
need to be remembered: NIJIEIJIEID, and nijie_tok.
2022-01-11 15:14:54 -06:00
evazion
cdd46b0ac5 tests: fix more spurious test failures in CI.
* Skip Nijie tests because they fail a lot due to Nijie rate limiting us.
* Skip ArtStation downloads tests because they sometimes return different file sizes.
* Fix random duplicate favgroup errors because favgroup names weren't random enough.
2021-10-01 18:05:25 -05:00
evazion
ac12efb636 tests: fix test failures when running without API keys.
Fix the test suite failing when trying to run it in the default state
with no config file or API keys configured. Most source sites require
API keys or login credentials to be set in order to work. Skip these
tests when credentials aren't configured.
2021-09-22 04:33:36 -05:00
evazion
3202ec8b9a tests: fix broken tests. 2021-09-06 03:25:03 -05:00
nonamethanks
1234d93292 Nijie: get correct image when using bookmarklet 2021-05-25 12:20:39 +02:00
evazion
869a99d9a3 nijie: clear session cookie if it's expired (#4665).
If we detect that the session cookie has expired (by the presence of the
`#login_illust` element on the page), then clear the cached session
cookie. The current source fetch will still fail, but the next fetch
will try to login again and hopefully succeed.
2021-03-08 01:30:02 -06:00
nonamethanks
e3fd3f42cc Nijie: Add doujin support 2020-08-15 09:29:29 +02:00
evazion
ff83cbeba9 tests: fix broken nijie test.
This post apparently had a tag added to it.
2020-08-12 11:01:48 -05:00
nonamethanks
76b0380927 Nijie: fix login
Fixes a regression in 6e6ce6e62f
2020-07-31 23:26:35 +02:00
evazion
7e471fe223 sources: replace HTTParty with Danbooru::Http in http_exists?. 2020-06-21 15:11:56 -05:00
evazion
6e6ce6e62f nijie: replace Mechanize with Danbooru::Http.
The Nijie login process works like this:

* First we submit our `email` and `password` to `https://nijie.info/login_int.php`.
* Then we save the NIJIEIEID session cookie from the response.
* We optionally retry if login failed. Nijie returns 429 errors with a
  `Retry-After: 5` header if we send too many login requests. This can
  happen during parallel testing.
* We cache the login cookies for only 1 hour so we don't have to worry
  about them becoming invalid if we cache them too long.

Cookies and retrying errors on failure are handled transparently by Danbooru::Http.
2020-06-21 05:22:57 -05:00
evazion
855e31ac90 nijie: fetch commentary as html instead of plaintext.
Fix regression in #4475. Fetch the commentary as html instead of
plaintext so that we don't lose links or other formatting.

Also fix it so that /jump.php redirect links are replaced with the
actual url.
2020-05-29 15:36:21 -05:00
evazion
2c60a51f64 Merge pull request #4475 from nonamethanks/refactor_source_normalizing
Refactor source normalization
2020-05-27 16:52:17 -05:00
nonamethanks
116f3a67ef Nijie: fetch full commentary rather than truncated preview 2020-05-22 02:47:19 +02:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
b7350b8fe0 tests: fix various broken tests. 2020-01-17 19:21:20 -06:00
evazion
1b426fb23f Fix #4150: Nijie strategy fails for mp4 files. 2019-09-03 22:33:09 -05:00
evazion
9ecf36585c nijie: update for new image urls.
Nijie moved from this:

    https://pic03.nijie.info/nijie_picture/236014_20170620101426_0.png (page: https://www.nijie.info/view.php?id=218856)

to this:

    https://pic.nijie.net/03/nijie_picture/236014_20170620101426_0.png (page: https://www.nijie.info/view.php?id=218856)
2019-08-04 17:49:54 -05:00
evazion
bea8c2a4b8 nijie: fix failure to handle certain image urls.
Fix IMAGE_URL regex not matching urls of this form:

* https://pic04.nijie.info/nijie_picture/diff/main/287736_161475_20181112032855_1.png

This caused the illust id to not be parsed from the url, which led to `#image_url`
returning nil, which led to uploads failing because the url to download was missing.
2018-11-12 18:04:07 -06:00
evazion
8f6c710c6b tests: fix translated tags test failures. 2018-11-12 18:04:07 -06:00
evazion
d9063a9f2a nijie: support preview urls (#3919). 2018-09-24 17:08:37 -05:00
evazion
5525bbe1ca nijie: normalize all thumbnail urls (#3919). 2018-09-23 20:08:14 -05:00
evazion
d294514dc0 nijie: don't crash on invalid urls or deleted works (#3919). 2018-09-23 20:08:14 -05:00
evazion
b6228505aa nijie: fix page_url method.
The id in a bare image url is the member id, not the illust id.
2018-09-23 20:08:13 -05:00
evazion
d6235d6f9e nijie: add canonical url tests. 2018-08-31 23:23:15 -05:00
Albert Yi
762dc3da24 Refactor sources 2018-08-24 12:10:51 -07:00
r888888888
abce4d2551 Raise error on unpermitted params.
Fail loudly if we forget to whitelist a param instead of silently
ignoring it.

misc models: convert to strong params.

artist commentaries: convert to strong params.

* Disallow changing or setting post_id to a nonexistent post.

artists: convert to strong params.

* Disallow setting `is_banned` in create/update actions. Changing it
  this way instead of with the ban/unban actions would leave the artist in
  a partially banned state.

bans: convert to strong params.

* Disallow changing the user_id after the ban has been created.

comments: convert to strong params.

favorite groups: convert to strong params.

news updates: convert to strong params.

post appeals: convert to strong params.

post flags: convert to strong params.

* Disallow users from setting the `is_deleted` / `is_resolved` flags.

ip bans: convert to strong params.

user feedbacks: convert to strong params.

* Disallow users from setting `disable_dmail_notification` when creating feedbacks.
* Disallow changing the user_id after the feedback has been created.

notes: convert to strong params.

wiki pages: convert to strong params.

* Also fix non-Builders being able to delete wiki pages.

saved searches: convert to strong params.

pools: convert to strong params.

* Disallow setting `post_count` or `is_deleted` in create/update actions.

janitor trials: convert to strong params.

post disapprovals: convert to strong params.

* Factor out quick-mod bar to shared partial.
* Fix quick-mod bar to use `Post#is_approvable?` to determine visibility
  of Approve button.

dmail filters: convert to strong params.

password resets: convert to strong params.

user name change requests: convert to strong params.

posts: convert to strong params.

users: convert to strong params.

* Disallow setting password_hash, last_logged_in_at, last_forum_read_at,
  has_mail, and dmail_filter_attributes[user_id].

* Remove initialize_default_image_size (dead code).

uploads: convert to strong params.

* Remove `initialize_status` because status already defaults to pending
  in the database.

tag aliases/implications: convert to strong params.

tags: convert to strong params.

forum posts: convert to strong params.

* Disallow changing the topic_id after creating the post.
* Disallow setting is_deleted (destroy/undelete actions should be used instead).
* Remove is_sticky / is_locked (nonexistent attributes).

forum topics: convert to strong params.

* merges https://github.com/evazion/danbooru/tree/wip-rails-5.1
* lock pg gem to 0.21 (1.0.0 is incompatible with rails 5.1.4)
* switch to factorybot and change all references

Co-authored-by: r888888888 <r888888888@gmail.com>
Co-authored-by: evazion <noizave@gmail.com>

add diffs
2018-04-06 18:09:57 -07:00
r888888888
9d5e4f969f fix source tests 2017-11-20 12:30:29 -08:00
evazion
9b031a9c20 Fix #3266: Normalize fullwidth Unicode characters in translated tags. 2017-08-12 17:13:52 -05:00
evazion
b880b07387 sources: factor out html-to-dtext code to DText.from_html. 2017-07-01 11:15:48 -05:00
evazion
294358b4a6 nijie: fetch image_urls for batch bookmarklet. 2017-06-20 16:29:31 -05:00
evazion
2d5fc191dd nijie: convert commentary to dtext. 2017-06-20 16:11:16 -05:00
evazion
25e7db860a nijie: fetch artist commentary. 2017-06-20 16:11:16 -05:00
r888888888
6174d0eef2 normalize nijie popup urls (fixes #3153) 2017-06-14 12:26:13 -07:00
r888888888
85fa58cb7c add test for #3153 2017-06-14 11:36:04 -07:00