Commit Graph

58 Commits

Author SHA1 Message Date
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
7394660ba9 posts: fix exception when post has source like 'https://www.twitter.com/username'.
`twitter.com` sources worked but `www.twitter.com` didn't.

Also match the URL by class instead of by site name to ensure we match
the expected class.
2022-03-20 21:08:05 -05:00
evazion
b4aea72d04 sources: remove preview_urls method from base strategy.
Remove the `preview_urls` method from strategies. The only place this was used was
when doing IQDB searches, to download the thumbnail image from the source instead of
the full image.

This wasn't worth it for a few reasons:

* Thumbnails on other sites are sometimes not the size we want, which could affect
  IQDB results.
* Grabbing thumbnails is complex for some sites. You can't always just rewrite the
  image URL. Sometimes it requires extra API calls, which can be slower than just
  grabbing the full image.
* For videos and animations, thumbnails from other sites don't always match our
  thumbnails. We do smart thumbnail generation to try to avoid blank thumbnails, which
  means we don't always pick the first frame, which could affect IQDB results.

API changes:

* /iqdb_queries?search[file_url] now downloads the URL as is without any modification.
  Before it tried to change thumbnail and sample size image URLs to the full version.

* /iqdb_queries?search[url] now returns an error if the URL is for a HTML page that
  contains multiple images. Before it would grab only the first image and silently
  ignore the rest.
2022-03-11 03:22:23 -06:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
043c08eb05 sources: factor out Source::URL::TwitPic. 2022-02-23 23:49:31 -06:00
evazion
dccc2edb75 tests: fix broken tests.
* Fix a Twitter test broken by a privated tweet.
* Fix an IP geolocation test broken by the ipregistry.co API returning new data.
2021-11-02 04:42:07 -05:00
evazion
738be825ff twitter: include artist name in source URLs on post pages.
Show Twitter sources on post pages like this:

    https://twitter.com/BOW999/status/1261877313349640194

and not like this:

    https://twitter.com/i/web/status/1261877313349640194

We originally removed the artist name because the link would be broken
when the artist changed their name. This is no longer the case.
2021-09-27 11:07:25 -05:00
evazion
3639d7eae5 tests: fixup twitter tests for c90ef9f1b. 2021-02-05 03:33:07 -06:00
evazion
73506bac33 twitter: add tests for uploading profile banners (#4520). 2020-06-23 02:37:21 -05:00
evazion
2ede41c4dc tests: fix twitter test broken by deleted tweet. 2020-06-10 20:22:16 -05:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
BrokenEagle
a45ae09d72 Account for additional Twitter video image links 2020-03-29 19:27:05 +00:00
evazion
31424ce545 twitter: add test for video thumbnails (#4262). 2020-03-20 16:19:07 -05:00
evazion
60bf21ff80 twitter: fix preview_urls when source url is a direct image.
Fix preview_urls returning an empty array when the source url is a
direct image from Twitter.

Also return preview_urls in /source.json.
2020-01-21 16:34:03 -06:00
evazion
e42881fbbf Fix #4262: Exception when using Twitter video image links. 2020-01-15 15:20:33 -06:00
evazion
faeec18efc twitter: add hashtag normalization test.
Add test for #4243. Also fix warning from bootsnap:

    iseq.rb:18: warning: nested repeat operator '+' and '?' was replaced with '*' in regular expression: /(?<!\A)生誕祭(?:\d+)?\z/
2020-01-05 17:38:30 -06:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
da84e3a2f2 twitter: replace twitter gem with our own API client.
The twitter gem had several problems:

* It's been unmaintained for over a year.
* It pulled in a lot of dependencies, many of which were outdated. In
  particular, it locked the `http` gem to version 3.3, preventing us
  from upgrading to 4.2.
* It raised exceptions on normal error conditions, like for deleted
  tweets or suspended users, which we really don't want.
* We had to wrap it to provide caching.

Changes:

* Fixes #4226 (Exception when creating new artists entries for suspended
  Twitter accounts)
* Drop support for scraping images from summary cards. Summary cards
  are the previews you get when you link to a website in a tweet. These
  preview images aren't always the best image.
2019-12-13 17:27:03 -06:00
evazion
a8896b664d twitter: fix batch bookmarklet selecting wrong image.
Fix regression in 7e465aeda.

https://danbooru.donmai.us/forum_topics/9127?page=276#forum_post_158779
2019-08-06 10:42:45 -05:00
evazion
7e465aedae Fix #4110: New Twitter image urls are broken in bookmarklet. 2019-08-04 20:23:10 -05:00
evazion
0f513d1a1b twitter: include intent url in new artist entries (#4028). 2018-12-27 15:03:11 -06:00
evazion
6a7cd6ce8e Fix #3984: Twitter: undefined method `first' for nil:NilClass.
Fix Sources::Strategies::Twitter#image_urls to return an empty array
instead of nil when the tweet doesn't contain any images.
2018-11-11 17:41:32 -06:00
evazion
5cf6a43918 sources: fix sources sometimes choosing wrong strategy (fix #3968)
Fix sources choosing the wrong strategy when the referer belongs to a
different site (for example, when uploading a twitter post with a pixiv
referer).

* Fix `match?` to only consider the main url, not the referer.

* Change `match?` to match against a list of domains given by the `domains` method.

* Change `match?` to an instance method.
2018-11-04 13:00:17 -06:00
evazion
96e89cecfb tests: move twitter canonical url test. 2018-09-17 23:27:53 -05:00
evazion
f135a7c064 twitter: normalize canonical urls.
Normalize http://mobile.twitter.com to http://twitter.com in canonical urls.
2018-09-16 15:03:47 -05:00
evazion
bd47641601 twitter: don't fail when api key isn't configured. 2018-09-16 15:03:47 -05:00
evazion
325120ee51 twitter: fix parsing of the artist name from the url.
Fixes URLs like https://twitter.com/intent/user?user_id=123 being
incorrectly normalized to http://twitter.com/intent/ in artist entries.

Also fixes the artist name to be taken from the url when it can't be
obtained from the api (when the tweet is deleted).
2018-09-16 15:03:23 -05:00
Albert Yi
4972c998f8 rely on preview urls if available for gallery 2018-09-11 15:06:12 -07:00
evazion
9a980367f6 twitter: normalize artist commentaries to nfkc (#3719)
Fixes hashtags not being interpreted when the author uses a fullwidth
number sign (#, U+FF03).

ref: https://github.com/r888888888/danbooru/issues/3719#issuecomment-419535610
2018-09-10 21:45:50 -05:00
evazion
a1044dbc19 twitter: fix handling of direct image urls without a referer url. 2018-08-29 17:14:57 -05:00
Albert Yi
762dc3da24 Refactor sources 2018-08-24 12:10:51 -07:00
evazion
4fd4cbd2a6 twitter: fix tests. 2018-04-28 12:33:05 -05:00
r888888888
abce4d2551 Raise error on unpermitted params.
Fail loudly if we forget to whitelist a param instead of silently
ignoring it.

misc models: convert to strong params.

artist commentaries: convert to strong params.

* Disallow changing or setting post_id to a nonexistent post.

artists: convert to strong params.

* Disallow setting `is_banned` in create/update actions. Changing it
  this way instead of with the ban/unban actions would leave the artist in
  a partially banned state.

bans: convert to strong params.

* Disallow changing the user_id after the ban has been created.

comments: convert to strong params.

favorite groups: convert to strong params.

news updates: convert to strong params.

post appeals: convert to strong params.

post flags: convert to strong params.

* Disallow users from setting the `is_deleted` / `is_resolved` flags.

ip bans: convert to strong params.

user feedbacks: convert to strong params.

* Disallow users from setting `disable_dmail_notification` when creating feedbacks.
* Disallow changing the user_id after the feedback has been created.

notes: convert to strong params.

wiki pages: convert to strong params.

* Also fix non-Builders being able to delete wiki pages.

saved searches: convert to strong params.

pools: convert to strong params.

* Disallow setting `post_count` or `is_deleted` in create/update actions.

janitor trials: convert to strong params.

post disapprovals: convert to strong params.

* Factor out quick-mod bar to shared partial.
* Fix quick-mod bar to use `Post#is_approvable?` to determine visibility
  of Approve button.

dmail filters: convert to strong params.

password resets: convert to strong params.

user name change requests: convert to strong params.

posts: convert to strong params.

users: convert to strong params.

* Disallow setting password_hash, last_logged_in_at, last_forum_read_at,
  has_mail, and dmail_filter_attributes[user_id].

* Remove initialize_default_image_size (dead code).

uploads: convert to strong params.

* Remove `initialize_status` because status already defaults to pending
  in the database.

tag aliases/implications: convert to strong params.

tags: convert to strong params.

forum posts: convert to strong params.

* Disallow changing the topic_id after creating the post.
* Disallow setting is_deleted (destroy/undelete actions should be used instead).
* Remove is_sticky / is_locked (nonexistent attributes).

forum topics: convert to strong params.

* merges https://github.com/evazion/danbooru/tree/wip-rails-5.1
* lock pg gem to 0.21 (1.0.0 is incompatible with rails 5.1.4)
* switch to factorybot and change all references

Co-authored-by: r888888888 <r888888888@gmail.com>
Co-authored-by: evazion <noizave@gmail.com>

add diffs
2018-04-06 18:09:57 -07:00
evazion
f8a5620768 Partial fix for #3514: Handle https://twitter.com/i/web/status/:id URL. 2018-01-23 23:07:21 -06:00
r888888888
461ddbf017 fixes #3422 2017-12-21 11:33:23 -08:00
r888888888
b5d72ae8d8 fixes #3422 2017-12-15 17:21:33 -08:00
r888888888
9d5e4f969f fix source tests 2017-11-20 12:30:29 -08:00
Albert Yi
058783755d Merge pull request #3379 from evazion/fix-3377
Fix #3377: Batch bookmarklet doesn't fetch artist/tags from twitter
2017-11-16 12:02:06 -08:00
evazion
f633222ef0 twitter: test fetching source data from direct image with referer. 2017-11-16 13:29:58 -06:00
r888888888
cd5d9cdaeb update twitter test 2017-11-16 11:19:39 -08:00
evazion
8b70e0099b twitter: fix to handle extended tweets (fix #3254, #3072). 2017-08-05 23:12:55 -05:00
r888888888
610e2bdedd fixes #3191 2017-06-27 14:48:55 -07:00
evazion
128e655aef twitter: fetch hashtags for translated tags (fixes #3171). 2017-06-17 16:15:23 -05:00
evazion
5dd3151d5b twitter: convert commentary to dtext.
* Convert hashtags and mentions to dtext links.
* Replace http://t.co urls to the actual url.
* Strip the http://t.co url linking to the tweet itself.
2017-06-16 12:56:55 -05:00
r888888888
f27d065e1f fixes #3119 2017-06-14 16:30:19 -07:00
r888888888
ff5586cb01 refactor twitter service to handle cards (fixes #3031) 2017-05-09 12:48:11 -07:00
Albert Yi
0ea7d78584 remove usage of vcr cassettes; delete unused fixtures; fix some broken unit tests 2016-12-28 15:47:28 -08:00