Commit Graph

42 Commits

Author SHA1 Message Date
evazion
c07b099bf8 Fix #5152: Nicovideo video urls getting bad_source. 2022-05-03 03:59:15 -05:00
evazion
23b8350320 sources: add image_url?, page_url?, and profile_url? methods.
Add methods to Source::URL for determining whether a URL is an image
URL, a page URL, or a profile URL.

Also add more source URL tests and fix various URL parsing bugs.
2022-05-01 21:01:36 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
5941c47b79 nicoseiga: support a few more url types. 2022-03-23 23:38:06 -05:00
evazion
c07c5ea594 nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls.
Fix the page_url method not to return URLs like this:

    https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)

These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.

This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
2022-03-23 23:38:06 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
2f61486ac6 sources: remove image_url method from base strategy.
Remove the `image_url` method from source strategies. This method would
return only the first image if a source had multiple images. The
`image_urls` method should be used instead. Tests were the main place
that still used `image_url` instead of `image_urls`.

Also make post replacements return an error if replacing with a source
that contains multiple images, instead of just blindly replacing the
post with the first image in the source.
2022-03-11 01:59:21 -06:00
evazion
43a665a66d sources: factor out Source::URL::NicoSeiga. 2022-03-10 04:53:51 -06:00
evazion
b6538fde38 uploads: fix NicoSeiga sources not working.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:

    https://seiga.nicovideo.jp/image/source/10315315

Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
2022-02-15 17:12:02 -06:00
evazion
ac12efb636 tests: fix test failures when running without API keys.
Fix the test suite failing when trying to run it in the default state
with no config file or API keys configured. Most source sites require
API keys or login credentials to be set in order to work. Skip these
tests when credentials aren't configured.
2021-09-22 04:33:36 -05:00
nonamethanks
cb6196c259 Nicoseiga: auto-add spoiler tags to commentary 2021-04-06 14:08:49 +02:00
Aaron Franke
191b528ad7 Ensure files end in newlines (POSIX compliance) 2020-10-04 05:13:39 -04:00
evazion
b583b3c810 tests: fix nicoseiga download tests. 2020-06-16 00:10:35 -05:00
nonamethanks
9f0e85e1b5 Refactor nicoseiga strategy
* Get rid of mechanize, fully switch to Danbooru::Http
* Switch to mobile api, improving speed
* Merge main and manga clients
* Add full support for manga pages
* Add support for anonymous and r-15 images
* Don't fail when attempting to upload oekaki direct links
* Various misc fixes
2020-06-15 03:37:51 +02:00
nonamethanks
307df3b3e4 Refactor source normalization
* Move the source normalization logic out of the post model
  and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
  and expand them significantly. Previously we were only testing
  a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
2020-05-21 22:46:51 +02:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
Albert Yi
f6a11e6363 remove residual code 2019-02-25 14:46:43 -08:00
Albert Yi
90ce42a537 add support for nico seiga manga (fixes #4060) 2019-02-25 14:44:45 -08:00
evazion
39f9e01b13 nicoseiga: fix canonical_url to use the image url. 2018-09-22 11:07:18 -05:00
Albert Yi
762dc3da24 Refactor sources 2018-08-24 12:10:51 -07:00
evazion
4c39783d28 Fix #3424: /iqdb_queries.json fails for certain urls.
Fix the HTML page -> image URL download rewrite strategy failing for
https://lohas.nicoseiga.jp/thumb/${id}i URLs.
2017-12-15 10:16:06 -06:00
evazion
047fb68f45 Fix #3117: Nicoseiga handler grabbing wrong commentary source
* `summary` is the wrong field. It's the list of comments left by users,
  not the artist's commentary.

* For some reason `doc.response.image.description` returns nil even
  though the description element exists. Switch to `Hash.from_xml` to
  avoid this.
2017-06-06 13:44:43 -05:00
r888888888
216ca06fee fixes #3100 2017-05-30 15:38:01 -07:00
r888888888
0b8d4105aa fix tests 2017-04-04 12:39:17 -07:00
Albert Yi
0ea7d78584 remove usage of vcr cassettes; delete unused fixtures; fix some broken unit tests 2016-12-28 15:47:28 -08:00
r888888888
fc7afd44ea refactor source pixiv test
refactor pixiv download tests
refactor upload test
refactor nico seiga test
refactor twitter tests
2016-09-28 11:25:29 -07:00
r888888888
45e5ea817a update tests 2016-06-12 15:12:30 -07:00
r888888888
58aa5c6d66 fix tests 2016-05-28 14:08:44 -07:00
r888888888
341b29ce41 fix tests 2015-08-18 17:40:53 -07:00
r888888888
8ef7462b6b fix tests 2015-07-06 18:32:54 -07:00
r888888888
66dd4f072e update tests 2015-06-02 19:20:48 -07:00
r888888888
c92c32ecda fix tests 2015-02-15 12:23:53 -08:00
r888888888
39ce77bbb1 fix nico seiga tests 2014-12-04 22:58:27 -08:00
r888888888
8d4c9d7955 fix pixiv tests 2014-10-22 17:22:36 -07:00
Toks
76d1f0a66b Update seiga and pixiv source tests, and seiga vcr cassettes
fixes #2142
2014-04-30 15:26:58 -04:00
r888888888
e2571e74cc refactored nico seiga sources 2013-08-07 18:12:16 -07:00
r888888888
5547f0937a switch vcr web backend to fakeweb, add vcr support for nico seiga tests 2013-05-03 16:24:10 -07:00
小太
cba839ba76 Kill trailing whitespace in ruby files 2013-03-19 23:10:10 +11:00
albert
7f11fb4583 fix for artist search 2013-02-19 21:41:35 -05:00
albert
f188c6d70d refactored source tests 2011-09-29 12:28:17 -04:00
albert
06054e5eb2 added source tests 2011-09-29 12:11:30 -04:00