Sources::Site#normalize_for_artist_finder! catches errors that occur
during the normalization process. However it only catches
StandardErrors, not Exceptions, so this makes Sources::Error a
StandardError.
Fix random VCR failures in Pixiv tests.
Sometimes tests randomly fail because the PHPSESSID they use in their
HTTP requests to Pixiv is different than the one that was originally
recorded by VCR. This causes VCR to complain that the requests don't
match.
This is caused by the PHPSESSID being globally cached in Memcache.
Depending on the order the tests run in (which is random), one set of
tests can use a PHPSESSID that was recorded for a /different/ set of
tests.
Improve Pixiv URL matching.
* Allow URLs that are missing the http:// part. These are sometimes seen
in artist entries.
* Ignore URLs from random Pixiv domains such as dic.pixiv.net,
blog.pixiv.net, etc. These are also sometimes in artist entries.
Improve normalize_for_artist_finder! URL matching.
* Normalize www.pixiv.net/stacc/username URLs.
* Correctly normalize URLs that are missing the illust ID part on the end
(i.e. http://i2.pixiv.net/img04/img/syounen_no_uta/). These are common
in artist entries.
Match URLs strictly when normalizing for artist entries.
Only normalize Pixiv URLs that strictly match a known format. Pass any
unrecognized URLs through without attempting to normalize them, just to
be safe.
Normalize URLs when saving artist entries.
Refactors things such that Sources::Site has a normalize_for_artist_finder!
method that delegates to the strategy for the appropriate site. This way
any site that needs to normalize URLs for the artist finder can do so.
There are two kinds of thumbnails that need to be rewritten. First case:
new /img-master/ URLs need to be rewritten to /img-original/ URLs like this:
http://i2.pixiv.net/c/600x600/img-master/img/2014/10/04/03/59/52/46337015_p0_master1200.jpg
=> http://i2.pixiv.net/img-original/img/2014/10/04/03/59/52/46337015_p0.png
This is what `rewrite_new_medium_images` does. In order to do this, it
has to use the Pixiv API to get the correct file extension.
Second case: Old small/medium size URLs need to be rewritten to full
size URLs like this:
http://i2.pixiv.net/img18/img/evazion/14901720_m.png
=> http://i2.pixiv.net/img18/img/evazion/14901720.png
But when the medium size URL is actually for a manga image, it needs to be
rewritten to the big manga URL instead:
http://i2.pixiv.net/img04/img/syounen_no_uta/46170939_m.jpg
=> http://i2.pixiv.net/img04/img/syounen_no_uta/46170939_big_p0.jpg
But we can't tell whether it's a manga image from the URL, so we have to
use the manga page count from either the HTML page or the API to
determine whether it's part of a manga gallery.
So in order to make this work, `rewrite_old_small_and_medium_images`
takes an `is_manga` flag. `Sources::Strategies::Pixiv#get` gets the
page count from the HTML and passes the `is_manga` flag on down through
the call chain until `rewrite_old_small_and_medium_images` gets it.
When `rewrite_old_small_and_medium_images` is called from
`Downloads::Strategies::Pixiv#rewrite_thumbnails`, the `is_manga` flag
isn't passed in because we didn't scrape the HTML. This causes
`rewrite_old_small_and_medium_images` to look it up in the API instead.
Currently this is done twice: once when the page first loads (although
this one isn't used) and then a second time asynchronously with
javascript (which is used). This commit removes the first one, improving
upload page load time.
* Support rewriting source when user uploads from a thumbnail url or
html page url
* Fix bug where site did not log in correctly
* Fix bug where the image url couldn't be extracted from the page if the
image was rated as adults only on seiga
* Normalize direct image url to html page url so tags, etc., can be
extracted