Commit Graph

82 Commits

Author SHA1 Message Date
r888888888
9763e76707 fixes #2324 2014-12-13 00:23:38 -08:00
r888888888
7175df2099 fix for Pixiv#get_image_url_from_page 2014-12-12 12:33:52 -08:00
Toks
6d565d9f87 Fix regression with pixiv manga uploads 2014-12-11 13:21:43 -05:00
r888888888
029202068d fix Pixiv#get_image_url_from_page 2014-12-11 10:05:59 -08:00
Toks
e0b0760b39 should fix #2320
Sources::Site#normalize_for_artist_finder! catches errors that occur
during the normalization process. However it only catches
StandardErrors, not Exceptions, so this makes Sources::Error a
StandardError.
2014-12-09 17:14:28 -05:00
r888888888
4fcb1d2bbc support for twitter downloads 2014-12-05 14:19:36 -08:00
r888888888
39ce77bbb1 fix nico seiga tests 2014-12-04 22:58:27 -08:00
r888888888
e8a8999a73 fix class error 2014-12-04 17:32:19 -08:00
r888888888
d7f044e18b loosen constraints on pixiv img server regexp 2014-12-03 13:39:47 -08:00
evazion
57bb51621e Add debugger gem.
Fix random VCR failures in Pixiv tests.

Sometimes tests randomly fail because the PHPSESSID they use in their
HTTP requests to Pixiv is different than the one that was originally
recorded by VCR. This causes VCR to complain that the requests don't
match.

This is caused by the PHPSESSID being globally cached in Memcache.
Depending on the order the tests run in (which is random), one set of
tests can use a PHPSESSID that was recorded for a /different/ set of
tests.

Improve Pixiv URL matching.

* Allow URLs that are missing the http:// part. These are sometimes seen
  in artist entries.
* Ignore URLs from random Pixiv domains such as dic.pixiv.net,
  blog.pixiv.net, etc. These are also sometimes in artist entries.

Improve normalize_for_artist_finder! URL matching.

* Normalize www.pixiv.net/stacc/username URLs.
* Correctly normalize URLs that are missing the illust ID part on the end
  (i.e. http://i2.pixiv.net/img04/img/syounen_no_uta/). These are common
  in artist entries.

Match URLs strictly when normalizing for artist entries.

Only normalize Pixiv URLs that strictly match a known format. Pass any
unrecognized URLs through without attempting to normalize them, just to
be safe.

Normalize URLs when saving artist entries.
2014-12-03 13:16:05 -08:00
Toks
fc0328b9df fixes #2317 2014-12-02 16:02:21 -05:00
r888888888
16f9a61d63 fixes #2299 2014-10-29 15:14:17 -07:00
r888888888
8d4c9d7955 fix pixiv tests 2014-10-22 17:22:36 -07:00
Toks
5f9ce7ee47 Fix get_image_url_from_page call 2014-10-19 02:30:02 -07:00
r888888888
4c73fb9f79 add ugoira support in view 2014-10-19 02:30:02 -07:00
r888888888
3bb06c2be4 integrate ugoiras into zip+webm+preview 2014-10-19 02:30:02 -07:00
r888888888
fb2219d4ac integrate ugoira converted into upload flow 2014-10-19 02:30:01 -07:00
Toks
2e8230f92a Merge pull request #2263 from evazion/new-pixiv-urls-fixes
Fix artist finder and URL rewriting for new Pixiv URLs
2014-10-05 16:16:04 -04:00
evazion
c75d2d208e normalize_for_artist_finder!: Don't crash on bad URLs
If we can't normalize the URL (because of bad IDs, it's malformed, or
the HTML page changed), just return the unnormalized URL.
2014-10-05 14:11:32 -05:00
evazion
7f3b98969f Refactor normalize_for_artist_finder!
Refactors things such that Sources::Site has a normalize_for_artist_finder!
method that delegates to the strategy for the appropriate site. This way
any site that needs to normalize URLs for the artist finder can do so.
2014-10-05 14:11:31 -05:00
Toks
f4529e73e3 Cache seiga and nijie sessions 2014-10-05 12:11:08 -04:00
evazion
ec0f226f46 Make the artist finder work with new Pixiv URLs. 2014-10-04 12:45:37 -05:00
evazion
964b5efcd3 Rewrite Pixiv small/medium images to full size images.
There are two kinds of thumbnails that need to be rewritten. First case:
new /img-master/ URLs need to be rewritten to /img-original/ URLs like this:

    http://i2.pixiv.net/c/600x600/img-master/img/2014/10/04/03/59/52/46337015_p0_master1200.jpg
    => http://i2.pixiv.net/img-original/img/2014/10/04/03/59/52/46337015_p0.png

This is what `rewrite_new_medium_images` does. In order to do this, it
has to use the Pixiv API to get the correct file extension.

Second case: Old small/medium size URLs need to be rewritten to full
size URLs like this:

    http://i2.pixiv.net/img18/img/evazion/14901720_m.png
    => http://i2.pixiv.net/img18/img/evazion/14901720.png

But when the medium size URL is actually for a manga image, it needs to be
rewritten to the big manga URL instead:

    http://i2.pixiv.net/img04/img/syounen_no_uta/46170939_m.jpg
    => http://i2.pixiv.net/img04/img/syounen_no_uta/46170939_big_p0.jpg

But we can't tell whether it's a manga image from the URL, so we have to
use the manga page count from either the HTML page or the API to
determine whether it's part of a manga gallery.

So in order to make this work, `rewrite_old_small_and_medium_images`
takes an `is_manga` flag. `Sources::Strategies::Pixiv#get` gets the
page count from the HTML and passes the `is_manga` flag on down through
the call chain until `rewrite_old_small_and_medium_images` gets it.

When `rewrite_old_small_and_medium_images` is called from
`Downloads::Strategies::Pixiv#rewrite_thumbnails`, the `is_manga` flag
isn't passed in because we didn't scrape the HTML. This causes
`rewrite_old_small_and_medium_images` to look it up in the API instead.
2014-10-04 12:45:37 -05:00
evazion
36a78361d7 Normalize URLs to the mode=medium page correctly.
This handles a few new cases that weren't handled correctly previously.

* http://i1.pixiv.net/img-zip-ugoira/img/2014/10/03/17/29/16/46323924_ugoira1920x1080.zip
* http://i1.pixiv.net/c/600x600/img-master/img/2014/10/02/13/51/23/46304396_p0_master1200.jpg
* http://www.pixiv.net/member_illust.php?mode=manga&illust_id=18557054
* http://www.pixiv.net/member_illust.php?mode=manga_big&illust_id=18557054&page=1
* http://www.pixiv.net/i/18557054
2014-10-04 12:45:36 -05:00
evazion
f889dbf10f Add get_metadata_from_spapi! 2014-10-04 12:45:36 -05:00
evazion
7f98b370ec Fix scraping the Pixiv artist username.
The artist's username is no longer contained in the image thumbnail URL on the
HTML page. Get it from the Feed link instead.
2014-10-04 12:45:36 -05:00
evazion
74c116ffb7 Fix for scraping the manga page count.
The string for the page count has changed. It now looks like "複数枚投稿 3P"
on all Pixiv posts I've checked.
2014-10-04 12:45:36 -05:00
Toks
6dce66f33d Cache pixiv session 2014-09-29 01:38:53 -04:00
Toks
5aca6aa7c9 Fix pixiv gallery page counts 2014-07-16 11:58:43 -04:00
Toks
224da8a7da Prevent pixiv downloader from returning wrong image 2014-06-25 11:14:08 -04:00
Toks
9621ec7dac Support translating Pixiv "x users iri" tags 2014-06-16 14:20:28 -04:00
Toks
be28a8e624 Fix Seiga sample/thumbnail rewriting 2014-06-13 16:59:08 -04:00
Toks
3230ab8781 Add warning when Pixiv post is a gallery of multiple images 2014-06-13 16:33:38 -04:00
Toks
7ca7ac2709 #1866: Support Nijie source data getting 2014-06-03 18:42:24 -04:00
Toks
d092ea0094 fixes #1207 2014-05-29 23:43:19 -04:00
Toks
eb81f06eb2 merge translated tags branch 2014-05-29 23:11:34 -04:00
Toks
4f8b455830 fixes #2168 2014-05-29 18:46:09 -04:00
Toks
5f70768962 #1866: Support HTTPS urls 2014-05-23 14:15:33 -04:00
Toks
38c0e01f9b Support referrer matching for seiga and da 2014-05-23 14:15:23 -04:00
Toks
b18bb73f4b Implentation for #2141 2014-05-22 20:07:15 -04:00
Toks
a3d120c632 #1866: Support HTTPS urls 2014-05-15 23:35:57 -04:00
Toks
0a75402cc7 Support referrer matching for seiga and da 2014-05-08 20:25:11 -04:00
Toks
47f56cd19d #1866: Fix deviantart regex again and support alternate url style 2014-05-04 15:54:11 -04:00
Toks
bb07dc429b Seiga: fix source uploads still not working in some cases 2014-04-30 15:18:53 -04:00
Toks
884be2b711 Seiga: fix source uploads not working 2014-04-30 14:40:21 -04:00
Toks
281c7e4bf7 Seiga: fix getting tags 2014-04-30 14:32:14 -04:00
Toks
899fd8f71f Don't instantly make a request to get info when using bookmarklet
Currently this is done twice: once when the page first loads (although
this one isn't used) and then a second time asynchronously with
javascript (which is used). This commit removes the first one, improving
upload page load time.
2014-04-30 14:28:07 -04:00
Toks
ce2bcc4570 Seiga: support alternate type of direct link url 2014-04-30 12:31:41 -04:00
Toks
b559f11c99 Seiga: fix getting artist name 2014-04-30 12:29:35 -04:00
Toks
0507064004 #1866: Add nico seiga support and fix various seiga bugs
* Support rewriting source when user uploads from a thumbnail url or
html page url
* Fix bug where site did not log in correctly
* Fix bug where the image url couldn't be extracted from the page if the
image was rated as adults only on seiga
* Normalize direct image url to html page url so tags, etc., can be
extracted
2014-04-29 11:46:08 -04:00