Files
danbooru/app/models/artist_url.rb
evazion 57bb51621e Add debugger gem.
Fix random VCR failures in Pixiv tests.

Sometimes tests randomly fail because the PHPSESSID they use in their
HTTP requests to Pixiv is different than the one that was originally
recorded by VCR. This causes VCR to complain that the requests don't
match.

This is caused by the PHPSESSID being globally cached in Memcache.
Depending on the order the tests run in (which is random), one set of
tests can use a PHPSESSID that was recorded for a /different/ set of
tests.

Improve Pixiv URL matching.

* Allow URLs that are missing the http:// part. These are sometimes seen
  in artist entries.
* Ignore URLs from random Pixiv domains such as dic.pixiv.net,
  blog.pixiv.net, etc. These are also sometimes in artist entries.

Improve normalize_for_artist_finder! URL matching.

* Normalize www.pixiv.net/stacc/username URLs.
* Correctly normalize URLs that are missing the illust ID part on the end
  (i.e. http://i2.pixiv.net/img04/img/syounen_no_uta/). These are common
  in artist entries.

Match URLs strictly when normalizing for artist entries.

Only normalize Pixiv URLs that strictly match a known format. Pass any
unrecognized URLs through without attempting to normalize them, just to
be safe.

Normalize URLs when saving artist entries.
2014-12-03 13:16:05 -08:00

49 lines
1.4 KiB
Ruby

class ArtistUrl < ActiveRecord::Base
before_save :initialize_normalized_url, on: [ :create ]
before_save :normalize
validates_presence_of :url
belongs_to :artist
attr_accessible :url, :artist_id, :normalized_url
def self.normalize(url)
if url.nil?
nil
else
url = url.gsub(/^https:\/\//, "http://")
url = url.gsub(/^http:\/\/blog\d+\.fc2/, "http://blog.fc2")
url = url.gsub(/^http:\/\/blog-imgs-\d+\.fc2/, "http://blog.fc2")
url = url.gsub(/^http:\/\/blog-imgs-\d+-\w+\.fc2/, "http://blog.fc2")
url = Sources::Site.new(url).normalize_for_artist_finder!
url = url.gsub(/\/+\Z/, "")
url + "/"
end
end
def self.normalize_for_search(url)
if url =~ /\.\w+\Z/ && url =~ /\w\/\w/
url = File.dirname(url)
end
url = url.gsub(/^https:\/\//, "http://")
url = url.gsub(/^http:\/\/blog\d+\.fc2/, "http://blog*.fc2")
url = url.gsub(/^http:\/\/blog-imgs-\d+\.fc2/, "http://blog*.fc2")
url = url.gsub(/^http:\/\/blog-imgs-\d+-\w+\.fc2/, "http://blog*.fc2")
url = url.gsub(/^http:\/\/img\d+\.pixiv\.net/, "http://img*.pixiv.net")
url = url.gsub(/^http:\/\/i\d+\.pixiv\.net\/img\d+/, "http://*.pixiv.net/img*")
end
def normalize
if !Sources::Site.new(normalized_url).normalized_for_artist_finder?
self.normalized_url = self.class.normalize(url)
end
end
def initialize_normalized_url
self.normalized_url = url
end
def to_s
url
end
end