Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.
This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.
This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.
API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
the MediaAsset is created, not when the post is created. This way it's
possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
with the ugoira frame data already attached to it, instead of returning it
in the `data` field then passing it around separately in the `context`
field of the upload.
Add support for using a proxy for HTTP requests. Only used for external
requests, such as downloading files or talking to source sites such as
Pixiv or Twitter, not for internal requests, such as talking to IQDB or
Reportbooru.
Add site icons linking to all the artist's sites in the fetch source
data box.
Some artist entries have a large number of URLs. Various heuristics are
applied to try to present the most useful URLs first. Dead URLs and
redundant URLs (Pixiv stacc and Twitter intent URLs) are filtered out.
Remaining URLs are sorted first by site (to put sites like Pixiv and
Twitter first), then by URL (to break ties when an artist has multiple
accounts on the same site).
Some sites have shitty hard-to-read icons. It can't be helped. The icons
are the official favicons of each site.
Regression caused by the switch from the mobile API to the Ajax API. In
the Ajax API, commentaries have /jump.php?<url> links that we have to strip out.
Fix the Pixiv API no longer working by rewriting the Pixiv strategy to
use the Ajax API instead of the mobile API.
Before we could authenticate in the mobile API by using the OAuth 2.0
grant_type=password authentication flow. This no longer works. Now it
requires logging in through a HTML page, which is protected by Google
reCaptcha. This makes using the mobile API infeasible.
Instead we switch to the Ajax API, which only needs a PHPSESSID to
authenticate. This can be obtained by logging in manually and using the
devtools to extract the cookie.
This also temporarily removes support for Pixiv novels. This should be
moved to a separate source strategy.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.
Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
* Move the source normalization logic out of the post model
and into individual sources' strategies.
* Rewrite normalization tests to be handled into each source's test,
and expand them significantly. Previously we were only testing
a very small subset of domains and variants.
* Fix up normalization for several sites.
* Normalize fav.me urls into normal deviantart urls.
* Tighten up illust id parsing to avoid misparsing ids from
non-illust urls (sketch urls and novel urls).
* Move id parsing tests from post_test.rb to sources/pixiv_test.rb.
* Drop support for touch.pixiv.net urls. These urls are no longer used
by Pixiv and aren't present as the source of any posts on Danbooru.
Bug: Uploading bad pixiv id images failed because the pixiv strategy
raised a BadIDError exception when the upload service checked for the
ugoira frame data.
* Rename `unique_id` to `tag_name`.
* Add `other_names` and `profile_urls` methods that sources can override
to provide extra names or urls when creating new artist entries.
* Normalize spaces to underscores when saving other names. Preserve case
since case can be significant.
* Fix WikiPage#other_names_include to search case-insensitively (note:
this prevents using the index).
* Fix sources to return the raw tags in `#tags` and the normalized tags
in `#normalized_tags`. The normalized tags are the tags that will be
matched against other names.
Fix sources choosing the wrong strategy when the referer belongs to a
different site (for example, when uploading a twitter post with a pixiv
referer).
* Fix `match?` to only consider the main url, not the referer.
* Change `match?` to match against a list of domains given by the `domains` method.
* Change `match?` to an instance method.
* On the /uploads/new page, instead of just showing a "This post has
probably already been uploaded" message, show the actual thumbnails of
posts having the same source as what the user is trying to upload.
* Move the iqdb results section up top, beside the related posts section.