Stop the last remaining uses of the `artist_urls.normalized_url` column.
It's already no longer used by the artist finder. The only remaining
uses were by API users. Those users should use the `url` column instead.
Show a warning when creating a duplicate artist; that is, when adding a
URL that already belongs to another artist.
This is a soft warning rather than a hard error because there are some
cases where multiple artists legitimately share the same site or account.
Change how artist URLs are normalized in artist entries. Don't try to secretly
convert image URLs to profile URLs in artist entries. For example, if someone puts a
Pixiv image URL in an artist entry, don't secretly try to fetch the source and
convert it into a profile URL in the `normalized_url` field.
We did this because years ago, it was standard practice to put image URLs in artist
entries. Pixiv image URLs used to contain the artist's username, so we used to put
image URLs in artist entries for artist finding purposes. But Pixiv changed it so
that image URLs no longer contained the username, so we dealt with it by adding a
`normalized_url` column to artist_urls and secretly converting image URLs to profile
URLs in this field. But this is no longer necessary because now we don't normally put
image URLs in artist entries in the first place.
Now the `profile_url` method in `Source::URL` is used to normalize URLs in artist
entries. This lets us parse various profile URL formats and normalize them into a
single canonical form.
This also removes the `normalize_for_artist_finder` method from source strategies.
Instead the `profile_url` method is used for artist finding purposes. So the profile
URL returned by the source strategy needs to be the same as the URL in the artist
entry in order for artist finding to work.
* Remove unnecessary trailing slashes when artist URLs are saved.
* Automatically add `http://` to new artist URLs if it's missing (before
this was an error; now it's automatically fixed).
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
Add site icons linking to all the artist's sites in the fetch source
data box.
Some artist entries have a large number of URLs. Various heuristics are
applied to try to present the most useful URLs first. Dead URLs and
redundant URLs (Pixiv stacc and Twitter intent URLs) are filtered out.
Remaining URLs are sorted first by site (to put sites like Pixiv and
Twitter first), then by URL (to break ties when an artist has multiple
accounts on the same site).
Some sites have shitty hard-to-read icons. It can't be helped. The icons
are the official favicons of each site.
- Converts scheme and hostname to lowercase
- Converts unicode hostnames into Punycode
This all gets done before the normalized URL gets assigned.
Additionally, this removes the dead commented out line for Nicoseiga.
The problem was that the Addressable parser does not catch all invalid
URL cases, so some extra checks were added in.
- hostname must contain a dot
This accounts for URLs of the following type:
http://http://something.com
which has a hostname of http.
The artist URL tests were also updated with cases which test all validation
errors.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.
Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
- The only string works much the same as before with its comma separation
-- Nested includes are indicated with square brackets "[ ]"
-- The nested include is the value immediately preceding the square brackets
-- The only string is the comma separated string inside those brackets
- Default includes are split between format types when necessary
-- This prevents unnecessary includes from being added on page load
- Available includes are those items which are allowed to be accessible to the user
-- Some aren't because they are sensitive, such as the creator of a flag
-- Some aren't because the number of associated items is too large
- The amount of times the same model can be included to prevent recursions
-- One exception is the root model may include the same model once
--- e.g. the user model can include the inviter which is also the user model
-- Another exception is if the include is a has_many association
--- e.g. artist urls can include the artist, and then artist urls again
Convert to an autosave association on urls. This ensures that when we
save the artist we only validate the added urls, not bad urls that we're
trying to remove, and that url validation errors are propagated up to
the artist object.
This also fixes invalid urls being saved in the artist history despite
validation failing (#3720).
* `legacy_normalize` came from c6012535, which is no longer a problem.
* `normalize_for_search` is only used for "[mass edit]" links
in artist entries. These links are a shortcut for performing a
`-artist_name source:<artist_url> -> artist_name` mass edit to tag
untagged artists, but this won't work for most sites these days.