Commit Graph

81 Commits

Author SHA1 Message Date
nonamethanks
8edd5dd810 Add furaffinity support 2022-04-27 03:47:59 +02:00
nonamethanks
c9227645d9 Add anifty.jp support 2022-04-18 16:50:26 +02:00
nonamethanks
9612578fcb Add Booth support 2022-04-16 17:52:18 +02:00
evazion
8ef72d59c1 artists: allow url_matches param to take multiple urls.
Pass as an array or space-separated string:

* https://danbooru.donmai.us/artists?search[url_matches]=https://www.pixiv.net/en/users/32777+https://www.pixiv.net/en/users/3584828
* https://danbooru.donmai.us/artists?search[url_matches][]=https://www.pixiv.net/en/users/32777&search[url_matches][]=https://www.pixiv.net/en/users/3584828
2022-04-03 02:54:30 -05:00
evazion
0d480eb832 artist urls: stop using normalized_url.
Stop the last remaining uses of the `artist_urls.normalized_url` column.
It's already no longer used by the artist finder. The only remaining
uses were by API users. Those users should use the `url` column instead.
2022-04-02 23:58:01 -05:00
evazion
b8f154d301 artists: add more artist url icons. 2022-03-30 22:04:24 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
912e996027 Fix #4470: Check URLs for duplicates when creating artists
Show a warning when creating a duplicate artist; that is, when adding a
URL that already belongs to another artist.

This is a soft warning rather than a hard error because there are some
cases where multiple artists legitimately share the same site or account.
2022-03-18 17:10:23 -05:00
evazion
10dac3ee51 artists: normalize urls added to artist entries.
When a URL is added to an artist entry, normalize it to a standard form.

Artist URLs have both a `url` column and a `normalized_url` column. The
`normalized_url` is used for artist finding and the `url` is the raw URL
entered by the user. Previously only the `normalized_url` field was
normalized; now the URL entered by the user is also converted to a
normalized form.

This means that if an URL like this is added to an artist entry:

* http://www.pixiv.net/member.php?id=1234
* http://www.pixiv.net/en/users/1234
* http://www.twitter.com/DanbooruBot/
* http://mobile.twitter.com/DanbooruBot/

It will get normalized to this:

* https://www.pixiv.net/users/1234
* https://twitter.com/DanbooruBot

This fixes problems with duplicate URLs being added to artist entries
because URLs weren't normalized to a single form.
2022-03-18 02:06:50 -05:00
evazion
03d2a86ef1 artists: normalize fc2.com profile urls. 2022-03-17 19:42:57 -05:00
evazion
cf8b8207e2 artists: change how artist urls are normalized.
Change how artist URLs are normalized in artist entries. Don't try to secretly
convert image URLs to profile URLs in artist entries. For example, if someone puts a
Pixiv image URL in an artist entry, don't secretly try to fetch the source and
convert it into a profile URL in the `normalized_url` field.

We did this because years ago, it was standard practice to put image URLs in artist
entries. Pixiv image URLs used to contain the artist's username, so we used to put
image URLs in artist entries for artist finding purposes. But Pixiv changed it so
that image URLs no longer contained the username, so we dealt with it by adding a
`normalized_url` column to artist_urls and secretly converting image URLs to profile
URLs in this field. But this is no longer necessary because now we don't normally put
image URLs in artist entries in the first place.

Now the `profile_url` method in `Source::URL` is used to normalize URLs in artist
entries. This lets us parse various profile URL formats and normalize them into a
single canonical form.

This also removes the `normalize_for_artist_finder` method from source strategies.
Instead the `profile_url` method is used for artist finding purposes. So the profile
URL returned by the source strategy needs to be the same as the URL in the artist
entry in order for artist finding to work.
2022-03-13 03:54:17 -05:00
evazion
28971fe103 sources: factor out site_name method. 2022-03-11 23:20:53 -06:00
nonamethanks
a6549bc6fe Add Fantia support
Also fixes a regression in 74fdeef10c
that stopped mastodon urls from being given the right priority.
2022-03-10 17:43:32 +01:00
evazion
5837b614d4 artists: fix exception on show page when artist has invalid URLs.
Fix an exception on the artist show page when the artist entry contained invalid URLs such
as `http://ttp://album.yahoo.co.jp/photos/my/8027988`. Caused by `ArtistUrl#domain`
returning nil for certain invalid URLs, which caused `Artist#sorted_urls` to blow up.

ref: https://danbooru.donmai.us/forum_posts/206488
2022-02-25 02:06:57 -06:00
evazion
26f4cf1ebd sources: factor out Source::URL::Skeb. 2022-02-25 02:06:57 -06:00
evazion
c5777f360e artist urls: normalize trailing slashes and missing http://.
* Remove unnecessary trailing slashes when artist URLs are saved.
* Automatically add `http://` to new artist URLs if it's missing (before
  this was an error; now it's automatically fixed).
2022-02-22 00:17:53 -06:00
evazion
60a26af6e3 rails: add 'URL' inflection.
Make it so we can write `ArtistURL` instead of `ArtistUrl`.
2022-02-22 00:17:53 -06:00
evazion
21c0d55aa4 Fix #5002: "Urls url has already been taken" when submitting duplicate urls with different capitalization
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
2022-02-08 19:15:55 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
nonamethanks
060223c9e2 Add Plurk support 2021-11-01 16:21:27 +01:00
nonamethanks
043f2fb124 Add Foundation support 2021-11-01 01:39:56 +01:00
evazion
79fdfa86ae Fix various rubocop warnings. 2021-09-27 00:46:13 -05:00
evazion
1716cc5bf9 artists: add more artist url icons. 2021-03-08 01:30:02 -06:00
evazion
7b60a476e5 sources: add artist profile links to fetch source data box.
Add site icons linking to all the artist's sites in the fetch source
data box.

Some artist entries have a large number of URLs. Various heuristics are
applied to try to present the most useful URLs first. Dead URLs and
redundant URLs (Pixiv stacc and Twitter intent URLs) are filtered out.
Remaining URLs are sorted first by site (to put sites like Pixiv and
Twitter first), then by URL (to break ties when an artist has multiple
accounts on the same site).

Some sites have shitty hard-to-read icons. It can't be helped. The icons
are the official favicons of each site.
2021-02-26 01:24:30 -06:00
evazion
ee4516f5fe searchable: refactor searchable_includes.
Pass searchable associations directly to search_attributes instead of
defining them separately in searchable_includes.
2020-12-16 23:57:07 -06:00
evazion
e771c0fca8 searchable: don't automatically include id, created_at, updated_at.
Don't make search methods on models call super in order to search
certain default attributes (id, created_at, updated_at). Simplifies some
magic.
2020-12-16 23:57:07 -06:00
evazion
8d87b1a0c0 models: fix deprecated errors[:base] << "message" calls.
Replace the idiom `errors[:base] << "message"` with
`errors.add(:base, "message")`. The former is deprecated in Rails 6.1.
2020-12-13 04:10:48 -06:00
evazion
92b6204a77 Merge pull request #4630 from nonamethanks/fix_fc2
Fix blog.fc2 urls matching wrong artists
2020-12-05 12:53:51 -06:00
nonamethanks
32f4cb1236 Fix blog.fc2 urls matching wrong artists 2020-12-04 00:17:02 +01:00
BrokenEagle
c4009efccd Convert models to use new search includes mechanism 2020-07-27 19:29:18 +00:00
evazion
3d90414e06 Merge pull request #4487 from BrokenEagle/fix-invalid-url
Fix invalid artist URLs being allowed
2020-06-29 17:46:13 -05:00
evazion
8eac82a971 pixiv: fix regression with new user profile urls.
* Update tests to use new Pixiv profile urls.
* Fix issue with artist finder not working when given direct image or
  html page urls.
2020-06-24 02:41:11 -05:00
BrokenEagle
ed9135bcf3 Perform some scheme and hostname normalization on the URL itself
- Converts scheme and hostname to lowercase
- Converts unicode hostnames into Punycode

This all gets done before the normalized URL gets assigned.

Additionally, this removes the dead commented out line for Nicoseiga.
2020-05-31 06:34:06 +00:00
BrokenEagle
c21af0c853 Fix invalid artist URLs being allowed
The problem was that the Addressable parser does not catch all invalid
URL cases, so some extra checks were added in.

- hostname must contain a dot

This accounts for URLs of the following type:

http://http://something.com

which has a hostname of http.

The artist URL tests were also updated with cases which test all validation
errors.
2020-05-31 06:34:06 +00:00
evazion
88d9fc4e5e sources: simplify artist finder url normalization.
Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`.
This was legacy bullshit that was originally designed to avoid API calls
when saving artist entries containing old Pixiv direct image urls that
had already been normalized, or that couldn't be normalized because they
were bad id.

Nowadays we store profile urls in artist entries instead of direct image
urls, so we don't normally need to do any API calls to normalize the
profile url. Strategies should take care to avoid triggering API calls
inside `profile_url` when possible.
2020-05-29 15:35:15 -05:00
BrokenEagle
63b3503bfc Add ability to use nested only parameter
- The only string works much the same as before with its comma separation
-- Nested includes are indicated with square brackets "[ ]"
-- The nested include is the value immediately preceding the square brackets
-- The only string is the comma separated string inside those brackets
- Default includes are split between format types when necessary
-- This prevents unnecessary includes from being added on page load
- Available includes are those items which are allowed to be accessible to the user
-- Some aren't because they are sensitive, such as the creator of a flag
-- Some aren't because the number of associated items is too large
- The amount of times the same model can be included to prevent recursions
-- One exception is the root model may include the same model once
--- e.g. the user model can include the inviter which is also the user model
-- Another exception is if the include is a has_many association
--- e.g. artist urls can include the artist, and then artist urls again
2020-02-12 23:58:53 +00:00
evazion
5a4b24f6a0 pixiv: normalize new profile urls.
ref: https://danbooru.donmai.us/forum_topics/9127?page=290#forum_post_162222
2020-01-08 17:33:55 -06:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
7b8584e3b0 Model#search: refactor searching for attributes. 2019-08-29 20:44:33 -05:00
evazion
03cc3dfa50 artists: fix editing invalid urls in artist entries (fix #3720, #3927, #3781)
Convert to an autosave association on urls. This ensures that when we
save the artist we only validate the added urls, not bad urls that we're
trying to remove, and that url validation errors are propagated up to
the artist object.

This also fixes invalid urls being saved in the artist history despite
validation failing (#3720).
2018-10-04 19:49:16 -05:00
evazion
3afc0b3a78 artist urls: add more url search params for /artist_urls.
Adds these search params:

* /artist_urls?search[url]=...
* /artist_urls?search[url_eq]=...
* /artist_urls?search[url_not_eq]=...
* /artist_urls?search[url_like]=...
* /artist_urls?search[url_ilike]=...
* /artist_urls?search[url_not_like]=...
* /artist_urls?search[url_not_ilike]=...
* /artist_urls?search[url_regex]=...
* /artist_urls?search[url_not_regex]=...

and likewise for normalized_url.
2018-09-15 19:58:54 -05:00
evazion
c06af060f9 artist urls: add artist, url_matches search params to /artist_urls. 2018-09-15 19:58:31 -05:00
evazion
1fce794b99 artist urls: add /artist_urls index page. 2018-09-15 19:58:05 -05:00
evazion
99a5e885e0 artist_url.rb: remove legacy artist url normalization code.
* `legacy_normalize` came from c6012535, which is no longer a problem.

* `normalize_for_search` is only used for "[mass edit]" links
in artist entries. These links are a shortcut for performing a
`-artist_name source:<artist_url> -> artist_name` mass edit to tag
untagged artists, but this won't work for most sites these days.
2018-09-07 12:55:51 -05:00
evazion
8e7dd9e97f artist_url.rb: remove unnecessary deviantart profile url normalization.
This is now handled in the source strategy.
2018-09-07 12:26:50 -05:00
evazion
aee1906761 Fix #3738: Artist URL search should be case-insensitive for domains. 2018-09-05 19:14:24 -05:00
Albert Yi
762dc3da24 Refactor sources 2018-08-24 12:10:51 -07:00
Albert Yi
95b72f5f5c normalize https into http for artist urls 2018-07-27 14:25:47 -07:00
Albert Yi
135b97d511 additional fixes for deviantart artist search (#3771) 2018-07-27 12:31:26 -07:00
Albert Yi
77854349e5 testing 2018-07-26 18:11:19 -07:00