Commit Graph

127 Commits

Author SHA1 Message Date
evazion
f8aa985a16 Fix #4908: Prevent artist entries from being made on disambiguation tags.
Don't allow artist entries to be created for deprecated tags.
2022-04-30 15:21:10 -05:00
evazion
76d9e86724 Fix #5140: Unexpected error: PublicSuffix::DomainInvalid for searching some newgrounds urls in /artists
When the artist name couldn't found for a Newgrounds URL, for example
for `https://www.newgrounds.com/dump/item`, then the `profile_url`
method erroneously returned `https://.newgrounds.com`. This led to an
error later on when the artist finder tried to parse the invalid URL.

Also fix `strategy_should_work` to test that the profile URL is a valid
URL, and not to try to download the file when image_urls is empty.
2022-04-22 23:16:41 -05:00
evazion
0d480eb832 artist urls: stop using normalized_url.
Stop the last remaining uses of the `artist_urls.normalized_url` column.
It's already no longer used by the artist finder. The only remaining
uses were by API users. Those users should use the `url` column instead.
2022-04-02 23:58:01 -05:00
evazion
231075fb49 artists: fix artist finder to return nothing if it finds too many duplicates 2022-03-26 15:08:55 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
455ee9a52a fc2: parse more url types. 2022-03-18 02:06:30 -05:00
evazion
cf8b8207e2 artists: change how artist urls are normalized.
Change how artist URLs are normalized in artist entries. Don't try to secretly
convert image URLs to profile URLs in artist entries. For example, if someone puts a
Pixiv image URL in an artist entry, don't secretly try to fetch the source and
convert it into a profile URL in the `normalized_url` field.

We did this because years ago, it was standard practice to put image URLs in artist
entries. Pixiv image URLs used to contain the artist's username, so we used to put
image URLs in artist entries for artist finding purposes. But Pixiv changed it so
that image URLs no longer contained the username, so we dealt with it by adding a
`normalized_url` column to artist_urls and secretly converting image URLs to profile
URLs in this field. But this is no longer necessary because now we don't normally put
image URLs in artist entries in the first place.

Now the `profile_url` method in `Source::URL` is used to normalize URLs in artist
entries. This lets us parse various profile URL formats and normalize them into a
single canonical form.

This also removes the `normalize_for_artist_finder` method from source strategies.
Instead the `profile_url` method is used for artist finding purposes. So the profile
URL returned by the source strategy needs to be the same as the URL in the artist
entry in order for artist finding to work.
2022-03-13 03:54:17 -05:00
evazion
c5777f360e artist urls: normalize trailing slashes and missing http://.
* Remove unnecessary trailing slashes when artist URLs are saved.
* Automatically add `http://` to new artist URLs if it's missing (before
  this was an error; now it's automatically fixed).
2022-02-22 00:17:53 -06:00
evazion
60a26af6e3 rails: add 'URL' inflection.
Make it so we can write `ArtistURL` instead of `ArtistUrl`.
2022-02-22 00:17:53 -06:00
evazion
21c0d55aa4 Fix #5002: "Urls url has already been taken" when submitting duplicate urls with different capitalization
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
2022-02-08 19:15:55 -06:00
evazion
6d2a2eee59 Fix #4017: Artist tag in upload page should account for aliases
Disallow creating artist entries for aliased tags. Add a fix script to
move existing artist entries for tags that have been aliased.
2022-02-01 12:33:45 -06:00
evazion
02c9498860 artists: normalize group names.
Normalize artist group names following the same rules as artist other names.

This means artist group names now use underscores instead of spaces.
It also means extra space characters at the beginning and end of names
is stripped, and Unicode characters are normalized.

Fixes #4647, which was caused by users accidentally replacing group
names with a single space character when trying to remove a group.
2022-01-20 00:17:06 -06:00
evazion
ac12efb636 tests: fix test failures when running without API keys.
Fix the test suite failing when trying to run it in the default state
with no config file or API keys configured. Most source sites require
API keys or login credentials to be set in order to work. Skip these
tests when credentials aren't configured.
2021-09-22 04:33:36 -05:00
evazion
ffbf7f1ccf tests: fix broken tests. 2021-05-15 02:48:13 -05:00
evazion
d18dc573fb artists: fix misnormalization of emoji in other names.
Fix `normalize_whitespace` to not strip zero-width joiner characters
(U+200D). These characters are used in emoji and stripping them breaks
some artist other names that use emoji.
2021-01-10 02:46:20 -06:00
evazion
5962152ee3 wiki pages, artists: fix normalization of other names.
Fix wiki pages and artists to normalize other names more consistently
and correctly.

For wiki pages, we strip leading / trailing / repeated underscores to
fix user typos, and we normalize to NFKC form to make search more consistent.

For artists, we allow leading / trailing / repeated underscores because
some artist names have these, and we normalize to NFC form because some
artists have weird names that would be lost by NFKC form. This does make
search less consistent.
2021-01-10 02:03:12 -06:00
nonamethanks
3801e08ae6 Update pixiv url matching tests 2020-12-16 13:53:16 +01:00
evazion
cc781ba2b9 tests: add tests for #4551, #4630. 2020-12-05 12:54:32 -06:00
evazion
8717c319ab aliases/implications: remove 'pending' state.
Remove the pending status from tag aliases and implications.

Previously aliases would be created first in the pending state then
changed to active when the alias was later processed in a delayed job.
This meant that BURs weren't processed completely sequentially; first
all the aliases in a BUR would be created in one go, then later they
would be processed and set to active sequentially.

This was problematic in complex BURs that tried to reverse or swap
around aliases, since new pending aliases could be created before old
conflicting aliases were removed.
2020-12-01 18:58:45 -06:00
evazion
45d050d918 tests: fix artist ban tests. 2020-12-01 14:16:47 -06:00
evazion
bfc31acbed artists: fix accidental gentag category changes.
Bug: if you created an artist with the name of an existing general tag,
then the gentag would be changed to an artist tag, no matter how big the
gentag was.

Now we only allow creating artist entries for non-artist tags if the tag
is empty.

Ref: https://danbooru.donmai.us/forum_topics/17095
2020-07-07 12:47:18 -05:00
evazion
e822d5f16e tests: fix invalid artist url test. 2020-07-03 12:06:12 -05:00
evazion
8eac82a971 pixiv: fix regression with new user profile urls.
* Update tests to use new Pixiv profile urls.
* Fix issue with artist finder not working when given direct image or
  html page urls.
2020-06-24 02:41:11 -05:00
evazion
83a8468ee9 tests: remove unnecessary rescueing of Net::OpenTimeout errors.
These exceptions are no longer thrown now that we've switched from
HTTParty to http.rb. Swallowing unexpected exceptions during testing was
a bad practice anyway.
2020-06-23 03:12:44 -05:00
lllusion3469
9205c32424 deviantart: revert to 7f482dc35b
that's the latest commit made to deviantart files before switching from
the developer API to the Javascript backend from the new "Eclipse"
frontend.
This is necessary because it's basically impossible to download posts
now with the JS backend without being logged in, i.e. having the cookies
from a logged in user, which can't be used for very long even if
exporting them from a browser. You would have to save the cookies
deviantart sends you back via the "Set-Cookie" header in a database
somewhere in addition to the other added complexity.

also
* (temporarily) replace HttpartyCache with HTTParty as it's long been
  removed
* fix one case of "last argument as keyword parameter"
* change repository url (5d1a1cc87e)
* remove self-explanatory comment
2020-05-11 16:09:00 +02:00
evazion
66c8c1f53f artists: fix artist bans not being recorded in artist history.
Using update_column bypasses callbacks, so a new artist version wasn't
created when the is_banned flag was changed.
2020-05-04 03:39:41 -05:00
evazion
ddffffb413 artists: factor out artist finder to separate module. 2020-03-06 23:23:38 -06:00
evazion
4c11e339bd artists: rename is_active flag to is_deleted.
Rename is_active to is_deleted. This is for better consistency with
other models, and to reduce confusion over what "active" means for
artists. Sometimes users think active is for whether the artist is
actively producing work.
2020-03-06 14:50:21 -06:00
evazion
2dab9aa075 models: remove creator_id from artists, notes, and pools.
Remove the creator_id field from artists, notes, and pools. The
creator_id wasn't otherwise used and was inconsistent with the
artist/note/pool history in some cases, especially for old artists.
2020-02-16 23:09:00 -06:00
evazion
ef3188a7fe artists/edit: refactor editing nested wiki pages.
Refactor to use accepts_nested_attributes_for instead of the notes
attribute to facilitate editing wikis on the artist edit page.

This fixes the notes attribute unintentionally showing up in the API.

This also changes it so that renaming an artist entry doesn't
automatically rename the corresponding wiki page. This had bad behavior
when there was a conflict between wiki pages (the wikis would be
silently merged, which usually isn't what you want). It also didn't warn
about wiki links being broken by renames.
2020-02-16 18:48:41 -06:00
evazion
b4ce2d83a6 models: remove belongs_to_creator macro.
The belongs_to_creator macro was used to initialize the creator_id field
to the CurrentUser. This made tests complicated because it meant you had
to create and set the current user every time you wanted to create an
object, when lead to the current user being set over and over again. It
also meant you had to constantly be aware of what the CurrentUser was in
many different contexts, which was often confusing. Setting creators
explicitly simplifies everything greatly.
2020-01-21 00:09:38 -06:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
258fa06422 tests: replace workoff_active_jobs with perform_enqueued_jobs. 2019-09-07 22:21:55 -05:00
evazion
7d07b5f289 artist: drop unused member_names method. 2019-09-06 16:18:29 -05:00
evazion
fc3b822bdf artists: reduce queries in artist summaries.
Avoid a few queries when searching for single artist tags.
2019-09-05 00:00:15 -05:00
evazion
eba6440b8b Fix #4144: Deviantart Eclipse update broke strategy. 2019-08-28 23:40:29 -05:00
evazion
27a118dfc8 tests: drop timecop gem. 2019-08-18 11:24:41 -05:00
evazion
145894fe8b tests: fix alias/implication tests. 2019-08-17 02:41:07 -05:00
evazion
065609ff4f artists: prevent creating artist entries for non-artist tags (#4107, 2717).
Validate that artist entries belong to an artist tag. Don't allow
creating artist entries for character/copyright/meta tags, but do allow
entries for general tags (the gentag will be automatically changed to an
artist tag).
2019-08-01 20:20:06 -05:00
evazion
bcaefa0a7e Fix #4107: Can't create artist entry if tag already has a wiki #4107 2019-08-01 19:34:21 -05:00
evazion
2129e60b2b pixiv: include stacc url in new artist entries (#4028). 2018-12-27 15:03:11 -06:00
evazion
2170961f47 artists: improve prefilling of new artist form (#4028)
* When creating an artist by clicking the '?' next to the artist tag in
  the tag list, prefill the new artist form by finding the artist's last
  upload and fetching its source data.

  Previously we filled the urls with the source of the artist's last
  upload, which was wrong because it was usually a direct image URL (#3078).

* Fix the other names field not escaping spaces within names to underscores.

* Fix the other names field being potentially prefilled with duplicate names.
2018-12-27 15:03:11 -06:00
evazion
286bf2f285 artists: filter out duplicates from other names (#4028). 2018-12-27 15:03:11 -06:00
evazion
41ff05c121 artists: convert other_names to array (#3987). 2018-11-15 14:31:16 -06:00
evazion
741462ae68 artist versions: convert other_names, url_string to arrays (#3987). 2018-11-14 14:25:02 -06:00
Albert Yi
5a13d06501 rescue failed network calls on artist test 2018-10-25 13:47:16 -07:00
evazion
bb5f291112 artists: don't create new version when nothing changed.
Fix an issue where saving an artist entry without changing anything
would create a new artist version.
2018-10-04 20:01:38 -05:00
evazion
03cc3dfa50 artists: fix editing invalid urls in artist entries (fix #3720, #3927, #3781)
Convert to an autosave association on urls. This ensures that when we
save the artist we only validate the added urls, not bad urls that we're
trying to remove, and that url validation errors are propagated up to
the artist object.

This also fixes invalid urls being saved in the artist history despite
validation failing (#3720).
2018-10-04 19:49:16 -05:00
evazion
09a8198979 /artists: add wildcard, regex search to url field (#3900)
Allow searching the URL field by regex or by wildcard.

If the query looks like `/twitter/` do a regex search, otherwise if it
looks like `http://www.twitter.com/*` do a wildcard search, otherwise if
it looks like an url do an artist finder search, lastly if it looks like
`twitter` do a `*twitter*` search.
2018-09-21 21:19:01 -05:00
evazion
a4608daf38 /artists: add more search options for other names, group name.
Add these search params:

* /artists?search[<field>]=
* /artists?search[<field>_eq]=
* /artists?search[<field>_not_eq]=
* /artists?search[<field>_like]=
* /artists?search[<field>_not_like]=
* /artists?search[<field>_ilike]=
* /artists?search[<field>_not_ilike]=
* /artists?search[<field>_regex]=
* /artists?search[<field>_not_regex]=

where `<field>` can be `name`, `group_name`, or `other_names`.

Remove these search params:

* /artists?search[name_matches]=
* /artists?search[other_names_match]=
* /artists?search[group_name_matches]=

`/artists?search[<field>_like]=` effectively does the same thing that
these searches did.
2018-09-21 20:55:14 -05:00