Commit Graph

205 Commits

Author SHA1 Message Date
evazion
b234727832 tags: ensure aliased tag categories stay in sync.
* When a tag's category is changed, also change the category of any aliases pointing to it. For
  example, if "ff7" is aliased to "final_fantasy_vii", and "final_fantasy_vii" is changed to a
  copyright tag, then change the empty "ff7" tag to be a copyright tag too.

* Don't allow changing the category of an aliased tag. For example, if "ff7" is aliased to
  "final_fantasy_vii", then don't allow changing the "ff7" tag to be a non-copyright tag.

This ensures that the categories of aliased tags stay in sync with that of their parent tags. This
way aliased tags are colored correctly in wikis and other places.
2022-11-22 22:00:23 -06:00
evazion
1a9718250f replacements: strip spaces from replacement URL.
Fix a handful of replacements having a replacement URL with a space in front or at the end.
This caused problems when searching by replacement URL.
2022-11-21 17:47:56 -06:00
evazion
1e478ab1b5 favgroups: add stricter favgroup naming rules.
Don't allow favgroup names that:

* Start or end with underscores.
* Contain multiple underscores in a row.
* Contain asterisks or non-printable characters.
* Consist of only underscores.
* Consist of only digits (conflicts with `favgroup:1234` syntax).

Add a fix script that fixes favgroups that violate these rules and notifies the user.
2022-11-20 22:09:57 -06:00
evazion
80b3e34bd1 replacements: initialize media_asset_id, old_media_asset_id columns. 2022-11-09 00:22:17 -06:00
evazion
83d14a281f replacements: backfill images in parallel. 2022-11-08 21:41:59 -06:00
evazion
09f1ace357 replacements: add fix script to backfill old images from Gelbooru.
Add a fix script to download images from Gelbooru for old replacements where we deleted the original
image. For archival purposes, we want to try to find the original file for every replacement.

These images will be uploaded as unposted assets under DanbooruBot's name.
2022-11-08 15:45:57 -06:00
evazion
00db63e885 Fix #5336: Nuke old danboorubot replacement comments
Add a fix script that imports the md5 for old post replacements from the corresponding DanbooruBot
replacement comment, then deletes all replacement comments.

There are about 250 replacements left that still have a null md5 because they don't have a matching
comment. This is because if a post was replaced but the file didn't change, it didn't leave a comment.
2022-11-08 02:26:50 -06:00
evazion
f083f29c3b users: add is_deleted flag.
Add is_deleted flag to users table in preparation for fixing #4555.
2022-11-06 01:41:14 -05:00
evazion
4ae3ebf845 artists: add SQL script to find incorrect artist URLs. 2022-11-05 19:09:56 -05:00
evazion
4c0f62254e script/fixes/123_refresh_media_metadata.rb: refresh metadata in parallel. 2022-11-03 22:09:24 -05:00
evazion
acc511ab7d media assets: fix dimensions of flash files.
Use ExifTool to get the dimensions of Flash files instead of calculating
it ourselves. Avoids copying third-party code.

Fixes a bug where Flash files with fractional dimensions (e.g. 607.6 x 756.6)
had their dimensions rounded down instead of rounded up.

Fixes another bug where Flash files could return negative dimensions.
This happened for two files:

* https://danbooru.donmai.us/media_assets/228662 (-179.2 x -339.2)
* https://danbooru.donmai.us/media_assets/228664 (-179.2 x -339.2)

Now we round these up to 1x1. This is still wrong, but it's less wrong than before.
2022-10-31 17:30:40 -05:00
evazion
27e4ae3d33 script/fixes/123_refresh_media_metadata.rb: don't wrap in transaction.
Don't wrap the metadata refresh script in a transaction because it could
be a very long running operation and it's not good to leave a transaction
open that long.
2022-10-31 02:29:07 -05:00
evazion
214a877c3c users: fix typo in contributor/approver migration script.
Fixup for #5306.
2022-10-30 21:19:05 -05:00
evazion
d65a35d4ae media assets: add fix script to refresh metadata.
Add a script to go through every media asset and check the metadata
(width, height, duration, filesize, md5, EXIF metadata) and update it
if it's changed. This is necessary after upgrading ExifTool because the
metadata it returns may have changed.
2022-10-30 14:49:12 -05:00
nonamethanks
ca31e7a47c Users: add Contributor and Approver user levels 2022-10-21 20:52:31 +02:00
evazion
873c67db58 emails: disallow names ending with a period.
Update email validation rules to disallow the percent character (e.g.
`foo%bar@gmail.com`) and names ending with a period (e.g. `foo.@gmail.com`).
Names ending with a period are invalid according to the RFCs and cause
`Mail::Address.new` to raise an exception.

The percent character is technically legal, but only one email used it
and it was probably a typo.
2022-10-17 22:13:19 -05:00
evazion
e31977ac29 emails: move EmailValidator into Danbooru::EmailAddress. 2022-10-17 22:13:19 -05:00
evazion
f6516e0e37 emails: add script to fix typo'd emails.
Add fix script to fix emails containing typos, such as `name@gamil.com`.
2022-10-14 23:28:23 -05:00
evazion
01d10a54f8 ugoira: store frame delays in MediaMetadata model.
Store Ugoira frame delays in the MediaMetadata model as a fake EXIF
field instead of in the PixivUgoiraFrameData model. This way we can get
rid of the PixivUgoiraFrameData model completely. This is a step towards
fixing #5264.
2022-10-09 22:25:20 -05:00
evazion
0cfd0ff436 emails: add fix script to renormalize email addresses.
Whenever the email address normalization procedure changes, the
`normalized_address` column of the email address table must be updated.
This is normally when the list of canonical domain mappings changes.

Renormalizing addresses may also require deleting duplicates.
2022-10-03 02:55:30 -05:00
evazion
86e69e3401 emails: add fix script to delete duplicate email addresses.
In the past it was possible for users to create multiple accounts with
the same email address. We had about 9000 such accounts. This removes
the email address from these accounts.

When multiple accounts have the same email address, the account that
visited the site last gets to keep the address.
2022-10-02 23:59:54 -05:00
evazion
21747e1f8e emails: add fix script to fix invalid email addresses.
Add a fix script that fixes invalid email addresses if they can be
fixed, otherwise they're deleted.

For a long time we didn't have any email validation, so we ended up with
a lot of invalid email addresses containing typos or other random garbage.
This tries to fix the most common typos when possible, otherwise the
email address is deleted.

In many cases the user created two accounts, one with a typo in the
email and one with the correct email. In these cases we can't fix the
invalid email, so we just delete it.
2022-10-02 20:44:10 -05:00
evazion
85cb434b2c users: fix bug in invalid username deletion script.
Fix a bug in script/fixes/115_delete_invalid_users.rb where certain
usernames containing punctuation weren't deleted.
2022-10-02 03:42:51 -05:00
evazion
3dc765ca9d mod actions: add fix script to populate subject field.
Add a fix script to populate the mod_actions subject field by parsing
mod action descriptions. Most mod actions contain an ID, so finding the
subject is easy, but some don't. And some mod actions refer to deleted
objects, such as deleted posts or comments. In these cases the subject
will be null.

For IP bans, the mod action description only contains the IP, but it's
possible to have multiple bans for the same IP. So we look for IP bans
created by the same user, for the same IP, within the same time range.

For user bans, the mod action only contains the banned user's name and
the ban reason. This makes it difficult to find the banned user's ID in
some cases, because it's possible for the user to have changed their
name, and for the name change to have not been recorded, and for the
banner to have edited the ban reason, or for the ban to have been
deleted. So we try multiple things until we find the closest match.
2022-09-25 21:19:43 -05:00
evazion
aea3837f9a users: delete accounts with invalid names.
Add a fix script to delete all accounts with invalid usernames. Also
change it so the owner-level user can delete accounts belonging to other
users.

Users who have logged in in the last year and who have a valid email
address will be given a one week warning. After that all accounts with
invalid names will be deleted. Anyone who has visited the site in the
last 6 months will have already seen a warning page that their name must
be changed to keep using the site.
2022-09-19 05:09:44 -05:00
evazion
2119a8efc5 mod actions: fix messages to use consistent format.
Fix mod actions to use the same message format everywhere.

Before mod actions were formatted in various inconsistent ways:

* "deleted post #1234"
* "comment #1234 updated by <user>"
* "<user> updated forum #1234"
* "<user> level changed Member -> Builder"

Now all mod actions consistently use this format:

* "deleted post #1234"
* "updated comment #1234"
* "updated forum #1234"
* "promoted <user> from Member to Builder"

This way mod actions are formatted consistently with other actions on
the /user_actions page, where everything is written as "<user> did X".

Also add a fix script to fix existing mod actions.
2022-09-18 21:56:57 -05:00
evazion
ec382357b8 tags: populate words column.
Add code for parsing tags into words and for populating the `words` column
in the tags table.
2022-09-01 23:54:07 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
fec92d765a users: change default blacklist to furry -rating:g. 2022-06-02 00:06:34 -05:00
evazion
173e43b192 user upgrades: add upgrade code system.
Add a system for upgrading accounts using upgrade codes. Users purchase
an upgrade code off-site then redeem it on-site to upgrade their account
to Gold. Upgrade codes are randomly pre-generated and are one time use
only. Codes have enough randomness that guessing a code is infeasible.
2022-06-01 18:31:46 -05:00
evazion
4ba993319a media assets: add file_key, is_public columns.
`file_key` is a random 9-character base-62 string that will be used as
the image filename in the future.

`is_public` is whether the image can be viewed without authentication or not.

Users running downstream boorus must run `bin/rails db:migrate` and
`script/fixes/109_generate_media_asset_file_keys.rb` after this commit.
2022-05-04 23:19:53 -05:00
evazion
703fd05025 favgroups: don't allow favgroups to be named 'any' or 'none'.
'any' and 'none' are now reserved keywords for the favgroup: metatag.

Also add a fix script to rename existing favgroups.
2022-04-17 23:17:18 -05:00
evazion
226faae8ec BURs: fix tags field not finding all BURs with that tag.
Fix the Tags field in the BUR search form not finding all BURs
mentioning that tag. Specifically, tags that were part of a mass update,
and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't
indexed as tags affected by the BUR.

This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb
to fix old BURs.
2022-03-29 21:06:24 -05:00
evazion
4b1264991f users: remove 'spoilers' tag from default blacklist.
Rationale:

* The spoilers tag is the most frequently removed tag from the default blacklist.
* It's frustrating for regular users to have posts randomly hidden because of trivial
  spoilers from a series they don't care about.
* The spoilers tag is used way too liberally for things that aren't considered
  spoilers on other sites.
* If you're looking up fanart on the internet, you should expect to see a certain
  level of spoilers.
* The tag is used very inconsistently, with some characters like Nia_(blade)_(xenoblade)
  getting the spoilers tag half the time and the rest of the time not.
2022-03-20 16:49:36 -05:00
evazion
04c03fa4e6 artist: normalize more artist url formats. 2022-03-16 17:17:50 -05:00
evazion
04226d3409 pixiv: normalize pixiv urls in artist entries.
Normalize Pixiv URLs to `https://www.pixiv.net/users/1234` format.
2022-03-14 16:43:19 -05:00
evazion
223742c365 weibo: normalize weibo urls in artist entries.
Normalize all Weibo URLs in artist entries to one of these forms:

* https://www.weibo.com/u/5399876326
* https://www.weibo.com/p/1005055399876326
* https://www.weibo.com/chengziyou666
2022-03-13 21:16:56 -05:00
evazion
eb032d54c1 uploads: set upload_media_asset.status to active.
Fix the status being set to pending instead of active for new upload
media assets.
2022-02-14 00:40:40 -06:00
evazion
04d242c60c uploads: save filename, image URL, page URL for uploads.
* Save the filename for files uploaded from disk. This could be used in
  the future to extract source data if the filename is from a known site.

* Save both the image URL and the page URL for files uploaded from
  source. This is needed for multi-file uploads. The image URL is the
  URL of the file actually downloaded from the source. This can be
  different from the URL given by the user, if the user tried to upload
  a sample URL and we automatically changed it to the original URL. The
  page URL is the URL of the page containing the image. We don't always
  know this, for example if someone uploads a Twitter image without the
  bookmarklet, then we can't find the page URL.

* Add a fix script to backfill URLs for existing uploads. For file
  uploads, the filename will be set to "unknown.jpg". For source
  uploads, we fetch the source data again to get the image and page
  URLs. This may fail for uploads that have been deleted from the
  source since uploading.
2022-02-12 15:22:41 -06:00
evazion
9a23970ab1 uploads: fix media_asset_count. 2022-02-12 15:22:24 -06:00
evazion
1a61e329ba uploads: add column for error messages.
Change it so uploads store errors in an `error` column instead of in the
`status` field.
2022-02-07 15:44:39 -06:00
evazion
19a9cf3d2f uploads: delete old upload records from before the rework.
Delete all old upload records from before the upload rework in abdab7a0a
/ f11c46b4f. Uploads from before the rework don't have any attached
media assets, so they're not valid under the new system because we can't
find which files they were for.

Before the rework, completed uploads were only saved for 1 hour, and
failed uploads were only saved for 3 days, so deleting this data
doesn't really lose anything that wouldn't have been deleted before.
2022-02-07 15:11:09 -06:00
evazion
6d2a2eee59 Fix #4017: Artist tag in upload page should account for aliases
Disallow creating artist entries for aliased tags. Add a fix script to
move existing artist entries for tags that have been aliased.
2022-02-01 12:33:45 -06:00
evazion
61c043c6b1 posts: normalize Unicode to NFC form in post sources.
Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.

Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
2022-01-31 14:16:49 -06:00
evazion
d2a24e6b10 Fix #4971: NoMethodError when trying to display some modreports.
Delete modreports for hard-deleted comments. There were a total of six
invalid modreports for deleted comments.
2022-01-22 18:12:07 -06:00
evazion
56722df753 forum: delete posts when topic is deleted.
Fix it so that when a forum topic is deleted, all posts in the topic are
deleted too. Also make it so that when a forum topic is undeleted, all
posts in it are undeleted too.

Before when a topic was deleted, only the topic itself was marked as
deleted, not the posts inside the topic. This meant that when a spam
topic was deleted, the OP wouldn't be marked as deleted, so any
modreports against it wouldn't be marked as handled.

Also change it so that it's not possible to undelete a post in a deleted
topic, or to delete the OP of a topic without deleting the topic itself.

Finally, add a fix script to delete all active posts in deleted topics,
and to undelete all deleted OPs in active topics.
2022-01-21 22:35:20 -06:00
evazion
c8d27c2719 Fix #4669: Track moderation report status.
* Add ability to mark moderation reports as 'handled' or 'rejected'.
* Automatically mark reports as handled when the comment or forum post
  is deleted.
* Send a dmail to the reporter when their report is handled.
* Don't show the report notice on comments or forum posts when all
  reports against it have been handled or rejected.
* Add a fix script to mark all existing reports for deleted comments,
  forum posts, or dmails as handled.
2022-01-20 20:50:23 -06:00
evazion
98aee048f2 artists: fix old artists with invalid names.
There are a lot of old artist entries with Japanese names. These names
are now invalid and these artist entries can't be edited because they
fail validation checks.

Add a fix script to delete all artist entries with non-ASCII names, and
rename them to `artist_1234`.
2022-01-20 16:01:31 -06:00
evazion
02c9498860 artists: normalize group names.
Normalize artist group names following the same rules as artist other names.

This means artist group names now use underscores instead of spaces.
It also means extra space characters at the beginning and end of names
is stripped, and Unicode characters are normalized.

Fixes #4647, which was caused by users accidentally replacing group
names with a single space character when trying to remove a group.
2022-01-20 00:17:06 -06:00
evazion
acf565be7b Fix #4678: Validate custom CSS.
* Make it an error to add invalid custom CSS to your account.
* Add a fix script to remove custom CSS from all accounts with invalid CSS.
2022-01-15 23:20:49 -06:00