Commit Graph

314 Commits

Author SHA1 Message Date
evazion
3dc765ca9d mod actions: add fix script to populate subject field.
Add a fix script to populate the mod_actions subject field by parsing
mod action descriptions. Most mod actions contain an ID, so finding the
subject is easy, but some don't. And some mod actions refer to deleted
objects, such as deleted posts or comments. In these cases the subject
will be null.

For IP bans, the mod action description only contains the IP, but it's
possible to have multiple bans for the same IP. So we look for IP bans
created by the same user, for the same IP, within the same time range.

For user bans, the mod action only contains the banned user's name and
the ban reason. This makes it difficult to find the banned user's ID in
some cases, because it's possible for the user to have changed their
name, and for the name change to have not been recorded, and for the
banner to have edited the ban reason, or for the ban to have been
deleted. So we try multiple things until we find the closest match.
2022-09-25 21:19:43 -05:00
evazion
aea3837f9a users: delete accounts with invalid names.
Add a fix script to delete all accounts with invalid usernames. Also
change it so the owner-level user can delete accounts belonging to other
users.

Users who have logged in in the last year and who have a valid email
address will be given a one week warning. After that all accounts with
invalid names will be deleted. Anyone who has visited the site in the
last 6 months will have already seen a warning page that their name must
be changed to keep using the site.
2022-09-19 05:09:44 -05:00
evazion
2119a8efc5 mod actions: fix messages to use consistent format.
Fix mod actions to use the same message format everywhere.

Before mod actions were formatted in various inconsistent ways:

* "deleted post #1234"
* "comment #1234 updated by <user>"
* "<user> updated forum #1234"
* "<user> level changed Member -> Builder"

Now all mod actions consistently use this format:

* "deleted post #1234"
* "updated comment #1234"
* "updated forum #1234"
* "promoted <user> from Member to Builder"

This way mod actions are formatted consistently with other actions on
the /user_actions page, where everything is written as "<user> did X".

Also add a fix script to fix existing mod actions.
2022-09-18 21:56:57 -05:00
evazion
ec382357b8 tags: populate words column.
Add code for parsing tags into words and for populating the `words` column
in the tags table.
2022-09-01 23:54:07 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
fec92d765a users: change default blacklist to furry -rating:g. 2022-06-02 00:06:34 -05:00
evazion
173e43b192 user upgrades: add upgrade code system.
Add a system for upgrading accounts using upgrade codes. Users purchase
an upgrade code off-site then redeem it on-site to upgrade their account
to Gold. Upgrade codes are randomly pre-generated and are one time use
only. Codes have enough randomness that guessing a code is infeasible.
2022-06-01 18:31:46 -05:00
evazion
4ba993319a media assets: add file_key, is_public columns.
`file_key` is a random 9-character base-62 string that will be used as
the image filename in the future.

`is_public` is whether the image can be viewed without authentication or not.

Users running downstream boorus must run `bin/rails db:migrate` and
`script/fixes/109_generate_media_asset_file_keys.rb` after this commit.
2022-05-04 23:19:53 -05:00
evazion
703fd05025 favgroups: don't allow favgroups to be named 'any' or 'none'.
'any' and 'none' are now reserved keywords for the favgroup: metatag.

Also add a fix script to rename existing favgroups.
2022-04-17 23:17:18 -05:00
evazion
226faae8ec BURs: fix tags field not finding all BURs with that tag.
Fix the Tags field in the BUR search form not finding all BURs
mentioning that tag. Specifically, tags that were part of a mass update,
and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't
indexed as tags affected by the BUR.

This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb
to fix old BURs.
2022-03-29 21:06:24 -05:00
evazion
4b1264991f users: remove 'spoilers' tag from default blacklist.
Rationale:

* The spoilers tag is the most frequently removed tag from the default blacklist.
* It's frustrating for regular users to have posts randomly hidden because of trivial
  spoilers from a series they don't care about.
* The spoilers tag is used way too liberally for things that aren't considered
  spoilers on other sites.
* If you're looking up fanart on the internet, you should expect to see a certain
  level of spoilers.
* The tag is used very inconsistently, with some characters like Nia_(blade)_(xenoblade)
  getting the spoilers tag half the time and the rest of the time not.
2022-03-20 16:49:36 -05:00
evazion
04c03fa4e6 artist: normalize more artist url formats. 2022-03-16 17:17:50 -05:00
evazion
04226d3409 pixiv: normalize pixiv urls in artist entries.
Normalize Pixiv URLs to `https://www.pixiv.net/users/1234` format.
2022-03-14 16:43:19 -05:00
evazion
223742c365 weibo: normalize weibo urls in artist entries.
Normalize all Weibo URLs in artist entries to one of these forms:

* https://www.weibo.com/u/5399876326
* https://www.weibo.com/p/1005055399876326
* https://www.weibo.com/chengziyou666
2022-03-13 21:16:56 -05:00
evazion
eb032d54c1 uploads: set upload_media_asset.status to active.
Fix the status being set to pending instead of active for new upload
media assets.
2022-02-14 00:40:40 -06:00
evazion
04d242c60c uploads: save filename, image URL, page URL for uploads.
* Save the filename for files uploaded from disk. This could be used in
  the future to extract source data if the filename is from a known site.

* Save both the image URL and the page URL for files uploaded from
  source. This is needed for multi-file uploads. The image URL is the
  URL of the file actually downloaded from the source. This can be
  different from the URL given by the user, if the user tried to upload
  a sample URL and we automatically changed it to the original URL. The
  page URL is the URL of the page containing the image. We don't always
  know this, for example if someone uploads a Twitter image without the
  bookmarklet, then we can't find the page URL.

* Add a fix script to backfill URLs for existing uploads. For file
  uploads, the filename will be set to "unknown.jpg". For source
  uploads, we fetch the source data again to get the image and page
  URLs. This may fail for uploads that have been deleted from the
  source since uploading.
2022-02-12 15:22:41 -06:00
evazion
9a23970ab1 uploads: fix media_asset_count. 2022-02-12 15:22:24 -06:00
evazion
1a61e329ba uploads: add column for error messages.
Change it so uploads store errors in an `error` column instead of in the
`status` field.
2022-02-07 15:44:39 -06:00
evazion
19a9cf3d2f uploads: delete old upload records from before the rework.
Delete all old upload records from before the upload rework in abdab7a0a
/ f11c46b4f. Uploads from before the rework don't have any attached
media assets, so they're not valid under the new system because we can't
find which files they were for.

Before the rework, completed uploads were only saved for 1 hour, and
failed uploads were only saved for 3 days, so deleting this data
doesn't really lose anything that wouldn't have been deleted before.
2022-02-07 15:11:09 -06:00
evazion
6d2a2eee59 Fix #4017: Artist tag in upload page should account for aliases
Disallow creating artist entries for aliased tags. Add a fix script to
move existing artist entries for tags that have been aliased.
2022-02-01 12:33:45 -06:00
evazion
61c043c6b1 posts: normalize Unicode to NFC form in post sources.
Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.

Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
2022-01-31 14:16:49 -06:00
evazion
d2a24e6b10 Fix #4971: NoMethodError when trying to display some modreports.
Delete modreports for hard-deleted comments. There were a total of six
invalid modreports for deleted comments.
2022-01-22 18:12:07 -06:00
evazion
56722df753 forum: delete posts when topic is deleted.
Fix it so that when a forum topic is deleted, all posts in the topic are
deleted too. Also make it so that when a forum topic is undeleted, all
posts in it are undeleted too.

Before when a topic was deleted, only the topic itself was marked as
deleted, not the posts inside the topic. This meant that when a spam
topic was deleted, the OP wouldn't be marked as deleted, so any
modreports against it wouldn't be marked as handled.

Also change it so that it's not possible to undelete a post in a deleted
topic, or to delete the OP of a topic without deleting the topic itself.

Finally, add a fix script to delete all active posts in deleted topics,
and to undelete all deleted OPs in active topics.
2022-01-21 22:35:20 -06:00
evazion
c8d27c2719 Fix #4669: Track moderation report status.
* Add ability to mark moderation reports as 'handled' or 'rejected'.
* Automatically mark reports as handled when the comment or forum post
  is deleted.
* Send a dmail to the reporter when their report is handled.
* Don't show the report notice on comments or forum posts when all
  reports against it have been handled or rejected.
* Add a fix script to mark all existing reports for deleted comments,
  forum posts, or dmails as handled.
2022-01-20 20:50:23 -06:00
evazion
98aee048f2 artists: fix old artists with invalid names.
There are a lot of old artist entries with Japanese names. These names
are now invalid and these artist entries can't be edited because they
fail validation checks.

Add a fix script to delete all artist entries with non-ASCII names, and
rename them to `artist_1234`.
2022-01-20 16:01:31 -06:00
evazion
02c9498860 artists: normalize group names.
Normalize artist group names following the same rules as artist other names.

This means artist group names now use underscores instead of spaces.
It also means extra space characters at the beginning and end of names
is stripped, and Unicode characters are normalized.

Fixes #4647, which was caused by users accidentally replacing group
names with a single space character when trying to remove a group.
2022-01-20 00:17:06 -06:00
evazion
acf565be7b Fix #4678: Validate custom CSS.
* Make it an error to add invalid custom CSS to your account.
* Add a fix script to remove custom CSS from all accounts with invalid CSS.
2022-01-15 23:20:49 -06:00
evazion
33103f6dc4 pools: add ability to search for pools linking to given tag.
Add ability to search for pools linking to a given tag in the pool
description. Example:

    https://danbooru.donmai.us/pools?search[linked_to]=touhou

(This isn't actually exposed in the UI to avoid cluttering the pool
search form with rarely used options.)

Pools with broken links can be found here:

    https://danbooru.donmai.us/dtext_links?search[has_linked_tag]=No&search[has_linked_wiki]=No&search[model_type]=Pool

Lays the groundwork for fixing #4629.
2022-01-15 20:26:30 -06:00
evazion
c3c4f5a2a7 Fix #4957: Autotag non-web_source.
Autotag non-web_source on posts that have a non-http:// or https:// URL.
Add a fix script to backfill old posts.

Syntactically invalid URLs are still considered web sources. For
example, `https://google,com` technically isn't a valid URL, but it's
not considered a non-web source.
2022-01-14 22:58:27 -06:00
evazion
2e1c7ce6d3 Fix #4951: chartags:0 returning posts with chartags.
* Add fix script to fix posts with incorrect tag_count_* fields.
* Simplify the code for updating tag_count_* fields (no functional change).
2022-01-10 13:33:56 -06:00
evazion
85e1ae3c9b favorites: fix posts with incorrect fav_count fields.
There were about 4000 posts with an incorrect fav_count.
2022-01-09 19:31:45 -06:00
evazion
ab4214dc00 emails: mark all invalid emails as undeliverable. 2022-01-09 13:24:53 -06:00
evazion
c09cd9e9fd users: fix incorrect count columns on users table.
Fix incorrect post_upload_count, note_update_count, and
unread_dmail_count columns on the users table.
2022-01-09 12:51:10 -06:00
evazion
5623b139aa db: add foreign key constraints on all tables.
Add foreign key constraints on all foreign keys on all tables.

These constraints are deferrable so that they're checked at the end of
the transaction, rather at the end of the statement. This is to reduce
lock duration and to allow for cyclic relationships.

Constraints are added in one migration then validated in another so that
the entire table isn't locked against reads and writes while the foreign
key constraints are being validated.

A few tables had invalid foreign keys. Add a fix script to fix these tables:

* A couple artist versions belonged to deleted artists.
* One dmail belonged to a deleted user.
* One forum topic visit belonged to that same deleted user.
* A few dozen note versions belonged to nonexistent posts. This came
  from RaisingK moving notes to different posts years ago, back when it
  was possible for users to set a note's post ID in the API.
* Some uploads had their parent ID set to 0.
2022-01-09 11:01:00 -06:00
evazion
3814aa21b3 favorites: delete favorites for expunged posts.
Delete favorites that have an invalid post_id because they belong to an
expunged post.

This bug of not deleting favorites after a post is expunged was fixed
long ago, but old favorites were never cleaned up.

Fixes #4711: Some users have incorrect fav count.
2022-01-08 20:55:55 -06:00
evazion
d903f45935 emails: add script to fix invalid emails.
Add a fix script to fix email addresses that are invalid or that contain
common typos. If the email can't be fixed, usually because the fixed
address is already in use by another account, then the email address is
marked undeliverable.
2022-01-02 16:08:35 -06:00
evazion
3d4d8ae2ae media assets: fix thumbnail backfill script to ignore Flash files.
We can't generate thumbnails for Flash files, so ignore them.
2021-12-05 21:48:57 -06:00
evazion
d47154a9a6 media assets: fix typo in thumbnail backfill script. 2021-12-05 19:04:21 -06:00
evazion
bcc773390b media assets: add script to backfill new thumbnail sizes. 2021-12-05 16:41:32 -06:00
evazion
be505920d1 media assets: add script to fix assets with deleted files.
Mark assets that have missing files as expunged. This happened with
uploads that were abandoned and had their files deleted, but that didn't
destroy their media asset record.

Fixes an issue where uploads could have missing files because someone
resumed an abandoned upload that had its files deleted.
2021-10-24 23:00:00 -05:00
evazion
d8de58d991 Fix bug in 079_fix_duration.rb
`assets` was unused.
2021-10-17 18:31:54 -05:00
evazion
e72446463e Fix #4901: Duplicate disapprovals
* Add uniqueness constraint on post_disapprovals (user_id, post_id).
* Add fix script to remove existing duplicates.
2021-10-12 20:22:00 -05:00
evazion
6d3d7b0bd1 Fix #4651: Favorites table contains duplicate favorites
Add fix script to remove duplicate favorites. When a user has duplicate
favorites on the same post, the earliest favorite will be kept and the
rest will be removed.
2021-10-08 05:17:01 -05:00
evazion
0731b07d27 posts: store duration of animations and videos.
Start storing the duration of animations and videos in the `duration`
field on the media_assets table. This had to wait until 3d30bfd69 was
deployed, which had to wait until Postgres was upgraded in order to add
the duration column to the media_assets table without downtime.

Also add a fix script to backfill the duration on existing posts. Usage:

    TAGS=animated ./script/fixes/079_fix_duration.rb
2021-10-07 03:21:08 -05:00
evazion
0fed4b557b Remove Unicorn.
No longer used now that we use Puma in production. If you still used
Unicorn in your install, switch to `bin/rails server` instead. See
config/puma.rb for config settings.
2021-09-20 06:17:57 -05:00
evazion
4b6e706e5e Fix #4603: Total Upload Limit Being Reduced After A Failed Appeal 2021-06-28 06:04:14 -05:00
evazion
ad4c75eb1a docs add more docs to app/{jobs,logical}.
These were missed in the last commit.
2021-06-28 05:09:19 -05:00
evazion
0563ca3001 docs: document config/ and some directories in app/.
* Add README files to several directories in app/ giving a brief
  overview of some parts of Danbooru's architecture.
* Add documentation for files in config/.
2021-06-27 05:21:38 -05:00
evazion
4439293bf1 newrelic: fix newrelic starting without license key.
Fix an issue where the New Relic agent always started in the production
environment, even when a license key wasn't configured.

Also make the New Relic agent log to stdout instead of log/newrelic_agent.log.
2021-05-24 21:58:01 -05:00
evazion
c22df03804 Move script/delayed_job to bin/delayed_job.
This has been the recommended location since Rails 4.
2021-05-24 17:38:56 -05:00