Commit Graph

179 Commits

Author SHA1 Message Date
evazion
ec382357b8 tags: populate words column.
Add code for parsing tags into words and for populating the `words` column
in the tags table.
2022-09-01 23:54:07 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
fec92d765a users: change default blacklist to furry -rating:g. 2022-06-02 00:06:34 -05:00
evazion
173e43b192 user upgrades: add upgrade code system.
Add a system for upgrading accounts using upgrade codes. Users purchase
an upgrade code off-site then redeem it on-site to upgrade their account
to Gold. Upgrade codes are randomly pre-generated and are one time use
only. Codes have enough randomness that guessing a code is infeasible.
2022-06-01 18:31:46 -05:00
evazion
4ba993319a media assets: add file_key, is_public columns.
`file_key` is a random 9-character base-62 string that will be used as
the image filename in the future.

`is_public` is whether the image can be viewed without authentication or not.

Users running downstream boorus must run `bin/rails db:migrate` and
`script/fixes/109_generate_media_asset_file_keys.rb` after this commit.
2022-05-04 23:19:53 -05:00
evazion
703fd05025 favgroups: don't allow favgroups to be named 'any' or 'none'.
'any' and 'none' are now reserved keywords for the favgroup: metatag.

Also add a fix script to rename existing favgroups.
2022-04-17 23:17:18 -05:00
evazion
226faae8ec BURs: fix tags field not finding all BURs with that tag.
Fix the Tags field in the BUR search form not finding all BURs
mentioning that tag. Specifically, tags that were part of a mass update,
and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't
indexed as tags affected by the BUR.

This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb
to fix old BURs.
2022-03-29 21:06:24 -05:00
evazion
4b1264991f users: remove 'spoilers' tag from default blacklist.
Rationale:

* The spoilers tag is the most frequently removed tag from the default blacklist.
* It's frustrating for regular users to have posts randomly hidden because of trivial
  spoilers from a series they don't care about.
* The spoilers tag is used way too liberally for things that aren't considered
  spoilers on other sites.
* If you're looking up fanart on the internet, you should expect to see a certain
  level of spoilers.
* The tag is used very inconsistently, with some characters like Nia_(blade)_(xenoblade)
  getting the spoilers tag half the time and the rest of the time not.
2022-03-20 16:49:36 -05:00
evazion
04c03fa4e6 artist: normalize more artist url formats. 2022-03-16 17:17:50 -05:00
evazion
04226d3409 pixiv: normalize pixiv urls in artist entries.
Normalize Pixiv URLs to `https://www.pixiv.net/users/1234` format.
2022-03-14 16:43:19 -05:00
evazion
223742c365 weibo: normalize weibo urls in artist entries.
Normalize all Weibo URLs in artist entries to one of these forms:

* https://www.weibo.com/u/5399876326
* https://www.weibo.com/p/1005055399876326
* https://www.weibo.com/chengziyou666
2022-03-13 21:16:56 -05:00
evazion
eb032d54c1 uploads: set upload_media_asset.status to active.
Fix the status being set to pending instead of active for new upload
media assets.
2022-02-14 00:40:40 -06:00
evazion
04d242c60c uploads: save filename, image URL, page URL for uploads.
* Save the filename for files uploaded from disk. This could be used in
  the future to extract source data if the filename is from a known site.

* Save both the image URL and the page URL for files uploaded from
  source. This is needed for multi-file uploads. The image URL is the
  URL of the file actually downloaded from the source. This can be
  different from the URL given by the user, if the user tried to upload
  a sample URL and we automatically changed it to the original URL. The
  page URL is the URL of the page containing the image. We don't always
  know this, for example if someone uploads a Twitter image without the
  bookmarklet, then we can't find the page URL.

* Add a fix script to backfill URLs for existing uploads. For file
  uploads, the filename will be set to "unknown.jpg". For source
  uploads, we fetch the source data again to get the image and page
  URLs. This may fail for uploads that have been deleted from the
  source since uploading.
2022-02-12 15:22:41 -06:00
evazion
9a23970ab1 uploads: fix media_asset_count. 2022-02-12 15:22:24 -06:00
evazion
1a61e329ba uploads: add column for error messages.
Change it so uploads store errors in an `error` column instead of in the
`status` field.
2022-02-07 15:44:39 -06:00
evazion
19a9cf3d2f uploads: delete old upload records from before the rework.
Delete all old upload records from before the upload rework in abdab7a0a
/ f11c46b4f. Uploads from before the rework don't have any attached
media assets, so they're not valid under the new system because we can't
find which files they were for.

Before the rework, completed uploads were only saved for 1 hour, and
failed uploads were only saved for 3 days, so deleting this data
doesn't really lose anything that wouldn't have been deleted before.
2022-02-07 15:11:09 -06:00
evazion
6d2a2eee59 Fix #4017: Artist tag in upload page should account for aliases
Disallow creating artist entries for aliased tags. Add a fix script to
move existing artist entries for tags that have been aliased.
2022-02-01 12:33:45 -06:00
evazion
61c043c6b1 posts: normalize Unicode to NFC form in post sources.
Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.

Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
2022-01-31 14:16:49 -06:00
evazion
d2a24e6b10 Fix #4971: NoMethodError when trying to display some modreports.
Delete modreports for hard-deleted comments. There were a total of six
invalid modreports for deleted comments.
2022-01-22 18:12:07 -06:00
evazion
56722df753 forum: delete posts when topic is deleted.
Fix it so that when a forum topic is deleted, all posts in the topic are
deleted too. Also make it so that when a forum topic is undeleted, all
posts in it are undeleted too.

Before when a topic was deleted, only the topic itself was marked as
deleted, not the posts inside the topic. This meant that when a spam
topic was deleted, the OP wouldn't be marked as deleted, so any
modreports against it wouldn't be marked as handled.

Also change it so that it's not possible to undelete a post in a deleted
topic, or to delete the OP of a topic without deleting the topic itself.

Finally, add a fix script to delete all active posts in deleted topics,
and to undelete all deleted OPs in active topics.
2022-01-21 22:35:20 -06:00
evazion
c8d27c2719 Fix #4669: Track moderation report status.
* Add ability to mark moderation reports as 'handled' or 'rejected'.
* Automatically mark reports as handled when the comment or forum post
  is deleted.
* Send a dmail to the reporter when their report is handled.
* Don't show the report notice on comments or forum posts when all
  reports against it have been handled or rejected.
* Add a fix script to mark all existing reports for deleted comments,
  forum posts, or dmails as handled.
2022-01-20 20:50:23 -06:00
evazion
98aee048f2 artists: fix old artists with invalid names.
There are a lot of old artist entries with Japanese names. These names
are now invalid and these artist entries can't be edited because they
fail validation checks.

Add a fix script to delete all artist entries with non-ASCII names, and
rename them to `artist_1234`.
2022-01-20 16:01:31 -06:00
evazion
02c9498860 artists: normalize group names.
Normalize artist group names following the same rules as artist other names.

This means artist group names now use underscores instead of spaces.
It also means extra space characters at the beginning and end of names
is stripped, and Unicode characters are normalized.

Fixes #4647, which was caused by users accidentally replacing group
names with a single space character when trying to remove a group.
2022-01-20 00:17:06 -06:00
evazion
acf565be7b Fix #4678: Validate custom CSS.
* Make it an error to add invalid custom CSS to your account.
* Add a fix script to remove custom CSS from all accounts with invalid CSS.
2022-01-15 23:20:49 -06:00
evazion
33103f6dc4 pools: add ability to search for pools linking to given tag.
Add ability to search for pools linking to a given tag in the pool
description. Example:

    https://danbooru.donmai.us/pools?search[linked_to]=touhou

(This isn't actually exposed in the UI to avoid cluttering the pool
search form with rarely used options.)

Pools with broken links can be found here:

    https://danbooru.donmai.us/dtext_links?search[has_linked_tag]=No&search[has_linked_wiki]=No&search[model_type]=Pool

Lays the groundwork for fixing #4629.
2022-01-15 20:26:30 -06:00
evazion
c3c4f5a2a7 Fix #4957: Autotag non-web_source.
Autotag non-web_source on posts that have a non-http:// or https:// URL.
Add a fix script to backfill old posts.

Syntactically invalid URLs are still considered web sources. For
example, `https://google,com` technically isn't a valid URL, but it's
not considered a non-web source.
2022-01-14 22:58:27 -06:00
evazion
2e1c7ce6d3 Fix #4951: chartags:0 returning posts with chartags.
* Add fix script to fix posts with incorrect tag_count_* fields.
* Simplify the code for updating tag_count_* fields (no functional change).
2022-01-10 13:33:56 -06:00
evazion
85e1ae3c9b favorites: fix posts with incorrect fav_count fields.
There were about 4000 posts with an incorrect fav_count.
2022-01-09 19:31:45 -06:00
evazion
ab4214dc00 emails: mark all invalid emails as undeliverable. 2022-01-09 13:24:53 -06:00
evazion
c09cd9e9fd users: fix incorrect count columns on users table.
Fix incorrect post_upload_count, note_update_count, and
unread_dmail_count columns on the users table.
2022-01-09 12:51:10 -06:00
evazion
5623b139aa db: add foreign key constraints on all tables.
Add foreign key constraints on all foreign keys on all tables.

These constraints are deferrable so that they're checked at the end of
the transaction, rather at the end of the statement. This is to reduce
lock duration and to allow for cyclic relationships.

Constraints are added in one migration then validated in another so that
the entire table isn't locked against reads and writes while the foreign
key constraints are being validated.

A few tables had invalid foreign keys. Add a fix script to fix these tables:

* A couple artist versions belonged to deleted artists.
* One dmail belonged to a deleted user.
* One forum topic visit belonged to that same deleted user.
* A few dozen note versions belonged to nonexistent posts. This came
  from RaisingK moving notes to different posts years ago, back when it
  was possible for users to set a note's post ID in the API.
* Some uploads had their parent ID set to 0.
2022-01-09 11:01:00 -06:00
evazion
3814aa21b3 favorites: delete favorites for expunged posts.
Delete favorites that have an invalid post_id because they belong to an
expunged post.

This bug of not deleting favorites after a post is expunged was fixed
long ago, but old favorites were never cleaned up.

Fixes #4711: Some users have incorrect fav count.
2022-01-08 20:55:55 -06:00
evazion
d903f45935 emails: add script to fix invalid emails.
Add a fix script to fix email addresses that are invalid or that contain
common typos. If the email can't be fixed, usually because the fixed
address is already in use by another account, then the email address is
marked undeliverable.
2022-01-02 16:08:35 -06:00
evazion
3d4d8ae2ae media assets: fix thumbnail backfill script to ignore Flash files.
We can't generate thumbnails for Flash files, so ignore them.
2021-12-05 21:48:57 -06:00
evazion
d47154a9a6 media assets: fix typo in thumbnail backfill script. 2021-12-05 19:04:21 -06:00
evazion
bcc773390b media assets: add script to backfill new thumbnail sizes. 2021-12-05 16:41:32 -06:00
evazion
be505920d1 media assets: add script to fix assets with deleted files.
Mark assets that have missing files as expunged. This happened with
uploads that were abandoned and had their files deleted, but that didn't
destroy their media asset record.

Fixes an issue where uploads could have missing files because someone
resumed an abandoned upload that had its files deleted.
2021-10-24 23:00:00 -05:00
evazion
d8de58d991 Fix bug in 079_fix_duration.rb
`assets` was unused.
2021-10-17 18:31:54 -05:00
evazion
e72446463e Fix #4901: Duplicate disapprovals
* Add uniqueness constraint on post_disapprovals (user_id, post_id).
* Add fix script to remove existing duplicates.
2021-10-12 20:22:00 -05:00
evazion
6d3d7b0bd1 Fix #4651: Favorites table contains duplicate favorites
Add fix script to remove duplicate favorites. When a user has duplicate
favorites on the same post, the earliest favorite will be kept and the
rest will be removed.
2021-10-08 05:17:01 -05:00
evazion
0731b07d27 posts: store duration of animations and videos.
Start storing the duration of animations and videos in the `duration`
field on the media_assets table. This had to wait until 3d30bfd69 was
deployed, which had to wait until Postgres was upgraded in order to add
the duration column to the media_assets table without downtime.

Also add a fix script to backfill the duration on existing posts. Usage:

    TAGS=animated ./script/fixes/079_fix_duration.rb
2021-10-07 03:21:08 -05:00
evazion
4b6e706e5e Fix #4603: Total Upload Limit Being Reduced After A Failed Appeal 2021-06-28 06:04:14 -05:00
evazion
ad4c75eb1a docs add more docs to app/{jobs,logical}.
These were missed in the last commit.
2021-06-28 05:09:19 -05:00
evazion
0563ca3001 docs: document config/ and some directories in app/.
* Add README files to several directories in app/ giving a brief
  overview of some parts of Danbooru's architecture.
* Add documentation for files in config/.
2021-06-27 05:21:38 -05:00
GlassedSilver
4cf62c520c Script unflat (#4798)
Added fix script to unflatten data w/o symlinking
2021-05-15 02:58:56 -05:00
evazion
29d2e7fed2 storage manager: remove hierarchical option.
Remove the `hierarchical` file storage option. This means that image
files are always stored in MD5-based subdirectories, like this:

   https://danbooru.donmai.us/data/original/f3/a7/f3a70a89c350b5ed4db22dbb25b934bb.jpg
   https://danbooru.donmai.us/data/sample/f3/a7/sample-f3a70a89c350b5ed4db22dbb25b934bb.jpg
   https://danbooru.donmai.us/data/preview/f3/a7/f3a70a89c350b5ed4db22dbb25b934bb.jpg

instead of in a single flat directory, like this:

   https://danbooru.donmai.us/data/original/f3a70a89c350b5ed4db22dbb25b934bb.jpg

This option is removed because storing files in a single directory is a
bad idea for large installations, and migrating from a single directory
to subdirectories later is a pain.

Downstream boorus who still have files in the old layout can migrate by
running this script:

   `./script/fixes/077_symlink_subdirectories.rb`

This will create symlinks that redirect the 00-ff subdirectories back to
the current directory, so that you can still store files in a single
directory, but use URLs containing subdirectories.

You should also make sure to remove the `hierarchical` option from
`storage_manager` in `config/danbooru_local_config.rb` if you set it
there.
2021-03-18 01:33:56 -05:00
evazion
921dde6522 users: don't set inviter field; clear inviter field for most users.
* Don't set the inviter field for newly promoted users, or for Gold/Plat
  upgrades.

* Clear the inviter field for paid Gold/Plat upgrades, and for users who
  have a feedback or a modaction listing who invited them. This leaves
  about 600 remaining users with an inviter field with no other record
  of who invited them.

See #4750.
2021-03-08 03:16:34 -06:00
evazion
0a8fe506b4 posts: add script to remove duplicate votes.
There were 67 duplicate votes in the production database.
2021-01-29 02:09:30 -06:00
evazion
c9570e698b comments: add fix script to remove duplicate votes.
There are about 100 duplicate comment votes. This is because there
wasn't a uniqueness constraint in the database to prevent duplicate
votes. This adds a script to remove duplicate votes so that a constraint
can be added later.
2021-01-21 07:58:50 -06:00
evazion
9af407c94b comments: set default comment threshold to -8.
* Set the default comment threshold to -8. This means that comments are
  hidden at -8 or lower and greyed out at -4 or lower.

* Reset the comment threshold to -8 for anyone with a threshold greater
  than -8. For reference, only about ~3100 users had a non-default
  threshold. About 1600 of those had their threshold reset to -8.

* Change the comment threshold to a less-than-or-equal comparison
  instead of a less-than comparsion. This means that a threshold of 0
  before is the same as a threshold of -1 now. Since everyone's
  thresholds were reset, this only affects people whose thresholds were
  already less than -8, which is so low that the difference shouldn't
  matter much.

* Set the maximum comment threshold to 5. For reference, less than 1% of
  comments have a score greater than 5.

* Set the minimum comment threshold to -100. For reference, the most
  downvoted comment has a score of -60.
2021-01-19 05:48:25 -06:00