Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.
AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.
The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.
You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).
You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.
To generate tags, use the `autotag` script from the Autotagger repo, something like this:
docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz
To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
Add a system for upgrading accounts using upgrade codes. Users purchase
an upgrade code off-site then redeem it on-site to upgrade their account
to Gold. Upgrade codes are randomly pre-generated and are one time use
only. Codes have enough randomness that guessing a code is infeasible.
`file_key` is a random 9-character base-62 string that will be used as
the image filename in the future.
`is_public` is whether the image can be viewed without authentication or not.
Users running downstream boorus must run `bin/rails db:migrate` and
`script/fixes/109_generate_media_asset_file_keys.rb` after this commit.
Fix the Tags field in the BUR search form not finding all BURs
mentioning that tag. Specifically, tags that were part of a mass update,
and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't
indexed as tags affected by the BUR.
This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb
to fix old BURs.
Rationale:
* The spoilers tag is the most frequently removed tag from the default blacklist.
* It's frustrating for regular users to have posts randomly hidden because of trivial
spoilers from a series they don't care about.
* The spoilers tag is used way too liberally for things that aren't considered
spoilers on other sites.
* If you're looking up fanart on the internet, you should expect to see a certain
level of spoilers.
* The tag is used very inconsistently, with some characters like Nia_(blade)_(xenoblade)
getting the spoilers tag half the time and the rest of the time not.
* Save the filename for files uploaded from disk. This could be used in
the future to extract source data if the filename is from a known site.
* Save both the image URL and the page URL for files uploaded from
source. This is needed for multi-file uploads. The image URL is the
URL of the file actually downloaded from the source. This can be
different from the URL given by the user, if the user tried to upload
a sample URL and we automatically changed it to the original URL. The
page URL is the URL of the page containing the image. We don't always
know this, for example if someone uploads a Twitter image without the
bookmarklet, then we can't find the page URL.
* Add a fix script to backfill URLs for existing uploads. For file
uploads, the filename will be set to "unknown.jpg". For source
uploads, we fetch the source data again to get the image and page
URLs. This may fail for uploads that have been deleted from the
source since uploading.
Delete all old upload records from before the upload rework in abdab7a0a
/ f11c46b4f. Uploads from before the rework don't have any attached
media assets, so they're not valid under the new system because we can't
find which files they were for.
Before the rework, completed uploads were only saved for 1 hour, and
failed uploads were only saved for 3 days, so deleting this data
doesn't really lose anything that wouldn't have been deleted before.
Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.
Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
Fix it so that when a forum topic is deleted, all posts in the topic are
deleted too. Also make it so that when a forum topic is undeleted, all
posts in it are undeleted too.
Before when a topic was deleted, only the topic itself was marked as
deleted, not the posts inside the topic. This meant that when a spam
topic was deleted, the OP wouldn't be marked as deleted, so any
modreports against it wouldn't be marked as handled.
Also change it so that it's not possible to undelete a post in a deleted
topic, or to delete the OP of a topic without deleting the topic itself.
Finally, add a fix script to delete all active posts in deleted topics,
and to undelete all deleted OPs in active topics.
* Add ability to mark moderation reports as 'handled' or 'rejected'.
* Automatically mark reports as handled when the comment or forum post
is deleted.
* Send a dmail to the reporter when their report is handled.
* Don't show the report notice on comments or forum posts when all
reports against it have been handled or rejected.
* Add a fix script to mark all existing reports for deleted comments,
forum posts, or dmails as handled.
There are a lot of old artist entries with Japanese names. These names
are now invalid and these artist entries can't be edited because they
fail validation checks.
Add a fix script to delete all artist entries with non-ASCII names, and
rename them to `artist_1234`.
Normalize artist group names following the same rules as artist other names.
This means artist group names now use underscores instead of spaces.
It also means extra space characters at the beginning and end of names
is stripped, and Unicode characters are normalized.
Fixes#4647, which was caused by users accidentally replacing group
names with a single space character when trying to remove a group.
Autotag non-web_source on posts that have a non-http:// or https:// URL.
Add a fix script to backfill old posts.
Syntactically invalid URLs are still considered web sources. For
example, `https://google,com` technically isn't a valid URL, but it's
not considered a non-web source.
Add foreign key constraints on all foreign keys on all tables.
These constraints are deferrable so that they're checked at the end of
the transaction, rather at the end of the statement. This is to reduce
lock duration and to allow for cyclic relationships.
Constraints are added in one migration then validated in another so that
the entire table isn't locked against reads and writes while the foreign
key constraints are being validated.
A few tables had invalid foreign keys. Add a fix script to fix these tables:
* A couple artist versions belonged to deleted artists.
* One dmail belonged to a deleted user.
* One forum topic visit belonged to that same deleted user.
* A few dozen note versions belonged to nonexistent posts. This came
from RaisingK moving notes to different posts years ago, back when it
was possible for users to set a note's post ID in the API.
* Some uploads had their parent ID set to 0.
Delete favorites that have an invalid post_id because they belong to an
expunged post.
This bug of not deleting favorites after a post is expunged was fixed
long ago, but old favorites were never cleaned up.
Fixes#4711: Some users have incorrect fav count.
Add a fix script to fix email addresses that are invalid or that contain
common typos. If the email can't be fixed, usually because the fixed
address is already in use by another account, then the email address is
marked undeliverable.
Mark assets that have missing files as expunged. This happened with
uploads that were abandoned and had their files deleted, but that didn't
destroy their media asset record.
Fixes an issue where uploads could have missing files because someone
resumed an abandoned upload that had its files deleted.
Add fix script to remove duplicate favorites. When a user has duplicate
favorites on the same post, the earliest favorite will be kept and the
rest will be removed.
Start storing the duration of animations and videos in the `duration`
field on the media_assets table. This had to wait until 3d30bfd69 was
deployed, which had to wait until Postgres was upgraded in order to add
the duration column to the media_assets table without downtime.
Also add a fix script to backfill the duration on existing posts. Usage:
TAGS=animated ./script/fixes/079_fix_duration.rb
* Add README files to several directories in app/ giving a brief
overview of some parts of Danbooru's architecture.
* Add documentation for files in config/.
* Don't set the inviter field for newly promoted users, or for Gold/Plat
upgrades.
* Clear the inviter field for paid Gold/Plat upgrades, and for users who
have a feedback or a modaction listing who invited them. This leaves
about 600 remaining users with an inviter field with no other record
of who invited them.
See #4750.
There are about 100 duplicate comment votes. This is because there
wasn't a uniqueness constraint in the database to prevent duplicate
votes. This adds a script to remove duplicate votes so that a constraint
can be added later.
* Set the default comment threshold to -8. This means that comments are
hidden at -8 or lower and greyed out at -4 or lower.
* Reset the comment threshold to -8 for anyone with a threshold greater
than -8. For reference, only about ~3100 users had a non-default
threshold. About 1600 of those had their threshold reset to -8.
* Change the comment threshold to a less-than-or-equal comparison
instead of a less-than comparsion. This means that a threshold of 0
before is the same as a threshold of -1 now. Since everyone's
thresholds were reset, this only affects people whose thresholds were
already less than -8, which is so low that the difference shouldn't
matter much.
* Set the maximum comment threshold to 5. For reference, less than 1% of
comments have a score greater than 5.
* Set the minimum comment threshold to -100. For reference, the most
downvoted comment has a score of -60.
Fix wiki pages and artists to normalize other names more consistently
and correctly.
For wiki pages, we strip leading / trailing / repeated underscores to
fix user typos, and we normalize to NFKC form to make search more consistent.
For artists, we allow leading / trailing / repeated underscores because
some artist names have these, and we normalize to NFC form because some
artists have weird names that would be lost by NFKC form. This does make
search less consistent.