Files
danbooru/script/fixes
evazion 1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
..
2013-02-22 20:59:21 -05:00
2013-02-22 21:13:11 -05:00
2013-02-23 19:46:32 -05:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00
2019-12-22 21:23:37 -06:00

Fixes

This directory contains one-off scripts designed to fix various problems with the production database. Most of these scripts are for the production database and aren't meant to be run by downstream users.

Most of the older scripts here no longer work because of changes to the code over the years.