Add AI tag model and UI.

Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
This commit is contained in:
evazion
2022-06-24 04:35:29 -05:00
parent ae9495ec7c
commit 1aeb52186e
20 changed files with 247 additions and 3 deletions

View File

@@ -211,6 +211,8 @@ class AutocompleteService
autocomplete_favorite_group(value)
when :search
autocomplete_saved_search_label(value)
when :ai, :unaliased
autocomplete_tag(value)
when *STATIC_METATAGS.keys
autocomplete_static_metatag(metatag, value)
else

View File

@@ -430,7 +430,13 @@ module Searchable
end
if model == Post && params["#{attr}_tags_match"].present?
relation = relation.where(attr => Post.user_tag_match(params["#{attr}_tags_match"], current_user).reorder(nil))
posts = Post.user_tag_match(params["#{attr}_tags_match"], current_user).reorder(nil)
if association.through_reflection?
relation = relation.includes(association.through_reflection.name).where(association.through_reflection.name => { attr => posts })
else
relation = relation.where(attr => posts)
end
end
if params["has_#{attr}"].to_s.truthy? || params["has_#{attr}"].to_s.falsy?

View File

@@ -38,7 +38,7 @@ class PostQueryBuilder
ordpool note comment commentary id rating source status filetype
disapproved parent child search embedded md5 width height mpixels ratio
score upvotes downvotes favcount filesize date age order limit tagcount pixiv_id pixiv
unaliased exif duration random is has
unaliased exif duration random is has ai
] + COUNT_METATAGS + COUNT_METATAG_SYNONYMS + CATEGORY_COUNT_METATAGS
ORDER_METATAGS = %w[
@@ -163,6 +163,8 @@ class PostQueryBuilder
relation.tags_include(value)
when "exif"
relation.exif_matches(value)
when "ai"
relation.ai_tags_include(value)
when "user"
relation.uploader_matches(value)
when "approver"