Add AI tag model and UI.

Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
This commit is contained in:
evazion
2022-06-24 04:35:29 -05:00
parent ae9495ec7c
commit 1aeb52186e
20 changed files with 247 additions and 3 deletions

40
app/models/ai_tag.rb Normal file
View File

@@ -0,0 +1,40 @@
# frozen_string_literal: true
class AITag < ApplicationRecord
belongs_to :tag
belongs_to :media_asset
has_one :post, through: :media_asset
validates :score, inclusion: { in: (0.0..1.0) }
def self.search(params)
q = search_attributes(params, :media_asset, :tag, :post, :score)
if params[:tag_name].present?
q = q.where(tag_id: Tag.find_by_name_or_alias(params[:tag_name])&.id)
end
if params[:is_posted].to_s.truthy?
q = q.where.associated(:post)
elsif params[:is_posted].to_s.falsy?
q = q.where.missing(:post)
end
q = q.apply_default_order(params)
q
end
def self.default_order
order(media_asset_id: :desc, tag_id: :asc)
end
def correct?
if post.nil?
false
elsif tag.name =~ /\Arating:(.)\z/
post.rating == $1
else
post.has_tag?(tag.name)
end
end
end

View File

@@ -20,6 +20,7 @@ class MediaAsset < ApplicationRecord
has_many :upload_media_assets, dependent: :destroy
has_many :uploads, through: :upload_media_assets
has_many :uploaders, through: :uploads, class_name: "User", foreign_key: :uploader_id
has_many :ai_tags
delegate :metadata, to: :media_metadata
delegate :is_non_repeating_animation?, :is_greyscale?, :is_rotated?, to: :metadata

View File

@@ -1307,6 +1307,14 @@ class Post < ApplicationRecord
where(md5: metadata.select(:md5))
end
def ai_tags_include(value)
tag = Tag.find_by_name_or_alias(value)
return none if tag.nil?
ai_tags = AITag.joins(:media_asset).where(tag: tag, score: (50..))
where(ai_tags.where("media_assets.md5 = posts.md5").arel.exists)
end
def uploader_matches(username)
case username.downcase
when "any"

View File

@@ -13,6 +13,7 @@ class Tag < ApplicationRecord
has_many :antecedent_implications, -> {active}, :class_name => "TagImplication", :foreign_key => "antecedent_name", :primary_key => "name"
has_many :consequent_implications, -> {active}, :class_name => "TagImplication", :foreign_key => "consequent_name", :primary_key => "name"
has_many :dtext_links, foreign_key: :link_target, primary_key: :name
has_many :ai_tags
validates :name, tag_name: true, uniqueness: true, on: :create
validates :name, tag_name: true, on: :name