Add AI tag model and UI.

Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
This commit is contained in:
evazion
2022-06-24 04:35:29 -05:00
parent ae9495ec7c
commit 1aeb52186e
20 changed files with 247 additions and 3 deletions

View File

@@ -89,6 +89,17 @@ SET default_tablespace = '';
SET default_table_access_method = heap;
--
-- Name: ai_tags; Type: TABLE; Schema: public; Owner: -
--
CREATE TABLE public.ai_tags (
media_asset_id integer NOT NULL,
tag_id integer NOT NULL,
score smallint NOT NULL
);
--
-- Name: api_keys; Type: TABLE; Schema: public; Owner: -
--
@@ -3123,6 +3134,27 @@ ALTER TABLE ONLY public.wiki_pages
ADD CONSTRAINT wiki_pages_pkey PRIMARY KEY (id);
--
-- Name: index_ai_tags_on_media_asset_id; Type: INDEX; Schema: public; Owner: -
--
CREATE INDEX index_ai_tags_on_media_asset_id ON public.ai_tags USING btree (media_asset_id);
--
-- Name: index_ai_tags_on_score; Type: INDEX; Schema: public; Owner: -
--
CREATE INDEX index_ai_tags_on_score ON public.ai_tags USING btree (score);
--
-- Name: index_ai_tags_on_tag_id; Type: INDEX; Schema: public; Owner: -
--
CREATE INDEX index_ai_tags_on_tag_id ON public.ai_tags USING btree (tag_id);
--
-- Name: index_api_keys_on_key; Type: INDEX; Schema: public; Owner: -
--
@@ -5942,6 +5974,7 @@ INSERT INTO "schema_migrations" (version) VALUES
('20220410050628'),
('20220504235329'),
('20220514175125'),
('20220525214746');
('20220525214746'),
('20220623052547');