Commit Graph

17 Commits

Author SHA1 Message Date
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
4c7cfc73c6 search: add new tag search parser.
Add a new tag tag search parser that supports full boolean expressions, including `and`,
`or`, and `not` operators and parenthesized subexpressions.

This is only the parser itself, not the code for converting the search into SQL. The new
parser isn't used yet for actual searches. Searches still use the old parser.

Some example syntax:

* `1girl 1boy`
* `1girl and 1boy` (same as `1girl 1boy`)
* `1girl or 1boy`
* `~1girl ~1boy` (same as `1girl or 1boy`)
* `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))`
* `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `1girl -(blonde_hair blue_eyes)`
* `*_hair *_eyes`
* `*_hair or *_eyes`
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Rules:

AND is implicit between terms, but may be written explicitly:

* `a b c` is `a and b and c`

AND has higher precedence (binds tighter) than OR:

* `a or b and c or d` is `a or (b and c) or d`
* `a or b c or d e` is `a or (b and c) or (d and e)`

All `~` operators in the same subexpression are combined into a single OR:

* `a b ~c ~d` is `a b (c or d)`
* `~a ~b and ~c ~d` is `(a or b) (c or d)`
* `(~a ~b) (~c ~d)` is `(a or b) (c or d)`

A single `~` operator in a subexpression by itself is ignored:

* `a ~b` is `a b`
* `~a and ~b` is `a and b`, which is `a b`
* `(~a) ~b` is `a ~b`, which is `a b`

The parser is written as a backtracking recursive descent parser built on top of
StringScanner and a handful of parser combinators. The parser generates an AST, which is
then simplified using Boolean algebra to remove redundant nodes and to convert the
expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).
2022-03-29 18:21:46 -05:00
evazion
60a26af6e3 rails: add 'URL' inflection.
Make it so we can write `ArtistURL` instead of `ArtistUrl`.
2022-02-22 00:17:53 -06:00
evazion
ef28576673 Fix #3400: Smarter thumbnail generation for videos 2021-09-05 06:10:18 -05:00
evazion
35a0c6b11f Fix #4736: Display network prefix length (if present) in API key IP whitelist. 2021-03-01 02:38:18 -06:00
evazion
3a3d456bd2 html: standardize font sizes and heading tags.
Standardize font sizes and heading tags (<h1>-<h6>) to be more
consistent across the site.

Changes:

* Introduce font size CSS variables and start replacing hardcoded font
  sizes with standard sizes.
* Change header tags to use only one <h1> per page. One <h1> per page is
  recommended for SEO purposes. Usually this is for the page title, like
  in forum threads or wiki pages.
* Standardize on <h2> for section headers in sidebars and <h3> for
  smaller subsection headers. Don't use <h4>-<h6>.
* In DText, make h1-h4 headers all the same size. Standard wiki style is
  to ignore h1-h3 and start at h4.
* In DText, make h4-h6 the same size as the h1-h3 tags outside of DText.
* In the tag list, change the <h1> and <h2> tag category headers to <h3>.
* Make usernames in comments and forum posts smaller. Also change the
  <h4> tag for the commenter name to <div class="author-name">.
* Make the tag list, paginator, and nav menu smaller on mobile.
* Change h1#app-name-header to a#app-name-header.
2020-07-23 17:34:17 -05:00
r888888888
fad0ab7c93 fixes #2133 2014-04-16 17:43:34 -07:00
albert
c8bcf5ad7c updated to rails 3.2, fixed tests 2012-01-27 14:22:47 -05:00
albert
c92bdf491e updated to rails 3.1.rc5 2011-08-06 16:22:49 -04:00
albert
98403d0cb7 fix user feedback controller test 2011-07-17 18:40:24 -04:00
albert
969185ad24 work 2011-05-26 19:10:08 -04:00
albert
1c964b5189 upgraded to rails 3.1.0.rc1 2011-05-24 18:04:25 -04:00
albert
a156cc8c62 moved some donmai-specific stuff out of default config 2010-11-19 13:44:11 -05:00
albert
5610731b35 sync 2010-08-18 18:42:33 -04:00
albert
ac98d7db37 stubbed in blank controllers/helpers/functional tests 2010-03-10 18:21:43 -05:00
albert
703eb6a1b6 added bans 2010-02-19 17:30:11 -05:00
Albert Yi
9bb07046cd initial 2010-02-04 15:08:49 -05:00