Allow searching for e.g. `ai:monochrome,0%` to find posts where the AI has 0% confidence
the post should be tagged monochrome.
This is useful for finding mistagged posts. You can search `monochrome ai:monochrome,0%`
to find posts that are potentially mistagged as monochrome, that is, posts that are
tagged monochrome but where the AI has 0% confidence that it should be tagged monochrome.
Not all cases will be mistags. The majority will be cases where the AI failed to identify
the tag. This is also useful for studying the AI's failures.
Fix a bug where `Post#has_tag?` would return false for tags that
contained colons. This caused the /ai_tags page to incorrectly say
certain tags weren't present on the post.
Allow searching for e.g. `ai:solo,>=90%` to to find posts that have the
solo tag with >=90% confidence. The default confidence level is 50%. The
delimiter is a comma because it's one of the few characters not allowed
in tag names.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.
AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.
The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.
You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).
You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.
To generate tags, use the `autotag` script from the Autotagger repo, something like this:
docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz
To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
* Add "general" rating.
* Rename "safe" rating to "sensitive".
* Change safe mode to include both rating:s and rating:g.
* Treat rating:safe as a synonym for rating:sensitive.
* Link "howto:rate" in the post edit form.
Refactor ratings to not be hardcoded in various places. Make it so
all ratings are defined in Post::RATINGS.
Also make it so that you can search multiple ratings at once with `rating:q,e`.
* Remove the default list of blocked tags in safe mode.
* Change it so that tags that are blocked in safe mode are filtered out
at the database level rather than at the html level.
The second bit of the `bit_flags` field was previously used for the
`has_cropped` flag, which is still set on many posts, so it's not safe
to reuse it for the `is_taken_down flag.
Bug: If a tag edit failed because it contained a metatag that raised an
exception, then a new post version would be created even though the edit
didn't go through. This could happen if the newpool:, fav:, favgroup:,
disapproved:, status:active, or status:banned metatags failed (for
example, because of a privilege error).
Fix: Silently ignore all errors raised when applying metatags. This way
the edit will always succeed, so erroneous post versions won't be created.
Automatically add the bad_link tag when the source is an image url from
a known site, but it can't be converted to a page url (for example, a
Twitter or Tumblr direct image link).
Automatically add the bad_source tag when the source is from a known
site, but it's not an image or page url (for example, a Twitter or Pixiv
profile url)
This setting automatically added the `-status:deleted` metatag to all searches. This meant deleted
posts were filtered out at the database level, rather than at the html level. This way searches
wouldn't have less-than-full pages.
The cost was that searches were slower, mainly because post counts weren't cached. Normally when you
search for a tag, we can get the post count from the tags table. If the search is actually like
`touhou -status:deleted`, then we don't know the count and we have to calculate it on demand.
This option is being removed because it did the opposite of what people thought it did. People
thought it made deleted posts visible, when actually it made them more hidden.
Factor out most of the tag edit logic from the Post class to a new
PostEdit class. The PostEdit class contains the logic for parsing tags
and metatags from the tag edit string, and for determining which tags
were added or removed by the edit.
Fixes various bugs caused by not calculating the set of added or removed
tags correctly, for example when tag category prefixes were used (e.g.
`copy:touhou`) or when the same tag was added and removed in the same
edit (e.g. `touhou -touhou`).
Fixes#5123: Tag categorization prefixes bypass deprecation check
Fixes#5126: Negating a deprecated tag will still cause the warning to show
Fixes#3477: Remove tag validator triggering on tag category changes
Fixes#4848: newpool: metatag doesn't parse correctly
Switch to the post search engine using the new PostQuery parser. The new
engine fully supports AND, OR, and NOT operators and grouping expressions
with parentheses.
Highlights:
New OR operator:
* `skirt or dress` (same as `~skirt ~dress`)
Tags can be grouped with parentheses:
* `1girl (skirt or dress)`
* `(blonde_hair blue_eyes) or (red_hair green_eyes)`
* `~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `(pantyhose or thighhighs) (black_legwear or brown_legwear)`
* `(~pantyhose ~thighhighs) (~black_legwear ~brown_legwear)` (same as above)
Metatags can be OR'd together:
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`
Wildcard tags can combined with either AND or OR:
* `black_* white_*` (find posts with at least one black_* tag AND one white_* tag)
* `black_* or white_*` (find posts with at least one black_* tag OR one white_* tag)
* `~black_* ~white_*` (same as above)
See 4c7cfc73 for more syntax examples.
Fixes#4949: And+or search?
Fixes#5056: Wildcard searches return unexpected results when combined with OR searches
* Deprecated tags can't be added to posts, but existing deprecated tags
in a post won't be removed
* Only empty tags can be marked as deprecated manually
* No tags can be manually undeprecated
** These limits don't apply to admins
* Deprecating or undeprecating a tag will create a new mod action to
prevent people from going rogue
* Added deprecate/undeprecate commands for BURs
* Deprecating a tag via BUR removes all implications to and from it as well
Fix a PublicSuffix::DomainNotAllowed exception raised with viewing or editing a post
with a source like `Blog.`.
This happened when parsing the post's source. `Danbooru::URL.parse("Blog.")` would
heuristically parse the source into `http://blog`. Calling any methods related to the
URL's hostname or domain would lead to calling `PublicSuffix.parse("blog")`, which
would fail with PublicSuffix::DomainNotAllowed.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
* Move replacement tests from test/unit/upload_service_test.rb to
test/functional/post_replacement_controller_test.rb
* Move UploadService::Replacer to PostReplacementProcessor.
* Fix a minor bug where if you used the API to replace a post with a file,
the replacement would fail unless you passed an empty string for the
replacement_url.
Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.
Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
Fix not being able to change the post's source when submitting the
upload. For example, if you were uploading a Twitter image from a direct
Twitter image URL, and you tried to change the source to the tweet URL
on the upload page before creating the post, then the source would be
ignored when the post was created.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.
The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:
* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
uploads. This was the cause of most weird "md5 confirmation doesn't
match md5" errors.
(Not all of these are implemented yet.)
Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:
* The user goes to /uploads/new and chooses a file or pastes an URL into
the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.
This is the data model:
* An upload represents a set of files uploaded to Danbooru by a user.
Uploaded files don't have to belong to a post. An upload has an
uploader, a status (pending, processing, completed, or error), a
source (unless uploading from a file), and a list of media assets
(image or video files).
* There is a has-and-belongs-to-many relationship between uploads and
media assets. An upload can have many media assets, and a media asset
can belong to multiple uploads. Uploads are joined to media assets
through a upload_media_assets table.
An upload could potentially have multiple media assets if it's a Pixiv
or Twitter gallery. This is not yet implemented (at the moment all
uploads have one media asset).
A media asset can belong to multiple uploads if multiple people try
to upload the same file, or if the same user tries to upload the same
file more than once.
New features:
* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.
Fixes:
* Improved error messages when uploading invalid files, bad URLs, and
when forgetting the rating.
Autotag non-web_source on posts that have a non-http:// or https:// URL.
Add a fix script to backfill old posts.
Syntactically invalid URLs are still considered web sources. For
example, `https://google,com` technically isn't a valid URL, but it's
not considered a non-web source.
Further micro-optimize thumbnails to reduce string allocations.
`Post#levelblocked?` gets called once per thumbnail. Before it split the
tag string, which meant one string allocation for each tag on each post.
This added up to thousands of string allocations per pageload.