Commit Graph

912 Commits

Author SHA1 Message Date
evazion
c4498f9be7 ai tags: allow searching for ai:monochrome,0%
Allow searching for e.g. `ai:monochrome,0%` to find posts where the AI has 0% confidence
the post should be tagged monochrome.

This is useful for finding mistagged posts. You can search `monochrome ai:monochrome,0%`
to find posts that are potentially mistagged as monochrome, that is, posts that are
tagged monochrome but where the AI has 0% confidence that it should be tagged monochrome.

Not all cases will be mistags. The majority will be cases where the AI failed to identify
the tag. This is also useful for studying the AI's failures.
2022-06-27 19:36:15 -05:00
evazion
6f24db92e5 ai tags: make ai tags accessible in api via includes.
Make these things work:

* https://danbooru.donmai.us/posts.json?only=ai_tags
* https://danbooru.donmai.us/media_assets.json?only=ai_tags
* https://danbooru.donmai.us/ai_tags.json?only=media_asset,post,tag
2022-06-26 20:37:35 -05:00
evazion
54e2bbd86b posts: fix has_tag? method.
Fix a bug where `Post#has_tag?` would return false for tags that
contained colons. This caused the /ai_tags page to incorrectly say
certain tags weren't present on the post.
2022-06-26 01:07:12 -05:00
evazion
c059b4a39a ai tags: add ability to filter by confidence in post searches.
Allow searching for e.g. `ai:solo,>=90%` to to find posts that have the
solo tag with >=90% confidence. The default confidence level is 50%. The
delimiter is a comma because it's one of the few characters not allowed
in tag names.
2022-06-25 21:51:48 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
4364516f2b posts: change safe mode to only allow rating:g posts. 2022-06-06 00:56:52 -05:00
evazion
81bd86d202 posts: add "general" rating; rename "safe" rating to "sensitive".
* Add "general" rating.
* Rename "safe" rating to "sensitive".
* Change safe mode to include both rating:s and rating:g.
* Treat rating:safe as a synonym for rating:sensitive.
* Link "howto:rate" in the post edit form.
2022-05-22 13:38:45 -05:00
evazion
e53e8da3a1 Fix #5171: Don't allow wide_image and tall_image to be manually tagged. 2022-05-21 14:01:31 -05:00
evazion
25f0b01d50 posts: optimize has: metatag.
Use EXISTS queries instead of `id IN (?)` subqueries because they're
faster, especially when negated.
2022-05-18 13:59:04 -05:00
evazion
181639368c posts: add is: and has: metatags.
Add the following metatags:

* is:parent
* is:child
* is:safe
* is:questionable
* is:explicit
* is:sfw (same as -rating:q,e)
* is:nsfw (same as rating:q,e)
* is:active
* is:deleted
* is:pending
* is:flagged
* is:appealed
* is:banned
* is:modqueue
* is:unmoderated
* is:jpg
* is:png
* is:gif
* is:mp4
* is:webm
* is:swf
* is:zip
* has:parent
* has:children
* has:source
* has:appeals
* has:flags
* has:replacements
* has:comments
* has:commentary
* has:notes
* has:pools

All of these searches were already possible with other metatags, but these might be more convenient.
2022-05-18 13:04:15 -05:00
evazion
141044d352 posts: refactor hardcoded ratings.
Refactor ratings to not be hardcoded in various places. Make it so
all ratings are defined in Post::RATINGS.

Also make it so that you can search multiple ratings at once with `rating:q,e`.
2022-05-18 13:04:15 -05:00
evazion
ce18c866d9 Fix #4582: Safebooru should not block "censored" tag
* Remove the default list of blocked tags in safe mode.
* Change it so that tags that are blocked in safe mode are filtered out
  at the database level rather than at the html level.
2022-05-17 02:24:16 -05:00
evazion
34038d71ae Fix #5153: Using the child metatag without an id causes unexpected behavior 2022-05-04 23:19:58 -05:00
evazion
d511a6b6cf posts: fix is_taken_down flag.
The second bit of the `bit_flags` field was previously used for the
`has_cropped` flag, which is still set on many posts, so it's not safe
to reuse it for the `is_taken_down flag.
2022-05-03 06:48:29 -05:00
evazion
48b8daa397 posts: add is_taken_down flag.
Posts with the is_taken_down flag are "double-banned" and only visible to moderators.
2022-05-03 05:51:17 -05:00
evazion
ac98c142a4 posts: move expunged image to trash folder.
When a post is expunged, move the image to a trash folder so it can be
recovered if needed.
2022-05-03 05:51:09 -05:00
evazion
d2502a0c40 Fix #4877: Error when tagging favgroup:foo when post is already in favgroup:foo
Bug: If a tag edit failed because it contained a metatag that raised an
exception, then a new post version would be created even though the edit
didn't go through. This could happen if the newpool:, fav:, favgroup:,
disapproved:, status:active, or status:banned metatags failed (for
example, because of a privilege error).

Fix: Silently ignore all errors raised when applying metatags. This way
the edit will always succeed, so erroneous post versions won't be created.
2022-05-02 15:56:16 -05:00
evazion
93352b318e Fix #5146: Adding an existing favorite to favorite groups leads to an error.
Show "Favgroup already contains post XXX" error when trying to add a
post to a favgroup that already contains that post.
2022-05-02 15:56:16 -05:00
evazion
2d9bba4abb posts: automatically add the bad_link and bad_source tags.
Automatically add the bad_link tag when the source is an image url from
a known site, but it can't be converted to a page url (for example, a
Twitter or Tumblr direct image link).

Automatically add the bad_source tag when the source is from a known
site, but it's not an image or page url (for example, a Twitter or Pixiv
profile url)
2022-05-01 21:01:36 -05:00
evazion
f117049750 users: remove 'hide deleted posts' account setting.
This setting automatically added the `-status:deleted` metatag to all searches. This meant deleted
posts were filtered out at the database level, rather than at the html level. This way searches
wouldn't have less-than-full pages.

The cost was that searches were slower, mainly because post counts weren't cached. Normally when you
search for a tag, we can get the post count from the tags table. If the search is actually like
`touhou -status:deleted`, then we don't know the count and we have to calculate it on demand.

This option is being removed because it did the opposite of what people thought it did. People
thought it made deleted posts visible, when actually it made them more hidden.
2022-05-01 00:47:46 -05:00
evazion
fdc1130aea Fix #5150: rating: metatag doesn't work on betabooru upload page. 2022-04-30 20:22:26 -05:00
evazion
bbe748bd2b posts: factor out post edit logic.
Factor out most of the tag edit logic from the Post class to a new
PostEdit class. The PostEdit class contains the logic for parsing tags
and metatags from the tag edit string, and for determining which tags
were added or removed by the edit.

Fixes various bugs caused by not calculating the set of added or removed
tags correctly, for example when tag category prefixes were used (e.g.
`copy:touhou`) or when the same tag was added and removed in the same
edit (e.g. `touhou -touhou`).

Fixes #5123: Tag categorization prefixes bypass deprecation check
Fixes #5126: Negating a deprecated tag will still cause the warning to show
Fixes #3477: Remove tag validator triggering on tag category changes
Fixes #4848: newpool: metatag doesn't parse correctly
2022-04-29 17:13:33 -05:00
nonamethanks
234ac98640 Posts: don't try to ban/unban a post that is already banned/unbanned 2022-04-19 21:40:47 +02:00
evazion
af183467b6 post queries: switch to new post search engine.
Switch to the post search engine using the new PostQuery parser. The new
engine fully supports AND, OR, and NOT operators and grouping expressions
with parentheses.

Highlights:

New OR operator:

* `skirt or dress` (same as `~skirt ~dress`)

Tags can be grouped with parentheses:

* `1girl (skirt or dress)`
* `(blonde_hair blue_eyes) or (red_hair green_eyes)`
* `~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `(pantyhose or thighhighs) (black_legwear or brown_legwear)`
* `(~pantyhose ~thighhighs) (~black_legwear ~brown_legwear)` (same as above)

Metatags can be OR'd together:

* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Wildcard tags can combined with either AND or OR:

* `black_* white_*` (find posts with at least one black_* tag AND one white_* tag)
* `black_* or white_*` (find posts with at least one black_* tag OR one white_* tag)
* `~black_* ~white_*` (same as above)

See 4c7cfc73 for more syntax examples.

Fixes #4949: And+or search?
Fixes #5056: Wildcard searches return unexpected results when combined with OR searches
2022-04-17 23:20:22 -05:00
evazion
e35bbb8bc8 Merge pull request #5120 from nottalulah/favgroup-any
favgroups: allow favgroup:any/none searches
2022-04-17 22:55:55 -05:00
Talulah
c1996e4f06 favgroups: allow favgroup:any/none searches 2022-04-12 23:01:19 -03:00
nonamethanks
1a990d5ab9 Allow post disapprovals to be edited 2022-04-11 21:05:44 +02:00
nonamethanks
ea76a889db Add ability to mark tags as deprecated
* Deprecated tags can't be added to posts, but existing deprecated tags
  in a post won't be removed
* Only empty tags can be marked as deprecated manually
* No tags can be manually undeprecated
** These limits don't apply to admins
* Deprecating or undeprecating a tag will create a new mod action to
  prevent people from going rogue
* Added deprecate/undeprecate commands for BURs
* Deprecating a tag via BUR removes all implications to and from it as well
2022-04-08 09:07:14 +02:00
evazion
86de5cb5d2 posts: fixup flagger: metatag.
Fix regression in 01a22930e.
2022-04-06 23:57:50 -05:00
evazion
01a22930e7 posts: move attribute search methods from PostQueryBuilder to Post.
Move `status_matches` etc methods from PostQueryBuilder to Post. This is
to make refactoring to use the new query parser easier.
2022-04-06 20:25:09 -05:00
evazion
c707190bc1 posts: refactor modules to use concerning. 2022-04-06 20:25:00 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
56f47c60e1 posts: fix exception when viewing post with source Blog..
Fix a PublicSuffix::DomainNotAllowed exception raised with viewing or editing a post
with a source like `Blog.`.

This happened when parsing the post's source. `Danbooru::URL.parse("Blog.")` would
heuristically parse the source into `http://blog`. Calling any methods related to the
URL's hostname or domain would lead to calling `PublicSuffix.parse("blog")`, which
would fail with PublicSuffix::DomainNotAllowed.
2022-03-21 03:24:50 -05:00
evazion
ad3f3fdce3 Fix unqualified column references.
Fix various places to avoid unqualified column references to prevent any
potential ambiguous column errors.
2022-03-01 17:48:16 -06:00
evazion
7d49ab6130 Add Danbooru::URL class.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
2022-02-22 00:17:53 -06:00
evazion
bdf83d1ffd uploads: refactor /uploads/:id page for multi-file uploads. 2022-02-14 00:41:08 -06:00
evazion
9dd1afbedd posts: fix exception in expunge method.
Fix regression in 7c63ac8db. Posts no longer have an association with
uploads, so expunge failed when it tried to destroy the associated upload.
2022-02-05 22:47:53 -06:00
evazion
2b1c58c959 Fix #4987: Can't populate tag string from upload url anymore.
Usage: https://danbooru.donmai.us/uploads/new?url=...&post[tag_string]=...&post[rating]=...

* Pass the URL parameters from the /uploads/new page to the /uploads/:id page.
* Fix the /uploads/:id page throwing an "unpermitted parameters" error
  when given URL params for the post edit form.
2022-02-03 19:41:04 -06:00
evazion
65b7c08e33 post replacements: refactor and fix tests.
* Move replacement tests from test/unit/upload_service_test.rb to
  test/functional/post_replacement_controller_test.rb
* Move UploadService::Replacer to PostReplacementProcessor.
* Fix a minor bug where if you used the API to replace a post with a file,
  the replacement would fail unless you passed an empty string for the
  replacement_url.
2022-01-31 14:17:14 -06:00
evazion
61c043c6b1 posts: normalize Unicode to NFC form in post sources.
Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.

Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
2022-01-31 14:16:49 -06:00
evazion
dadd6aed47 uploads: fix not being able to change the source field during upload.
Fix not being able to change the post's source when submitting the
upload. For example, if you were uploading a Twitter image from a direct
Twitter image URL, and you tried to change the source to the tweet URL
on the upload page before creating the post, then the source would be
ignored when the post was created.
2022-01-30 03:13:49 -06:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
c3c4f5a2a7 Fix #4957: Autotag non-web_source.
Autotag non-web_source on posts that have a non-http:// or https:// URL.
Add a fix script to backfill old posts.

Syntactically invalid URLs are still considered web sources. For
example, `https://google,com` technically isn't a valid URL, but it's
not considered a non-web source.
2022-01-14 22:58:27 -06:00
evazion
33828ec8a4 posts: remove set_tag_string method. 2022-01-11 10:06:46 -06:00
evazion
2e1c7ce6d3 Fix #4951: chartags:0 returning posts with chartags.
* Add fix script to fix posts with incorrect tag_count_* fields.
* Simplify the code for updating tag_count_* fields (no functional change).
2022-01-10 13:33:56 -06:00
evazion
72ea78e697 searchable: replace find_ordered with in_order_of.
Rails 7 added an `in_order_of` method that does what our `find_ordered`
method did before.
2022-01-07 14:24:57 -06:00
evazion
53527b9b29 posts: remove pool_string, fav_string from ignored columns.
These columns have been removed from the database.
2022-01-07 11:22:10 -06:00
evazion
993965b654 posts: reduce string allocations during thumbnail generation.
Further micro-optimize thumbnails to reduce string allocations.

`Post#levelblocked?` gets called once per thumbnail. Before it split the
tag string, which meant one string allocation for each tag on each post.
This added up to thousands of string allocations per pageload.
2021-12-16 17:17:06 -06:00
evazion
1c5786d20f posts: remove cropped thumbnails. 2021-12-16 15:58:29 -06:00