danbooru

Author	SHA1	Message	Date
evazion	1aeb52186e	Add AI tag model and UI. Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags. AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that repo for details about the model. The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is designed to be as space-efficient as possible, since in production we have over 300 million AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus indexes. You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are potentially mistagged (or more likely where the AI missed the tag). You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by confidence level. You can also search unposted media assets by AI tag. To generate tags, use the `autotag` script from the Autotagger repo, something like this: docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images \| gzip > tags.csv.gz To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.	2022-06-24 04:54:26 -05:00
evazion	1e78b97eb8	Add config options to disable comments and the forum. Add options to disable comments, the forum, and autocomplete. This is for personal boorus and potentially for safe mode. Note that disabling the forum may cause difficulties with creating and approving BURs. Disabling comments and the forum merely hides them from most areas, rather than completely removing them.	2022-05-18 14:45:40 -05:00
evazion	181639368c	posts: add is: and has: metatags. Add the following metatags: * is:parent * is:child * is:safe * is:questionable * is:explicit * is:sfw (same as -rating:q,e) * is:nsfw (same as rating:q,e) * is:active * is:deleted * is:pending * is:flagged * is:appealed * is:banned * is:modqueue * is:unmoderated * is:jpg * is:png * is:gif * is:mp4 * is:webm * is:swf * is:zip * has:parent * has:children * has:source * has:appeals * has:flags * has:replacements * has:comments * has:commentary * has:notes * has:pools All of these searches were already possible with other metatags, but these might be more convenient.	2022-05-18 13:04:15 -05:00
evazion	141044d352	posts: refactor hardcoded ratings. Refactor ratings to not be hardcoded in various places. Make it so all ratings are defined in Post::RATINGS. Also make it so that you can search multiple ratings at once with `rating:q,e`.	2022-05-18 13:04:15 -05:00
evazion	031ab1e833	Fix #4752 : Category metatags show up as errors in tag edit box. Fix category prefix metatags not working in autocomplete. Now typing e.g. `copy:t` will show tags starting with 't' in autocomplete. Also fix it so that tags beginning with a '(' work in autocomplete. Typing e.g. `-(tou` will show `touhou` in autocomplete. This also fixes it so that when you type a negated tag in autocomplete, e.g. `-touhou`, it sends `touhou` in the autocomplete API call, rather than `-touhou`. This makes caching more effective since negated tags will be cached the same as non-negated tags.	2022-04-29 21:47:41 -05:00
evazion	71c8768fe2	autocomplete: fix autocomplete not working. Broken by `7fe7175`. PostQuery is no longer normalized by default, so the AST structure not the same as before, which broke autocomplete.	2022-04-04 18:09:56 -05:00
evazion	04551b8154	autocomplete: replace calls to PostQueryBuilder with PostQuery.	2022-03-30 02:12:25 -05:00
evazion	a7dc05ce63	Enable frozen string literals. Make all string literals immutable by default.	2021-12-14 21:33:27 -06:00
Lily	647848b499	remove references to locks	2021-10-24 15:16:48 -03:00
evazion	206a4b5de5	docker: avoid rebuilding CSS/JS assets on every commit. Restructure the Dockerfile and the CSS/JS files so that we only rebuild the CSS and JS when they change, not on every commit. Before it took several minutes to rebuild the Docker image after every commit, even when the JS/CSS files didn't change. This also made pulling images slower. This requires refactoring the CSS and JS to not use embedded Ruby (ERB) templates, since this made the CSS and JS dependent on the Ruby codebase, which is why we had to rebuild the assets after every Ruby change.	2021-10-13 02:48:30 -05:00
evazion	ed302fdf4d	docs: add documentation for various classes in app/logical.	2021-06-23 06:23:29 -05:00
evazion	1b93cbd075	autocomplete: match static metatags case-sensitively. Fix rating:S not matching rating:s.	2021-02-25 06:16:17 -06:00
evazion	be1251b6be	autocomplete: optimize various types of bogus input. Optimize autocomplete to ignore various types of bogus input that will never match anything. It turns out it's not uncommon for people to do things like paste random URLs into autocomplete, or hold down keys, or enter long strings of gibberish text (sometimes in other languages). Some things, like autocorrect and slash abbreviations, become pathologically slow when fed certain types of bad input. Autocomplete will abort and return nothing in the following situations: * Searching for URLs (tags that start with http:// or https://). * Overly long tags (strings longer than the 170 char tag name limit). * Slash abbreviations longer than 10 chars (e.g. typing `/qwoijqoiqogirqewgoi`). * Slash abbreviations that aren't alphanumeric (e.g. typing `/////////`). * Autocorrect input that contains too much punctuation and not enough actual letters.	2021-01-11 05:12:09 -06:00
evazion	fc5db679e4	autocomplete: optimize searching by artist/wiki page other names. Optimize searches for non-English phrases in autocomplete. These searches were pretty slow, and could sometimes cause sitewide lag spikes when users typed long strings of non-English text into the search box and caused an unintentional DoS. The trick is to use an `array_to_tsvector(other_names) USING gin` index on other_names. This supports fast string prefix matching against all elements of the array. The downside is that it doesn't allow infix or suffix matches, so we can't support wildcards in general. Wildcards didn't quite work anyway, since artist and wiki other names can contain literal '*' characters.	2021-01-10 03:35:12 -06:00
evazion	5b7894a8b2	autocomplete: fix exception when type param is missing.	2021-01-01 04:06:38 -06:00
evazion	3ad4beac02	autocomplete: fix exception when completing unsupported metatags.	2020-12-20 01:27:48 -06:00
evazion	2c1da660fd	tags: allow tag abbreviations in searches and during tagging. Expand the tag abbreviation system introduced in `b0be8ae45` so that it works in searches and when tagging posts, not just in autocomplete. For example, you can tag a post with /evth and it will add the tag eyebrows_visible_through_hair. You can search for /evth and it will search for the tag eyebrows_visible_through_hair. Some more examples: * /ops is short for one-piece_swimsuit * /hooe is short for hair_over_one_eye * /saol is short for standing_on_one_leg * /tlozbotw is short for the_legend_of_zelda:_breath_of_the_wild If two tags have the same abbreviation, then the larger tag takes precedence. For example, /be is short for blue_eyes, not brown_eyes, because blue_eyes is the bigger tag. If there is an existing shortcut alias that conflicts with the abbreviation, then the alias take precedence. For example, /sh is short for suzumiya_haruhi, not short_hair, because there's an old alias for /sh -> suzumiya_haruhi.	2020-12-17 23:57:13 -06:00
evazion	26246b0ac9	autocomplete: fix exception when typing "/" in autocomplete. Fix an exception that could occur when typing "/" by itself in autocomplete and a regular tag starting with "/" was returned. This caused an exception in `r[:antecedent].length` because the tag's antecedent was nil.	2020-12-14 21:57:28 -06:00
evazion	4cdaf7bcdf	autocomplete: update html data attributes. * Remove the `source` and `weight` html data attributes (no longer used). * Make the `type` html data attribute properly indicate the completion type. Valid types: `tag`, `tag-alias`, `tag-abbreviation`, `tag-autocorrect`, `tag-other-name`.	2020-12-14 18:58:11 -06:00
evazion	c02c31b966	autocomplete: recognize Japanese tags in autocomplete. Allowing typing Japanese tags in autocomplete. For example, typing 東方 in autocomplete will be completed to the touhou tag. Typing ぶくぶ will complete to the bkub tag. This works using wiki page and artist other names. Effectively, any name listed as an other name in a wiki or artist page will be treated like an alias for autocomplete purposes. This is limited to non-ASCII other names, to prevent English other names from interfering with regular tag searches.	2020-12-14 18:58:11 -06:00
evazion	b002bf25f5	autocomplete: display autocorrected tags like aliases. Display autocorrected tags similar to aliases, with an arrow pointing at the corrected tag, but with a dotted underline beneath the misspelled tag to indicate that it's misspelled.	2020-12-13 04:10:48 -06:00
evazion	6a46aeb55c	autocomplete: tune autocorrect algorithm. Tune autocorrect to produce fewer false positives. Before we used trigram similarity. Now we use Levenshtein edit distance with a dynamic typo threshold. Trigram similarity was able to correct large transpositions (e.g. `miku_hatsune` -> `hatsune_miku`), but it was bad at correcting small typos. Levenshtein is good at small typos, but can't correct large transpositions.	2020-12-13 04:10:48 -06:00
evazion	119268e118	autocomplete: fix exception when completing saved search labels. Fix an exception that was thrown when trying to autocomplete saved search labels (e.g. `search:all`) as an anonymous user. This was a pre-existing bug.	2020-12-13 00:45:22 -06:00
evazion	d6a5b9e252	autocomplete: rework cache policy. The previous cache policy was that all autocomplete results were cached for a fixed 7 days. The new policy is that if autocomplete returns more than 10 results they're cached for 24 hours, otherwise if it returns less than 10 results they're cached for 1 hour. The rationale is that if autocomplete returns a lot of results, then the top 10 results are relatively stable and unlikely to change, but if it returns less than 10 results, then the results are unstable and can be easily changed. We also change it so that autocomplete calls can be cached publicly. Public caching means that HTTP requests are cached by Cloudflare. This will ideally reduce load on the server and reduce latency for end users. This is only safe for calls that return the same results for all users (i.e. the results don't depend on the current user), since the cache is publicly shared by all users. Currently username, favgroup, and saved search autocomplete results depend on the current user, so they can't be publicly cached.	2020-12-13 00:45:22 -06:00
evazion	b0be8ae456	autocomplete: rework tag autocomplete behavior. Reworks tag autocomplete to work the same way for all users. Previously autocomplete for Builders worked differently than autocomplete for regular users. This is how it works now: * If the search starts with a slash (/), then do a tag abbreviation match. For example, `/evth` matches eyebrows_visible_through_hair. * Otherwise if the search contains a wildcard (), then just do a simple wildcard search. Otherwise do a tag prefix match against tags and aliases. For example, `black` matches all tags or aliases beginning with `black`. * If the tag prefix match returns no results, then do a autocorrect match. The differences for regular users: * You can abbreviate tags with a slash (/). The differences for Builders: * Now tag abbreviations have to start with a slash (/). * Autocorrect isn't performed unless a regular search returns no results. * Results are always sorted by tag count. Before different types of results (regular tag matches, alias matches, abbreviation matches, and autocorrect matches) were all mixed together based on a tag weighting scheme.	2020-12-13 00:45:22 -06:00
evazion	adc1c2c2cc	autocomplete: refactor javascript to use /autocomplete endpoint. This refactors the autocomplete Javascript to use a single dedicated /autocomplete.json endpoint instead of a bunch of separate endpoints. This simplifies the autocomplete Javascript by making it so that instead of calling a different endpoint for each type of query (for users, wiki pages, pools, artists, etc), then having to parse the results of each call to get the data we need, we can call a single endpoint that returns exactly what we need. This also means we don't have to parse searches clientside in order to autocomplete metatags. Instead we can just pass the search term to the server and let it parse the search, which is easy to do serverside. Finally, this makes autocomplete easier to test, and it makes it easier to add more sophisticated autocomplete behavior, since most of the logic lives serverside.	2020-12-13 00:45:22 -06:00

26 Commits