Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.
AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.
The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.
You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).
You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.
To generate tags, use the `autotag` script from the Autotagger repo, something like this:
docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz
To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
Don't record most HTTP request and response headers in the APM, except
for the User-Agent, Referer, Save-Data, X-Forwarded-For, Accept-Language,
and Content-Type headers. Recording every HTTP header for every request
takes up a lot of space and most of them aren't very useful.
Add a new tag tag search parser that supports full boolean expressions, including `and`,
`or`, and `not` operators and parenthesized subexpressions.
This is only the parser itself, not the code for converting the search into SQL. The new
parser isn't used yet for actual searches. Searches still use the old parser.
Some example syntax:
* `1girl 1boy`
* `1girl and 1boy` (same as `1girl 1boy`)
* `1girl or 1boy`
* `~1girl ~1boy` (same as `1girl or 1boy`)
* `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))`
* `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `1girl -(blonde_hair blue_eyes)`
* `*_hair *_eyes`
* `*_hair or *_eyes`
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`
Rules:
AND is implicit between terms, but may be written explicitly:
* `a b c` is `a and b and c`
AND has higher precedence (binds tighter) than OR:
* `a or b and c or d` is `a or (b and c) or d`
* `a or b c or d e` is `a or (b and c) or (d and e)`
All `~` operators in the same subexpression are combined into a single OR:
* `a b ~c ~d` is `a b (c or d)`
* `~a ~b and ~c ~d` is `(a or b) (c or d)`
* `(~a ~b) (~c ~d)` is `(a or b) (c or d)`
A single `~` operator in a subexpression by itself is ignored:
* `a ~b` is `a b`
* `~a and ~b` is `a and b`, which is `a b`
* `(~a) ~b` is `a ~b`, which is `a b`
The parser is written as a backtracking recursive descent parser built on top of
StringScanner and a handful of parser combinators. The parser generates an AST, which is
then simplified using Boolean algebra to remove redundant nodes and to convert the
expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).
Fix this deprecation warning:
DEPRECATION WARNING: ActiveSupport::TimeWithZone.name has been deprecated
and from Rails 7.1 will use the default Ruby implementation. You can set
`config.active_support.remove_deprecated_time_with_zone_name = true` to
enable the new behavior now.
Triggered by the XML serializer in the API.
Fix regression introduced in 0db20e0ca. Setting `format: false` on the
wiki pages resource disabled format negotiation on all wiki page routes,
not just the show page, which meant /wiki_pages.json no longer worked.
The fix to monkey patch the internal Rails method that parses the file
extension from the URL, and have it ignore everything but the .html,
.json, .js, and .xml extensions. This is really hacky and may break in
future Rails releases.
* Update framework files with `bin/rails app:update`.
* Update to use new Rails 7.0 default settings, except for a couple
things regarding new cookie and cache formats that would prevent us
from rolling back to Rails 6.1 if necessary.
Add ability to search jobs on the /jobs page by job type or by status.
Fixes#2577 (Search filters for delayed jobs). This wasn't possible
before with DelayedJobs because it stored the job data in a YAML string,
which made it difficult to search jobs by type. GoodJobs stores job data
in a JSON object, which is easier to search in Postgres.
Remove the DelayedJobs gem and database table. Completes the transition
to GoodJob started in c06bfa64f and f4953549a.
Downstream users can upgrade as follows:
* Stop the Rails server.
* Stop the DelayedJobs worker (normally running as `bin/delayed_job` or `bin/rails jobs:work`).
* Run `bin/rails jobs:work` to finish any pending delayed jobs.
* Run `bin/rails db:migrate` to create the good_jobs table and drop the delayed_jobs table.
* Start the Rails server again.
* Start the GoodJobs worker with `bin/good_job start`.
Switch the ActiveJob backend from DelayedJob to GoodJob. Differences:
* The job worker is run with `bin/good_job start` instead of `bin/delayed_job`.
* Jobs have an 8 hour timeout instead of a 4 hour timeout.
* Jobs don't automatically retry on failure.
* Finishing jobs are preserved and pruned after 7 days.
Monkey-patch Symbol#to_s to return a frozen (immutable) string instead
of a mutable string.
This should reduce string allocations, and thereby reduce memory usage
and garbage collector pressure, but it may be incompatible with
libraries that expect Symbol#to_s to return a mutable string.
https://bugs.ruby-lang.org/issues/16150https://github.com/Shopify/symbol-fstring
Remove the SFTP file storage backend. Downstream users can use either
sshfs (which is what Danbooru now uses in production) or rclone instead.
The Ruby SFTP gem was much slower than sshfs.
Send all logs to stderr by default instead of stdout. Fixes a problem
where parsing the output of sandboxed commands could fail, because they
could contain Rails log messages in their stdout.
When we run a command in a sandbox, we call fork+exec to run the command
in the background so we can capture its output. If Rails prints
anything to stdout between the fork and exec calls, then it will be
inadvertently captured along with the command's output. This will break
parsing of the command's output. This can happen if warning messages are
printed by Rails while setting up the sandbox between the fork and exec
calls.
Writing to stderr is also more correct, since stdout is buffered by
default, which means logs could potentially be lost if the process dies
unexpectedly before the buffers are flushed. Stderr is unbuffered by
default, which means logs will always be output immediately.
Unlike Unicorn, Puma doesn't have a builtin HTTP request timeout
mechanism, so we have to use Rack::Timeout instead.
See the caveats in the Rack::Timeout documentation [1]. In Unicorn, a
timeout would send a SIGKILL to the worker, immediately killing it. This
would result in a dropped connection and a Cloudflare 502 error to the
user. In Puma, it raises an exception, which we can catch and return a
better error to the user. On the other hand, raising an exception can
potentially corrupt application state if it's sent at the wrong time, or
be delayed indefinitely if the app is stuck in IO or C extension code.
The default request timeout is 65 seconds. 65 seconds is to give things
like HTTP requests on a 60 second timeout enough time to complete. Set
the RACK_REQUEST_TIMEOUT environment variable to change the timeout.
1: https://github.com/sharpstone/rack-timeout#further-documentation
Using `Rails.logger` here causes server boot to fail with a `Undefined
method 'tagged'` error, possibly because `Rails.logger` isn't ready yet
during early initialization.
* Add README files to several directories in app/ giving a brief
overview of some parts of Danbooru's architecture.
* Add documentation for files in config/.
Fix an issue where the New Relic agent always started in the production
environment, even when a license key wasn't configured.
Also make the New Relic agent log to stdout instead of log/newrelic_agent.log.
Fix Rails complaining about IpAddressType not being reloaded by hot
reloading:
DEPRECATION WARNING: Initialization autoloaded the constant IpAddressType.
Being able to do this is deprecated. Autoloading during initialization is going
to be an error condition in future versions of Rails.
Reloading does not reboot the application, and therefore code executed during
initialization does not run again. So, if you reload IpAddressType, for example,
the expected changes won't be reflected in that stale Class object.
This autoloaded constant has been unloaded.
In order to autoload safely at boot time, please wrap your code in a reloader
callback this way:
Rails.application.reloader.to_prepare do
# Autoload classes and modules needed at boot time here.
end
That block runs when the application boots, and every time there is a reload.
For historical reasons, it may run twice, so it has to be idempotent.
Check the "Autoloading and Reloading Constants" guide to learn more about how
Rails autoloads and reloads.
Remove a monkey patch that added a `to_i` method to `FalseClass` so that
`false.to_i` returned 0. This is legacy code that shouldn't still be in
use anywhere. It doesn't really work anyway, because `true.to_i` isn't
defined.
* Fix it so that all edit forms show an error banner if the form
has validation errors. Previously forms had to manually call
`error_messages_for`, which not all forms did.
* Fix it so that the full validation error message is shown next to each
input attribute that had errors. Also update the styling of these
error messages to look better.
Fix `normalize_whitespace` to not strip zero-width joiner characters
(U+200D). These characters are used in emoji and stripping them breaks
some artist other names that use emoji.
This upgrades from the legacy version of Stripe's checkout system to the
new version:
> The legacy version of Checkout presented customers with a modal dialog
> that collected card information, and returned a token or a source to
> your website. In contrast, the new version of Checkout is a smart
> payment page hosted by Stripe that creates payments or subscriptions. It
> supports Apple Pay, Dynamic 3D Secure, and many other features.
Basic overview of the new system:
* We send the user to a checkout page on Stripe.
* Stripe collects payment and sends us a webhook notification when the
order is complete.
* We receive the webhook notification and upgrade the user.
Docs:
* https://stripe.com/docs/payments/checkout
* https://stripe.com/docs/payments/checkout/migration#client-products
* https://stripe.com/docs/payments/handling-payment-events
* https://stripe.com/docs/payments/checkout/fulfill-orders
Most of the new settings aren't relevant to us. We do have to fix some
tests to work around a Rails bug. `assert_enqueued_email_with` uses the
wrong queue, so we have to specify it explicitly. This is fixed in Rails
HEAD but not yet released.
* Introduce an abstraction for normalizing attributes. Very loosely
modeled after https://github.com/fnando/normalize_attributes.
* Normalize wiki bodies to Unicode NFC form.
* Normalize Unicode space characters in wiki bodies (strip zero width
spaces, normalize line endings to CRLF, normalize Unicode spaces to
ASCII spaces).
* Trim spaces from the start and end of wiki page bodies. This may cause
wiki page diffs to show spaces being removed even when the user didn't
explicitly remove the spaces themselves.
Expand the tag abbreviation system introduced in b0be8ae45 so that it
works in searches and when tagging posts, not just in autocomplete.
For example, you can tag a post with /evth and it will add the tag
eyebrows_visible_through_hair. You can search for /evth and it will
search for the tag eyebrows_visible_through_hair.
Some more examples:
* /ops is short for one-piece_swimsuit
* /hooe is short for hair_over_one_eye
* /saol is short for standing_on_one_leg
* /tlozbotw is short for the_legend_of_zelda:_breath_of_the_wild
If two tags have the same abbreviation, then the larger tag takes
precedence. For example, /be is short for blue_eyes, not brown_eyes,
because blue_eyes is the bigger tag.
If there is an existing shortcut alias that conflicts with the
abbreviation, then the alias take precedence. For example, /sh is short
for suzumiya_haruhi, not short_hair, because there's an old alias for
/sh -> suzumiya_haruhi.
Standardize font sizes and heading tags (<h1>-<h6>) to be more
consistent across the site.
Changes:
* Introduce font size CSS variables and start replacing hardcoded font
sizes with standard sizes.
* Change header tags to use only one <h1> per page. One <h1> per page is
recommended for SEO purposes. Usually this is for the page title, like
in forum threads or wiki pages.
* Standardize on <h2> for section headers in sidebars and <h3> for
smaller subsection headers. Don't use <h4>-<h6>.
* In DText, make h1-h4 headers all the same size. Standard wiki style is
to ignore h1-h3 and start at h4.
* In DText, make h4-h6 the same size as the h1-h3 tags outside of DText.
* In the tag list, change the <h1> and <h2> tag category headers to <h3>.
* Make usernames in comments and forum posts smaller. Also change the
<h4> tag for the commenter name to <div class="author-name">.
* Make the tag list, paginator, and nav menu smaller on mobile.
* Change h1#app-name-header to a#app-name-header.
Rework sitemaps to provide more coverage of the site. We want every
important page on the site - including every post, tag, and wiki page -
to be indexed by Google. We do this by generating sitemaps and sitemap
indexes that contain links to every important page on the site.
Makes it so that models that have maximum length validations will add
maxlength attributes to form fields. This includes flag reasons, appeal
reasons, and forum topic titles.
Partially fixes#4519 (Add "n/m characters remaining" character counter to the appeal reason).
https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/maxlength
* Refactors DText form fields to use a custom SimpleForm input instead
of manually generated html. This fixes it so that DText fields use the
same markup as normal SimpleForm fields, which lets us apply browser
maxlength validations to DText input fields.
* Fixes autocomplete for @-mentions only working in comments and forum posts.
Now @-mention autocomplete works in all DText fields, including dmails.
Known bug: it applies in artist commentary fields when it shouldn't.