Commit Graph

180 Commits

Author SHA1 Message Date
evazion
b41b67af6c media assets: add dynamically-generated thumbnails (owner-only).
Add ability to dynamically generate thumbnails with:

* https://danbooru.donmai.us/media_assets/6961761.jpg?width=180&height=180

This is currently restricted to the Owner-level user because it's slow.
2022-11-01 01:36:38 -05:00
evazion
edc7e52353 emails: automatically fix typos in email addresses.
Try to automatically fix various kind of typos and common mistakes in
email addresses when a user creates a new account. It's common for users
to signup with addresses like `name@gmai.com`, which leads to bounces
when we try to send the welcome email.
2022-10-14 18:49:33 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
c187d56cce apm: record only select http headers in the apm.
Don't record most HTTP request and response headers in the APM, except
for the User-Agent, Referer, Save-Data, X-Forwarded-For, Accept-Language,
and Content-Type headers. Recording every HTTP header for every request
takes up a lot of space and most of them aren't very useful.
2022-04-19 06:59:24 -05:00
evazion
5f1c296011 tags: don't allow tags with unbalanced parentheses.
Don't allow tags to have unbalanced parentheses, except for a few
emoticon tags as special exceptions to the rule.
2022-04-17 23:20:22 -05:00
evazion
c21c25089d apm: disable Elastic APM initializer.
This caused problems because it effectively started the APM agent twice,
causing the configuration to be ignored and duplicate events to be sent.
2022-04-16 18:07:04 -05:00
evazion
f69847fc59 Add Elastic APM integration.
https://www.elastic.co/guide/en/apm/agent/ruby/4.x/introduction.html
2022-04-12 20:49:10 -05:00
evazion
98b313f8de Remove NewRelic integration.
Remove the NewRelic integration in preparation for migrating to Elastic APM instead.
2022-04-11 01:46:30 -05:00
evazion
4c7cfc73c6 search: add new tag search parser.
Add a new tag tag search parser that supports full boolean expressions, including `and`,
`or`, and `not` operators and parenthesized subexpressions.

This is only the parser itself, not the code for converting the search into SQL. The new
parser isn't used yet for actual searches. Searches still use the old parser.

Some example syntax:

* `1girl 1boy`
* `1girl and 1boy` (same as `1girl 1boy`)
* `1girl or 1boy`
* `~1girl ~1boy` (same as `1girl or 1boy`)
* `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))`
* `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `1girl -(blonde_hair blue_eyes)`
* `*_hair *_eyes`
* `*_hair or *_eyes`
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Rules:

AND is implicit between terms, but may be written explicitly:

* `a b c` is `a and b and c`

AND has higher precedence (binds tighter) than OR:

* `a or b and c or d` is `a or (b and c) or d`
* `a or b c or d e` is `a or (b and c) or (d and e)`

All `~` operators in the same subexpression are combined into a single OR:

* `a b ~c ~d` is `a b (c or d)`
* `~a ~b and ~c ~d` is `(a or b) (c or d)`
* `(~a ~b) (~c ~d)` is `(a or b) (c or d)`

A single `~` operator in a subexpression by itself is ignored:

* `a ~b` is `a b`
* `~a and ~b` is `a and b`, which is `a b`
* `(~a) ~b` is `a ~b`, which is `a b`

The parser is written as a backtracking recursive descent parser built on top of
StringScanner and a handful of parser combinators. The parser generates an AST, which is
then simplified using Boolean algebra to remove redundant nodes and to convert the
expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).
2022-03-29 18:21:46 -05:00
evazion
c989726313 rails: enable remove_deprecated_time_with_zone_name.
Fix this deprecation warning:

    DEPRECATION WARNING: ActiveSupport::TimeWithZone.name has been deprecated
    and from Rails 7.1 will use the default Ruby implementation. You can set
    `config.active_support.remove_deprecated_time_with_zone_name = true` to
    enable the new behavior now.

Triggered by the XML serializer in the API.
2022-03-09 01:14:09 -06:00
evazion
60a26af6e3 rails: add 'URL' inflection.
Make it so we can write `ArtistURL` instead of `ArtistUrl`.
2022-02-22 00:17:53 -06:00
evazion
90be15e0b5 Fix #4973: Wiki pages json index returns 404.
Fix regression introduced in 0db20e0ca. Setting `format: false` on the
wiki pages resource disabled format negotiation on all wiki page routes,
not just the show page, which meant /wiki_pages.json no longer worked.

The fix to monkey patch the internal Rails method that parses the file
extension from the URL, and have it ignore everything but the .html,
.json, .js, and .xml extensions. This is really hacky and may break in
future Rails releases.
2022-01-22 16:52:20 -06:00
evazion
bd7018a3ae rails: update cache format version to 7.0. 2022-01-10 11:39:09 -06:00
evazion
87dfc66073 rails: update framework files and settings to 7.0.
* Update framework files with `bin/rails app:update`.
* Update to use new Rails 7.0 default settings, except for a couple
  things regarding new cookie and cache formats that would prevent us
  from rolling back to Rails 6.1 if necessary.
2022-01-07 21:10:55 -06:00
evazion
82211ba935 jobs: add ability to search jobs on /jobs page.
Add ability to search jobs on the /jobs page by job type or by status.

Fixes #2577 (Search filters for delayed jobs). This wasn't possible
before with DelayedJobs because it stored the job data in a YAML string,
which made it difficult to search jobs by type. GoodJobs stores job data
in a JSON object, which is easier to search in Postgres.
2022-01-04 17:18:36 -06:00
evazion
3841fba78e jobs: remove DelayedJobs.
Remove the DelayedJobs gem and database table. Completes the transition
to GoodJob started in c06bfa64f and f4953549a.

Downstream users can upgrade as follows:

* Stop the Rails server.
* Stop the DelayedJobs worker (normally running as `bin/delayed_job` or `bin/rails jobs:work`).
* Run `bin/rails jobs:work` to finish any pending delayed jobs.
* Run `bin/rails db:migrate` to create the good_jobs table and drop the delayed_jobs table.
* Start the Rails server again.
* Start the GoodJobs worker with `bin/good_job start`.
2022-01-04 15:58:12 -06:00
evazion
f4953549ae jobs: switch from DelayedJob to GoodJob.
Switch the ActiveJob backend from DelayedJob to GoodJob. Differences:

* The job worker is run with `bin/good_job start` instead of `bin/delayed_job`.
* Jobs have an 8 hour timeout instead of a 4 hour timeout.
* Jobs don't automatically retry on failure.
* Finishing jobs are preserved and pruned after 7 days.
2022-01-04 13:52:08 -06:00
evazion
67b96135dd Make Symbol#to_s return frozen string.
Monkey-patch Symbol#to_s to return a frozen (immutable) string instead
of a mutable string.

This should reduce string allocations, and thereby reduce memory usage
and garbage collector pressure, but it may be incompatible with
libraries that expect Symbol#to_s to return a mutable string.

https://bugs.ruby-lang.org/issues/16150
https://github.com/Shopify/symbol-fstring
2021-12-14 21:33:27 -06:00
evazion
6fc0854b4c Remove StorageManager::SFTP.
Remove the SFTP file storage backend. Downstream users can use either
sshfs (which is what Danbooru now uses in production) or rclone instead.
The Ruby SFTP gem was much slower than sshfs.
2021-12-01 23:46:20 -06:00
evazion
3f9a85a828 Rails: send logs to stderr by default, not stdout.
Send all logs to stderr by default instead of stdout. Fixes a problem
where parsing the output of sandboxed commands could fail, because they
could contain Rails log messages in their stdout.

When we run a command in a sandbox, we call fork+exec to run the command
in the background so we can capture its output. If Rails prints
anything to stdout between the fork and exec calls, then it will be
inadvertently captured along with the command's output. This will break
parsing of the command's output. This can happen if warning messages are
printed by Rails while setting up the sandbox between the fork and exec
calls.

Writing to stderr is also more correct, since stdout is buffered by
default, which means logs could potentially be lost if the process dies
unexpectedly before the buffers are flushed. Stderr is unbuffered by
default, which means logs will always be output immediately.
2021-11-11 09:20:57 -06:00
evazion
a2a4ab887d newrelic: insert browser timing header manually.
Insert the <script> tag that monitors browser timing into the <head>
manually. This is to avoid this error:

    Skipping RUM instrumentation. Unable to find <body> tag in first 50000 bytes of document.

See also https://docs.newrelic.com/docs/agents/ruby-agent/features/new-relic-browser-ruby-agent/#manual_instrumentation
2021-09-27 00:46:13 -05:00
evazion
1f77e6980a puma: disable request timeout in development. 2021-09-26 23:11:12 -05:00
evazion
ae7d964bf1 MediaFile: replace APNGInspector with ExifTool.
Replace our own handwritten APNG parser with ExifTool. This makes
ExifTool a hard requirement for handling APNGs.
2021-09-21 07:47:45 -05:00
evazion
5995571885 clockwork: add heartbeat task.
Add a cron job that touches a file every minute so we can be sure
clockwork (the cronjob daemon) is still running.
2021-09-20 01:32:12 -05:00
evazion
4cc8dd41ec puma: add rack-timeout gem.
Unlike Unicorn, Puma doesn't have a builtin HTTP request timeout
mechanism, so we have to use Rack::Timeout instead.

See the caveats in the Rack::Timeout documentation [1]. In Unicorn, a
timeout would send a SIGKILL to the worker, immediately killing it. This
would result in a dropped connection and a Cloudflare 502 error to the
user. In Puma, it raises an exception, which we can catch and return a
better error to the user. On the other hand, raising an exception can
potentially corrupt application state if it's sent at the wrong time, or
be delayed indefinitely if the app is stuck in IO or C extension code.

The default request timeout is 65 seconds. 65 seconds is to give things
like HTTP requests on a 60 second timeout enough time to complete. Set
the RACK_REQUEST_TIMEOUT environment variable to change the timeout.

1: https://github.com/sharpstone/rack-timeout#further-documentation
2021-09-12 09:32:12 -05:00
evazion
540a3e111a Replace streamio-ffmpeg library.
Replace the streamio-ffmpeg library with our own very thin FFmpeg wrapper.
2021-09-05 06:54:56 -05:00
evazion
ef28576673 Fix #3400: Smarter thumbnail generation for videos 2021-09-05 06:10:18 -05:00
evazion
8f24e789b6 newrelic: fix crash during bootup caused by Rails.logger.
Using `Rails.logger` here causes server boot to fail with a `Undefined
method 'tagged'` error, possibly because `Rails.logger` isn't ready yet
during early initialization.
2021-08-15 02:16:57 -05:00
evazion
0563ca3001 docs: document config/ and some directories in app/.
* Add README files to several directories in app/ giving a brief
  overview of some parts of Danbooru's architecture.
* Add documentation for files in config/.
2021-06-27 05:21:38 -05:00
evazion
f65f24be0b docker: add cron service to compose file. 2021-05-25 01:16:59 -05:00
evazion
4439293bf1 newrelic: fix newrelic starting without license key.
Fix an issue where the New Relic agent always started in the production
environment, even when a license key wasn't configured.

Also make the New Relic agent log to stdout instead of log/newrelic_agent.log.
2021-05-24 21:58:01 -05:00
evazion
a062c040cb saved searches: fail gracefully when Redis is disabled.
Just make saved searches return nothing when Redis is disabled.
2021-03-30 05:35:42 -05:00
evazion
803efe8501 Don't use secure cookes on non-HTTPS deployments.
Fixes not being able to login or signup when running in production mode
on a non-HTTPS site.
2021-03-30 03:58:34 -05:00
evazion
12436c4aa9 Fix IpAddressType autoload warning.
Fix Rails complaining about IpAddressType not being reloaded by hot
reloading:

    DEPRECATION WARNING: Initialization autoloaded the constant IpAddressType.

    Being able to do this is deprecated. Autoloading during initialization is going
    to be an error condition in future versions of Rails.

    Reloading does not reboot the application, and therefore code executed during
    initialization does not run again. So, if you reload IpAddressType, for example,
    the expected changes won't be reflected in that stale Class object.

    This autoloaded constant has been unloaded.

    In order to autoload safely at boot time, please wrap your code in a reloader
    callback this way:

        Rails.application.reloader.to_prepare do
        # Autoload classes and modules needed at boot time here.
        end

    That block runs when the application boots, and every time there is a reload.
    For historical reasons, it may run twice, so it has to be idempotent.

    Check the "Autoloading and Reloading Constants" guide to learn more about how
    Rails autoloads and reloads.
2021-03-29 03:01:02 -05:00
evazion
b79bd8407f Remove FalseClass#to_i core extension.
Remove a monkey patch that added a `to_i` method to `FalseClass` so that
`false.to_i` returned 0. This is legacy code that shouldn't still be in
use anywhere. It doesn't really work anyway, because `true.to_i` isn't
defined.
2021-03-11 17:34:05 -06:00
evazion
35a0c6b11f Fix #4736: Display network prefix length (if present) in API key IP whitelist. 2021-03-01 02:38:18 -06:00
evazion
cde76e66f6 forms: fix form validation error messages.
* Fix it so that all edit forms show an error banner if the form
  has validation errors. Previously forms had to manually call
  `error_messages_for`, which not all forms did.

* Fix it so that the full validation error message is shown next to each
  input attribute that had errors. Also update the styling of these
  error messages to look better.
2021-02-22 02:38:26 -06:00
evazion
d18dc573fb artists: fix misnormalization of emoji in other names.
Fix `normalize_whitespace` to not strip zero-width joiner characters
(U+200D). These characters are used in emoji and stripping them breaks
some artist other names that use emoji.
2021-01-10 02:46:20 -06:00
evazion
7762489d7d user upgrades: upgrade to new Stripe checkout system.
This upgrades from the legacy version of Stripe's checkout system to the
new version:

> The legacy version of Checkout presented customers with a modal dialog
> that collected card information, and returned a token or a source to
> your website. In contrast, the new version of Checkout is a smart
> payment page hosted by Stripe that creates payments or subscriptions. It
> supports Apple Pay, Dynamic 3D Secure, and many other features.

Basic overview of the new system:

* We send the user to a checkout page on Stripe.
* Stripe collects payment and sends us a webhook notification when the
  order is complete.
* We receive the webhook notification and upgrade the user.

Docs:

* https://stripe.com/docs/payments/checkout
* https://stripe.com/docs/payments/checkout/migration#client-products
* https://stripe.com/docs/payments/handling-payment-events
* https://stripe.com/docs/payments/checkout/fulfill-orders
2020-12-24 19:58:29 -06:00
evazion
f3880569e1 rails: update settings to 6.1 defaults.
Most of the new settings aren't relevant to us. We do have to fix some
tests to work around a Rails bug. `assert_enqueued_email_with` uses the
wrong queue, so we have to specify it explicitly. This is fixed in Rails
HEAD but not yet released.
2020-12-21 22:42:50 -06:00
evazion
906430b983 config: add option for customizing session cookie name.
Fixes getting logged out when you visited Testbooru because of
Testbooru's session cookies clobbering Danbooru's session cookies.
2020-12-21 22:42:50 -06:00
evazion
efb836ac02 wikis: normalize Unicode characters in wiki bodies.
* Introduce an abstraction for normalizing attributes. Very loosely
  modeled after https://github.com/fnando/normalize_attributes.
* Normalize wiki bodies to Unicode NFC form.
* Normalize Unicode space characters in wiki bodies (strip zero width
  spaces, normalize line endings to CRLF, normalize Unicode spaces to
  ASCII spaces).
* Trim spaces from the start and end of wiki page bodies. This may cause
  wiki page diffs to show spaces being removed even when the user didn't
  explicitly remove the spaces themselves.
2020-12-21 20:47:50 -06:00
evazion
6849a3d68b Update app files to Rails 6.1 defaults. 2020-12-19 00:26:27 -06:00
evazion
2c1da660fd tags: allow tag abbreviations in searches and during tagging.
Expand the tag abbreviation system introduced in b0be8ae45 so that it
works in searches and when tagging posts, not just in autocomplete.

For example, you can tag a post with /evth and it will add the tag
eyebrows_visible_through_hair. You can search for /evth and it will
search for the tag eyebrows_visible_through_hair.

Some more examples:

* /ops is short for one-piece_swimsuit
* /hooe is short for hair_over_one_eye
* /saol is short for standing_on_one_leg
* /tlozbotw is short for the_legend_of_zelda:_breath_of_the_wild

If two tags have the same abbreviation, then the larger tag takes
precedence. For example, /be is short for blue_eyes, not brown_eyes,
because blue_eyes is the bigger tag.

If there is an existing shortcut alias that conflicts with the
abbreviation, then the alias take precedence. For example, /sh is short
for suzumiya_haruhi, not short_hair, because there's an old alias for
/sh -> suzumiya_haruhi.
2020-12-17 23:57:13 -06:00
evazion
3a3d456bd2 html: standardize font sizes and heading tags.
Standardize font sizes and heading tags (<h1>-<h6>) to be more
consistent across the site.

Changes:

* Introduce font size CSS variables and start replacing hardcoded font
  sizes with standard sizes.
* Change header tags to use only one <h1> per page. One <h1> per page is
  recommended for SEO purposes. Usually this is for the page title, like
  in forum threads or wiki pages.
* Standardize on <h2> for section headers in sidebars and <h3> for
  smaller subsection headers. Don't use <h4>-<h6>.
* In DText, make h1-h4 headers all the same size. Standard wiki style is
  to ignore h1-h3 and start at h4.
* In DText, make h4-h6 the same size as the h1-h3 tags outside of DText.
* In the tag list, change the <h1> and <h2> tag category headers to <h3>.
* Make usernames in comments and forum posts smaller. Also change the
  <h4> tag for the commenter name to <div class="author-name">.
* Make the tag list, paginator, and nav menu smaller on mobile.
* Change h1#app-name-header to a#app-name-header.
2020-07-23 17:34:17 -05:00
evazion
42f0112c38 seo: increase sitemap coverage.
Rework sitemaps to provide more coverage of the site. We want every
important page on the site - including every post, tag, and wiki page -
to be indexed by Google. We do this by generating sitemaps and sitemap
indexes that contain links to every important page on the site.
2020-07-10 00:18:30 -05:00
evazion
048bc7faf5 Revert "simple form: enable HTML5 maxlength validations."
This incorrectly added maxlength=4 to parent id fields.

This reverts commit 5e23861bea.
2020-06-30 13:03:51 -05:00
evazion
804a2ef9a5 Fix #4535: Comment edit forms contain duplicate IDs.
Prefix comment form IDs with `post_<id>_comment_<id>` to ensure
uniqueness.
2020-06-26 14:57:19 -05:00
evazion
5e23861bea simple form: enable HTML5 maxlength validations.
Makes it so that models that have maximum length validations will add
maxlength attributes to form fields. This includes flag reasons, appeal
reasons, and forum topic titles.

Partially fixes #4519 (Add "n/m characters remaining" character counter to the appeal reason).

https://developer.mozilla.org/en-US/docs/Web/HTML/Attributes/maxlength
2020-06-25 16:28:18 -05:00
evazion
883856d4af simple form: refactor DText form fields to use SimpleForm.
* Refactors DText form fields to use a custom SimpleForm input instead
  of manually generated html. This fixes it so that DText fields use the
  same markup as normal SimpleForm fields, which lets us apply browser
  maxlength validations to DText input fields.

* Fixes autocomplete for @-mentions only working in comments and forum posts.
  Now @-mention autocomplete works in all DText fields, including dmails.
  Known bug: it applies in artist commentary fields when it shouldn't.
2020-06-25 16:28:09 -05:00