Commit Graph

56 Commits

Author SHA1 Message Date
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
26fe3e26e0 tests: silence output from pg_amcheck.
Fix pg_amcheck flooding the test suite output in Github.
2021-10-27 04:59:21 -05:00
evazion
6d2ce5c8c1 saved searches: lower job priority.
Lower the priority of the populate saved search job. This is so that
large numbers of saved searches don't overwhelm the job queue and
prevent higher priority jobs from running.
2021-10-24 22:56:59 -05:00
evazion
8d5e0a5b58 replacements: don't delete replaced files.
Don't delete replaced files after 30 days. There are only about 30k
replacements in total, so the cost of keeping replaced files is
negligible. It was also wrong because the media asset wasn't destroyed
too, so there were active media assets with missing files.
2021-10-24 04:35:13 -05:00
evazion
5e8c91700c tests: fix amcheck job tests. 2021-10-13 04:19:44 -05:00
evazion
1653392361 posts: stop updating fav_string attribute.
Stop updating the fav_string attribute on posts. The column still exists
on the table, but is no longer used or updated.

Like the pool_string in 7d503f08, the fav_string was used in the past to
facilitate `fav:X` searches. Posts had a hidden fav_string column that
contained a list of every user who favorited the post. These were
treated like fake hidden tags on the post so that a search for `fav:X`
was treated like a tag search.

The fav_string attribute has been unused for search purposes for a while
now. It was only kept because of technicalities that required
departitioning the favorites table first (340e1008e) before it could be
removed. Basically, removing favorites with `@favorite.destroy` was
slow because Rails always deletes object by ID, but we didn't have an
index on favorites.id, and we couldn't easily add one until the
favorites table was departitioned.

Fixes #4652. See https://github.com/danbooru/danbooru/issues/4652#issuecomment-754993802
for more discussion of issues caused by the fav_string (in short: write
amplification, post table bloat, and favorite inconsistency problems).
2021-10-09 22:36:26 -05:00
evazion
950bc608c2 maintenance: add job to check for database corruption.
Add a job to run pg_amcheck hourly to check for corrupt database indexes.

https://www.postgresql.org/docs/14/app-pgamcheck.html
2021-10-06 08:08:52 -05:00
evazion
ec9e844ab3 jobs: disable timeouts by default for all jobs.
Fix a regression in 52bf4a3a6 that caused certain jobs to timeout.
2021-09-27 09:17:50 -05:00
evazion
52bf4a3a6b maintenance: break maintenance tasks into individual jobs.
Break the hourly/daily/weekly/monthly maintenance tasks down into
individual delayed jobs. This way if one task fails, it won't prevent
other tasks from running. Also, jobs can be run in parallel, and can be
individually retried if they fail.
2021-09-26 20:38:30 -05:00
evazion
c6bf3e7934 BURs: don't automatically retry failed BURs.
If a bulk update job fails, don't automatically retry it. Retrying it
will clobber the original error message if it fails again.
2021-09-20 16:33:23 -05:00
evazion
9ba84efc07 BURs: process BURs sequentially in a single job.
Change the way BURs are processed. Before, we spawned a background job
for each line of the BUR, then processed each job sequentially. Now, we
process the entire BUR sequentially in a single background job.

This means that:

* BURs are truly sequential now. Before certain things like removing
  aliases weren't actually performed in a background job, so they were
  performed out-of-order before everything else in the BUR.

* Before, if an alias or implication line failed, then subsequent alias
  or implication lines would still be processed. This was because each
  alias or implication line was queued as a separate job, so a failure
  of one job didn't block another. Now, if any alias or implication
  fails, the entire BUR will fail and stop processing after that line.
  This may be good or bad, depending on whether we actually need the BUR
  to be processed in order or not.

* Before, BURs were processed inside a database transaction (except for the
  actual updating of posts). Now they're not. This is because we can't
  afford to hold transactions open while processing long-running aliases
  or implications. This means that if BUR fails in the middle when it is
  initially approved, it will be left in a half-complete state. Before
  it would be rolled back and left in a pending state with no changes
  performed.

* Before, only one BUR at a time could be processed. If multiple BURs
  were approved at the same time, then they would queue up and be
  processed one at a time. Now, multiple BURs can be processed at the
  same time. This may be undesirable when processing large BURs, or BURs
  that must be approved in a specific order.

* Before, large tag category changes could time out. This was because
  they weren't actually performed in a background job. Now they are, so
  they shouldn't time out.
2021-09-20 01:12:14 -05:00
evazion
ad4c75eb1a docs add more docs to app/{jobs,logical}.
These were missed in the last commit.
2021-06-28 05:09:19 -05:00
evazion
0563ca3001 docs: document config/ and some directories in app/.
* Add README files to several directories in app/ giving a brief
  overview of some parts of Danbooru's architecture.
* Add documentation for files in config/.
2021-06-27 05:21:38 -05:00
evazion
00ca7526bb docs: add remaining docs for classes in app/logical. 2021-06-24 01:31:41 -05:00
evazion
e5cfb7904c CurrentUser: remove #as method.
Replace with CurrentUser#scoped.
2021-06-22 23:39:30 -05:00
evazion
07e23204b6 rubocop: fix various Rubocop warnings. 2021-06-17 04:17:53 -05:00
evazion
0f36bbf8d3 iqdb: update API client to use new version of IQDB.
Replace the old IQDB API client with a new client for the new forked
version of IQDB at https://github.com/danbooru/iqdb.

Changes:

* The /iqdb_queries endpoint now returns `hash` and `signature` fields.
  The `signature` is the full decoded Haar signature, while the `hash`
  is a encoded version of the signature.
* The /iqdb_queries endpoint no longer returns `width` and `height`
  fields in the response (these were always 128x128).
* We no longer need the IQDBs frontend server, now we talk to the IQDB
  instance directly.
* We no longer send add/remove image commands to IQDB through AWS SQS,
  now we send them to IQDB directly. They are sent in a delayed job so
  that if IQDB is down, uploading images is still possible, the add
  image commands will just get queued up.
* Fix a bug where regenerating an image's thumbnails didn't regenerate
  IQDB, because IQDB silently ignored add image commands when the image
  already existed in the database.
2021-06-16 05:36:24 -05:00
evazion
f235b72b3f Export public database dumps to BigQuery.
* Export daily public database dumps to BigQuery and Google Cloud Storage.
* Only data visible to anonymous users is exported. Some tables have
  null or missing fields because of this.
* The bans table is excluded because some bans have an expires_at
  timestamp set beyond year 9999, which BigQuery doesn't support.
* The favorites table is excluded because it's too slow to dump (it
  doesn't have an id index, which is needed by find_each).
* Version tables are excluded because dumping them every day is
  inefficient, streaming insertions should be used instead.

Links:

* https://console.cloud.google.com/bigquery?project=danbooru1
* https://console.cloud.google.com/storage/browser/danbooru_public
* https://storage.googleapis.com/danbooru_public/data/posts.json
2021-03-10 02:52:16 -06:00
evazion
b63d8207a9 forum: automatically post new forum posts to Discord. 2021-02-18 07:08:45 -06:00
evazion
b6f9c9a866 post regenerations: regenerate posts asynchronously.
Regenerate posts asynchronously using a delayed job.

Regenerating a post can be slow because it involves downloading the
original file, regenerating the thumbnails, and redistributing the new
thumbnails back to the image servers. It's better to run this in the
background, especially if a user is trying to regenerate posts in bulk.

The downside is there's no notification to the user when the regeneration
is complete. You have to check the modactions log to see when it's finished.
2021-01-04 21:43:27 -06:00
evazion
9e37f5a588 BURs: don't log mod actions for aliases/implications/mass updates.
Don't log mod actions when aliases, implications, or mass updates are
processed.

Originally aliases and implications were logged because they could be
approved outside of a BUR. Mass updates could also be performed by mods
without making a forum request. This is no longer the case.

They were also logged for debugging purposes. This is no longer needed.
This generated a lot of spam in the mod action logs when a large BUR was
approved.
2020-12-02 12:20:28 -06:00
evazion
4741a52cc4 aliases/implications: remove 'error' state.
Remove the error status from aliases and implications. Aliases and
implications normally shouldn't fail because they're validated
beforehand. If they do, just let the delayed job itself record the
failure.

Also disable the delayed job from retrying if the alias/implication
somehow fails.
2020-12-01 18:58:45 -06:00
evazion
8717c319ab aliases/implications: remove 'pending' state.
Remove the pending status from tag aliases and implications.

Previously aliases would be created first in the pending state then
changed to active when the alias was later processed in a delayed job.
This meant that BURs weren't processed completely sequentially; first
all the aliases in a BUR would be created in one go, then later they
would be processed and set to active sequentially.

This was problematic in complex BURs that tried to reverse or swap
around aliases, since new pending aliases could be created before old
conflicting aliases were removed.
2020-12-01 18:58:45 -06:00
evazion
9a287cd71f Fix #4483: Wrong order for BUR caused 12k mistags.
Bug: if a BUR contained a mass update followed by an alias, then the
alias would become active before the mass update, which could cause
the mass update to return incorrect results if both the alias and mass
update touched the same tags.

This happened because all aliases and implications in the BUR were set
to a queued state before the mass update was processed, but putting an
alias in the queued state effectively made it active.

The fix is to remove the queued state. This was only used anyway as a
debugging tool anyway to monitor the state of BURs as they were being
processed.
2020-11-12 16:09:56 -06:00
evazion
23944a1794 Fix #4491: Have tag rename option for bulk update requests.
* Add a `rename A -> B` command for bulk update requests.
* Change mass updates to only retag the posts, not to move saved
  searches or blacklists.

A tag rename does the same thing an alias does, except it doesn't
create a permanent alias. More precisely, a tag rename:

* Moves the wiki.
* Moves the artist entry.
* Moves saved searches.
* Moves blacklists.
* Merges the wikis, if both tags have wiki pages.
* Merges the artist entries, if both tags have artist pages.
* Fixes links in wiki pages to point to the new tag.
* Retags the posts.
2020-08-26 19:53:04 -05:00
evazion
67aab0236d search: apply aliases after parsing searches.
Make PostQueryBuilder apply aliases earlier, immediately after parsing
the search.

On the post index page there are multiple places where we need to apply
aliases:

* When running the search with PostQueryBuilder#build.
* When calculating the search count with PostQueryBuilder#fast_count.
* When calculating the related tags for the sidebar.
* When tracking missed searches and popular searches for Reportbooru.
* When looking up wiki excerpts.

Applying aliases after parsing ensures we only have to apply aliases
once for all of these things.

We also normalize the order of tags in searches and strip repeated tags.
This is so that we have consistent cache keys for fast_count.

* Fixes searches for aliased tags being counted as missed searches (fixes #4433).
* Fixes wiki excerpts not showing up when searching for aliased tags.
2020-05-07 13:53:35 -05:00
evazion
f38c38f26e search: split tag_match into user_tag_match / system_tag_match.
When doing a tag search, we have to be careful about which user we're
running the search as because the results depend on the current user.
Specifically, things like private favorites, private favorite groups,
post votes, saved searches, and flagger names depend on the user's
permissions, and whether non-safe or deleted posts are filtered out
depend on whether the user has safe mode on or the hide deleted posts
setting enabled.

* Refactor internal searches to explicitly state whether they're
  running as the system user (DanbooruBot) or as the current user.
* Explicitly pass in the current user to PostQueryBuilder instead of
  implicitly relying on the CurrentUser global.
* Get rid of CurrentUser.admin_mode? (used to ignore the hide deleted
  post setting) and CurrentUser.without_safe_mode (used to ignore safe
  mode).
* Change the /counts/posts.json endpoint to ignore safe mode and the
  hide deleted posts settings when counting posts.
* Fix searches not correctly overriding the hide deleted posts setting
  when multiple status: metatags were used (e.g. `status:banned status:active`)
* Fix fast_count not respecting the hide deleted posts setting when the
  status:banned metatag was used.
2020-05-07 03:29:44 -05:00
evazion
3dab648d0e search: refactor PostQueryBuilder class methods into instance methods.
* Make scan_query, parse_query, normalize_query into instance methods
  instead of class methods. This is to a) clean up the API and b)
  prepare for moving certain tag utility methods into PostQueryBuilder.

* Fix a few cases where a caller used scan_query when they should have
  used split_query or parse_tag_edit.
2020-04-22 19:38:17 -05:00
evazion
7726563733 search: refactor scan_query callers to use split_query.
Refactor to use split_query instead of scan_query to split a query on
spaces. Preparation for refactoring scan_query into something smarter.
2020-04-19 02:54:44 -05:00
evazion
ab1839c613 uploads: fix exception with preprocessed uploads.
Fix exception when submitting an upload and an in-progress preprocessed
upload already exists. In this case we forgot to pass the upload params
when calling UploadService#delayed_start.
2020-03-31 21:57:34 -05:00
evazion
08ce5a71c4 Eliminate various dead code. 2020-03-31 21:57:34 -05:00
evazion
0e7632ed8a aliases/implications: remove forum topic updating code.
Remove code for updating forum topics when an alias or implication is
approved or rejected. This code was only used when approving single
alias or implication requests. This is no longer used now that all
alias/implication requests are done through BURs.
2020-03-10 20:55:20 -05:00
evazion
967d398c8e search: move query parsing code from tag model to post query builder. 2020-03-06 23:23:38 -06:00
evazion
5817af4014 burs/show: remove BUR update count estimate.
Remove the post update count estimate from BUR show pages. This was
complex, slow, and usually inaccurate since it assumed that requests in
a BUR had no overlap with each other, which usually wasn't the case.
2020-02-16 19:21:56 -06:00
evazion
22cb0ea322 models: replace raw LIKE queries with where_like. 2020-01-22 13:21:31 -06:00
evazion
153a8339ab Inherit errors from StandardError instead of Exception. 2020-01-11 19:07:28 -06:00
evazion
3bcc503cf7 BURs: process bulk updates in dedicated job queue.
Add a dedicated queue for bulk update requests and process it using a
single worker. This prevents bulk updates from consuming all available
workers and preventing other job types from running.

This also effectively serializes bulk updates so that they're processed
one-at-a-time instead of in parallel. This will be slower overall but
may avoid some of the issues with indeterminate update order under
parallel updates.
2019-12-27 12:24:51 -06:00
evazion
06e12973e2 BURs: lock posts when updating tags during bulk updates.
Fixes updates sometimes getting clobbered when multiple aliases or mass
updates tried to update the same post at the same time.
2019-12-27 12:11:42 -06:00
evazion
309821bf73 rubocop: fix various style issues. 2019-12-22 21:23:37 -06:00
evazion
2c6567b5d2 Remove uses of the read replica database.
https://danbooru.donmai.us/forum_topics/9127?page=283#forum_post_160508

There was a recent outage that was caused by the read replica
(yukinoshita.donmai.us) being temporarily unavailable. The pg driver in
rails got hardstuck trying to connect to the replica, which brought down
the whole site. The app servers stopped responding and could only be
brought down with SIGKILL. Even try to boot the rails console didn't
work.

We only really used this to calculate tag counts inside Post.fast_count,
which wasn't really beneficial since the read replica is slower than the
main database.
2019-10-22 12:15:46 -05:00
evazion
f2dccf8cf1 Remove mod-only bulk revert system (#4178).
The mass undo system added in #4178 is a replacement for the mod-only
bulk revert system.
2019-09-27 21:12:53 -05:00
evazion
bc34fb16a4 tags: automatically fix incorrect tag counts during maintenance.
* Automatically fix all tags with incorrect counts during daily
  maintenance (previously only tags with negative counts were fixed).
* Log fixed tags to NewRelic.
* Remove the ability to manually fix tag counts with the "Fix" button on
  the /tags listing. This is no longer necessary now that tags are
  fixed automatically.
2019-09-25 17:57:11 -05:00
evazion
0a6661d145 uploads: switch to active job.
* Switch upload processing from DelayedJob to ActiveJob.
* Remove remaining references to delayed job from tests.

Closes #4128.
2019-09-23 15:11:18 -05:00
evazion
763ac1a7e0 pools: stop maintaining pool category pseudotags in pool strings (#4160)
Stop maintaining pool category pseudo tags (pool:series, pool:collection)
in pool strings. They're no longer used and the changes to the
`Post#pools` method in dc4d2e54b caused issues with this.

Also allow Members to change the category of large pools again. This was
only restricted because maintaining these pseudotags forced us to update
every post in the pool whenever a pool's category was changed.
2019-09-08 23:28:02 -05:00
evazion
a932b25608 Fix #4142: Missing images after upload. 2019-09-01 13:10:37 -05:00
evazion
6dd331745a Rewrite related tags implementation.
Rewrite the implementation of related tags to be simpler, faster, and
more accurate:

* The related tags are now calculated by taking a random sample of 1000
  posts, finding the top 250 most frequent tags among those posts, then
  ordering those tags by cosine similarity.

* Related tags can generally be calculated in 50-300ms at these sample
  sizes. Very high sample sizes (25000+ posts) are still relatively fast
  (1-3 seconds), but generally they don't improve accuracy much.

* Related tags are now cached in redis rather than in the tags table.
  The related_tags column in the tags table is no longer used.

* Only the related tags in the search taglist are cached. The related
  tags returned by the 'Related tags' button are not cached.

* The cache lifetime is a fixed 4 hours.

* The 'Related tags' button now works with metatags.

* The /related_tag page now works with metatags and multitag searches.

Fixes #4134, #4146.
2019-08-30 20:03:36 -05:00
evazion
dab43d96c9 jobs: migrate mass updates to ActiveJob.
Also fixes a bug where mod actions weren't logged on mass updates.
Creating the mod action silently failed because it was called when
CurrentUser wasn' set.
2019-08-19 00:46:31 -05:00
evazion
48488d04db jobs: migrate bulk reverts to ActiveJob. 2019-08-19 00:46:30 -05:00
evazion
ddcc7eddfb jobs: migrate fixing post counts to ActiveJob. 2019-08-19 00:46:30 -05:00
evazion
868a2256d1 jobs: migrate file deletion jobs to ActiveJob. 2019-08-16 20:49:35 -05:00