Mark old columns as ignored in preparation for dropping them. Make the
rating and tag_string nullable so they don't have to be set when
creating uploads and can be ignored too.
Add a join table that allows multiple media assets (images or videos) to
be attached to uploads. This is for a future ability to upload multiple
files at once.
* Add ability to mark moderation reports as 'handled' or 'rejected'.
* Automatically mark reports as handled when the comment or forum post
is deleted.
* Send a dmail to the reporter when their report is handled.
* Don't show the report notice on comments or forum posts when all
reports against it have been handled or rejected.
* Add a fix script to mark all existing reports for deleted comments,
forum posts, or dmails as handled.
Add foreign key constraints on all foreign keys on all tables.
These constraints are deferrable so that they're checked at the end of
the transaction, rather at the end of the statement. This is to reduce
lock duration and to allow for cyclic relationships.
Constraints are added in one migration then validated in another so that
the entire table isn't locked against reads and writes while the foreign
key constraints are being validated.
A few tables had invalid foreign keys. Add a fix script to fix these tables:
* A couple artist versions belonged to deleted artists.
* One dmail belonged to a deleted user.
* One forum topic visit belonged to that same deleted user.
* A few dozen note versions belonged to nonexistent posts. This came
from RaisingK moving notes to different posts years ago, back when it
was possible for users to set a note's post ID in the API.
* Some uploads had their parent ID set to 0.
is_note_locked, is_rating_locked, and is_status_locked have been unused
since 126046cb6.
tag_index has been unused since 37a8dc5db.
fav_string has been unused since 165339236.
pool_string has been unused since 7d503f088.
Remove the DelayedJobs gem and database table. Completes the transition
to GoodJob started in c06bfa64f and f4953549a.
Downstream users can upgrade as follows:
* Stop the Rails server.
* Stop the DelayedJobs worker (normally running as `bin/delayed_job` or `bin/rails jobs:work`).
* Run `bin/rails jobs:work` to finish any pending delayed jobs.
* Run `bin/rails db:migrate` to create the good_jobs table and drop the delayed_jobs table.
* Start the Rails server again.
* Start the GoodJobs worker with `bin/good_job start`.
Switch the ActiveJob backend from DelayedJob to GoodJob. Differences:
* The job worker is run with `bin/good_job start` instead of `bin/delayed_job`.
* Jobs have an 8 hour timeout instead of a 4 hour timeout.
* Jobs don't automatically retry on failure.
* Finishing jobs are preserved and pruned after 7 days.
This is the first step towards replacing DelayedJob with GoodJob. Compared to
DelayedJob:
* GoodJob supports Rails 7 (DelayedJob is currently a blocker for Rails 7
because it has a version bound on ActiveRecord <6.2).
* GoodJob has a builtin admin dashboard.
* GoodJob supports threaded job workers.
* GoodJob supports scheduled cronjobs.
* GoodJob supports healthchecks for workers.
* GoodJob uses Postgres notifications instead of polling to pick up new
jobs. This allows jobs to be picked up faster and scales better with
large numbers of workers.
https://github.com/bensheldon/good_job
Add a md5 uniqueness constraint on media assets to prevent duplicate
assets from being created. This way we can guarantee that there is one
active media asset per uploaded file.
Also make it so that if two people are uploading the same file at the
same time, the file is processed only once.
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
the MediaAsset is created, not when the post is created. This way it's
possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
with the ugoira frame data already attached to it, instead of returning it
in the `data` field then passing it around separately in the `context`
field of the upload.
Refactor full-text search on several tables (comments, dmails,
forum_posts, forum_topics, notes, and wiki_pages) to use to_tsvector
expression indexes instead of dedicated tsvector columns. This way
full-text search works the same way across all tables.
API changes:
* Changed /wiki_pages.json?search[body_matches] to match against only
the body. Before `body_matches` matched against both the title and the body.
* Added /wiki_pages.json?search[title_or_body_matches] to match against
both the title and the body.
* Fixed /dmails.json?search[message_matches] to match against both the
title and body when doing a wildcard search. Before a wildcard search
only matched against the body.
* Added /dmails.json?search[body_matches] to match against only the dmail body.
Drop the final dependency on the Postgres test_parser extension.
We also have to remove references to test_parser in the migration where
it was first defined, otherwise replaying all migrations from the
beginning will fail. Replaying all migrations from the beginning
normally isn't done except in testing.
After this, it should be possible to use a vanilla install of Postgres
with Danbooru. It's still recommended to use Danbooru's Docker image for
Postgres (https://ghcr.io/danbooru/postgres), as other Postgres extensions
may be necessary in the future.
Change the wiki_pages tsvector_update_trigger to use
`pg_catalog.english` instead of `public.danbooru`. This changes how wiki
page text is parsed for full-text search to use the standard English
parser instead of test_parser. This is to prepare for dropping
test_parser. Using test_parser here was wrong anyway because it meant
that punctuation wasn't removed from words when indexing wiki pages for
full-text search.
Stop updating the fav_string attribute on posts. The column still exists
on the table, but is no longer used or updated.
Like the pool_string in 7d503f08, the fav_string was used in the past to
facilitate `fav:X` searches. Posts had a hidden fav_string column that
contained a list of every user who favorited the post. These were
treated like fake hidden tags on the post so that a search for `fav:X`
was treated like a tag search.
The fav_string attribute has been unused for search purposes for a while
now. It was only kept because of technicalities that required
departitioning the favorites table first (340e1008e) before it could be
removed. Basically, removing favorites with `@favorite.destroy` was
slow because Rails always deletes object by ID, but we didn't have an
index on favorites.id, and we couldn't easily add one until the
favorites table was departitioned.
Fixes#4652. See https://github.com/danbooru/danbooru/issues/4652#issuecomment-754993802
for more discussion of issues caused by the fav_string (in short: write
amplification, post table bloat, and favorite inconsistency problems).
Merge the 100 favorite subtables into a single table.
Previously the favorites table was partitioned by user id into 100
subtables to try to make searching by user id faster. This wasn't really
necessary and probably slower than just making an index on
(favorites.user_id, favorites.id) to satisfy ordfav searches. BTree
indexes are logarithmic so dividing an index by 100 doesn't make it 100
times faster to search; instead it just removes a layer or two from the
tree.
This also adds a uniqueness index on (user_id, post_id) to prevent
duplicate favorites. Previously we had to check for duplicates at the
application layer, which required careful locking to do it correctly.
Finally, this adds an index on favorites.id, which was surprisingly
missing before. This made ordering and deleting favorites by id really
slow because it degraded to a sequential scan.
Add a model for storing image and video metadata for uploaded files.
Metadata is extracted using ExifTool. You will need to install ExifTool
after this commit. ExifTool 12.22 is the minimum required version
because we use the `--binary` option, which was added in this release.
The MediaMetadata model is separate from the MediaAsset model because
some files contain tons of metadata, and most of it is non-essential.
The MediaAsset model represents an uploaded file and contains essential
metadata, like the file's size and type, while the MediaMetadata model
represents all the other non-essential metadata associated with a file.
Metadata is stored as a JSON column in the database.
ExifTool returns all the file's metadata, not just the EXIF metadata.
EXIF is one of several types of image metadata, hence why we call
it MediaMetadata instead of EXIFMetadata.
A MediaAsset represents an image or video file uploaded to Danbooru. It
stores the metadata associated with the image or video. This is to work
on decoupling files from posts so that images can be uploaded separately
from posts.
Make it so that when a user removes their own vote, the vote is soft
deleted (the is_deleted flag is set) instead of hard deleted.
Changes:
* Add is_deleted flag to comment votes.
* Relax uniqueness constraint so you can have multiple deleted votes on
the same comment. You can still only have one active vote on the comment.
* Add `soft_delete` method to Deletable concern.
Changes:
* Change the `expires_at` field to `duration`.
* Make moderators choose from a fixed set of standard ban lengths,
instead of allowing arbitrary ban lengths.
* List `duration` in seconds in the /bans.json API.
* Dump bans to BigQuery.
Note that some old bans have a negative duration. This is because their
expiration date was before their creation date, which is because in 2013
bans were migrated to Danbooru 2 and the original ban creation dates
were lost.
Rework the rate limit implementation to make it more flexible:
* Allow setting different rate limits for different actions. Before we
had a single rate limit for all write actions. Now different
controller endpoints can have different limits.
* Allow actions to be rate limited by user ID, by IP address, or both.
Before actions were only limited by user ID, which meant non-logged-in
actions like creating new accounts or attempting to login couldn't be rate
limited. Also, because actions were limited by user ID only, you could
use multiple accounts with the same IP to get around limits.
Other changes:
* Remove the API Limit field from user profile pages.
* Remove the `remaining_api_limit` field from the `/profile.json` endpoint.
* Rename the `X-Api-Limit` header to `X-Rate-Limit` and change it from a
number to a JSON object containing all the rate limit info
(including the refill rate, the burst factor, the cost of the call,
and the current limits).
* Fix a potential race condition where, if you flooded requests fast
enough, you could exceed the rate limit. This was because we checked
and updated the rate limit in two separate steps, which was racy;
simultaneous requests could pass the check before the update happened.
The new code uses some tricky SQL to check and update multiple limits
in a single statement.
Add the ability to restrict API keys so that they can only be used with
certain IP addresses or certain API endpoints.
Restricting your key is useful to limit damage in case it gets leaked or
stolen. For example, if your key is on a remote server and it gets
hacked, or if you accidentally check-in your key to Github.
Restricting your key's API permissions is useful if a third-party app or
script wants your key, but you don't want to give full access to your
account.
If you're an app or userscript developer, and your app needs an API key
from the user, you should only request a key with the minimum
permissions needed by your app.
If you have a privileged account, and you have scripts running under
your account, you are highly encouraged to restrict your key to limit
damage in case your key gets leaked or stolen.
Should improve performance for rating:e and rating:q searches. Rating:s
isn't isn't indexed because Postgres is unlikely to use the index for
rating:s searches (the selectivity is too low, ~77% of all posts are
rating:s).
Fix various minor inconsistencies between the production database schema
and the declared schema in db/structure.sql.
* tags.category was a smallint instead of an integer in production.
* The unique_schema_migrations index didn't exist outside production.
* The index_posts_on_tag_index index was called index_posts_on_tags_index
outside production.
* The posts.tag_index column didn't have a statistics target defined
outside production.
* ID sequences didn't have `AS integer` defined in production.
Specify the default settings for new users inside the User model instead
of inside the database. This makes it easier to change defaults, and it
makes the code clearer.
Optimize searches for non-English phrases in autocomplete. These
searches were pretty slow, and could sometimes cause sitewide lag spikes
when users typed long strings of non-English text into the search box
and caused an unintentional DoS.
The trick is to use an `array_to_tsvector(other_names) USING gin` index
on other_names. This supports fast string prefix matching against all
elements of the array. The downside is that it doesn't allow infix or
suffix matches, so we can't support wildcards in general. Wildcards
didn't quite work anyway, since artist and wiki other names can contain
literal '*' characters.
Add tracking of certain important user actions. These events include:
* Logins
* Logouts
* Failed login attempts
* Account creations
* Account deletions
* Password reset requests
* Password changes
* Email address changes
This is similar to the mod actions log, except for account activity
related to a single user.
The information tracked includes the user, the event type (login,
logout, etc), the timestamp, the user's IP address, IP geolocation
information, the user's browser user agent, and the user's session ID
from their session cookie. This information is visible to mods only.
This is done with three models. The UserEvent model tracks the event
type (login, logout, password change, etc) and the user. The UserEvent
is tied to a UserSession, which contains the user's IP address and
browser metadata. Finally, the IpGeolocation model contains the
geolocation information for IPs, including the city, country, ISP, and
whether the IP is a proxy.
This tracking will be used for a few purposes:
* Letting users view their account history, to detect things like logins
from unrecognized IPs, failed logins attempts, password changes, etc.
* Rate limiting failed login attempts.
* Detecting sockpuppet accounts using their login history.
* Detecting unauthorized account sharing.
In Danbooru 1, aliases (and implications) had a `reason` field where
either the admin or the alias requester gave a reason for the alias.
This field was removed from the code and the database schema, but it
still existed in the production database. This adds the field back, so
that the dev schema is consistent with the production schema, and so
that legacy reasons can be viewed on site again.
* Add back legacy tag_aliases.reason and tag_implications.reason field.
* Make /tag_aliases and /tag_implications show legacy reasons.
* Add the reason field to the search form.
* Remove the PostRegeneration model. Instead just use a mod action
to log when a post is regenerated.
* Change it so that IQDB is also updated when the image samples are
regenerated. This is necessary because when the images samples are
regenerated, the thumbnail may change, which means IQDB needs to be
updated too. This can happen when regenerating old images with
transparent backgrounds where the transparency was flattened to black
instead of white in the thumbnail.
* Only display one "Regenerate image" option in the post sidebar, to
regenerate both the images and IQDB. Regenerating IQDB only can be
done through the API. Having two options in the sidebar is too much
clutter, and it's too confusing for Mods who don't know the difference
between an IQDB-only regeneration and a full image regeneration.
* Add a confirm prompt to the "Regenerate image" link.
* Refactor to move upgrade logic from UserPromotion to UserUpgrade.
* Send the recipient and the purchaser of a gifted upgrade separate
dmail notifications.
Add a model to store the status of user upgrades.
* Store the upgrade purchaser and the upgrade receiver (these are
different for a gifted upgrade, the same for a self upgrade).
* Store the upgrade type: gold, platinum, or gold-to-platinum upgrades.
* Store the upgrade status:
** pending: User is still on the Stripe checkout page, no payment
received yet.
** processing: User has completed checkout, but the checkout status in
Stripe is still 'unpaid'.
** complete: We've received notification from Stripe that the payment
has gone through and the user has been upgraded.
* Store the Stripe checkout ID, to cross-reference the upgrade record on
Danbooru with the checkout record on Stripe.
This is the upgrade flow:
* When the user clicks the upgrade button on the upgrade page, we call
POST /user_upgrades and create a pending UserUpgrade.
* We redirect the user to the checkout page on Stripe.
* When the user completes checkout on Stripe, Stripe sends us a webhook
notification at POST /webhooks/receive.
* When we receive the webhook, we check the payment status, and if it's
paid we mark the UserUpgrade as complete and upgrade the user.
* After Stripe sees that we have successfully processed the webhook,
they redirect the user to the /user_upgrades/:id page, where we show
the user their upgrade receipt.