Commit Graph

277 Commits

Author SHA1 Message Date
evazion
0c327a2228 tags: add tag_versions table.
Add a tag_versions table for tracking the history of tags.

A couple notable differences from other version tables:

* There is a previous_version_id column that points to the previous
  version. This allows finding the first, last, previous, or next
  version efficiently in SQL.

* There is a `version` column that tracks the revision number (1, 2, 3, etc).
  Post versions and note versions have this, but other version tables don't.

* The `updater_id` column is optional. This is because we don't know who
  the last updater was before we started tracking the history of tags,
  so the initial updater will be NULL in the first version of the tag.
2022-09-11 17:35:53 -05:00
evazion
f36f1ff37b tags: drop is_locked column.
This column was deprecated in 208b6189. Finish removing it.
2022-09-09 15:58:48 -05:00
evazion
e058cfba4d tags: add words column.
Add a `words` column to the tags table. This will be used for parsing
tags into words for word-based matching in autocomplete.

For example, "very_long_hair" can be parsed into ["very", "long", "hair"].

The `array_to_tsvector(words)` index is for performing wildcard
searches. It lets us do e.g

    SELECT * FROM tags WHERE array_to_tsvector(words) @@ 'hand:* & hold:*'

to find tags containing the words "hand*" and "hold*".
2022-09-01 23:54:07 -05:00
evazion
fb926a1bd2 ai tags: replace tag_id index with (tag_id, score) index.
Index on (tag_id, score) instead of (tag_id) to allow tags to be
filtered and sorted by confidence more efficiently. This index takes up
about the same amount of space as an index on tag_id alone, so including
the score in the index is essentially free.
2022-06-27 18:29:35 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
173e43b192 user upgrades: add upgrade code system.
Add a system for upgrading accounts using upgrade codes. Users purchase
an upgrade code off-site then redeem it on-site to upgrade their account
to Gold. Upgrade codes are randomly pre-generated and are one time use
only. Codes have enough randomness that guessing a code is infeasible.
2022-06-01 18:31:46 -05:00
evazion
4b65e96abc upgrades: rename stripe_id to transaction_id
* Rename the stripe_id column to transaction_id.
* Add a new payment_processor column to identity the processor used for
  this transaction (and hence, which backend system the transaction_id is for).
2022-05-15 01:05:24 -05:00
evazion
4ba993319a media assets: add file_key, is_public columns.
`file_key` is a random 9-character base-62 string that will be used as
the image filename in the future.

`is_public` is whether the image can be viewed without authentication or not.

Users running downstream boorus must run `bin/rails db:migrate` and
`script/fixes/109_generate_media_asset_file_keys.rb` after this commit.
2022-05-04 23:19:53 -05:00
evazion
90b800ced7 tags: add index on is_deprecated. 2022-04-10 00:07:15 -05:00
nonamethanks
ea76a889db Add ability to mark tags as deprecated
* Deprecated tags can't be added to posts, but existing deprecated tags
  in a post won't be removed
* Only empty tags can be marked as deprecated manually
* No tags can be manually undeprecated
** These limits don't apply to admins
* Deprecating or undeprecating a tag will create a new mod action to
  prevent people from going rogue
* Added deprecate/undeprecate commands for BURs
* Deprecating a tag via BUR removes all implications to and from it as well
2022-04-08 09:07:14 +02:00
evazion
70c5332be8 artist urls: add index on url column. 2022-04-03 17:08:36 -05:00
evazion
86e9cf1d05 artist urls: make normalized_url nullable.
Make the normalized_url column nullable in preparation for dropping it.
2022-04-02 23:30:39 -05:00
evazion
c64df46de4 artists: make artist finder use url instead of normalized_url.
Make the artist finder search for artists using the `url` field instead
of the `normalized_url` field. This lets us get rid of `normalized_url`
in the future.

As described in 10dac3ee5, artist URLs have both a `url` column and a
`normalized_url` column. The `normalized_url` column was the one used
for artist finding. The `url` was secretly normalized behind the scenes
so that artist finding would work no matter how the URL was written in
the artist entry. This is no longer necessary now that URLs are directly
normalized in artist entries.

This fixes various cases where artist finding didn't work for non-obvious
reasons, usually because the URL wasn't written in the right format so
it wasn't properly normalized behind the scenes.

This also makes it so that artist finding is case-insensitive, which
fixes #4821. Hopefully no sites are perverse enough to allow two
different usernames that differ only in case.

Users running their own Danbooru instance may have to fix the URLs in
their artist entries for artist finding to work again. There are a few
fix scripts to help with this:

* script/fixes/104_normalize_weibo_artist_urls.rb
* script/fixes/105_normalize_pixiv_artist_urls.rb
* script/fixes/106_normalize_artist_urls.rb
2022-03-18 04:00:16 -05:00
evazion
44ca178d7a uploads: add upload_media_assets.page_url.
This is needed for multi-file uploads. We need to know both the image
url and the page url to set the post's source correctly when converting
an upload media asset into a post.
2022-02-11 02:51:20 -06:00
evazion
c2ed5c2841 uploads: make upload_media_assets.media_asset_id nullable.
Make upload_media_assets.media_asset_id nullable in order to support
multi-file uploads. The media asset will be null while the image is
still being downloaded from the source.
2022-02-11 02:49:52 -06:00
evazion
70d38d9e0b uploads: add columns needed for multi-file uploads.
* uploads.media_asset_count - the number of media assets attached to this upload.
* upload_media_assets.status - the status of each media asset attached to this upload (processing, active, failed)
* upload_media_assets.source_url - the source of each media asset attached to this upload
* upload_media_assets.error - the error message if uploading the media asset failed
2022-02-10 12:06:57 -06:00
evazion
1a61e329ba uploads: add column for error messages.
Change it so uploads store errors in an `error` column instead of in the
`status` field.
2022-02-07 15:44:39 -06:00
evazion
7c63ac8dbd uploads: drop unused columns. 2022-02-04 02:19:30 -06:00
evazion
2dfec29da7 uploads: mark old columns as ignored.
Mark old columns as ignored in preparation for dropping them. Make the
rating and tag_string nullable so they don't have to be set when
creating uploads and can be ignored too.
2022-02-03 14:07:09 -06:00
evazion
c4775d96a9 uploads: add upload_media_assets table.
Add a join table that allows multiple media assets (images or videos) to
be attached to uploads. This is for a future ability to upload multiple
files at once.
2022-01-25 16:10:53 -06:00
evazion
c8d27c2719 Fix #4669: Track moderation report status.
* Add ability to mark moderation reports as 'handled' or 'rejected'.
* Automatically mark reports as handled when the comment or forum post
  is deleted.
* Send a dmail to the reporter when their report is handled.
* Don't show the report notice on comments or forum posts when all
  reports against it have been handled or rejected.
* Add a fix script to mark all existing reports for deleted comments,
  forum posts, or dmails as handled.
2022-01-20 20:50:23 -06:00
evazion
fd2db2ff23 Update Ruby gems and Yarn packages. 2022-01-10 11:32:59 -06:00
evazion
5623b139aa db: add foreign key constraints on all tables.
Add foreign key constraints on all foreign keys on all tables.

These constraints are deferrable so that they're checked at the end of
the transaction, rather at the end of the statement. This is to reduce
lock duration and to allow for cyclic relationships.

Constraints are added in one migration then validated in another so that
the entire table isn't locked against reads and writes while the foreign
key constraints are being validated.

A few tables had invalid foreign keys. Add a fix script to fix these tables:

* A couple artist versions belonged to deleted artists.
* One dmail belonged to a deleted user.
* One forum topic visit belonged to that same deleted user.
* A few dozen note versions belonged to nonexistent posts. This came
  from RaisingK moving notes to different posts years ago, back when it
  was possible for users to set a note's post ID in the API.
* Some uploads had their parent ID set to 0.
2022-01-09 11:01:00 -06:00
evazion
dbf4e1e98e db: remove unused tsvector triggers.
This was forgotten in 080dbf5a8.
2022-01-06 20:03:50 -06:00
evazion
e8c52432a4 db: remove unused columns on posts table.
is_note_locked, is_rating_locked, and is_status_locked have been unused
since 126046cb6.

tag_index has been unused since 37a8dc5db.

fav_string has been unused since 165339236.

pool_string has been unused since 7d503f088.
2022-01-06 11:39:18 -06:00
evazion
080dbf5a8c db: remove unused tsvector columns.
These columns have been unused since e3b836b50.
2022-01-06 11:25:51 -06:00
evazion
3841fba78e jobs: remove DelayedJobs.
Remove the DelayedJobs gem and database table. Completes the transition
to GoodJob started in c06bfa64f and f4953549a.

Downstream users can upgrade as follows:

* Stop the Rails server.
* Stop the DelayedJobs worker (normally running as `bin/delayed_job` or `bin/rails jobs:work`).
* Run `bin/rails jobs:work` to finish any pending delayed jobs.
* Run `bin/rails db:migrate` to create the good_jobs table and drop the delayed_jobs table.
* Start the Rails server again.
* Start the GoodJobs worker with `bin/good_job start`.
2022-01-04 15:58:12 -06:00
evazion
c06bfa64f5 Add GoodJob gem.
This is the first step towards replacing DelayedJob with GoodJob. Compared to
DelayedJob:

* GoodJob supports Rails 7 (DelayedJob is currently a blocker for Rails 7
  because it has a version bound on ActiveRecord <6.2).
* GoodJob has a builtin admin dashboard.
* GoodJob supports threaded job workers.
* GoodJob supports scheduled cronjobs.
* GoodJob supports healthchecks for workers.
* GoodJob uses Postgres notifications instead of polling to pick up new
  jobs. This allows jobs to be picked up faster and scales better with
  large numbers of workers.

https://github.com/bensheldon/good_job
2022-01-02 17:13:41 -06:00
evazion
be5173c8dd votes: add is_deleted flag to post_votes table.
Add an is_deleted flag to post_votes so they can be soft-deleted in the future.
2021-11-21 02:36:30 -06:00
evazion
5c7a0f225c media assets: prevent duplicate media assets.
Add a md5 uniqueness constraint on media assets to prevent duplicate
assets from being created. This way we can guarantee that there is one
active media asset per uploaded file.

Also make it so that if two people are uploading the same file at the
same time, the file is processed only once.
2021-10-24 04:35:06 -05:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
85c3b4f2d1 ugoiras: add md5 column to pixiv_ugoira_frame_data.
This is necessary so we can associate ugoira frame data with the media
asset instead of with the post.
2021-10-18 00:34:24 -05:00
evazion
e3b836b506 Refactor full-text search to get rid of tsvector columns.
Refactor full-text search on several tables (comments, dmails,
forum_posts, forum_topics, notes, and wiki_pages) to use to_tsvector
expression indexes instead of dedicated tsvector columns. This way
full-text search works the same way across all tables.

API changes:

* Changed /wiki_pages.json?search[body_matches] to match against only
  the body. Before `body_matches` matched against both the title and the body.

* Added /wiki_pages.json?search[title_or_body_matches] to match against
  both the title and the body.

* Fixed /dmails.json?search[message_matches] to match against both the
  title and body when doing a wildcard search. Before a wildcard search
  only matched against the body.

* Added /dmails.json?search[body_matches] to match against only the dmail body.
2021-10-16 07:44:27 -05:00
evazion
d50cfdb856 db: drop dependency on Postgres test_parser extension.
Drop the final dependency on the Postgres test_parser extension.

We also have to remove references to test_parser in the migration where
it was first defined, otherwise replaying all migrations from the
beginning will fail. Replaying all migrations from the beginning
normally isn't done except in testing.

After this, it should be possible to use a vanilla install of Postgres
with Danbooru. It's still recommended to use Danbooru's Docker image for
Postgres (https://ghcr.io/danbooru/postgres), as other Postgres extensions
may be necessary in the future.
2021-10-14 02:41:44 -05:00
evazion
e72446463e Fix #4901: Duplicate disapprovals
* Add uniqueness constraint on post_disapprovals (user_id, post_id).
* Add fix script to remove existing duplicates.
2021-10-12 20:22:00 -05:00
evazion
7976323f7a wiki pages: change tsvector update trigger to not use test_parser.
Change the wiki_pages tsvector_update_trigger to use
`pg_catalog.english` instead of `public.danbooru`. This changes how wiki
page text is parsed for full-text search to use the standard English
parser instead of test_parser. This is to prepare for dropping
test_parser. Using test_parser here was wrong anyway because it meant
that punctuation wasn't removed from words when indexing wiki pages for
full-text search.
2021-10-11 03:34:47 -05:00
evazion
51e9ea2772 posts: add string_to_array(tag_string, ' ') index.
This is preparation for removing tag_index and test_parser.
2021-10-10 17:45:19 -05:00
evazion
340e1008e9 favorites: merge favorites subtables.
Merge the 100 favorite subtables into a single table.

Previously the favorites table was partitioned by user id into 100
subtables to try to make searching by user id faster. This wasn't really
necessary and probably slower than just making an index on
(favorites.user_id, favorites.id) to satisfy ordfav searches. BTree
indexes are logarithmic so dividing an index by 100 doesn't make it 100
times faster to search; instead it just removes a layer or two from the
tree.

This also adds a uniqueness index on (user_id, post_id) to prevent
duplicate favorites. Previously we had to check for duplicates at the
application layer, which required careful locking to do it correctly.

Finally, this adds an index on favorites.id, which was surprisingly
missing before. This made ordering and deleting favorites by id really
slow because it degraded to a sequential scan.
2021-10-08 21:26:42 -05:00
evazion
01cdc7da7f media assets: add status column. 2021-09-26 08:06:13 -05:00
evazion
3d30bfd69d media assets: add duration column.
Add a column for tracking the duration (length) of videos and
animations. The duration will be null for static images.
2021-09-26 07:45:29 -05:00
evazion
3a0614bb55 db: recreate post versions and pool versions tables.
Add the post and pool versions tables back. Currently only used by the
test suite to make it easier to run. Not yet used for production.
2021-09-21 12:35:46 -05:00
evazion
3d660953d4 Add MediaMetadata model.
Add a model for storing image and video metadata for uploaded files.

Metadata is extracted using ExifTool. You will need to install ExifTool
after this commit. ExifTool 12.22 is the minimum required version
because we use the `--binary` option, which was added in this release.

The MediaMetadata model is separate from the MediaAsset model because
some files contain tons of metadata, and most of it is non-essential.
The MediaAsset model represents an uploaded file and contains essential
metadata, like the file's size and type, while the MediaMetadata model
represents all the other non-essential metadata associated with a file.

Metadata is stored as a JSON column in the database.

ExifTool returns all the file's metadata, not just the EXIF metadata.
EXIF is one of several types of image metadata, hence why we call
it MediaMetadata instead of EXIFMetadata.
2021-09-08 05:00:54 -05:00
evazion
207a1b7688 db/structure.sql: fix syntax error. 2021-09-04 03:59:35 -05:00
evazion
b068c113a8 Add MediaAsset model.
A MediaAsset represents an image or video file uploaded to Danbooru. It
stores the metadata associated with the image or video. This is to work
on decoupling files from posts so that images can be uploaded separately
from posts.
2021-09-02 06:07:52 -05:00
evazion
247934ad83 db: add non-null constraints to all non-optional columns.
Add non-null constraints to all columns that are non-optional. Now the
only columns that are nullable are optional columns.
2021-03-30 04:52:01 -05:00
evazion
6b91e55283 comments: allow votes to be soft deleted.
Make it so that when a user removes their own vote, the vote is soft
deleted (the is_deleted flag is set) instead of hard deleted.

Changes:

* Add is_deleted flag to comment votes.
* Relax uniqueness constraint so you can have multiple deleted votes on
  the same comment. You can still only have one active vote on the comment.
* Add `soft_delete` method to Deletable concern.
2021-03-30 00:10:22 -05:00
evazion
808c039f03 db/structure.sql: fixup ban duration field from 81fe68d39. 2021-03-12 23:33:51 -06:00
evazion
81fe68d392 bans: change expires_at field to duration.
Changes:

* Change the `expires_at` field to `duration`.
* Make moderators choose from a fixed set of standard ban lengths,
  instead of allowing arbitrary ban lengths.
* List `duration` in seconds in the /bans.json API.
* Dump bans to BigQuery.

Note that some old bans have a negative duration. This is because their
expiration date was before their creation date, which is because in 2013
bans were migrated to Danbooru 2 and the original ban creation dates
were lost.
2021-03-11 02:59:58 -06:00
evazion
4492610dfe rate limits: rework rate limit implementation.
Rework the rate limit implementation to make it more flexible:

* Allow setting different rate limits for different actions. Before we
  had a single rate limit for all write actions. Now different
  controller endpoints can have different limits.

* Allow actions to be rate limited by user ID, by IP address, or both.
  Before actions were only limited by user ID, which meant non-logged-in
  actions like creating new accounts or attempting to login couldn't be rate
  limited. Also, because actions were limited by user ID only, you could
  use multiple accounts with the same IP to get around limits.

Other changes:

* Remove the API Limit field from user profile pages.
* Remove the `remaining_api_limit` field from the `/profile.json` endpoint.
* Rename the `X-Api-Limit` header to `X-Rate-Limit` and change it from a
  number to a JSON object containing all the rate limit info
  (including the refill rate, the burst factor, the cost of the call,
  and the current limits).
* Fix a potential race condition where, if you flooded requests fast
  enough, you could exceed the rate limit. This was because we checked
  and updated the rate limit in two separate steps, which was racy;
  simultaneous requests could pass the check before the update happened.
  The new code uses some tricky SQL to check and update multiple limits
  in a single statement.
2021-03-05 16:00:54 -06:00
evazion
25fda1ecc2 api keys: add IP whitelist and API permission system.
Add the ability to restrict API keys so that they can only be used with
certain IP addresses or certain API endpoints.

Restricting your key is useful to limit damage in case it gets leaked or
stolen. For example, if your key is on a remote server and it gets
hacked, or if you accidentally check-in your key to Github.

Restricting your key's API permissions is useful if a third-party app or
script wants your key, but you don't want to give full access to your
account.

If you're an app or userscript developer, and your app needs an API key
from the user, you should only request a key with the minimum
permissions needed by your app.

If you have a privileged account, and you have scripts running under
your account, you are highly encouraged to restrict your key to limit
damage in case your key gets leaked or stolen.
2021-02-14 21:02:07 -06:00