danbooru

Author	SHA1	Message	Date
evazion	075199cd1e	Remove /ip_addresses page. Remove the /ip_addresses page. This page allowed moderators to search users by IP, and to see recent activity tied to an IP. However, it was limited to IPs tied to uploads, comments, dmails, artist edits, note edits, and wiki edits. Remove this page because it was limited in scope and because there are better ways of doing what it did. The /user_events page is better at catching sockpuppets because it tracks IPs for every login, not just for certain types of edits. And the /user_actions page is better at monitoring user activity because it shows all activity associated with an account, not just for certain types of edits. Removing this allows us to drop IP addresses from all tables besides the user_events table. This is good because these IPs are no longer necessary for any purpose, and because storing them forever is a liability.	2022-09-17 21:32:26 -05:00
evazion	0830af49a7	db: add user_actions view. Add a user_actions view. This view unions together a bunch of tables to produce an event log of every action taken by a user. Also add a bunch of indexes to make queries on this table efficient. Even though the view is an enormous query combining together about 30 different tables, queries are very efficient as long as every table has `created_at` and `(user_id, created)` indexes.	2022-09-16 04:20:19 -05:00
evazion	bfe2eabc6d	db: change ids from bigint to integer on various tables. Change ID columns from `bigint` (64-bits) to `integer` (32-bits) on various tables. Rails 6.0 switched the default from bigint to integer for IDs on new tables, so now we have a mix of tables with integer IDs and bigint IDs. Switch back to integer IDs on certain tables because we're going to build a view that unions a bunch of tables together to build a user activity timeline, and for this purpose all the tables need to have IDs of the same type in order for Postgres to optimize the query effectively.	2022-09-15 03:47:05 -05:00
evazion	0c327a2228	tags: add tag_versions table. Add a tag_versions table for tracking the history of tags. A couple notable differences from other version tables: * There is a previous_version_id column that points to the previous version. This allows finding the first, last, previous, or next version efficiently in SQL. * There is a `version` column that tracks the revision number (1, 2, 3, etc). Post versions and note versions have this, but other version tables don't. * The `updater_id` column is optional. This is because we don't know who the last updater was before we started tracking the history of tags, so the initial updater will be NULL in the first version of the tag.	2022-09-11 17:35:53 -05:00
evazion	f36f1ff37b	tags: drop is_locked column. This column was deprecated in `208b6189`. Finish removing it.	2022-09-09 15:58:48 -05:00
evazion	e058cfba4d	tags: add `words` column. Add a `words` column to the tags table. This will be used for parsing tags into words for word-based matching in autocomplete. For example, "very_long_hair" can be parsed into ["very", "long", "hair"]. The `array_to_tsvector(words)` index is for performing wildcard searches. It lets us do e.g SELECT * FROM tags WHERE array_to_tsvector(words) @@ 'hand:* & hold:' to find tags containing the words "hand" and "hold*".	2022-09-01 23:54:07 -05:00
evazion	8ed2c4f0b6	db: fix migration to drop ai_tags.tag_id index concurrently. Fix deadlock in production when trying to drop this index.	2022-08-21 20:10:47 -05:00
evazion	fb926a1bd2	ai tags: replace tag_id index with (tag_id, score) index. Index on (tag_id, score) instead of (tag_id) to allow tags to be filtered and sorted by confidence more efficiently. This index takes up about the same amount of space as an index on tag_id alone, so including the score in the index is essentially free.	2022-06-27 18:29:35 -05:00
evazion	1aeb52186e	Add AI tag model and UI. Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags. AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that repo for details about the model. The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is designed to be as space-efficient as possible, since in production we have over 300 million AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus indexes. You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are potentially mistagged (or more likely where the AI missed the tag). You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by confidence level. You can also search unposted media assets by AI tag. To generate tags, use the `autotag` script from the Autotagger repo, something like this: docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images \| gzip > tags.csv.gz To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.	2022-06-24 04:54:26 -05:00
evazion	173e43b192	user upgrades: add upgrade code system. Add a system for upgrading accounts using upgrade codes. Users purchase an upgrade code off-site then redeem it on-site to upgrade their account to Gold. Upgrade codes are randomly pre-generated and are one time use only. Codes have enough randomness that guessing a code is infeasible.	2022-06-01 18:31:46 -05:00
evazion	4b65e96abc	upgrades: rename stripe_id to transaction_id * Rename the stripe_id column to transaction_id. * Add a new payment_processor column to identity the processor used for this transaction (and hence, which backend system the transaction_id is for).	2022-05-15 01:05:24 -05:00
evazion	910846163d	Fix #5094 : db/populate.rb is broken. Rewrite db/populate.rb: * Fix broken code. * Pull random posts from Danbooru for more realistic data. * Generate more data (wiki pages, artist commentaries, artist urls). * Make the amount of data generated configurable with environment variables. * Use FFaker to generate better random text and usernames. Usage: * docker-compose exec danbooru bin/rails runner db/populate.rb # with Docker * bin/rails runner db/populate.rb # without Docker	2022-05-08 22:56:08 -05:00
evazion	4ba993319a	media assets: add file_key, is_public columns. `file_key` is a random 9-character base-62 string that will be used as the image filename in the future. `is_public` is whether the image can be viewed without authentication or not. Users running downstream boorus must run `bin/rails db:migrate` and `script/fixes/109_generate_media_asset_file_keys.rb` after this commit.	2022-05-04 23:19:53 -05:00
evazion	90b800ced7	tags: add index on is_deprecated.	2022-04-10 00:07:15 -05:00
nonamethanks	ea76a889db	Add ability to mark tags as deprecated * Deprecated tags can't be added to posts, but existing deprecated tags in a post won't be removed * Only empty tags can be marked as deprecated manually * No tags can be manually undeprecated ** These limits don't apply to admins * Deprecating or undeprecating a tag will create a new mod action to prevent people from going rogue * Added deprecate/undeprecate commands for BURs * Deprecating a tag via BUR removes all implications to and from it as well	2022-04-08 09:07:14 +02:00
evazion	70c5332be8	artist urls: add index on `url` column.	2022-04-03 17:08:36 -05:00
evazion	86e9cf1d05	artist urls: make normalized_url nullable. Make the normalized_url column nullable in preparation for dropping it.	2022-04-02 23:30:39 -05:00
evazion	c64df46de4	artists: make artist finder use `url` instead of `normalized_url`. Make the artist finder search for artists using the `url` field instead of the `normalized_url` field. This lets us get rid of `normalized_url` in the future. As described in `10dac3ee5`, artist URLs have both a `url` column and a `normalized_url` column. The `normalized_url` column was the one used for artist finding. The `url` was secretly normalized behind the scenes so that artist finding would work no matter how the URL was written in the artist entry. This is no longer necessary now that URLs are directly normalized in artist entries. This fixes various cases where artist finding didn't work for non-obvious reasons, usually because the URL wasn't written in the right format so it wasn't properly normalized behind the scenes. This also makes it so that artist finding is case-insensitive, which fixes #4821. Hopefully no sites are perverse enough to allow two different usernames that differ only in case. Users running their own Danbooru instance may have to fix the URLs in their artist entries for artist finding to work again. There are a few fix scripts to help with this: * script/fixes/104_normalize_weibo_artist_urls.rb * script/fixes/105_normalize_pixiv_artist_urls.rb * script/fixes/106_normalize_artist_urls.rb	2022-03-18 04:00:16 -05:00
evazion	f6a96c632d	Fix #5024 : URL inflection change broke migrations Broken in `60a26af6e`.	2022-02-28 20:27:11 -06:00
evazion	44ca178d7a	uploads: add upload_media_assets.page_url. This is needed for multi-file uploads. We need to know both the image url and the page url to set the post's source correctly when converting an upload media asset into a post.	2022-02-11 02:51:20 -06:00
evazion	c2ed5c2841	uploads: make upload_media_assets.media_asset_id nullable. Make upload_media_assets.media_asset_id nullable in order to support multi-file uploads. The media asset will be null while the image is still being downloaded from the source.	2022-02-11 02:49:52 -06:00
evazion	70d38d9e0b	uploads: add columns needed for multi-file uploads. * uploads.media_asset_count - the number of media assets attached to this upload. * upload_media_assets.status - the status of each media asset attached to this upload (processing, active, failed) * upload_media_assets.source_url - the source of each media asset attached to this upload * upload_media_assets.error - the error message if uploading the media asset failed	2022-02-10 12:06:57 -06:00
evazion	1a61e329ba	uploads: add column for error messages. Change it so uploads store errors in an `error` column instead of in the `status` field.	2022-02-07 15:44:39 -06:00
evazion	7c63ac8dbd	uploads: drop unused columns.	2022-02-04 02:19:30 -06:00
evazion	2dfec29da7	uploads: mark old columns as ignored. Mark old columns as ignored in preparation for dropping them. Make the rating and tag_string nullable so they don't have to be set when creating uploads and can be ignored too.	2022-02-03 14:07:09 -06:00
evazion	c4775d96a9	uploads: add upload_media_assets table. Add a join table that allows multiple media assets (images or videos) to be attached to uploads. This is for a future ability to upload multiple files at once.	2022-01-25 16:10:53 -06:00
evazion	c8d27c2719	Fix #4669 : Track moderation report status. * Add ability to mark moderation reports as 'handled' or 'rejected'. * Automatically mark reports as handled when the comment or forum post is deleted. * Send a dmail to the reporter when their report is handled. * Don't show the report notice on comments or forum posts when all reports against it have been handled or rejected. * Add a fix script to mark all existing reports for deleted comments, forum posts, or dmails as handled.	2022-01-20 20:50:23 -06:00
evazion	fd2db2ff23	Update Ruby gems and Yarn packages.	2022-01-10 11:32:59 -06:00
evazion	5623b139aa	db: add foreign key constraints on all tables. Add foreign key constraints on all foreign keys on all tables. These constraints are deferrable so that they're checked at the end of the transaction, rather at the end of the statement. This is to reduce lock duration and to allow for cyclic relationships. Constraints are added in one migration then validated in another so that the entire table isn't locked against reads and writes while the foreign key constraints are being validated. A few tables had invalid foreign keys. Add a fix script to fix these tables: * A couple artist versions belonged to deleted artists. * One dmail belonged to a deleted user. * One forum topic visit belonged to that same deleted user. * A few dozen note versions belonged to nonexistent posts. This came from RaisingK moving notes to different posts years ago, back when it was possible for users to set a note's post ID in the API. * Some uploads had their parent ID set to 0.	2022-01-09 11:01:00 -06:00
evazion	dbf4e1e98e	db: remove unused tsvector triggers. This was forgotten in `080dbf5a8`.	2022-01-06 20:03:50 -06:00
evazion	e8c52432a4	db: remove unused columns on posts table. is_note_locked, is_rating_locked, and is_status_locked have been unused since `126046cb6`. tag_index has been unused since `37a8dc5db`. fav_string has been unused since `165339236`. pool_string has been unused since `7d503f088`.	2022-01-06 11:39:18 -06:00
evazion	080dbf5a8c	db: remove unused tsvector columns. These columns have been unused since `e3b836b50`.	2022-01-06 11:25:51 -06:00
evazion	3841fba78e	jobs: remove DelayedJobs. Remove the DelayedJobs gem and database table. Completes the transition to GoodJob started in `c06bfa64f` and `f4953549a`. Downstream users can upgrade as follows: * Stop the Rails server. * Stop the DelayedJobs worker (normally running as `bin/delayed_job` or `bin/rails jobs:work`). * Run `bin/rails jobs:work` to finish any pending delayed jobs. * Run `bin/rails db:migrate` to create the good_jobs table and drop the delayed_jobs table. * Start the Rails server again. * Start the GoodJobs worker with `bin/good_job start`.	2022-01-04 15:58:12 -06:00
evazion	f4953549ae	jobs: switch from DelayedJob to GoodJob. Switch the ActiveJob backend from DelayedJob to GoodJob. Differences: * The job worker is run with `bin/good_job start` instead of `bin/delayed_job`. * Jobs have an 8 hour timeout instead of a 4 hour timeout. * Jobs don't automatically retry on failure. * Finishing jobs are preserved and pruned after 7 days.	2022-01-04 13:52:08 -06:00
evazion	c06bfa64f5	Add GoodJob gem. This is the first step towards replacing DelayedJob with GoodJob. Compared to DelayedJob: * GoodJob supports Rails 7 (DelayedJob is currently a blocker for Rails 7 because it has a version bound on ActiveRecord <6.2). * GoodJob has a builtin admin dashboard. * GoodJob supports threaded job workers. * GoodJob supports scheduled cronjobs. * GoodJob supports healthchecks for workers. * GoodJob uses Postgres notifications instead of polling to pick up new jobs. This allows jobs to be picked up faster and scales better with large numbers of workers. https://github.com/bensheldon/good_job	2022-01-02 17:13:41 -06:00
evazion	be5173c8dd	votes: add is_deleted flag to post_votes table. Add an is_deleted flag to post_votes so they can be soft-deleted in the future.	2021-11-21 02:36:30 -06:00
evazion	5c7a0f225c	media assets: prevent duplicate media assets. Add a md5 uniqueness constraint on media assets to prevent duplicate assets from being created. This way we can guarantee that there is one active media asset per uploaded file. Also make it so that if two people are uploading the same file at the same time, the file is processed only once.	2021-10-24 04:35:06 -05:00
evazion	bc506ed1b8	uploads: refactor to simplify ugoira-handling and replacements: * Make it so replacing a post doesn't generate a dummy upload as a side effect. * Make it so you can't replace a post with itself (the post should be regenerated instead). * Refactor uploads and replacements to save the ugoira frame data when the MediaAsset is created, not when the post is created. This way it's possible to view the ugoira before the post is created. * Make `download_file!` in the Pixiv source strategy return a MediaFile with the ugoira frame data already attached to it, instead of returning it in the `data` field then passing it around separately in the `context` field of the upload.	2021-10-18 05:18:46 -05:00
evazion	85c3b4f2d1	ugoiras: add md5 column to pixiv_ugoira_frame_data. This is necessary so we can associate ugoira frame data with the media asset instead of with the post.	2021-10-18 00:34:24 -05:00
evazion	e3b836b506	Refactor full-text search to get rid of tsvector columns. Refactor full-text search on several tables (comments, dmails, forum_posts, forum_topics, notes, and wiki_pages) to use to_tsvector expression indexes instead of dedicated tsvector columns. This way full-text search works the same way across all tables. API changes: * Changed /wiki_pages.json?search[body_matches] to match against only the body. Before `body_matches` matched against both the title and the body. * Added /wiki_pages.json?search[title_or_body_matches] to match against both the title and the body. * Fixed /dmails.json?search[message_matches] to match against both the title and body when doing a wildcard search. Before a wildcard search only matched against the body. * Added /dmails.json?search[body_matches] to match against only the dmail body.	2021-10-16 07:44:27 -05:00
evazion	d50cfdb856	db: drop dependency on Postgres test_parser extension. Drop the final dependency on the Postgres test_parser extension. We also have to remove references to test_parser in the migration where it was first defined, otherwise replaying all migrations from the beginning will fail. Replaying all migrations from the beginning normally isn't done except in testing. After this, it should be possible to use a vanilla install of Postgres with Danbooru. It's still recommended to use Danbooru's Docker image for Postgres (https://ghcr.io/danbooru/postgres), as other Postgres extensions may be necessary in the future.	2021-10-14 02:41:44 -05:00
evazion	e72446463e	Fix #4901 : Duplicate disapprovals * Add uniqueness constraint on post_disapprovals (user_id, post_id). * Add fix script to remove existing duplicates.	2021-10-12 20:22:00 -05:00
evazion	7976323f7a	wiki pages: change tsvector update trigger to not use test_parser. Change the wiki_pages tsvector_update_trigger to use `pg_catalog.english` instead of `public.danbooru`. This changes how wiki page text is parsed for full-text search to use the standard English parser instead of test_parser. This is to prepare for dropping test_parser. Using test_parser here was wrong anyway because it meant that punctuation wasn't removed from words when indexing wiki pages for full-text search.	2021-10-11 03:34:47 -05:00
evazion	51e9ea2772	posts: add string_to_array(tag_string, ' ') index. This is preparation for removing tag_index and test_parser.	2021-10-10 17:45:19 -05:00
evazion	1653392361	posts: stop updating fav_string attribute. Stop updating the fav_string attribute on posts. The column still exists on the table, but is no longer used or updated. Like the pool_string in `7d503f08`, the fav_string was used in the past to facilitate `fav:X` searches. Posts had a hidden fav_string column that contained a list of every user who favorited the post. These were treated like fake hidden tags on the post so that a search for `fav:X` was treated like a tag search. The fav_string attribute has been unused for search purposes for a while now. It was only kept because of technicalities that required departitioning the favorites table first (`340e1008e`) before it could be removed. Basically, removing favorites with `@favorite.destroy` was slow because Rails always deletes object by ID, but we didn't have an index on favorites.id, and we couldn't easily add one until the favorites table was departitioned. Fixes #4652. See https://github.com/danbooru/danbooru/issues/4652#issuecomment-754993802 for more discussion of issues caused by the fav_string (in short: write amplification, post table bloat, and favorite inconsistency problems).	2021-10-09 22:36:26 -05:00
evazion	340e1008e9	favorites: merge favorites subtables. Merge the 100 favorite subtables into a single table. Previously the favorites table was partitioned by user id into 100 subtables to try to make searching by user id faster. This wasn't really necessary and probably slower than just making an index on (favorites.user_id, favorites.id) to satisfy ordfav searches. BTree indexes are logarithmic so dividing an index by 100 doesn't make it 100 times faster to search; instead it just removes a layer or two from the tree. This also adds a uniqueness index on (user_id, post_id) to prevent duplicate favorites. Previously we had to check for duplicates at the application layer, which required careful locking to do it correctly. Finally, this adds an index on favorites.id, which was surprisingly missing before. This made ordering and deleting favorites by id really slow because it degraded to a sequential scan.	2021-10-08 21:26:42 -05:00
evazion	01cdc7da7f	media assets: add status column.	2021-09-26 08:06:13 -05:00
evazion	3d30bfd69d	media assets: add duration column. Add a column for tracking the duration (length) of videos and animations. The duration will be null for static images.	2021-09-26 07:45:29 -05:00
evazion	3a0614bb55	db: recreate post versions and pool versions tables. Add the post and pool versions tables back. Currently only used by the test suite to make it easier to run. Not yet used for production.	2021-09-21 12:35:46 -05:00
evazion	3d660953d4	Add MediaMetadata model. Add a model for storing image and video metadata for uploaded files. Metadata is extracted using ExifTool. You will need to install ExifTool after this commit. ExifTool 12.22 is the minimum required version because we use the `--binary` option, which was added in this release. The MediaMetadata model is separate from the MediaAsset model because some files contain tons of metadata, and most of it is non-essential. The MediaAsset model represents an uploaded file and contains essential metadata, like the file's size and type, while the MediaMetadata model represents all the other non-essential metadata associated with a file. Metadata is stored as a JSON column in the database. ExifTool returns all the file's metadata, not just the EXIF metadata. EXIF is one of several types of image metadata, hence why we call it MediaMetadata instead of EXIFMetadata.	2021-09-08 05:00:54 -05:00

1 2 3 4 5 ...

496 Commits