Commit Graph

3193 Commits

Author SHA1 Message Date
evazion
a607cb1cb1 posts: fix IP leak in /posts.json includes.
This leaks the full post object in the error message, which includes the
uploader's IP:

* https://danbooru.donmai.us/posts/4871548.json?only=updater
2021-10-27 02:36:24 -05:00
evazion
082544ab03 StorageManager: remove Post-specific code.
Refactor StorageManager to remove all image URL generation code. Instead
the image URL generation code lives in MediaAsset.

Now StorageManager is only concerned with how to read and write files to
remote storage backends like S3 or SFTP, not with how image URLs should
be generated. This way the file storage code isn't tightly coupled to
posts, so it can be used to store any kind of file, not just images
belonging to posts.
2021-10-27 00:05:30 -05:00
evazion
afe5095ee6 posts: mark media asset as expunged when post is expunged.
Fix it so that when a post is expunged, the media asset is also marked
as expunged. This way the files will be deleted, but the media asset
will still remain as a record of what was expunged. The media asset will
have the md5, width, height, file ext, and file size of the deleted file.
2021-10-26 02:53:32 -05:00
evazion
f1b5c34b4d posts: show length of videos and animations in thumbnails.
Show the length of videos and animated posts in the thumbnail. The
length is shown the top left corner in MM:SS format. This replaces the
play button icon.

Show a speaker icon instead of a music note icon for posts with sound.

Doing this requires doing `.includes(:media_asset)` in a bunch of
places to avoid N+1 queries when we access the post's duration.
2021-10-25 02:56:55 -05:00
evazion
a9088d8a87 search: fix flag_count:N metatag being broken. 2021-10-24 17:02:38 -05:00
evazion
8d5e0a5b58 replacements: don't delete replaced files.
Don't delete replaced files after 30 days. There are only about 30k
replacements in total, so the cost of keeping replaced files is
negligible. It was also wrong because the media asset wasn't destroyed
too, so there were active media assets with missing files.
2021-10-24 04:35:13 -05:00
evazion
d258790199 uploads: don't delete files of abandoned uploads.
Just leave them. They don't take up that much space and they may be used
in the future if someone else tries to upload the same file.
2021-10-24 04:35:13 -05:00
evazion
f5e7d50dbb media assets: don't destroy ugoira data on destroy.
Don't destroy Pixiv Ugoira frame data when the media asset is destroyed.
This is wrong because when uploads were pruned, it could delete the
frame data of an active post.
2021-10-24 04:35:13 -05:00
evazion
5c7a0f225c media assets: prevent duplicate media assets.
Add a md5 uniqueness constraint on media assets to prevent duplicate
assets from being created. This way we can guarantee that there is one
active media asset per uploaded file.

Also make it so that if two people are uploading the same file at the
same time, the file is processed only once.
2021-10-24 04:35:06 -05:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
1d034a3223 media assets: move more file-handling logic into MediaAsset.
Move more of the file-handling logic from UploadService and
StorageManager into MediaAsset. This is part of refactoring posts and
uploads to allow multiple images per post.
2021-10-18 00:10:29 -05:00
evazion
e3b836b506 Refactor full-text search to get rid of tsvector columns.
Refactor full-text search on several tables (comments, dmails,
forum_posts, forum_topics, notes, and wiki_pages) to use to_tsvector
expression indexes instead of dedicated tsvector columns. This way
full-text search works the same way across all tables.

API changes:

* Changed /wiki_pages.json?search[body_matches] to match against only
  the body. Before `body_matches` matched against both the title and the body.

* Added /wiki_pages.json?search[title_or_body_matches] to match against
  both the title and the body.

* Fixed /dmails.json?search[message_matches] to match against both the
  title and body when doing a wildcard search. Before a wildcard search
  only matched against the body.

* Added /dmails.json?search[body_matches] to match against only the dmail body.
2021-10-16 07:44:27 -05:00
evazion
300bc6941e newrelic: log with_timeout errors as expected.
Make it so that when a database call inside a `with_timeout` block times
out, the error logged to New Relic is marked as expected. This is so
that expected timeouts, such as timeouts when calculating search counts
or timeouts when generating related tags for the sidebar, don't count
against the error rate.
2021-10-14 23:39:21 -05:00
evazion
e72446463e Fix #4901: Duplicate disapprovals
* Add uniqueness constraint on post_disapprovals (user_id, post_id).
* Add fix script to remove existing duplicates.
2021-10-12 20:22:00 -05:00
evazion
341be51f95 posts: remove unused flag! and approve! methods.
These methods were unused outside of the test suite
2021-10-11 20:05:09 -05:00
evazion
f155023b77 posts: remove unused exception classes. 2021-10-11 18:58:15 -05:00
evazion
7976323f7a wiki pages: change tsvector update trigger to not use test_parser.
Change the wiki_pages tsvector_update_trigger to use
`pg_catalog.english` instead of `public.danbooru`. This changes how wiki
page text is parsed for full-text search to use the standard English
parser instead of test_parser. This is to prepare for dropping
test_parser. Using test_parser here was wrong anyway because it meant
that punctuation wasn't removed from words when indexing wiki pages for
full-text search.
2021-10-11 03:34:47 -05:00
evazion
37a8dc5dbd posts: use string_to_array index for tag searches.
Use the `string_to_array(tag_string, ' ')` index instead of the
`tag_index` for tag searches. The string_to_array index lets us treat
the tag_string as an array for searching purposes. This lets us get rid
of the tag_index column and the test_parser dependency in the future.
2021-10-10 22:00:10 -05:00
evazion
1653392361 posts: stop updating fav_string attribute.
Stop updating the fav_string attribute on posts. The column still exists
on the table, but is no longer used or updated.

Like the pool_string in 7d503f08, the fav_string was used in the past to
facilitate `fav:X` searches. Posts had a hidden fav_string column that
contained a list of every user who favorited the post. These were
treated like fake hidden tags on the post so that a search for `fav:X`
was treated like a tag search.

The fav_string attribute has been unused for search purposes for a while
now. It was only kept because of technicalities that required
departitioning the favorites table first (340e1008e) before it could be
removed. Basically, removing favorites with `@favorite.destroy` was
slow because Rails always deletes object by ID, but we didn't have an
index on favorites.id, and we couldn't easily add one until the
favorites table was departitioned.

Fixes #4652. See https://github.com/danbooru/danbooru/issues/4652#issuecomment-754993802
for more discussion of issues caused by the fav_string (in short: write
amplification, post table bloat, and favorite inconsistency problems).
2021-10-09 22:36:26 -05:00
evazion
c4a4e77ca5 favorites: order /favorites.json by id.
Order https://danbooru.donmai.us/favorites.json by favorite ID instead
of by post ID. This way you can get a feed of recently added favorites.
Previously this wasn't possible because we didn't have an index on
favorite ID.
2021-10-08 21:26:42 -05:00
evazion
340e1008e9 favorites: merge favorites subtables.
Merge the 100 favorite subtables into a single table.

Previously the favorites table was partitioned by user id into 100
subtables to try to make searching by user id faster. This wasn't really
necessary and probably slower than just making an index on
(favorites.user_id, favorites.id) to satisfy ordfav searches. BTree
indexes are logarithmic so dividing an index by 100 doesn't make it 100
times faster to search; instead it just removes a layer or two from the
tree.

This also adds a uniqueness index on (user_id, post_id) to prevent
duplicate favorites. Previously we had to check for duplicates at the
application layer, which required careful locking to do it correctly.

Finally, this adds an index on favorites.id, which was surprisingly
missing before. This made ordering and deleting favorites by id really
slow because it degraded to a sequential scan.
2021-10-08 21:26:42 -05:00
evazion
73acc16271 Merge pull request #4898 from nonamethanks/feat-meme-tag
Autotag the meme tag on posts with *_(meme) tags
2021-10-08 05:49:17 -05:00
evazion
7fa23c5fbf users: give all users unlimited favorites.
Let all users have unlimited favorites. Formerly the limit was 10k
favorites for regular members, 20k for Gold, and unlimited for Platinum.

Limiting favorites doesn't make sense since upvotes are unlimited.
2021-10-07 06:27:09 -05:00
evazion
7d503f088e posts: stop using pool_string attribute.
Stop using the pool_string attribute on posts:

* Stop updating it when adding or removing posts from pools.
* Stop returning pool_string in the /posts.json API.
* Stop including the `data-pools` attribute on thumbnails.

The pool_string attribute was used in the past to facilitate pool:X
searches. Posts had a hidden pool_string attribute that contained a list
of every pool the post belonged to. These pools were treated like fake
hidden tags on the post and a search for `pool:X` was treated like a tag
search.

The pool_string has no longer been used for this purpose for a long time
now, and was only maintained for API compatibility purposes. Getting rid
of it eliminates a bunch of legacy cruft relating to adding and removing
posts from pools.

If you need to see which pools a post belongs to, do this:

* https://danbooru.donmai.us/pools.json?search[post_ids_include_any]=318550

The `data-pools` attribute on thumbnails was used by some people to add
custom borders to pooled posts with custom CSS. This will no longer
work. This was already broken because it included things like collection
pools and deleted pools, which you probably didn't want. Use a
userscript to add this attribute back to thumbnails if you need it.
2021-10-07 05:55:43 -05:00
evazion
0731b07d27 posts: store duration of animations and videos.
Start storing the duration of animations and videos in the `duration`
field on the media_assets table. This had to wait until 3d30bfd69 was
deployed, which had to wait until Postgres was upgraded in order to add
the duration column to the media_assets table without downtime.

Also add a fix script to backfill the duration on existing posts. Usage:

    TAGS=animated ./script/fixes/079_fix_duration.rb
2021-10-07 03:21:08 -05:00
nonamethanks
d6daec8918 Autotag the meme tag on posts with *_(meme) tags 2021-10-06 18:06:46 +02:00
evazion
f6a6289c8d posts: autoremove tagme on posts with >30 tags.
If you're able to add 30 tags then you don't need to tag it tagme.
2021-10-06 08:08:52 -05:00
evazion
c99d0523bb /media_assets: add basic index and show pages.
* Add a basic index page at https://danbooru.donmai.us/media_assets.
* Add a basic show page at https://danbooru.donmai.us/media_assets/1.
* Add ability to search /media_assets.json by metadata. Example:
** https://danbooru.donmai.us/media_assets.json?search[metadata][File:ColorComponents]=3
* Add a "»" link next to the filesize on posts linking to the metadata page.

Known issues:

* Sometimes the MD5 links on the /media_assets page return "That record
  was not found" errors. These are unfinished uploads that haven't been
  made into posts yet.
* No good way to search for custom metadata fields in the search form.
* Design is ugly.
2021-09-29 07:46:11 -05:00
evazion
f9d25660b8 Fixup regression in 2eb89a835.
Fix regression in 2eb89a835 that broke the modqueue page because the
arguments to `paginated_search` changed and weren't updated here.

Also fix incorrect YARD documentation syntax.
2021-09-29 06:28:53 -05:00
evazion
2eb89a8354 Fix #4601: Hide deleted pools by default in pool search.
* On /pools, hide deleted pools by default in HTML responses. Don't
  filter out deleted pools in API responses.

* API change: on /forum_topics, only hide deleted forum topics by
  default for HTML responses, not for API responses. Explicitly do
  https://danbooru.donmai.us/forum_topics.json?search[is_deleted]=false
  to filter out deleted topics.

* API change: on /tags, only hide empty tags by default for HTML
  responses, not for API responses. Explicitly do
  https://danbooru.donmai.us/tags.json?search[is_empty]=false to filter
  out empty tags.

* API change: on /pools, default to 20 posts per page for API responses,
  not 40.

* API change: add `search[is_empty]` param to /tags.json endpoint.
  `search[hide_empty]=true` is deprecated in favor of `search[is_empty]=false`.

* On /pools, add option to show/hide deleted pools in search form.

* Fix the /forum_topics page putting `search[order]=sticky&limit=40` in
  the URL when browsing past page 1.
2021-09-29 05:44:59 -05:00
evazion
126046cb69 posts: remove rating, note, and status locks.
Remove the ability for users to lock ratings, note, and post statuses.

Historically the majority of locked posts were from 10+ years ago when
certain users habitually locked ratings and notes on every post they
touched for no reason. Nowadays most posts have been unlocked. Only a
handful of locked posts are left, none of which deserve to be locked.

The is_rating_locked, is_note_locked, and is_status_locked columns still
exist in the database, but aren't used.
2021-09-27 22:32:30 -05:00
evazion
79fdfa86ae Fix various rubocop warnings. 2021-09-27 00:46:13 -05:00
evazion
7d3e491dc6 posts: stop autotagging huge_filesize.
https://danbooru.donmai.us/forum_topics/19526
2021-09-26 18:26:38 -05:00
evazion
1075277d36 posts: remove unused methods. 2021-09-26 08:15:17 -05:00
evazion
01cdc7da7f media assets: add status column. 2021-09-26 08:06:13 -05:00
evazion
ab3f35580f metadata: move metadata parsing into ExifTool::Metadata.
Move the metadata parsing code from MediaAsset to ExifTool::Metadata so
we can use it outside the context of a MediaAsset, in particular when
dealing with a MediaFile that hasn't been saved to disk yet.
2021-09-26 07:19:36 -05:00
evazion
7d3eebaced posts: purge all cached URLs when post is regenerated
Fix not all URLs being purged from Cloudflare when a post is
regenerated.
2021-09-26 01:21:32 -05:00
evazion
463e6d7b49 artists: fix deadlock when banning artists.
Caused by d854bf6b. Banning an artist would deadlock because it was
performed in a transaction, which didn't work with the `parallel_each`
inside the "create an implication to banned_artist" step.
2021-09-24 08:40:33 -05:00
evazion
74b03a7bd0 posts: fix incorrect exif rotation for PNGs.
Fix a bug where where PNG images could be incorrectly detected as
exif-rotated. This would happen when a PNG contained the
IFD0:Orientation flag. It's technically possible for a PNG to contain
this flag, but it's ignored by libvips and by browsers.

post #3762340 (nsfw) is an example of a PNG like this.

The fix is to use `autorot` to let libvips apply the rotation instead of
trying to interpret the exif data ourselves. Note that libvips-8.9 has a
bug where it doesn't strip the orientation flag after applying
`autorot`, which leads to the image being incorrectly rotated a second
time when generating the thumbnail. Use libvips-8.11 instead.
2021-09-23 00:10:00 -05:00
evazion
e7a455ea44 Merge pull request #4884 from nonamethanks/remove_long_image
Posts: stop autotagging long_image
2021-09-22 23:09:08 -05:00
evazion
ee5cd8330d uploads: fix exception when pruning expired uploads.
Hourly pruning of expired uploads was failing because of nil deference
errors in `media_asset.destroy!`. There are various cases where an
upload doesn't have a media asset, for example when the source url
fails to download or when the upload is of an invalid filetype.
2021-09-22 13:24:27 -05:00
evazion
6740ef17ab posts: fix detection of exif_rotation tag.
`IFD0:Orientation` is the orientation of the main image.
`IFD1:Orientation` is the orientation of the embedded thumbnail, if it
has one. Using `IFD1:Orientation` was incorrect here because some images
have a non-rotated main image but a rotated thumbnail. Post #1023563 is
an example.
2021-09-22 11:17:28 -05:00
evazion
b378785582 Fix #3692: Rotate pictures based on metadata
Rotate the image based on the EXIF orientation flag when generating
thumbnails and samples.

Also fix the width and height to be calculated correctly for rotated
images. Vips gives us the unrotated width and height of the image; we
have to detect whether the image is rotated and swap the width and
height manually to correct them. For example, if an image with the
"Rotate 90 CW" flag is 100x500 before rotation, then after rotation it's
500x100. This should fix #4883 (Exif rotation breaks Javascript fit-to-window)

We also have to fix it so that regenerating a post updates the width and
height of the post, in the event that it's a rotated image.

Finally we set `image-orientation: from-image;` even though it's
probably not necessary.
2021-09-22 11:12:50 -05:00
nonamethanks
ce8c8e1ab7 Posts: stop autotagging long_image 2021-09-22 11:16:52 +02:00
evazion
5af21f03de versions: default ARCHIVE_DATABASE_URL to DATABASE_URL.
Make it so that when ARCHIVE_DATABASE_URL isn't set, it defaults to
DATABASE_URL. In other words, if you don't have a separate archive
database configured, then default to using the main database for
post/pool versions.

Fixes an issue where running the test suite would fail if you didn't
explicitly set ARCHIVE_DATABASE_URL because it tried to use
`archive_test` as the post/pool versions database name.
2021-09-22 00:34:28 -05:00
evazion
c69ba54b5a Fix #4442: Autotag image metadata.
Autotag `greyscale`, `non-repeating_animation`, and `exif_rotation`.

Note that this does not detect all (or even most) greyscale images.
Artists often save greyscale images as RGB instead of as greyscale.
2021-09-21 11:18:06 -05:00
evazion
d5981754c4 posts: automatically tag animated_gif & animated_png on tag edit.
Automatically tag animated_gif and animated_png when a post is edited.
Add them back if the user tries to remove them from an animated post,
or remove them if the user tries to add them to a non-animated post.

Before we added these tags at upload time, but it was possible for users
to remove them after upload, or to incorrectly add them to non-animated
posts. They were added at upload time because we couldn't afford to open
the file and parse the metadata on every tag edit. Now that we save the
metadata in the database, we can do this.

This also makes it so you can't tag ugoira on non-ugoira files.

Known bug: it's possible to have an animated GIF where every frame is
identical. Post #3770975 is an example. This will be detected as an
animated GIF even though visually it doesn't appear to be animated.

Fixes #4041: Animated_gif tag not added to preprocessed uploads
2021-09-21 08:26:02 -05:00
evazion
1d8a3bf09f BURs: allow failed BURs to be reapproved.
Fix it so that you can reapprove a failed BUR to run it again. Before
this would fail because it would end up trying to create the aliases
or implications again, which would fail because they already existed.
Now it ignores when an alias or implication already exists. It will
however finish tagging the posts if they haven't been fully moved.
2021-09-20 16:29:38 -05:00
evazion
98b3c82ac5 tests: fix deadlock during artist ban test.
The artist ban tests deadlocked because of a weird interaction between
threads and database transactions when tagging posts in parallel. Add a
hack to work around it.
2021-09-20 02:09:14 -05:00
evazion
d854bf6b53 BURs: update posts in parallel.
When processing an alias, rename, implication, mass update, or nuke,
update the posts in parallel. This means that if we alias foo to bar,
for example, then we use four processes at once to retag the posts from
foo to bar.

This doesn't mean that if we have two aliases in a BUR, we process both
aliases in parallel. It simply means that when processing an alias, we
update the posts in parallel for that alias.
2021-09-20 01:12:14 -05:00