Refactor StorageManager to remove all image URL generation code. Instead
the image URL generation code lives in MediaAsset.
Now StorageManager is only concerned with how to read and write files to
remote storage backends like S3 or SFTP, not with how image URLs should
be generated. This way the file storage code isn't tightly coupled to
posts, so it can be used to store any kind of file, not just images
belonging to posts.
Fix it so that when a post is expunged, the media asset is also marked
as expunged. This way the files will be deleted, but the media asset
will still remain as a record of what was expunged. The media asset will
have the md5, width, height, file ext, and file size of the deleted file.
Switch the font to 11px bold Arial. This is more compact and more
readable than 9px Tahoma. Also add a slight border radius and margins
around the indicator to make it stand out from the edge of the image.
Fix various elements to use standard font sizes instead of ad-hoc sizes.
Noticeable changes:
* Tags in autocomplete are slightly smaller.
* The favorite heart icon on posts is slightly smaller.
* Pool titles on thumbnails in the pool gallery page are slightly bigger.
* The page footer is slightly smaller.
* Timestamps on comments and forum posts are very slightly smaller.
* "Pending"/"approved"/"rejected" labels on forum posts are very slightly smaller.
Use rem units for font sizes so that font sizes are relative to the root
<html> element, not the parent element.
Fixes an issue where the video duration indicator would be too small on
parent/child thumbnails in post show pages. This was because of nesting
issues with em units. Em units are relative to their parent element, so
if you had a parent element with a font size of 0.8em, and a child
element with a font size of 0.8em, then the final computed font size
would be 0.8*0.8 = 0.64em.
Show the length of videos and animated posts in the thumbnail. The
length is shown the top left corner in MM:SS format. This replaces the
play button icon.
Show a speaker icon instead of a music note icon for posts with sound.
Doing this requires doing `.includes(:media_asset)` in a bunch of
places to avoid N+1 queries when we access the post's duration.
Mark assets that have missing files as expunged. This happened with
uploads that were abandoned and had their files deleted, but that didn't
destroy their media asset record.
Fixes an issue where uploads could have missing files because someone
resumed an abandoned upload that had its files deleted.
Lower the priority of the populate saved search job. This is so that
large numbers of saved searches don't overwhelm the job queue and
prevent higher priority jobs from running.
Don't delete replaced files after 30 days. There are only about 30k
replacements in total, so the cost of keeping replaced files is
negligible. It was also wrong because the media asset wasn't destroyed
too, so there were active media assets with missing files.
Don't destroy Pixiv Ugoira frame data when the media asset is destroyed.
This is wrong because when uploads were pruned, it could delete the
frame data of an active post.
Add a md5 uniqueness constraint on media assets to prevent duplicate
assets from being created. This way we can guarantee that there is one
active media asset per uploaded file.
Also make it so that if two people are uploading the same file at the
same time, the file is processed only once.
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
the MediaAsset is created, not when the post is created. This way it's
possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
with the ugoira frame data already attached to it, instead of returning it
in the `data` field then passing it around separately in the `context`
field of the upload.
Move more of the file-handling logic from UploadService and
StorageManager into MediaAsset. This is part of refactoring posts and
uploads to allow multiple images per post.
Fix how the duration of videos and animated GIFs / PNGs is calculated.
If we can't determine the duration from the file metadata, then play the
entire video or animation back using FFmpeg and scrape the duration and
frame count.
This is necessary for things like WebM files where the duration metadata
is optional, or animated GIFs and PNGs that don't have a duration field
in the metadata, only a frame count and a sequence of frame delays.
Refactor full-text search on several tables (comments, dmails,
forum_posts, forum_topics, notes, and wiki_pages) to use to_tsvector
expression indexes instead of dedicated tsvector columns. This way
full-text search works the same way across all tables.
API changes:
* Changed /wiki_pages.json?search[body_matches] to match against only
the body. Before `body_matches` matched against both the title and the body.
* Added /wiki_pages.json?search[title_or_body_matches] to match against
both the title and the body.
* Fixed /dmails.json?search[message_matches] to match against both the
title and body when doing a wildcard search. Before a wildcard search
only matched against the body.
* Added /dmails.json?search[body_matches] to match against only the dmail body.
Make it so that when a database call inside a `with_timeout` block times
out, the error logged to New Relic is marked as expected. This is so
that expected timeouts, such as timeouts when calculating search counts
or timeouts when generating related tags for the sidebar, don't count
against the error rate.
Drop the final dependency on the Postgres test_parser extension.
We also have to remove references to test_parser in the migration where
it was first defined, otherwise replaying all migrations from the
beginning will fail. Replaying all migrations from the beginning
normally isn't done except in testing.
After this, it should be possible to use a vanilla install of Postgres
with Danbooru. It's still recommended to use Danbooru's Docker image for
Postgres (https://ghcr.io/danbooru/postgres), as other Postgres extensions
may be necessary in the future.
Restructure the Dockerfile and the CSS/JS files so that we only rebuild
the CSS and JS when they change, not on every commit.
Before it took several minutes to rebuild the Docker image after every
commit, even when the JS/CSS files didn't change. This also made pulling
images slower.
This requires refactoring the CSS and JS to not use embedded Ruby (ERB)
templates, since this made the CSS and JS dependent on the Ruby
codebase, which is why we had to rebuild the assets after every Ruby
change.
Move all the code for defining tag categories from the config file to
TagCategory. It didn't belong in the config because it's not possible to
add new tag categories purely in the config without editing other things
like the CSS.
Also change it so that tag colors are hardcoded in the CSS instead of
generated using ERB. Generating the CSS in ERB meant that the Docker
build had to recompile the CSS on every commit, even when it didn't
change, because it relied on Ruby code outside the CSS that we couldn't
guarantee didn't change.
Try to optimize certain types of common slow searches:
* Searches for mutually-exclusive tags (e.g. `1girl multiple_girls`,
`touhou solo -1girl -1boy`)
* Relatively large tags that are heavily skewed towards old posts
(e.g. lucky_star, haruhi_suzumiya_no_yuuutsu, inazuma_eleven_(series),
imageboard_desourced).
* Mid-sized tags in the <30k post range that Postgres thinks are
big enough for a post id index scan, but a tag index scan is faster.
The general pattern is Postgres not using the tag index because it
thinks scanning down the post id index would be faster, but it's
actually much slower because it degrades to a full table scan. This
usually happens when Postgres thinks a tag is larger or more common than
it really is. Here we try to force Postgres into using the tag index
when we know the search is small.
One case that is still slow is `2girls -multiple_girls`. This returns no
results, but we can't know that without searching all of `2girls`. The
general case is searching for `A -B` where A is a subset of B and A and B
are both large tags.
Hopefully fixes#581, #654, #743, #1020, #1039, #1421, #2207, #4070,
#4337, #4896, and various other issues raised over the years regarding
slow searches.
When a search is performed, we cache the post count so we don't have to
calculate it again every time the user switches pages. However, if the
count times out, we didn't cache it before, causing us to do a slow
count on every page load. This usually happens on multi-tag searches
that return a lot of results, `1girl solo` for example.
This changes it so that the count is cached even when it times out. This
will speed up large multi-tag searches.
This also changes it so that the count is cached for a fixed 5 minutes.
Before it was variable based on the size of the count, but this probably
didn't make much difference.