Allow searching the /uploads and /media_assets pages by the following metatags:
* id:
* md5:
* width:
* height:
* duration:
* mpixels:
* ratio:
* filesize:
* filetype:
* date:
* age:
* status:<processing|active|deleted|expunged|failed> (for /media_assets)
* status:<pending|processing|active|failed> (for /uploads)
* is:<filetype>, is:<status>
* exif:
Examples:
* https://betabooru.donmai.us/media_assets?search[ai_tags_match]=filetype:png
* https://betabooru.donmai.us/uploads?search[ai_tags_match]=filetype:png
Note that in /uploads search, the id:, date:, and age: metatags refer to the upload media asset, not
the upload itself.
Note also that uploads may contain multiple assets, so for example searching uploads by
`filetype:png` will return all uploads containing at least one PNG file, even if they contain other
non-PNG files.
Fix bug where it was possible to submit blank text in various text fields.
Caused by `String#blank?` not considering certain Unicode characters as blank. `blank?` is defined
as `match?(/\A[[:space:]]*\z/)`, where `[[:space:]]` matches ASCII spaces (space, tab, newline, etc)
and Unicode characters in the Space category ([1]). However, there are other space-like characters
not in the Space category. This includes U+200B (Zero-Width Space), and many more.
It turns out the "Default ignorable code points" [2][3] are what we're after. These are the set of 400
or so formatting and control characters that are invisible when displayed.
Note that there are other control characters that aren't invisible when rendered, instead they're
shown with a placeholder glyph. These include the ASCII C0 and C1 control codes [4], certain Unicode
control characters [5], and unassigned, reserved, and private use codepoints.
There is one outlier: the Braille pattern blank (U+2800) [6]. This character is visually blank, but is
not considered to be a space or an ignorable code point.
[1]: https://codepoints.net/search?gc[]=Z
[2]: https://codepoints.net/search?DI=1
[3]: https://www.unicode.org/review/pr-5.html
[4]: https://codepoints.net/search?gc[]=Cc
[5]: https://codepoints.net/search?gc[]=Cf
[6]: https://codepoints.net/U+2800
[7]: https://en.wikipedia.org/wiki/Whitespace_character
[8]: https://character.construction/blanks
[9]: https://invisible-characters.com
Add ability to undelete accounts from within the console. Their password is reset, their name is
restored to their last known user name, and a mod action is logged.
Upload files in natural order rather than archive order when uploading archive files.
Before files were listed in the same order they appeared in the zip file. This could be in
non-alphabetical order, or even with files from different directories interleaved between each
other. Now files are uploaded in natural order, which is alphabetical order but with numbers sorted
properly, so that `file-9.jpg` appears before `file-10.jpg`.
A couple non-obvious consequences:
* Users can't flag non-rating:G posts in safe mode.
* Non-Gold users can flag Gold-only posts if they're the uploader.
* Add og:image:width, og:image:height, and og:image:type tags.
* Use og:video tags for videos.
* Use 720x720 instead of 150x150 preview images for videos.
* Add duration tag to JSON-LD data for videos.
* Add OpenGraph tags to media assets show page.
* Respect Twitter max image size limits.
* Don't include OpenGraph image tags when someone shares a plain https://danbooru.donmai.us link
with no tag search. This caused random potentially NSFW images to be shown when someone shared a
https://danbooru.donmai.us link on social media, which could be cached for long periods of time.
Allow admins to delete media asset files.
This only deletes the image file itself, not the upload or media asset record. The upload will still
be in the user's upload list, but the image will be gone. The media asset page will still exist, but
it will only show the file's metadata, not the image itself. We don't delete the metadata so we have
a record of what the file's MD5 was and who uploaded it, to prevent the file from being uploaded
again and to take action against the user if necessary.
Show sources on the media asset show page. An asset can have more than one source if the same
file is uploaded from multiple sites.
Only sources from known sites are shown. Sources from unknown sites aren't shown because they
could potentially contain private information or identify the uploader in some way.
Known issue: Twitter posts often show two sources, the direct image URL and the page URL. This is
because someone uploaded the direct image URL first, and we're not able to tell that the image URL
and the page URL are for the same tweet.
* When a tag's category is changed, also change the category of any aliases pointing to it. For
example, if "ff7" is aliased to "final_fantasy_vii", and "final_fantasy_vii" is changed to a
copyright tag, then change the empty "ff7" tag to be a copyright tag too.
* Don't allow changing the category of an aliased tag. For example, if "ff7" is aliased to
"final_fantasy_vii", then don't allow changing the "ff7" tag to be a non-copyright tag.
This ensures that the categories of aliased tags stay in sync with that of their parent tags. This
way aliased tags are colored correctly in wikis and other places.
When regenerating thumbnails for a media asset, don't redistribute the original file. This is
unnecessary and also slow if it's a large file on remote storage.
Don't allow favgroup names that:
* Start or end with underscores.
* Contain multiple underscores in a row.
* Contain asterisks or non-printable characters.
* Consist of only underscores.
* Consist of only digits (conflicts with `favgroup:1234` syntax).
Add a fix script that fixes favgroups that violate these rules and notifies the user.
Fix the ban! and unban! methods to:
* Lock the artist while it is being banned or unbanned.
* Perform the edits as a mass update, so that the posts are updated in parallel.
* Edit the artist as the banner rather than as the current user.
* Soft delete the banned_artist implication when an artist is unbanned instead of hard deleting it.
* Ignore the banned_artist implication if it's deleted.
When a user is banned, send them a "You have been banned" dmail instead of a "Your user record has
been updated" dmail.
When a user loses approver status due to inactivity, don't seen them a "Your user record has been
updated" dmail for the "Lost approver privileges" neutral feedback they receive.
Fix a bug where, if you were uploading an entire 4chan thread, then the source of each post would
get set to the 4chan thread, rather than to the individual 4chan post.
Allow uploading .zip, .rar, and .7z files from disk. The archive will be extracted and the images
inside will be uploaded.
This only works for archive files uploaded from disk, not from a source URL.
Post source URLs will look something like this: "file://foo.zip/1.jpg", "file://foo.zip/2.jpg", etc.
Sometimes artists uses Shift JIS or other encodings instead of UTF-8 for filenames. In these cases
we just assume the filename is UTF-8 and replace invalid characters with '?', so filenames might be
wrong in some cases.
There are various protections to prevent uploading malicious archive files:
* Archives with more than 100 files aren't allowed.
* Archives that decompress to more than 100MB aren't allowed.
* Archives with filenames containing '..' components aren't allowed (e.g. '../../../../../etc/passwd').
* Archives with filenames containing absolute paths aren't allowed (e.g. '/etc/passwd').
* Archives containing symlinks aren't allowed (e.g. 'foo -> /etc/passwd').
* Archive types other than .zip, .rar, and .7z aren't allowed (e.g. .tar.gz, .cpio).
* File permissions, owners, and other metadata are ignored.
Partial fix for #5340: Add support for extracting archive attachments from certain sources
Fix temp files generated during the upload process not being cleaned up quickly enough. This included
downloaded files, generated preview images, and Ugoira video conversions.
Before we relied on `Tempfile` cleaning up files automatically. But this only happened when the
Tempfile object was garbage collected, which could take a long time. In the meantime we could have
hundreds of megabytes of temp files hanging around.
The fix is to explicitly close temp files when we're done with them. But the standard `Tempfile`
class doesn't immediately delete the file when it's closed. So we also have to introduce a
Danbooru::Tempfile wrapper that deletes the tempfile as soon as it's closed.
Fix three exploits that allowed one to keep using their account after it was deleted:
* It was possible to use session cookies from another computer to login after you deleted your account.
* It was possible to use API keys to make API requests after you deleted your account.
* It was possible to request a password reset, delete your account, then use the password reset link
to change your password and login to your deleted account.
* Don't delete the user's favorites unless private favorites are enabled. The general rule is that
public account activity is kept and private account activity is deleted.
* Delete the user's API keys, forum topics visits, private favgroups, downvotes, and upvotes (if
privacy is enabled).
* Reset all of the user's account settings to default. This means custom CSS is deleted, where it
wasn't before.
* Delete everything but the user's name and password asynchronously.
* Don't log the current user out if it's the owner deleting another user's account.
* Fix#5067 (Mod actions sometimes not created for user deletions) by wrapping the deletion process
in a transaction.
Automatically add the `sound` tag if the post has sound. Remove the tag if the post doesn't have sound.
A video is considered to have sound if its peak loudness is greater than -70 dB. The current quietest post
on Danbooru has a peak loudness of -62 dB (post #3470668), but it's possible to have audible sound at
-80 dB or possibly even lower. It's hard to draw a clear line between "silent" and "barely audible".
If a media asset is corrupt, include the error message from libvips or
ffmpeg in the "Vips:Error" or "FFmpeg:Error" fields in the media
metadata table.
Corrupt files can't be uploaded nowadays, but they could be in the past,
so we have some old corrupted files that we can't generate thumbnails
for. This lets us mark these files in the metadata so they're findable
with the tag search `exif:Vips:Error`.
Known bug: Vips has a single global error buffer that is shared between
threads and that isn't cleared between operations. So we can't reliably
get the actual error message because it may pick up errors from other
threads, or from previous operations in the same thread.
When searching posts by width, height, file size, or file extension, use the
values from the media_assets table rather than the posts table.
This makes filetype: searches faster because the file_ext is indexed on
the media assets table, but not on the posts table.
This paves the way for getting rid of the width, height, file_size, and
file_ext indexes on the posts table in the future. It's wasteful to
index these columns on both the posts table and the media assets table.
Add a `MediaAsset#regenerate!` method that regenerates everything about
the asset, including the metadata, thumbnails, IQDB, cached Cloudflare
URLs, and AI tags.
Fixes it so that a) it's possible to regenerate media assets that aren't
attached to posts and b) regenerating a post regenerates everything. Before
it didn't regenerate the metadata, AI tags, or all of the cached URLs.
Fix it so that trying to regenerate AI tags for a Flash file doesn't
fail because Flash files have no image preview.
Also let `MediaFile.open` take a block argument.
Fix certain corrupt GIFs returning dimensions of 0x0. This happened
when the GIF was too corrupt for libvips to read. Fixed by using
ExifTool to read the dimensions instead.
Also add validations to ensure that it's not possible to have media
assets with a width or height of 0.
Add a script to go through every media asset and check the metadata
(width, height, duration, filesize, md5, EXIF metadata) and update it
if it's changed. This is necessary after upgrading ExifTool because the
metadata it returns may have changed.
Fix StatementInvalid exception when uploading https://files.catbox.moe/vxoe2p.mp4.
This was a result of multiple bugs:
* First, generating thumbnails for the video failed. This was because
the video uses the AV1 codec, which FFmpeg failed to decode. It failed
because our version of FFmpeg was built without the `--enable-libdav1d`
flag, so it uses the builtin AV1 decoder, which apparently can't
handle this particular video (it spews a bunch of errors about "Failed
to get pixel format" and "missing sequence header" and "failed to get
reference frame").
* Because generating the thumbnails failed, an exception was raised. We
tried to save the error message in the upload_media_assets.error
field. However, this also failed because the error message was 77kb
long (it contained the entire output of the ffmpeg command), but the
`upload_media_assets` table had a btree index on the `error` column,
which meant the maximum length of the error column was limited to
~2.7kb. This lead to a StatementInvalid exception being raised.
* Because the StatementInvalid exception was raised while we were trying
to set the upload media asset's status to `failed`, the upload was
left stuck in the `processing` state rather than being set to the
`failed` state.
* Because the upload was stuck in the `processing` state, the upload
page would hang forever waiting for the upload to complete.
The fixes are to:
* Build FFmpeg with `--enable-libdav1d` to use libdav1d for decoding AV1
videos instead of the builtin AV1 decoder.
* Remove the index on the `upload_media_assets.error` column so that
setting overly long error messages won't fail.
* Catch unexpected exceptions in ProcessUploadMediaAssetJob so we can
mark uploads as failed, even if `process_upload!` itself fails because
it raises an unexpected exception inside its own exception handler.
* Check that the video is playable with `MediaFile::Video#is_corrupt?` before
allowing it to be uploaded. This way we can return a better error
message if we can't generate thumbnails because the video isn't
playable. This requires decoding the entire video, so it means uploads
may take several seconds longer for long videos. It's also a security
risk in case ffmpeg has any bugs.
* Define `MediaAsset#preview!` as raising an exception on error, so
it's clear that generating thumbnails can fail. Define `MediaAsset#preview`
as returning nil on error for when we don't care about the cause of
the error.
Add a JPEG conversion for .avif and .webp files. The `full` variant is
the .avif or .webp file converted to JPEG format, with the same
resolution as the original file (full resolution).
Known bug: When converting an HDR .avif file to .jpeg, the resulting
image is too bright compared to the original image as rendered by
Firefox or Chrome.