Commit Graph

29 Commits

Author SHA1 Message Date
evazion
5f92f452fe media file: factor out file type detection code.
Factor out the file type detection code from MediaFile into a FileTypeDetector class so we can use
it to detect archive files (.zip, .rar, .7z) too.
2022-11-14 20:14:37 -06:00
evazion
3172031caa media assets: track corrupted files in media metadata.
If a media asset is corrupt, include the error message from libvips or
ffmpeg in the "Vips:Error" or "FFmpeg:Error" fields in the media
metadata table.

Corrupt files can't be uploaded nowadays, but they could be in the past,
so we have some old corrupted files that we can't generate thumbnails
for. This lets us mark these files in the metadata so they're findable
with the tag search `exif:Vips:Error`.

Known bug: Vips has a single global error buffer that is shared between
threads and that isn't cleared between operations. So we can't reliably
get the actual error message because it may pick up errors from other
threads, or from previous operations in the same thread.
2022-11-02 20:48:15 -05:00
evazion
d2c520035b media assets: fix regenerating AI tags for flash files.
Fix it so that trying to regenerate AI tags for a Flash file doesn't
fail because Flash files have no image preview.

Also let `MediaFile.open` take a block argument.
2022-11-01 16:30:50 -05:00
evazion
b41b67af6c media assets: add dynamically-generated thumbnails (owner-only).
Add ability to dynamically generate thumbnails with:

* https://danbooru.donmai.us/media_assets/6961761.jpg?width=180&height=180

This is currently restricted to the Owner-level user because it's slow.
2022-11-01 01:36:38 -05:00
evazion
83ba91425f uploads: fix .mp4 filetype detection.
Fix a bug where MP4 files with major brand "iso4" weren't detected as
MP4, so they couldn't be uploaded.

This switches our MP4 detection code to something very similar to Firefox's
MP4 sniffing algorithm. Ours is slightly wrong because a) we only check
the major_brand, not the minor_brands, and b) we falsely detect certain 3GP
videos as MP4. 3GP is a very similar format to MP4, close enough that it
can be played by Chrome (but not Firefox), but it's technically not MP4
and should not have a .mp4 file extension. We leave it alone because we
have two existing 3GP media assets that were falsely detected as MP4.

https://danbooru.donmai.us/forum_topics/22356
https://github.com/mozilla/gecko-dev/blob/master/toolkit/components/mediasniffer/nsMediaSniffer.cpp#L78
https://mimesniff.spec.whatwg.org/#signature-for-mp4
2022-10-28 03:51:46 -05:00
evazion
a9d586e93a Fix #3615: Unsupported video codecs.
Don't allow uploading videos with unsupported video codecs.

The only video codecs we allow for MP4 files are H.264 and VP9. Other
codecs, including H.265 (aka HEVC), MPEG-4 part 2, and AV1, are
disallowed because they're not universally supported by browsers.
Firefox doesn't support H.265 or MPEG-4 part 2, and Safari doesn't
support AV1.

Additionally, don't allow videos with multiple video tracks, multiple
audio tracks, or no video tracks. Multiple video and audio tracks are
disallowed because they're rare and for moderation purposes, we don't
want people hiding content in extra tracks.

These restrictions really only apply to MP4 videos, since WebM files
don't support multiple video or audio tracks and only support a limited
number of codecs (VP8 and VP9 for videos, Vorbis and Opus for audio).

There are currently 22 posts with unsupported video codecs:

* https://danbooru.donmai.us/posts?tags=video+is:mp4+-exif:Track1:CompressorID=avc1+-exif:Track2:CompressorID=avc1+-exif:Track1:CompressorID=vp09+-exif:Track2:CompressorID=vp09 # AVC1 is H.264

There is one post that has multiple audio tracks:

* https://danbooru.donmai.us/posts/2382057
2022-10-27 01:43:33 -05:00
evazion
48ecb80d6b Fix #5230: video upload 500 error (StatementInvalid) & empty error panel on page
Fix StatementInvalid exception when uploading https://files.catbox.moe/vxoe2p.mp4.

This was a result of multiple bugs:

* First, generating thumbnails for the video failed. This was because
  the video uses the AV1 codec, which FFmpeg failed to decode. It failed
  because our version of FFmpeg was built without the `--enable-libdav1d`
  flag, so it uses the builtin AV1 decoder, which apparently can't
  handle this particular video (it spews a bunch of errors about "Failed
  to get pixel format" and "missing sequence header" and "failed to get
  reference frame").

* Because generating the thumbnails failed, an exception was raised. We
  tried to save the error message in the upload_media_assets.error
  field. However, this also failed because the error message was 77kb
  long (it contained the entire output of the ffmpeg command), but the
  `upload_media_assets` table had a btree index on the `error` column,
  which meant the maximum length of the error column was limited to
  ~2.7kb. This lead to a StatementInvalid exception being raised.

* Because the StatementInvalid exception was raised while we were trying
  to set the upload media asset's status to `failed`, the upload was
  left stuck in the `processing` state rather than being set to the
  `failed` state.

* Because the upload was stuck in the `processing` state, the upload
  page would hang forever waiting for the upload to complete.

The fixes are to:

* Build FFmpeg with `--enable-libdav1d` to use libdav1d for decoding AV1
  videos instead of the builtin AV1 decoder.

* Remove the index on the `upload_media_assets.error` column so that
  setting overly long error messages won't fail.

* Catch unexpected exceptions in ProcessUploadMediaAssetJob so we can
  mark uploads as failed, even if `process_upload!` itself fails because
  it raises an unexpected exception inside its own exception handler.

* Check that the video is playable with `MediaFile::Video#is_corrupt?` before
  allowing it to be uploaded. This way we can return a better error
  message if we can't generate thumbnails because the video isn't
  playable. This requires decoding the entire video, so it means uploads
  may take several seconds longer for long videos. It's also a security
  risk in case ffmpeg has any bugs.

* Define `MediaAsset#preview!` as raising an exception on error, so
  it's clear that generating thumbnails can fail. Define `MediaAsset#preview`
  as returning nil on error for when we don't care about the cause of
  the error.
2022-10-26 22:49:55 -05:00
evazion
acea0d5553 Fix #5065: .webp images upload support
Add ability to upload .webp images.

Animated WebP images aren't supported. This is because they aren't
supported by FFmpeg yet[1], so generating thumbnails and samples for
them would be more complicated than for other formats.

[1]: https://trac.ffmpeg.org/ticket/4907
2022-10-25 22:41:36 -05:00
evazion
df0e9bc4a7 uploads: fix it being possible to upload .mkv files as .webm.
Fix it being possible to upload arbitrary .mkv files and have them
be treated as .webm. This was possible because WebM uses the Matroska
container format, and we only checked for the Matroska header, not that
the file was actually a WebM.

There were only 6 such files in production:

* https://danbooru.donmai.us/posts?tags=exif:Matroska:DocType=matroska
* https://danbooru.donmai.us/posts/5522036
* https://danbooru.donmai.us/posts/4743498
* https://danbooru.donmai.us/posts/3925427
* https://danbooru.donmai.us/posts/3147897
* https://danbooru.donmai.us/posts/2965862
* https://danbooru.donmai.us/posts/2430436

These videos are playable in Chrome, but not in Firefox, since Firefox
doesn't support .mkv files (it supports some, depending on which codecs
are used, but not .mkv files in general).
2022-10-25 19:32:31 -05:00
evazion
c96d60a840 uploads: add support for uploading .avif files.
Features of AVIF include:

* Lossless and lossy compression.
* High dynamic range (HDR) images
* Wide color gamut images (i.e. 10- and 12-bit color depths)
* Transparency (through alpha planes).
* Animations (with an optional cover image).
* Auxiliary image sequences, where the file contains a single primary
  image and a short secondary video, like Apple's Live Photos.
* Metadata rotation, mirroring, and cropping.

The AVIF format is still relatively new and some of these features aren't well
supported by browsers or other software:

* Animated AVIFs aren't supported by Firefox or by libvips.
* HDR images aren't supported by Firefox.
* Rotated, mirrored, and cropped AVIFs aren't supported by Firefox or Chrome.
* Image grids, where the file contains multiple images that are tiled
  together into one big image, aren't supported by Firefox.
* AVIF as a whole has only been supported for a year or two by Chrome
  and Firefox, and less than a year by Safari.

For these reasons, only basic AVIFs that don't use animation, rotation,
cropping, or image grids can be uploaded.
2022-10-25 03:29:58 -05:00
evazion
0a5ebcc69d uploads: refactor media asset validation logic.
Refactor the upload validation logic to not depend on the current user.
Fixes several broken upload tests.
2022-09-15 05:09:07 -05:00
evazion
a9fe73a483 ai tags: save ai tags on upload.
Save the AI tags when a media asset is uploaded.
2022-06-28 03:12:46 -05:00
evazion
58fc00e549 uploads: allow uploading iso5 .mp4 files.
This is an MP4 ftyp sometimes used by Twitter.
2022-02-09 16:48:11 -06:00
evazion
572878fb0d uploads: allow uploading .m4v format videos.
Fix not being able to upload .m4v format videos as reported here:

* https://danbooru.donmai.us/forum_posts/205248
* https://github.com/danbooru/danbooru/issues/3615#issuecomment-1030950924

From https://en.wikipedia.org/wiki/M4V:

  The M4V file format is a video container format developed by Apple and
  is very similar to the MP4 format. The primary difference is that M4V
  files may optionally be protected by DRM copy protection.

This could be a problem if it allows uploading videos that are
unplayable because of DRM.
2022-02-06 21:41:35 -06:00
evazion
1c5786d20f posts: remove cropped thumbnails. 2021-12-16 15:58:29 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
f86e16cfbd MediaFile: allow generating thumbnails for corrupt files.
We need this so we can regenerate thumbnails for old posts with
corrupted images.
2021-12-01 04:45:26 -06:00
evazion
e5ba6d4afc MediaFile: fix thumbnail dimension calculation.
Calculate the dimensions of thumbnails ourselves instead of letting
libvips calculate them for us. This way we know the exact size of
thumbnails, so we can set the right width and height for <img> tags. If
we let libvips calculate thumbnail sizes for us, then we can't predict
the exact size of thumbnails, because sometimes libvips rounds numbers
differently than us.
2021-12-01 04:45:26 -06:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
0e901b2f84 media file: get duration of animated GIFs, PNGs, and ugoiras.
Add methods to MediaFile to calculate the duration, frame count, and
frame rate of animated GIFs, PNGs, Ugoiras, and videos.

Some considerations:

* It's possible to have a GIF or PNG that's technically animated but
  just has one frame. These are treated as non-animated images.

* It's possible to have an animated GIF that has an unspecified
  frame rate. In this case we assume the frame rate is 10 FPS; this is
  browser dependent and may not be correct.

* Animated GIFs, PNGs, and Ugoiras all support variable frame rates.
  Technically, each frame has a separate delay, and the delays can be
  different frame-to-frame. We report only the average frame rate.

* Getting the duration of an APNG is surprisingly hard. Most tools don't
  have good support for APNGs since it's a rare and non-standardized
  format. The best we can do is get the frame count using ExifTool and the
  frame rate using ffprobe, then calculate the duration from that.
2021-09-27 05:18:25 -05:00
evazion
9cc8d8aa4a metadata: add CLI script for printing image metadata
Add a utility script for printing image metadata from the command line.

Usage: `bin/lsmetadata 1.jpg 2.jpg`
2021-09-15 21:39:56 -05:00
evazion
3d660953d4 Add MediaMetadata model.
Add a model for storing image and video metadata for uploaded files.

Metadata is extracted using ExifTool. You will need to install ExifTool
after this commit. ExifTool 12.22 is the minimum required version
because we use the `--binary` option, which was added in this release.

The MediaMetadata model is separate from the MediaAsset model because
some files contain tons of metadata, and most of it is non-essential.
The MediaAsset model represents an uploaded file and contains essential
metadata, like the file's size and type, while the MediaMetadata model
represents all the other non-essential metadata associated with a file.

Metadata is stored as a JSON column in the database.

ExifTool returns all the file's metadata, not just the EXIF metadata.
EXIF is one of several types of image metadata, hence why we call
it MediaMetadata instead of EXIFMetadata.
2021-09-08 05:00:54 -05:00
evazion
00ca7526bb docs: add remaining docs for classes in app/logical. 2021-06-24 01:31:41 -05:00
evazion
a6994cd4d7 media file: fix exception on empty files.
This may happen if a user uploads from a source that returns an error
HTTP response with no data.
2020-06-22 18:49:36 -05:00
evazion
8a2ae91ff2 tests: skip video file tests if ffmpeg isn't installed. 2020-06-10 18:07:54 -05:00
evazion
f94c52478c media file: memoize expensive methods in subclasses. 2020-05-27 14:31:39 -05:00
evazion
364343453c uploads: factor out remaining image methods to MediaFile. 2020-05-19 02:42:19 -05:00
evazion
45064853de uploads: move thumbnail generation code to MediaFile.
* Move image thumbnail generation code to MediaFile::Image.
* Move video thumbnail generation code to MediaFile::Video.
* Move ugoira->webm conversion code to MediaFile::Ugoira.

This separates thumbnail generation from the upload process so that it's
possible to generate thumbnails outside of uploads.
2020-05-18 04:19:04 -05:00
evazion
e477232e02 uploads: factor out image dimension and filetype detection code.
* Add MediaFile abstraction. A MediaFile represents an image or video file.
* Move filetype detection and dimension parsing code from uploads to MediaFile.
2020-05-06 00:33:35 -05:00