Commit Graph

59 Commits

Author SHA1 Message Date
evazion
acea0d5553 Fix #5065: .webp images upload support
Add ability to upload .webp images.

Animated WebP images aren't supported. This is because they aren't
supported by FFmpeg yet[1], so generating thumbnails and samples for
them would be more complicated than for other formats.

[1]: https://trac.ffmpeg.org/ticket/4907
2022-10-25 22:41:36 -05:00
evazion
c96d60a840 uploads: add support for uploading .avif files.
Features of AVIF include:

* Lossless and lossy compression.
* High dynamic range (HDR) images
* Wide color gamut images (i.e. 10- and 12-bit color depths)
* Transparency (through alpha planes).
* Animations (with an optional cover image).
* Auxiliary image sequences, where the file contains a single primary
  image and a short secondary video, like Apple's Live Photos.
* Metadata rotation, mirroring, and cropping.

The AVIF format is still relatively new and some of these features aren't well
supported by browsers or other software:

* Animated AVIFs aren't supported by Firefox or by libvips.
* HDR images aren't supported by Firefox.
* Rotated, mirrored, and cropped AVIFs aren't supported by Firefox or Chrome.
* Image grids, where the file contains multiple images that are tiled
  together into one big image, aren't supported by Firefox.
* AVIF as a whole has only been supported for a year or two by Chrome
  and Firefox, and less than a year by Safari.

For these reasons, only basic AVIFs that don't use animation, rotation,
cropping, or image grids can be uploaded.
2022-10-25 03:29:58 -05:00
evazion
78fa652646 media assets: make file storage paths and URLs configurable.
Add config options to customize where uploads are stored, and how image URLs are generated.

* Add `media_asset_file_path` option to customize where uploads are stored.
* Add `media_asset_file_url` option to customize how image URLs are generated.
* Remove the `enable_seo_post_urls` config option. The `media_asset_file_url` option
  should be used instead to include the tags in the image URL.
2022-10-16 22:36:52 -05:00
evazion
16e74650e8 media assets: include file URLs in /media_assets.json API.
Include information about the asset's variants (sample images) in the /media_assets.json API:

    {
      "id": 6410907,
      "created_at": "2022-07-31T15:44:34.522-04:00",
      "updated_at": "2022-07-31T15:44:38.002-04:00",
      "md5": "19a2be6a1a8582bb349de9734b7a649a",
      "file_ext": "jpg",
      "file_size": 369029,
      "image_width": 600,
      "image_height": 900,
      "duration": null,
      "status": "active",
      "file_key": "R4DBCxBID",
      "is_public": true,
      "variants": [
         {
           "variant": "preview",
           "url": "https://cdn.donmai.us/preview/19/a2/19a2be6a1a8582bb349de9734b7a649a.jpg",
           "width": 100,
           "height": 150,
           "file_ext": "jpg"
         },
         {
           "variant": "180x180",
           "url": "https://cdn.donmai.us/180x180/19/a2/19a2be6a1a8582bb349de9734b7a649a.jpg",
           "width": 120,
           "height": 180,
           "file_ext": "jpg"
         },
         {
           "variant": "360x360",
           "url": "https://cdn.donmai.us/360x360/19/a2/19a2be6a1a8582bb349de9734b7a649a.jpg",
           "width": 240,
           "height": 360,
           "file_ext": "jpg"
         },
         {
           "variant": "720x720",
           "url": "https://cdn.donmai.us/720x720/19/a2/19a2be6a1a8582bb349de9734b7a649a.webp",
           "width": 480,
           "height": 720,
           "file_ext": "webp"
         },
         {
           "variant": "original",
           "url": "https://cdn.donmai.us/original/19/a2/19a2be6a1a8582bb349de9734b7a649a.jpg",
           "width": 600,
           "height": 900,
           "file_ext": "jpg"
         }
      ]
    }
2022-10-16 17:28:23 -05:00
evazion
3b0e94040f posts: fix placeholder thumbnail for Flash files.
* Replace the "Download" placeholder thumbnail for Flash files with a
  new placeholder that specifically says it's a Flash file.
* Fix a bug where the Flash placeholder thumbnail was too small when
  using larger thumbnail sizes.
* Fix it so that media assets don't falsely consider Flash files to have
  thumbnails. This could potentially cause errors if someone tried to
  expunge, replace, or regenerate a Flash post.
2022-10-16 16:46:18 -05:00
evazion
c2adf279ee ugoira: remove the PixivUgoiraFrameData model.
Remove the last remaining uses of the PixivUgoiraFrameData model. As of
32bfb8407, Ugoira frame data is now stored in the MediaMetadata model,
under the `Ugoira:FrameDelays` EXIF field.

The pixiv_ugoira_frame_data table still exists, but it can be removed
after this commit is deployed.

Fixes #5264: Error when replacing with ugoira.
2022-10-10 18:21:30 -05:00
evazion
1d5db37f56 posts: automatically tag AI-generated on NovelAI posts.
Automatically add the AI-generated tag to posts that have the
`PNG:Software=NovelAI` EXIF attribute.

This is not foolproof because this metadata may get removed if an
AI-generated post is resaved or uploaded to a site that strips EXIF
metadata. It also only works for NovelAI. Currently it detects 29 out of
177 AI-generated uploads on Danbooru.
2022-10-10 04:04:35 -05:00
evazion
88ac91f5f3 search: refactor to pass in the current user explicitly. 2022-09-22 04:31:21 -05:00
evazion
0a5ebcc69d uploads: refactor media asset validation logic.
Refactor the upload validation logic to not depend on the current user.
Fixes several broken upload tests.
2022-09-15 05:09:07 -05:00
evazion
9e16de13ef Merge pull request #5220 from nonamethanks/duration-validation
Uploads: allow admins to bypass duration limits again
2022-09-15 03:46:21 -05:00
evazion
e3af738371 tests: fix broken tests. 2022-08-24 02:03:37 -05:00
evazion
d7e08d1313 media assets: add ability to search by AI tags.
Add ability to search the /media_assets index by AI tags. Multi-tag
searches are supported, including AND/OR/NOT operators, but metatags
aren't supported. Multi-tag searches will probably be slow.

The default AI tag confidence threshold is 50%. There's a hidden
search[min_score] URL param that lets you change this.
2022-07-06 01:38:41 -05:00
evazion
67798c9ece Fix #5221: Trying to upload an unsupported url shows ai tags error. 2022-07-01 18:13:36 -05:00
nonamethanks
b1ae6112bd Uploads: allow admins to bypass duration limits again 2022-06-29 21:17:39 +02:00
evazion
a9fe73a483 ai tags: save ai tags on upload.
Save the AI tags when a media asset is uploaded.
2022-06-28 03:12:46 -05:00
evazion
6f24db92e5 ai tags: make ai tags accessible in api via includes.
Make these things work:

* https://danbooru.donmai.us/posts.json?only=ai_tags
* https://danbooru.donmai.us/media_assets.json?only=ai_tags
* https://danbooru.donmai.us/ai_tags.json?only=media_asset,post,tag
2022-06-26 20:37:35 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
181639368c posts: add is: and has: metatags.
Add the following metatags:

* is:parent
* is:child
* is:safe
* is:questionable
* is:explicit
* is:sfw (same as -rating:q,e)
* is:nsfw (same as rating:q,e)
* is:active
* is:deleted
* is:pending
* is:flagged
* is:appealed
* is:banned
* is:modqueue
* is:unmoderated
* is:jpg
* is:png
* is:gif
* is:mp4
* is:webm
* is:swf
* is:zip
* has:parent
* has:children
* has:source
* has:appeals
* has:flags
* has:replacements
* has:comments
* has:commentary
* has:notes
* has:pools

All of these searches were already possible with other metatags, but these might be more convenient.
2022-05-18 13:04:15 -05:00
evazion
4ba993319a media assets: add file_key, is_public columns.
`file_key` is a random 9-character base-62 string that will be used as
the image filename in the future.

`is_public` is whether the image can be viewed without authentication or not.

Users running downstream boorus must run `bin/rails db:migrate` and
`script/fixes/109_generate_media_asset_file_keys.rb` after this commit.
2022-05-04 23:19:53 -05:00
evazion
ac98c142a4 posts: move expunged image to trash folder.
When a post is expunged, move the image to a trash folder so it can be
recovered if needed.
2022-05-03 05:51:09 -05:00
Michał Frąckiewicz
93635a20d9 Configurable max video duration 2022-03-21 19:22:34 +01:00
evazion
fc5aec7de0 media assets: optimize /media_assets?search[is_posted] query.
Followup to 093a808a3. Using a NOT EXISTS clause is much faster than the
`LEFT OUTER JOIN posts WHERE posts.id IS NULL` clause generated by
`.where.missing(:post)`.
2022-02-18 04:24:33 -06:00
evazion
093a808a36 Fix #4986: Add ability to filter images in /media_assets and /uploads depending on if they have become posts 2022-02-18 03:39:08 -06:00
evazion
e4d7453180 uploads: improve error messages.
Improve upload error messages when downloading an URL fails, or it isn't
an image or video file.
2022-02-15 18:54:55 -06:00
evazion
87a00a1182 uploads: fix "ArgumentError: string contains null byte" error
Fix an error when trying to upload a file larger than the file size
limit. In this case we tried to dump the whole HTTP response into the
error message, which included the binary file itself, which caused this
exception because it contained null bytes.
2022-02-15 18:16:47 -06:00
evazion
02edb52569 uploads: enable multi-file uploads when uploading from source.
Make the upload page automatically detect when a source URL has multiple images
and let the user choose which images to post.

For example, when uploading a Twitter or Pixiv post with more than one image, we
direct the user to a page showing a thumbnail for each image and letting
them choose which ones to post.

This is similar to the batch upload page, except we actually download each image
in the background, instead of just hotlinking or proxying the thumbnails through
our servers. This avoids various problems with proxying and makes new features
possible, like showing which images in the batch have already been posted.
2022-02-14 16:13:55 -06:00
evazion
e7744cb6e3 uploads: generate thumbnails in parallel.
Make uploads faster by generating and saving thumbnails in parallel.

We generate each thumbnail in parallel, then send each thumbnail to the
backend image servers in parallel.

Most images have 5 variants: 'preview' (150x150), 180x180, 360x360,
720x720, and 'sample' (850px width). Plus the original file, that's 6
files we have to save. In production we have 2 image servers, so we have
to save each file twice, to 2 remote servers. Doing all this in parallel
should make uploads significantly faster.
2022-02-04 16:20:50 -06:00
evazion
92a4d045e2 media assets: add thumbnail view to /media_assets page.
Add a thumbnail view to the /media_assets page. This page lets you see
all images uploaded to Danbooru by all users (although you can't see who
the uploader is). Also add a link to this page in the subnav bar on the
upload page.
2022-02-02 01:12:56 -06:00
evazion
43c4158d36 uploads: merge tags when a duplicate is uploaded (fix #3130).
Automatically merge tags when uploading a duplicate.

There are two cases:

* You try to upload an image, but it's already on Danbooru. In this case
  you'll be immediately redirected to the original post, before you
  can start tagging the upload.

* You're uploading an image, it wasn't a dupe when you first opened the
  upload page, but you got sniped while tagging it. In this case your tags
  will be merged with the original post, and you will be redirected to the
  original post.

There are a few corner cases:

* If you don't have permission to edit the original post, for example
  because it's banned or has a censored tag, then your tags won't be
  merged and will be silently ignored.

* Only the tags, rating, and parent ID will be merged. The source and
  artist commentary won't be merged. This is so that if an artist uploads
  the exact same file to multiple sites, the new source won't override
  the original source.

* Some tags might be contradictory. For example, the new post might
  be tagged translation_request, but the original post might already be
  translated. It's up to the user to fix these things afterwards.
2022-01-30 03:14:22 -06:00
evazion
11b7bcac91 uploads: fix broken tests.
* Fix broken upload tests.
* Fix uploads to return an error if both a file and a source are given
  at the same time, or if neither are given. Also fix the error message
  in this case so that it doesn't include "base" at the start of the string.
* Fix uploads to percent-encode any Unicode characters in the source URL.
* Add a max filesize validation to media assets.
2022-01-29 05:14:49 -06:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
1c5786d20f posts: remove cropped thumbnails. 2021-12-16 15:58:29 -06:00
evazion
163ba8e7da posts: micro-optimize allocations during thumbnail generation.
Do a few micro-optimizations to reduce the number of memory allocations
during thumbnail generation.

This commit, combined with freezing string literals in a7dc05 and
67b961, reduces the number of allocations on the front page from 180,000
to 150,000, and the number of retained objects from 8,000 to 4,000.
2021-12-16 00:53:48 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
c22f7b799b media assets: fix error when generating thumbnails for corrupt files.
Fix an error being raised when trying to generate thumbnails for corrupt
files. If the original image is corrupt, then ignore any errors and let
libvips try to generate a thumbnail as best it can. This will usually
result in an incomplete thumbnail.
2021-12-05 21:46:14 -06:00
evazion
ad49a10147 media assets: fix bug in thumbnail generation.
Fix thumbnail generation throwing a NoMatchingPatternError.
2021-12-05 19:04:17 -06:00
evazion
9cb70fa632 posts: add 720x720 thumbnail size.
This is used to provide higher resolution thumbnails for high pixel
density displays, such as phones or laptops. If your screen has a 2x
pixel density ratio, then 360x360 thumbnails will be rendered at 720x720
resolution.

We use WebP here because it's about 15% smaller than the equivalent
JPEG, and because if a device has a high enough pixel density to use
this, then it probably supports WebP.

720x720 thumbnails average about 36kb in size, compared to 20.35kb for
360x360 thumbnails and 7.55kb for 180x180 thumbnails.
2021-12-05 09:19:29 -06:00
evazion
17537084fe posts: generate 180x180px and 360x360px thumbnails (#4932).
Add two new thumbnail sizes. These new thumbnail sizes are generated on
upload, but not used yet.
2021-12-02 23:42:44 -06:00
evazion
e5ba6d4afc MediaFile: fix thumbnail dimension calculation.
Calculate the dimensions of thumbnails ourselves instead of letting
libvips calculate them for us. This way we know the exact size of
thumbnails, so we can set the right width and height for <img> tags. If
we let libvips calculate thumbnail sizes for us, then we can't predict
the exact size of thumbnails, because sometimes libvips rounds numbers
differently than us.
2021-12-01 04:45:26 -06:00
evazion
8f36ebe2b8 Fix #4914: RuntimeError corrupting uploads
Bug: If a media asset got stuck in the 'processing' state during upload,
then it would stay stuck forever and the file couldn't be uploaded again
later.

Fix: Mark stuck assets as failed before raising the "Upload failed"
error. Once the asset is marked as failed, it can be uploaded again
later. Also, only wait for assets to finish processing if they were
uploaded less than 5 minutes ago. If a processing asset is more than 5
minutes old, consider it stuck and mark it as failed immediately.

Assets getting stuck in the processing state is a 'this should never
happen' error. Normally if any kind of exception is raised while
uploading the asset, the asset will be set to the 'failed' state. The
only way an asset can get stuck is if it fails and the exception handler
doesn't run, or the exception handler itself fails. This might happen if
the process is unexpectedly killed, or possibly if the HTTP request
times out and a TimeoutError is raised at an inopportune time. See below
for discussion of issues with Timeout.

[1]: https://vaneyckt.io/posts/the_disaster_that_is_rubys_timeout_method/
[2]: https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/
[3]: https://adamhooper.medium.com/in-ruby-dont-use-timeout-77d9d4e5a001
[4]: https://ruby-doc.org/core-3.0.2/Thread.html#method-c-handle_interrupt-label-Guarding+from+Timeout-3A-3AError
2021-11-08 18:22:04 -06:00
evazion
4095d14f2a media assets: fix tagged filenames option.
Fix the `enable_seo_post_urls` config option not being respected. This
option controls whether filenames in image URLs contain the tags. This
option requires URLs rewrites in Nginx to work so it's disabled by
default.
2021-10-29 07:14:21 -05:00
evazion
082544ab03 StorageManager: remove Post-specific code.
Refactor StorageManager to remove all image URL generation code. Instead
the image URL generation code lives in MediaAsset.

Now StorageManager is only concerned with how to read and write files to
remote storage backends like S3 or SFTP, not with how image URLs should
be generated. This way the file storage code isn't tightly coupled to
posts, so it can be used to store any kind of file, not just images
belonging to posts.
2021-10-27 00:05:30 -05:00
evazion
afe5095ee6 posts: mark media asset as expunged when post is expunged.
Fix it so that when a post is expunged, the media asset is also marked
as expunged. This way the files will be deleted, but the media asset
will still remain as a record of what was expunged. The media asset will
have the md5, width, height, file ext, and file size of the deleted file.
2021-10-26 02:53:32 -05:00
evazion
f5e7d50dbb media assets: don't destroy ugoira data on destroy.
Don't destroy Pixiv Ugoira frame data when the media asset is destroyed.
This is wrong because when uploads were pruned, it could delete the
frame data of an active post.
2021-10-24 04:35:13 -05:00
evazion
5c7a0f225c media assets: prevent duplicate media assets.
Add a md5 uniqueness constraint on media assets to prevent duplicate
assets from being created. This way we can guarantee that there is one
active media asset per uploaded file.

Also make it so that if two people are uploading the same file at the
same time, the file is processed only once.
2021-10-24 04:35:06 -05:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
1d034a3223 media assets: move more file-handling logic into MediaAsset.
Move more of the file-handling logic from UploadService and
StorageManager into MediaAsset. This is part of refactoring posts and
uploads to allow multiple images per post.
2021-10-18 00:10:29 -05:00
evazion
0731b07d27 posts: store duration of animations and videos.
Start storing the duration of animations and videos in the `duration`
field on the media_assets table. This had to wait until 3d30bfd69 was
deployed, which had to wait until Postgres was upgraded in order to add
the duration column to the media_assets table without downtime.

Also add a fix script to backfill the duration on existing posts. Usage:

    TAGS=animated ./script/fixes/079_fix_duration.rb
2021-10-07 03:21:08 -05:00
evazion
c99d0523bb /media_assets: add basic index and show pages.
* Add a basic index page at https://danbooru.donmai.us/media_assets.
* Add a basic show page at https://danbooru.donmai.us/media_assets/1.
* Add ability to search /media_assets.json by metadata. Example:
** https://danbooru.donmai.us/media_assets.json?search[metadata][File:ColorComponents]=3
* Add a "»" link next to the filesize on posts linking to the metadata page.

Known issues:

* Sometimes the MD5 links on the /media_assets page return "That record
  was not found" errors. These are unfinished uploads that haven't been
  made into posts yet.
* No good way to search for custom metadata fields in the search form.
* Design is ugly.
2021-09-29 07:46:11 -05:00
evazion
79fdfa86ae Fix various rubocop warnings. 2021-09-27 00:46:13 -05:00