danbooru

Author	SHA1	Message	Date
evazion	e3af738371	tests: fix broken tests.	2022-08-24 02:03:37 -05:00
evazion	d7e08d1313	media assets: add ability to search by AI tags. Add ability to search the /media_assets index by AI tags. Multi-tag searches are supported, including AND/OR/NOT operators, but metatags aren't supported. Multi-tag searches will probably be slow. The default AI tag confidence threshold is 50%. There's a hidden search[min_score] URL param that lets you change this.	2022-07-06 01:38:41 -05:00
evazion	67798c9ece	Fix #5221 : Trying to upload an unsupported url shows ai tags error.	2022-07-01 18:13:36 -05:00
evazion	a9fe73a483	ai tags: save ai tags on upload. Save the AI tags when a media asset is uploaded.	2022-06-28 03:12:46 -05:00
evazion	6f24db92e5	ai tags: make ai tags accessible in api via includes. Make these things work: * https://danbooru.donmai.us/posts.json?only=ai_tags * https://danbooru.donmai.us/media_assets.json?only=ai_tags * https://danbooru.donmai.us/ai_tags.json?only=media_asset,post,tag	2022-06-26 20:37:35 -05:00
evazion	1aeb52186e	Add AI tag model and UI. Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags. AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that repo for details about the model. The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is designed to be as space-efficient as possible, since in production we have over 300 million AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus indexes. You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are potentially mistagged (or more likely where the AI missed the tag). You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by confidence level. You can also search unposted media assets by AI tag. To generate tags, use the `autotag` script from the Autotagger repo, something like this: docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images \| gzip > tags.csv.gz To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.	2022-06-24 04:54:26 -05:00
evazion	181639368c	posts: add is: and has: metatags. Add the following metatags: * is:parent * is:child * is:safe * is:questionable * is:explicit * is:sfw (same as -rating:q,e) * is:nsfw (same as rating:q,e) * is:active * is:deleted * is:pending * is:flagged * is:appealed * is:banned * is:modqueue * is:unmoderated * is:jpg * is:png * is:gif * is:mp4 * is:webm * is:swf * is:zip * has:parent * has:children * has:source * has:appeals * has:flags * has:replacements * has:comments * has:commentary * has:notes * has:pools All of these searches were already possible with other metatags, but these might be more convenient.	2022-05-18 13:04:15 -05:00
evazion	4ba993319a	media assets: add file_key, is_public columns. `file_key` is a random 9-character base-62 string that will be used as the image filename in the future. `is_public` is whether the image can be viewed without authentication or not. Users running downstream boorus must run `bin/rails db:migrate` and `script/fixes/109_generate_media_asset_file_keys.rb` after this commit.	2022-05-04 23:19:53 -05:00
evazion	ac98c142a4	posts: move expunged image to trash folder. When a post is expunged, move the image to a trash folder so it can be recovered if needed.	2022-05-03 05:51:09 -05:00
Michał Frąckiewicz	93635a20d9	Configurable max video duration	2022-03-21 19:22:34 +01:00
evazion	fc5aec7de0	media assets: optimize /media_assets?search[is_posted] query. Followup to `093a808a3`. Using a NOT EXISTS clause is much faster than the `LEFT OUTER JOIN posts WHERE posts.id IS NULL` clause generated by `.where.missing(:post)`.	2022-02-18 04:24:33 -06:00
evazion	093a808a36	Fix #4986 : Add ability to filter images in /media_assets and /uploads depending on if they have become posts	2022-02-18 03:39:08 -06:00
evazion	e4d7453180	uploads: improve error messages. Improve upload error messages when downloading an URL fails, or it isn't an image or video file.	2022-02-15 18:54:55 -06:00
evazion	87a00a1182	uploads: fix "ArgumentError: string contains null byte" error Fix an error when trying to upload a file larger than the file size limit. In this case we tried to dump the whole HTTP response into the error message, which included the binary file itself, which caused this exception because it contained null bytes.	2022-02-15 18:16:47 -06:00
evazion	02edb52569	uploads: enable multi-file uploads when uploading from source. Make the upload page automatically detect when a source URL has multiple images and let the user choose which images to post. For example, when uploading a Twitter or Pixiv post with more than one image, we direct the user to a page showing a thumbnail for each image and letting them choose which ones to post. This is similar to the batch upload page, except we actually download each image in the background, instead of just hotlinking or proxying the thumbnails through our servers. This avoids various problems with proxying and makes new features possible, like showing which images in the batch have already been posted.	2022-02-14 16:13:55 -06:00
evazion	e7744cb6e3	uploads: generate thumbnails in parallel. Make uploads faster by generating and saving thumbnails in parallel. We generate each thumbnail in parallel, then send each thumbnail to the backend image servers in parallel. Most images have 5 variants: 'preview' (150x150), 180x180, 360x360, 720x720, and 'sample' (850px width). Plus the original file, that's 6 files we have to save. In production we have 2 image servers, so we have to save each file twice, to 2 remote servers. Doing all this in parallel should make uploads significantly faster.	2022-02-04 16:20:50 -06:00
evazion	92a4d045e2	media assets: add thumbnail view to /media_assets page. Add a thumbnail view to the /media_assets page. This page lets you see all images uploaded to Danbooru by all users (although you can't see who the uploader is). Also add a link to this page in the subnav bar on the upload page.	2022-02-02 01:12:56 -06:00
evazion	43c4158d36	uploads: merge tags when a duplicate is uploaded (fix #3130 ). Automatically merge tags when uploading a duplicate. There are two cases: * You try to upload an image, but it's already on Danbooru. In this case you'll be immediately redirected to the original post, before you can start tagging the upload. * You're uploading an image, it wasn't a dupe when you first opened the upload page, but you got sniped while tagging it. In this case your tags will be merged with the original post, and you will be redirected to the original post. There are a few corner cases: * If you don't have permission to edit the original post, for example because it's banned or has a censored tag, then your tags won't be merged and will be silently ignored. * Only the tags, rating, and parent ID will be merged. The source and artist commentary won't be merged. This is so that if an artist uploads the exact same file to multiple sites, the new source won't override the original source. * Some tags might be contradictory. For example, the new post might be tagged translation_request, but the original post might already be translated. It's up to the user to fix these things afterwards.	2022-01-30 03:14:22 -06:00
evazion	11b7bcac91	uploads: fix broken tests. * Fix broken upload tests. * Fix uploads to return an error if both a file and a source are given at the same time, or if neither are given. Also fix the error message in this case so that it doesn't include "base" at the start of the string. * Fix uploads to percent-encode any Unicode characters in the source URL. * Add a max filesize validation to media assets.	2022-01-29 05:14:49 -06:00
evazion	abdab7a0a8	uploads: rework upload process. Rework the upload process so that files are saved to Danbooru first before the user starts tagging the upload. The main user-visible change is that you have to select the file first before you can start tagging it. Saving the file first lets us fix a number of problems: * We can check for dupes before the user tags the upload. * We can perform dupe checks and show preview images for users not using the bookmarklet. * We can show preview images without having to proxy images through Danbooru. * We can show previews of videos and ugoira files. * We can reliably show the filesize and resolution of the image. * We can let the user save files to upload later. * We can get rid of a lot of spaghetti code related to preprocessing uploads. This was the cause of most weird "md5 confirmation doesn't match md5" errors. (Not all of these are implemented yet.) Internally, uploading is now a two-step process: first we create an upload object, then we create a post from the upload. This is how it works: * The user goes to /uploads/new and chooses a file or pastes an URL into the file upload component. * The file upload component calls `POST /uploads` to create an upload. * `POST /uploads` immediately returns a new upload object in the `pending` state. * Danbooru starts processing the upload in a background job (downloading, resizing, and transferring the image to the image servers). * The file upload component polls `/uploads/$id.json`, checking the upload `status` until it returns `completed` or `error`. * When the upload status is `completed`, the user is redirected to /uploads/$id. * On the /uploads/$id page, the user can tag the upload and submit it. * The upload form calls `POST /posts` to create a new post from the upload. * The user is redirected to the new post. This is the data model: * An upload represents a set of files uploaded to Danbooru by a user. Uploaded files don't have to belong to a post. An upload has an uploader, a status (pending, processing, completed, or error), a source (unless uploading from a file), and a list of media assets (image or video files). * There is a has-and-belongs-to-many relationship between uploads and media assets. An upload can have many media assets, and a media asset can belong to multiple uploads. Uploads are joined to media assets through a upload_media_assets table. An upload could potentially have multiple media assets if it's a Pixiv or Twitter gallery. This is not yet implemented (at the moment all uploads have one media asset). A media asset can belong to multiple uploads if multiple people try to upload the same file, or if the same user tries to upload the same file more than once. New features: * On the upload page, you can press Ctrl+V to paste an URL and immediately upload it. * You can save files for upload later. Your saved files are at /uploads. Fixes: * Improved error messages when uploading invalid files, bad URLs, and when forgetting the rating.	2022-01-28 04:13:22 -06:00
evazion	1c5786d20f	posts: remove cropped thumbnails.	2021-12-16 15:58:29 -06:00
evazion	163ba8e7da	posts: micro-optimize allocations during thumbnail generation. Do a few micro-optimizations to reduce the number of memory allocations during thumbnail generation. This commit, combined with freezing string literals in a7dc05 and 67b961, reduces the number of allocations on the front page from 180,000 to 150,000, and the number of retained objects from 8,000 to 4,000.	2021-12-16 00:53:48 -06:00
evazion	a7dc05ce63	Enable frozen string literals. Make all string literals immutable by default.	2021-12-14 21:33:27 -06:00
evazion	c22f7b799b	media assets: fix error when generating thumbnails for corrupt files. Fix an error being raised when trying to generate thumbnails for corrupt files. If the original image is corrupt, then ignore any errors and let libvips try to generate a thumbnail as best it can. This will usually result in an incomplete thumbnail.	2021-12-05 21:46:14 -06:00
evazion	ad49a10147	media assets: fix bug in thumbnail generation. Fix thumbnail generation throwing a NoMatchingPatternError.	2021-12-05 19:04:17 -06:00
evazion	9cb70fa632	posts: add 720x720 thumbnail size. This is used to provide higher resolution thumbnails for high pixel density displays, such as phones or laptops. If your screen has a 2x pixel density ratio, then 360x360 thumbnails will be rendered at 720x720 resolution. We use WebP here because it's about 15% smaller than the equivalent JPEG, and because if a device has a high enough pixel density to use this, then it probably supports WebP. 720x720 thumbnails average about 36kb in size, compared to 20.35kb for 360x360 thumbnails and 7.55kb for 180x180 thumbnails.	2021-12-05 09:19:29 -06:00
evazion	17537084fe	posts: generate 180x180px and 360x360px thumbnails (#4932 ). Add two new thumbnail sizes. These new thumbnail sizes are generated on upload, but not used yet.	2021-12-02 23:42:44 -06:00
evazion	e5ba6d4afc	MediaFile: fix thumbnail dimension calculation. Calculate the dimensions of thumbnails ourselves instead of letting libvips calculate them for us. This way we know the exact size of thumbnails, so we can set the right width and height for <img> tags. If we let libvips calculate thumbnail sizes for us, then we can't predict the exact size of thumbnails, because sometimes libvips rounds numbers differently than us.	2021-12-01 04:45:26 -06:00
evazion	8f36ebe2b8	Fix #4914 : RuntimeError corrupting uploads Bug: If a media asset got stuck in the 'processing' state during upload, then it would stay stuck forever and the file couldn't be uploaded again later. Fix: Mark stuck assets as failed before raising the "Upload failed" error. Once the asset is marked as failed, it can be uploaded again later. Also, only wait for assets to finish processing if they were uploaded less than 5 minutes ago. If a processing asset is more than 5 minutes old, consider it stuck and mark it as failed immediately. Assets getting stuck in the processing state is a 'this should never happen' error. Normally if any kind of exception is raised while uploading the asset, the asset will be set to the 'failed' state. The only way an asset can get stuck is if it fails and the exception handler doesn't run, or the exception handler itself fails. This might happen if the process is unexpectedly killed, or possibly if the HTTP request times out and a TimeoutError is raised at an inopportune time. See below for discussion of issues with Timeout. [1]: https://vaneyckt.io/posts/the_disaster_that_is_rubys_timeout_method/ [2]: https://jvns.ca/blog/2015/11/27/why-rubys-timeout-is-dangerous-and-thread-dot-raise-is-terrifying/ [3]: https://adamhooper.medium.com/in-ruby-dont-use-timeout-77d9d4e5a001 [4]: https://ruby-doc.org/core-3.0.2/Thread.html#method-c-handle_interrupt-label-Guarding+from+Timeout-3A-3AError	2021-11-08 18:22:04 -06:00
evazion	4095d14f2a	media assets: fix tagged filenames option. Fix the `enable_seo_post_urls` config option not being respected. This option controls whether filenames in image URLs contain the tags. This option requires URLs rewrites in Nginx to work so it's disabled by default.	2021-10-29 07:14:21 -05:00
evazion	082544ab03	StorageManager: remove Post-specific code. Refactor StorageManager to remove all image URL generation code. Instead the image URL generation code lives in MediaAsset. Now StorageManager is only concerned with how to read and write files to remote storage backends like S3 or SFTP, not with how image URLs should be generated. This way the file storage code isn't tightly coupled to posts, so it can be used to store any kind of file, not just images belonging to posts.	2021-10-27 00:05:30 -05:00
evazion	afe5095ee6	posts: mark media asset as expunged when post is expunged. Fix it so that when a post is expunged, the media asset is also marked as expunged. This way the files will be deleted, but the media asset will still remain as a record of what was expunged. The media asset will have the md5, width, height, file ext, and file size of the deleted file.	2021-10-26 02:53:32 -05:00
evazion	f5e7d50dbb	media assets: don't destroy ugoira data on destroy. Don't destroy Pixiv Ugoira frame data when the media asset is destroyed. This is wrong because when uploads were pruned, it could delete the frame data of an active post.	2021-10-24 04:35:13 -05:00
evazion	5c7a0f225c	media assets: prevent duplicate media assets. Add a md5 uniqueness constraint on media assets to prevent duplicate assets from being created. This way we can guarantee that there is one active media asset per uploaded file. Also make it so that if two people are uploading the same file at the same time, the file is processed only once.	2021-10-24 04:35:06 -05:00
evazion	bc506ed1b8	uploads: refactor to simplify ugoira-handling and replacements: * Make it so replacing a post doesn't generate a dummy upload as a side effect. * Make it so you can't replace a post with itself (the post should be regenerated instead). * Refactor uploads and replacements to save the ugoira frame data when the MediaAsset is created, not when the post is created. This way it's possible to view the ugoira before the post is created. * Make `download_file!` in the Pixiv source strategy return a MediaFile with the ugoira frame data already attached to it, instead of returning it in the `data` field then passing it around separately in the `context` field of the upload.	2021-10-18 05:18:46 -05:00
evazion	1d034a3223	media assets: move more file-handling logic into MediaAsset. Move more of the file-handling logic from UploadService and StorageManager into MediaAsset. This is part of refactoring posts and uploads to allow multiple images per post.	2021-10-18 00:10:29 -05:00
evazion	0731b07d27	posts: store duration of animations and videos. Start storing the duration of animations and videos in the `duration` field on the media_assets table. This had to wait until `3d30bfd69` was deployed, which had to wait until Postgres was upgraded in order to add the duration column to the media_assets table without downtime. Also add a fix script to backfill the duration on existing posts. Usage: TAGS=animated ./script/fixes/079_fix_duration.rb	2021-10-07 03:21:08 -05:00
evazion	c99d0523bb	/media_assets: add basic index and show pages. * Add a basic index page at https://danbooru.donmai.us/media_assets. * Add a basic show page at https://danbooru.donmai.us/media_assets/1. * Add ability to search /media_assets.json by metadata. Example: ** https://danbooru.donmai.us/media_assets.json?search[metadata][File:ColorComponents]=3 * Add a "»" link next to the filesize on posts linking to the metadata page. Known issues: * Sometimes the MD5 links on the /media_assets page return "That record was not found" errors. These are unfinished uploads that haven't been made into posts yet. * No good way to search for custom metadata fields in the search form. * Design is ugly.	2021-09-29 07:46:11 -05:00
evazion	79fdfa86ae	Fix various rubocop warnings.	2021-09-27 00:46:13 -05:00
evazion	01cdc7da7f	media assets: add status column.	2021-09-26 08:06:13 -05:00
evazion	ab3f35580f	metadata: move metadata parsing into ExifTool::Metadata. Move the metadata parsing code from MediaAsset to ExifTool::Metadata so we can use it outside the context of a MediaAsset, in particular when dealing with a MediaFile that hasn't been saved to disk yet.	2021-09-26 07:19:36 -05:00
evazion	74b03a7bd0	posts: fix incorrect exif rotation for PNGs. Fix a bug where where PNG images could be incorrectly detected as exif-rotated. This would happen when a PNG contained the IFD0:Orientation flag. It's technically possible for a PNG to contain this flag, but it's ignored by libvips and by browsers. post #3762340 (nsfw) is an example of a PNG like this. The fix is to use `autorot` to let libvips apply the rotation instead of trying to interpret the exif data ourselves. Note that libvips-8.9 has a bug where it doesn't strip the orientation flag after applying `autorot`, which leads to the image being incorrectly rotated a second time when generating the thumbnail. Use libvips-8.11 instead.	2021-09-23 00:10:00 -05:00
evazion	6740ef17ab	posts: fix detection of exif_rotation tag. `IFD0:Orientation` is the orientation of the main image. `IFD1:Orientation` is the orientation of the embedded thumbnail, if it has one. Using `IFD1:Orientation` was incorrect here because some images have a non-rotated main image but a rotated thumbnail. Post #1023563 is an example.	2021-09-22 11:17:28 -05:00
evazion	c69ba54b5a	Fix #4442 : Autotag image metadata. Autotag `greyscale`, `non-repeating_animation`, and `exif_rotation`. Note that this does not detect all (or even most) greyscale images. Artists often save greyscale images as RGB instead of as greyscale.	2021-09-21 11:18:06 -05:00
evazion	d5981754c4	posts: automatically tag animated_gif & animated_png on tag edit. Automatically tag animated_gif and animated_png when a post is edited. Add them back if the user tries to remove them from an animated post, or remove them if the user tries to add them to a non-animated post. Before we added these tags at upload time, but it was possible for users to remove them after upload, or to incorrectly add them to non-animated posts. They were added at upload time because we couldn't afford to open the file and parse the metadata on every tag edit. Now that we save the metadata in the database, we can do this. This also makes it so you can't tag ugoira on non-ugoira files. Known bug: it's possible to have an animated GIF where every frame is identical. Post #3770975 is an example. This will be detected as an animated GIF even though visually it doesn't appear to be animated. Fixes #4041: Animated_gif tag not added to preprocessed uploads	2021-09-21 08:26:02 -05:00
evazion	ea6e47125e	metadata: add ability to search exif metadata. Usage: * https://danbooru.donmai.us/media_metadata?search[has_metadata]=true * https://danbooru.donmai.us/media_metadata?search[has_metadata]=false * https://danbooru.donmai.us/media_metadata?search[metadata_has_key]=GIF:GIFVersion * https://danbooru.donmai.us/media_metadata?search[metadata][GIF:GIFVersion]=89a * https://danbooru.donmai.us/media_metadata?search[metadata][GIF:GIFVersion]&search[metadata][GIF:BackgroundColor]=0	2021-09-16 00:25:21 -05:00
evazion	3d660953d4	Add MediaMetadata model. Add a model for storing image and video metadata for uploaded files. Metadata is extracted using ExifTool. You will need to install ExifTool after this commit. ExifTool 12.22 is the minimum required version because we use the `--binary` option, which was added in this release. The MediaMetadata model is separate from the MediaAsset model because some files contain tons of metadata, and most of it is non-essential. The MediaAsset model represents an uploaded file and contains essential metadata, like the file's size and type, while the MediaMetadata model represents all the other non-essential metadata associated with a file. Metadata is stored as a JSON column in the database. ExifTool returns all the file's metadata, not just the EXIF metadata. EXIF is one of several types of image metadata, hence why we call it MediaMetadata instead of EXIFMetadata.	2021-09-08 05:00:54 -05:00
evazion	b068c113a8	Add MediaAsset model. A MediaAsset represents an image or video file uploaded to Danbooru. It stores the metadata associated with the image or video. This is to work on decoupling files from posts so that images can be uploaded separately from posts.	2021-09-02 06:07:52 -05:00

48 Commits