Commit Graph

233 Commits

Author SHA1 Message Date
evazion
03560bafc6 uploads: add limit to prevent users from submitting too many uploads at once.
Add a limit so that users can't upload more if they already have more
than 250 images queued for upload.

For example, if you upload a Pixiv post that has 200 images, then you'll
have 200 queued images for upload. This will go down as the images are
processed. If you exceed the limit, then trying to create new uploads
will return an error.

This is to prevent single users from overwhelming the site by uploading
too many images at once, thereby preventing other users from uploading
because the job queue is backed up and can't process new uploads by
other users until existing uploads are finished.
2022-02-28 23:10:12 -06:00
evazion
26f4cf1ebd sources: factor out Source::URL::Skeb. 2022-02-25 02:06:57 -06:00
evazion
7d49ab6130 Add Danbooru::URL class.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
2022-02-22 00:17:53 -06:00
evazion
202dfe5d87 uploads: allow uploading multiple files from your computer at once.
Allow uploading multiple files from your computer at once.

The maximum limit is 100 files at once. There is still a 50MB size limit
that applies to the whole upload. This limit is at the Nginx level.

The upload widget no longer shows a thumbnail preview of the uploaded
file. This is because there isn't room for it in a multi-file upload,
and because the next page will show a preview anyway after the files are
uploaded.

Direct file uploads are processed synchronously, so they may be slow.

API change: the `POST /uploads` endpoint now expects the param to be
`upload[files][]`, not `upload[file]`.
2022-02-19 00:00:56 -06:00
evazion
093a808a36 Fix #4986: Add ability to filter images in /media_assets and /uploads depending on if they have become posts 2022-02-18 03:39:08 -06:00
evazion
6b56b6a122 uploads: fix error when source doesn't have any images.
Fix an error when trying to upload a source that doesn't have any
images, for example a Twitter post with no images.
2022-02-15 18:55:12 -06:00
evazion
02edb52569 uploads: enable multi-file uploads when uploading from source.
Make the upload page automatically detect when a source URL has multiple images
and let the user choose which images to post.

For example, when uploading a Twitter or Pixiv post with more than one image, we
direct the user to a page showing a thumbnail for each image and letting
them choose which ones to post.

This is similar to the batch upload page, except we actually download each image
in the background, instead of just hotlinking or proxying the thumbnails through
our servers. This avoids various problems with proxying and makes new features
possible, like showing which images in the batch have already been posted.
2022-02-14 16:13:55 -06:00
evazion
bdf83d1ffd uploads: refactor /uploads/:id page for multi-file uploads. 2022-02-14 00:41:08 -06:00
evazion
eb032d54c1 uploads: set upload_media_asset.status to active.
Fix the status being set to pending instead of active for new upload
media assets.
2022-02-14 00:40:40 -06:00
evazion
04d242c60c uploads: save filename, image URL, page URL for uploads.
* Save the filename for files uploaded from disk. This could be used in
  the future to extract source data if the filename is from a known site.

* Save both the image URL and the page URL for files uploaded from
  source. This is needed for multi-file uploads. The image URL is the
  URL of the file actually downloaded from the source. This can be
  different from the URL given by the user, if the user tried to upload
  a sample URL and we automatically changed it to the original URL. The
  page URL is the URL of the page containing the image. We don't always
  know this, for example if someone uploads a Twitter image without the
  bookmarklet, then we can't find the page URL.

* Add a fix script to backfill URLs for existing uploads. For file
  uploads, the filename will be set to "unknown.jpg". For source
  uploads, we fetch the source data again to get the image and page
  URLs. This may fail for uploads that have been deleted from the
  source since uploading.
2022-02-12 15:22:41 -06:00
evazion
9a23970ab1 uploads: fix media_asset_count. 2022-02-12 15:22:24 -06:00
evazion
44c9c7f1ac uploads: removed unused /uploads/preprocess route. 2022-02-11 03:15:12 -06:00
evazion
70d38d9e0b uploads: add columns needed for multi-file uploads.
* uploads.media_asset_count - the number of media assets attached to this upload.
* upload_media_assets.status - the status of each media asset attached to this upload (processing, active, failed)
* upload_media_assets.source_url - the source of each media asset attached to this upload
* upload_media_assets.error - the error message if uploading the media asset failed
2022-02-10 12:06:57 -06:00
evazion
1a61e329ba uploads: add column for error messages.
Change it so uploads store errors in an `error` column instead of in the
`status` field.
2022-02-07 15:44:39 -06:00
evazion
7c63ac8dbd uploads: drop unused columns. 2022-02-04 02:19:30 -06:00
evazion
2dfec29da7 uploads: mark old columns as ignored.
Mark old columns as ignored in preparation for dropping them. Make the
rating and tag_string nullable so they don't have to be set when
creating uploads and can be ignored too.
2022-02-03 14:07:09 -06:00
evazion
11b7bcac91 uploads: fix broken tests.
* Fix broken upload tests.
* Fix uploads to return an error if both a file and a source are given
  at the same time, or if neither are given. Also fix the error message
  in this case so that it doesn't include "base" at the start of the string.
* Fix uploads to percent-encode any Unicode characters in the source URL.
* Add a max filesize validation to media assets.
2022-01-29 05:14:49 -06:00
evazion
21dcf53dcb uploads: show similar images for disk uploads.
Fix the upload page so that it shows similar images (IQDB matches) for
files uploaded from your computer. Before this only worked for files
uploaded from a source.
2022-01-28 21:07:06 -06:00
evazion
abdab7a0a8 uploads: rework upload process.
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.

The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:

* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
  uploads. This was the cause of most weird "md5 confirmation doesn't
  match md5" errors.

(Not all of these are implemented yet.)

Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:

* The user goes to /uploads/new and chooses a file or pastes an URL into
  the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
  resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
  upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.

This is the data model:

* An upload represents a set of files uploaded to Danbooru by a user.
  Uploaded files don't have to belong to a post. An upload has an
  uploader, a status (pending, processing, completed, or error), a
  source (unless uploading from a file), and a list of media assets
  (image or video files).

* There is a has-and-belongs-to-many relationship between uploads and
  media assets. An upload can have many media assets, and a media asset
  can belong to multiple uploads. Uploads are joined to media assets
  through a upload_media_assets table.

  An upload could potentially have multiple media assets if it's a Pixiv
  or Twitter gallery. This is not yet implemented (at the moment all
  uploads have one media asset).

  A media asset can belong to multiple uploads if multiple people try
  to upload the same file, or if the same user tries to upload the same
  file more than once.

New features:

* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.

Fixes:

* Improved error messages when uploading invalid files, bad URLs, and
  when forgetting the rating.
2022-01-28 04:13:22 -06:00
evazion
f11c46b4f8 uploads: stop pruning uploads. 2022-01-28 04:13:22 -06:00
evazion
18c08688df Merge pull request #4947 from nonamethanks/fix-duration-twitter
Fix duration check for uploads from twitter
2021-12-29 22:33:59 -06:00
nonamethanks
15bd5f73b3 Fix duration check for uploads from twitter
Some twitter videos near the max duration had some stray
milliseconds that made the check fail.

For example
https://twitter.com/kivo_some_18/status/1152167154059321344?s=20 (nsfw)
has 140.053333 duration.
2021-12-29 14:25:14 +01:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
d258790199 uploads: don't delete files of abandoned uploads.
Just leave them. They don't take up that much space and they may be used
in the future if someone else tries to upload the same file.
2021-10-24 04:35:13 -05:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
1d034a3223 media assets: move more file-handling logic into MediaAsset.
Move more of the file-handling logic from UploadService and
StorageManager into MediaAsset. This is part of refactoring posts and
uploads to allow multiple images per post.
2021-10-18 00:10:29 -05:00
evazion
79fdfa86ae Fix various rubocop warnings. 2021-09-27 00:46:13 -05:00
evazion
ee5cd8330d uploads: fix exception when pruning expired uploads.
Hourly pruning of expired uploads was failing because of nil deference
errors in `media_asset.destroy!`. There are various cases where an
upload doesn't have a media asset, for example when the source url
fails to download or when the upload is of an invalid filetype.
2021-09-22 13:24:27 -05:00
evazion
3d660953d4 Add MediaMetadata model.
Add a model for storing image and video metadata for uploaded files.

Metadata is extracted using ExifTool. You will need to install ExifTool
after this commit. ExifTool 12.22 is the minimum required version
because we use the `--binary` option, which was added in this release.

The MediaMetadata model is separate from the MediaAsset model because
some files contain tons of metadata, and most of it is non-essential.
The MediaAsset model represents an uploaded file and contains essential
metadata, like the file's size and type, while the MediaMetadata model
represents all the other non-essential metadata associated with a file.

Metadata is stored as a JSON column in the database.

ExifTool returns all the file's metadata, not just the EXIF metadata.
EXIF is one of several types of image metadata, hence why we call
it MediaMetadata instead of EXIFMetadata.
2021-09-08 05:00:54 -05:00
evazion
07e23204b6 rubocop: fix various Rubocop warnings. 2021-06-17 04:17:53 -05:00
evazion
4deb8aeea2 uploads: disallow uploading new Flash files.
Flash is dead. It's no longer supported by browsers, it's not
well-supported by emulators, and only two Flash posts were uploaded in
the last year anyway. Old Flash files will continue to exist, but new
Flash uploads will no longer be allowed.
2021-03-31 20:47:35 -05:00
evazion
520b72948f Fix #4695: Raise max video length to match Twitter's (2:20). 2021-02-04 00:01:03 -06:00
evazion
94e125709c users: add Restricted user level.
Add a Restricted user level. Restricted users are level 10, below
Members. New users start out as Restricted if they sign up from a proxy
or an IP recently used by another user.

Restricted users can't update or edit any public content on the site
until they verify their email address, at which point they're promoted
to Member. Restricted users are only allowed to do personal actions
like keep favorites, keep favgroups and saved searches, mark dmails as
read or deleted, or mark forum posts as read.

The restricted state already existed before, the only change here is
that now it's an actual user level instead of a hidden state. Before it
was based on two hidden flags on the user, the `requires_verification`
flag (set when a user signs up from a proxy, etc), and the `is_verified`
flag (set after the user verifies their email). Making it a user level
means that now the Restricted status will be shown publicly.

Introducing a new level below Member means that we have to change every
`is_member?` check to `!is_anonymous` for every place where we used
`is_member?` to check that the current user is logged in.
2021-01-07 17:10:29 -06:00
evazion
ee4516f5fe searchable: refactor searchable_includes.
Pass searchable associations directly to search_attributes instead of
defining them separately in searchable_includes.
2020-12-16 23:57:07 -06:00
evazion
e771c0fca8 searchable: don't automatically include id, created_at, updated_at.
Don't make search methods on models call super in order to search
certain default attributes (id, created_at, updated_at). Simplifies some
magic.
2020-12-16 23:57:07 -06:00
evazion
8d87b1a0c0 models: fix deprecated errors[:base] << "message" calls.
Replace the idiom `errors[:base] << "message"` with
`errors.add(:base, "message")`. The former is deprecated in Rails 6.1.
2020-12-13 04:10:48 -06:00
BrokenEagle
c4009efccd Convert models to use new search includes mechanism 2020-07-27 19:29:18 +00:00
evazion
ed79b623cc Fix #4544: Show limited view of other user's uploads on the upload index.
* Show completed uploads to other users.
* Don't show failed or incomplete uploads to other users.
* Don't show tags to other users.
* Delete completed uploads after 1 hour.
* Delete incomplete uploads after 1 day.
* Delete failed uploads after 3 days.
2020-07-13 19:25:30 -05:00
evazion
8b5ffb4c43 uploads: allow admins to upload videos more than 2 minutes long.
At some point the ability for admins to bypass the video length
restriction got lost.

ref: https://danbooru.donmai.us/forum_topics/14647
2020-06-09 03:08:06 -05:00
evazion
cf88411dce uploads: fix /uploads listing search not working.
Upload#search was declared as an instance method instead of a class
method.
2020-05-24 00:29:19 -05:00
evazion
364343453c uploads: factor out remaining image methods to MediaFile. 2020-05-19 02:42:19 -05:00
evazion
45064853de uploads: move thumbnail generation code to MediaFile.
* Move image thumbnail generation code to MediaFile::Image.
* Move video thumbnail generation code to MediaFile::Video.
* Move ugoira->webm conversion code to MediaFile::Ugoira.

This separates thumbnail generation from the upload process so that it's
possible to generate thumbnails outside of uploads.
2020-05-18 04:19:04 -05:00
evazion
ad02e0f62c posts/index: fix rating:s being included in page title in safe mode.
Fixes bug described in d3e4ac7c17 (commitcomment-39049351)

When dealing with searches, there are several variables we have to keep
in mind:

* Whether tag aliases should be applied.
* Whether search terms should be sorted.
* Whether the rating:s and -status:deleted metatags should be added by
  safe mode and the hide deleted posts setting.

Which of these things we need to do depends on the context:

* We want to apply aliases when actually doing the search, calculating
  the count, looking up the wiki excerpt, recording missed/popular
  searches in Reportbooru, and calculating related tags for the sidebar,
  but not when displaying the raw search as typed by the user (for
  example, in the page title or in the tag search box).
* We want to sort the search when calculating cache keys for fast_count
  or related tags, and when recording missed/popular searches, but not
  in the page title or when displaying the raw search.
* We want to add rating:s and -status:deleted when performing the
  search, calculating the count, or recording missed/popular searches,
  but not when calculating related tags for the sidebar, or when
  displaying the page title or raw search.

Here we introduce normalized_query and try to use it in contexts where
query normalization is necessary. When to use the normalized query
versus the raw unnormalized query is still subtle and prone to error.
2020-05-12 21:47:00 -05:00
evazion
67aab0236d search: apply aliases after parsing searches.
Make PostQueryBuilder apply aliases earlier, immediately after parsing
the search.

On the post index page there are multiple places where we need to apply
aliases:

* When running the search with PostQueryBuilder#build.
* When calculating the search count with PostQueryBuilder#fast_count.
* When calculating the related tags for the sidebar.
* When tracking missed searches and popular searches for Reportbooru.
* When looking up wiki excerpts.

Applying aliases after parsing ensures we only have to apply aliases
once for all of these things.

We also normalize the order of tags in searches and strip repeated tags.
This is so that we have consistent cache keys for fast_count.

* Fixes searches for aliased tags being counted as missed searches (fixes #4433).
* Fixes wiki excerpts not showing up when searching for aliased tags.
2020-05-07 13:53:35 -05:00
evazion
dd0d9dff4a search: move misc search parsing helpers to PostQueryBuilder.
* Move various search parser helper methods (`has_metatag?`,
  `is_single_tag?` et al) from PostSets and the Tag model to
  PostQueryBuilder.

* Fix various minor bugs stemming from trying to check if a search query
  contains certain metatags using regexes or other adhoc techniques.
2020-04-23 01:51:30 -05:00
evazion
dc6575dc76 uploads: fix corrupted image detection.
* Fix corrupted image detection. We were shelling out to vips and trying
  to grep for error messages, but the error message for jpeg files changed.
  Now we load the file in ruby vips, which raises an error on failure.

* Don't attempt to redownload corrupted images. If a download completes
  without any errors yet the downloaded file is corrupt, then something is
  wrong at the source and redownloading is unlikely to help. Let the
  upload fail and the user retry if necessary.

* Validate that all uploads are uncorrupted, including files uploaded
  from a computer, not just files uploaded from a source.
2020-04-13 15:30:17 -05:00
evazion
1e0f6f730a uploads: only let users see their own uploads on /uploads listing. 2020-04-06 14:13:22 -05:00
evazion
743b6f0854 Fix #4377: Save commentary by default.
Remove the "Include artist commentary" checkbox. Commentary is included
by default unless the commentary fields are blank.
2020-04-04 00:46:36 -05:00
evazion
83a0cb0a71 models: refactor class methods into scopes. 2020-02-17 02:10:08 -06:00
evazion
8649ff6dbe API: remove various associated fields included by default.
Remove various associated fields that were included by default on
certain endpoints. API users can use the only param to include the
full association if they need these fields.

* /artists.json: urls.
* /artist_urls.json: artist.
* /comments.json: creator_name and updater_name.
* /notes.json: creator_name.
* /pools.json: creator_name.
* /posts.json: uploader_name, children_ids, pixiv_ugoira_frame_data.
* /post_appeals.json: is_resolved.
* /post_versions.json: updater_name.
* /uploads.json: uploader_name.
2020-02-15 06:17:11 -06:00