Change how artist URLs are normalized in artist entries. Don't try to secretly
convert image URLs to profile URLs in artist entries. For example, if someone puts a
Pixiv image URL in an artist entry, don't secretly try to fetch the source and
convert it into a profile URL in the `normalized_url` field.
We did this because years ago, it was standard practice to put image URLs in artist
entries. Pixiv image URLs used to contain the artist's username, so we used to put
image URLs in artist entries for artist finding purposes. But Pixiv changed it so
that image URLs no longer contained the username, so we dealt with it by adding a
`normalized_url` column to artist_urls and secretly converting image URLs to profile
URLs in this field. But this is no longer necessary because now we don't normally put
image URLs in artist entries in the first place.
Now the `profile_url` method in `Source::URL` is used to normalize URLs in artist
entries. This lets us parse various profile URL formats and normalize them into a
single canonical form.
This also removes the `normalize_for_artist_finder` method from source strategies.
Instead the `profile_url` method is used for artist finding purposes. So the profile
URL returned by the source strategy needs to be the same as the URL in the artist
entry in order for artist finding to work.
Add stricter username rules:
* Only allow usernames to contain basic letters, numbers, CJK characters, underscores, dashes and periods.
* Don't allow names to start or end with punctuation.
* Don't allow names to have multiple underscores in a row.
* Don't allow active users to have names that look like deleted users (e.g. "user_1234").
* Don't allow emoji or any other Unicode characters except for Chinese, Japanese, and Korean
characters. CJK characters are currently grandfathered in but will be disallowed in the future.
Users with an invalid name will be shown a permanent sitewide banner until they change their name.
Add a limit so that users can't upload more if they already have more
than 250 images queued for upload.
For example, if you upload a Pixiv post that has 200 images, then you'll
have 200 queued images for upload. This will go down as the images are
processed. If you exceed the limit, then trying to create new uploads
will return an error.
This is to prevent single users from overwhelming the site by uploading
too many images at once, thereby preventing other users from uploading
because the job queue is backed up and can't process new uploads by
other users until existing uploads are finished.
* Remove unnecessary trailing slashes when artist URLs are saved.
* Automatically add `http://` to new artist URLs if it's missing (before
this was an error; now it's automatically fixed).
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
Allow uploading multiple files from your computer at once.
The maximum limit is 100 files at once. There is still a 50MB size limit
that applies to the whole upload. This limit is at the Nginx level.
The upload widget no longer shows a thumbnail preview of the uploaded
file. This is because there isn't room for it in a multi-file upload,
and because the next page will show a preview anyway after the files are
uploaded.
Direct file uploads are processed synchronously, so they may be slow.
API change: the `POST /uploads` endpoint now expects the param to be
`upload[files][]`, not `upload[file]`.
Followup to 093a808a3. Using a NOT EXISTS clause is much faster than the
`LEFT OUTER JOIN posts WHERE posts.id IS NULL` clause generated by
`.where.missing(:post)`.
Fix an error when trying to upload a file larger than the file size
limit. In this case we tried to dump the whole HTTP response into the
error message, which included the binary file itself, which caused this
exception because it contained null bytes.
Make the "completed" status for an upload mean "at least one file in the
upload successfully completed". The "error" status means "all files in
the upload failed".
This means that when an upload has multiple assets and some succeed and
some fail, the whole upload is considered completed. This can happen
when uploading multiple files and some files are over the size limit,
for example. The upload is considered failed only if all files in the
upload fail.
This fixes an issue where, if uploading a single file and that file
failed because it was over the size limit, then the upload wouldn't be
marked as failed.
Make the upload page automatically detect when a source URL has multiple images
and let the user choose which images to post.
For example, when uploading a Twitter or Pixiv post with more than one image, we
direct the user to a page showing a thumbnail for each image and letting
them choose which ones to post.
This is similar to the batch upload page, except we actually download each image
in the background, instead of just hotlinking or proxying the thumbnails through
our servers. This avoids various problems with proxying and makes new features
possible, like showing which images in the batch have already been posted.
This page shows each individual file you've uploaded. This is different
from the regular uploads page because files in multi-file uploads are
not grouped together.
* Save the filename for files uploaded from disk. This could be used in
the future to extract source data if the filename is from a known site.
* Save both the image URL and the page URL for files uploaded from
source. This is needed for multi-file uploads. The image URL is the
URL of the file actually downloaded from the source. This can be
different from the URL given by the user, if the user tried to upload
a sample URL and we automatically changed it to the original URL. The
page URL is the URL of the page containing the image. We don't always
know this, for example if someone uploads a Twitter image without the
bookmarklet, then we can't find the page URL.
* Add a fix script to backfill URLs for existing uploads. For file
uploads, the filename will be set to "unknown.jpg". For source
uploads, we fetch the source data again to get the image and page
URLs. This may fail for uploads that have been deleted from the
source since uploading.
This is needed for multi-file uploads. We need to know both the image
url and the page url to set the post's source correctly when converting
an upload media asset into a post.
Make upload_media_assets.media_asset_id nullable in order to support
multi-file uploads. The media asset will be null while the image is
still being downloaded from the source.
* uploads.media_asset_count - the number of media assets attached to this upload.
* upload_media_assets.status - the status of each media asset attached to this upload (processing, active, failed)
* upload_media_assets.source_url - the source of each media asset attached to this upload
* upload_media_assets.error - the error message if uploading the media asset failed
* Group URLs by site.
* List most important URLs first and dead URLs last.
* Add site icons next to URLs.
* Put other names and group name beneath the artist name, instead of beneath the wiki.
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
Fix two issues that could lead to duplicate errors when creating posts:
* Fix the submit button on the upload form to disable itself on submit, to prevent
accidental double submit errors.
* Fix a race condition when checking for MD5 duplicates. MD5 uniqueness is checked on both
the Rails level, with a uniqueness validation, and on the database level, with a unique
index on the md5 column. Creating a post could fail with an ActiveRecord::RecordNotUnique
error if the uniqueness validation in Rails passed, but the uniqueness constraint in the
database failed. In this case, we catch the RecordNotUnique error and convert it to a
Rails validation error so we can treat it like a normal validation failure.
`string.mb_chars.downcase` was used to correctly downcase Unicode
characters when downcasing strings in Ruby <2.4. This hasn't been needed
since Ruby 2.4.
Make uploads faster by generating and saving thumbnails in parallel.
We generate each thumbnail in parallel, then send each thumbnail to the
backend image servers in parallel.
Most images have 5 variants: 'preview' (150x150), 180x180, 360x360,
720x720, and 'sample' (850px width). Plus the original file, that's 6
files we have to save. In production we have 2 image servers, so we have
to save each file twice, to 2 remote servers. Doing all this in parallel
should make uploads significantly faster.