Add stricter username rules:
* Only allow usernames to contain basic letters, numbers, CJK characters, underscores, dashes and periods.
* Don't allow names to start or end with punctuation.
* Don't allow names to have multiple underscores in a row.
* Don't allow active users to have names that look like deleted users (e.g. "user_1234").
* Don't allow emoji or any other Unicode characters except for Chinese, Japanese, and Korean
characters. CJK characters are currently grandfathered in but will be disallowed in the future.
Users with an invalid name will be shown a permanent sitewide banner until they change their name.
The median username length is 8 characters. The 99% percentile is 18
characters. The 99.9% percentile is 24 characters. About 750 users have
a name more than 24 characters long.
This doesn't do anything about existing users with long usernames.
Note that this is the length in Unicode codepoints, not grapheme
clusters. Some Unicode characters and emoji may be a single glyph but
composed of multiple codepoints.
Fix `Cannot write log file 'ffmpeg2pass-0.log' for pass-1 encoding: Permission denied` error
when uploading ugoira files. Caused by the fact that 2-pass encoding tries to write a log file in
the current directory by default, which fails in production because the default working directory in
the Docker image is /danbooru, which is read-only.
Also fixes the uploader uploading all images when trying to upload only a
single image in a multi-image work. Caused by `image_urls` incorrectly
returning all images when the source strategy was given a url for a
single image.
Add `#basename`, `#filename`, and `#file_ext` utility methods to
Danbooru::URL and change a few places to use them. Simplifies parsing
filenames in source URLs in various places.
Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.
This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.
This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
Fix certain ugoiras having very low quality webm samples. This was
because we had a target bitrate of 5 Mbps, but this wasn't enough for
videos that were high resolution or that had choppy, hard-to-compress
motion, such as post 5081776 (nsfw).
NicoSeiga changed it so that on every login, you must enter a 2FA code
sent by email. This broke the NicoSeiga strategy. The fix is to just use
a static session cookie instead (and hope it doesn't expire, and isn't
tied to an IP).
The `nico_seiga_login` and `nico_seiga_password` config settings have
been removed from config/danbooru_default_config.rb and replaced by
`nico_seiga_user_session`. If you run your own Danbooru instance, you
will have to update your config file manually.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
Fixes a bug where the Foundation source strategy failed because http.rb
automatically sent a `Content-Length: 0` header with all GET requests,
which caused Foundation to return a 400 Bad Request error. This behavior
was fixed in http.rb 5.x.
http.rb 5.x has a breaking change where it now includes the request object
inside the response object, which we have to handle in a few places.
Add data attributes to thumbnails on the /uploads, /upload_media_assets,
and /media_assets pages. Add a `data-is-posted` attribute for styling
thumbnails based on whether they've already been posted.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.
API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
Fix an error when trying to upload a file larger than the file size
limit. In this case we tried to dump the whole HTTP response into the
error message, which included the binary file itself, which caused this
exception because it contained null bytes.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:
https://seiga.nicovideo.jp/image/source/10315315
Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
Fix the null source strategy setting the page URL. The page URL is
expected to be nil when we can't determine the page containing the image URL.
Fixes the upload_media_assets.page_url field being filled for uploads
from unknown sites.
Raise the timeout for downloading files from the source to 60 seconds globally.
Previously had a lower timeout because uploads were processed in the
foreground when not using the bookmarklet, and we didn't want to tie up
Puma worker processes with slow downloads. Now that all uploads are
processed in the background, we can have a higher timeout.
This exception was thrown by app/logical/pixiv_ajax_client.rb:406 when a
Pixiv API call failed with a network error. In this case we tried to log
the response body, but this failed because we returned a faked HTTP
response with an empty string for the body, which the http.rb library
didn't like because it was expecting an IO-like object for the body.
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
Fix a potential exploit where private information could be leaked if
it was contained in the error message of an unexpected exception.
For example, NoMethodError contains a raw dump of the object in the
error message, which could leak private user data if you could force a
User object to raise a NoMethodError.
Fix the error page to only show known-safe error messages from expected
exceptions, not unknown error messages from unexpected exceptions.
API changes:
* JSON errors now have a `message` param. The message will be blank for unknown exceptions.
* XML errors have a new format. This is a breaking change. They now look like this:
<result>
<success type="boolean">false</success>
<error>PaginationExtension::PaginationError</error>
<message>You cannot go beyond page 5000.</message>
<backtrace type="array">
<backtrace>app/logical/pagination_extension.rb:54:in `paginate'</backtrace>
<backtrace>app/models/application_record.rb:17:in `paginate'</backtrace>
<backtrace>app/logical/post_query_builder.rb:529:in `paginated_posts'</backtrace>
<backtrace>app/logical/post_sets/post.rb:95:in `posts'</backtrace>
<backtrace>app/controllers/posts_controller.rb:22:in `index'</backtrace>
</backtrace>
</result>
instead of like this:
<result success="false">You cannot go beyond page 5000.</result>
Make uploads faster by generating and saving thumbnails in parallel.
We generate each thumbnail in parallel, then send each thumbnail to the
backend image servers in parallel.
Most images have 5 variants: 'preview' (150x150), 180x180, 360x360,
720x720, and 'sample' (850px width). Plus the original file, that's 6
files we have to save. In production we have 2 image servers, so we have
to save each file twice, to 2 remote servers. Doing all this in parallel
should make uploads significantly faster.
Lock the post during replacement to ensure we have the latest version of
the tags and to ensure nobody else can modify the post until after the
replacement is finished.
Perform the replacement in a before_create callback so that it runs in a
transaction and if it fails, the transaction will rollback and the
replacement record won't be created.
Doing the replacement in a transaction isn't great because, for one
thing, it could hold the transaction open a long time, which isn't good
for the database. And two, if the transaction rolls back, the database
changes will be undone, but if the replacement file has already been saved
to disk, then it won't be undone, which could result in a dangling file.