Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.
This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.
This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
Fix certain ugoiras having very low quality webm samples. This was
because we had a target bitrate of 5 Mbps, but this wasn't enough for
videos that were high resolution or that had choppy, hard-to-compress
motion, such as post 5081776 (nsfw).
NicoSeiga changed it so that on every login, you must enter a 2FA code
sent by email. This broke the NicoSeiga strategy. The fix is to just use
a static session cookie instead (and hope it doesn't expire, and isn't
tied to an IP).
The `nico_seiga_login` and `nico_seiga_password` config settings have
been removed from config/danbooru_default_config.rb and replaced by
`nico_seiga_user_session`. If you run your own Danbooru instance, you
will have to update your config file manually.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
Fixes a bug where the Foundation source strategy failed because http.rb
automatically sent a `Content-Length: 0` header with all GET requests,
which caused Foundation to return a 400 Bad Request error. This behavior
was fixed in http.rb 5.x.
http.rb 5.x has a breaking change where it now includes the request object
inside the response object, which we have to handle in a few places.
Add data attributes to thumbnails on the /uploads, /upload_media_assets,
and /media_assets pages. Add a `data-is-posted` attribute for styling
thumbnails based on whether they've already been posted.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.
API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
Fix an error when trying to upload a file larger than the file size
limit. In this case we tried to dump the whole HTTP response into the
error message, which included the binary file itself, which caused this
exception because it contained null bytes.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:
https://seiga.nicovideo.jp/image/source/10315315
Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
Fix the null source strategy setting the page URL. The page URL is
expected to be nil when we can't determine the page containing the image URL.
Fixes the upload_media_assets.page_url field being filled for uploads
from unknown sites.
Raise the timeout for downloading files from the source to 60 seconds globally.
Previously had a lower timeout because uploads were processed in the
foreground when not using the bookmarklet, and we didn't want to tie up
Puma worker processes with slow downloads. Now that all uploads are
processed in the background, we can have a higher timeout.
This exception was thrown by app/logical/pixiv_ajax_client.rb:406 when a
Pixiv API call failed with a network error. In this case we tried to log
the response body, but this failed because we returned a faked HTTP
response with an empty string for the body, which the http.rb library
didn't like because it was expecting an IO-like object for the body.
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
Fix a potential exploit where private information could be leaked if
it was contained in the error message of an unexpected exception.
For example, NoMethodError contains a raw dump of the object in the
error message, which could leak private user data if you could force a
User object to raise a NoMethodError.
Fix the error page to only show known-safe error messages from expected
exceptions, not unknown error messages from unexpected exceptions.
API changes:
* JSON errors now have a `message` param. The message will be blank for unknown exceptions.
* XML errors have a new format. This is a breaking change. They now look like this:
<result>
<success type="boolean">false</success>
<error>PaginationExtension::PaginationError</error>
<message>You cannot go beyond page 5000.</message>
<backtrace type="array">
<backtrace>app/logical/pagination_extension.rb:54:in `paginate'</backtrace>
<backtrace>app/models/application_record.rb:17:in `paginate'</backtrace>
<backtrace>app/logical/post_query_builder.rb:529:in `paginated_posts'</backtrace>
<backtrace>app/logical/post_sets/post.rb:95:in `posts'</backtrace>
<backtrace>app/controllers/posts_controller.rb:22:in `index'</backtrace>
</backtrace>
</result>
instead of like this:
<result success="false">You cannot go beyond page 5000.</result>
Make uploads faster by generating and saving thumbnails in parallel.
We generate each thumbnail in parallel, then send each thumbnail to the
backend image servers in parallel.
Most images have 5 variants: 'preview' (150x150), 180x180, 360x360,
720x720, and 'sample' (850px width). Plus the original file, that's 6
files we have to save. In production we have 2 image servers, so we have
to save each file twice, to 2 remote servers. Doing all this in parallel
should make uploads significantly faster.
Lock the post during replacement to ensure we have the latest version of
the tags and to ensure nobody else can modify the post until after the
replacement is finished.
Perform the replacement in a before_create callback so that it runs in a
transaction and if it fails, the transaction will rollback and the
replacement record won't be created.
Doing the replacement in a transaction isn't great because, for one
thing, it could hold the transaction open a long time, which isn't good
for the database. And two, if the transaction rolls back, the database
changes will be undone, but if the replacement file has already been saved
to disk, then it won't be undone, which could result in a dangling file.
* Fix a bug where creating posts failed if IQDB wasn't configured.
* Fix broken Skeb test caused by changed URL.
* Fix broken IP geolocation tests caused by API returning different data.
* Fix broken post regeneration tests.
* Move replacement tests from test/unit/upload_service_test.rb to
test/functional/post_replacement_controller_test.rb
* Move UploadService::Replacer to PostReplacementProcessor.
* Fix a minor bug where if you used the API to replace a post with a file,
the replacement would fail unless you passed an empty string for the
replacement_url.
Fix the upload page so that it shows similar images (IQDB matches) for
files uploaded from your computer. Before this only worked for files
uploaded from a source.
* Fix `UploadService is not a class` error.
* Update list of available job classes (remove UploadPreprocessorDelayedStartJob,
UploadServiceDelayedStartJob, add ProcessUploadJob).
Rework the upload process so that files are saved to Danbooru first
before the user starts tagging the upload.
The main user-visible change is that you have to select the file first
before you can start tagging it. Saving the file first lets us fix a
number of problems:
* We can check for dupes before the user tags the upload.
* We can perform dupe checks and show preview images for users not using the bookmarklet.
* We can show preview images without having to proxy images through Danbooru.
* We can show previews of videos and ugoira files.
* We can reliably show the filesize and resolution of the image.
* We can let the user save files to upload later.
* We can get rid of a lot of spaghetti code related to preprocessing
uploads. This was the cause of most weird "md5 confirmation doesn't
match md5" errors.
(Not all of these are implemented yet.)
Internally, uploading is now a two-step process: first we create an upload
object, then we create a post from the upload. This is how it works:
* The user goes to /uploads/new and chooses a file or pastes an URL into
the file upload component.
* The file upload component calls `POST /uploads` to create an upload.
* `POST /uploads` immediately returns a new upload object in the `pending` state.
* Danbooru starts processing the upload in a background job (downloading,
resizing, and transferring the image to the image servers).
* The file upload component polls `/uploads/$id.json`, checking the
upload `status` until it returns `completed` or `error`.
* When the upload status is `completed`, the user is redirected to /uploads/$id.
* On the /uploads/$id page, the user can tag the upload and submit it.
* The upload form calls `POST /posts` to create a new post from the upload.
* The user is redirected to the new post.
This is the data model:
* An upload represents a set of files uploaded to Danbooru by a user.
Uploaded files don't have to belong to a post. An upload has an
uploader, a status (pending, processing, completed, or error), a
source (unless uploading from a file), and a list of media assets
(image or video files).
* There is a has-and-belongs-to-many relationship between uploads and
media assets. An upload can have many media assets, and a media asset
can belong to multiple uploads. Uploads are joined to media assets
through a upload_media_assets table.
An upload could potentially have multiple media assets if it's a Pixiv
or Twitter gallery. This is not yet implemented (at the moment all
uploads have one media asset).
A media asset can belong to multiple uploads if multiple people try
to upload the same file, or if the same user tries to upload the same
file more than once.
New features:
* On the upload page, you can press Ctrl+V to paste an URL and immediately upload it.
* You can save files for upload later. Your saved files are at /uploads.
Fixes:
* Improved error messages when uploading invalid files, bad URLs, and
when forgetting the rating.
Fix the paginator not appearing when all posts on the page are hidden,
because of deleted posts, banned artists, censored tags, or non-safe
posts in safe mode. This prevented navigating to the next or previous
page.