Commit Graph

2623 Commits

Author SHA1 Message Date
evazion
a160a3acce users: add stricter username rules.
Add stricter username rules:

* Only allow usernames to contain basic letters, numbers, CJK characters, underscores, dashes and periods.
* Don't allow names to start or end with punctuation.
* Don't allow names to have multiple underscores in a row.
* Don't allow active users to have names that look like deleted users (e.g. "user_1234").
* Don't allow emoji or any other Unicode characters except for Chinese, Japanese, and Korean
  characters. CJK characters are currently grandfathered in but will be disallowed in the future.

Users with an invalid name will be shown a permanent sitewide banner until they change their name.
2022-03-05 01:08:53 -06:00
evazion
b4620f561c users: lower max username length to 25 characters.
The median username length is 8 characters. The 99% percentile is 18
characters. The 99.9% percentile is 24 characters. About 750 users have
a name more than 24 characters long.

This doesn't do anything about existing users with long usernames.

Note that this is the length in Unicode codepoints, not grapheme
clusters. Some Unicode characters and emoji may be a single glyph but
composed of multiple codepoints.
2022-03-01 21:23:21 -06:00
evazion
99221af855 ugoiras: fix regression in 7031fd13d.
Fix `Cannot write log file 'ffmpeg2pass-0.log' for pass-1 encoding: Permission denied` error
when uploading ugoira files. Caused by the fact that 2-pass encoding tries to write a log file in
the current directory by default, which fails in production because the default working directory in
the Docker image is /danbooru, which is read-only.
2022-03-01 00:16:55 -06:00
evazion
7031fd13d7 ugoiras: encode .webm samples using VP9 instead of VP8.
Switch the codec for .webm samples from VP8 to VP9. All modern browsers
support VP9 (Safari was the last to add support in ~2020), so it should
be safe to provide only VP9 .webms without a fallback.

VP9 lets us use two-pass encoding, which should offer better compression.

Fixes ugoira samples still having poor quality even after 4c652cf3e.
4c652cf3e tried to remove the max bitrate limit by setting `-b:v 0`, but
this only worked in FFmpeg 4.2. In production Danbooru uses FFmpeg 4.4,
and apparently in 4.4 `-b:v 0` means "use the default max bitrate of
256kb/s" instead of "no bitrate limit".

https://trac.ffmpeg.org/wiki/Encode/VP9
https://developers.google.com/media/vp9/bitrate-modes
https://developers.google.com/media/vp9/settings/vod
http://wiki.webmproject.org/ffmpeg/vp9-encoding-guide
https://www.reddit.com/r/AV1/comments/k7colv/encoder_tuning_part_1_tuning_libvpxvp9_be_more/
2022-02-28 22:02:56 -06:00
user
2600dcdbfa nijie: extract post ID from new image URL. 2022-02-28 21:14:47 +01:00
user
550e0bef93 nijie: fix pattern for new image URL. 2022-02-28 21:14:31 +01:00
evazion
1609059bf4 sources: factor out Source::URL::Fanbox.
Also fix it so that we grab the full image for cover URLs like this:

* Sample: https://pixiv.pximg.net/c/1620x580_90_a2_g5/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
* Full: https://pixiv.pximg.net/fanbox/public/images/creator/1566167/cover/QqxYtuWdy4XWQx1ZLIqr4wvA.jpeg
2022-02-28 06:25:06 -06:00
evazion
317ec886bc sources: factor out Source::URL::Nijie.
Also fixes the uploader uploading all images when trying to upload only a
single image in a multi-image work. Caused by `image_urls` incorrectly
returning all images when the source strategy was given a url for a
single image.
2022-02-27 02:27:35 -06:00
evazion
926a8fa81f Danbooru::URL: add #basename, #filename, and #file_ext utility methods.
Add `#basename`, `#filename`, and `#file_ext` utility methods to
Danbooru::URL and change a few places to use them. Simplifies parsing
filenames in source URLs in various places.
2022-02-27 02:27:21 -06:00
evazion
fcf517834d sources: factor out Source::URL::ArtStation. 2022-02-26 21:03:49 -06:00
evazion
9169f00e80 sources: factor out Source::URL::Moebooru. 2022-02-26 17:46:44 -06:00
evazion
74fdeef10c sources: factor out Source::URL::Mastodon. 2022-02-26 15:08:27 -06:00
evazion
86d8e2d13d sources: factor out Source::URL::Lofter. 2022-02-25 23:43:10 -06:00
evazion
f062f2d145 sources: factor out Source::URL::Newgrounds.
Also fix it so that the image URL is set as the source for Newgrounds
posts, not the page URL. It's possible to generate the page URL from the
image URL (except for images after the first in multi-image posts).

* Page: https://www.newgrounds.com/art/view/natthelich/weaver
* Image: https://art.ngfiles.com/images/1520000/1520217_natthelich_weaver.jpg?f1606365031
2022-02-25 23:04:03 -06:00
evazion
64472a7b7e sources: factor out Source::URL::HentaiFoundry.
Add support for these URL types:

* http://pictures.hentai-foundry.com//s/soranamae/363663.jpg
* http://www.hentai-foundry.com/piccies/d/dmitrys/1183.jpg
* http://www.hentai-foundry.com/pic-149160.php
* http://www.hentai-foundry.com/user-RockCandy.php
* http://www.hentai-foundry.com/profile-sawao.php

These URL types are obsolete, but still present in some old posts.
2022-02-25 22:01:17 -06:00
evazion
e6ded89f85 sources: factor out Source::URL::Plurk.
Also fix it so that for adult works, we get the images posted by the
artist in the replies. Example: https://www.plurk.com/p/omc64y (nsfw).
2022-02-25 02:06:57 -06:00
evazion
26f4cf1ebd sources: factor out Source::URL::Skeb. 2022-02-25 02:06:57 -06:00
evazion
ffe52f5ead sources: factor out Source::URL::Foundation.
Add support for a couple more URL types:

* https://foundation.app/@asuka111art/dinner-with-cats-82426
* https://f8n-production-collection-assets.imgix.net/0x3B3ee1931Dc30C1957379FAc9aba94D1C48a5405/128711/QmcBfbeCMSxqYB3L1owPAxFencFx3jLzCPFx6xUBxgSCkH/nft.png

Also include these URLs in the list of profile URLs:

* https://foundation.app/0x7E2ef75C0C09b2fc6BCd1C68B6D409720CcD58d2 (for https://foundation.app/@mochiiimo)

These URLs should be stable even if the user changes their name.
2022-02-23 23:49:31 -06:00
evazion
043c08eb05 sources: factor out Source::URL::TwitPic. 2022-02-23 23:49:31 -06:00
evazion
7ed8f95a8e sources: add Source::URL class; factor out Source::URL::Twitter.
Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.

This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.

This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
2022-02-23 23:46:04 -06:00
evazion
4c652cf3ec ugoiras: increase quality of webm samples.
Fix certain ugoiras having very low quality webm samples. This was
because we had a target bitrate of 5 Mbps, but this wasn't enough for
videos that were high resolution or that had choppy, hard-to-compress
motion, such as post 5081776 (nsfw).
2022-02-23 02:50:14 -06:00
evazion
112b323f01 foundation: fix exception when uploading new Foundation url format.
Fix 'null value in column "source_url"' exception when uploading urls like this:

* https://foundation.app/@KILLERGF/kgfgen/4
* https://foundation.app/@mochiiimo/foundation/97376
2022-02-22 13:29:28 -06:00
evazion
7b009cc893 nicoseiga: fix inability to login to nicoseiga.
NicoSeiga changed it so that on every login, you must enter a 2FA code
sent by email. This broke the NicoSeiga strategy. The fix is to just use
a static session cookie instead (and hope it doesn't expire, and isn't
tied to an IP).

The `nico_seiga_login` and `nico_seiga_password` config settings have
been removed from config/danbooru_default_config.rb and replaced by
`nico_seiga_user_session`. If you run your own Danbooru instance, you
will have to update your config file manually.
2022-02-22 12:23:01 -06:00
evazion
7d49ab6130 Add Danbooru::URL class.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
2022-02-22 00:17:53 -06:00
evazion
60a26af6e3 rails: add 'URL' inflection.
Make it so we can write `ArtistURL` instead of `ArtistUrl`.
2022-02-22 00:17:53 -06:00
evazion
fbab273c81 Upgrade http.rb gem to 5.0.4.
Fixes a bug where the Foundation source strategy failed because http.rb
automatically sent a `Content-Length: 0` header with all GET requests,
which caused Foundation to return a 400 Bad Request error. This behavior
was fixed in http.rb 5.x.

http.rb 5.x has a breaking change where it now includes the request object
inside the response object, which we have to handle in a few places.
2022-02-22 00:17:05 -06:00
evazion
68ba447494 uploads: remove batch upload page.
* Make /uploads/batch redirect to /uploads/new.
* Remove /uploads/image_proxy.
2022-02-21 00:03:43 -06:00
evazion
a1d2572bad uploads: add data attributes to thumbnail html.
Add data attributes to thumbnails on the /uploads, /upload_media_assets,
and /media_assets pages. Add a `data-is-posted` attribute for styling
thumbnails based on whether they've already been posted.
2022-02-18 15:44:12 -06:00
evazion
9a5a04d74e nijie: fix uploads not working for new image URL format.
Fix uploads not working for image URLs like this:

    https://pic.nijie.net/07/nijie/17/95/728995/illust/0_0_403fdd541191110c_c25585.jpg
2022-02-15 20:45:28 -06:00
evazion
7cfbd891ae pixiv: avoid unnecessary API call when uploading Pixiv posts.
Do one less API call when fetching the image URLs for a Pixiv post. The
`is_ugoira?` check in `image_urls` caused us to do an extra API call
when fetching the image URLs for a non-ugoira post.

API calls to Pixiv take around ~800ms, so this reduces minimum upload
time for Pixiv posts from ~1.6 seconds (two calls) to ~0.8 seconds.
2022-02-15 18:55:12 -06:00
evazion
ccaf5398f6 nico seiga: fix error when login fails.
Fix an undefined variable exception being thrown when login to NicoSeiga
failed (the `url` variable wasn't defined).
2022-02-15 18:55:12 -06:00
evazion
e4d7453180 uploads: improve error messages.
Improve upload error messages when downloading an URL fails, or it isn't
an image or video file.
2022-02-15 18:54:55 -06:00
evazion
87a00a1182 uploads: fix "ArgumentError: string contains null byte" error
Fix an error when trying to upload a file larger than the file size
limit. In this case we tried to dump the whole HTTP response into the
error message, which included the binary file itself, which caused this
exception because it contained null bytes.
2022-02-15 18:16:47 -06:00
evazion
b6538fde38 uploads: fix NicoSeiga sources not working.
Fix uploads for NicoSeiga sources not working because the strategy
returned URLs like the one below in the list of image_urls, which
require a login to download:

    https://seiga.nicovideo.jp/image/source/10315315

Also fix certain URLs like https://dic.nicovideo.jp/oekaki/52833.png not
working, because they didn't contain an image ID and the image_urls
method returned an empty list in this case.
2022-02-15 17:12:02 -06:00
evazion
37075988ce uploads: fix page_url for null strategy.
Fix the null source strategy setting the page URL. The page URL is
expected to be nil when we can't determine the page containing the image URL.

Fixes the upload_media_assets.page_url field being filled for uploads
from unknown sites.
2022-02-15 00:59:22 -06:00
evazion
27d71f2727 uploads: raise download timeout.
Raise the timeout for downloading files from the source to 60 seconds globally.

Previously had a lower timeout because uploads were processed in the
foreground when not using the bookmarklet, and we didn't want to tie up
Puma worker processes with slow downloads. Now that all uploads are
processed in the background, we can have a higher timeout.
2022-02-15 00:56:51 -06:00
evazion
26da728a07 deviant art: fix new image URLs not being recognized.
Partial fix for #5008. DeviantArt now returns https://wixmp-ed30a86b8c4ca887773594c2.wixmp.com
URLs instead of https://images-wixmp-ed30a86b8c4ca887773594c2.wixmp.com for images in the
API. Fix these URLs not being recognized by the DeviantArt strategy.
2022-02-14 00:33:50 -06:00
evazion
d6f7725a1e nijie: fix exception in login process.
Fix an exception when we can't find the 'url' field in the login form
because we're rate limited by Nijie and couldn't scrape the login page.
2022-02-12 17:26:25 -06:00
evazion
117d31e633 Fix undefined method readpartial' for \"\":String` error.
This exception was thrown by app/logical/pixiv_ajax_client.rb:406 when a
Pixiv API call failed with a network error. In this case we tried to log
the response body, but this failed because we returned a faked HTTP
response with an empty string for the body, which the http.rb library
didn't like because it was expecting an IO-like object for the body.
2022-02-12 15:22:24 -06:00
evazion
58fc00e549 uploads: allow uploading iso5 .mp4 files.
This is an MP4 ftyp sometimes used by Twitter.
2022-02-09 16:48:11 -06:00
evazion
51ba56e8a3 Fix #5001: Media assets not searchable through upload records.
Fix this:

  https://danbooru.donmai.us/uploads.json?search[media_assets][md5]=b83daa7f1ae7e4127b1befd32f71ba10

failing with an ActiveRecord::StatementInvalid error.

The bug was that for a `has_many through: ...` association, like
`has_many :media_assets, through: :upload_media_assets`, we weren't
joining on the associated table properly so we ended up generating
invalid SQL.
2022-02-08 19:18:11 -06:00
evazion
21c0d55aa4 Fix #5002: "Urls url has already been taken" when submitting duplicate urls with different capitalization
Fix URLs being normalized after checking for duplicates rather than
before, which meant that URLs that differed in capitalization weren't
detected as duplicates.
2022-02-08 19:15:55 -06:00
evazion
572878fb0d uploads: allow uploading .m4v format videos.
Fix not being able to upload .m4v format videos as reported here:

* https://danbooru.donmai.us/forum_posts/205248
* https://github.com/danbooru/danbooru/issues/3615#issuecomment-1030950924

From https://en.wikipedia.org/wiki/M4V:

  The M4V file format is a video container format developed by Apple and
  is very similar to the MP4 format. The primary difference is that M4V
  files may optionally be protected by DRM copy protection.

This could be a problem if it allows uploading videos that are
unplayable because of DRM.
2022-02-06 21:41:35 -06:00
evazion
7bed81812d Don't show error messages that could contain private information.
Fix a potential exploit where private information could be leaked if
it was contained in the error message of an unexpected exception.

For example, NoMethodError contains a raw dump of the object in the
error message, which could leak private user data if you could force a
User object to raise a NoMethodError.

Fix the error page to only show known-safe error messages from expected
exceptions, not unknown error messages from unexpected exceptions.

API changes:

* JSON errors now have a `message` param. The message will be blank for unknown exceptions.
* XML errors have a new format. This is a breaking change. They now look like this:

    <result>
      <success type="boolean">false</success>
      <error>PaginationExtension::PaginationError</error>
      <message>You cannot go beyond page 5000.</message>
      <backtrace type="array">
        <backtrace>app/logical/pagination_extension.rb:54:in `paginate'</backtrace>
        <backtrace>app/models/application_record.rb:17:in `paginate'</backtrace>
        <backtrace>app/logical/post_query_builder.rb:529:in `paginated_posts'</backtrace>
        <backtrace>app/logical/post_sets/post.rb:95:in `posts'</backtrace>
        <backtrace>app/controllers/posts_controller.rb:22:in `index'</backtrace>
      </backtrace>
    </result>

  instead of like this:

    <result success="false">You cannot go beyond page 5000.</result>
2022-02-06 18:09:54 -06:00
nonamethanks
1c9014a5bb Fix lofter not working with iqdb 2022-02-05 09:43:17 +01:00
evazion
e7744cb6e3 uploads: generate thumbnails in parallel.
Make uploads faster by generating and saving thumbnails in parallel.

We generate each thumbnail in parallel, then send each thumbnail to the
backend image servers in parallel.

Most images have 5 variants: 'preview' (150x150), 180x180, 360x360,
720x720, and 'sample' (850px width). Plus the original file, that's 6
files we have to save. In production we have 2 image servers, so we have
to save each file twice, to 2 remote servers. Doing all this in parallel
should make uploads significantly faster.
2022-02-04 16:20:50 -06:00
evazion
c4852b3486 rails: fix deprecated #to_s(:format) method.
Fix this deprecation:

    Deprecate passing a format to #to_s in favor of #to_formatted_s in
    Array, Range, Date, DateTime, Time, BigDecimal, Float and, Integer.

https://guides.rubyonrails.org/7_0_release_notes.html#active-support-deprecations
2022-02-01 13:19:50 -06:00
evazion
8cdc11a3e1 Fix #4983: Weird result for status:DELETED. 2022-02-01 01:59:09 -06:00
evazion
7435f2e516 Fix #4969: Tag changes made by replacements wipe out edits done at the same time.
Lock the post during replacement to ensure we have the latest version of
the tags and to ensure nobody else can modify the post until after the
replacement is finished.
2022-02-01 01:16:00 -06:00
evazion
60a13fd2d5 Fix #4913: Invalid replacements created if an error is raised during replacement
Perform the replacement in a before_create callback so that it runs in a
transaction and if it fails, the transaction will rollback and the
replacement record won't be created.

Doing the replacement in a transaction isn't great because, for one
thing, it could hold the transaction open a long time, which isn't good
for the database. And two, if the transaction rolls back, the database
changes will be undone, but if the replacement file has already been saved
to disk, then it won't be undone, which could result in a dangling file.
2022-02-01 01:14:41 -06:00