Commit Graph

70 Commits

Author SHA1 Message Date
evazion
30b286a030 Fix #5241: Source extractors not using 60s timeout.
Raise the default HTTP timeout from 10 seconds to 20 seconds.
2022-10-30 17:26:42 -05:00
evazion
eff82c43d2 emails: fix validation of undeliverable email addresses.
* Fix bug where bogus domains weren't found because we checked for `nil?` instead of `blank?`.
* Fix bug where bogus names weren't found because we used a nonexistent variable in the
  RCPT TO check (`to_address` instead of `address`).
2022-10-30 14:49:12 -05:00
evazion
0d835983ce reports: fix error when report is empty.
Fix an exception when a report is empty, for example when performing a
tag search that returns no results:

* https://betabooru.donmai.us/reports/posts?search[group]=uploader&search[tags]=does_not_exist
2022-10-23 21:55:06 -05:00
evazion
f73d2e3956 reports: add ability to group reports by column.
Add ability to group reports by various columns. For example, you can see
the posts by the top 10 uploaders over time, or posts grouped by rating
over time.
2022-10-22 04:05:10 -05:00
evazion
412b7f2727 http: split requests into internal and external requests.
Split requests made by Danbooru::Http into either internal or external
requests. Internal requests are API calls to internal services run by
Danbooru. External requests are requests to external websites, for
example fetching sources or downloading files. External requests may use
a HTTP proxy if one is configured. Internal requests don't.

Fixes a few source extractors not using the HTTP proxy for certain API calls.
2022-10-19 01:49:28 -05:00
evazion
873c67db58 emails: disallow names ending with a period.
Update email validation rules to disallow the percent character (e.g.
`foo%bar@gmail.com`) and names ending with a period (e.g. `foo.@gmail.com`).
Names ending with a period are invalid according to the RFCs and cause
`Mail::Address.new` to raise an exception.

The percent character is technically legal, but only one email used it
and it was probably a typo.
2022-10-17 22:13:19 -05:00
evazion
e31977ac29 emails: move EmailValidator into Danbooru::EmailAddress. 2022-10-17 22:13:19 -05:00
evazion
9ea2c34f17 emails: add more typo correction rules for Gmail. 2022-10-17 22:13:19 -05:00
evazion
edc7e52353 emails: automatically fix typos in email addresses.
Try to automatically fix various kind of typos and common mistakes in
email addresses when a user creates a new account. It's common for users
to signup with addresses like `name@gmai.com`, which leads to bounces
when we try to send the welcome email.
2022-10-14 18:49:33 -05:00
nonamethanks
d51cc17eaf Nicoseiga: rewrite tests and fix several bugs
* Fixed a bug where manga posts with a single tag would raise an error
* Fixed a bug where dic.nicovideo.jp/oekaki posts weren't uploadable due
  to SSL issues
* Added support for more manga corner cases
2022-09-29 14:37:46 +02:00
evazion
530d8cf762 searchable: fix searching for invalid IP addresses.
Fix an ArgumentError exception when searching for an invalid IP address.

Also allow searching for multiple subnets at once.
2022-09-29 04:36:12 -05:00
evazion
adba70a0de api: make IP addresses in the API.
Make the following fields visible in API responses:

* ip_bans.ip_addr
* ip_geolocations.ip_addr
* ip_geolocations.network
* users.last_ip_addr (mod only)
* user_sessions.ip_addr
* api_keys.last_ip_address
* api_keys.permitted_ip_addresses

Before IP addresses were globally hidden in API responses because IPs were
present in a lot of tables and we didn't want to accidentally leak them.
Now that we've gotten rid of IPs from most tables, it's safe to unhide them.
2022-09-24 03:48:45 -05:00
evazion
56f47c60e1 posts: fix exception when viewing post with source Blog..
Fix a PublicSuffix::DomainNotAllowed exception raised with viewing or editing a post
with a source like `Blog.`.

This happened when parsing the post's source. `Danbooru::URL.parse("Blog.")` would
heuristically parse the source into `http://blog`. Calling any methods related to the
URL's hostname or domain would lead to calling `PublicSuffix.parse("blog")`, which
would fail with PublicSuffix::DomainNotAllowed.
2022-03-21 03:24:50 -05:00
evazion
42144eaa4b Fix #5012: Fc2 image link paste not uploading.
Fix referer spoofing not working for certain fc2.com image URLs.

Spoofing the referer like this redirects to an HTML error page:

* curl -H "Referer: http://wwwew.web.fc2.com" http://wwwew.web.fc2.com/e/405.jpg

Spoofing it like this works:

* curl -H "Referer: http://wwwew.web.fc2.com/e/405.jpg" http://wwwew.web.fc2.com/e/405.jpg
2022-03-18 04:39:13 -05:00
evazion
926a8fa81f Danbooru::URL: add #basename, #filename, and #file_ext utility methods.
Add `#basename`, `#filename`, and `#file_ext` utility methods to
Danbooru::URL and change a few places to use them. Simplifies parsing
filenames in source URLs in various places.
2022-02-27 02:27:21 -06:00
evazion
fcf517834d sources: factor out Source::URL::ArtStation. 2022-02-26 21:03:49 -06:00
evazion
26f4cf1ebd sources: factor out Source::URL::Skeb. 2022-02-25 02:06:57 -06:00
evazion
7ed8f95a8e sources: add Source::URL class; factor out Source::URL::Twitter.
Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter
source strategy to use it.

This is the first step towards factoring all the URL parsing logic out of source
strategies and moving it to subclasses of Source::URL. Each site will have a subclass
of Source::URL dedicated to parsing URLs from that site. Source strategies will use
these classes to extract information from URLs.

This is to simplify source strategies. Most sites have many different URL formats we have
to parse or rewrite, and handling all these different cases tends to make source
strategies very complex. Isolating the URL parsing logic from the site scraping logic
should make source strategies easier to maintain.
2022-02-23 23:46:04 -06:00
evazion
7d49ab6130 Add Danbooru::URL class.
Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper
around Addressable::URI that adds some additional helper methods. Most
significantly, the `parse` method only allows valid http/https URLs, and
it returns nil instead of raising an exception when the URL is invalid.
2022-02-22 00:17:53 -06:00
evazion
fbab273c81 Upgrade http.rb gem to 5.0.4.
Fixes a bug where the Foundation source strategy failed because http.rb
automatically sent a `Content-Length: 0` header with all GET requests,
which caused Foundation to return a 400 Bad Request error. This behavior
was fixed in http.rb 5.x.

http.rb 5.x has a breaking change where it now includes the request object
inside the response object, which we have to handle in a few places.
2022-02-22 00:17:05 -06:00
evazion
e4d7453180 uploads: improve error messages.
Improve upload error messages when downloading an URL fails, or it isn't
an image or video file.
2022-02-15 18:54:55 -06:00
evazion
87a00a1182 uploads: fix "ArgumentError: string contains null byte" error
Fix an error when trying to upload a file larger than the file size
limit. In this case we tried to dump the whole HTTP response into the
error message, which included the binary file itself, which caused this
exception because it contained null bytes.
2022-02-15 18:16:47 -06:00
evazion
117d31e633 Fix undefined method readpartial' for \"\":String` error.
This exception was thrown by app/logical/pixiv_ajax_client.rb:406 when a
Pixiv API call failed with a network error. In this case we tried to log
the response body, but this failed because we returned a faked HTTP
response with an empty string for the body, which the http.rb library
didn't like because it was expecting an IO-like object for the body.
2022-02-12 15:22:24 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
b561ca49f2 foundation: fix mojibake in artist commentaries.
Fix certain artist commentaries for foundation.app containing scrambled
characters. Apparently caused by the Nokogiri HTML5 parser not handling
UTF-8 input correctly when the encoding isn't explicitly set to UTF-8.
2021-11-15 04:55:48 -06:00
evazion
3f9a85a828 Rails: send logs to stderr by default, not stdout.
Send all logs to stderr by default instead of stdout. Fixes a problem
where parsing the output of sandboxed commands could fail, because they
could contain Rails log messages in their stdout.

When we run a command in a sandbox, we call fork+exec to run the command
in the background so we can capture its output. If Rails prints
anything to stdout between the fork and exec calls, then it will be
inadvertently captured along with the command's output. This will break
parsing of the command's output. This can happen if warning messages are
printed by Rails while setting up the sandbox between the fork and exec
calls.

Writing to stderr is also more correct, since stdout is buffered by
default, which means logs could potentially be lost if the process dies
unexpectedly before the buffers are flushed. Stderr is unbuffered by
default, which means logs will always be output immediately.
2021-11-11 09:20:57 -06:00
evazion
f8d52e6758 /status: add more information to /status page.
Add the following:

* Container name, machine name, worker id.
* Container uptime, puma uptime, worker uptime.
* Number of requests processed by current worker.
* ExifTool version.

Also change /status page to show information in tables instead of lists.
2021-09-26 23:11:08 -05:00
evazion
bb7f24d279 Add HTTP proxy support.
Add support for using a proxy for HTTP requests. Only used for external
requests, such as downloading files or talking to source sites such as
Pixiv or Twitter, not for internal requests, such as talking to IQDB or
Reportbooru.
2021-08-28 04:53:33 -05:00
evazion
ad4c75eb1a docs add more docs to app/{jobs,logical}.
These were missed in the last commit.
2021-06-28 05:09:19 -05:00
evazion
00ca7526bb docs: add remaining docs for classes in app/logical. 2021-06-24 01:31:41 -05:00
evazion
e2704f6a7b Danbooru::Http: redirect POST to GET on 302.
When a POST request returns a 302 redirect, follow the redirect with a
GET request instead of with a POST request.

HTTP standards leave it unspecified whether a POST request that returns
a 302 redirect should be followed with a GET or with a POST. A GET is
what most browsers use, which means it's what most servers expect.

Fixes the /tagme Discord command not working because when we uploaded
the image to DeepDanbooru, the POST request returned a 302 redirect,
which the server expected us to follow with a GET, not with a POST.

Ref:

* https://stackoverflow.com/questions/17605915/what-is-the-correct-behavior-expected-of-an-http-post-302-redirect-to-get
2021-03-29 03:01:02 -05:00
evazion
1a7a108d47 discord: add /tagme command. 2021-03-19 04:44:22 -05:00
evazion
52adf87489 Fix #4666: Broken network link for some IPs. 2021-03-01 20:44:51 -06:00
evazion
92b8f24724 ip addresses: move more logic to Danbooru::IpAddress.
* Move `is_local?` from IpLookup to Danbooru::IpAddress.
* Refactor more things to use Danbooru::IpAddress instead of using
  IPAddress directly.
2021-03-01 20:13:14 -06:00
evazion
35a0c6b11f Fix #4736: Display network prefix length (if present) in API key IP whitelist. 2021-03-01 02:38:18 -06:00
evazion
65be2c99b0 Fix #4657: Hentai-Foundry: Document tree depth limit exceeded. 2021-01-06 03:05:36 -06:00
evazion
9dc788c0ce users: improve sockpuppet detection on signup.
Require new accounts to verify their email address if any of the
following conditions are true:

* Their IP is a proxy.
* Their IP is under a partial IP ban.
* They're creating a new account while logged in to another account.
* Somebody recently created an account from the same IP in the last week.

Changes from before:

* Allow logged in users to view the signup page and create new accounts.
  Creating a new account while logged in to your old account is now
  allowed, but it requires email verification. This is a honeypot.
* Creating multiple accounts from the same IP is now allowed, but they
  require email verification. Previously the same IP check was only for
  the last day (now it's the last week), and only for an exact IP match
  (now it's a subnet match, /24 for IPv4 or /64 for IPv6).
* New account verification is disabled for private IPs (e.g. 127.0.0.1,
  192.168.0.1), to make development or running personal boorus easier
  (fixes #4618).
2020-12-27 23:41:07 -06:00
evazion
5917587fd5 http: add logger for debugging purposes.
Usage: Danbooru::Http.new.use(:logger).get(url).
2020-08-12 13:11:33 -05:00
evazion
f5c9a78797 danbooru::http: fix SSLError exceptions not being caught.
Bug: The frontpage failed due to a SSL error. We couldn't fetch the
popular tag list from Reportbooru because Reportbooru's SSL certificate
had expired and HTTP.rb raised an SSLError exception that we didn't
catch.

Fix: Convert the SSLError to a 5xx HTTP error to prevent SSL exceptions
from leaking through HTTP.rb.
2020-06-29 14:49:59 -05:00
evazion
5af50b7fcd danbooru::http: factor out Cloudflare Polish bypassing.
* Factor out the Cloudflare Polish bypass code to a standalone feature.

* Add `http_downloader` method to the base source strategy. This is a
  HTTP client that should be used for downloading images or making
  requests to images. This client ensures that referrer spoofing and
  Cloudflare bypassing are performed.

This fixes a bug with the upload page reporting the polished filesize
instead of the original filesize when uploading ArtStation images.
2020-06-24 22:54:04 -05:00
evazion
d3bb5c67ee danbooru::http: factor out referrer spoofing.
Factor out referrer spoofing so that it can be used outside of downloading
files. We also need to spoof the referrer when determining the remote
filesize of images on the uploads page.
2020-06-24 21:46:59 -05:00
evazion
7f5e87568a danbooru::http: raise exception on failed downloads.
Restore behavior from a6994cd4d, it breaks tests when they try to the
response body from a fake 599 response.
2020-06-22 22:51:36 -05:00
evazion
a6994cd4d7 media file: fix exception on empty files.
This may happen if a user uploads from a source that returns an error
HTTP response with no data.
2020-06-22 18:49:36 -05:00
evazion
bd25be95f5 danbooru::http: factor out cache feature.
Fixes a bug with cookies stored by the `session` feature not being sent
with cached requests.
2020-06-21 18:28:37 -05:00
evazion
f85eef9bcd nijie: fix bug with retries returning cached responses.
Bug: if a Nijie login failed with a 429 Too Many Requests error, the
error would get cached, so when we retried the request, we would just
get our own cached response back every time. The 429 error would
eventually be passed up to the Nijie strategy, which caused random
methods to fail because they couldn't get the html page.

Fix: add the `retriable` feature *after* the `cache` feature so that
retries don't go through the cache. This is a hack. We want retries to
go at the bottom of the stack, below caching, but we can't enforce this
ordering.
2020-06-21 18:13:21 -05:00
evazion
a4efeb2260 gems: drop Mechanize, HTTParty, and Sinatra gems. 2020-06-21 15:13:42 -05:00
evazion
05d7355ebb danbooru::http: support automatically following redirects.
Replace http.rb's builtin redirect following option with our own
redirect follower. This fixes an issue with http.rb losing cookies after
following a redirect.
2020-06-21 05:22:57 -05:00
evazion
71b0bc6c0f danbooru::http: support tracking cookies between requests.
Allow cookies to be saved and sent back when making several requests in
a row. Usage:

    http = Danbooru::Http.use(:session)

    # saves the foo=42 cookie sent by the response.
    http.get("https://httpbin.org/cookies/set/foo/42")

    # sends back the foo=42 cookie from the previous request.
    http.get("https://httpbin.org/cookies")
2020-06-21 05:22:56 -05:00
evazion
87ed882234 danbooru::http: support automatically retrying 429 errors. 2020-06-21 05:22:30 -05:00
evazion
a929f3134e danbooru::http: parse html responses. 2020-06-21 05:22:27 -05:00