Commit Graph

11446 Commits

Author SHA1 Message Date
evazion
f31878bc2c Fix blank error page when performing search with invalid regex.
Fix searches like this:

    https://danbooru.donmai.us/artist_urls?search[url_not_regex]=https://www/.artstation/.com/[a-zA-Z0-9-.]

returning a HTTP 500 error with a blank page.

Here the problem is that Postgres raises an error because `[a-zA-Z0-9-.]`
is an invalid character class (it should be `[a-ZA-Z0-9.-]`). This error
happens on the error page itself, when rendering the <link rel="next"> /
<link rel="prev"> links in the <head>, so it causes the error page to fail.
2022-03-30 22:45:44 -05:00
evazion
b8f154d301 artists: add more artist url icons. 2022-03-30 22:04:24 -05:00
evazion
bfbc932025 Fix #5082: NoMethodError when searching an old-style dead fanbox url in artist urls.
This API call:

    # profile: https://www.pixiv.net/fanbox/creator/40684196
    curl -H "Origin: https://fanbox.cc" "https://api.fanbox.cc/creator.get?userId=40684196"

returns `{ "body": nil }` when the artist is deleted. We didn't expect `body` to be nil.

Also fix it so that `profile_url` returns the `https://www.pixiv.net/fanbox/creator/40684196`
URL if we can't get the `https://<username>.fanbox.cc` URL, usually because the API call failed
because the artist is deleted.
2022-03-30 18:19:08 -05:00
Thayol
89b40a65ba Refactor to hash from multiple ifs 2022-03-30 17:54:52 +02:00
evazion
a272c19b98 Fix #5078: Pixiv booth upload broken.
Allow image URLs from https://booth.pximg.net to be uploaded. Fix bug
where Booth.pm URLs were incorrectly caught by the Pixiv extractor.
2022-03-30 03:25:42 -05:00
evazion
04551b8154 autocomplete: replace calls to PostQueryBuilder with PostQuery. 2022-03-30 02:12:25 -05:00
evazion
6edff247f2 search: replace calls to PostQueryBuilder#fast_count with PostQuery#fast_count.
Prepare a few more places for the new tag search parser.
2022-03-30 01:37:11 -05:00
evazion
8c9e045a9c PostQuery::AST: fix #to_infix to not add unnecessary parentheses.
* Fix the `#to_infix` method to not add unnecessary parentheses around subexpressions.
* Fix metatags to add quotes around values when necessary.
2022-03-30 01:05:08 -05:00
evazion
fb0a7851bf BURs: fix mass update A -> B not being allowed.
Fix mass updates of the form `mass update A -> B` not being allowed.

This was originally because after `rename` was introduced, we wanted to
prevent people from using mass updates to move tags. Now, `mass update A -> B`
adds B to all posts tagged A instead of moving A to B. So `mass update A -> B`
should no longer be disallowed.

This also makes it so that it's an error to create a mass update with a
syntax error in the search. Before searches couldn't have syntax errors,
but now with the new query parser it's possible.
2022-03-29 21:31:28 -05:00
evazion
226faae8ec BURs: fix tags field not finding all BURs with that tag.
Fix the Tags field in the BUR search form not finding all BURs
mentioning that tag. Specifically, tags that were part of a mass update,
and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't
indexed as tags affected by the BUR.

This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb
to fix old BURs.
2022-03-29 21:06:24 -05:00
evazion
823fa5c6e9 search: switch to new tag search parser in a few places. 2022-03-29 18:21:47 -05:00
evazion
4c7cfc73c6 search: add new tag search parser.
Add a new tag tag search parser that supports full boolean expressions, including `and`,
`or`, and `not` operators and parenthesized subexpressions.

This is only the parser itself, not the code for converting the search into SQL. The new
parser isn't used yet for actual searches. Searches still use the old parser.

Some example syntax:

* `1girl 1boy`
* `1girl and 1boy` (same as `1girl 1boy`)
* `1girl or 1boy`
* `~1girl ~1boy` (same as `1girl or 1boy`)
* `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))`
* `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `1girl -(blonde_hair blue_eyes)`
* `*_hair *_eyes`
* `*_hair or *_eyes`
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Rules:

AND is implicit between terms, but may be written explicitly:

* `a b c` is `a and b and c`

AND has higher precedence (binds tighter) than OR:

* `a or b and c or d` is `a or (b and c) or d`
* `a or b c or d e` is `a or (b and c) or (d and e)`

All `~` operators in the same subexpression are combined into a single OR:

* `a b ~c ~d` is `a b (c or d)`
* `~a ~b and ~c ~d` is `(a or b) (c or d)`
* `(~a ~b) (~c ~d)` is `(a or b) (c or d)`

A single `~` operator in a subexpression by itself is ignored:

* `a ~b` is `a b`
* `~a and ~b` is `a and b`, which is `a b`
* `(~a) ~b` is `a ~b`, which is `a b`

The parser is written as a backtracking recursive descent parser built on top of
StringScanner and a handful of parser combinators. The parser generates an AST, which is
then simplified using Boolean algebra to remove redundant nodes and to convert the
expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).
2022-03-29 18:21:46 -05:00
evazion
8b2798d487 Merge pull request #5068 from NamelessContributor/patch-1
tag_list_component: fix overflow of some long tags
2022-03-27 02:29:37 -05:00
evazion
a12f82cb86 tests: fix tag name '(' test broken by dd21d4b45. 2022-03-26 16:31:20 -05:00
evazion
231075fb49 artists: fix artist finder to return nothing if it finds too many duplicates 2022-03-26 15:08:55 -05:00
evazion
44903abe28 BURs: change mass updates to not remove left-hand side tags.
Change mass updates to not automatically remove the left-hand side tags
from the post. This won't work with full boolean searches in the future
and already doesn't work with complex searches involving metatags or OR-tags.
2022-03-26 02:01:04 -05:00
evazion
dd21d4b45c tags: don't allow tags to start with '(' or '['.
Also don't allow the words 'and', 'or', and 'not'.

Related to #4949.
2022-03-26 00:38:34 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
34aa22f90b sources: fix fandom.com page urls.
Fix it so that sources like this:

* https://vignette.wikia.nocookie.net/valkyriecrusade/images/c/c5/Crimson_Hatsune_H.png/revision/latest?cb=20180702031954

link to this:

* https://valkyriecrusade.fandom.com/?file=Crimson_Hatsune_H.png

instead of this

* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png

The `/wiki/File:$name` URL redirects to whatever wiki page contains the
image instead of showing the file itself.
2022-03-23 23:38:06 -05:00
evazion
5941c47b79 nicoseiga: support a few more url types. 2022-03-23 23:38:06 -05:00
evazion
c07c5ea594 nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls.
Fix the page_url method not to return URLs like this:

    https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)

These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.

This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
2022-03-23 23:38:06 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
NamelessContributor
136e2777cb tag_list_component: fix overflow of some long tags
Fixes rare cases of tags containing a long word overflowing outside the sidebar e.g.
https://danbooru.donmai.us/posts?tags=iwakiyamayukisatoshironanogojuurokushi_akira
2022-03-23 11:37:10 +01:00
evazion
eef6e8f55f Merge pull request #5066 from nonamethanks/use-non-ass-fantia-icon
Fantia: use non-transparent site icon
2022-03-23 03:00:48 -05:00
evazion
fbca31d29e artists: add more artist url icons. 2022-03-23 02:59:22 -05:00
evazion
c51d1a6f5e artists: add more sites to artist finder blacklist. 2022-03-23 02:30:52 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
nonamethanks
38bd487e6b Fantia: use non-transparent site icon 2022-03-22 19:17:35 +01:00
evazion
770f850c66 instagram: add a couple more url types. 2022-03-22 04:35:50 -05:00
evazion
1e6e519709 artists: add more sites to artist finder blacklist. 2022-03-22 04:27:46 -05:00
evazion
cdbcc8e768 artists: add more artist url icons. 2022-03-22 03:43:23 -05:00
evazion
8705b8ec89 Merge pull request #5053 from NamelessContributor/fix-5052
posts: fix sidebar min-width on small screens (fix #5052)
2022-03-22 03:42:41 -05:00
evazion
a5115473d0 Merge pull request #5064 from CoreMack/master
Fix modqueue highlighting after topic #20445 (screenshots)
2022-03-22 03:42:12 -05:00
evazion
4d885acdbb Merge pull request #5061 from SystemZ/feat-conf-max-video-length
Configurable max video duration during upload
2022-03-22 03:41:32 -05:00
evazion
dd764bb4cf Merge pull request #5062 from NamelessContributor/fix-naver-duplicate
artists: add naver.com to artist finder blacklist.
2022-03-22 03:41:06 -05:00
CoreMack
dc45e6ddcb correct modqueue screencap highlighting 2022-03-21 16:03:07 -07:00
evazion
452ce8d165 artstation: add partial support for video clips (#5063).
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
2022-03-21 16:51:42 -05:00
Michał Frąckiewicz
93635a20d9 Configurable max video duration 2022-03-21 19:22:34 +01:00
NamelessContributor
038c767455 artists: add naver.com to artist finder blacklist. 2022-03-21 17:34:44 +01:00
evazion
3fc01de19c reports: fix exception when deleting reported comments or forum posts.
Fix regression in 1a4efbda3. Locking the comment before validation
failed when the comment had unsaved changes, as is the case when
clearing reports from a comment before it is deleted.
2022-03-21 03:53:53 -05:00
evazion
56f47c60e1 posts: fix exception when viewing post with source Blog..
Fix a PublicSuffix::DomainNotAllowed exception raised with viewing or editing a post
with a source like `Blog.`.

This happened when parsing the post's source. `Danbooru::URL.parse("Blog.")` would
heuristically parse the source into `http://blog`. Calling any methods related to the
URL's hostname or domain would lead to calling `PublicSuffix.parse("blog")`, which
would fail with PublicSuffix::DomainNotAllowed.
2022-03-21 03:24:50 -05:00
evazion
defea08084 posts: fix exception in random:1 searches.
Fix regression in 1ad0e8688. Caused by `relation.order_values` returning
an array of Arel nodes instead of an array of strings when doing a
`random:1` search.
2022-03-21 01:29:10 -05:00
evazion
f52dc9e2ad tinami: fix 'http already memoized' warning. 2022-03-21 00:37:55 -05:00
evazion
1a4efbda33 Fix #5058: Duplicate report can't be rejected. 2022-03-21 00:36:59 -05:00
evazion
7c887f8adc artists: fix exception when adding TwitPic urls. 2022-03-20 21:56:38 -05:00
evazion
705edfb175 artists: add more patterns to artist finder blacklist. 2022-03-20 21:27:38 -05:00
evazion
7394660ba9 posts: fix exception when post has source like 'https://www.twitter.com/username'.
`twitter.com` sources worked but `www.twitter.com` didn't.

Also match the URL by class instead of by site name to ensure we match
the expected class.
2022-03-20 21:08:05 -05:00
evazion
1ad0e8688d posts: fix timeouts for searches using sequential navigation.
Fix certain searches timing out when using sequential navigation (page=b1234).

The problem was that the so-called "small search optimization" (AKA: force Postgres
to use the tag index for small searches instead a sequential scan) wasn't triggering
because the ORDER BY clause for sequential navigation was `posts.id desc`, and we
were only checking for `posts.id DESC`.
2022-03-20 18:46:06 -05:00
evazion
4b1264991f users: remove 'spoilers' tag from default blacklist.
Rationale:

* The spoilers tag is the most frequently removed tag from the default blacklist.
* It's frustrating for regular users to have posts randomly hidden because of trivial
  spoilers from a series they don't care about.
* The spoilers tag is used way too liberally for things that aren't considered
  spoilers on other sites.
* If you're looking up fanart on the internet, you should expect to see a certain
  level of spoilers.
* The tag is used very inconsistently, with some characters like Nia_(blade)_(xenoblade)
  getting the spoilers tag half the time and the rest of the time not.
2022-03-20 16:49:36 -05:00
evazion
71f42d67a7 tinami: return nothing if getting the full image fails.
Fix to make sure `image_urls` returns an empty array instead of `[nil]`
if grabbing the full image URL fails for whatever reason.
2022-03-19 23:42:34 -05:00