Commit Graph

2490 Commits

Author SHA1 Message Date
nonamethanks
3b055138ff Fix normalization for fandom sources 2022-04-22 03:27:05 +02:00
nonamethanks
c9227645d9 Add anifty.jp support 2022-04-18 16:50:26 +02:00
evazion
6c5dd5ffed tests: fix broken tests. 2022-04-18 00:31:31 -05:00
evazion
eca0ab04f7 post queries: raise error on invalid searches.
Raise an error if the search is invalid for one of the following reasons:

* It contains multiple conflicting order: metatags (e.g. `order:score order:favcount` or `ordfav:a ordfav:b`).
* It contains a metatag that can't be used more than once: (e.g. `limit:5 limit:10`, `random:5 random:10`).
* It contains a metatag that can't be negated (e.g. `-order:score`, `-limit:20`, or `-random:20`).
* It contains a metatag that can't be used in an OR clause (e.g. ` touhou or order:score`, `touhou or limit:20`, `touhou or random:20`).
2022-04-17 23:20:22 -05:00
evazion
c45d1d42c2 post queries: fix parsing of trailing parentheses.
Fix queries like `(fate_(series) saber)` being parsed as `fate_(series` + `saber)`
instead of `fate_(series)` + `saber`.

This is pretty hacky. We assume that parentheses in tags are balanced.
So the rule is that trailing parentheses are part of the tag as long as
they're balanced, and not part of the tag if they're unbalanced.
2022-04-17 23:20:22 -05:00
evazion
5f1c296011 tags: don't allow tags with unbalanced parentheses.
Don't allow tags to have unbalanced parentheses, except for a few
emoticon tags as special exceptions to the rule.
2022-04-17 23:20:22 -05:00
evazion
3e8e33e663 post queries: fix handling of '~' operator.
Fix queries like `(~a ~b) (~c ~d)` being handled like `~a ~b ~c ~d`.
Caused by trimming AND nodes from the tree before rewriting the '~'
operator, which caused `~a` terms to be incorrectly lifted out of
subexpressions.
2022-04-17 23:20:22 -05:00
evazion
af183467b6 post queries: switch to new post search engine.
Switch to the post search engine using the new PostQuery parser. The new
engine fully supports AND, OR, and NOT operators and grouping expressions
with parentheses.

Highlights:

New OR operator:

* `skirt or dress` (same as `~skirt ~dress`)

Tags can be grouped with parentheses:

* `1girl (skirt or dress)`
* `(blonde_hair blue_eyes) or (red_hair green_eyes)`
* `~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `(pantyhose or thighhighs) (black_legwear or brown_legwear)`
* `(~pantyhose ~thighhighs) (~black_legwear ~brown_legwear)` (same as above)

Metatags can be OR'd together:

* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Wildcard tags can combined with either AND or OR:

* `black_* white_*` (find posts with at least one black_* tag AND one white_* tag)
* `black_* or white_*` (find posts with at least one black_* tag OR one white_* tag)
* `~black_* ~white_*` (same as above)

See 4c7cfc73 for more syntax examples.

Fixes #4949: And+or search?
Fixes #5056: Wildcard searches return unexpected results when combined with OR searches
2022-04-17 23:20:22 -05:00
evazion
d19e307e69 Merge pull request #5125 from nonamethanks/booth-support
Add Booth support
2022-04-17 23:00:14 -05:00
evazion
adbef1b869 Merge pull request #5113 from nonamethanks/prune-disapprovals-on-queue-enter
Posts: prune disapprovals on new appeal or flag
2022-04-17 22:57:49 -05:00
evazion
e35bbb8bc8 Merge pull request #5120 from nottalulah/favgroup-any
favgroups: allow favgroup:any/none searches
2022-04-17 22:55:55 -05:00
evazion
4f684044e3 Merge pull request #5114 from nonamethanks/editable-post-disapprovals
Allow post disapprovals to be edited
2022-04-17 22:54:57 -05:00
nonamethanks
9612578fcb Add Booth support 2022-04-16 17:52:18 +02:00
nonamethanks
70148366d9 Add source test helper 2022-04-15 23:09:27 +02:00
evazion
363cf2014b views: fix deprecated calls to ViewComponent#with_variant. 2022-04-13 00:18:53 -05:00
Talulah
c1996e4f06 favgroups: allow favgroup:any/none searches 2022-04-12 23:01:19 -03:00
nonamethanks
1a990d5ab9 Allow post disapprovals to be edited 2022-04-11 21:05:44 +02:00
nonamethanks
63bd5daa3b Posts: prune disapprovals on new appeal or flag 2022-04-11 15:54:28 +02:00
evazion
e37f3e6538 Merge pull request #5104 from nonamethanks/no-wiki-no-deprecation
Tags: don't allow deprecation of tags without wiki
2022-04-10 00:05:08 -05:00
nonamethanks
11281d6f58 Tags: don't allow deprecation of tags without wiki 2022-04-09 20:16:55 +02:00
Lily
9dde90ef94 fix broken assertion in nijie test 2022-04-09 12:32:45 -03:00
nonamethanks
ea76a889db Add ability to mark tags as deprecated
* Deprecated tags can't be added to posts, but existing deprecated tags
  in a post won't be removed
* Only empty tags can be marked as deprecated manually
* No tags can be manually undeprecated
** These limits don't apply to admins
* Deprecating or undeprecating a tag will create a new mod action to
  prevent people from going rogue
* Added deprecate/undeprecate commands for BURs
* Deprecating a tag via BUR removes all implications to and from it as well
2022-04-08 09:07:14 +02:00
evazion
98a9b2484b post queries: parse order:*_count synonyms. 2022-04-06 23:57:55 -05:00
evazion
86de5cb5d2 posts: fixup flagger: metatag.
Fix regression in 01a22930e.
2022-04-06 23:57:50 -05:00
evazion
a4d43ae72a post queries: track whether metatag values are quoted.
This is necessary for the `commentary:` metatag, which has different
behavior depending on whether the metatag value is quoted. For example,
`commentary:translated` finds translated commentaries, while
`commentary:"translated"` finds commentaries containing the literal word
"translated".
2022-04-06 17:20:27 -05:00
evazion
2adc530ba0 post queries: parse count metatag synonyms. 2022-04-06 17:20:27 -05:00
evazion
783419bcd7 post queries: support single-quoted strings in metatags. 2022-04-06 00:18:38 -05:00
evazion
bf7c721815 post queries: refactor AST #simplify method.
Refactor the `PostQuery::AST#simplify` method to split it into three
methods: `#trim` to eliminate redundant AND and OR clauses, `#simplify`
to expand deeply nested subexpressions, and `#sort` to sort the query
into alphabetical order.

This is so we can normalize queries written by users by parsing and
rewriting them, but without expanding out nested subexpressions, which
can substantially alter the way the query is written.
2022-04-04 00:48:40 -05:00
evazion
8ef72d59c1 artists: allow url_matches param to take multiple urls.
Pass as an array or space-separated string:

* https://danbooru.donmai.us/artists?search[url_matches]=https://www.pixiv.net/en/users/32777+https://www.pixiv.net/en/users/3584828
* https://danbooru.donmai.us/artists?search[url_matches][]=https://www.pixiv.net/en/users/32777&search[url_matches][]=https://www.pixiv.net/en/users/3584828
2022-04-03 02:54:30 -05:00
evazion
0d480eb832 artist urls: stop using normalized_url.
Stop the last remaining uses of the `artist_urls.normalized_url` column.
It's already no longer used by the artist finder. The only remaining
uses were by API users. Those users should use the `url` column instead.
2022-04-02 23:58:01 -05:00
evazion
ca8083465b newgrounds: exclude links to other works in commentary.
Sometimes when a Newgrounds post is part of a set, there is a list of
links to other posts in the set in the artist's commentary. Exclude
these links because they're not really part of the commentary.

Example: https://www.newgrounds.com/art/view/boxofwant/annie-hughes-1 (NSFW)
2022-04-02 23:13:26 -05:00
evazion
54cfbf84c6 pawoo: fix www.pawoo.net urls not being normalized to pawoo.net.
Fix artist URLs like https://www.pawoo.net/@01051708 not being normalized to https://pawoo.net/@01051708.
2022-03-31 02:17:51 -05:00
evazion
bfbc932025 Fix #5082: NoMethodError when searching an old-style dead fanbox url in artist urls.
This API call:

    # profile: https://www.pixiv.net/fanbox/creator/40684196
    curl -H "Origin: https://fanbox.cc" "https://api.fanbox.cc/creator.get?userId=40684196"

returns `{ "body": nil }` when the artist is deleted. We didn't expect `body` to be nil.

Also fix it so that `profile_url` returns the `https://www.pixiv.net/fanbox/creator/40684196`
URL if we can't get the `https://<username>.fanbox.cc` URL, usually because the API call failed
because the artist is deleted.
2022-03-30 18:19:08 -05:00
evazion
a272c19b98 Fix #5078: Pixiv booth upload broken.
Allow image URLs from https://booth.pximg.net to be uploaded. Fix bug
where Booth.pm URLs were incorrectly caught by the Pixiv extractor.
2022-03-30 03:25:42 -05:00
evazion
8c9e045a9c PostQuery::AST: fix #to_infix to not add unnecessary parentheses.
* Fix the `#to_infix` method to not add unnecessary parentheses around subexpressions.
* Fix metatags to add quotes around values when necessary.
2022-03-30 01:05:08 -05:00
evazion
fb0a7851bf BURs: fix mass update A -> B not being allowed.
Fix mass updates of the form `mass update A -> B` not being allowed.

This was originally because after `rename` was introduced, we wanted to
prevent people from using mass updates to move tags. Now, `mass update A -> B`
adds B to all posts tagged A instead of moving A to B. So `mass update A -> B`
should no longer be disallowed.

This also makes it so that it's an error to create a mass update with a
syntax error in the search. Before searches couldn't have syntax errors,
but now with the new query parser it's possible.
2022-03-29 21:31:28 -05:00
evazion
226faae8ec BURs: fix tags field not finding all BURs with that tag.
Fix the Tags field in the BUR search form not finding all BURs
mentioning that tag. Specifically, tags that were part of a mass update,
and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't
indexed as tags affected by the BUR.

This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb
to fix old BURs.
2022-03-29 21:06:24 -05:00
evazion
4c7cfc73c6 search: add new tag search parser.
Add a new tag tag search parser that supports full boolean expressions, including `and`,
`or`, and `not` operators and parenthesized subexpressions.

This is only the parser itself, not the code for converting the search into SQL. The new
parser isn't used yet for actual searches. Searches still use the old parser.

Some example syntax:

* `1girl 1boy`
* `1girl and 1boy` (same as `1girl 1boy`)
* `1girl or 1boy`
* `~1girl ~1boy` (same as `1girl or 1boy`)
* `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))`
* `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `1girl -(blonde_hair blue_eyes)`
* `*_hair *_eyes`
* `*_hair or *_eyes`
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Rules:

AND is implicit between terms, but may be written explicitly:

* `a b c` is `a and b and c`

AND has higher precedence (binds tighter) than OR:

* `a or b and c or d` is `a or (b and c) or d`
* `a or b c or d e` is `a or (b and c) or (d and e)`

All `~` operators in the same subexpression are combined into a single OR:

* `a b ~c ~d` is `a b (c or d)`
* `~a ~b and ~c ~d` is `(a or b) (c or d)`
* `(~a ~b) (~c ~d)` is `(a or b) (c or d)`

A single `~` operator in a subexpression by itself is ignored:

* `a ~b` is `a b`
* `~a and ~b` is `a and b`, which is `a b`
* `(~a) ~b` is `a ~b`, which is `a b`

The parser is written as a backtracking recursive descent parser built on top of
StringScanner and a handful of parser combinators. The parser generates an AST, which is
then simplified using Boolean algebra to remove redundant nodes and to convert the
expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).
2022-03-29 18:21:46 -05:00
evazion
a12f82cb86 tests: fix tag name '(' test broken by dd21d4b45. 2022-03-26 16:31:20 -05:00
evazion
231075fb49 artists: fix artist finder to return nothing if it finds too many duplicates 2022-03-26 15:08:55 -05:00
evazion
44903abe28 BURs: change mass updates to not remove left-hand side tags.
Change mass updates to not automatically remove the left-hand side tags
from the post. This won't work with full boolean searches in the future
and already doesn't work with complex searches involving metatags or OR-tags.
2022-03-26 02:01:04 -05:00
evazion
d9d3c1dfe4 sources: rename Sources::Strategies to Source::Extractor.
Rename Sources::Strategies to Source::Extractor. A Source::Extractor
represents a thing that extracts information from a given URL.
2022-03-24 03:49:44 -05:00
evazion
34aa22f90b sources: fix fandom.com page urls.
Fix it so that sources like this:

* https://vignette.wikia.nocookie.net/valkyriecrusade/images/c/c5/Crimson_Hatsune_H.png/revision/latest?cb=20180702031954

link to this:

* https://valkyriecrusade.fandom.com/?file=Crimson_Hatsune_H.png

instead of this

* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png

The `/wiki/File:$name` URL redirects to whatever wiki page contains the
image instead of showing the file itself.
2022-03-23 23:38:06 -05:00
evazion
5941c47b79 nicoseiga: support a few more url types. 2022-03-23 23:38:06 -05:00
evazion
c07c5ea594 nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls.
Fix the page_url method not to return URLs like this:

    https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193)

These are direct image URLs, not page URLs. It's not generally possible
to get to the page URL from an image URL like this.

This fixes it so that we don't incorrectly set the source of NicoSeiga
uploads to the image URL.
2022-03-23 23:38:06 -05:00
evazion
4ef8178bd1 sources: remove canonical_url method.
Refactor source strategies to remove the `canonical_url` method.

`canonical_url` returned the URL that should be used as the source of
the post after upload. Now we simply use `Source::URL#page_url` to
determine the source after upload. If the source is an image URL that is
convertible to a page URL, then the image URL is used as the source. If
the source is an image URL that is not convertible to a page URL, then
the page URL is used as the source.

This simplifies source strategies so that all they have to care about is
implementing the `Source::URL#page_url` and `Sources::Strategies#page_url`
methods, and the preferred source will be chosen for posts automatically.
2022-03-23 23:38:06 -05:00
evazion
3aa5cab2aa sources: refactor normalize_for_source.
`normalize_for_source` was used to convert image URLs to page URLs when displaying sources
on the post show page. Move all the code for converting image URLs to page URLs from
`Sources::Strategies#normalize_for_source` to `Source::URL#page_url`.

Before we had to be very careful in source strategies not to make any network calls in
`normalize_for_source`, since it was used in the view for the post show page. Now all the
code for generating page URLs is isolated in Source::URL, which makes source strategies
simpler. It also makes it easier to check if a source is an image URL or page URL, and if
the image URL is convertible to a page URL, which will make autotagging bad_link or
bad_source feasible.

Finally, this fixes it to generate better page URLs in a handful of cases:

* https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP
* https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6
* http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677
* https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png
* https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1
2022-03-23 01:34:04 -05:00
evazion
452ce8d165 artstation: add partial support for video clips (#5063).
Add partial support for fetching videos from ArtStation posts that
contain videos. Most of this code is disabled for now because actually
downloading these videos requires bypassing a Cloudflare captcha.
2022-03-21 16:51:42 -05:00
evazion
56f47c60e1 posts: fix exception when viewing post with source Blog..
Fix a PublicSuffix::DomainNotAllowed exception raised with viewing or editing a post
with a source like `Blog.`.

This happened when parsing the post's source. `Danbooru::URL.parse("Blog.")` would
heuristically parse the source into `http://blog`. Calling any methods related to the
URL's hostname or domain would lead to calling `PublicSuffix.parse("blog")`, which
would fail with PublicSuffix::DomainNotAllowed.
2022-03-21 03:24:50 -05:00
evazion
defea08084 posts: fix exception in random:1 searches.
Fix regression in 1ad0e8688. Caused by `relation.order_values` returning
an array of Arel nodes instead of an array of strings when doing a
`random:1` search.
2022-03-21 01:29:10 -05:00