Commit Graph

12 Commits

Author SHA1 Message Date
evazion
918f32c554 Fix #4461: Improve posts/index page titles. 2022-04-30 01:52:33 -05:00
evazion
db6bb2ccac Fix #5136: Regular tags are now case-sensitive.
* Fix `AST.tag` to downcase the tag name.
* Change PostQuery::Parser to use build nodes using `AST.tag`,
  `AST.metatag`, `AST.wildcard`, etc methods instead of building nodes
  directly. This way all the normalization happens in the node
  constructor methods instead of in the parser.
2022-04-22 02:14:07 -05:00
evazion
ca5dd61728 post queries: optimize zero tag and single tag searches.
Avoid going through the full post query parser for empty searches or
simple single-tag searches.
2022-04-17 23:20:22 -05:00
evazion
eca0ab04f7 post queries: raise error on invalid searches.
Raise an error if the search is invalid for one of the following reasons:

* It contains multiple conflicting order: metatags (e.g. `order:score order:favcount` or `ordfav:a ordfav:b`).
* It contains a metatag that can't be used more than once: (e.g. `limit:5 limit:10`, `random:5 random:10`).
* It contains a metatag that can't be negated (e.g. `-order:score`, `-limit:20`, or `-random:20`).
* It contains a metatag that can't be used in an OR clause (e.g. ` touhou or order:score`, `touhou or limit:20`, `touhou or random:20`).
2022-04-17 23:20:22 -05:00
evazion
a4d43ae72a post queries: track whether metatag values are quoted.
This is necessary for the `commentary:` metatag, which has different
behavior depending on whether the metatag value is quoted. For example,
`commentary:translated` finds translated commentaries, while
`commentary:"translated"` finds commentaries containing the literal word
"translated".
2022-04-06 17:20:27 -05:00
evazion
783419bcd7 post queries: support single-quoted strings in metatags. 2022-04-06 00:18:38 -05:00
evazion
7fe717506d post queries: add methods for normalizing queries. 2022-04-04 03:56:56 -05:00
evazion
1957cb354e post queries: add #replace_aliases method. 2022-04-04 03:56:54 -05:00
evazion
bf7c721815 post queries: refactor AST #simplify method.
Refactor the `PostQuery::AST#simplify` method to split it into three
methods: `#trim` to eliminate redundant AND and OR clauses, `#simplify`
to expand deeply nested subexpressions, and `#sort` to sort the query
into alphabetical order.

This is so we can normalize queries written by users by parsing and
rewriting them, but without expanding out nested subexpressions, which
can substantially alter the way the query is written.
2022-04-04 00:48:40 -05:00
evazion
04551b8154 autocomplete: replace calls to PostQueryBuilder with PostQuery. 2022-03-30 02:12:25 -05:00
evazion
8c9e045a9c PostQuery::AST: fix #to_infix to not add unnecessary parentheses.
* Fix the `#to_infix` method to not add unnecessary parentheses around subexpressions.
* Fix metatags to add quotes around values when necessary.
2022-03-30 01:05:08 -05:00
evazion
4c7cfc73c6 search: add new tag search parser.
Add a new tag tag search parser that supports full boolean expressions, including `and`,
`or`, and `not` operators and parenthesized subexpressions.

This is only the parser itself, not the code for converting the search into SQL. The new
parser isn't used yet for actual searches. Searches still use the old parser.

Some example syntax:

* `1girl 1boy`
* `1girl and 1boy` (same as `1girl 1boy`)
* `1girl or 1boy`
* `~1girl ~1boy` (same as `1girl or 1boy`)
* `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))`
* `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above)
* `1girl -(blonde_hair blue_eyes)`
* `*_hair *_eyes`
* `*_hair or *_eyes`
* `user:evazion or fav:evazion`
* `~user:evazion ~fav:evazion`

Rules:

AND is implicit between terms, but may be written explicitly:

* `a b c` is `a and b and c`

AND has higher precedence (binds tighter) than OR:

* `a or b and c or d` is `a or (b and c) or d`
* `a or b c or d e` is `a or (b and c) or (d and e)`

All `~` operators in the same subexpression are combined into a single OR:

* `a b ~c ~d` is `a b (c or d)`
* `~a ~b and ~c ~d` is `(a or b) (c or d)`
* `(~a ~b) (~c ~d)` is `(a or b) (c or d)`

A single `~` operator in a subexpression by itself is ignored:

* `a ~b` is `a b`
* `~a and ~b` is `a and b`, which is `a b`
* `(~a) ~b` is `a ~b`, which is `a b`

The parser is written as a backtracking recursive descent parser built on top of
StringScanner and a handful of parser combinators. The parser generates an AST, which is
then simplified using Boolean algebra to remove redundant nodes and to convert the
expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).
2022-03-29 18:21:46 -05:00