Commit Graph

2504 Commits

Author SHA1 Message Date
nonamethanks
49e232f2ae Foundation: add support for unconventional account names 2021-11-09 13:35:52 +01:00
evazion
2225c9b472 Merge pull request #4912 from nonamethanks/feat-foundation-video
Foundation: add support for videos
2021-11-05 06:03:45 -05:00
evazion
a5f589f9e0 aliases/implications: change automatic retirement rules.
Change the rules for automatically retiring aliases and implications:

* Retire aliases to tags that are empty, or that are for a general or
  artist tag that hasn't received any new posts in the last two years.
* Retire implications from tags that are empty.
* Don't retire aliases or implications for character, copyright, or
  meta tags any more, unless the tags are empty.
2021-11-05 05:46:50 -05:00
nonamethanks
6c9b49c194 Foundation: add support for videos 2021-11-05 09:43:49 +01:00
evazion
65ab7f1eb5 API: fix regression in expires_in URL parameter.
Fix `https://danbooru.donmai.us/artists.json?expires_in=300` failing with
an `'300' is not a valid duration` error. This call pattern is used by the
Translate Pixiv Tags userscript.

Caused by a5ed8c72, which changed the `age:N` metatag to require time
units, but this inadvertently changed the `expires_in` parameter to
require them too.

Using `expires_in` without time units is deprecated and will be removed
in the future.
2021-11-04 03:51:39 -05:00
evazion
7709e84502 BURs: allow reapproving failed BURs containing alias or implication removals.
Make it possible to reapprove failed BURs that removed aliases or
implications.

Before if a BUR failed midway through, and we tried to reapprove it,
then it would fail when it got to a `remove alias` line because the
alias had already been removed. Now we keep going if we try to remove an
alias or implication that has already been removed.
2021-11-03 19:45:28 -05:00
evazion
148752d3c4 PostQueryBuilder: remove useless code.
The workaround for `unaliased:fav:1` is no longer needed since favorites
are no longer included in the post's tag_index.
2021-11-02 04:07:21 -05:00
evazion
4b94cac757 Merge pull request #4911 from nonamethanks/feat-plurk
Add Plurk support
2021-11-02 04:07:07 -05:00
evazion
a5ed8c72c9 search: fix parsing of invalid metatag values.
* Change `age:` metatag to require time units. This means e.g.
  `age:<600` no longer works; instead you have to say `age:<600sec`.

* Allow time units in the `age:` metatag to be abbreviated as long as
  they're unambiguous. This means `age:<60sec`, `age:<5min`, and
  `age:<5mon` now work, in addition to `age:<60s` and `age:<60seconds`.

* Allow the `ratio:` metatag to be written like `ratio:16/9` in addition
  to `ratio:16:9`.

* Fix invalid date searches like `date:foo` or `date:05-15-2021`
  to return nothing instead of raising an "undefined method
  'beginning_of_day' for nil" exception. (`date:05-15-2021` is invalid
  because it's parsed as DD-MM-YYYY).

* Fix invalid searches like `score:foo`, `ratio:foo`, and `mpixels:foo`
  to return nothing instead of being treated like `score:0`, `ratio:0`,
  `mpixels:0`.

* Fix `age:<60m` to return nothing instead of silently being treated
  like `age:<60seconds`.

* Fix `age:foo` to return nothing instead of silently being treated like
  `age:0d` (return all uploads from today).

Fixes #4389.
2021-11-02 01:54:05 -05:00
nonamethanks
060223c9e2 Add Plurk support 2021-11-01 16:21:27 +01:00
evazion
788dcbd87b Temp disable dumping favorites table to BigQuery.
The favorites table is too big and dumping it tends to time out. Then
the job keeps retrying even though it always fails, then multiple
instances of the job build up in the job queue because the old jobs
never finish.
2021-11-01 05:15:31 -05:00
evazion
5177a28f2c Merge pull request #4910 from nonamethanks/feat-foundation
Add Foundation support
2021-11-01 05:07:44 -05:00
nonamethanks
043f2fb124 Add Foundation support 2021-11-01 01:39:56 +01:00
evazion
9ff4d94382 Merge pull request #4909 from nonamethanks/add-lofter-theme
Lofter: Add support for additional theme
2021-10-31 05:13:04 -05:00
nonamethanks
5946544f71 Lofter: Add support for additional theme 2021-10-30 17:22:45 +02:00
evazion
f593828bb9 storage manager: refactor base_dir option.
Fix it so the `base_dir` option is only required by subclasses that
actually use it. The StorageManager::Mirror class doesn't use it.
2021-10-29 07:14:21 -05:00
evazion
4095d14f2a media assets: fix tagged filenames option.
Fix the `enable_seo_post_urls` config option not being respected. This
option controls whether filenames in image URLs contain the tags. This
option requires URLs rewrites in Nginx to work so it's disabled by
default.
2021-10-29 07:14:21 -05:00
evazion
1614b301e3 storage managers: add mirror storage manager.
Add a storage manager that allows mirroring files to multiple storage
backends.
2021-10-29 07:14:21 -05:00
evazion
e697d1886d Fix #4899: Alias fails when implication already exists. 2021-10-27 01:18:54 -05:00
evazion
eddff747d6 Fix certain IPs not being recognized as proxies.
Fix certain IPs (namely Digital Ocean IPs) no longer being recognized as
proxy IPs by the Ipregistry.co API. Caused by some sudden change in the
API.
2021-10-27 00:05:44 -05:00
evazion
082544ab03 StorageManager: remove Post-specific code.
Refactor StorageManager to remove all image URL generation code. Instead
the image URL generation code lives in MediaAsset.

Now StorageManager is only concerned with how to read and write files to
remote storage backends like S3 or SFTP, not with how image URLs should
be generated. This way the file storage code isn't tightly coupled to
posts, so it can be used to store any kind of file, not just images
belonging to posts.
2021-10-27 00:05:30 -05:00
evazion
84212acfae Merge pull request #4905 from nottalulah/remove-locks-from-autocomplete
remove references to locks
2021-10-25 21:18:36 -05:00
evazion
f1b5c34b4d posts: show length of videos and animations in thumbnails.
Show the length of videos and animated posts in the thumbnail. The
length is shown the top left corner in MM:SS format. This replaces the
play button icon.

Show a speaker icon instead of a music note icon for posts with sound.

Doing this requires doing `.includes(:media_asset)` in a bunch of
places to avoid N+1 queries when we access the post's duration.
2021-10-25 02:56:55 -05:00
Lily
647848b499 remove references to locks 2021-10-24 15:16:48 -03:00
evazion
8d5e0a5b58 replacements: don't delete replaced files.
Don't delete replaced files after 30 days. There are only about 30k
replacements in total, so the cost of keeping replaced files is
negligible. It was also wrong because the media asset wasn't destroyed
too, so there were active media assets with missing files.
2021-10-24 04:35:13 -05:00
evazion
f78378cc69 recommendations: reduce cache lifetime to 5 minutes.
These calls aren't actually slow and don't need to be cached for long
periods of time.
2021-10-21 00:46:47 -05:00
evazion
0221ecdf29 uploads: remove useless code. 2021-10-18 06:25:02 -05:00
evazion
748fdf33d4 uploads: don't autotag sound on videos.
Don't automatically add the sound tag to videos. This was incorrect
nearly 20% of the time because of silent audio tracks. This error rate
is too high.

https://danbooru.donmai.us/posts?tags=exif:Track2:AudioChannels+-sound
https://danbooru.donmai.us/posts?tags=exif:Track1:AudioChannels+-sound
2021-10-18 06:16:47 -05:00
evazion
bc506ed1b8 uploads: refactor to simplify ugoira-handling and replacements:
* Make it so replacing a post doesn't generate a dummy upload as a side effect.
* Make it so you can't replace a post with itself (the post should be regenerated instead).
* Refactor uploads and replacements to save the ugoira frame data when
  the MediaAsset is created, not when the post is created. This way it's
  possible to view the ugoira before the post is created.
* Make `download_file!` in the Pixiv source strategy return a MediaFile
  with the ugoira frame data already attached to it, instead of returning it
  in the `data` field then passing it around separately in the `context`
  field of the upload.
2021-10-18 05:18:46 -05:00
evazion
1d034a3223 media assets: move more file-handling logic into MediaAsset.
Move more of the file-handling logic from UploadService and
StorageManager into MediaAsset. This is part of refactoring posts and
uploads to allow multiple images per post.
2021-10-18 00:10:29 -05:00
evazion
8b3ab04724 media file: fix calculation of video/animation duration.
Fix how the duration of videos and animated GIFs / PNGs is calculated.
If we can't determine the duration from the file metadata, then play the
entire video or animation back using FFmpeg and scrape the duration and
frame count.

This is necessary for things like WebM files where the duration metadata
is optional, or animated GIFs and PNGs that don't have a duration field
in the metadata, only a frame count and a sequence of frame delays.
2021-10-17 20:15:51 -05:00
evazion
2845164872 search: support quoted phrases, OR, and NOT operators in full-text search.
Make all full-text search fields support quoted phrases and OR and NOT
operators.

This affects all text search fields (any search field that looks like `*_matches`).

Examples:

* hakurei reimu   - matches anything containing the words "hakurei" and "reimu", in any order.
* hakuri or reimu - matches either "hakurei" or "reimu".
* hakurei -reimu  - matches "hakurei" but not "reimu"
* "hakurei reimu" - matches the exact phrase "hakurei reimu"
* "reimu hakurei" - matches the exact phrase "reimu hakurei"

* https://danbooru.donmai.us/notes?search[body_matches]=reimu+hakurei
* https://danbooru.donmai.us/notes?search[body_matches]=reimu+or+hakurei
* https://danbooru.donmai.us/notes?search[body_matches]=reimu+-hakurei
* https://danbooru.donmai.us/notes?search[body_matches]="hakurei+reimu"
* https://danbooru.donmai.us/notes?search[body_matches]="reimu+hakurei"

The phrase search ability partially fixes #4536 (Inconsistent behavior
of search function for comments/forums).

See `websearch_to_tsquery` [1] for full details of the search syntax.

[1]: https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES
2021-10-16 19:13:09 -05:00
evazion
e3b836b506 Refactor full-text search to get rid of tsvector columns.
Refactor full-text search on several tables (comments, dmails,
forum_posts, forum_topics, notes, and wiki_pages) to use to_tsvector
expression indexes instead of dedicated tsvector columns. This way
full-text search works the same way across all tables.

API changes:

* Changed /wiki_pages.json?search[body_matches] to match against only
  the body. Before `body_matches` matched against both the title and the body.

* Added /wiki_pages.json?search[title_or_body_matches] to match against
  both the title and the body.

* Fixed /dmails.json?search[message_matches] to match against both the
  title and body when doing a wildcard search. Before a wildcard search
  only matched against the body.

* Added /dmails.json?search[body_matches] to match against only the dmail body.
2021-10-16 07:44:27 -05:00
evazion
c0f744f84d Fix #4893: Add a FIELD_present parameter variation for text fields.
Usage:

* https://danbooru.donmai.us/wiki_pages.json?search[body_present]=true
* https://danbooru.donmai.us/wiki_pages.json?search[body_present]=false
2021-10-13 04:10:23 -05:00
evazion
206a4b5de5 docker: avoid rebuilding CSS/JS assets on every commit.
Restructure the Dockerfile and the CSS/JS files so that we only rebuild
the CSS and JS when they change, not on every commit.

Before it took several minutes to rebuild the Docker image after every
commit, even when the JS/CSS files didn't change. This also made pulling
images slower.

This requires refactoring the CSS and JS to not use embedded Ruby (ERB)
templates, since this made the CSS and JS dependent on the Ruby
codebase, which is why we had to rebuild the assets after every Ruby
change.
2021-10-13 02:48:30 -05:00
evazion
587a9d0c8f tags: move tag category definitions out of the config file.
Move all the code for defining tag categories from the config file to
TagCategory. It didn't belong in the config because it's not possible to
add new tag categories purely in the config without editing other things
like the CSS.

Also change it so that tag colors are hardcoded in the CSS instead of
generated using ERB. Generating the CSS in ERB meant that the Docker
build had to recompile the CSS on every commit, even when it didn't
change, because it relied on Ruby code outside the CSS that we couldn't
guarantee didn't change.
2021-10-12 21:17:17 -05:00
evazion
92e20713e3 search: fixup hardcoded small search threshold.
Fixup for f6abf39eb.
2021-10-12 19:01:31 -05:00
evazion
f6abf39ebc search: try to optimize slow searches.
Try to optimize certain types of common slow searches:

* Searches for mutually-exclusive tags (e.g. `1girl multiple_girls`,
  `touhou solo -1girl -1boy`)
* Relatively large tags that are heavily skewed towards old posts
  (e.g. lucky_star, haruhi_suzumiya_no_yuuutsu, inazuma_eleven_(series),
  imageboard_desourced).
* Mid-sized tags in the <30k post range that Postgres thinks are
  big enough for a post id index scan, but a tag index scan is faster.

The general pattern is Postgres not using the tag index because it
thinks scanning down the post id index would be faster, but it's
actually much slower because it degrades to a full table scan. This
usually happens when Postgres thinks a tag is larger or more common than
it really is. Here we try to force Postgres into using the tag index
when we know the search is small.

One case that is still slow is `2girls -multiple_girls`. This returns no
results, but we can't know that without searching all of `2girls`. The
general case is searching for `A -B` where A is a subset of B and A and B
are both large tags.

Hopefully fixes #581, #654, #743, #1020, #1039, #1421, #2207, #4070,
 #4337, #4896, and various other issues raised over the years regarding
slow searches.
2021-10-12 02:30:30 -05:00
evazion
0b22e873c9 search: cache timed out search counts.
When a search is performed, we cache the post count so we don't have to
calculate it again every time the user switches pages. However, if the
count times out, we didn't cache it before, causing us to do a slow
count on every page load. This usually happens on multi-tag searches
that return a lot of results, `1girl solo` for example.

This changes it so that the count is cached even when it times out. This
will speed up large multi-tag searches.

This also changes it so that the count is cached for a fixed 5 minutes.
Before it was variable based on the size of the count, but this probably
didn't make much difference.
2021-10-12 01:33:21 -05:00
evazion
f155023b77 posts: remove unused exception classes. 2021-10-11 18:58:15 -05:00
evazion
7976323f7a wiki pages: change tsvector update trigger to not use test_parser.
Change the wiki_pages tsvector_update_trigger to use
`pg_catalog.english` instead of `public.danbooru`. This changes how wiki
page text is parsed for full-text search to use the standard English
parser instead of test_parser. This is to prepare for dropping
test_parser. Using test_parser here was wrong anyway because it meant
that punctuation wasn't removed from words when indexing wiki pages for
full-text search.
2021-10-11 03:34:47 -05:00
evazion
37a8dc5dbd posts: use string_to_array index for tag searches.
Use the `string_to_array(tag_string, ' ')` index instead of the
`tag_index` for tag searches. The string_to_array index lets us treat
the tag_string as an array for searching purposes. This lets us get rid
of the tag_index column and the test_parser dependency in the future.
2021-10-10 22:00:10 -05:00
evazion
1653392361 posts: stop updating fav_string attribute.
Stop updating the fav_string attribute on posts. The column still exists
on the table, but is no longer used or updated.

Like the pool_string in 7d503f08, the fav_string was used in the past to
facilitate `fav:X` searches. Posts had a hidden fav_string column that
contained a list of every user who favorited the post. These were
treated like fake hidden tags on the post so that a search for `fav:X`
was treated like a tag search.

The fav_string attribute has been unused for search purposes for a while
now. It was only kept because of technicalities that required
departitioning the favorites table first (340e1008e) before it could be
removed. Basically, removing favorites with `@favorite.destroy` was
slow because Rails always deletes object by ID, but we didn't have an
index on favorites.id, and we couldn't easily add one until the
favorites table was departitioned.

Fixes #4652. See https://github.com/danbooru/danbooru/issues/4652#issuecomment-754993802
for more discussion of issues caused by the fav_string (in short: write
amplification, post table bloat, and favorite inconsistency problems).
2021-10-09 22:36:26 -05:00
evazion
5ce36b482f maintenance: disable amcheck job.
Creates too much load and causes creating favorites to time out.
2021-10-09 11:45:36 -05:00
evazion
8b0d58130c posts: add workaround to avoid falsely delete pending posts.
Add a temporary workaround for the database index corruption bug. Add a
check to skip deleting pending posts if they're not really pending.
2021-10-08 21:47:56 -05:00
evazion
c4eeeb8531 search: optimize counting posts for fav: and pool: searches.
Optimize counting the number of posts returned by fav:<name> and
pool:<name> searches. Use cached counts to avoid slow count(*) queries
for users with lots of favorites.
2021-10-08 21:26:42 -05:00
evazion
26a411ba27 favorites: include favorites in bigquery exports.
Include the favorites table in the nightly database dumps in BigQuery.
Previously we couldn't do this because we didn't have an index on
the favorite ID, which we needed to iterate across the table efficiently.

Note that this doesn't include private favorites. Note also that if a
user switches their favorites from private to public, then their
favorites will begin to appear in these dumps.
2021-10-08 21:26:42 -05:00
evazion
340e1008e9 favorites: merge favorites subtables.
Merge the 100 favorite subtables into a single table.

Previously the favorites table was partitioned by user id into 100
subtables to try to make searching by user id faster. This wasn't really
necessary and probably slower than just making an index on
(favorites.user_id, favorites.id) to satisfy ordfav searches. BTree
indexes are logarithmic so dividing an index by 100 doesn't make it 100
times faster to search; instead it just removes a layer or two from the
tree.

This also adds a uniqueness index on (user_id, post_id) to prevent
duplicate favorites. Previously we had to check for duplicates at the
application layer, which required careful locking to do it correctly.

Finally, this adds an index on favorites.id, which was surprisingly
missing before. This made ordering and deleting favorites by id really
slow because it degraded to a sequential scan.
2021-10-08 21:26:42 -05:00
evazion
595e02ab45 posts: add duration:<x> and order:duration metatags.
Add duration:<x> and order:duration metatags for searching animated
posts by duration.

https://danbooru.donmai.us/posts?tags=animated+duration:<5.0
https://danbooru.donmai.us/posts?tags=animated+duration:>60
https://danbooru.donmai.us/posts?tags=animated+order:duration
2021-10-07 03:21:08 -05:00
evazion
2595f18b2f posts: fix calculation of animated PNG duration.
Fix certain animated PNGs returning NaN as the duration because the
frame rate was being reported as "0/0" by FFMpeg. This happens when the
animation has zero delay between frames. This is supposed to mean a PNG
with an infinitely fast frame rate, but in practice browsers limit it to
around 10FPS. The exact frame rate browsers will use is unknown and
implementation defined.
2021-10-06 21:04:36 -05:00