Commit Graph

2852 Commits

Author SHA1 Message Date
evazion
c96bdd1766 autocomplete: fix ranking of exact matches.
Fix a bug where searching for `sakana~` ranked `sakana~_(meme)` beneath
random artist tags containing the word `sakana`. Now, if the search contains
punctuation, we rank exact matches first, even for small tags. Before we
ranked exact matches for small tags lower than inexact matches for large
tags. If the search contains punctuation, it's a strong signal the user
is looking for an exact match.
2022-09-09 15:58:48 -05:00
evazion
0cc76625eb Update Ruby gems and Yarn packages. 2022-09-07 03:13:13 -05:00
evazion
d2147eca80 tumblr: fix exception when fetching data for video urls.
Fix an exception when trying to fetch source data for URLs like
https://va.media.tumblr.com/tumblr_pgohk0TjhS1u7mrsl.mp4.

For these URLs it's not possible to use the trick where we try to open
the URL as a HTML page and scrape the post id from the HTML. Instead we
get the raw video if we try to to this.
2022-09-05 16:15:47 -05:00
evazion
f55951ab58 tumblr: fix exception when parsing mangled image urls.
Fix a nil exception when trying to parse invalid URLs like `https://25.media.tumblr.com/91719d337b218681abc48cdc24e`.
2022-09-05 16:15:46 -05:00
evazion
f83af31a00 autocomplete: fix usernames not being highlighted in @mentions.
Fix usernames not being highlighted when completing @mentions.

This also changes it so the autocomplete results don't include the '@'
in front of the name.

Minor breaking change to the /autocomplete.json API. Autocomplete
results for mentions now have the type `mention` instead of `user`.
2022-09-03 20:34:26 -05:00
evazion
03c02ef78e autocomplete: fix accidental API breakage.
Fix /autocomplete.json returning results like this:

    [
      {
        "table": {
          "type": "tag-word",
          "label": "long hair",
          "value": "long_hair",
          "category": 0,
          "post_count": 2818211,
          "antecedent": null
        }
      },
    ]

instead of like this:

    [
      {
        "type": "tag-word",
        "label": "long hair",
        "value": "long_hair",
        "category": 0,
        "post_count": 2818211,
      },
    ]

Also change it so that optional attributes like `antecedent` aren't
returned if they're null. This could be a minor breaking change if users
rely on these attributes always being present.

Lastly change XML results to look like this:

    <autocomplete-service-results type="array">
      <autocomplete-service-result>
        <type>tag-word</type>
        <label>long hair</label>
        <value>long_hair</value>
        <category type="integer">0</category>
        <post-count type="integer">2801117</post-count>
      </autocomplete-service-result>
    </autocomplete-service-results>

instead of like this:

    <objects type="array">
      <object>
        <type>tag</type>
        <label>hair ornament</label>
        <value>hair_ornament</value>
        <category type="integer">0</category>
        <post-count type="integer">883340</post-count>
        <antecedent nil="true"/>
      </object>
    </objects>
2022-09-03 18:53:29 -05:00
evazion
6235965da0 autocomplete: fix labels for static metatags.
When completing a static metatag, such as rating: or order:,
don't include the metatag name in the autocomplete menu.

For example, when completing `rating:g`, show `general` in the
autocomplete menu, not `rating:general`.

This makes static metatags consistent with other metatags.
2022-09-02 13:56:53 -05:00
evazion
f8e4e5724f autocomplete: switch to word-based tag matching.
Switch autocomplete to match individual words in the tag, instead of
only matching the start of the tag.

For example, "hair" matches any tag containing the word "hair", not just tags
starting with "hair". "long_hair" matches all tags containing the words "long"
and "hair", which includes "very_long_hair" and "absurdly_long_hair".

Words can be in any order and words can be left out. So "closed_eye" matches
"one_eye_closed". "asuka_langley_souryuu" matches "souryuu_asuka_langley".

This has several advantages:

* You can search characters by first name. For example, "miku" matches "hatsune_miku".
  "zelda" matches both "princess_zelda" and "the_legend_of_zelda".
* You can find the right tag even if you get the word order wrong, or forget a word.
  For example, "eyes_closed" matches "closed_eyes". "hair_over_eye" matches "hair_over_one_eye".
* You can find more related tags. For example, searching "skirt" shows all tags
  containing the word "skirt", not just tags starting with "skirt".

The downside is this may break muscle memory by changing the autocomplete order of
some tags. This is an acceptable trade-off.

You can get the old behavior by writing a "*" at the end of the tag. For
example, searching "skirt*" gives the same results as before.
2022-09-02 13:56:26 -05:00
evazion
2b76a4c5ba tumblr: fix exception when parsing subdomainless Tumblr URLs.
Fix exception when a post has a Tumblr source without a subdomain, such
as `https://tumblr.com`.
2022-08-30 01:52:55 -05:00
evazion
cf13ab1540 autocomplete: render html server-side.
Render the HTML for autocomplete results server-side instead of in
Javascript. This is cleaner than building HTML in Javascript, but it may
hurt caching because the HTTP responses are larger.

Fixes #4698: user autocomplete contains links to /posts

Also fixes a bug where tag counts in the autocomplete menu were different
from tag counts displayed elsewhere because of differences in rounding.
2022-08-30 01:26:02 -05:00
evazion
55266be2ef autocomplete: drop support for shortcut abbreviations in aliases.
Previously if you typed e.g. "/tr" in autocomplete we would first check
if "/tr" was aliased to another tag before expanding out the abbreviation.
This was for compatibility with legacy shortcut aliases. These aliases
have been removed so this is dead code now.
2022-08-30 01:25:46 -05:00
evazion
4448b3d15b posts: fix incorrect post counts for -pool:, -fav: searches
Fix `-pool:1234` and `-fav:evazion` searches incorrectly returning the
same post count as `pool:1234` and `fav:evazion` searches.
2022-08-28 23:17:01 -05:00
evazion
f7794de0b7 weibo: fix bad artist name suggestions in new artist form.
Fix the new artist form suggesting invalid Chinese tag names for Weibo
artists. Suggest `weibo_123456` instead as a placeholder.
2022-08-26 01:25:05 -05:00
evazion
4d009568fd Fix #5165: add support for weibo share urls 2022-08-26 01:12:23 -05:00
evazion
115085006e Fix #5194: AND/OR no longer trigger autocomplete.
Also change the /autocomplete.json API to no longer strip '-' and '~'
from the start of the tag. This may be a breaking change if third-party
scripts relied on this behavior.
2022-08-25 20:45:22 -05:00
evazion
4215d5ed86 Fix #5233: is: metatags don't show deleted posts if relevant.
Fix thumbnails of deleted posts still being hidden when searching for `is:deleted`.
2022-08-24 16:05:34 -05:00
evazion
600bdc9ae6 pixiv: drop support for https://tc-pximg01.techorus-cdn.com urls.
This was an obsolete URL format briefly used by Pixiv around 2019-2020.
There were only ~80 posts with sources using this format. They have been
manually fixed.
2022-08-24 15:54:10 -05:00
evazion
bf3ee9cfb8 Fix #5238: Trying to upload a pixiv direct image url that got trumped by a revision redirects to the new post if it's uploaded.
Bug: When uploading a direct Pixiv image URL, we ignored it in favor of the
image URL returned by the Pixiv API. This meant if you tried to upload the
original version of a revised image, we would get the revised version instead.

Fix: When given a direct Pixiv image URL, use it as-is if it's a full
image URL. If it's a sample image URL, ignore it in favor of the full image
URL as returned by the API, unless the post is deleted and the API data
is unavailable.
2022-08-24 15:40:04 -05:00
evazion
f46134e87f Fix #5234: Weibo URLs get normalized incorrectly in some cases. 2022-08-24 14:47:00 -05:00
evazion
e3af738371 tests: fix broken tests. 2022-08-24 02:03:37 -05:00
evazion
09dfab1f0d hentai foundry: update url for Hentai Foundry tags.
Change the URL used for Hentai Foundry tags from:

    https://www.hentai-foundry.com/search/index?query=elf&search_in=keywords

to:

    https://www.hentai-foundry.com/pictures/tagged/elf
2022-08-24 00:25:37 -05:00
evazion
2c36e02810 foundation.app: fix scraping of image urls.
Foundation changed their HTML page format and we can no longer scrape
the image URL directly from the page. Instead we have to build it based
on API data.
2022-08-24 00:25:37 -05:00
evazion
228850b749 newgrounds: support parsing video urls.
Fixes URLS like `https://www.newgrounds.com/portal/view/830293` being treated as bad_source.
2022-08-23 13:39:32 -05:00
evazion
9c2d362e93 tumblr: fix misparsing of image urls.
Fix URLs like https://yogurtmedia.tumblr.com/post/45732863347 being
misparsed as image urls.
2022-08-20 21:20:46 -05:00
evazion
9cab67c0ac artstation: fix parsing of reserved usernames. 2022-07-06 16:00:54 -05:00
evazion
d7e08d1313 media assets: add ability to search by AI tags.
Add ability to search the /media_assets index by AI tags. Multi-tag
searches are supported, including AND/OR/NOT operators, but metatags
aren't supported. Multi-tag searches will probably be slow.

The default AI tag confidence threshold is 50%. There's a hidden
search[min_score] URL param that lets you change this.
2022-07-06 01:38:41 -05:00
evazion
52ff12dffb dtext: fix wiki links not showing tag type for empty tags. 2022-07-05 15:19:41 -05:00
evazion
9000fa63bc related tags: fix AI tags not showing rating tags.
* Fix the suggested tags list in the related tags box not showing rating tags.
* Fix the suggested tags list showing tags that have been aliased to another tag.
2022-07-02 19:05:12 -05:00
evazion
6386962357 Fix #5225: PG::AmbiguousColumn: ERROR: column reference "bit_prefs" is ambiguous 2022-07-02 17:37:22 -05:00
evazion
0d953e2492 related tags: add AI tags to related tags section.
Add a Suggested tags list to the Related Tags box. The suggested tags
are just the AI tags for the post.

Suggested tags are currently hidden in CSS for beta testing. Use custom
CSS to unhide them.
2022-07-02 05:29:59 -05:00
evazion
d1ace32c40 bigquery: exclude large tables from nightly BigQuery dumps.
Exclude the posts, post_votes, favorites, media_assets, and ai_tags
tables from the BigQuery dumps. These usually take too long to complete
and also consume huge amounts of memory in the background workers.
2022-06-30 21:42:30 -05:00
evazion
a9fe73a483 ai tags: save ai tags on upload.
Save the AI tags when a media asset is uploaded.
2022-06-28 03:12:46 -05:00
evazion
ab7462a42d discord: fixup /tagme command.
Add code left out of 0435967f.
2022-06-27 03:57:09 -05:00
evazion
04359d67f4 discord: update /tagme command to use new autotagger service. 2022-06-27 01:40:44 -05:00
evazion
ee57ada33b ai tags: add autotagger API client.
Add API client for https://github.com/danbooru/autotagger service.
2022-06-27 01:09:14 -05:00
evazion
1aeb52186e Add AI tag model and UI.
Add a database model for storing AI-predicted tags, and add a UI for browsing and searching these tags.

AI tags are generated by the Danbooru Autotagger (https://github.com/danbooru/autotagger). See that
repo for details about the model.

The database schema is `ai_tags (media_asset_id integer, tag_id integer, score smallint)`. This is
designed to be as space-efficient as possible, since in production we have over 300 million
AI-generated tags (6 million images and 50 tags per post). This amounts to over 10GB in size, plus
indexes.

You can search for AI tags using e.g. `ai:scenery`. You can do `ai:scenery -scenery` to find posts
where the scenery tag is potentially missing, or `scenery -ai:scenery` to find posts that are
potentially mistagged (or more likely where the AI missed the tag).

You can browse AI tags at https://danbooru.donmai.us/ai_tags. On this page you can filter by
confidence level. You can also search unposted media assets by AI tag.

To generate tags, use the `autotag` script from the Autotagger repo, something like this:

  docker run --rm -v ~/danbooru/public/data/360x360:/images ghcr.io/danbooru/autotagger ./autotag -c -f /images | gzip > tags.csv.gz

To import tags, use the fix script in script/fixes/. Expect a Danbooru-size dataset to take
hours to days to generate tags, then 20-30 minutes to import. Currently this all has to be done by hand.
2022-06-24 04:54:26 -05:00
evazion
ae9495ec7c discord: allow only rating:g posts in SFW channels. 2022-06-08 17:36:49 -05:00
evazion
4364516f2b posts: change safe mode to only allow rating:g posts. 2022-06-06 00:56:52 -05:00
evazion
7149845677 Merge pull request #5202 from nonamethanks/fix-nicoseiga-oekaki-bad-tag
Nicoseiga: normalize oekaki links
2022-06-05 15:56:37 -05:00
evazion
821b97b4e2 Merge pull request #5201 from nonamethanks/fix-deviantart
Deviantart: fix regression in 3a0a32b98a
2022-06-05 15:56:06 -05:00
evazion
05df3a194c posts: add more free metatags.
The following metatags no longer count against the tag search limit:

* is
* id
* date
* age
* filesize
* filetype
* parent
* child
* md5
* width
* height
* duration
* mpixels
* ratio
* score
* upvotes
* downvotes
* favcount
* embedded
* tagcount
* pixiv_id
* pixiv

These are mostly metatags that have to do with properties of the post
itself. Other metatags still count because they involve things like
subqueries or joins, or they're more tag-like in function.
2022-06-05 14:28:44 -05:00
nonamethanks
e7584c7e0a Nicoseiga: normalize oekaki links 2022-06-04 22:57:54 +02:00
nonamethanks
2fd8e9bc14 Deviantart: fix regression in 3a0a32b98a 2022-06-04 20:26:14 +02:00
evazion
572c58d115 Merge pull request #5187 from Hyozen1/patch-1
Allow background-clip and -webkit-background-clip properties in notes
2022-06-01 19:26:06 -05:00
evazion
cc2285ed6f Merge pull request #5188 from nonamethanks/fix-deviantart
Fix deviantart strategy to get biggest available size
2022-06-01 18:39:43 -05:00
evazion
173e43b192 user upgrades: add upgrade code system.
Add a system for upgrading accounts using upgrade codes. Users purchase
an upgrade code off-site then redeem it on-site to upgrade their account
to Gold. Upgrade codes are randomly pre-generated and are one time use
only. Codes have enough randomness that guessing a code is infeasible.
2022-06-01 18:31:46 -05:00
Hyozen1
547af83af8 Add background-clip and -webkit complementar properties to notes 2022-06-01 18:55:18 -03:00
nonamethanks
3a0a32b98a Fix deviantart strategy to get biggest available size 2022-05-27 17:07:22 +02:00
evazion
81bd86d202 posts: add "general" rating; rename "safe" rating to "sensitive".
* Add "general" rating.
* Rename "safe" rating to "sensitive".
* Change safe mode to include both rating:s and rating:g.
* Treat rating:safe as a synonym for rating:sensitive.
* Link "howto:rate" in the post edit form.
2022-05-22 13:38:45 -05:00
evazion
d346adabc9 Revert "posts: fix rounding errors in ratio: metatag."
This reverts commit 80ced3e418.

This turned out to be intentional. Rounding the aspect ratio to 2
decimal places is so that searches for exact ratios like `ratio:16:9` or
`ratio:1.78` work even when the ratio doesn't exactly match. Rounding to
2 decimal places means that the ratio: metatag has a 1% error tolerance.
2022-05-22 12:37:26 -05:00