danbooru

Author	SHA1	Message	Date
evazion	a272c19b98	Fix #5078 : Pixiv booth upload broken. Allow image URLs from https://booth.pximg.net to be uploaded. Fix bug where Booth.pm URLs were incorrectly caught by the Pixiv extractor.	2022-03-30 03:25:42 -05:00
evazion	04551b8154	autocomplete: replace calls to PostQueryBuilder with PostQuery.	2022-03-30 02:12:25 -05:00
evazion	6edff247f2	search: replace calls to PostQueryBuilder#fast_count with PostQuery#fast_count. Prepare a few more places for the new tag search parser.	2022-03-30 01:37:11 -05:00
evazion	8c9e045a9c	PostQuery::AST: fix #to_infix to not add unnecessary parentheses. * Fix the `#to_infix` method to not add unnecessary parentheses around subexpressions. * Fix metatags to add quotes around values when necessary.	2022-03-30 01:05:08 -05:00
evazion	fb0a7851bf	BURs: fix `mass update A -> B` not being allowed. Fix mass updates of the form `mass update A -> B` not being allowed. This was originally because after `rename` was introduced, we wanted to prevent people from using mass updates to move tags. Now, `mass update A -> B` adds B to all posts tagged A instead of moving A to B. So `mass update A -> B` should no longer be disallowed. This also makes it so that it's an error to create a mass update with a syntax error in the search. Before searches couldn't have syntax errors, but now with the new query parser it's possible.	2022-03-29 21:31:28 -05:00
evazion	226faae8ec	BURs: fix tags field not finding all BURs with that tag. Fix the Tags field in the BUR search form not finding all BURs mentioning that tag. Specifically, tags that were part of a mass update, and that were prefixed with `~` or `-` (OR tags and NOT tags), weren't indexed as tags affected by the BUR. This requires re-running script/fixes/064_initialize_bulk_update_request_tags.rb to fix old BURs.	2022-03-29 21:06:24 -05:00
evazion	823fa5c6e9	search: switch to new tag search parser in a few places.	2022-03-29 18:21:47 -05:00
evazion	4c7cfc73c6	search: add new tag search parser. Add a new tag tag search parser that supports full boolean expressions, including `and`, `or`, and `not` operators and parenthesized subexpressions. This is only the parser itself, not the code for converting the search into SQL. The new parser isn't used yet for actual searches. Searches still use the old parser. Some example syntax: * `1girl 1boy` * `1girl and 1boy` (same as `1girl 1boy`) * `1girl or 1boy` * `~1girl ~1boy` (same as `1girl or 1boy`) * `1girl and ((blonde_hair blue_eyes) or (red_hair green_eyes))` * `1girl ~(blonde_hair blue_eyes) ~(red_hair green_eyes)` (same as above) * `1girl -(blonde_hair blue_eyes)` * `_hair _eyes` * `_hair or _eyes` * `user:evazion or fav:evazion` * `~user:evazion ~fav:evazion` Rules: AND is implicit between terms, but may be written explicitly: * `a b c` is `a and b and c` AND has higher precedence (binds tighter) than OR: * `a or b and c or d` is `a or (b and c) or d` * `a or b c or d e` is `a or (b and c) or (d and e)` All `~` operators in the same subexpression are combined into a single OR: * `a b ~c ~d` is `a b (c or d)` * `~a ~b and ~c ~d` is `(a or b) (c or d)` * `(~a ~b) (~c ~d)` is `(a or b) (c or d)` A single `~` operator in a subexpression by itself is ignored: * `a ~b` is `a b` * `~a and ~b` is `a and b`, which is `a b` * `(~a) ~b` is `a ~b`, which is `a b` The parser is written as a backtracking recursive descent parser built on top of StringScanner and a handful of parser combinators. The parser generates an AST, which is then simplified using Boolean algebra to remove redundant nodes and to convert the expression to conjunctive normal form (that is, a product of sums, or an AND of ORs).	2022-03-29 18:21:46 -05:00
evazion	231075fb49	artists: fix artist finder to return nothing if it finds too many duplicates	2022-03-26 15:08:55 -05:00
evazion	44903abe28	BURs: change mass updates to not remove left-hand side tags. Change mass updates to not automatically remove the left-hand side tags from the post. This won't work with full boolean searches in the future and already doesn't work with complex searches involving metatags or OR-tags.	2022-03-26 02:01:04 -05:00
evazion	dd21d4b45c	tags: don't allow tags to start with '(' or '['. Also don't allow the words 'and', 'or', and 'not'. Related to #4949.	2022-03-26 00:38:34 -05:00
evazion	d9d3c1dfe4	sources: rename Sources::Strategies to Source::Extractor. Rename Sources::Strategies to Source::Extractor. A Source::Extractor represents a thing that extracts information from a given URL.	2022-03-24 03:49:44 -05:00
evazion	34aa22f90b	sources: fix fandom.com page urls. Fix it so that sources like this: * https://vignette.wikia.nocookie.net/valkyriecrusade/images/c/c5/Crimson_Hatsune_H.png/revision/latest?cb=20180702031954 link to this: * https://valkyriecrusade.fandom.com/?file=Crimson_Hatsune_H.png instead of this * https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png The `/wiki/File:$name` URL redirects to whatever wiki page contains the image instead of showing the file itself.	2022-03-23 23:38:06 -05:00
evazion	5941c47b79	nicoseiga: support a few more url types.	2022-03-23 23:38:06 -05:00
evazion	c07c5ea594	nicoseiga: fix page_url method not to return seiga.nicovideo.jp/image/source/:id urls. Fix the page_url method not to return URLs like this: https://seiga.nicovideo.jp/image/source/8017978 (page: https://seiga.nicovideo.jp/watch/mg310193) These are direct image URLs, not page URLs. It's not generally possible to get to the page URL from an image URL like this. This fixes it so that we don't incorrectly set the source of NicoSeiga uploads to the image URL.	2022-03-23 23:38:06 -05:00
evazion	4ef8178bd1	sources: remove `canonical_url` method. Refactor source strategies to remove the `canonical_url` method. `canonical_url` returned the URL that should be used as the source of the post after upload. Now we simply use `Source::URL#page_url` to determine the source after upload. If the source is an image URL that is convertible to a page URL, then the image URL is used as the source. If the source is an image URL that is not convertible to a page URL, then the page URL is used as the source. This simplifies source strategies so that all they have to care about is implementing the `Source::URL#page_url` and `Sources::Strategies#page_url` methods, and the preferred source will be chosen for posts automatically.	2022-03-23 23:38:06 -05:00
evazion	fbca31d29e	artists: add more artist url icons.	2022-03-23 02:59:22 -05:00
evazion	c51d1a6f5e	artists: add more sites to artist finder blacklist.	2022-03-23 02:30:52 -05:00
evazion	3aa5cab2aa	sources: refactor normalize_for_source. `normalize_for_source` was used to convert image URLs to page URLs when displaying sources on the post show page. Move all the code for converting image URLs to page URLs from `Sources::Strategies#normalize_for_source` to `Source::URL#page_url`. Before we had to be very careful in source strategies not to make any network calls in `normalize_for_source`, since it was used in the view for the post show page. Now all the code for generating page URLs is isolated in Source::URL, which makes source strategies simpler. It also makes it easier to check if a source is an image URL or page URL, and if the image URL is convertible to a page URL, which will make autotagging bad_link or bad_source feasible. Finally, this fixes it to generate better page URLs in a handful of cases: * https://www.artstation.com/artwork/qPVGP instead of https://anubis1982918.artstation.com/projects/qPVGP * https://yande.re/post/show?md5=b4b1d11facd1700544554e4805d47bb6s instead of https://yande.re/post?tags=md5:b4b1d11facd1700544554e4805d47bb6 * http://gallery.minitokyo.net/view/365677 instead of http://gallery.minitokyo.net/download/365677 * https://valkyriecrusade.fandom.com/wiki/File:Crimson_Hatsune_H.png instead of https://valkyriecrusade.wikia.com/wiki/File:Crimson_Hatsune_H.png * https://rule34.paheal.net/post/view/852405 instead of https://rule34.paheal.net/post/list/md5:854806addcd3b1246424e7cea49afe31/1	2022-03-23 01:34:04 -05:00
evazion	770f850c66	instagram: add a couple more url types.	2022-03-22 04:35:50 -05:00
evazion	1e6e519709	artists: add more sites to artist finder blacklist.	2022-03-22 04:27:46 -05:00
evazion	dd764bb4cf	Merge pull request #5062 from NamelessContributor/fix-naver-duplicate artists: add naver.com to artist finder blacklist.	2022-03-22 03:41:06 -05:00
evazion	452ce8d165	artstation: add partial support for video clips (#5063 ). Add partial support for fetching videos from ArtStation posts that contain videos. Most of this code is disabled for now because actually downloading these videos requires bypassing a Cloudflare captcha.	2022-03-21 16:51:42 -05:00
NamelessContributor	038c767455	artists: add naver.com to artist finder blacklist.	2022-03-21 17:34:44 +01:00
evazion	56f47c60e1	posts: fix exception when viewing post with source `Blog.`. Fix a PublicSuffix::DomainNotAllowed exception raised with viewing or editing a post with a source like `Blog.`. This happened when parsing the post's source. `Danbooru::URL.parse("Blog.")` would heuristically parse the source into `http://blog`. Calling any methods related to the URL's hostname or domain would lead to calling `PublicSuffix.parse("blog")`, which would fail with PublicSuffix::DomainNotAllowed.	2022-03-21 03:24:50 -05:00
evazion	defea08084	posts: fix exception in random:1 searches. Fix regression in `1ad0e8688`. Caused by `relation.order_values` returning an array of Arel nodes instead of an array of strings when doing a `random:1` search.	2022-03-21 01:29:10 -05:00
evazion	f52dc9e2ad	tinami: fix 'http already memoized' warning.	2022-03-21 00:37:55 -05:00
evazion	7c887f8adc	artists: fix exception when adding TwitPic urls.	2022-03-20 21:56:38 -05:00
evazion	705edfb175	artists: add more patterns to artist finder blacklist.	2022-03-20 21:27:38 -05:00
evazion	7394660ba9	posts: fix exception when post has source like 'https://www.twitter.com/username '. `twitter.com` sources worked but `www.twitter.com` didn't. Also match the URL by class instead of by site name to ensure we match the expected class.	2022-03-20 21:08:05 -05:00
evazion	1ad0e8688d	posts: fix timeouts for searches using sequential navigation. Fix certain searches timing out when using sequential navigation (page=b1234). The problem was that the so-called "small search optimization" (AKA: force Postgres to use the tag index for small searches instead a sequential scan) wasn't triggering because the ORDER BY clause for sequential navigation was `posts.id desc`, and we were only checking for `posts.id DESC`.	2022-03-20 18:46:06 -05:00
evazion	71f42d67a7	tinami: return nothing if getting the full image fails. Fix to make sure `image_urls` returns an empty array instead of `[nil]` if grabbing the full image URL fails for whatever reason.	2022-03-19 23:42:34 -05:00
evazion	7f58cfbe5e	tinami: get the full image. Support grabbing the full image for Tinami uploads, rather than the sample. Getting the full image requires making a request like this: curl -X POST \ -H 'Referer: https://www.tinami.com/' \ -H 'Content-Type: application/x-www-form-urlencoded' \ -H 'Cookie: Tinami2SESSID=<redacted>;' \ --data-raw 'action_view_original=true&cont_id=1087268&ethna_csrf=<redacted>' \ https://www.tinami.com/view/1087268 Then scraping the <img> tag from the resulting HTML page. If the post has multiple images, then we need to scrape and pass the `sub_id` of the image too. Fixes #2818.	2022-03-19 23:22:09 -05:00
evazion	01b683798e	sources: add Tinami support.	2022-03-19 00:50:36 -05:00
evazion	40cbc0423c	sources: add Instagram profile url normalization.	2022-03-18 18:20:29 -05:00
evazion	c6e528a073	artstation: normalize `https://artstation.com/artist/username/albums/all/` urls.	2022-03-18 17:10:26 -05:00
evazion	cc54b5f730	fanbox: normalize http://www.pixiv.net/fanbox/creator/3113804/post urls.	2022-03-18 17:10:26 -05:00
evazion	26d23c49d0	pawoo: normalize https://pawoo.net/users/evazion urls.	2022-03-18 17:10:26 -05:00
evazion	a78b6528dc	weibo: fix https://m.weibo.cn/detail/1234 urls not finding the artist. Fix https://m.weibo.cn/detail/4506950043618873 type URLs not finding the artist because the profile_url method returned nil instead of the actual profile URL.	2022-03-18 06:01:51 -05:00
evazion	03d2098d6d	artists: fix artist finder returning wrong results when given nil url. Fix the artist finder returning incorrect results when given a nil URL. This only happened when an artist with a URL like this existed: http:///blog.naver.com/dan_rak Note the triple `///`; the extra `/` messed up the artist finder. The artist finder may be given a nil URL when a source strategy returns a nil profile URL, usually because the source is bad_id.	2022-03-18 06:01:36 -05:00
evazion	42144eaa4b	Fix #5012 : Fc2 image link paste not uploading. Fix referer spoofing not working for certain fc2.com image URLs. Spoofing the referer like this redirects to an HTML error page: * curl -H "Referer: http://wwwew.web.fc2.com" http://wwwew.web.fc2.com/e/405.jpg Spoofing it like this works: * curl -H "Referer: http://wwwew.web.fc2.com/e/405.jpg" http://wwwew.web.fc2.com/e/405.jpg	2022-03-18 04:39:13 -05:00
evazion	c64df46de4	artists: make artist finder use `url` instead of `normalized_url`. Make the artist finder search for artists using the `url` field instead of the `normalized_url` field. This lets us get rid of `normalized_url` in the future. As described in `10dac3ee5`, artist URLs have both a `url` column and a `normalized_url` column. The `normalized_url` column was the one used for artist finding. The `url` was secretly normalized behind the scenes so that artist finding would work no matter how the URL was written in the artist entry. This is no longer necessary now that URLs are directly normalized in artist entries. This fixes various cases where artist finding didn't work for non-obvious reasons, usually because the URL wasn't written in the right format so it wasn't properly normalized behind the scenes. This also makes it so that artist finding is case-insensitive, which fixes #4821. Hopefully no sites are perverse enough to allow two different usernames that differ only in case. Users running their own Danbooru instance may have to fix the URLs in their artist entries for artist finding to work again. There are a few fix scripts to help with this: * script/fixes/104_normalize_weibo_artist_urls.rb * script/fixes/105_normalize_pixiv_artist_urls.rb * script/fixes/106_normalize_artist_urls.rb	2022-03-18 04:00:16 -05:00
evazion	10dac3ee51	artists: normalize urls added to artist entries. When a URL is added to an artist entry, normalize it to a standard form. Artist URLs have both a `url` column and a `normalized_url` column. The `normalized_url` is used for artist finding and the `url` is the raw URL entered by the user. Previously only the `normalized_url` field was normalized; now the URL entered by the user is also converted to a normalized form. This means that if an URL like this is added to an artist entry: * http://www.pixiv.net/member.php?id=1234 * http://www.pixiv.net/en/users/1234 * http://www.twitter.com/DanbooruBot/ * http://mobile.twitter.com/DanbooruBot/ It will get normalized to this: * https://www.pixiv.net/users/1234 * https://twitter.com/DanbooruBot This fixes problems with duplicate URLs being added to artist entries because URLs weren't normalized to a single form.	2022-03-18 02:06:50 -05:00
evazion	455ee9a52a	fc2: parse more url types.	2022-03-18 02:06:30 -05:00
evazion	03d2a86ef1	artists: normalize fc2.com profile urls.	2022-03-17 19:42:57 -05:00
evazion	ded03df1ff	tests: fix more broken tests.	2022-03-15 05:14:56 -05:00
evazion	644dfaf74c	tests: fix broken tests.	2022-03-15 04:45:30 -05:00
evazion	133c45ee29	sources: parse more profile url formats. Add support for parsing these URL formats: * https://www.artstation.com/felipecartin/profile * https://www.deviantart.com/nlpsllp/gallery * https://fantia.jp/asanagi * https://www.lofter.com/front/blog/home-page/noshiqian * https://www.lofter.com/app/xiaokonggedmx * https://www.lofter.com/blog/semblance * https://q.nicovideo.jp/users/18700356 * https://dic.nicovideo.jp/u/11141663 * https://3d.nicovideo.jp/users/109584 * https://3d.nicovideo.jp/u/siobi * https://game.nicovideo.jp/atsumaru/users/7757217 * https://www.pixiv.net/user/13569921/series/81967 * https://pixiv.cc/zerousagi/ * https://www.plurk.com/u/ddks2923 * https://www.plurk.com/m/u/leiy1225 * https://www.plurk.com/s/u/salmonroe13 * https://www.plurk.com/RSSSww/invite/4 * https://skeb.jp/@okku_oxn/works * https://www.tumblr.com/blog/view/artofelaineho/187614935612 * https://www.tumblr.com/blog/view/artofelaineho * https://www.tumblr.com/blog/artofelaineho * https://www.tumblr.com/dashboard/blog/dankwartart * https://rosarrie.tumblr.com/archive * https://whereisnovember.tumblr.com/tagged/art * https://twitpic.com/photos/Type10TK * https://www.weibo.com/detail/4676597657371957 * https://www.weibo.com/u/5957640693/home?wvr=5 * https://www.weibo.com/lvxiuzi0/home	2022-03-15 00:49:54 -05:00
evazion	1d9a15a119	weibo: handle a couple more profile url types. Parse these profile URL types: * https://www.weibo.cn/endlessnsmt * https://www.weibo.com/p/1005055399876326 Also add anchors around the regexes so they have to match the full string.	2022-03-13 20:32:57 -05:00
evazion	be9ef0c49f	artists: add m.weibo.cn urls to artist finder blacklist.	2022-03-13 03:54:17 -05:00

1 2 3 4 5 ...

2706 Commits