danbooru

Author	SHA1	Message	Date
evazion	7ed8f95a8e	sources: add Source::URL class; factor out Source::URL::Twitter. Introduce a Source::URL class for parsing URLs from source sites. Refactor the Twitter source strategy to use it. This is the first step towards factoring all the URL parsing logic out of source strategies and moving it to subclasses of Source::URL. Each site will have a subclass of Source::URL dedicated to parsing URLs from that site. Source strategies will use these classes to extract information from URLs. This is to simplify source strategies. Most sites have many different URL formats we have to parse or rewrite, and handling all these different cases tends to make source strategies very complex. Isolating the URL parsing logic from the site scraping logic should make source strategies easier to maintain.	2022-02-23 23:46:04 -06:00
evazion	7d49ab6130	Add Danbooru::URL class. Introduce a Danbooru::URL class for dealing with URLs. This is a wrapper around Addressable::URI that adds some additional helper methods. Most significantly, the `parse` method only allows valid http/https URLs, and it returns nil instead of raising an exception when the URL is invalid.	2022-02-22 00:17:53 -06:00
evazion	68ba447494	uploads: remove batch upload page. * Make /uploads/batch redirect to /uploads/new. * Remove /uploads/image_proxy.	2022-02-21 00:03:43 -06:00
evazion	e4d7453180	uploads: improve error messages. Improve upload error messages when downloading an URL fails, or it isn't an image or video file.	2022-02-15 18:54:55 -06:00
evazion	27d71f2727	uploads: raise download timeout. Raise the timeout for downloading files from the source to 60 seconds globally. Previously had a lower timeout because uploads were processed in the foreground when not using the bookmarklet, and we didn't want to tie up Puma worker processes with slow downloads. Now that all uploads are processed in the background, we can have a higher timeout.	2022-02-15 00:56:51 -06:00
evazion	abdab7a0a8	uploads: rework upload process. Rework the upload process so that files are saved to Danbooru first before the user starts tagging the upload. The main user-visible change is that you have to select the file first before you can start tagging it. Saving the file first lets us fix a number of problems: * We can check for dupes before the user tags the upload. * We can perform dupe checks and show preview images for users not using the bookmarklet. * We can show preview images without having to proxy images through Danbooru. * We can show previews of videos and ugoira files. * We can reliably show the filesize and resolution of the image. * We can let the user save files to upload later. * We can get rid of a lot of spaghetti code related to preprocessing uploads. This was the cause of most weird "md5 confirmation doesn't match md5" errors. (Not all of these are implemented yet.) Internally, uploading is now a two-step process: first we create an upload object, then we create a post from the upload. This is how it works: * The user goes to /uploads/new and chooses a file or pastes an URL into the file upload component. * The file upload component calls `POST /uploads` to create an upload. * `POST /uploads` immediately returns a new upload object in the `pending` state. * Danbooru starts processing the upload in a background job (downloading, resizing, and transferring the image to the image servers). * The file upload component polls `/uploads/$id.json`, checking the upload `status` until it returns `completed` or `error`. * When the upload status is `completed`, the user is redirected to /uploads/$id. * On the /uploads/$id page, the user can tag the upload and submit it. * The upload form calls `POST /posts` to create a new post from the upload. * The user is redirected to the new post. This is the data model: * An upload represents a set of files uploaded to Danbooru by a user. Uploaded files don't have to belong to a post. An upload has an uploader, a status (pending, processing, completed, or error), a source (unless uploading from a file), and a list of media assets (image or video files). * There is a has-and-belongs-to-many relationship between uploads and media assets. An upload can have many media assets, and a media asset can belong to multiple uploads. Uploads are joined to media assets through a upload_media_assets table. An upload could potentially have multiple media assets if it's a Pixiv or Twitter gallery. This is not yet implemented (at the moment all uploads have one media asset). A media asset can belong to multiple uploads if multiple people try to upload the same file, or if the same user tries to upload the same file more than once. New features: * On the upload page, you can press Ctrl+V to paste an URL and immediately upload it. * You can save files for upload later. Your saved files are at /uploads. Fixes: * Improved error messages when uploading invalid files, bad URLs, and when forgetting the rating.	2022-01-28 04:13:22 -06:00
evazion	a7dc05ce63	Enable frozen string literals. Make all string literals immutable by default.	2021-12-14 21:33:27 -06:00
evazion	bc506ed1b8	uploads: refactor to simplify ugoira-handling and replacements: * Make it so replacing a post doesn't generate a dummy upload as a side effect. * Make it so you can't replace a post with itself (the post should be regenerated instead). * Refactor uploads and replacements to save the ugoira frame data when the MediaAsset is created, not when the post is created. This way it's possible to view the ugoira before the post is created. * Make `download_file!` in the Pixiv source strategy return a MediaFile with the ugoira frame data already attached to it, instead of returning it in the `data` field then passing it around separately in the `context` field of the upload.	2021-10-18 05:18:46 -05:00
nonamethanks	9a6a6e52ea	Lofter: raise timeout for file download	2021-09-10 13:10:29 +02:00
evazion	bb7f24d279	Add HTTP proxy support. Add support for using a proxy for HTTP requests. Only used for external requests, such as downloading files or talking to source sites such as Pixiv or Twitter, not for internal requests, such as talking to IQDB or Reportbooru.	2021-08-28 04:53:33 -05:00
evazion	0249c290fd	skeb: remove skeb from site_name in base strategy. Fixup a mistake with the way the merge conflict was resolved in `9dd903d21`.	2021-03-08 03:56:44 -06:00
evazion	1716cc5bf9	artists: add more artist url icons.	2021-03-08 01:30:02 -06:00
evazion	7b60a476e5	sources: add artist profile links to fetch source data box. Add site icons linking to all the artist's sites in the fetch source data box. Some artist entries have a large number of URLs. Various heuristics are applied to try to present the most useful URLs first. Dead URLs and redundant URLs (Pixiv stacc and Twitter intent URLs) are filtered out. Remaining URLs are sorted first by site (to put sites like Pixiv and Twitter first), then by URL (to break ties when an artist has multiple accounts on the same site). Some sites have shitty hard-to-read icons. It can't be helped. The icons are the official favicons of each site.	2021-02-26 01:24:30 -06:00
evazion	5af50b7fcd	danbooru::http: factor out Cloudflare Polish bypassing. * Factor out the Cloudflare Polish bypass code to a standalone feature. * Add `http_downloader` method to the base source strategy. This is a HTTP client that should be used for downloading images or making requests to images. This client ensures that referrer spoofing and Cloudflare bypassing are performed. This fixes a bug with the upload page reporting the polished filesize instead of the original filesize when uploading ArtStation images.	2020-06-24 22:54:04 -05:00
evazion	4074cc99f9	uploads: fix incorrect remote sizes on pixiv uploads. Bug: the uploads page showed a remote size of 146 bytes for Pixiv uploads. Cause: we didn't spoof the Referer header when making the HEAD request for the image, causing Pixiv to return a 403 error. Also fix the case where the Content-Length header is absent.	2020-06-24 03:02:45 -05:00
evazion	db3407caa3	uploads: fix uploading from source not working. ref: `26ad844bbe (r40077579)`.	2020-06-22 15:32:48 -05:00
evazion	a4efeb2260	gems: drop Mechanize, HTTParty, and Sinatra gems.	2020-06-21 15:13:42 -05:00
evazion	7e471fe223	sources: replace HTTParty with Danbooru::Http in `http_exists?`.	2020-06-21 15:11:56 -05:00
evazion	26ad844bbe	downloads: refactor Downloads::File into Danbooru::Http. Remove the Downloads::File class. Move download methods to Danbooru::Http instead. This means that: * HTTParty has been replaced with http.rb for downloading files. * Downloading is no longer tightly coupled to source strategies. Before Downloads::File tried to automatically look up the source and download the full size image instead if we gave it a sample url. Now we can do plain downloads without source strategies altering the url. * The Cloudflare Polish check has been changed from checking for a Cloudflare IP to checking for the CF-Polished header. Looking up the list of Cloudflare IPs was slow and flaky during testing. * The SSRF protection code has been factored out so it can be used for normal http requests, not just for downloads. * The Webmock gem can be removed, since it was only used for stubbing out certain HTTParty requests in the download tests. The Webmock gem is buggy and caused certain tests to fail during CI. * The retriable gem can be removed, since we no longer autoretry failed downloads. We assume that if a download fails once then retrying probably won't help.	2020-06-20 00:20:39 -05:00
evazion	1aa0f65187	sources: fix rubocop warnings.	2020-06-16 00:10:37 -05:00
evazion	88d9fc4e5e	sources: simplify artist finder url normalization. Get rid of `normalized_for_artist_finder?` and `normalizable_for_artist_finder?`. This was legacy bullshit that was originally designed to avoid API calls when saving artist entries containing old Pixiv direct image urls that had already been normalized, or that couldn't be normalized because they were bad id. Nowadays we store profile urls in artist entries instead of direct image urls, so we don't normally need to do any API calls to normalize the profile url. Strategies should take care to avoid triggering API calls inside `profile_url` when possible.	2020-05-29 15:35:15 -05:00
nonamethanks	307df3b3e4	Refactor source normalization * Move the source normalization logic out of the post model and into individual sources' strategies. * Rewrite normalization tests to be handled into each source's test, and expand them significantly. Previously we were only testing a very small subset of domains and variants. * Fix up normalization for several sites. * Normalize fav.me urls into normal deviantart urls.	2020-05-21 22:46:51 +02:00
evazion	e3187e0bd0	tags: add general?, character?, copyright?, artist?, meta?, empty? helper methods.	2020-05-10 23:56:50 -05:00
evazion	f38c38f26e	search: split tag_match into user_tag_match / system_tag_match. When doing a tag search, we have to be careful about which user we're running the search as because the results depend on the current user. Specifically, things like private favorites, private favorite groups, post votes, saved searches, and flagger names depend on the user's permissions, and whether non-safe or deleted posts are filtered out depend on whether the user has safe mode on or the hide deleted posts setting enabled. * Refactor internal searches to explicitly state whether they're running as the system user (DanbooruBot) or as the current user. * Explicitly pass in the current user to PostQueryBuilder instead of implicitly relying on the CurrentUser global. * Get rid of CurrentUser.admin_mode? (used to ignore the hide deleted post setting) and CurrentUser.without_safe_mode (used to ignore safe mode). * Change the /counts/posts.json endpoint to ignore safe mode and the hide deleted posts settings when counting posts. * Fix searches not correctly overriding the hide deleted posts setting when multiple status: metatags were used (e.g. `status:banned status:active`) * Fix fast_count not respecting the hide deleted posts setting when the status:banned metatag was used.	2020-05-07 03:29:44 -05:00
evazion	ddffffb413	artists: factor out artist finder to separate module.	2020-03-06 23:23:38 -06:00
evazion	60bf21ff80	twitter: fix preview_urls when source url is a direct image. Fix preview_urls returning an empty array when the source url is a direct image from Twitter. Also return preview_urls in /source.json.	2020-01-21 16:34:03 -06:00
evazion	aff3d3b18f	Fix various rubocop issues.	2020-01-11 19:01:40 -06:00
evazion	309821bf73	rubocop: fix various style issues.	2019-12-22 21:23:37 -06:00
evazion	eba6440b8b	Fix #4144 : Deviantart Eclipse update broke strategy.	2019-08-28 23:40:29 -05:00
evazion	1f73e60514	sources: add methods for customizing new artist entries. * Rename `unique_id` to `tag_name`. * Add `other_names` and `profile_urls` methods that sources can override to provide extra names or urls when creating new artist entries.	2018-12-27 15:03:11 -06:00
evazion	2170961f47	artists: improve prefilling of new artist form (#4028 ) * When creating an artist by clicking the '?' next to the artist tag in the tag list, prefill the new artist form by finding the artist's last upload and fetching its source data. Previously we filled the urls with the source of the artist's last upload, which was wrong because it was usually a direct image URL (#3078). * Fix the other names field not escaping spaces within names to underscores. * Fix the other names field being potentially prefilled with duplicate names.	2018-12-27 15:03:11 -06:00
evazion	c700ea4b5f	Fix #4016 : Translated tags failing to find some tags. * Normalize spaces to underscores when saving other names. Preserve case since case can be significant. * Fix WikiPage#other_names_include to search case-insensitively (note: this prevents using the index). * Fix sources to return the raw tags in `#tags` and the normalized tags in `#normalized_tags`. The normalized tags are the tags that will be matched against other names.	2018-12-16 11:37:57 -06:00
evazion	811bad5a86	/source.json: include raw api responses in output (#3940 ).	2018-11-30 00:19:00 -06:00
evazion	308a5021b4	wiki pages: convert other_names to array (#3987 ).	2018-11-13 19:18:11 -06:00
evazion	628341f7f0	Fix #3969 : Translated tags should ignore artist tags.	2018-11-04 17:07:35 -06:00
evazion	5cf6a43918	sources: fix sources sometimes choosing wrong strategy (fix #3968 ) Fix sources choosing the wrong strategy when the referer belongs to a different site (for example, when uploading a twitter post with a pixiv referer). * Fix `match?` to only consider the main url, not the referer. * Change `match?` to match against a list of domains given by the `domains` method. * Change `match?` to an instance method.	2018-11-04 13:00:17 -06:00
evazion	b0d7d90103	tumblr: extract info from url when api data is unavailable. Derive the artist name / profile url / page url from the source URLs when the API response is unavailable because the Tumblr post was deleted. This fixes the artist finder to work on bad_tumblr_id posts.	2018-10-09 12:44:59 -05:00
evazion	09a8198979	/artists: add wildcard, regex search to url field (#3900 ) Allow searching the URL field by regex or by wildcard. If the query looks like `/twitter/` do a regex search, otherwise if it looks like `http://www.twitter.com/` do a wildcard search, otherwise if it looks like an url do an artist finder search, lastly if it looks like `twitter` do a `twitter*` search.	2018-09-21 21:19:01 -05:00
evazion	d9ce953752	Fix #3906 : Moebooru strategy raises NotImplementedError.	2018-09-16 21:00:11 -05:00
evazion	bd47641601	twitter: don't fail when api key isn't configured.	2018-09-16 15:03:47 -05:00
evazion	325120ee51	twitter: fix parsing of the artist name from the url. Fixes URLs like https://twitter.com/intent/user?user_id=123 being incorrectly normalized to http://twitter.com/intent/ in artist entries. Also fixes the artist name to be taken from the url when it can't be obtained from the api (when the tweet is deleted).	2018-09-16 15:03:23 -05:00
evazion	583f8457f0	artists: clean up artist finding logic. Rename Artist#find_all_by_url to url_matches and drop previous url_matches method, along with find_artists and search_for_profile. Previously find_artists tried to lookup the url, referer url, and profile url in turn until an artist match was found. This was wasteful, because the source strategy already knows which url to lookup (usually the profile url). If that url doesn't find a match, then the artist doesn't exist.	2018-09-11 20:14:46 -05:00
Albert Yi	4972c998f8	rely on preview urls if available for gallery	2018-09-11 15:06:12 -07:00
evazion	b924c2bb9c	nijie: fix artist url normalization.	2018-09-09 13:17:52 -05:00
evazion	950fcdb7b2	uploads: add new source:<url> dupe check (fix #3873 ) * On the /uploads/new page, instead of just showing a "This post has probably already been uploaded" message, show the actual thumbnails of posts having the same source as what the user is trying to upload. * Move the iqdb results section up top, beside the related posts section.	2018-09-06 20:43:20 -05:00
Albert Yi	e695cdde75	add a default for image_url on Sources::Strategies#canonical_url	2018-09-04 11:35:33 -07:00
evazion	d693f01dde	Fix #3859 : Related tag and find artist don't run when fetch data fails. Fixes an exception in the artist finder caused by searching for a nil profile_url.	2018-09-01 11:48:42 -05:00
Albert Yi	762dc3da24	Refactor sources	2018-08-24 12:10:51 -07:00
evazion	302994e5d9	Fix #3639 : Favorite count pixiv tags aren't skipped by translated tags.	2018-04-13 22:39:52 -05:00
evazion	265377bdbb	Fix #3450 : Aliased tags show up under translated tags. Resolves aliases in translated tags. For example, say we lookup `遠坂凛` and find `tohsaka_rin` and `toosaka_rin`. We apply aliases so that `tohsaka_rin` becomes `toosaka_rin`, which is then returned as the only translated tag.	2017-12-23 12:27:58 -06:00

1 2

80 Commits