tumblr: fix exception when fetching data for video urls.

Fix an exception when trying to fetch source data for URLs like
https://va.media.tumblr.com/tumblr_pgohk0TjhS1u7mrsl.mp4.

For these URLs it's not possible to use the trick where we try to open
the URL as a HTML page and scrape the post id from the HTML. Instead we
get the raw video if we try to to this.
This commit is contained in:
evazion
2022-09-05 16:12:25 -05:00
parent f55951ab58
commit d2147eca80
3 changed files with 27 additions and 1 deletions

View File

@@ -101,13 +101,16 @@ class Source::Extractor
end
def post_url_from_image_html
return nil unless parsed_url.image_url? && parsed_url.file_ext&.in?(%w[jpg png pnj gif])
extracted = image_url_html(parsed_url)&.at("[href*='/post/']")&.[](:href)
Source::URL.parse(extracted)
end
memoize :post_url_from_image_html
def image_url_html(image_url)
resp = http.cache(1.minute).headers(accept: "text/html").get(image_url)
return nil if resp.code != 200
return nil if resp.code != 200 || resp.mime_type != "text/html"
resp.parse
end