Export public database dumps to BigQuery.

* Export daily public database dumps to BigQuery and Google Cloud Storage.
* Only data visible to anonymous users is exported. Some tables have
  null or missing fields because of this.
* The bans table is excluded because some bans have an expires_at
  timestamp set beyond year 9999, which BigQuery doesn't support.
* The favorites table is excluded because it's too slow to dump (it
  doesn't have an id index, which is needed by find_each).
* Version tables are excluded because dumping them every day is
  inefficient, streaming insertions should be used instead.

Links:

* https://console.cloud.google.com/bigquery?project=danbooru1
* https://console.cloud.google.com/storage/browser/danbooru_public
* https://storage.googleapis.com/danbooru_public/data/posts.json
This commit is contained in:
evazion
2021-03-10 01:31:32 -06:00
parent 5623cfb145
commit f235b72b3f
8 changed files with 200 additions and 0 deletions

View File

@@ -512,6 +512,18 @@ module Danbooru
def cloudflare_zone
end
# Google Cloud API key. Used for exporting data to BigQuery and to Google
# Cloud Storage. Should be the JSON key object you get after creating a
# service account. Must have the "BigQuery User" and "Storage Admin" roles.
#
# * Go to https://console.cloud.google.com/iam-admin/serviceaccounts and create a service account.
# * Go to "Keys" and add a new key.
# * Go to https://console.cloud.google.com/iam-admin/iam and add the
# BigQuery User and Storage Admin roles to the service account.
# * Paste the JSON key file here.
def google_cloud_credentials
end
# The URL for the recommender server (https://github.com/evazion/recommender).
# Optional. Used to generate post recommendations.
# Set to http://localhost/mock/recommender to enable a fake recommender