Export public database dumps to BigQuery.

* Export daily public database dumps to BigQuery and Google Cloud Storage.
* Only data visible to anonymous users is exported. Some tables have
  null or missing fields because of this.
* The bans table is excluded because some bans have an expires_at
  timestamp set beyond year 9999, which BigQuery doesn't support.
* The favorites table is excluded because it's too slow to dump (it
  doesn't have an id index, which is needed by find_each).
* Version tables are excluded because dumping them every day is
  inefficient, streaming insertions should be used instead.

Links:

* https://console.cloud.google.com/bigquery?project=danbooru1
* https://console.cloud.google.com/storage/browser/danbooru_public
* https://storage.googleapis.com/danbooru_public/data/posts.json
This commit is contained in:
evazion
2021-03-10 01:31:32 -06:00
parent 5623cfb145
commit f235b72b3f
8 changed files with 200 additions and 0 deletions

View File

@@ -49,6 +49,7 @@ jobs:
DANBOORU_RAKISMET_KEY: ${{ secrets.DANBOORU_RAKISMET_KEY }}
DANBOORU_RAKISMET_URL: ${{ secrets.DANBOORU_RAKISMET_URL }}
DANBOORU_IP_REGISTRY_API_KEY: ${{ secrets.DANBOORU_IP_REGISTRY_API_KEY }}
DANBOORU_GOOGLE_CLOUD_CREDENTIALS: ${{ secrets.DANBOORU_GOOGLE_CLOUD_CREDENTIALS }}
DANBOORU_STRIPE_SECRET_KEY: ${{ secrets.DANBOORU_STRIPE_SECRET_KEY }}
DANBOORU_STRIPE_PUBLISHABLE_KEY: ${{ secrets.DANBOORU_STRIPE_PUBLISHABLE_KEY }}
DANBOORU_STRIPE_WEBHOOK_SECRET: ${{ secrets.DANBOORU_STRIPE_WEBHOOK_SECRET }}