Export public database dumps to BigQuery.

* Export daily public database dumps to BigQuery and Google Cloud Storage.
* Only data visible to anonymous users is exported. Some tables have
  null or missing fields because of this.
* The bans table is excluded because some bans have an expires_at
  timestamp set beyond year 9999, which BigQuery doesn't support.
* The favorites table is excluded because it's too slow to dump (it
  doesn't have an id index, which is needed by find_each).
* Version tables are excluded because dumping them every day is
  inefficient, streaming insertions should be used instead.

Links:

* https://console.cloud.google.com/bigquery?project=danbooru1
* https://console.cloud.google.com/storage/browser/danbooru_public
* https://storage.googleapis.com/danbooru_public/data/posts.json
This commit is contained in:
evazion
2021-03-10 01:31:32 -06:00
parent 5623cfb145
commit f235b72b3f
8 changed files with 200 additions and 0 deletions

View File

@@ -15,6 +15,7 @@ module DanbooruMaintenance
safely { BulkUpdateRequestPruner.warn_old }
safely { BulkUpdateRequestPruner.reject_expired }
safely { Ban.prune! }
safely { BigqueryExportService.async_export_all! }
safely { ActiveRecord::Base.connection.execute("vacuum analyze") unless Rails.env.test? }
end