Exclude the posts, post_votes, favorites, media_assets, and ai_tags
tables from the BigQuery dumps. These usually take too long to complete
and also consume huge amounts of memory in the background workers.
Fix BigQuery export jobs failing with:
Google::Cloud::InvalidArgumentError: required: Bucket is requester pays
bucket but no user project provided.
Caused by changing the storage bucket to requester pays. The
`user_project` param must be set to true on requester pays buckets to
bill usage to the current project.
The favorites table is too big and dumping it tends to time out. Then
the job keeps retrying even though it always fails, then multiple
instances of the job build up in the job queue because the old jobs
never finish.
Include the favorites table in the nightly database dumps in BigQuery.
Previously we couldn't do this because we didn't have an index on
the favorite ID, which we needed to iterate across the table efficiently.
Note that this doesn't include private favorites. Note also that if a
user switches their favorites from private to public, then their
favorites will begin to appear in these dumps.
Changes:
* Change the `expires_at` field to `duration`.
* Make moderators choose from a fixed set of standard ban lengths,
instead of allowing arbitrary ban lengths.
* List `duration` in seconds in the /bans.json API.
* Dump bans to BigQuery.
Note that some old bans have a negative duration. This is because their
expiration date was before their creation date, which is because in 2013
bans were migrated to Danbooru 2 and the original ban creation dates
were lost.
* Export daily public database dumps to BigQuery and Google Cloud Storage.
* Only data visible to anonymous users is exported. Some tables have
null or missing fields because of this.
* The bans table is excluded because some bans have an expires_at
timestamp set beyond year 9999, which BigQuery doesn't support.
* The favorites table is excluded because it's too slow to dump (it
doesn't have an id index, which is needed by find_each).
* Version tables are excluded because dumping them every day is
inefficient, streaming insertions should be used instead.
Links:
* https://console.cloud.google.com/bigquery?project=danbooru1
* https://console.cloud.google.com/storage/browser/danbooru_public
* https://storage.googleapis.com/danbooru_public/data/posts.json