Commit Graph

14 Commits

Author SHA1 Message Date
evazion
d1ace32c40 bigquery: exclude large tables from nightly BigQuery dumps.
Exclude the posts, post_votes, favorites, media_assets, and ai_tags
tables from the BigQuery dumps. These usually take too long to complete
and also consume huge amounts of memory in the background workers.
2022-06-30 21:42:30 -05:00
evazion
fe18e37566 Fix #4954: BigQuery exports not updating.
Fix BigQuery export jobs failing with:

    Google::Cloud::InvalidArgumentError: required: Bucket is requester pays
    bucket but no user project provided.

Caused by changing the storage bucket to requester pays. The
`user_project` param must be set to true on requester pays buckets to
bill usage to the current project.
2022-01-14 00:12:20 -06:00
evazion
aedc09f301 bigquery: exclude GoodJob::Job from BigQuery. 2022-01-10 00:12:31 -06:00
evazion
72d5291a27 bigquery: exclude more GoodJobs classes from BigQuery. 2022-01-06 11:13:55 -06:00
evazion
123edc63a1 bigquery: don't dump good_jobs table to bigquery. 2022-01-06 00:41:26 -06:00
evazion
090125e239 Revert "Temp disable dumping favorites table to BigQuery."
This reverts commit 788dcbd87b.
2022-01-04 18:08:54 -06:00
evazion
9000facaf7 Revert "bigquery: temp disable dumping the posts table."
This reverts commit f02b437085.
2022-01-04 18:08:47 -06:00
evazion
a7dc05ce63 Enable frozen string literals.
Make all string literals immutable by default.
2021-12-14 21:33:27 -06:00
evazion
f02b437085 bigquery: temp disable dumping the posts table.
Dumping the posts table to BigQuery tends to timeout and leave stuck
jobs in the jobs table. Disable it until it can be fixed.
2021-12-08 18:04:14 -06:00
evazion
788dcbd87b Temp disable dumping favorites table to BigQuery.
The favorites table is too big and dumping it tends to time out. Then
the job keeps retrying even though it always fails, then multiple
instances of the job build up in the job queue because the old jobs
never finish.
2021-11-01 05:15:31 -05:00
evazion
26a411ba27 favorites: include favorites in bigquery exports.
Include the favorites table in the nightly database dumps in BigQuery.
Previously we couldn't do this because we didn't have an index on
the favorite ID, which we needed to iterate across the table efficiently.

Note that this doesn't include private favorites. Note also that if a
user switches their favorites from private to public, then their
favorites will begin to appear in these dumps.
2021-10-08 21:26:42 -05:00
evazion
ed302fdf4d docs: add documentation for various classes in app/logical. 2021-06-23 06:23:29 -05:00
evazion
81fe68d392 bans: change expires_at field to duration.
Changes:

* Change the `expires_at` field to `duration`.
* Make moderators choose from a fixed set of standard ban lengths,
  instead of allowing arbitrary ban lengths.
* List `duration` in seconds in the /bans.json API.
* Dump bans to BigQuery.

Note that some old bans have a negative duration. This is because their
expiration date was before their creation date, which is because in 2013
bans were migrated to Danbooru 2 and the original ban creation dates
were lost.
2021-03-11 02:59:58 -06:00
evazion
f235b72b3f Export public database dumps to BigQuery.
* Export daily public database dumps to BigQuery and Google Cloud Storage.
* Only data visible to anonymous users is exported. Some tables have
  null or missing fields because of this.
* The bans table is excluded because some bans have an expires_at
  timestamp set beyond year 9999, which BigQuery doesn't support.
* The favorites table is excluded because it's too slow to dump (it
  doesn't have an id index, which is needed by find_each).
* Version tables are excluded because dumping them every day is
  inefficient, streaming insertions should be used instead.

Links:

* https://console.cloud.google.com/bigquery?project=danbooru1
* https://console.cloud.google.com/storage/browser/danbooru_public
* https://storage.googleapis.com/danbooru_public/data/posts.json
2021-03-10 02:52:16 -06:00