Remove the `category_name` field from the `/wiki_page.json` API. This
field was originally added only because it was needed by our autocomplete
Javascript. It was also misnamed, it wasn't the tag's category name, it
was the category's ID.
Users should use `https://danbooru.donmai.us/wiki_pages.json?only=title,tag`
instead if they need this.
This triggered a N+1 query pattern when dumping wiki pages to BigQuery,
which made dumping wiki pages very slow. It also meant this field was
included in the database dump, even though it wasn't a real database
column.
Fix the `normalize` and `array_attribute` macros conflicting with each
other on the WikiPage model. This meant code like
`wiki_page.other_names = "foo bar"` didn't work. Both macros defined a
`other_names=` method, but one method overrode the other.
The fix is to use anonymous modules and prepend so we can chain method
calls with super.
Fix wiki pages and artists to normalize other names more consistently
and correctly.
For wiki pages, we strip leading / trailing / repeated underscores to
fix user typos, and we normalize to NFKC form to make search more consistent.
For artists, we allow leading / trailing / repeated underscores because
some artist names have these, and we normalize to NFC form because some
artists have weird names that would be lost by NFKC form. This does make
search less consistent.
* Introduce an abstraction for normalizing attributes. Very loosely
modeled after https://github.com/fnando/normalize_attributes.
* Normalize wiki bodies to Unicode NFC form.
* Normalize Unicode space characters in wiki bodies (strip zero width
spaces, normalize line endings to CRLF, normalize Unicode spaces to
ASCII spaces).
* Trim spaces from the start and end of wiki page bodies. This may cause
wiki page diffs to show spaces being removed even when the user didn't
explicitly remove the spaces themselves.
Don't allow wiki pages to have invalid names.
This incidentally means that you can't create wiki pages for pools. For
example, you can't create a wiki titled "pool:almost_heart-warming".
This is not a valid tag name, so it's not a valid wiki name either. This
was done in a handful of cases to translate Pixiv tags to Danbooru pools
(see: <https://danbooru.donmai.us/wiki_page_versions?search[title_like]=pool:*>)
Also fix it so that titles are normalized before validation, not before save.
Allowing typing Japanese tags in autocomplete. For example, typing 東方
in autocomplete will be completed to the touhou tag. Typing ぶくぶ will
complete to the bkub tag.
This works using wiki page and artist other names. Effectively, any name
listed as an other name in a wiki or artist page will be treated like an
alias for autocomplete purposes. This is limited to non-ASCII other names,
to prevent English other names from interfering with regular tag searches.
This refactors the autocomplete Javascript to use a single dedicated
/autocomplete.json endpoint instead of a bunch of separate endpoints.
This simplifies the autocomplete Javascript by making it so that instead
of calling a different endpoint for each type of query (for users, wiki
pages, pools, artists, etc), then having to parse the results of each
call to get the data we need, we can call a single endpoint that returns
exactly what we need.
This also means we don't have to parse searches clientside in order to
autocomplete metatags. Instead we can just pass the search term to the
server and let it parse the search, which is easy to do serverside.
Finally, this makes autocomplete easier to test, and it makes it easier
to add more sophisticated autocomplete behavior, since most of the logic
lives serverside.
When aliasing A to B, update any wikis linking to [[A]] to link to [[B]]
instead.
This is a best-effort process based on rough heuristics. There are a few
known problems:
* We don't always know how to capitalize the new tag. We try to mimic
the capitalization of the old tag, such that if the old tag was
capitalized (because it was at the beginning of a sentence), or if
every word in the old link was capitalized (because it's a proper
noun), then the new link will be capitalized in the same way. This can
handle simple general tags and character tags, but will fail for
copyright tags with mixed capitalization. For example, we don't know
that [[jojo_no_kimyou_na_bouken]] should be capitalized as [[JoJo no
Kimyou na Bouken]]. If we don't know how to capitalize the new tag, we
leave the old tag as-is so it can manually be fixed.
* Some aliases might require changing how a tag is pluralized. If we
changed [[rat]] to [[mouse]], then we should change `[[rat]]s` to
[[mice]]. We don't try to deal with this.
* In general, some changes might require entire sentences to be
rewritten to keep the grammar correct. Changing something like
[[skirt lift]] to [[lifting skirt]] could break the grammar of the
sentence. We don't try to deal with this.
Move html_data_attributes definitions from models to policies. Which
attributes are permitted as data-* attributes is a view level concern
and should be defined on the policy level, not the model level. Models
should be agnostic about how they're used in views.
Also add inputs on the search page for both the linked_to and the
not_linked_to search parameters. Additionally, normalize the title
first since autocomplete adds trailing spaces. The search query was
also simplified a bit by taking advantage of Rails associations.
Refactor models so that we define attribute API permissions in policy
files instead of directly in models.
This is cleaner because a) permissions are better handled by policies
and b) which attributes are visible to the API is an API-level concern
that models shouldn't have to care about.
This fixes an issue with not being able to precompile CSS/JS assets
unless the database was up and running. This was a problem when building
Docker images because we don't have a database at build time. We needed
the database because `api_attributes` was a class-level macro in some
places, which meant it ran at boot time, but this triggered a database
call because api_attributes used database introspection to get the list
of allowed API attributes.
Rename is_active to is_deleted. This is for better consistency with
other models, and to reduce confusion over what "active" means for
artists. Sometimes users think active is for whether the artist is
actively producing work.
* Replace the .category-N CSS classes on tags with .tag-type-N. Before
we were inconsistent about whether tag colors were indicated with
.category-N or .tag-type-N. Now it's always .tag-type-N.
* Fix various places to not use Tag.category_for. Tag.category_for does
one Redis call per tag lookup, which leads to N Redis calls on many
pages. This was inefficient because usually we either already had the
tags from the database, or we could fetch them easily.
- The only string works much the same as before with its comma separation
-- Nested includes are indicated with square brackets "[ ]"
-- The nested include is the value immediately preceding the square brackets
-- The only string is the comma separated string inside those brackets
- Default includes are split between format types when necessary
-- This prevents unnecessary includes from being added on page load
- Available includes are those items which are allowed to be accessible to the user
-- Some aren't because they are sensitive, such as the creator of a flag
-- Some aren't because the number of associated items is too large
- The amount of times the same model can be included to prevent recursions
-- One exception is the root model may include the same model once
--- e.g. the user model can include the inviter which is also the user model
-- Another exception is if the include is a has_many association
--- e.g. artist urls can include the artist, and then artist urls again
Remove support for the `search[sort]` param on certain index pages. This
hasn't been used for years, and it caused the `search[order]=` param to
be added to pagination links even when the order was blank.
Allow all users to view and edit artist entries and wiki pages belonging
to banned artists. There was little need to hide these pages from
Members, it was mainly to appease artists who didn't like us even
linking to their sites.
These restrictions also had multiple flaws:
* Banned artist information was still visible in the API.
* It was still possible to edit banned artists using the API.
* It was still possible for unprivileged users to revert banned
artist entries or wiki pages to previous versions.
* The restrictions were inconsistent: in various places they were
either Member-only, Gold-only, or Builder-only.
* Warn when renaming a wiki that still has links from other wikis.
* When renaming a wiki that still has posts, just show a warning instead
of returning an error and making the user confirm the rename.
Drop the creator_id and updater_id fields from wiki pages. These fields
had several issues:
* The creator_id field was inconsistent with the wiki_page_versions
table. Apparently during the migration to Danbooru 2 in 2012-2013 the
creator_id field got reset to whoever last updated the wiki at that
point in time.
* Saving a wiki would set the updater_id even when nothing actually
changed. This also caused the updated_at timestamp to get bumped.
Because of this, anything that saved a wiki, including things like
creating aliases or implications, would bump the updater_id and
updated_at even though the wiki didn't actually change. This meant
these fields weren't consistent with the wiki_page_versions history.
Changes:
* Remove `creator_name` field from the /wiki_pages.json API.
* Remove creator name search option from /wiki_pages/search.
Add a dtext_links table for tracking links between wiki pages. This is
to allow for broken link detection and "what links here" searches, among
other uses.
DText is processed in three phases: a preprocessing phase, the regular
parsing phases, and a postprocessing phase.
In the preprocessing phase we extract all the wiki links from all the
dtext messages on the page (more precisely, we do this in forum threads
and on comment pages, because these are the main places with lots of
dtext). This is so we can lookup all the tags and wiki pages in one
query, which is necessary because in the worst case (in certain forum
threads and in certain list_of_* wiki pages) there can be hundreds of
tags per page.
In the postprocessing phase we fixup the html generated by the ragel
parser to add CSS classes to wiki links. We do this in a postprocessing
step because it's easier than doing it in the ragel parser itself.
* Parse the wiki page with the actual dtext parser instead of by hand.
This is so that wiki links inside things like [nodtext] or [code]
blocks are handled properly.
* Only include tags that exist and are nonempty. Don't include links to
dead pages or blank tags.