posts: normalize Unicode to NFC form in post sources.

Fix strings like "pokémon" (NFD form) and "pokémon" (NFC form) being
considered different strings in sources.

Also add a fix script to fix existing sources. There were only 15 posts
with unnormalized sources.
This commit is contained in:
evazion
2022-01-31 10:56:27 -06:00
parent 0132c5f0a5
commit 61c043c6b1
4 changed files with 26 additions and 33 deletions

View File

@@ -1,6 +1,4 @@
# frozen_string_literal: true
# normalize unicode in non-web sources
# normalize percent-encode unicode in source urls
class Post < ApplicationRecord
class RevertError < StandardError; end
@@ -14,9 +12,9 @@ class Post < ApplicationRecord
deletable
normalize :source, :normalize_source
before_validation :merge_old_changes
before_validation :normalize_tags
before_validation :strip_source
before_validation :parse_pixiv_id
before_validation :blank_out_nonexistent_parents
before_validation :remove_parent_loops
@@ -1334,8 +1332,8 @@ class Post < ApplicationRecord
self
end
def strip_source
self.source = source.try(:strip)
def self.normalize_source(source)
source.to_s.strip.unicode_normalize(:nfc)
end
def mark_as_translated(params)