linux-toradex.git/fs/unicode, branch v5.19-rc8

kbuild: unify cmd_copy and cmd_shipped

2022-02-14T01:37:32+00:00

cmd_copy and cmd_shipped have similar functionality. The difference is
that cmd_copy uses 'cp' while cmd_shipped 'cat'.

Unify them into cmd_copy because this macro name is more intuitive.

Going forward, cmd_copy will use 'cat' to avoid the permission issue.
I also thought of 'cp --no-preserve=mode' but this option is not
mentioned in the POSIX spec [1], so I am keeping the 'cat' command.

[1]: https://pubs.opengroup.org/onlinepubs/009695299/utilities/cp.html
Signed-off-by: Masahiro Yamada 
Reviewed-by: Nick Desaulniers 
Reviewed-by: Gabriel Krisman Bertazi

Merge tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode

2022-02-01T19:13:24+00:00

Pull unicode cleanup from Gabriel Krisman Bertazi:
 "A fix from Christoph Hellwig merging the CONFIG_UNICODE_UTF8_DATA into
  the previous CONFIG_UNICODE. It is -rc material since we don't want to
  expose the former symbol on 5.17.

  This has been living on linux-next for the past week"

* tag 'unicode-for-next-5.17-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode:
  unicode: clean up the Kconfig symbol confusion

unicode: clean up the Kconfig symbol confusion

2022-01-21T00:57:24+00:00

Turn the CONFIG_UNICODE symbol into a tristate that generates some always
built in code and remove the confusing CONFIG_UNICODE_UTF8_DATA symbol.

Note that a lot of the IS_ENABLED() checks could be turned from cpp
statements into normal ifs, but this change is intended to be fairly
mechanic, so that should be cleaned up later.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Reported-by: Linus Torvalds 
Reviewed-by: Eric Biggers 
Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: fix .gitignore for generated utfdata file

2022-01-17T05:26:43+00:00

Commit 2b3d04787012 ("unicode: Add utf8-data module") changed the
generated utf8data file from 'utf8data.h' to 'utf8data.c', but didn't
change the comments or the .gitignore to match.

The comments should be updated too, but at least they don't cause any
visible breakage.  But the gitignore file needs changing to avoid git
complaining about untracked files.

Fixes: 2b3d04787012 ("unicode: Add utf8-data module")
Signed-off-by: Linus Torvalds

unicode: only export internal symbols for the selftests

2021-10-12T14:41:39+00:00

The exported symbols in utf8-norm.c are not needed for normal
file system consumers, so move them to conditional _GPL exports
just for the selftest.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: Add utf8-data module

2021-10-12T14:41:39+00:00

utf8data.h contains a large database table which is an auto-generated
decodification trie for the unicode normalization functions.

Allow building it into a separate module.

Based on a patch from Shreeya Patel .

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: cache the normalization tables in struct unicode_map

2021-10-11T20:02:02+00:00

Instead of repeatedly looking up the version add pointers to the
NFD and NFD+CF tables to struct unicode_map, and pass a
unicode_map plus index to the functions using the normalization
tables.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: move utf8cursor to utf8-selftest.c

2021-10-11T20:01:58+00:00

Only used by the tests, so no need to keep it in the core.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: simplify utf8len

2021-10-11T20:01:54+00:00

Just use the utf8nlen implementation with a (size_t)-1 len argument,
similar to utf8_lookup.  Also move the function to utf8-selftest.c, as
it isn't used anywhere else.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi

unicode: remove the unused utf8{,n}age{min,max} functions

2021-10-11T20:01:50+00:00

No actually used anywhere.

Signed-off-by: Christoph Hellwig 
Signed-off-by: Gabriel Krisman Bertazi