Been struggling to assemble a set of test / specimen text samples that cover the set of Unicode scripts. Might be a useful resource for font developers.
I'm not looking for an exhaustive character set (which could be auto-generated) but something more user-friendly (for a specimen book) and serve as a lightweight initial test of a font with extended script coverage. Yes, there are issues of which language to use for a given script, and regional variations, but I'm thinking as a lightweight test, a set of available texts might be useful for font work.
Some options I've looked at:
I'm not looking for an exhaustive character set (which could be auto-generated) but something more user-friendly (for a specimen book) and serve as a lightweight initial test of a font with extended script coverage. Yes, there are issues of which language to use for a given script, and regional variations, but I'm thinking as a lightweight test, a set of available texts might be useful for font work.
Some options I've looked at:
- Article 1 of the UDHR (Universal Declaration of Human Rights - "All human beings are born free ...") available at https://github.com/unicode-org/udhr. Good length. However, the 500-ish translations cover only 43 of the 150 Unicode scripts (I'm still on Unicode v12.1).
- Genesis 11:1 ("Now the whole world had one language and a common speech"). A bit short. I've collected about 76 scripts (only 50%), but and many of the "under-served" scripts have only images available, not Unicode text.
- Pangrams. Would need development for less-used scripts, and that is daunting.
- Representative Characters. A small set of characters that demonstrate the typographic attributes of that script. This might be useful, but is typically very short, does not represent body text, and does not give the 'feel' of the script from a user perspective.
- Character Strings. For scripts I do not have any of the above, I have been falling back on a character string of the first hundred or so assigned characters, excluding combining diacritics and other oddballs, with some random spaces thrown in to approximate body text. Pretty poor substitute for body text, but that's all I've come up with ...