Home

Top

←

→

Overview

Module

Class

Index

Help

About

<Frames(L)

| No Frames |

Frames(R)>

Package xmlschema_acue :: Module codepoints

Module codepoints

source code

This module defines Unicode character categories and blocks, defined as sets of code points.

Classes
	UnicodeSubset Represent a subset of Unicode code points, implemented with an ordered list of integer values and ranges.

Functions

code_point_order(cp)
Ordering function for code points.

source code

code_point_reverse_order(cp)
Reverse ordering function for code points.

source code

iter_code_points(code_points, reverse=False)
Iterates a code points sequence.

source code

check_code_point(cp)
Checks a code point or code point range.

source code

code_point_repr(cp)
Returns the string representation of a code point.

source code

iterparse_character_group(s, expand_ranges=False)
Parse a regex character group part, generating a sequence of code points and code points ranges.

source code

get_unicodedata_categories()
Extracts Unicode categories information from unicodedata library.

source code

save_unicode_categories(filename=None)
Save Unicode categories to a JSON file.

source code

build_unicode_categories(filename=None)
Builds the Unicode categories as `UnicodeSubset` instances.

source code

Variables
	CHARACTER_GROUP_ESCAPED = `{ord(c) for c in r'-\|.^?*+{}()[]\'}` Code Points of escaped chars in a character group.
	UCS4_MAXUNICODE = `1114111`
	UNICODE_CATEGORIES = `build_unicode_categories()`
	UNICODE_BLOCKS = `{'IsBasicLatin': UnicodeSubset('-'), 'IsLat...`

Function Details

iter_code_points(code_points, reverse=False)

source code

Iterates a code points sequence. The code points are accorpated in ranges when are contiguous.

:param code_points: an iterable with code points and code point ranges. :param reverse: if `True` reverses the order of the sequence. :return: yields code points or code point ranges.

Decorators:

check_code_point(cp)

source code

Checks a code point or code point range.

:return: a valid code point range.

Decorators:

code_point_repr(cp)

source code

Returns the string representation of a code point.

:param cp: an integer or a tuple with at least two integers. Values must be in interval [0, sys.maxunicode].

Decorators:

iterparse_character_group(s, expand_ranges=False)

source code

Parse a regex character group part, generating a sequence of code points and code points ranges. An unescaped hyphen (-) that is not at the start or at the and is interpreted as range specifier.

:param s: a string representing a character group part. :param expand_ranges: if set to `True` then expands character ranges. :return: yields integers or couples of integers.

Decorators:

get_unicodedata_categories()

source code

Extracts Unicode categories information from unicodedata library. Each category is represented with an ordered list containing code points and code point ranges.

:return: a dictionary with category names as keys and lists as values.

Decorators:

save_unicode_categories(filename=None)

source code

Save Unicode categories to a JSON file.

:param filename: the JSON file to save. If it's `None` uses the predefined filename 'unicode_categories.json' and try to save in the directory of this module.

Decorators:

build_unicode_categories(filename=None)

source code

Builds the Unicode categories as `UnicodeSubset` instances. For a fast building a pre-built JSON file with Unicode categories data can be used. If the JSON file is missing or is not accessible the categories data is rebuild using `unicodedata.category()` API.

:param filename: the name of the JSON file to load for a fast building of the categories. If not provided the predefined filename 'unicode_categories.json' is used. :return: a dictionary that associates Unicode category names with `UnicodeSubset` instances.

Decorators:

Variables Details

UNICODE_BLOCKS

Value:

{'IsBasicLatin': UnicodeSubset('-'), 'IsLatin-1Supplement': UnicodeS
ubset('€-ÿ'), 'IsLatinExtended-A': UnicodeSubset('Ā-ſ'), 'IsLatinExten
ded-B': UnicodeSubset('ƀ-ɏ'), 'IsIPAExtensions': UnicodeSubset('ɐ-ʯ'),
 'IsSpacingModifierLetters': UnicodeSubset('ʰ-˿'), 'IsCombiningDiacrit
icalMarks': UnicodeSubset('̀-ͯ'), 'IsGreek': UnicodeSubset('Ͱ-Ͽ'), 'Is
Cyrillic': UnicodeSubset('Ѐ-ӿ'), 'IsArmenian': UnicodeSubset('԰-֏'), '
IsHebrew': UnicodeSubset('֐-׿'), 'IsArabic': UnicodeSubset('؀-ۿ'), 'Is
Syriac': UnicodeSubset('܀-ݏ'), 'IsThaana': UnicodeSubset('ހ-޿'), 'IsDe
...

Home

Top

←

→

Overview

Module

Class

Index

Help

About

Copyright(C) 2019 Arno-Can Uestuensoez @Ingenieurbuero Arno-Can Uestuensoez	https://arnocan.wordpress.com
Generated by Epydoc 4.0.4 / Python-3.8 / fedora27 on Fri Dec 13 15:25:22 2019	http://epydoc.sourceforge.net