Involved Source Files Package bidi contains functionality for bidirectional text support.
See https://www.unicode.org/reports/tr9.
NOTE: UNDER CONSTRUCTION. This API may change in backwards incompatible ways
and without notice.bracket.gocore.goprop.gotables15.0.0.gotrieval.go
Package-Level Type Names (total 17, in which 7 are exported)
An Ordering holds the computed visual order of runs of a Paragraph. Calling
SetBytes or SetString on the originating Paragraph invalidates an Ordering.
The methods of an Ordering should only be called by one goroutine at a time.directions[]Directionrunes[][]runestartpos[]int Direction reports the directionality of the runs.
The direction may be LeftToRight, RightToLeft, Mixed, or Neutral. NumRuns returns the number of runs. Run returns the ith run within the ordering.
func (*Paragraph).Line(start, end int) (Ordering, error)
func (*Paragraph).Order() (Ordering, error)
func calculateOrdering(levels []level, runes []rune) Ordering
A Paragraph holds a single Paragraph for Bidi processing.oOrderingoptionsoptionsopts[]Optionp[]bytepairTypes[]bracketTypepairValues[]runerunes[]runetypes[]Class Direction returns the direction of the text of this paragraph.
The direction may be LeftToRight, RightToLeft, Mixed, or Neutral. IsLeftToRight reports whether the principle direction of rendering for this
paragraphs is left-to-right. If this returns false, the principle direction
of rendering is right-to-left. Line computes the visual ordering of runs for a single line starting and
ending at the given positions in the original text. Order computes the visual ordering of all the runs in a Paragraph. RunAt reports the Run at the given position of the input text.
This method can be used for computing line breaks on paragraphs. SetBytes configures p for the given paragraph text. It replaces text
previously set by SetBytes or SetString. If b contains a paragraph separator
it will only process the first paragraph and report the number of bytes
consumed from b including this separator. Error may be non-nil if options are
given. SetString configures s for the given paragraph text. It replaces text
previously set by SetBytes or SetString. If s contains a paragraph separator
it will only process the first paragraph and report the number of bytes
consumed from s including this separator. Error may be non-nil if options are
given. Initialize the p.pairTypes, p.pairValues and p.types from the input previously
set by p.SetBytes() or p.SetString(). Also limit the input up to (and including) a paragraph
separator (bidi class B).
The function p.Order() needs these values to be set, so this preparation could be postponed.
But since the SetBytes and SetStrings functions return the length of the input up to the paragraph
separator, the whole input needs to be processed anyway and should not be done twice.
The function has the same return values as SetBytes() / SetString()
Properties provides access to BiDi properties of runes.entryuint8lastuint8 Class returns the Bidi class for p. IsBracket reports whether the rune is a bracket. IsOpeningBracket reports whether the rune is an opening bracket.
IsBracket must return true. TODO: find a better API and expose.
func Lookup(s []byte) (p Properties, sz int)
func LookupRune(r rune) (p Properties, size int)
func LookupString(s string) (p Properties, sz int)
A Run is a continuous sequence of characters of a single direction.directionDirectionrunes[]runestartposint Bytes returns the text of the run in its original order. Direction reports the direction of the run. Pos returns the position of the Run within the text passed to SetBytes or SetString of the
originating Paragraph value. String returns the text of the run in its original order.
*Run : fmt.Stringer
*Run : context.stringer
*Run : runtime.stringer
func (*Ordering).Run(i int) Run
func (*Paragraph).RunAt(pos int) Run
bidiTrie. Total size: 19904 bytes (19.44 KiB). Checksum: b1f201ed2debb6c8. lookup returns the trie value for the first UTF-8 encoding in s and
the width in bytes of this encoding. The size will be 0 if s does not
hold enough bytes to complete the encoding. len(s) must be greater than 0. lookupString returns the trie value for the first UTF-8 encoding in s and
the width in bytes of this encoding. The size will be 0 if s does not
hold enough bytes to complete the encoding. len(s) must be greater than 0. lookupStringUnsafe returns the trie value for the first UTF-8 encoding in s.
s must start with a full and valid UTF-8 encoded rune. lookupUnsafe returns the trie value for the first UTF-8 encoding in s.
s must start with a full and valid UTF-8 encoded rune. lookupValue determines the type of block n and looks up the value for b.
func newBidiTrie(i int) *bidiTrie
var trie *bidiTrie
bracketPair holds a pair of index values for opening and closing bracket
location of a bracket pair.closerintopenerint(*bracketPair) String() string
*bracketPair : fmt.Stringer
*bracketPair : context.stringer
*bracketPair : runtime.stringer
// directional bidi codes for an isolated run // array of index values into the original string // list of positions for opening brackets bracket pair positions sorted by location of opening bracket // direction corresponding to start of sequence assignBracketType implements rule N0 for a single bracket pair. classBeforePair determines which strong types are present before a Bracket
Pair. Return R or L if strong type found, otherwise ON. classifyPairContent reports the strong types contained inside a Bracket Pair,
assuming the given embedding direction.
It returns ON if no strong type is found. If a single strong type is found,
it returns this type. Otherwise it returns the embedding direction.
TODO: use separate type for "strong" directionality. getStrongTypeN0 maps character's directional code to strong type as required
by rule N0.
TODO: have separate type for "strong" directionality. locateBrackets locates matching bracket pairs according to BD16.
This implementation uses a linked list instead of a stack, because, while
elements are added at the front (like a push) they are not generally removed
in atomic 'pop' operations, reducing the benefit of the stack archetype. matchOpener reports whether characters at given positions form a matching
bracket pair. resolveBrackets implements rule N0 for a list of pairs.(*bracketPairer) setBracketsToType(loc bracketPair, dirPair Class, initialTypes []Class)
bracketPairs is a slice of bracketPairs with a sort.Interface implementation.( bracketPairs) Len() int( bracketPairs) Less(i, j int) bool( bracketPairs) Swap(i, j int)
bracketPairs : sort.Interface
Bidi_Paired_Bracket_Type
BD14. An opening paired bracket is a character whose
Bidi_Paired_Bracket_Type property value is Open.
BD15. A closing paired bracket is a character whose
Bidi_Paired_Bracket_Type property value is Close.
func newParagraph(types []Class, pairTypes []bracketType, pairValues []rune, levels level) (*paragraph, error)
func validatePbTypes(pairTypes []bracketType) error
func validatePbValues(pairValues []rune, pairTypes []bracketType) error
const bpClose
const bpNone
const bpOpen
eosClass // indexes to the original stringlevellevelp*paragraph // resolved levels after application of rulessosClass // type of each character using the index(*isolatingRunSequence) Len() int Applies the levels and types resolved in rules W1-I2 to the
resultLevels array. Algorithm validation. Assert that all values in types are in the
provided set. Return the limit of the run consisting only of the types in validSet
starting at index. This checks the value at index, and will return
index if that value is not in validSet. 7) resolving implicit embedding levels Rules I1, I2. 6) resolving neutral types Rules N1-N2. Resolving weak types Rules W1-W7.
Note that some weak types (EN, AN) remain after this processing is
complete.
func resolvePairedBrackets(s *isolatingRunSequence)
A paragraph contains the state of a paragraph. // default: = implicitLevel;initialTypes[]Class Index of matching isolate initiator for PDI characters. For other
characters, and for PDIs with no matching isolate initiator, the value of
matchingIsolateInitiator will be set to -1. Index of matching PDI for isolate initiator characters. For other
characters, the value of matchingPDI will be set to -1. For isolate
initiators with no matching PDI, matchingPDI will be set to the length of
the input string. Arrays of properties needed for paired bracket evaluation in N0 // paired Bracket types for paragraph // rune for opening bracket or pbOpen and pbClose; 0 for pbNoneresultLevels[]level at the paragraph levels(*paragraph) Len() int Assign level information to characters removed by rule X9. This is for
ease of relating the level information to the original input data. Note
that the levels assigned to these codes are arbitrary, they're chosen so
as to avoid breaking level runs. Determine explicit levels using rules X1 - X8 Definition BD13. Determine isolating run sequences. determineLevelRuns returns an array of level runs. Each level run is
described as an array of indexes into the input string.
Determines the level runs. Rule X9 will be applied in determining the
runs, in the way that makes sure the characters that are supposed to be
removed are not included in the runs. determineMatchingIsolates determines the matching PDI for each isolate
initiator and vice versa.
Definition BD9.
At the end of this function:
- The member variable matchingPDI is set to point to the index of the
matching PDI character for each isolate initiator character. If there is
no matching PDI, it is set to the length of the input text. For other
characters, it is set to -1.
- The member variable matchingIsolateInitiator is set to point to the
index of the matching isolate initiator character for each PDI character.
If there is no matching isolate initiator, or the character is not a PDI,
it is set to -1. determineParagraphEmbeddingLevel reports the resolved paragraph direction of
the substring limited by the given range [start, end).
Determines the paragraph level based on rules P2, P3. This is also used
in rule X5c to find if an FSI should resolve to LRI or RLI. getLevels computes levels array breaking lines at offsets in linebreaks.
Rule L1.
The linebreaks array must include at least one value. The values must be
in strictly increasing order (no duplicates) between 1 and the length of
the text, inclusive. The last value must be the length of the text. getReordering returns the reordering of lines from a visual index to a
logical index for line breaks at the given offsets.
Lines are concatenated from left to right. So for example, the fifth
character from the left on the third line is
getReordering(linebreaks)[linebreaks[1] + 4]
(linebreaks[1] is the position after the last character of the second
line, which is also the index of the first character on the third line,
and adding four gets the fifth character from the left).
The linebreaks array must include at least one value. The values must be
in strictly increasing order (no duplicates) between 1 and the length of
the text, inclusive. The last value must be the length of the text. Rule X10, second bullet: Determine the start-of-sequence (sos) and end-of-sequence (eos) types,
either L or R, for each isolating run sequence. The algorithm. Does not include line-based processing (Rules L1, L2).
These are applied later in the line-based phase of the algorithm.
func newParagraph(types []Class, pairTypes []bracketType, pairValues []rune, levels level) (*paragraph, error)
Package-Level Functions (total 23, in which 6 are exported)
AppendReverse reverses the order of characters of in, appends them to out,
and returns the result. Modifiers will still follow the runes they modify.
Brackets are replaced with their counterparts.
DefaultDirection sets the default direction for a Paragraph. The direction is
overridden if the text contains directional characters.
Lookup returns properties for the first rune in s and the width in bytes of
its encoding. The size will be 0 if s does not hold enough bytes to complete
the encoding.
LookupRune returns properties for r.
LookupString returns properties for the first rune in s and the width in
bytes of its encoding. The size will be 0 if s does not hold enough bytes to
complete the encoding.
ReverseString reverses the order of characters in s and returns a new string.
Modifiers will still follow the runes they modify. Brackets are replaced with
their counterparts.
Return multiline reordering array for a given level array. Reordering
does not occur across a line break.
Return reordering array for a given level array. This reorders a single
line. The reordering is a visual to logical map. For example, the
leftmost char is string.charAt(order[0]). Rule L2.
isRemovedByX9 reports whether the type is one of the types removed in X9.
isWhitespace reports whether the type is considered a whitespace type for the
line break rules.
newParagraph initializes a paragraph. The user needs to supply a few arrays
corresponding to the preprocessed text input. The types correspond to the
Unicode BiDi classes for each rune. pairTypes indicates the bracket type for
each rune. pairValues provides a unique bracket class identifier for each
rune (suggested is the rune of the open bracket for opening and matching
close brackets, after normalization). The embedding levels are optional, but
may be supplied to encode embedding levels of styled text.
resolvePairedBrackets runs the paired bracket part of the UBA algorithm.
For each rune, it takes the indexes into the original string, the class the
bracket type (in pairTypes) and the bracket identifier (pairValues). It also
takes the direction type for the start-of-sentence and the embedding level.
The identifiers for bracket types are the rune of the canonicalized opening
bracket for brackets (open or close) or 0 for runes that are not brackets.
LeftToRight indicates the text contains no right-to-left characters and
that either there are some left-to-right characters or the option
DefaultDirection(LeftToRight) was passed.
RightToLeft indicates the text contains no left-to-right characters and
that either there are some right-to-left characters or the option
DefaultDirection(RightToLeft) was passed.
The pages are generated with Goldsv0.6.7. (GOOS=linux GOARCH=amd64)
Golds is a Go 101 project developed by Tapir Liu.
PR and bug reports are welcome and can be submitted to the issue list.
Please follow @Go100and1 (reachable from the left QR code) to get the latest news of Golds.