# 字符串

¥String

UTF-16 代码单元的固定长度序列。

¥A fixed-length sequence of UTF-16 code units.

String API 的工作方式与 JavaScript (MDN (opens new window)) 非常相似,显着的区别是 string 类型是 String 的实际别名。

¥The String API works very much like JavaScript's (MDN (opens new window)), with the notable difference that the string type is an actual alias of String.

# 静态成员

¥Static members

  • function fromCharCode(unit: i32, surr?: i32): string
    

    从指定的 UTF-16 代码单元创建一个字符长的字符串。

    ¥Creates a one character long string from the specified UTF-16 code units.

  • function fromCharCodes(units: u16[]): string
    

    从 UTF-16 代码单元序列创建字符串。

    ¥Creates a string from a sequence of UTF-16 code units.

  • function fromCodePoint(code: i32): string
    

    从指定的 Unicode 代码点创建一个字符长的字符串。

    ¥Creates a one character long string from the specified Unicode code point.

  • function fromCodePoints(codes: i32[]): string
    

    从 Unicode 代码点序列创建字符串。

    ¥Creates a string from a sequence of Unicode code points.

# 实例成员

¥Instance members

# 字段

¥Fields

  • readonly length: i32
    

    UTF-16 代码单元中字符串的长度。

    ¥The length of the string in UTF-16 code units.

# 方法

¥Methods

  • function at(pos: i32): string
    

    获取指定位置处的 UTF-16 代码单元作为单个字符串。此方法允许正整数和负整数。负整数从最后一个字符串字符开始倒数。

    ¥Gets the UTF-16 code unit at the specified position as a single character string. This method allows for positive and negative integers. Negative integers count back from the last string character.

  • function charAt(pos: i32): string
    

    获取指定位置处的 UTF-16 代码单元作为单个字符串。如果超出范围,则返回 ""(空字符串)。

    ¥Gets the UTF-16 code unit at the specified position as a single character string. Returns "" (empty string) if out of bounds.

  • function charCodeAt(pos: i32): i32
    

    以数字形式获取指定位置的 UTF-16 代码单元。如果超出范围,则返回 -1

    ¥Gets the UTF-16 code unit at the specified position as a number. Returns -1 if out of bounds.

  • function codePointAt(pos: i32): i32
    

    获取指定(UTF-16 代码单元)位置处的 Unicode 代码点作为数字,可能组合多个连续的 UTF-16 代码单元。如果超出范围,则返回 -1

    ¥Gets the Unicode code point at the specified (UTF-16 code unit) position as a number, possibly combining multiple successive UTF-16 code units. Returns -1 if out of bounds.

  • function concat(other: string): string
    

    按此顺序将此字符串与另一个字符串连接起来,并返回结果字符串。

    ¥Concatenates this string with another string, in this order, and returns the resulting string.

  • function endsWith(search: string, end?: i32): bool
    

    测试字符串是否以指定字符串结尾。如果指定,end 表示停止搜索的位置,就好像它是字符串的长度一样。

    ¥Tests if the strings ends with the specified string. If specified, end indicates the position at which to stop searching, acting as if it is the length of the string.

  • function includes(search: string, start?: i32): bool
    

    测试字符串是否包含搜索字符串。如果指定,start 表示开始搜索的位置。

    ¥Tests if the string includes the search string. If specified, start indicates the position at which to begin searching.

  • function indexOf(search: string, start?: i32): i32
    

    获取指定搜索字符串在字符串中的第一个索引,如果未找到则获取 -1。如果指定,start 表示开始搜索的位置。

    ¥Gets the first index of the specified search string within the string, or -1 if not found. If specified, start indicates the position at which to begin searching.

  • function lastIndexOf(search: string, start?: i32): i32
    

    获取字符串中指定搜索字符串的最后一个索引,如果未找到则获取 -1。如果指定,start 表示从右向左开始搜索的位置。

    ¥Gets the last index of the specified search string within the string, or -1 if not found. If specified, start indicates the position at which to begin searching from right to left.

  • function padStart(length: i32, pad: string): string
    

    用另一个字符串的内容填充该字符串(可能会多次),直到结果字符串达到指定的长度,然后返回结果字符串。

    ¥Pads the string with the contents of another string, possibly multiple times, until the resulting string reaches the specified length, returning the resulting string.

  • function padEnd(length: i32, pad: string): string
    

    用另一个字符串的内容填充该字符串(可能会多次),直到结果字符串达到指定的长度,然后返回结果字符串。

    ¥Pads the string with the contents of another string, possibly multiple times, until the resulting string reaches the specified length, returning the resulting string.

  • function repeat(count?: i32): string
    

    重复字符串 count 次并返回连接结果。

    ¥Repeats the string count times and returns the concatenated result.

  • function replace(search: string, replacement: string): string
    

    将第一次出现的 search 替换为 replacement

    ¥Replaces the first occurrence of search with replacement.

  • function replaceAll(search: string, replacement: string): string
    

    将所有出现的 search 替换为 replacement

    ¥Replaces all occurrences of search with replacement.

  • function slice(start: i32, end?: i32): string
    

    返回从 start(含)到 end(不含)的字符串区域,作为新字符串。如果省略,end 默认为字符串末尾。

    ¥Returns the region of the string from start inclusive to end exclusive, as a new string. If omitted, end defaults to the end of the string.

  • function split(separator?: string, limit?: i32): string[]
    

    在每次出现 separator 时拆分字符串,并将结果作为最多包含 limit 值的新字符串数组返回。如果省略 limit,则假定没有限制。如果字符串中省略或不存在 separator,则该字符串将成为数组的唯一元素。如果 separator 是空字符串,则在代码单元(而不是代码点)之间执行拆分,可能会破坏代理项对。

    ¥Splits the string at each occurrence of separator and returns the result as a new array of strings that has a maximum of limit values. If limit is omitted, no limit is assumed. If separator is omitted or not present in the string, the string becomes the sole element of the array. If separator is an empty string, the split is performed between code units (not code points), potentially destroying surrogate pairs.

  • function startsWith(search: string, start?: i32): bool
    

    测试字符串是否以指定字符串开头。如果指定,start 指示开始搜索的位置,充当字符串的开头。

    ¥Tests if the string starts with the specified string. If specified, start indicates the position at which to begin searching, acting as the start of the string.

  • function substring(start: i32, end?: i32): string
    

    获取字符串中介于 startend 之间的部分作为新字符串。

    ¥Gets the part of the string in between start inclusive and end exclusive as a new string.

  • function toString(): this
    

    返回字符串。

    ¥Returns the string.

  • function trim(): string
    

    删除字符串开头和结尾的空白字符,返回结果字符串。

    ¥Removes white space characters from both the start and the end of the string, returning the resulting string.

  • function trimStart(): string
    function trimLeft(): string
    

    从字符串开头删除空白字符,返回结果字符串。

    ¥Removes white space characters from the start of the string, returning the resulting string.

  • function trimEnd(): string
    function trimRight(): string
    

    从字符串末尾删除空格字符,返回结果字符串。

    ¥Removes white space characters from the end of the string, returning the resulting string.

# 编码 API

¥Encoding API

# UTF8

与使用 UTF-8 的环境集成时,可以使用以下辅助程序快速重新编码字符串数据。

¥When integrating with an environment that uses UTF-8, the following helpers can be used to quickly re-encode String data.

  • function String.UTF8.byteLength(str: string, nullTerminated?: bool): i32
    

    计算指定字符串编码为 UTF-8 时的字节长度,可以选择以 null 结尾。

    ¥Calculates the byte length of the specified string when encoded as UTF-8, optionally null terminated.

  • function String.UTF8.encode(str: string, nullTerminated?: bool): ArrayBuffer
    

    将指定字符串编码为 UTF-8 字节,可以选择以 null 结尾。

    ¥Encodes the specified string to UTF-8 bytes, optionally null terminated.

  • function String.UTF8.encodeUnsafe(str: usize, len: i32, buf: usize, nullTerminated?: bool): usize
    

    将指定的原始字符串编码为 UTF-8 字节,可以选择以 null 结尾。返回写入的字节数。

    ¥Encodes the specified raw string to UTF-8 bytes, opionally null terminated. Returns the number of bytes written.

  • function String.UTF8.decode(buf: ArrayBuffer, nullTerminated?: bool): string
    

    将指定缓冲区从 UTF-8 字节解码为字符串,可以选择以 null 结尾。

    ¥Decodes the specified buffer from UTF-8 bytes to a string, optionally null terminated.

  • function String.UTF8.decodeUnsafe(
      buf: usize,
      len: usize,
      nullTerminated?: bool
    ): string
    

    将原始 UTF-8 字节解码为字符串,可以选择以 null 结尾。

    ¥Decodes raw UTF-8 bytes to a string, optionally null terminated.

提示

请注意,任何 ArrayBuffer 返回值都是内部指向缓冲区数据的指针,因此可以直接传递给 C 函数。但是,如果指针的寿命比立即的外部函数调用长,则 必须跟踪缓冲区的生命周期 不会因数据无效而过早收集。

¥Note that any ArrayBuffer return value is a pointer to the buffer's data internally and thus can be passed to let's say a C-function directly. However, if the pointer is meant to live longer than the immediate external function call, the lifetime of the buffer must be tracked so it doesn't become collected prematurely with the data becoming invalid.

# UTF16

以下内容主要是为了在 Strings 和 ArrayBuffers 之间进行复制提供安全的方法,但不涉及重新编码步骤。

¥The following mostly exist to have a safe way to copy between Strings and ArrayBuffers, but doesn't involve a re-encoding step.

  • function String.UTF16.byteLength(str: string): i32
    

    计算指定字符串编码为 UTF-16 时的字节长度。

    ¥Calculates the byte length of the specified string when encoded as UTF-16.

  • function String.UTF16.encode(str: string): ArrayBuffer
    

    将指定字符串编码为 UTF-16 字节。

    ¥Encodes the specified string to UTF-16 bytes.

  • function String.UTF16.encodeUnsafe(str: usize, len: i32, buf: usize): usize
    

    将指定的原始字符串编码为 UTF-16 字节。返回写入的字节数。

    ¥Encodes the specified raw string to UTF-16 bytes. Returns the number of bytes written.

  • function String.UTF16.decode(buf: ArrayBuffer): string
    

    将指定缓冲区从 UTF-16 字节解码为字符串。

    ¥Decodes the specified buffer from UTF-16 bytes to a string.

  • function String.UTF16.decodeUnsafe(buf: usize, len: usize): string
    

    将原始 UTF-16 字节解码为字符串。

    ¥Decodes raw UTF-16 bytes to a string.

# 注意事项

¥Considerations

长话短说:AssemblyScript 像 JavaScript 一样处理字符串,但是......

¥TL;DR: AssemblyScript does strings like JavaScript, but...

AssemblyScript 字符串有意与 JavaScript 字符串共享语义,包括可能出现孤立的代理项并且不会立即对其进行清理。这样做有两个原因。首先,正如 Unicode 标准,版本 13.0 (opens new window) 在第 §2.7 Unicode 字符串中所述:

¥AssemblyScript strings purposely share their semantics with JavaScript strings, including that isolated surrogates can occur and are not eagerly sanitized. This is done for two reasons. First, as the Unicode Standard, Version 13.0 (opens new window) states in §2.7 Unicode Strings:

根据编程环境的不同,Unicode 字符串可能需要也可能不需要采用相应的 Unicode 编码形式。例如,Java、C# 或 ECMAScript 中的字符串是 Unicode 16 位字符串,但不一定是格式正确的 UTF-16 序列。在正常处理中,允许此类字符串包含格式不正确的 UTF-16 代码单元序列(即孤立的代理)可能会更有效。由于字符串是每个程序的基本组成部分,因此在修改字符串的每个操作中检查孤立的代理可能会产生巨大的开销,特别是因为增补字符在全世界程序中的整体文本中所占的比例极其罕见。

¥Depending on the programming environment, a Unicode string may or may not be required to be in the corresponding Unicode encoding form. For example, strings in Java, C#, or ECMAScript are Unicode 16-bit strings, but are not necessarily well-formed UTF-16 sequences. In normal processing, it can be far more efficient to allow such strings to contain code unit sequences that are not well-formed UTF-16—that is, isolated surrogates. Because strings are such a fundamental component of every program, checking for isolated surrogates in every operation that modifies strings can create significant overhead, especially because supplementary characters are extremely rare as a percentage of overall text in programs worldwide.

其次,在 AssemblyScript 和 JavaScript 之间,或者两个 AssemblyScript 模块之间,清理也是不可取的,在这些模块中,当函数调用遍历 JS<- 时,保持惯用字符串的相等性、不等性和哈希完整性至关重要。 >AS 或 AS<->AS 边界。例如,应用通常由 AssemblyScript 和 JavaScript 代码组成,或者 AssemblyScript 模块替换 JavaScript 模块,并且期望两个 AssemblyScript 模块可以交换字符串,而迫切需要清理(修改)字符串。 不必要且危险的。此外,ESM 集成旨在使 JavaScript 和 WebAssembly 模块更广泛地互换,这只有在保持兼容性的情况下才能安全地实现。

¥Second, sanitization is also not desirable in between AssemblyScript and JavaScript, or between two AssemblyScript modules, where it can be critical to maintain equality, inequality and hash integrity of idiomatic strings when function calls traverse JS<->AS or AS<->AS boundaries. It is common, for example, that an application is composed of both AssemblyScript and JavaScript code, or that an AssemblyScript module replaces a JavaScript module, and it is expected that two AssemblyScript modules can exchange strings, where sanitizing (mutating) strings eagerly would be unnecessary and dangerous. Also, ESM-integration aims to make JavaScript and WebAssembly modules interchangeable more broadly, which can only be achieved safely when maintaining compatibility.

请注意,这一立场与 WebAssembly CG 已经决定了 (opens new window) 中针对接口类型和组件模型的大多数立场不同,包括其 Web 集成,无论我们关心什么(1 (opens new window)2 (opens new window)),但我们认为,只有在不可避免的情况下才应进行清理,理想情况下 尽可能晚地匹配当今 Web API 所做的事情(在调用 HTTP API 时说对,但不能更早),以免对许多已经与 JavaScript 和 Web 所做的事情匹配的流行语言强加无法解决的问题。因此,我们希望接口类型和类似或相关提案能够修改其方法以遵循 WebIDL 的先例,其中 DOMString (opens new window) 代表概念,USVString (opens new window) 是特殊情况。正如 WebIDL 所说:

¥Note that this stance differs from what the majority within the WebAssembly CG has decided (opens new window) for Interface Types and the Component Model, including its Web integration, regardless of our concerns (1 (opens new window), 2 (opens new window)), yet we believe that sanitization should only be performed where unavoidable, ideally as late as possible, matching what Web APIs do today (say right when calling an HTTP API but no earlier) to not impose unresolvable problems on the many popular languages already matching what JavaScript and the Web do. As such, we'd prefer if Interface Types and similar or related proposals revised their approach to follow WebIDL's precedent, where DOMString (opens new window) represents the concept and USVString (opens new window) is the special case. As WebIDL states:

如有疑问,请使用 DOMString

¥When in doubt, use DOMString.

当然,我们同意应该在某种程度上支持 USVString,以匹配 Rust 和 C++ 所做的事情,但使其成为唯一受支持的概念不仅会从根本上打破生态系统的另一半,而且还会破坏整个 Web 平台 ,这是不必要且可以避免的,因为仅在需要时可以通过接口适配器轻松执行清理。

¥Of course we agree that USVString should be supported in some capacity to match what Rust and C++ do, but making it the only supported concept will not only break the other half of the ecosystem on a fundamental level, but also the Web platform as a whole, which is unnecessary and avoidable given that sanitization can easily be performed by an interface adapter only when needed.