Index: docs/accessibility.md |
diff --git a/docs/accessibility.md b/docs/accessibility.md |
index 5029490c48d5376f8a5a9e906038df9c0ee00e67..6b560e02e476385e4e9e5c36e55a09065904bc2e 100644 |
--- a/docs/accessibility.md |
+++ b/docs/accessibility.md |
@@ -1,55 +1,383 @@ |
# Accessibility Overview |
-This document describes how accessibility is implemented throughout Chromium at |
-a high level. |
+Accessibility means ensuring that all users, including users with disabilities, |
+have equal access to software. One piece of this involves basic design |
+principles such as using appropriate font sizes and color contrast, |
+avoiding using color to convey important information, and providing keyboard |
+alternatives for anything that is normally accomplished with a pointing device. |
+However, the majority of accessibility code in Chromium is concerned with |
aboxhall
2016/11/04 17:02:07
Rather than "the majority of accessibility code in
dmazzoni
2016/11/04 22:09:05
Done.
|
+providing full access to Chromium's UI via external accessibility APIs that |
+are utilized by assistive technology. |
+ |
+Assistive technology includes: |
aboxhall
2016/11/04 17:02:07
How about "'Assistive technology' here refers to s
dmazzoni
2016/11/04 22:09:05
Done.
|
+ |
+* Screen readers for blind users that describe the screen using |
+ synthesized speech or braille |
+* Voice control applications that let you speak to the computer, |
+* Switch access that lets you control the computer with a small number |
+ of physical switches, |
+* Magnifiers that magnify a portion of the screen, and often highlight the |
+ cursor and caret for easier viewing, and |
+* Assistive learning and literacy software that helps users who have a hard |
+ time reading print, by highlighting and/or speaking selected text |
+ |
+In addition, because accessibility APIs provide a convenient and universal |
+way to explore and control applications, they're often used for automated |
+testing scripts, and UI automation software like password managers. |
+ |
+Web browsers play an important role in this ecosystem because they need |
+to not only provide access to their own UI, but also provide access to |
+all of the content of the web. |
+ |
+Each operating system has its own native accessibility API. While the |
+core APIs tend to be well-documented, it's unfortunately common for |
+screen readers in particular to depend on additional undocumented or |
+vendor-specific APIs in order to fully function, especially with web |
+browsers, because the standard APIs are insufficient to handle the |
+complexity of the web. |
+ |
+Chromium needs to support all of these operating system and |
+vendor-specific accessibility APIs in order to be usable with the full |
+ecosystem of assistive technology on all platforms. Just like Chromium |
+sometimes mimics the quirks and bugs of older browsers, Chromium often |
+needs to mimic the quirks and bugs of other browsers' implementation |
+of accessibility APIs, too. |
## Concepts |
-The three central concepts of accessibility are: |
+While each operating system and vendor accessibility API is different, |
+there are some concepts all of them share. |
1. The *tree*, which models the entire interface as a tree of objects, exposed |
- to screenreaders or other accessibility software; |
-2. *Events*, which let accessibility software know that a part of the tree has |
+ to assistive technology via accessibility APIs; |
+2. *Events*, which let assistive technology know that a part of the tree has |
changed somehow; |
-3. *Actions*, which come from accessibility software and ask the interface to |
+3. *Actions*, which come from assistive technology and ask the interface to |
change. |
-Here's an example of an accessibility tree looks like. The following HTML: |
+Consider the following small HTML file: |
``` |
-<select title="Select A"> |
- <option value="1">Option 1</option> |
- <option value="2" selected>Option 2</option> |
- <option value="3">Option 3</option> |
-</select> |
+<html> |
+<head> |
+ <title>How old are you?</title> |
+</head> |
+<body> |
+ <label for="age">Age</label> |
+ <input id="age" type="number" name="age" value="42"> |
+ <div> |
+ <button>Back</button> |
+ <button>Next</button> |
+ </div> |
+</body> |
+</html> |
``` |
-has a generated accessibility tree like this: |
+### The Accessibility Tree and Accessibility Attributes |
+ |
+Internally, Chromium represents the accessibility tree for that web page |
+using a data structure something like this: |
``` |
-0: AXMenuList title="Select A" |
-1: AXMenuListOption title="Option 1" |
-2: AXMenuListOption title="Option 2" selected |
-3: AXMenuListOption title="Option 3" |
+id=1 role=WebArea name="How old are you?" |
+ id=2 role=Label name="Age" |
+ id=3 role=TextField labelledByIds=[2] value="42" |
+ id=4 role=Group |
+ id=5 role=Button name="Back" |
+ id=6 role=Button name="Next" |
``` |
-Given that accessibility tree, an example of the events generated when selecting |
-"Option 1" might be: |
+Note that the tree structure closely resembles the structure of the |
+HTML elements, but slightly simplified. Each node in the accessibility |
+tree has an ID and a role. Many have a name. The text field has a value, |
+and instead of a name it has labelledByIds, which indicates that its |
+accessible name comes from another node in the tree, the label node |
+with id=2. |
+ |
+On a particular platform, each node in the accessibility tree is implemented |
+by an object that conforms to a particular protocol. |
+ |
+On Windows, the root element implements the IAccessible protocol and |
+if you call IAccessible::get_accRole, it returns ROLE_SYSTEM_DOCUMENT, |
+and if you call IAccessible::get_accName, it returns "How old are you?". |
+Other methods let you walk the tree. |
+ |
+On macOS, the root element implements the NSAccessibility protocol and |
+if you call accessibilityRole(), it returns @"AXWebArea", and if you |
Elly Fong-Jones
2016/11/04 15:59:06
objc methods are generally referenced like -[NSAcc
dmazzoni
2016/11/04 22:09:05
Good idea
|
+call accessibilityLabel(), it returns "How old are you?". |
+ |
Elly Fong-Jones
2016/11/04 15:59:06
what (if anything) happens on Linux?
dmazzoni
2016/11/04 22:09:05
Done.
|
+So while the details of the interface vary, the underlying concepts are |
+similar. Both IAccessible and NSAccessibility have a concept of a role, |
+but IAccessible uses a role of "document" for a web page, while NSAccessibility |
+uses a role of "web area". Both IAccessible and NSAccessibility have a |
+concept of the primary accessible text for a node, but IAccessible calls |
+it the "name" while NSAccessibility calls it the "label". |
+ |
+**Historical note:** The internal names of roles and attributes in |
+Chrome often tend to most closely match the macOS accessibility API |
+because Chromium was originally based on WebKit, where most of the |
+accessibility code was written by Apple. Over time we're slowly |
+migrating internal names to match what those roles and attributes are |
+called in web accessibility standards, like ARIA. |
+ |
+### Accessibility Events |
+ |
+In Chromium's internal terminology, an Accessibility Event always represents |
+communication from the app to the assistive technology, indicating that the |
+accessibility tree changed in some way. |
+ |
+As an example, if the user were to press the Tab key and the text |
+field from the example above became focused, Chromium would fire a |
+"focus" accessibility event that assistive technology could listen |
+to. A screen reader might then announce the name and current value of |
+the text field. A magnifier might zoom the screen to its bounding |
+box. If the user types some text into the text field, Chromium would |
+fire a "value changed" accessibility event. |
+ |
+As with nodes in the accessibility tree, each platform has a slightly different |
+API for accessibility events. On Windows we'd fire EVENT_OBJECT_FOCUS for |
+a focus change, and on Mac we'd fire @"AXFocusedUIElementChanged". |
+Those are pretty similar. Sometimes they're quite different - to support |
+live regions (notifications that certain key parts of a web page have changed), |
+on Mac we simply fire @"AXLiveRegionChanged", but on Windows we need to |
+fire IA2_EVENT_TEXT_INSERTED and IA2_EVENT_TEXT_REMOVED events individually |
+on each affected node within the changed region, with additional attributes |
+like "container-live:polite" to indicate that the affected node was part of |
+a live region. The point is just to illustrate that the concepts are similar, |
Elly Fong-Jones
2016/11/04 15:59:06
is this last sentence a meta-comment on the rest o
dmazzoni
2016/11/04 22:09:05
Yes, I tried to clarify it a bit. I just didn't wa
|
+but the details of notifying software on each platform about changes can |
+vary quite a bit. |
+ |
+### Accessibility Actions |
+ |
+Each native object that implements a platform's native accessibility API |
+supports a number of actions, which are requests from the assistive |
+technology to control or change the UI. This is the opposite of events, |
+which are messages from Chromium to the assistive technology. |
+ |
+For example, if the user had a voice control application running, such as |
+Voice Access on Android, the user could just speak the name of one of the |
+buttons on the page, like "Next". Upon recognizing that text and finding |
+that it matches one of the UI elements on the page, the voice control |
+app executes the action to click the button id=6 in Chromium's accessibility |
+tree. Internally we call that action "do default" rather than click, since |
+it represents the default action for any type of control. |
+ |
+Other examples of actions include setting focus, changing the value of |
+a control, and scrolling the page. |
+ |
+### Parameterized attributes |
+ |
+In addition to accessibility attributes, events, and actions, native |
+accessibility APIs often have so-called "parameterized attributes". |
+The most common example of this is for text - for example there may be |
+a function to retrieve the bounding box for a range of text, or a |
+function to retrieve the text properties (font family, font size, |
+weight, etc.) at a specific character position. |
+ |
+Parameterized attributes are particularly tricky to implement because |
+of Chromium's multi-process architecture. More on this in the next section. |
+ |
+## Chromium's multi-process architecture |
aboxhall
2016/11/04 17:02:07
A diagram somewhere in here would be very helpful,
dmazzoni
2016/11/04 22:09:05
Sounds great, but I'll save that for a future revi
|
+ |
+Native accessibility APIs tend to have a *functional* interface, where |
+Chromium implements an interface for a canonical accessible object that |
+includes methods to return various attributes, walk the tree, or perform |
+an action like click(), focus(), or setValue(...). |
+ |
+In contrast, the web has a largely *declarative* interface. The shape |
+of the accessibility tree is determined by the DOM tree (occasionally |
+influenced by CSS), and the accessible semantics of a DOM element can |
+be modified by adding ARIA attributes. |
+ |
+One important complication is that all of these native accessibility APIs |
+are *synchronous*, while Chromium is multi-process, with the contents of |
+each web page living in a different process than the process that |
+implements Chromium's UI and the native accessibility APIs. |
+ |
+Chromium's multi-process architecture means that we can't implement |
+accessibility APIs the same way that a single-process browser can - |
+namely, by calling directly into the DOM to compute the result of each |
+API call. For example, on some operating systems there might be an API |
+to get the bounding box for a particular range of characters on the |
+page. In other browsers, this might be implemented by creating a DOM |
+selection object and asking for its bounding box. |
+ |
+That implementation would be impossible in Chromium because it'd require |
+blocking the main thread while waiting for a response from the renderer |
+process that implements that web page's DOM. (Not only is blocking the |
+main thread strictly disallowed, but the latency of doing this for every |
+API call makes it prohibitively slow anyway.) Instead, Chromium takes an |
+approach where a representation of the entire accessibility tree is |
+cached in the main process. Great care needs to be taken to ensure that |
+this representation is as concise as possible. |
+ |
+In Chromium, we build a data structure representing all of the |
+information for a web page's accessibility tree, send the data |
+structure from the renderer process to the main browser process, cache |
+it in the main browser process, and implement native accessibility |
+APIs using solely the information in that cache. |
+ |
+As the accessibility tree changes, tree updates and accessibility events |
+get sent from the renderer process to the browser process. The browser |
+cache is updated atomically in the main thread, so whenever an external |
+client (like assistive technology) calls an accessibility API function, |
+we're always returning something from a complete and consistent snapshot |
+of the accessibility tree. From time to time, the cache may lag what's |
+in the renderer process by a fraction of a second. |
+ |
+Here are some of the specific challenges faced by this approach and |
+how we've addressed them. |
+ |
+### Sparse data |
+ |
+There are a *lot* of possible accessibility attributes for any given |
+node in an accessibility tree. For example, there are more than 150 |
+unique accessibility API methods that Chrome implements on the Windows |
+platform alone. We need to implement all of those APIs, many of which |
+request rather rare or obscure attributes, but storing all possible |
+attribute values in a single struct would be quite wasteful. |
+ |
+To avoid each accessible node object containing hundreds of fields the |
+data for each accessibility node is stored in a relatively compact |
+data structure, ui::AXNodeData. Every AXNodeData has an integer ID, a |
+role enum, and a couple of other mandatory fields, but everything else |
+is stored in attribute arrays, one for each major data type. |
``` |
-AXMenuListItemUnselected 2 |
-AXMenuListItemSelected 1 |
-AXMenuListValueChanged 0 |
+struct AXNodeData { |
+ int32_t id; |
+ AXRole role; |
+ ... |
+ std::vector<std::pair<AXStringAttribute, std::string>> string_attributes; |
+ std::vector<std::pair<AXIntAttribute, int32_t>> int_attributes; |
+ ... |
+} |
``` |
-An example of a command used to change the selection from "Option 1" to "Option |
-3" might be: |
+So if a text field has a placeholder attribute, we can store |
+that by adding an entry to `string_attributes` with an attribute |
+of ui::AX_ATTR_PLACEHOLDER and the placeholder string as the value. |
+ |
+### Incremental tree updates |
+ |
+Web pages change frequently. It'd be terribly inefficient to send a |
+new copy of the accessibility tree every time any part of it changes. |
+However, the accessibility tree can change shape in complicated ways - |
+for example, whole subtrees can be reparented dynamically. |
+ |
+Rather than writing code to deal with every possible way the |
+accessibility tree could be modified, Chromium has a general-purpose |
+tree serializer class that's designed to send small incremental |
+updates of a tree from one process to another. The tree serializer has |
+just a few requirements: |
+ |
+* Every node in the tree must have a unique integer ID. |
+* The tree must be acyclic. |
+* The tree serializer must be notified when a node's data changes. |
+* The tree serializer must be notified when the list of child IDs of a |
+ node changes. |
+ |
+The tree serializer doesn't know anything about accessibility attributes. |
aboxhall
2016/11/04 17:02:07
Would it be dangerous to point to the specific loc
dmazzoni
2016/11/04 22:09:05
My plan was to cover all of the concepts at the to
|
+It keeps track of the previous state of the tree, and every time the tree |
+structure changes (based on notifications of a node changing or a node's |
+children changing), it walks the tree and builds up an incremental tree |
+update that serializes as few nodes as possible. |
+ |
+In the other process, the Unserialization code applies the incremental |
+tree update atomically. |
+ |
+### Text bounding boxes |
+ |
+One challenge faced by Chromium is that accessibility clients want to be |
+able to query the bounding box of an arbitrary range of text - not necessarily |
+just the current cursor position or selection. As discussed above, it's |
+not possible to block Chromium's main browser process while waiting for this |
+information from Blink, so instead we cache enough information to satisfy these |
+queries in the accessibility tree. |
+ |
+To compactly store the bounding box of every character on the page, we |
+split the text into *inline text boxes*, sometimes called *text runs*. |
+For example, in a typical paragraph, each line of text would be its own |
+inline text box. In general, an inline text box or text run contians a |
+sequence of text characters that are all oriented in the same direction, |
+in a line, with the same font, size, and style. |
+ |
+Each inline text box stores its own bounding box, and then the relative |
+x-coordinate of each character in its text (assuming left-to-right). |
+From that it's possible to compute the bounding box |
+of any individual character. |
+ |
+The inline text boxes are part of Chromium's internal accessibility tree. |
+They're used purely internally and aren't ever exposed directly via any |
+native accessibility APIs. |
+ |
+For example, suppose that a document contains a text field with the text |
+"Hello world", but the field is narrow, so "Hello" is on the first line and |
+"World" is on the second line. Internally Chromium's accessibility tree |
+might look like this: |
``` |
-AccessibilityMsg_DoDefaultAction 3 |
+staticText location=(8, 8) size=(38, 36) name='Hello world' |
+ inlineTextBox location=(0, 0) size=(36, 18) name='Hello ' characterOffsets=12,19,23,28,36 |
+ inlineTextBox location=(0, 18) size=(38, 18) name='world' characterOffsets=12,20,25,29,37 |
``` |
-All three concepts are handled at several layers in Chromium. |
+### Scrolling, transformations, and animation |
+ |
+Native accessibility APIs typically want the bounding box of every element in the |
+tree, either in window coordinates or global screen coordinates. If we |
+stored the global screen coordinates for every node, we'd be constantly |
+re-serializing the whole tree every time the user scrolls or drags the |
+window. |
+ |
+Instead, we store the bounding box of each node in the accessibility tree |
+relative to its *offset container*, which can be any ancestor. If no offset |
+container is specified, it's assumed to be the root of the tree. |
+ |
+In addition, any offset container can contain scroll offsets, which can be |
+used to scroll the bounding boxes of anything in that subtree. |
+ |
+Finally, any offset container can also include an arbitrary 4x4 transformation |
+matrix, which can be used to represent arbitrary 3-D rotations, translations, and |
+scaling, and more. The transformation matrix applies to the whole subtree. |
+ |
+Storing coordinates this way means that any time an object scrolls, moves, or |
+animates its position and scale, only the root of the scrolling or animation |
+needs to post updates to the accessibility tree. Everything in the subtree |
+remains valid relative to that offset container. |
+ |
+Computing the global screen coordinates for an object in the accessibility |
+tree just means walking up its ancestor chain and applying offsets and |
+occasionally multiplying by a 4x4 matrix. |
+ |
+### Site isolation / out-of-process iframes |
+ |
+At one point in time, all of the content of a single Tab or other web view |
+was contained in the same Blink process, and it was possible to serialize |
+the accessibility tree for a whole frame tree in a single pass. |
+ |
+Today the situation is a bit more complicated, as Chromium supports |
+out-of-process iframes. (It also supports "browser plugins" such as |
+the `<webview>` tag in Chrome packaged apps, which embeds a whole |
+browser inside a browser, but for the purposes of accessibility this |
+is handled the same as frames.) |
+ |
+Rather than a mix of in-process and out-of-process frames that are handled |
+differently, Chromium builds a separate independent accessibility tree |
+for each frame. Each frame gets its own tree ID, and it keeps track of |
+the tree ID of its parent frame (if any) and any child frames. |
+ |
+In Chrome's main browser process, the accessibility trees for each frame |
+are cached separately, and when an accessibility client (assistive |
+technology) walks the accessibility tree, Chromium dynamically composes |
+all of the frames into a single virtual accessibility tree on the fly, |
+using those aforementioned tree IDs. |
+ |
+The node IDs for accessibility trees only need to be unique within a |
+single frame. Where necessary, separate unique IDs are used within |
+Chrome's main browser process. In Chromium accessibility, a "node ID" |
+always means that ID that's only unique within a frame, and a "unique ID" |
+means an ID that's globally unique. |
## Blink |
@@ -106,7 +434,7 @@ usually forwarded to [BrowserAccessibilityManager] which is responsible for: |
On Chrome OS, RenderFrameHostImpl does not route events to |
BrowserAccessibilityManager at all, since there is no platform screenreader |
-outside Chrome to integrate with. |
+outside Chromium to integrate with. |
## Views |