This section explains the Screen Element Detection functionality and its settings.
Screen Element Detection functionality
The screen element detection step checks whether buttons, text input fields, or other screen elements that match specific criteria appear on the screen.
You can specify what happens when the element is found or not found.
By default, the action is set to tap the center of the detected UI element when detection succeeds.
Screen Element Detection Settings

Target Element
You can configure the detection target either when adding the step or by tapping the ✏️ icon in the step editor.

When you open the detection target tool, the currently visible screen elements will be highlighted with red boxes. Tap the one you want to detect.
The tool includes two modes:
- Selection Mode: Tap a highlighted element to set it as the detection target.
- Operation Mode: Allows you to interact with the foreground app or switch apps while keeping the red highlight overlay visible.
By tapping “Settings”, you can also view and modify the current detection criteria.
The detection target includes the following fields:
Field | Description |
---|---|
App Name | The identifier (package name) of the app that contains the target screen element. |
Element Type | The type of screen element to detect. |
Internal ID | The internal ID (viewId) assigned to the screen element. This can be used when defined by the app developer. |
Text Content | The text displayed on the screen element. |
Contained Text | Text that is displayed inside the specified element. |
Description | The content description assigned to the screen element. This may differ from the visually displayed text. |
Element Position | The rectangular area (bounding box) representing the position and size of the element on the screen. |
Element Structure | A number representing the order of the element in the screen hierarchy, indicating its position within the structure. |
Detection Conditions
- Restrict to tappable elements
When enabled, only tappable elements such as buttons will be detected.
In the detection tool, only tappable elements will be highlighted. Enable this if the post-detection action is a tap.
- Restrict to elements that support text input
When enabled, only elements that allow text input (such as text fields) will be detected.
In the detection tool, only input-capable elements will be highlighted. Enable this if the post-detection action is text input.
- Usage mode per condition
For each field (e.g., text, position), you can set how it should be used during detection.
The detection logic works as follows:
Only elements that satisfy fields marked as Required are considered.
Among them, the element that best matches fields marked as Preferred is selected.
You can choose one of the following for each field:
- Required: If this does not match, the element will be excluded.
- Preferred: If this matches, the element will be prioritized among candidates.
- Ignored: This attribute will not be used for comparison.
- Text matching method
For the Text, Contained Text, and Content Description fields, you can choose how the text should be matched:
- Exact match: Matches elements whose text exactly equals the given text.
- Contains: Matches elements that contain the given text.
- Starts with: Matches elements that start with the given text.
- Ends with: Matches elements that end with the given text.
If Screen Element Detection Does Not Work Properly
If screen element detection is not functioning as expected, please refer to the FAQ section related to screen element detection for troubleshooting.