Script Specifications - PowerClicker Guide Page

This section explains the specifications of scripts that can be used in PowerClicker.

Overview

PowerClicker uses Lua 5.2 as the scripting language.
In addition to standard Lua functions, PowerClicker provides custom functions (e.g., tap()) to define scenario actions.
When a scenario runs, run() function is called automatically. This function must be implemented in the script.
If a scenario requires screen capture for image detection or text detection, the script must implement the should_capture_screen() function and return true. If false is returned or if should_capture_screen() is missing, the scenario will terminate when attempting to perform image or text detection.
If an error occurs during script scenario execution, the execution will be interrupted. The Scenario Execution Log will display the error details, so please check the log to identify and resolve the issue.

Scenario Execution Log
The print() function in Lua is available for use. Any output from print() will be displayed in the Scenario Execution Log.

PowerClicker Custom Functions

delay(milliSeconds)

Pauses execution for the given time (in milliseconds).

Arguments
- milliSeconds: Time to wait in milliseconds.
Return Value
None

tap(x, y, duration)

Taps on the screen at the given (x, y) coordinates for the specified duration.

Arguments
- x: X-coordinate of the tap position.
- y: Y-coordinate of the tap position.
- duration: Time (in milliseconds) to hold the tap.
Return Value
None

gesture(path, duration)

Performs a gesture operation on the screen using a specified path. This function allows you to execute a single-finger gesture by providing a sequence of keyframes.

Arguments
- path: Table. A list of keyframes defining the gesture path.
  Each keyframe is specified as {x, y, frame}, where:
  - x : X-coordinate of the touch position.
  - y : Y-coordinate of the touch position.
  - frame: Time in milliseconds from the start of the gesture when the touch point reaches this coordinate.
  Example: Swipe from (500, 500) to (100, 100) in 100 milliseconds
```
gesture({{500,500, 0},{100, 100, 100}}, 100)
```
- duration: The execution time for the gesture.
  - If the duration matches the last keyframe’s frame value, the gesture will move at a constant speed.
  - If a smaller duration is specified, the gesture will execute faster than the original timing defined in the keyframes.
Return Value
None

gesture(path_list, duration)

Performs a gesture operation on the screen using multiple paths, allowing for multi-finger gestures (e.g., pinch, zoom, multi-finger swipes)

Arguments
- path_list: Table. A list of path, where each path represents the movement of a single finger. Each path consists of keyframes defined as {x, y, frame}, where:
  - x : X-coordinate of the touch position.
  - y : Y-coordinate of the touch position.
  - frame: Time in milliseconds from the start of the gesture when the touch point reaches this coordinate.
  Example: Pinch Gesture (Zoom Out) Centered Around (500, 500) Over 1000ms
```
gesture({{{450, 450, 0}, {200, 200, 100}}, {{550,550, 0},{800, 800, 100}}}, 1000)
```
- duration: The execution time for the gesture.
  - If the duration matches the last keyframe’s frame value, the gesture will move at a constant speed.
  - If a smaller duration is specified, the gesture will execute faster than the original timing defined in the keyframes.
Return Value
None

input_text(text)

Inputs text into the currently focused text input field.

Arguments
- text: The text to be entered
Return Value
Whether the text input was successful

input_text(element, text)

Inputs text into the specified text input field.

Arguments
- element: The element information of the text input field.
  This can be obtained as the return value of detect_ui.
- text: The text to be entered
Return Value
Whether the text input was successful

detect_image(image_id, require_detection_count, parameters, timeout)

Performs image detection. The image used for detection must be pre-registered in the app.

Image List

When detect_image is used for image detection, the result is returned as a value.
You can use the detected bounding box from the result to perform actions such as tapping the detected image.

Example: Detect Image ID “3” and Tap its Center

local result = detect_image("3", 1, {grayscale = true, error_tolerance = 70, scaling_level = 0, }, 1000)
if result ~= nil then
    local r1 = result.rects[1] -- The first detection rectangle
    -- Tap Detected Image            
    tap(r1[1] + r1[3] * 0.5 ,r1[2] + r1[4] * 0.5, 100)
    delay(100)
else
end

Arguments
- image_id: String. The ID of the image to detect.
- require_detection_count: The minimum number of successful detections required to return a result. If fewer than this number of images are detected, the function returns nil. Use 2 or higher when handling multiple detections (e.g., drag between detected targets).
- parameters: A table specifying detection parameters. If omitted, default values are used.
  - max_detection: Maximum number of detections to return. Must be ≥ require_detection_count. Default is 1.
  - target_rect: Specifies the detection area in rectangle format. Default is fullscreen.
  - grayscale: Enables grayscale detection. Default is true.
  - error_tolerance: Default is 70. Lower values make detection stricter(range: 0–100).
  - scaling_level: Controls scaling for detection. Smaller values shrink the image for efficiency. Scaling factor is 2^(scaling_level). Default is 0.
  For more details, refer to Image Detection Step Settings.
  
  🖼️Image Detection
- timeout: Specifies the detection timeout in milliseconds.
Return Value
The function returns a table containing the detection result. If detection fails, it returns nil.

The table format is { rects = list of detected bounding boxes } . Each detected bounding box is in rectangle format.

detect_text(text, require_detection_count, parameters, timeout)

Performs text detection.

When detect_text is used, the result is returned as a value.
The detected bounding box can be used to perform actions such as tapping the detected text.

Example

Detect the text “Test” or “Scenario”, and tap the center of the detected bounding box.

local result = detect_text({"Test", "Scenario"}, 1, {matches_only_element = false, scaling_level = 0, }, 1000)
if result ~= nil then
    local r1 = result.rects[1] -- The first detection rectangle
    -- Tap Detected Text            
    tap(r1[1] + r1[3] * 0.5 ,r1[2] + r1[4] * 0.5, 100)
    delay(100)
else
end

Arguments
- text: The text to detect. Can be a string or an array of strings.
- require_detection_count: The minimum number of detections required. Returns a value only if this number or more detections are found. Use 2 or more for operations like dragging between detected targets.
- parameters: A table specifying detection parameters. If omitted, default values are used.
  - target_rect: Specifies the detection area in rectangle format. Default is fullscreen.
  - matches_only_element: If true, detection succeeds only when an entire detected element matches. Default is false.
  - scaling_level: Defines scaling level for detection. Default is 0. Lower values shrink the detection area for efficiency. The scale factor is 2^(scaling_level).
  See Text Detection Step Settings for more details.
  
  🔤Text Detection
- timeout: Specifies the detection timeout in milliseconds.

Return Value

A table representing the text detection result. Returns nil if detection fails.

The table format is:

{
    rects = { ... },  -- Array of detected bounding boxes
    text = "detected_text",  -- Detected text string
    rect_map = { ["detected_text"] = { bounding_boxes } }  -- Mapping of detected texts to bounding boxes
}

Since multiple texts can be detected at once, all detected information is included in rect_map.

detect_ui(condition, require_detection_count, settings, timeout)

Performs screen element detection.

When detect_ui is called, it returns the detection result as a return value.
You can use the detection result, including the bounding rectangles and element information, to perform actions such as text input on the detected element.

Example
Detect a text input field that contains the text “Type” and input “Test” into it:

local result = detect_ui(
    {
        class_name = "android.widget.EditText",
        contained_text = "Type"
    }, 
    1, 
    {
        match_settings = {
            class_name = "required",
            contained_text = "required"
        },
        text_match_settings = {
            contained_text = "exact"            
        }
    },
    1000
)
if result ~= nil then
    input_text(result.candidates[1], "Test")
    delay(100)
else
end

Arguments

To specify attributes like app name in the condition table, use the following keys:

Attribute	Attribute Name	Description
App Name	package_name	The identifier (package name) of the app that contains the target screen element.
Element Type	class_name	The type of screen element to detect.
Internal ID	view_id	The internal ID (viewId) assigned to the screen element. This can be used when defined by the app developer.
Text Content	text	The text displayed on the screen element.
Contained Text	contained_text	Text that is displayed inside the specified element.
Description	content_description	The content description assigned to the screen element. This may differ from the visually displayed text.
Element Position	bounds	The rectangular area (bounding box) representing the position and size of the element on the screen.
Element Structure	path	A number representing the order of the element in the screen hierarchy, indicating its position within the structure.

condition: A table defining the target element conditions. Each attribute accepts:
- bounds: rectangle format.
- path: Array of integers
- Others: String
require_detection_count: The minimum number of detections required. Returns a value only if this number or more detections are found. Use 2 or more for operations like dragging between detected targets.

settings: A table to configure matching behavior.

clickable_only: A boolean value that specifies whether to limit detection to clickable elements. Defaults to false if omitted.
editable_only: A boolean value that specifies whether to limit detection to text-editable elements. Defaults to false if omitted.

match_settings: A table that defines how each attribute should be used for matching. Unspecified attributes are ignored. Valid values:

Usage mode	Value	Description
Required	required	If this does not match, the element will be excluded.
Preferred	preferred	If this matches, the element will be prioritized among candidates.
Ignored	ignored	This attribute will not be used for comparison.

text_match_settings: A table defining how to match text-related fields (text, contained_text, content_description). Valid values:

Text matching method	Value	Description
Exact Match	exact	Matches elements whose text exactly equals the given text.
Contains	contains	Matches elements that contain the given text.
Starts With	starts_with	Matches elements that start with the given text.
Ends With	ends_with	Matches elements that end with the given text.

timeout: Specifies the detection timeout in milliseconds.

Return Value
A table representing the result of screen element detection.

The result is in the format { rects: array of bounding rectangles, candidates: array of detected elements }. Both arrays are sorted based on the “preferred” match settings.
The most relevant result is rects[1] and candidates[1].
You can pass candidates[1] to input_text or other related functions.