Astro RPA User Guide
  • Astro RPA
  • Astro Studio
    • Software requirements
    • Installation and startup
      • Install Chrome extension
      • Install Edge extension
      • Set up RDP
    • Getting started
      • Working with projects
      • Working with processes
      • Working with sequences
      • Working with workflows
      • Working with Code
      • Working with elements
    • Process execution and debugging
    • Testing
    • Script recording
    • Log
    • Variables and arguments
    • Search
      • Search patterns
    • Release control
    • Traffic record
    • UI inspector
    • SAP Inspector
    • Mobile devices
    • Studio settings
    • Panel management
    • Robot
      • Robot editions
    • RDP
    • Tools
      • Import
      • OCR pattern editor
      • Dialog designer
  • Elements
    • Base elements
      • Assistant
        • Show hint
      • Browser
        • Open browser
        • Attach browser
        • Text input
        • Element vanish
        • Close browser
        • Mouse click
        • Navigate
        • Get attribute
        • Get list
        • Get text
        • Element exists
        • Read table
        • Get image
        • Select item
        • Set focus
        • Anchor
      • Clipboard
        • Get from clipboard
        • Copy to clipboard
      • Collections
        • Add to array
        • Table to CSV
        • Table filter
      • Cryptography
        • Set Credentials
        • Get Credentials
        • Delete Credentials
      • Data
        • Date/time
          • Current date/time
          • Date diff
          • Date part
          • Date to string
          • Modify date
          • String to date
        • Archiving
          • Create archive
          • Extract archive
        • HTML
          • HTML to object
        • JSON
          • JSON to object
          • Object to JSON
        • Mapping
          • Create mapping
          • Update mapping
        • Strings
          • Cast to string
          • Get substring
          • Regular expression
          • Replace substring
          • Split string
          • String length
          • Substring exists
          • Trim string
        • XML
          • XML to object
          • XPath query
          • Object to XML
        • Data tables
          • Add column
          • Add row
          • Clean table
          • Create table
          • Merge tables
          • Remove column
          • Remove row
          • Sort table
      • Database
        • Connect
        • Execute query
      • Desktop
        • Activate window
        • Anchor
        • Attach application
        • Close application
        • Create desktop video
        • Drag and drop
        • Element exists
        • Element vanish
        • Get list
        • Get processes
        • Get text
        • Hot-key simulation
        • Kill application
        • Maximize window
        • Minimize window
        • Mouse click
        • Read table
        • Restore window
        • Select item
        • Set focus
        • Start application
        • Take screenshot
        • Text input
        • Type simulate
      • Dialogs
        • Add to log
        • Comment
        • Message box
        • Input dialog
        • Beep
        • Custom input
      • E-mail
        • Lotus Notes
          • Attach Lotus Notes
          • Delete mail
          • Move mail
          • Read mail
          • Send message
        • Move to folder (IMAP)
        • Receive mail (IMAP)
        • Receive mail POP3
        • Send message (SMTP)
      • File system
        • Append line
        • Write file
        • Copy file
        • Move file
        • File search
        • Create directory
        • Create file
        • File/folder exists
        • Delete file/folder
        • Read file
      • Flow control
        • If-Else
        • Switch
        • Try-Catch
        • Exit sequence
        • Break
        • Comment out
        • Throw
        • Wait
        • Parallel threads
        • Sequence
        • Assign
        • Continue
        • Process link
        • Do-While
        • ForEach
        • While
      • Google Sheets
        • Google Sheets document
        • Write range
        • Read range
      • Message queues
        • ActiveMQ
          • Send message
          • ActiveMQ
        • Kafka
          • Send message
          • Read messages
      • MS Excel
        • Run macro
        • Excel workbook
        • Cell input
        • Select range
        • Append range
        • Calculate formulas
        • Sort range
        • Get sheets
        • Save workbook
        • Delete range
        • Filter range
        • Read range
      • MS Outlook
        • Outlook
        • Close Outlook
        • Send message
        • Read address book
        • Read mail
      • MS Word
        • Add table row
        • Copy to clipboard
        • Delete text
        • Export document
        • Find text
        • Get text
        • Insert image
        • Insert table
        • Read table
        • Replace text
        • Save document
        • Select range
        • Text input
        • Word document
        • Write table cell
      • Network
        • FTP
          • Send to FTP
          • Download FTP file
          • List FTP files
        • Terminal server
          • Connect
          • Execute script
          • Disconnect
        • Web request
      • OCR
        • ABBYY FlexiCapture
          • FlexiCapture server
          • Process documents
        • Dbrain
          • Classify documents
          • Dbrain server
          • Process documents
        • Microsoft OCR
        • Tesseract OCR
        • Yandex Vision OCR
        • Image Vanish
        • Image click
        • Text click
        • Image exists
        • Text recognition
        • Validate document
      • Orchestrator
        • Assets
          • Get Asset
          • Get Credentials
          • Set Asset
          • Set Credentials
        • Process
        • Queues
          • Peek queue
          • Add to queue
      • PDF
        • Add watermark
        • Page count
        • Merge documents
        • Get image
        • Get text
      • Programming
        • C# Script
        • Invoke method
        • JavaScript
        • Python Script
        • Command prompt
        • Power Shell
      • SAP Front end
        • BAPI
          • BAPI connection
          • BAPI function
        • Attach SAP
        • Calendar
        • Check box
        • Combo box
        • Control exists
        • Get text
        • Mouse click
        • Open SAP
        • Radio button
        • Set focus
        • Tab strip
        • Table
        • Text input
        • Tree
      • Smart devices
        • Attach device
        • Press hot-key
        • Type text
        • Start application
        • Tap control
        • Get text
      • Testing
        • Get next local test data
        • Mock
        • Verify expression
        • Verify expression with operator
        • Verify output with operator
      • Workflow
        • Decision
        • Sequence
        • State
        • Workflow
  • Orchestrator
    • Monitoring
    • Automation
      • Tasks
    • Administration
      • Workers
      • Robots
      • Projects
      • Templates
      • Assets
      • Queues
    • Setting
      • Distribute
      • Journal
      • Users
      • Roles
      • Licenses
    • User settings
Powered by GitBook
On this page
  1. Astro Studio
  2. Tools

OCR pattern editor

The OCR pattern editor is a plus-in for the studio and is used for developing patterns for reading scanned documents. To open the editor, click Tools -> OCR pattern editor. At the moment, the editor only supports Microsoft OCR when working with patterns (it is built into Windows OS starting from version 8). This tool is still young and will be improved as we receive feedback from users.

Screenshot

The editor is composed of the following components:

  • Main menu

  • The pattern document recognized image

  • Toggle tabs for switching between a pattern document rotations

  • Text groups

  • Properties

The main menu contains buttons

  • Create a pattern

  • Open a pattern

  • Save a pattern

  • Save a pattern as ...

  • Test a pattern

  • Test an image

  • Settings

When creating a new pattern, click the «Create pattern» button, then select the file with the scanned reference document. This file will be recognized by OCR, text blocks will be highlighted in it (marked with red rectangles), and the resulting image along with the blocks will be displayed on the screen.

When working with recognized documents, the main entities are anchor and group. An anchor is a text block that is a reference point when searching for text groups (top, bottom, left, and right of the anchor). A group is a text blocks group that form useful data and are positioned in an unambiguous position relative to the selected anchors, or located in the specified coordinates (proportional to the image size).

To work with block properties, click on the desired block red rectangle. Each text block found has the following properties:

  • Name

  • Text

  • Anchor

  • Case

  • Whitespaces

  • Regular expression

  • Fuzzy

The Name property is mnemonic and is used for further pattern development. The Text property is the basis for searching for anchors in recognized documents. The Anchor property determines whether this block can be considered an anchor. The Case property determines whether the text should be case-sensitive when searching for an anchor. The Space property determines whether to include spaces in the text when searching for an anchor. The Regular expression property allows to enter the regular expression text used during text comparison when searching for an anchor. The Fuzzy property determines whether to use fuzzy logic when comparing texts during anchor search. Anchors can be located on different document rotations, so it makes sense to go through all the rotation tabs (0, 90, 180, 270).

After the pattern initial recognition, it is needed to define anchor blocks (using the Anchor property) and enter the values in the properties that are necessary for searching for anchors in documents that will be recognized by this pattern. After defining anchors, you need to create text groups,to do this, click on an empty string in the groups' panel and enter the new group name. Text blocks are included in the group only if they are completely located in the search coordinate area. Each group has the following properties:

  • Name

  • Rotation

  • Coordinates

  • Anchor - *

The Name property is mnemonic and is used when working with documents from the Recognize form element. The Rotate property determines the document's which rotations to search for this group on(0, 90, 180, 270). The coordinates' property determines which the document area to search for this group (% relative to the document size). The Anchor properties determine which anchors to use when searching for a given group. There are four such anchors in total: left, right, lower and upper, and each has the following properties:

  • Name

  • Upper

  • Left

  • Lower

  • Right

The Name property defines the anchor name used in this search. The Upper, Left, Lower, Right properties define the search area beginning and end offsets relative to the anchor (determined in% relative to the anchor block size).

If you use coordinates, it is important to remember that the processed documents frames and sizes must be identical to the reference and cut off parts, or additional blank areas in the scanned image are unacceptable. It is also important to remember that OCR technology only works with good quality and high-resolution documents. The studio comes with several recognition patterns. These patterns are not final and require customization for your scan formats.

To test the pattern, you can click either the Test pattern button or the Test image button. After testing, the screen will display the image processing results by the pattern.

Screenshot

Testing a pattern differs from testing an image in that in a pattern case, the image attached to the pattern is tested, and in an image case, it will be necessary to select a file for testing**.**

The created pattern must be saved to disk, after which it can be used in the Recognize shape element**.**

To configure OCR, you must click the Settings button, after which, in the window that opens, you can edit the properties:

  • Language (defines the document language)

PreviousImportNextDialog designer

Last updated 2 years ago

The selected OCR language must be installed in the operating system (see item description ).

Microsoft OCR