Textract connector

The Textract connector uses the Amazon Textract service to extract text and metadata from JPEG and PNG files that are less than 5mb in size.

The implementation value of the Textract connector in a service task would be similar to the following:

<bpmn2:serviceTask id="ServiceTask_6thy7kl" implementation="textractConnector.EXTRACT" />

Amazon Web Services (AWS) configuration

The Amazon Textract APIs called are the Detect Document Text API which joins all LINE block objects with a line separator between them and the Analyze Document API with FORM and TABLES analysis.

The Textract connector requires an AWS account to access Amazon features. It also requires an Identity and Access Management (IAM) user to have the textract:DetectDocumentText and textract:AnalyzeDocument permissions.

Input parameters

The following are the parameters that can be passed to the Textract connector as input parameters using the EXTRACT action:

ParameterDescriptionTypeRequired?
nodeIdThe node ID of the image to use from Alfresco Content ServicesString*
uriThe URI of the image to useString*
filesA file uploaded in a process and set as a process variable or uploaded as part of a form or another connectorFile*
outputFormatSets the output format to JSON or txt. The default is JSONStringNo
confidenceLevelThe minimum confidence level to use for a label. The default is 0.75StringNo
timeoutThe timeout period for calling the Textract service in millisecondsIntegerNo

* Only one of these parameters is required.

Output parameters

The following are the parameters that are returned to the process by the Textract connector as output parameters using the EXTRACT action:

ParameterDescriptionType
textract.errorA list of errors if any are caught by the connectorString
awsResultThe result of the image analysis. The format is defined by the input parameter outputFormatJSON

Configuration parameters

Values for configuration parameters that are specific to a connector instance can be set in the modeling application or during application deployment.

The following are the configuration parameters that need to be set for the Textract connector:

ParameterDescriptionRequired?
AWS_ACCESS_KEY_IDThe access key to be used to authenticate against AWSYes
AWS_SECRET_KEYThe secret key to be used to authenticate against AWSYes
AWS_REGIONThe region of AWS to use the Textract service inYes
AWS_S3_BUCKETThe name of the S3 bucket to useYes
ALFRESCO_CONTENT_REPO_BASE_URLThe base URL of the Content Services deployment

© 2023 Alfresco Software, Inc. All Rights Reserved.