TextCollectionAttachment
An array of TextCollectionAttachment objects to be labeled.
Video Support
The video attachment should have content
that is a link. Supported media types are listed on the MDN Web Docs.
HTML Support in TextCollection Attachments:
When creating a task in TextCollection, customers are able to pass Markdown as the string content. Markdown also allows the use of HTML tags within the Markdown syntax.
However, to ensure the security of the TextCollection platform, we sanitize all HTML tags passed within the Markdown syntax using the HTML-sanitize JavaScript package. This package removes all tags except for the specific set of allowed HTML tags mentioned on the table to the right.
By allowing only these specific HTML tags to be passed through the string, we ensure that the content displayed to the tasker is secure and adheres to our standards. Any HTML tags that are not included in the list of allowed tags will be removed from the string during the sanitization process.
By sanitizing the HTML tags, we prevent any potential security risks that could arise from the use of unauthorized HTML tags, and maintain a high level of security on our platform.
Parameter | Type | Description |
---|---|---|
type* | string | One of |
content* | string | Content or link to relevant file. |
forms | array | Array of |
HTML tags allowed:
Content sectioning | 'address', 'article', 'aside', 'footer', 'header','h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'hgroup', 'main', 'nav', 'section'. |
---|---|
Text content | 'blockquote', 'dd', 'div', 'dl', 'dt', 'figcaption', 'figure', 'hr', 'li', 'main', 'ol', 'p', 'pre', 'ul', |
Inline text semantics | 'a', 'abbr', 'b', 'bdi', 'bdo', 'br', 'cite', 'code', 'data', 'dfn', 'em', 'i', 'kbd', 'mark', 'q', 'rb', 'rp', 'rt', 'rtc', 'ruby', 's', 'samp', 'small', 'span', 'strong', 'sub', 'sup', 'time', 'u', 'var' |
Table content | 'caption', 'col', 'colgroup', 'table', 'tbody', 'td', 'tfoot', 'th', 'thead', 'tr' |
Additional Tags | 'img', 'iframe' |
UnitField
UnitField
objects define simple components for data collection.
Beta: Conditional Fields
Sometimes a field should only be presented if specific choices are selected for other fields. In these cases, you can specify the conditions — the dependent questions and corresponding sets of choices.
The conditions
property should have the following structure: an array of objects, which define one set of conditions allowing the field to be shown. The operators AND ({ }
), OR ([ ]
), and NOT (not
) are supported, so you could specify an arbitrary set of fields and choices. Each set may contain objects or arrays with the following:
Key: the
field_id
of the dependent fieldValue: an object specifying the desired choices for the dependent field.
For example conditions, please check out the code on the right.
Conditions currently only work with dependent fields of type CategoryField. It is valid syntax on other fields, but may raise errors or undefined behavior.
Parameter | Type | Default | Description |
---|---|---|---|
type* | string | One of | |
field_id* | string | A unique identifier for the field, which should not change among tasks within a project. | |
title* | string | Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project. Must not be an empty string. | |
description | string | undefined | A brief description about what the response should be. This may change among tasks within a project. |
hint | string | undefined | Longer explanation of why the field exists and how it should be used. Renders as a tooltip. |
required | boolean | false | Determines whether or not a response for this field is required. |
min_responses_required | integer | 1 | The minimum number of separate annotations allowed for this field. Must be larger than 0. |
max_responses_required | integer | 1 | The maximum number of separate annotations allowed for this field. Must be larger than or equal to |
conditions | array_object | undefined | A set of conditions which must be satisfied for this field to be shown. |
Additional Fields | See the TextField, BooleanField, NumberField, DatetimeField, and CategoryField sections. |
Example
// Example of UnitField with conditions
{
type: "category",
field_id: "occlusion",
title: "Is there occlusion in the image?",
choices: [{label: 'None', value: '0' },
{label: 'A little', value: '1'},
{label: 'A lot', value: '2'}],
conditions: [{}],
},
{
type: "category",
field_id: "occlusion_detail",
title: "What is the cause of the occlusion?",
choices: [{label: 'Rain', value: 'rain'},
{label: 'Shadow', value: 'shadow'}],
conditions: [{
occlusion: ['1', '2'], // show if 1 or 2 are selected
// equivalently {not: [[], ['0']}
// equivalently [{not: []}, {not: ['0']}]
// equivalently [['1'],['2']]
}],
},
{
type: "text",
field_id: "a_lot_of_shadow",
title: "Please describe why there is so much shadow.",
conditions: [{
// show if 2 and shadow are selected in their respective fields
occlusion: ['2'],
occlusion_detail: ['shadow'],
}],
},
TextField
Subclass of UnitField and returns a string
response.
Example
{
"type": "text",
"field_id": "summary",
"title": "Summary",
"min_responses_required": 1,
"max_responses_required": 3,
"max_characters": 500,
"required": true
}
Parameter | Type | Default | Description |
---|---|---|---|
max_characters | integer |
| The maximum number of characters allowed in the field. |
BooleanField
Subclass of UnitField and returns a boolean
response. Has no additional parameters.
Example
{
"type": "boolean",
"field_id": "availability",
"title": "Item Availability",
"description": "Choose true if available."
}
NumberField
Subclass of UnitField and returns a string
response based on the annotated number.
Example
{
"type": "number",
"field_id": "item_price",
"title": "Item Price",
"description": "Leave empty if not applicable.",
"required": false,
"use_slider": true,
"min": 0,
"max": 100
}
Parameter | Type | Default | Description |
---|---|---|---|
use_slider | boolean |
| Set to |
min | float |
| Sets the minimum value of the slider. |
max | float |
| Sets the maximum value of the slider. |
step | float |
| Sets the step value of the slider. |
DatetimeField
Subclass of UnitField and returns a DatetimeAnnotation
response.
Definition: DatetimeSpec
An enum that consists of year
, month
, day
, hour
, and minute
.
Definition: DatetimeAnnotation
An interface that contains optional number fields including year
, month
, day
, hour
, and minute
.
Example
{
"type": "datetime",
"field_id": "release_date",
"title": "Date of Product Release",
"description": "Leave empty if not applicable.",
"include": ["year", "month", "day"],
"defaults": {
"year": 2021,
"month": 4,
"day": 13
}
}
Parameter | Parameter | Default | Description |
---|---|---|---|
include* | array | An array of | |
defaults | DatetimeAnnotation |
| Default value for the return value. |
CategoryField
Subclass of UnitField and returns an array of selected CategoryChoiceValue
elements in its response.
CategoryChoice
elements with subchoices are only used for navigation. The only selectable CategoryChoice
elements are those with no subchoices.
Example
{
"type": "category",
"field_id": "genre",
"title": "Select all genres that apply.",
"choices": [
{
"label": "Hip-Hop/Rap",
"value": "hip-hop-rap",
"hint":
"It consists of a stylized rhythmic music that commonly accompanies rapping, a rhythmic and rhyming speech that is chanted.",
"subchoices": [
{ "label": "Dirty South", "value": "dirty-south" },
{ "label": "Industrial Hip Hop", "value": "industrial-hip-hop" },
{ "label": "Nerdcore", "value": "nerdcore" },
{ "label": "Rap", "value": "rap" },
]
},
{
"label": "R&B/Soul",
"value": "rb-soul",
"subchoices": [
{ "label": "Disco", "value": "disco" },
{ "label": "Funk", "value": "funk" },
{ "label": "Motown", "value": "motown" },
]
},
],
"min_choices": 1,
"max_choices": 5
}
Parameter | Type | Default | Description |
---|---|---|---|
choices* | array | An array of | |
min_choices | integer |
| Minimum number of choices to select. |
max_choices | interer |
| Maximum number of choices to select. If this value is greater than 1, the form renders a checkbox. Otherwise, it renders a radio button. |
CategoryChoice
Parameter | Type | Default | Description |
---|---|---|---|
label* | string | The label of the choice field. This description may change among tasks within a project. | |
value* | CategoryChoiceValue | The value of the choice field. Must be a | |
hint | string | undefined | An array of |
TimerangeField
Subclass of UnitField.
Example
{
"type": "time_range",
"field_id": "hours",
"title": "Store Hours",
"defaults_seconds": [
28800,
72000
],
"increment_seconds": 300,
"max_responses_required": 2,
"min_responses_required": 0
}
Parameter | Type | Default | Description |
---|---|---|---|
default_seconds* | array | Must have length 2, and be in range [0, 24 * 60 * 60] | |
increment_seconds | number | Must be between 1 and 60 * 60 | |
default_from_field | string | Must be a valid field_id |
SelectField
Subclass of UnitField.
Example
{
"type": "select",
"field_id": "sentiment",
"title": "Sentiment",
"description": "Choose a sentiment that best describes this text",
"required": True,
"choices_from_field": "Options",
}
Parameter | Type | Default | Description |
---|---|---|---|
choices | array | An array of selectable options, | |
choices_from_field | string | Must be a valid field_id |
RankingField
RankingField
objects allow you to define task to rank task attachments.
Returns a list
response with ordered options.
Example
{
"type": "ranking_order",
"field_id": "relevance_ranking",
"title": "Rank titles based on their relevance to the article",
"hint": "From the most relevant to the least one",
"first_label": "Best",
"last_label": "Worst",
"num_items_to_rank": 3
}
Parameter | Type | Default | Description |
---|---|---|---|
title | string |
| A brief description about what the response should be. This may change among tasks within a project. |
hint | string |
| An array of child |
first_label | string |
| Determines whether or not all . |
last_label | string |
| |
num_items_to_rank | integer |
| The number of options required to rank (can be less than number of attachments). |
required | booleanfalse |
| Determines whether or not all |
FormField
FormField
objects allow you to create several mini-forms associated with different attachments. These mini-forms will be populated by the object's child fields.
Returns a dictionary
response with key-value pairs defined by its child fields.
📘Note
FormField
objects can only be located on the top level of thefields
task parameter. If oneFormField
object is used, all the other top-level objects must also beFormField
objects.
Example
{
"type": "form",
"field_id": "form_query",
"title": "Query Intention",
"fields": [
{
"type": "text",
"field_id": "query_intention",
"title": "Query Intention",
"hint": "Please investigate the search links."
},
]
}
Parameter | Type | Default | Description |
---|---|---|---|
type* | string | For | |
field_id* | string | A unique identifier for the field, which should not change among tasks within a project | |
title* | string | Field title to be displayed to taskers. This should be short and singular. This may change among tasks within a project. | |
description | string |
| A brief description about what the response should be. This may change among tasks within a project. |
fields* | array | An array of child UnitField and FieldSet objects. Any FieldSet objects here must have incline set to true |
Text Collection Callback Format
The response
object, which is part of the callback POST request and permanently stored as part of the task object, will have an annotations
field. The annotations
object is a dictionary in which each key is a field_id
defined in the task parameters and each value is the respective annotation for that field.
Each annotation will be of the type defined by its field above. If max_responses_required
is applicable and greater than 1, the annotation will be an array of the type.
📘
See the Callback section for more details about callbacks.
Example
{
"response": {
"annotations": {
"category_name": "Soup", //TextField
"category_items": [ //FieldSet with max_responses_required greater than one
{
"item_name": "Tom Yum Chicken Soup", //TextField
"item_price": "11.79" //NumberField
},
{
"item_name": "Tom Yum Beef Soup", //TextField
"item_price": "11.79" //NumberField
}
],
"category_metadata": { //FieldSet
"gluten_friendly": true, //BooleanField
"labels": [ //TextField with max_responses_required greater than one
"Free Range",
"All Natural"
]
}
}
},
"task_id": "5774cc78b01249ab09f089dd",
"task": {
// populated task for convenience
}
}
Text Collection Hypothesis
When creating a textcollection
task, you can provide prelabels in the hypothesis
field, so that workers don't have to start from scratch to annotate the image.
In order to add pre-labels in a task using hypothesis, you’ll need to provide these in the hypothesis
field of the payload when creating the task. The schema of the hypothesis object must match the schema of the task response.
Verify the task response field schema for the desired task type.
Review your project taxonomy (label names, attribute conditions, annotation types, etc).
Generate pre-labels that are formatted to match the aforementioned schema and taxonomy.
Create a task, including a hypothesis field that contains the pre-labels at the same top-level as other task fields such as project and instructions.
The hypothesis format will largely mirror Scale’s task response format. In this particular task type, annotations
field is mandatory inside the hypothesis object.
The only difference between hypothesis
and the response
format is that inside every field you want to pre-annotate, you'll need to add two more field fields:
type
describes the field type (category, select, text, etc.)field_id
describes the identification given to this field for tracking (field name)
You can find these two fields in your task taxonomy
Note: For Text types fields the response format differs from the other types. For this particular field type, response
field will be an array of a single string instead of an array of arrays containing strings.
task_payload_with_hypothesis
{
...
"batch": "regular_batch_name",
"hypothesis": {
"annotations": {
"(EXAMPLE) Multiple Choice Question": {
"type": "category",
"field_id": "(EXAMPLE) Multiple Choice Question",
"response": [
[
"B"
]
]
}
}
},
...
}
task_taxonomy
{
"fields": [
{
"type": "category",
"field_id": "(EXAMPLE) Multiple Choice Question",
"title": "Which option best fits this task?",
"choices": [
{
"label": "A",
"value": "A"
},
{
"label": "B",
"value": "B"
},
{
"label": "C",
"value": "C"
}
],
"min_choices": 1,
"max_choices": 1,
"description": "Select one of the following. "
}
]
}
task_payload_with_hypothesis_text_field
{
...
"hypothesis": {
"annotations": {
"Product Description": {
"type": "text",
"field_id": "(EXAMPLE) Text Input Field",
"response": [
"Dolore in dolor occaecat deserunt ex in qui non amet est."
]
}
}
}
...
}
NamedEntityRecognitionLabel
NamedEntityRecognitionLabel
objects define the taxonomy of labels to use to annotate spans of text.
NamedEntityRecognitionAttribute
objects define form fields for individual annotations.
AttributeSelectOption
objects define possible values for select attributes.
NamedEntityRecognitionLabel
Parameter | Type | Default | Description |
---|---|---|---|
name* | string | A unique identifier for this label. | |
display_name | string |
| An alias for this label to display to taskers. |
description | string |
| A description of what this label should represent. Displayed to taskers to improve quality. |
children | array_object |
| An array of |
attributes (optional) | object |
|
NamedEntityRecognitionAttribute
Parameter | Type | Description |
---|---|---|
type | string | Only 'select' for now. |
options | array_object | List of select option objects. |
display_name | string | Optional display name. |
description | string | Optional description. |
AttributeSelectOption
Parameter | Type | Description |
---|---|---|
value | string | The value that will show up in the response if this option is selected. |
display_name | string | Optional display name if different from the value. |
NamedEntityRecognitionRelationshipDefinition
NamedEntityRecognitionRelationshipDefinition
objects specify the types of relationship that can exist between two text spans.
A relationship can either be named or unnamed. A named relationship is useful if you need to distinguish between multiple types of relationship that could exist between the same two text spans. For instance, if you're annotating a description of someone's family history, you might want to distinguish a "child of" relationship from a "sibling of" relationship.
A task can only specify one type of relationship. Either all the relationships in a task must be named, or all must be unnamed.
Parameter | Type | Default | Description |
---|---|---|---|
name | string | A unique identifier for this type of relationship. Required for named relationships; disallowed for unnamed relationships. | |
display_name | string | A description for this relationship to display to taskers. Should be able to be used to construct a short phrase describing the relationship. For example, a relationship between two text spans "A" and "B" with | |
is_directed | boolean | false | A field indicating whether the directionality of this relationship matters. For example, a "is parent of" relationship would likely be directed, whereas a "is sibling of" relationship would likely not be directed. Optional for named relationships; disallowed for unnamed relationships. |
source_label | string | A string referencing the | |
target_label | string | A string referencing the |
Named Entity Recognition Callback Format
The response
object is part of the callback POST request and is permanently stored as part of the task object.
NamedEntityRecognitionResponse
The structure of a response object for named entity recognition consists of two arrays: one for entity annotations and another for relationships between these entities.
NamedEntityRecognitionAnnotation
The format for an individual entity annotation within the named entity recognition response, detailing the unique identifier, position, and content of the recognized text span, as well as its label and any optional attributes.
NamedEntityRecognitionRelationship
In tasks with undirected relationships, the source_ref
and target_ref
fields are interchangeable. In tasks with links that do not have relationship names, the name
field will be left blank.
Example
{
"annotations": [
{
"id": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
"start": 10,
"end": 17,
"text": "Alex Wang",
"label": "person",
},
{
"id": "a76da53e-4ebd-4466-aed7-80db6fb98329",
"start": 22,
"end": 31,
"text": "Transform",
"label": "conference",
}
],
"relationships": [
{
"id": "ade8e9e9-ef9c-4fc7-9517-62d79a15c1cb",
"source_ref": "b86c22a3-1f7c-4be2-bb8f-899ee9324c0b",
"target_ref": "a76da53e-4ebd-4466-aed7-80db6fb98329",
"name": "speaker_at",
}
]
}
NamedEntityRecognitionResponse
Field | Type | Description |
---|---|---|
annotations | object array | List of |
relationships | object array | List of |
NamedEntityRecognitionAnnotation
Field | Type | Description |
---|---|---|
id | string | Unique identifier. |
start | number | Start index of the text span. |
end | number | End index of the text span. |
text | string | Text of the text span. |
label | string | References the |
attributes (optional) | object | The keys of the object reference keys of the |
NamedEntityRecognitionRelationship
Field | Type | Description |
---|---|---|
id | string | Unique identifier. |
source_ref | string | References the |
target_ref | string | References the |
name (optional) | string | References the |