Documentation

Contents

How to guides | How to call the Specrom Text Analytics REST API | Detect Language |

I. How to guides

1. Sign up

Specrom Text Analytics resources are available 24-7 in the cloud. Before you can upload your content for analysis, you must sign up to get an access key. Each call to the API requires an access key on the request.

Start with a free Algorithmia account or Contact us for setting up your account. You can create a free account to experiment at no charge.

For Text Analytics, there is a free tier for exploration and evaluation, and billable tiers for production workloads. Algorithmia platform is great for small to medium loads, however, we recommend that for higher loads you request access to our own AWS EC2 cloud instances.

Back to top

2. Get an access key

Once you have signed up for an Algorithmia account, please go to your dashboard, and select My API keys (shown below). You can also generate new API keys from the page.

API keys

Back to top

3. How to call the Specrom Text Analytics REST APIs

Calls to the Text Analytics API are HTTP POST/GET calls, which you can formulate in any language. In this article, we use REST and Postman to demonstrate key concepts.

Each request must include your access key and an HTTP endpoint. The endpoint specifies the region you chose during sign up, the service URL, and a resource used on the request: sentiment, keyphrases, languages, and entities.

Recall that Text Analytics is stateless so there are no data assets to manage. Your text is uploaded, analyzed upon receipt, and results are returned immediately to the calling application.

3.1 Prerequisites

You must have an API account either directly with us or at Algorithmia as shown in sign up section.

You must also have the endpoint and access key that is generated for you when you sign up for an account with us or at Algorithmia.

3.2 JSON Schema Defination

You can currently submit the same documents for all Text Analytics operations: sentiment, key phrase, language detection, and entity identification. (The schema is likely to vary for each analysis in the future.)

  • id: The data type is string, but in practice document IDs tend to be integers. The system uses the IDs you provide to structure the output. Language codes, key phrases, and sentiment scores are generated for each ID in the request.
  • text: Unstructured raw text, up to 5,000 characters. For language detection, text can be expressed in any language. For sentiment analysis, key phrase extraction and entity identification, the text must be in a supported language.
  • document: These are collections of ids, texts.
{ "documents": 
  [
    { "id": "1", "text": "Late in the 21st century, man develops artificial intelligence." },
    { "id": "2", "text": "Don Vito Corleone, head of a mafia family, decides to hand over his empire to his youngest son Michael." },
    { "id": "3", "text": "During WWII, Rick, a nightclub owner in Casablanca, agrees to help his former lover Ilsa and her husband."}
  ]
}

3.3 Set up a request in Postman

The service accepts request up to 1 MB in size. If you are using Postman (or another Web API test tool), set up the endpoint to include the resource you want to use, and provide the access key in a request header. Each operation requires that you append the appropriate resource to the endpoint.

  1. In Postman: - Choose Post as the request type. - Paste in the endpoint you copied from the portal page. - Append a resource.

  2. Set the two request headers:

    • Authorization: your access key, obtained from Algorithmia platform.
    • Content-Type: application/json.
  3. Click Body and choose raw for the format.

  4. Paste in some JSON documents in a format that is valid for the intended analysis. For more information about a particular analysis, see the topics below:

  5. Click Send to submit the request. You can submit up to 100 requests per minute.

In Postman, the response is displayed in the next window down, as a single JSON document, with an item for each document ID provided in the request.

Back to top

4. Detect Language

This Algorithm returns ISO 639-1 codes and normalized probability scores (0-1) for 97 pretrained languages.

Its often useful to detect the language of the text before applying further text processing APIs; for example, if you load thousands of tweets for some futher processing such as name entity recognition, than its important to make sure you only select tweets in langauges your model supports (english, spanish etc), and our NaturalLanguageDetection API can help you do that preprocessing quickly.

The complete list of supported langauges of the pretrained model are af, am, an, ar, as, az, be, bg, bn, br, bs, ca, cs, cy, da, de, dz, el, en, eo, es, et, eu, fa, fi, fo, fr, ga, gl, gu, he, hi, hr, ht, hu, hy, id, is, it, ja, jv, ka, kk, km, kn, ko, ku, ky, la, lb, lo, lt, lv, mg, mk, ml, mn, mr, ms, mt, nb, ne, nl, nn, no, oc, or, pa, pl, ps, pt, qu, ro, ru, rw, se, si, sk, sl, sq, sr, sv, sw, ta, te, th, tl, tr, ug, uk, ur, vi, vo, wa, xh, zh, zu.

4.1 Preparation

You must have JSON documents in this format: id, text

Document size must be under 5,000 characters per document, and you can have up to 1,000 items (IDs) per collection. The collection is submitted in the body of the request. The following is an example of content you might submit for language detection.


{ "documents": [
    { "id": "1", "text": "This is a document written in English." },
    { "id": "2", "text": "Este es un document escrito en Español." },
    { "id": "3", "text": "这是一个用中文写的文件" }]}
    

Step 1: Structure the request

Details on request definition can be found in How to call the [Text Analytics API](). The following points are restated for convenience:

  • Create a POST request. Review the API documentation for this request: Language Detection API

  • Set the HTTP endpoint for language detection, It must include the /languages resource: https://algorithmia.com/algorithms/specrom/NaturalLanguageDetection

  • Set a request header to include the access key for Text Analytics operations. For more information, see How to find endpoints and access keys.

  • In the request body, provide the JSON documents collection you prepared for this analysis

Step 2: Post the request

Analysis is performed upon receipt of the request. The service accepts up to 100 requests per minute. Each request can be a maximum of 1 MB.

Recall that the service is stateless. No data is stored in your account. Results are returned immediately in the response.

Step 3: View results

{
  "documents": [
    {
      "Detected_language": [
        {
          "ISO631-1_language_code": "en",
          "normalized_probability": 0.9999999998851724
        }
      ],
      "id": "1"
    },
    {
      "Detected_language": [
        {
          "ISO631-1_language_code": "es",
          "normalized_probability": 0.9999992791970168
        }
      ],
      "id": "2"
    },
    {
      "Detected_language": [
        {
          "ISO631-1_language_code": "zh",
          "normalized_probability": 1
        }
      ],
      "id": "3"
    }
  ]
}

All POST requests return a JSON formatted response with the IDs and detected properties.

Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data.

Results for the example request should look like the following JSON. Notice that it is one document with multiple items. Output is in English. Language identifiers include a friendly name and a language code in ISO 639-1 format.

A normalized_probability score of 1.0 expresses the highest possible confidence level of the analysis.

Back to top

5. Extract Keywords from Text

The Key Phrase Extraction API evaluates unstructured text, and for each JSON document, returns a list of key phrases.

This capability is useful if you need to quickly identify the main points in a collection of documents. For example, given input text about the movie Godfather “Don Vito Corleone, head of a mafia family, decides to hand over his empire to his youngest son Michael. However, his decision unintentionally puts the lives of his loved ones in grave danger.”, the service returns the main talking points: “family”, “corleone”, “unintentionally”, “son”.

5.1 Preparation

Key phrase extraction works best when you give it bigger chunks of text to work on. This is opposite from sentiment analysis, which performs better on smaller blocks of text. To get the best results from both operations, consider restructuring the inputs accordingly.

You must have JSON documents in this format: documents, id, text.

{ "documents": [
    { "id": "1", "text": "Late in the 21st century, man develops artificial intelligence (referred to simply as the Machines). The Machines take control of Earth. Man fights back." },
    { "id": "2", "text": "Don Vito Corleone, head of a mafia family, decides to hand over his empire to his youngest son Michael. However, his decision unintentionally puts the lives of his loved ones in grave danger." },
    { "id": "3", "text": "During WWII, Rick, a nightclub owner in Casablanca, agrees to help his former lover Ilsa and her husband. Soon, Ilsa's feelings for Rick resurface and she finds herself renewing her love for him."}]}

Document size must be under 5,000 characters per document, and you can have up to 1,000 items (IDs) per collection. The collection is submitted in the body of the request. The following example is an illustration of content you might submit for key phrase extraction.

Step 1: Structure the request

Details on request definition can be found in How to call the Text Analytics API. The following points are restated for convenience:

  • Create a POST request. Review the API documentation for this request: Key Phrases API

  • Set the HTTP endpoint for key phrase extraction. It must include the resource: https://algorithmia.com/algorithms/specrom/ExtractKeywordsfromText

  • Set a request header to include the access key for Text Analytics operations. For more information, see How to find endpoints and access keys.

  • In the request body, provide the JSON documents collection you prepared for this analysis

Step 2: Post the request

Analysis is performed upon receipt of the request. The service accepts up to 100 requests per minute. Each request can be a maximum of 1 MB.

Recall that the service is stateless. No data is stored in your account. Results are returned immediately in the response.

Step 3: View results

All POST requests return a JSON formatted response with the IDs and detected properties.

Output is returned immediately. You can stream the results to an application that accepts JSON or save the output to a file on the local system, and then import it into an application that allows you to sort, search, and manipulate the data.

An example of the output for key phrase extraction is shown next:

{
  "documents": [
    {
      "id": "1",
      "keywords": [
        "man",
        "intelligence"
      ]
    },
    {
      "id": "2",
      "keywords": [
        "son",
        "corleone",
        "family",
        "unintentionally"
      ]
    },
    {
      "id": "3",
      "keywords": [
        "rick",
        "ilsa",
        "soon"
      ]
    }
  ]
}

Back to top

6. Get Sentiments Score from Text

The Sentiment Analysis API evaluates text input and returns a sentiment score for each document, ranging from 0 (negative) to 1 (positive). This capability is useful for detecting positive and negative sentiment in social media, customer reviews, and discussion forums.

6.1 Concepts

Text Analytics uses a machine learning classification algorithm to generate a sentiment score between 0 and 1. Scores closer to 1 indicate positive sentiment, while scores closer to 0 indicate negative sentiment.

Sentiment analysis is performed on the entire document, as opposed to extracting sentiment for a particular entity in the text. In practice, there is a tendency for scoring accuracy to improve when documents contain one or two sentences rather than a large block of text. A score near to 0.5 indicates neutral sentiments.

We have many types of sentiments analysis models pretrained on data for social media, user reviews on ecommerce sites, news stories etc and all of them perform well for datasets similar to the training dataset. We highly reccomend that you contact us to het access to those specific models instead of the general purpose model if you have a well defined use case.

6.2 Preparation

Sentiment analysis produces a higher quality result when you give it smaller chunks of text to work on. This is opposite from key phrase extraction, which performs better on larger blocks of text. To get the best results from both operations, consider restructuring the inputs accordingly.

You must have JSON documents in this format: id, text, and documents.

Document size must be under 5,000 characters per document, and you can have up to 1,000 items (IDs) per collection. The collection is submitted in the body of the request. The following is an example of content you might submit for sentiment analysis.

{ "documents": [
    { "id": "1", "text": "Late in the 21st century, man develops artificial intelligence (referred to simply as the Machines). The Machines take control of Earth. Man fights back." },
    { "id": "2", "text": "During WWII, Rick, a nightclub owner in Casablanca, agrees to help his former lover Ilsa and her husband. Soon, Ilsa's feelings for Rick resurface and she finds herself renewing her love for him."}]}

Step 1: Structure the request

Details on request definition can be found in How to call the Text Analytics API. The following points are restated for convenience:

Create a POST request. Review the API documentation for this request: Sentiment Analysis API

Set the HTTP endpoint for sentiment analysis. It must include the /sentiment resource: https://api.algorithmia.com/v1/algo/specrom/GetSentimentsScorefromText/0.1.1

Set a request header to include the access key for Text Analytics operations. For more information, see How to find endpoints and access keys.

In the request body, provide the JSON documents collection you prepared for this analysis.

Step 2: Post the request

Analysis is performed upon receipt of the request. The service accepts up to 100 requests per minute. Each request can be a maximum of 1 MB.

Recall that the service is stateless. No data is stored in your account. Results are returned immediately in the response.

Step 3: View results

The sentiment analyzer classifies text as predominantly positive or negative, assigning a score in the range of 0 to 1. Values close to 0.5 are neutral or indeterminate. A score of 0.5 indicates neutrality.

{
  "documents": [
    {
      "id": "1",
      "sentiments_score": 0.41
    },
    {
      "id": "2",
      "sentiments_score": 0.625
    }
  ]
}

Back to top