Supametas.AI
Official website
English
English
  • Welcome to Supametas.AI
  • Cloud Service
    • Supametas.AI Cloud Service
      • Detailed Pricing Comparison
    • Guide
      • Create Dataset
      • Import Metadata
        • Fetching webpage data
        • Importing local text processing
        • Importing local image processing
        • Importing local audio processing
        • Importing local video processing
      • Cleaning data query
      • Export Cleaned Data
      • Dataset Configuration
  • Developer
    • Integration Process
    • Create API Key
    • Standard Request and Response
    • API
      • Import Text File
      • Import Web Page Task
      • Import Image Files
      • Import Audio Files
      • Import Video Files
      • View Import Task Details
      • Delete File Processing Task
    • Webhook
    • Error Code
    • Demo
  • Other
    • Community
    • Technical Support
Powered by GitBook
On this page
  1. Developer
  2. API

Import Web Page Task

Enter the content URL of the page you want to scrape, set the scraping rules and periodic scheduling rules. The system will scrape the corresponding parameters from the page according to the establish

Endpoint: POST: {{BaseUrl}}/web-task

Request body:

{
  "contentUrl": "https://yourdomain.com/news/13084793",
  "getDemandFormat":"json",
  "contentType":"list",
  "loopTimeValue":"24",
  "title":1,
  "contentDetails":1,
  "customKeys":[
      {
        "key":"c1",
        "desc":"c1 desc"
      }
  	],
  "loopTimeValue":"24",
  "needPage":"1",
  "depthValue":"3"
}

Response (Data Part):

{
  "num": 0,
  "taskId": "xxxx010"
}

Request:

Parameter
Type
Required
Description

contentUrl

STRING

YES

The target URL to scrape

getDemandFormat

STRING

YES

Output document format:

1 : json

2 : markdown

contentType

STRING

YES

Web page type:

list : list page

detail : detail page

title

INTEGER

YES

Title for the detail page:

1 : get

0 : don't get

contentDetails

INTEGER

YES

Content Details for the detail page:

1 : get

0 : don't get

name

STRING

NO

Column Title for the list page:

1 : get

0 : don't get

link

STRING

NO

Hyperlink for the list page:

1 : get

0 : don't get

publicationTime

STRING

NO

Publication Time for the list page:

1 : get

0 : don't get

customKeys

OBJECT

NO

Custom fields

-key

STRING

NO

Custom field key

-desc

STRING

NO

Custom field description

loopTimeValue

INTEGER

NO

Loop interval duration:

If no need for repeated execution, set to "0";

Unit: hours;

Detail page cannot include this parameter

needPage

STRING

NO

Whether to paginate:

1 : paginate

0 : no pagination

Detail page cannot include this parameter

depthValue

STRING

NO

Crawl depth:

If no need to crawl deeper, set to "0";

Detail page cannot include this parameter

Response (Data Part):

Parameter
Type
Description

num

INTEGER

Number of files processed

taskId

STRING

Import task ID, which can be used to query task status

PreviousImport Text FileNextImport Image Files

Last updated 4 months ago