Import Web Page Task

Enter the content URL of the page you want to scrape, set the scraping rules and periodic scheduling rules. The system will scrape the corresponding parameters from the page according to the establish

Endpoint: POST: {{BaseUrl}}/web-task

Request body:

{
  "contentUrl": "https://yourdomain.com/news/13084793",
  "getDemandFormat":"json",
  "contentType":"list",
  "loopTimeValue":"24",
  "title":1,
  "contentDetails":1,
  "customKeys":[
      {
        "key":"c1",
        "desc":"c1 desc"
      }
  	],
  "loopTimeValue":"24",
  "needPage":"1",
  "depthValue":"3"
}

Response (Data Part):

{
  "num": 0,
  "taskId": "xxxx010"
}

Request:

Parameter

Type

Required

Description

contentUrl

STRING

YES

The target URL to scrape

getDemandFormat

STRING

YES

Output document format:

1 : json

2 : markdown

contentType

STRING

YES

Web page type:

list : list page

detail : detail page

title

INTEGER

YES

Title for the detail page:

1 : get

0 : don't get

contentDetails

INTEGER

YES

Content Details for the detail page:

1 : get

0 : don't get

name

STRING

Column Title for the list page:

1 : get

0 : don't get

link

STRING

Hyperlink for the list page:

1 : get

0 : don't get

publicationTime

STRING

Publication Time for the list page:

1 : get

0 : don't get

customKeys

OBJECT

Custom fields

-key

STRING

Custom field key

-desc

STRING

Custom field description

loopTimeValue

INTEGER

Loop interval duration：

If no need for repeated execution, set to "0";

Unit: hours；

Detail page cannot include this parameter

needPage

STRING

Whether to paginate：

1 : paginate

0 : no pagination

Detail page cannot include this parameter

depthValue

STRING

Crawl depth：

If no need to crawl deeper, set to "0"；

Detail page cannot include this parameter

Response (Data Part):

Parameter

Type

Description

num

INTEGER

Number of files processed

taskId

STRING

Import task ID, which can be used to query task status

PreviousImport Text File NextImport Image Files

Last updated 5 months ago