Import Web Page Task
Enter the content URL of the page you want to scrape, set the scraping rules and periodic scheduling rules. The system will scrape the corresponding parameters from the page according to the establish
Endpoint: POST: {{BaseUrl}}/web-task
Request body:
Response (Data Part):
Request:
contentUrl
STRING
YES
The target URL to scrape
getDemandFormat
STRING
YES
Output document format:
1 : json
2 : markdown
contentType
STRING
YES
Web page type:
list : list page
detail : detail page
title
INTEGER
YES
Title for the detail page:
1 : get
0 : don't get
contentDetails
INTEGER
YES
Content Details for the detail page:
1 : get
0 : don't get
name
STRING
NO
Column Title for the list page:
1 : get
0 : don't get
link
STRING
NO
Hyperlink for the list page:
1 : get
0 : don't get
publicationTime
STRING
NO
Publication Time for the list page:
1 : get
0 : don't get
customKeys
OBJECT
NO
Custom fields
-key
STRING
NO
Custom field key
-desc
STRING
NO
Custom field description
loopTimeValue
INTEGER
NO
Loop interval duration:
If no need for repeated execution, set to "0";
Unit: hours;
Detail page cannot include this parameter
needPage
STRING
NO
Whether to paginate:
1 : paginate
0 : no pagination
Detail page cannot include this parameter
depthValue
STRING
NO
Crawl depth:
If no need to crawl deeper, set to "0";
Detail page cannot include this parameter
Response (Data Part):
num
INTEGER
Number of files processed
taskId
STRING
Import task ID, which can be used to query task status
Last updated