Importing local text processing

If your original data is text, then follow this document for processing.

Select 'From Local Text' and 'New Task' in the 'Import Data Source' section on the dataset details page.

All imports are referred to as tasks, where each task can include multiple similar data entries for processing. In this section, local text tasks can accommodate several text data entries.

1. Create a New Task

On the task creation page, provide a name for your task (up to 20 characters). This name will assist you in quickly locating and managing this task in the task list.

2. Upload Local Text Files:

Click on the upload area, drag and drop the local files you want to import into the upload box, or click the upload button to select files for uploading.
Supported file formats include: .docx、.pdf、.txt、.md、.json.
You can upload up to 50 files per task, with each file not exceeding 200MB in size(in some cases, the CDN we use may only allow file uploads of around 100MB).
Ensure that multiple files uploaded within a task have similar content for parameter extraction and output processing.

3. Task Settings

Task settings are similar to importing tasks from the web and involve configuring fields and content extraction.
Select an appropriate parsing method based on the file type to ensure that the system can correctly process the uploaded text files.

4. Get parameters

Default Field Types:
- Title: The system will attempt to extract title information from the file content.
- Content Details: The system will capture and store the main content of the file.
Custom Fields:
- If you need to categorize specific extracted data into designated fields, you can click on "+ Add Field" and add field names and descriptions.
- For example, if there is a nickname to be extracted from the text, the field name key could be: nickname; field description: user nickname.
- Please use English when adding custom fields; more detailed descriptions lead to more accurate extraction.

5. Output Settings

Once you have configured the fetch parameters, you will need to set up output settings to determine how the extracted data will be saved and exported.

Output Format Settings
- You can choose to save the retrieved data in either JSON or Markdown format. JSON format is more suitable for subsequent API program calls, while Markdown format is better suited for knowledge base data processing.

6. Save or Execute Task Immediately

Save and manually execute the task later:
- If you wish to configure the task first without initiating the scraping process immediately, you can click on the "Save and manually execute the task later" button. The task will be saved in the task list for manual initiation at a later time.
Execute the task immediately:
- If you are prepared to instantly scrape webpage data, click the "Execute the task immediately" button. The system will commence data scraping and import it into the specified dataset.

PreviousFetching webpage data NextImporting local image processing

Last updated 4 months ago

2. Upload Local Text Files:

Click on the upload area, drag and drop the local files you want to import into the upload box, or click the upload button to select files for uploading.

Supported file formats include: .docx、.pdf、.txt、.md、.json.

You can upload up to 50 files per task, with each file not exceeding 200MB in size(in some cases, the CDN we use may only allow file uploads of around 100MB).

Ensure that multiple files uploaded within a task have similar content for parameter extraction and output processing.

4. Get parameters

Default Field Types:

Title: The system will attempt to extract title information from the file content.
Content Details: The system will capture and store the main content of the file.

Custom Fields:

If you need to categorize specific extracted data into designated fields, you can click on "+ Add Field" and add field names and descriptions.
For example, if there is a nickname to be extracted from the text, the field name key could be: nickname; field description: user nickname.
Please use English when adding custom fields; more detailed descriptions lead to more accurate extraction.

5. Output Settings

Once you have configured the fetch parameters, you will need to set up output settings to determine how the extracted data will be saved and exported.

Output Format Settings

You can choose to save the retrieved data in either JSON or Markdown format. JSON format is more suitable for subsequent API program calls, while Markdown format is better suited for knowledge base data processing.

6. Save or Execute Task Immediately

Save and manually execute the task later:

If you wish to configure the task first without initiating the scraping process immediately, you can click on the "Save and manually execute the task later" button. The task will be saved in the task list for manual initiation at a later time.

Execute the task immediately:

If you are prepared to instantly scrape webpage data, click the "Execute the task immediately" button. The system will commence data scraping and import it into the specified dataset.