Importing local text processing
If your original data is text, then follow this document for processing.
Last updated
If your original data is text, then follow this document for processing.
Last updated
Select 'From Local Text' and 'New Task' in the 'Import Data Source' section on the dataset details page.
All imports are referred to as tasks, where each task can include multiple similar data entries for processing. In this section, local text tasks can accommodate several text data entries.
On the task creation page, provide a name for your task (up to 20 characters). This name will assist you in quickly locating and managing this task in the task list.
Click on the upload area, drag and drop the local files you want to import into the upload box, or click the upload button to select files for uploading.
Supported file formats include: .docx
、.pdf
、.txt
、.md
、.json
.
You can upload up to 50 files per task, with each file not exceeding 200MB in size(in some cases, the CDN we use may only allow file uploads of around 100MB).
Ensure that multiple files uploaded within a task have similar content for parameter extraction and output processing.
Task settings are similar to importing tasks from the web and involve configuring fields and content extraction.
Select an appropriate parsing method based on the file type to ensure that the system can correctly process the uploaded text files.
Default Field Types:
Title: The system will attempt to extract title information from the file content.
Content Details: The system will capture and store the main content of the file.
Custom Fields:
If you need to categorize specific extracted data into designated fields, you can click on "+ Add Field" and add field names and descriptions.
For example, if there is a nickname to be extracted from the text, the field name key could be: nickname; field description: user nickname.
Please use English when adding custom fields; more detailed descriptions lead to more accurate extraction.
Once you have configured the fetch parameters, you will need to set up output settings to determine how the extracted data will be saved and exported.
Output Format Settings
You can choose to save the retrieved data in either JSON or Markdown format. JSON format is more suitable for subsequent API program calls, while Markdown format is better suited for knowledge base data processing.
Save and manually execute the task later:
If you wish to configure the task first without initiating the scraping process immediately, you can click on the "Save and manually execute the task later" button. The task will be saved in the task list for manual initiation at a later time.
Execute the task immediately:
If you are prepared to instantly scrape webpage data, click the "Execute the task immediately" button. The system will commence data scraping and import it into the specified dataset.