A module for parsing and manipulating HTML using Cheerio.
It provides functions to load HTML from strings, files, and URLs, allowing for easy querying and manipulation of HTML documents.
This module is particularly useful for web scraping, data extraction, and HTML manipulation tasks in Gingee applications.
It abstracts the complexities of working with raw HTML, providing a simple and consistent API for developers.
It leverages the Cheerio library to provide a jQuery-like syntax for traversing and manipulating the HTML structure.
It supports both synchronous and asynchronous operations, making it flexible for various use cases.
- Description:
- A module for parsing and manipulating HTML using Cheerio. It provides functions to load HTML from strings, files, and URLs, allowing for easy querying and manipulation of HTML documents. This module is particularly useful for web scraping, data extraction, and HTML manipulation tasks in Gingee applications. It abstracts the complexities of working with raw HTML, providing a simple and consistent API for developers. It leverages the Cheerio library to provide a jQuery-like syntax for traversing and manipulating the HTML structure. It supports both synchronous and asynchronous operations, making it flexible for various use cases.
Methods
(static) fromFile(scope, filePath) → {Promise.<cheerio.CheerioAPI>}
- Description:
- Reads and parses an HTML file from the secure filesystem. This function allows you to load HTML content from a file, ensuring that the file is read securely within the Gingee environment. It uses the secure file system module to read the file content and then parses it into a Cheerio instance. This is particularly useful for applications that need to manipulate or query HTML files stored in the Gingee filesystem. It abstracts the file reading process, providing a simple interface to work with HTML files.
Example
const $ = await html.fromFile(fs.BOX, 'data/myfile.html');
console.log($('.test').text()); // Outputs the text content of the .test element
Parameters:
Name | Type | Description |
---|---|---|
scope |
string | The scope to operate in (fs.BOX or fs.WEB). |
filePath |
string | The path to the HTML file. |
Throws:
-
If the file cannot be read or parsed.
- Type
- Error
Returns:
A Promise that resolves to the Cheerio instance.
- Type
- Promise.<cheerio.CheerioAPI>
(static) fromFileSync(scope, filePath) → {cheerio.CheerioAPI}
- Description:
- Synchronously reads and parses an HTML file from the secure filesystem. This function allows you to load HTML content from a file in a synchronous manner, ensuring that the file is read securely within the Gingee environment. It uses the secure file system module to read the file content and then parses it into a Cheerio instance. This is particularly useful for applications that need to manipulate or query HTML files stored in the Gingee filesystem in a synchronous context. It abstracts the file reading process, providing a simple interface to work with HTML files.
Example
const $ = html.fromFileSync(fs.BOX, 'data/myfile.html');
console.log($('.test').text()); // Outputs the text content of the .test element
Parameters:
Name | Type | Description |
---|---|---|
scope |
string | The scope to operate in (fs.BOX or fs.WEB). |
filePath |
string | The path to the HTML file. |
Throws:
-
If the file cannot be read or parsed.
- Type
- Error
Returns:
The Cheerio instance for querying.
- Type
- cheerio.CheerioAPI
(static) fromString(htmlString) → {cheerio.CheerioAPI}
- Description:
- Parses an HTML document from a string. This function takes a raw HTML string and returns a Cheerio instance for querying and manipulating the HTML content. It is useful for scenarios where HTML content is dynamically generated or fetched from an external source.
Example
const $ = html.fromString('<div class="test">Hello, World!</div>');
console.log($('.test').text()); // Outputs: Hello, World!
Parameters:
Name | Type | Description |
---|---|---|
htmlString |
string | The raw HTML content to parse. |
Throws:
-
If the input is not a string.
- Type
- Error
Returns:
The Cheerio instance for querying.
- Type
- cheerio.CheerioAPI
(static) fromUrl(url, optionsopt) → {Promise.<cheerio.CheerioAPI>}
- Description:
- Asynchronously fetches and parses an HTML document from a URL. This function retrieves HTML content from a specified URL and returns a Cheerio instance for querying and manipulating the HTML. It is useful for web scraping, data extraction, and any scenario where you need to work with HTML content from the web. It abstracts the complexities of making HTTP requests and parsing the response, providing a simple interface for developers. It ensures that the response is of the correct content type (text/html) before parsing. It supports only url with response of content type - 'text/html'.
Example
const $ = await html.fromUrl('https://example.com');
console.log($('.test').text()); // Outputs the text content of the .test element
Parameters:
Name | Type | Attributes | Description |
---|---|---|---|
url |
string | The URL of the webpage to scrape. | |
options |
object |
<optional> |
Options to be passed for the http call (like request headers). |
Throws:
-
If the response is not of type 'text/html' or if the HTML cannot be parsed.
- Type
- Error
Returns:
A Promise that resolves to the Cheerio instance.
- Type
- Promise.<cheerio.CheerioAPI>