ScraperAPI Plugin
Web scraping integration via ScraperAPI for structured Google SERP, Google News, and HTML fetching with geotargeting capabilities.
Overview
The @tokenring-ai/scraperapi package provides a ScraperAPI-based web search provider that integrates with the Token Ring AI platform. It enables AI agents and applications to perform web searches, fetch structured Google Search Engine Results Pages (SERP), retrieve Google News articles, and scrape web pages through a unified interface.
This package extends the WebSearchProvider from @tokenring-ai/websearch, offering:
- Google SERP Search: Structured search results with organic listings, knowledge graphs, and related questions
- Google News Search: Structured news articles with sources, thumbnails, dates, and links
- HTML Fetching: Retrieve page content with optional JavaScript rendering and geotargeting
- Error Handling: Robust error management with retry logic via
doFetchWithRetry - Geotargeting: Support for country-specific searches and custom TLDs
- Plugin Integration: Automatic registration with Token Ring applications
Key Features
Core Capabilities
- Web Scraping: Fetch HTML content from any URL with optional rendering
- Google Search: Perform structured searches with comprehensive parameter support
- Google News: Retrieve news articles with metadata and thumbnails
- Geotargeting: Country-specific searches with support for multiple Google TLDs
- Pagination: Support for result pagination and continuation
- Structured Data: JSON responses with consistent response formats
Configuration Options
- API key authentication
- Country code and TLD customization
- JavaScript rendering toggle
- Device type selection (desktop/mobile)
Installation
This package is part of the Token Ring AI monorepo. To use it:
# Install dependencies
bun install
Configuration
Prerequisites
- Sign up for a ScraperAPI account at scraperapi.com
- Obtain your API key
Package Configuration
Add the ScraperAPI configuration to your Token Ring configuration file (e.g., .tokenring/writer-config.js):
export default {
websearch: {
providers: {
scraperapi: {
type: "scraperapi",
apiKey: process.env.SCRAPERAPI_KEY, // Required
countryCode: "us", // Optional (e.g., 'us', 'gb', 'ca')
tld: "com", // Optional (e.g., 'com', 'co.uk')
render: false, // Optional (enable JS rendering)
deviceType: "desktop", // Optional ('desktop' or 'mobile')
}
}
}
};
Configuration Schema
The package uses Zod schema validation for configuration:
const ScraperAPIWebSearchProviderOptionsSchema = z.object({
apiKey: z.string(), // Required
countryCode: z.string().optional(), // Optional
tld: z.string().optional(), // Optional
render: z.boolean().optional(), // Optional
deviceType: z.enum(["desktop", "mobile"]).optional(), // Optional
});
Usage
Basic Usage
import ScraperAPIWebSearchProvider from '@tokenring-ai/scraperapi';
// Initialize the provider
const provider = new ScraperAPIWebSearchProvider({
apiKey: 'your-api-key',
countryCode: 'us',
tld: 'com',
render: false,
deviceType: 'desktop'
});
// Perform Google SERP search
const searchResults = await provider.searchWeb('cherry tomatoes', {
countryCode: 'us'
});
console.log(searchResults.organic);
console.log(searchResults.knowledgeGraph);
console.log(searchResults.relatedSearches);
// Search Google News
const newsResults = await provider.searchNews('Space exploration', {
countryCode: 'us',
num: 10
});
console.log(newsResults.news);
// Fetch page content
const pageContent = await provider.fetchPage('https://example.com', {
render: true,
countryCode: 'gb'
});
console.log(pageContent.markdown);
Google Search Parameters
The package supports comprehensive Google search parameters:
// Search with time filter
const recentResults = await provider.searchWeb('technology news', {
countryCode: 'us',
gl: 'us'
});
// Search with result limit and pagination
const limitedResults = await provider.searchWeb('AI research', {
countryCode: 'us',
num: 20,
start: 10
});
// News search with time range
const weeklyNews = await provider.searchNews('climate change', {
countryCode: 'us',
tbs: 'w' // Past week
});
// Fetch page with JavaScript rendering
const renderedContent = await provider.fetchPage('https://example.com', {
render: true,
countryCode: 'gb'
});
API Reference
ScraperAPIWebSearchProvider
The main provider class that extends WebSearchProvider.
Constructor
new ScraperAPIWebSearchProvider(config: ScraperAPIWebSearchProviderOptions)
Parameters:
apiKey(string, required): Your ScraperAPI API keycountryCode(string, optional): Two-letter ISO country code for geotargetingtld(string, optional): Google TLD (e.g., 'com', 'co.uk')render(boolean, optional): Enable JavaScript renderingdeviceType(string, optional): Device type ('desktop' or 'mobile')
Methods
searchWeb
async searchWeb(query: string, options?: WebSearchProviderOptions): Promise<WebSearchResult>
Performs a Google SERP search and returns structured results.
Parameters:
query(string): Search queryoptions(WebSearchProviderOptions, optional): Search options
Returns: WebSearchResult containing:
organic: Array of organic search resultsknowledgeGraph: Knowledge graph information (if available)relatedSearches: Array of related search queries
searchNews
async searchNews(query: string, options?: WebSearchProviderOptions): Promise<NewsSearchResult>
Performs a Google News search and returns structured results.
Parameters:
query(string): Search queryoptions(WebSearchProviderOptions, optional): Search options
Returns: NewsSearchResult containing:
news: Array of news articles with source, title, description, date, and link
fetchPage
async fetchPage(url: string, options?: WebPageOptions): Promise<WebPageResult>
Fetches HTML content from a URL using ScraperAPI.
Parameters:
url(string): URL to fetchoptions(WebPageOptions, optional): Fetch options
Returns: WebPageResult containing:
markdown: Page content in markdown format
Response Types
Google SERP Response
interface GoogleSerpResponse {
search_information: {
query_displayed: string;
total_results?: number;
time_taken_displayed?: number;
};
knowledge_graph?: {
position: number;
title: string;
image?: string;
description: string;
};
organic_results: Array<{
position: number;
title: string;
snippet: string;
highlights?: string[];
link: string;
displayed_link: string;
}>;
related_questions?: Array<{
question: string;
position: number;
}>;
videos?: Array<{
position: number;
link: string;
title: string;
source: string;
channel: string;
publish_date: string;
thumbnail: string;
duration: string;
}>;
pagination: {
pages_count: number;
current_page: number;
next_page_url?: string;
prev_page_url?: string;
pages: Array<{
page: number;
url: string;
}>;
};
}
Google News Response
interface GoogleNewsResponse {
search_information: {
query_displayed: string;
total_results: number;
time_taken_displayed: number;
};
articles: Array<{
source: string;
thumbnail?: string;
title: string;
description: string;
date: string;
link: string;
}>;
pagination: {
pagesCount: number;
currentPage: number;
nextPageUrl?: string;
prevPageUrl?: string;
pages: Array<{
page: number;
url: string;
}>;
};
}
Error Handling
The package provides standardized error handling with detailed error information:
try {
const results = await provider.searchWeb('query');
} catch (error) {
console.error('Search failed:', error.message);
console.error('Status code:', error.status);
console.error('Hint:', error.hint);
// Handle specific error cases
if (error.status === 429) {
console.log('Rate limit exceeded - consider upgrading your plan');
}
}
Error Types:
- 400: Missing required parameters (url, query, apiKey)
- 429: Rate limit exceeded
- 5xx: Server errors from ScraperAPI
Plugin Integration
The package includes automatic plugin integration:
// plugin.ts
import { TokenRingPlugin } from '@tokenring-ai/app';
import { WebSearchConfigSchema, WebSearchService } from '@tokenring-ai/websearch';
import { z } from 'zod';
import packageJSON from './package.json' with { type: 'json' };
import ScraperAPIWebSearchProvider, { ScraperAPIWebSearchProviderOptionsSchema } from './ScraperAPIWebSearchProvider.ts';
const packageConfigSchema = z.object({
websearch: WebSearchConfigSchema.optional()
});
export default {
name: packageJSON.name,
version: packageJSON.version,
description: packageJSON.description,
install(app, config) {
if (config.websearch) {
app.waitForService(WebSearchService, cdnService => {
for (const name in config.websearch!.providers) {
const provider = config.websearch!.providers[name];
if (provider.type === "scraperapi") {
cdnService.registerProvider(name, new ScraperAPIWebSearchProvider(
ScraperAPIWebSearchProviderOptionsSchema.parse(provider)
));
}
}
});
}
},
config: packageConfigSchema
} satisfies TokenRingPlugin<typeof packageConfigSchema>;
Runtime Dependencies
{
"@tokenring-ai/app": "0.2.0",
"@tokenring-ai/chat": "0.2.0",
"@tokenring-ai/agent": "0.2.0",
"@tokenring-ai/websearch": "0.2.0",
"@tokenring-ai/utility": "0.2.0"
}
Development Dependencies
vitest: Testing frameworktypescript: Type checking
Testing
Run the test suite:
# Run all tests
bun run test
# Run tests in watch mode
bun run test:watch
# Run tests with coverage
bun run test:coverage
Package Structure
pkg/scraperapi/
├── index.ts # Package entry point
├── ScraperAPIWebSearchProvider.ts # Main provider implementation
├── plugin.ts # Token Ring plugin integration
├── package.json # Package metadata and dependencies
└── README.md # This documentation
Rate Limiting and Usage
- ScraperAPI quotas: Respects your plan's rate limits
- 429 handling: Automatic retry with exponential backoff
- Usage tracking: Monitor your usage through ScraperAPI dashboard
- Best practices: Implement caching for repeated queries
Ethical Considerations
- Rate Limits: Respect ScraperAPI's usage limits and quotas
- Robots.txt: The service automatically respects robots.txt directives
- Frequency: Avoid high-frequency scraping; implement caching where appropriate
- Terms of Service: Comply with ScraperAPI's terms of service and target websites' policies
- Geographic Targeting: Use appropriate country codes for targeted content
Troubleshooting
Common Issues
-
Missing API Key:
if (!config?.apiKey) throw new Error("ScraperAPIWebSearchProvider requires apiKey"); -
Rate Limiting (429):
// Check your ScraperAPI plan limits
// Consider implementing caching -
Country Targeting:
// Verify country code is supported
// Use both countryCode and tld parameters -
JavaScript Rendering:
// JS rendering consumes more credits
// Only enable when necessary
Debug Information
Enable detailed logging to troubleshoot issues:
// Check configuration validation
const validatedConfig = ScraperAPIWebSearchProviderOptionsSchema.parse(config);
// Monitor API responses
const response = await provider.searchWeb('test query');
console.log('Response status:', response);
Performance Optimization
- Caching: Implement result caching for repeated queries
- Batch Processing: Group similar requests when possible
- Monitoring: Track usage and performance metrics
- Error Handling: Implement proper error recovery strategies
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Add tests for new functionality (
bun test) - Ensure all tests pass (
bun run test:coverage) - Submit a pull request
Development Guidelines
- Follow TypeScript best practices
- Include comprehensive tests for new features
- Update documentation for API changes
- Respect semantic versioning (major.minor.patch)
- Use proper error handling patterns
- Add JSDoc comments for all public APIs
Code Style
- Use consistent naming conventions
- Implement proper error handling
- Follow existing patterns for plugin integration
- Use Zod schemas for configuration validation
- Include proper TypeScript types
- Add comprehensive documentation
Support
For issues related to:
- ScraperAPI service: Refer to ScraperAPI documentation
- Token Ring integration: Check the main Token Ring repository
- Package bugs: Open an issue in this repository
- Feature requests: Submit a pull request or issue
Getting Help
- Check the troubleshooting section above
- Review the design documents in
design/ - Examine test files for usage examples
- Open an issue with detailed error information
Version: 0.2.0 License: MIT Maintainers: Token Ring AI Team