Huge number of Sources Sites, in Unknown Formats…
It had a subscribed audience base to which it would send these trade signals on a daily basis.
Scale was the primary attribute in picture because the number of sources involved was huge – in the range of 30,000 + websites.
Availability and velocity of the data and its coverage were important too, considering the dynamism in the finance market.
Artificial Intelligence Scoring
The system was tuned to turn into this scale and adaptively crawl sources based on frequency (very active sources vs. nearly dormant ones).
Alerts were employed in order to notify about dead sources, so crawl results were quickly very accurate and the whole system was more efficient than what the client was requiring. In order to address the low latency requirement of 5-10 minutes, few components were added that could live up to such computation power.
The crawled data was indexed using hosted indexing component and plugged into Artificial Intelligence API, to identify scoring and positive/negative information of the crawled information every few minutes.
Final results were provided to client information system in JSON format.
Benefits for Client
- 100% API availability and continuous data feeds
- Dynamic list of keywords and sources
- 100% API availability and continuous data feeds
- Zero data processing efforts at client’s end
- Scalable infrastructure reduced client’s costs
- Artificial Intelligence and Deep Learning analysis
- Client’s workload only focused on querying internal datasets and running analysis
Scraped Items / Day
Happy Client
Crawled Pages / Day
Get immediate Quotation with your specific Requirements
What Data would you like today?
– Yes, it is really that simple.