What web data extraction can and can't do

What web data extraction can and can't do

Big data companies like mine are making a living by solving others problems using data. Big data is such a buzz word that creates a lot of misconceptions in the minds of people. I'll blame it on TechCrunch and other blogs which played a major role in confusing people. Will you? 

There are many things big data can and can't do. Educating prospects on what it can do and what it can't do is a tough job. 

What web scraping can’t do?

Instead of giving a long boring lecture, I think it is better to quote some examples of what can’t be done. Don't take the literal meaning when I'm using the word impossible. Relate it to something like " It is not possible to do that in a reasonable timeframe and budget.

1) It is impossible to scrape data from all the e-commerce websites in the US.

2) It is impossible to build a single web scraper to scrape all the websites.

3) It is impossible to crawl the whole web to extract only startup data

Why are these things can’t be done ?

The answer is rather simple, machines can’t differentiate one type of data to another unless it can be explained through a programming logic. As of now, there is simply no programming logic to solve these problems.

Read the full story here : What web data extraction can and can't do



To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics