What web data extraction can and can't do
Big data companies like mine are making a living by solving others problems using data. Big data is such a buzz word that creates a lot of misconceptions in the minds of people. I'll blame it on TechCrunch and other blogs which played a major role in confusing people. Will you?
There are many things big data can and can't do. Educating prospects on what it can do and what it can't do is a tough job.
What web scraping can’t do?
Instead of giving a long boring lecture, I think it is better to quote some examples of what can’t be done. Don't take the literal meaning when I'm using the word impossible. Relate it to something like " It is not possible to do that in a reasonable timeframe and budget.
1) It is impossible to scrape data from all the e-commerce websites in the US.
2) It is impossible to build a single web scraper to scrape all the websites.
3) It is impossible to crawl the whole web to extract only startup data
Why are these things can’t be done ?
The answer is rather simple, machines can’t differentiate one type of data to another unless it can be explained through a programming logic. As of now, there is simply no programming logic to solve these problems.
Read the full story here : What web data extraction can and can't do