Monday 26 May 2014

Smarter Web scraping from Connotate

Connotate has significantly upgraded its technology with the introduction of Connotate4, which, the company explains, simplifies and streamlines the “Webdata” extraction process and ensures full coverage of a website. It adds that the key component of Connotate4 is a custom browser that uses the Webkit engine powering such browsers as Safari and Chrome.

Connotate states its core technology is based on visual abstraction techniques that allow machines to view Web pages as humans do, enabling high-volume extraction of data from Web pages to be automated through a point-and-click interface. Further, it says, because “agents” are not relying on HTML code to find the data to extract, they can easily adjust to moderate site changes without breaking.

In addition to the custom browser at the center of Connotate4, the company highlights the following capabilities:

    Inline data transformation within the agent development process is a powerful new capability that will ease data integration and customization.

    Enhanced change detection with highlighting can be requested during the agent development process via a point-and-click checkbox, enabling highlighted change detection that is illustrated at the character, word or phrase level.

    Parallel extraction tasks make it faster to complete tasks, allowing even more scalability for even larger extractions.

    Build-and-expand capabilities turn the act of reusing a single agent for related extraction tasks a one-click event, allowing for faster Agent creation.

    A simplified user interface enables simplified and faster Agent development.

The new release allows Connotate’s intelligent extraction Agents to access about 95 percent of Webdata. And the adaptive platform can quickly accommodate new Web properties and technologies as they emerge, providing the ability to scale far beyond the competitive landscape. Existing customers of Connotate’s hosted solution will not be affected by the introduction of this new platform. On-premises customers will be migrated on an as-needed basis.

Source: http://www.kmworld.com/Articles/News/News/Smarter-Web-scraping-from-Connotate-96637.aspx

Monday 19 May 2014

Five tips to get started with big data

Everyone seems to be talking about "big data" these days. Do you wonder what you’re missing out on? Let’s take a look at how you can get started with Big Data.

    Learn what it is, and what it is not. While we are all comfortable with the concept of data, why the emphasis on "big" data now? The world of data has changed. The speed at which data is produced and often is required to be consumed is nearly at the speed of thought. With data from sensors, geo-spatial tracked data and the discourse of human conversation being captured in social media today, our heads are left spinning with the amount of data being collected. And this is not just facts about dollars and cents or counts of activity. These are complex data points, often verbose text data that must be interpreted to identify the relevant meanings to your business. Big data simply means that you have a lot of data, and it is quite complex to use.

    Get with the program. Organizations can no longer stick their collective heads in the sand about the amount of data they must handle.

    Find the right people. Notice people come before tools. Data analysis tools are often touted as the “magic solution” to big data. While they will help you get there, you must have the right people that understand business strategy and data analysis (including statistical analysis, business modeling, and data mining).

    Find the right tools. The centralized IT group cannot keep up with the business’s demands for data and analytics. Self-service BI and analytics is the name of the game here. You want tools with features such as in-memory analytics that provide high-powered analysis on desktop-sized machines that your analysts can use.

    Be an IT-hugger. So often, once the business users and analysts get their hands on high-powered tools, they decide that IT is no longer needed. This could not be further from the truth. Often, IT has developed complex solutions to integrating data from large and diverse systems, providing standard methods for retrieving the data you need. In addition, IT is trained to plan for critical elements of your environment such as security and scalability. Figure out what the business needs, develop solid business rules around these requirements and learn from IT all you can.

Not so hard, right? Now, the next step? Go analyze. Go figure out how to turn your good company into a great company with the power of analytics.

Source: http://www.bizjournals.com/nashville/blog/2014/05/five-tips-to-get-started-with-big-data.html