rounded

Eric Reis’ Startup Lessons Learned 3 days, 22 hours ago. by Joseph Kelly

In June the Infochimps attended an event in Austin where Eric Reis gave a talk about the Lean Startup. His ideas inspired further reading, and we have been applying his methodology to making Infochimps.org a sustainable and profitable web service. Here is a breakdown of two of the ideas Eric writes about, which also crossover with Steve Blank’s wonderful book, The 4 Steps to the Epiphany.

1) Product development vs. customer development: In product development the team builds a product that they spec’d out themselves in the early stages. Customer development instead is about developing the market. It is a more holistic approach to building a company and launching a product. And customer development deeply integrates with agile software development. Every code deploy happens for a reason - it is in the service of some story that solves an identified need of the customer or users. How do you know what those needs are? You need to have talked to real customers and users.

Our site is built by two Physics researchers - scientists intimately familiar with the problems of finding and sharing data on the web. They have thought well into the future about how our site can solve these issues. Our feature list is long and describes a killer application. Problems arise, however, when we try to organize and prioritize this list. User testing helps tremendously. Observing how people used the site teaches us which features our users have trouble with and which features we can neglect because they aren’t being used. For example, user testing showed that Search is our most important feature, and that browsing by categories was less important.

Once we started talking to customers, our organizational priorities became much clearer as well. Through talking to Data Suppliers, we learned what features are most important to them on the site, which clauses of our Data Supplier Agreement they had most trouble with, and what the best way is to talk to them about selling their data on our site.

2) What type of market are you in? Steve Blank drives this point home in nearly every chapter of his book. Is your product competing in a market that already exists? If so, does it resegment that market by price or niche? Or is your product creating a new market?

Steve’s clearest example of this is the PDA market. When the first PDA came out, it created a new market. People could now do something they had never been able to do before - that is, sync their computer with a handheld device and work on the go. Marketing and PR efforts had to go towards educating people on these new tools and what they could do, and not talk about product features. Once PDA’s became an existing market with multiple players, marketing and PR efforts had to switch goals, and the conversations became less about the new possibilities and more about individual features, like whether this PDA had 8MB of memory and a 10in screen.

Infochimps has to split our pitch between the existing markets we resegment, and the new markets we create. Data is already sold in the Market Research and Finance industries - our website resegments this existing industry by offering different features and benefits. When we spoke to Zogby we didn’t have to tell them they could sell their data, they already do this. We just had to show them why Infochimps is different and a better solution. Data is not already sold by businesses everywhere, but our website is enabling just this. It is much harder to talk a taxicab company into selling their data - we first have to make the case that this is a profitable possibility. Our job is to educate this mainstream market to the new opportunities they can take advantage of with their data.

The data landscape online, as we see it. Part 1 24 days ago. by Joseph Kelly

Nathan at FlowingData did a wonderful job last week culling 30 great resources from the world wide web for finding data. Yesterday another site launched - Factual, making great resource number 31. We are excited to see a growing number of companies spring up that in turn increase everyone’s access to data. Solving the problems with data online is no small task fit for any single player. It’s a team effort, which we are proud to be a part of.

We thought we would take a minute today to talk about the problems as we see them, and how players within the online data market are choosing to tackle these problems.

The first problems are finding and sharing data. Most of these sources already solve this problem. Socrata and Factual let users upload data onto their sites, and each company’s datasets are easily searchable along with what’s on Data.gov and Numbrary.

There are also other, more technical issues. Swivel, Socrata, Factual, Many Eyes - all of these websites allow users to play around with data live on the site. This opens up costly issues for the hosting company.

1. The data has to live in their platform and reconcile with the whole.

2. Many new datasets are on the order of gigabytes in size.

Whereas datasets on Infochimps can be of any size, format, or shape, their datasets must be in a standard csv/tsv/xls format and are limited to a few hundred megabytes. In reality, statisticians want data in .sas formats, and geographical data comes in .gis formats. Because of the larger size of today’s datasets, tools within a browser will be insufficient to work with and understand the data, and a person’s options for distributing that data are also limited.

Data, especially valuable data, is often proprietary. The owners of that data won’t release it unless there are clear licenses and terms of use. We differ from these other open data players in our commitment to host open data for free and maintain our open data commons for everyone’s benefit, but we will also host licensed data. Unfortunately, open data doesn’t include all of the data in the world. Instead, what we offer organizations is the ability to permit only users that have agreed to a license or paid for access to download their data. As the data marketplace grows, we believe more and more buyers will realize the value proposition in looking for data on Infochimps. Our aim is to give incentive to the long tail of businesses with data gathering dust on hard drives that could otherwise be useful to another person or organization.

Calling all Pollsters 26 days ago. by Joseph Kelly

Carl Bialik, from the WSJ Numbers Guy blog highlighted the recent controversy in the opinion polling industry over Strategic Vision’s choice to not share their polling methodology or raw data.  Pollster.com and FiveThirtyEight have also weighed in on the problem.

Our message to opinion polling firms is this: share or sell your data on Infochimps.org.

Free, public polls can be distributed for free on our site.  If you’d like to charge for the download of your data, set your own price. Your data will live in a place where the whole world can find it, bringing you a larger and broader audience.

Get in touch at upload@infochimps.org.

New site is live 1 month, 15 days ago. by Joseph Kelly

Thanks to everyone new that’s come by. We appreciate the coverage from www.gigaom.com and others. We thought we’d spend a moment to cover what we hope to accomplish from this launch.

With this launch anyone can edit or add datasets to the site. Very soon, uploading will work and we can host and distribute open licensed datasets for free. These are our steps towards building an open data commons.

Additionally, this new site offers a few datasets for sale. These datasets are not ours, but owned by others. We make a commission on the sale of these datasets. An example is the TAKS dataset, which contains all of the test scores data for students in the state of Texas on standardized tests. This dataset has cost one particular researcher $1400 to free from the government coffers, and the format it came in was awful. On Infochimps you can find the same dataset but in a cleaned up format, and for a much lower price - $15.

We consider this marketplace offering an incentive to the world of data gatherers to put their data somewhere others can find it. By letting people charge for their data, we encourage data to come out of the woodwork that might otherwise remain behind closed doors.

We hope you enjoy playing around with the site. If you are excited to send data our way before we get upload working, please get in touch: upload@infochimps.org.

jammin’ to data 1 month, 17 days ago. by maegan

While swinging through the jungle, one of the Infochimps came across this awesome video featured on FlowingData by They Might Be Giants entitled Meet the Elements

Inspired, we decided to start hunting for more awesome data viz music videos

Here’s Radiohead’s House of Cards (uses 3D data)

Another one of our favorites is Royskopp’s Remind Me

Hungry for more? Check out search results from FlowingData on music and video.

API’s and Datasets, living in harmony 1 month, 21 days ago. by Joseph Kelly

The most popular way for one to access data on the web right now is through an API.  API’s provide real-time data, an incredible advantage, and outsourced computation.  These are advantages for the end-user and the developer, where the API provider has to eat the cost of providing such a service.  It is worth it, though, for the provider of the API, as a myriad of services can be conjoined with their primary service.

There are some things an API can’t give you, though.  An API can generally not give you historical data, as with Twitter’s API only letting you go back XX number of tweets.  This means that a service built late in the game may not carry the same value as a service that was built in the early days of an API, as the latter’s data goes back further.

Next, API’s only give you peices.  The scale of questions you can ask is limited by the rate limit and sizes of the peices to return.  Services can’t ask for everything and they may be further limited by the bandwidth and load on the primary API.

The types of questions we’re talking about have to do with the deep structure of the data in question.  One of the reasons our near-complete scrape of Twitter’s friend graph was so popular is because this type of dataset is extremely valuable to network researchers.  The sort of research a graph like Twitter’s makes possible is phenomenal.  Without such a dataset, reserachers are left like the Antarctic exploresrs of the past - slowly crawling new territory, making maps and filling in details only as they come along, peice by peice.

The value API’s provide to the service and the outside world is undeniable.  The problems that API’s leave open can be solved by those services providing complete dumps periodically.  These datasets of complete and historical data will not only let researchers get to work improving their science, but will also allow applications to seed their service with the latest dataset, then begin updating through the API.

Should services share their data on a platform like Infochimps, they not only provide a great service to applications and researchers, but they also reduce their own costs.  The load on their API is lighter as less requests have to be made for data.  And, when researchers have the complete dataset sitting on their hard drive, the API’s provider will not be depended upon for compute time, as the researcher’s local access to the data will make his job much faster and easier.

The two solutions for sharing data are complimentary.  Freebase does a great job at this, we are hoping other services will soon follow suit.

SXSW Data Panels 2 months, 20 days ago. by Joseph Kelly

We are especially excited to announce and share that big data is coming to SXSW.  Here are the panels we like:

Pete Skomoroch of DataWrangling: Petabyte As Platform, Making Big Data Accessible Online - We have long been fans of Pete Skomoroch’s work, this is your chance to hear from him about web applications built on massive datasets.

Our own mrflip: Scraping the Social Web - Flip has done extensive work building massive datasets from social media sites.  Hear him talk about the nuances involved and ask him about best practices.

Michael Driscoll of Dataspora: Cloud Crunching Big Data with HIVE/Hadoop and R and Become a Sexy Data Geek in One Week - Another friend of ours, Michael, will be talking about how to use the right tools to massage and produce results from big datasets, and profiles what you need to do to be a data geek.

Stu Hood of Rackspace: Using Hadoop to Manage a Ton of Data - Hadoop might be the the most important tool to know for working with terabytes and terabytes of data.

Ian Davis of Talis: Set Your Data Free - Talis does great work.  Listen to Ian cover topics very relevant to Infochimps.org’s collection: data copyright and licensing.

Dave Bowker of Designing the News: Engaging Data Visualizations and Infographic Communication - Glad to see some data viz stuff at SXSW.

Casey Caplowe of GOOD: Interactive Infographics - More visualizations, GOOD stuff.

Leave a comment if you know of any other good ones.

Infochimps receives a donation from SmartBear 3 months, 10 days ago. by Joseph Kelly

Smart Bear Software is an Austin-based company whose founder, Jason Cohen, is one of our favorite people.  Jason grew Smart Bear from the ground up, and he has helped the Infochimps team in the past with practical advice.  Jason blogs about marketing and small business at http://blog.asmartbear.com/ and he is well worth reading.  

The Infochimps rely on agile methods for the building of Infochimps.org, a process which can benefit from a code review tool.  Smart Bear’s product, Code Collaborator, is a well-known online peer code review tool that simplifies and expedites code reviews, helping teams produce higher-quality, tested and done code more efficiently.

Smart Bear’s latest promotion offered 5 seats of one of their code review tools for $5.  As a part of this promotion, they selected a start-up company to receive the funds collected from the promotion.  Infochimps won!  Smart Bear has graciously donated $2220 to Infochimps to help our mission of increasing the world’s access to data.  We appreciate their acknowledgment of our work and we know we can put the funds to good use.

To see how we reacted to the news, check out the video below:

Open a banana like a Monkey does 3 months, 27 days ago. by mrflip

Open a Banana like a Monkey - most human primates do it wrong!

To go with open banana here is open banana data:

It's Hot, Damn Hot. So Hot I saw a Chimp in Orange Robes Burst into Flames. 3 months, 29 days ago. by mrflip

It’s been ridiculously hot ridiculously early this year in Austin. A friend passed along this link to a visualization of 100+ degree days over the last 10 years. The author couldn’t find data extending back farther than 2000, but luckily I knew where to look.

I pulled the NCDC weather for Austin from 1948-present (see infochimps.org link for details) and got my Tufte on.

This temperature cycle is hotter than but comparable to the 1950-1965 era. I’ve got no idea if it’s global warming or the peak of a cycle. The fundamental conclusion — that this year so far, 2000 and 2008 were damn hot — stands up well.

(more…)