<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blog.infochimps.org</title>
	<atom:link href="http://blog.infochimps.org/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.infochimps.org</link>
	<description>Organizing huge information resources</description>
	<lastBuildDate>Tue, 09 Mar 2010 16:43:28 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Data Cluster Meetup</title>
		<link>http://blog.infochimps.org/2010/03/09/data-cluster-meetup/</link>
		<comments>http://blog.infochimps.org/2010/03/09/data-cluster-meetup/#comments</comments>
		<pubDate>Tue, 09 Mar 2010 16:43:28 +0000</pubDate>
		<dc:creator>maegan</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=555</guid>
		<description><![CDATA[Austin, TX may be the live music capital of the world, but next weekend Rackspace, together with Infochimps, WolframAlpha, Factual, and knowmore, are putting together an event that will prove it’s not just about the music.
Data geeks from all over the nation will come together to discuss the latest developments in the world of data [...]]]></description>
			<content:encoded><![CDATA[<p>Austin, TX may be the live music capital of the world, but next weekend <a href="http://www.rackspace.com/index.php">Rackspace</a>, together with <a href="http://infochimps.org">Infochimps</a>, <a href="http://www.wolframalpha.com/">WolframAlpha</a>, <a href="http://www.factual.com/">Factual</a>, and knowmore, are putting together an event that will prove it’s not just about the music.</p>
<p>Data geeks from all over the nation will come together to discuss the latest developments in the world of data during birds-of-a-feather sessions, talks and pure and simple mingling (not to mention munching on free food) at the <a href="http://datacluster.infochimps.org">Data Cluster Meetup</a> (Sunday, March 14, 6pm at <a href="http://maps.google.com/maps?f=d&amp;source=s_d&amp;saddr=4th+%26+Trinity,+Austin+TX+78701+(Austin+Convention+Center)&amp;daddr=700+West+6th+Street,+Austin,+TX+78701+(Infochimps+Meetup+at+Opal+Divine%27s)&amp;geocode=FZfPzQEdHpss-ilTRaTcp7VEhjEddh6-FyCCQg%3BFZDizQEdGXUs-ilTkQMtDLVEhjFTQ86cS4ttJw&amp;hl=en&amp;mra=ls&amp;dirflg=r&amp;date=2%2F21%2F10&amp;time=6:00pm&amp;ttype=dep&amp;noexp=0&amp;noal=0&amp;sort=&amp;sll=30.266962,-97.744868&amp;sspn=0.015734,0.014956&amp;ie=UTF8&amp;ll=30.267667,-97.74446&amp;spn=0.015734,0.014956&amp;z=16&amp;start=0/">Opal Divine’s Freehouse</a>).</p>
<p>Not excited yet?  Read on…</p>
<p><strong>Non-relational Database Smackdown</strong><br />
Stu Hood of the Cassandra project will lead a discussion that will debate the merits of various non-relational databases.  Any CouchDB or MongoDB users out there?  RSVP and get in touch to be involved in the panel.</p>
<p><strong>Birds-of-a-feather</strong><br />
There will be five birds-of-a-feather sessions going on concurrently.  Each discussion topic chosen so that you’ll be able to find one that you are most interested in:</p>
<p>1.	Operations (managing data) – Stu Hood of <a href="http://www.rackspace.com/index.php">Rackspace</a> and the <a href="http://incubator.apache.org/cassandra/">Apache Cassandra project</a> will lead a discussion on non relational databases<br />
2.	Analytics (exploring data) – [No moderators locked in, interested?  Email info@infochimps.org]<br />
3.	Web Applications (humanizing data) – [No moderators locked in, interested?  Email info@infochimps.org]<br />
4.	Visualization (seeing data) – [No moderators locked in, interested?  Email info@infochimps.org]<br />
5.	Data Commons (freeing data) – <a href="http://infochimps.org">Infochimps</a>’s own Flip Kromer, together with <a href="http://www.factual.com/">Factual</a>’s Gil Elbaz will lead a discussion on building a cross-domain data commons.</p>
<p><strong>Mingling</strong><br />
The best part of this event is the people.  You’ll have time to talk, eat, and network with some of the greatest minds in the data world and exchange cutting edge ideas.</p>
<p>If you’re a really smart data geek, you can’t miss out on this chance to immerse yourself in the world you love.  RSVP now at <a href="http://datacluster.infochimps.org">http://datacluster.infochimps.org</a> Afterwards, check out our Facebook event page for more information on who’s coming and the latest updates.</p>
<p>None of this would be possible without our sponsors, <a href="http://www.rackspace.com/index.php">Rackspace</a>, <a href="http://infochimps.org">Infochimps</a>, <a href="http://www.wolframalpha.com/">WolframAlpha</a>, <a href="http://www.factual.com/">Factual</a> and knowmore.  To all of them, thank you!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2010/03/09/data-cluster-meetup/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How to create datasets that the rest of the world needs</title>
		<link>http://blog.infochimps.org/2010/03/04/how-to-create-datasets-that-the-rest-of-the-world-needs/</link>
		<comments>http://blog.infochimps.org/2010/03/04/how-to-create-datasets-that-the-rest-of-the-world-needs/#comments</comments>
		<pubDate>Thu, 04 Mar 2010 18:39:31 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=546</guid>
		<description><![CDATA[We recently created a dataset for the web site that is a map between IP addresses to zip codes and census demographic information.  The work that was involved in this is representative of the type of community we want to have involved with Infochimps in the future.  The type of people that will [...]]]></description>
			<content:encoded><![CDATA[<p>We recently created a dataset for the web site that is a map between IP addresses to zip codes and census demographic information.  The work that was involved in this is representative of the type of community we want to have involved with Infochimps in the future.  The type of people that will find this dataset useful &#8211; web site owners, internet advertisers &#8211; are not always going to be the same people that can create such a dataset.  This division of labor can only happen when experts at data gathering can share their data in a place where people that want to use the data can find it.</p>
<p>Our social media expert Maegan recently interviewed Carl, a member of our data team, to talk about this dataset creation process.  You can find the IP-Census data he&#8217;s talking about here: <a href="http://infochimps.org/collections/ip-address-to-us-census-data">http://infochimps.org/collections/ip-address-to-us-census-data</a>.</p>
<p>M: Hi Carl, would you start by introducing yourself and telling us what you do for Infochimps?</p>
<p>C: I’m a member of the data team here at Infochimps.  Basically, the team in charge of gathering data that’s available on the web, cleaning it up and making it more useful for other people out there that are looking for this sort of data.</p>
<p>M: I can imagine how appealing that data is to a lot of people.  Speaking of useful data, I heard that you recently came up with a collection of datasets that link IP addresses to Census information.  Can you tell me more about it?</p>
<p>C: Well, we heard from a few people that that sort of thing might be interesting.  There are a lot of people out there want to know more about the people that come to their website.  Using this dataset, they can get demographic details by using the IP address of their visitors.  That way they can improve their understanding of their audience and target the content on their website better.  The dataset that we have links IP addresses to zip codes, and then zip codes to all sorts of demographic data from the Census.</p>
<p>M: I saw that you have so many different types of information from the Census.  Where did you go to find the data to mash together?</p>
<p>C: For the Census data, that’s a fairly well-known source.  The US government has a Census website, <a href="http://factfinder.census.gov">Factfinder.census.gov</a>, where you can go to download all sorts of information.  As far as the IP to geolocation data, there are lot of datasets available.  We were looking for one that had good coverage of IP addresses, was available for free, and had a license that allowed us to take that data, do what we wanted with it and make it available on our site.</p>
<p>M: Is this a new kind of dataset?  Or is it available elsewhere?</p>
<p>C: The IP to geolocation dataset is available from where we got it &#8211; at <a href="http://www.maxmind.com">MaxMind</a>.  Linking that to the Census data is something that I don’t think we’ve seen elsewhere.</p>
<p>M: How did the process work once you had the data?</p>
<p>C: The Census data is divided into a lot of different geographic segments – national, state, city, county and all those sorts of things, but the IP geolocation data only uses zip codes.  We wanted just the data from the Census that’s associated with the zip codes, so I had to comb through the Census data and pull out just the lines of the data that are associated with zip codes and then use that to match up to the IP addresses in the geolocation data.</p>
<p>M: Is it just how they’re organized?</p>
<p>C: Yeah, it’s more of how it’s organized.  The Census data is organized into a few different files.  You have one file that lists all the different breakdowns of how the data is divided up – like how I was saying, by state, city, zip code or the country.  Each of those breakdowns was associated with this logical record number.  Then, the actual Census data files have the logical record number at the beginning and then all the numbers associated with the different fields in the rest of the file.  I had to pick out just all the logical record numbers that were associated with the zip codes in the first files and then pull all those out of the Census data to match it to the zip codes from the IP addresses.</p>
<p>M: I would imagine that Census data would involve big files – did this make them difficult to manage?</p>
<p>C: Yeah, the Census data files are really large and so it took a lot of space to load everything into memory.  Then, I made a list of what data we needed from the Census data files and searched through them line by line to match zip codes to demographic information.</p>
<p>M: That sounds like a lot of work.  Did you have to do anything else to process the data?</p>
<p>C: The other thing that I did was figure out the column headings to make it more useful.  The way it was presented by the US Census bureau is that each column of data has a column heading that is just a code that you look up somewhere else to figure out what it actually meant.  I went through and did a lot of manual editing to make the column headings more readable.  Now if you just look at it, you have a better idea of what’s actually going on and it’s not just meaningless code.</p>
<p>M: How did you find data with licenses that actually let you mash them?</p>
<p>C: We were looking for specific datasets that had the licenses with certain properties that let you freely download, mash and mix up the data with other datasets, and sell it on your own site or do anything commercial with it.  Of course, most of these licenses have attribution requirements, so we made sure to list all our sources in the dataset.  The final dataset that we have available clearly says that this data originally came from the US Census Bureau and this MaxMind website.</p>
<p>M: In the end, what licenses did you put on the dataset that you made?</p>
<p>C: The license that is on there now is a very open license that lets users use the data for whatever they need.  It is the Open Database License.</p>
<p>M: Are there any other difficulties you faced?</p>
<p>C: One of the issues that we wanted to make sure was cleared up was that the IP address data that we got was reliable and would cover a lot of IP addresses.  It needed to have broad coverage of general IP addresses.  We did a quick test and used the logs from our own website, took IP addresses from 6 months worth of page visits, and ran all those IP addresses through the IP address database.  It turned out that it matched over 90% of the IP addresses that we had, and so that was a pretty good indication that the IP address dataset we had was fairly complete and had very good coverage compared to others which we heard would have only 50% coverage.</p>
<p>M: Is the availability of the IP addresses a privacy concern?</p>
<p>C: I don’t think it’s a privacy concern because it’s not matching it up to a specific address, but it’s matching it up to a zip code.  Since zip codes have a very large number of people, it’s hard to determine if that IP address is coming from one specific person or even one specific household.</p>
<p>M: Ok, thank you very much, Carl.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2010/03/04/how-to-create-datasets-that-the-rest-of-the-world-needs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Partner with us</title>
		<link>http://blog.infochimps.org/2010/01/26/partner-with-us/</link>
		<comments>http://blog.infochimps.org/2010/01/26/partner-with-us/#comments</comments>
		<pubDate>Tue, 26 Jan 2010 19:42:55 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=537</guid>
		<description><![CDATA[2009 was a great year for us.  We made lots of progress on the website (with a long way to go), but we were especially excited for all the great contacts we made with other developers and companies interested in data.  We strongly encourage all of our followers (you!) to get in touch with us [...]]]></description>
			<content:encoded><![CDATA[<p>2009 was a great year for us.  We made lots of progress on the website (with a long way to go), but we were especially excited for all the great contacts we made with other developers and companies interested in data.  We strongly encourage all of our followers (you!) to get in touch with us to talk about your expertise and data needs.</p>
<p>We will create a page on the site soon which lists our network of data mechanics, data tools, and solution providers.  One of the issues with our site is that many of the datasets can&#8217;t be used by everybody &#8211; some are too large for Excel and average tools, and others require specialized skills in order to use.  Our <a href="http://infochimps.org/datasets/texas-assessment-of-knowedge-and-skills-taks-exams-2003-2007">TAKS</a> and <a href="http://infochimps.org/datasets/twitter-census-::-conversation-metrics-one-year-of-urls-hashtags">Twitter</a> datasets are just some examples of datasets that can be really powerful for a lot of businesses only after an expert has had the chance to analyze them.</p>
<p>Here are a few of the great companies who have worked with us so far:</p>
<p><a href="http://www.qvapps.com/">QVApps</a>: QVApps is a marketplace for QlikView applications.  <a href="http://www.qlikview.com/">QlikView</a> is a business intelligence software that supports third party applications.  Having trouble understanding your AWS reports?  QVApps has a <a href="http://www.qvapps.com/en/qlikview-applications/web-services/amazon-web-services-analyzer.html">free AWS report analyzer</a> that we&#8217;ve found useful.</p>
<p>UPDATE: Check out QVApps&#8217; <a href=http://www.slideshare.net/qvapps/qvapps-twitter-hashtags-2009-3205119">slideshow on some of the data from our Twitter Census</a>! See below for the imbedded slideshow.</p>
<p><a href="http://www.data-applied.com/">Data Applied</a>: Data Applied&#8217;s application is truly magic.  Their software is putting the power of machine learning into the world&#8217;s hands.  Techniques and algorithms that people wrote Phd. thesis&#8217;s on 20 years ago are here at the click of a mouse.  <a href="https://data-applied.com/Web/TryNow/Overview.aspx">Try them out with a free account.</a></p>
<p><a href="http://dataminingtools.net/">DataMiningTools.net</a>: A startup based in India, DataMiningTools.net is doing a wonderful job working to educate the masses on data mining tools and resources.  Find <a href="http://dataminingtools.net/browse.php">tutorials</a> on clustering analysis, R, Matlab &#8211; you name it.  Check out videos on <a href="http://dataminingtools.net/videos.php?id=7">Data Applied</a> and your very own <a href="http://dataminingtools.net/videos.php?id=9">Infochimps</a>!</p>
<p>If you are a Data Mechanic, another data company, or just interested in being listed as a solutions provider, please get in touch with me at joe@infochimps.org.  Likewise, if you&#8217;re a Ruby/Rails developer, we&#8217;re hiring!</p>
<div id="__ss_3205119" style="width: 425px; text-align: left;"><a style="font:14px Helvetica,Arial,Sans-serif;display:block;margin:12px 0 3px 0;text-decoration:underline;" title="QVApps Twitter Hashtags 2009" href="http://www.slideshare.net/qvapps/qvapps-twitter-hashtags-2009-3205119">QVApps Twitter Hashtags 2009</a><object style="margin:0px" classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" width="425" height="355" codebase="http://download.macromedia.com/pub/shockwave/cabs/flash/swflash.cab#version=6,0,40,0"><param name="allowFullScreen" value="true" /><param name="allowScriptAccess" value="always" /><param name="src" value="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=qvappstwitterhashtags2009-100217024029-phpapp02&amp;stripped_title=qvapps-twitter-hashtags-2009-3205119" /><param name="allowfullscreen" value="true" /><embed style="margin:0px" type="application/x-shockwave-flash" width="425" height="355" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=qvappstwitterhashtags2009-100217024029-phpapp02&amp;stripped_title=qvapps-twitter-hashtags-2009-3205119" allowscriptaccess="always" allowfullscreen="true"></embed></object></div>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2010/01/26/partner-with-us/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Data.gov import</title>
		<link>http://blog.infochimps.org/2010/01/13/data-gov-import/</link>
		<comments>http://blog.infochimps.org/2010/01/13/data-gov-import/#comments</comments>
		<pubDate>Wed, 13 Jan 2010 18:59:35 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=528</guid>
		<description><![CDATA[Infochimps is pleased to announce a recent import of all of the data from Data.gov!  Data.gov was one of the more exciting things to happen last year for the world community and it has had a big impact in the US and internationally by setting precedent for government data sharing.  We hope that these datasets&#8217; [...]]]></description>
			<content:encoded><![CDATA[<p>Infochimps is pleased to announce a recent import of all of the data from <a href="http://www.data.gov/">Data.gov</a>!  Data.gov was one of the more exciting things to happen last year for the world community and it has had a big impact in the US and <a href="http://data.london.gov.uk/">internationally</a> by setting precedent for government data sharing.  We hope that these <a href="http://infochimps.org/collections/data-gov" target="_blank">datasets&#8217; inclusion</a> in our collection increases the visibility for all these datasets and becomes useful for the world at large.</p>
<p>The fact that users can edit this data makes them much more usable and interesting.  Unlike Data.gov, users on Infochimps can upload datasets and even upload different versions of datasets to the site.  So when a dataset comes from the government in some messy, incomprehensible format, you can do what Infochimps user Ganglion did and upload a better version.  This type of Wikipedia style curation of datasets is where Infochimps got its name.  Because data drudge work (column titles, formatting issues, etc.) is fit for a chimp, this type of work should only be done once.  And may the result live on Infochimps!</p>
<p><a href="http://infochimps.org/collections/data-gov" target="_blank">Take a look</a> at the Data.gov collection to get started.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2010/01/13/data-gov-import/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Visualizing Chinese media</title>
		<link>http://blog.infochimps.org/2009/12/29/visualizing-chinese-media/</link>
		<comments>http://blog.infochimps.org/2009/12/29/visualizing-chinese-media/#comments</comments>
		<pubDate>Tue, 29 Dec 2009 16:24:35 +0000</pubDate>
		<dc:creator>nickster</dc:creator>
				<category><![CDATA[news]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=523</guid>
		<description><![CDATA[For data geeks interested in the developing world, few places are more compelling to gather numbers about than China. This owes much to its legendary economic growth, the staggering size of its population and global footprint, and hybrid political system. But there is another, often overlooked characteristic of the country at work here: its relentless [...]]]></description>
			<content:encoded><![CDATA[<p>For data geeks interested in the developing world, few places are more compelling to gather numbers about than China. This owes much to its legendary economic growth, the staggering size of its population and global footprint, and hybrid political system. But there is another, often overlooked characteristic of the country at work here: its relentless pursuit of what it calls &#8220;scientific development&#8221;, which emphasizes the use of scientific research as a means to achieve social harmony and balanced economic growth, has led to an explosion in data-fueled, science-based policy. As a result, China is now one of the largest and most sophisticated data-gathering entities in the world.</p>
<p>There&#8217;s a good reason for this. Unlike China&#8217;s early post-revolution cadres, the ranks of China&#8217;s top leadership today are <a id="sezc" title="brimming with scientists and engineers" href="http://www.time.com/time/world/article/0,8599,165453,00.html">brimming with scientists and engineers</a>, including President Hu Jintao, who has a degree in hydraulic engineering. When government &#8220;works&#8221;, these technocrats steer Chinese policy down a painfully cautious course based on five, ten, and even twenty year plans crafted to satisfy <a id="ipz5" title="discrete social, economic, and technological benchmarks" href="http://en.ndrc.gov.cn/hot/t20060529_71334.htm">discrete social, economic, and technological benchmarks</a>. At any given moment, the country is teeming with pilot projects spanning areas like subsidized housing, health care, industrial development, and family planning, which will ultimately be scrutinized by the country&#8217;s <a id="elr0" title="National Reform and Development Commission" href="http://en.ndrc.gov.cn/">National Reform and Development Commission</a> for use at the national level.</p>
<p>This science-based approach is exactly why China has recently come forward with <a id="g5_0" title="ambitious carbon emissions targets" href="http://www.carnegieendowment.org/publications/index.cfm?fa=view&amp;id=24275">ambitious carbon emissions targets</a>&#8211;global warming has a direct, significant impact on its population, and therefore social stability. None of these projects could be completed without good data, and China knows it.</p>
<p>While we&#8217;ve been emphasizing social media data with recent posts, we hope to shine more light on the state of Chinese data and bring more of it into the repository in the near future. To this end, and as a special holiday treat, we&#8217;re releasing a visualization of major Chinese websites we scraped this past October during the country&#8217;s <a id="fd_9" title="meticulously executed" href="http://www.guardian.co.uk/world/video/2009/oct/01/china-national-day-timelapse">meticulously executed</a> 60th anniversary of its founding. We find the bright colors and flashing lights to be particularly seasonally appropriate.</p>
<p><a href="http://dev.infochimps.org/static/gallery/media/nationalday/index.html">Click here for the visualization</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2009/12/29/visualizing-chinese-media/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Open Data Applications</title>
		<link>http://blog.infochimps.org/2009/12/10/open-data-applications/</link>
		<comments>http://blog.infochimps.org/2009/12/10/open-data-applications/#comments</comments>
		<pubDate>Thu, 10 Dec 2009 17:35:03 +0000</pubDate>
		<dc:creator>maegan</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=485</guid>
		<description><![CDATA[With President Obama&#8217;s Open Government Directive and news about Data.gov&#8217;s overhaul, more and more people have been talking about the benefits of open data.  Yes, this includes greater transparency and a more accountable government, but it also gives birth to useful apps that use these newly available datasets.
A lot of these apps have been [...]]]></description>
			<content:encoded><![CDATA[<p>With President Obama&#8217;s Open Government Directive and news about Data.gov&#8217;s overhaul, more and more people have been talking about the benefits of open data.  Yes, this includes greater transparency and a more accountable government, but it also gives birth to useful apps that use these newly available datasets.</p>
<p>A lot of these apps have been made for competitions like Sunlight Lab&#8217;s Apps for America and various cities&#8217; own initiatives like NYC BigApps.  Understandably, they provide appealing incentives for programmers.  (If not the recognition, the cash prizes are appealing).</p>
<p>All that said, these competitions have spawned very useful apps.  Here are 5 that we feel are great examples of the good that can be done with government data:</p>
<p><a href="http://www.thisweknow.org/"><img src="http://blog.infochimps.org/wp-content/uploads/2009/12/This-We-Know_-Explore-U.S.-Government-Data-About-Your-Community-11-150x150.jpg" alt="This We Know_ Explore U.S. Government Data About Your Community-1" /></a><br />
1. <a href="http://www.thisweknow.org/">This We Know</a> (www.thisweknow.org)<br />
This We Know is a excellent tool that provides a wealth of information sourced mainly from Data.gov.  You name a place and it tells you what we know about that location &#8211; things like demographics or the number of factories in the area.  It&#8217;s also presented in a very clear fashion, condensing data into an easily understandable and still useful format.</p>
<p><a href="http://www.outsideindc.com/stumblesafely"><img src="http://blog.infochimps.org/wp-content/uploads/2009/12/stumble-150x89.png" alt="stumble"  /></a><br />
2. <a href="http://www.outsideindc.com/stumblesafely">StumbleSafely</a> (www.outsideindc.com/stumblesafely)<br />
This app from DC literally helps you stumble safely home.  It uses data on crime and geography to map out safe routes from the more (in)famous bars in the city, no matter what time you like to party &#8211; day, evening or night.</p>
<p><a href="http://www.nycway.com/"><img src="http://blog.infochimps.org/wp-content/uploads/2009/12/photo_185-150x150.jpg" alt="photo_185"  /></a><br />
3. <a href="http://www.nycway.com/">NYC Way</a> (www.nycway.com)<br />
An iPhone app, NYC Way provides you with a plethora of useful information for locals and tourists alike right at your fingertips.  Location aware, it draws from a bunch of various datasets from the NYC.gov Data Mine and gives you facts about nearby zoos, wi-fi spots, emergency rooms, and a lot of other useful places to help you find your way in the big city.</p>
<p><a href="http://www.everyblock.com/"><img src="http://blog.infochimps.org/wp-content/uploads/2009/12/everyblock_0-150x105.jpg" alt="everyblock_0"  /></a><br />
4. <a href="http://www.everyblock.com/">EveryBlock</a> (www.everyblock.com)<br />
This one&#8217;s not yet available in Austin, but it does have versions for 15 cities across the nation.  EveryBlock provides you with a newsfeed of things going on around a user specified address or location in these cities.  It also allows you to browse by topic and track trends overtime.</p>
<p><a href="http://www.ikidny.com/"><img src="http://blog.infochimps.org/wp-content/uploads/2009/12/ikid-150x150.jpg" alt="ikid"  /></a><br />
5. <a href="http://www.ikidny.com/">iKidNY</a> (www.ikidny.com)<br />
Not all apps are useful just for adults &#8211; this iPhone app, iKidNY, helps you find kid-friendly places all over the NYC.  It provides you with locations and information about activities, kid-friendly restaurants, playgrounds, and even changing tables and subway elevators.</p>
<p>If you want to look at more apps, these competitions&#8217; submission galleries are worth a look:<br />
<a href="http://www.sunlightlabs.com/contests/appsforamerica2/apps/">Apps for America 2</a><br />
<a href="http://www.sunlightlabs.com/contests/appsforamerica2/apps/"></a><a href="http://www.appsfordemocracy.org/application-directory/">Apps for Democracy</a><br />
<a href="http://www.appsfordemocracy.org/application-directory/"></a><a href="http://www.nycbigapps.com/application-gallery">NYC BigApps</a><br />
<a href="http://www.nycbigapps.com/application-gallery"></a><a href="http://datasf.org/showcase/">DataSF</a></p>
<p>Did we miss out on your favorite app?  Let us know!  We&#8217;d love to check it out.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2009/12/10/open-data-applications/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Twitter data, open questions to Developers, Academics, and Data Geeks</title>
		<link>http://blog.infochimps.org/2009/11/20/twitter-data-open-questions-to-developers-academics-and-data-geeks/</link>
		<comments>http://blog.infochimps.org/2009/11/20/twitter-data-open-questions-to-developers-academics-and-data-geeks/#comments</comments>
		<pubDate>Fri, 20 Nov 2009 21:23:44 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=463</guid>
		<description><![CDATA[We are excited to announce the re-release of the Twitter datasets, and a discount to the Twitter API Map dataset.  Again, the datasets are:

Twitter API Map discounted to $100 for this weekend only

and
Conversation Metrics, with Token Count of:

Smiley&#8217;s for free
Hashtags, URL&#8217;s, and Smiley&#8217;s by month for $1,000
Hashtags, URL&#8217;s, and Smiley&#8217;s by hour for $8,000

This time [...]]]></description>
			<content:encoded><![CDATA[<p>We are excited to announce the re-release of the Twitter datasets, and a discount to the Twitter API Map dataset.  Again, the datasets are:</p>
<ul>
<li><a href="http://infochimps.org/datasets/twitter-census-::-developer-tools-mapping-from-twitter-user-sear">Twitter API Map</a> discounted to $100 for this weekend only</li>
</ul>
<p>and</p>
<p><a href="http://infochimps.org/datasets/twitter-census-::-conversation-metrics-one-year-of-urls-hashtags">Conversation Metrics</a>, with Token Count of:</p>
<ul>
<li><a href="http://infochimps.org/datasets/twitter-census-::-conversation-metrics-one-year-of-urls-hashtags/payloads/15374">Smiley&#8217;s</a> for free</li>
<li><a href="http://infochimps.org/datasets/twitter-census-::-conversation-metrics-one-year-of-urls-hashtags/payloads/15371">Hashtags, URL&#8217;s, and Smiley&#8217;s</a> by month for $1,000</li>
<li><a href="http://infochimps.org/datasets/twitter-census-::-conversation-metrics-one-year-of-urls-hashtags/payloads/15371">Hashtags, URL&#8217;s, and Smiley&#8217;s</a> by hour for $8,000</li>
</ul>
<p>This time the data is being released with Twitter&#8217;s approval.  We are talking with them about how we can increase access to more and more bulk data, and need your help in showing them how useful this data really is.</p>
<p>We want to make clear to people with privacy concerns that we absolutely hear and respect your points, and so does Twitter.  These datasets contain NO personally identifiable information, they do NOT contain whole tweets, and they meet the guidelines laid out in <a href="http://www.eff.org/deeplinks/2009/09/what-information-personally-identifiable">this EFF document</a> (on personally id&#8217;able info).</p>
<p>We encourage everybody to take advantage of this weekend&#8217;s discount and go build great things with this data.  Let&#8217;s show Twitter and the world what is possible when one has access to bulk data:</p>
<ul>
<li>Data geeks and Visualization studs: what would you do if you could run jobs across our massive crawl (or the full Twitter graph)?</li>
<li>App devs: what data do you want those nerds to extract?  How would it improve the experience of Twitter or enable new things?</li>
<li>Businesses: how can this data improve your services?  How can this data make you money?</li>
<li>Academic researchers: what amazing things will you uncover by exploring the social network&#8217;s deep structure?</li>
</ul>
<p>Reach out to us in the comments or send us ideas at info@infochimps.org</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2009/11/20/twitter-data-open-questions-to-developers-academics-and-data-geeks/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The data landscape (Part 2), and Microsoft</title>
		<link>http://blog.infochimps.org/2009/11/19/the-data-landscape-part-2-and-microsoft/</link>
		<comments>http://blog.infochimps.org/2009/11/19/the-data-landscape-part-2-and-microsoft/#comments</comments>
		<pubDate>Thu, 19 Nov 2009 07:57:21 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=457</guid>
		<description><![CDATA[The data platform industry has a new entrant this week!  Yesterday Microsoft announced a data store of their own at their developer conference.  Called Dallas, their offering is another example of a data marketplace.  The market for selling data online in an open way is still young (how many platforms besides ours and Microsoft&#8217;s do you [...]]]></description>
			<content:encoded><![CDATA[<p><span style="color: #000000;">The data platform industry has a new entrant this week!  Yesterday Microsoft </span><a href="http://gigaom.com/2009/11/17/microsofts-future-lies-in-software-and-data/"><span style="color: #000000;"><span style="color: #000000; text-decoration: none;">announced a data store of their own</span></span></a><span style="color: #000000;"> at their developer conference.  Called Dallas, their offering is another example of a data marketplace.  The market for selling data online in an open way is still young (how many platforms besides ours and Microsoft&#8217;s do you know?) and so it is validating to see another entrant in this space.  We know that Microsoft will encourage the developer community to explore what these new platforms make possible.</span></p>
<p>Like many other services, Dallas meters out data through an API which is helpful to programmers with limited resources.  With Infochimps, however, developers get full datasets in bulk, which is better for many applications and essential for any kind of analytic work.</p>
<p>Both our marketplaces have the same value proposition: open up your data and profit.  When trying to convince an organization to open up its data, API&#8217;s can be an easier sell.  Even though they are costly to build and run, organizations may prefer the control they get over what people can access when compared to our simple and cheap bulk solution.</p>
<p>It is still unclear what the size and format restrictions are on Dallas.  If they are like other services out there (Socrata, Factual), they need data that comes in a structured, rectangular format.  These constraints enable these services to display their data live online.  While Infochimps doesn&#8217;t have that feature (yet!), we can handle datasets at the terabyte scale as well as those that don&#8217;t fit the spreadsheet paradigm, such as social network graphs.</p>
<p>Dallas is also part of a platform that forces users to integrate with other Microsoft services.  Infochimps&#8217; mission is simply to connect people with the data they&#8217;re looking for, and we let anyone download data without having to register for an account.</p>
<p>We are proud to be a part of a strong community that&#8217;s grown over the past year, and to continue our commitment to an open data comons.  On the commercial side, we are narrowing focus on the right verticals after months of talking with this new market about what is possible.  That ultimately is what this is about &#8211; enabling something that couldn&#8217;t be done before, and connecting buyers to sellers and people to knowledge.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2009/11/19/the-data-landscape-part-2-and-microsoft/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Twitter data update</title>
		<link>http://blog.infochimps.org/2009/11/16/twitter-data-update/</link>
		<comments>http://blog.infochimps.org/2009/11/16/twitter-data-update/#comments</comments>
		<pubDate>Mon, 16 Nov 2009 06:50:26 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=419</guid>
		<description><![CDATA[Our launch of the Twitter data was a great success, and we thank Marshal Kirkpatrick at ReadWriteWeb (also) and Jordan Golson at GigaOm for their coverage. The community reaction has been overwhelming and energizing. We accomplished our two main goals: crack open some issues close to our hearts and kick-start the conversation about sharing data [...]]]></description>
			<content:encoded><![CDATA[<p>Our launch of the Twitter data was a great success, and we thank <a href="http://www.readwriteweb.com/archives/twitter_data_dump_infochimp_puts_1b_connections_up.php">Marshal Kirkpatrick at ReadWriteWeb</a> (<a href="http://www.readwriteweb.com/archives/the_value_of_twitter_data_the_future_of_tweetdeck.php">also</a>) and <a href="http://gigaom.com/2009/11/11/is-infochimps-aggregated-data-a-boon-to-researchers-or-a-privacy-nightmare/">Jordan Golson at GigaOm</a> for their coverage. The community reaction has been overwhelming and energizing. We accomplished our two main goals: crack open some issues close to our hearts and kick-start the conversation about sharing data online.</p>
<p>Twitter has advanced some reasonable concerns, however, and have asked us to take the datasets down. We have temporarily disabled downloads while we discuss licensing terms. The outcome of discussions will, we hope, encourage more internet services to open up and share data in bulk. The two biggest issues this data release highlighted are third party redistribution and user privacy.</p>
<p><strong>Redistribution rights</strong>. Twitter maintains a legendarily open API:</p>
<blockquote><p>&ldquo;Except as permitted through the Services (or these Terms), you have to use the Twitter API if you want to reproduce, modify, <strong>create derivative works, distribute, sell, transfer</strong>, publicly display, publicly perform, transmit, or otherwise use the Content or Services.<br />
&ldquo;We <strong>encourage and permit broad re-use of Content</strong>. The Twitter API exists to enable this.&rdquo; <small><em>[highlighting added by us]</em></small></p>
</blockquote>
<p>However, Twitter wants to more closely control who has access to data at massive scale and to prevent its malicious use. We understand this concern &mdash; innovation is always a double-edged sword. The applications and services that can use this data to make the world a better place far outnumber those with bad intentions, however, and good people need better access to this type of data. The best solution is to apply a reasonable license to the data. We are addressing this in our talks with Twitter, and we expect to have a resolution soon.</p>
<p><strong>User privacy</strong>. What little criticism we heard from the community was the potential for a breach of user privacy. This is an issue with many types of internet data, and one we take seriously. We ensured that the datasets released posed no such dangers. The Token Count data contained no personally identifying information, only what the entire mass of twitter users were discussing over time. The API ID Mapping Dataset is simply a sort of phone book for the Twitter APIs: it converts screen names to numeric IDs and reveals absolutely nothing about the corresponding user. <strong>Infochimps.org&#8217;s policy is to not host any personally identifying information of non-consenting individuals</strong> &mdash; we apply this rule to any data that goes on the site from any source.</p>
<p>These are hard issues and it took a bold move to bring them into the open. It will take further sharing and discussion to establish best practices for these concerns so that Twitter and other internet services (Facebook, Amazon, etc.) can share their data to the benefit of the greater online community. Stay tuned while we agree upon appropriate licensing for open sharing of this social data.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2009/11/16/twitter-data-update/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Twitter Census: Publishing the First of Many Datasets</title>
		<link>http://blog.infochimps.org/2009/11/11/twitter-census-publishing-the-first-of-many-datasets/</link>
		<comments>http://blog.infochimps.org/2009/11/11/twitter-census-publishing-the-first-of-many-datasets/#comments</comments>
		<pubDate>Wed, 11 Nov 2009 19:32:52 +0000</pubDate>
		<dc:creator>Joseph Kelly</dc:creator>
				<category><![CDATA[main]]></category>

		<guid isPermaLink="false">http://blog.infochimps.org/?p=364</guid>
		<description><![CDATA[As useful as the Twitter API is, developers, designers, and researchers have long clamored for more than the trickle of data that service currently allows. We agree &#8212; some of the sexiest uses of data require processing not just all that is now, but the vast historical record. Twitter doesn&#8217;t provide the only use case [...]]]></description>
			<content:encoded><![CDATA[<p>As useful as the Twitter API is, developers, designers, and researchers have long clamored for more than the trickle of data that service currently allows. We agree &#8212; some of the sexiest uses of data require processing not just all that is now, but the vast historical record. <a href="http://blogs.wsj.com/numbersguy/if-statisticians-could-turn-back-time-840/" alt="WSJ Number's Guy Blog">Twitter doesn&#8217;t provide the only use case for this</a>, but until now its historical bulk data has been hard to find.</p>
<p>Today we are publishing a few items collected from our large scrape of Twitter&#8217;s API. The data was collected, cleaned, and packaged over twelve months and contains almost the entire history of Twitter: 35 million users, one billion relationships, and half a billion Tweets, reaching back to March 2006. The initial datasets are a part of our <a href="http://infochimps.org/collections/twitter-census">Twitter Census</a> collection.</p>
<p>The first dataset, a <a href="http://bit.ly/twtrcensus1" alt="Token Count dataset">Token Count</a>, counts the number of tokens (hashtags, smiley&#8217;s and URL&#8217;s) that have been tweeted. The data is available for free by month and for pay by hour. Think about comparing this data to the stock market, new movies, new video games, or even trendingtopics.org. For example, use it to look at the adoption of Google Wave on the rate of its mentions. <a href="http://infochimps.org/datasets/twitter-token-counts/payloads/15372" alt="Token Count free payload">On one payload&#8217;s page</a> you will find a snippet with a sample taken during Kanye West&#8217;s outburst in September, and <a href="http://infochimps.org/datasets/twitter-token-counts/payloads/15374" alt="Smiley Dataset">on another&#8217;s</a> you can see that the &#8220;:)&#8221; emoticon has been used 135,000 times.</p>
<p>The second dataset solves a large problem developers have when they use Twitter&#8217;s Search API and the Twitter API, as each API gives back a different unique string for every user on Twitter. This dataset maps user IDs between the two API&#8217;s for 24.5 million users. This mapping should be a godsend to Twitter app developers, as it allows them to easily combine data from each API, letting API calls for friends lists mix easily with searches on the Twitter Search API.</p>
<p>These datasets are only views from the massive collection we have been growing over the last year. We will be releasing additional datasets regularly over the next few weeks so please check back for updates. If you&#8217;d like a custom slice or analysis done on this data, please get in touch at <a href="mailto:imw@infochimps.org">imw@infochimps.org</a>.</p>
<p>With the release of this data, we hope to send a signal that this data is valuable and useful to real-time search engines, Twitter apps, and social media researchers. This should start a conversation about where value really lies in this type of data, the various ownership and privacy issues that arise, and that Infochimps.org is the place to go to find data. We invite interested parties to get in touch and <a href="https://infochimps.org/signup" alt="Signup">begin uploading their data</a>(try invite code &#8220;newsupplier&#8221;) today as part of the Infochimps marketplace.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.infochimps.org/2009/11/11/twitter-census-publishing-the-first-of-many-datasets/feed/</wfw:commentRss>
		<slash:comments>35</slash:comments>
		</item>
	</channel>
</rss>
