rounded

Real geeks don’t use IE – Infochimps Browser Usage Analytics 11 hours, 55 minutes ago. by Jesse Crouch

Browser usage by the somewhat normal web

When one is scoping out a web project, one of the first requirements that a designer/web programmer will want to know is “what browsers are we supporting?”. The decision is usually led by a quick googling to find a page like the W3C’s which quickly tells you:

2010 IE8 IE7 IE6 Firefox Chrome Safari Opera
July 15.6% 7.6% 7.2% 46.4% 16.7% 3.4% 2.3%

Over 30.8% of the browser world belongs to IE (much better than the way things were just a few years ago). Almost 15% of your users are using such an old version of IE that you may be tempted to code using IE6 or 7 as your least common denominator.

Browser usage by Infochimps users

Consider who is visiting your site though. Are your users more net savvy? Are they geeks? Here’s what our visitors use:

About 10% of infochimps.org users use IE, almost a third of the norm.
Half of our IE users use IE8 (a much more capable version of IE) leaving a meager 5% in the IE6/7 realm, which is split half and half (2.5% total IE6 users – again, almost a third of the normal).

Conclusion: Real nerds don’t use Internet Explorer

As far as design philosophy goes, we strive to design our sites (infochimps.org, api.infochimps.com) in a progressive enhancement fashion so that all browsers can be supported well (enough) and accessibility is simple and works. IE6 isn’t number one on our list of things to deal with.

When you have limited resources (like a startup), consider who is actually using your site before spending resources on that group.

Infochimps notes from Lone Star Ruby Conference 2010 2 days, 13 hours ago. by Jesse Crouch

Notes and repos from @jessecrouch at LSRC 2010:

http://github.com/infochimps/stronglinks-example
http://github.com/ioda006/lsrc-jekyll

Find the notes in /presentation for stronglinks-example. All slides are done with S5/Operashow and can be viewed with Firefox and Opera by opening the show.html file and pressing F11. Use pageup/pagedown to navigate.

SXSW PANELS 2011 10 days ago. by Sarah Nordquist

Infochimps is ecstatic to promote the following proposed 2011 SXSW Interactive panels. We’ve categorized them loosely according to topic. To the data geeks of the world: go forth and vote!

Seeing Data

Beautiful Data: Interactive Visualization of Social Media

What are the different methods in which data can be displayed and what tools are used to create them? What are the benfits and practical uses of presenting data visually? Finally, what are the most exicting and innovative specimens of data visualization erected around social data?

Social Media Data Visualization: Mapping the World’s Conversations

All about Infographics. How are Infographics constructed and what information can they convey beyond that of raw data?

Exploring Data

Data Overload: Probabilistic Computing For Breakthrough Data Analytics

What is probabilistic computing and how does it differ from more common types of programming? How does probabilistic computing fit into other data analysis tools?

Making Sense of Social Media Data

Explains the ins-and-outs of social media monitoring tools, the techniques that work and realistic expectations of what they can deliver.

Managing Data

Big Data for Everyone (No Data Scientists Required)

What makes Big Data so darn big? Topics of discussion range from (the necessity of) non-traditional solutions to handling Big Data, how those solutions fit into existing architecture, and common pitfalls encountered.

A Showdown at the Database Corral

Oh yes, there are a new sheriffs in town. They answer to Casandra, Drivel, and Drupal. Panelists will talk about case scenarios for each, their relation to traditional, distributed, and non-relational databases, in addition to other topics of interest for folks with their head in the clouds.

Data Nerds, Is Big Data Crushing the Web?

How does a business discern differences between Hadoop, bulk raw data and web crawlers as big data solutions? How does the average non-programmer tap into big data’s value? What sorts of tools are available to access big data, and what are their differences? What problems with our current business systems can be fixed to more manageable handle big data? Is it feasible to make big data repositories open source?

Humanizing Data

What the F*** is the Semantic Web

Good question! In this panel geared for everyone with a soupçon of curiosity and a brain, the Semantic Web is defined and discussed. How do web developers become part of it and how what are the business opportunities?

Open Data & What It Means For You

Does the mere thought of open data cause you to quiver in excitement? You’re not alone. More on the open data movement than you could shake a stick at!

Paying with Data: how free services aren’t free

How is your Facebook information being used and how could it be used in the future? How concerned should we all be? Panelists will also discuss current policy on privacy issues online.

Refreshed Datasets! 1 month, 0 days ago. by Sarah Nordquist

By popular demand, we have refreshed our massive corpus of Twitter data. As part of the facelift, some of our API fields have been eliminated, and many more have been added. Trstrank, for instance, will include a new field called Trstquotient, or TQ, which can be used as a spam indicator. (For details on how that works, stay tuned for a forthcoming blog post). The fields we chose to eliminate from Trstrank–followers, following, and statuses–can be readily accessed via Twitter’s API.

Our new datasets will provide the most accurate and up-to-date reflection of a Twitter user’s measure of influence (Trstrank), activity level (Influencer Metrics), and interactions between two given users (Conversations). The datasets that changed the most, Influencer Metrics and Conversations, have lots of new fields.  Influencer Metrics is now a more rigorous way to measure retweets and @ replies, both incoming and outgoing, and Conversations gives a full summary of the interactions between two users.

We’re versioning the new API calls, to prevent the unpleasantness that could accompany a rapid switcheroo, but our old calls will be phased out quickly. We welcome your feedback on this exciting update!

Cool things to be built with the Infochimps API 1 month, 25 days ago. by Jesse Crouch

We started a page of ideas of cool things you can build using the Query API. There are a ton of valuable things that can be done using the current API calls and we’d love to see them made. Here are some of them:

  • Filter influencers or non-influencers from any feed of tweets (Influence and/or Trstrank)
  • Filter Twitter spam (Trstrank and/or influence)
  • Build a word cloud for a Twitter user in any app (Wordbag)
  • Target content/ads based on words a user tweets about the most (Wordbag)
  • Find the true influence of a Twitter user by combining their Trstrank, ratio of friends/followers, ratio of statuses to retweets in, etc (Trstrank and Influence)
  • Find social circles on Twitter, not by followers, but by who is actually talking to each other (Conversation)
  • Target content/ads based on IP address (IP→Census)
  • A/B test your website/web app based on demographic data (IP→Census)
  • Build a site that lists a person’s Twitter followers with columns for trstrank, influence metrics (display them as ratios) and wordbag. (Trstrank, Influence, Wordbag)
  • Integrate reputation metrics into your Twitter client to help users decide who and who not to follow and also filter their tweet streams. (Trstrank, Influence, Wordbag)
  • Demographic web analytics. Build an app/plugin/etc to analyze web server logs (or log it and analyze remotely with JavaScript) that gives demographic information about a website’s users (IP→Census)

If you’ve got your own idea feel free to post it here or just send it to us!

Infochimps API in Action 2 months, 6 days ago. by Sarah Nordquist

Back in May when our API was still in its infancy, Sean McDonald, founder of Jute Networks, requested access to the Trstrank data to explore the potential application of it on network relationship management. He created a proficient report and raised some pointed questions that some of our other datasets can now answer. We thought it prudent to showcase his work, not only because it’s just plain nifty, but also because it illustrates the exciting synergy of our calls and the particularly appetizing value of them to market researchers.

If you’re attempting to promote something on Twitter, it’s likely that you would want to focus on promoting it amongst the Twitter luminaries. Enter Trstrank, our exciting little measure of Twitter luminescence. Getting your product promoted by someone with a high Trstrank could potentially be marketing gold. The likelihood, however, of someone with a very high Trstrank nurturing your product’s visibility with a steady stream of cooing retweets is slim to, well, none. So how to know where to focus your evangelizing efforts?

Sean wondered the same thing when he set about to promote his report. He created the following visualization of an arbitrarily selected sample of his Twitter friends positioning himself in the center, companies in the inner circle, and contacts associated with those companies in the outer circle. Any contact or company with a Trstrank greater than five is designated by a blue dot; those with a Trstrank between two and five are designated by an orange dot. This gives a useful snapshot of who occupies a “strategic position” in his Twitter universe.

Sean hypothesized that the least likely to engage and retweet his report were both the most top-ranked and most bottom ranked. Eliminating those two tails would yield a swath of active users to target, the orange dots. Ten of Sean’s thirty sample contacts were orange dots. Of those ten users, Sean eliminated seven of them based on personal knowledge he had of them (i.e. he didn’t know them very well or knew they didn’t care about data and data visualization). This left him with three contacts to enlist in his promotional efforts. Sean’s strategy is very savvy, but requires some amount of personal familiarity with contacts, a luxury not every promoter has.

Fortunately, two of our newer API calls, can simulate Sean’s marketing method. Influencer Metrics will show you how likely a user is to retweet a post based on their tweeting history.  Coupling Influencer Metrics with Trstrank would enable a promoter to identify not only the users most likely to engage, but also the most influential of those users. Throw Wordbag into the mix and a promoter could also discover if users in the active, influential target population have a potential interest in their product.

We would love reader feedback about our current API calls. How do you envision them working together? What other kind of calls would be of benefit to you? Let us know your ideas.

Access the Infochimps Query API via commandline 2 months, 15 days ago. by Jesse Crouch

A tutorial on how to use chimps to access the Infochimps Query API via commandline.

  1. Sign up for the API
  2. When you get your API key, create your chimps dotfile: sudo nano ~/.chimps
  3. Put this in your dotfile:
    :query:
          :username: your_api_name
          :key:      you_api_key
    
    
  4. Install chimps: sudo gem install chimps. (make sure you have gemcutter as a source otherwise it won’t find the gem: gem sources -a http://gemcutter.org)
  5. Run a query! % chimps query soc/net/tw/influence screen_name=infochimps

It should return with something like this:

{"replies_out":13,"account_age":602,"statuses":166,"id":15748351,"replies_in":22,"screen_name":"infochimps"}

That’s it!

Introducing the Infochimps Query API 2 months, 20 days ago. by Sarah Nordquist

Infochimps is pleased to announce the release of our Query API in public beta today. As part of our ongoing effort to democratize access to structured data, the Infochimps Query API offers several calls that allow you to analyze a prodigious amount of Twitter data dating back to 2006. Our current operational calls include the following:

Trstrank

Trstrank uses an algorithm similar to Google Page Rank to generate a numerical rank that indicates the amount of influence a particular user has. This is a much more robust way to determine a Twiter user’s influence than by their number of followers alone.

Wordbag

Wordbag enables you to discover what a specific Twitter user finds interesting. After entering the handle of a specific Twitter user, Wordbag generates a list of words unique to that Twitter user.

Influencer Metrics

Influencer Metrics measures the number retweets, mentions, and @replies that a specific Twitter user has. Retweets and mentions can indicate the value the Twitter community gives to the tweets of a specific user. Coupling Trstrank with Influencer Metrics provides a particularly powerful way to gauge the influence of a Twitter user.

The potential applications of our API are limited only by the imagination. We hope market researchers, brazen self-promoters, statisticians, sociologists, cultural anthropologists, linguists, and all the curious Georges out there will find it as compelling as we do.

Looking to the future, our development team will be constantly polishing and updating the API. Follow @infochimps on Twitter for announcements. We received many requests on our private beta for more frequent refreshments of our data and fuller coverage.  Our next update will do just that. We have additional API calls percolating, including one that will allow you to discover close-knit interactions between Twitter users and see the level of interaction between them.

For features and pricing, including our totally free package, the Baboon, click here.

Visualizing a Socially Connected World 3 months, 3 days ago. by Sarah Nordquist

Fresh visualizations for the world of data viz! Using data from our Twitter Census, two data viz enthusiasts have created some sleek visualizations in the past month.

If you want to see Twitter users simply shimmer against a backdrop of a dusky world map, look no further than Ernesto Badillo’s visualization. His map was inspired by the NASA Earth at Night picture. All Twitter users with latitude and longitude coordinates are plotted. With automatic updates from some mobile clients like ÜberTwitter, etchings of some roads, particularly in the U.S., can even be seen. Take a look at Ernesto’s blog post on that page for some details on how he made the visualization.

Twitter users worldwide VIZ

William Johnson of Chatanooga Data submitted a handful of visualizations, again of Twitter users by location. His visualization provides a way to immediately apprehend the ubiquity of Twitter users with playfully colored dots. These visualizations were made with Tableau.

If you, dear reader, are seeking refuge from the oppressive summer sun in the confines of your cool, well-wired basement, consider making your own viz with data from our massive Twitter scrape and submitting it us. We’ll think you’re awesome.

5 Interesting Data Articles 3 months, 17 days ago. by maegan

Inspired by Pete Warden’s Five Short Links, we decided we’d put up a post about the most interesting data articles we’ve come across in the recent months.

Data
Data, Data Everywhere: The Economist ran a pretty comprehensive and accessible special report on data with a series of articles covering the different implications – both good and bad – of the growing amount of data in existence. Make sure you click on the links “In this special report” to read the rest of the articles.

Personal data collection
The Data-Driven Life: This New York Times article shows that even the most mundane-seeming data can be useful. Shared are stories about people who collect personal data using tools, applications and processes, bringing home the point that all this tracking isn’t merely creepy – it gathers data that can help us make better informed decisions.

Privacy
Informavore: The Future of Data Privacy: Here the author explores the extent to which social network data should be private. Citing various case studies and Danah Boyd’s talk during SXSWi earlier this year, she highlights many points of debate and provides readers with some food for thought.

Data visualization
Four Ways of Looking at Twitter: Jeff Clark is a data viz enthusiast and has taken Twitter data and created four interesting visualizations. The four are just a few of the visualizations that have come about in the past few months, and are great examples of what people can do with the rich data source of social networks.

Twitter influence
On Twitter, Followers Don’t Equal Influence: This research is a great example for why Trst.me is a better measure of influence than followers count, and goes much more in-depth in explaining why. This press release from Carnegie Mellon University has some similar validations. Scientists there determined that Twitter could be as good at determining public opinion as a Gallup Poll.