Bing Copies Google?

If you’re keeping up with web-related news, this should be high on your personal radar: Google recently accused Bing of copying their search results. This has been covered on broadcast news, in major newspapers, and of course in the tech blogsphere.

Here are some links on the issue:

Fox NewsSearchEngine Land, Matt Cutts (video is very interesting and worth the 40 minutes), David Pogue (NYT, may require registration), Nate Silver (NYT), more here, and here, and of course what started it all.

I posted a variant of this for my students to answer/discuss, and this space is for my take on the issue…

There are two topics here – the original conceit, which regards search spam destroying the basic value of searching; and secondly, the stunning PR coup perpetrated by Google on Microsoft.

I also learned Blekko is a fancy directory service;  a competent version of dmoz.

Matt Cutts is a better communicator than Harry Shum, and Vivek Wadhwa and Rich Skrenta appear to be friends.

Onward to the issues… The PR coup is Google changing the subject away from search relevance and into a plagiarism issue for Microsoft. Read Nate Silver’s blog at the NYTimes for some statistical musings on the possibilities, but I think Microsoft got caught…

All toolbars collect data and do an ET-phone-home routine. BUT – so do most anti-virus products, and I expect many of them are selling data back to Microsoft – and this may amplify honeypot results. I don’t think these guys want to get into that part of things and reveal too much about what data traffic really moves around between engines and browsers and toolbars and anti-virus (wireshark is very enlightening). But the Microsoft emphasis on “no one reads EULAs” is a blatant tell, at least to me.

The real issue regarding relevance of search is obviously a topic which Google aims to avoid discussing. Hence we have a lot of dissembling and yapping about algorithmic approaches and so on. Google doesn’t know how to deal with the problem; neither does Microsoft, and neither wants to go the directory route to solve it. But Google does understand it must be solved, and soon.

I am wondering if the spam issue might be the real reason behind the recent shakeup at the top of Google – Schmidt may recognize the real danger while the founders are still blind to criticism.

Back in ’02 I attended a conference for the professional journal industry (mostly medical but some engineering thrown in) and recall a speaker who predicted the eventual demise of Google – indeed he felt all “free” search engines would eventually fail – due to the internal conflicts of the basic business model.

I think we’re beginning to see this play out.

The next-gen search will have a subscription model – pay a fee and block the spammers or search for “free” and accept the less useful results.

I agree there are likely technological methods available to block the spammers, but the incentives have to be there to implement them. I’ve often thought selective economic boycott would do wonders to the email spam problem… along with public execution of some of the purchasers of spamvertised goods.

Ye Olde Greate Aggregator Panic (Spokeo.com)

The latest Facebook panic:

ATTN! There’s a site called spokeo.com that’s a new online USA phone book w/personal information: everything from pics you’ve posted on FB, your home address, credit score, home value, income, age, etc. REMOVE yourself by searching your name, copy the URL of your page, go to the bottom right corner of the page and click on the Privacy button to remove yourself. Copy & re-post so your FB friends are aware.

My reaction: YAWN.

Um, seriously, folks, spokeo is not the threat. Your own habits and those of your state and local government are the threats, if you’d like to look at it like that.

Spokeo is an aggregator. They run a “spider” or robotic software which scours databases and websites, looking for names and phone numbers and email addresses and regular addresses and blends all this together with public databases to produce reports, which they will then sell.

Go ahead. Go tell Spokeo that the information they have is correct (you’re doing that by “removing” yourself). It’s ok. PeopleSearch, ZabaSearch, PPLFinder, WhitePagesUSA and many many others will be happy to continue to list you – and not all of them have ways for you to “remove” yourself either.

You see, you’ve provided all the information collected by these data-miners. Look at those magazine or newspaper subscriptions. Recall those little mail-in cards, and questionnaires as part of renewing? Recall all those sweepstakes where you just answered a few questions (and included your email address and/or phone number)? How about some of those warranty cards you mailed back (after answering a few questions)?

Did you remember to use the same email address and phone number for all those entries? Good. The data-mining industry thanks you for your cooperation.

And as for the information you didn’t provide (like how much your house is worth), well, you did. When your deed was recorded, or mortgage(s) issued, your local government made a copy (and taxed you a bit on the transaction). All that data is public and accessible – and data-miners are the biggest purchasers. They tie the addresses and names back to the subscription/warranty/sweepstakes lists, and – we have a spokeo.


Protecting yourself…

You can’t really keep the data-miners out of the government records arena, or out of the conventional phone books. But you can make life a bit harder for them.

Don’t go filling in lots of warranty card data, or put bogus stuff in. In most jurisdictions it is not even necessary to “register” a warranty, despite what the card says (your state law trumps the fine print on the warranty card). Try to avoid using the same email address all the time – or even the same phone number.

You might really go whole hog and use blocking and anonymizer software systems, but it really shouldn’t be necessary, though a good adblocker will certainly help.For myself, I use a variety of email addresses, various phone numbers (between home, business, GoogleVoice, etc there are plenty to go around) and the NoScript and AdBlocker plugins in the Firefox browser to stay out of the data mines.

Finally – don’t panic about this stuff. Go to the site, search a few names, marvel at how often they get things really laughably wrong… and go back to living.

[wpw]

This is a repository

of my ramblings on topics related to my instructional activities. It’s also a demonstration site, for viewing and commenting by students (yes, Frank, there are those out there who’ve never commented on a blog).

The usual caveats apply: All opinions are mine and not those of others including but not limited to incidental employers and/or employees; etc.

Google – where “Free” becomes awful costly

This note was written in response (in part) to the following article, as passed along by Frank Stallone:
http://www.duckworksmagazine.com/11/columns/guest/winter/index.htm

 Summary: The author of that page relied on Google’s AdSense program and custom YouTube videos for revenue, and at some point “violated terms” with Google AdSense and was summarily booted off, with no recourse or understanding of the “violation.”

The problem lies in the details… and many of the details are hidden away. Google has three levels of rules: those in the contract (the standard “fine print”); those located in FAQ documents; and those enforced only through opaque algorithms. It’s this last group which causes the most headaches.

You can go along, playing by all the published rules (contract and FAQs) and one day, suddenly, you’re locked out and nothing works – and there’s no explanation, other than a terse “This account is suspended due to a Terms of Service violation.”

For the above-captioned individual, he was reliant upon Google’s AdSense program for revenue. That may work, IF you’re willing to play by the unrevealed rules – which to my mind means – you only run AdSense, or perhaps AdSense with Amazon, on any site. Period. Mr. Winter tried to get cute (subscriptions, local ad syndication and affiliations, cross-feeds) and found out the downside of playing with Google. They “fired” him.

I’m not going to analyse specifics with AdSense. It’s not something I have much dealings with – I rarely see the need to run syndicated advertising on a website. None of my sites nor any site I maintain has syndicated advertising.


The main issue I think is the hidden ToS (terms of service), and the selective enforcement thereof by Google.

A lot of people (individuals, schools, corporations, nonprofits, etc) have relationships with Google. A lot rely on Google services for critical communications and even business-critical services.

If the relationship with Google is a paid one – you’re paying them a fee for service – then this is fine. Where it is problematical is when it’s “Free.”

The user has very limited control over “free” applications, and little or no recourse if something goes wrong.

One of the websites I support is http://www.njchurchscape.com/

NJChurchscape has been hosted on my server since the site’s inception and is the busiest site on the server. A couple of years ago, in an attempt to broaden research methods and experiment with public collaboration tools, a subordinate “wiki-style” site was launched via the Google Sites offering:

http://sites.google.com/site/cumberlandchurchscape/

and if you follow the link you’ll see it’s been cut off – for some sort of ToS violation, the details of which have not been revealed. Google won’t answer questions regarding this – but I don’t see how a discussion of old churches in Cumberland County NJ causes a ToS violation. However this was done using the “free” version of Sites, and thus there is no recourse, no one to call or talk to or blame. It’s all faceless.

I personally use a handful of GoogleApps sites, all associated with domain names, and should any of these end up being deemed as “critical” for line-of-business, I will pay the $50/user/year fee.  For now, these are experimental sites.

I know a few people who read this (at least I hope read this) are running their primary business applications on the “free” Google services. I hope “free” turns out well for them.

My bottom line: Google should not be relied on as a source of income, unless you’re an employee of the company. Google should not be relied on as a provider of any service, unless paid for.

Google is NOT your “friend.” EVER.