YOUR DATA FOR SALE : The Web Data Mining
OK, let's blame it to the festive season that you have gained a few pounds in the last 2 months... And now, you have a vacation lined up, after which you will definitely be posting in quite a few pictures to flaunt a little about your latest vacation.
But then, you would want to be in shape before all this happens and someone points it out on your FB wall and embarrass you for no reason. So, here begins your weight loss spree..!
It's 12am, and you're surfing some weight loss tips. You search for a daily diet regime and exercise plan since you bet to follow it the very next morning.
While you surf, the different pages show you the clothes/shoes you surfed the other night. But you decide not to deviate from the topic but they somehow tempt you to go to their website and buy something more useful at this moment. You end up buying a weighing machine after having compared the prices at different websites and reading their reviews.
And you are now quite a happy customer with the cheap purchase with the coupons available on the site. You are almost ready with the diet and exercise plan and now move to your daily routine of posting in a forum or two, and tweet about your day.
At this point, if you sense someone peering over your shoulder, it will probably be your spouse/mom/dad/sister looking for a midnight snack. You definitely Won't be thinking about electronic privacy and the personal information your computer leaves as it weaves from site to site.
And you won’t even realize that by now they've gathered a vast amount of data - taken from the websites you looked at, the stuff you bought, your Facebook photos, your tweets, your warranty cards, your customer-reward cards, the songs you listen to online, surveys you were guilted into filling out and magazines you subscribe to.. It's easy to think of yourself as a small speck of sand in an invisible web of servers.
Once again, technology is making us weigh what we're sacrificing in privacy against what we're gaining in instant access to information. Some facts about you were always public -- the price of your home, some divorce papers, your criminal records, your political donations -- but they were held in different buildings, accessible only by those who filled out annoying forms; now they can be clicked on.
Other information was not possible to compile pre-Intemet because it would have required sending a person to follow each of us around the mall, listen to our conversations and watch what we read in the newspaper. Personal information was once stored in just two places: your home or your head.
Now all of those activities happen online and can be tracked instantaneously and this same information is worth billions to advertisers and data brokers, operating quietly in nearly every facet of your life.
Below, explore the complicated relationship between you and Big Data: from how your information is collected, to where it’s used, to how you can profit instead.
It Starts With You...
The simple act of browsing the Internet or subscribing to magazines will push your personal information into a complex industry of buyers, sellers and brokers.
➤ The Internet: Ads that follow you and those Junk Mails
Dubbed “behavioural advertising.” advocates say they can eliminate annoying and out-of-place ads by collecting information about your interests. Monitoring begins with the cookie, a small text file advertisers save on your computer. It‘s retrieved later and compiled with other cookies to develop a complex portrait of your online behaviour. The scale of online tracking can be staggering.
Since targeted ads are so much more effective than non-targeted ones, websites can charge much more for them. This is why --- compared with the old banners and pop-ups --- online ads have become smaller and less invasive, and why websites have been able to provide better content and still be free.
Advertisers are interested only in tiny chunks of information about your behavior, not your whole profile, which is one of the reasons that some people actually argue that data mining does no real damage. Junk mail is a familiar evil that's barely changed over the decades.
Data mining and the advertising it supports get more refined every month. The latest trick to freak people out is re-targeting --- when you look at an item in an online store and then an ad for that item follows you around to other sites. other sites.
➤ Social Networks: Facebook and Google Troves
Your social network is part of the richest data set ever produced. Gnip, the largest provider of social data, stores more than 100 billion social activities each month, offering companies access to a vast database of content from Twitter, Foursquare, Tumblr, Wordpress, Disqus, StockTwits, Facebook, YouTube, lnstagram. Google+ and others. You can download your Facebook data to see how much information is on record, including a history of every advertisement you've clicked.
Our identities, however, were never completely within our control: our friends keep letters we've forgotten writing, our enemies tell stories about us we remember differently, our yearbook photos are in way too many people's houses. Opting out of all those interactions is opting out of society. Which is why Facebook is such a confusing privacy hub point.
Many data-mining companies made this argument to people: How can they complain about having their last trip data-mined when they are posting photos of themselves on Facebook and writing columns about how they enjoyed they luxury car ride there?
So your privacy on Facebook --- that's up to you. You choose what to share and what circle of friends gets to see it, and you can untag yourself from any photos of you that other people put up. However, from a miner's point of view, Facebook has the most valuable trove of data ever assembled: not only have you told it everything you like, but it also knows what your friends like, which is an amazing predictor of what you'll like.
Facebook doesn't sell any of your data. partly because it doesn't have to --- 23.1% of all online ads not on search engines, video or e-mail run on Facebook. But data-mining companies are "scraping" all your personal data that's not set to private and selling it to any outside party that's interested. So that information is being bought and sold unless you squeeze your Facebook privacy settings tight, which keeps you from a lot of the social interaction that drew you to the site in the first place.
The only company that might have an even better dossier on you than Facebook is Google. Google keeps the data it has about you from various parts of its company separate. One category is the personally identifiable account data it can attach to your name, age, gender, e-mail address and ZIP code when you signed up for services like Gmail, YouTube, Blogger, Picasa, iGoogle, Google Voice or Calendar.
The other is log data associated with your computer, which it "anonymizes" after nine months: your search history, Chrome browser data, Google Maps requests and all the info its myriad data trackers and ad agencies (DoubleClick, AdSense, AdMob) collect when you're on other sites and Android phone apps.
You can change your settings on the former at Google Dashboard and the latter at Google Ads Preferences --- where you can opt out of having your data mined or change the company's guesses about what you're into.
Google says that its mission is to organise the world's information and make it universally accessible and useful. Which is awesome, except for the fact that your own information is part of the world's information.
➤ Mobile Phone & Loyalty Cards
More companies have started to use this data to track your movements and send you location-specific offers or to follow your movements in their stores. Telecommunications giant Rogers Communications announced an advertising program last year to send four promotions per week to its mobile customers via text message. The idea is simple: opt in and get 15% off a pizza slice when you‘re near a Pizza Hut at lunch time.
Around 50 per cent of all Indians participate in loyalty programs, according to a global research firm. In exchange for some basic information. loyalty cards offer discounts and rewards for repeat business. But for businesses, the real value lies in mining that information for insights into customer behaviour. Businesses can track your movements and buying habits, helping them determine whether you're a vegetarian or predict when you're about to have a baby.
Collected by Data Brokers
Data flows from social networks and Internet companies to data brokers. They combine this with other data to create valuable lists --- curated data on specific groups with names like “Indians with Discretionary Funds" and “Tropical Beach Resort Coors.” These lists are bought, sold, exchanged and battered, forming the basis of the Big Data economy. People can still be wrapped up in the Big Data economy without using mobile phones or loyalty cards.
One of the largest data brokers, Axciom Communications. uses more than 500 phone directories to create a profile about you by pairing your name, phone number and address with census information such as income and housing. In the United States, they‘re known for trying to collect 1.500 data points on every American, making use of everything from court records to birth records to magazine subscriptions.
Google's Ads Preferences believes that you are a guy interested in politics, Asian food, perfume, celebrity gossip, animated movies and crime but who doesn't care about "books & literature" or "people & society." (So not true.)
Yahoo! has you down as a 20-25-year-old male who uses a Mac computer and likes hockey, rap, rock, recipes, clothes and beauty products: it also thinks that you live in Delhi, even though you moved to IIT Dhanbad more than two years ago.
A data-mining company that was recently banned by Facebook because it mined people's user IDs, has you down as a 20-25-year old male who is still enjoying his school days. It knows that you like studying and watching cricket. Since you followed some matches online, they also think you are a hard core sports enthusiast.
Each of these pieces of information (and misinformation) about you is sold for about two-fifths of a cent to advertisers. which then deliver you an internet ad, send you a catalog or mail you a credit card offer. This data is collected in lots of ways, such as tracking devices (like cookies) on websites that allow a company to identify you as you travel around the Web and apps you download on your cell that look at your contact list and location.
You know how everything has seemed free for the past few years? It wasn't. It's just that no one told you that instead of using money, you were paying with your personal information.
Purchased by Data Users
The destination of all the collecting, analyzing and trading are the data users. These are mostly marketers and advertisers, but can include fundraisers or non-profits. They buy or rent lists to better understand their target demographic based on specific traits such as ethnicity, income property value, and hobbies.
Data users don't target you specifically. Instead, they'll analyze a specific list to build profiles. Marketers can blanket a target area with advertising or open a new store in an up-and-coming neighbourhood --- expensive decisions made all the more easy by using data.
How you can Profit Instead?
But if you can't beat them, how about joining them? Most of the people are actually interested in selling their data for rewards, according to a survey by Microsoft. Now some websites are popping up to give you the tools.
Datacoup is a New York startup that pays to access your data, including your social networks and credit card statements. Currently in beta testing, the company will eventually sell intel on its users to marketing companies (after removing any personally identifying information).
Matt Hogan, the company‘s CEO and founder, says collections of personal information are like valuable diamonds created every day through purchases and internet use. “Data brokers are finding all these diamonds and reaping all the benefits," he says.
Even if you don't want to sell your data, some are advocating a system where each person can maintain one official file of data in the cloud, dubbed the “golden record."
This system would help the companies store everything from phone numbers to health records to tax information, controlling who has access and how it's used. Personal data vaults, such as the one offered by Personal.com, already offer this service. In the future. data vaults could include features to open your data to companies --- for a price.
How Do People Mine the Web?
Crawlers aren't easy to create, so unless you know how the Internet works and how to code, you probably need to buy an engine or hire a programmer to create one for you. The type of crawler you create depends on the type of mining you want to do. You probably have limited finances, so collecting and storing everything is usually too expensive. A crawler can log every bit of information it finds or just log the specific information you want to collect.
A crawler is basically a bot. A bot is a program. Every search engine has a bot. These bots are the most popular data mining tools. You can think of a bot as a different kind of browser. Instead of grabbing a web page from a server and displaying the HTML on a user's screen, the bot finds a page, grabs information and logs this information to a database.
A bot usually runs based on some kind of trigger. You can manually run this program, but most data mining bots run on a schedule. You can schedule it for certain times in the day or based on some kind of trigger such as finding a new website or link. You can even use your website visitors to trigger the bot.
For instance, a user enters information into your e-commerce sign up form and gives you a URL as a referral for how they find your site. After storing this URL, you then crawl the URL to data mine from it. How you use this information is just as important as how you collect it.
A good database design helps keep data integrity and avoids redundancy. Good database design also affects performance, so unless you want your reports to take hours to render, make sure your database design is normalized and indexed.
A bot‘s complexity varies, but you should think of it in the same way you think of your browser. First, the bot does a lookup on the URL using DNS. DNS servers translate friendly domain URLs to IP addresses. The bot program can then use the IP address to “find“ the web server and website on the Internet.
Next, the bot can view and store server headers. Server headers are set by your host, but if you have a dedicated server or VPS, you can set customer server headers. Server headers tell you a few things about the site. First, it tells you the server's operating system. Second, the server sends a response code. There are several response codes. For instance, server response “200“ means the page was returned without an error. A 404 means the page is “temporarily not found." A “503“ is service unavailable (usually for scheduled downtime such as maintenance). These are some of the most common responses, but your bot needs to account for each server response.
After you determine server response codes, you can grab the HTML. The HTML contains JavaScript and CSS files, the HTML code and links. You'll need to crawl the HTML if you want any information from the front-end screen that the user sees in the browser. Back-links, content, site structure and code languages are all reasons to obtain the web page's content.
One issue to remember is that site owners sometimes watch bots and anonymous browsing. If you use too much of a website's resources, the site owner or the host might block your bot. You should be considerate with a bot and acknowledge and honor a robots.txt file. A robot.txt tile contains directives for bots. It's always stored in the root, and it tells bots which directories and files the webmaster doesn't want crawled. If you don't honor this file, webmasters and hosts might block your hot either by user agent or IP address.
Even if you have permission to crawl the site, you still shouldn't send too much traffic to a website. Some web hosts don't have the resources to handle large amounts of data, so your bot can affect regular user traffic. If the site goes down, your bot can cost the webmaster sales and organic search engine traffic.
Track Back
➤ Monitor the information your computer sends out
When a user clicks on a website, a "session" begins. A session tracks you from the first page you click on until you exit the site. Your session can be monitored in several ways. Your IP address, the binary digits assigned to your computer by your Internet provider, can provide website owners with your approximate location, including city, suburb and state, as well as your computer hardware and what type of operating system you run.
Although IP addresses can provide a fairly detailed summary of your computer, Web browser cookies provide a more complete profile of a user's preferences. Three types of cookies are sent out when you surf the Internet. A session cookie is a simple text file that expires once you close the website.
A persistent cookie exists as a text file as well, but it remains on your hard drive and either expires at a set time or remains until you delete it. Often used when someone logs in to a site and wants to remain logged in for a set amount of time. persistent or permanent cookies collect information about you and your Web browsing habits. The important thing to note is that these types of cookies generally exist for only one domain.
➤ Not all Internet cookies are created equally
The last type of cookie is a third-party ad-serving cookie, which monitors your Web browsing to show you advertisements that relate to your interests. The site owner places third-party ads on the site, but the actual ads are hosted by another site. If your computer accepts the third-party cookie, the company hosting the ad can access your information and compile detail-rich profiles, including your IP address, location, shopping preferences and in some cases the means and methods in which you pay online, In order to maintain your privacy, your internet browser will allow you to decline all third-party cookies.
Although you may actively be diverting third-party cookies, they can also appear in the form of Web bugs. Web bugs are small graphics imbedded into a webpage. Web bugs are used to hide the fact that the page is being monitored. Information collected by Web bugs include IP addresses, times that the image was viewed and data from related cookies on your computer. Web bugs can track you as you move from site to site and create personal profiles of users.
You can check and see if Web bugs are planted within a page by viewing the page source. if you see images called 'clear.gif ' or find images linking to another site, you'll have found Web bugs. This is one way how companies collect your private information.
➤ Control your personal information online and offline
In much the same way that companies gauge the strength of their personal branding by monitoring how you watch television, the way you travel through the deep web is analyzed and tabulated into statistical data. This data allows businesses both large and small to develop new products, discover the shopping habits of their target markets and make important marketing decisions.
On one hand, without access to this information, you would find companies struggling to properly determine the interests of their mainstream online audience. On the other hand, having your Web browsing monitored can make you feel as though your personal privacy is being invaded.
However you feel, there are distinct ways how companies collect your private information when you browse online. and it is important to know exactly how that works.
Comments
Post a Comment