regression towards the datascience

Tornado Error during WebSocket handshake


I was getting following exeption in WebSocket client, when trying to connect Tornado WebSocket server.

WebSocket connection to 'ws://localhost:5678/echo' failed: Error during WebSocket handshake: Unexpected response code: 403

and on the server-side log

WARNING:tornado.access:403 GET /echo (::1) 6.00ms

A simple Echo WebSocket server …

Binding port 80 to tomcat in Ubuntu


Edit the file /etc/default/tomcat7

change the line




and run the following commands

$ sudo touch /etc/authbind/byport/80
$ sudo chmod 500 /etc/authbind/byport/80
$ sudo chown tomcat7 /etc/authbind/byport/80

Hope this helps.

Building Hadoop source code


The Apache Hadoop is a framework that allows for distributed processing of large data sets across clusters of computers using MapReduce.

The steps listed below is to build and package hadoop from source code. This guide assumes a fresh installation of Ubuntu 14.04 version.

  1. Let's start with installing Oracle …

Pig script to process CSV file with quotes and multiline


While writing Pig script, usually we use PigStorage for loading a CSV file.

Consider a sample CSV file in the following format.

2,Loading successfull,2014-09-25
3,Loading successfull,2014-09-25
4,Loading successfull,2014-09-25

can be loaded as

logs = LOAD 'log_folder/log_file.csv' USING …

Split array based on difference with NumPy


I had a NumPy array of numbers, which I had to split based on the change of value.

For example, consider an array as shown below.

values = [112.0, 111.0, 113.0, 111.0, 112.0, 112.0, 112.0, 113.0, 113.0,
       113.0, 114.0, 114 …

Import postgres table to HDFS using sqoop


Importing data from postgres tables into HDFS using sqoop could be done with the following steps.

Make sure postgres jdbc connector is available in /usr/share/java directory.

To list all available tables in the postgres database

$ sqoop list-tables \
    --connect jdbc:postgresql:// \
    --username myUserName \
    --password myPassword

To …

Categorizing and summing data in d3js


Categorizing an array into different buckets in d3js with d3.nest.

For example, consider a dataset that contains an array of integers ranging values from 1 to 200. If we had to group this array into the following four buckets, as defined as;

  • value >= 150, categorize as excess-heat
  • value >= 140 …

Connect remote postgresql server with pgAdmin


When trying to connect remote postgresql server with pgAdmin, leads to "Unable to connect" error.

To enable pgAdmin to connect, edit the following configuration in the server:

$ sudo vim /etc/postgresql/9.3/main/postgresql.conf
   listen_address = '*'

$ sudo vim /etc/postgresql/9.3/main/pg_hba.conf
   local    all     postgresql          trust …