regression towards the datascience

Getting started with AWS Kinesis using Python


Amazon Kinesis is a fully managed stream hosted on AWS. It is used to collect and process large streams of data in real time. Along with Kinesis Analytics, Kinesis Firehose, AWS Lambda, AWS S3, AWS EMR you can build a robust distributed application to power your real-time monitoring dashboards, do ...

AWS DynamoDB full table scan


Amazon DynamoDB is a NoSQL database service hosted on AWS. It is a fully managed and scalable document store database. It is quite similar to MongoDB.

As the data grows, scan operation on full table would return parts of the data with the LastEvaluatedKey. The application should initiate scan again ...

Manage multiple SSH keys for GIT


Configuring and managing multiple SSH keys for GIT accounts, in Ubuntu is shown below:

Using ssh-keygen create a public/private key and name it as user1_github, user1_bitbucket

create a file ~/.ssh/config and put the following in it.

# user1 github account
Host user1_github
PreferredAuthentications publickey
IdentityFile ~/.ssh ...

Resample timeseries data with custom function


With timeseries data we often require to resample on different intervel to feed in to our analytics model.

Pandas resample have a built-in list of widely used methods. However, if the built-in methods are not sufficient, it is always possible to write a custom function to resample.

This post shows ...

Birthday paradox


Building python visualization of the famous Birthday paradox or Birthday problem.

Birthday problem, is to find the probability of a pair from the given set of randomly chosen people to have same birthday.

Given, we have 365 days (ignoring February 29th). The following chart shows how the probability increases with ...

Adding empty directory within GIT


While maintaining GIT repositories, in some cases, we might be required to add an empty directory as a placeholder to mean something or we wanted all the files in a directory to stay local. However in GIT we can add only files to the repository and not the directory.

I ...

Tornado Error during WebSocket handshake


I was getting following exeption in WebSocket client, when trying to connect Tornado WebSocket server.

WebSocket connection to 'ws://localhost:5678/echo' failed: Error during WebSocket handshake: Unexpected response code: 403

and on the server-side log

WARNING:tornado.access:403 GET /echo (::1) 6.00ms

A simple Echo WebSocket server ...

Binding port 80 to tomcat in Ubuntu


Edit the file /etc/default/tomcat7

change the line




and run the following commands

$ sudo touch /etc/authbind/byport/80
$ sudo chmod 500 /etc/authbind/byport/80
$ sudo chown tomcat7 /etc/authbind/byport/80

Hope this helps.

Building Hadoop source code


The Apache Hadoop is a framework that allows for distributed processing of large data sets across clusters of computers using MapReduce.

The steps listed below is to build and package hadoop from source code. This guide assumes a fresh installation of Ubuntu 14.04 version.

  1. Let's start with installing ...