all that is technology

Syncing files to AWS S3 bucket using AWS CLI


AWS CLI is a command line tool used for interacting with AWS services.

In this post, I'll share the command to sync your local files to S3 bucket by removing files in the bucket that are not present in the local folder.

$ aws s3 sync . s3://bucket-name --delete

--delete option …

Securing your webapp with AWS Cognito


AWS Cognito provides authentication, authorization, and user management for your webapps. In this article I will show Angular snippets to perform authentication with AWS Cognito credentials.

First, I'll show the CognitoService class with just signIn functionality. I've removed other operations like register, confirm registration, etc. from the snippet for simplicity …

Read and Write DataFrame from Database using PySpark


To load a DataFrame from a MySQL table in PySpark

source_df ='jdbc').options(

And to write a DataFrame to a MySQL table

          url='jdbc:mysql …

Calculate difference with previous row in PySpark


To find the difference between the current row value and the previous row value in spark programming with PySpark is as below

Let say, we have the following DataFrame and we shall now calculate the difference of values between consecutive rows.

| id|value|
|  1|   65|
|  2|   66|
|  3|   65|
|  4 …

Getting started with AWS Kinesis using Python


Amazon Kinesis is a fully managed stream hosted on AWS. It is used to collect and process large streams of data in real time. Along with Kinesis Analytics, Kinesis Firehose, AWS Lambda, AWS S3, AWS EMR you can build a robust distributed application to power your real-time monitoring dashboards, do …

AWS DynamoDB full table scan


Amazon DynamoDB is a NoSQL database service hosted on AWS. It is a fully managed and scalable document store database. It is quite similar to MongoDB.

As the data grows, scan operation on full table would return parts of the data with the LastEvaluatedKey. The application should initiate scan again …

Manage multiple SSH keys for GIT


Configuring and managing multiple SSH keys for GIT accounts, in Ubuntu is shown below:

Using ssh-keygen create a public/private key and name it as user1_github, user1_bitbucket

create a file ~/.ssh/config and put the following in it.

# user1 github account
Host user1_github
PreferredAuthentications publickey
IdentityFile ~/.ssh …

Resample timeseries data with custom function


With timeseries data we often require to resample on different intervel to feed in to our analytics model.

Pandas resample have a built-in list of widely used methods. However, if the built-in methods are not sufficient, it is always possible to write a custom function to resample.

This post shows …