Create files and folders as shown below. Technically,
app.py is sufficient to create a function …
upstreamrepository that has to be synced with your fork
- Fetch the
- Pull the
Resolve any merge conflicts
Push to …
To load a DataFrame from a MySQL table in PySpark
source_df = sqlContext.read.format('jdbc').options( url='jdbc:mysql://localhost/database_name', driver='com.mysql.jdbc.Driver', dbtable='SourceTableName', user='your_user_name', password='your_password').load()
And to write a DataFrame to a MySQL table
destination_df.write.format('jdbc').options( url='jdbc:mysql …
Let say, we have the following DataFrame and we shall now calculate the difference of values between consecutive rows.
+---+-----+ | id|value| +---+-----+ | 1| 65| | 2| 66| | 3| 65| | 4 …
Amazon Kinesis is a fully managed stream hosted on AWS. It is used to collect and process large streams of data in real time. Along with Kinesis Analytics, Kinesis Firehose, AWS Lambda, AWS S3, AWS EMR you can build a robust distributed application to power your real-time monitoring dashboards, do …
Amazon DynamoDB is a NoSQL database service hosted on AWS. It is a fully managed and scalable document store database. It is quite similar to MongoDB.
As the data grows, scan operation on full table would return parts of the data with the
LastEvaluatedKey. The application should initiate scan again …
Configuring and managing multiple SSH keys for GIT accounts, in Ubuntu is shown below:
ssh-keygen create a public/private key and name it as
create a file
~/.ssh/config and put the following in it.
# user1 github account Host user1_github HostName github.com PreferredAuthentications publickey IdentityFile ~/.ssh …
With timeseries data we often require to resample on different intervel to feed in to our analytics model.
Pandas resample have a built-in list of widely used methods. However, if the built-in methods are not sufficient, it is always possible to write a custom function to resample.
This post shows …
Building python visualization of the famous Birthday paradox or Birthday problem.
Birthday problem, is to find the probability of a pair from the given set of randomly chosen people to have same birthday.
Given, we have 365 days (ignoring February 29th). The following chart shows how the probability increases with …
While maintaining GIT repositories, in some cases, we might be required to add an empty directory as a placeholder to mean something or we wanted all the files in a directory to stay local. However in GIT we can add only files to the repository and not the directory.