Chandan Rajpurohit

An Artist With Technical Skills

  • What is RDD? How to perform Join in Spark Core?

    When you start working with Spark Core RDD is an important topic you stumble upon. What is RDD? What is the need of RDD? etc. are some of the questions we come across.  What is RDD? RDD stands for Resilient Distributed Datasets.They can be simply treated as blocks of data.  RDD is similar to Directory…

    Continue Reading

  • Working With Avro

    While working on data analysis and data processing we encounter various file formats. Avro is one of the file formats. What is Avro? Avro is a data serialization and RPC library. It is used to improve data interchange, interoperability, and versioning in MapReduce. Avro was Created By Don Cutting. Avro utilizes a compact Binary data…

    Continue Reading

  • What is Link Spam?

    After knowing that PageRank and other techniques used by Google are spam Ineffective, various scammers turned to a method designed to fool the PageRank algorithm into overvaluing certain pages. Fraudsters used techniques to increase the rank of their pages. The techniques for artificially increasing PageRank of pages are called link spam.We will examine how spammers…

    Continue Reading

  • HyperLogLog

    Let us build a web analytics tool where one data point is number of unique users that visited URL. Problem that we face implementing this is web-scale, as you may have million of users. A naive MapReduce implementation of aggregation will be to use hashtable to store and count number of unique users. Hashtable for…

    Continue Reading

  • How to Find Friends of Friends

    Social media provides us with recommendations and suggestions for new friends. Most such platforms use the Friends-Of-Friends (fof) algorithm. Friends-Of-Friends (fof) is implemented using the shortest-path algorithm. It helps social media platforms to broaden their network of users.  The friends-of-friends algorithm suggests friends that users may know but who aren’t part of their immediate network.…

    Continue Reading