Sign in

Currently a Research Scholar at IIIT Bangalore. I make videos on ML/DL/NLP topics @ | LinkedIn:

Outlier Detection Methods (Visuals and Code)

Modified Image from Source

Outliers are those observations that differ strongly(different properties) from the other data points in the sample of a population. In this blog, we will go through 5 Outlier Detection techniques that every “Data Enthusiast” must know. But before that let’s take a look and understand the source of outliers.

What are the possible sources of outliers in a dataset?

There are multiple reasons why there can be outliers in the dataset, like Human errors(Wrong data entry), Measurement errors(System/Tool error), Data manipulation error(Faulty data preprocessing error), Sampling errors(creating samples from heterogeneous sources), etc. Importantly, detecting and treating these Outliers is important for learning a robust and generalizable machine learning system.

Understanding Sampling Methods (Visuals and Code)

Image from Author

Sampling is the process of selecting a subset(a predetermined number of observations) from a larger population. It’s a pretty common technique wherein, we run experiments and draw conclusions about the population, without the need of having to study the entire population. In this blog, we will go through two types of sampling methods:

  1. Probability Sampling —Here we choose a sample based on the theory of probability.
  2. Non-Probability Sampling — Here we choose a sample based on non-random criteria, and not every member of the population has a chance of being included.

Readlist of NLP Research Paper Summary Blogs and Videos

Image from Source

This blog is an outpost with the aim to organize all the blogs that I have written so far on Medium for explaining NLP research papers. Also for easy searchability, I have grouped all the blogs under a common high-level topic.

Before you start criticizing me for not giving credits to original authors in this post 😠 — All the information related to the author, organization and paper is present in the respective blog posts. 😊 So Happy Reading……

P.S. I will keep updating this blog when I add more research paper summaries. (Last updated: 08/06/2021) — dd/mm/yyyy

Topics Covered so far…

  1. On-Device NLP…

Summarizing approaches from 10 research papers for Unsupervised Keyword Extraction from Text

Image from Source

This blog lists out (All?) popular Unsupervised Keyword Extraction Algorithms in Natural Language Processing (NLP). Keywords refer to the important phrases/expressions that are representative of the underlying document. Keywords from a document can accurately describe the document’s content and can facilitate fast information processing. Also, keywords act as a meta-information for any document and also can be used to construct document glossaries.

So for this blog, we will focus only on unsupervised techniques because supervised keyword extraction algorithms require large amounts of manually selected keywords as training data which might not always be available and is a costly labour-intensive task…

Research Paper Explanation

Image by Author


Keyword/Keyphrase extraction is the task of extracting important words that are relevant to the underlying document. It lets you to enable faster search over documents by indexing them as document alias and are even helpful in categorizing a given piece of text for these central topics. Now, these could be either abstractive (relevant keywords from outside of the written text) or extractive (relevant keywords present in the written text) in nature. Both have their own benefits but for this blog, we will go through this very interesting piece of work by researchers from Swisscom AG and EPFL on extractive keyphrase…

These were my Medium Statistics for June 2021 which got me a bonus of $500.

Image from Source

Total Stories Published in June

I managed to publish 10 stories in total for the month of June.

Views/Reads/Fans in June

Below figure shows my stats chart with relevant details like views, reads and fans over the month of June.

Slackbot tutorial with Code in Python

Image from Source

Chatbots are softwares that are programmed to have a conversation with humans around some theme with a motive to achieve a defined goal. They can help us automate any particular task or trigger a general chit-chat allowing the communication medium to be either text or voice.

Some of the popular examples of real-world chatbot deployments include —

  • Restaurants allowing customers to order food from their table using a bot.
  • A bot that helps customers make e-commerce purchases.
  • etc…

There exist many platforms where you can deploy your bot and make it available for the public to use it. Some of…

Discussing Popular Text Readability Methods

Image from Source

People have been using language as a medium for ages to express their thoughts and emotions on any particular topic or thing. Also, different authors have different styles of writing, such as some might use pretty complex vocabulary whereas some would use common words to express the same thing, Sentence structure also varies a lot with people, etc.

In this blog, we will be discussing some of the existing formulas to measure the complexity/readability of the text document from a linguistics point of view. …

Language Models in NLP (Visuals and Examples)

Modified Image from Source

Most of the modern-day NLP systems have been following a pretty standard approach for training new models for various use-cases and that is First Pre-train then Fine-tune. Here, the goal of pre-training is to leverage large amounts of unlabeled text and build a general model of language understanding before being fine-tuned on various specific NLP tasks such as machine translation, text summarization, etc.

In this blog, we will discuss two popular pre-training schemes, namely, Masked Language Modeling (MLM) and Causal Language Modeling (CLM).

Don’t have time to read the entire blog? …

Understanding Bisecting K-Means Clustering Algorithm (Visuals and Code)

Modified Image from Source

Bisecting K-means clustering technique is a little modification to the regular K-Means algorithm, wherein you fix the procedure of dividing the data into clusters. So, similar to K-means, we first initialize K centroids (You can either do this randomly or can have some prior). After which we apply regular K-means with K=2 (that’s why the word bisecting). We keep repeating this bisection step until the desired number of clusters are reached. …

Prakhar Mishra

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store