Levenshtein Distance is the minimal number of insertions, deletions, and symbol substitutions required to transform a string a into string b.
Example: Consider string a: mouse & string b: morse
Levenshtein distance between string a and string b is 2. You need to delete u from string a and insert r to transform string a to string b.
There is also one other modification to levenshtein distance:
In the above example, Damerau-Levenshtein…
A certain death of an artist is overconfidence — Robin Trower
Remember the time when you switched off google maps because you were so confident that you know the way to your destination but it turns out that the road was shut due to some construction.. and you had to make use of google maps after all. That was you being over-confident. How to tackle overconfidence in machines? Sometimes, it is easy to confuse one thing with other. Hence, it is better to be just a little less confident about certain things which you are 100% confident about.
With the help of Machine learning, a system can make decision which can be relatable to the decisions that humans make. Machine learning has the ability to learn from the data it inputs. For now, Machine learning is trying to imitate the way humans learn in a computationally efficient manner.
Supervised learning contains of a dataset with a set of features and the corresponding target label. These sets of features are also known as predictors because they help in predicting the target label based on each data sample.
For example: Imagine giving a kid an ice-cream whenever he finishes his…
We like to believe that this is an era of ML and AI, but what we are forgetting is that this is also an era of black-box models which provide no justice and interpretation about the classifier and the decision it makes.
One way to make a change is to audit black-box models. Wait… what are black-box models?
A black-box model refers to something which is completely dark and hence, one can only observe the input and output variables but not what is going on inside it. …
My father gave me an excel sheet with his work details and asked me to infer anything and everything that I could. This is what I did.
Wikipedia says “EDA is an approach to analyzing datasets to summarize their main characteristics, often with visual methods”. It is an approach which will help you build a better relationship with the new dataset. If you use it wisely, half of your analysis is done.
For example: When you are buying something online, do you read the reviews? Do you see the price? Do you see the color? Do you swipe through the…
Currently, I am a Master’s in Data Science Student at NYU, Center for Data Science. I completed my first ever internship in Data Science and I would love to tell you all about it.
I worked for a contextual-AI company, where we detect toxic behaviors from various platforms.
God! Did I have fun? The answer is — hell YES!
One major thing to take away is how Academic Data Science differs from Industry Data Science. But, the transition is what makes it so exciting.
We talk about working on data all the time to gain specific insights related to our projects, our experiences, maximising profits, increasing revenue and so many other reasons. This is what anyone thinks about all the time. But.. Let us take a step back and think in terms of “actual” data by taking a break from our “hypothetical” data.
It is not always easy to get data. No company will give it to you for free. There are a lot of trades that take place. For instance, you help the other company monetarily and they will help you with the…
What is the first word that comes to your mind when you hear the word — insight? As a Data Scientist, I refer to the word insight when I collect some useful information from my data. I believe there is an order which is followed by Data, Information and Insight.
There is a famous hierarchical structure that comes to mind after looking at these three words together:
Note from the editors: Towards Data Science is a Medium publication primarily based on the study of data science and machine learning. We are not health professionals or epidemiologists, and the opinions of this article should not be interpreted as professional advice. To learn more about the coronavirus pandemic, you can click here.
Just to give you a little background as to why I am preprocessing tweets: Given the current situation as of May, 2020, I am interested in the political discourse of the US Governors with respect to the ongoing pandemic. I would like to analyse how did the…
Masters in Data Science @ NYU