Learn to use word vectors to calculate the semantic similarity of texts programmatically

Image for post
Image for post
Photo by Nika Charakova

In a nutshell, word vectors are nothing but the series of real numbers that represent the meanings of natural language words. …


Image for post
Image for post
Photo by Igor Shabalin

If you search for ways to improve your stock investment skills, you’ll definitely find ‘do some research’ among other things.

The decision on whether to buy, sell, or hold the stock is often taken based on emotions, which is definitely not the best strategy. …


Image for post
Image for post
Photo by Dmitry Starinov

It’s quite common nowadays to employ computer technologies to predict stock returns. Thus, many stock prediction algorithms rely on machine learning technology to search patterns and insights in stock data. Before you can do that however, you first need to obtain a data set with necessary stock data and then load it into a data structure in your program. This article covers how you can accomplish those first steps required to get started with stock data analysis in Python.

Getting Stock Data from Yahoo Finance

Where can I get data for analysis? This is perhaps the first question you may have after you decide to perform stock analysis programmatically. If so, Yahoo Finance API is one the most relevant answers to your question, allowing you to obtain stock market data for free. With the yfinance library built on top of Yahoo Finance API, the procedure of obtaining the data for a certain stock within a specified time period can be accomplished in Python with just a few lines of code. …


Image for post
Image for post
Photo by Igor Shabalin

Sentiment analysis is a task of text classification. In particular, it is about determining whether a piece of writing is positive, negative, or neutral. Having a set of labeled sentences accordingly, you may train a machine learning model that can be then used to make predictions on new sentences. This article illustrates that the process of training such a model can be implemented with just a few lines of code in a Python script that employs the sklearn library.

Loading the Data from a Data Set

In the article example, we’ll take advantage of the Sentiment Labelled Sentences Data Set available from the UCI Machine Learning Repository. This data set contains customer product reviews labeled with positive or negative sentiment. The reviews were taken from three different websites, including imdb.com, amazon.com, and yelp.com. So, the data comes in three different files. The total number of instances in the data set is 3000. In the article example, we’ll use the instances (labeled sentences) that were taken from amazon.com …


Image for post
Image for post
Photo by Igor Shabalin

The process of finding necessary information in the internet manually can be quite time consuming. This article discusses how you might tune your own channel in a messenger to automatically get latest information on the topics of your interest. The idea is the following: you create a bot that will collect information for your channel and then connect this bot to the channel. You need to make the bot available for everyone, so that the results of bot users’ requests come to your channel.

This approach allows you to discover the latest trends, since you will receive the information in which the users of your bot are interested. For example, if you notice that many users are recently interested about Tesla stock, it may encourage you to take a closer look at it. And due to your channel, you’ll have this information in the form of the latest articles written by stock analysts on this topic. …


Image for post
Image for post
Photo by Igor Shabalin

Nowadays, the internet has become the main source of information for most of us. When we need to learn about something or master something, we typically go online, using a web search engine like Google to obtain necessary information. Reviewing the retrieved results however, may take considerable time, requiring you to look into each link to see whether the information it contains really suits you. You can significantly shorten your research time when you know exactly what you want to find and can narrow down your search accordingly.

The problem is, though, that sometimes it’s too hard to explain all your requirements to the search engine. For example, you may need to obtain only the latest information about a business entity, thus obtaining only those resources that were published, say, within the last week. To address this problem, you can conduct an advanced search with a search engine. To accomplish this programmatically, you may take advantage of a web scraping API, allowing you to specify the necessary parameters of the search being conducted from within a script. …


Image for post
Image for post
Photo by Ivan Brigida

Everything occurs somewhere. That is why the spatial attributes representing the features of an object can be not less important than its nonspatial attributes when performing data analysis in which that object is involved. As a very simple example, consider taxi ordering via an application. When you order a taxi, you might want to know not only some basic information about the car and the driver assigned to your order but also track their current location on a map while they are heading to you.

Python has a robust ecosystem of pre-existing libraries that can be leveraged in spatial analysis. These are geopandas, shapely, and geopy to name a few. Since, in many data analysis scenarios, both the spatial and nonspatial attributes of an object are involved, you will often need to use these spatial libraries in conjunction with general-purpose libraries used in data analysis, such as, Pandas. …


Image for post
Image for post
Photo by Sergey Arianov

The samples of natural language are typically contextual and mostly driven by intention. So, as a chatbot developer, your primary tasks are to ‘teach’ your bot how to understand the intent and context of what a user says. In my previous article, I touched upon how you can make your bot recognize the context of a sentence being processed, finding the antecedents for substitutes in the previous discourse. In this article, I’ll share some tips on how you can make your bot recognize the intent behind a sentence, using syntactic dependency labels and named entities.

An Example of a User Request

Whether your chatbot is used for food-ordering, ticket-booking, or taxi-ordering, it needs to understand a customer’s intent to take an order correctly. As an example, consider the following request that might be submitted to a ticket booking chatbot…


Image for post
Image for post
Photo by Igor Shabalin

One of the most challenging tasks in chatbot development is to ‘teach’ the bot to recognize the context of a sentence being processed. Many good designed bots are supposed to understand what stands for pronoun in the current sentence, based on what was said in a discourse previously. Say, a smart bot should understand that in the phrase: ‘What do you know about it?’ ‘it’ is a substitute for a word or a phrase mentioned in one of the previous sentences.

Understanding Substitutes

It’s quite common in natural language to use pronouns in place of nouns, proper nouns, or even entire phrases, which were mentioned earlier in a discourse. For those who develop bots, this makes things a bit more complicated. …


Image for post
Image for post

This article discusses how you can analyze official COVID-19 cases data with Python, employing the Pandas library. You’ll see how you can glean insights from actual datasets, discovering information that may not be so obvious at first glance. In particular, the example provided in the article illustrates how you can derive information about the rate of spread of the disease in different countries.

Preparing Your Working Environment

To follow along, you need to have the Pandas library installed in your Python environment. If you don’t have it yet, you can install it with the pip command:

pip install pandas

Then, you’ll need to pick up an actual dataset to work with. For the example provided in this article, I needed a dataset that includes information about total confirmed cases of COVID-19 by country and date. Such a dataset can be downloaded from https://data.humdata.org/dataset/novel-coronavirus-2019-ncov-cases as a CSV file: time_series_covid19_confirmed_global_narrow.csv …

About

Yuli Vasiliev

is the author of Natural Language Processing with Python and spaCy (2020, No Starch Press, https://nostarch.com/NLPPython)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store