About Yandex’s new “Palekh” algorithm in simple words

Recently we held an internal seminar about the new Palekh algorithm from Yandex. And I decided About Yandex’s new to publish the 3 main topics of the seminar in the blog. Some things are simplified for better understanding of most readers.

Ask your questions in the comments to the post. I will try to clarify unclear points.


1. How do text algorithms with dictionaries differ from algorithms based on neural networks?


In simple text algorithms, thematic words are found in the corresponding thematic dictionaries. The dictionary of synonyms can also be included there. Let’s say the search phrase contains the word “phone”. The dictionary of synonyms contains the word “mobile phone”, and the dictionary of thematic words will contain the word “display”. And the presence of these words in the content will have a positive effect on the assessment of the text relevance of the page for the query “phone”.

What happens if the text contains the word “matrix”, but it is not in the thematic dictionary of the algorithm? Such a word will get a steering wheel (score = 0), and the entire text will lose textual relevance. Although everyone understands that the word “matrix” has some relation to the word “phone”. It’s just that the connection is a little more complicated.

In fact, term N is related to term M to some extent. The stronger the connection between the search term and the content term, the greater the contribution to text relevance that term makes . To assess the strength of these Ebay Data connections, search engines build neural networks:


And the further the words are from each other, the worse their connection. The less they are connected in meaning . Do you understand?

The neural network is built by machine learning algorithms – “Matrixnet” in Yandex. To train it and build a complex network of connections, the algorithm is shown good and bad examples of connections. Then the algorithm learns itself, taking into account such examples.

2. About the priority of words in search phrases

Let’s say the phrase “blackberry sofa” comes into a search engine.

The search engine evaluates the importance of each word for the entire search query. For example, by the number of mentions of the term in the general or thematic word base. Let’s say that the word “sofa” is frequent, and “blackberry” is rare. This means that the significance of the word “blackberry” in this phrase is greater than the word “sofa”. For example, the ratio is 75% (“blackberry”) to 25% (“sofa”).

What does this mean? We can pay 120% attention only to the word “sofa”. And we will be below Bolivia Phone Number List the page with the word “blackberry”. And if the opposite, then the chances are much higher. I hope it is clear.

By the way, I have compiled an internal memo for copywriters on analyzing key phrases. If anyone needs it, write to my email with the subject “Memo on analyzing key phrases”.

Tags: , , , ,