WDF*IDF: What can the SEO miracle formula really do?

Contents

The struggle to grab the top spot on Google’s (and other search engines) search results is an ongoing battle. It used to be considered almost an SEO sport to use as many keywords as possible in their content, but now the art of search engine optimization revolves around creating unique texts. Whether at the homepage, subpage, product page or category page of your website: exclusive, relevant content that differs in terms of copywriting and keyword usage from peer-to-peer reviewers is key when it comes to outperforming the competition and being placing first in the results. A term that is increasingly used in this context is the WDF*IDF analysis or formula.

What is WDF*IDF?

WDF*IDF is an analysis method that can be used within the scope of search engine optimization to determine keywords and terms that sustainably increase the relevance of published texts, and, therefore, the entire website. It’s a formula that multiplies the two values of Within Document Frequency (WDF) and Inverse Document Frequency (IDF). The result is the relative term frequency (also term weighting) of a document, relative to all other web documents which also contain the keyword included in the analysis. Before the WDF*IDF analysis can be run, you first need to determine the two factors mentioned.

How to determine the Within-Document-Frequency (WDF) value

The WDF describes how often a particular term occurs in a document compared to all other terms it contains. To increase the validity of the determined value, the formula is based on a logarithm that prevents the central term from being weighted too heavily. The term was first mentioned in 1992 in the work of Donna Harman, in which her article “Ranking Algorithms” features the term WDF as a way to give words of a particular document a weighting value useful for information science. In Website Optimization, the WDF value has been used for some time as an alternative to the less flexible keyword density value, which merely reflects the relative abundance of a key term. The formula for determining the Within Document Frequency is:

The individual components of the equation can be explained as follows:

i	Term that you are using the Within Document Frequency to determine the frequency of
j	Document to be analyzed
Lj	Total number of words in the "j" document
Freq(i,j)	Frequency of the word"i" in the document "i"
log2	Logarithm of the number x to the power of 2

Therefore, the WDF value for a term “I” in the document “j” is determined by adding the frequency of the term and 1 and dividing it by the total number of words in that document. Both values use the logarithm “log2”, which gives you more meaningful results for the term than it does in determining pure keyword density or relative frequency. An example can illustrate this:

An examined term that appears 50 times in a 1,000-word document has a Within Document Frequency of 0.57. The relative frequency, in this case, is 5 percent. If you increase the frequency of the term now for optimization purposes, to say 500, you get a WDF value of 0.9 (rounded) – i.e. a value that is around 1.5 times higher than in the original text. On the other hand, if you choose the relative value (which has now risen to 50 percent) as the basis, you will see an increase 10 times the original value.

How to determine the Inverse Document Frequency (IDF) value

Inverse Document Frequency (IDF) is a value that measures the meaning of a term, not by its frequency in a particular document, rather by its distribution and use throughout the body of the document: the more potential a concept has, the higher the Inverse Document Frequency. The optimal case is that a term is very common in just a few documents. On the other hand, words that appear in almost every document or appear only rarely are of minor importance. For example, the word “imprint” has a very low IDF value because it is used on almost every website.

To calculate the inverse document frequency value, the following formula is needed (it also uses a logarithm to adjust the results):

The different components of the IDF equation can be explained as follows:

i	Term, that the Inverse Document Frequency is being determined for
log	Logarithm of the number x to base 10 or to any basis b
ND	Number of documents in the result sets (containing relevant terms)
fi	Number of documents where the term i occurs

Therefore, to determine the IDF value of a term "i", divide the total number of (relevant) documents contained in the result sets by the number of documents containing the term and then add the number 1. Finally, take the logarithm “log” from the result of that calculation.

How is the number of all relevant documents in the result set calculated?

Adding N_D means that the IDF formula cannot be uniformly determined. Instead, it is the result of the frequency of all meaningful words in the examined document, as well as the underlying absolute number of documents. However, when analyzing web documents for SEO purposes, the potential results are huge, as all pages indexed by Google (or other search engines) are eligible. Nevertheless, to obtain a specific value, the number of search results of all relevant terms in the document is determined and added. For example, in a highly simplified document that only contains the words “Search Engine Optimization” (17,300,00 search results, December 2017) and “Web Analytics” (2,200,000 search results, December 2017), has a N_Dvalue of 19,500,000.

WDF*IDF: The combination of both formulae

Because Within Document Frequency represents the relevance of a term within a particular document and the Inverse Document Frequency can reflect the role of a term relative to all of the search result documents, merging both values provides deep insights into the actual term frequency and potential of the term to optimize existing text content. To this purpose, it is only necessary to multiply both values, which results in the following overall formula for the WDF*IDF analysis and help determine the most exact, usable term frequency:

In principle, it means bringing all the important components together and using them to determine the validity of terms used in webtexts. Of course, the bigger the database, the more meaningful the results are. However, to make the WDF*IDF analysis useful for search engine optimization, it must be applied to all meaningful words within a document. This would simply be too much effort to do manually, which is why using the WDF*IDF tool is part of any serious repertoire when calculating term weighting. On the one hand, these programs (see below) help to analyze the existing textual material. On the other hand, they also provide clues as to which concepts a document lacks in order to be as unique and relevant as possible.

Conclusion

The frequency of the term "i" in the document "j" can be determined by multiplying the Within Document Frequency of the term "i" in the document "j" by the inverse document frequency of the term "i" throughout the set results.

The benefits of WDF*IDF for Search Engine Optimization

The advantages of a comprehensive WDF*IDF analysis are obvious: the values obtained for weighting key terms serve as perfect landmarks for writing texts so that:

they have high relevance for search engines
they cover topics which do not have a lot of competition
they do not have any keyword spam
and are as unique as possible

Anyone who is dissatisfied with his or her own website rankings or strives for improved optimization has helpful ally by utilizing WDF*IDF values. Based on analysis data, copywriters can create concrete guidelines for revising their content that aren’t just aimed at increasing the keyword density or incorporating other keywords into the text.

Note

For all the usefulness of a thorough WDF*IDF analysis, you should never forget that content is written primarily for readers and not for search engines. In addition, since the former is getting better and better at capturing texts semantically, in the long run, there is simply no way around strong content in which keywords and other technical additions play just a minor part.

What are the weak aspects of WDF*IDF analyses?

Although WDF*IDF provides very valuable input for website optimization, there are a few issues that should be considered before analyzing and evaluating results. For example, a fundamental problem is that a WDF*IDF analysis always includes all the textual elements of a document, whether they are headings, category/product descriptions, or captions. Differentiation of the individual components won’t take place. Even if only one paragraph is too keyword-heavy or contains too few elementary terms, the analysis method won’t provide a satisfactory answer, since the frequency weighting is always evaluated for the entire document.

Tip

Before considering a WDF*IDF analysis for your own website, you should carefully check whether the embedded content is suitable for the term frequency analysis method. In addition, the results obtained should be carefully scrutinized in order to detect potential fallacies (too small a database, for example) that need to be avoided.

Another weakness of the WDF*IDF formula is that it only gets really interesting with a high word count. For shorter passages like product descriptions, smaller blog entries or news articles, the analysis does not provide meaningful, usable results. This is why it’s often unsuitable for certain websites like online stores or news portals. For sites that rely on heavy editorial work, the drawback is that WDF*IDF analysis is difficult to incorporate into the standard workflow. Since fast response times and up-to-datedness are particularly in demand here, optimizing the texts after publishing would be a practical, if complex, solution.

An overview of the advantages and disadvantages of the WDF*IDF analysis

Advantages of the WDF*IDF analysis	Disadvantages of WDF*IDF analysis
Provides a great opportunity to expose existing keyword spam	Always examines the complete text content of a document
Makes relevance and uniqueness crucial criteria for frequency weighting in the foreground	Provides no information about special paragraphs or passages that are worth optimizing
Rates terms with lower competition better than highly competitive ones	Not suitable for short texts with few words
Unites the disciplines of document specific and cross-disciplinary analysis	Hard to integrate into work processes which prioritize timeliness and responsiveness
Flattens results through logarithms for more meaningful results	Precise number of all relevant documents is difficult to determine

What WDF*IDF tools are there?

There are several tools that can be used to perform a WDF*IDF analysis. It is important to distinguish between applications that are only part of an SEO suite and those that are available as standalone solutions. Depending on the range of functions and the usage options, the individual tools differ in terms of cost. To give a brief overview of the variety of applications, we have compiled some of the best WDF*IDF tools in the following list:

OnpageDoc: If you would like to analyze and optimize your websites‘ SEO status, you can use OnpageDoc, the complete package from SAC Solutions GmbH in Cologne, Germany. If you take out a monthly subscription, you’ll have access to a variety of features to review and improve keywords, meta tags, backlinks and more. A WDF*IDF tool for term weighting analysis and targeted competitive comparison is also part of the portfolio. Those who do not want to access the entire suite can also download the tool for free at wdfidf-tool.com. However, the problem is that the number of possible queries is limited to 100 queries per hour (common to all users).
SEOlyze: Semantic analysis and research based on the WDF*IDF principle can also be done with the paid content analysis section of SEOlyze. Helminger GmbH, which is based in Austria, focuses on helping clients perfect website content and offers various tools like a W-questions tool for research, a duplicate content checker or readability analyses (Flesch/Wiener factual text formula) to achieve this. The centerpiece, however, is the comprehensive WDF*IDF analysis function, the results of which can be implemented directly into the SEOlyze interface, thanks to the integrated editor. In addition to the WDF*IDF tool, the SEO suite includes various rank-tracking features, as well as several other tools for general on-page optimization (keyword analysis, metadata checker, images, links, etc.).
XOVI: XOVI, a subsidiary of Plesk since 2017, provides its customers with a SEO suite that leaves little to be desired. The chargeable XOVI Toolbox, which is available in multiple languages, has three different models on offer (Pro, Business and Enterprise). It also includes tools to keep track of ads, traffic, keywords, backlinks and social signals. The XOVI TextOptimizer also includes a WDF*IDF text tool that not only calculates the relevance of terms used and suggests other terms based on the first ten Google search results pages, but also allows for direct editing.
Seobility: Seobility offers numerous SEO tools free of charge on their homepage – such as a simple WDF*IDF tool. The web application allows users to parse the weighting of a term based on the WDF*IDF formula. In addition, the tool plays other terms (including frequency value) that match the word you are looking for. Access to the Seobility program is limited to five analyses per day per user. Users who create an account can have access to the advanced search settings in order to, for example, adjust the base of the logarithm, increase the number of considered search results or select the platform (desktop/mobile) to optimize for.

Reviewer

Melissa Senn
Melissa Senn has been with IONOS since 2016 and focuses on website, e-commerce, social media, and AI in the Digital Guide. As a digital strategist, she uses her extensive SEO and GEO experience to transform complex technological trends into practical content.

10 Years Digital Guide: A Success Story

Turn calls into revenue: AI receptionist

What are worthwhile Keyword Planner alternatives?

Since the end of 2016, Google’s Keyword Planner has been limited in its use. Instead of exact values for the search volume of checked keywords, users of the free version now only receive approximate range values, which are particularly uninformative for keywords with low search…

Google
Advice
SEO

Andrii OrlovShutterstock

What are the best SEO tools? 11 tools compared

When you want to run a successful web business there’s no way of getting around Google and search engine optimization or SEO for short. Once a secret marketing practice, there are now plenty of useful SEO tools available to guide beginners in finding and adding keywords and…

ESB ProfessionalShutterstock

SEO articles: 11 tips for better content

What distinguishes good SEO texts from average website content and how do you actually write SEO articles for your website? We provide 11 tips on topics such as keyword optimization, duplicate content, and internal links to show you how to customize your site content and optimize…

Content Marketing
SEO

Lee Yiu TungShutterstock

Duplicate content: how to spot and avoid it

What is duplicate content? The term refers to pages or text modules that appear several times on a website or across domains. Duplicate content makes it difficult for search engines to display relevant content to users. Therefore, websites with duplicated content are downranked.…

Encyclopedia
Google
SEO