The struggle to grab the top spot on Google’s (and other search engines) search results is an ongoing battle. It used to be con­sid­ered almost an SEO sport to use as many keywords as possible in their content, but now the art of search engine op­ti­miza­tion revolves around creating unique texts. Whether at the homepage, subpage, product page or category page of your website: exclusive, relevant content that differs in terms of copy­writ­ing and keyword usage from peer-to-peer reviewers is key when it comes to out­per­form­ing the com­pe­ti­tion and being placing first in the results. A term that is in­creas­ing­ly used in this context is the WDF*IDF analysis or formula.

What is WDF*IDF?

WDF*IDF is an analysis method that can be used within the scope of search engine op­ti­miza­tion to determine keywords and terms that sus­tain­ably increase the relevance of published texts, and, therefore, the entire website. It’s a formula that mul­ti­plies the two values of Within Document Frequency (WDF) and Inverse Document Frequency (IDF). The result is the relative term frequency (also term weighting) of a document, relative to all other web documents which also contain the keyword included in the analysis. Before the WDF*IDF analysis can be run, you first need to determine the two factors mentioned.

How to determine the Within-Document-Frequency (WDF) value

The WDF describes how often a par­tic­u­lar term occurs in a document compared to all other terms it contains. To increase the validity of the de­ter­mined value, the formula is based on a logarithm that prevents the central term from being weighted too heavily. The term was first mentioned in 1992 in the work of Donna Harman, in which her article “Ranking Al­go­rithms” features the term WDF as a way to give words of a par­tic­u­lar document a weighting value useful for in­for­ma­tion science. In Website Op­ti­miza­tion, the WDF value has been used for some time as an al­ter­na­tive to the less flexible keyword density value, which merely reflects the relative abundance of a key term. The formula for de­ter­min­ing the Within Document Frequency is:

The in­di­vid­ual com­po­nents of the equation can be explained as follows:

i Term that you are using the Within Document Frequency to determine the frequency of
j Document to be analyzed
Lj Total number of words in the "j" document
Freq(i,j) Frequency of the word"i" in the document "i"
log2 Logarithm of the number x to the power of 2

Therefore, the WDF value for a term “I” in the document “j” is de­ter­mined by adding the frequency of the term and 1 and dividing it by the total number of words in that document. Both values use the logarithm “log2”, which gives you more mean­ing­ful results for the term than it does in de­ter­min­ing pure keyword density or relative frequency. An example can il­lus­trate this:

An examined term that appears 50 times in a 1,000-word document has a Within Document Frequency of 0.57. The relative frequency, in this case, is 5 percent. If you increase the frequency of the term now for op­ti­miza­tion purposes, to say 500, you get a WDF value of 0.9 (rounded) – i.e. a value that is around 1.5 times higher than in the original text. On the other hand, if you choose the relative value (which has now risen to 50 percent) as the basis, you will see an increase 10 times the original value.

How to determine the Inverse Document Frequency (IDF) value

Inverse Document Frequency (IDF) is a value that measures the meaning of a term, not by its frequency in a par­tic­u­lar document, rather by its dis­tri­b­u­tion and use through­out the body of the document: the more potential a concept has, the higher the Inverse Document Frequency. The optimal case is that a term is very common in just a few documents. On the other hand, words that appear in almost every document or appear only rarely are of minor im­por­tance. For example, the word “imprint” has a very low IDF value because it is used on almost every website.

To calculate the inverse document frequency value, the following formula is needed (it also uses a logarithm to adjust the results):

The different com­po­nents of the IDF equation can be explained as follows:

i Term, that the Inverse Document Frequency is being de­ter­mined for
log Logarithm of the number x to base 10 or to any basis b
ND Number of documents in the result sets (con­tain­ing relevant terms)
fi Number of documents where the term i occurs

Therefore, to determine the IDF value of a term "i", divide the total number of (relevant) documents contained in the result sets by the number of documents con­tain­ing the term and then add the number 1. Finally, take the logarithm “log” from the result of that cal­cu­la­tion.

How is the number of all relevant documents in the result set cal­cu­lat­ed?

Adding ND means that the IDF formula cannot be uniformly de­ter­mined. Instead, it is the result of the frequency of all mean­ing­ful words in the examined document, as well as the un­der­ly­ing absolute number of documents. However, when analyzing web documents for SEO purposes, the potential results are huge, as all pages indexed by Google (or other search engines) are eligible. Nev­er­the­less, to obtain a specific value, the number of search results of all relevant terms in the document is de­ter­mined and added. For example, in a highly sim­pli­fied document that only contains the words “Search Engine Op­ti­miza­tion” (17,300,00 search results, December 2017) and “Web Analytics” (2,200,000 search results, December 2017), has a Nvalue of 19,500,000.

WDF*IDF: The com­bi­na­tion of both formulae

Because Within Document Frequency rep­re­sents the relevance of a term within a par­tic­u­lar document and the Inverse Document Frequency can reflect the role of a term relative to all of the search result documents, merging both values provides deep insights into the actual term frequency and potential of the term to optimize existing text content. To this purpose, it is only necessary to multiply both values, which results in the following overall formula for the WDF*IDF analysis and help determine the most exact, usable term frequency:

In principle, it means bringing all the important com­po­nents together and using them to determine the validity of terms used in webtexts. Of course, the bigger the database, the more mean­ing­ful the results are. However, to make the WDF*IDF analysis useful for search engine op­ti­miza­tion, it must be applied to all mean­ing­ful words within a document. This would simply be too much effort to do manually, which is why using the WDF*IDF tool is part of any serious reper­toire when cal­cu­lat­ing term weighting. On the one hand, these programs (see below) help to analyze the existing textual material. On the other hand, they also provide clues as to which concepts a document lacks in order to be as unique and relevant as possible.

Con­clu­sion

The frequency of the term "i" in the document "j" can be de­ter­mined by mul­ti­ply­ing the Within Document Frequency of the term "i" in the document "j" by the inverse document frequency of the term "i" through­out the set results.

The benefits of WDF*IDF for Search Engine Op­ti­miza­tion

The ad­van­tages of a com­pre­hen­sive WDF*IDF analysis are obvious: the values obtained for weighting key terms serve as perfect landmarks for writing texts so that:

  • they have high relevance for search engines
  • they cover topics which do not have a lot of com­pe­ti­tion
  • they do not have any keyword spam
  • and are as unique as possible

Anyone who is dis­sat­is­fied with his or her own website rankings or strives for improved op­ti­miza­tion has helpful ally by utilizing WDF*IDF values. Based on analysis data, copy­writ­ers can create concrete guide­lines for revising their content that aren’t just aimed at in­creas­ing the keyword density or in­cor­po­rat­ing other keywords into the text.

Note

For all the use­ful­ness of a thorough WDF*IDF analysis, you should never forget that content is written primarily for readers and not for search engines. In addition, since the former is getting better and better at capturing texts se­man­ti­cal­ly, in the long run, there is simply no way around strong content in which keywords and other technical additions play just a minor part.

What are the weak aspects of WDF*IDF analyses?

Although WDF*IDF provides very valuable input for website op­ti­miza­tion, there are a few issues that should be con­sid­ered before analyzing and eval­u­at­ing results. For example, a fun­da­men­tal problem is that a WDF*IDF analysis always includes all the textual elements of a document, whether they are headings, category/product de­scrip­tions, or captions. Dif­fer­en­ti­a­tion of the in­di­vid­ual com­po­nents won’t take place. Even if only one paragraph is too keyword-heavy or contains too few el­e­men­tary terms, the analysis method won’t provide a sat­is­fac­to­ry answer, since the frequency weighting is always evaluated for the entire document.

Tip

Before con­sid­er­ing a WDF*IDF analysis for your own website, you should carefully check whether the embedded content is suitable for the term frequency analysis method. In addition, the results obtained should be carefully scru­ti­nized in order to detect potential fallacies (too small a database, for example) that need to be avoided.

Another weakness of the WDF*IDF formula is that it only gets really in­ter­est­ing with a high word count. For shorter passages like product de­scrip­tions, smaller blog entries or news articles, the analysis does not provide mean­ing­ful, usable results. This is why it’s often un­suit­able for certain websites like online stores or news portals. For sites that rely on heavy editorial work, the drawback is that WDF*IDF analysis is difficult to in­cor­po­rate into the standard workflow. Since fast response times and up-to-datedness are par­tic­u­lar­ly in demand here, op­ti­miz­ing the texts after pub­lish­ing would be a practical, if complex, solution.

An overview of the ad­van­tages and dis­ad­van­tages of the WDF*IDF analysis

Ad­van­tages of the WDF*IDF analysis Dis­ad­van­tages of WDF*IDF analysis
Provides a great op­por­tu­ni­ty to expose existing keyword spam Always examines the complete text content of a document
Makes relevance and unique­ness crucial criteria for frequency weighting in the fore­ground Provides no in­for­ma­tion about special para­graphs or passages that are worth op­ti­miz­ing
Rates terms with lower com­pe­ti­tion better than highly com­pet­i­tive ones Not suitable for short texts with few words
Unites the dis­ci­plines of document specific and cross-dis­ci­pli­nary analysis Hard to integrate into work processes which pri­or­i­tize time­li­ness and re­spon­sive­ness
Flattens results through log­a­rithms for more mean­ing­ful results Precise number of all relevant documents is difficult to determine

What WDF*IDF tools are there?

There are several tools that can be used to perform a WDF*IDF analysis. It is important to dis­tin­guish between ap­pli­ca­tions that are only part of an SEO suite and those that are available as stand­alone solutions. Depending on the range of functions and the usage options, the in­di­vid­ual tools differ in terms of cost. To give a brief overview of the variety of ap­pli­ca­tions, we have compiled some of the best WDF*IDF tools in the following list:

  • OnpageDoc: If you would like to analyze and optimize your websites‘ SEO status, you can use OnpageDoc, the complete package from SAC Solutions GmbH in Cologne, Germany. If you take out a monthly sub­scrip­tion, you’ll have access to a variety of features to review and improve keywords, meta tags, backlinks and more. A WDF*IDF tool for term weighting analysis and targeted com­pet­i­tive com­par­i­son is also part of the portfolio. Those who do not want to access the entire suite can also download the tool for free at wdfidf-tool.com. However, the problem is that the number of possible queries is limited to 100 queries per hour (common to all users).
  • SEOlyze: Semantic analysis and research based on the WDF*IDF principle can also be done with the paid content analysis section of SEOlyze. Helminger GmbH, which is based in Austria, focuses on helping clients perfect website content and offers various tools like a W-questions tool for research, a duplicate content checker or read­abil­i­ty analyses (Flesch/Wiener factual text formula) to achieve this. The cen­ter­piece, however, is the com­pre­hen­sive WDF*IDF analysis function, the results of which can be im­ple­ment­ed directly into the SEOlyze interface, thanks to the in­te­grat­ed editor. In addition to the WDF*IDF tool, the SEO suite includes various rank-tracking features, as well as several other tools for general on-page op­ti­miza­tion (keyword analysis, metadata checker, images, links, etc.).
     
  • XOVI: XOVI, a sub­sidiary of Plesk since 2017, provides its customers with a SEO suite that leaves little to be desired. The charge­able XOVI Toolbox, which is available in multiple languages, has three different models on offer (Pro, Business and En­ter­prise). It also includes tools to keep track of ads, traffic, keywords, backlinks and social signals. The XOVI Tex­tOp­ti­miz­er also includes a WDF*IDF text tool that not only cal­cu­lates the relevance of terms used and suggests other terms based on the first ten Google search results pages, but also allows for direct editing.
     
  • Seobility: Seobility offers numerous SEO tools free of charge on their homepage – such as a simple WDF*IDF tool. The web ap­pli­ca­tion allows users to parse the weighting of a term based on the WDF*IDF formula. In addition, the tool plays other terms (including frequency value) that match the word you are looking for. Access to the Seobility program is limited to five analyses per day per user. Users who create an account can have access to the advanced search settings in order to, for example, adjust the base of the logarithm, increase the number of con­sid­ered search results or select the platform (desktop/mobile) to optimize for.
Go to Main Menu