Getting your Trinity Audio player ready...
|
Google Algorithm Ranking Factors Revealed
Ruby Media Group Founder Kris Ruby spoke with Gizmodo about the latest Google Algorithm Leak.
“SEO experts say a massive leak of 14,000 ranking features exposes the blueprint for how Google secretly curates the Internet.”
Thousands of pages from Google’s internal Content API Warehouse detailing over 14,014 attributes related to Google’s ranking systems. Two of the attributes include signals related to Covid authority and election scoring. This potentially means Google had an internal config for domain level authority scoring pertaining to elections and vaccine related content.
Kristen Ruby, CEO of Ruby Media Group who has worked in digital public relations and SEO for more than 15 years, tells Gizmodo she received an ominous text on Monday night about the Google leak before it happened.
Kristen Ruby quickly found the leak and noted two ranking features that stuck out to her: “isElectionAuthority” and “isCovidLocalAuthority.” These features seem to be Google’s way of ranking a web page’s credibility for providing proper information about elections and COVID-19, respectively.
In 2019, Ruby wrote extensively about how Google’s measure of trustworthy web pages (which Google refers to as E-E-A-T, standing for Experience, Expertise, Authoritativeness, and Trust) is inherently political.
She notes that Google’s measure of these factors tends to skew along political lines. “It is problematic to me that Google is providing no context on critical items in the data such as ‘isElectionAuthority’ or ‘isCovidLocalAuthority.’ How is Google defining an authority in these critical domains?” Ruby said in an emailed statement.
“I should not have to guess at what the answer is. Google should be forthcoming and tell me what the answer is.”
Even though Google is a business with a right to private information, Ruby argues that Google has an obligation to answer questions on these ranking features that shape the world around us. Google has a right to keep certain IP proprietary. They don’t have to reveal everything. But once it is out there, they do have an obligation to answer real questions on what the information means. If Google wants to maintain their right to secrecy, then they must reciprocate the same right to users who do not want their privacy repeatedly violated with digital surveillance tools.
How does Google deal with Covid and Election content?
“Kristen Ruby, CEO of Ruby Media Group questioned two of the features leaked “isElectionAuthority” and “isCovidLocalAuthority.” Ruby is interested in what Google’s criteria are for deeming certain sources authoritative in these sensitive topics.”
The recent leak of Google’s internal search ranking factors generated significant buzz in the SEO community. But what matters to the SEO community pales in comparison to what matters to the public at large. Critical questions remain on how Google scored vaccine authority, election authority, and a “pedo” page rank score.
Natural Language Politics
According to Kristen Ruby, the API documentation shows alleged evidence of political bias and censorship in the Google code pertaining to elections and vaccines. The documents allegedly show that Google may have manipulated search results to control the flow of information on the Internet. In the past, Google has repeatedly denied all allegations of bias and censorship, stating that the documents had been taken out of context and that their systems are designed to be unbiased. Most recently, Google confirmed the validity of the documents and told users not to take the documents out of context.
What is problematic about Google’s ranking of election and Covid related content?
Google stated that people should be cautious against making inaccurate assumptions about search based on out of context, outdated, or incomplete information. Raw data will always be out of context. It is up to a subject matter expert to put that data into context based on their industry experience. It is problematic to me that Google is providing no context on critical items in the data such as “Is election authority” or “is Covid local authority.” How is Google defining an authority in these critical domains? These topics shape the world around us.
Based on my extensive reporting “Political Bias: How Google Quality Rater Guidelines Impact SERPs,” we know that Google exhibits political bias in SERPS based on how quality raters are told to score content at the page level. What we didn’t know is how that transcends to machine learning models and internal levels of authority. Is topical authority evenly distributed? Can someone be ranked as an authority on a topic that has an opposing point of view on the topic at a mainstream media level?
The election and Covid lines in Google’s Content Warehouse API documents are cause for concern.
- Did Google implement a specific authoritative metric for election authority and Covid authority?
- Was this only at the topic level or the domain level?
- If so, how did Google measure the expertise on these topics?
- Is an expert only someone who agrees with the narrative being pushed out?
- Or can an expert also be someone who has a different opinion?
My concern is that expertise is not actually about expertise- rather, that expertise may in fact be used as a vector associated with the entity of truth scoring. What if a site is associated with someone Google deems to be a non-authoritative source on Covid or elections? How does that impact the overall site level authority and ranking and visibility in search?
The Google API documentation is evidence that there are scores of topical authority associated with these highly polarized topics. How was that authority score measured and what impact did it have on search engine visibility?
Google must let people know if this election authority factor is still in place prior to an upcoming U.S. election. Google needs to clarify the meaning of election authority and Covid local authority. If Google refuses to do this, a formal investigation must be launched to uncover how this scoring system was used and what impact this had on search engine visibility during some of the most critical periods in U.S. history.
How did Google use election and covid authority in its search ranking system?
There is a library we cannot see that pertains to covid-local-authority, covid-vaccine-refinement, and Election-authority. They are internal hotlinks, but the variables are in the system as booleans. This shows it’s a factor in the content assessment tech.
These are internal links to libraries we don’t have access to so we can only speculate on how they may work.
As an example. “Is Covid…” is a Boolean so is a yes or no ‘bit’ that says if a site has the local authority Covid signal. Therefore, it can be assumed that a “local Covid authority signal” exists.
It could also be a way to note that a site can appear in a Covid related search or it could be a way to note a location where someone could get a Covid vaccine.
- How is someone determined to be an election authority?
- What does being an election authority actually mean?
Google can weasel out of the Covid tag by pinning it on local vaccine sites. However, they can’t get away without answering the election authority library of code that determines whether someone is or isn’t an election authority.
The ‘is Covid local authority’ – is either yes or no. There is a separate package of software that processes a page or site that designates that the page or site is a local or Covid authority or an election authority. Imagine every page gets a piece of data that says yes this is or this isn’t. If the query requires that be a config that improves performance – then it would enable that page to be there or not. For example, during Covid, if you were talking about shots.
If you searched for local vaccines, the entities you see in search could be those who offer the vaccine. Could this also be used to say that if you didn’t have that flag you can’t be the one providing Covid shot location information? That would be the censorship concern here.
Another possible concern: if you aren’t designated here then you couldn’t appear. Theoretically, it could be speculated that any related Covid information wouldn’t appear in search depending on the implementation. Did Google require the Boolean to be true in order for you to appear in searches related to Covid? If so, that is a form of censorship.
Out of the two lines of code, the election line is most alarming.
Hi @Google
Please explain the election authority signal. 🗳️ pic.twitter.com/A41gz3yDJl
— Kristen Ruby (@sparklingruby) May 29, 2024
The election line of code is more scandalous because it is not focused on the polling places. It is most likely related to election results and following the election and what sites can and can’t appear during high pressure temporal periods. If this is a true override, does it apply to what places online are giving election results? If so, it is a form of censorship to say that one media outlet is allowed to cover the election and receive search engine prominence and another one is not. It could be the election results, it could also be anything related to the election. We just don’t know.
Election coverage – is this web page or website an appropriate website to give information on the election? To me, that is extremely concerning regarding big tech censorship concerns pertaining to U.S. elections. When you put these two together – it becomes problematic.
These two Booleans are the smoking gun in the Google documentation because it shows possible manual intervention of search results. This is a smoking gun and signal where the search engine has to be manually managed and Google had to figure it out using this type of a classifier. If the system runs automatically – as you can see with these API options – the system has to catch exceptions. Requiring this shows that it was hard for Google to figure out. They had to algorithmically figure out whether someone was an election authority or not and they had to do that differently than whether someone is an authority on any other topic, for example, turtles.
Regarding the election line in the Google documentation, this could be used to cover the election regarding expertise and authority. Since ‘local’ isn’t noted in that line, it seems to be more aligned with manual tagging. Theoretically, it could be related to polling places and actual ‘election,’ but that’s less likely. That’s more of a signal of likely censorship. If that was just tagging polling places it would be called local election polling place or similar.
What are the biggest findings in the Google Search API leak pertaining to censorship?
The word fringe in the API documents is more of a smoking gun than election / Covid authority line. Here’s why..
FRINGE:
“The twiddler framework is the part of Superroot (http://go/sr) responsible for re-ranking of results from a single corpus. (The other major ranking component in Superroot is the universal packer, which combines results from multiple corpora, i.e., for universal search.) A twiddler is a C++ object that makes ranking recommendations (twiddles) given a provisional search response from a single corpus.”
The word fringe is a classifier that is equivalent to adult, pedo, etc. There are over 52 references to fringe in the new documents. Fringe is a way to reclassify search queries as controversial.
Google has an entire team evaluating whether a context or concept is in the fringe of societies acceptability. The fringe ranking team is a cross department team weeding out things like conspiracy theories that are not in the mainstream and only showing them to people who want that. If you understand what fringe means and apply it to this API – it becomes very spicy in the realm of Internet censorship. These controversial topics labeled as conspiracy theories can be a way to label real news as fake news. If you label something with a fringe classifier, it will impact the search engine visibility on the topic. We need to further explore the public statements made by the Google fringe results team.
If you cross reference the fringe cases with this API – it is chilling from a censorship perspective. I don’t think Google is properly classifying semantic intent. Instead, I think they are reclassifying queries to their personal intent and throwing semantics to the wind.
Fringe rankings are used to evaluate topics and context. It’s also able to be applied to pages and sites. If you look at this new API data alongside the data Zach Vorhies released a few years ago, you can put together a comprehensive picture of how Google operated during high stakes cases pertaining to the election and the pandemic. In my professional opinion, any alleged censorship is buried in the library behind fringe, which is referenced in the API. These hidden libraries serve as a knowledge vault for queries related to real vs. fake news, elections, and the pandemic.
Based on my research and analysis, the sensational aspect in is that the election authority, Covid authority, and adult scoring are co mingling with the fringe concept. The fringe concept is controversial because it is a way to label what is and isn’t a conspiracy theory.
What is Fringe?
- What is and isn’t mainstream.
- What is and isn’t a site that’s going to report about vaccines in a way that mainstream or conspiracy.
- Fringe topics may include mainstream conspiracies and fake news
Is Google training classifiers to essentially state that non controversial topics are fringe? If so, the machine learning system will make predictions based off of false classifications.
For the first time – how Google actually did that is in the API leak. Everyone searches for queries. Most people are classifying fringe as an unusual or unique query. That’s not actually what fringe is. If Google determines something is controversial or fringe and you don’t agree with it – that is where the potential resides for censorship. For example, if I think a cat is really a dog – if it goes through the process – it will call that belief fringe. No one knew how this actually worked until this document. If you don’t agree with what this team discerns – it is classified as fringe. Fringe is the act of deciding what is and is not mainstream.
The smoking gun in the data is buried in fringe. Many of the fringe classifications are not truly controversial. Google is determining what is and isn’t mainstream and classifying high stakes queries as fringe. The problem is that many of the fringe queries aren’t controversial at all, but they are being labeled as controversial, and flagged as fringe. This is a way to demote content in search and reduce visibility for content that isn’t fringe in the first place.
Did Google censor autocomplete for the Trump assassination query?
Kris Ruby: Yes. Autocomplete isn’t only bound by trends data. The answer depends on if Google encoded it as a knowledge panel entity.
The query is in the knowledge graph. There is story encoding on the actual…
— Kristen Ruby (@sparklingruby) July 28, 2024
TWIDDLER:
Twiddler is an overriding term meant to clean up garbage the main algorithm deems to be controversial queries that the normal algorithm wouldn’t catch. Words represent controversial queries. Then the Twiddler kicks in and adapts the search results in a non standard way. The problem is that these are not actually controversial queries.So why did Twiddler kick in for these queries at all? If you review the blacklists Mr. Vorhies shared many years ago, you will see that Google is using this as a way to reclassify non controversial queries as controversial. The recent API documentation shows the importance of Twiddler. Years ago Zach Vorhies shared words that were part of a Twiddler query blacklist. Again, many of the queries are not actually controversial at all. Twiddler never should have been used for these words. All of this is a system of misclassification at scale. For example, the ADL is not a controversial query. Twiddler never should have kicked in for it.
When you look at the fringe case in the data, you will see there there are many, and the term fringe is frequently misused. Think of fringe as manual intervention. Twiddler is put on the side of the main system so it can handle something and treat it uniquely. The same thing is happening for the fringe query. For example, Zachs documents show that in 2018, queries related to the NRA were being flagged as being controversial. The new API data further compounds this issue to show the expansive and comprehensive scope of this. When you match the module name to the API list you can cross reference the data. There are specific functions that interact with Twiddler. Everyone already knew what it does. However, what they did not know is how it could work and interact with other systems. This was revealed quite significantly in the new data. It tells you how Twiddler could theoretically function for high stakes queries.
The Future of AI Search
In the era of AI answers, Ruby notes that the way Google ranks web pages is more important than ever. Instead of a series of links to various perspectives, you might just get one straight answer thanks to Google’s new AI Overviews.
However, we’ve seen 10-year-old Reddit posts get strange amounts of authority, telling some users to put glue in their pizza. How Google chooses authority is increasingly important, since the top result may be the only one with a voice now.
“We are switching gears. We are moving from one system of search to another,” Ruby said. “AI is impacting search results in profound ways.”
The search engine industry is in the middle of a rapid change. Google once held market dominance, but recent botched product launches have led to decreased consumer trust and paved the way for new AI search competitors to eat into the highly coveted search monopoly that Google once held. How we search for information and retrieve information today will be extremely different than how we did yesterday. The algorithmic changes matter less than understanding the foundational changes of the entire system of search and information retrieval.
What should people who aren’t in SEO take away from this algorithm “leak”?
The search engine industry is in the middle of a rapid change.
The search landscape is rapidly evolving with the rapid integration of artificial intelligence technology. Understanding how to adapt your search engine optimization strategy to rank in AI search results is what matters most. The battle to compete for consumer attention is changing with machine learning, and those who know how to rank their business in the new system will ultimately reap the benefits of visibility.
The foundation of how we search for and retrieve information is rapidly shifting. There is a difference between how Google ranks content and how we search for content.
How will AI impact search results in the future?
AI will impact search engine results more than anything shown in these files.
The files are the Rosetta stone for search engine marketing professionals, but what matters is the difference of the underlying machine. Think of it like a car. We are switching gears. We are moving from one system of search to another. Most SEOS are deep in the technical weeds of this discussion and do not realize that the majority of Americans do not realize the profound shift that has taken place in how we search for and retrieve information.
AI is impacting search results in profound ways. If creators stop publishing online because they don’t want their business content used to train a model, that impacts search. Creators participation in the creator economy will impact search more than anything else. If people start opting out of the current information environment because they can’t trust that their work won’t be stolen by an AI company, that will wreak havoc on the search landscape. It will ultimately result in people with less authority having more authority, and with people with real authority fleeing the Internet. This means non-subject matter experts will be prioritized over real subject matter experts. That is not a Google problem – that is an information architecture problem powered by AI companies who fail to see their role in this.
A NOTE TO SEOS:
It is time to choose truth over rankings.
Overview of what the leaked Content Warehouse API documentation reveals about Google’s ranking features:
Google tells SEOs one thing and does another. SEO professionals blindly accept this because it is not advantageous to question Google when your livelihood depends on ranking clients in Google search results. It is not in their best interest to pick a fight with Google. Not only will doing that hurt them, it will also potentially hurt their clients’ ability to rank if any retaliatory action is ever taken. Unfortunately, this is why some search engine industry professionals remain silent on critical issues facing our country. The incentive and reward structure is directly in opposition to reporting on the internal technical issues in the industry.
The problem is not only the bias of Google, but the bias of those who work in the SEO industry. Some SEOs conflate their opinion with facts and editorialize their opinion as definitive facts to people who do not know what they are looking at. This is a serious issue when only a few people know how to interpret this type of data, and those who interpret the data often misinterpret it to their own political advantage. By political advantage, I am referring to their ability to climb the corporate ladder in the hierarchy of the top SEO positions in The United States.
Unfortunately, those who question the data or accurately interpret the data will find themselves on the outskirts of the SEO community and on Google at large. Over time, technical experts have learned that questioning the political nature of Google can result in a botched knowledge graph. Political warfare is intertwined with information warfare. This is a threat to the future of the entire information infrastructure. Google will bury this under veiled policy decisions that few ever read as justification for fringe use cases. But the real problem is that fringe use cases are often not fringe. Non-controversial topics are being mis labeled as controversial. This is a way to distort reality in the wrong direction.
SEOs realize the power of their words when it comes to ranking in search. They know that discussing controversial topics like a “pedo page score” means they will get dinged on Google for being associated with illegal and nefarious entities. Therefore, they choose their own search engine visibility over reporting the truth because they don’t want to hurt their search appearance. There are many unwritten rules, and discussing topics like “pedo page score,” vaccine content, or election authority scoring is one of the many topics that is considered off the table.
There must be a better way to differentiate those who report on these entities from those who are the entities. If people fear reporting on these topics because Google will confuse the entity as being associated with the topic, it hurts the future of news and search. If people are afraid to report accurate information because it hurts their reputation score in search, this will lead to a decline of factual reporting. We’re seeing this issue play out on AI, where AI does not understand the difference between researching a topic and searching for information. An academic researcher or reporter Googling “pedo page score” is not the same as someone looking for this content. As it stands, AI does not know the difference and will build a user profile showing someone is searching for this content.
How is it possible that I am the only one in SEO who noticed this pedo page score? Certainly, the brightest technical minds in SEO must have noticed it too. So why is it that when this story came out in the national media, not a single person in the SEO community raised a question about it? In fact, SEOs have been doing the rounds on national television, and still, this has not been mentioned.
The real SEO secret revealed is that the SEO community constantly buries news. Whether they realize it or not, they continue to do Google’s bidding. Google will have to do minimal damage control with this group of SEOs because they can continue to mislead the public. There are no consequences because the SEO industry will not question Google. They can discover a pedo page score and will still say nothing. While this group runs around touting how diverse and inclusive they are, they fail to adhere to any real constructs around political diversity or diversity of opinions or thought. This was the same group that attempted to cancel me in 2020 for raising critical questions about how Google ranked covid related content and the impact on medical practices.
The latest Google API documentation proves what I said in 2020 was correct. But to some politically charged SEOs, facts don’t matter. What matters is a hive mind mentality, and not straying from the approved Google narrative. Despite the fact that everything I said was correct, they still won’t acknowledge they were wrong. You cannot cancel someone for providing accurate information. I provided accurate SEO analysis but was shunned for saying the quiet part out loud.
We must start rewarding search engine marketing professionals who provide accurate counsel to clients during hard times, rather than those who outright lie and cancel anyone who disagrees with them. It is patently absurd that to be accepted in SEO, one is expected to peddle nonsense. Those who hold the keys to the digital castle have a moral and professional responsibility to question code that changes not only rankings, but elections. That they choose not to do that shows that at the end of the day, they choose their own visibility over national visibility on critical issues.
I was willing to go on air and provide accurate SEO advice to the public when none of them would. They lacked the courage, fortitude, and bravery to give sound counsel because they chose personal politics over empirical evidence, data, and facts. Four years later, the truth comes out on how Google really ranked content pertaining to elections and Covid content. Keep in mind that the SEO who revealed this story earlier in the week stated that this was a great thing, and he agreed with Google’s decision.
When data influences politics, you have a moral obligation to accurately report the data. Your opinion is not a fact
ABOUT RUBY MEDIA GROUP | KRIS RUBY
Ruby Media Group helps clients rank in search with SEO and digital PR strategies. Kris Ruby reports on the politics of big tech and social media. Kris Ruby is the author of The Ruby Files, the real story of AI Censorship and how Twitter used natural language processing to censor speech at scale. Kris Ruby was worked in the search engine optimization (SEO) industry for more than a decade.
PRESS:
Gizmodo: Leaked Documents Reveal How Google Search Gatekeeps the Internet
SEO Roundtable: Google Confirms Search Leak But Urges Caution
Fox News: Google Gemini using invisible commands to define toxicity
Ruby Media Group is an award-winning NY Public Relations Firm and NYC Social Media Marketing Agency. The New York PR Firm specializes in healthcare marketing, healthcare PR and medical practice marketing. Ruby Media Group helps companies increase their exposure through leveraging social media and digital PR. RMG conducts a thorough deep dive into an organizations brand identity, and then creates a digital footprint and comprehensive strategy to execute against.