Finding the needle in the digital multilingual haystack

, , , ,

Dr. Anthony Fauci recently stated that the Coronavirus behind the current COVID-19 pandemic “is very efficient,” meaning it has the ability to protect and advance itself through its environment with explosive propagative efficiency.

This quote demonstrates that viruses have a billion-year evolutionary head start compared to humans. Where a virus may have simply infected a horseshoe bat with little to no affect with their longer evolutionary traits, humans on the other hand do not have the same evolutionary advantage. The process of natural selection overs eons has given the virus the power to proliferate at the rate of millions of human host infections during a single outbreak.

Natural selection is a theoretical concept about all living beings on this earth. It means that this being will by selecting the best means of survival of its kind, while overcoming challenges to its environment can better its chances of remaining viable and allowing itself to reproduce and regenerate.  As part of this process, this being, whether it is an animal, humans or organisms will beat out other beings in this endeavor, resulting into the demise or extinction of others in nature that compete with it.

The propagative effects of the virus are similar to some mathematical theories. The authors posit that utilizing these mathematical theories in reverse order of the progression of the disease will help mitigate it and prevent it from further progressing—what has been commonly referred to as “flattening the curve.” We have succeeded in using these same mathematical theories to identify the baseline (minimum) level of mitigation required to flatten the curve and ultimately reverse it.

Trend Analysis vs. Data Modeling vs. AI

The prominent methods of statistical analysis utilized to reliably predict future trends, including the spread of diseases such as viruses, are Trend Analysis and Data Modeling.

Trend Analysis takes advantage of minimal and immediately previous data to ascertain the path in which the data will progress in the near term. Naturally, the more data accumulated over the course of collecting, the better and longer term the trend projections can be. Typically, trend analysis focuses on one subject at a time, and does not take into account all variables (due to the small data subset), but it can later be combined with other analyses to achieve better and longer-term projections.

Data Modeling, on the other hand, is the accumulation of vast and varying degrees of data to achieve longer-term and nuanced projections. It is obviously the preferred method to employ as long as sufficient data is available.

Artificial Intelligence takes advantage of data modeling techniques to develop trending beyond what even data modeling can achieve. It uses multiple algorithmic techniques borrowed from mathematical theories to create accurate, reliable projections of complicated trends such as the spread of viral diseases.


Pandemics and the Advent of Data Modeling

Defining a standard of testing during a pandemic

The White House Coronavirus task force has employed various data modeling techniques to attempt to project the course of this epidemic. With the help of the University of Washington, it has provided a semblance of the direction of the epidemic, which it has reported in its daily briefings. A key problem with the task force’s approach is that most of its projections utilize trend analysis and not data modeling.

Data modeling requires much more extensive data and detailed mathematical algorithms. It is able to arrive at substantially accurate projections even for the long term. Without it, projections are often unreliable and inaccurate. Let me give you an example of the unreliability of the trend analysis techniques employed by the Coronavirus task force.

On April 6, 2020, the Coronavirus task force initially projected a death rate of 100,000 to 240,000 COVID-19 deaths nationally by the end of the pandemic. The week of April 13 it revised its projection to only 60,000 deaths by August. The revised projection occurred because the death rate had risen to only about 20,000 nationwide at the time. However, within one week—that is, as of April 20—the death rate doubled to over 40,000 deaths. This meant that a level of 60,000 virus deaths looked highly probable by the end of April or sooner. Therefore, the task force’s latest revised trend analysis projection was off by at least three months.

To achieve accurate long-term data modeling during a crisis it is therefore critical to obtain a substantial reliable sample set. Essential to that task is standardization of data accumulation and tracking.

Numerous states have complained about the lack of testing and feverishly sought to increase their own testing because they recognize that testing is the best barometer available to predict the need for vital resources such as Personal Protective Equipment (PPE) and ventilators, and to enable them to provide accurate predictions to the public. A foundational problem, however, is that a standard system of data collection and analysis must be created so that projections are both accurate and comprehensive long-term and nationwide. That standard needs to be both universally applicable and applied so that all participating parties can accurately evaluate their own situation and relate and compare their data to that of their counterparts, and thus be able to evaluate both how their area and the nation itself is doing in addressing the problem.


A Standardized Approach to Testing Evaluations

TEC Factor (Test Efficiency Co-Factor)

Unfortunately, the government has not created a standard for data collection and analysis, even to the minimal extent of specifying how many tests must be recorded to form the minimum foundation for analysis. Therefore, iQwest Information Technologies created a system called the Testing Efficiency Co-Factor, or TEC Factor, to help officials easily communicate their virus emergency goals, including when to safely resume normal economic activity. Those goals would be accurately based on needs and trends that have been identified and reliably validated by a TEC Factor number. We propose utilizing 4 different parameters in the calculation of this number.

We have purposely designed the TEC Factor to be in the range of 1 to 100, with the goal of each state to reach a TEC Factor of 100 to begin affecting the progression of the virus and reducing it. If states go over 100, they will be able tackle the progression faster, otherwise the minimum they need to reach is 100.

Based on initial estimates from all states up to April 25th, the total required number of tests across the country approaches 20 Million tests to allow states to get control of their pandemic. This is without accounting for the infected having to retest multiple times to determine if they are still infected.

Since 2010 iQwest Information Technologies has been utilizing data modeling that incorporates artificial intelligence. We have successfully used these metrics for “Big Data” analysis for major clients in the Legal, Life Sciences, Pharmaceutical, Insurance and other fields. Our company has published several articles on this topic that demonstrate the science and reasoning behind this approach. Two such articles referencing these data modeling techniques are available at the following links:

Finding the needle in the digital multilingual haystack

Case Study: How iQwest Helped Samsung Win

The following are the 4 parameters we used to arrive at the TEC Factor.

  1.  Minimum of 4 percent of population testing

Our extensive experience shows that there needs to be a minimum 2-percent sample set of data to even start making any reasonably reliable, albeit limited, projections. With at least 4 percent of data collected, however, certainty increases dramatically for longer-term projections. A data collection of 2-percent, therefore, produces short-term trend analysis, while 4-percent data collection yields more accurate and longer-term data modeling.

Therefore, gathering a substantial sample set (at least 4 percent) of COVID-19 patients would enable accurate data modeling of the virus. This figure is utilized as a goal in our calculations to determine baselines for progress, so before reaching that goal we can achieve somewhat of a broad prediction. The CDC does something similar from time to time. They utilize “Representative Random Sampling” of diseases to better allow them to predict the nature and progression of diseases.

Since this epidemic is not of a constant nature, the 4 percent of data collected cannot be gathered from a single point on the viral timeline, whether earlier or later in the epidemic. Therefore, for example, if some regions lag behind this 4 percent target in testing, the effort to make up for the lost time will be exponential in terms of number of tests. Most of the virus tests performed so far have occurred in the latter part of the crisis (4 million tests as of the week of April 20, which represents less than 2 percent of the U.S. population). This is simply not enough data numerically, nor representatively, to allow government officials to make accurate longer-term projections.

However, this testing target is only one parameter necessary to develop the data model and projections that are the ultimate goal. The cycle of chasing the target number will continue endlessly if we are not vigilant in proactively testing and removing the ill and their immediate contacts from the general population. We therefore propose three additional parameters to help formulate a thorough, accurate projection, since testing at this 4 percent rate is only one of the many mitigation factors.

  1.  Progression/Regression Intersects

The second parameter we propose factors in the progression of the disease. Experience in other countries demonstrates that if mitigation efforts are taken, the general course of the disease is approximately 12 weeks. The following chart graphically illustrates the general course of this pandemic in an afflicted area. The curve may actually be higher or flatter depending on testing and other mitigation factors. Chart 1 illustrates this progression and regression pattern. Since this pandemic is not over, the regression part of the chart has not yet been reached, although this is more aspirational and figures into our calculations as the desired outcome. One key point in evaluating this parameter is that most progressions have occurred with a 30-45 degrees incline and this will figure in our calculations of this parameter.

We posit that to best understand and develop mitigation of the spread of the virus, it is necessary to examine the graph of the viral progression overlaid by the graph of its regression, essentially splitting the above graph in half and folding it on itself. Chart 2 below illustrates the disease progression (orange line) versus the disease regression (yellow line). This graph demonstrates an actual reversal, not just a flattening, of the viral trend if testing is performed at a sufficient rate. Our study indicates that to mitigate the effects of the viral spread, testing needs to be performed minimally in reverse of the viral trend, thus generating a chart that progresses in reverse of the disease progression graph.

The intersection point on the overlaid chart reveals the key point required for the calculation of our second parameter, defined as the mean average of progression/regression or intersect.

The point of intersection (the mean average number) serves as a defined barometer of comparison. We can use this number to compare all regions and evaluate our progress from region to region (State, City or Zip Code). This becomes the second parameter in our calculations. The actual number of this parameter is further refined in the definition of the next parameter. In order to properly calculate this parameter, we rely on the rate of incline of 30-45 degrees referenced in previous sections. If mitigation efforts are not maintained, especially widespread testing, this number will gradually shift towards the 21 day intersect point and maybe even below that for some states. Although in order to create an average to use as a barometer our initial calculations place this number at 28.16 days as of the first week of April 2020 or roughly at about 4 weeks of a 6-week period. Notice that this is more than the halfway point of 21 days in a 6-week period that has been presumed to be the midpoint of evaluative duration. Based on our calculations, we believe the midpoint realistically will occur at 28.16 days versus the 21 illustrated in the chart above. This is primarily due to the existing progression that we have seen so far, and can change slightly over time. Since our primary goal was to establish the TEC Factor at a range of 1 to 100, we will utilize the parameter to better illustrate why this number should start out at a 28.16 value.

3.  6 Week Rolling Period with Different Weighting Factors


To make certain the third parameter follows the same time standard as the progression/regression parameter, we utilize the same 6-week period. Since mitigation efforts in a pandemic are important earlier in the progression of the disease, and the progression of the disease is slower earlier on and has an upward trajectory of 30-45 degrees over the course of its incline, we propose a data model with a sample set comprised of three periods of testing of 2 weeks: The initial 2 week period will have the weight of a 3-week period, due to the incline and the importance of this period, followed by a 2-week period, and a final 2-week period with the weight of 1 week. This is a 6-week rolling period that allows us to look back at the previous 6 weeks to help best identify this crucial parameter for defining the TEC Factor.

To properly calculate this parameter, we also defined a baseline of 300,000 tests over the course of this 6-week period, allocating 100,000 tests to each timeline as provided in the illustration below. We can then compare any region’s number of tests with this baseline. Since each State’s testing is dependent on a number of factors, we felt it important to define this baseline to allow for comparative evaluations from State to State.

iQwest has created an algorithm that evaluates tests performed across this 6-week period, with declining weight accorded to the progressing periods. Since natural selection and the disease progression follow the laws of nature and are inherently mathematical, it is logical that the ultimate solution to combat the progression is by using math in the same way nature follows it.

4.  Congestion Factors

The algorithm also takes into account population density across each State, City or Zip Code. It generates a congestion factor to better identify and track the spread factor of the virus. This variable is crucial since congested cities like New York and New Jersey have experienced the highest death rates and incidence of virus in the country. Similarly, mitigation factors such as early mitigation, physical distancing, and contact tracing powerfully affect the progression of the disease and viral projections, although not the topic of this paper and will allow us better focus on defining this mitigation factor. Furthermore, antibody serology tests can also affect the degree in which these measures are necessary. At the end of paper, we have provided a sample of the TEC factor as of April 19th and April 25th for comparative purposes.

We have not included the CF in the calculation of the TEC Factor of the states, mainly due to the vast and varying amounts of empty space in each of the states. If our methodology is adopted for the cities and more densely populated areas, the congestion factor can be included as part of the final calculation. For now, we are simply including it as a guide.


In summary, we have determined the four parameters for proper data modeling and correctly defining the TEC factor, which will enable accurate disease trend projections for the COVID-19 Coronavirus:

  1. A minimum of 4 percent of testing for the population to allow for a sufficient representative sample
  2. A mean average of progression/regression of the disease defined as 28.16
  3. A 6-week rolling data collection period divided into three 2-week periods, each given a different power rating, based on the incline of progression and earlier detection importance. This parameter, together with the first two will allow an accurate calculation of the TEC Factor at any point in time for the previous 6 weeks
  4. Congestion Factor (CF) of each region (State, City, Zip code), which will increase/decrease testing requirements for those regions based on how far their CF is above or below the mean average

With these four parameters satisfied/determined, the TEC Factor can then provide every state, county and municipality in the country with up-to-the-minute checks on their testing performance—provided that all states follow the same standard for comparison purposes. This is why a government-mandated standard of data collection and analysis is essential and is already being sought by major government leaders. The TEC Factor can not only provide government and medical authorities with an accurate projection tool, but also help identify the spread factor (risk of spread) from one state to another.

The TEC factor range is 1-100 over the course of a 6-week period. We posit that to achieve testing efficiency for effective predictive accuracy, each region needs to reach a TEC factor of 100 to gain control of their state’s epidemic.

Furthermore, in our analysis, for every two weeks that mitigation efforts are delayed, the TEC Factor needs to be decreased by up to 25 percent, hence having an inverse effect on the existing mitigation efforts. The Congestion Factor will skew this number by percentage points up or down for locales above or below the average mean Congestion Factor of all states.

By utilizing the four parameters, at any point in time and along with the total population of that region and the number of tests already performed, any region’s TEC Factor can be calculated.

For illustrative purposes, the next page provides a sampling of the TEC Factor for the ten states with the highest number of Coronavirus cases as of April 19th and April 25th. Since these are preliminary numbers based on data we have collected, this table is only for the intent to demonstrate this capability and should not be utilized to enact any policies.

Each region (State, City or Zip code) can calculate their own TEC Factor based on real numbers.

Based on initial estimates from all states up to April 25th, the total required number of tests across the country approaches 20 Million tests to allow states to get control of their pandemic. This is without accounting for the infected having to retest multiple times to determine if they are still infected.



About iQwest

For almost two decades iQwest has provided electronic discovery, managed services and consulting to AMLAW100 firms and large organizations throughout the technology, pharmaceutical, manufacturing and other industries. We provide organizations with the proper know-how to establish technology solutions internally. We deal with many confidential and court-protected documents, and can provide non-disclosure agreements (NDAs) and contractual assurances regarding any information.

Mr. Peter Afrasiabi, the President of iQwest, is a proven expert at aggregating technology-assisted business processes into organizations. He has almost 30 years of groundbreaking experience in the field, and has been a leader in the litigation support industry for 20 years. He has overseen Management of Services and E-Discovery since the inception of the company, and spearheaded projects involving the processing of over 100 million documents. Previously, he was in charge of the deployment of technology solutions (CRM, email, infrastructure, etc.) across large enterprises. He has a deep knowledge of business processes and project management, and extensive experience working with C-Level executives.


Pete Afrasiabi

iQwest Information Technologies, Inc.

844-894-0100 x236 / 714-812-8051 Direct

[email protected]