Geothermal Data Reduction Guide

The following material is provided by the State of California, Department of Conservation, its divisions, officers and employees (hereinafter the “state“). The state makes no warranties, guarantees of representations of any kind, express or implied. This material provided is provided on an “as is” basis. The state shall not be liable for any loss or damage, of any kind, (including, without limitations, incidental or consequential damages, lost profits or damages resulting from any lost data or business interpretation) resulting from its use.​

FAQs for Excel Tools

(These tools are part of the paper titled, "New Data Reduction Tools and their Application to The Geysers Geothermal Field")


Microsoft Excel-based (using Visual Basic for Applications) Data Reduction and Visualization Tools have been developed that enable the user to numerically reduce large sets of data to any size. The data can be quickly sifted and graphed to allow their study, or check for data validity. These Tools were developed to visualize and check the data validity for The Geysers Geothermal field. With 45 years of production and injection history, and approximately 600 wells, the data set is very large. Therefore, we developed Tools to reduce large amount of data into smaller sets. 

Q: What do Tools do?

A: The Tools have three components.

  1. Data Query and Import
    An Excel macro-based dialog box uses a Microsoft Access database to automatically retrieve data-groups based on pre-defined queries. These queries can easily be modified to suit changing needs, or link to different database environments. Another option is to copy and paste data into the working area for data reduction tools.

  2. Data Reduction. This is the main function of Tools.
    Generally, large data sets are reduced by using existing criteria such as year, month, a physical boundary, a certain well, et al. The data-reduction technique provided here needs no such criteria—and it is simple, yet powerful. The user may choose any numerical data-reduction rate.

  3. Data Charting (Graphing). The graphical representation of the above-mentioned reduced-data has also been automated. The user can easily and quickly change different combinations of data-sets for graphing, comparison, offset realignment, and curve fitting.

Q: What are system requirements?

A: Two databases with The Geysers geothermal field data are provided with Tools as examples. The databases are in MS Access 2000 (and MS Office 2007) formats. To use Tools, including importing data from The Geysers database example, you do not have to have the Microsoft Access program on your computer, but you must have Excel2000 or later.

Q: How do I start?

A: Unzip all the files included in Preferably, select the default unzip directories, which should be as follows:

  • FTP:\DOGGR_Geysers\ Data_Reduction.xls

  • FTP:\DOGGR_Geysers\Readme.txt

  • FTP:\DOGGR​_Geysers\ Database\TheGeysers.mdb

Now open the Excel file Data_Reduction.xls. If your Excel is macro disabled or if you selected a wrong choice, it will show you instructions  to open the file correctly, or you may look at the instructions, by clicking here. If the file is opened properly, you should now see a page with a black background, a graph, several check boxes, radio buttons and drop-down lists.

Q: I want to work with the database provided with Tools, how do I start?

A: Click on the very top left button, which is named Query Data. You will see a dialog box. Most of the selections here are self-explanatory.

You may select data from one of the several predefined query-models, or individual wells. Even though the main purpose of Tools was to visualize data in large aggregates, included features that allow individual well data to also be quickly graphed and analyzed. For example, if you select individual wells, when you come back for the next query, the next well will be pre-selected.

Q: What are my choices in importing data for Data Reduction?

A: You have three choices:

  1. As described in the previous answer, you could import data from the database provided.

  2. You may copy and paste your data into the Data_Reduction.xls file. You must paste it onto the worksheet Enter_Data. You can paste your data starting from cell B10. Row # 9 is reserved for column headings. You may type or paste your column headings there. The program works on 10 columns: columns B through K.

  3. You may replace the included database TheGeysers.mdb, with your own database with similar structure, or you may slightly modify the Excel code to directly access your database. We tried to make it as generic as possible, but you will need a little basic knowledge of VBA to accomplish. You may be able to have Excel automatically record the macro for that part of the query and paste it into the existing code.

Q: What are my choices for Data Reduction?

A: Choices available are: Average, Summation, Median, Mode, Geomean, Minimum, Maximum, and Cumulative

Q: What other Tools are provided?

A: Some other helpful Tools are being added. For example, we have added a tool that allows calculation of the saturation temperature and superheat for pure steam system. Click here for the equation and other details.

Q: How do I use the Data Reduction Tools?

A: If the data have column headings, they will be listed above the first row of check boxes. If not, just the column names are listed. You can type over your own column headings.

The second row with check boxes is initially checked for all  the columns that have data. If you would like to speed up the program,  uncheck some of the check boxes and the program will skip over those without calculating. You do not need to uncheck the columns that contain no data, as the program will skip them. If your data in any of the columns are less than at least two calculated values, the program will skip them.

The rows of data required to make at least two calculated values depend on your selection of the Data-Reduction interval. For example,  if you select a Data Reduction of 200, your unreduced (original) data must have at least 400 rows to give any results.

The program will start from the top of the list (cell B10 in Enter_Data worksheet) and calculate for each 100 data points and place that in the Reduced_Data worksheet. Then it selects the next 100 points and does the same thing until it reaches the end of the page, which in the Excel 2003 version is 65,536. However, if you have a larger database, you may use multiple pages.

Q: How do I select data reduction?

A: In the drop-down list for each column you may select all that is available in the list. After doing that for each column, click the Data-Reduction Button. It will then ask you what Data-Reduction interval you would like to use.

Q: What Data-Reduction intervals can I select?

A: You may select any integer from 1 to 5000. Of course, selecting 1 would mean that the data is not reduced, which means the original data is copied as reduced.

Q: How do I  just paste my data and use the Data_Reduction Tools?

A: Go to the Enter_Data worksheet and paste your data in the cell starting at B10. In row # 9 you may enter your column headings.

Q: Is there anything I should watch for?

A: As is the case with any statistical tool, the user must be careful when choosing the type of information to be grouped and how the data are reduced. Unrelated groups, wrong selection of data-reduction criteria, grouping with statistical irrelevance, or statistical insignificance may give “viable looking” results, but they may not necessarily be valid

Q: How do I select the X-axis and the Y-axis for the graph?

A: For the X-axis, select any one of the 10 data columns by clicking on the radio buttons. For the Y-axis select up to three of the ten check boxes. If you select more than three, it will disregard everything except for the first three from left to right.

Q: Is there any data-cleaning or other option provided?

A: Yes, there are two types of options provided. One is for the unaltered data in the Enter_Data worksheet. The second is for the data after reduction, which is on Reduced _Data worksheet. These options are, by default, hidden to keep the main part of the Tools more accessible. They may be opened by clicking one of the buttons.

Q: What are these different options?

A: One is to remove zeros, negative numbers and numerically incorrect values. Another is to sort the data differently than it is in the database.

Q: How can you calculate unweighted averages  for vector data?

A: Data-points generated by these tools are purely mathematical operations with no regard to the "relative location" or "weight" of each individual point. However, logical selection of individual wells to form data-groups will produce meaningful results. When analyzing data using this kind of mathematical "averaging," and using "un-corrected" data, conclusions should be general and relative rather than absolute, unless the user can tie these relative conclusions to some corrected data points.

When analyzing such a large data set, the effect of measuring instrument uncertainties and random variations in the data measured tends to be minimized. In the absence of smaller variations, the larger variations become more conspicuous and relatable to the actual events. Authors also observed this benefit of “aggregate analysis” of data (Barker and Pingol, 1997, Khan, 1993). However, instrument and measurement bias will not be minimized when using a large data set, and unless corrected, can lead to erroneous conclusions. The old adage of “garbage in, garbage out” is just as true here as with any computational tool.