Data Mining and Knowledge Discovery Nuggets 96:26, e-mailed 96-08-13
From: Data Warehouse Report, Volume 6, Summer 1996.
Published with kind permission from Data Warehouse Network
PO Box 7, Skibbereen, Cork, Ireland.
Tel +353 28 38483.
Fax +353 28 38485.
© Data Warehouse Network, 1996, Ireland.
Written by:
The creation of large data warehouses and data mining environments will have a deep influence on database technology.
Data warehousing and data mining involve a shift from viewing database systems as plain administrative records to viewing
them as a production factor. Databases are seen as potential sources of new information. Knowledge discovery in databases
(KDD) is defined as the extraction of implicit, previously unknown, and potentially useful information from databases.
This article shows how bit-mapped indexes can speed up data mining search algorithms, in particular for building decision
trees and finding association rules.
In order to fulfil the needs of a KDD environment, adaptation of traditional relational technology is necessary. In
contrast to common beliefs, data mining is first and foremost a server technology. Current-generation data mining tools
are mainly client tools, with attractive graphical user interfaces, that use flat files or relational databases as input.
The performance of these tools on truly large data sets is poor, and will not improve unless the underlying database
technology is adapted.
The current relational technology with its stress on efficiency of updates and exact identification of records is ill
suited for the support of data mining algorithms. Pattern recognition algorithms ask for efficient sampling methods,
storage of intermediate results during execution of queries, flexible user-defined functions, bit-mapped operations,
and geometric indexes that allow users to search in the neighbourhood of a specific record. All these functions can in
principle be implemented on top of the existing database management systems since they are based on generalisations of
the same mathematical structure: first order logic and relational algebra.
The actual pattern discovery stage of the KDD process is currently called data mining. At this moment, most data mining
tools are developed on top of traditional database platforms, and use SQL as a query language. There are indications,
however, that this situation is far from ideal:
Data mining is not a single technique: any technique that will help to get more information out of data is useful.
Therefore, data mining techniques form a heterogeneous group. Different techniques are used for different purposes. Some
of the more interesting data mining techniques are:
Our database contains attributes like age, income, and credit. If we want to predict customer behaviour, we might
investigate which of these attributes provides most information. If we want to predict who will buy a car magazine, what
would help us more; information about the age or information about the income of a person? It could be that age is more
important. This will mean that we may be able to predict whether or not he or she will buy a car magazine based on
knowledge of age only.
If this is the case, we can split this attribute in two. That is, we must investigate whether there is a certain age
threshold that separates car magazine buyers from non-car magazine buyers. The split-function determines this threshold
value. In this way, we could start with the first attribute, find a certain threshold, go on to the next one, find a
certain threshold, and repeat this process until we have made a correct classification for our customers, thus creating a
decision tree for our database.
There are many algorithms that build such decision trees automatically. They are very effective, since they have (nlog n)
complexity.
Figure 1: A simple decision tree for the car magazine
(target column = CAR_MAGAZINE and depth = 2)
Figure 1 shows the result of applying a tree induction algorithm to our data set. We are interested in a description of
the readers of our car magazine: we need to build a decision tree that tells us exactly what type of customers would be
interested in such a magazine. A tree of depth 1 gives the a priori chance that people buy car magazines, in this case
33%. For a tree of depth two, age appears to be the most decisive attribute. The threshold lies at 44.5 years. Above this
age only 1% of the people buy a car magazine, below this age 62% of the people have a subscription.
Suppose we have a database with information on the gender of customers, the colour and type of their car, the type of pets
they have, and a number of products they are likely to buy. The rule that was mentioned above would reflect itself in such
a database in the following way:
For 90% of the records where gender is female, car is sports car, colour of car is red, and pet is a small dog, then
perfume would be Chanel No 5.
Association rules are always defined on binary attributes, like the ones we used in our sample database to represent
subscriptions to magazines. So we have to flatten the table mentioned above before we can execute an association algorithm.
This is illustrated in Figure 2: on the left the original table is shown, on the right the flattened version of the table
is shown.
| Customer | Area | Age-group | Customer | Area #1 | Area #2 | Area #3 | Area #4 | Young | Medium | Old | |
| 1 | 1 | young | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | |
| 2 | 4 | old | 2 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | |
| 3 | 2 | old | 3 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | |
| 4 | 3 | young | 4 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | |
| 5 | 3 | medium | 5 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | |
| 6 | 2 | old | 6 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | |
| 7 | 1 | young | 7 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
Figure 2: An example of flattening a table
The problem with association rules is that one is bound to find so many associations that it will be very difficult to
separate valuable information from mere noise. It is therefore necessary to introduce some measures to distinguish
interesting associations from non-interesting ones. We will represent an association rule in the following
attributei ( ,attributej ...) => target-attribute (confidence, support)
MUSIC_MAG, HOUSE_MAG => CAR_MAG (97%,9%)
Interesting associations are those with many examples in the database. We call this the support of an association rule. In
our case the support of the rule is: the percentage of records for which MUSIC_MAG, HOUSE_MAG and CAR_MAG all hold; that
is, all the people that read all three magazines.
Support in itself is not enough however. A considerable group of people may read all three magazines, but a much larger
group may read MUSIC_MAG and HOUSE_MAG, but not CAR_MAG. In this case the association is weak, although the support might
be relatively high. We need an additional measure: that is, confidence. In our case the confidence is the percentage of
records for which CAR_MAG holds, within the group of records for which MUSIC_MAG and HOUSE_MAG hold.
In our example, with five attributes for the five magazines, there are 25 possible unary association rules, some of which
are trivial. This number grows rapidly if we allow multiple attribute associations. As in the case of decision trees, it
will be better to use an environment that enables us to zoom in on interesting sets of association rules interactively
using zoom scan algorithms.
| Sports-magazine (36%, 45%) | |
| Music-magazine (96%, 15%) | |
| Comic-magazine (57%, 8%) | |
| Target Minimal confidence Minimal support A priori | Car-magazine 30% 3% 30% |
Figure 3: Binary associations for the car magazine
Figure 3 illustrates such an environment. We have selected CAR_MAG as our target attribute; that is, we are interested in
readers of the car magazine. The confidence and support levels are set at 33% and 3%. This means that we will not be
interested in sub-groups smaller than 3% of the database and that within these sub-groups we want to find associations
that hold for at least 33% of the records.
The association between music-magazines and car-magazines, is the most interesting since it has a high confidence level
(96%) with a fairly high support (15%).+
The second part of this article, which discusses the implementation of decision trees and association rules using bitmaps,
will be published in the next issue.
About the authors:
WOUTER SENF is technical leader of the decision support group at Tandem's High Performance Research Centre where he is closely involved in large data warehousing projects in Europe and with new software developments at Tandem. Prior to joining the HPRC, he worked for 6 years as a consultant at Tandem Netherlands BV where he specialised in the areas of large databases and performance.
PIETER ADRIAANS has been active in research in the areas of artificial intelligence and relational database systems since 1984. He is a director at Syllogic, where he is responsible for the development of tools for the management of client-server systems and databases specialising in the integration of artificial intelligence techniques, machine learning, object orientation, and management systems.
David Aubrey
While the practice of corporate data warehousing and data mining is receiving a lot of hype, there's considerable confusion about what it is and who should be using it. The vague, even proprietary terminology for describing this sophisticated approach to storing and retrieving database information hides the serious technology that comprises this practice. This leads to questions such as, "Isn't data warehousing just a glorified term for database management?" To give you the real answer, we'll first define data warehousing and data mining, then explore the advantages these methods bring to corporations.
In the most simplistic definition, data warehousing and data mining are complementary methods of applying various technologies to use the information stored in a company's existing database more effectively. In a broad sense, these technologies are the most dramatic utilization of a PC's raw computational power.
But there is controversy surrounding the effectiveness of data warehousing and data mining. Skeptics say that data warehousing is actually an expensive step backward, disguised as a step forward. They describe it as stuffing all the data contained in a company's many small databases into one large database, which is then managed by a trained staff and accessed by users through a friendly front end. The detractors see this as revisiting the age of mainframes and dumb terminals. They also say the sheer volume of database records being warehoused would inevitably drag down productivity; lost files and records would take up valuable storage space while important data would remain largely unused.
This is a glib view, and on closer examination, it's also inaccurate. Data warehousing is much more than simply dumping piles of files into a single database. Properly executed, it has the potential to be the most sophisticated example of client/server software architecture to date.
Let's start with an accurate definition of a data warehouse. A data-warehouse environment is a subject-oriented data collection used mainly to aid in organization-wide decision making. This is why many IS managers have adopted the term "decision support" in place of data warehousing, since a decision-support construct can be viewed as a specialized database that is maintained separately from your organization's operational databases.
But why isn't the traditional database technology, used for operational databases, sufficient enough? Well, that's the nub of the warehousing question and the reason the warehousing industry is spewing out new buzzwords at a ridiculous--and even careless--rate. Regular, or operational, databases weren't designed with data-warehousing applications in mind. Their primary function is to serve as a data-processing center for business support. All data stored in an operational database is based on the software process required to input and process that data. The database's innards are highly structured and repetitive, and accessing the data requires complicated data-entry procedures and batch-processing or online-transaction-processing (OLTP) queries.
This is fine for storing a fairly large number of records, with the ability to retrieve them for future use. What sets a data-warehousing operation apart is that it's designed to support not just the storage and retrieval of records, but also the actual manipulation and cross-fertilization of that data to help users make more-informed decisions. To do this, a decision-support system has to be built using an entirely different foundation than an operational database.
A large number of the buzzwords claim to be the backbone of a data warehouse, but most are simply the tools used to get the most out of a decision-support system. On an architectural level, there are two primary technologies vying for attention: multidimensional database (MMD) technology and relational online analytical processing (ROLAP).
Where standard operational databases, often based on OLTP, store records in a two-dimensional architecture generally called tables, data warehousing involves much more. For example, an MMD-based construct arranges its records in an N-dimensional "cube." Essentially, the database performs a large number of precalculations for all the multidimensional views of its cube and stores them as part of the cube for later use, when users access or cross-reference the data. That way, when a user calls for data in one of these multidimensional views, it's retrieved much faster than from a two-dimensional system where the database would need to spend a considerable amount of time scanning its relational tables.
The ROLAP design, however, blends powerful querying tools with third-party optimization software, which is used in conjunction with your existing relational database-management systems. This creates a multilevel architecture that lets the ROLAP client see multidimensional views while keeping the database calculation engine, the "metadata"--the predefined elements of your data warehouse--and all the security code on the ROLAP server.
The object behind a data warehouse is to link all your company's data to a single, user-accessible front end. Naturally, there are a variety of ways to do this, but we will discuss the three main methods here: using your existing operational databases with third-party products, setting up a virtual data warehouse, and using a discrete data warehouse.
If your corporation's MIS equipment is built around a standard, well-supported system architecture and comes from one company, such as DEC, Hewlett-Packard, or IBM, then it's possible to create a fairly inexpensive solution using your existing operational databases. Via customized database query engines from vendors such as Oracle or SQL, you can create a decision-support environment that doesn't require a separate metadata repository. While this approach is inexpensive (since you'll need neither a new database nor data-duplication methods), you'll almost certainly encounter serious performance and flexibility problems when trying to run evolving decision-support queries.
This hybrid solution simply enhances your existing databases by adding data tables dedicated solely to decision support within the operational database. Operational data must then be separated from historical data and organized by subject in those special tables. This will greatly reduce locking conflicts within OLTP applications. Again, while this is certainly a feasible solution, it's best left for a departmental data warehouse, since an enterprise-wide solution would stress a single OLTP environment far too much.
Then there's the virtual data warehouse. This is an alternative for companies that want to expand the previous single-OLTP-based concept over several distributed databases. The virtual data warehouse depends on a piece of software nebulously known as middleware. This ever-changing category of software basically bridges end users' querying tools to the physical databases. In this situation, the user simultaneously accesses multiple databases on multiple systems, but to the user, it seems as if everything is functioning as a single data warehouse. Again, however, the potential drawback here is that the operational databases aren't optimized for decision-support querying. The lack of standardization among different database platforms is another obstacle.
Finally, the only true data-warehouse implementation is called the discrete data warehouse. This system is composed of a separate, discrete database dedicated to decision-support querying and traffic activity. It's populated only with data consistent with true data-warehousing criteria, which is discussed below. While this is certainly the most rewarding approach for an enterprise-wide decision-support platform, it's also the most involved to construct. Typically, building a discrete warehouse requires the involvement of all your users. They must identify the data they require in their daily business activities, help design an appropriate data model, and help create the extraction and cleansing routines.
Selecting the appropriate data-warehouse architecture cannot be based on static performance measurements. You've got to compare performance characteristics with your business process. The real key here is understanding how your business works and what it requires of your data-storage engine to facilitate decision making. That leaves a major gray area wide-open. But within the chaos, you can cling to a few rules when designing an engine.
At a basic level, data warehouses share four fundamental characteristics: They are subject-oriented, integrated, time-variant, and nonvolatile. Regular operational databases, such as order processing and manufacturing, are organized around a single, static business application. This causes companies to store identical information in multiple locations, resulting not only in wasted time and storage space, but also in inaccurately updated information. By contrast, a data warehouse is subject-oriented, meaning it is organized around specific subjects. Subject organization presents the data in a format easier for end users to understand and manipulate more creatively.
Data-warehousing systems are also integrated. Data integration is perhaps one of the trickiest facets of the operation. It is accomplished by dictating complete consistency in how data is formatted, named, stored, manipulated, and more. The political ramifications alone are daunting to any IS manager facing a large and varied corporate milieu. But if you surmount these problems, your data-warehouse information will always be maintained and accessed in a consistent way--which is critical to its success.
Since data warehouses hold and maintain historical and current data, they are considered to be time-variant. Operational databases, however, hold only the most up-to-date data. On a historical scale, data warehouses contain data gleaned from a company's operational databases on a daily, weekly, or even monthly basis and is then maintained from one to five years. This illustrates one of the major differences between the two database technologies: Unlike static data-processing environments, where only the latest record matters, historical information can be of high importance to corporate decision makers. It can be used to better understand business trends and relationships--a virtually impossible task for an operational database.
Finally, data warehouses are nonvolatile, which means that after the informational data is loaded into the warehouse, changes, inserts, or deletes are performed only rarely. Data loaded into the warehouse is actually "transformed" data that stems from the operational databases. The data warehouse reloads that data on a periodic basis and updates itself with transformed data from the operational databases. Apart from this loading process, the information contained in the data warehouse generally remains static. Nonvolatility lets a data warehouse be heavily optimized for query processing.
Building and implementing a decision-support system is only the beginning. Holding countless interdepartmental meetings to discuss consistent data-storage criteria; choosing a multidimensional database engine; creating customized interdepartmental interfaces for that engine; configuring custom or third-party middleware packages to let that database engine communicate with your existing relational databases; and scheduling file transfers, updates, and backups simply get you off to a good start.
Once your system is all in place, it's got to actually do something. Now you must use an ever-growing list of software analysis tools to arrive at a synergy called data mining.
Data mining is designed to reach as deeply as possible into a data store, and the mining tools are designed to find patterns and infer rules from it. You can use those results to answer users' in-depth questions and perform forecasts. They can help speed analysis by focusing their attention on the most germane variables.
Generally, data mining can access five common information types: associations, classifications, clusters, forecasting, and sequences. Associations arise when occurrences are connected by a single event. Recognizing patterns that describe the group to which an item belongs is the basis of classification, probably the most common data-mining activity. Examining a set of items that have already been classified and inferring a set of rules from the classified items is a classification. Clustering is related to classification, but there are no predefined groups. Through clustering, a data-mining tool can segment warehouse data into distinct groups, again using those groups to make predictions and comparisons.
While these information types are the primary results of a data-mining operation, a variety of tools can access those information types. Which ones you choose depends on your performance requirements, database platforms, data types, and overall business scenario.
One of the most popular tools right now is a neural network. (See the March 1996 article "Brain Waves," p. 566.) A neural net is software-based operation divided into a fairly large number of virtual nodes, each with its own inputs, outputs, and processing power. Underneath the initial input and output layers, a programmer can code a number of hidden processing layers. By comparing its output with a known outcome described by the net programmer, a neural net can adjust its processes to better adapt to its mission. But this is cutting-edge software development best left to the experts.
A decision tree is a different animal. Decision trees divide your warehouse data into groups, based on set values of their variables. They do this basically by creating a large hierarchy of if-then questions that serve to classify the data. While decision trees aren't as complex as a neural network, they're nevertheless sparking some interest. Their relative simplicity lets them be faster than neural nets on average, and they can be customized to specific business needs.
But they're not idiot-proof and can't work with all types of data. For instance, some decision trees have trouble dealing with data sets containing continuous sets of data such as sales-over-time figures. They require that these be grouped into ranges before the decision tree can perform. So, yes, a decision tree can be much easier to implement, use, and understand than a neural net, but depending on the business circumstance, an if-then statement can get hairy.
By now, it should be clear that data warehousing and mining comprise some of the most advanced software techniques available. As you might imagine, this kind of code won't run on just any hardware platform. In fact, warehousing and decision support are so demanding that they've rejuvenated the ultra-high-end server market.
Modular systems, or scaleable systems in the new lingo, are back in. With a scaleable system, once you buy the base platform, you can easily upgrade these machines via proprietary plug-in hardware modules, including processor cards, memory modules, and hard drives. These systems are also designed to let advanced client/server environments run with a minimal number of skilled or trained professional support people.
Data warehousing has also put symmetric multiprocessing (SMP) servers and even massively parallel processors (MPPs) back on the map. Basically, these machines have between two and four processors that use special operating systems such as Windows NT, OS/2 Warp, and NetWare SMP to divide up processing loads among themselves in the most efficient manner possible.
MPP systems extend the SMP paradigm, both in terms of the number of simultaneous processors and the degree to which they can communicate and share with one another. These systems are also usually based around high-end individual processor chips such as those found in the IBM RS/6000 SP, ICL Goldrush MegaServer, NCR 5100M, or Unisys OPUS.
Unfortunately, while SMP systems are taken care of by the operating system so a database can run over them without requiring optimization, MPP systems require special versions of database software. Right now, IBM, Oracle, and Sybase are some of the vendors supporting this technology, but not every vendor is on board.
Data warehousing and mining will affect all aspects of the corporate IS department--from software considerations and hardware requirements to interpersonal issues between IS and the other business departments. Everything is touched by its implementation. Have no doubt: Building one of these systems is intense. But you can use the following tips to avoid problems: Users pick only a select few data types on which they require decision support. Start small and perfect your system before attempting to grow. Remember that different departments will want different data types initially, and getting them all to agree on just a few can be a major hassle.
Overall design will depend on the most frequently accessed data elements and the most commonly required dimensional query (be it time, geography, or whatever). If any information has to be aggregated or summarized, it will have to be identified right away. Also at this stage, the metadata (criteria, rules, and so on) relating to the data you've selected to be contained in the warehouse must be defined; these elements will assist your users in understanding the warehouse.
Finally, there's warehouse maintenance. The procedures required to maintain your warehouse cannot be implemented on an as-needed basis. Change is inevitable, so the challenge for database managers is to come up with an effective maintenance plan right away that leaves enough room for flexible alterations to the system as time and technology march on.
Warehousing data and subsequently mining it for cutting-edge information are among the highest forms of corporate computing. This is what managers have wanted since computers first appeared in the business arena. A warehousing system is still expensive to set up and not always as easy to use as you might like. Remember, this is a complex undertaking, directed at large companies that require only the most-sophisticated technology, can staff the most-qualified personnel, and have the largest budgets. You'd be ill-advised to look at data warehousing and mining any other way. But the overall benefit is as rich as the collected data they hold.
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Return-Path:
Date: Tue, 6 Aug 1996 16:55:32 -0400
From: Gregory Piatetsky-Shapiro (gps0@eureka)
To: kdd
Subject: [raghavan@cacs.usl.edu:
help]
Content-Length: 1087
------- Start of forwarded message -------
Date: Thu, 1 Aug 1996 15:37:46 -0500
From: 'Dr. Raghavan' (raghavan@cacs.usl.edu)
Subject: help
Content-Length: 728
Dear Dr. GPS,
I am hoping this reaches you before you leave for the KDD conf. to
Portland. As you know, I will be (co)guest editing a special issue on
data mining for the Journal of American Society for Information Science.
I would like your help in making an announcement about that at the
conference (since, I am unable to attend).
The following URL has the complete call for papers:
http://www.usl.edu/~raghavan/JASIS97.html
If some of the attendees would be willing to serve as referees, they could
get in touch with me by e-mail at raghavan@ccacs.usl.edu.
With regards,
Vijay Raghavan
Previous
7 Next Top
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: 'jpbrown' (jpbrown@hal-pc.org)
Organization: Ultimate Resources
Date: Tue, 6 Aug 1996 22:41:50 -0006
Subject: Complexity and Predictions
Predictions, while resolving the Complexity of a database, can
improve the Coefficients of Determination.
A new link on my Website, http://www.hal-pc.org/~jpbrown/hmpg16.html
will show you the results of an iterative application of Artificial Neural
Nets for predictions. The sub-sets produce a classification which can provide
causal explanations for the apparent complexity of the original
database.
Previous
8 Next Top
>~~~Siftware:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
From: 'Norman F. Smith' (nfsmith@artnet.net)
Subject: A Related Website
Date: Thu, 8 Aug 1996 14:24:17 -0700
Check out our website at http://www.jurikres.com.
We provide data preprocessing software to neural network users.
Our site also contains an amusing page concerned with 'snake oil' in the financial markets.
Norman Smith
Jurik Research Software
Previous
9 Next Top
>~~~Positions:~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Date: Tue, 6 Aug 1996 14:50:43 -0700
From: (ali@almaden.ibm.com)
(Kamal Ali)
To: kdd@gte.com
Subject: Data Mining positions at IBM San Jose
IBM DATA MINING POSITIONS
Join the team! We are an entrepreneurial organization within IBM
developing Data Mining solutions. We have three groups, a consulting
services group, research & development group, and software application
development group. Currently, we are looking for qualified
individuals in the rapidly expanding consulting services group.
DATA MINING ANALYSTS/CONSULTANTS
Analysts will be responsible for performing consulting engagements in
any of the following areas: finance, insurance, retail, tele-
communications, media, and health care. There will also be
opportunities for teaching business data-mining classes and a few
opportunities for applied research for the kinds of problems that
arise from our data-mining engagements. Familiarity with databases,
statistics, data preparation, and high-end data mining techniques and
tools required. Familiarity with SAS and previous experience in
applying data-mining in a commercial context are big
pluses. Applicants must have advanced degrees in CS, Statistics, or
Mathematics - PhD preferred. Applicants should have good communication
skills, like working with people in a team environment, be willing to
travel and be application oriented. These positions provide excellent
customer contact with high level executives in FORTUNE 500 companies.
These positions will be located at our world class research lab -
Almaden Research Center - in sunny San Jose, CA. Almaden is located
in beautifully situated rolling hills in Silicon Valley affording
close contact with top universities such as Stanford University and
UC Berkeley.
For further information, please email your resume to myself
(ali@almaden.ibm.com)
preferably in ASCII or Postscript format.
I've been working as an Analyst in IBM's data mining group since December
and it's been a great experience. Feel free to contact me with questions.
(408) 927-1354.
Also check out our web pages, which give some detailed examples of how
we've used our tools to build and visualize models and give
information on previous engagements we have had.
http://www.almaden.ibm.com/stss
(click on 'Data Mining')
==============================================================================
Kamal Mahmood Ali, Ph.D. Phone: 408 927 1354
Consultant and data mining analyst, Fax: 408 927 3025
Data Mining Solutions, Office: ARC D3-250
IBM http://www.almaden.ibm.com/stss/
==============================================================================
Previous
10 Next Top
>~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~