To get to know about data mining techniques we should have a base knowledge of Artificial Intelligence. Here I wrote about AI from the book ‘Artificial intelligence book by Peter Norvig and Stuart J. Russell’ to better understand how AI algorithms or specifically data mining algorithms work. I will write in four sections to understand easier.
In the first section, we learned what is actually artificial intelligence is
In the second one, we studied what is AI and how environments and agents work.
In the third one, we learned the question of how to solve our problems with searching.
And In this last section, we would learn how local search algorithms and also genetic algorithms work to know specifically how data mining methods work.
So with no time waste let’s start the last section.
What is a local-search algorithm?
In the Other algorithms as BFS, DFS, A*, … we were going to find a path to the goal but in local-search algorithms, From the present state, we are trying to find a state of goal or a better state.
So there are many algorithms for local search. Here I would mention some of them.
in this algorithm, we would choose an ideal state as the present state and in each iteration, the algorithm would choose the better neighbor and move to it. We don’t have a tree in this algorithm and we’re just choosing the nearest (or the better) node in every iteration. So here I put an example to have a better understanding of the algorithm.
In this problem, we are trying to find the node with the maximum f value. To start we would choose randomly a node. For example, we choose node B to start the algorithm. In the first iteration neighbors of B are A and D, D’s value is more so we would choose it, In the next interaction D would look at its neighbors, and because it’s the dominant node the algorithm would stop and return D as the output.
There are some issues in local search specifically hill-climbing algorithm, I’ll mention them here:
- local-maximum: Sometimes we are in a state that its neighbor’s values are lower than itself but we also have nodes with better value. The algorithm would stop ( because of lower neighbor values ) and we could not reach the global maximum. In this scenario, we say that we are stuck in a local maximum.
- local minimum: Same as local maximum, But our goal is to find the minimum value.
- Flat: In this state, our neighbors’ values are the same as the present state, but we also have nodes with better values and can’t reach them.
There are several ways to fix these issues
- If our algorithm stops we would likely start from another random state.
- We can simultaneously start from random states with more than one CPU core, And at the end, we would choose the best state.
- If we are stuck in local-maximum, local-minimum, or a flat state, We can go to a state with not a better value, hope to find a better value from other nodes. ( Note that in the classic hill climbing we were always choosing the better state )
- There is a better opinion for fix no.3 that is, At the starting age of searching, With more probability, We can go to nodes with less value more than the ending times ( or last iterations ). This is the opinion of the Simulated-Annealing algorithm.
So these were some opinions (Also algorithms made from them) to fix the hill-climbing algorithm.
Now we can learn the Genetic algorithm that is the base of a lot of algorithms.
So until here, we learned how different hill-climbing algorithms work. Here we would learn the Genetic Algorithm that is closely associated with datamining algorithms.
Genetic algorithms are based on Darwin's theory. In Darvin’s theory, In every society, the dominant organisms will remain and the weakest ones will be wiped out. In every generation, this will happen and the dominant organisms will produce children and the children would inherit the stronger property of the dominant organism, So in this way the newer generations will be the dominant generations compared to the last ones.
this was the main work of Genetic-Algorithms, I won’t go into detail because our main topic here is to get to know data mining methods better, but if you want to know better I will leave you this link.
The last question here is how Genetic-Algorithms are associated with data-mining methods?
To answer this question, We can say that in Genetic algorithms we had iterations that make better results compare to the last one, So in data mining methods we have the same work. For example, we can say that we made a model to predict house price data, When we are creating the model, It will train itself (generation iterations ) to create the best connection (a mathematical function mainly ) between house properties and the price.
So this was the last part that we learned together how Ai is working in Datamining. I would highly recommend you read the Artificial intelligence book by Peter Norvig and Stuart J. Russell and specifically chapter four to get a much better view of how these algorithms work.
If you want to find out more about how to learn and work as a data-miner you can read Data mining for business analysis by Galit Shmueli, Peter C. Bruce, Peter Gedeck, Nitil R. Patel book after you read Russell’s textbook.
Thank you for reading, If you need more instructions you can leave a comment here or contact me with this email email@example.com, I would be more than happy to help you.