Most modern AI need labelled data start with. Data can be anything, from an image, to a database or even a trajectory, labels are used to train the AI. For example, if you want an AI that can find spoiled apples in a basket you will need to collect a lot of pictures of spoiled apples and you will need to label the spoiled apples as “spoiled”.
What happens next it is up to the developer. You could create a very complex neural network which gets as input pictures of apples and produces as a result number and location of spoiled apples, or you could you a different system or even an equation to detect spoiled apples. Here there is complete freedom and, in my opinion, it is a little bit like cooking. You need to find a good recipe, a good combination of approaches and algorithms, to produce a good result.
Once you have a decided how your AI system looks like, you will need to fine tune it. This is where you previously labelled data comes into play. You find part of this labelled data to your AI system and you will use a learning algorithm to automatically tune its parameters. You keep on tuning parameters and tweaking your system until you obtain a good result (say 92% of the times the system is able to find spoiled apples).
Finally you need to validate it. You will take the other part of the labelled data (or new data) and you will run your system without any fine tuning, then you evaluate how well it found spoiled apples. This is the most important test, because if you provided bad data when tuning your system (e.g. only apples in brown baskets) or if your system is not as great as you think it is, you will find it in this stage.
Comments