In Part 1 of this series, we saw one important splitting criterion for Decision Tree algorithms, that is Information Gain.
In this Part 2 of this series, I’m going to dwell on another splitting criterion, always based on the concept of node heterogeneity, which is the Gini Index. Before diving deeper into this criterion, let’s have a quick recap on what it means “node impurity”, by considering the example used in the previous part:
Decision Trees are popular Machine Learning algorithms used for both regression and classification tasks. Their popularity mainly arises from their interpretability and representability, as they mimic the way the human brain takes decisions.
The mechanism behind decision trees is that of a recursive classification procedure as a function of explanatory variables (considered one at the time) and supervised by the target variable. More specifically, this mechanism sees a recursive splitting of the initial sample alongside one variable at the time into two or more subsamples.
The idea is that we start by splitting the variable which is more able to…
Box Plots are very useful graphs used in descriptive statistics. Box plots visually show many features of numerical data through displaying their statistics, like means, averages, and so forth.
Visually speaking, a Box Plot looks like the following:
Announced in September 2020 at Microsoft Ignite, Azure Defender has been presented as the evolution of Azure Security Center (ASC). As such, it integrates this latter with new features, powered by Microsoft Threat Intelligence, and it offers a homogeneous control and security management even across heterogeneous organizations (with assets that might live on-prem, on Azure, and on other cloud platforms at the same time).
In this article, I’m going to introduce you to Azure Defender, showing how to activate it to cover your active subscriptions and services and, finally, how to connect it to the Azure-native SIEM-SOAR, Sentinel.
Artificial Intelligence and its applications have opened innovative paths in societies and organizations. Today we can simultaneously get a transcript, in any language, of someone’s speech, identify individuals’ identities via smart cameras, and so on.
Yet it all comes with a price: time and resources. Indeed, training algorithms behind any AI system is no joke: especially in the computer vision field, where the class of algorithms is that of Neural Networks, those latter could take days to properly train. Plus, in order to parallelize and speed up the process, one needs also to be provided with special hardware powered by…
In the previous articles of this series, we have been introducing some techniques to deal with the imbalance in data in binary classification tasks. Part 1 examined some resampling techniques; Part 2 focused on how to modify the algorithm by changing the threshold value.
In this Part 3, we are going to dwell once more on a way to directly intervene on the algorithm, yet this time on its loss function.
The idea is similar to that of the cutoff value examined in Part 2. Basically, given the following unbalanced situation:
In Part 1 of this series of articles, I’ve been introducing the curse of class imbalance in binary classification tasks and some remedies to address it. More specifically, I’ve been focusing on how to intervene directly to the dataset with different sampling techniques in order to make it more balanced.
In this article, I’m going to dwell on one of a set of techniques, that are applied directly to the training procedure rather than to the dataset. In particular, those techniques are:
Let’s examine the first one (next ones will…
Whenever we initialize a task for a Machine Learning model, the very first thing to do is analyzing and reasoning on the data we are provided with and will be using for training/testing purposes. Indeed, it is often the case that even before thinking about the model to use, we might need to re-architect the dataset or at least incorporate in the training some features to deal with initial data conditions.
One of those conditions is that of unbalanced data, and in this article, I’m going to focus on unbalanced datasets within binary classification tasks.
We face an imbalance in…
Whenever we train a Machine Learning model, we need a set of tools that are able to give us an idea of how well our algorithm has performed. In the case of binary classification, the evaluation of performances can be pretty simple: we can directly count how many instances have been correctly labeled by our model. Ideally, we want to maximize this last number.
Nevertheless, it is often not possible to do so without facing a trade-off in terms of other metrics. …
In my previous article, I’ve been talking about Autoencoders and their applications, especially in the Computer Vision field.
Generally speaking, an autoencoder (AE) learns to represent some input information (in our case, images input) by compressing them into a latent space, then reconstructing the input from its compressed form to a new, auto-generated output image (again in the original domain space).
In this article, I’m going to focus on a particular class of Autoencoders which have been introduced a bit later, that is that of Variational Autoencoders (VAEs).
VAEs are a variation of AEs in the sense that their main…
Cloud Specialist at @Microsoft | MSc in Data Science | Machine Learning, Statistics and Running enthusiast