Main Hyper-Parameter Reference¶
Name |
Description |
|---|---|
|
Activation function used in the model layers. |
|
Number of samples per batch during training. |
|
Whether to include bias terms in the model layers. |
|
Dimension of convolution layers when the input and output dimensions must be the same. |
|
Whether to concatenate the outputs from multiple heads in multi-head attention. |
|
Dataset to be used for training or evaluation. |
|
Dropout rate applied to the model to prevent overfitting. |
|
Dimension of the embedding layer. |
|
Number of training epochs. |
|
Dimension of each attention head in multi-head attention mechanisms. |
|
Dimension of the hidden layers within the model. |
|
Dimension of the input data. |
|
Learning rate for training. |
|
Metadata of graph or tabular data, including node and edge types, and other related information. |
|
Number of classes in the classification task. |
|
Number of features in the dataset. |
|
Number of attention heads in multi-head attention mechanisms. |
|
Number of layers in the model. |
|
Dimension of the model’s final output. |
|
Early stopping criterion, specifying the number of epochs to wait for improvement before halting training. |
|
Random seed for reproducibility of results. |
|
Weight decay parameter to regularize the model. |