This project aims to classify playing cards using deep learning techniques, specifically Convolutional Neural Networks (CNNs). The model is trained on a dataset containing images of various playing cards.
- Introduction
- Dataset Overview
- Model Architecture
- Training Process
- Model Evaluation
- Usage
- Dependencies
- Contributing
- References
In this project, we develop a CNN-based model to classify playing cards into different categories. The model is trained on a dataset consisting of images of playing cards from various decks and suits. The goal is to accurately identify the type of playing card depicted in a given image.
The dataset used for training and evaluation contains a diverse collection of playing card images. It includes images of cards from different decks, suits, and ranks. The dataset is preprocessed to ensure consistency in image size and format.
- Dataset Size: 7794 images
- Classes: 53 (One class for each type of playing card)
- Image Size: 224 x 224 pixels (RGB format)
- Train-Validation-Test Split: 7624 images / 265 images / 265 images
For more information about the dataset, refer to this link.
The model architecture consists of a series of convolutional layers followed by fully connected layers. Here's an overview of the model architecture:
- The model begins with two convolutional layers (
conv1
andconv2
), which extract features from the input images. - Each convolutional layer is followed by a Rectified Linear Unit (ReLU) activation function (
relu1
andrelu2
) to introduce non-linearity to the model. - Max-pooling layers (
pool1
andpool2
) are applied after each convolutional layer to downsample the feature maps and reduce spatial dimensions.
- Following the convolutional layers, the feature maps are flattened and passed through two fully connected layers (
fc1
andfc2
). - The first fully connected layer (
fc1
) has 512 neurons and applies a ReLU activation function (relu3
). - The final fully connected layer (
fc2
) outputs logits for each class without applying an activation function.
- Input images are assumed to have three channels (RGB).
- The output layer has
num_classes
neurons, wherenum_classes
represents the number of classes for classification (default: 53).
- During the forward pass, input images (
x
) undergo convolutional operations, followed by activation functions and max-pooling. - The resulting feature maps are flattened and passed through fully connected layers to generate class logits.
Overall, the CardClassifierCNN
architecture employs convolutional and fully connected layers to learn hierarchical representations of playing card images and make predictions based on these representations.
For more details, refer to the Model Architecture section in the code.
The model is trained using the Adam optimizer with the Cross-Entropy Loss function. Training is performed over multiple epochs, with early stopping implemented to prevent overfitting. Training progress and performance metrics are monitored using validation data.
For detailed information about the training process, refer to the Training Process section in the code.
After training, the model is evaluated on a separate test set to assess its performance. The evaluation includes metrics such as accuracy, precision, recall, and F1-score. Additionally, qualitative assessment is performed by visualizing predictions on sample test images.
For more details, refer to the Model Evaluation section in the code.
To use the model for inference, follow these steps:
- Install the required dependencies (specified in the Dependencies section).
- Clone the repository to your local machine.
- Download the dataset and place it in the appropriate directory.
- Run the provided scripts or execute the code in your preferred environment.
Ensure you have the following dependencies installed:
- Python (version 3.9)
- PyTorch (version 2.1.2)
- Matplotlib
- NumPy (version 1.26.3)
- scikit-learn
Contributions to this project are welcome. Feel free to open issues, submit pull requests, or provide feedback on the existing implementation.