Exploring Hub:>Activeloop A tale of Uploading Pokemon Data
What is Hub ?
If you know about docker,which manages the different types of applications by storing them in an isolated container.Now the containers are very useful because you can run them on any platform or any os with any types of configuration.
Now above image you can see that you need to write a Dockerfile which contains the configuration of your application and it is used to build the docker image which is hosted to the docker hub and you can pull that on your machine to run directly without installing any other software just docker.
Now here comes into play the hub by Activeloop,Machine Learning,Deep Learning and AI are heavily rely on the data and this data is very much unstructured and disorganized.To prepare the data Data scientists,Machine Learning Engineers are use their 70% time and rest for model building,model deploy,model testing.
So The hub comes into play to minimize the time to prepare data and using them efficiently as a numpy array and can load to your machine in 2 seconds.
Hub takes the dataset as schema to build a meta.json file more like an Dockerfile by which while it uploading converts the each data and label in an numpy array.These array are stored to the cloud and you can use them from any device just like docker images.
Let’s See how to install hub:
The hub is a python based package hosted in pypi which you can install by just one command:
pip install hub
That’s all to install a hub now …
How I upload my Pokemons to hub:
Before two sections are for the basic knowledge ,from here begins the tale.I know about the activeloop and hub in the hacktoberfest while searching for the repositories to contribute.When found I think about how to use it and contribute to this awesome community,after some days I joined the activeloop/hub slack community workspace which is a good place to start your contribution in machine learning.
Now I learned how to use it then I start to find some suitable datasets to upload in hub and found a good pokemon dataset to upload.Pokemon is my favourite cartoon at all,so in a fun way I want to learn the hub and use it for my usage.
The dataset I use is from github by rileynwong/pokemon-images-dataset-by-type: Dataset of Pokemon images sorted by primary type.Now after gotten a suitable dataset I am like let’s get started.
Importing all libraries and preparing Schemas for the dataset:
Now I imported the required libraries for the reading and formatting data.The libraries are numpy,opencv,hub.
The code snippet is following,
import numpy as np
import os
from cv2 import imread
from hub import schema,Dataset
from hub.schema import ClassLabel,Image
The initial preparation is ready so I think about to take a look to the datasets structure which is consist of 19 folder between which all folder is not require at all the correct folder are below in the code:
all_labels = [‘bug’,’dark’,’dragon’,’electric’,’fairy’,’fighting’,’fire’,’flying’,’ghost’,’grass’,’ground’,’ice’,’normal’,’poison’,’psychic’,’rock’,’steel’,’water’]
Now I got the labels ready for use and need to define the schema for making it uploading to the hub.First try I can’t make it but by the help of Abhinav Tuli Bro I atlast made it and which looks like below.
classlabel = ClassLabel(names=[‘bug’,’dark’,’dragon’,’electric’,’fairy’,’fighting’,’fire’,’flying’,’ghost’,’grass’,’ground’,’ice’,’normal’,’poison’,’psychic’,’rock’,’steel’,’water’])
schema = {
“labels”: classlabel,
“image”:Image(
dtype=”uint8",
shape=((120,120,None)),max_shape=(120,120,4))
}
Now labels of the schema take the classlabel as a list and Images are a shape of 120 x 120.
Next will be reading the images and loading to a dictionary.
Loading Images to dictionary and upload to the hub:
I wrote this following code to load the images to the memory,take a look:
store = []
store_label = []
store_name = []root = ‘/home/debo/uploadhub/pokemon-images-dataset-by-type’for i in sorted(os.listdir(root)):
for j in sorted(os.listdir(root+’/’+i)):
image = imread(root+’/’+i+’/’+j)
store.append(np.asarray(image))
store_label.append(i)
store_name.append(j)classlabel = ClassLabel(names=[‘bug’,’dark’,’dragon’,’electric’,’fairy’,’fighting’,’fire’,’flying’,’ghost’,’grass’,’ground’,’ice’,’normal’,’poison’,’psychic’,’rock’,’steel’,’water’])schema = {
“labels”: classlabel,
“image”:Image(
dtype=”uint8",
shape=((120,120,None)),max_shape=(120,120,4))
}ds = Dataset(
“darkdebo/pokemon_data”,
mode=”w+”,
schema=schema,
shape=(809,),
cache=2**26,
)
While two list store and store_label is for storing the images into store in a numpy array and store_label for storing label in a numerical way.But you can take another approach to make it more efficient.
Now this is done so final code to upload it to the hub…
for i in range(len(store_label)):
#print(store_name[i])
ds[‘image’,i] = store[i]
#print(ds[‘image’,i])
ds[‘labels’,i] = classlabel.str2int(store_label[i])
print(“uploading…”)print(“uploaded succesfully”)ds.commit()
This will commit the dataset to the hub and now available to use it from anywhere for every pokemon fans and machine learners.
Some Words….
Now the dataset you can check https://app.activeloop.ai/dataset/darkdebo/pokemon_data.
Some screenshot of the visualisation tool …
Now I think next part of this blog will come how to load data using hub and make a ml model to do a task.for now listen to the pokecito in youtube and checkout these:
Activeloop website:https://docs.activeloop.ai/en/latest/
Activeloop website:https://www.activeloop.ai/
Github repository of hub:https://github.com/activeloopai/hub