Help on how to setup OneStor native storage

Hello

I’m trying to deploy the simplest cluster with 2 host to test new OneStor native storage solution.

I’ve followed the official docs (OneStor Datastore — OpenNebula 6.0.4 documentation) but it seems that I’m missing something to make it works.

image

As a summary, this is the workflow:

  1. Install ONE 6.2 on first hosts with minione (one-01)
  • Minione creates initial datastores (default: 0, system: 1 and files: 2)
  1. Add new host to the cluster (one-02)(KVM Node Installation — OpenNebula 6.0.4 documentation)
  2. Create aditional datastores needed by OneStor
  • 100 system_replica_cluster
  • 101 system_hosts_local

This is the final configuration:

  ID NAME                                                 SIZE AVA CLUSTERS IMAGES TYPE DS      TM      STAT
 101 system_hosts_local                                      - -   0             0 sys  -       ssh     on
 100 system_replica_cluster                                  - -   0             0 sys  -       ssh     on
   2 files                                                5.8T 95% 0             0 fil  fs      ssh     on
   1 default                                              5.8T 95% 0             1 img  fs      ssh     on
   0 system                                               5.8T 95% -             0 sys  -       qcow2   off

Datastore 1 is intended to be Image Datastore Replica (cache)

onedatastore show 1
DATASTORE 1 INFORMATION
ID             : 1
NAME           : default
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
TYPE           : IMAGE
DS_MAD         : fs
TM_MAD         : ssh
BASE PATH      : /var/lib/one//datastores/1
DISK_TYPE      : FILE
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : 5.8T
FREE:          : 5.5T
USED:          : 420M
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
CLONE_TARGET="SYSTEM"
CLONE_TARGET_SSH="SYSTEM"
DISK_TYPE="FILE"
DISK_TYPE_SSH="FILE"
DRIVER="qcow2"
DS_MAD="fs"
LN_TARGET="SYSTEM"
LN_TARGET_SSH="SYSTEM"
SAFE_DIRS="/var/lib/one/import"
TM_MAD="ssh"
TM_MAD_SYSTEM="ssh"
TYPE="IMAGE_DS"

Datastore 100 is intended to be the one used for replication (keep snapshots). From doc: " Replication is enabled by the presence of the REPLICA_HOST key, with the name of one of the Hosts belonging to the cluster"

onedatastore show 100
DATASTORE 100 INFORMATION
ID             : 100
NAME           : system_replica_cluster
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
TYPE           : SYSTEM
DS_MAD         : -
TM_MAD         : ssh
BASE PATH      : /var/lib/one//datastores/100
DISK_TYPE      : FILE
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : -
FREE:          : -
USED:          : -
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
DISK_TYPE="FILE"
DS_MIGRATE="YES"
REPLICA_HOST="one-02"
SHARED="NO"
TM_MAD="ssh"
TYPE="SYSTEM_DS"

Datastore 101 is intended to be the local datastore for each host to deploy local VMs

onedatastore show 101
DATASTORE 101 INFORMATION
ID             : 101
NAME           : system_hosts_local
USER           : oneadmin
GROUP          : oneadmin
CLUSTERS       : 0
TYPE           : SYSTEM
DS_MAD         : -
TM_MAD         : ssh
BASE PATH      : /var/lib/one//datastores/101
DISK_TYPE      : FILE
STATE          : READY

DATASTORE CAPACITY
TOTAL:         : -
FREE:          : -
USED:          : -
LIMIT:         : -

PERMISSIONS
OWNER          : um-
GROUP          : u--
OTHER          : ---

DATASTORE TEMPLATE
ALLOW_ORPHANS="YES"
DISK_TYPE="FILE"
DS_MIGRATE="YES"
RESTRICTED_DIRS="/"
SAFE_DIRS="/var/tmp"
SHARED="NO"
TM_MAD="ssh"
TYPE="SYSTEM_DS"

With this setup, when I try to deploy a new VM is works fine, but the snapshoting mechanism for automatic recovery is not working.

Docs says that to enable automatic snapshot only need to add the option RECOVERY_SNAPSHOT_FREQ (300s in my test) to the vm disk section.

...
DISK = [
  ALLOW_ORPHANS = "YES",
  CLONE = "YES",
  CLONE_TARGET = "SYSTEM",
  CLUSTER_ID = "0",
  DATASTORE = "default",
  DATASTORE_ID = "1",
  DEV_PREFIX = "vd",
  DISK_ID = "0",
  DISK_SNAPSHOT_TOTAL_SIZE = "0",
  DISK_TYPE = "FILE",
  DRIVER = "qcow2",
  FORMAT = "qcow2",
  IMAGE = "Alpine Linux 3.14",
  IMAGE_ID = "0",
  IMAGE_STATE = "2",
  LN_TARGET = "SYSTEM",
  ORIGINAL_SIZE = "256",
  READONLY = "NO",
  RECOVERY_SNAPSHOT_FREQ = "300",
  SAVE = "NO",
  SIZE = "256",
  SOURCE = "/var/lib/one//datastores/1/fca60a0588a5374cb2f9361d20e98cb9",
  TARGET = "vda",
  TM_MAD = "ssh",
  TYPE = "FILE" ]
...

The automation so that the snapshot is automatically launched and saved in the datastore (100 i guest) is not happening, and I don’t see anything in the logs that gives me any clues.

Going through the documentation, the configuration seems simple and I don’t find any additional step.

I would be grateful if any colleague could give me a clue because at the moment I have no more ideas.

Thanks a lot !!

1 Like

Hi Jesus

Yo do not need to define 2 SYSTEM datastores, all the replication process happens behind the scenes. You only need:

  1. Image datastore (local) holds the golden images, these images are copied (SSH) to the hypervisors
  2. System datastore (“distributed”) each host uses its storage area to hold a copy. You only need one of this type per cluster. You should see replica snaphosts and cache to speedup transfers in the REPLICA_HOST (look for the local /var/lib/one/datastores folder)

Hope this helps

Hi Ruben

I have deleted the other system datastores and deployed a new VM on datastore 100 that is the one configured for replication and everything works fine.

At the beginning I understood that the datastore for replication was only used for replication, not for normal deployments. That was the reason why I had several system datastores.

In the other hand, I realize that the replication mechanisim only ocurs with the VMs deployed on the datastore with REPLICA_HOST configured. The other system datastores are regular local storages. Is this right?

Lastly I realize that when a VM with replication is terminated, the snapshot stored on /var/lib/one/datastores/replica_snaps still there. Is this normal? because with time the datastore could be full with garbage.

Thank you a lot for your help

Hola Jesús,

Yes there should be a periodic cleaning of the cache so it does not full the whole disk. Also yes current version only uses one replica host at a time.

Cheers