How to Add Fault Tolerance to the Control Plane

This tutorial and the FT module were designed by Tulio Alberton Ribeiro of the LaSIGE - Large-Scale Informatics Systems Laboratory. Thanks Tulio!

Table of Contents

Using the Included FT Module

If you're interested in using the FT module as written and not about developing or understanding the code, you only need to complete this section. That being said, it'll probably be beneficial if you go through the entire tutorial (smile).

Creating the Keystore

First thing that you need to do is generate the key used in challenge response authentication as follows:

# keytool -genkey -alias AliasChallengeResponse -keystore myKey.jceks -keypass "YourPassWord" -storepass "YourPassWord" -storetype JCEKS

Currently the alias option from keytool is hard coded and it is used in CryptoUtil class located at: floodlight/src/main/java/org/sdnplatform/sync/internal/util/CryptoUtil.java:

public static final String CHALLENGE_RESPONSE_SECRET = "AliasChallengeResponse";

which means that it is necessary to use alias option value as defined above. The value set in CHALLENGE_RESPONSE_SECRET variable will be used to recover the key from the keystore.

As you can see the alias option needs to be "AliasChallengeResponse", unless you change it in both places (keytool generation and CHALLENGE_RESPONSE_SECRET var).

Testing the Keystore

After key generation, you can test the accessibility of your keystore:

# keytool -list -alias AliasChallengeResponse -keystore myKey.jceks -storetype JCEKS
Enter keystore password: YourPassWord /* this should be your password as defined above */
AliasChallengeResponse, 24/Mar/2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): A2:1B:49:1B:18:D8:DC:95:CC:9F:C3:33:94:04:39:EE:44:DD:CF:BE

Defining Controllers

The Primary Controller

First, note that the net.floodlightcontroller.core.internal.FloodlightProvider.controllerId and org.sdnplatform.sync.internal.SyncManager.thisNodeId should be set to the same values. Note also that all switches defined in net.floodlightcontroller.core.internal.OFSwitchManager.switchesInitialState should be set to MASTER in the primary controller and SLAVE in the backup controller.

(PRIMARY CONTROLLER) The floodlightdefault.properties file shall be defined as follow:
org.sdnplatform.sync.internal.SyncManager.authScheme=CHALLENGE_RESPONSE
org.sdnplatform.sync.internal.SyncManager.keyStorePath=/etc/floodlight/myKey.jceks
org.sdnplatform.sync.internal.SyncManager.dbPath=/var/lib/floodlight/
org.sdnplatform.sync.internal.SyncManager.keyStorePassword=YourPassWord
org.sdnplatform.sync.internal.SyncManager.port=6642
org.sdnplatform.sync.internal.SyncManager.thisNodeId=1
org.sdnplatform.sync.internal.SyncManager.persistenceEnabled=FALSE
org.sdnplatform.sync.internal.SyncManager.nodes=[\
{"nodeId": 1, "domainId": 1, "hostname": "192.168.1.100", "port": 6642},\
{"nodeId": 2, "domainId": 1, "hostname": "192.168.1.100", "port": 6643}\
]
 
net.floodlightcontroller.core.internal.FloodlightProvider.controllerId=1
net.floodlightcontroller.core.internal.OFSwitchManager.switchesInitialState={"00:00:00:00:00:00:00:01":"ROLE_MASTER","00:00:00:00:00:00:00:02":"ROLE_MASTER", "00:00:00:00:00:00:00:03":"ROLE_MASTER", "00:00:00:00:00:00:00:04":"ROLE_MASTER","00:00:00:00:00:00:00:05":"ROLE_MASTER"}

The Backup Controller

Note thisNodeId and controllerId are set to 2 in this case (and must be different from 1 as defined above for the master). Also note that the switch roles defined in switchesIntialState are set to SLAVE, as this is the backup controller.

(BACKUP CONTROLLER) The floodlightNodeBackup.properties file shall be defined as follow:
org.sdnplatform.sync.internal.SyncManager.authScheme=CHALLENGE_RESPONSE
org.sdnplatform.sync.internal.SyncManager.keyStorePath=/etc/floodlight/key2.jceks
org.sdnplatform.sync.internal.SyncManager.dbPath=/var/lib/floodlight2/
org.sdnplatform.sync.internal.SyncManager.keyStorePassword=PassWord
org.sdnplatform.sync.internal.SyncManager.port=6643
org.sdnplatform.sync.internal.SyncManager.thisNodeId=2
org.sdnplatform.sync.internal.SyncManager.persistenceEnabled=FALSE
org.sdnplatform.sync.internal.SyncManager.nodes=[\
{"nodeId": 1, "domainId": 1, "hostname": "192.168.1.100", "port": 6642},\
{"nodeId": 2, "domainId": 1, "hostname": "192.168.1.100", "port": 6643}\
]
 
net.floodlightcontroller.core.internal.FloodlightProvider.controllerId=2
net.floodlightcontroller.core.internal.OFSwitchManager.switchesInitialState={"00:00:00:00:00:00:00:01":"ROLE_SLAVE","00:00:00:00:00:00:00:02":"ROLE_SLAVE", "00:00:00:00:00:00:00:03":"ROLE_SLAVE", "00:00:00:00:00:00:00:04":"ROLE_SLAVE","00:00:00:00:00:00:00:05":"ROLE_SLAVE"}

Running the Module

To run the FT module, make sure it's listed in the list of modules to load in floodlightdefault.properties, save the file, and run the controller. It's as simple as that.

Using the Sync Service – A Developer's Guide

This is not a step-by-step guide. It is expected the reader be comfortable with Java and writing Floodlight modules (tutorial here). One can follow along using the SimpleFT code here.

Initialize

To use the sync service, we need create two variables for ISyncService (the IFloodlightService we'll leverage) and IStoreClient (our module, as a user of the sync service): 

private ISyncService syncService;
private IStoreClient<String, String> storeFT;
this.syncService = context.getServiceImpl(ISyncService.class);

Next, we need to start our store with global scope, which allows us to sync our store with other controllers and receive remote updates:

try {
	this.syncService.registerStore("NameOfMyStore", Scope.GLOBAL);
	this.storeFT = this.syncService.getStoreClient("NameOfMyStore",
					String.class,
					String.class);
	this.storeFT.addStoreListener(this);
} catch (SyncException e) {
	throw new FloodlightModuleException("Error while setting up sync service", e);
}

Read/Write Data Operations

To add data to our store:

try {
	this.storeFT.put("Key Y", "Data X");
} catch (SyncException e) {
	e.printStackTrace();
}

To retrieve data from our store:

try {
	this.storeFT.get("Key Y").getValue().toString();
} catch (SyncException e) {
	e.printStackTrace();
}

And finally, if you we wish to monitor our store, it is necessary implement interface IStoreListener<String> in our module or monitoring class. (In this case the store has the String type, but this can vary if you wish to store other types.)

Receiving Updates to the Store (from a Remote Controller)

In the example below, we show how our module can receive and process store updates from remote controllers using the keysModified() callback function. If you wish, you can uncomment debug logging code and see local and remote updates from your sync store.

@Override
public void keysModified(Iterator<String> keys, org.sdnplatform.sync.IStoreListener.UpdateType type) {

	while(keys.hasNext()){
		String k = keys.next();
		try {
			/*
			logger.debug("keysModified: Key:{}, Value:{}, Type: {}", 
					new Object[] {
							k, 
							storeFT.get(k).getValue().toString(), 
							type.name()
						}
					);
			*/
			if(type.name().equals("REMOTE")){
				String info = storeFT.get(k).getValue();
				logger.debug("REMOTE: Key:{}, Value:{}", k, value);
			}
		} catch (SyncException e) {
			e.printStackTrace();
		}
	}
}

FT Implementation Details

The FT class uses an RPCListener to monitor RPC connections among the cluster and inform all synced nodes about connected and disconnected events. The fault tolerance module defines a RPCListener and monitors its connections.

In the event a controller boots up and connects, the SimpleFT module will insert a list of its switches and their roles to the store. And upon controller disconnection events, the module gets the disconnected node's switch list from store and set all the disconnected controller's switches as MASTER.

More Information

Write to our email list.

The source code is located on GitHub here.