Got a question that the wiki doesn't answer? Ask on the forum (preferred), or join us on IRC.

BeastNode

CommandHelper/Staged/Persistence Network

From EngineHub.org Wiki
Jump to: navigation, search


The Persistence Network allows for a flexible, efficient, and intuitive way to store your persistence data. The general idea is that your code doesn't need to know exactly where the data is stored, it simply needs to know what it's address is. Much like the DNS system, you don't need to remember 173.194.37.65, you just need to remember google.com. Persistence data is stored the same way as normal, as far as your code is concerned, but there is an extra layer of abstraction on top that allows you to customize precisely where data is stored Like the DNS system, instead of knowing precisely how or where the data is stored, you just remember its "address" (the key name). There are three factors you need to understand when dealing with the Persistence Network: connections, filters, and controls.

Connections

A connection is a read/write or read-only data source, to which persistence data is mapped. There are several supported formats, and there is the potential to add more in the future. In your configuration file, a connection can be aliased, to make re-specifying a connection easier, but the actual connection specification is a URI that maps to a specific data source. For instance, the default SQLite format is simply a pointer to a file:

sqlite:///home/data/persistence.db

There are several different connection types supported, and each has a slightly different requirement:

Type Description Example Since
INI This type stores data in plain text, in a ini style. All the pros and cons of yml apply here, but instead of using the yml style to store the data, values are stored with key=value\n signatures. Pretty print is not supported, since whitespace is relevant to the meta information. ini:///path/to/ini/file.ini 3.3.1
JSON This type stores data in JSON format. All the pros and cons of yml apply here, but instead of using the yml style to store the data, values are stored in a JSON medium. The JSON will be an array, where each namespace is its own array or value, so 'name.of.key' = 'value' would be stored as such: {"name":{"of":{"key":"value"}}}. Due to lack of support for pretty printing in the json library currently used, prettyprint is unsupported, however it is intended to be supported in the future. json:///path/to/file.json 3.3.1
MySQL This type stores data in a MySQL database. Unlike the file based systems, this is extremely efficient, but requires a database connection already set up to work. This also always allows for simultaneous connections from multiple data sink/sources at once, which is not possible without the potential for corruption in file based data sources, without risking either data corruption, or extremely low efficiency. The layout of the table in the database is required to be of a specific format:
CREATE TABLE IF NOT EXISTS `testTable` (
 -- This is an UNHEX(MD5('key')) binary hash of the unlimited length key column, so the table may have a primary key.
 `key_hash` BINARY(16) PRIMARY KEY NOT NULL,
 -- This is the key itself, stored for plaintext readability, and for full text searches for getting values
 `key` TEXT NOT NULL,
 -- The value itself, which may be null
 `value` MEDIUMTEXT
)
 -- The engine is InnoDB, to support transactions
ENGINE = InnoDB,
 -- The charset is utf8, since all keys are utf8, and values are utf8 json
CHARACTER SET = utf8,
 -- The collation is case sensitive
COLLATE = utf8_bin,
 -- Table comment
COMMENT = 'MethodScript storage table'
;
mysql://[user[:password]@]host[:port]/database/table 3.3.1
Redis This type allows a connection to a redis server. A redis server must be set up and running, and if not "localhost," it is heavily recommended to be async as well. Instructions for download and setup can be found at http://redis.io/download though Windows does not appear to be officially supported. The options in the url may be set to provide additional connection information. redis://host:port?timeout=90&password=pass 3.3.1
SQLite This type store data in a SQLite database. All the pros and cons of MySQL apply here. The database will contain a lone table, and the table should be created with the query:
CREATE TABLE IF NOT EXISTS `persistance` (`key` TEXT PRIMARY KEY, `value` TEXT)
sqlite://path/to/db/file.db 3.3.1
Serialized Persistence The default type, this simply uses java serialization to store data. Extremely simple to use, it is less scalable than database driven solutions, but for a file based solution, is relatively efficient, since it is stored as binary data. This means that it cannot be easily edited however. ser:///path/to/persistence.ser 3.0.2
Temporary Memory Creates a temporary database that exists in memory only. Since keys across databases are always unique anyways, the name for databaseName is irrelevant, but is required, so "mem:default" is a recommended configuration. There are no guarantees to how long the data will stay around (in either how short of how long the data will be kept), except that it is guaranteed that within an execution unit, that data will continue to exist. This causes it to work much like import() and export(). Data stored this way is inaccessible to external processes, because it exists only in the process's memory space. mem:databaseName 3.3.1
YML This type stores data in plain text, in a yml file. Extremely simple to use, it is less scalable than database driven solutions, and even the Serialized Persistence will perform better. However, since it is stored in plain text, it is easy to edit locally, with a plain text editor, or using other tools. yml:///path/to/yml/file.yml 3.3.1

In addition, several modifier types can be specified, which modify the connection type. They are specified as extra protocols at the start of the URI.

transient:readonly:yml://persistence.yml

In the above example, the transient and read-only flags have been added to the connection. The specific meaning of each flag is as follows, and they aren't always applicable to all connection types.

Flag Name Description
readonly Makes the connection read-only. That is to say, calls to store_data() on the keys mapped to this data source will always fail.
transient The data from this source is not cached. Note that for file based data sources, this makes it incredibly inefficient for large data sources, but makes it possible for multiple things to read and write to a source at the same time. If the connection is not read-only, a lock file will be created while the file is being written to (which will be the filename with .lock appended), which should be respected by other applications to prevent corruption. During read/write operations, if the lock file exists, the call to retrieve that data will block until the lock file goes away. File based connections that are NOT transient are loaded up at startup, and only writes require file system access from that point on. It is assumed that nothing else will be editing the data source, and so data is not re-read again, which means that leaving off the transient flag makes connections much more efficient. Database driven connections are always transient.
http Makes the connection source be retrieved via http instead of assuming a local file. Connections via http are always read-only. If the connection is also transient, a call to get_value() cannot be used in synchronous mode, and will fail if async mode is not used.
https Makes the connection source be retrieved via https instead of assuming a local file. Connections via http are always read-only. If the connection is also transient, a call to get_value() cannot be used in synchronous mode, and will fail if async mode is not used.
async Forces retrievals to this connection to require asyncronous usage. This is handy if an otherwise blocking data source has gotten too large to allow synchonous connections, or if you are using a medium/large data source transiently.
prettyprint For text based files, where it is applicable and possible, if there is a way to "Pretty Print" the data, do so. This usually comes at the cost of file size, but makes it easier to read in a text editor. For some data sources, this is not possible, due to the file layout requirements of the protocol itself.
ssh Retrieves the file via SSH. This cannot be used in combination with the HTTP or HTTPS flags. The file path must match the syntax used by SCP connections, for instance: ssh:yml://[email protected]:/path/to/file/over/ssh.yml. This will only work with public-key authentication however, since there is no practical way to input your password otherwise. Since this is a remote IO connection, async is implied if this modifier is set.

Invalid modifiers will cause a warning to be raised during startup, but will otherwise be ignored.

A note on file based URIs: The file path is specified after two forward slashes, so an absolute path on unix looks like this: yml:///path/to/file, and an absolute path on windows looks like this: yml://C:/path/to/file (alternatively yml://C:\path\to\file will also work). On all platforms, a relative path would look like this: yml://path/to/file. Additionally, file based connections are usually going to be much faster, but less reliable than SQL based connections, so it is HIGHLY recommended that you use SQL connections, if nothing else, using the zero config SQLite (which is the default). The only case for a file based connection type is when using frequently read/written data, in which case a subset of your keys may be written out to a file based protocol. The ser protocol is the fastest and most compact, but as it stores the data in a lump binary form, it is not (easily) editable by hand, and is prone to total data corruption in the event of any section of the file being corrupted. For a full rundown of the speed comparisons, see the chart below.

There are special implementation considerations you must take into account if you are writing an external system that integrates with the persistence network, (including if you edit the files by hand), so you should read up on the Persistence Network Integration guide before you attempt to edit the output files, or otherwise care about the internal storage specifications.

Connection Aliases

Often times you will want to re-use a connection, but you don't want to have to re-specify the full connection details for each filter. In this case, you can use connection aliases. A connection alias looks just like a filter, but the filter name starts with a dollar sign.

$connection=mysql://username:[email protected]:3304/database/table

Then, elsewhere, instead of rewriting the entire connection string, you may simply use $connection

Filters

Filters are what map namespaces to connections. The configuration file (persistence.ini) is used to specify the actual filters. (An example is shown below). It is important to note that the namespace conventions followed by the filter system map to the REAL namespace conventions, not the namespaces you use in code. For instance, if you were to make a call to store_value('name.of.key', 'value'), the value will actually be stored in storage.name.of.key. For a more detailed description of the namespaces, see this wiki page.

A filter is a simple regex style matcher; if a key matches this filter, it is stored via this connection. Filters are specified as such: filter=connection where connection is either a full connection URI, or an alias, and filter is a matcher as specified below. Filters are matched from best fit to worst fit, top to bottom. The following wildcards are supported:

* Any value in this namespace, but don't cross namespace boundries
** Any value in this namespace, and cross namespace boundries

If we are attempting to store a value in "storage.key.name", and we have the following two filters defined:

storage.**.name=$connection1
storage.**=$connection2

Then it would be stored in $connection1, since that is a more specific match. It is defined as a more specific match, because, minus wildcards, more namespaces match. This mechanism of filter competition allows for very specific control over what data goes where, while also not having to worry about providing specific filters for all possible namespaces. If not otherwise specified, or if the connection is invalid, The filter ** is ALWAYS defined to be a connection to the default serialized persistence file, so all otherwise unmatched keys will go there.

Controls

It is sometimes necessary to transfer data from one data source to another, in which case you can use the data manager tools to do so. This should be done while the server is off, to ensure corruption does not occur, though it is possible to do this with the server running if you're careful. To transfer data, simply specify the keys to match, and the destination connection. The data will be transferred appropriately. If a conflict would occur, you will interactively be prompted with an action. After transferring, these keys, you should update your persistence.ini file to reflect the new mappings. This is the appropriate way to make modifications to your data mappings, while ensuring that no data is lost in the process. Consider the following scenario:

We have data stored in a file, persistence.ser, and we want to change the mapping of storage.player.** over to a database. If we simply changed it in the mapping file, all the existing data would be hidden. Instead, you must export/import the individual data beforehand, then change the mappings. Instead, we can use the transfer tool.

Sometimes, however, you have data in a source that isn't currently mapped in. In this case, you want to use the merge tool. Accordingly, if you want to copy data, and not move it, you also want to use the merge tool. You can also use the data manager to show hidden data, that is, data that is stored in the data store somewhere, but isn't accessible due to bad mappings.

For more information on these tools and more, see this article.

Usage

Your code will not need to change to change where data ends up being stored. To use this feature, you simply need to change the key -> connection mappings in the persistence.ini file. In the configuration file, mappings and connection aliases are stored INI style, as shown below. Local file paths are relative to the configuration file itself.

Example

#Lines starting with # are considered comments

#These are our aliases
$sqlite=sqlite://persistence.db
$sp=ser://persistence.ser
$remote=transient:readonly:http:yml://www.example.com/data.yml

#Catch all default
**=$sp

#User based settings should be stored in the database
storage.players.**=$sqlite

#Assuming the page at www.example.com/data.yml was serving a yml file
#that got server information, we might map that accordingly
storage.server_info.**=$remote

So, now, let's go over what would happen when we run our code.

1   // First call
2   store_value('server_info.start_time'time());
3   //...
4   // Second call
5   store_value('players.' . player() . 'info'pinfo());
6   //...
7   // Third call
8   store_value('other.data', 12345);


The first call would fail, because we are trying to write to a readonly connection.

The second call would store the data in the SQLite database, stored in the file persistence.db. The key will be the full key 'storage.players.player.info' though, it does not presume that the file is inherently aware of the key prefix, even if it is unique to this file.

The third call will store the data in persistence.ser, in the Serialized Persistence format. Notice that our code doesn't care at all where data is actually being stored, or in what format, it is a routing layer on top of the global key=>value storage system.

Data Source Comparisons

This table of data was generated based on the information obtained from the following script:

01   @names = array('ini''mem''sqlite''json''ser''yml')
02   @iterations = array(10, 100, 1000, 5000)
03   foreach(@iterations@iteration){
04           foreach(@names@name){
05                   sys_out('Now profiling '.@name.' with '.@iteration.' values')
06                   @start = time()
07                   foreach(cslice(1,@iteration), @i){
08                           store_value(@name'val'.@i'value')
09                   }
10                   foreach(cslice(1,@iteration), @i){
11                           get_value(@name'val'.@i)
12                   }
13                   @stop = time()
14                   sys_out(@name.' took '.(@stop - @start).'ms to complete with '.@iteration.' values')
15           }
16   }


using the following persistence.ini:

**=sqlite://persistence.db
storage.ini.**=ini://persistence/persistence.ini
storage.sqlite.**=sqlite://persistence/persistence.db
storage.mem.**=mem://persistence/persistence.db
storage.json.**=json://persistence/persistence.json
storage.ser.**=ser://persistence/persistence.ser
storage.yml.**=yml://persistence/persistence.yml

Take what information you will from the data, and feel free to run it on your system to get actual values relevant to your system, not just relative to each other on the test system.

+-------------+--------+-------+--------+-----------+--------------+----------------------------+
| iterations: |    1   |   10  |   100  |    1000   |     5000     | File size with 5000 values |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|     yml     |  59 ms | 15 ms | 268 ms | 1.958 sec |    40 sec    |          103.7 kB          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|    redis    |  24 ms |  3 ms |  17 ms |   184 ms  |    602 ms    |           Unknown          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|    sqlite   | 163 ms | 18 ms | 139 ms | 2.890 sec |    36 sec    |          360.4 kB          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|    mysql    | 160 ms | 29 ms | 185 ms | 1.588 sec |   9.770 sec  |           Unknown          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|     ini     |  1 ms  |  1 ms |  28 ms | 2.061 sec | 3.06 minutes |          138.9 kB          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|     mem     |  0 ms  |  1 ms |  2 ms  |   26 ms   |    107 ms    |             N/A            |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|     json    |  2 ms  |  1 ms |  18 ms |   293 ms  |   5.274 sec  |          108.9 kB          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+
|     ser     |  3 ms  |  0 ms |  4 ms  |   36 ms   |    117 ms    |           31.0 kB          |
+-------------+--------+-------+--------+-----------+--------------+----------------------------+

An important observation that could be made based on this data is that SQLite is considerably slower than many of the other protocols. This is because SQLite is less prone to data corruption, and is multiprocess safe. SQLite manages its own locking and journaling systems, so it is unlikely to corrupt if a bad write occurs, or if multiple processes are accessing it at once. Due to this, it is the default storage mechanism, despite its slower runtime. The tradeoff of data protection vs. script speed vs. inter-operability is not something that can be generically decided in all cases though, so feel free to change defaults as you see fit. Each protocol has pros and cons, so you must decide which one to use.



Navigation menu