In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a split level. Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover, we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query execution times and better cluster utilisation.
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A. and Rasin, A. (2009). HadoopDB: An architectural hybrid of MapReduce and DBMS technologies for analytical workloads, Proceedings of the VLDB Endowment 2(1): 922-933, DOI: 10.14778/1687627.1687731.
Alamoudi, A., Grover, R., Carey, M.J. and Borkar, V. (2015). External data access and indexing in AsterixDB, Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, pp. 3-12, DOI: 10.1145/2806416.2806428.
Ambekar, G., Chikane, T., Sheth, S., Sable, A. and Ghag, K. (2015). Anticipation of winning probability in poker using data mining, International Conference on Computer, Communication and Control, Indore, India, pp. 1-6, DOI: 10.1109/IC4.2015.7375593.
Delaney, K. (2009). Microsoft SQL Server 2008 Internals, Microsoft Press, Redmond, WA. Hadoop (2014). Apache Hadoop, http://hadoop.apache.org.
Jiang, D., Ooi, B.C., Shi, L. and Wu, S. (2010). The performance of MapReduce: And in-depth study, Proceedings of the VLDB Endowment 3(1-2): 472-483, DOI: 10.14778/1920841.1920903.
Mealing, R. and Shapiro, J. (2015). Opponent modelling by expectation-maximisation and sequence prediction in simplified poker, IEEE Transactions on Computational Intelligence and AI in Games PP(99): 472-483, DOI:10.1109/TCIAIG.2015.2491611.
Miltersen, P.B. and Sørensen, T.B. (2007). A near-optimal, Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems, Honolulu, HI, USA, pp. 1168-1175, DOI:10.1145/1329125.1329357.
Mullins, C.S. (2000). DB2 Developer’s Guide, Fourth Edition, Sams, Indianapolis, IN. MySQL (2016). MySQL internals manual: Writing a custom storage engine, http://dev.mysql.com/doc/internals/en/custom-engine.html
Richter, S., Quian´e-Ruiz, J., Schuh, S. and Dittrich, J. (2014). Towards zero-overhead static and adaptive indexing in Hadoop, The VLDB Journal 23(3): 469-494, DOI: 10.1007/s00778-103-0332-z.
Shvachko, K., Kuang, H., Radia, S. and Chansler, R. (2010). The Hadoop distributed file system, 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1-10, DOI: 10.1109/MSST.2010.5496972.
Teófilo, L.F. and Reis, L.P. (2011). Identifying player’s strategies in no limit Texas Hold’em poker through the analysis of individual moves, EPIA Conference on Artificial Intelligence, Lisbon, Portugal, pp. 70-83.
Teófilo, L.F., Reis, L.P. and Cardoso, H.L. (2013). Estimating the probability of winning for Texas Hold’em poker agents, IEEE/WIC/ACM International Conference on Intelligent Agent Technology, Washington, DC, USA, pp. 369-374, DOI: 10.1109/WI-IAT.2013.134.
Teófilo, Reis, L.P. and Cardoso, H.L. (2014). A profitable online no-limit poker playing agent, Proceedings of the 2014 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies (IAT), Washington, DC, USA, Vol. 03, pp. 286-293, DOI: 10.1109/WI-IAT.2014.179.
Thusoo, A., Sarma, J.S., Jain, N., Shao, Z., Chakka, P., Zhang, N., Anthony, S., Liu, H. and Murthy, R. (2010). Hive-a petabyte scale data warehouse using Hadoop, Data Engineering (ICDE), 2010 IEEE 26th International Conference on, Long Beach, CA, USA, pp. 996-1005, DOI: 10.1109/ICDE.2010.5447738.