“This	document	is	the	Accepted	Manuscript	version	of	a	Published	Work	that	appeared	in	final	form	in	“Managing	the	Computational	Chemistry	Big	Data	problem:		
the	ioChem-BD	platform”,	copyright	©	American	Chemical	Society	after	peer	review	
and	technical	editing	by	the	publisher.	To	access	the	final	edited	and	published	work	
see	http://pubs.acs.org/doi/abs/10.1021/ci500593j.

Managing the Computational Chemistry Big Data
problem: the ioChem-BD platform
M. Álvarez-Moreno, 1,2,* C. de Graaf, 2,4 N. López,1 F. Maseras,1,3 J. M. Poblet,2 C. Bo, 1,2,*
1

Institute of Chemical Research of Catalonia, ICIQ, Av. Països Catalans 16, 43007, Tarragona, Catalonia, Spain
Department of Physical and Inorganic Chemistry, Universitat Rovira i Virgili, C/ Marcel·lí Domingo s/n, 43007, Tarragona, Catalonia, Spain
3
Department of Chemistry, Universitat Autònoma de Barcelona, 08193, Bellaterra, Catalonia, Spain
4
Catalan Institution for Research and Advanced Studies, ICREA, Passeig Lluis Companys 23, 08010, Barcelona, Catalonia,
Spain
KEYWORDS: Big Data, Chemical Markup Language, XML, XSLT, HTML5, semantic data, digital repository, Density Functional Theory, simulations, catalysis.
2

ABSTRACT: We present the ioChem-BD platform as a multi-headed tool aimed to manage large volumes of quantum chemistry
results from a diverse group of already common simulation packages. The platform has an extensible structure, the key modules
managing the main tasks: (i) upload of output files from common computational chemistry packages, (ii) extract meaningful data
from the results, (iii) generate output summaries in user-friendly formats. A heavy use of the Chemical Mark-up Language (CML)
is made in the intermediate files used by ioChem-BD. From them and using XSL techniques, we will manipulate and transform
such chemical datasets to fulfill researchers’ needs in the form of HTML5 reports, supporting information and other research media.

1. INTRODUCTION
Intensive, high performance computing is one of the pillars
to accelerate materials discovery and development in many
fields of science and engineering, most prominently Chemistry, Physics and related areas. The volume of information
generated daily, coming from the results of scientific calculations, is increasing exponentially. For instance, in our lab
scientists (a group of about 10) generate 1.3TB weekly. Its
conservation on physical media is favored right now by cheap
price of storage per bit of information as well as the increase
of the available telecommunications infrastructure bandwidth.
This fact makes the public and private centers generate and
store more and more terabytes of information, in our particular
case this information corresponds to the outcome of calculations. At present, storage of computational simulations has not
been identified as a bottleneck in most data and supercomputing centers. However, the main global players in the Internet
business have already introduced the "Big Data"1 concept to
start looking for solutions to maintain all physical data storage
systems sustainable, and provide its convenient access. Solutions based on what is called "the cloud" along with concepts
derived from the "social networks" are leading to a reformula-

tion of how and in what physical space the information shall
be stored. As a result, there is a growing demand of tools to
order the storage, allow the analysis and simplify the presentation of significantly large volumes of growing data in an amenable, transparent, and reliable manner.2
Only a very small percentage of the information currently
stored in data centers is hierarchically indexed.3 This simply
means that the information available is impossible to process:
the information bits are hardly usable by any person other than
the creator himself. Tim Berners-Lee, one of the original developers of the World Wide Web, identified the need to transform numerical data into "raw data".4 This raw data is the
desirable state where information is meaningful because it is
enriched with labels that contextualizes it, the labels being
then "metadata". Once contextualized, searching the ocean of
information is more efficient. Moreover, the search process
becomes a process of knowledge creation, as it is possible to
establish new connections, in what is already called "linked
data".5
The application of these concepts to the field of computational chemistry is hindered by the heterogeneity of the program packages used in the atomistic simulations of molecules

and materials. As a result, the outcome of atomistic simulations addressing chemical or physical problems and based on
the application of the Schrödinger equation are presented in a
disperse manner, with sparse data showing only some of the
key aspects as geometries, energies, chemical and/or physical
properties. This high degree of diversity in the data formats
requires the definition of standards.6 The ultimate consequence is that the data published in the scientific journals of

Chemistry, Physics, Nanoscience, Biochemistry and related
areas are not homogeneous, they are often incomplete, are
hardly consulted in bulk and rarely reused. There have been
technological initiatives7 like the Quixote project that implement solutions on multiple aspects of this problem like data
format unity and data management.8 Alternatively purpose
dedicated databases have been generated by the groups of
MIT, Berkeley9 and Standford10.

Scheme 1. ioChem-BD system overview. The web nature of the CREATE and BROWSE modules is highlighted. Both modules share
functionalities like searching and browsing chemical datasets, exporting to third-party formats and publishing results, but without losing
sight of the original private-public sense of each module.

In this paper, we present an alternative platform, ioChemBD, encompassing a variety of aspects in the definition of
standards for treatment, hierarchical storage and retrieval of
data. Our platform automates the extraction of relevant data
and its conversion into fully tagged information in a distributed database. It provides tools for the researcher to validate,
enrich, publish and share information, and tools in the cloud to
access it and view it.
2. BASE TECHNOLOGIES
The keystone in the definition of the project is that it employs high reliability software technologies widely used in
Internet world that are extended to cover our particular area of
interest. We chose eXtensible Markup Language (XML)11 as
the container element of all information for its reliability,
format neutrality and ease of validation (using XSD verification tools)12. To be more specific we chose Chemical Markup
Language (CML) implementation, because it contains all
semantics necessary to describe most chemistry.13 With calcu-

lations in CML format, querying its content for specific information is extremely easy and efficient by using XPATH queries.14 Working with XML provide a wide range of conversion
operations from CML files into any other existing or future
format using eXtensible Stylesheet Language Transformations
(XSLT).15
As the access to information is becoming more universal,
the data in the system is reachable through Internet by any
digital device in the market, following the latest existing web
standards in communication.11,16 Users have at their disposal
the latest search,17,18 display,19 and data labeling tools13 and
also there exist communication channels enabled to propose
new features.
In terms of data storage, information is distributed among
the content generators, creating a mesh topology in which the
service is always available and accessible on the network.
Being a cloud system, it has the necessary standards in data
definition20-22 in order to connect to other digital repositories
and external Web services to build up a network of intercon-

nected semantic data to provide the most sense to the user
experience.
Finally, to enhance industrial implementation, the platform
allows secure23,24 and reliable channels for the communication
and collaboration between users, groups and/or centers but
with the highest privacy standards for third partners. All information has configurable levels of access and licensing,
allowing to adapt to the specific legal needs of each entity.
ARCHITECTURE
The ioChem-BD platform is composed of two main modules that work independently, labeled CREATE and
BROWSE. Both of them are executed as Java web services.
The CREATE module is designed to extract the information
from the output files generated by the computational chemistry
packages and store it in an organized way. It currently manages output files from programs Gaussian25, ADF26, VASP27.
The list of accessible codes should be expanded in the midterm future to codes such as SIESTA,28 TURBOMOLE,29
MOLCAS,30 and ORCA.31 The CREATE module shall be
used by the scientists that execute the simulations (creator).
The BROWSE module is designed as a tool to explore and
use the data contained in the database and resides in the cloud.
BROWSE module has a much broader scope, and is useful to
researchers interested in accessing the computational Big
Data.
The combination of both modules allows the scientific
community the storage and access to all the Chemistry Big
Data in a way schematically shown in Scheme 1.
CREATE MODULE
The CREATE module contains two different subunits that
allow: (i) the extraction and structuring of the relevant output
data (ii) the publication of such data and derivates into
BROWSE.
CREATE MODULE: DATA EXTRACTION FROM
OUTPUT FILES.
The system initially works with input and output files from
the computational chemistry packages described above. In
Gaussian and ADF the results are stored in a single file that
needs to be extracted. However, for other computational codes
the CREATE module needs a group of files. This is the particular case of VASP calculations, where relevant data are split
into multiple files from inputs like POTCAR, or output files
OUTCAR (summary), CONTCAR (geometries), XDATCAR
(trajectory), and vasprun.xml. ioChem-BD platform outputs a
single CML file from these sources. The upload of these files
to the CREATE module is done inside a layered process where
we simultaneously parse and tag all relevant data; we infer its
metadata and capture the molecular geometry. A scriptable
shell upload utility is used to do file conversion and upload
calculations straight from HPC clusters to CREATE module,
there is also an alternative mechanism to upload content from
user web browser. A detection algorithm decides which format
extraction templates to use based on calculation file content.
Once the file format is elucidated, the appropriate templates
for such format are selected and the first conversion is performed: from plain text to CML using a modified version of

the JUMBOconverters library.32 This is followed by a second
conversion to reorder CML tags so they comply with CompChem convention.33 This process can be used on individual or
multiple output files as it is depicted in Figure 1 and Figure 2.
As an additional feature, the module is able to attach directly other supporting files such as calculation input files,
graphics, text and all needed associated gray literature. These
additional files are not processed, so the future user of the
database should process them him/herself. Such files will be
paired with the calculation CML file during its existence inside ioChem-BD system to provide further information of the
calculation.

Figure 1. Conversion workflow for individual output files. This
two-step process is the default behavior for file conversion inside
JUMBOconverters library, from output files into CML elements,
and then to compliant CML CompChem.

Figure 2. Conversion workflow for multiple output files. Our
customized JUMBOconverters library accepts multiple output
files for its unification into a single CML file. Uploaded files can
be a mixture of plain text and XML files.

Figure 3. CREATE main panel view. The hierarchical tree on upper section allows browsing all uploaded content. Selecting an element
from it will fill lower panels with more detailed information and its available display actions.

Once the CompChem CML file is generated, a second data
flow is triggered to extract the corresponding metadata fields.
By using XSLT style sheets we infer fields such as: type of
calculation, methods used, basis set, charge, multiplicity, and
several others. From CREATE database we will also retrieve

additional information such as structural (which files are involved in this upload process) and administrative (how this
files were generated). Figure 4 depicts this process that ends
building a METS compliant file of administrative, descriptive
and structural metadata containing all aspects of the upload.

Prior to the data storage by the CREATE module there is a
final step aimed to extract the final geometry, a key point to
repeat the calculation in case of need. This particular point sets
our computational database close to other structural databases
like the CSD (molecules)34 and for the structures of compounds in crystallography COD.35 In the case of geometry
optimizations a large number of geometries can appear in the
same file. Again we rely on XSLT templates to contain the
necessary logic to retrieve final value of such field. The geometry is then indexed with ChemAxon JChemBase software for
future substructure searches.36
When all these processes are completed, newly uploaded
calculations are accessible on CREATE module via tree
browsing or search (Figure 3). At this point, users can browse
their uploaded content. Selecting a calculation opens an auxiliary window on lower right corner with all available actions:
visualize molecule on JSmol viewer,37 view an HTML5 resume of relevant calculation data (or other attaches files),
download and visualize CML and attached files. To keep the
system extensibility, all actions applicable to content are implementations of an abstract Action class which are managed
via an ActionManager object. Such class acts also as a class
loader. This allows upgrading the system with new calculation

Figure 5. CREATE search panel allows users to define multiple
search criteria using boolean logic. Such queries range from administrative metadata, chemical related terms and chemical substructure.

Figure 6. Search output can be narrowed by the definition of a
molecular substructure that will refine its results. A visual
HTML5 molecular editor is displayed on user’s browser to sketch
part or the entire molecule.

Figure 4. METS file generation workflow. Using XSLT
stylesheets we can extract calculation descriptive metadata fields.
Together with these fields we will append structural and administrative information to compose a METS file that fully describes
our new uploaded result.

operations without the need to update its code, just dropping a
new Action implementation class package in the web server
class path.

Once uploaded it is possible to search the stored data. The
search functionality relies on standard database queries in
conjunction with JChemBase search engine to filter its content.36 As seen on Figure 5 and 6, users can query administrative and descriptive metadata fields and use a molecular editor
to sketch substructures that will be used as a search filter.
Results vary depending on the privileges that the user possesses towards CREATE calculations. They are defined by fine
grained access rules set at user, group and others level, like
UNIX system file rights.
Next to JSmol visualization, another remarkable action is
the HTML resume (see Figure 7). Using XSLT style sheets,
ioChem-BD is able to generate a fully compliant HTML5
resume that implements features such as: one page presentation, all datasets are exportable to other formats, compact
drop-down content, device-responsive, and its most valuable

feature: fully customizable with new data fields without the
need to upgrade the platform.

CREATE PUBLICATION MECHANISM
Communication between both ioChem-BD modules is currently unidirectional, from CREATE to BROWSE modules
through a process called "Content publication". Publishing
allows importing single calculations or groups of them to the
BROWSE module, to generate assets like reports. To complete
this step, it is only necessary to name calculations and mark
them for publication. The remaining process, REST API
communications, is invisible to the user. Because both modules are written as Java web services, publication mechanism
is done via servlets and published files are bundled in DSpace
METS SIP41 format during its ingestion in BROWSE module.
From this step onwards published calculations will be called
‘items’. As a result of the publication process, a group of URL
handles referring published items are presented. These links
point to public HTML pages in BROWSE module with the
following content: (i) final calculation geometry visualization
with Jsmol, (ii) an expandable summary of the item’s metadata, (iii) a summary of the most relevant data in HTML5 format, (iv) a list of downloadable content such as input files, (v)
support files and gray literature associated with calculations.
Most of these sections can be mapped to CREATE Actions as
they share the same conversion style sheets. Therefore, results
share coherency in both modules.

Figure 7. Every uploaded calculation has a group of actions associated with it. One of them is an HTML summary that displays its
most remarkable fields. Such summary can be customized to
fulfill researchers’ needs and to adapt to future requirements.

Another feature delivered in HTML5 reports is the visual
representation of data. A reference to Highcharts (a Javascript
charting library)38 has been included in all generated reports.
This inclusion eases the process to convert plain data into
interactive visual elements using (among others) line, scatter
or column charts. This inclusion behavior is easily replicable
to the innumerable third-party plugins that exist today in the
chemistry field.
In addition, this report file can contain other rich content objects such as third party plugins, navigable data tables and
interactive graphics among others. An example of ioChemBD’s pluggability with external tools can be observed in the
integration process done inside HTML5 report generation
engine with JCAMP-MOL IR Spectrum Viewer applet.38
During the development of this engine, there was a need to
include an IR viewer so that calculated vibrational frequencies
could be displayed as an additional visual field inside the
resume. To do so, a java servlet was created to convert CML
calculations to Jcamp-DX40 compliant output text by the use of
XSLT transformations. Now calling this servlet with a calculation ID will return its vibrational information in Jcamp-DX
format, so appending the applet tag calling this servlet inside
our report did the job and no major code development was
needed.

BROWSE MODULE
The BROWSE module consists of a heavily modified version of DSpace digital repository.17 It has been adapted to
fulfill our requirements, mainly in quantum chemistry data
representation and in external services communication. Some
workflows have been copied from the CREATE module to
have a similar behavior between them. One of the main features in BROWSE (DSpace) instances is that they can communicate between them using OAI-PMH protocol20 to share
item metadata. This allows building a public distributed network of theoretical chemical repositories, which will become a
great advance in term of information socialization.
The module works by default with Dublin Core metadata
schema,42 which is good on capturing the most basic bibliographic information about any digital asset, but cannot hold
the description of quantum chemistry documents. However,
this module is versatile enough to expand its metadata schemas with new ones, so we have created a schema focused on
computational chemistry field.
Among other interesting features, the BROWSE module accepts browsing and searching content, such content can be
embargoed, exported or syndicated depending on users’ needs.
A notable aspect of the BROWSE module is its ability to
display Supporting Information and other derived chemical
reports build on CREATE module. As a brief overview, Supporting Information documents are normally composed of one
or several chemical structures (normally on XYZ format) from
a series of related calculations. It can also contain extra information fields such as final energies, vibrational frequencies,
spin angular momentum, etc. Supporting Information documents are normally stored on heterogeneous locations like
public ftp servers, private web servers, cloud storage services,
etc, depending on the data publication policy of each research
center. These documents are later pointed by journal papers as
additional information related to research. Usually they are

generated manually in a tedious, time-wasting and error prone
action that sometimes derivates on unportable digital documents (one, maybe two stars of five in Open Data Scheme).43
We try to remove such ineffective procedure using the Supporting Information generator that is integrated inside
CREATE module and whose results are displayed on
BROWSE (Figure 8). It uses XSL-FO, an open format object
definition language, as a bridge between raw data and multiformat output. Such report engine is feed with CML calculations from a user selection at CREATE main panel tree. After
setting them in session, the user chooses to create a new report
from it. In this case, we choose Supporting Information as
report type. A fast XPath query will return additional fields
(like final energies) that exist among these calculations and
that will (dis)activate additional report generation options.

Using a similar process, ioChem-BD is able to build a daily
growing set of reports. In this case, we can opt to generate two
types of outputs: a ready to download multi-format XSL-FO
document (similar to Supporting Information report), or an
HTML5 web page that will pop up in a new tab. This last
option is extremely versatile because it opens the door for
adding third-party plugins and other dynamic content to our
report, a more powerful way to display results.
As an example of this functionality we will describe energy
reaction profile report generation. CREATE users need to
select a group of calculations and define a set of formulas that
constitute the energy steps. The report engine will build a
dynamic, device-responsive HTML5 report in our browser
displaying an energy profile chart for such calculations (Figure
9).

Figure 9. An example of dynamically generated report. Based on
user calculation selection and the definition of multiple energy
reaction formulas, our platform is able to build and output reaction energy profile reports.

In terms of programming code, there is an abstract class defined for Reports, so new classes can implement its functions
and the ReportManager class will load them appending new
report types dynamically with no need to alter our existing
code. Extensions to more complex outputs like R language
code snippets44 or Jmol scripts45 are envisaged.

Figure 8. Supporting Information report generation workflow.
Starting from a user selection of calculations, the module extracts
its molecular geometry (among other fields like final energies) to
bundle them into a single XML file. Following iterations will
convert it to XSL-FO format and then to user’s desired output
format.

After setting up report fields, the engine will extract the final geometries and other fields from chosen calculations. Then
they are joined into a single CML file. Next step in report
generation is to convert CML to XSL-FO document; with
some more XSLT work we obtain a XSL-FO document ready
to be converted based on users choice to any kind of digital
document such as PDF, TXT, CSV, etc.

Inside ioChem-BD all content derived from calculations is
built under demand and then streamed to the user’s browser.
There is a minimal performance loss using this dynamic generation approach but we enormously reduce disk space requirements and increase in data veracity avoiding the massive
storage of formatted content that over time can become outdated or partial.
Current developments in ioChem-BD are focused on the
publication in the BROWSE module of calculation reports. At
present, reports can only be generated in the CREATE module, but in the near future it will be possible to generate a
public handle inside BROWSE that points to a report generation page that, depending on its URL parameters, outputs its
results in multiple formats.
SYSTEM
ADAPTABILITY
AND
SAFETY
CONSIDERATIONS
Dynamic data definition and capture is a requirement of
nowadays chemistry computational sector. Quantum software

vendors periodically release new versions of its products with
the addition of new functionalities, bug fixes, data representation changes, new chemical properties, calculation methods or
atom basis and on the other side, chemist software users demand more analysis tools and higher levels of data representation.
This constant flow of structural and representational data
changes defines a list of requirements that our software tries to
fulfill with a loosely-coupled data management rules. With our
customized JUMBOconverters library we can expand our data
capture rules just by expanding its XML templates definition.
We can also modify metadata capture and data presentation to
final user with the modification of inner XSLT style sheets.
Mastering the skills necessary to modify and expand these
rules presents a small learning curve, since they are based on
open and well documented standards. Therefore, every research group can easily adapt its ioChem-BD instance to its
requirements without the need of an external programmer.
User authentication mechanism has been implemented with
Jasig CAS SSO Server.24 Its session management service
allows us to append new independent web services in a modular fashion without the need to implement user credential
management inside our modules.
In ioChem-BD, data processing documentation has the same
relevance as the processes it tries to describe. An outdated
documentation on a highly dynamic system as the ioChem-BD
environment will unavoidably lead to confusion. Users cannot
track down recent changes and the reimplementation of already existing extraction rules becomes hard to avoid. In addition to this, such rules are defined on XML, a cryptic language
which does not help its reading unless it is converted to a userfriendly format. These new requirements led us to develop a
toolkit that manipulates Jumbo capture templates to build a
SGML/XML DocBook fileset.46 We use it as a neutral format
bridge for its later conversion into a hierarchical group of web
pages in WebHelp format. Documentation generation process
is triggered on every template modification and becomes instantly accessible to all CREATE users for its reference. This
effectively avoids that the documentation becomes outdated.
All content managed inside ioChem-BD is under access
control, even published items. In the CREATE module calculation content is restricted at user/group/others level. In the
BROWSE module content can define fine grained access rules
and also set content embargos depending on third-party publication requirements. Splitting the system into two separated
modules that should be installed on separated web servers
increases the overall security of the system. CREATE module
will hold internal research data and should be deployed in
internal web servers with few open ports to capture upload
calculations from HPC and for publication mechanism.
BROWSE module can be moved to a public web server,
where published items will reside and also referred by its
handle. The whole system relies on HTTPS protocol for its
communication among users and modules to ensure that data
is always encrypted when transferred. There is an “additional”
CAS module in charge of user validation that uses tokens for
single sign on / single sign off session management, which
greatly simplifies the session management code and detaches
it from our modules.
CONCLUSIONS

The massive use of simulation techniques in chemical research generates huge amounts of information, which starts to
be known as “the Big Data problem”. The main obstacle for
managing enormous volumes of information is its storage in
such a way that facilitates data mining as a strategy to optimize the processes that allow scientists to face the challenges
of sustainability, knowledge, and the rational use of existent
resources. We created ioChem-BD as a group of services in
the cloud to manage computational chemistry input and output
files. As other database-related projects, the concepts underlying our platform rely on well-defined standards and it implements treatment, hierarchical storage and data recovery tools
to facilitate data mining. This software implements new methodological strategies that promotes an optimal re-use of results
and accumulated knowledge, and that improve researchers’
daily productivity. It automates the extraction of relevant data
and transforms numerical data into tagged data inside its database. This platform provides tools for the researcher in order
to validate, enrich, publish and share information, and tools for
accessing and visualizing data. Other modules allow the automatic creation of both reaction energy profile plots (by combining data of a set of molecular entities), and Supporting
Information files. Besides, this platform capable of performing
kinetic analysis from reaction energy profiles, QSSR analysis,
or build data sets for screening, for instance.
The final goal is to build a new reference tool in computational chemistry research, to fill the gap between the generation of results and the publication of manuscripts, embedded in
bibliography management and services to third parties. Future
implementations will include integration with a semantic
database by taking advantage of XSLT transformations to
create data triples of every uploaded calculation. With such
information we will be in the position to connect our semantic
data with other external data sources and to develop a REST
API to open bridges between the BROWSE module and thirdparty data services.43

ASSOCIATED CONTENT
A list of current working instances of ioChem-BD software and a
demo server are accessible at www.iochem-bd.org.

AUTHOR INFORMATION
Corresponding Author
moises.alvarez@urv.cat; cbo@iciq.cat

Author Contributions
The manuscript was written through contributions of all authors.
All authors have given approval to the final version of the manuscript.

ACKNOW LEDGMENTS
Financial support for this work from the AGAUR (ref. 2009 SGR
25, 2014 SGR 199 and 2014 SGR 409) of Generalitat de Catalunya is grateful acknowledged. We also thank the Spanish Ministry
of Science and Innovation (projects CTQ2011-29054-C0201/BQU; CTQ2011-29054-C02-02/BQU; CTQ2011-27033/BQU;
CTQ2012-3382/BQU; CTQ2011-23140) and MINECO for support through Severo Ochoa Excellence Accreditation 2014-2018
(SEV-2013-0319). COST Action CM1203 “Polyoxometalate

Chemistry for Molecular Nanoscience (PoCheMoN)” and COST
Action "ECOSTBio CM1305" are also gratefully acknowledged.

REFERENCES
(1) Lynch, C. Big data: How do your data grow? Nature 2008, 455,
28-29.
(2) Harvey, M. J.; Mason, N. J.; Rzepa, H. S. Digital Data Repositories in Chemistry and Their Integration with Journals and Electronic
Notebooks. J. Chem. Inf. Model. 2014. DOI: 10.1021/ci500302p
(accessed Sept 17, 2014).
(3) Ramakrishnan, R.; Dral, P. O.; Rupp, M.; von lilienfeld, O. A.
Quantum chemistry structures and properties of 134 kilo molecules.
Sci. Data 2014. DOI:10.1038/sdata.2014.22 (accessed Sept 22, 2014).
(4) Berners-Lee, T. The next web. Ted Conference.
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
(accessed Sept 17, 2014).
(5) Frey, J. G.; Bird, C. L. Cheminformatics and the semantic web:
adding value with linked data and enhanced provenance. Wiley Interdiscip. Rev.: Comput. Mol. Sci. 2013, 3, 465-481.
(6) Phadungsukanan, W.; Kraft, M.; Townsend, J.; Murray-Rust, P.
The semantics of Chemical Markup Language (CML) for computational chemistry: CompChem. J. Cheminf. 2012, 4, 15.
(7) Chen, M.; Stott, A. C.; Li, S.; Dixon, D. A. Construction of a
robust, large-scale, collaborative database for raw data in computational chemistry: The Collaborative Chemistry Database Tool
(CCDBT). J. Mol. Graphics Modell. 2012, 34, 67-75.
(8) Adams, S.; de Castro, P.; Echenique, P.; Estrada, J.; Hanwell,
M. D.; Murray-Rust, P.; Sherwood, P.; Thomas, J.; Townsend, J. The
Quixote project: Collaborative and Open Quantum Chemistry data
management in the Internet age. J. Cheminf. 2011, 3, 38.
(9)
The
Materials
Project
Home
Page
https://www.materialsproject.org (accessed Sept 22, 2014).
(10) Hummelshøj, J. S.; Abild-Pedersen, F.; Studt, F.; Bligaard, T.;
Nørskov, J. K. CatApp: A Web Application for Surface Chemistry
and Heterogeneous Catalysis. Angew. Chem. Int. Ed. 2012, 51, 272–
274.
(11) World Wide Web Consortium. Extensible Markup Language
(XML)
1.0
(Third
Edition)
specification.
http://www.w3.org/TR/REC-xml (accessed Sept 17, 2014).
(12) Java schema validation class, javadoc definition.
http://docs.oracle.com/javase/7/docs/api/javax/xml/validation/Validat
or.html (accessed Sept 17, 2014).
(13) Adams, N.; Cannon, E. O.; Murray-Rust, P. ChemAxiom–an
ontological framework for chemistry in science. Nature Precedings,
2009. DOI:10.1038/npre.2009.3714.1 (accessed Sept 22, 2014).
(14) World Wide Web Consortium. XML Path Language – Version 1.0 http://www.w3.org/TR/xpath (accessed Sept 17, 2014).
(15) World Wide Web Consortium. XSL Transformations
(XSLT) - Version 1.0 - W3C Recommendation 16 November 1999.
http://www.w3.org/TR/xslt (accessed Sept 17, 2014).
(16) HTML5 - A vocabulary and associated APIs for HTML
and XHTML. http://www.w3.org/TR/2012/CR-html5-20121217/
(accessed Sept 17, 2014).
(17) Smith, M.; Barton, M.; Bass, M.; Branschofsky, M.; McClellan, G.; Stuve, D.; Walker, J. H. DSpace: An open source dynamic
digital repository. D-Lib Magazine, Jan 2003, 9.
(18) Apache Lucene. A high-performance, full-featured text search
engine library. http://lucene.apache.org (accessed Sept 17, 2014).
(19) Jmol Home Page. http://jmol.sourceforge.net/ (accessed Sept
17, 2014).
(20) Lagoze, C; Van de Sompel, H. In The open archives initiative:
building a low-barrier interoperability framework. Proceedings of the
1st ACM/IEEE-CS joint conference on Digital libraries, New York,
NY, USA, ACM, 2001.

(21) Gartner, R. METS: Metadata Encoding and Transmission
Standard. JISC Techwatch report TSW, Oct 2002, 2-5.
(22) Allinson, J.; François, S.; Lewis, S. Sword: Simple webservice offering repository deposit. Ariadne, Jan 2008, 54.
(23)
HTTP
over
TLS
description.
https://tools.ietf.org/html/rfc2818/ (accessed Sept 17, 2014).
(24) Addison, M. S.; Battaglia, S.; Petro, A. Jasig CAS Documentation. http://jasig.github.io/cas/4.0.0/index.html (accessed Sept 17,
2014).
(25) Gaussian Home Page. http://www.gaussian.com (accessed
Sept 17, 2014).
(26) ADF Home Page. http://www.scm.com/ADF (accessed Sept
17, 2014).
(27) VASP Home Page. http://www.vasp.at (accessed Sept 17,
2014).
(28) SIESTA Home Page. http://departments.icmab.es/leem/siesta
(accessed Sept 17, 2014).
(29) Turbomole Home Page. http://www.turbomole.com (accessed
Sept 17, 2014).
(30) Molcas Home Page. http://www.molcas.org (accessed Sept 17,
2014).
(31) Orca Home Page. http://cec.mpg.de/forum (accessed Sept 17,
2014).
(32)
JUMBOconverters
Main
project
page.
https://bitbucket.org/wwmm/jumbo-converters (accessed Sept 17,
2014).
(33) Murray-Rust, P.; Townsend, J.; Adams, S. E.; Phadungsukanan, W.; Thomas, J. The semantics of Chemical Markup
Language (CML): dictionaries and conventions.
J.
Cheminf.
2011, 3, 43.
(34) Cambridge Structural Database
Home Page.
http://www.ccdc.cam.ac.uk/Solutions/CSDSystem/Pages/CSD.aspx
(accessed Sept 17, 2014).
(35)
Crystallography
Open
Database
Home
Page.
http://www.crystallography.net/ - (accessed Sept 17, 2014).
(36) JChem Base - Chemical interface to relational database engines. http://www.chemaxon.com/products/jchem-base (accessed Sept
17, 2014).
(37)
JSmol,
sourceforge
project.
http://sourceforge.net/projects/jsmol/ - (accessed Sept 17, 2014).
(38) Highcharts Home Page. http://www.highcharts.com (accessed
Sept 17, 2014).
(39) Hanson, R. M.; Lancashire, R. J. In JCAMP-MOL: A JCAMPDX extension to allow interactive model/spectrum exploration using
Jmol and JSpecView. The ACS 2013 symposium on exchangeable
data formats, Sept 11, Indiana, IN, USA, Am. Chem. Soc. 2013.
(40) IUPAC CPEP Subcommittee on Electronic Data Standards
Home Page - http://www.jcamp-dx.org (accessed Sept 17, 2014).
(41) DSpace METS Document Profile for Submission Information
Packages
(SIP).
https://wiki.duraspace.org/display/DSPACE/DSpaceMETSSIPProfile
(accessed Sept 17, 2014).
(42)
DCMI
Metadata
Terms
definition
page.
http://dublincore.org/documents/dcmi-terms/ (accessed Sept 22,2104).
(43) Five star open data Home Page. http://5stardata.info (accessed
Sept 17, 2014).
(44) The R Project for Statistical Computing. http://www.rproject.org (accessed Sept 17, 2014).
(45) Jmol /JSmol interactive scripting documentation.
http://chemapps.stolaf.edu/jmol/docs (accessed Sept 17, 2014).
(46) Ortiz, I. M.; Moreno, P.; Sierra, J. L.; Manjón, B. F. Using
DocBook and XML Technologies to Create Adaptive Learning Content in Technical Domains. Int. J. Comput. Sci., App. 2006, 3, 91-108.

Table of Contents

10