Importing data
Introduction
To import data is an important part of Dédalo projects. Dédalo is a high structured data system, Dédalo manage literal data and relationships between data. Data normalization is inside the core of the application. Mainly Dédalo use lists, thesaurus, resources and other related sections to define his data.
Dédalo data model has a abstraction layer that use the ontology definitions to create components (as fields) and sections (as tables).
About plain text / non-normalized data
Lots of catalogues in museums have a previous cataloging system, sometimes do it by itself in commercial applications as FileMaker, Access, etc. and sometimes this data has not structure and is saved as plain text without normalization. This situation create a lots of data inconsistencies that could generate a very difficult situations to resolve. Dédalo can import plain text without any structuring but it is not recommended. If you want import this kind of data, we recommend to do a clean process before import to Dédalo.
Format
Dédalo use the standard csv to import data with UTF-8 encoding without BOM (Byte Order Mark).
Warning
Any other encoding different of UTF are not supported. Bad encoding files could break the import process at any time and the data imported could has typos and errors.
Byte Order Mark (BOM)
BOM is accepted in some cases, but in general and according to the Unicode standard, the BOM for UTF-8 files is not recommended:
2.6 Encoding Schemes
... Use of a BOM is neither required nor recommended for UTF-8, but may be encountered in contexts where UTF-8 data is converted from other encoding forms that use a BOM or where the BOM is used as a UTF-8 signature. See the “Byte Order Mark” subsection in Section 23.8, Specials, for more information...
By default Dédalo use a stringify JSON encoding in UTF-8 with double quotes ""
escaped marks for the data.
Example of locator:
[{
"type":"dd151",
"section_id":"2",
"section_tipo":"rsc723"
}]
Will need to be encoded in csv format as:
"[{""type"":""dd151"",""section_id"":""2"",""section_tipo"":""rsc723""}]"
But is possible to use a plain text to import flat data.
Example of text:
My plain text without double quotes
Will need to be encoded in csv format as:
My plain text without double quotes
Example of text with double quotes inside:
my plain text with "double quotes"
Will need to be encoded in csv format as:
"my plain text with ""double quotes"""
File nomenclature (optional)
Is highly recommended to use only accii characters in the name of import files, so try to use names without spaces, accents or any special character.
Adding the section to the filename
Filenames can be using to detect the section automatically when import, is possible specify it in this way:
my_name_to_identify_data-section_tipo.csv
Example, file with interviews data to import in Interviews section oh1:
interviews_2015-oh1.csv
But is possible indicate the destination section in the import csv tool.
Using editors
Is possible use a editor to create the csv import files. If you want to use spreadsheet editor as LibreOffice Calc, remember that you will need to export csv with UTF-8 encoding.
Using a spreadsheet
Dédalo data can be represented as a spreadsheet, with columns and rows, columns represent the components(fields), rows represent the records.
Every csv import file represent a section, if you need import multiple sections you will need a csv file for every section. To import in Types section numisdata3, the name of the csv must be has this section_tipo as:
my_import_types-numisdata3.csv
Every column represent a component(field) and every row represent a record, data will be the cell that crosses the column and the row.
column A | column B | column X |
---|---|---|
data1A | data1B | data1X |
data1B | data2B | data2X |
Defining the target component in the column name
Every column in the first row in the file, the head, will content the ontology tipo of the target component of the section to be imported. At least one column need to be set as the section_id to identify the column with the unique id, by convention it will be the first one, but it is not mandatory.
To import the component Key
numisdata81 and the component Number
numisdata27 as fields do the Types
section numisdata3, you will need to create a csv as:
section_id | numisdata81 | numisdata27 |
---|---|---|
1 | ["key1"] | ["062"] |
2 | ["key2"] | ["685a"] |
Columns with names instead ontology tipo
Is possible use "human" names in the columns, but the import tool will not match with the component and you will need to set manually before import.
the previous csv could be named in this way:
id | Key | Number |
---|---|---|
1 | ["key1"] | ["062"] |
2 | ["key2"] | ["685a"] |
But it will not match and you will need to set the component tipo inside the import tool.
You can know the ontology tipo of the component picking one component and Dédalo will show it inside the info part of the inspector:
Besides, Dédalo will show the component data format and it is possible to copy it. In this case ["062"]
Also you can check the ontology here.
Data formats
In general Dédalo import a stringify JSON for every data. But, for create a useful and easy import process, is possible use a string representation formats of data.
Plain text
By default import model use the JSON format of his data, an object with lang properties and values in array.
{
"lg-spa" : ["mi dato para importar", "Otro dato"],
"lg-eng" : ["my import data", "Other data to import"]
}
As Dédalo import use a csv without format, JSON data need to be stringified in this way:
The table to import
section_id | oh14 |
---|---|
1 | {"lg-spa": ["mi dato para importar","Otro dato"]} |
Will be encoded in csv format as:
section_id;rsc86
1;"{""lg-spa"":[""mi dato para importar"",""Otro dato""]}"
Alternative formats to import text
-
An array of string values
["mi dato para importar", "Otro dato"]
In this case the import process assume the Dédalo data lang defined by the user in menu and will save into this lang, or if the component is non translatable will use
lg-nolan
to save import data.Example:
section_id oh14 1 ["mi dato para importar","Otro dato"] -
Plain text
new data to import
Example:
section_id oh14 1 new data to import In this case the import process assume the Dédalo data lang defined by the user in menu and will import the value as unique value in the array, if exists previous data it will be replace with a new array with the import value.
If the data in database is:
{ "lg-spa" : ["mi dato importado", "Otro dato"], "lg-eng" : ["my imported data", "Other data"] }
and the Dédalo data lang is set to English, after import plain text, the final data will be:
{ "lg-spa" : ["mi dato importado", "Otro dato"], "lg-eng" : ["new data to import"] }
Plain text is easy to import, but it is limited in the data control. take account of the language set in the menu.
Format text
HTML
Dédalo use HTML standard format to import formated text.
As Dédalo use ck-editor as text editor, HTML tags accepted are the same than ck-editor:
Dédalo has two editors, text_area
and html_text
-
text_area
accepts:<p></p> // a Paragraph <strong></strong> // Bold text <i></i> // Italic text <u></u> // Underscore text
-
html_text
accepts:<p></p> // a Paragraph <strong></strong> // Bold text <i></i> // italic text <u></u> // underscore text <s></s> // Strikethrough text <Code></Code> // programming code <sub></sub> // Subscript text <sup></sup> // Superscript text
Besides, import format text support some compatible elements and css styles:
<b> <* style="font-weight: bold"> // (or numeric values that are greater or equal 600)
<em> <* style="font-style: italic">
<* style="text-decoration: underline">
<del><strike> <* style="text-decoration: line-through">
<* style="word-wrap: break-word">
<* style="vertical-align: sub">
<* style="vertical-align: super">
Note
This elements and styles will be changed to elements supported in the import process.
Indexation tags
Dédalo use a non standard HTML tags to define indexation
tags
, tc
, person
, language
, notes
and references
.
The main format of this tags follow this rules:
- The tag is enclosing by
[]
- the element are separated by
-
character - the first element is
tag_name
with the standard name of the tag - the second element is the
state
of the tag with n|r|d options, n=normal, r=to review, d=deleted. - unique id of the tag, int.
- data::data enclosing the locator in the case that this tag has a link to any data.
- locator is stringify version with double quotes
"
remplace with simple quotes'
[tag_name-state-id-label-data:locator:data]
index
index tag define a fragment inside of formatted text, index tag has a in and out format, the fragment will be in the middle of this tags.
indexIn
Mark the initial position of the indexation fragment.
Example:
[index-n-1-my tag label-data::data]
indexOut
Mark the out position of the indexation fragment.
Example:
[index-n-1-my tag label-data::data]
tc
Tc tag are using to point a specific audiovisual timecode at the beginning of paragraphs, it use to create a time relation between text and his audiovisual time.
tc tags has his own format, the tc is enclosing by TC_
and _TC
marks.
[TC:hh:mm:ss.ms_TC]
[TC_00:01:25.627_TC]
lang
The lang tag is used to mark the change from the previous language. Example, an interview in Catalan in which the interviewee begin to speak in French.
[lang-a-1-spa-data:['lg-spa']:data]
svg
The svg tag is used to add a graphic within the text. The tag uses a locator to point to the svg section. Example to add an Iberian symbol inside a legend text.
Example:
[svg-n-1--data:{'section_tipo':'sccmk1','section_id':'2','component_tipo':'hierarchy95'}:data]
geo
The geo tag is used to add some features as, polygons, points, or marks.
Example:
[geo-n-10-10-data::data]
page
The page tag is used to mark a page break inside text.
Example:
[page-n-3]
person
The person tag is used to mark a person that is begin to talk. The tag use a locator to point into People under study rsc197 section.
[person-a-1-Pedpi-data:{'section_tipo':'rsc197','section_id':'1','component_tipo':'oh24'}:data]
note
The note tag is used to add a annotation in text. The annotation use a locator to point to Annotations rsc326 section. The state of the note could be a | b, a=private, b=public.
Example:
[note-a-1-1-data:{'section_tipo':'rsc326','section_id':1}:data]
reference
The reference tag is used to a link to any other section. It use the locator to point at any other sections. The reference works pointed as HTML <a href><a>
element. References has a in and out tag to indicate the beginning and end fo the reference.
referenceIn
[reference-n-1-reference 1-data:[{'section_tipo':'fr1','section_id':'1','type':'dd151'}]:data]
referenceOut
[/reference-n-1-reference 1-data:[{'section_tipo':'fr1','section_id':'1','type':'dd151'}]:data]
By default import model use the JSON format of his data, an object with lang properties and values in array.
{
"lg-cat" : ["<p>Les meves dades per <strong>importar</strong></p><p> </p><p>Amb 2 paragraphs</p>"],
"lg-eng" : ["<p>My data to <strong>import</strong></p><p> </p><p>With 2 paragraphs</p>"]
}
As Dédalo import use a csv without format, JSON data need to be stringified in this way:
The table to import
section_id | numisdata18 |
---|---|
1 | {"lg-cat": ["<p>El meu text per <strong>importar</strong></p>","<p>Altra dada</p>"]} |
Will be encoded in csv format as:
section_id;numisdata18
1;"{""lg-cat"": [""<p>El meu text per <strong>importar</strong></p>"",""<p>Altra dada</p>""]}"
Alternative formats to import formated text
-
An array of string values
["<p>El meu text per <strong>importar</strong></p>","<p>Altra dada</p>"]
In this case the import process assume the Dédalo data lang defined by the user in menu and will save into this lang, or if the component is non translatable will use
lg-nolan
to save import data.Example:
section_id oh14 1 ["<p>El meu text per <strong>importar</strong></p>","<p>Altra dada</p>"]
-
Formatted text
<p>Nou text per <strong>importar</strong></p>
Example:
section_id numisdata18 1 <p>Nou text per <strong>importar</strong></p>
In this case the import process assume the Dédalo data lang defined by the user in menu and will import the value as unique value in the array, if exists previous data it will be replace with a new array with the import value.
If the data in database is:
{ "lg-cat" : ["<p>la meva dada importada</p>", "<p>Altra dada</p>"], "lg-eng" : ["<p>my imported data</p>", "<p>Other data</p>"] }
and the Dédalo data lang is set to Catalan, after import plain text, the final data will be:
{ "lg-cat" : ["<p>Nou text per <strong>importar</strong></p>"], "lg-eng" : ["<p>my imported data</p>", "<p>Other data</p>"] }
-
Plain text
Some cases, the text could not use any format instead the components support the formats, so is possible import plain text (without HTML)
new data to import
Example:
section_id numisdata18 1 new data to import In this case the import process assume the Dédalo data lang defined by the user in menu and will import the value as unique value in the array, if exists previous data it will be replace with a new array with the import value.
If the data in database is:
{ "lg-spa" : ["mi dato importado", "Otro dato"], "lg-eng" : ["my imported data", "Other data"] }
and the Dédalo data lang is set to English, after import plain text, the final data will be:
{ "lg-spa" : ["mi dato importado", "Otro dato"], "lg-eng" : ["new data to import"] }
Plain text is easy to import, but it is limited in the data control. take account of the language set in the menu.
Numbers
By default import model use the JSON format of his value, as the component do not use languages the main format to import is the array of values.
[104,-75.35]
As Dédalo import use a csv without format, JSON data need to be stringified in this way:
The table to import
section_id | numisdata133 |
---|---|
1 | [104,-75.35] |
Will be encoded in csv format as:
section_id;rsc86
1;[104,-75.35]
Alternative formats to import numbers
-
Plain number
33.85
Example:
section_id numisdata133 1 33.85 In this case the import process assume this data as the full data, if exists previous data it will be replace with a new array with the import value.
If the data in database is:
{ "lg-nolan" : [104,-75.35] }
after import plain number, the final data will be:
{ "lg-nolan" : [33.85] }
Plain number is easy to import, but it is limited in the data control.
Dates
By default import model use the JSON format of his value, as the component do not use languages the main format to import is the array of dd_date objects.
[{
"start" : {
"year": 1238,
"month": 10,
"day": 9
}
}]
As Dédalo import use a csv without format, JSON data need to be stringified in this way:
The table to import
section_id | tch56 |
---|---|
1 | [{"start":{"year":1238,"month":10,"day":9}}] |
Will be encoded in csv format as:
section_id;tch56
1;"[{""start"":{""year"":1238,""month"":10,""day"":9}}]"
Alternative formats to import dates
-
A punctual date in flat string :
-205/05
Example:
section_id tch56 1 -205/05 It's allowed to use different formats indicating it in the name of the header as tch56_dmy.
section_id tch56_dmy 1 05/-205 It's allowed to use different separator between values of elements.
-205-05 15-11--50 15.11.-50
-
A range of dates in flat string:
2023/10/26<>2023/10/27
The '<>' separator indicate the range with the start date at left and end date at right.
[{ "start" : { "year": 2023, "month": 10, "day": 26 }, "end" : { "year": 2023, "month": 10, "day": 27 } }]
Is possible to leave spaces between dates and the separator.
-150 <> 238
Is a valid range date, but the separator will be always in same format, a space between marks are not allowed:
-150< >238
it's not a valid range.
It's allowed to use different formats indicating it in the name of the header as tch56_dmy.
section_id tch56_mdy 1 10/26/2023<>10/27/2023 It's allowed to use different separator between values of elements.
10-26-2023<>10-27-2023 10.26.2023<>10.27.2023
-
Multi value date in flat string
2023/10/26|1853/02/18
The '|' separator indicate multiple values. The values are not a star <> end dates, both are start dates, the second one is the start date of the second value.
The previous string date will be parse as:
[ { "start" : { "year": 2023, "month": 10, "day": 26 } }, { "start" : { "year": 1853, "month": 02, "day": 18 } } ]
Is possible to leave spaces between dates and the separator.
-150 | -25
It's allowed to use different formats indicating it in the name of the header as tch56_dmy.
section_id tch56_mdy 1 10/26/2023|02/18/1853 It's allowed to use different separator between values of elements.
10-26-2023|02-18-1853 10.26.2023|02.18.1853
-
Combination of multi value and range
2023/10/26<>2023/10/27|1853/02/18
To define multiple values with ranges is possible to use a combination of the '|' to indicate the multi value and the '<>' to indicate the range.
The previous string date will be parse as two date values with the range of the first value with star and end dates:
[ { "start" : { "year": 2023, "month": 10, "day": 26 }, "end" : { "year": 2023, "month": 10, "day": 27 } }, { "start" : { "year": 1853, "month": 02, "day": 18 } } ]
Is possible leave a part of the range blank:
2023/10/26|<>1853/02/18
[ { "start" : { "year": 2023, "month": 10, "day": 26 } }, { "end" : { "year": 1853, "month": 02, "day": 18 } } ]
Is possible to leave spaces between dates and the separators.
2023/10/26 | <> 1853/02/18
It's allowed to use different formats indicating it in the name of the header as tch56_dmy.
section_id tch56_mdy 1 10/26/2023|<>02/18/1853 It's allowed to use different separator between values of elements.
10-26-2023\|<>02-18-1853 10.26.2023\|<>02.18.1853
Using other date formats
By default the string date formats use [-]y/m/d, but its possible to import the date in other formats indicating in the column header the format as second parameter after the tipo, using the `_ as character between them.
section_id | tch56_dmy |
---|---|
1 | 05/-205 |
Is possible to use this formats
Format | Description |
---|---|
ymd | year/moth/day as 2023/10/26 |
mdy | moth/day/year as 10/26/2023 |
dmy | day/moth/year as 26/10/2023 |
Using other separators
Default separator between day moth and year is /
but is possible to use -
and .
20-10-1945
2023-10-26|<>1853-02-18
-200<>50-11|-150-10
11-12--200|28-10-5
20.10.1945
2023.10.26|<>1853.02.18
-200<>50.11|-150.10
11.12.-200|28.10.5
Related data
Understanding relationships between data
Dédalo use a data relation model based in locators, sections are connected between them with locators. Any related data is connected by locators, a list showed in a select is connected by locators, a image inside a section is connected by locators. Dédalo uses locators everywhere.
Locators are extensible connection between data and can be point to full section, component inside section or a part of the components inside a section. Besides, locators can create links to external data.
When you want to import data with relations, you will use locators.
Basic locator has only two properties:
- section_tipo
- section_id
section_tipo
is the ontology tipo of the target section, section_id
is the unique id of the target section.
Also the locator has a type
, that defines the relation type and from_component_tipo
that defines the origin component (the field that point to target section, the portal).
Data linked as:
erDiagram
Types-numisdata3 ||--o{ Mints-numisdata6 : has
Types-numisdata3 {
int section_id PK "1"
locator Mint-numisdata30 "5"
}
Mints-numisdata6 {
int section_id PK "5"
string Name-numisdata16 "Arse Saguntum"
}
It say that Type 1 has a link with Mint 5. The field Mint numisdata30 in the section Types numisdata3 has lint to id 5 of section Mints numisdata6
In Dédalo format it will be:
{
"section_id": 1,
"section_tipo": "numisdata3",
"data":
{
"relations":
[
{
"section_id":"5",
"section_tipo":"numisdata6",
"from_component_tipo": "numisdata30"
}
]
}
}
And it could be represented in csv spreadsheet columns in this way:
types-numisdata3.csv
section_id | numisdata30 |
---|---|
1 | [{"section_id":"5","section_tipo":"numisdata6","from_component_tipo": "numisdata30"}] |
Importing
By default import model use the JSON format of his data, an array of locator.
[{"type":"dd151","section_id":"2","section_tipo":"rsc723","from_component_tipo":"tch191"}]
As Dédalo import use a csv without format, JSON data need to be stringified in this way:
The table to import
section_id | tch191 |
---|---|
1 | [{"type":"dd151","section_id":"2","section_tipo":"rsc723","from_component_tipo":"tch191"}] |
Will need to be encoded in csv format as:
section_id;tch191
1;"[{""type"":""dd151"", ""section_id"":""2"", ""section_tipo"":""rsc723"", ""from_component_tipo"":""tch191""}]"
It's possible remove the type
and from_component_tipo
properties because the head of the columns specify the value of from_component_tipo
and the component knows his own type
. So, is possible to define previous locator to import in this way:
[{"section_id":"2","section_tipo":"rsc723"}]
Alternative formats to import related data
-
A comma separate int (as destination section_id):
1,4,6
To import this data, is necessary specify, in the column head of the component, the section_tipo using the `_ character to between them:
component_tipo + '_' + section_tipo
Example:
tch191_rsc723
In this case the import process assume that all int values are section_id, the section_tipo become from the second tipo in the name of column head, from_component_tipo become from the first tipo in the name of column head and type is calculated asking to the component in server.
Example:
section_id tch191_rsc723 1 1,4,6 will be parse as:
[ {"type":"dd151","section_id":"1","section_tipo":"rsc723","from_component_tipo":"tch191"}, {"type":"dd151","section_id":"4","section_tipo":"rsc723","from_component_tipo":"tch191"}, {"type":"dd151","section_id":"6","section_tipo":"rsc723","from_component_tipo":"tch191"} ]
When the component point to multiple sections this import way will not respect other sections values in his data. Previous data pointed to other sections, than the section indicate in the head, will be removed.
-
Importing unique values
Is possible import unique int as section_tipo
Example:
section_id tch191_rsc723 1 6 -
Removing section_tipo reference in head.
Is possible remove the section_tipo in the head of the column when the component use only 1 pointed section.
Example:
section_id tch191 1 1,4,6 Components using with multiple sections
This possibility is only available when the component point to 1 section. Multiple sections are not allowed to import in this way.
In this case the import process will ask to component in the server to get the section_tipo to be used, if the component has multiple sections it will fail to import, to avoid errors and inconsistencies
-
URI
By default import model use the JSON format of his value, if the component does not use languages the main format to import is the array of dd_iri objects.
[{
"iri" : "https://dedalo.dev",
"title": "Dédalo website"
}]
As Dédalo import use a csv without format, JSON data need to be stringified in this way:
The table to import
section_id | tch442 |
---|---|
1 | [{"iri":"https://dedalo.dev","title":"Dédalo website"}] |
Will be encoded in csv format as:
section_id;tch442
1;"[{""iri"":""https://dedalo.dev"",""title"":""Dédalo website""}]
Multiple values
To import multiple values in the same component/field, add new object to the array in this way:
[
{
"iri" : "https://dedalo.dev",
"title": "Dédalo website"
},
{
"iri" : "http://monedaiberica.org",
"title": "MIB website"
}
]
Languages
The import will marked this data as lg-nolan
because the component interpreted that the URI has not language by default. But sometimes you will manage multilingual URI's, as wikipedia articles, so in those cases is possible identify the language of the URI in this way:
{
"lg-spa": [{
"iri" : "https://es.wikipedia.org/wiki/Escrituras_paleohispánicas"
}],
"lg-deu": [{
"iri" : "https://de.wikipedia.org/wiki/Althispanische_Schriften"
}]
}
The table to import
section_id | tch442 |
---|---|
1 | {"lg-spa":[{"iri":"https://es.wikipedia.org/wiki/Escrituras_paleohispánicas"}],"lg-deu":[{"iri":"https://de.wikipedia.org/wiki/Althispanische_Schriften"}]} |
Alternative formats to import URI's
-
Flat string:
https://dedalo.dev
section_id tch442 1 https://dedalo.dev it will be parse as:
[{ "iri":"https://dedalo.dev" }]
-
Array of strings:
Used to import multiple URI's into the component / field.
["https://dedalo.dev","https://dedalo.dev/docs"]
section_id tch442 1 ["https://dedalo.dev","https://dedalo.dev/docs"] will be parse as:
[ {"iri":"https://dedalo.dev"}, {"iri":"https://dedalo.dev/docs"} ]
-
String with title
To import the title of the URI use the
,
separator in this way:Dédalo website, https://dedalo.dev
section_id tch442 1 Dédalo website, https://dedalo.dev it will be parse as:
[{ "iri": "https://dedalo.dev", "title": "Dédalo website" }]
Separator format
Dédalo interpreted the
,
separator between data to differentiate two parts, left of the comma is the title and the right of the comma will be the URI, if you are using this separator is important to add the space between comma and the URI, because is possible identify the comma when the URI is clear, sometimes the title or the URI can use this character. To minimized errors ensure that the space is after the comma in the separator. -
String of multiple values
To import multiple values of the URI use the
|
separator in this way:https://dedalo.dev | https://dedalo.dev/docs
section_id tch442 1 https://dedalo.dev | https://dedalo.dev/docs it will be parse as:
[ {"iri":"https://dedalo.dev"}, {"iri":"https://dedalo.dev/docs"} ]
Separator format
Dédalo interpreted the
|
separator between data to differentiate two or more values. Is possible that some URI's will use this character inside the variables. To minimized errors ensure that the space is before and after of the separator character. -
String of multiple values with title
Is possible combine the title separator and the values separator in the same string in this way:
Dédalo website, https://dedalo.dev | https://dedalo.dev/docs
section_id tch442 1 Dédalo website, https://dedalo.dev | https://dedalo.dev/docs it will be parse as:
[ { "iri":"https://dedalo.dev", "title": "Dédalo website" }, { "iri":"https://dedalo.dev/docs" } ]