.. _restservice: REST Service ============ .. Esto es una definición de primer nivel y tenemos que definir un buen diseño. .. Partes de este servicio pueden quedar fuera del prácticum para continuar a .. partir del TFG. Primero centrarnos en la parte servidor de .. predicciones (para poder hacer pruebas). Por orden de prioridad. Server: Debería ofrecer los métodos para buscar entidades similares tanto por id, por uri, como por vector de embedding. Dataset: Creación de datasets desde un método a partir de un SPARQL endpoint y una query semilla o un path a un fichero Ntriples. El servicio debería crear un id único para el dataset para poder pasárselo al algoritmo de training. Algorithm: Encontrar el mejor modelo dado un dataset y rangos de parámetros. /algorithm/1 Crear con petición asíncrona. POST /algorithm?dataset={id}¶m1= ¶m2= etc... El servicio REST está compuesto principalmente de un recurso dataset con distintas operaciones Endpoints --------- Aquí se detallarán todos los endpoints del servicio. El valor de la prioridad que se muestra indica la importancia que se le va a dar a la implementación de ese servicio. Cuanto menor sea, más importancia se le dará. Datasets management ``````````````````` The `/dataset` collection contains several methods to create, add triples to the dataset, train and generate search indexes. It also contains these main params .. sourcecode:: json {"entities", "relations", "triples", "status", "algorithm"} The ``algorithm`` parameter contains all the information the dataset are trained with. See `/algorithm` collection to get more information about this. Dataset will be changing its status when actions such training or indexing are performed. The *status* can only grow up. When a changing status is taking place, the dataset cannot be edited. In this situations, the status will be a negative integer. **status**: ``untrained`` -> ``trained`` -> ``indexed`` .. http:get:: /datasets/(int:dataset_id)/ Get all the information about a dataset, given a ``dataset_id`` **Sample request and response** :http:get:`/datasets/1/` .. sourcecode:: json { "dataset": { "relations": 655, "triples": 3307248, "algorithm": { "id": 2, "embedding_size": 100, "max_epochs": null, "margin": 2 }, "entities": 651759, "status": 2, "name": null, "id": 4 } } :param int dataset_id: Unique *dataset_id* :statuscode 200: Returns all information about a dataset. :statuscode 404: The dataset can't be found. .. http:post:: /datasets/(int:dataset_id)/train?algorithm=(int:id_algorithm) Train a dataset with a given algorithm id. The training process can be quite large, so this REST method uses a asynchronous model to perform each request. The response of this method will only be a ``202 ACCEPTED`` status code, with the ``Location:`` header filled with the task path element. See ``/tasks`` collection to get more information about how tasks are managed on the service. The dataset must be in a 'untrained' (0) state to get this operation done. Also, no operation such as ``add_triples`` must be being processed. Otherwise, a 409 CONFLICT status code will be obtained. :param int dataset_id: Unique *dataset_id* :query int id_algorithm: The wanted algorithm to train the dataset :statuscode 202: The requests has been accepted to the system and a task has been created. See Location header to get more information. :statuscode 404: The dataset or the algorithm can't be found. :statuscode 409: The dataset cannot be trained due to its status. .. http:get:: /datasets/ Gets all datasets available on the system. :statuscode 200: All the datasets are shown correctly .. TODO: This method is not implemented .. http:get:: /dataset_types Obtener todos los tipos de dataset disponibles en el sistema. No pueden ser modificados. :statuscode 200: Se muestran adecuadamente los tipos .. http:post:: /datasets?dataset_type=(int:dataset_type) Creates a new and empty dataset. To fill in you must use other requests. You also must provide ``dataset_type`` query param. This method will create a WikidataDataset (id: 1) by default, but you also can create different datasets providing a different dataset_type. Inside the body of the request you can provide a name and/or a description for the dataset. The name must be unique. For example: **Sample request** :http:post:`/datasets` .. sourcecode:: json {"name": "films", "description": "A dataset with favourite films"} **Sample response** The ``location:`` header of the response will contain the relative URI for the created dataset. Additionally, the body of the response will contain a dataset object with only id argument filled in: ``location: /datasets/32`` .. sourcecode:: json { "dataset": { "id": 32 } } :query int dataset_type: The dataset type to be created. 0 is for a simple Dataset and 1 is for WikidataDataset (default). :statuscode 201: A new dataset has been created successfuly. See ``Location:`` header to get the id and the new resource path. :statuscode 409: The given name already exists on the server. .. http:put:: /datasets/(int:dataset_id) Edits the description from a existing dataset. **Sample request** :http:put:`/datasets` .. sourcecode:: json {"description": "A dataset with most awarded films"} :param int dataset_id: Unique *dataset_id* :statuscode 200: The dataset has been updated successfully. The updated dataset will be returned in the response. :statuscode 404: The provided *dataset_id* does not exist. .. http:post:: /datasets/(int:dataset_id)/triples Adds a triple or a list of triples to the dataset. You must provide a JSON object on the request body, as shown below on the example. The name of the JSON object must be *triples* and must contain a list of all entities to be introduced inside the dataset. These entities must contain ``{"subject", "predicate", "object"}`` params. This notation is similar to other known as *head*, *label* and *tail*. Only triples can be added on a ``untrained`` (0) dataset. **Ejemplo** :http:post:`/datasets/6/triples` .. sourcecode:: json {"triples": [ { "subject": {"value": "Q1492"}, "predicate": {"value": "P17"}, "object": {"value": "Q29"} }, { "subject": {"value": "Q2807"}, "predicate": {"value": "P17"}, "object": {"value": "Q29"} } ] } :param int dataset_id: Unique *dataset_id* :statuscode 200: The request has been successful :statuscode 404: The dataset or the algorithm can't be found. :statuscode 409: The dataset cannot be trained due to its status. .. http:post:: /datasets/(int:dataset_id)/generate_triples Adds triples to dataset doing a sequence of SPARQL queries by levels, starting with a seed vector. This operation is supported only by certain types of datasets (the default one, type=1) The request will use asyncronous operations. This means that the request will not be satisfied on the same HTTP connection. Instead, the service will return a `/task` resource that will be queried with the progress of the task. The ``graph_pattern`` argument must be the where part of a SPARQL query. It **must** contain three variables named as ``?subject``, ``?predicate`` and ``?object``. The service will try to make a query with these names. You also must provide the ``levels`` to make a deep lookup of the entities retrieved from previous queries. The optional param ``batch_size`` is used on the first lookup for SPARQL query. For big queries you must tweak this parameter to avoid server errors as well as to increase performance. It is the LIMIT statement when doing this queries. **Sample request** .. sourcecode:: json { "generate_triples": { "graph_pattern": "SPARQL Query", "levels": 2, "batch_size": 30000 } } **Sample response** The ``location:`` header of the response will contain the relative URI for the created task resouce. Additionally, it is possible to get the task id from the response in json format. ``location: /tasks/32`` .. sourcecode:: json { "message": "Task 32 created successfuly", "status": 202, "task": { "id": 32 } } :param int dataset_id: Unique identifier of dataset :statuscode 404: The provided *dataset_id* does not exist. :statuscode 409: The *dataset_id* does not allow this operation :statuscode 202: A new task has been created. See /tasks resource to get more information. .. http:post:: /datasets/(int:dataset_id)/embeddings Retrieve from the trained dataset the embeddings from a list of entities. If on the request list the user requests for a entity that does not exist, the response won't contain that element. The 404 error is limited to the dataset, not the entities inside the dataset. The dataset must be in trained status (>= 1), because a model must exist to extract triples from. If not, a 409 CONFLICT will be returned. This could be useful if it is used with /similar_entities endpoint, to find similar entities given a different embedding vector. **Sample request** :http:post:`/datasets/6/embeddings` .. sourcecode:: json {"entities": [ "http://www.wikidata.org/entity/Q1492", "http://www.wikidata.org/entity/Q2807", "http://www.wikidata.org/entity/Q1" ] } **Sample response** .. sourcecode:: json { "embeddings": [ [ "Q1", [0.321, -0.178, 0.195, 0.816] ], [ "Q2807", [-0.192, 0.172, -0.124, 0.138] ], [ "Q1492", [0.238, -0.941, 0.116, -0.518] ] ] } *Note: The upper vectors are only shown as illustrative, they are not real values* :param int dataset_id: Unique id of the dataset :statuscode 200: Operation was successful :statuscode 404: The dataset ID does not exist :statuscode 409: The dataset is not on a correct status .. http:post:: /datasets/(int:dataset_id)/generate_autocomplete_index Creates a task to build an autocomplete index The task will perform a request to SPARQL endpoint for each entity. This will extract the labels, description and altLabels and store it on an Elasticsearch database. It is also possible give the languages desired to build the autocomplete index, allowing not only having english language, but others available on the endpoint. You must specify in the body a param named `langs` with a list with all language codes in ISO 639-1 format. **Sample request** :http:post:`/datasets/6/generate_autocomplete_index` .. sourcecode:: json { "langs" : [ "en", "es" ] } **Sample response** .. sourcecode:: json { "status": 202, "message": "Task 73 created successfuly" } :param int dataset_id: Unique id of the dataset :statuscode 202: The request has been accepted in the system and a task has been created. See Location header to get more information. :statuscode 404: The dataset can't be found. :statuscode 409: The dataset cannot be trained due to it's status. Algorithms `````````` The algorithm collection is used mainly to create and see the different algorithms created on the server. The hyperparameters that are allowed currently to tweak are: - `embedding_size`: The size of the embeddigs the trainer will use - ``margin``: The margin used on the trainer - ``max_epochs``: The maximum number of iterations of the algorithm .. http:get:: /algorithms/ Gets a list with all the algorithms created on the service. .. http:get:: /algorithms/(int:algorithm_id) Gets only one algorithm :param int algorithm_id: The algorithm unique identifier .. http:post:: /algorithms/ Create one algorithm on the service. On success, this method will return a 201 CREATED status code and the header parameter `Location:` filled with the relative path to the created resource. The body of the request must contain all parameters for the new algorithm. See the example below: **Sample request** :http:post:`/algorithms` .. sourcecode:: json { "algorithm": { "embedding_size": 50, "margin": 2, "max_epochs": 80 } } **Sample successfull response** The response when creating a new algorithm gives the location header filled with the URI of the new resource. It also returns the HTTP 202 status code, and the body has information about the request in json format. ``location: /algorithm/2`` .. sourcecode:: json { "status": 202, "algorithm": { "id": 2 }, "message": "Algorithm 2 created successfuly" } :statuscode 201: The request has been processed successfuly and a new resource has been created. See ``Location:`` header to get the new path. Tasks ````` The task collection stores all the information that async request need. This collection are made mainly to get the actual state of tasks, but no to edit or delete tasks (Not implemented). .. http:get:: /tasks/(int:task_id)?get_debug_info=(boolean:get_debug_info)&?no_redirect=(boolean:no_redirect) Shows the progress of a task with a ``task_id``. The finished tasks can be deleted from the system without previous advise. Some tasks can inform to the user about its progress. It is done through the progress param, which has *current* and *total* relative arguments, and *current_steps* and *total_steps* absolute arguments. When a task involves some steps and the number of small tasks to be done in next step cannot be discovered, the current and total will only indicate progress in current step, and will not include previous step, expected to be already done, or next step which is expected to be empty. The resource has two optional parameters: ``get_debug_info`` and ``no_redirect``. The first one, ``get_debug_info`` set to true on the query params will return additional information from the task. The other param, ``no_redirect`` will avoid send a 303 status to the client to redirect to the created resource. Instead it will send a simple 200 status code, but with the location header filled. :param int task_id: Unique *task_id* from the task. :statuscode 200: Shows the status from the current task. :statuscode 303: The task has finished. See Location header to find the resource it has created/modified. With ``no_redirect`` query param set to true, the location header will be filled, but a 200 code will be returned instead. :statuscode 404: The given *task_id* does not exist. .. NOT IMPLEMENTED STILL... .. http:delete:: /tasks/(int:task_id) Deletes a task from database. If it is possible to stop a task which is started but not finished, it will be stopped and deleted. If this is not possible, the task resource will be kept as is, and a 409 status code will be sent along a reason why the task cannot be stopped. If the task is deleted, the status will not be queried in a future, but any result produced by the task (such as adding triples to a dataset), will be kept on its own resource. :prioridad: 1 :todo: Not implemented :statuscode 204: The task has been deleted :statuscode 404: The task does not exists and cannot be deleted :statuscode 409: The current state of the task does not allow to delete it Triples prediction `````````````````` .. http:get:: /datasets/(int:dataset_id)/similar_entities/(string:entity)?limit=(int:limit)?search_k=(int:search_k) .. http:post:: /datasets/(int:dataset_id)/similar_entities?limit=(int:limit)?search_k=(int:search_k) Get the *limit* entities most similar to a *entity* inside a *dataset_id*. The given number in *limit* excludes the entity given itself. The POST method allows any representation of the wanted resource. See the example below. You can provide an entity as an URI or other similar representation, even an embedding. The type param inside entity JSON object must be "uri" for a URI or similar representation and "embedding" for an embedding. The ``search_k`` param is used to tweak the results of the search. When this value is greater, the precission of the results are also greater, but the time it takes to find the response is also bigger. **Sample request** :http:get:`/datasets/7/similar_entities?limit=1&search_k=10000` .. sourcecode:: json { "entity": {"value": "http://www.wikidata.org/entity/Q1492", "type": "uri"} } **Sample response** .. sourcecode:: json { "similar_entities": { "response": [ {"distance": 0, "entity": "http://www.wikidata.org/entity/Q1492"}, {"distance": 0.8224636912345886, "entity": "http://www.wikidata.org/entity/Q15090"} ], "entity": "http://www.wikidata.org/entity/Q1492", "limit": 2 }, "dataset": { "entities": 664444, "relations": 647, "id": 1, "status": 2, "triples": 3261785, "algorithm": 100 } } :param int dataset_id: Unique id of the dataset :query int limit: Limit of similar entities requested. By default this is set to 10. :query int search_k: Max number of trees where the lookup is performed. This increase the result quality, but reduces the performance of the request. By default is set to -1 :statuscode 200: The request has been performed successfully :statuscode 404: The dataset or the entity can't be found .. http:post:: /datasets/(int:dataset_id)/distance Returns the distance between two elements. The lower the number is, most probable to be both the same triple. The minimum distance is 0. **Request Example** :http:post:`/datasets/0/similar_entities` .. sourcecode:: json { "distance": [ "http://www.wikidata.org/entity/Q1492", "http://www.wikidata.org/entity/Q5682" ] } *HTTP Response* .. sourcecode:: json { "distance": 1.460597038269043 } :param int dataset_id: Unique id of the dataset :statuscode 200: The request has been performed successfully :statuscode 404: The dataset or the entity can't be found .. TODO: It is unknown the method on kgeserver library to get the wanted value .. http:get:: /datasets/(int:dataset_id)/embedding_probability/(string:embedding) Devuelve la probabilidad de que un vector de *embedding* sea verdadero dentro de un *dataset_id* dado. :prioridad: 0 :todo: 501 Not Implemented :param int dataset_id: Unique id of the dataset :param list embedding: Vector de *embedding* a obtener su probabilidad .. http:post:: /datasets/(int:dataset_id)/suggest_name Gives a list of autocomplete suggestions. For each entity, this will show labels on every language available, descriptions and altLabels. If any suggestion is available, this will return an empty list. **Request Example** :http:post:`/datasets/7/suggest_name` .. sourcecode:: json { "input": "human" } *HTTP Response* .. sourcecode:: json [ { "text": "humano", "entity": { "alt_label": { "es": [ "humano", "Homo sapiens sapiens", "persona", ], "en": [ "people", "person", "human being" ] }, "label": { "en": "human", "es": "ser humano", }, "entity": "Q5", "description": { "en": "common name of Homo sapiens (Q15978631), unique extant species of the genus Homo", "es": "especie animal perteneciente a la familia Hominidae, única superviviente del género Homo", } } } ] :param int dataset_id: Unique id of the dataset :statuscode 200: The request has been performed successfully :statuscode 404: The dataset or the entity can't be found