Automatically generating effective search queries directly from community question-answering questions for finding related questions

Resultado de la investigación: Contribución a la publicaciónArticle

  • 1 Citas

Resumen

Community Question-Answering platforms are massive knowledge bases of questions and answers pairs produced by their members. In other to provide a vibrant service, they are compelled to provide answers to new posted questions as soon as possible. However, since their dynamic requires their own users to answer questions, there is an inherent delay between posting time and the arrival of good answers. In fact, many of these new questions might be already asked and satisfactorily answered in the past. Ergo, one of the pressing needs of these services is capitalizing on good answers given to related resolved questions across their large-scale knowledge base. To that end, current approaches have studied the effectiveness of human-generated web queries across search logs in fetching related questions and potential good answers from these community archives. However, this kind of strategy is not suitable for questions without click-through data, in particular those recently posted, limiting their capability of providing them with real-time answers. In this paper, we propose an approach to find related questions across the cQA knowledge base, which automatically generate effective search strings directly from question titles and bodies. In so doing, we automatically construct a massive corpus of related questions on top of the relationships yielded by their click-through graph, and generated candidate queries by inspecting dependency paths across the title and body of each question afterwards. Then, we utilize this corpus for automatically annotating the retrieval power of each of these candidates. With this labelled corpus, we study the effectiveness of several learning to rank models enriched with assorted linguistically-motivated properties. Thus deducing the linguistic structure of automatically generated search strings that are effective in finding related questions. Since these models are inferred solely from each question itself, they can be used when search log data (i.e., web queries) is unavailable. Overall, our experiments underline the effectiveness of our approach, in particular our outcomes indicate that named entity recognition is instrumental in structuring and recognizing 2–5 terms effective queries. Furthermore, we carry out experiments considering and ignoring question bodies, and we show that profiting only from question titles is more promising, but most effective queries are harder to detect. Conversely, adding question bodies makes the retrieval of past related questions noisier, but their content helps to generalize models capable of identifying more effective candidates.

Idioma originalEnglish
Páginas (desde - hasta)11-19
Número de páginas9
PublicaciónExpert Systems with Applications
Volumen77
Identificadores de objetos digitales
EstadoPublished - 1 jul 2017

Huella dactilar

Experiments
Linguistics

Keywords

    ASJC Scopus subject areas

    • Engineering(all)
    • Computer Science Applications
    • Artificial Intelligence

    Citar esto

    @article{2d3a50229d1d46ac951c548acdc4e0dd,
    title = "Automatically generating effective search queries directly from community question-answering questions for finding related questions",
    abstract = "Community Question-Answering platforms are massive knowledge bases of questions and answers pairs produced by their members. In other to provide a vibrant service, they are compelled to provide answers to new posted questions as soon as possible. However, since their dynamic requires their own users to answer questions, there is an inherent delay between posting time and the arrival of good answers. In fact, many of these new questions might be already asked and satisfactorily answered in the past. Ergo, one of the pressing needs of these services is capitalizing on good answers given to related resolved questions across their large-scale knowledge base. To that end, current approaches have studied the effectiveness of human-generated web queries across search logs in fetching related questions and potential good answers from these community archives. However, this kind of strategy is not suitable for questions without click-through data, in particular those recently posted, limiting their capability of providing them with real-time answers. In this paper, we propose an approach to find related questions across the cQA knowledge base, which automatically generate effective search strings directly from question titles and bodies. In so doing, we automatically construct a massive corpus of related questions on top of the relationships yielded by their click-through graph, and generated candidate queries by inspecting dependency paths across the title and body of each question afterwards. Then, we utilize this corpus for automatically annotating the retrieval power of each of these candidates. With this labelled corpus, we study the effectiveness of several learning to rank models enriched with assorted linguistically-motivated properties. Thus deducing the linguistic structure of automatically generated search strings that are effective in finding related questions. Since these models are inferred solely from each question itself, they can be used when search log data (i.e., web queries) is unavailable. Overall, our experiments underline the effectiveness of our approach, in particular our outcomes indicate that named entity recognition is instrumental in structuring and recognizing 2–5 terms effective queries. Furthermore, we carry out experiments considering and ignoring question bodies, and we show that profiting only from question titles is more promising, but most effective queries are harder to detect. Conversely, adding question bodies makes the retrieval of past related questions noisier, but their content helps to generalize models capable of identifying more effective candidates.",
    keywords = "Community question answering, Expert systems, Knowledge bases, Knowledge processing, Natural language processing, Question analysis, Real-time intelligent automation",
    author = "Alejandro Figueroa",
    year = "2017",
    month = "7",
    doi = "10.1016/j.eswa.2017.01.041",
    volume = "77",
    pages = "11--19",
    journal = "Expert Systems with Applications",
    issn = "0957-4174",
    publisher = "Elsevier Limited",

    }

    TY - JOUR

    T1 - Automatically generating effective search queries directly from community question-answering questions for finding related questions

    AU - Figueroa,Alejandro

    PY - 2017/7/1

    Y1 - 2017/7/1

    N2 - Community Question-Answering platforms are massive knowledge bases of questions and answers pairs produced by their members. In other to provide a vibrant service, they are compelled to provide answers to new posted questions as soon as possible. However, since their dynamic requires their own users to answer questions, there is an inherent delay between posting time and the arrival of good answers. In fact, many of these new questions might be already asked and satisfactorily answered in the past. Ergo, one of the pressing needs of these services is capitalizing on good answers given to related resolved questions across their large-scale knowledge base. To that end, current approaches have studied the effectiveness of human-generated web queries across search logs in fetching related questions and potential good answers from these community archives. However, this kind of strategy is not suitable for questions without click-through data, in particular those recently posted, limiting their capability of providing them with real-time answers. In this paper, we propose an approach to find related questions across the cQA knowledge base, which automatically generate effective search strings directly from question titles and bodies. In so doing, we automatically construct a massive corpus of related questions on top of the relationships yielded by their click-through graph, and generated candidate queries by inspecting dependency paths across the title and body of each question afterwards. Then, we utilize this corpus for automatically annotating the retrieval power of each of these candidates. With this labelled corpus, we study the effectiveness of several learning to rank models enriched with assorted linguistically-motivated properties. Thus deducing the linguistic structure of automatically generated search strings that are effective in finding related questions. Since these models are inferred solely from each question itself, they can be used when search log data (i.e., web queries) is unavailable. Overall, our experiments underline the effectiveness of our approach, in particular our outcomes indicate that named entity recognition is instrumental in structuring and recognizing 2–5 terms effective queries. Furthermore, we carry out experiments considering and ignoring question bodies, and we show that profiting only from question titles is more promising, but most effective queries are harder to detect. Conversely, adding question bodies makes the retrieval of past related questions noisier, but their content helps to generalize models capable of identifying more effective candidates.

    AB - Community Question-Answering platforms are massive knowledge bases of questions and answers pairs produced by their members. In other to provide a vibrant service, they are compelled to provide answers to new posted questions as soon as possible. However, since their dynamic requires their own users to answer questions, there is an inherent delay between posting time and the arrival of good answers. In fact, many of these new questions might be already asked and satisfactorily answered in the past. Ergo, one of the pressing needs of these services is capitalizing on good answers given to related resolved questions across their large-scale knowledge base. To that end, current approaches have studied the effectiveness of human-generated web queries across search logs in fetching related questions and potential good answers from these community archives. However, this kind of strategy is not suitable for questions without click-through data, in particular those recently posted, limiting their capability of providing them with real-time answers. In this paper, we propose an approach to find related questions across the cQA knowledge base, which automatically generate effective search strings directly from question titles and bodies. In so doing, we automatically construct a massive corpus of related questions on top of the relationships yielded by their click-through graph, and generated candidate queries by inspecting dependency paths across the title and body of each question afterwards. Then, we utilize this corpus for automatically annotating the retrieval power of each of these candidates. With this labelled corpus, we study the effectiveness of several learning to rank models enriched with assorted linguistically-motivated properties. Thus deducing the linguistic structure of automatically generated search strings that are effective in finding related questions. Since these models are inferred solely from each question itself, they can be used when search log data (i.e., web queries) is unavailable. Overall, our experiments underline the effectiveness of our approach, in particular our outcomes indicate that named entity recognition is instrumental in structuring and recognizing 2–5 terms effective queries. Furthermore, we carry out experiments considering and ignoring question bodies, and we show that profiting only from question titles is more promising, but most effective queries are harder to detect. Conversely, adding question bodies makes the retrieval of past related questions noisier, but their content helps to generalize models capable of identifying more effective candidates.

    KW - Community question answering

    KW - Expert systems

    KW - Knowledge bases

    KW - Knowledge processing

    KW - Natural language processing

    KW - Question analysis

    KW - Real-time intelligent automation

    UR - http://www.scopus.com/inward/record.url?scp=85011843809&partnerID=8YFLogxK

    U2 - 10.1016/j.eswa.2017.01.041

    DO - 10.1016/j.eswa.2017.01.041

    M3 - Article

    VL - 77

    SP - 11

    EP - 19

    JO - Expert Systems with Applications

    T2 - Expert Systems with Applications

    JF - Expert Systems with Applications

    SN - 0957-4174

    ER -