italiano 资源

意大利语 动词、名词、形容词 按使用频率排序

附件下载:

📎Word_Frequency_Lists_ITA-main.zip

https://pan.baidu.com/s/1y8J4SEf5Q1-e82dOHxNL5A?pwd=6mlh

数据来源:

https://github.com/franfranz/Word_Frequency_Lists_ITA

Word_frequency_Lists_ITA

Handy frequency lists for Italian lexical words calculated from the corpus ItWac (Baroni, M., Bernardini, S., Ferraresi, A., & Zanchetta, E., 2009).

Contents:

Lists

NOUNS

  • itwac_nouns_lemmas_notail_2_0_0.csv List of word forms tagged as NOUNS. The minimum token frequency in this list is 3. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemma_v2.0.0
  • itwac_nouns_lemmas_raw_2_0_0.zip List of word forms tagged as NOUNS. The minimum token frequency in this list is 1. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemma_v2.0.0

VERBS

  • itwac_verbs_lemmas_notail_2_1_0.csv List of word forms tagged as lexical VERBS (no auxiliary verbs). Contains: wordform, lemma, POS, modality, POS2 (ideally, functional verbs), frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemma_verb_2_1_0.
  • itwac_verbs_list_of_lemmas_2_1_0.csv List of lemmas from most to least represented across lexical VERB wordforms. Encoding: utf-8. Calculated using countlemma_verb_2_1_0.

ADJECTIVES

  • itwac_adj_lemmas_notail_2_1_0.csv List of word forms tagged as ADJ. The minimum token frequency in this list is 3. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemmaADJ
  • itwac_adj_lemmas_raw_2_1_0.zip List of word forms tagged as ADJ. The minimum token frequency in this list is 1. Contains: wordform, lemma, POS, frequency (raw), frequency per million words (fpmw), frequency (zipf). Encoding: utf-8. Calculated using countlemmaADJ

Code

  • countlemma_v2.0.0.R Code used to provide a frequency list of all the NOUN forms present in Itwac, tagged for POS and lemma. This version is less time consuming in handling big files if compared to v_1.
  • countlemma_verb_2_1_0.R Code used to provide a frequency list of all the VERB forms present in Itwac, tagged for POS, lemma, modality.
  • countlemma_adj.R Code used to provide a frequency list of all the ADJ forms present in Itwac, tagged for POS and lemma.

发表回复