What is the criteria for Profiler to detect dates in data sources? I have a plain CSV file and one of the columns has "date" inside it as show below. The profiler does not detect this as date at all. 0% is reported under date data-type.
When importing datasets, we need to consider the source of the data when discussing column data types:
Some formats and systems provide column type metadata. If the system provides the metadata, we use this to identify column types. Example: databases, Parquet files, Excel files
Some formats are basically text files. For these formats, When Paxata imports the data, we look at cell contents to identify the predominant type of data. Example: CSV, TSV, Positional format, XML, JSON
Some parsers have options that instruct Paxata to attempt to interpret column data types during import for textual datasets.
Looking a back at your example: You likely had the "Parse cells into numbers, dates, and booleans" option enabled on the CSV during import
This configuration leads to the following:
"store-id": cells contain alphanumeric data, therefore we specify the type as text
"date": There are many formats for dates, so identification by looking at cell contents is far too imprecise. Cells contains numbers and non-numeric "/", therefore we specify the type as text.
"totalsales": cells contain numbers and characters that are typically present in some numeric formats ('$', ','), therefore we specify the type as numeric.
If you disable the "Parse cells into numbers, dates, and booleans" option, each column type will be defined as text.
If you open the CSV in Excel, set the column type to Date and import the resulting Excel file, the "date" column type will be set from the Excel type.